# Mandelson Files API — agent guide

This is a read-only JSON API over the UK Cabinet Office "Humble Address" document
releases concerning Lord Peter Mandelson's appointment as HM Ambassador to
Washington (Volume I = HC 1774-I, March 2026, 31 docs; Volume II = HC 2, June 2026,
497 docs). It serves (a) journalist "leads" — written analyses with citations —
and (b) the full text of all 528 source documents, with search.

This instance serves the **full** dataset.
- full dataset API:    https://mandelson-palantir-api.apps.autonomy.work
- general dataset API: https://mandelson-files-api.apps.autonomy.work
(The two differ only in which leads are included — the Global Counsel & Palantir
leads are present in `full`, removed in `general`. Documents are identical.)

## IMPORTANT — AI-assisted analysis
The leads were generated with AI assistance and may contain mistakes. Treat them as
pointers, not facts. Before relying on or publishing anything, verify it against the
raw source document text (the `text` / `/text` endpoints) and the original PDFs
(`pdf_url`). Every lead and finding carries document references so you can check them.

## Data model
- A **lead** has: slug, volume (1|2), title, dek (summary line), tags, priority
  (bool; true = Global Counsel/Palantir focus), doc_refs (list of ints), and a
  markdown `body` with inline citations. Citations link to documents.
- A **document** is identified by (volume, ref) e.g. volume=2 ref=360. It has:
  date, title (sender/recipients/type), an extractive `summary`, full `text`
  (OCR'd; redactions shown as `***`, `Personal`, or `JCS`), and `pdf_url`.

## Endpoints
- `GET /`  or  `GET /guide`            → this guide (text/markdown)
- `GET /stats`                          → counts
- `GET /leads`                          → all leads (briefs)
- `GET /leads/{slug}`                 → one lead incl. markdown body
- `GET /documents`                      → list documents (briefs).
       query params: `volume` (1|2), `q` (substring filter), `limit` (default 100),
       `offset`, `cited` (true → only docs cited by a lead)
- `GET /documents/v{volume}/{ref}`  → one document incl. full `text`
- `GET /documents/v{volume}/{ref}/text` → raw document text (text/plain)
- `GET /search`                         → search. query params:
       `q` (required), `type` = leads | documents | all (default all),
       `fuzzy` = true|false (default true), `limit` (default 20)
       Documents are matched by exact substring on full text (ranked by hit count)
       plus, when fuzzy=true, fuzzy matching on title+summary (catches typos).
       Leads are matched fuzzily across title, dek, tags and body.
- Interactive OpenAPI docs: `GET /docs`   (machine-readable schema at `/openapi.json`)

## Examples
  curl https://mandelson-palantir-api.apps.autonomy.work/search?q=Palantir&type=documents
  curl https://mandelson-palantir-api.apps.autonomy.work/search?q=Witkof&fuzzy=true   # typo-tolerant
  curl https://mandelson-palantir-api.apps.autonomy.work/documents/v2/360/text
  curl https://mandelson-palantir-api.apps.autonomy.work/leads/v2_01_palantir_state_visit

Suggested workflow for an agent: GET / (this) → GET /leads to scan analyses →
GET /search?q=... to find relevant documents → GET /documents/vN/REF/text to read
the source → cite the pdf_url. Always fact-check the leads against the source text.