Provenance — every claim traces to raw bytes
If you ask the agent “where did you get this?” it walks the chain back to the bytes. This page documents that chain.
The chain
Final story ──► final-story.md │ cites story:42 ▼Story ──► story.md │ cites note_paths[postit:12345, postit:12346] ▼Post-it ──► agent_postits row │ source_row_ids=[emails:7891, conversation_turns:55432] ▼Evidence ──► incidents:7891 → file:///incidents/raw/2026-05-18_INC-2026-0142.jsonl conversation_turns:55432 → session.jsonl:421:198432 ▼Raw bytes ──► absolute ground truth — never movesSix layers. Each one cites the layer below. The bottom layer (raw bytes) is append-only and immutable — original files never rewritten, only superseded.
Why files are canonical (not DB columns)
Stories, post-its, and final stories live on disk as .md files first. The DB holds:
- An index for fast retrieval
- A hash of the file contents for tamper-detection
- Cross-references between layers
But the canonical truth is the file. Eight reasons:
- Human-readable without DB access
- Git-able (every story has a history)
- Survives DB corruption
- Greppable
- Easy to back up
- Easy to inspect during incidents
- No ORM impedance
- Same shape across cloud and local modes
What this enables
| Need | What provenance gives you |
|---|---|
| ”Why did the agent say this?” | Walk back to source — see exactly which post-it, which email, which sentence |
| ”Has this been tampered with?” | Hash check — DB hash vs file hash on every read |
| ”What did the system know on date X?” | Archive layer — every story has dated archives |
| ”Should I trust this claim?” | Importance scale on each post-it + source row depth |
What it costs
| Layer | Cost characteristic |
|---|---|
| Final story | One file, daily rewrite, hash stored in DB |
| Story | One file per casefile, rebuilt on new post-it cluster, prior versions archived |
| Post-it | One row per analyzer-lens per source, immutable once written |
| Evidence | One row per ingested source (email, conversation turn, photo, ERP row) |
| Raw | Append-only — original files stay in ~/inbox/, ~/library/raw/ |
Storage scales linearly with ingestion. At ~100 emails/day across an office, ~10MB/day total — a year of full provenance fits in 4GB.
Read next
Worked example · One AI incident The chain exercised end-to-end on one AI safety incident.
Office library The full library pipeline that feeds this chain.
Three feeders How provenance feeds into agent cold-start.