All data types
Anything readable becomes evidence.
L0 raw on disk. L1 readable for agents. L2 structured when you need it. Same URL threaded through all three.
The office reads in many languages. PDF, Excel, photo, voice memo, JSONL chat log — all end up as readable text an agent can grep, with a thread back to the original byte.
L0 — Raw files, ground truth
The bytes that arrived. Never touched.
Every kind of file the office sees lands on disk first. The raw byte is preserved exactly as it came in — original encoding, original timestamp, original everything.
L0 → L1 — Convert once, read forever
A markdown sidecar next to every raw file.
PDFs are not text to an agent. Photos are not text. Audio is not text. A small, cheap model reads each raw file once and writes a readable .md sidecar that lives next to it on disk.
The file arrives
2026-05-20-vendor.pdf · bytes preserved as they came.
A converter reads it
Small, cheap, model. PDF text + tables. Image captions. Audio transcripts.
Sidecar is written
2026-05-20-vendor.md sits next to the raw file. Frontmatter points to the byte.
Agents read the .md
Grep-able · citeable · reasonable. The raw is one click away if anyone asks.
URL preserved
Frontmatter source_url carries back to the raw byte. Always verifiable.
L1 → L2 (optional) — Structured analysis on top
Structured extraction for the cases that need it.
Some files reward a second pass. A vessel report has a table of fuel figures; an invoice has line items; a CT scan has measurements. L2 captures these as structured .json sitting next to the readable .md. Skip the layer when prose is enough.
source_url. Same chain. No break.The agent’s own chat is also Layer 0
The agent’s own conversations — what was asked, what was answered, what tools fired, what files were read — flow into the same library. They are just another file type at L0.
The agent’s work log is evidence too.
Claude Code writes each session to a .jsonl file — append-only, one line per message. A daemon mirrors lines into the library DB. The story-builder later mixes agent-chat post-its with email post-its without caring about the source. One pipeline. Same provenance.