Skip to content

All data types

L0 · RawL1 · ReadableL2 · Analysis (opt)L3 · Post-itsL4 · StoriesL5 · Memory

Anything readable becomes evidence.

L0 raw on disk. L1 readable for agents. L2 structured when you need it. Same URL threaded through all three.

The office reads in many languages. PDF, Excel, photo, voice memo, JSONL chat log — all end up as readable text an agent can grep, with a thread back to the original byte.

L0 — Raw files, ground truth

LAYER 0 · CANONICAL

The bytes that arrived. Never touched.

Every kind of file the office sees lands on disk first. The raw byte is preserved exactly as it came in — original encoding, original timestamp, original everything.

EMAIL
Gmail · Outlook · SMTP. Headers, body, attachments — kept together.
PDF
Contracts, invoices, manuals, class certificates.
DOCX
Letters, proposals, minutes-of-meeting, internal memos.
XLSX
Books, schedules, voyage data, KPIs.
CSV · JSON
Exports, API payloads, ERP feeds — already structured.
IMAGE
Photos, screenshots, scanned forms. OCR + caption at L1.
AUDIO
Voice notes, call recordings. Transcribed at L1.
VIDEO
Walkthroughs, demos, CCTV. Transcript + keyframes at L1.
CHAT
WhatsApp · Slack · Teams · Line. Threaded conversations.
ERP ROW
Mirrored from the source system. PMS, accounting, crew.
RECEIPT
Bills, bank statements, GST documents.
AGENT CHAT
The agent’s own work log. Treated as evidence too.
Raw bytes stay where they live — in the inbox, the documents folder, the photos folder, the chat log directory. Never moved. Never rewritten. They are the ground truth of the office.

L0 → L1 — Convert once, read forever

LAYER 1 · READABLE

A markdown sidecar next to every raw file.

PDFs are not text to an agent. Photos are not text. Audio is not text. A small, cheap model reads each raw file once and writes a readable .md sidecar that lives next to it on disk.

RAW

The file arrives

2026-05-20-vendor.pdf · bytes preserved as they came.

READ

A converter reads it

Small, cheap, model. PDF text + tables. Image captions. Audio transcripts.

.MD

Sidecar is written

2026-05-20-vendor.md sits next to the raw file. Frontmatter points to the byte.

USE

Agents read the .md

Grep-able · citeable · reasonable. The raw is one click away if anyone asks.

URL preserved

Frontmatter source_url carries back to the raw byte. Always verifiable.

L1 → L2 (optional) — Structured analysis on top

LAYER 2 · ANALYSIS · OPTIONAL

Structured extraction for the cases that need it.

Some files reward a second pass. A vessel report has a table of fuel figures; an invoice has line items; a CT scan has measurements. L2 captures these as structured .json sitting next to the readable .md. Skip the layer when prose is enough.

JSON tables
Tabular data lifted out of XLSX, PDF, photo into structured rows.
OCR text
Scanned forms and handwritten notes turned into searchable strings.
Transcripts
Audio · video with speaker tags, timestamps, segments.
Summaries
A short paragraph at the top of long readable files. Reading-time first.
Entities
People · companies · vessels · acts mentioned. Tagged for the graph.
URL preserved
L2 files also carry source_url. Same chain. No break.

The agent’s own chat is also Layer 0

The agent’s own conversations — what was asked, what was answered, what tools fired, what files were read — flow into the same library. They are just another file type at L0.

L0 · AGENT CHAT

The agent’s work log is evidence too.

Claude Code writes each session to a .jsonl file — append-only, one line per message. A daemon mirrors lines into the library DB. The story-builder later mixes agent-chat post-its with email post-its without caring about the source. One pipeline. Same provenance.

Layer 0raw on disk.md sidecar.json structuredfrontmattersource_urlJSONL chatagent chat = evidence