Document processing
Get any PDF in; get any office doc out.
Most maritime documents arrive as PDFs — many of them scans with no text layer — and traditional extractors flatten tables, drop diagrams, and lose the page a line came from. Document processing covers both directions: vision-based ingestion that keeps structure intact, and office generation for the DOCX / PDF / XLSX / PPTX a report has to ship as.
Two directions
Ingest: PDF → vision → structured markdown + search indexProduce: data → docx · pdf · xlsx · pptx- Ingest — render each page, run a layout-aware vision model, stitch structured markdown back together with page images embedded.
- Produce — generate and edit office documents (tracked changes, comments, formulas) via the Anthropic document skills.
Key concept — why vision-based ingestion
Text-layer PDFs lose their layout the moment a traditional tool extracts them, and a large share of maritime documents are scans with no text layer at all. Running vision over the rendered page captures tables, diagrams and forms regardless of how the PDF was produced — so an engineer can still read the drawing in the converted output.
Two PDF extractors — bulk vs precision
| PDF to Markdown | PDF Vision Extractor | |
|---|---|---|
| Purpose | Bulk indexing for search | Accurate Q&A on specific pages |
| Processing | Whole-page vision, parallel | Tiled vision with overlap |
| Accuracy | Good — fast at scale | Highest — recovers fine detail |
| Output | Per-page markdown + search index | One consolidated markdown |
| Use when | ”Index this folder of circulars" | "What does this diagram show?” |
Worked example — circulars in, answer out
- “Index this folder” → PDF to Markdown converts every PDF in the tree in parallel and builds one keyword index.
- “What are the maintenance intervals on page 7?” → PDF Vision Extractor renders that page at high resolution, tiles it, and reads the dense table other extractors garble.
- The converted library then feeds Search Indexed Documents for evidence-first retrieval.
Under the hood
What ingestion preserves
- Tables — column structure and cell alignment, not flattened text
- Diagrams — embedded as page images so engineers can still read them
- Forms — field labels and values kept paired
- Layout — heading hierarchy, lists, captions
- Page numbers — every line traces back to its source page
Modes
PDF to Markdown — single PDF · folder (recursive, unified index) · index-only rebuild. PDF Vision Extractor — single page · page range · query mode (extract and answer a focused question in one step).
Tiling is what buys the precision: whole-page vision tends to summarise and miss small text, fine cells and drawing dimensions. Splitting into overlapping tiles forces the model to attend to local detail — slower, but acceptable for a few critical pages.
Office generation — the Anthropic document skills
The “produce” half wraps the official Anthropic docx · pdf · pptx · xlsx skills — read, modify and generate office files with tracked changes, comments and formulas intact.