Document processing

Get any PDF in; get any office doc out.

Most maritime documents arrive as PDFs — many of them scans with no text layer — and traditional extractors flatten tables, drop diagrams, and lose the page a line came from. Document processing covers both directions: vision-based ingestion that keeps structure intact, and office generation for the DOCX / PDF / XLSX / PPTX a report has to ship as.

vision OCRtable-preservingtiled extractionsearch index

docx · pdf · xlsx · pptx

Two directions

Ingest:   PDF  →  vision  →  structured markdown + search index
Produce:  data →  docx · pdf · xlsx · pptx

Ingest — render each page, run a layout-aware vision model, stitch structured markdown back together with page images embedded.
Produce — generate and edit office documents (tracked changes, comments, formulas) via the Anthropic document skills.

Key concept — why vision-based ingestion

Text-layer PDFs lose their layout the moment a traditional tool extracts them, and a large share of maritime documents are scans with no text layer at all. Running vision over the rendered page captures tables, diagrams and forms regardless of how the PDF was produced — so an engineer can still read the drawing in the converted output.

Two PDF extractors — bulk vs precision

	PDF to Markdown	PDF Vision Extractor
Purpose	Bulk indexing for search	Accurate Q&A on specific pages
Processing	Whole-page vision, parallel	Tiled vision with overlap
Accuracy	Good — fast at scale	Highest — recovers fine detail
Output	Per-page markdown + search index	One consolidated markdown
Use when	”Index this folder of circulars"	"What does this diagram show?”

Worked example — circulars in, answer out

“Index this folder” → PDF to Markdown converts every PDF in the tree in parallel and builds one keyword index.
“What are the maintenance intervals on page 7?” → PDF Vision Extractor renders that page at high resolution, tiles it, and reads the dense table other extractors garble.
The converted library then feeds Search Indexed Documents for evidence-first retrieval.

Under the hood

What ingestion preserves

Tables — column structure and cell alignment, not flattened text
Diagrams — embedded as page images so engineers can still read them
Forms — field labels and values kept paired
Layout — heading hierarchy, lists, captions
Page numbers — every line traces back to its source page

Modes

PDF to Markdown — single PDF · folder (recursive, unified index) · index-only rebuild. PDF Vision Extractor — single page · page range · query mode (extract and answer a focused question in one step).

Tiling is what buys the precision: whole-page vision tends to summarise and miss small text, fine cells and drawing dimensions. Splitting into overlapping tiles forces the model to attend to local detail — slower, but acceptable for a few critical pages.

Office generation — the Anthropic document skills

The “produce” half wraps the official Anthropic docx · pdf · pptx · xlsx skills — read, modify and generate office files with tracked changes, comments and formulas intact.