Skip to content

Document processing

Get any PDF in; get any office doc out.

Most maritime documents arrive as PDFs — many of them scans with no text layer — and traditional extractors flatten tables, drop diagrams, and lose the page a line came from. Document processing covers both directions: vision-based ingestion that keeps structure intact, and office generation for the DOCX / PDF / XLSX / PPTX a report has to ship as.

vision OCRtable-preservingtiled extractionsearch indexdocx · pdf · xlsx · pptx

Two directions

Ingest: PDF → vision → structured markdown + search index
Produce: data → docx · pdf · xlsx · pptx
  • Ingest — render each page, run a layout-aware vision model, stitch structured markdown back together with page images embedded.
  • Produce — generate and edit office documents (tracked changes, comments, formulas) via the Anthropic document skills.

Key concept — why vision-based ingestion

Text-layer PDFs lose their layout the moment a traditional tool extracts them, and a large share of maritime documents are scans with no text layer at all. Running vision over the rendered page captures tables, diagrams and forms regardless of how the PDF was produced — so an engineer can still read the drawing in the converted output.

Two PDF extractors — bulk vs precision

PDF to MarkdownPDF Vision Extractor
PurposeBulk indexing for searchAccurate Q&A on specific pages
ProcessingWhole-page vision, parallelTiled vision with overlap
AccuracyGood — fast at scaleHighest — recovers fine detail
OutputPer-page markdown + search indexOne consolidated markdown
Use when”Index this folder of circulars""What does this diagram show?”

Worked example — circulars in, answer out

  1. “Index this folder”PDF to Markdown converts every PDF in the tree in parallel and builds one keyword index.
  2. “What are the maintenance intervals on page 7?”PDF Vision Extractor renders that page at high resolution, tiles it, and reads the dense table other extractors garble.
  3. The converted library then feeds Search Indexed Documents for evidence-first retrieval.

Under the hood

What ingestion preserves
  • Tables — column structure and cell alignment, not flattened text
  • Diagrams — embedded as page images so engineers can still read them
  • Forms — field labels and values kept paired
  • Layout — heading hierarchy, lists, captions
  • Page numbers — every line traces back to its source page
Modes

PDF to Markdown — single PDF · folder (recursive, unified index) · index-only rebuild. PDF Vision Extractor — single page · page range · query mode (extract and answer a focused question in one step).

Tiling is what buys the precision: whole-page vision tends to summarise and miss small text, fine cells and drawing dimensions. Splitting into overlapping tiles forces the model to attend to local detail — slower, but acceptable for a few critical pages.

Office generation — the Anthropic document skills

The “produce” half wraps the official Anthropic docx · pdf · pptx · xlsx skills — read, modify and generate office files with tracked changes, comments and formulas intact.