Documentation Index
Fetch the complete documentation index at: https://docs.trueparser.com/llms.txt
Use this file to discover all available pages before exploring further.
Supported modes
- Basic Single Column: Standard extraction for single-column PDF layouts.
- Basic Multi Column: Standard extraction for multi-column PDF layouts.
- Advanced: OCR for scanned PDFs and OCR-aware extraction for complex digital layouts. Advanced runs are limited to 100 pages per run in beta.
Beta limits
- Maximum file size: 25 MB
- Advanced mode page limit: 100 pages per run
Top-level envelope
Document fields
| Field | Notes |
|---|---|
source_file | Source file name. |
format | Always pdf. |
format_family | Always pdf. |
title, author, subject, company | Document metadata fields. |
created_at, modified_at | Timestamps when available. |
page_count | Page count when available. |
source_mode | Public extraction mode label. |
source_engine | Engine identity. |
Universal content/block shape
Every public content record uses the same base shape.| Field | Notes |
|---|---|
id | Stable record id. |
type | Public record type. |
path | Public structural path. |
parent_id | Parent record id. |
depth | Structural depth. |
page_number | Page number. |
order | Deterministic order. |
bbox | Required bounding box. |
source_ref | Provenance object. |
is_inferred | Inference marker. |
chunk_hint | Present only when you request it. |
text | Searchable text projection. |
attributes | PDF-specific structured data. |
paragraph, heading, list, table, image, and header_footer.
Warnings
warningsis always present.- Use plain strings.
- Keep warnings human-readable.
- Use warnings for client-facing notes, not for hidden parser behavior.
What clients can rely on
- Page order stays stable.
- Section structure stays explicit.
bboxandsource_refstay attached to each block.attributescarries PDF-specific details.- The public contract does not expose internal transport or worker details.

