Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trueparser.com/llms.txt

Use this file to discover all available pages before exploring further.

MsOffice

Use this contract for Microsoft Office and web-text results. It is designed for everyday document intelligence workflows such as knowledge extraction, document QA, editorial review, report ingestion, and RAG across Word, Excel, PowerPoint, and web text content.

Top-level envelope

{
  "schema_version": "1.0",
  "document": {},
  "warnings": [],
  "content": []
}

Document fields

FieldNotes
source_fileSource file name.
formatConcrete source format.
format_familyOne of word, excel, powerpoint, or web_text.
title, author, subject, companyDocument metadata fields.
created_at, modified_atTimestamps when available.
page_count, sheet_count, slide_countFamily-specific summary counts.

Universal content/block shape

Every public content record uses the same base shape.
FieldNotes
idStable record id.
typePublic record type.
pathPublic structural path.
parent_idParent record id.
depthStructural depth.
page_numberPage, sheet, or slide reference when applicable.
source_refProvenance object.
is_inferredInference marker.
chunk_hintPresent only when you request it.
textSearchable text projection.
attributesOffice-specific structured data.
Common record types include word_metadata, section, heading, paragraph, list, table, table_row, table_cell, image, header_footer, bookmark, footnote, endnote, toc, field, hyperlink, word_formula, word_chart, word_smartart, and page_break.

Warnings

  • warnings is always present.
  • Use plain strings.
  • Keep warnings readable.
  • Do not use warnings as a substitute for missing fields.

What clients can rely on

  • Order stays deterministic.
  • Structure stays explicit.
  • attributes holds family-specific details.
  • Optional fields may be omitted when they do not apply.
  • The public contract does not expose internal transport or worker details.

Example

{
  "schema_version": "1.0",
  "document": {
    "source_file": "sample.docx",
    "format": "docx",
    "format_family": "word",
    "title": "Insurance Policy",
    "author": "Jane Smith",
    "page_count": 4
  },
  "warnings": [],
  "content": [
    {
      "id": "blk_0001",
      "type": "heading",
      "path": ["Section 1"],
      "parent_id": null,
      "depth": 0,
      "page_number": 1,
      "source_ref": {
        "page": 1
      },
      "is_inferred": false,
      "text": "Coverage",
      "attributes": {
        "level": 1
      }
    }
  ]
}
Last modified on April 28, 2026