Documentation Index
Fetch the complete documentation index at: https://docs.trueparser.com/llms.txt
Use this file to discover all available pages before exploring further.
Intelligent Format Identification
The engine identifies most documents automatically by inspecting their binary signature and structure (e.g., PDFs, Email, SQL). However, ZIP-based formats (GIS Shapefiles, MapInfo) require an explicit format hint during ingestion. Because ZIP files act as opaque containers, the system cannot verify whether a payload contains a CAD archive, a GIS dataset, or a generic compression without your instruction via thedocumentType parameter.
The Routing Decision
When a document enters the system, TrueParser determines the optimal engine using the following priority:1. Explicit Declaration
If you provide adocumentType in your request, TrueParser honors that routing immediately. This is the recommended approach for production pipelines where the source format is known.
2. Automatic Detection
If no type is provided, the platform uses a combination of filename heuristics, MIME types, and signature sniffing to identify the format. Once identified, it is routed to the corresponding family (e.g.,TrueParserGis for Geospatial, TrueParserMsOffice for Word/Excel).
Case Study: The CSV Dilemma
CSV is a unique format because it can represent two very different kinds of data:- GIS/Spatial: Coordinate lists and point layers.
- Tabular/Office: Standard spreadsheets and reports.
csvRoute at ingestion time:
Spatial: Routes to the GIS family.Tabular: Routes to the Office family.
Specialized Engine Families
TrueParser orchestrates the following specialized engine families:- TrueParserPdf: Technical and Vision-based PDF extraction.
- TrueParserGis: High-throughput vector spatial ingestion.
- TrueParserCad: DWG/DXF normalization and world-coordinate flattening.
- TrueParserSql: Multi-dialect SQL statement analysis and logic extraction.
- TrueParserMailKit: Email forensics for PST/OST/EML archives.
- TrueParserMsOffice / TrueParserOpenDoc: Enterprise document processing.
Benefits of Decoupled Routing
By separating routing from the core API:- Scalable Processing: The system can scale processing capacity independently for different format families based on your actual traffic patterns.
- Version Isolation: You can benefit from engine-specific updates (e.g., a new SQL dialect) without any changes to your ingestion code.
- Reliability: A failure in one engine family does not impact the rest of the parsing platform.

