Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trueparser.com/llms.txt

Use this file to discover all available pages before exploring further.

Parquet

Use this contract for Parquet dataset results. It is designed for data engineering and analytics workflows where teams need schema inspection, dataset profiling, lakehouse ingestion, feature extraction, and validation over columnar data.

Top-level envelope

{
  "schema_version": "1.0",
  "document": {},
  "warnings": [],
  "content": []
}

Document fields

FieldNotes
source_file, document_nameSource file and display name.
formatAlways parquet.
format_familyAlways parquet_file.
document_id, content_hashOptional stable identifiers.
is_partialPublic completeness flag.
metadataAdditive document metadata.
record_count, schema_node_count, leaf_column_count, row_group_count, row_record_count, column_chunk_count, statistics_count, key_value_metadata_count, semantic_tag_count, diagnostic_countSummary counts when available.
row_count, compressed_size, uncompressed_size, confidence, status, completenessPublic summary values when available.

Universal content/block shape

Every public content record uses the same base shape.
FieldNotes
idStable record id.
typePublic record type.
orderDeterministic order.
pathPublic structural path.
parent_idParent record id.
depthStructural depth.
source_refProvenance object.
is_inferredInference marker.
warningsRecord-local notes.
content_hashOptional hash.
textSearchable text projection.
attributesParquet-specific structured data.
Common record types include schema_node, column, row_group, row, column_chunk, statistics, key_value_metadata, semantic_tag, and diagnostic.

Warnings

  • warnings is always present.
  • Use plain strings.
  • Keep warnings human-readable.
  • Surface recoverable issues without changing the public shape.

What clients can rely on

  • Schema order stays stable.
  • Row-group and row order stay stable.
  • Raw metadata and summary values remain separate.
  • Additive fields do not break older clients.
  • The public contract does not expose internal transport or worker details.

Example

{
  "schema_version": "1.0",
  "document": {
    "format": "parquet",
    "format_family": "parquet_file",
    "source_file": "orders.parquet",
    "row_group_count": 1,
    "record_count": 2,
    "metadata": {}
  },
  "warnings": [],
  "content": [
    {
      "id": "schema_0001",
      "type": "schema_node",
      "order": 1,
      "path": ["schema_node", "schema-order-id"],
      "parent_id": null,
      "depth": 0,
      "source_ref": {
        "record_id": "schema-order-id",
        "record_type": "schema_node",
        "source_file": "orders.parquet"
      },
      "is_inferred": false,
      "warnings": [],
      "text": "order.id",
      "attributes": {}
    }
  ]
}
Last modified on April 28, 2026