nv_ingest_api.internal.schemas.meta package#

Submodules#

nv_ingest_api.internal.schemas.meta.base_model_noext module#

pydantic model nv_ingest_api.internal.schemas.meta.base_model_noext.BaseModelNoExt[source]#

Bases: BaseModel

Show JSON schema
{
   "title": "BaseModelNoExt",
   "type": "object",
   "properties": {},
   "additionalProperties": false
}

Config:
  • extra: str = forbid

nv_ingest_api.internal.schemas.meta.ingest_job_schema module#

pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestJobSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestJobSchema",
   "type": "object",
   "properties": {
      "job_payload": {
         "$ref": "#/$defs/JobPayloadSchema"
      },
      "job_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Job Id"
      },
      "tasks": {
         "items": {
            "$ref": "#/$defs/IngestTaskSchema"
         },
         "title": "Tasks",
         "type": "array"
      },
      "tracing_options": {
         "anyOf": [
            {
               "$ref": "#/$defs/TracingOptionsSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      }
   },
   "$defs": {
      "ContentTypeEnum": {
         "description": "Enum for representing various content types.\n\nNote: Content type declares the broad category of the content, such as text, image, audio, etc.\nThis is not equivalent to the Document type, which is a specific file format.\n\nAttributes\n----------\nAUDIO : str\n    Represents audio content.\nEMBEDDING : str\n    Represents embedding content.\nIMAGE : str\n    Represents image content.\nINFO_MSG : str\n    Represents an informational message.\nPAGE_IMAGE : str\n    Represents a full-page image rendered from a document.\nSTRUCTURED : str\n    Represents structured content.\nTEXT : str\n    Represents text content.\nUNSTRUCTURED : str\n    Represents unstructured content.\nVIDEO : str\n    Represents video content.",
         "enum": [
            "audio",
            "chart",
            "embedding",
            "image",
            "infographic",
            "info_message",
            "none",
            "page_image",
            "structured",
            "table",
            "text",
            "unknown",
            "video"
         ],
         "title": "ContentTypeEnum",
         "type": "string"
      },
      "DocumentTypeEnum": {
         "description": "Enum for representing various document file types.\n\nNote: Document type refers to the specific file format of the content, such as PDF, DOCX, etc.\nThis is not equivalent to the Content type, which is a broad category of the content.\n\nAttributes\n----------\nBMP: str\n    BMP image format.\nDOCX: str\n    Microsoft Word document format.\nHTML: str\n    HTML document.\nJPEG: str\n    JPEG image format.\nPDF: str\n    PDF document format.\nPNG: str\n    PNG image format.\nPPTX: str\n    PowerPoint presentation format.\nSVG: str\n    SVG image format.\nTIFF: str\n    TIFF image format.\nTXT: str\n    Plain text file.\nMP3: str\n    MP3 audio format.\nWAV: str\n    WAV audio format.",
         "enum": [
            "bmp",
            "docx",
            "html",
            "jpeg",
            "pdf",
            "png",
            "pptx",
            "svg",
            "tiff",
            "text",
            "text",
            "mp3",
            "wav",
            "unknown"
         ],
         "title": "DocumentTypeEnum",
         "type": "string"
      },
      "IngestTaskAudioExtraction": {
         "additionalProperties": false,
         "properties": {
            "auth_token": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Auth Token"
            },
            "grpc_endpoint": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Grpc Endpoint"
            },
            "http_endpoint": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Http Endpoint"
            },
            "infer_protocol": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Infer Protocol"
            },
            "function_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Function Id"
            },
            "use_ssl": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Use Ssl"
            },
            "ssl_cert": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Ssl Cert"
            },
            "segment_audio": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Segment Audio"
            }
         },
         "title": "IngestTaskAudioExtraction",
         "type": "object"
      },
      "IngestTaskCaptionSchema": {
         "additionalProperties": false,
         "properties": {
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Api Key"
            },
            "endpoint_url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Endpoint Url"
            },
            "prompt": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Prompt"
            },
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Model Name"
            }
         },
         "title": "IngestTaskCaptionSchema",
         "type": "object"
      },
      "IngestTaskChartExtraction": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskChartExtraction",
         "type": "object"
      },
      "IngestTaskDedupParams": {
         "additionalProperties": false,
         "properties": {
            "filter": {
               "default": false,
               "title": "Filter",
               "type": "boolean"
            }
         },
         "title": "IngestTaskDedupParams",
         "type": "object"
      },
      "IngestTaskDedupSchema": {
         "additionalProperties": false,
         "properties": {
            "content_type": {
               "$ref": "#/$defs/ContentTypeEnum",
               "default": "image"
            },
            "params": {
               "$ref": "#/$defs/IngestTaskDedupParams",
               "default": {
                  "filter": false
               }
            }
         },
         "title": "IngestTaskDedupSchema",
         "type": "object"
      },
      "IngestTaskEmbedSchema": {
         "additionalProperties": false,
         "properties": {
            "endpoint_url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Endpoint Url"
            },
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Model Name"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Api Key"
            },
            "filter_errors": {
               "default": false,
               "title": "Filter Errors",
               "type": "boolean"
            },
            "text_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Text Elements Modality"
            },
            "image_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Image Elements Modality"
            },
            "structured_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Structured Elements Modality"
            },
            "audio_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Audio Elements Modality"
            }
         },
         "title": "IngestTaskEmbedSchema",
         "type": "object"
      },
      "IngestTaskExtractSchema": {
         "additionalProperties": false,
         "properties": {
            "document_type": {
               "$ref": "#/$defs/DocumentTypeEnum"
            },
            "method": {
               "title": "Method",
               "type": "string"
            },
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "required": [
            "document_type",
            "method"
         ],
         "title": "IngestTaskExtractSchema",
         "type": "object"
      },
      "IngestTaskFilterParamsSchema": {
         "additionalProperties": false,
         "properties": {
            "min_size": {
               "default": 128,
               "title": "Min Size",
               "type": "integer"
            },
            "max_aspect_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": 5.0,
               "title": "Max Aspect Ratio"
            },
            "min_aspect_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": 0.2,
               "title": "Min Aspect Ratio"
            },
            "filter": {
               "default": false,
               "title": "Filter",
               "type": "boolean"
            }
         },
         "title": "IngestTaskFilterParamsSchema",
         "type": "object"
      },
      "IngestTaskFilterSchema": {
         "additionalProperties": false,
         "properties": {
            "content_type": {
               "$ref": "#/$defs/ContentTypeEnum",
               "default": "image"
            },
            "params": {
               "$ref": "#/$defs/IngestTaskFilterParamsSchema",
               "default": {
                  "min_size": 128,
                  "max_aspect_ratio": 5.0,
                  "min_aspect_ratio": 0.2,
                  "filter": false
               }
            }
         },
         "title": "IngestTaskFilterSchema",
         "type": "object"
      },
      "IngestTaskInfographicExtraction": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskInfographicExtraction",
         "type": "object"
      },
      "IngestTaskSchema": {
         "additionalProperties": false,
         "properties": {
            "type": {
               "$ref": "#/$defs/TaskTypeEnum"
            },
            "task_properties": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/IngestTaskSplitSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskExtractSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskStoreEmbedSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskStoreSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskEmbedSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskCaptionSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskDedupSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskFilterSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskVdbUploadSchema"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskAudioExtraction"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskTableExtraction"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskChartExtraction"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskInfographicExtraction"
                  },
                  {
                     "$ref": "#/$defs/IngestTaskUDFSchema"
                  }
               ],
               "title": "Task Properties"
            },
            "raise_on_failure": {
               "default": false,
               "title": "Raise On Failure",
               "type": "boolean"
            }
         },
         "required": [
            "type",
            "task_properties"
         ],
         "title": "IngestTaskSchema",
         "type": "object"
      },
      "IngestTaskSplitSchema": {
         "additionalProperties": false,
         "properties": {
            "tokenizer": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Tokenizer"
            },
            "chunk_size": {
               "default": 1024,
               "exclusiveMinimum": 0,
               "title": "Chunk Size",
               "type": "integer"
            },
            "chunk_overlap": {
               "default": 150,
               "minimum": 0,
               "title": "Chunk Overlap",
               "type": "integer"
            },
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskSplitSchema",
         "type": "object"
      },
      "IngestTaskStoreEmbedSchema": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskStoreEmbedSchema",
         "type": "object"
      },
      "IngestTaskStoreSchema": {
         "additionalProperties": false,
         "properties": {
            "structured": {
               "default": true,
               "title": "Structured",
               "type": "boolean"
            },
            "images": {
               "default": false,
               "title": "Images",
               "type": "boolean"
            },
            "method": {
               "title": "Method",
               "type": "string"
            },
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "required": [
            "method"
         ],
         "title": "IngestTaskStoreSchema",
         "type": "object"
      },
      "IngestTaskTableExtraction": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskTableExtraction",
         "type": "object"
      },
      "IngestTaskUDFSchema": {
         "additionalProperties": false,
         "properties": {
            "udf_function": {
               "title": "Udf Function",
               "type": "string"
            },
            "udf_function_name": {
               "title": "Udf Function Name",
               "type": "string"
            },
            "phase": {
               "anyOf": [
                  {
                     "maximum": 5,
                     "minimum": 1,
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Phase"
            },
            "run_before": {
               "default": false,
               "description": "Execute UDF before the target stage",
               "title": "Run Before",
               "type": "boolean"
            },
            "run_after": {
               "default": false,
               "description": "Execute UDF after the target stage",
               "title": "Run After",
               "type": "boolean"
            },
            "target_stage": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Name of the stage to target (e.g., 'image_dedup', 'text_extract')",
               "title": "Target Stage"
            }
         },
         "required": [
            "udf_function",
            "udf_function_name"
         ],
         "title": "IngestTaskUDFSchema",
         "type": "object"
      },
      "IngestTaskVdbUploadSchema": {
         "additionalProperties": false,
         "properties": {
            "bulk_ingest": {
               "default": false,
               "title": "Bulk Ingest",
               "type": "boolean"
            },
            "bulk_ingest_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Bulk Ingest Path"
            },
            "params": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Params"
            },
            "filter_errors": {
               "default": true,
               "title": "Filter Errors",
               "type": "boolean"
            }
         },
         "title": "IngestTaskVdbUploadSchema",
         "type": "object"
      },
      "JobPayloadSchema": {
         "additionalProperties": false,
         "properties": {
            "content": {
               "items": {
                  "anyOf": [
                     {
                        "type": "string"
                     },
                     {
                        "format": "binary",
                        "type": "string"
                     }
                  ]
               },
               "title": "Content",
               "type": "array"
            },
            "source_name": {
               "items": {
                  "type": "string"
               },
               "title": "Source Name",
               "type": "array"
            },
            "source_id": {
               "items": {
                  "anyOf": [
                     {
                        "type": "string"
                     },
                     {
                        "type": "integer"
                     }
                  ]
               },
               "title": "Source Id",
               "type": "array"
            },
            "document_type": {
               "items": {
                  "type": "string"
               },
               "title": "Document Type",
               "type": "array"
            }
         },
         "required": [
            "content",
            "source_name",
            "source_id",
            "document_type"
         ],
         "title": "JobPayloadSchema",
         "type": "object"
      },
      "TaskTypeEnum": {
         "description": "Enum for representing various task types.\n\nAttributes\n----------\nCAPTION : str\n    Represents a caption task.\nDEDUP : str\n    Represents a deduplication task.\nEMBED : str\n    Represents an embedding task.\nEXTRACT : str\n    Represents an extraction task.\nFILTER : str\n    Represents a filtering task.\nSPLIT : str\n    Represents a splitting task.\nSTORE : str\n    Represents a storing task.\nSTORE_EMBEDDING : str\n    Represents a task for storing embeddings.\nVDB_UPLOAD : str\n    Represents a task for uploading to a vector database.\nAUDIO_DATA_EXTRACT : str\n    Represents a task for extracting audio data.\nTABLE_DATA_EXTRACT : str\n    Represents a task for extracting table data.\nCHART_DATA_EXTRACT : str\n    Represents a task for extracting chart data.\nINFOGRAPHIC_DATA_EXTRACT : str\n    Represents a task for extracting infographic data.\nUDF : str\n    Represents a user-defined function task.",
         "enum": [
            "audio_data_extract",
            "caption",
            "chart_data_extract",
            "dedup",
            "embed",
            "extract",
            "filter",
            "infographic_data_extract",
            "split",
            "store_embedding",
            "store",
            "table_data_extract",
            "udf",
            "vdb_upload"
         ],
         "title": "TaskTypeEnum",
         "type": "string"
      },
      "TracingOptionsSchema": {
         "additionalProperties": false,
         "properties": {
            "trace": {
               "default": false,
               "title": "Trace",
               "type": "boolean"
            },
            "ts_send": {
               "title": "Ts Send",
               "type": "integer"
            },
            "trace_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Trace Id"
            }
         },
         "required": [
            "ts_send"
         ],
         "title": "TracingOptionsSchema",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "job_payload",
      "job_id",
      "tasks"
   ]
}

Config:
  • extra: str = forbid

Fields:
field job_id: str | int [Required]#
field job_payload: JobPayloadSchema [Required]#
field tasks: List[IngestTaskSchema] [Required]#
field tracing_options: TracingOptionsSchema | None = None#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskAudioExtraction[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskAudioExtraction",
   "type": "object",
   "properties": {
      "auth_token": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Auth Token"
      },
      "grpc_endpoint": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Grpc Endpoint"
      },
      "http_endpoint": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Http Endpoint"
      },
      "infer_protocol": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Infer Protocol"
      },
      "function_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Function Id"
      },
      "use_ssl": {
         "anyOf": [
            {
               "type": "boolean"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Use Ssl"
      },
      "ssl_cert": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Ssl Cert"
      },
      "segment_audio": {
         "anyOf": [
            {
               "type": "boolean"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Segment Audio"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field auth_token: str | None = None#
field function_id: str | None = None#
field grpc_endpoint: str | None = None#
field http_endpoint: str | None = None#
field infer_protocol: str | None = None#
field segment_audio: bool | None = None#
field ssl_cert: str | None = None#
field use_ssl: bool | None = None#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskCaptionSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskCaptionSchema",
   "type": "object",
   "properties": {
      "api_key": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Api Key"
      },
      "endpoint_url": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Endpoint Url"
      },
      "prompt": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Prompt"
      },
      "model_name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Model Name"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field api_key: str | None = None#
field endpoint_url: str | None = None#
field model_name: str | None = None#
field prompt: str | None = None#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskChartExtraction[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskChartExtraction",
   "type": "object",
   "properties": {
      "params": {
         "additionalProperties": true,
         "title": "Params",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field params: dict [Optional]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskDedupParams[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskDedupParams",
   "type": "object",
   "properties": {
      "filter": {
         "default": false,
         "title": "Filter",
         "type": "boolean"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field filter: bool = False#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskDedupSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskDedupSchema",
   "type": "object",
   "properties": {
      "content_type": {
         "$ref": "#/$defs/ContentTypeEnum",
         "default": "image"
      },
      "params": {
         "$ref": "#/$defs/IngestTaskDedupParams",
         "default": {
            "filter": false
         }
      }
   },
   "$defs": {
      "ContentTypeEnum": {
         "description": "Enum for representing various content types.\n\nNote: Content type declares the broad category of the content, such as text, image, audio, etc.\nThis is not equivalent to the Document type, which is a specific file format.\n\nAttributes\n----------\nAUDIO : str\n    Represents audio content.\nEMBEDDING : str\n    Represents embedding content.\nIMAGE : str\n    Represents image content.\nINFO_MSG : str\n    Represents an informational message.\nPAGE_IMAGE : str\n    Represents a full-page image rendered from a document.\nSTRUCTURED : str\n    Represents structured content.\nTEXT : str\n    Represents text content.\nUNSTRUCTURED : str\n    Represents unstructured content.\nVIDEO : str\n    Represents video content.",
         "enum": [
            "audio",
            "chart",
            "embedding",
            "image",
            "infographic",
            "info_message",
            "none",
            "page_image",
            "structured",
            "table",
            "text",
            "unknown",
            "video"
         ],
         "title": "ContentTypeEnum",
         "type": "string"
      },
      "IngestTaskDedupParams": {
         "additionalProperties": false,
         "properties": {
            "filter": {
               "default": false,
               "title": "Filter",
               "type": "boolean"
            }
         },
         "title": "IngestTaskDedupParams",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field content_type: ContentTypeEnum = ContentTypeEnum.IMAGE#
field params: IngestTaskDedupParams = IngestTaskDedupParams(filter=False)#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskEmbedSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskEmbedSchema",
   "type": "object",
   "properties": {
      "endpoint_url": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Endpoint Url"
      },
      "model_name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Model Name"
      },
      "api_key": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Api Key"
      },
      "filter_errors": {
         "default": false,
         "title": "Filter Errors",
         "type": "boolean"
      },
      "text_elements_modality": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Text Elements Modality"
      },
      "image_elements_modality": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Image Elements Modality"
      },
      "structured_elements_modality": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Structured Elements Modality"
      },
      "audio_elements_modality": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Audio Elements Modality"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field api_key: str | None = None#
field audio_elements_modality: str | None = None#
field endpoint_url: str | None = None#
field filter_errors: bool = False#
field image_elements_modality: str | None = None#
field model_name: str | None = None#
field structured_elements_modality: str | None = None#
field text_elements_modality: str | None = None#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskExtractSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskExtractSchema",
   "type": "object",
   "properties": {
      "document_type": {
         "$ref": "#/$defs/DocumentTypeEnum"
      },
      "method": {
         "title": "Method",
         "type": "string"
      },
      "params": {
         "additionalProperties": true,
         "title": "Params",
         "type": "object"
      }
   },
   "$defs": {
      "DocumentTypeEnum": {
         "description": "Enum for representing various document file types.\n\nNote: Document type refers to the specific file format of the content, such as PDF, DOCX, etc.\nThis is not equivalent to the Content type, which is a broad category of the content.\n\nAttributes\n----------\nBMP: str\n    BMP image format.\nDOCX: str\n    Microsoft Word document format.\nHTML: str\n    HTML document.\nJPEG: str\n    JPEG image format.\nPDF: str\n    PDF document format.\nPNG: str\n    PNG image format.\nPPTX: str\n    PowerPoint presentation format.\nSVG: str\n    SVG image format.\nTIFF: str\n    TIFF image format.\nTXT: str\n    Plain text file.\nMP3: str\n    MP3 audio format.\nWAV: str\n    WAV audio format.",
         "enum": [
            "bmp",
            "docx",
            "html",
            "jpeg",
            "pdf",
            "png",
            "pptx",
            "svg",
            "tiff",
            "text",
            "text",
            "mp3",
            "wav",
            "unknown"
         ],
         "title": "DocumentTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "document_type",
      "method"
   ]
}

Config:
  • extra: str = forbid

Fields:
Validators:
field document_type: DocumentTypeEnum [Required]#
Validated by:
field method: str [Required]#
field params: dict [Optional]#
validator case_insensitive_document_type  »  document_type[source]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskFilterParamsSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskFilterParamsSchema",
   "type": "object",
   "properties": {
      "min_size": {
         "default": 128,
         "title": "Min Size",
         "type": "integer"
      },
      "max_aspect_ratio": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "integer"
            }
         ],
         "default": 5.0,
         "title": "Max Aspect Ratio"
      },
      "min_aspect_ratio": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "integer"
            }
         ],
         "default": 0.2,
         "title": "Min Aspect Ratio"
      },
      "filter": {
         "default": false,
         "title": "Filter",
         "type": "boolean"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field filter: bool = False#
field max_aspect_ratio: float | int = 5.0#
field min_aspect_ratio: float | int = 0.2#
field min_size: int = 128#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskFilterSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskFilterSchema",
   "type": "object",
   "properties": {
      "content_type": {
         "$ref": "#/$defs/ContentTypeEnum",
         "default": "image"
      },
      "params": {
         "$ref": "#/$defs/IngestTaskFilterParamsSchema",
         "default": {
            "min_size": 128,
            "max_aspect_ratio": 5.0,
            "min_aspect_ratio": 0.2,
            "filter": false
         }
      }
   },
   "$defs": {
      "ContentTypeEnum": {
         "description": "Enum for representing various content types.\n\nNote: Content type declares the broad category of the content, such as text, image, audio, etc.\nThis is not equivalent to the Document type, which is a specific file format.\n\nAttributes\n----------\nAUDIO : str\n    Represents audio content.\nEMBEDDING : str\n    Represents embedding content.\nIMAGE : str\n    Represents image content.\nINFO_MSG : str\n    Represents an informational message.\nPAGE_IMAGE : str\n    Represents a full-page image rendered from a document.\nSTRUCTURED : str\n    Represents structured content.\nTEXT : str\n    Represents text content.\nUNSTRUCTURED : str\n    Represents unstructured content.\nVIDEO : str\n    Represents video content.",
         "enum": [
            "audio",
            "chart",
            "embedding",
            "image",
            "infographic",
            "info_message",
            "none",
            "page_image",
            "structured",
            "table",
            "text",
            "unknown",
            "video"
         ],
         "title": "ContentTypeEnum",
         "type": "string"
      },
      "IngestTaskFilterParamsSchema": {
         "additionalProperties": false,
         "properties": {
            "min_size": {
               "default": 128,
               "title": "Min Size",
               "type": "integer"
            },
            "max_aspect_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": 5.0,
               "title": "Max Aspect Ratio"
            },
            "min_aspect_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": 0.2,
               "title": "Min Aspect Ratio"
            },
            "filter": {
               "default": false,
               "title": "Filter",
               "type": "boolean"
            }
         },
         "title": "IngestTaskFilterParamsSchema",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field content_type: ContentTypeEnum = ContentTypeEnum.IMAGE#
field params: IngestTaskFilterParamsSchema = IngestTaskFilterParamsSchema(min_size=128, max_aspect_ratio=5.0, min_aspect_ratio=0.2, filter=False)#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskInfographicExtraction[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskInfographicExtraction",
   "type": "object",
   "properties": {
      "params": {
         "additionalProperties": true,
         "title": "Params",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field params: dict [Optional]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskSchema",
   "type": "object",
   "properties": {
      "type": {
         "$ref": "#/$defs/TaskTypeEnum"
      },
      "task_properties": {
         "anyOf": [
            {
               "$ref": "#/$defs/IngestTaskSplitSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskExtractSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskStoreEmbedSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskStoreSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskEmbedSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskCaptionSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskDedupSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskFilterSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskVdbUploadSchema"
            },
            {
               "$ref": "#/$defs/IngestTaskAudioExtraction"
            },
            {
               "$ref": "#/$defs/IngestTaskTableExtraction"
            },
            {
               "$ref": "#/$defs/IngestTaskChartExtraction"
            },
            {
               "$ref": "#/$defs/IngestTaskInfographicExtraction"
            },
            {
               "$ref": "#/$defs/IngestTaskUDFSchema"
            }
         ],
         "title": "Task Properties"
      },
      "raise_on_failure": {
         "default": false,
         "title": "Raise On Failure",
         "type": "boolean"
      }
   },
   "$defs": {
      "ContentTypeEnum": {
         "description": "Enum for representing various content types.\n\nNote: Content type declares the broad category of the content, such as text, image, audio, etc.\nThis is not equivalent to the Document type, which is a specific file format.\n\nAttributes\n----------\nAUDIO : str\n    Represents audio content.\nEMBEDDING : str\n    Represents embedding content.\nIMAGE : str\n    Represents image content.\nINFO_MSG : str\n    Represents an informational message.\nPAGE_IMAGE : str\n    Represents a full-page image rendered from a document.\nSTRUCTURED : str\n    Represents structured content.\nTEXT : str\n    Represents text content.\nUNSTRUCTURED : str\n    Represents unstructured content.\nVIDEO : str\n    Represents video content.",
         "enum": [
            "audio",
            "chart",
            "embedding",
            "image",
            "infographic",
            "info_message",
            "none",
            "page_image",
            "structured",
            "table",
            "text",
            "unknown",
            "video"
         ],
         "title": "ContentTypeEnum",
         "type": "string"
      },
      "DocumentTypeEnum": {
         "description": "Enum for representing various document file types.\n\nNote: Document type refers to the specific file format of the content, such as PDF, DOCX, etc.\nThis is not equivalent to the Content type, which is a broad category of the content.\n\nAttributes\n----------\nBMP: str\n    BMP image format.\nDOCX: str\n    Microsoft Word document format.\nHTML: str\n    HTML document.\nJPEG: str\n    JPEG image format.\nPDF: str\n    PDF document format.\nPNG: str\n    PNG image format.\nPPTX: str\n    PowerPoint presentation format.\nSVG: str\n    SVG image format.\nTIFF: str\n    TIFF image format.\nTXT: str\n    Plain text file.\nMP3: str\n    MP3 audio format.\nWAV: str\n    WAV audio format.",
         "enum": [
            "bmp",
            "docx",
            "html",
            "jpeg",
            "pdf",
            "png",
            "pptx",
            "svg",
            "tiff",
            "text",
            "text",
            "mp3",
            "wav",
            "unknown"
         ],
         "title": "DocumentTypeEnum",
         "type": "string"
      },
      "IngestTaskAudioExtraction": {
         "additionalProperties": false,
         "properties": {
            "auth_token": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Auth Token"
            },
            "grpc_endpoint": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Grpc Endpoint"
            },
            "http_endpoint": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Http Endpoint"
            },
            "infer_protocol": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Infer Protocol"
            },
            "function_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Function Id"
            },
            "use_ssl": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Use Ssl"
            },
            "ssl_cert": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Ssl Cert"
            },
            "segment_audio": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Segment Audio"
            }
         },
         "title": "IngestTaskAudioExtraction",
         "type": "object"
      },
      "IngestTaskCaptionSchema": {
         "additionalProperties": false,
         "properties": {
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Api Key"
            },
            "endpoint_url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Endpoint Url"
            },
            "prompt": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Prompt"
            },
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Model Name"
            }
         },
         "title": "IngestTaskCaptionSchema",
         "type": "object"
      },
      "IngestTaskChartExtraction": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskChartExtraction",
         "type": "object"
      },
      "IngestTaskDedupParams": {
         "additionalProperties": false,
         "properties": {
            "filter": {
               "default": false,
               "title": "Filter",
               "type": "boolean"
            }
         },
         "title": "IngestTaskDedupParams",
         "type": "object"
      },
      "IngestTaskDedupSchema": {
         "additionalProperties": false,
         "properties": {
            "content_type": {
               "$ref": "#/$defs/ContentTypeEnum",
               "default": "image"
            },
            "params": {
               "$ref": "#/$defs/IngestTaskDedupParams",
               "default": {
                  "filter": false
               }
            }
         },
         "title": "IngestTaskDedupSchema",
         "type": "object"
      },
      "IngestTaskEmbedSchema": {
         "additionalProperties": false,
         "properties": {
            "endpoint_url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Endpoint Url"
            },
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Model Name"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Api Key"
            },
            "filter_errors": {
               "default": false,
               "title": "Filter Errors",
               "type": "boolean"
            },
            "text_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Text Elements Modality"
            },
            "image_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Image Elements Modality"
            },
            "structured_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Structured Elements Modality"
            },
            "audio_elements_modality": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Audio Elements Modality"
            }
         },
         "title": "IngestTaskEmbedSchema",
         "type": "object"
      },
      "IngestTaskExtractSchema": {
         "additionalProperties": false,
         "properties": {
            "document_type": {
               "$ref": "#/$defs/DocumentTypeEnum"
            },
            "method": {
               "title": "Method",
               "type": "string"
            },
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "required": [
            "document_type",
            "method"
         ],
         "title": "IngestTaskExtractSchema",
         "type": "object"
      },
      "IngestTaskFilterParamsSchema": {
         "additionalProperties": false,
         "properties": {
            "min_size": {
               "default": 128,
               "title": "Min Size",
               "type": "integer"
            },
            "max_aspect_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": 5.0,
               "title": "Max Aspect Ratio"
            },
            "min_aspect_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": 0.2,
               "title": "Min Aspect Ratio"
            },
            "filter": {
               "default": false,
               "title": "Filter",
               "type": "boolean"
            }
         },
         "title": "IngestTaskFilterParamsSchema",
         "type": "object"
      },
      "IngestTaskFilterSchema": {
         "additionalProperties": false,
         "properties": {
            "content_type": {
               "$ref": "#/$defs/ContentTypeEnum",
               "default": "image"
            },
            "params": {
               "$ref": "#/$defs/IngestTaskFilterParamsSchema",
               "default": {
                  "min_size": 128,
                  "max_aspect_ratio": 5.0,
                  "min_aspect_ratio": 0.2,
                  "filter": false
               }
            }
         },
         "title": "IngestTaskFilterSchema",
         "type": "object"
      },
      "IngestTaskInfographicExtraction": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskInfographicExtraction",
         "type": "object"
      },
      "IngestTaskSplitSchema": {
         "additionalProperties": false,
         "properties": {
            "tokenizer": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Tokenizer"
            },
            "chunk_size": {
               "default": 1024,
               "exclusiveMinimum": 0,
               "title": "Chunk Size",
               "type": "integer"
            },
            "chunk_overlap": {
               "default": 150,
               "minimum": 0,
               "title": "Chunk Overlap",
               "type": "integer"
            },
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskSplitSchema",
         "type": "object"
      },
      "IngestTaskStoreEmbedSchema": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskStoreEmbedSchema",
         "type": "object"
      },
      "IngestTaskStoreSchema": {
         "additionalProperties": false,
         "properties": {
            "structured": {
               "default": true,
               "title": "Structured",
               "type": "boolean"
            },
            "images": {
               "default": false,
               "title": "Images",
               "type": "boolean"
            },
            "method": {
               "title": "Method",
               "type": "string"
            },
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "required": [
            "method"
         ],
         "title": "IngestTaskStoreSchema",
         "type": "object"
      },
      "IngestTaskTableExtraction": {
         "additionalProperties": false,
         "properties": {
            "params": {
               "additionalProperties": true,
               "title": "Params",
               "type": "object"
            }
         },
         "title": "IngestTaskTableExtraction",
         "type": "object"
      },
      "IngestTaskUDFSchema": {
         "additionalProperties": false,
         "properties": {
            "udf_function": {
               "title": "Udf Function",
               "type": "string"
            },
            "udf_function_name": {
               "title": "Udf Function Name",
               "type": "string"
            },
            "phase": {
               "anyOf": [
                  {
                     "maximum": 5,
                     "minimum": 1,
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Phase"
            },
            "run_before": {
               "default": false,
               "description": "Execute UDF before the target stage",
               "title": "Run Before",
               "type": "boolean"
            },
            "run_after": {
               "default": false,
               "description": "Execute UDF after the target stage",
               "title": "Run After",
               "type": "boolean"
            },
            "target_stage": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Name of the stage to target (e.g., 'image_dedup', 'text_extract')",
               "title": "Target Stage"
            }
         },
         "required": [
            "udf_function",
            "udf_function_name"
         ],
         "title": "IngestTaskUDFSchema",
         "type": "object"
      },
      "IngestTaskVdbUploadSchema": {
         "additionalProperties": false,
         "properties": {
            "bulk_ingest": {
               "default": false,
               "title": "Bulk Ingest",
               "type": "boolean"
            },
            "bulk_ingest_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Bulk Ingest Path"
            },
            "params": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Params"
            },
            "filter_errors": {
               "default": true,
               "title": "Filter Errors",
               "type": "boolean"
            }
         },
         "title": "IngestTaskVdbUploadSchema",
         "type": "object"
      },
      "TaskTypeEnum": {
         "description": "Enum for representing various task types.\n\nAttributes\n----------\nCAPTION : str\n    Represents a caption task.\nDEDUP : str\n    Represents a deduplication task.\nEMBED : str\n    Represents an embedding task.\nEXTRACT : str\n    Represents an extraction task.\nFILTER : str\n    Represents a filtering task.\nSPLIT : str\n    Represents a splitting task.\nSTORE : str\n    Represents a storing task.\nSTORE_EMBEDDING : str\n    Represents a task for storing embeddings.\nVDB_UPLOAD : str\n    Represents a task for uploading to a vector database.\nAUDIO_DATA_EXTRACT : str\n    Represents a task for extracting audio data.\nTABLE_DATA_EXTRACT : str\n    Represents a task for extracting table data.\nCHART_DATA_EXTRACT : str\n    Represents a task for extracting chart data.\nINFOGRAPHIC_DATA_EXTRACT : str\n    Represents a task for extracting infographic data.\nUDF : str\n    Represents a user-defined function task.",
         "enum": [
            "audio_data_extract",
            "caption",
            "chart_data_extract",
            "dedup",
            "embed",
            "extract",
            "filter",
            "infographic_data_extract",
            "split",
            "store_embedding",
            "store",
            "table_data_extract",
            "udf",
            "vdb_upload"
         ],
         "title": "TaskTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "type",
      "task_properties"
   ]
}

Config:
  • extra: str = forbid

Fields:
Validators:
field raise_on_failure: bool = False#
Validated by:
field task_properties: IngestTaskSplitSchema | IngestTaskExtractSchema | IngestTaskStoreEmbedSchema | IngestTaskStoreSchema | IngestTaskEmbedSchema | IngestTaskCaptionSchema | IngestTaskDedupSchema | IngestTaskFilterSchema | IngestTaskVdbUploadSchema | IngestTaskAudioExtraction | IngestTaskTableExtraction | IngestTaskChartExtraction | IngestTaskInfographicExtraction | IngestTaskUDFSchema [Required]#
Validated by:
field type: TaskTypeEnum [Required]#
Validated by:
validator case_insensitive_task_type  »  type[source]#
validator check_task_properties_type  »  all fields[source]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskSplitSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskSplitSchema",
   "type": "object",
   "properties": {
      "tokenizer": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Tokenizer"
      },
      "chunk_size": {
         "default": 1024,
         "exclusiveMinimum": 0,
         "title": "Chunk Size",
         "type": "integer"
      },
      "chunk_overlap": {
         "default": 150,
         "minimum": 0,
         "title": "Chunk Overlap",
         "type": "integer"
      },
      "params": {
         "additionalProperties": true,
         "title": "Params",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
Validators:
field chunk_overlap: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])] = 150#
Constraints:
  • ge = 0

Validated by:
field chunk_size: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] = 1024#
Constraints:
  • gt = 0

field params: dict [Optional]#
field tokenizer: str | None = None#
validator check_chunk_overlap  »  chunk_overlap[source]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskStoreEmbedSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskStoreEmbedSchema",
   "type": "object",
   "properties": {
      "params": {
         "additionalProperties": true,
         "title": "Params",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field params: dict [Optional]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskStoreSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskStoreSchema",
   "type": "object",
   "properties": {
      "structured": {
         "default": true,
         "title": "Structured",
         "type": "boolean"
      },
      "images": {
         "default": false,
         "title": "Images",
         "type": "boolean"
      },
      "method": {
         "title": "Method",
         "type": "string"
      },
      "params": {
         "additionalProperties": true,
         "title": "Params",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "method"
   ]
}

Config:
  • extra: str = forbid

Fields:
field images: bool = False#
field method: str [Required]#
field params: dict [Optional]#
field structured: bool = True#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskTableExtraction[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskTableExtraction",
   "type": "object",
   "properties": {
      "params": {
         "additionalProperties": true,
         "title": "Params",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field params: dict [Optional]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskUDFSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskUDFSchema",
   "type": "object",
   "properties": {
      "udf_function": {
         "title": "Udf Function",
         "type": "string"
      },
      "udf_function_name": {
         "title": "Udf Function Name",
         "type": "string"
      },
      "phase": {
         "anyOf": [
            {
               "maximum": 5,
               "minimum": 1,
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Phase"
      },
      "run_before": {
         "default": false,
         "description": "Execute UDF before the target stage",
         "title": "Run Before",
         "type": "boolean"
      },
      "run_after": {
         "default": false,
         "description": "Execute UDF after the target stage",
         "title": "Run After",
         "type": "boolean"
      },
      "target_stage": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Name of the stage to target (e.g., 'image_dedup', 'text_extract')",
         "title": "Target Stage"
      }
   },
   "additionalProperties": false,
   "required": [
      "udf_function",
      "udf_function_name"
   ]
}

Config:
  • extra: str = forbid

Fields:
Validators:
field phase: int | None = None#
Constraints:
  • ge = 1

  • le = 5

Validated by:
field run_after: bool = False#

Execute UDF after the target stage

Validated by:
field run_before: bool = False#

Execute UDF before the target stage

Validated by:
field target_stage: str | None = None#

Name of the stage to target (e.g., ‘image_dedup’, ‘text_extract’)

Validated by:
field udf_function: str [Required]#
Validated by:
field udf_function_name: str [Required]#
Validated by:
validator validate_stage_targeting  »  all fields[source]#

Validate that stage targeting configuration is consistent

pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.IngestTaskVdbUploadSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "IngestTaskVdbUploadSchema",
   "type": "object",
   "properties": {
      "bulk_ingest": {
         "default": false,
         "title": "Bulk Ingest",
         "type": "boolean"
      },
      "bulk_ingest_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Bulk Ingest Path"
      },
      "params": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Params"
      },
      "filter_errors": {
         "default": true,
         "title": "Filter Errors",
         "type": "boolean"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field bulk_ingest: bool = False#
field bulk_ingest_path: str | None = None#
field filter_errors: bool = True#
field params: dict | None = None#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.JobPayloadSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "JobPayloadSchema",
   "type": "object",
   "properties": {
      "content": {
         "items": {
            "anyOf": [
               {
                  "type": "string"
               },
               {
                  "format": "binary",
                  "type": "string"
               }
            ]
         },
         "title": "Content",
         "type": "array"
      },
      "source_name": {
         "items": {
            "type": "string"
         },
         "title": "Source Name",
         "type": "array"
      },
      "source_id": {
         "items": {
            "anyOf": [
               {
                  "type": "string"
               },
               {
                  "type": "integer"
               }
            ]
         },
         "title": "Source Id",
         "type": "array"
      },
      "document_type": {
         "items": {
            "type": "string"
         },
         "title": "Document Type",
         "type": "array"
      }
   },
   "additionalProperties": false,
   "required": [
      "content",
      "source_name",
      "source_id",
      "document_type"
   ]
}

Config:
  • extra: str = forbid

Fields:
field content: List[str | bytes] [Required]#
field document_type: List[str] [Required]#
field source_id: List[str | int] [Required]#
field source_name: List[str] [Required]#
pydantic model nv_ingest_api.internal.schemas.meta.ingest_job_schema.TracingOptionsSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "TracingOptionsSchema",
   "type": "object",
   "properties": {
      "trace": {
         "default": false,
         "title": "Trace",
         "type": "boolean"
      },
      "ts_send": {
         "title": "Ts Send",
         "type": "integer"
      },
      "trace_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Trace Id"
      }
   },
   "additionalProperties": false,
   "required": [
      "ts_send"
   ]
}

Config:
  • extra: str = forbid

Fields:
field trace: bool = False#
field trace_id: str | None = None#
field ts_send: int [Required]#
nv_ingest_api.internal.schemas.meta.ingest_job_schema.validate_ingest_job(
job_data: Dict[str, Any],
) IngestJobSchema[source]#

Validates a dictionary representing an ingest_job using the IngestJobSchema.

Parameters: - job_data: Dictionary representing an ingest job.

Returns: - IngestJobSchema: The validated ingest job.

Raises: - ValidationError: If the input data does not conform to the IngestJobSchema.

nv_ingest_api.internal.schemas.meta.metadata_schema module#

pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.AudioMetadataSchema[source]#

Bases: BaseModelNoExt

The schema for extracted audio content.

Show JSON schema
{
   "title": "AudioMetadataSchema",
   "description": "The schema for extracted audio content.",
   "type": "object",
   "properties": {
      "audio_transcript": {
         "default": "",
         "title": "Audio Transcript",
         "type": "string"
      },
      "audio_type": {
         "default": "",
         "title": "Audio Type",
         "type": "string"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field audio_transcript: str = ''#

A transcript of the audio content.

field audio_type: str = ''#

The type or format of the audio, such as mp3, wav.

field custom_content: Dict[str, Any] | None = None#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.ChartMetadataSchema[source]#

Bases: BaseModelNoExt

The schema for extracted chart content.

Show JSON schema
{
   "title": "ChartMetadataSchema",
   "description": "The schema for extracted chart content.",
   "type": "object",
   "properties": {
      "caption": {
         "default": "",
         "title": "Caption",
         "type": "string"
      },
      "table_format": {
         "$ref": "#/$defs/TableFormatEnum"
      },
      "table_content": {
         "default": "",
         "title": "Table Content",
         "type": "string"
      },
      "table_content_format": {
         "anyOf": [
            {
               "$ref": "#/$defs/TableFormatEnum"
            },
            {
               "type": "string"
            }
         ],
         "default": "",
         "title": "Table Content Format"
      },
      "table_location": {
         "default": [
            0,
            0,
            0,
            0
         ],
         "items": {},
         "title": "Table Location",
         "type": "array"
      },
      "table_location_max_dimensions": {
         "default": [
            0,
            0
         ],
         "items": {},
         "title": "Table Location Max Dimensions",
         "type": "array"
      },
      "uploaded_image_uri": {
         "default": "",
         "title": "Uploaded Image Uri",
         "type": "string"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "TableFormatEnum": {
         "description": "Enum for representing table formats.\n\nAttributes\n----------\nHTML : str\n    Represents HTML table format.\nIMAGE : str\n    Represents image table format.\nLATEX : str\n    Represents LaTeX table format.\nMARKDOWN : str\n    Represents Markdown table format.\nPSEUDO_MARKDOWN : str\n    Represents pseudo Markdown table format.\nSIMPLE : str\n    Represents simple table format.",
         "enum": [
            "html",
            "image",
            "latex",
            "markdown",
            "pseudo_markdown",
            "simple"
         ],
         "title": "TableFormatEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "table_format"
   ]
}

Config:
  • extra: str = forbid

Fields:
field caption: str = ''#

The caption for the chart.

field custom_content: Dict[str, Any] | None = None#
field table_content: str = ''#

Extracted text content, formatted according to chart_metadata.table_format.

field table_content_format: TableFormatEnum | str = ''#
field table_format: TableFormatEnum [Required]#

The format of the table. One of Structured (dataframe / lists of rows and columns), or serialized as markdown, html, latex, simple (cells separated as spaces).

field table_location: tuple = (0, 0, 0, 0)#

The bounding box of the chart, in the format (x1,y1,x2,y2).

field table_location_max_dimensions: tuple = (0, 0)#

The maximum dimensions of the bounding box of the chart, in the format (x_max,y_max).

field uploaded_image_uri: str = ''#

A mirror of source_metadata.source_location.

pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.ContentHierarchySchema[source]#

Bases: BaseModelNoExt

Schema for the extracted content hierarchy.

Show JSON schema
{
   "title": "ContentHierarchySchema",
   "description": "Schema for the extracted content hierarchy.",
   "type": "object",
   "properties": {
      "page_count": {
         "default": -1,
         "title": "Page Count",
         "type": "integer"
      },
      "page": {
         "default": -1,
         "title": "Page",
         "type": "integer"
      },
      "block": {
         "default": -1,
         "title": "Block",
         "type": "integer"
      },
      "line": {
         "default": -1,
         "title": "Line",
         "type": "integer"
      },
      "span": {
         "default": -1,
         "title": "Span",
         "type": "integer"
      },
      "nearby_objects": {
         "$ref": "#/$defs/NearbyObjectsSchema",
         "default": {
            "text": {
               "bbox": [],
               "content": [],
               "type": []
            },
            "images": {
               "bbox": [],
               "content": [],
               "type": []
            },
            "structured": {
               "bbox": [],
               "content": [],
               "type": []
            }
         }
      }
   },
   "$defs": {
      "NearbyObjectsSchema": {
         "additionalProperties": false,
         "description": "Schema to hold types of related extracted objects.",
         "properties": {
            "text": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            },
            "images": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            },
            "structured": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            }
         },
         "title": "NearbyObjectsSchema",
         "type": "object"
      },
      "NearbyObjectsSubSchema": {
         "additionalProperties": false,
         "description": "Schema to hold related extracted object.",
         "properties": {
            "content": {
               "items": {
                  "type": "string"
               },
               "title": "Content",
               "type": "array"
            },
            "bbox": {
               "items": {
                  "items": {},
                  "type": "array"
               },
               "title": "Bbox",
               "type": "array"
            },
            "type": {
               "items": {
                  "type": "string"
               },
               "title": "Type",
               "type": "array"
            }
         },
         "title": "NearbyObjectsSubSchema",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field block: int = -1#
field line: int = -1#
field nearby_objects: NearbyObjectsSchema = NearbyObjectsSchema(text=NearbyObjectsSubSchema(content=[], bbox=[], type=[]), images=NearbyObjectsSubSchema(content=[], bbox=[], type=[]), structured=NearbyObjectsSubSchema(content=[], bbox=[], type=[]))#
field page: int = -1#
field page_count: int = -1#
field span: int = -1#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.ContentMetadataSchema[source]#

Bases: BaseModelNoExt

Data extracted from a source; generally Text or Image.

Show JSON schema
{
   "title": "ContentMetadataSchema",
   "description": "Data extracted from a source; generally Text or Image.",
   "type": "object",
   "properties": {
      "type": {
         "$ref": "#/$defs/ContentTypeEnum"
      },
      "description": {
         "default": "",
         "title": "Description",
         "type": "string"
      },
      "page_number": {
         "default": -1,
         "title": "Page Number",
         "type": "integer"
      },
      "hierarchy": {
         "$ref": "#/$defs/ContentHierarchySchema",
         "default": {
            "page_count": -1,
            "page": -1,
            "block": -1,
            "line": -1,
            "span": -1,
            "nearby_objects": {
               "images": {
                  "bbox": [],
                  "content": [],
                  "type": []
               },
               "structured": {
                  "bbox": [],
                  "content": [],
                  "type": []
               },
               "text": {
                  "bbox": [],
                  "content": [],
                  "type": []
               }
            }
         }
      },
      "subtype": {
         "anyOf": [
            {
               "$ref": "#/$defs/ContentTypeEnum"
            },
            {
               "type": "string"
            }
         ],
         "default": "",
         "title": "Subtype"
      },
      "start_time": {
         "default": -1,
         "title": "Start Time",
         "type": "integer"
      },
      "end_time": {
         "default": -1,
         "title": "End Time",
         "type": "integer"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "ContentHierarchySchema": {
         "additionalProperties": false,
         "description": "Schema for the extracted content hierarchy.",
         "properties": {
            "page_count": {
               "default": -1,
               "title": "Page Count",
               "type": "integer"
            },
            "page": {
               "default": -1,
               "title": "Page",
               "type": "integer"
            },
            "block": {
               "default": -1,
               "title": "Block",
               "type": "integer"
            },
            "line": {
               "default": -1,
               "title": "Line",
               "type": "integer"
            },
            "span": {
               "default": -1,
               "title": "Span",
               "type": "integer"
            },
            "nearby_objects": {
               "$ref": "#/$defs/NearbyObjectsSchema",
               "default": {
                  "text": {
                     "bbox": [],
                     "content": [],
                     "type": []
                  },
                  "images": {
                     "bbox": [],
                     "content": [],
                     "type": []
                  },
                  "structured": {
                     "bbox": [],
                     "content": [],
                     "type": []
                  }
               }
            }
         },
         "title": "ContentHierarchySchema",
         "type": "object"
      },
      "ContentTypeEnum": {
         "description": "Enum for representing various content types.\n\nNote: Content type declares the broad category of the content, such as text, image, audio, etc.\nThis is not equivalent to the Document type, which is a specific file format.\n\nAttributes\n----------\nAUDIO : str\n    Represents audio content.\nEMBEDDING : str\n    Represents embedding content.\nIMAGE : str\n    Represents image content.\nINFO_MSG : str\n    Represents an informational message.\nPAGE_IMAGE : str\n    Represents a full-page image rendered from a document.\nSTRUCTURED : str\n    Represents structured content.\nTEXT : str\n    Represents text content.\nUNSTRUCTURED : str\n    Represents unstructured content.\nVIDEO : str\n    Represents video content.",
         "enum": [
            "audio",
            "chart",
            "embedding",
            "image",
            "infographic",
            "info_message",
            "none",
            "page_image",
            "structured",
            "table",
            "text",
            "unknown",
            "video"
         ],
         "title": "ContentTypeEnum",
         "type": "string"
      },
      "NearbyObjectsSchema": {
         "additionalProperties": false,
         "description": "Schema to hold types of related extracted objects.",
         "properties": {
            "text": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            },
            "images": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            },
            "structured": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            }
         },
         "title": "NearbyObjectsSchema",
         "type": "object"
      },
      "NearbyObjectsSubSchema": {
         "additionalProperties": false,
         "description": "Schema to hold related extracted object.",
         "properties": {
            "content": {
               "items": {
                  "type": "string"
               },
               "title": "Content",
               "type": "array"
            },
            "bbox": {
               "items": {
                  "items": {},
                  "type": "array"
               },
               "title": "Bbox",
               "type": "array"
            },
            "type": {
               "items": {
                  "type": "string"
               },
               "title": "Type",
               "type": "array"
            }
         },
         "title": "NearbyObjectsSubSchema",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "type"
   ]
}

Config:
  • extra: str = forbid

Fields:
field custom_content: Dict[str, Any] | None = None#
field description: str = ''#

A text description of the content object.

field end_time: int = -1#

The timestamp of the end of a piece of audio content.

field hierarchy: ContentHierarchySchema = ContentHierarchySchema(page_count=-1, page=-1, block=-1, line=-1, span=-1, nearby_objects=NearbyObjectsSchema(text=NearbyObjectsSubSchema(content=[], bbox=[], type=[]), images=NearbyObjectsSubSchema(content=[], bbox=[], type=[]), structured=NearbyObjectsSubSchema(content=[], bbox=[], type=[])))#

The location or order of the content within the source.

field page_number: int = -1#

The page number of the content in the source.

field start_time: int = -1#

The timestamp of the start of a piece of audio content.

field subtype: ContentTypeEnum | str = ''#

The type of the content for structured data types, such as table or chart.

field type: ContentTypeEnum [Required]#

The type of the content. Text, Image, Structured, Table, or Chart.

pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.ErrorMetadataSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "ErrorMetadataSchema",
   "type": "object",
   "properties": {
      "task": {
         "$ref": "#/$defs/TaskTypeEnum"
      },
      "status": {
         "$ref": "#/$defs/StatusEnum"
      },
      "source_id": {
         "default": "",
         "title": "Source Id",
         "type": "string"
      },
      "error_msg": {
         "title": "Error Msg",
         "type": "string"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "StatusEnum": {
         "description": "Enum for representing status messages.\n\nAttributes\n----------\nERROR : str\n    Represents an error status.\nSUCCESS : str\n    Represents a success status.",
         "enum": [
            "error",
            "success"
         ],
         "title": "StatusEnum",
         "type": "string"
      },
      "TaskTypeEnum": {
         "description": "Enum for representing various task types.\n\nAttributes\n----------\nCAPTION : str\n    Represents a caption task.\nDEDUP : str\n    Represents a deduplication task.\nEMBED : str\n    Represents an embedding task.\nEXTRACT : str\n    Represents an extraction task.\nFILTER : str\n    Represents a filtering task.\nSPLIT : str\n    Represents a splitting task.\nSTORE : str\n    Represents a storing task.\nSTORE_EMBEDDING : str\n    Represents a task for storing embeddings.\nVDB_UPLOAD : str\n    Represents a task for uploading to a vector database.\nAUDIO_DATA_EXTRACT : str\n    Represents a task for extracting audio data.\nTABLE_DATA_EXTRACT : str\n    Represents a task for extracting table data.\nCHART_DATA_EXTRACT : str\n    Represents a task for extracting chart data.\nINFOGRAPHIC_DATA_EXTRACT : str\n    Represents a task for extracting infographic data.\nUDF : str\n    Represents a user-defined function task.",
         "enum": [
            "audio_data_extract",
            "caption",
            "chart_data_extract",
            "dedup",
            "embed",
            "extract",
            "filter",
            "infographic_data_extract",
            "split",
            "store_embedding",
            "store",
            "table_data_extract",
            "udf",
            "vdb_upload"
         ],
         "title": "TaskTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "task",
      "status",
      "error_msg"
   ]
}

Config:
  • extra: str = forbid

Fields:
field custom_content: Dict[str, Any] | None = None#
field error_msg: str [Required]#
field source_id: str = ''#
field status: StatusEnum [Required]#
field task: TaskTypeEnum [Required]#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.ImageMetadataSchema[source]#

Bases: BaseModelNoExt

The schema for the extracted image content.

Show JSON schema
{
   "title": "ImageMetadataSchema",
   "description": "The schema for the extracted image content.",
   "type": "object",
   "properties": {
      "image_type": {
         "anyOf": [
            {
               "$ref": "#/$defs/DocumentTypeEnum"
            },
            {
               "type": "string"
            }
         ],
         "title": "Image Type"
      },
      "structured_image_type": {
         "$ref": "#/$defs/ContentTypeEnum",
         "default": "none"
      },
      "caption": {
         "default": "",
         "title": "Caption",
         "type": "string"
      },
      "text": {
         "default": "",
         "title": "Text",
         "type": "string"
      },
      "image_location": {
         "default": [
            0,
            0,
            0,
            0
         ],
         "items": {},
         "title": "Image Location",
         "type": "array"
      },
      "image_location_max_dimensions": {
         "default": [
            0,
            0
         ],
         "items": {},
         "title": "Image Location Max Dimensions",
         "type": "array"
      },
      "uploaded_image_url": {
         "default": "",
         "title": "Uploaded Image Url",
         "type": "string"
      },
      "width": {
         "default": 0,
         "title": "Width",
         "type": "integer"
      },
      "height": {
         "default": 0,
         "title": "Height",
         "type": "integer"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "ContentTypeEnum": {
         "description": "Enum for representing various content types.\n\nNote: Content type declares the broad category of the content, such as text, image, audio, etc.\nThis is not equivalent to the Document type, which is a specific file format.\n\nAttributes\n----------\nAUDIO : str\n    Represents audio content.\nEMBEDDING : str\n    Represents embedding content.\nIMAGE : str\n    Represents image content.\nINFO_MSG : str\n    Represents an informational message.\nPAGE_IMAGE : str\n    Represents a full-page image rendered from a document.\nSTRUCTURED : str\n    Represents structured content.\nTEXT : str\n    Represents text content.\nUNSTRUCTURED : str\n    Represents unstructured content.\nVIDEO : str\n    Represents video content.",
         "enum": [
            "audio",
            "chart",
            "embedding",
            "image",
            "infographic",
            "info_message",
            "none",
            "page_image",
            "structured",
            "table",
            "text",
            "unknown",
            "video"
         ],
         "title": "ContentTypeEnum",
         "type": "string"
      },
      "DocumentTypeEnum": {
         "description": "Enum for representing various document file types.\n\nNote: Document type refers to the specific file format of the content, such as PDF, DOCX, etc.\nThis is not equivalent to the Content type, which is a broad category of the content.\n\nAttributes\n----------\nBMP: str\n    BMP image format.\nDOCX: str\n    Microsoft Word document format.\nHTML: str\n    HTML document.\nJPEG: str\n    JPEG image format.\nPDF: str\n    PDF document format.\nPNG: str\n    PNG image format.\nPPTX: str\n    PowerPoint presentation format.\nSVG: str\n    SVG image format.\nTIFF: str\n    TIFF image format.\nTXT: str\n    Plain text file.\nMP3: str\n    MP3 audio format.\nWAV: str\n    WAV audio format.",
         "enum": [
            "bmp",
            "docx",
            "html",
            "jpeg",
            "pdf",
            "png",
            "pptx",
            "svg",
            "tiff",
            "text",
            "text",
            "mp3",
            "wav",
            "unknown"
         ],
         "title": "DocumentTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "image_type"
   ]
}

Config:
  • extra: str = forbid

Fields:
Validators:
field caption: str = ''#

A caption or subheading associated with the image.

field custom_content: Dict[str, Any] | None = None#
field height: int = 0#

The height of the image.

Validated by:
field image_location: tuple = (0, 0, 0, 0)#

The bounding box of the image, in the format (x1,y1,x2,y2).

field image_location_max_dimensions: tuple = (0, 0)#

The maximum dimensions of the bounding box of the image, in the format (x_max,y_max).

field image_type: DocumentTypeEnum | str [Required]#

The type of the image, such as structured, natural, hybrid, and others.

Validated by:
field structured_image_type: ContentTypeEnum = ContentTypeEnum.NONE#

The type of the content for structured data types, such as bar chart, pie chart, and others.

field text: str = ''#

Extracted text from a structured chart.

field uploaded_image_url: str = ''#

A mirror of source_metadata.source_location.

field width: int = 0#

The width of the image.

Validated by:
validator clamp_non_negative  »  width, height[source]#
validator validate_image_type  »  image_type[source]#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.InfoMessageMetadataSchema[source]#

Bases: BaseModelNoExt

Show JSON schema
{
   "title": "InfoMessageMetadataSchema",
   "type": "object",
   "properties": {
      "task": {
         "$ref": "#/$defs/TaskTypeEnum"
      },
      "status": {
         "$ref": "#/$defs/StatusEnum"
      },
      "message": {
         "title": "Message",
         "type": "string"
      },
      "filter": {
         "title": "Filter",
         "type": "boolean"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "StatusEnum": {
         "description": "Enum for representing status messages.\n\nAttributes\n----------\nERROR : str\n    Represents an error status.\nSUCCESS : str\n    Represents a success status.",
         "enum": [
            "error",
            "success"
         ],
         "title": "StatusEnum",
         "type": "string"
      },
      "TaskTypeEnum": {
         "description": "Enum for representing various task types.\n\nAttributes\n----------\nCAPTION : str\n    Represents a caption task.\nDEDUP : str\n    Represents a deduplication task.\nEMBED : str\n    Represents an embedding task.\nEXTRACT : str\n    Represents an extraction task.\nFILTER : str\n    Represents a filtering task.\nSPLIT : str\n    Represents a splitting task.\nSTORE : str\n    Represents a storing task.\nSTORE_EMBEDDING : str\n    Represents a task for storing embeddings.\nVDB_UPLOAD : str\n    Represents a task for uploading to a vector database.\nAUDIO_DATA_EXTRACT : str\n    Represents a task for extracting audio data.\nTABLE_DATA_EXTRACT : str\n    Represents a task for extracting table data.\nCHART_DATA_EXTRACT : str\n    Represents a task for extracting chart data.\nINFOGRAPHIC_DATA_EXTRACT : str\n    Represents a task for extracting infographic data.\nUDF : str\n    Represents a user-defined function task.",
         "enum": [
            "audio_data_extract",
            "caption",
            "chart_data_extract",
            "dedup",
            "embed",
            "extract",
            "filter",
            "infographic_data_extract",
            "split",
            "store_embedding",
            "store",
            "table_data_extract",
            "udf",
            "vdb_upload"
         ],
         "title": "TaskTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "task",
      "status",
      "message",
      "filter"
   ]
}

Config:
  • extra: str = forbid

Fields:
field custom_content: Dict[str, Any] | None = None#
field filter: bool [Required]#
field message: str [Required]#
field status: StatusEnum [Required]#
field task: TaskTypeEnum [Required]#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.MetadataSchema[source]#

Bases: BaseModelNoExt

The primary container schema for extraction results.

Show JSON schema
{
   "title": "MetadataSchema",
   "description": "The primary container schema for extraction results.",
   "type": "object",
   "properties": {
      "content": {
         "default": "",
         "title": "Content",
         "type": "string"
      },
      "content_url": {
         "default": "",
         "title": "Content Url",
         "type": "string"
      },
      "embedding": {
         "anyOf": [
            {
               "items": {
                  "type": "number"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Embedding"
      },
      "source_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/SourceMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "content_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/ContentMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "audio_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/AudioMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "text_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/TextMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "image_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/ImageMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "table_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/TableMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "chart_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/ChartMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "error_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/ErrorMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "info_message_metadata": {
         "anyOf": [
            {
               "$ref": "#/$defs/InfoMessageMetadataSchema"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "debug_metadata": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Debug Metadata"
      },
      "raise_on_failure": {
         "default": false,
         "title": "Raise On Failure",
         "type": "boolean"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "AccessLevelEnum": {
         "description": "Note\n----\nThis is for future use, and currently has no functional use case.\n\nEnum for representing different access levels.\n\nAttributes\n----------\nLEVEL_1 : int\n    Represents access level 1.\nLEVEL_2 : int\n    Represents access level 2.\nLEVEL_3 : int\n    Represents access level 3.",
         "enum": [
            -1,
            1,
            2,
            3
         ],
         "title": "AccessLevelEnum",
         "type": "integer"
      },
      "AudioMetadataSchema": {
         "additionalProperties": false,
         "description": "The schema for extracted audio content.",
         "properties": {
            "audio_transcript": {
               "default": "",
               "title": "Audio Transcript",
               "type": "string"
            },
            "audio_type": {
               "default": "",
               "title": "Audio Type",
               "type": "string"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "title": "AudioMetadataSchema",
         "type": "object"
      },
      "ChartMetadataSchema": {
         "additionalProperties": false,
         "description": "The schema for extracted chart content.",
         "properties": {
            "caption": {
               "default": "",
               "title": "Caption",
               "type": "string"
            },
            "table_format": {
               "$ref": "#/$defs/TableFormatEnum"
            },
            "table_content": {
               "default": "",
               "title": "Table Content",
               "type": "string"
            },
            "table_content_format": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TableFormatEnum"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "",
               "title": "Table Content Format"
            },
            "table_location": {
               "default": [
                  0,
                  0,
                  0,
                  0
               ],
               "items": {},
               "title": "Table Location",
               "type": "array"
            },
            "table_location_max_dimensions": {
               "default": [
                  0,
                  0
               ],
               "items": {},
               "title": "Table Location Max Dimensions",
               "type": "array"
            },
            "uploaded_image_uri": {
               "default": "",
               "title": "Uploaded Image Uri",
               "type": "string"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "table_format"
         ],
         "title": "ChartMetadataSchema",
         "type": "object"
      },
      "ContentHierarchySchema": {
         "additionalProperties": false,
         "description": "Schema for the extracted content hierarchy.",
         "properties": {
            "page_count": {
               "default": -1,
               "title": "Page Count",
               "type": "integer"
            },
            "page": {
               "default": -1,
               "title": "Page",
               "type": "integer"
            },
            "block": {
               "default": -1,
               "title": "Block",
               "type": "integer"
            },
            "line": {
               "default": -1,
               "title": "Line",
               "type": "integer"
            },
            "span": {
               "default": -1,
               "title": "Span",
               "type": "integer"
            },
            "nearby_objects": {
               "$ref": "#/$defs/NearbyObjectsSchema",
               "default": {
                  "text": {
                     "bbox": [],
                     "content": [],
                     "type": []
                  },
                  "images": {
                     "bbox": [],
                     "content": [],
                     "type": []
                  },
                  "structured": {
                     "bbox": [],
                     "content": [],
                     "type": []
                  }
               }
            }
         },
         "title": "ContentHierarchySchema",
         "type": "object"
      },
      "ContentMetadataSchema": {
         "additionalProperties": false,
         "description": "Data extracted from a source; generally Text or Image.",
         "properties": {
            "type": {
               "$ref": "#/$defs/ContentTypeEnum"
            },
            "description": {
               "default": "",
               "title": "Description",
               "type": "string"
            },
            "page_number": {
               "default": -1,
               "title": "Page Number",
               "type": "integer"
            },
            "hierarchy": {
               "$ref": "#/$defs/ContentHierarchySchema",
               "default": {
                  "page_count": -1,
                  "page": -1,
                  "block": -1,
                  "line": -1,
                  "span": -1,
                  "nearby_objects": {
                     "images": {
                        "bbox": [],
                        "content": [],
                        "type": []
                     },
                     "structured": {
                        "bbox": [],
                        "content": [],
                        "type": []
                     },
                     "text": {
                        "bbox": [],
                        "content": [],
                        "type": []
                     }
                  }
               }
            },
            "subtype": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/ContentTypeEnum"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "",
               "title": "Subtype"
            },
            "start_time": {
               "default": -1,
               "title": "Start Time",
               "type": "integer"
            },
            "end_time": {
               "default": -1,
               "title": "End Time",
               "type": "integer"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "type"
         ],
         "title": "ContentMetadataSchema",
         "type": "object"
      },
      "ContentTypeEnum": {
         "description": "Enum for representing various content types.\n\nNote: Content type declares the broad category of the content, such as text, image, audio, etc.\nThis is not equivalent to the Document type, which is a specific file format.\n\nAttributes\n----------\nAUDIO : str\n    Represents audio content.\nEMBEDDING : str\n    Represents embedding content.\nIMAGE : str\n    Represents image content.\nINFO_MSG : str\n    Represents an informational message.\nPAGE_IMAGE : str\n    Represents a full-page image rendered from a document.\nSTRUCTURED : str\n    Represents structured content.\nTEXT : str\n    Represents text content.\nUNSTRUCTURED : str\n    Represents unstructured content.\nVIDEO : str\n    Represents video content.",
         "enum": [
            "audio",
            "chart",
            "embedding",
            "image",
            "infographic",
            "info_message",
            "none",
            "page_image",
            "structured",
            "table",
            "text",
            "unknown",
            "video"
         ],
         "title": "ContentTypeEnum",
         "type": "string"
      },
      "DocumentTypeEnum": {
         "description": "Enum for representing various document file types.\n\nNote: Document type refers to the specific file format of the content, such as PDF, DOCX, etc.\nThis is not equivalent to the Content type, which is a broad category of the content.\n\nAttributes\n----------\nBMP: str\n    BMP image format.\nDOCX: str\n    Microsoft Word document format.\nHTML: str\n    HTML document.\nJPEG: str\n    JPEG image format.\nPDF: str\n    PDF document format.\nPNG: str\n    PNG image format.\nPPTX: str\n    PowerPoint presentation format.\nSVG: str\n    SVG image format.\nTIFF: str\n    TIFF image format.\nTXT: str\n    Plain text file.\nMP3: str\n    MP3 audio format.\nWAV: str\n    WAV audio format.",
         "enum": [
            "bmp",
            "docx",
            "html",
            "jpeg",
            "pdf",
            "png",
            "pptx",
            "svg",
            "tiff",
            "text",
            "text",
            "mp3",
            "wav",
            "unknown"
         ],
         "title": "DocumentTypeEnum",
         "type": "string"
      },
      "ErrorMetadataSchema": {
         "additionalProperties": false,
         "properties": {
            "task": {
               "$ref": "#/$defs/TaskTypeEnum"
            },
            "status": {
               "$ref": "#/$defs/StatusEnum"
            },
            "source_id": {
               "default": "",
               "title": "Source Id",
               "type": "string"
            },
            "error_msg": {
               "title": "Error Msg",
               "type": "string"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "task",
            "status",
            "error_msg"
         ],
         "title": "ErrorMetadataSchema",
         "type": "object"
      },
      "ImageMetadataSchema": {
         "additionalProperties": false,
         "description": "The schema for the extracted image content.",
         "properties": {
            "image_type": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/DocumentTypeEnum"
                  },
                  {
                     "type": "string"
                  }
               ],
               "title": "Image Type"
            },
            "structured_image_type": {
               "$ref": "#/$defs/ContentTypeEnum",
               "default": "none"
            },
            "caption": {
               "default": "",
               "title": "Caption",
               "type": "string"
            },
            "text": {
               "default": "",
               "title": "Text",
               "type": "string"
            },
            "image_location": {
               "default": [
                  0,
                  0,
                  0,
                  0
               ],
               "items": {},
               "title": "Image Location",
               "type": "array"
            },
            "image_location_max_dimensions": {
               "default": [
                  0,
                  0
               ],
               "items": {},
               "title": "Image Location Max Dimensions",
               "type": "array"
            },
            "uploaded_image_url": {
               "default": "",
               "title": "Uploaded Image Url",
               "type": "string"
            },
            "width": {
               "default": 0,
               "title": "Width",
               "type": "integer"
            },
            "height": {
               "default": 0,
               "title": "Height",
               "type": "integer"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "image_type"
         ],
         "title": "ImageMetadataSchema",
         "type": "object"
      },
      "InfoMessageMetadataSchema": {
         "additionalProperties": false,
         "properties": {
            "task": {
               "$ref": "#/$defs/TaskTypeEnum"
            },
            "status": {
               "$ref": "#/$defs/StatusEnum"
            },
            "message": {
               "title": "Message",
               "type": "string"
            },
            "filter": {
               "title": "Filter",
               "type": "boolean"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "task",
            "status",
            "message",
            "filter"
         ],
         "title": "InfoMessageMetadataSchema",
         "type": "object"
      },
      "LanguageEnum": {
         "description": "Enum for representing various language codes.\n\nAttributes\n----------\nAF : str\n    Afrikaans language code.\nAR : str\n    Arabic language code.\nBG : str\n    Bulgarian language code.\nBN : str\n    Bengali language code.\nCA : str\n    Catalan language code.\nCS : str\n    Czech language code.\nCY : str\n    Welsh language code.\nDA : str\n    Danish language code.\nDE : str\n    German language code.\nEL : str\n    Greek language code.\nEN : str\n    English language code.\nES : str\n    Spanish language code.\nET : str\n    Estonian language code.\nFA : str\n    Persian language code.\nFI : str\n    Finnish language code.\nFR : str\n    French language code.\nGU : str\n    Gujarati language code.\nHE : str\n    Hebrew language code.\nHI : str\n    Hindi language code.\nHR : str\n    Croatian language code.\nHU : str\n    Hungarian language code.\nID : str\n    Indonesian language code.\nIT : str\n    Italian language code.\nJA : str\n    Japanese language code.\nKN : str\n    Kannada language code.\nKO : str\n    Korean language code.\nLT : str\n    Lithuanian language code.\nLV : str\n    Latvian language code.\nMK : str\n    Macedonian language code.\nML : str\n    Malayalam language code.\nMR : str\n    Marathi language code.\nNE : str\n    Nepali language code.\nNL : str\n    Dutch language code.\nNO : str\n    Norwegian language code.\nPA : str\n    Punjabi language code.\nPL : str\n    Polish language code.\nPT : str\n    Portuguese language code.\nRO : str\n    Romanian language code.\nRU : str\n    Russian language code.\nSK : str\n    Slovak language code.\nSL : str\n    Slovenian language code.\nSO : str\n    Somali language code.\nSQ : str\n    Albanian language code.\nSV : str\n    Swedish language code.\nSW : str\n    Swahili language code.\nTA : str\n    Tamil language code.\nTE : str\n    Telugu language code.\nTH : str\n    Thai language code.\nTL : str\n    Tagalog language code.\nTR : str\n    Turkish language code.\nUK : str\n    Ukrainian language code.\nUR : str\n    Urdu language code.\nVI : str\n    Vietnamese language code.\nZH_CN : str\n    Chinese (Simplified) language code.\nZH_TW : str\n    Chinese (Traditional) language code.\nUNKNOWN : str\n    Represents an unknown language.",
         "enum": [
            "af",
            "ar",
            "bg",
            "bn",
            "ca",
            "cs",
            "cy",
            "da",
            "de",
            "el",
            "en",
            "es",
            "et",
            "fa",
            "fi",
            "fr",
            "gu",
            "he",
            "hi",
            "hr",
            "hu",
            "id",
            "it",
            "ja",
            "kn",
            "ko",
            "lt",
            "lv",
            "mk",
            "ml",
            "mr",
            "ne",
            "nl",
            "no",
            "pa",
            "pl",
            "pt",
            "ro",
            "ru",
            "sk",
            "sl",
            "so",
            "sq",
            "sv",
            "sw",
            "ta",
            "te",
            "th",
            "tl",
            "tr",
            "uk",
            "ur",
            "vi",
            "zh-cn",
            "zh-tw",
            "unknown"
         ],
         "title": "LanguageEnum",
         "type": "string"
      },
      "NearbyObjectsSchema": {
         "additionalProperties": false,
         "description": "Schema to hold types of related extracted objects.",
         "properties": {
            "text": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            },
            "images": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            },
            "structured": {
               "$ref": "#/$defs/NearbyObjectsSubSchema",
               "default": {
                  "content": [],
                  "bbox": [],
                  "type": []
               }
            }
         },
         "title": "NearbyObjectsSchema",
         "type": "object"
      },
      "NearbyObjectsSubSchema": {
         "additionalProperties": false,
         "description": "Schema to hold related extracted object.",
         "properties": {
            "content": {
               "items": {
                  "type": "string"
               },
               "title": "Content",
               "type": "array"
            },
            "bbox": {
               "items": {
                  "items": {},
                  "type": "array"
               },
               "title": "Bbox",
               "type": "array"
            },
            "type": {
               "items": {
                  "type": "string"
               },
               "title": "Type",
               "type": "array"
            }
         },
         "title": "NearbyObjectsSubSchema",
         "type": "object"
      },
      "SourceMetadataSchema": {
         "additionalProperties": false,
         "description": "Schema for the knowledge base file from which content\nand metadata is extracted.",
         "properties": {
            "source_name": {
               "title": "Source Name",
               "type": "string"
            },
            "source_id": {
               "title": "Source Id",
               "type": "string"
            },
            "source_location": {
               "default": "",
               "title": "Source Location",
               "type": "string"
            },
            "source_type": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/DocumentTypeEnum"
                  },
                  {
                     "type": "string"
                  }
               ],
               "title": "Source Type"
            },
            "collection_id": {
               "default": "",
               "title": "Collection Id",
               "type": "string"
            },
            "date_created": {
               "default": "2025-09-17T20:05:30.782143",
               "title": "Date Created",
               "type": "string"
            },
            "last_modified": {
               "default": "2025-09-17T20:05:30.782152",
               "title": "Last Modified",
               "type": "string"
            },
            "summary": {
               "default": "",
               "title": "Summary",
               "type": "string"
            },
            "partition_id": {
               "default": -1,
               "title": "Partition Id",
               "type": "integer"
            },
            "access_level": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/AccessLevelEnum"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": -1,
               "title": "Access Level"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "source_name",
            "source_id",
            "source_type"
         ],
         "title": "SourceMetadataSchema",
         "type": "object"
      },
      "StatusEnum": {
         "description": "Enum for representing status messages.\n\nAttributes\n----------\nERROR : str\n    Represents an error status.\nSUCCESS : str\n    Represents a success status.",
         "enum": [
            "error",
            "success"
         ],
         "title": "StatusEnum",
         "type": "string"
      },
      "TableFormatEnum": {
         "description": "Enum for representing table formats.\n\nAttributes\n----------\nHTML : str\n    Represents HTML table format.\nIMAGE : str\n    Represents image table format.\nLATEX : str\n    Represents LaTeX table format.\nMARKDOWN : str\n    Represents Markdown table format.\nPSEUDO_MARKDOWN : str\n    Represents pseudo Markdown table format.\nSIMPLE : str\n    Represents simple table format.",
         "enum": [
            "html",
            "image",
            "latex",
            "markdown",
            "pseudo_markdown",
            "simple"
         ],
         "title": "TableFormatEnum",
         "type": "string"
      },
      "TableMetadataSchema": {
         "additionalProperties": false,
         "description": "The schema for the extracted table content.",
         "properties": {
            "caption": {
               "default": "",
               "title": "Caption",
               "type": "string"
            },
            "table_format": {
               "$ref": "#/$defs/TableFormatEnum"
            },
            "table_content": {
               "default": "",
               "title": "Table Content",
               "type": "string"
            },
            "table_content_format": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TableFormatEnum"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "",
               "title": "Table Content Format"
            },
            "table_location": {
               "default": [
                  0,
                  0,
                  0,
                  0
               ],
               "items": {},
               "title": "Table Location",
               "type": "array"
            },
            "table_location_max_dimensions": {
               "default": [
                  0,
                  0
               ],
               "items": {},
               "title": "Table Location Max Dimensions",
               "type": "array"
            },
            "uploaded_image_uri": {
               "default": "",
               "title": "Uploaded Image Uri",
               "type": "string"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "table_format"
         ],
         "title": "TableMetadataSchema",
         "type": "object"
      },
      "TaskTypeEnum": {
         "description": "Enum for representing various task types.\n\nAttributes\n----------\nCAPTION : str\n    Represents a caption task.\nDEDUP : str\n    Represents a deduplication task.\nEMBED : str\n    Represents an embedding task.\nEXTRACT : str\n    Represents an extraction task.\nFILTER : str\n    Represents a filtering task.\nSPLIT : str\n    Represents a splitting task.\nSTORE : str\n    Represents a storing task.\nSTORE_EMBEDDING : str\n    Represents a task for storing embeddings.\nVDB_UPLOAD : str\n    Represents a task for uploading to a vector database.\nAUDIO_DATA_EXTRACT : str\n    Represents a task for extracting audio data.\nTABLE_DATA_EXTRACT : str\n    Represents a task for extracting table data.\nCHART_DATA_EXTRACT : str\n    Represents a task for extracting chart data.\nINFOGRAPHIC_DATA_EXTRACT : str\n    Represents a task for extracting infographic data.\nUDF : str\n    Represents a user-defined function task.",
         "enum": [
            "audio_data_extract",
            "caption",
            "chart_data_extract",
            "dedup",
            "embed",
            "extract",
            "filter",
            "infographic_data_extract",
            "split",
            "store_embedding",
            "store",
            "table_data_extract",
            "udf",
            "vdb_upload"
         ],
         "title": "TaskTypeEnum",
         "type": "string"
      },
      "TextMetadataSchema": {
         "additionalProperties": false,
         "description": "The schema for the extracted text content.",
         "properties": {
            "text_type": {
               "$ref": "#/$defs/TextTypeEnum"
            },
            "summary": {
               "default": "",
               "title": "Summary",
               "type": "string"
            },
            "keywords": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "additionalProperties": true,
                     "type": "object"
                  }
               ],
               "default": "",
               "title": "Keywords"
            },
            "language": {
               "$ref": "#/$defs/LanguageEnum",
               "default": "en"
            },
            "text_location": {
               "default": [
                  0,
                  0,
                  0,
                  0
               ],
               "items": {},
               "title": "Text Location",
               "type": "array"
            },
            "text_location_max_dimensions": {
               "default": [
                  0,
                  0
               ],
               "items": {},
               "title": "Text Location Max Dimensions",
               "type": "array"
            },
            "custom_content": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Custom Content"
            }
         },
         "required": [
            "text_type"
         ],
         "title": "TextMetadataSchema",
         "type": "object"
      },
      "TextTypeEnum": {
         "description": "Enum for representing different types of text segments.\n\nAttributes\n----------\nBLOCK : str\n    Represents a text block.\nBODY : str\n    Represents body text.\nDOCUMENT : str\n    Represents an entire document.\nHEADER : str\n    Represents a header text.\nLINE : str\n    Represents a single line of text.\nNEARBY_BLOCK : str\n    Represents a block of text in close proximity to another.\nOTHER : str\n    Represents other unspecified text type.\nPAGE : str\n    Represents a page of text.\nSPAN : str\n    Represents an inline text span.",
         "enum": [
            "block",
            "body",
            "document",
            "header",
            "line",
            "nearby_block",
            "other",
            "page",
            "span"
         ],
         "title": "TextTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
Validators:
field audio_metadata: AudioMetadataSchema | None = None#

Specific metadata for audio content. Automatically set to None if content_metadata.type is not AUDIO.

Validated by:
field chart_metadata: ChartMetadataSchema | None = None#

Specific metadata for chart content. Automatically set to None if content_metadata.type is not STRUCTURED.

Validated by:
field content: str = ''#

The actual textual content extracted from the source.

Validated by:
field content_metadata: ContentMetadataSchema | None = None#

General metadata about the extracted content itself.

Validated by:
field content_url: str = ''#

A URL that points to the location of the content, if applicable.

Validated by:
field custom_content: Dict[str, Any] | None = None#
Validated by:
field debug_metadata: Dict[str, Any] | None = None#

A dictionary for storing any arbitrary debug information.

Validated by:
field embedding: List[float] | None = None#

An optional numerical vector representation (embedding) of the content.

Validated by:
field error_metadata: ErrorMetadataSchema | None = None#

Metadata that describes any errors encountered during processing.

Validated by:
field image_metadata: ImageMetadataSchema | None = None#

Specific metadata for image content. Automatically set to None if content_metadata.type is not IMAGE.

Validated by:
field info_message_metadata: InfoMessageMetadataSchema | None = None#

Informational messages related to the processing.

Validated by:
field raise_on_failure: bool = False#

If True, indicates that processing should halt on failure.

Validated by:
field source_metadata: SourceMetadataSchema | None = None#

Metadata about the original source of the content.

Validated by:
field table_metadata: TableMetadataSchema | None = None#

Specific metadata for tabular content. Automatically set to None if content_metadata.type is not STRUCTURED.

Validated by:
field text_metadata: TextMetadataSchema | None = None#

Specific metadata for text content. Automatically set to None if content_metadata.type is not TEXT.

Validated by:
validator check_metadata_type  »  all fields[source]#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.NearbyObjectsSchema[source]#

Bases: BaseModelNoExt

Schema to hold types of related extracted objects.

Show JSON schema
{
   "title": "NearbyObjectsSchema",
   "description": "Schema to hold types of related extracted objects.",
   "type": "object",
   "properties": {
      "text": {
         "$ref": "#/$defs/NearbyObjectsSubSchema",
         "default": {
            "content": [],
            "bbox": [],
            "type": []
         }
      },
      "images": {
         "$ref": "#/$defs/NearbyObjectsSubSchema",
         "default": {
            "content": [],
            "bbox": [],
            "type": []
         }
      },
      "structured": {
         "$ref": "#/$defs/NearbyObjectsSubSchema",
         "default": {
            "content": [],
            "bbox": [],
            "type": []
         }
      }
   },
   "$defs": {
      "NearbyObjectsSubSchema": {
         "additionalProperties": false,
         "description": "Schema to hold related extracted object.",
         "properties": {
            "content": {
               "items": {
                  "type": "string"
               },
               "title": "Content",
               "type": "array"
            },
            "bbox": {
               "items": {
                  "items": {},
                  "type": "array"
               },
               "title": "Bbox",
               "type": "array"
            },
            "type": {
               "items": {
                  "type": "string"
               },
               "title": "Type",
               "type": "array"
            }
         },
         "title": "NearbyObjectsSubSchema",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field images: NearbyObjectsSubSchema = NearbyObjectsSubSchema(content=[], bbox=[], type=[])#
field structured: NearbyObjectsSubSchema = NearbyObjectsSubSchema(content=[], bbox=[], type=[])#
field text: NearbyObjectsSubSchema = NearbyObjectsSubSchema(content=[], bbox=[], type=[])#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.NearbyObjectsSubSchema[source]#

Bases: BaseModelNoExt

Schema to hold related extracted object.

Show JSON schema
{
   "title": "NearbyObjectsSubSchema",
   "description": "Schema to hold related extracted object.",
   "type": "object",
   "properties": {
      "content": {
         "items": {
            "type": "string"
         },
         "title": "Content",
         "type": "array"
      },
      "bbox": {
         "items": {
            "items": {},
            "type": "array"
         },
         "title": "Bbox",
         "type": "array"
      },
      "type": {
         "items": {
            "type": "string"
         },
         "title": "Type",
         "type": "array"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field bbox: List[tuple] [Optional]#
field content: List[str] [Optional]#
field type: List[str] [Optional]#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.SourceMetadataSchema[source]#

Bases: BaseModelNoExt

Schema for the knowledge base file from which content and metadata is extracted.

Show JSON schema
{
   "title": "SourceMetadataSchema",
   "description": "Schema for the knowledge base file from which content\nand metadata is extracted.",
   "type": "object",
   "properties": {
      "source_name": {
         "title": "Source Name",
         "type": "string"
      },
      "source_id": {
         "title": "Source Id",
         "type": "string"
      },
      "source_location": {
         "default": "",
         "title": "Source Location",
         "type": "string"
      },
      "source_type": {
         "anyOf": [
            {
               "$ref": "#/$defs/DocumentTypeEnum"
            },
            {
               "type": "string"
            }
         ],
         "title": "Source Type"
      },
      "collection_id": {
         "default": "",
         "title": "Collection Id",
         "type": "string"
      },
      "date_created": {
         "default": "2025-09-17T20:05:30.782143",
         "title": "Date Created",
         "type": "string"
      },
      "last_modified": {
         "default": "2025-09-17T20:05:30.782152",
         "title": "Last Modified",
         "type": "string"
      },
      "summary": {
         "default": "",
         "title": "Summary",
         "type": "string"
      },
      "partition_id": {
         "default": -1,
         "title": "Partition Id",
         "type": "integer"
      },
      "access_level": {
         "anyOf": [
            {
               "$ref": "#/$defs/AccessLevelEnum"
            },
            {
               "type": "integer"
            }
         ],
         "default": -1,
         "title": "Access Level"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "AccessLevelEnum": {
         "description": "Note\n----\nThis is for future use, and currently has no functional use case.\n\nEnum for representing different access levels.\n\nAttributes\n----------\nLEVEL_1 : int\n    Represents access level 1.\nLEVEL_2 : int\n    Represents access level 2.\nLEVEL_3 : int\n    Represents access level 3.",
         "enum": [
            -1,
            1,
            2,
            3
         ],
         "title": "AccessLevelEnum",
         "type": "integer"
      },
      "DocumentTypeEnum": {
         "description": "Enum for representing various document file types.\n\nNote: Document type refers to the specific file format of the content, such as PDF, DOCX, etc.\nThis is not equivalent to the Content type, which is a broad category of the content.\n\nAttributes\n----------\nBMP: str\n    BMP image format.\nDOCX: str\n    Microsoft Word document format.\nHTML: str\n    HTML document.\nJPEG: str\n    JPEG image format.\nPDF: str\n    PDF document format.\nPNG: str\n    PNG image format.\nPPTX: str\n    PowerPoint presentation format.\nSVG: str\n    SVG image format.\nTIFF: str\n    TIFF image format.\nTXT: str\n    Plain text file.\nMP3: str\n    MP3 audio format.\nWAV: str\n    WAV audio format.",
         "enum": [
            "bmp",
            "docx",
            "html",
            "jpeg",
            "pdf",
            "png",
            "pptx",
            "svg",
            "tiff",
            "text",
            "text",
            "mp3",
            "wav",
            "unknown"
         ],
         "title": "DocumentTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "source_name",
      "source_id",
      "source_type"
   ]
}

Config:
  • extra: str = forbid

Fields:
Validators:
field access_level: AccessLevelEnum | int = AccessLevelEnum.UNKNOWN#

The role-based access control for the source.

field collection_id: str = ''#

The ID of the collection in which the source is contained.

field custom_content: Dict[str, Any] | None = None#
field date_created: str = '2025-09-17T20:05:30.782143'#

The date the source was created.

Validated by:
field last_modified: str = '2025-09-17T20:05:30.782152'#

The date the source was last modified.

Validated by:
field partition_id: int = -1#

The offset of this data fragment within a larger set of fragments.

field source_id: str [Required]#

The ID of the source file.

field source_location: str = ''#

The URL, URI, or pointer to the storage location of the source file.

field source_name: str [Required]#

The name of the source file.

field source_type: DocumentTypeEnum | str [Required]#

The type of the source file, such as pdf, docx, pptx, or txt.

field summary: str = ''#

A summary of the source.

validator validate_fields  »  last_modified, date_created[source]#
pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.TableMetadataSchema[source]#

Bases: BaseModelNoExt

The schema for the extracted table content.

Show JSON schema
{
   "title": "TableMetadataSchema",
   "description": "The schema for the extracted table content.",
   "type": "object",
   "properties": {
      "caption": {
         "default": "",
         "title": "Caption",
         "type": "string"
      },
      "table_format": {
         "$ref": "#/$defs/TableFormatEnum"
      },
      "table_content": {
         "default": "",
         "title": "Table Content",
         "type": "string"
      },
      "table_content_format": {
         "anyOf": [
            {
               "$ref": "#/$defs/TableFormatEnum"
            },
            {
               "type": "string"
            }
         ],
         "default": "",
         "title": "Table Content Format"
      },
      "table_location": {
         "default": [
            0,
            0,
            0,
            0
         ],
         "items": {},
         "title": "Table Location",
         "type": "array"
      },
      "table_location_max_dimensions": {
         "default": [
            0,
            0
         ],
         "items": {},
         "title": "Table Location Max Dimensions",
         "type": "array"
      },
      "uploaded_image_uri": {
         "default": "",
         "title": "Uploaded Image Uri",
         "type": "string"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "TableFormatEnum": {
         "description": "Enum for representing table formats.\n\nAttributes\n----------\nHTML : str\n    Represents HTML table format.\nIMAGE : str\n    Represents image table format.\nLATEX : str\n    Represents LaTeX table format.\nMARKDOWN : str\n    Represents Markdown table format.\nPSEUDO_MARKDOWN : str\n    Represents pseudo Markdown table format.\nSIMPLE : str\n    Represents simple table format.",
         "enum": [
            "html",
            "image",
            "latex",
            "markdown",
            "pseudo_markdown",
            "simple"
         ],
         "title": "TableFormatEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "table_format"
   ]
}

Config:
  • extra: str = forbid

Fields:
field caption: str = ''#

The caption for the table.

field custom_content: Dict[str, Any] | None = None#
field table_content: str = ''#

Extracted text content, formatted according to table_metadata.table_format.

field table_content_format: TableFormatEnum | str = ''#
field table_format: TableFormatEnum [Required]#

The format of the table. One of Structured (dataframe / lists of rows and columns), or serialized as markdown, html, latex, simple (cells separated as spaces).

field table_location: tuple = (0, 0, 0, 0)#

The bounding box of the table, in the format (x1,y1,x2,y2).

field table_location_max_dimensions: tuple = (0, 0)#

The maximum dimensions of the bounding box of the table, in the format (x_max,y_max).

field uploaded_image_uri: str = ''#

A mirror of source_metadata.source_location.

pydantic model nv_ingest_api.internal.schemas.meta.metadata_schema.TextMetadataSchema[source]#

Bases: BaseModelNoExt

The schema for the extracted text content.

Show JSON schema
{
   "title": "TextMetadataSchema",
   "description": "The schema for the extracted text content.",
   "type": "object",
   "properties": {
      "text_type": {
         "$ref": "#/$defs/TextTypeEnum"
      },
      "summary": {
         "default": "",
         "title": "Summary",
         "type": "string"
      },
      "keywords": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "items": {
                  "type": "string"
               },
               "type": "array"
            },
            {
               "additionalProperties": true,
               "type": "object"
            }
         ],
         "default": "",
         "title": "Keywords"
      },
      "language": {
         "$ref": "#/$defs/LanguageEnum",
         "default": "en"
      },
      "text_location": {
         "default": [
            0,
            0,
            0,
            0
         ],
         "items": {},
         "title": "Text Location",
         "type": "array"
      },
      "text_location_max_dimensions": {
         "default": [
            0,
            0
         ],
         "items": {},
         "title": "Text Location Max Dimensions",
         "type": "array"
      },
      "custom_content": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Custom Content"
      }
   },
   "$defs": {
      "LanguageEnum": {
         "description": "Enum for representing various language codes.\n\nAttributes\n----------\nAF : str\n    Afrikaans language code.\nAR : str\n    Arabic language code.\nBG : str\n    Bulgarian language code.\nBN : str\n    Bengali language code.\nCA : str\n    Catalan language code.\nCS : str\n    Czech language code.\nCY : str\n    Welsh language code.\nDA : str\n    Danish language code.\nDE : str\n    German language code.\nEL : str\n    Greek language code.\nEN : str\n    English language code.\nES : str\n    Spanish language code.\nET : str\n    Estonian language code.\nFA : str\n    Persian language code.\nFI : str\n    Finnish language code.\nFR : str\n    French language code.\nGU : str\n    Gujarati language code.\nHE : str\n    Hebrew language code.\nHI : str\n    Hindi language code.\nHR : str\n    Croatian language code.\nHU : str\n    Hungarian language code.\nID : str\n    Indonesian language code.\nIT : str\n    Italian language code.\nJA : str\n    Japanese language code.\nKN : str\n    Kannada language code.\nKO : str\n    Korean language code.\nLT : str\n    Lithuanian language code.\nLV : str\n    Latvian language code.\nMK : str\n    Macedonian language code.\nML : str\n    Malayalam language code.\nMR : str\n    Marathi language code.\nNE : str\n    Nepali language code.\nNL : str\n    Dutch language code.\nNO : str\n    Norwegian language code.\nPA : str\n    Punjabi language code.\nPL : str\n    Polish language code.\nPT : str\n    Portuguese language code.\nRO : str\n    Romanian language code.\nRU : str\n    Russian language code.\nSK : str\n    Slovak language code.\nSL : str\n    Slovenian language code.\nSO : str\n    Somali language code.\nSQ : str\n    Albanian language code.\nSV : str\n    Swedish language code.\nSW : str\n    Swahili language code.\nTA : str\n    Tamil language code.\nTE : str\n    Telugu language code.\nTH : str\n    Thai language code.\nTL : str\n    Tagalog language code.\nTR : str\n    Turkish language code.\nUK : str\n    Ukrainian language code.\nUR : str\n    Urdu language code.\nVI : str\n    Vietnamese language code.\nZH_CN : str\n    Chinese (Simplified) language code.\nZH_TW : str\n    Chinese (Traditional) language code.\nUNKNOWN : str\n    Represents an unknown language.",
         "enum": [
            "af",
            "ar",
            "bg",
            "bn",
            "ca",
            "cs",
            "cy",
            "da",
            "de",
            "el",
            "en",
            "es",
            "et",
            "fa",
            "fi",
            "fr",
            "gu",
            "he",
            "hi",
            "hr",
            "hu",
            "id",
            "it",
            "ja",
            "kn",
            "ko",
            "lt",
            "lv",
            "mk",
            "ml",
            "mr",
            "ne",
            "nl",
            "no",
            "pa",
            "pl",
            "pt",
            "ro",
            "ru",
            "sk",
            "sl",
            "so",
            "sq",
            "sv",
            "sw",
            "ta",
            "te",
            "th",
            "tl",
            "tr",
            "uk",
            "ur",
            "vi",
            "zh-cn",
            "zh-tw",
            "unknown"
         ],
         "title": "LanguageEnum",
         "type": "string"
      },
      "TextTypeEnum": {
         "description": "Enum for representing different types of text segments.\n\nAttributes\n----------\nBLOCK : str\n    Represents a text block.\nBODY : str\n    Represents body text.\nDOCUMENT : str\n    Represents an entire document.\nHEADER : str\n    Represents a header text.\nLINE : str\n    Represents a single line of text.\nNEARBY_BLOCK : str\n    Represents a block of text in close proximity to another.\nOTHER : str\n    Represents other unspecified text type.\nPAGE : str\n    Represents a page of text.\nSPAN : str\n    Represents an inline text span.",
         "enum": [
            "block",
            "body",
            "document",
            "header",
            "line",
            "nearby_block",
            "other",
            "page",
            "span"
         ],
         "title": "TextTypeEnum",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "text_type"
   ]
}

Config:
  • extra: str = forbid

Fields:
field custom_content: Dict[str, Any] | None = None#
field keywords: str | List[str] | Dict = ''#

Keywords, named entities, or other phrases.

field language: LanguageEnum = 'en'#

The language of the content.

field summary: str = ''#

An abbreviated summary of the content.

field text_location: tuple = (0, 0, 0, 0)#

The bounding box of the text, in the format (x1,y1,x2,y2).

field text_location_max_dimensions: tuple = (0, 0)#

The maximum dimensions of the bounding box of the text, in the format (x_max,y_max).

field text_type: TextTypeEnum [Required]#

The type of the text, such as header or body.

nv_ingest_api.internal.schemas.meta.metadata_schema.validate_metadata(
metadata: Dict[str, Any],
) MetadataSchema[source]#

Validates the given metadata dictionary against the MetadataSchema.

Parameters: - metadata: A dictionary representing metadata to be validated.

Returns: - An instance of MetadataSchema if validation is successful.

Raises: - ValidationError: If the metadata does not conform to the schema.

nv_ingest_api.internal.schemas.meta.udf module#

pydantic model nv_ingest_api.internal.schemas.meta.udf.UDFStageSchema[source]#

Bases: BaseModel

Schema for UDF stage configuration.

The UDF function string should be provided in the task config. If no UDF function is provided and ignore_empty_udf is True, the message is returned unchanged. If ignore_empty_udf is False, an error is raised when no UDF function is provided.

Show JSON schema
{
   "title": "UDFStageSchema",
   "description": "Schema for UDF stage configuration.\n\nThe UDF function string should be provided in the task config. If no UDF function\nis provided and ignore_empty_udf is True, the message is returned unchanged.\nIf ignore_empty_udf is False, an error is raised when no UDF function is provided.",
   "type": "object",
   "properties": {
      "ignore_empty_udf": {
         "default": false,
         "description": "If True, ignore UDF tasks without udf_function and return message unchanged. If False, raise error.",
         "title": "Ignore Empty Udf",
         "type": "boolean"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

Fields:
field ignore_empty_udf: bool = False#

If True, ignore UDF tasks without udf_function and return message unchanged. If False, raise error.

Module contents#