VSS Blueprint Integration#

NVIDIA’s Video Search and Summarization Blueprint (VSS) integrates Context Aware RAG library for data ingestion and retreival. VSS makes it easy to get started building and customizing video analytics AI agents for video search and summarization — all powered by generative AI, vision language models (VLMs). VLM processes video chunks into descriptive captions. These captions encapsulate key visual and contextual details and are ingested as documents into our library for processing and retrieval.

For more information on Context Aware RAG integration with VSS, please refer to the VSS Context Aware RAG Integration.

Workflow#

  1. Video Processing:

    • The process involves decoding video segments (chunks) generated by the stream handler, selecting frames, and using a vision-language model (VLM) along with a caption prompt to generate detailed captions for each chunk.

  2. Document Ingestion:

    • Generated captions are documents that are ingested into our system.

    • Each document is assigned a unique identifier (doc_index) and metadata (doc_meta) such as stream ID and timestamp.

    • The ingestion process is asynchronous and parallel, ensuring efficient handling of high-volume and out-of-order data.

  3. Data Retrieval with Graph-RAG:

    • Graph Extraction identifies entities and relationships from the captions, building a comprehensive knowledge graph.

    • Graph Retrieval leverages this graph to fetch contextually relevant documents in response to queries, enabling QnA for videos.