Experimental Features#

These are some early stage features shipped with CA-RAG.

Model Context Protocol (MCP)#

The Model Context Protocol (MCP) is an open standard and open-source framework created to standardize how AI models connect to external data sources and tools, like databases or applications, using a client-server architecture. CA-RAG provides MCP tools that enable AI agents to use CA-RAG as a data source for iterative and deep research use cases involving video content analysis.

CA-RAG exposes five specialized MCP tools designed for comprehensive video content analysis and retrieval. These tools range from complex question answering to specific event detection, providing AI agents with the capability to perform sophisticated video analysis workflows. Each tool is optimized for different use cases, from finding specific incidents to generating executive summaries of video content.

The tools leverage CA-RAG’s advanced retrieval capabilities, including GraphRAG and VectorRAG techniques, to provide accurate and contextually relevant responses. They return rich metadata that includes temporal information, source tracking, and content structure markers, enabling precise analysis and multi-stream video processing.

After the CA-RAG service is started as mentioned in Retrieval Service, Cursor can be configured to connect to CA-RAG retrieval service. Here’s an example mcp.json:

{
    "mcpServers": {
      "via-ctx-rag": {
        "url": "http://localhost:${VSS_CTX_PORT_RET}/mcp"
      }
    }
}

For more configuration docs, refer Cursor MCP Docs.

Available MCP Tools#

  • query:

    • Answers complex analytical questions about video content using advanced RAG techniques (GraphRAG or VectorRAG)

    • Returns response generated by the configured LLM in config.yaml.

    • Input: question: "What safety violations occurred in the warehouse?"

    • Output:

    {"response": "Analysis of safety violations...", "error": null}
    
  • find_event:

    • Find specific events in videos based on keyword search, returning structured source documents of the video clip along with metadata for further processing.

    • Returns a list of SourceDocs with page_content (event descriptions) and metadata (timestamps, stream IDs, file paths).

    • Input: keywords: "forklift accident"

    • Output:

    [{"page_content": "Event description...", "metadata": {"timestamp": "2024-01-01T10:30:00", "stream_id": "stream123", "file_path": "/path/to/video.mp4"}}]
    
  • find_object:

    • Locate events containing specific objects using entity-based retrieval, ideal for tracking particular equipment, people, or safety items

    • Returns a list of SourceDocs with object detection events and complete temporal/source metadata like timestamp, stream ID, file paths etc.

    • Input: object_name: "worker with yellow vest"

    • Output:

    [{"page_content": "Worker wearing yellow safety vest observed in warehouse area", "metadata": {"timestamp": "2024-01-01T10:30:00", "stream_id": "stream123", "file_path": "/path/to/video.mp4"}}]
    
  • find_event_formatted:

    • Retrieve events/documents as a list of simple formatted strings for reports without metadata

    • Returns a list of strings with “Document start/Content/Document end” structure

    • Input: keywords: "equipment malfunction"

    • Output:

    ["Document start\nEquipment malfunction detected...\nDocument end"]
    
  • summary_retriever:

    • Generate summaries of video/documents within specified time ranges or for specific video streams

    • Returns a summary text describing events and activities in the specified time period or stream

    • Input: start_time: 100.5, end_time: 200.0, uuid: "stream123"

    • Output:

    "Summary of activities between 100.5s and 200.0s in stream123..."
    

Note: Ensure the config.yaml contains the following summary_retriever section if you want to use MCP tools:

functions:
    # ... existing functions
    summary_retriever:
        type: summary_retriever
        params:
            summarization_prompt: "Given the following dense captions of events that happened in a video, filtered by the time range, summarize the following events: {context}. If you don't see any events, return exactly 'No events found'."
        tools:
            db: graph_db
            llm: nvidia_llm

context_manager:
    functions:
        # ... other functions
        - summary_retriever

Structured Response#

CA-RAG supports returning structured responses in JSON format instead of plain text. This feature enables applications to receive machine-readable outputs that can be processed programmatically and used in downstream tasks. This can be used in both library and service mode.

Configuration Parameters#

When making retrieval requests either as a library or service, you can specify the following parameters to control response formatting:

response_method#

Controls how the response should be formatted. Supported values:

  • "text" (default): Returns a plain text response

  • "json_mode": Returns a JSON-formatted response (requires “json” keyword in the question). The LLM assumes any JSON structure and returns. This guarantees the JSON response but variable structure.

  • "function_calling": Returns structured output strictly based on a provided JSON schema. If the response_method is function_calling, then response_schema should be provided.

response_schema#

A valid JSON schema that defines the structure of the expected response. Required when using "function_calling" method, ignored for other methods.

Usage Examples#

Refer to Document Retrieval for full context_manager setup and usage.

It is also possible to use structured response mode using CA-RAG as a retrieval service as mentioned in Retrieval Service by just passing in the additional state values response_method and response_schema.

1. Default Text Response#

result = cm.call(
    {
        "retriever_function": {
            "question": "What safety violations occurred in the warehouse?",
            "response_method": None,  # or "text"
            "response_schema": None,
            "is_live": False,
            "is_last": False,
        }
    }
)
logger.info(f"Response {result['retriever_function']['response']}")

Output:

"Multiple safety violations were observed including workers not wearing proper PPE..."

2. JSON Mode Response#

result = cm.call(
    {
        "retriever_function": {
            "question": "Summarize the warehouse incidents in JSON format",
            "response_method": "json_mode",
            "response_schema": None,
            "is_live": False,
            "is_last": False,
        }
    }
)
logger.info(f"Response {result['retriever_function']['response']}")

Output:

{
    "incidents": [
        {
            "type": "safety_violation",
            "description": "Worker without hard hat in construction zone",
            "severity": "high"
        }
    ],
    "total_count": 1
}

3. Function Calling with Schema#

result = cm.call(
    {
        "retriever_function": {
            "question": "What safety violations occurred in the warehouse?",
            "response_method": "function_calling",
            "response_schema": {
                "title": "SafetyReport",
                "description": "Safety violation analysis report",
                "type": "object",
                "properties": {
                    "summary": {
                        "type": "string",
                        "description": "Brief summary of findings"
                    },
                    "violations": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "type": {"type": "string"},
                                "severity": {"type": "string", "enum": ["low", "medium", "high"]},
                                "location": {"type": "string"},
                                "timestamp": {"type": "string"}
                            },
                            "required": ["type", "severity"]
                        }
                    },
                    "recommendations": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["summary", "violations"]
            },
            "is_live": False,
            "is_last": False,
        }
    }
)
logger.info(f"Response {result['retriever_function']['response']}")

Output:

{
    "summary": "Analysis identified 3 safety violations requiring immediate attention",
    "violations": [
        {
            "type": "PPE violation",
            "severity": "high",
            "location": "warehouse floor",
            "timestamp": "2024-01-01T10:30:00"
        }
    ],
    "recommendations": [
        "Implement mandatory PPE checks",
        "Increase safety training frequency"
    ]
}

Error Handling#

Common validation errors:

  • "Invalid response_method": Unsupported response_method value provided

  • "JSON mode requires 'json' in the question": Using json_mode without “json” keyword in question

  • "schema must be specified": Using function_calling without providing response_schema

  • "OpenAI function format or valid JSON schema": Invalid schema format provided