Ingestion#
This guide explains how to add documents to the Context-Aware RAG system.
Adding Documents#
Documents can be added to the system using the /add_doc
endpoint of the Data Ingestion Service.
Request Format#
{
"document": "Your document text here",
"doc_index": 0,
"doc_metadata": {
"streamId": "unique_stream_id",
"chunkIdx": 0,
"file": "source_file.txt",
"is_first": true, // Required for first document in a stream
"is_last": false, // Required for last document in a stream
"uuid": "your_session_uuid"
}
}
Metadata Flags#
is_first
: Set totrue
for the first document in a streamis_last
: Set totrue
for the last document in a streamAt least one document must have
is_first: true
and one must haveis_last: true
Example: Adding Multiple Documents#
First document:
import requests
url = "http://localhost:8001/add_doc"
headers = {"Content-Type": "application/json"}
data = {
"document": "First document content",
"doc_index": 0,
"doc_metadata": {
"streamId": "stream1",
"chunkIdx": 0,
"file": "doc.txt",
"is_first": True,
"is_last": False,
"uuid": "your_session_uuid"
}
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
Middle document:
import requests
url = "http://localhost:8001/add_doc"
headers = {"Content-Type": "application/json"}
data = {
"document": "Middle document content",
"doc_index": 1,
"doc_metadata": {
"streamId": "stream1",
"chunkIdx": 1,
"file": "doc.txt",
"uuid": "your_session_uuid"
}
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
Last document:
import requests
url = "http://localhost:8001/add_doc"
headers = {"Content-Type": "application/json"}
data = {
"document": "Last document content",
"doc_index": 2,
"doc_metadata": {
"streamId": "stream1",
"chunkIdx": 2,
"file": "doc.txt",
"is_first": False,
"is_last": True,
"uuid": "your_session_uuid"
}
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
Best Practices#
Document Structure#
Keep documents between 100-1000 words for optimal retrieval
Use clear, well-formatted text
Include relevant metadata
Document Indexing#
Use sequential indices starting from 0
Maintain consistent indexing within a stream
Include relevant metadata for better context
Performance Optimization#
Batch similar documents together
Use appropriate chunk sizes
Monitor system resources