Document Processing

Understanding how your documents are processed for retrieval.

Processing Pipeline

Document Upload
     |
     v
+-------------+
|   Parsing   |  Extract text from PDF/DOCX
+-------------+
     |
     v
+-------------+
|  Chunking   |  Split into smaller pieces
+-------------+
     |
     v
+-------------+
|  Embedding  |  Convert to vectors
+-------------+
     |
     v
+-------------+
|  Indexing   |  Store for fast search
+-------------+

Chunking Strategy

Documents are split into chunks for retrieval:

Parameter	Default	Description
Chunk size	512 tokens	Size of each piece
Overlap	50 tokens	Overlap between chunks

Why Chunks?

LLMs have context limits
Smaller chunks = more precise retrieval
Overlap prevents losing context at boundaries

Embedding

Chunks are converted to vectors (embeddings) that capture semantic meaning:

Similar content = similar vectors
Enables semantic search (not just keyword matching)

Retrieval Process

When an agent needs information:

Query: User question is embedded
Search: Find similar chunk vectors
Rank: Score by relevance
Filter: Apply minimum score threshold
Return: Top K chunks sent to LLM

User: "What is your return policy?"
     |
     v
Query Embedding: [0.23, 0.45, ...]
     |
     v
Vector Search in Knowledge Base
     |
     v
Top 5 matching chunks:
1. "Return Policy: Items can be returned within 30 days..."
2. "Refund Process: Once received, refunds take 5-7 days..."
3. ...
     |
     v
LLM generates response using chunks

Monitoring

View processing status:

Go to Knowledge Base
Select your KB
See Processing Status for each document

Status	Meaning
Processing	Currently being processed
Ready	Available for retrieval
Failed	Processing error