Architecture Deep Dive
Every answer is grounded in your documents. Here's exactly how we make that happen.
RAG (Retrieval-Augmented Generation) is the technique that prevents AI hallucination. Instead of asking a model to answer from memory, we first retrieve the most relevant passages from your documents, then ask the model to synthesize an answer using only those passages.
Documents are split into overlapping chunks using a sliding window approach. Each chunk is ~512 tokens with ~50 token overlap to preserve context at chunk boundaries.
// Chunking configuration chunk_size: 512 tokens overlap: 50 tokens strategy: "sliding_window" boundary_aware: true // respects paragraph/section breaks
Each chunk is transformed into a 768-dimensional vector using a multilingual embedding model. These vectors capture the semantic meaning of the text, enabling similarity search that understands concepts, not just keywords.
// Embedding specification model: "multilingual-e5-large" dimensions: 768 index: "HNSW" (Hierarchical Navigable Small World) distance: "cosine_similarity"
When you query, we run two search strategies in parallel:
Finds semantically similar passages. Understands that "termination clause" and "ending the agreement" mean the same thing.
Finds exact term matches. Critical for proper nouns, case numbers, statute references, and technical terms.
Results from both strategies are fused using Reciprocal Rank Fusion (RRF), producing a single ranked list of the most relevant passages.
A single query can search across multiple corpora simultaneously. Each corpus maintains its own vector index and BM25 index. Results are fused across corpora before being passed to the LLM.
// Multi-corpus query POST /api/v2/chat { "message": "What are the GDPR requirements for AI systems?", "corpus_ids": ["internal-policies", "eu-ai-act", "gdpr-guidance"] }
The LLM receives the top-ranked passages as context and generates an answer. Every claim in the answer is annotated with a reference back to the specific source passage — document name, page number, and the relevant text extract.
If the retrieved passages don't contain enough information to answer the question, the system explicitly says so rather than fabricating an answer.