Technical Reference
Architecture, pipelines, protocols, and performance. Everything your engineering team needs to evaluate, integrate, and deploy.
01
Three-tier topology on dedicated bare-metal servers. No cloud dependencies. Each layer is independently scalable and geographically isolated within EU/EEA data centres.
S1 — Hetzner, Finland
S4 — WireGuard VPN
Hippo + S2 — Germany
WireGuard VPN Mesh
S1 (App) ↔ S4 (AI) — 10.0.0.x
S1 (App) ↔ Hippo (GPU) — 10.8.0.x
S1 (App) ↔ S2 (Qdrant) — 10.0.0.x
Data Centre Locations
S1: Hetzner, Helsinki, Finland — Web, DB, MCP
S2: Hetzner, Falkenstein, Germany — Qdrant
S4: Hetzner, Falkenstein, Germany — Ollama
Hippo: Hetzner, Falkenstein, Germany — RTX PRO 6000
02
Six-stage ingestion pipeline transforms raw documents into searchable, citation-ready knowledge. Fully automated — from PDF upload to queryable vector index.
Scraper or upload acquires raw document (PDF, DOCX, HTML, TXT).
Text extraction preserving headings, tables, and footnotes. Section hierarchy detected.
Heading-aware semantic chunking. Max 600 words, min 50, 75-word overlap between chunks.
768-dimensional vectors via nomic-embed-text. CPU-only, no GPU required for ingestion.
Dual storage: Qdrant (HNSW vector index) + MariaDB (metadata + fallback keyword search).
Deduplication check, chunk count verification, Qdrant point confirmation. Bad data rejected.
| Parameter | Value | Notes |
|---|---|---|
| Max chunk size | 600 words | Optimised for 768-dim embedding window |
| Min chunk size | 50 words | Prevents degenerate single-sentence chunks |
| Overlap | 75 words | Cross-boundary context preservation |
| Heading detection | Markdown H1–H4 + font-weight heuristics | Section-aware boundaries |
| Section title | Extracted & stored per chunk | Displayed in citations |
03
Eight-step retrieval-augmented generation pipeline. Hybrid search combines vector similarity with keyword matching via reciprocal rank fusion.
User question is embedded into 768-dim vector space using the same nomic-embed-text model used for document chunks.
Qdrant HNSW approximate nearest-neighbour search returns top-k candidate chunks by cosine similarity. Default k=8.
Parallel BM25-style keyword search against MariaDB full-text index for exact term matching and rare token coverage.
Reciprocal Rank Fusion (RRF) merges vector and keyword results. Private chunks boosted 1.5× vs shared corpus chunks.
Top-ranked chunks assembled into a prompt context window with document titles, section headers, and jurisdiction metadata.
Prompt + context sent to fine-tuned LLM (bnl-legal 7B or qwen2.5:72b). Model generates answer grounded in retrieved chunks.
Every claim mapped back to source document. Output includes document title, chunk section, relevance score, and jurisdiction.
Structured JSON response with answer text, source citations, model used, and response time. Streaming supported via SSE.
768
Embedding dimensions
1.5×
Private chunk boost factor
RRF
Reciprocal Rank Fusion
04
Model Context Protocol server exposing 9 tools for AI agent integration. Compatible with Claude Desktop, Claude Code, Cursor, and any MCP-compliant client.
Built with FastMCP (Python). Transports: SSH stdio (Claude Code) and SSE port 8001 (Claude Desktop).
| Tool | Description | Key Parameters |
|---|---|---|
search_corpus |
Semantic vector search across shared corpuses | query, corpus, limit, language, jurisdiction, category |
list_corpuses |
Inventory of all corpuses with doc/chunk counts | — |
get_document |
Full document with all chunks in reading order | document_id, corpus |
get_categories |
Category taxonomy tree for a corpus | corpus |
search_keywords |
Metadata filtering (no embedding needed) | title_contains, tags, jurisdiction, authority_type, year_from/to |
search_private |
Tenant-isolated search of client documents | client_id, query, limit, category |
ask |
Full RAG: search → context → LLM answer with citations | question, corpus, model, temperature, max_tokens |
ingest_text |
Chunk + embed + store a text document | title, content, corpus, category, language, jurisdiction |
corpus_stats |
Analytics: breakdowns by category, language, year | corpus, client_id |
Claude Desktop — SSE transport
{
"mcpServers": {
"caveauai": {
"url": "https://ai.bluenotelogic.com/mcp",
"auth": {
"type": "bearer",
"token": "$API_KEY"
}
}
}
}
Claude Code — SSH stdio transport
{
"mcpServers": {
"legal-knowledge": {
"command": "ssh",
"args": [
"user@server",
"python server.py --stdio"
]
}
}
}
05
Standard REST endpoints authenticated via X-API-Key header. JSON request/response. HTTPS only.
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| GET | /api/v2/documents |
List documents with pagination & filters | API Key |
| POST | /api/v2/documents |
Upload document (multipart) with metadata | API Key |
| DELETE | /api/v2/documents |
Remove document + chunks + Qdrant points | API Key |
| POST | /api/v2/chat |
RAG chat with streaming (SSE) support | API Key |
| GET | /api/v2/search |
Vector search across client corpus | API Key |
| POST | /api/v2/feedback |
Submit feedback on AI responses | API Key |
curl -X POST https://ai.bluenotelogic.com/api/v2/documents \ -H "X-API-Key: YOUR_KEY" \ -F "file=@contract.pdf" \ -F "title=Employment Contract" \ -F "category=contracts" \ -F "author=Legal Department" \ -F "tags=employment,hr,compliance" // Response: { "success": true, "document_id": 1842, "chunks_created": 23, "processing_time_ms": 4200 }
06
Three model tiers covering embedding, domain-specific reasoning, and enterprise-grade analysis. All self-hosted on EU bare-metal. No third-party API calls.
137M parameters
| Dimensions | 768 |
| Max tokens | 8,192 |
| Similarity | Cosine |
| Hardware | CPU only |
| Location | S4 |
Qwen 2.5 7B — Q5_K_M
| Parameters | 7.6B |
| Quantisation | Q5_K_M |
| Fine-tuned on | Norwegian law |
| Method | QLoRA (r=64) |
| Location | S4 + Hippo |
Qwen 2.5 72B — Q8_0
| Parameters | 72B |
| Quantisation | Q8_0 (77GB) |
| VRAM required | 80+ GB |
| Hardware | RTX PRO 6000 |
| Location | Hippo |
Teacher models (NorwAI-24B + Qwen-72B) generate domain-specific Q&A pairs from corpus documents. These are curated, scored, and used to QLoRA fine-tune the 7B student model. Result: expert-level answers at 7B inference cost.
07
GDPR compliance is an architectural decision, not an afterthought. Every layer enforces tenant isolation, data sovereignty, and access control.
Each client’s documents, chunks, and vectors are stored with a client_id filter. Qdrant queries include mandatory payload filters. No cross-tenant data leakage is possible at the query level.
Client-submitted scraper URLs are validated against private IP ranges (RFC 1918, link-local, loopback). DNS resolution is checked post-redirect. Blocks TOCTOU attacks.
New scraper sources require admin approval before activation. URL or schedule changes trigger re-approval. Deactivated sources cannot ingest until approved.
All servers in EU/EEA data centres (Hetzner, Germany & Finland). No US cloud providers. No data export. No third-party AI API calls. WireGuard encrypted transit.
Platform: session-based with CSRF tokens. API: per-client API keys via X-API-Key header. Role-based access: owner, admin, editor, viewer. Bcrypt password hashing.
All database queries use PDO prepared statements with parameterised bindings. No string concatenation in SQL. XSS prevention via htmlspecialchars() on output.
08
Measured on production infrastructure with real corpus data. Numbers reflect typical workloads, not synthetic benchmarks.
< 50ms
Vector Search
Qdrant HNSW, 146K+ points
1.5–3s
RAG Answer (7B)
Full pipeline, 8-chunk context
3–8s
RAG Answer (72B)
Complex multi-doc reasoning
< 200ms
Embedding
Single query, nomic-embed-text
< 5s
Document Ingest
Average 10-page PDF
60s
Batch 100 docs
Parallel chunking + embedding
99.5%
Uptime SLA
Production (Corporate+ plans)
0
Data Export
Zero bytes leave EU jurisdiction
| Collection | Corpus | Documents | Vectors | Index |
|---|---|---|---|---|
bnl_chunks |
Norwegian Law | 8,665 | 81,367 | HNSW (m=16, ef=128) |
corpus_eu_ai |
EU AI & Digital Law | 710 | 24,938 | HNSW (m=16, ef=128) |
corpus_telecom |
Telecom Regulation | 979 | 19,784 | HNSW (m=16, ef=128) |
corpus_co2 |
CO2 & Climate | 874 | 14,818 | HNSW (m=16, ef=128) |
bnl_client_chunks |
Private Client Docs | — | — | HNSW + client_id filter |
Connect your AI agents to CaveauAI in under 2 minutes. Full API access on Professional plans and above.