Technical Specifications — CaveauAI by Blue Note Logic

01

System Architecture

Three-tier topology on dedicated bare-metal servers. No cloud dependencies. Each layer is independently scalable and geographically isolated within EU/EEA data centres.

Application Tier

S1 — Hetzner, Finland

• PHP 8.x application server
• MariaDB 10.11 (platform + RAG)
• MCP server (Python, FastMCP)
• Apache 2.4 + mod_php
• REST API endpoints

AI Gateway

S4 — WireGuard VPN

• Ollama inference server
• Embedding generation (nomic-embed)
• LLM inference (7B fine-tuned)
• Model routing & load balancing
• Health monitoring

GPU & Vector Store

Hippo + S2 — Germany

• NVIDIA RTX PRO 6000 (96 GB VRAM)
• 72B model inference (Q8_0)
• Qdrant vector DB (HNSW index)
• 768-dim cosine similarity
• Fine-tuning pipeline (QLoRA)

Network Topology

WireGuard VPN Mesh

S1 (App) ↔ S4 (AI) — 10.0.0.x

S1 (App) ↔ Hippo (GPU) — 10.8.0.x

S1 (App) ↔ S2 (Qdrant) — 10.0.0.x

Data Centre Locations

S1: Hetzner, Helsinki, Finland — Web, DB, MCP

S2: Hetzner, Falkenstein, Germany — Qdrant

S4: Hetzner, Falkenstein, Germany — Ollama

Hippo: Hetzner, Falkenstein, Germany — RTX PRO 6000

02

Corpus Pipeline

Six-stage ingestion pipeline transforms raw documents into searchable, citation-ready knowledge. Fully automated — from PDF upload to queryable vector index.

1

Source

Scraper or upload acquires raw document (PDF, DOCX, HTML, TXT).

2

Parse

Text extraction preserving headings, tables, and footnotes. Section hierarchy detected.

3

Chunk

Heading-aware semantic chunking. Max 600 words, min 50, 75-word overlap between chunks.

4

Embed

768-dimensional vectors via nomic-embed-text. CPU-only, no GPU required for ingestion.

5

Index

Dual storage: Qdrant (HNSW vector index) + MariaDB (metadata + fallback keyword search).

6

Validate

Deduplication check, chunk count verification, Qdrant point confirmation. Bad data rejected.

Chunking Parameters

Parameter	Value	Notes
Max chunk size	600 words	Optimised for 768-dim embedding window
Min chunk size	50 words	Prevents degenerate single-sentence chunks
Overlap	75 words	Cross-boundary context preservation
Heading detection	Markdown H1–H4 + font-weight heuristics	Section-aware boundaries
Section title	Extracted & stored per chunk	Displayed in citations

03

RAG Pipeline

Eight-step retrieval-augmented generation pipeline. Hybrid search combines vector similarity with keyword matching via reciprocal rank fusion.

1

Query Embedding

User question is embedded into 768-dim vector space using the same nomic-embed-text model used for document chunks.

2

Vector Search

Qdrant HNSW approximate nearest-neighbour search returns top-k candidate chunks by cosine similarity. Default k=8.

3

Keyword Search

Parallel BM25-style keyword search against MariaDB full-text index for exact term matching and rare token coverage.

4

Rank Fusion

Reciprocal Rank Fusion (RRF) merges vector and keyword results. Private chunks boosted 1.5× vs shared corpus chunks.

5

Context Assembly

Top-ranked chunks assembled into a prompt context window with document titles, section headers, and jurisdiction metadata.

6

LLM Inference

Prompt + context sent to fine-tuned LLM (bnl-legal 7B or qwen2.5:72b). Model generates answer grounded in retrieved chunks.

7

Citation Extraction

Every claim mapped back to source document. Output includes document title, chunk section, relevance score, and jurisdiction.

8

Response Delivery

Structured JSON response with answer text, source citations, model used, and response time. Streaming supported via SSE.

768

Embedding dimensions

1.5×

Private chunk boost factor

RRF

Reciprocal Rank Fusion

04

MCP Protocol

Model Context Protocol server exposing 9 tools for AI agent integration. Compatible with Claude Desktop, Claude Code, Cursor, and any MCP-compliant client.

Built with FastMCP (Python). Transports: SSH stdio (Claude Code) and SSE port 8001 (Claude Desktop).

MCP Tools Reference

Tool	Description	Key Parameters
`search_corpus`	Semantic vector search across shared corpuses	query, corpus, limit, language, jurisdiction, category
`list_corpuses`	Inventory of all corpuses with doc/chunk counts	—
`get_document`	Full document with all chunks in reading order	document_id, corpus
`get_categories`	Category taxonomy tree for a corpus	corpus
`search_keywords`	Metadata filtering (no embedding needed)	title_contains, tags, jurisdiction, authority_type, year_from/to
`search_private`	Tenant-isolated search of client documents	client_id, query, limit, category
`ask`	Full RAG: search → context → LLM answer with citations	question, corpus, model, temperature, max_tokens
`ingest_text`	Chunk + embed + store a text document	title, content, corpus, category, language, jurisdiction
`corpus_stats`	Analytics: breakdowns by category, language, year	corpus, client_id

Claude Desktop — SSE transport

{
  "mcpServers": {
    "caveauai": {
      "url": "https://ai.bluenotelogic.com/mcp",
      "auth": {
        "type": "bearer",
        "token": "$API_KEY"
      }
    }
  }
}

Claude Code — SSH stdio transport

{
  "mcpServers": {
    "legal-knowledge": {
      "command": "ssh",
      "args": [
        "user@server",
        "python server.py --stdio"
      ]
    }
  }
}

05

REST API

Standard REST endpoints authenticated via X-API-Key header. JSON request/response. HTTPS only.

Method	Endpoint	Description	Auth
GET	`/api/v2/documents`	List documents with pagination & filters	API Key
POST	`/api/v2/documents`	Upload document (multipart) with metadata	API Key
DELETE	`/api/v2/documents`	Remove document + chunks + Qdrant points	API Key
POST	`/api/v2/chat`	RAG chat with streaming (SSE) support	API Key
GET	`/api/v2/search`	Vector search across client corpus	API Key
POST	`/api/v2/feedback`	Submit feedback on AI responses	API Key

POST /api/v2/documents

curl -X POST https://ai.bluenotelogic.com/api/v2/documents \
  -H "X-API-Key: YOUR_KEY" \
  -F "file=@contract.pdf" \
  -F "title=Employment Contract" \
  -F "category=contracts" \
  -F "author=Legal Department" \
  -F "tags=employment,hr,compliance"

// Response:
{
  "success": true,
  "document_id": 1842,
  "chunks_created": 23,
  "processing_time_ms": 4200
}

06

AI Models

Three model tiers covering embedding, domain-specific reasoning, and enterprise-grade analysis. All self-hosted on EU bare-metal. No third-party API calls.

Embedding

nomic-embed-text

137M parameters

Dimensions	768
Max tokens	8,192
Similarity	Cosine
Hardware	CPU only
Location	S4

Domain Expert

bnl-legal

Qwen 2.5 7B — Q5_K_M

Parameters	7.6B
Quantisation	Q5_K_M
Fine-tuned on	Norwegian law
Method	QLoRA (r=64)
Location	S4 + Hippo

Enterprise

qwen2.5:72b

Qwen 2.5 72B — Q8_0

Parameters	72B
Quantisation	Q8_0 (77GB)
VRAM required	80+ GB
Hardware	RTX PRO 6000
Location	Hippo

Knowledge Distillation Pipeline

Teacher models (NorwAI-24B + Qwen-72B) generate domain-specific Q&A pairs from corpus documents. These are curated, scored, and used to QLoRA fine-tune the 7B student model. Result: expert-level answers at 7B inference cost.

07

Security & Compliance

GDPR compliance is an architectural decision, not an afterthought. Every layer enforces tenant isolation, data sovereignty, and access control.

Tenant Isolation

Each client’s documents, chunks, and vectors are stored with a client_id filter. Qdrant queries include mandatory payload filters. No cross-tenant data leakage is possible at the query level.

SSRF Protection

Client-submitted scraper URLs are validated against private IP ranges (RFC 1918, link-local, loopback). DNS resolution is checked post-redirect. Blocks TOCTOU attacks.

Approval Workflows

New scraper sources require admin approval before activation. URL or schedule changes trigger re-approval. Deactivated sources cannot ingest until approved.

EU Data Sovereignty

All servers in EU/EEA data centres (Hetzner, Germany & Finland). No US cloud providers. No data export. No third-party AI API calls. WireGuard encrypted transit.

Authentication

Platform: session-based with CSRF tokens. API: per-client API keys via X-API-Key header. Role-based access: owner, admin, editor, viewer. Bcrypt password hashing.

Prepared Statements

All database queries use PDO prepared statements with parameterised bindings. No string concatenation in SQL. XSS prevention via htmlspecialchars() on output.

08

Performance Benchmarks

Measured on production infrastructure with real corpus data. Numbers reflect typical workloads, not synthetic benchmarks.

< 50ms

Vector Search

Qdrant HNSW, 146K+ points

1.5–3s

RAG Answer (7B)

Full pipeline, 8-chunk context

3–8s

RAG Answer (72B)

Complex multi-doc reasoning

< 200ms

Embedding

Single query, nomic-embed-text

< 5s

Document Ingest

Average 10-page PDF

60s

Batch 100 docs

Parallel chunking + embedding

99.5%

Uptime SLA

Production (Corporate+ plans)

0

Data Export

Zero bytes leave EU jurisdiction

Qdrant Vector Collections

Collection	Corpus	Documents	Vectors	Index
`bnl_chunks`	Norwegian Law	8,665	81,367	HNSW (m=16, ef=128)
`corpus_eu_ai`	EU AI & Digital Law	710	24,938	HNSW (m=16, ef=128)
`corpus_telecom`	Telecom Regulation	979	19,784	HNSW (m=16, ef=128)
`corpus_co2`	CO2 & Climate	874	14,818	HNSW (m=16, ef=128)
`bnl_client_chunks`	Private Client Docs	—	—	HNSW + client_id filter

Ready to integrate?

Connect your AI agents to CaveauAI in under 2 minutes. Full API access on Professional plans and above.

Start Free Trial View Integrations

Under the Hood ofCaveauAI

System Architecture

Application Tier

AI Gateway

GPU & Vector Store

Network Topology

Corpus Pipeline

Source

Parse

Chunk

Embed

Index

Validate

Chunking Parameters

RAG Pipeline

Query Embedding

Vector Search

Keyword Search

Rank Fusion

Context Assembly

LLM Inference

Citation Extraction

Response Delivery

MCP Protocol

MCP Tools Reference

REST API

AI Models

nomic-embed-text

bnl-legal

qwen2.5:72b

Knowledge Distillation Pipeline

Security & Compliance

Tenant Isolation

SSRF Protection

Approval Workflows

EU Data Sovereignty

Authentication

Prepared Statements

Performance Benchmarks

Qdrant Vector Collections

Ready to integrate?

Under the Hood of
CaveauAI