Rich Metadata — CaveauAI by Blue Note Logic

The Problem

Vector Search Alone Isn't Enough

Pure vector search finds semantically similar text — but it can't distinguish between a 2024 Supreme Court ruling and a 1998 district court opinion. Both might discuss the same legal concept, but only one is the authority you need.

Metadata solves this. By attaching structured fields — jurisdiction, date, authority type, case number — to every document, you give the retrieval engine pre-filters that narrow scope before semantic search even begins.

The result: fewer chunks to search, higher relevance, and answers grounded in the right sources.

// Without metadata — searches everything

query: "custody rights after divorce"

→ 2,847 chunks across all jurisdictions, all years

→ Mixed court levels, outdated rulings

// With metadata — targeted retrieval

query: "custody rights after divorce"

filters: jurisdiction=norway, year>=2020,

authority=supreme_court

→ 23 chunks from relevant rulings

→ Current law, highest authority

Standard Fields

Attach Context to Every Document

Every document can carry these standard metadata fields. Set them on upload, or let scrapers populate them automatically.

Author

Who created or published the document. Populated from scraper source or manual input.

Oslo District Court, European Commission, Dr. Anna Larsen

Source URL

Original URL of the document. Linked in citations so users can verify the source.

https://lovdata.no/dokument/NL/lov/1981-04-08-7

Publication Date

When the document was published. Enables time-range filtering and "most recent" sorting.

2024-03-15, 2023-01-01

Language

ISO 639-1 language code. Enables mono-lingual or cross-lingual retrieval.

nb, en, sv, de, fr

Auto-Extracted

Domain-Specific Intelligence

When our scrapers ingest content from legal and regulatory sources, they automatically extract structured metadata that goes far beyond basic fields:

Case Number — Detected patterns: HR-2020-1789-A, Rt-1990-1274, C-311/18
Authority Type — Court, regulator, legislature, or expert body. Extracted from source domain and document structure.
Authority Rank — Supreme court, court of appeal, district court, regulatory body. Enables hierarchical weighting in retrieval.
Jurisdiction — Norway, EU, international, ECHR. Determined from source URL and document metadata.

// Auto-extracted from Lovdata scraper

case_number"HR-2024-00412-A"

authority"Norges Høyesterett"

authority_type"court"

authority_rank"supreme_court"

jurisdiction"norway"

date"2024-06-21"

category"court-ruling"

tags["barneloven", "§36"]

// Auto-extracted from CJEU scraper

case_number"C-311/18"

authority"CJEU Grand Chamber"

jurisdiction"eu"

tags["schrems-ii", "GDPR"]

In Practice

Metadata-Filtered Queries

Pass metadata filters alongside your natural language query. The retrieval engine narrows scope before semantic search begins.

POST /api/v2/search Search with metadata filters

curl -X POST https://ai.bluenotelogic.com/api/v2/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "corpus_id": "corp_abc123",
    "query": "custody rights when parents disagree on relocation",
    "filters": {
      "jurisdiction": "norway",
      "authority_rank": ["supreme_court", "court_of_appeal"],
      "date_from": "2020-01-01",
      "category": "court-ruling"
    },
    "top_k": 10
  }'

POST /api/v2/documents/upload Upload with metadata

curl -X POST https://ai.bluenotelogic.com/api/v2/documents/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@ruling-HR-2024-00412.pdf" \
  -F "corpus_id=corp_abc123" \
  -F "metadata[case_number]=HR-2024-00412-A" \
  -F "metadata[authority]=Norges Høyesterett" \
  -F "metadata[authority_rank]=supreme_court" \
  -F "metadata[jurisdiction]=norway" \
  -F "metadata[category]=court-ruling" \
  -F "metadata[tags]=barneloven,custody,relocation"

You Don't Have to Tag Manually

When documents are ingested via our web scrapers, metadata is extracted automatically from URLs, page titles, document headers, and content patterns. The BaseScraper framework knows how to parse:

Norwegian case numbers EU case numbers Publication dates Authority names Jurisdiction from URL Document categories

Manual uploads can also include metadata fields — but for scraped content, it's all automatic.

Build Your Knowledge Base

Start with a free sandbox. Upload documents with metadata, or let our scrapers extract it automatically. Smarter metadata means smarter AI.

Start Free → Explore Scrapers

Smarter Metadata. Smarter AI.