Structured Retrieval
Metadata filters narrow your search before the AI even starts looking. The result: faster answers, fewer false positives, and retrieval that understands context.
The Problem
Pure vector search finds semantically similar text — but it can't distinguish between a 2024 Supreme Court ruling and a 1998 district court opinion. Both might discuss the same legal concept, but only one is the authority you need.
Metadata solves this. By attaching structured fields — jurisdiction, date, authority type, case number — to every document, you give the retrieval engine pre-filters that narrow scope before semantic search even begins.
The result: fewer chunks to search, higher relevance, and answers grounded in the right sources.
Standard Fields
Every document can carry these standard metadata fields. Set them on upload, or let scrapers populate them automatically.
Who created or published the document. Populated from scraper source or manual input.
Oslo District Court, European Commission, Dr. Anna Larsen
Freeform labels for categorisation. Filterable in queries and the platform UI.
family-law, custody, barneloven, GDPR
Original URL of the document. Linked in citations so users can verify the source.
https://lovdata.no/dokument/NL/lov/1981-04-08-7
When the document was published. Enables time-range filtering and "most recent" sorting.
2024-03-15, 2023-01-01
Document type classification. Maps to your corpus taxonomy.
court-ruling, legislation, guidance, report
ISO 639-1 language code. Enables mono-lingual or cross-lingual retrieval.
nb, en, sv, de, fr
Auto-Extracted
When our scrapers ingest content from legal and regulatory sources, they automatically extract structured metadata that goes far beyond basic fields:
HR-2020-1789-A, Rt-1990-1274, C-311/18In Practice
Pass metadata filters alongside your natural language query. The retrieval engine narrows scope before semantic search begins.
curl -X POST https://ai.bluenotelogic.com/api/v2/search \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"corpus_id": "corp_abc123",
"query": "custody rights when parents disagree on relocation",
"filters": {
"jurisdiction": "norway",
"authority_rank": ["supreme_court", "court_of_appeal"],
"date_from": "2020-01-01",
"category": "court-ruling"
},
"top_k": 10
}'
curl -X POST https://ai.bluenotelogic.com/api/v2/documents/upload \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@ruling-HR-2024-00412.pdf" \
-F "corpus_id=corp_abc123" \
-F "metadata[case_number]=HR-2024-00412-A" \
-F "metadata[authority]=Norges Høyesterett" \
-F "metadata[authority_rank]=supreme_court" \
-F "metadata[jurisdiction]=norway" \
-F "metadata[category]=court-ruling" \
-F "metadata[tags]=barneloven,custody,relocation"
When documents are ingested via our web scrapers, metadata is extracted automatically from URLs, page titles, document headers, and content patterns. The BaseScraper framework knows how to parse:
Manual uploads can also include metadata fields — but for scraped content, it's all automatic.
Start with a free sandbox. Upload documents with metadata, or let our scrapers extract it automatically. Smarter metadata means smarter AI.