The Core of CorpusAI

Your Corpus, Your Competitive Edge

Every business has knowledge that exists nowhere else — contracts, policies, case files, internal procedures. A corpus turns that knowledge into an AI that understands your world.

Why Your Internal Corpus Changes Everything

Generic AI knows the internet. Your corpus makes AI know your business.

Institutional Memory

When senior employees leave, their knowledge walks out the door. A corpus captures decades of institutional knowledge — decisions, precedents, rationale — and makes it instantly searchable for every team member.

Competitive Moat

Your competitors can use the same AI models you do. They can't replicate your data. An internal corpus gives you answers grounded in your contracts, your case history, your internal policies — knowledge no competitor has access to.

Consistent Answers

Ask ChatGPT the same question twice and you get different answers. Ask your corpus and every answer is grounded in specific documents with citations. Your team gets consistent, verifiable information every time.

Time Recovery

Knowledge workers spend 20% of their time searching for information. A corpus gives instant answers from your document base — no more digging through folders, email threads, or asking colleagues.

Onboarding Accelerator

New hires take months to learn your systems, processes, and history. With a corpus, they can ask questions in natural language and get answers from your full document base on day one.

Growing Intelligence

Every document you add makes your corpus smarter. Unlike static databases, a corpus continuously improves — new precedents, updated policies, fresh case files all become part of your AI's corpus.

Stack of aged documents and manuscripts

Knowledge That Stays

Your Best People Won't Be Here Forever

When a senior partner retires or an experienced engineer leaves, decades of institutional knowledge walk out the door. A corpus captures that expertise — decisions, precedents, rationale — and makes it searchable for every team member.

Every document you add makes your corpus smarter. Unlike static databases, this knowledge grows and compounds over time.

Three Ways to Build Your Knowledge Stack

Choose the strategy that fits your business. Start simple, scale up as you grow.

Internal Only

Upload your own documents. Build a completely private CorpusAI corpus that only your team can access. Perfect for companies with proprietary content that shouldn't be mixed with external sources.

Ideal for

Internal policies & procedures
Client case files & contracts
Product documentation & SOPs
HR handbooks & training materials

Example query

“What is our policy on remote work for employees in the Nordic region?”

Answers from: Employee Handbook v4.2, Nordic Office Policy 2025

Most powerful

Internal + Marketplace

Combine your private documents with expert-curated marketplace corpora. Your internal knowledge meets industry-grade reference material — all queryable in a single question.

Ideal for

Law firms: case files + legislation
Compliance teams: policies + GDPR corpus
Consultants: client data + regulation
Tech companies: docs + EU AI Act

Example query

“Does our client contract comply with barneloven § 48 regarding the child’s best interest?”

Your files: Client Contract #4821  ·  Marketplace: barneloven § 48, HR-2020-1843-A

Marketplace Only

Don't have documents to upload yet? Start with expert-curated corpora from the marketplace. Get instant access to industry knowledge — legislation, regulations, case law — without uploading a single file.

Ideal for

Quick legal research access
Startups needing regulation context
Solo practitioners exploring AI
Education & academic research

Example query

“What are the key obligations under the EU AI Act for high-risk systems?”

Answers from: EU AI Act Corpus — Articles 6-49, Annex III

How You Feed Your Corpus

Three paths to get your documents indexed. Pick one or combine all three — your AI gets smarter with every document.

Drag & Drop Upload

Upload PDFs, Word documents, or text files — one or fifty at a time. Drag them in, add metadata, and watch them get chunked, embedded, and indexed in real time. No technical skills required.

Batch upload — multiple files at once
Per-file metadata: author, tags, source URL
OCR for scanned PDFs included

PDF, DOCX, TXT, HTML, MD — up to 50 MB each

Automated Web Scrapers

Point a scraper at any public data source. Configure it to run daily or weekly, and your corpus stays current automatically. Every new regulation, guideline, or publication gets indexed without lifting a finger.

Website, PDF collection, or RSS feed
Scheduled updates: daily or weekly
Admin-approved for security

SSRF-protected — configurable depth & page limits

REST API

Integrate with your existing systems. Push documents programmatically, tag them with rich metadata, and let the pipeline handle the rest. Perfect for CMS integrations, document management workflows, and CI/CD pipelines.

Upload, list, delete via API key
Rich metadata: author, tags, dates
Sync ingestion with response

X-API-Key auth — plan-gated document limits

Rich Metadata Makes Your AI Smarter

Every document can carry metadata — author, tags, source URL, publication date, category. This isn't just filing. It's how your AI learns to filter, prioritize, and cite the right source for the right question.

Author Tags Source URL Publication Date Category Language
Person browsing library shelves

Amplified Knowledge

Your Data + Industry Regulations

Your internal documents contain the answers to "what did we do?" Marketplace corpora contain the answers to "what should we do?" Combined, they let you ask questions that span both worlds.

Check your contracts against current legislation. Verify your policies meet GDPR requirements. Cross-reference your case files with Supreme Court precedent — all in a single query.

Internal Data Alone vs. Combined Intelligence

See how adding marketplace corpora amplifies the value of your internal documents.

1

Internal Corpus Alone

Answer from your data

Grounded in your contracts, policies, case files

Full privacy

Zero external data mixing

Exact document citations

Page numbers, paragraphs, file names

No regulatory context

Your data doesn't include the laws it's subject to

No cross-referencing

Can't check compliance against external standards

Recommended
+

Internal + Marketplace Corpora

Everything above, plus…

All internal corpus benefits included

Regulatory cross-referencing

Check your docs against legislation automatically

Compliance gap analysis

"Does our privacy policy cover all GDPR Article 13 requirements?"

Multi-source citations

Answers cite both your docs and relevant regulations

Always up-to-date regulations

Marketplace corpora maintained by domain experts

Enterprise & Custom Plans

From Corpus to Your Own AI Model

The ultimate competitive advantage: a fine-tuned AI model trained specifically on your corpus. Not just retrieving your documents — thinking like your organisation.

1. Build Corpus

Upload documents, configure scrapers, grow your corpus

2. Train Model

We fine-tune an AI model on your corpus using GPU infrastructure

3. Validate

Test against your domain questions, iterate until quality matches expectations

4. Deploy

Your model goes live — faster, cheaper, and smarter than generic AI

Why Fine-Tune?

3× faster responses

A fine-tuned model doesn't need to search your corpus at query time — the knowledge is baked into the model weights.

Lower per-query cost

No vector search overhead per query. A smaller, specialised model outperforms a larger generic one on your domain.

Domain vocabulary mastery

The model learns your industry's jargon, abbreviations, and terminology. It understands "BL § 48" means barneloven, not a product code.

Your model, your control

The fine-tuned model runs on EU infrastructure. You own the output. It can be deployed privately or integrated via API into your existing systems.

RAG + Fine-Tuning: Better Together

Fine-tuning and RAG (Retrieval-Augmented Generation) aren't competing approaches — they're complementary. Use both for the best results.

RAG alone (all plans)

Searches your corpus at query time. Always up-to-date. Great for fact-finding, document lookup, and citation-heavy queries. Works immediately after upload.

Fine-tuned model alone

Knows your domain deeply. Faster responses. Better at reasoning, analysis, and synthesis. Requires periodic retraining when your corpus changes significantly.

RAG + Fine-tuned model (best)

The fine-tuned model understands your domain vocabulary and reasoning patterns, while RAG ensures it always cites the latest documents. Deep understanding + real-time accuracy.

Already using CorpusAI? Your existing corpus is automatically the training data for fine-tuning. No extra preparation needed — just contact us when you're ready to take the next step.

Real-World Scenarios

See how different businesses use corpus strategies to solve real problems.

Law Firm

Family law practice, 12 employees

Handles custody, divorce, and child welfare cases across Norway.

Corpus Setup

3,200 client case files (internal)
Norwegian Family Law (free)
EU AI Act & GDPR (free, coming soon)

Result

Lawyers ask one question and get answers citing both their own case history and relevant legislation. New associates become productive in days, not months.

SaaS Company

B2B platform, 45 employees

Expanding to EU market, needs to ensure AI Act and GDPR compliance.

Corpus Setup

Product docs & privacy policies (internal)
EU AI Act & GDPR (free, coming soon)

Result

Compliance team can instantly check if product features meet EU AI Act obligations. Gap analysis that took weeks now takes minutes.

Enterprise + Fine-Tuning

Insurance company, 200+ employees

Processes thousands of claims and needs fast, consistent assessments.

Corpus Setup

50,000+ claim records (internal)
Policy manuals & guidelines (internal)
Fine-tuned model (deployed)

Result

A dedicated AI model trained on their claim history. Assessors get instant, consistent recommendations. Processing time reduced by 60%. The model speaks their exact terminology.

Continue exploring

Ready to turn your documents into a competitive edge?

Start with a free corpus. Grow with marketplace corpora. Graduate to a fine-tuned model. Your data, your pace.