How It Works
Two-corpus RAG with a live Box integration. Here's what's actually happening under the hood.
The pipeline
Scrape FDA enforcement corpus
870+ CDER/CBER warning letters (2019–present) scraped from FDA.gov, categorized into 10 violation areas using Claude, chunked, and embedded into Pinecone.
Connect to Box via JWT
Internal quality documents live in a Box folder. The server-to-server JWT connector downloads files on demand — no migration, no export.
Embed internal documents
Each document is chunked with section context preserved and embedded into a separate Pinecone namespace. Box webhooks trigger automatic re-ingestion when files change.
Cross-corpus retrieval
For each violation category, semantic search runs against both corpora in parallel — retrieving the most relevant warning letter passages and internal document sections.
Risk signal generation
Claude analyzes the enforcement patterns and document evidence to produce a structured signal: enforcement frequency, document coverage assessment, and a specific review prompt for the team.
Stream results in real time
Signals appear as they complete — 10 categories processed in parallel batches, streamed via SSE so users see results progressively rather than waiting for the full scan.
Architecture
Who uses it and why
Same engine. Different use cases.
The two-corpus RAG architecture adapts to any domain where external reference data needs to be cross-referenced against internal documents.
| Pharma Intelligence | Compliance Copilot | Rules Expert | |
|---|---|---|---|
| Document corpus | FDA warning letters + quality SOPs | 21 CFR Part 11 + policy documents | 2023 Rules of Golf |
| Retrieval | Two-namespace cross-corpus | Single-namespace requirement matching | Hybrid vector + BM25 |
| Output | Risk signals with coverage assessment | Gap analysis with requirement status | Cited rule answers |
| External integration | Box JWT connector | Static document upload | None |
| Streaming | SSE (signal-by-signal) | SSE (requirement-by-requirement) | UI message stream |
Frequently asked questions
What does the Pharma Intelligence Copilot do?+
It analyzes FDA warning letters and cross-references them against a company's internal quality documents, surfacing where the issues regulators are citing elsewhere might apply to you — with the supporting citations and enforcement-trend context.
What is two-corpus RAG?+
Most retrieval systems search one body of documents. This system reasons across two at once — the external corpus of FDA warning letters and your internal quality documents — so it can connect an external enforcement pattern to the specific internal document it bears on. Both sides of every finding are cited.
Can this be adapted to our documents and regulators?+
Yes. The demo uses FDA warning letters and a sample company's documents, but the same two-corpus architecture applies to other regulators, standards bodies, or external sources cross-referenced against your internal material. Discovery scopes the corpora and integrations.
How does it connect to where our documents live?+
The demo integrates with Box to read documents from a real document store. We build against the systems your documents actually live in rather than requiring you to export and upload everything by hand.
Ready to explore?
Start with enforcement trends or run a full risk scan against Meridian's documents.