RAG Systems · Healthcare

Clinical Knowledge Base for 10,000+ Clinicians

Accurate, cited answers from 40 years of protocols

RAGHealthcareEnterpriseKnowledge Base
CASE
10×
Improvement in retrieval precision
<2s
Average query response time
0
Hallucinations in 6-month audit
The Challenge
40 Years of Protocols, Inaccessible in Practice

The hospital network had accumulated 40 years of clinical protocols, drug interaction databases, treatment guidelines, care pathway documents, and formulary updates. The corpus comprised 2.4 million documents totaling over 8 terabytes, housed across 14 separate systems with inconsistent metadata and no unified search. 12,000 clinicians — physicians, nurses, pharmacists, and clinical specialists — needed to access this information at the point of care. The existing solution was a SharePoint-based search that returned keyword matches: functionally unusable for clinical queries that used natural language and required semantic understanding.

The Hallucination Risk in Healthcare

The system could not hallucinate. In the hospital's threat model, a response that confidently cited an incorrect drug interaction or a superseded dosing protocol was categorically unacceptable. Unlike consumer AI applications where a wrong answer is an inconvenience, in a clinical setting a wrong answer is a patient safety event. This constraint shaped every architectural decision. The system needed to retrieve and cite, not to generate and infer. Every response needed source attribution with enough specificity — document name, section, page — that a clinician could verify the source in under 30 seconds if they chose to.

"A response that confidently cites an incorrect drug interaction is not an inconvenience — it is a patient safety event. The system could not hallucinate."

Our Solution
Hybrid Retrieval Architecture

We built a retrieval system combining BM25 keyword search with dense semantic retrieval, merged via a reciprocal rank fusion algorithm and re-ranked by a cross-encoder trained on clinical query-document pairs labeled by clinical informaticists. The dual retrieval approach handles two distinct query types that clinical staff issue: keyword-anchored factual queries ("metformin contraindications CKD stage 3") where BM25 excels, and natural language clinical reasoning queries ("what's the protocol for chest pain presentation in a patient with prior CABG on anticoagulation") where dense retrieval is required. The merger ensures neither query type degrades the other.

Citation Infrastructure

Every response includes structured citations: document title, version date, section heading, page number, and a direct link to the source document in the document management system. The citation is generated alongside the response — not as an afterthought, but as a first-class output of the retrieval pipeline. Clinicians can verify any response in under 30 seconds by clicking through to the source. The system's behavior in uncertain cases is to surface multiple relevant documents and let the clinician synthesize, rather than to generate a synthesized answer that obscures the source uncertainty.

Confidence Thresholds and Escalation

Queries that fall below a confidence threshold — measured by retrieval score distribution and cross-encoder confidence — are routed to a human clinical informatics team rather than returned with a low-confidence automated response. The threshold was calibrated by reviewing 400 queries where the system's confidence was borderline and having clinical informaticists label the correct handling. This escalation path is not a failure mode — it's a designed feature. In the system's first 6 months, 2.3% of queries were escalated. Clinician feedback on escalated queries was uniformly positive: they preferred a transparent "I'm not confident, here's a human" to a confidently wrong automated answer.

"2.3% of queries were escalated to human review in the first 6 months. Clinicians preferred transparent uncertainty to confident errors — every time."

Results
Retrieval Precision and Speed

Precision at k=5 (did the retrieved context contain the information needed to answer the query?) improved from 12% with the previous SharePoint system to 91% with the hybrid retrieval system — a 10× improvement in the metric that most directly predicts clinical utility. Average query response time is 1.8 seconds end-to-end, including retrieval, re-ranking, and response generation. Clinicians report this is fast enough to use at the point of care without disrupting workflow — a threshold they placed at under 3 seconds based on user research conducted before the engagement.

The Zero Hallucination Audit

The hospital's clinical informatics team conducted a structured audit over 6 months: 3,200 system responses were reviewed by clinical staff against source documents. Zero fabricated facts were identified. In every case, the system's response was either directly grounded in a retrieved document or was an escalation to human review. This outcome reflects the architectural decision to retrieve-and-cite rather than generate-and-synthesize. The system does not produce clinical conclusions — it surfaces relevant clinical documentation and attributes it accurately. Clinical judgment remains with the clinician.

Implementation
Process & Timeline
01
Corpus Audit & Indexing
Catalogued 2.4M documents across 14 systems. Built normalization pipeline and metadata enrichment. Established document versioning and freshness tracking.
02
Retrieval Pipeline
Built BM25 and dense retrieval systems, reciprocal rank fusion layer, and cross-encoder re-ranker. Training data for re-ranker built with clinical informaticists.
03
Citation System
Built source attribution pipeline, document link resolution, and clinician-facing citation display. Extensive UX testing with clinical staff.
04
Confidence & Escalation
Calibrated confidence thresholds with clinical review of borderline cases. Built escalation routing and human informatics team workflow integration.
05
Epic Integration & Rollout
Integrated query interface into Epic EHR as a contextual panel. Phased rollout by department, starting with pharmacy and ICU.
Technology Stack
BM25 (Elasticsearch)OpenAI text-embedding-3-largeCross-Encoder Re-rankerPythonFastAPIElasticsearchPostgreSQLAWSReactEpic EHR IntegrationPrometheus