Healthcare AI · Retrieval

RAG clinical intelligence that actually reads the rules.

A production retrieval pipeline that answers domain-specific clinical queries across three distinct reimbursement models — the kind of problem generic RAG quietly fails at.

Role

PM & builder

Client

Pillbox Health

Timeline

2025 — present

Stack

Python · ChromaDB · Mistral 7B

[ HERO IMAGE / SYSTEM DIAGRAM — 1080 × 420 ]

01 /Context

Pillbox Health operates an RTM (Remote Therapeutic Monitoring) platform reimbursed under three separate CPT billing models. Providers needed fast, accurate answers about [...]

Off-the-shelf RAG returned plausible-sounding text that was often wrong on the fine print. Fine print is the whole business.

02 /Approach

Hybrid retrieval + cross-encoder reranking + CPT metadata boosting. Then — only then — the LLM.

I built a hybrid pipeline that combines lexical (BM25) and semantic (vector) retrieval, reranks with a cross-encoder, and boosts chunks whose CPT-code metadata matches the query. A local [...]

# pipeline query → BM25 ┐ → Vector ┤→ hybrid merge → Rerank → CPT boost → Mistral 7B → Metadata┘

03 /Decisions

1Local model, not API. PHI-adjacent data meant no external calls by default.
2Hybrid over pure vector. CPT codes are lexical by nature — exact-match recall matters.
3Metadata boosting beats prompt tricks. We encoded billing-model info into chunk metadata and let retrieval do the work.
4Audit log on every answer. Traceability is non-negotiable in healthcare.

04 /Outcome

The system ships clinical answers grounded in the right billing model, with a citation trail providers can inspect. Document scrubbing and audit logging satisfied internal compliance rev[...]

reimbursement models covered end-to-end

Local

Mistral 7B — no external API calls

Audit

Every answer traceable to source chunks

05 /What I learned

Most "RAG in production" war stories are really retrieval stories. The LLM is the last 10% — the first 90% is getting the right three chunks in front of it. For a domain like this, met[...]

As a PM, the interesting work was defining what "grounded" meant for this product, and getting clinicians to agree on acceptance criteria before we wrote the retrieval code.