Case Study2026-03-30 · 5 min

LocalRx — Offline-First Medical Reference for Low-Connectivity Clinics

A deterministic search engine for healthcare workers where the internet isn't a given. Hybrid retrieval, server-side filters, no LLM in the response path, zero network calls during query. Built for the Actian VectorAI DB Build Challenge.

Problem

WHO data: 50% of the global population lacks reliable access to essential health services. The connectivity story tracks: rural clinics may have intermittent internet at best. Cloud-hosted medical references — UpToDate, BMJ Best Practice — are subscription-walled and require always-on connectivity. When the internet drops, the reference drops with it.

Constraint

Zero network calls in the query path. The system must boot, ingest, and serve search entirely from one machine.
Deterministic, not generative. No LLM hallucinating dosages. Every result traceable to a verified WHO record.
Sub-second latency. Clinical context, not casual browsing.
One-command setup. A nurse or trainer should be able to bring it up with docker compose up on an ARM Raspberry Pi or an x86 laptop.

Decisions

Actian VectorAI DB as the core engine, not a sidecar

The DB does the heavy lifting: vector storage, BM25-style keyword search, RRF fusion, and metadata filters — all in one API call, all server-side, all in a Docker container. The FastAPI backend is a ~200 LOC orchestration layer. Most vector DBs would have forced maintaining a separate BM25 index and fusing in application code.

Hybrid retrieval with Reciprocal Rank Fusion (k=60)

Semantic alone fails on exact drug names. Keyword alone fails on conceptual symptom descriptions. RRF over both signals is robust to score scale mismatch (cosine [0,1] vs BM25 [0,∞]) because it operates on ranks, not raw scores. For medical retrieval where a single strong keyword match can't be allowed to dominate the ranking, that property is load-bearing.

Server-side filters with a typed builder

Six filter dimensions — category, specialty, age_group, risk_level, region, availability — applied during the search inside the DB engine, not as a Python post-filter. availability is the killer feature: a clinic configures what's actually on the shelf, and diagnoses requiring unavailable medicines don't rank.

Sentence-transformers MiniLM-L6-v2, CPU only

384-dimensional embeddings, ~80MB model, ~50ms per query on CPU. No GPU required. Loaded once at startup, kept warm.

Outcome

Four collections, 70 verified records: drugs (30 WHO Essential Medicines), guidelines (15 clinical protocols), conditions (15 disease descriptions), interactions (10 critical drug interactions).
~12ms total query time after warmup.
Frontend renders ranked result cards with score breakdown (RRF, semantic, keyword) so the clinician can interrogate the ranking — not a black box.
Ships as a Docker compose stack: VectorAI DB container + FastAPI app. Auto-ingests on first startup.

Lessons

Native fusion changes the architecture. Choosing a DB that does RRF natively collapsed an entire application-layer concern. Maintaining two parallel indices and fusing in Python would have been 3x the code and a much slower critical path.
The right anti-goal sharpens the design. Saying “this is not a chatbot” up front forced every design decision toward traceability and ranking transparency.
Surface the score breakdown. Showing per-signal scores in the UI builds trust in a way that “here's an answer” can't. Clinical users want to see why a result ranked where it did.