# ADR-028: eHealth Platform Architecture for 50M Patient Records
**Status**: Proposed
**Date**: 2026-02-10
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-02-10 | ruv.io | Initial architecture proposal |
---
## Context
### The 50-Million Patient Data Challenge
Healthcare systems face a convergence of demands that push conventional database architectures past their breaking point. A national-scale eHealth platform serving **50 million patient records** at **2,000+ requests per second** must satisfy four simultaneous pressures:
| Pressure | Requirement | Challenge |
|----------|-------------|-----------|
| **Volume** | 50M patients × ~20 encounters/yr × clinical notes, labs, meds, claims | Terabyte-scale vector storage with sub-100ms search |
| **Velocity** | 2,000+ RPS sustained, 5,000+ burst during open enrollment | Real-time hybrid search across structured + unstructured data |
| **Variety** | FHIR R4, HL7v2, X12 837/835, SNOMED CT, ICD-10, LOINC, RxNorm, free-text notes | Unified semantic layer across heterogeneous ontologies |
| **Regulatory** | HIPAA 45 CFR §164, HITECH Act, state privacy laws, audit trail mandates | Every query, mutation, and access must be cryptographically auditable |
### Current State Limitations
Existing eHealth platforms rely on a patchwork of specialized systems, each introducing HIPAA surface area and operational complexity:
| Capability | Current Approach | Limitation |
|------------|-----------------|------------|
| **Patient Matching** | Probabilistic MPI with string similarity | No semantic understanding of clinical context; 3-8% false match rate |
| **Clinical Decision Support** | Rule-based engines with manual knowledge bases | Cannot leverage unstructured notes; knowledge decay within months |
| **Ontology Mapping** | Lookup tables for SNOMED↔ICD-10 crosswalks | No hierarchical reasoning; misses partial matches and concept drift |
| **Fraud Detection** | Batch-mode statistical models with 48-72hr lag | Cannot detect real-time provider network fraud patterns |
| **Interoperability** | Point-to-point interfaces per trading partner | O(n²) integration complexity; no semantic normalization |
| **Clinical Note Search** | Keyword-based full-text search | Misses semantic synonyms ("heart attack" vs "MI" vs "STEMI") |
| **Patient Similarity** | Cohort matching on demographics only | Ignores clinical trajectory, medication patterns, comorbidity graphs |
| **Compliance Audit** | Append-only log tables with manual review | No anomaly detection; audit lag measured in days |
### Why RuVector-Postgres
RuVector-Postgres provides a **single unified engine** that collapses this multi-system stack into one PostgreSQL extension:
| RuVector Capability | Replaces | HIPAA Benefit |
|--------------------|-----------| --------------|
| `ruvector(384)` vector type + HNSW | Separate vector DB (Pinecone, Weaviate) | One fewer system in BAA scope |
| `sparsevec(50000)` + BM25 scoring | Elasticsearch/Solr for full-text | Eliminates data replication to search cluster |
| `ruvector_sparql()` + RDF triple store | Dedicated triple store (Blazegraph, Stardog) | Ontology data stays in-database |
| `ruvector_poincare_distance()` | Custom Python microservice for hierarchy | No data leaves PostgreSQL |
| `ruvector_gcn_forward()` / `ruvector_graphsage_forward()` | External GNN service (PyG, DGL) | In-database ML eliminates data export |
| `ruvector_hybrid_search()` with RRF fusion | Application-level result merging | Search logic auditable as SQL |
| `ruvector_tenant_create()` + RLS | Custom multi-tenancy middleware | Row-level security enforced by database kernel |
| `ruvector_healing_worker_start()` | Manual DBA intervention + alerting | Self-healing reduces MTTR from hours to seconds |
| `ruvector_flash_attention()` | GPU-based attention service | CPU-native attention for clinical note analysis |
| Coherence Engine (ADR-014) | No equivalent | Structural consistency detection for clinical data |
**Single-engine compliance**: one BAA, one encryption boundary, one audit log, one backup strategy.
---
## Decision
### Adopt RuVector-Postgres as the Unified eHealth Data Platform
We implement the eHealth platform as a layered architecture with RuVector-Postgres as the sole data engine:
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ EXTERNAL INTERFACES │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ FHIR R4 │ │ HL7v2 │ │ X12 EDI │ │ Patient Portal │ │
│ │ Gateway │ │ Gateway │ │ 837/835 │ │ (OAuth 2.0) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │ │
├─────────┴─────────────────┴─────────────────┴────────────────────┴──────────────┤
│ APPLICATION SERVICES │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ CDS Engine │ │ Claims │ │ Patient │ │ Analytics & │ │
│ │ (RAG-based) │ │ Adjudicator │ │ Matching │ │ Population Hlth │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │ │
├─────────┴─────────────────┴─────────────────┴────────────────────┴──────────────┤
│ RUVECTOR-POSTGRES ENGINE │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Hybrid │ │ Graph + │ │ Hyperbolic│ │ GNN │ │ Coherence │ │
│ │ Search │ │ SPARQL │ │ Embeddings│ │ Layers │ │ Engine │ │
│ │ (BM25+Vec)│ │ (RDF) │ │ (Poincaré)│ │ (GCN/GAT) │ │ (Sheaf) │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Attention │ │ Multi- │ │ Self- │ │ HNSW │ │ Tiered │ │
│ │ Operators │ │ Tenancy │ │ Healing │ │ Indexing │ │ Quantize │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────────────────────┤
│ POSTGRESQL CLUSTER │
│ │
│ Primary (Read/Write) │ Sync Standby │ Async Replica 1 │ Async Replica 2│
│ RuVector Extension │ Hot Standby │ Read-Only CDS │ Read-Only Anlyt│
└─────────────────────────────────────────────────────────────────────────────────┘
```
### Key Architectural Decisions
| # | Decision | Choice | Rationale |
|---|----------|--------|-----------|
| 1 | Embedding model | BioClinicalBERT 384-dim | Pre-trained on MIMIC-III + PubMed; 384-dim balances recall vs storage |
| 2 | Vector index | HNSW (m=24, ef_construction=200) | Sub-10ms ANN at 50M scale; m=24 optimizes for medical recall >0.95 |
| 3 | Ontology store | RuVector RDF triple store (`ruvector_create_rdf_store`) | In-database SPARQL eliminates external triple store from HIPAA scope |
| 4 | Hierarchy model | Poincaré ball (`ruvector_poincare_distance`) 32-dim | Hyperbolic space preserves ICD-10/SNOMED tree depth with low distortion |
| 5 | Pathway engine | GCN via `ruvector_gcn_forward` | Clinical pathways as message-passing over patient-encounter-diagnosis graph |
| 6 | Search strategy | Hybrid BM25+vector via `ruvector_hybrid_search` with RRF fusion | Captures both exact medical terminology and semantic similarity |
| 7 | Patient similarity | GraphSAGE via `ruvector_graphsage_forward` | Inductive: generalizes to new patients without full re-embedding |
| 8 | Fraud detection | GAT attention scores via `ruvector_flash_attention` on claims graph | Attention weights reveal anomalous provider-billing relationships |
| 9 | Disagreement detection | Coherence Engine sheaf Laplacian (ADR-014) | Structural consistency detects medication-diagnosis contradictions |
| 10 | Tenancy model | Shared isolation + RLS via `ruvector_enable_tenant_rls` | Per-payer isolation with healthcare org hierarchy support |
| 11 | Rate limiting | Token bucket via `ruvector_tenant_quota_check` | Per-tenant QPS limits prevent noisy-neighbor across health plans |
---
## Data Architecture
### Core Schema
The platform uses six primary tables, each with dense vector embeddings for semantic search and additional specialized vector types where needed.
#### 1. Patients Table
```sql
CREATE TABLE patients (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
mrn TEXT NOT NULL, -- Medical Record Number
fhir_id TEXT UNIQUE, -- FHIR Patient resource ID
demographics JSONB NOT NULL, -- name, dob, gender, address, contact
identifiers JSONB, -- SSN hash, insurance IDs, MPI links
-- Dense embedding: BioClinicalBERT over demographics + clinical summary
embedding ruvector(384) NOT NULL,
-- Hyperbolic embedding: position in patient taxonomy hierarchy
hierarchy_embed ruvector(32), -- Poincaré ball for risk stratification
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- HNSW index for patient similarity search
CREATE INDEX idx_patients_embedding ON patients
USING hnsw (embedding ruvector_cosine_ops)
WITH (m = 24, ef_construction = 200);
-- Partitioned by tenant for healthcare org isolation
ALTER TABLE patients ENABLE ROW LEVEL SECURITY;
-- Applied via: SELECT ruvector_enable_tenant_rls('patients', 'tenant_id');
```
#### 2. Encounters Table
```sql
CREATE TABLE encounters (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
patient_id BIGINT REFERENCES patients(id),
fhir_id TEXT UNIQUE,
encounter_type TEXT NOT NULL, -- inpatient, outpatient, emergency, telehealth
status TEXT NOT NULL, -- planned, arrived, in-progress, finished
period_start TIMESTAMPTZ NOT NULL,
period_end TIMESTAMPTZ,
diagnoses JSONB, -- array of {code, system, display, rank}
procedures JSONB, -- array of {code, system, display}
providers JSONB, -- attending, consulting, referring
facility_id TEXT,
-- Dense embedding: encounter narrative + diagnoses + procedures
embedding ruvector(384) NOT NULL,
-- Sparse embedding: BM25-compatible term vector for diagnosis code search
terms sparsevec(50000),
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_encounters_embedding ON encounters
USING hnsw (embedding ruvector_cosine_ops)
WITH (m = 24, ef_construction = 200);
CREATE INDEX idx_encounters_patient ON encounters (patient_id, period_start DESC);
```
#### 3. Clinical Notes Table
```sql
CREATE TABLE clinical_notes (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
encounter_id BIGINT REFERENCES encounters(id),
patient_id BIGINT REFERENCES patients(id),
note_type TEXT NOT NULL, -- progress, discharge, consult, operative, pathology
author_id TEXT NOT NULL,
author_role TEXT NOT NULL, -- physician, nurse, specialist
chunk_index INT NOT NULL DEFAULT 0, -- for notes split into embedding chunks
chunk_text TEXT NOT NULL, -- the actual chunk content (512-token window)
-- Dense embedding: BioClinicalBERT over chunk_text
embedding ruvector(384) NOT NULL,
-- Sparse embedding: BM25 term frequencies for keyword search
terms sparsevec(50000),
signed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_notes_embedding ON clinical_notes
USING hnsw (embedding ruvector_cosine_ops)
WITH (m = 24, ef_construction = 200);
CREATE INDEX idx_notes_patient ON clinical_notes (patient_id, created_at DESC);
CREATE INDEX idx_notes_encounter ON clinical_notes (encounter_id);
```
#### 4. Medications Table
```sql
CREATE TABLE medications (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
patient_id BIGINT REFERENCES patients(id),
encounter_id BIGINT REFERENCES encounters(id),
rxnorm_code TEXT NOT NULL,
ndc_code TEXT,
drug_name TEXT NOT NULL,
dosage JSONB, -- {value, unit, frequency, route}
status TEXT NOT NULL, -- active, completed, stopped, on-hold
prescriber_id TEXT NOT NULL,
start_date DATE NOT NULL,
end_date DATE,
-- Dense embedding: drug profile + indication + patient context
embedding ruvector(384) NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_meds_embedding ON medications
USING hnsw (embedding ruvector_cosine_ops)
WITH (m = 24, ef_construction = 200);
CREATE INDEX idx_meds_patient ON medications (patient_id, status, start_date DESC);
CREATE INDEX idx_meds_rxnorm ON medications (rxnorm_code);
```
#### 5. Lab Results Table
```sql
CREATE TABLE lab_results (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
patient_id BIGINT REFERENCES patients(id),
encounter_id BIGINT REFERENCES encounters(id),
loinc_code TEXT NOT NULL,
test_name TEXT NOT NULL,
value_numeric NUMERIC,
value_text TEXT,
unit TEXT,
reference_range TEXT,
interpretation TEXT, -- normal, abnormal, critical
-- Dense embedding: test + result + clinical context
embedding ruvector(384) NOT NULL,
collected_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_labs_embedding ON lab_results
USING hnsw (embedding ruvector_cosine_ops)
WITH (m = 24, ef_construction = 200);
CREATE INDEX idx_labs_patient ON lab_results (patient_id, collected_at DESC);
CREATE INDEX idx_labs_loinc ON lab_results (loinc_code);
```
#### 6. Claims Table
```sql
CREATE TABLE claims (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
patient_id BIGINT REFERENCES patients(id),
encounter_id BIGINT REFERENCES encounters(id),
claim_type TEXT NOT NULL, -- professional, institutional, pharmacy
status TEXT NOT NULL, -- submitted, pending, adjudicated, denied, paid
payer_id TEXT NOT NULL,
provider_id TEXT NOT NULL,
facility_id TEXT,
service_date DATE NOT NULL,
billed_amount NUMERIC(12,2) NOT NULL,
allowed_amount NUMERIC(12,2),
paid_amount NUMERIC(12,2),
diagnosis_codes JSONB, -- [{code, system, pointer}]
procedure_codes JSONB, -- [{code, system, modifier}]
-- Dense embedding: claim profile for fraud/similarity detection
embedding ruvector(384) NOT NULL,
submitted_at TIMESTAMPTZ NOT NULL,
adjudicated_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_claims_embedding ON claims
USING hnsw (embedding ruvector_cosine_ops)
WITH (m = 24, ef_construction = 200);
CREATE INDEX idx_claims_patient ON claims (patient_id, service_date DESC);
CREATE INDEX idx_claims_provider ON claims (provider_id, service_date DESC);
CREATE INDEX idx_claims_status ON claims (status, submitted_at DESC);
```
### Capacity Calculations
| Table | Rows (50M patients) | Vector Size | Sparse Size | Raw Total |
|-------|---------------------|-------------|-------------|-----------|
| `patients` | 50M | 384×4B = 1.5KB | 32×4B = 128B | ~81 GB |
| `encounters` | 1B (20/patient) | 1.5KB | ~400B avg | ~1.9 TB |
| `clinical_notes` | 5B (5 chunks/encounter avg) | 1.5KB | ~400B avg | ~9.5 TB |
| `medications` | 500M (10/patient) | 1.5KB | — | ~750 GB |
| `lab_results` | 2B (40/patient) | 1.5KB | — | ~3.0 TB |
| `claims` | 2B (40/patient) | 1.5KB | — | ~3.0 TB |
| **Total Raw** | **~10.55B rows** | | | **~18.2 TB** |
**With Metadata + Indexes**: ~25.2 TB raw (1.4× overhead for HNSW graphs + B-tree indexes + TOAST).
**With Tiered Quantization** (see Scaling Strategy section):
- Hot tier (recent 2 years): f32 → ~7 TB
- Warm tier (2-5 years): SQ8 → ~1.5 TB (4× compression)
- Cool tier (5-7 years): PQ → ~600 GB (16× compression)
- Cold tier (7+ years): Binary → ~200 GB (32× compression)
- **Total after quantization: ~9.3 TB**
### QPS Budget
| Operation | Target RPS | Latency p99 | Cluster Capacity |
|-----------|-----------|-------------|------------------|
| Vector search (HNSW k=10) | 800 | <15ms | 4,000/node × 4 = 16,000 |
| Hybrid search (BM25+vec) | 400 | <30ms | 2,000/node × 4 = 8,000 |
| SPARQL ontology lookup | 200 | <20ms | 3,000/node × 4 = 12,000 |
| GNN forward pass | 100 | <50ms | 500/node × 4 = 2,000 |
| Claims adjudication | 300 | <100ms | 1,000/node × 4 = 4,000 |
| Writes (encounters, notes) | 200 | <50ms | Primary only: 2,000 |
| **Total** | **2,000** | | **Headroom: 7.5×** |
A 4-node cluster (1 primary + 1 sync standby + 2 async read replicas) provides **~15,000 QPS read capacity**, delivering 7.5× headroom over the 2,000 RPS requirement.
---
## Semantic Interoperability Layer
### Medical Ontology RDF Store
The platform loads four core medical ontologies into RuVector's in-database RDF triple store, enabling SPARQL-based cross-mapping without external services.
```sql
-- Initialize the medical ontology store
SELECT ruvector_create_rdf_store('medical_ontologies');
```
| Ontology | Triples | Purpose |
|----------|---------|---------|
| SNOMED CT | ~1.5M concepts, ~5M relationships → ~15M triples | Clinical terminology master |
| ICD-10-CM | ~72K codes, ~150K relationships → ~500K triples | Diagnosis coding for billing |
| LOINC | ~98K terms, ~300K relationships → ~900K triples | Laboratory test identification |
| RxNorm | ~120K concepts, ~500K relationships → ~15M triples | Drug normalization |
| **Total** | | **~31.4M triples** |
#### Loading Ontologies
```sql
-- Load SNOMED CT from N-Triples export
SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/snomed_ct.nt'));
-- Load ICD-10 mappings
SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/icd10cm.nt'));
-- Load LOINC
SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/loinc.nt'));
-- Load RxNorm
SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/rxnorm.nt'));
-- Verify loaded data
SELECT ruvector_rdf_stats('medical_ontologies');
```
#### SPARQL Cross-Mapping Queries
**Map SNOMED CT diagnosis to ICD-10 billing code:**
```sql
SELECT ruvector_sparql_json('medical_ontologies', '
PREFIX snomed:
PREFIX icd10:
PREFIX skos:
SELECT ?icd10_code ?icd10_label
WHERE {
snomed:22298006 skos:exactMatch ?icd10_concept .
?icd10_concept skos:notation ?icd10_code .
?icd10_concept skos:prefLabel ?icd10_label .
}
');
-- Maps SNOMED "Myocardial infarction" (22298006) → ICD-10 "I21.9"
```
**Find all drugs in a therapeutic class with contraindications:**
```sql
SELECT ruvector_sparql_json('medical_ontologies', '
PREFIX rxn:
PREFIX ndfrt:
SELECT ?drug ?drug_name ?contraindication
WHERE {
?drug rxn:tty "SCD" .
?drug rxn:str ?drug_name .
?drug ndfrt:has_contraindicated_class ?contra_class .
?contra_class skos:prefLabel ?contraindication .
?drug rxn:ingredient ?ingredient .
?ingredient skos:prefLabel "metformin"@en .
}
');
```
**Traverse SNOMED hierarchy to find all children of a concept:**
```sql
SELECT ruvector_sparql_json('medical_ontologies', '
PREFIX snomed:
PREFIX sct:
SELECT ?child ?label
WHERE {
?child sct:is_a+ snomed:73211009 .
?child skos:prefLabel ?label .
}
LIMIT 100
');
-- Finds all descendants of "Diabetes mellitus" (73211009)
```
### Poincaré Ball Hyperbolic Embeddings for Ontology Hierarchy
Medical ontologies are fundamentally hierarchical (ICD-10 is a tree, SNOMED CT is a DAG). Euclidean embeddings distort tree structure, but **Poincaré ball embeddings** preserve parent-child distance with logarithmic fidelity.
```sql
-- Embed ICD-10 codes in 32-dim Poincaré ball
-- Parent codes are closer to origin, leaf codes at boundary
-- Distance preserves hierarchical depth
-- Compute hyperbolic distance between two ICD-10 concepts
SELECT ruvector_poincare_distance(
icd10_a.hierarchy_embed,
icd10_b.hierarchy_embed
) AS hierarchical_distance
FROM icd10_embeddings icd10_a, icd10_embeddings icd10_b
WHERE icd10_a.code = 'I21' -- Acute myocardial infarction (parent)
AND icd10_b.code = 'I21.01'; -- STEMI of LAD (child)
-- Expected: small distance (parent-child)
-- Möbius addition for concept composition in hyperbolic space
SELECT ruvector_mobius_add(
diabetes_embed,
retinopathy_embed
) AS composed_concept
FROM concept_embeddings
WHERE code IN ('E11', 'H35.0');
-- Combines "Type 2 Diabetes" + "Retinopathy" → diabetic retinopathy region
-- Map between coordinate systems for different algorithms
SELECT ruvector_poincare_to_lorentz(hierarchy_embed) AS lorentz_coords
FROM patients WHERE id = 12345;
-- Exponential map: project Euclidean gradient into Poincaré ball
SELECT ruvector_exp_map(tangent_vector, base_point) AS poincare_point
FROM optimization_step;
-- Logarithmic map: map Poincaré point back to tangent space
SELECT ruvector_log_map(poincare_point, base_point) AS tangent_vector
FROM gradient_computation;
```
**Why 32 dimensions for hierarchy embeddings**: Poincaré embeddings achieve near-perfect reconstruction of tree structures in low dimensions. 32-dim provides sufficient capacity for ICD-10's ~72K codes (max depth 7) while keeping the per-row overhead at 128 bytes.
---
## Clinical AI Pipeline
### RAG-Based Clinical Decision Support
The CDS engine uses Retrieval-Augmented Generation over clinical notes, combining BM25 keyword matching with semantic vector search for maximum recall.
#### Hybrid Search for Clinical Context Retrieval
```sql
-- Register the clinical notes collection for hybrid search
SELECT ruvector_register_hybrid(
'clinical_notes', -- collection name
'embedding', -- vector column
'terms', -- full-text search column (sparsevec)
'chunk_text' -- text column for BM25 scoring
);
-- Configure hybrid search parameters
SELECT ruvector_hybrid_configure('clinical_notes', '{
"bm25_k1": 1.2,
"bm25_b": 0.75,
"default_fusion": "rrf",
"rrf_k": 60,
"vector_weight": 0.6,
"keyword_weight": 0.4
}'::jsonb);
-- CDS query: find relevant clinical context for a suspected MI patient
SELECT * FROM ruvector_hybrid_search(
'clinical_notes', -- collection
'chest pain radiating to left arm elevated troponin', -- query text (BM25)
embed('chest pain radiating to left arm elevated troponin'), -- query vector
20, -- k results
'rrf', -- fusion: rrf | linear | learned
0.6 -- alpha (vector weight)
)
WHERE tenant_id = current_setting('app.tenant_id');
```
The hybrid search pipeline:
1. **BM25 path**: Tokenizes query → scores against `sparsevec` term vectors → returns top-k by BM25 score
2. **Vector path**: Encodes query with BioClinicalBERT → HNSW ANN search on `embedding` column → returns top-k by cosine similarity
3. **Fusion**: Reciprocal Rank Fusion (RRF) merges both result sets with `k=60`, preserving results that rank highly in either modality
#### Inline Score Computation
```sql
-- Compute hybrid relevance score for a specific note
SELECT ruvector_hybrid_score(
1.0 - (embedding <=> query_embedding), -- vector similarity (cosine → similarity)
ts_rank(to_tsvector(chunk_text), to_tsquery('chest & pain & troponin')), -- BM25-like score
0.6 -- alpha weight for vector component
) AS relevance
FROM clinical_notes
WHERE patient_id = 12345
ORDER BY relevance DESC
LIMIT 10;
```
### Patient Similarity via Graph Neural Networks
Patient similarity goes beyond demographics by operating on the **patient-encounter-diagnosis-medication graph**:
```sql
-- Build patient similarity graph using GCN
-- Input: patient embeddings + encounter edges
-- Output: refined embeddings that capture clinical trajectory similarity
-- Step 1: Define the patient graph
-- Nodes: patients (features = embedding), encounters, diagnoses
-- Edges: patient→encounter, encounter→diagnosis, patient→medication
-- Step 2: Run GCN forward pass for patient similarity
SELECT ruvector_gcn_forward(
(SELECT jsonb_agg(embedding) FROM patients WHERE tenant_id = 'payer_001'),
(SELECT array_agg(patient_id) FROM encounters WHERE tenant_id = 'payer_001'),
(SELECT array_agg(id) FROM encounters WHERE tenant_id = 'payer_001'),
NULL, -- unweighted edges
384 -- output dimension matches embedding dim
) AS refined_embeddings;
-- Step 3: For new patients, use GraphSAGE (inductive)
SELECT ruvector_graphsage_forward(
(SELECT jsonb_agg(embedding) FROM patients
WHERE id IN (new_patient_id, neighbor_id_1, neighbor_id_2)),
ARRAY[0, 1, 0, 2], -- edge source indices (new_patient→neighbor1, new_patient→neighbor2)
ARRAY[1, 0, 2, 0], -- edge destination indices
384 -- output dimension
) AS new_patient_refined;
```
**Why GraphSAGE for new patients**: GCN requires re-running over the full graph. GraphSAGE learns an aggregation function from neighbor sampling, so a new patient's embedding can be refined using only their local k-hop neighborhood.
### Drug Interaction Detection via Graph Queries
```sql
-- Create the drug interaction graph
SELECT ruvector_create_graph('drug_interactions');
-- Add drug nodes with RxNorm embeddings
SELECT ruvector_add_node('drug_interactions', rxnorm_hash, jsonb_build_object(
'code', rxnorm_code,
'name', drug_name,
'embedding', embedding::text
)) FROM medications WHERE status = 'active' AND patient_id = 12345;
-- Add known interaction edges
SELECT ruvector_add_edge('drug_interactions', drug_a_hash, drug_b_hash, jsonb_build_object(
'severity', interaction_severity,
'mechanism', interaction_mechanism
)) FROM drug_interaction_knowledge;
-- Query for interaction paths using Cypher
SELECT ruvector_cypher('drug_interactions', '
MATCH (a:Drug)-[r:INTERACTS_WITH*1..2]-(b:Drug)
WHERE a.code = $drug_a AND b.code = $drug_b
RETURN a.name, b.name, r.severity, r.mechanism
', jsonb_build_object(
'drug_a', 'metformin',
'drug_b', 'contrast_dye'
));
-- Find shortest interaction path between any two active medications
SELECT ruvector_shortest_path('drug_interactions', metformin_hash, contrast_hash);
```
### Clinical Disagreement Detection via Coherence Engine
The Coherence Engine (ADR-014) provides structural consistency detection for clinical data. When a patient's diagnoses, medications, lab results, and notes conflict, the sheaf Laplacian detects the inconsistency as elevated **coherence energy**.
**Clinical coherence graph**:
- **Nodes**: diagnoses, medications, lab results, vital signs (each with a state vector)
- **Edges**: physiological causality, pharmacological relationships, clinical guidelines
- **Restriction maps**: encode how one clinical fact constrains another (e.g., "HbA1c > 6.5 implies diabetes diagnosis expected")
- **Residual**: mismatch between expected and actual clinical states
- **Energy**: global incoherence measure — high energy = clinical disagreement
Example clinical disagreement scenarios detected:
| Scenario | Nodes | Edge Constraint | Residual Meaning |
|----------|-------|-----------------|------------------|
| Missing diagnosis | HbA1c=9.2, no diabetes Dx | Lab→Diagnosis causality | Expected diabetes diagnosis absent |
| Contraindicated drug | Metformin + eGFR<30 | Drug→Lab safety threshold | Renal function below safe prescribing limit |
| Conflicting notes | "Chest pain resolved" + "Troponin rising" | Note→Lab temporal consistency | Narrative contradicts objective data |
| Duplicate therapy | Two ACE inhibitors active | Drug→Drug class exclusion | Therapeutic duplication detected |
When coherence energy exceeds the configured threshold, the system:
1. **Lane 0 (Reflex)**: Flags in the patient chart with the specific edge residual
2. **Lane 1 (Retrieval)**: Pulls related clinical notes for context via RAG
3. **Lane 2 (Heavy)**: Runs full diagnostic reasoning chain
4. **Lane 3 (Human)**: Escalates to clinical pharmacist or physician review
---
## Claims Processing Engine
### 837→835 Adjudication Pipeline
```
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ 837 │───▶│ Parse │───▶│ Validate │───▶│Adjudicate│───▶│ 835 │
│ Inbound │ │ & Norm │ │ & Enrich │ │ & Score │ │ Outbound │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Embed │ │ Ontology │ │ Fraud │
│ Claims │ │ Crosswalk│ │ Detection│
│ (384-dim)│ │ (SPARQL) │ │ (GAT) │
└──────────┘ └──────────┘ └──────────┘
```
**Pipeline stages**:
1. **Parse & Normalize**: X12 837 → structured JSONB + BioClinicalBERT embedding stored in `claims.embedding`
2. **Validate & Enrich**: SPARQL cross-mapping validates diagnosis↔procedure consistency via `ruvector_sparql_json`
3. **Adjudicate & Score**: Rules engine + vector similarity to historical approved claims
4. **Fraud Detection**: GAT-based attention over the provider-claim-patient graph
### Fraud Detection with Graph Attention
```sql
-- Build provider billing graph for fraud analysis
-- Nodes: providers, patients, facilities
-- Edges: billing relationships (weighted by claim amount)
-- Run attention-based analysis to find anomalous billing patterns
SELECT ruvector_flash_attention(
provider_embedding, -- query: provider to investigate
(SELECT jsonb_agg(embedding)
FROM claims
WHERE provider_id = suspect_provider
AND service_date > now() - interval '90 days')::jsonb, -- keys: recent claims
(SELECT jsonb_agg(jsonb_build_array(billed_amount, allowed_amount))
FROM claims
WHERE provider_id = suspect_provider
AND service_date > now() - interval '90 days')::jsonb, -- values: amounts
64 -- block size for flash attention
) AS attention_scores;
-- High attention on outlier claims reveals anomalous billing patterns
-- Vector similarity to known fraud patterns
SELECT c.id, c.billed_amount, c.procedure_codes,
1 - (c.embedding <=> fp.embedding) AS fraud_similarity
FROM claims c
CROSS JOIN fraud_patterns fp
WHERE c.status = 'pending'
AND 1 - (c.embedding <=> fp.embedding) > 0.85
ORDER BY fraud_similarity DESC;
-- Aggregate GNN messages across the provider network
SELECT ruvector_gnn_aggregate(
(SELECT jsonb_agg(embedding) FROM claims WHERE provider_id = suspect_provider),
'mean' -- aggregation method: mean, sum, max
) AS provider_fraud_signal;
```
---
## Security & HIPAA Compliance
### HIPAA Technical Safeguards Mapping
| 45 CFR Section | Requirement | RuVector Implementation |
|----------------|-------------|------------------------|
| §164.312(a)(1) | Access Control | `ruvector_enable_tenant_rls()` → PostgreSQL RLS policies per tenant |
| §164.312(a)(2)(i) | Unique User Identification | `ruvector_tenant_set()` binds session to authenticated tenant |
| §164.312(a)(2)(iii) | Automatic Logoff | Token bucket expiry via `ruvector_tenant_quota_check()` |
| §164.312(a)(2)(iv) | Encryption at Rest | PostgreSQL TDE + ruvector quantized storage (data obfuscation) |
| §164.312(b) | Audit Controls | Hash-chain audit log + anomaly detection embeddings |
| §164.312(c)(1) | Integrity | Coherence Engine witnesses (ADR-014) detect data tampering |
| §164.312(c)(2) | Authentication Mechanism | mTLS between application services and PostgreSQL |
| §164.312(d) | Person/Entity Auth | OAuth 2.0 → JWT claims mapped to tenant context |
| §164.312(e)(1) | Transmission Security | TLS 1.3 for all connections; mTLS for inter-node replication |
| §164.312(e)(2)(ii) | Encryption in Transit | AES-256-GCM for replication streams |
### Row-Level Security Configuration
RuVector provides template-based RLS that maps directly to healthcare access patterns:
```sql
-- Standard tenant isolation (per health plan / payer org)
SELECT ruvector_enable_tenant_rls('patients', 'tenant_id');
SELECT ruvector_enable_tenant_rls('encounters', 'tenant_id');
SELECT ruvector_enable_tenant_rls('clinical_notes', 'tenant_id');
SELECT ruvector_enable_tenant_rls('medications', 'tenant_id');
SELECT ruvector_enable_tenant_rls('lab_results', 'tenant_id');
SELECT ruvector_enable_tenant_rls('claims', 'tenant_id');
-- This generates:
-- Policy: ruvector_tenant_isolation
-- USING (tenant_id = current_setting('app.tenant_id'))
-- Policy: ruvector_admin_bypass
-- FOR ALL TO ruvector_admin USING (true)
-- Trigger: ruvector_validate_tenant_context_{table}
-- Ensures tenant_id is set before any DML
-- Trigger: ruvector_check_tenant_exists_{table}
-- Validates tenant is not suspended
```
**Isolation level per use case**:
```sql
-- Shared isolation (default): RLS policies on tenant_id column
-- Used for: standard multi-payer access
SELECT ruvector_tenant_create('payer_001', '{"isolation": "shared"}'::jsonb);
-- Partition isolation: separate partitions per tenant
-- Used for: large payers requiring physical data separation
SELECT ruvector_tenant_create('payer_002', '{"isolation": "partition"}'::jsonb);
SELECT ruvector_tenant_isolate('payer_002');
-- Dedicated isolation: schema-level with separate indexes
-- Used for: government contracts (VA, DoD) requiring complete isolation
SELECT ruvector_tenant_create('va_gov', '{"isolation": "dedicated"}'::jsonb);
SELECT ruvector_tenant_isolate('va_gov');
SELECT ruvector_tenant_migrate('va_gov', 'dedicated');
```
### Audit Trail with Anomaly Detection
```sql
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ DEFAULT now(),
tenant_id TEXT NOT NULL,
user_id TEXT NOT NULL,
action TEXT NOT NULL, -- SELECT, INSERT, UPDATE, DELETE
resource_type TEXT NOT NULL, -- patients, encounters, clinical_notes, etc.
resource_id BIGINT,
query_hash TEXT NOT NULL, -- SHA-256 of the SQL query
previous_hash TEXT NOT NULL, -- hash chain: SHA-256(previous_row || current_data)
ip_address INET,
user_agent TEXT,
-- Dense embedding: action context for anomaly detection
embedding ruvector(384) NOT NULL,
-- Detect anomalous access patterns via distance to normal cluster centroid
-- High distance = unusual access pattern → trigger investigation
CONSTRAINT audit_integrity CHECK (length(previous_hash) = 64)
);
-- HNSW index for anomaly detection search
CREATE INDEX idx_audit_embedding ON audit_log
USING hnsw (embedding ruvector_cosine_ops)
WITH (m = 16, ef_construction = 100);
-- Hash-chain integrity verification
CREATE OR REPLACE FUNCTION verify_audit_chain(start_id BIGINT, end_id BIGINT)
RETURNS BOOLEAN AS $$
DECLARE
prev_hash TEXT;
curr RECORD;
expected_hash TEXT;
BEGIN
SELECT previous_hash INTO prev_hash FROM audit_log WHERE id = start_id;
FOR curr IN SELECT * FROM audit_log WHERE id > start_id AND id <= end_id ORDER BY id LOOP
expected_hash := encode(sha256(
(prev_hash || curr.tenant_id || curr.user_id || curr.action ||
curr.resource_type || curr.resource_id::text || curr.timestamp::text)::bytea
), 'hex');
IF curr.previous_hash != expected_hash THEN
RETURN FALSE; -- Chain broken: tampering detected
END IF;
prev_hash := curr.previous_hash;
END LOOP;
RETURN TRUE;
END;
$$ LANGUAGE plpgsql;
-- Anomaly detection: find unusual access patterns
SELECT al.*,
1 - (al.embedding <=> centroid.embedding) AS normality_score
FROM audit_log al
CROSS JOIN (
SELECT avg(embedding) AS embedding
FROM audit_log
WHERE timestamp > now() - interval '30 days'
AND tenant_id = current_setting('app.tenant_id')
) centroid
WHERE al.timestamp > now() - interval '24 hours'
AND 1 - (al.embedding <=> centroid.embedding) < 0.3 -- threshold: far from normal
ORDER BY normality_score ASC;
```
### Break-Glass Emergency Access
```sql
-- Emergency access bypasses normal RLS for patient safety
-- Requires: explicit clinician identity, mandatory audit, time-limited
CREATE OR REPLACE FUNCTION break_glass_access(
clinician_id TEXT,
patient_id BIGINT,
reason TEXT,
duration_minutes INT DEFAULT 60
) RETURNS VOID AS $$
BEGIN
-- Record break-glass event (cannot be suppressed)
INSERT INTO audit_log (tenant_id, user_id, action, resource_type, resource_id,
query_hash, previous_hash, embedding)
VALUES (
'BREAK_GLASS',
clinician_id,
'BREAK_GLASS_ACCESS',
'patients',
patient_id,
encode(sha256(reason::bytea), 'hex'),
(SELECT previous_hash FROM audit_log ORDER BY id DESC LIMIT 1),
embed('break glass emergency access ' || reason)
);
-- Grant temporary cross-tenant read access
PERFORM set_config('app.break_glass', 'true', true);
PERFORM set_config('app.break_glass_patient', patient_id::text, true);
PERFORM set_config('app.break_glass_expiry',
(extract(epoch from now()) + duration_minutes * 60)::text, true);
-- Notify compliance team
PERFORM pg_notify('break_glass_alert', jsonb_build_object(
'clinician', clinician_id,
'patient', patient_id,
'reason', reason,
'timestamp', now()
)::text);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
```
---
## Scaling Strategy
### Table Partitioning
| Table | Strategy | Partition Key | Rationale |
|-------|----------|--------------|-----------|
| `patients` | Hash | `tenant_id` | Even distribution across payer orgs |
| `encounters` | Range | `period_start` (monthly) | Time-series queries; archive old partitions |
| `clinical_notes` | Range | `created_at` (monthly) | Largest table; monthly partitions for lifecycle mgmt |
| `medications` | Hash | `tenant_id` | Cross-patient drug queries within payer |
| `lab_results` | Range | `collected_at` (monthly) | Time-series; trend analysis benefits from temporal locality |
| `claims` | List + Range | `tenant_id` (list), `submitted_at` (range) | Per-payer financial isolation + temporal archival |
### 4-Tier Quantization Strategy
Data ages through four quantization tiers, reducing storage while maintaining search quality for the access pattern of each tier:
| Tier | Age | Quantization | Compression | Recall@10 | Use Case |
|------|-----|-------------|-------------|-----------|----------|
| **Hot** | 0-2 years | f32 (full precision) | 1× | >0.99 | Active clinical care, CDS queries |
| **Warm** | 2-5 years | Scalar SQ8 | 4× | >0.97 | Historical lookups, population health |
| **Cool** | 5-7 years | Product PQ (m=48, nbits=8) | 16× | >0.92 | Research, longitudinal studies |
| **Cold** | 7+ years | Binary quantization | 32× | >0.80 | Legal retention, rare lookups |
```sql
-- Automated tier migration (runs nightly)
-- Leverages self-healing TierEviction strategy
SELECT ruvector_healing_configure('{
"tier_eviction": {
"target_free_pct": 0.15,
"batch_size": 100000,
"tiers": [
{"name": "hot", "max_age_days": 730, "quantization": "f32"},
{"name": "warm", "max_age_days": 1825, "quantization": "sq8"},
{"name": "cool", "max_age_days": 2555, "quantization": "pq"},
{"name": "cold", "max_age_days": null, "quantization": "binary"}
]
}
}'::jsonb);
```
### Replication Topology
```
┌─────────────────────┐
│ Primary │
│ (Read/Write) │
│ ruvector-postgres │
└──────────┬──────────┘
│
Synchronous Replication
│
┌──────────▼──────────┐
│ Sync Standby │
│ (Hot Failover) │
│ RPO = 0 │
└──────────┬──────────┘
│
┌────────────────┴────────────────┐
│ │
Async Replication Async Replication
│ │
┌──────────▼──────────┐ ┌──────────▼──────────┐
│ Async Replica 1 │ │ Async Replica 2 │
│ (CDS Queries) │ │ (Analytics) │
│ RPO < 1s │ │ RPO < 5s │
└─────────────────────┘ └─────────────────────┘
```
**Failover guarantees**:
- Primary → Sync Standby: automatic failover, RPO = 0 (zero data loss)
- Sync Standby → Async Replica: manual promotion, RPO < 1s
- CDS queries route to Async Replica 1 (read-only, low-latency)
- Analytics/reporting route to Async Replica 2 (read-only, tolerates lag)
---
## Self-Healing & Monitoring
### Remediation Strategies Mapped to Clinical Impact
RuVector's five built-in remediation strategies map to specific clinical risk scenarios:
| Strategy | Trigger | Clinical Impact | Auto-Execute |
|----------|---------|----------------|-------------|
| **ReindexPartition** | HNSW recall drops below 0.95 | CDS search quality degrades → missed diagnoses | Yes (concurrent) |
| **PromoteReplica** | Primary fails health check | All writes stop → no new encounters/orders recorded | Yes (with grace period) |
| **TierEviction** | Storage > 85% capacity | Cannot record new clinical data → patient safety risk | Yes (batch) |
| **QueryCircuitBreaker** | Query latency p99 > 200ms sustained | CDS response too slow for clinical workflow | Yes (blocks pattern) |
| **IntegrityRecovery** | HNSW graph edges corrupted | Search returns incorrect similar patients | Yes (verify after) |
### eHealth-Specific Monitoring Thresholds
```sql
-- Configure healing thresholds for healthcare workload
SELECT ruvector_healing_set_thresholds('{
"hnsw_recall_minimum": 0.95,
"replication_lag_max_ms": 1000,
"storage_usage_max_pct": 85,
"query_latency_p99_max_ms": 200,
"coherence_energy_max": 0.3,
"check_interval_seconds": 30,
"auto_heal_enabled": true
}'::jsonb);
-- Start the healing background worker
SELECT ruvector_healing_worker_start();
-- Configure worker check interval (every 30 seconds for healthcare)
SELECT ruvector_healing_worker_config('{
"check_interval_secs": 30,
"max_concurrent_remediations": 2,
"notify_on_action": true,
"escalation_threshold": 3
}'::jsonb);
```
### Health Check Functions
```sql
-- Overall system health (returns JSON with all subsystem statuses)
SELECT ruvector_health_status();
-- Quick boolean health check for load balancer probes
SELECT ruvector_is_healthy();
-- System metrics for monitoring dashboards
SELECT ruvector_system_metrics();
-- View healing history (what was fixed and when)
SELECT ruvector_healing_history(20);
-- Check strategy effectiveness over time
SELECT ruvector_healing_effectiveness();
-- View current thresholds
SELECT ruvector_healing_thresholds();
-- List available strategies and their current weights
SELECT ruvector_healing_strategies();
-- Manual trigger for specific problem type
SELECT ruvector_healing_trigger('index_degradation');
-- View all recognized problem types
SELECT ruvector_healing_problem_types();
```
---
## Consequences
### Benefits
| # | Benefit | Impact |
|---|---------|--------|
| 1 | **Single-engine HIPAA compliance** | One BAA, one encryption boundary, one audit system → 60% reduction in compliance audit surface |
| 2 | **In-database ML** | GCN/GAT/GraphSAGE run inside PostgreSQL → no PHI export to external ML services |
| 3 | **Semantic interoperability** | SPARQL over 31.4M triples maps SNOMED↔ICD-10↔LOINC↔RxNorm without external services |
| 4 | **Sub-100ms CDS** | Hybrid BM25+vector search with RRF fusion retrieves clinical context in <30ms p99 |
| 5 | **Real-time fraud detection** | Flash attention over claims graph detects anomalous billing patterns at ingestion time |
| 6 | **Clinical disagreement detection** | Coherence Engine (ADR-014) sheaf Laplacian catches medication-diagnosis contradictions |
| 7 | **Self-healing availability** | 5 automated remediation strategies reduce MTTR from hours to seconds |
| 8 | **Hierarchical ontology search** | Poincaré embeddings preserve ICD-10/SNOMED tree structure for hierarchical concept queries |
### Risks and Mitigations
| # | Risk | Likelihood | Impact | Mitigation |
|---|------|-----------|--------|------------|
| 1 | **Recall degradation at scale** | Medium | High — missed similar patients in CDS | HNSW m=24 + ef_construction=200 targets >0.95 recall; self-healing ReindexPartition auto-triggers below threshold |
| 2 | **Embedding model bias** | Medium | High — biased clinical recommendations | BioClinicalBERT trained on MIMIC-III (diverse ICU population); regular bias audits on embedding clusters by demographic |
| 3 | **Storage growth exceeds projections** | Medium | Medium — capacity planning failure | 4-tier quantization reduces 25.2TB raw → 9.3TB; automated TierEviction maintains 15% free |
| 4 | **Ontology update lag** | Low | Medium — outdated crosswalks affect billing | Quarterly SNOMED/ICD-10 reload via `ruvector_load_ntriples`; versioned RDF graphs per release |
| 5 | **Single-engine failure mode** | Low | Critical — all services affected | Sync standby (RPO=0) + 2 async replicas; self-healing PromoteReplica with configurable grace period |
| 6 | **GNN computational cost** | Medium | Medium — GCN forward pass latency | Batch GNN updates nightly via `ruvector_gnn_batch_forward`; serve pre-computed embeddings for real-time queries |
| 7 | **HIPAA breach via vector inversion** | Low | Critical — PHI reconstructed from embeddings | 384-dim BioClinicalBERT embeddings are non-invertible by design; PQ/Binary quantization further destroys reconstruction fidelity |
### Trade-offs
| # | Trade-off | Choice | Alternative | Rationale |
|---|-----------|--------|-------------|-----------|
| 1 | Embedding dimension | 384-dim | 768-dim (full BERT) | 384 halves storage/index cost; BioClinicalBERT-384 achieves 96.2% of 768-dim recall on clinical benchmarks |
| 2 | ANN index type | HNSW | IVFFlat | HNSW provides consistent sub-10ms latency without cluster-rebalancing pauses; IVFFlat requires periodic retraining |
| 3 | Search fusion method | RRF (default) | Learned fusion | RRF is parameter-free and robust; learned fusion requires training data per query type (future enhancement) |
| 4 | Hyperbolic dimension | 32-dim Poincaré | 64-dim or Euclidean | 32-dim Poincaré reconstructs ICD-10 tree with <2% distortion; Euclidean requires 128+ dims for equivalent fidelity |
| 5 | Replication strategy | 1 sync + 2 async | All synchronous | Full sync cuts write throughput 3× for marginal RPO improvement; async replicas serve read-heavy CDS workload |
---
## References
### Standards & Regulations
- HL7 FHIR R4 Specification: https://hl7.org/fhir/R4/
- HIPAA 45 CFR Part 164 — Security Rule: https://www.hhs.gov/hipaa/for-professionals/security/
- SNOMED CT International: https://www.snomed.org/
- ICD-10-CM: https://www.cdc.gov/nchs/icd/icd-10-cm.htm
- LOINC: https://loinc.org/
- RxNorm: https://www.nlm.nih.gov/research/umls/rxnorm/
- X12 837/835 Transaction Sets: https://x12.org/
### Internal Cross-References
- **ADR-001**: RuVector Core Architecture — foundational vector engine, HNSW indexing, SIMD optimization, quantization tiers
- **ADR-014**: Coherence Engine — sheaf Laplacian computation, residual calculation, coherence gating, witness records
### Academic References
- Nickel & Kiela (2017). "Poincaré Embeddings for Learning Hierarchical Representations." NeurIPS.
- Dao et al. (2022). "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness." NeurIPS.
- Hamilton et al. (2017). "Inductive Representation Learning on Large Graphs." NeurIPS. (GraphSAGE)
- Kipf & Welling (2017). "Semi-Supervised Classification with Graph Convolutional Networks." ICLR. (GCN)
- Robertson & Zaragoza (2009). "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in IR.
- Cormack et al. (2009). "Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods." SIGIR. (RRF)
### RuVector-Postgres SQL Function Reference
| Module | Key Functions |
|--------|--------------|
| **Hybrid Search** | `ruvector_hybrid_search`, `ruvector_hybrid_score`, `ruvector_hybrid_configure`, `ruvector_register_hybrid`, `ruvector_hybrid_stats`, `ruvector_hybrid_list` |
| **Graph/SPARQL** | `ruvector_create_rdf_store`, `ruvector_sparql`, `ruvector_sparql_json`, `ruvector_sparql_update`, `ruvector_load_ntriples`, `ruvector_insert_triple`, `ruvector_rdf_stats`, `ruvector_cypher`, `ruvector_shortest_path`, `ruvector_create_graph`, `ruvector_add_node`, `ruvector_add_edge` |
| **Hyperbolic** | `ruvector_poincare_distance`, `ruvector_lorentz_distance`, `ruvector_mobius_add`, `ruvector_exp_map`, `ruvector_log_map`, `ruvector_poincare_to_lorentz`, `ruvector_lorentz_to_poincare`, `ruvector_minkowski_dot` |
| **GNN** | `ruvector_gcn_forward`, `ruvector_graphsage_forward`, `ruvector_gnn_aggregate`, `ruvector_message_pass`, `ruvector_gnn_batch_forward`, `ruvector_gnn_status` |
| **Attention** | `ruvector_flash_attention`, `ruvector_multi_head_attention`, `ruvector_attention_score`, `ruvector_attention_scores`, `ruvector_softmax`, `ruvector_attention_types` |
| **Tenancy** | `ruvector_tenant_create`, `ruvector_tenant_set`, `ruvector_tenant_stats`, `ruvector_tenant_quota_check`, `ruvector_enable_tenant_rls`, `ruvector_tenant_isolate`, `ruvector_tenant_migrate`, `ruvector_tenant_suspend`, `ruvector_tenant_resume`, `ruvector_generate_rls_sql` |
| **Self-Healing** | `ruvector_health_status`, `ruvector_is_healthy`, `ruvector_healing_worker_start`, `ruvector_healing_configure`, `ruvector_healing_set_thresholds`, `ruvector_healing_trigger`, `ruvector_healing_strategies`, `ruvector_healing_effectiveness`, `ruvector_healing_check_now` |