# ADR-028: eHealth Platform Architecture for 50M Patient Records **Status**: Proposed **Date**: 2026-02-10 **Authors**: ruv.io, RuVector Team **Deciders**: Architecture Review Board **SDK**: Claude-Flow ## Version History | Version | Date | Author | Changes | |---------|------|--------|---------| | 0.1 | 2026-02-10 | ruv.io | Initial architecture proposal | --- ## Context ### The 50-Million Patient Data Challenge Healthcare systems face a convergence of demands that push conventional database architectures past their breaking point. A national-scale eHealth platform serving **50 million patient records** at **2,000+ requests per second** must satisfy four simultaneous pressures: | Pressure | Requirement | Challenge | |----------|-------------|-----------| | **Volume** | 50M patients × ~20 encounters/yr × clinical notes, labs, meds, claims | Terabyte-scale vector storage with sub-100ms search | | **Velocity** | 2,000+ RPS sustained, 5,000+ burst during open enrollment | Real-time hybrid search across structured + unstructured data | | **Variety** | FHIR R4, HL7v2, X12 837/835, SNOMED CT, ICD-10, LOINC, RxNorm, free-text notes | Unified semantic layer across heterogeneous ontologies | | **Regulatory** | HIPAA 45 CFR §164, HITECH Act, state privacy laws, audit trail mandates | Every query, mutation, and access must be cryptographically auditable | ### Current State Limitations Existing eHealth platforms rely on a patchwork of specialized systems, each introducing HIPAA surface area and operational complexity: | Capability | Current Approach | Limitation | |------------|-----------------|------------| | **Patient Matching** | Probabilistic MPI with string similarity | No semantic understanding of clinical context; 3-8% false match rate | | **Clinical Decision Support** | Rule-based engines with manual knowledge bases | Cannot leverage unstructured notes; knowledge decay within months | | **Ontology Mapping** | Lookup tables for SNOMED↔ICD-10 crosswalks | No hierarchical reasoning; misses partial matches and concept drift | | **Fraud Detection** | Batch-mode statistical models with 48-72hr lag | Cannot detect real-time provider network fraud patterns | | **Interoperability** | Point-to-point interfaces per trading partner | O(n²) integration complexity; no semantic normalization | | **Clinical Note Search** | Keyword-based full-text search | Misses semantic synonyms ("heart attack" vs "MI" vs "STEMI") | | **Patient Similarity** | Cohort matching on demographics only | Ignores clinical trajectory, medication patterns, comorbidity graphs | | **Compliance Audit** | Append-only log tables with manual review | No anomaly detection; audit lag measured in days | ### Why RuVector-Postgres RuVector-Postgres provides a **single unified engine** that collapses this multi-system stack into one PostgreSQL extension: | RuVector Capability | Replaces | HIPAA Benefit | |--------------------|-----------| --------------| | `ruvector(384)` vector type + HNSW | Separate vector DB (Pinecone, Weaviate) | One fewer system in BAA scope | | `sparsevec(50000)` + BM25 scoring | Elasticsearch/Solr for full-text | Eliminates data replication to search cluster | | `ruvector_sparql()` + RDF triple store | Dedicated triple store (Blazegraph, Stardog) | Ontology data stays in-database | | `ruvector_poincare_distance()` | Custom Python microservice for hierarchy | No data leaves PostgreSQL | | `ruvector_gcn_forward()` / `ruvector_graphsage_forward()` | External GNN service (PyG, DGL) | In-database ML eliminates data export | | `ruvector_hybrid_search()` with RRF fusion | Application-level result merging | Search logic auditable as SQL | | `ruvector_tenant_create()` + RLS | Custom multi-tenancy middleware | Row-level security enforced by database kernel | | `ruvector_healing_worker_start()` | Manual DBA intervention + alerting | Self-healing reduces MTTR from hours to seconds | | `ruvector_flash_attention()` | GPU-based attention service | CPU-native attention for clinical note analysis | | Coherence Engine (ADR-014) | No equivalent | Structural consistency detection for clinical data | **Single-engine compliance**: one BAA, one encryption boundary, one audit log, one backup strategy. --- ## Decision ### Adopt RuVector-Postgres as the Unified eHealth Data Platform We implement the eHealth platform as a layered architecture with RuVector-Postgres as the sole data engine: ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ EXTERNAL INTERFACES │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ FHIR R4 │ │ HL7v2 │ │ X12 EDI │ │ Patient Portal │ │ │ │ Gateway │ │ Gateway │ │ 837/835 │ │ (OAuth 2.0) │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │ │ │ │ │ │ │ ├─────────┴─────────────────┴─────────────────┴────────────────────┴──────────────┤ │ APPLICATION SERVICES │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ CDS Engine │ │ Claims │ │ Patient │ │ Analytics & │ │ │ │ (RAG-based) │ │ Adjudicator │ │ Matching │ │ Population Hlth │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │ │ │ │ │ │ │ ├─────────┴─────────────────┴─────────────────┴────────────────────┴──────────────┤ │ RUVECTOR-POSTGRES ENGINE │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Hybrid │ │ Graph + │ │ Hyperbolic│ │ GNN │ │ Coherence │ │ │ │ Search │ │ SPARQL │ │ Embeddings│ │ Layers │ │ Engine │ │ │ │ (BM25+Vec)│ │ (RDF) │ │ (Poincaré)│ │ (GCN/GAT) │ │ (Sheaf) │ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Attention │ │ Multi- │ │ Self- │ │ HNSW │ │ Tiered │ │ │ │ Operators │ │ Tenancy │ │ Healing │ │ Indexing │ │ Quantize │ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ POSTGRESQL CLUSTER │ │ │ │ Primary (Read/Write) │ Sync Standby │ Async Replica 1 │ Async Replica 2│ │ RuVector Extension │ Hot Standby │ Read-Only CDS │ Read-Only Anlyt│ └─────────────────────────────────────────────────────────────────────────────────┘ ``` ### Key Architectural Decisions | # | Decision | Choice | Rationale | |---|----------|--------|-----------| | 1 | Embedding model | BioClinicalBERT 384-dim | Pre-trained on MIMIC-III + PubMed; 384-dim balances recall vs storage | | 2 | Vector index | HNSW (m=24, ef_construction=200) | Sub-10ms ANN at 50M scale; m=24 optimizes for medical recall >0.95 | | 3 | Ontology store | RuVector RDF triple store (`ruvector_create_rdf_store`) | In-database SPARQL eliminates external triple store from HIPAA scope | | 4 | Hierarchy model | Poincaré ball (`ruvector_poincare_distance`) 32-dim | Hyperbolic space preserves ICD-10/SNOMED tree depth with low distortion | | 5 | Pathway engine | GCN via `ruvector_gcn_forward` | Clinical pathways as message-passing over patient-encounter-diagnosis graph | | 6 | Search strategy | Hybrid BM25+vector via `ruvector_hybrid_search` with RRF fusion | Captures both exact medical terminology and semantic similarity | | 7 | Patient similarity | GraphSAGE via `ruvector_graphsage_forward` | Inductive: generalizes to new patients without full re-embedding | | 8 | Fraud detection | GAT attention scores via `ruvector_flash_attention` on claims graph | Attention weights reveal anomalous provider-billing relationships | | 9 | Disagreement detection | Coherence Engine sheaf Laplacian (ADR-014) | Structural consistency detects medication-diagnosis contradictions | | 10 | Tenancy model | Shared isolation + RLS via `ruvector_enable_tenant_rls` | Per-payer isolation with healthcare org hierarchy support | | 11 | Rate limiting | Token bucket via `ruvector_tenant_quota_check` | Per-tenant QPS limits prevent noisy-neighbor across health plans | --- ## Data Architecture ### Core Schema The platform uses six primary tables, each with dense vector embeddings for semantic search and additional specialized vector types where needed. #### 1. Patients Table ```sql CREATE TABLE patients ( id BIGSERIAL PRIMARY KEY, tenant_id TEXT NOT NULL, mrn TEXT NOT NULL, -- Medical Record Number fhir_id TEXT UNIQUE, -- FHIR Patient resource ID demographics JSONB NOT NULL, -- name, dob, gender, address, contact identifiers JSONB, -- SSN hash, insurance IDs, MPI links -- Dense embedding: BioClinicalBERT over demographics + clinical summary embedding ruvector(384) NOT NULL, -- Hyperbolic embedding: position in patient taxonomy hierarchy hierarchy_embed ruvector(32), -- Poincaré ball for risk stratification created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); -- HNSW index for patient similarity search CREATE INDEX idx_patients_embedding ON patients USING hnsw (embedding ruvector_cosine_ops) WITH (m = 24, ef_construction = 200); -- Partitioned by tenant for healthcare org isolation ALTER TABLE patients ENABLE ROW LEVEL SECURITY; -- Applied via: SELECT ruvector_enable_tenant_rls('patients', 'tenant_id'); ``` #### 2. Encounters Table ```sql CREATE TABLE encounters ( id BIGSERIAL PRIMARY KEY, tenant_id TEXT NOT NULL, patient_id BIGINT REFERENCES patients(id), fhir_id TEXT UNIQUE, encounter_type TEXT NOT NULL, -- inpatient, outpatient, emergency, telehealth status TEXT NOT NULL, -- planned, arrived, in-progress, finished period_start TIMESTAMPTZ NOT NULL, period_end TIMESTAMPTZ, diagnoses JSONB, -- array of {code, system, display, rank} procedures JSONB, -- array of {code, system, display} providers JSONB, -- attending, consulting, referring facility_id TEXT, -- Dense embedding: encounter narrative + diagnoses + procedures embedding ruvector(384) NOT NULL, -- Sparse embedding: BM25-compatible term vector for diagnosis code search terms sparsevec(50000), created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_encounters_embedding ON encounters USING hnsw (embedding ruvector_cosine_ops) WITH (m = 24, ef_construction = 200); CREATE INDEX idx_encounters_patient ON encounters (patient_id, period_start DESC); ``` #### 3. Clinical Notes Table ```sql CREATE TABLE clinical_notes ( id BIGSERIAL PRIMARY KEY, tenant_id TEXT NOT NULL, encounter_id BIGINT REFERENCES encounters(id), patient_id BIGINT REFERENCES patients(id), note_type TEXT NOT NULL, -- progress, discharge, consult, operative, pathology author_id TEXT NOT NULL, author_role TEXT NOT NULL, -- physician, nurse, specialist chunk_index INT NOT NULL DEFAULT 0, -- for notes split into embedding chunks chunk_text TEXT NOT NULL, -- the actual chunk content (512-token window) -- Dense embedding: BioClinicalBERT over chunk_text embedding ruvector(384) NOT NULL, -- Sparse embedding: BM25 term frequencies for keyword search terms sparsevec(50000), signed_at TIMESTAMPTZ, created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_notes_embedding ON clinical_notes USING hnsw (embedding ruvector_cosine_ops) WITH (m = 24, ef_construction = 200); CREATE INDEX idx_notes_patient ON clinical_notes (patient_id, created_at DESC); CREATE INDEX idx_notes_encounter ON clinical_notes (encounter_id); ``` #### 4. Medications Table ```sql CREATE TABLE medications ( id BIGSERIAL PRIMARY KEY, tenant_id TEXT NOT NULL, patient_id BIGINT REFERENCES patients(id), encounter_id BIGINT REFERENCES encounters(id), rxnorm_code TEXT NOT NULL, ndc_code TEXT, drug_name TEXT NOT NULL, dosage JSONB, -- {value, unit, frequency, route} status TEXT NOT NULL, -- active, completed, stopped, on-hold prescriber_id TEXT NOT NULL, start_date DATE NOT NULL, end_date DATE, -- Dense embedding: drug profile + indication + patient context embedding ruvector(384) NOT NULL, created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_meds_embedding ON medications USING hnsw (embedding ruvector_cosine_ops) WITH (m = 24, ef_construction = 200); CREATE INDEX idx_meds_patient ON medications (patient_id, status, start_date DESC); CREATE INDEX idx_meds_rxnorm ON medications (rxnorm_code); ``` #### 5. Lab Results Table ```sql CREATE TABLE lab_results ( id BIGSERIAL PRIMARY KEY, tenant_id TEXT NOT NULL, patient_id BIGINT REFERENCES patients(id), encounter_id BIGINT REFERENCES encounters(id), loinc_code TEXT NOT NULL, test_name TEXT NOT NULL, value_numeric NUMERIC, value_text TEXT, unit TEXT, reference_range TEXT, interpretation TEXT, -- normal, abnormal, critical -- Dense embedding: test + result + clinical context embedding ruvector(384) NOT NULL, collected_at TIMESTAMPTZ NOT NULL, created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_labs_embedding ON lab_results USING hnsw (embedding ruvector_cosine_ops) WITH (m = 24, ef_construction = 200); CREATE INDEX idx_labs_patient ON lab_results (patient_id, collected_at DESC); CREATE INDEX idx_labs_loinc ON lab_results (loinc_code); ``` #### 6. Claims Table ```sql CREATE TABLE claims ( id BIGSERIAL PRIMARY KEY, tenant_id TEXT NOT NULL, patient_id BIGINT REFERENCES patients(id), encounter_id BIGINT REFERENCES encounters(id), claim_type TEXT NOT NULL, -- professional, institutional, pharmacy status TEXT NOT NULL, -- submitted, pending, adjudicated, denied, paid payer_id TEXT NOT NULL, provider_id TEXT NOT NULL, facility_id TEXT, service_date DATE NOT NULL, billed_amount NUMERIC(12,2) NOT NULL, allowed_amount NUMERIC(12,2), paid_amount NUMERIC(12,2), diagnosis_codes JSONB, -- [{code, system, pointer}] procedure_codes JSONB, -- [{code, system, modifier}] -- Dense embedding: claim profile for fraud/similarity detection embedding ruvector(384) NOT NULL, submitted_at TIMESTAMPTZ NOT NULL, adjudicated_at TIMESTAMPTZ, created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_claims_embedding ON claims USING hnsw (embedding ruvector_cosine_ops) WITH (m = 24, ef_construction = 200); CREATE INDEX idx_claims_patient ON claims (patient_id, service_date DESC); CREATE INDEX idx_claims_provider ON claims (provider_id, service_date DESC); CREATE INDEX idx_claims_status ON claims (status, submitted_at DESC); ``` ### Capacity Calculations | Table | Rows (50M patients) | Vector Size | Sparse Size | Raw Total | |-------|---------------------|-------------|-------------|-----------| | `patients` | 50M | 384×4B = 1.5KB | 32×4B = 128B | ~81 GB | | `encounters` | 1B (20/patient) | 1.5KB | ~400B avg | ~1.9 TB | | `clinical_notes` | 5B (5 chunks/encounter avg) | 1.5KB | ~400B avg | ~9.5 TB | | `medications` | 500M (10/patient) | 1.5KB | — | ~750 GB | | `lab_results` | 2B (40/patient) | 1.5KB | — | ~3.0 TB | | `claims` | 2B (40/patient) | 1.5KB | — | ~3.0 TB | | **Total Raw** | **~10.55B rows** | | | **~18.2 TB** | **With Metadata + Indexes**: ~25.2 TB raw (1.4× overhead for HNSW graphs + B-tree indexes + TOAST). **With Tiered Quantization** (see Scaling Strategy section): - Hot tier (recent 2 years): f32 → ~7 TB - Warm tier (2-5 years): SQ8 → ~1.5 TB (4× compression) - Cool tier (5-7 years): PQ → ~600 GB (16× compression) - Cold tier (7+ years): Binary → ~200 GB (32× compression) - **Total after quantization: ~9.3 TB** ### QPS Budget | Operation | Target RPS | Latency p99 | Cluster Capacity | |-----------|-----------|-------------|------------------| | Vector search (HNSW k=10) | 800 | <15ms | 4,000/node × 4 = 16,000 | | Hybrid search (BM25+vec) | 400 | <30ms | 2,000/node × 4 = 8,000 | | SPARQL ontology lookup | 200 | <20ms | 3,000/node × 4 = 12,000 | | GNN forward pass | 100 | <50ms | 500/node × 4 = 2,000 | | Claims adjudication | 300 | <100ms | 1,000/node × 4 = 4,000 | | Writes (encounters, notes) | 200 | <50ms | Primary only: 2,000 | | **Total** | **2,000** | | **Headroom: 7.5×** | A 4-node cluster (1 primary + 1 sync standby + 2 async read replicas) provides **~15,000 QPS read capacity**, delivering 7.5× headroom over the 2,000 RPS requirement. --- ## Semantic Interoperability Layer ### Medical Ontology RDF Store The platform loads four core medical ontologies into RuVector's in-database RDF triple store, enabling SPARQL-based cross-mapping without external services. ```sql -- Initialize the medical ontology store SELECT ruvector_create_rdf_store('medical_ontologies'); ``` | Ontology | Triples | Purpose | |----------|---------|---------| | SNOMED CT | ~1.5M concepts, ~5M relationships → ~15M triples | Clinical terminology master | | ICD-10-CM | ~72K codes, ~150K relationships → ~500K triples | Diagnosis coding for billing | | LOINC | ~98K terms, ~300K relationships → ~900K triples | Laboratory test identification | | RxNorm | ~120K concepts, ~500K relationships → ~15M triples | Drug normalization | | **Total** | | **~31.4M triples** | #### Loading Ontologies ```sql -- Load SNOMED CT from N-Triples export SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/snomed_ct.nt')); -- Load ICD-10 mappings SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/icd10cm.nt')); -- Load LOINC SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/loinc.nt')); -- Load RxNorm SELECT ruvector_load_ntriples('medical_ontologies', pg_read_file('/data/ontologies/rxnorm.nt')); -- Verify loaded data SELECT ruvector_rdf_stats('medical_ontologies'); ``` #### SPARQL Cross-Mapping Queries **Map SNOMED CT diagnosis to ICD-10 billing code:** ```sql SELECT ruvector_sparql_json('medical_ontologies', ' PREFIX snomed: PREFIX icd10: PREFIX skos: SELECT ?icd10_code ?icd10_label WHERE { snomed:22298006 skos:exactMatch ?icd10_concept . ?icd10_concept skos:notation ?icd10_code . ?icd10_concept skos:prefLabel ?icd10_label . } '); -- Maps SNOMED "Myocardial infarction" (22298006) → ICD-10 "I21.9" ``` **Find all drugs in a therapeutic class with contraindications:** ```sql SELECT ruvector_sparql_json('medical_ontologies', ' PREFIX rxn: PREFIX ndfrt: SELECT ?drug ?drug_name ?contraindication WHERE { ?drug rxn:tty "SCD" . ?drug rxn:str ?drug_name . ?drug ndfrt:has_contraindicated_class ?contra_class . ?contra_class skos:prefLabel ?contraindication . ?drug rxn:ingredient ?ingredient . ?ingredient skos:prefLabel "metformin"@en . } '); ``` **Traverse SNOMED hierarchy to find all children of a concept:** ```sql SELECT ruvector_sparql_json('medical_ontologies', ' PREFIX snomed: PREFIX sct: SELECT ?child ?label WHERE { ?child sct:is_a+ snomed:73211009 . ?child skos:prefLabel ?label . } LIMIT 100 '); -- Finds all descendants of "Diabetes mellitus" (73211009) ``` ### Poincaré Ball Hyperbolic Embeddings for Ontology Hierarchy Medical ontologies are fundamentally hierarchical (ICD-10 is a tree, SNOMED CT is a DAG). Euclidean embeddings distort tree structure, but **Poincaré ball embeddings** preserve parent-child distance with logarithmic fidelity. ```sql -- Embed ICD-10 codes in 32-dim Poincaré ball -- Parent codes are closer to origin, leaf codes at boundary -- Distance preserves hierarchical depth -- Compute hyperbolic distance between two ICD-10 concepts SELECT ruvector_poincare_distance( icd10_a.hierarchy_embed, icd10_b.hierarchy_embed ) AS hierarchical_distance FROM icd10_embeddings icd10_a, icd10_embeddings icd10_b WHERE icd10_a.code = 'I21' -- Acute myocardial infarction (parent) AND icd10_b.code = 'I21.01'; -- STEMI of LAD (child) -- Expected: small distance (parent-child) -- Möbius addition for concept composition in hyperbolic space SELECT ruvector_mobius_add( diabetes_embed, retinopathy_embed ) AS composed_concept FROM concept_embeddings WHERE code IN ('E11', 'H35.0'); -- Combines "Type 2 Diabetes" + "Retinopathy" → diabetic retinopathy region -- Map between coordinate systems for different algorithms SELECT ruvector_poincare_to_lorentz(hierarchy_embed) AS lorentz_coords FROM patients WHERE id = 12345; -- Exponential map: project Euclidean gradient into Poincaré ball SELECT ruvector_exp_map(tangent_vector, base_point) AS poincare_point FROM optimization_step; -- Logarithmic map: map Poincaré point back to tangent space SELECT ruvector_log_map(poincare_point, base_point) AS tangent_vector FROM gradient_computation; ``` **Why 32 dimensions for hierarchy embeddings**: Poincaré embeddings achieve near-perfect reconstruction of tree structures in low dimensions. 32-dim provides sufficient capacity for ICD-10's ~72K codes (max depth 7) while keeping the per-row overhead at 128 bytes. --- ## Clinical AI Pipeline ### RAG-Based Clinical Decision Support The CDS engine uses Retrieval-Augmented Generation over clinical notes, combining BM25 keyword matching with semantic vector search for maximum recall. #### Hybrid Search for Clinical Context Retrieval ```sql -- Register the clinical notes collection for hybrid search SELECT ruvector_register_hybrid( 'clinical_notes', -- collection name 'embedding', -- vector column 'terms', -- full-text search column (sparsevec) 'chunk_text' -- text column for BM25 scoring ); -- Configure hybrid search parameters SELECT ruvector_hybrid_configure('clinical_notes', '{ "bm25_k1": 1.2, "bm25_b": 0.75, "default_fusion": "rrf", "rrf_k": 60, "vector_weight": 0.6, "keyword_weight": 0.4 }'::jsonb); -- CDS query: find relevant clinical context for a suspected MI patient SELECT * FROM ruvector_hybrid_search( 'clinical_notes', -- collection 'chest pain radiating to left arm elevated troponin', -- query text (BM25) embed('chest pain radiating to left arm elevated troponin'), -- query vector 20, -- k results 'rrf', -- fusion: rrf | linear | learned 0.6 -- alpha (vector weight) ) WHERE tenant_id = current_setting('app.tenant_id'); ``` The hybrid search pipeline: 1. **BM25 path**: Tokenizes query → scores against `sparsevec` term vectors → returns top-k by BM25 score 2. **Vector path**: Encodes query with BioClinicalBERT → HNSW ANN search on `embedding` column → returns top-k by cosine similarity 3. **Fusion**: Reciprocal Rank Fusion (RRF) merges both result sets with `k=60`, preserving results that rank highly in either modality #### Inline Score Computation ```sql -- Compute hybrid relevance score for a specific note SELECT ruvector_hybrid_score( 1.0 - (embedding <=> query_embedding), -- vector similarity (cosine → similarity) ts_rank(to_tsvector(chunk_text), to_tsquery('chest & pain & troponin')), -- BM25-like score 0.6 -- alpha weight for vector component ) AS relevance FROM clinical_notes WHERE patient_id = 12345 ORDER BY relevance DESC LIMIT 10; ``` ### Patient Similarity via Graph Neural Networks Patient similarity goes beyond demographics by operating on the **patient-encounter-diagnosis-medication graph**: ```sql -- Build patient similarity graph using GCN -- Input: patient embeddings + encounter edges -- Output: refined embeddings that capture clinical trajectory similarity -- Step 1: Define the patient graph -- Nodes: patients (features = embedding), encounters, diagnoses -- Edges: patient→encounter, encounter→diagnosis, patient→medication -- Step 2: Run GCN forward pass for patient similarity SELECT ruvector_gcn_forward( (SELECT jsonb_agg(embedding) FROM patients WHERE tenant_id = 'payer_001'), (SELECT array_agg(patient_id) FROM encounters WHERE tenant_id = 'payer_001'), (SELECT array_agg(id) FROM encounters WHERE tenant_id = 'payer_001'), NULL, -- unweighted edges 384 -- output dimension matches embedding dim ) AS refined_embeddings; -- Step 3: For new patients, use GraphSAGE (inductive) SELECT ruvector_graphsage_forward( (SELECT jsonb_agg(embedding) FROM patients WHERE id IN (new_patient_id, neighbor_id_1, neighbor_id_2)), ARRAY[0, 1, 0, 2], -- edge source indices (new_patient→neighbor1, new_patient→neighbor2) ARRAY[1, 0, 2, 0], -- edge destination indices 384 -- output dimension ) AS new_patient_refined; ``` **Why GraphSAGE for new patients**: GCN requires re-running over the full graph. GraphSAGE learns an aggregation function from neighbor sampling, so a new patient's embedding can be refined using only their local k-hop neighborhood. ### Drug Interaction Detection via Graph Queries ```sql -- Create the drug interaction graph SELECT ruvector_create_graph('drug_interactions'); -- Add drug nodes with RxNorm embeddings SELECT ruvector_add_node('drug_interactions', rxnorm_hash, jsonb_build_object( 'code', rxnorm_code, 'name', drug_name, 'embedding', embedding::text )) FROM medications WHERE status = 'active' AND patient_id = 12345; -- Add known interaction edges SELECT ruvector_add_edge('drug_interactions', drug_a_hash, drug_b_hash, jsonb_build_object( 'severity', interaction_severity, 'mechanism', interaction_mechanism )) FROM drug_interaction_knowledge; -- Query for interaction paths using Cypher SELECT ruvector_cypher('drug_interactions', ' MATCH (a:Drug)-[r:INTERACTS_WITH*1..2]-(b:Drug) WHERE a.code = $drug_a AND b.code = $drug_b RETURN a.name, b.name, r.severity, r.mechanism ', jsonb_build_object( 'drug_a', 'metformin', 'drug_b', 'contrast_dye' )); -- Find shortest interaction path between any two active medications SELECT ruvector_shortest_path('drug_interactions', metformin_hash, contrast_hash); ``` ### Clinical Disagreement Detection via Coherence Engine The Coherence Engine (ADR-014) provides structural consistency detection for clinical data. When a patient's diagnoses, medications, lab results, and notes conflict, the sheaf Laplacian detects the inconsistency as elevated **coherence energy**. **Clinical coherence graph**: - **Nodes**: diagnoses, medications, lab results, vital signs (each with a state vector) - **Edges**: physiological causality, pharmacological relationships, clinical guidelines - **Restriction maps**: encode how one clinical fact constrains another (e.g., "HbA1c > 6.5 implies diabetes diagnosis expected") - **Residual**: mismatch between expected and actual clinical states - **Energy**: global incoherence measure — high energy = clinical disagreement Example clinical disagreement scenarios detected: | Scenario | Nodes | Edge Constraint | Residual Meaning | |----------|-------|-----------------|------------------| | Missing diagnosis | HbA1c=9.2, no diabetes Dx | Lab→Diagnosis causality | Expected diabetes diagnosis absent | | Contraindicated drug | Metformin + eGFR<30 | Drug→Lab safety threshold | Renal function below safe prescribing limit | | Conflicting notes | "Chest pain resolved" + "Troponin rising" | Note→Lab temporal consistency | Narrative contradicts objective data | | Duplicate therapy | Two ACE inhibitors active | Drug→Drug class exclusion | Therapeutic duplication detected | When coherence energy exceeds the configured threshold, the system: 1. **Lane 0 (Reflex)**: Flags in the patient chart with the specific edge residual 2. **Lane 1 (Retrieval)**: Pulls related clinical notes for context via RAG 3. **Lane 2 (Heavy)**: Runs full diagnostic reasoning chain 4. **Lane 3 (Human)**: Escalates to clinical pharmacist or physician review --- ## Claims Processing Engine ### 837→835 Adjudication Pipeline ``` ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ 837 │───▶│ Parse │───▶│ Validate │───▶│Adjudicate│───▶│ 835 │ │ Inbound │ │ & Norm │ │ & Enrich │ │ & Score │ │ Outbound │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Embed │ │ Ontology │ │ Fraud │ │ Claims │ │ Crosswalk│ │ Detection│ │ (384-dim)│ │ (SPARQL) │ │ (GAT) │ └──────────┘ └──────────┘ └──────────┘ ``` **Pipeline stages**: 1. **Parse & Normalize**: X12 837 → structured JSONB + BioClinicalBERT embedding stored in `claims.embedding` 2. **Validate & Enrich**: SPARQL cross-mapping validates diagnosis↔procedure consistency via `ruvector_sparql_json` 3. **Adjudicate & Score**: Rules engine + vector similarity to historical approved claims 4. **Fraud Detection**: GAT-based attention over the provider-claim-patient graph ### Fraud Detection with Graph Attention ```sql -- Build provider billing graph for fraud analysis -- Nodes: providers, patients, facilities -- Edges: billing relationships (weighted by claim amount) -- Run attention-based analysis to find anomalous billing patterns SELECT ruvector_flash_attention( provider_embedding, -- query: provider to investigate (SELECT jsonb_agg(embedding) FROM claims WHERE provider_id = suspect_provider AND service_date > now() - interval '90 days')::jsonb, -- keys: recent claims (SELECT jsonb_agg(jsonb_build_array(billed_amount, allowed_amount)) FROM claims WHERE provider_id = suspect_provider AND service_date > now() - interval '90 days')::jsonb, -- values: amounts 64 -- block size for flash attention ) AS attention_scores; -- High attention on outlier claims reveals anomalous billing patterns -- Vector similarity to known fraud patterns SELECT c.id, c.billed_amount, c.procedure_codes, 1 - (c.embedding <=> fp.embedding) AS fraud_similarity FROM claims c CROSS JOIN fraud_patterns fp WHERE c.status = 'pending' AND 1 - (c.embedding <=> fp.embedding) > 0.85 ORDER BY fraud_similarity DESC; -- Aggregate GNN messages across the provider network SELECT ruvector_gnn_aggregate( (SELECT jsonb_agg(embedding) FROM claims WHERE provider_id = suspect_provider), 'mean' -- aggregation method: mean, sum, max ) AS provider_fraud_signal; ``` --- ## Security & HIPAA Compliance ### HIPAA Technical Safeguards Mapping | 45 CFR Section | Requirement | RuVector Implementation | |----------------|-------------|------------------------| | §164.312(a)(1) | Access Control | `ruvector_enable_tenant_rls()` → PostgreSQL RLS policies per tenant | | §164.312(a)(2)(i) | Unique User Identification | `ruvector_tenant_set()` binds session to authenticated tenant | | §164.312(a)(2)(iii) | Automatic Logoff | Token bucket expiry via `ruvector_tenant_quota_check()` | | §164.312(a)(2)(iv) | Encryption at Rest | PostgreSQL TDE + ruvector quantized storage (data obfuscation) | | §164.312(b) | Audit Controls | Hash-chain audit log + anomaly detection embeddings | | §164.312(c)(1) | Integrity | Coherence Engine witnesses (ADR-014) detect data tampering | | §164.312(c)(2) | Authentication Mechanism | mTLS between application services and PostgreSQL | | §164.312(d) | Person/Entity Auth | OAuth 2.0 → JWT claims mapped to tenant context | | §164.312(e)(1) | Transmission Security | TLS 1.3 for all connections; mTLS for inter-node replication | | §164.312(e)(2)(ii) | Encryption in Transit | AES-256-GCM for replication streams | ### Row-Level Security Configuration RuVector provides template-based RLS that maps directly to healthcare access patterns: ```sql -- Standard tenant isolation (per health plan / payer org) SELECT ruvector_enable_tenant_rls('patients', 'tenant_id'); SELECT ruvector_enable_tenant_rls('encounters', 'tenant_id'); SELECT ruvector_enable_tenant_rls('clinical_notes', 'tenant_id'); SELECT ruvector_enable_tenant_rls('medications', 'tenant_id'); SELECT ruvector_enable_tenant_rls('lab_results', 'tenant_id'); SELECT ruvector_enable_tenant_rls('claims', 'tenant_id'); -- This generates: -- Policy: ruvector_tenant_isolation -- USING (tenant_id = current_setting('app.tenant_id')) -- Policy: ruvector_admin_bypass -- FOR ALL TO ruvector_admin USING (true) -- Trigger: ruvector_validate_tenant_context_{table} -- Ensures tenant_id is set before any DML -- Trigger: ruvector_check_tenant_exists_{table} -- Validates tenant is not suspended ``` **Isolation level per use case**: ```sql -- Shared isolation (default): RLS policies on tenant_id column -- Used for: standard multi-payer access SELECT ruvector_tenant_create('payer_001', '{"isolation": "shared"}'::jsonb); -- Partition isolation: separate partitions per tenant -- Used for: large payers requiring physical data separation SELECT ruvector_tenant_create('payer_002', '{"isolation": "partition"}'::jsonb); SELECT ruvector_tenant_isolate('payer_002'); -- Dedicated isolation: schema-level with separate indexes -- Used for: government contracts (VA, DoD) requiring complete isolation SELECT ruvector_tenant_create('va_gov', '{"isolation": "dedicated"}'::jsonb); SELECT ruvector_tenant_isolate('va_gov'); SELECT ruvector_tenant_migrate('va_gov', 'dedicated'); ``` ### Audit Trail with Anomaly Detection ```sql CREATE TABLE audit_log ( id BIGSERIAL PRIMARY KEY, timestamp TIMESTAMPTZ DEFAULT now(), tenant_id TEXT NOT NULL, user_id TEXT NOT NULL, action TEXT NOT NULL, -- SELECT, INSERT, UPDATE, DELETE resource_type TEXT NOT NULL, -- patients, encounters, clinical_notes, etc. resource_id BIGINT, query_hash TEXT NOT NULL, -- SHA-256 of the SQL query previous_hash TEXT NOT NULL, -- hash chain: SHA-256(previous_row || current_data) ip_address INET, user_agent TEXT, -- Dense embedding: action context for anomaly detection embedding ruvector(384) NOT NULL, -- Detect anomalous access patterns via distance to normal cluster centroid -- High distance = unusual access pattern → trigger investigation CONSTRAINT audit_integrity CHECK (length(previous_hash) = 64) ); -- HNSW index for anomaly detection search CREATE INDEX idx_audit_embedding ON audit_log USING hnsw (embedding ruvector_cosine_ops) WITH (m = 16, ef_construction = 100); -- Hash-chain integrity verification CREATE OR REPLACE FUNCTION verify_audit_chain(start_id BIGINT, end_id BIGINT) RETURNS BOOLEAN AS $$ DECLARE prev_hash TEXT; curr RECORD; expected_hash TEXT; BEGIN SELECT previous_hash INTO prev_hash FROM audit_log WHERE id = start_id; FOR curr IN SELECT * FROM audit_log WHERE id > start_id AND id <= end_id ORDER BY id LOOP expected_hash := encode(sha256( (prev_hash || curr.tenant_id || curr.user_id || curr.action || curr.resource_type || curr.resource_id::text || curr.timestamp::text)::bytea ), 'hex'); IF curr.previous_hash != expected_hash THEN RETURN FALSE; -- Chain broken: tampering detected END IF; prev_hash := curr.previous_hash; END LOOP; RETURN TRUE; END; $$ LANGUAGE plpgsql; -- Anomaly detection: find unusual access patterns SELECT al.*, 1 - (al.embedding <=> centroid.embedding) AS normality_score FROM audit_log al CROSS JOIN ( SELECT avg(embedding) AS embedding FROM audit_log WHERE timestamp > now() - interval '30 days' AND tenant_id = current_setting('app.tenant_id') ) centroid WHERE al.timestamp > now() - interval '24 hours' AND 1 - (al.embedding <=> centroid.embedding) < 0.3 -- threshold: far from normal ORDER BY normality_score ASC; ``` ### Break-Glass Emergency Access ```sql -- Emergency access bypasses normal RLS for patient safety -- Requires: explicit clinician identity, mandatory audit, time-limited CREATE OR REPLACE FUNCTION break_glass_access( clinician_id TEXT, patient_id BIGINT, reason TEXT, duration_minutes INT DEFAULT 60 ) RETURNS VOID AS $$ BEGIN -- Record break-glass event (cannot be suppressed) INSERT INTO audit_log (tenant_id, user_id, action, resource_type, resource_id, query_hash, previous_hash, embedding) VALUES ( 'BREAK_GLASS', clinician_id, 'BREAK_GLASS_ACCESS', 'patients', patient_id, encode(sha256(reason::bytea), 'hex'), (SELECT previous_hash FROM audit_log ORDER BY id DESC LIMIT 1), embed('break glass emergency access ' || reason) ); -- Grant temporary cross-tenant read access PERFORM set_config('app.break_glass', 'true', true); PERFORM set_config('app.break_glass_patient', patient_id::text, true); PERFORM set_config('app.break_glass_expiry', (extract(epoch from now()) + duration_minutes * 60)::text, true); -- Notify compliance team PERFORM pg_notify('break_glass_alert', jsonb_build_object( 'clinician', clinician_id, 'patient', patient_id, 'reason', reason, 'timestamp', now() )::text); END; $$ LANGUAGE plpgsql SECURITY DEFINER; ``` --- ## Scaling Strategy ### Table Partitioning | Table | Strategy | Partition Key | Rationale | |-------|----------|--------------|-----------| | `patients` | Hash | `tenant_id` | Even distribution across payer orgs | | `encounters` | Range | `period_start` (monthly) | Time-series queries; archive old partitions | | `clinical_notes` | Range | `created_at` (monthly) | Largest table; monthly partitions for lifecycle mgmt | | `medications` | Hash | `tenant_id` | Cross-patient drug queries within payer | | `lab_results` | Range | `collected_at` (monthly) | Time-series; trend analysis benefits from temporal locality | | `claims` | List + Range | `tenant_id` (list), `submitted_at` (range) | Per-payer financial isolation + temporal archival | ### 4-Tier Quantization Strategy Data ages through four quantization tiers, reducing storage while maintaining search quality for the access pattern of each tier: | Tier | Age | Quantization | Compression | Recall@10 | Use Case | |------|-----|-------------|-------------|-----------|----------| | **Hot** | 0-2 years | f32 (full precision) | 1× | >0.99 | Active clinical care, CDS queries | | **Warm** | 2-5 years | Scalar SQ8 | 4× | >0.97 | Historical lookups, population health | | **Cool** | 5-7 years | Product PQ (m=48, nbits=8) | 16× | >0.92 | Research, longitudinal studies | | **Cold** | 7+ years | Binary quantization | 32× | >0.80 | Legal retention, rare lookups | ```sql -- Automated tier migration (runs nightly) -- Leverages self-healing TierEviction strategy SELECT ruvector_healing_configure('{ "tier_eviction": { "target_free_pct": 0.15, "batch_size": 100000, "tiers": [ {"name": "hot", "max_age_days": 730, "quantization": "f32"}, {"name": "warm", "max_age_days": 1825, "quantization": "sq8"}, {"name": "cool", "max_age_days": 2555, "quantization": "pq"}, {"name": "cold", "max_age_days": null, "quantization": "binary"} ] } }'::jsonb); ``` ### Replication Topology ``` ┌─────────────────────┐ │ Primary │ │ (Read/Write) │ │ ruvector-postgres │ └──────────┬──────────┘ │ Synchronous Replication │ ┌──────────▼──────────┐ │ Sync Standby │ │ (Hot Failover) │ │ RPO = 0 │ └──────────┬──────────┘ │ ┌────────────────┴────────────────┐ │ │ Async Replication Async Replication │ │ ┌──────────▼──────────┐ ┌──────────▼──────────┐ │ Async Replica 1 │ │ Async Replica 2 │ │ (CDS Queries) │ │ (Analytics) │ │ RPO < 1s │ │ RPO < 5s │ └─────────────────────┘ └─────────────────────┘ ``` **Failover guarantees**: - Primary → Sync Standby: automatic failover, RPO = 0 (zero data loss) - Sync Standby → Async Replica: manual promotion, RPO < 1s - CDS queries route to Async Replica 1 (read-only, low-latency) - Analytics/reporting route to Async Replica 2 (read-only, tolerates lag) --- ## Self-Healing & Monitoring ### Remediation Strategies Mapped to Clinical Impact RuVector's five built-in remediation strategies map to specific clinical risk scenarios: | Strategy | Trigger | Clinical Impact | Auto-Execute | |----------|---------|----------------|-------------| | **ReindexPartition** | HNSW recall drops below 0.95 | CDS search quality degrades → missed diagnoses | Yes (concurrent) | | **PromoteReplica** | Primary fails health check | All writes stop → no new encounters/orders recorded | Yes (with grace period) | | **TierEviction** | Storage > 85% capacity | Cannot record new clinical data → patient safety risk | Yes (batch) | | **QueryCircuitBreaker** | Query latency p99 > 200ms sustained | CDS response too slow for clinical workflow | Yes (blocks pattern) | | **IntegrityRecovery** | HNSW graph edges corrupted | Search returns incorrect similar patients | Yes (verify after) | ### eHealth-Specific Monitoring Thresholds ```sql -- Configure healing thresholds for healthcare workload SELECT ruvector_healing_set_thresholds('{ "hnsw_recall_minimum": 0.95, "replication_lag_max_ms": 1000, "storage_usage_max_pct": 85, "query_latency_p99_max_ms": 200, "coherence_energy_max": 0.3, "check_interval_seconds": 30, "auto_heal_enabled": true }'::jsonb); -- Start the healing background worker SELECT ruvector_healing_worker_start(); -- Configure worker check interval (every 30 seconds for healthcare) SELECT ruvector_healing_worker_config('{ "check_interval_secs": 30, "max_concurrent_remediations": 2, "notify_on_action": true, "escalation_threshold": 3 }'::jsonb); ``` ### Health Check Functions ```sql -- Overall system health (returns JSON with all subsystem statuses) SELECT ruvector_health_status(); -- Quick boolean health check for load balancer probes SELECT ruvector_is_healthy(); -- System metrics for monitoring dashboards SELECT ruvector_system_metrics(); -- View healing history (what was fixed and when) SELECT ruvector_healing_history(20); -- Check strategy effectiveness over time SELECT ruvector_healing_effectiveness(); -- View current thresholds SELECT ruvector_healing_thresholds(); -- List available strategies and their current weights SELECT ruvector_healing_strategies(); -- Manual trigger for specific problem type SELECT ruvector_healing_trigger('index_degradation'); -- View all recognized problem types SELECT ruvector_healing_problem_types(); ``` --- ## Consequences ### Benefits | # | Benefit | Impact | |---|---------|--------| | 1 | **Single-engine HIPAA compliance** | One BAA, one encryption boundary, one audit system → 60% reduction in compliance audit surface | | 2 | **In-database ML** | GCN/GAT/GraphSAGE run inside PostgreSQL → no PHI export to external ML services | | 3 | **Semantic interoperability** | SPARQL over 31.4M triples maps SNOMED↔ICD-10↔LOINC↔RxNorm without external services | | 4 | **Sub-100ms CDS** | Hybrid BM25+vector search with RRF fusion retrieves clinical context in <30ms p99 | | 5 | **Real-time fraud detection** | Flash attention over claims graph detects anomalous billing patterns at ingestion time | | 6 | **Clinical disagreement detection** | Coherence Engine (ADR-014) sheaf Laplacian catches medication-diagnosis contradictions | | 7 | **Self-healing availability** | 5 automated remediation strategies reduce MTTR from hours to seconds | | 8 | **Hierarchical ontology search** | Poincaré embeddings preserve ICD-10/SNOMED tree structure for hierarchical concept queries | ### Risks and Mitigations | # | Risk | Likelihood | Impact | Mitigation | |---|------|-----------|--------|------------| | 1 | **Recall degradation at scale** | Medium | High — missed similar patients in CDS | HNSW m=24 + ef_construction=200 targets >0.95 recall; self-healing ReindexPartition auto-triggers below threshold | | 2 | **Embedding model bias** | Medium | High — biased clinical recommendations | BioClinicalBERT trained on MIMIC-III (diverse ICU population); regular bias audits on embedding clusters by demographic | | 3 | **Storage growth exceeds projections** | Medium | Medium — capacity planning failure | 4-tier quantization reduces 25.2TB raw → 9.3TB; automated TierEviction maintains 15% free | | 4 | **Ontology update lag** | Low | Medium — outdated crosswalks affect billing | Quarterly SNOMED/ICD-10 reload via `ruvector_load_ntriples`; versioned RDF graphs per release | | 5 | **Single-engine failure mode** | Low | Critical — all services affected | Sync standby (RPO=0) + 2 async replicas; self-healing PromoteReplica with configurable grace period | | 6 | **GNN computational cost** | Medium | Medium — GCN forward pass latency | Batch GNN updates nightly via `ruvector_gnn_batch_forward`; serve pre-computed embeddings for real-time queries | | 7 | **HIPAA breach via vector inversion** | Low | Critical — PHI reconstructed from embeddings | 384-dim BioClinicalBERT embeddings are non-invertible by design; PQ/Binary quantization further destroys reconstruction fidelity | ### Trade-offs | # | Trade-off | Choice | Alternative | Rationale | |---|-----------|--------|-------------|-----------| | 1 | Embedding dimension | 384-dim | 768-dim (full BERT) | 384 halves storage/index cost; BioClinicalBERT-384 achieves 96.2% of 768-dim recall on clinical benchmarks | | 2 | ANN index type | HNSW | IVFFlat | HNSW provides consistent sub-10ms latency without cluster-rebalancing pauses; IVFFlat requires periodic retraining | | 3 | Search fusion method | RRF (default) | Learned fusion | RRF is parameter-free and robust; learned fusion requires training data per query type (future enhancement) | | 4 | Hyperbolic dimension | 32-dim Poincaré | 64-dim or Euclidean | 32-dim Poincaré reconstructs ICD-10 tree with <2% distortion; Euclidean requires 128+ dims for equivalent fidelity | | 5 | Replication strategy | 1 sync + 2 async | All synchronous | Full sync cuts write throughput 3× for marginal RPO improvement; async replicas serve read-heavy CDS workload | --- ## References ### Standards & Regulations - HL7 FHIR R4 Specification: https://hl7.org/fhir/R4/ - HIPAA 45 CFR Part 164 — Security Rule: https://www.hhs.gov/hipaa/for-professionals/security/ - SNOMED CT International: https://www.snomed.org/ - ICD-10-CM: https://www.cdc.gov/nchs/icd/icd-10-cm.htm - LOINC: https://loinc.org/ - RxNorm: https://www.nlm.nih.gov/research/umls/rxnorm/ - X12 837/835 Transaction Sets: https://x12.org/ ### Internal Cross-References - **ADR-001**: RuVector Core Architecture — foundational vector engine, HNSW indexing, SIMD optimization, quantization tiers - **ADR-014**: Coherence Engine — sheaf Laplacian computation, residual calculation, coherence gating, witness records ### Academic References - Nickel & Kiela (2017). "Poincaré Embeddings for Learning Hierarchical Representations." NeurIPS. - Dao et al. (2022). "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness." NeurIPS. - Hamilton et al. (2017). "Inductive Representation Learning on Large Graphs." NeurIPS. (GraphSAGE) - Kipf & Welling (2017). "Semi-Supervised Classification with Graph Convolutional Networks." ICLR. (GCN) - Robertson & Zaragoza (2009). "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in IR. - Cormack et al. (2009). "Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods." SIGIR. (RRF) ### RuVector-Postgres SQL Function Reference | Module | Key Functions | |--------|--------------| | **Hybrid Search** | `ruvector_hybrid_search`, `ruvector_hybrid_score`, `ruvector_hybrid_configure`, `ruvector_register_hybrid`, `ruvector_hybrid_stats`, `ruvector_hybrid_list` | | **Graph/SPARQL** | `ruvector_create_rdf_store`, `ruvector_sparql`, `ruvector_sparql_json`, `ruvector_sparql_update`, `ruvector_load_ntriples`, `ruvector_insert_triple`, `ruvector_rdf_stats`, `ruvector_cypher`, `ruvector_shortest_path`, `ruvector_create_graph`, `ruvector_add_node`, `ruvector_add_edge` | | **Hyperbolic** | `ruvector_poincare_distance`, `ruvector_lorentz_distance`, `ruvector_mobius_add`, `ruvector_exp_map`, `ruvector_log_map`, `ruvector_poincare_to_lorentz`, `ruvector_lorentz_to_poincare`, `ruvector_minkowski_dot` | | **GNN** | `ruvector_gcn_forward`, `ruvector_graphsage_forward`, `ruvector_gnn_aggregate`, `ruvector_message_pass`, `ruvector_gnn_batch_forward`, `ruvector_gnn_status` | | **Attention** | `ruvector_flash_attention`, `ruvector_multi_head_attention`, `ruvector_attention_score`, `ruvector_attention_scores`, `ruvector_softmax`, `ruvector_attention_types` | | **Tenancy** | `ruvector_tenant_create`, `ruvector_tenant_set`, `ruvector_tenant_stats`, `ruvector_tenant_quota_check`, `ruvector_enable_tenant_rls`, `ruvector_tenant_isolate`, `ruvector_tenant_migrate`, `ruvector_tenant_suspend`, `ruvector_tenant_resume`, `ruvector_generate_rls_sql` | | **Self-Healing** | `ruvector_health_status`, `ruvector_is_healthy`, `ruvector_healing_worker_start`, `ruvector_healing_configure`, `ruvector_healing_set_thresholds`, `ruvector_healing_trigger`, `ruvector_healing_strategies`, `ruvector_healing_effectiveness`, `ruvector_healing_check_now` |