Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/docs/postgres/v2/00-overview.md
+++ b/docs/postgres/v2/00-overview.md
@@ -0,0 +1,645 @@
+# RuVector Postgres v2 - Architecture Overview
+<!-- Last reviewed: 2025-12-25 -->
+
+## What We're Building
+
+Most databases, including vector databases, are **performance-first systems**. They optimize for speed, recall, and throughput, then bolt on monitoring. Structural safety is assumed, not measured.
+
+RuVector does something different.
+
+We give the system a **continuous, internal measure of its own structural integrity**, and the ability to **change its own behavior based on that signal**.
+
+This puts RuVector in a very small class of systems.
+
+---
+
+## Why This Actually Matters
+
+### 1. From Symptom Monitoring to Causal Monitoring
+
+Everyone else watches outputs: latency, errors, recall.
+
+We watch **connectivity and dependence**, which are upstream causes.
+
+By the time latency spikes, the graph has already weakened. We detect that weakening while everything still looks healthy.
+
+> **This is the difference between a smoke alarm and a structural stress sensor.**
+
+### 2. Mincut Is a Leading Indicator, Not a Metric
+
+Mincut answers a question no metric answers:
+
+> *"How close is this system to splitting?"*
+
+Not how slow it is. Not how many errors. **How close it is to losing coherence.**
+
+That is a different axis of observability.
+
+### 3. An Algorithm Becomes a Control Signal
+
+Most people use graph algorithms for analysis. We use mincut to **gate behavior**.
+
+That makes it a **control plane**, not analytics.
+
+Very few production systems have mathematically grounded control loops.
+
+### 4. Failure Mode Changes Class
+
+| Without Integrity Control | With Integrity Control |
+|---------------------------|------------------------|
+| Fast → stressed → cascading failure → manual recovery | Fast → stressed → scope reduction → graceful degradation → automatic recovery |
+
+Changing failure mode is what separates hobby systems from infrastructure.
+
+### 5. Explainable Operations
+
+The **witness edges** are huge.
+
+When something slows down or freezes, we can say: *"Here are the exact links that would have failed next."*
+
+That is gold in production, audits, and regulated environments.
+
+---
+
+## Why Nobody Else Has Done This
+
+Not because it's impossible. Because:
+
+1. **Most systems don't model themselves as graphs** — we do
+2. **Mincut was too expensive dynamically** — we use contracted graphs (~1000 nodes, not millions)
+3. **Ops culture reacts, it doesn't preempt** — we preempt
+4. **Survivability isn't a KPI until after outages** — we measure it continuously
+
+---
+
+## The Honest Framing
+
+Will this get applause from model benchmarks or social media? No.
+
+Will this make systems boringly reliable and therefore indispensable? Yes.
+
+Those are the ideas that end up everywhere.
+
+**We're not making vector search faster. We're making vector infrastructure survivable.**
+
+---
+
+## What This Is, Concretely
+
+RuVector Postgres v2 is a **PostgreSQL extension** (built with pgrx) that provides:
+
+- **100% pgvector compatibility** — drop-in replacement, change extension name, queries work unchanged
+- **Architecture separation** — PostgreSQL handles ACID/joins, RuVector handles vectors/graphs/learning
+- **Dynamic mincut integrity gating** — the control plane described above
+- **Self-learning pipeline** — GNN-based query optimization that improves over time
+- **Tiered storage** — automatic hot/warm/cool/cold management with compression
+- **Graph engine with Cypher** — property graphs with SQL joins
+
+---
+
+## Architecture Principles
+
+### Separation of Concerns
+
+```
+------------------------------------------------------------------+
+|                     PostgreSQL Frontend                           |
+|  (SQL Parsing, Planning, ACID, Transactions, Joins, Aggregates)   |
+------------------------------------------------------------------+
+                              |
+                              v
+------------------------------------------------------------------+
+|                   Extension Boundary (pgrx)                       |
+|  - Type definitions (vector, sparsevec, halfvec)                 |
+|  - Operator overloads (<->, <=>, <#>)                            |
+|  - Index access method hooks                                      |
+|  - Background worker registration                                 |
+------------------------------------------------------------------+
+                              |
+                              v
+------------------------------------------------------------------+
+|                    RuVector Engine (Rust)                         |
+|  - HNSW/IVFFlat indexing                                         |
+|  - SIMD distance calculations                                     |
+|  - Graph storage & Cypher execution                              |
+|  - GNN training & inference                                       |
+|  - Compression & tiering                                          |
+|  - Mincut integrity control                                       |
+------------------------------------------------------------------+
+```
+
+### Core Design Decisions
+
+| Decision | Rationale |
+|----------|-----------|
+| **pgrx for extension** | Safe Rust bindings, modern build system, well-maintained |
+| **Background worker pattern** | Long-lived engine, avoid per-query initialization |
+| **Shared memory IPC** | Bounded request queue with explicit payload limits (see [02-background-workers](02-background-workers.md)) |
+| **WAL as source of truth** | Leverage Postgres replication, durability guarantees |
+| **Contracted mincut graph** | Never compute on full similarity - use operational graph |
+| **Hybrid consistency** | Synchronous hot tier, async background ops (see [10-consistency-replication](10-consistency-replication.md)) |
+
+---
+
+## System Architecture
+
+### High-Level Components
+
+```
+                                   +-----------------------+
+                                   |   Client Application  |
+                                   +-----------+-----------+
+                                               |
+                                   +-----------v-----------+
+                                   |     PostgreSQL        |
+                                   |  +-----------------+  |
+                                   |  | Query Executor  |  |
+                                   |  +--------+--------+  |
+                                   |           |           |
+                                   |  +--------v--------+  |
+                                   |  | RuVector SQL    |  |
+                                   |  | Surface Layer   |  |
+                                   |  +--------+--------+  |
+                                   +-----------|----------+
+                                               |
+                          +--------------------+--------------------+
+                          |                                         |
+               +----------v----------+                  +-----------v-----------+
+               |   Index AM Hooks    |                  |  Background Workers   |
+               |  (HNSW, IVFFlat)   |                  |  (Maintenance, GNN)   |
+               +----------+----------+                  +-----------+-----------+
+                          |                                         |
+                          +--------------------+--------------------+
+                                               |
+                                   +-----------v-----------+
+                                   |   Shared Memory      |
+                                   |   Communication      |
+                                   +-----------+-----------+
+                                               |
+                                   +-----------v-----------+
+                                   |   RuVector Engine    |
+                                   |  +-------+ +-------+ |
+                                   |  | Index | | Graph | |
+                                   |  +-------+ +-------+ |
+                                   |  +-------+ +-------+ |
+                                   |  |  GNN  | | Tier  | |
+                                   |  +-------+ +-------+ |
+                                   |  +------------------+|
+                                   |  | Integrity Ctrl   ||
+                                   |  +------------------+|
+                                   +-----------------------+
+```
+
+### Component Responsibilities
+
+#### 1. SQL Surface Layer
+- **pgvector type compatibility**: `vector(n)`, operators `<->`, `<#>`, `<=>`
+- **Extended types**: `sparsevec`, `halfvec`, `binaryvec`
+- **Function catalog**: `ruvector_*` functions for advanced features
+- **Views**: `ruvector_nodes`, `ruvector_edges`, `ruvector_hyperedges`
+
+#### 2. Index Access Methods
+- **ruhnsw**: HNSW index with configurable M, ef_construction
+- **ruivfflat**: IVF-Flat index with automatic centroid updates
+- **Scan hooks**: Route queries to RuVector engine
+- **Build hooks**: Incremental and bulk index construction
+
+#### 3. Background Workers
+- **Engine Worker**: Long-lived RuVector engine instance
+- **Maintenance Worker**: Tiering, compaction, statistics
+- **GNN Training Worker**: Periodic model updates
+- **Integrity Worker**: Mincut sampling and state updates
+
+#### 4. RuVector Engine
+- **Index Manager**: HNSW/IVFFlat in-memory structures
+- **Graph Store**: Property graph with Cypher support
+- **GNN Pipeline**: Training data capture, model inference
+- **Tier Manager**: Hot/warm/cool/cold classification
+- **Integrity Controller**: Mincut-based operation gating
+
+---
+
+## Feature Matrix
+
+### Phase 1: pgvector Compatibility (Foundation)
+
+| Feature | Status | Description |
+|---------|--------|-------------|
+| `vector(n)` type | Core | Dense vector storage |
+| `<->` operator | Core | L2 (Euclidean) distance |
+| `<=>` operator | Core | Cosine distance |
+| `<#>` operator | Core | Negative inner product |
+| HNSW index | Core | `CREATE INDEX ... USING hnsw` |
+| IVFFlat index | Core | `CREATE INDEX ... USING ivfflat` |
+| `vector_l2_ops` | Core | Operator class for L2 |
+| `vector_cosine_ops` | Core | Operator class for cosine |
+| `vector_ip_ops` | Core | Operator class for inner product |
+
+### Phase 2: Tiered Storage & Compression
+
+| Feature | Status | Description |
+|---------|--------|-------------|
+| `ruvector_set_tiers()` | v2 | Configure tier thresholds |
+| `ruvector_compact()` | v2 | Trigger manual compaction |
+| Access frequency tracking | v2 | Background counter updates |
+| Automatic tier promotion/demotion | v2 | Policy-based migration |
+| SQ8/PQ compression | v2 | Transparent quantization |
+
+### Phase 3: Graph Engine & Cypher
+
+| Feature | Status | Description |
+|---------|--------|-------------|
+| `ruvector_cypher()` | v2 | Execute Cypher queries |
+| `ruvector_nodes` view | v2 | Graph nodes as relations |
+| `ruvector_edges` view | v2 | Graph edges as relations |
+| `ruvector_hyperedges` view | v2 | Hyperedge support |
+| SQL-graph joins | v2 | Mix Cypher with SQL |
+
+### Phase 4: Integrity Control Plane
+
+| Feature | Status | Description |
+|---------|--------|-------------|
+| `ruvector_integrity_sample()` | v2 | Sample contracted graph |
+| `ruvector_integrity_policy_set()` | v2 | Configure policies |
+| `ruvector_integrity_gate()` | v2 | Check operation permission |
+| Integrity states | v2 | normal/stress/critical |
+| Signed audit events | v2 | Cryptographic audit trail |
+
+---
+
+## Data Flow Patterns
+
+### Vector Search (Read Path)
+
+```
+1. Client: SELECT ... ORDER BY embedding <-> $query LIMIT k
+
+2. PostgreSQL Planner:
+   - Recognizes index on embedding column
+   - Generates Index Scan plan using ruhnsw
+
+3. Index AM (amgettuple):
+   - Submits search request to shared memory queue
+   - Engine worker receives request
+
+4. RuVector Engine:
+   - Checks integrity gate (normal state: proceed)
+   - Executes HNSW greedy search
+   - Applies post-filter if needed
+   - Returns top-k with distances
+
+5. Index AM:
+   - Fetches results from shared memory
+   - Returns TIDs to executor
+
+6. PostgreSQL Executor:
+   - Fetches heap tuples
+   - Applies remaining WHERE clauses
+   - Returns to client
+```
+
+### Vector Insert (Write Path)
+
+```
+1. Client: INSERT INTO items (embedding) VALUES ($vec)
+
+2. PostgreSQL Executor:
+   - Assigns TID, writes heap tuple
+   - Generates WAL record
+
+3. Index AM (aminsert):
+   - Checks integrity gate (normal: proceed, stress: throttle)
+   - Submits insert to engine queue
+
+4. RuVector Engine:
+   - Integrates vector into HNSW graph
+   - Updates tier counters
+   - Writes to hot tier
+
+5. WAL Writer:
+   - Persists operation for durability
+
+6. Replication (if configured):
+   - Streams WAL to replicas
+   - Replicas apply via engine
+```
+
+### Integrity Gating
+
+```
+1. Background Worker (periodic):
+   - Samples contracted operational graph
+   - Computes lambda_cut (minimum cut value) on contracted graph
+   - Optionally computes lambda2 (algebraic connectivity) as drift signal
+   - Updates integrity state in shared memory
+
+2. Any Operation:
+   - Reads current integrity state
+   - normal (lambda > T_high): allow all
+   - stress (T_low < lambda < T_high): throttle bulk ops
+   - critical (lambda < T_low): freeze mutations
+
+3. On State Change:
+   - Logs signed integrity event
+   - Notifies waiting operations
+   - Adjusts background worker priorities
+```
+
+---
+
+## Deployment Modes
+
+### Mode 1: Single Postgres Embedded
+
+```
+--------------------------------------------+
+|            PostgreSQL Instance             |
+|  +--------------------------------------+  |
+|  |          RuVector Extension          |  |
+|  |  +--------+  +---------+  +-------+  |  |
+|  |  | Engine |  | Workers |  | Index |  |  |
+|  |  +--------+  +---------+  +-------+  |  |
+|  +--------------------------------------+  |
+|                                            |
+|  +--------------------------------------+  |
+|  |           Data Directory             |  |
+|  |  vectors/ graphs/ indexes/ wal/      |  |
+|  +--------------------------------------+  |
+--------------------------------------------+
+```
+
+**Use case**: Development, small-medium deployments (< 100M vectors)
+
+### Mode 2: Postgres + RuVector Cluster
+
+```
+------------------+      +------------------+
+|   PostgreSQL 1   |      |   PostgreSQL 2   |
+|  (Primary)       |      |  (Replica)       |
+--------+---------+      +--------+---------+
+         |                         |
+         | WAL Stream              | WAL Apply
+         |                         |
+--------v-------------------------v---------+
+|              RuVector Cluster              |
+|  +-------+  +-------+  +-------+  +------+ |
+|  | Node1 |  | Node2 |  | Node3 |  | ...  | |
+|  +-------+  +-------+  +-------+  +------+ |
+|                                             |
+|  Distributed HNSW | Sharded Graph | GNN    |
+---------------------------------------------+
+```
+
+**Use case**: Production, large deployments (100M+ vectors)
+
+### v2 Cluster Mode Clarification
+
+```
+------------------------------------------------------------------+
+|              CLUSTER DEPLOYMENT DECISION                          |
+------------------------------------------------------------------+
+
+v2 cluster mode is a SEPARATE SERVICE with a stable RPC API.
+The Postgres extension acts as a CLIENT to the cluster.
+
+ARCHITECTURE OPTIONS:
+
+Option A: SIDECAR (per Postgres instance)
+  • RuVector cluster node co-located with each Postgres
+  • Pros: Low latency, simple networking
+  • Cons: Resource contention, harder to scale independently
+  • Use when: Latency-sensitive, moderate scale
+
+Option B: SHARED SERVICE (separate cluster)
+  • Dedicated RuVector cluster serving multiple Postgres instances
+  • Pros: Independent scaling, resource isolation
+  • Cons: Network latency, requires service discovery
+  • Use when: Large scale, multi-tenant
+
+PROTOCOL:
+  • gRPC with protobuf serialization
+  • mTLS for authentication
+  • Connection pooling in extension
+
+PARTITION ASSIGNMENT:
+  • Consistent hashing for shard routing
+  • Automatic rebalancing on node join/leave
+  • Partition map cached in extension shared memory
+
+PARTITION MAP VERSIONING AND FENCING:
+  • partition_map_version: monotonic counter incremented on any change
+  • lease_epoch: obtained from cluster leader, prevents split-brain
+  • Extension rejects stale map updates unless epoch matches current
+  • On leader failover:
+    1. New leader increments epoch
+    2. Extensions must re-fetch map with new epoch
+    3. Stale-epoch operations return ESTALE, client retries
+
+v2 RECOMMENDATION:
+  Start with Mode 1 (embedded). Add cluster mode only when:
+  • Dataset exceeds single-node memory
+  • Need independent scaling of compute/storage
+  • Multi-region deployment required
+
+------------------------------------------------------------------+
+```
+
+---
+
+## Consistency Contract
+
+### Heap-Engine Relationship
+
+```
+------------------------------------------------------------------+
+|                    CONSISTENCY CONTRACT                           |
+------------------------------------------------------------------+
+|                                                                   |
+|  PostgreSQL Heap is AUTHORITATIVE for:                           |
+|    • Row existence and visibility (MVCC xmin/xmax)               |
+|    • Transaction commit status                                    |
+|    • Data integrity constraints                                   |
+|                                                                   |
+|  RuVector Engine Index is EVENTUALLY CONSISTENT:                 |
+|    • Bounded lag window (configurable, default 100ms)            |
+|    • Never returns invisible tuples (heap recheck)               |
+|    • Never resurrects deleted vectors                             |
+|                                                                   |
+|  v2 HYBRID MODEL:                                                 |
+|    • SYNCHRONOUS: Hot tier mutations, primary HNSW inserts       |
+|    • ASYNCHRONOUS: Compaction, tier moves, graph maintenance     |
+|                                                                   |
+------------------------------------------------------------------+
+```
+
+See [10-consistency-replication.md](10-consistency-replication.md) for full specification.
+
+---
+
+## Performance Targets
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| Query latency (p50) | < 5ms | 1M vectors, top-10 |
+| Query latency (p99) | < 20ms | 1M vectors, top-10 |
+| Insert throughput | > 10K/sec | Bulk mode |
+| Index build | < 30min | 10M 768-dim vectors |
+| Recall@10 | > 95% | HNSW default params |
+| Compression ratio | 4-32x | Tier-dependent |
+| Memory overhead | < 2x | Compared to pgvector |
+
+### Benchmark Specification
+
+Performance targets must be validated against a defined benchmark suite:
+
+```
+------------------------------------------------------------------+
+|                    BENCHMARK SPECIFICATION                        |
+------------------------------------------------------------------+
+
+VECTOR CONFIGURATIONS:
+  • Dimensions: 768 (typical text embeddings), 1536 (large embedding models)
+  • Row counts: 1M, 10M, 100M
+  • Data type: float32
+
+QUERY PATTERNS:
+  • Pure vector search (no filter)
+  • Vector + metadata filter (10% selectivity)
+  • Vector + metadata filter (1% selectivity)
+  • Batch query (100 queries)
+
+HARDWARE BASELINE:
+  • CPU: 8 cores (AMD EPYC or Intel Xeon)
+  • RAM: 64GB
+  • Storage: NVMe SSD (3GB/s read)
+  • Single node, no replication
+
+CONCURRENCY:
+  • Single thread baseline
+  • 8 concurrent queries (parallel)
+  • 32 concurrent queries (stress)
+
+RECALL MEASUREMENT:
+  • Brute-force baseline on 10K sampled queries
+  • Report recall@1, recall@10, recall@100
+  • Calculate 95th percentile recall
+
+INDEX CONFIGURATIONS:
+  • HNSW: M=16, ef_construction=200, ef_search=100
+  • IVFFlat: nlist=sqrt(N), nprobe=10
+
+TIER-SPECIFIC TARGETS:
+  • Hot tier: exact float32, recall > 98%
+  • Warm tier: exact or float16, recall > 96%
+  • Cool tier: approximate + rerank, recall > 94%
+  • Cold tier: approximate only, recall > 90%
+
+------------------------------------------------------------------+
+```
+
+---
+
+## Security Considerations
+
+### Integrity Event Signing
+
+All integrity state changes are cryptographically signed:
+
+```rust
+struct IntegrityEvent {
+    timestamp: DateTime<Utc>,
+    event_type: IntegrityEventType,
+    previous_state: IntegrityState,
+    new_state: IntegrityState,
+    lambda_cut: f64,
+    witness_edges: Vec<EdgeId>,
+    signature: Ed25519Signature,
+}
+```
+
+### Access Control
+
+- Leverages PostgreSQL GRANT/REVOKE
+- Separate roles for:
+  - `ruvector_admin`: Full access
+  - `ruvector_operator`: Maintenance operations
+  - `ruvector_user`: Query and insert only
+
+### Audit Trail
+
+- All administrative operations logged
+- Integrity events stored in `ruvector_integrity_events`
+- Optional export to external SIEM
+
+---
+
+## Implementation Roadmap
+
+### Phase 1: Foundation (Weeks 1-4)
+- [ ] Extension skeleton with pgrx
+- [ ] Collection metadata tables
+- [ ] Basic HNSW integration
+- [ ] pgvector compatibility tests
+- [ ] Recall/performance benchmarks
+
+### Phase 2: Tiered Storage (Weeks 5-8)
+- [ ] Access counter infrastructure
+- [ ] Tier policy table
+- [ ] Background compactor
+- [ ] Compression integration
+- [ ] Tier report functions
+
+### Phase 3: Graph & Cypher (Weeks 9-12)
+- [ ] Graph storage schema
+- [ ] Cypher parser integration
+- [ ] Relational bridge views
+- [ ] SQL-graph join helpers
+- [ ] Graph maintenance
+
+### Phase 4: Integrity Control (Weeks 13-16)
+- [ ] Contracted graph construction
+- [ ] Lambda cut computation
+- [ ] Policy application layer
+- [ ] Signed audit events
+- [ ] Control plane testing
+
+---
+
+## Dependencies
+
+### Rust Crates
+
+| Crate | Purpose |
+|-------|---------|
+| `pgrx` | PostgreSQL extension framework |
+| `parking_lot` | Fast synchronization primitives |
+| `crossbeam` | Lock-free data structures |
+| `serde` | Serialization |
+| `ed25519-dalek` | Signature verification |
+
+### PostgreSQL Features
+
+| Feature | Minimum Version |
+|---------|-----------------|
+| Background workers | 9.4+ |
+| Custom access methods | 9.6+ |
+| Parallel query | 9.6+ |
+| Logical replication | 10+ |
+| Partitioning | 10+ (native) |
+
+---
+
+## Related Documents
+
+| Document | Description |
+|----------|-------------|
+| [01-sql-schema.md](01-sql-schema.md) | Complete SQL schema |
+| [02-background-workers.md](02-background-workers.md) | Worker specifications with IPC contract |
+| [03-index-access-methods.md](03-index-access-methods.md) | Index AM details |
+| [04-integrity-events.md](04-integrity-events.md) | Event schema, policies, hysteresis, operation classes |
+| [05-phase1-pgvector-compat.md](05-phase1-pgvector-compat.md) | Phase 1 specification with incremental AM path |
+| [06-phase2-tiered-storage.md](06-phase2-tiered-storage.md) | Phase 2 specification with tier exactness modes |
+| [07-phase3-graph-cypher.md](07-phase3-graph-cypher.md) | Phase 3 specification with SQL join keys |
+| [08-phase4-integrity-control.md](08-phase4-integrity-control.md) | Phase 4 specification (mincut + λ₂) |
+| [09-migration-guide.md](09-migration-guide.md) | pgvector migration |
+| [10-consistency-replication.md](10-consistency-replication.md) | Consistency contract, MVCC, WAL, recovery |
--- a/docs/postgres/v2/01-sql-schema.md
+++ b/docs/postgres/v2/01-sql-schema.md
--- a/docs/postgres/v2/02-background-workers.md
+++ b/docs/postgres/v2/02-background-workers.md
--- a/docs/postgres/v2/03-index-access-methods.md
+++ b/docs/postgres/v2/03-index-access-methods.md
--- a/docs/postgres/v2/04-integrity-events.md
+++ b/docs/postgres/v2/04-integrity-events.md
--- a/docs/postgres/v2/05-phase1-pgvector-compat.md
+++ b/docs/postgres/v2/05-phase1-pgvector-compat.md
--- a/docs/postgres/v2/06-phase2-tiered-storage.md
+++ b/docs/postgres/v2/06-phase2-tiered-storage.md
--- a/docs/postgres/v2/07-phase3-graph-cypher.md
+++ b/docs/postgres/v2/07-phase3-graph-cypher.md
--- a/docs/postgres/v2/08-phase4-integrity-control.md
+++ b/docs/postgres/v2/08-phase4-integrity-control.md
--- a/docs/postgres/v2/09-migration-guide.md
+++ b/docs/postgres/v2/09-migration-guide.md
@@ -0,0 +1,656 @@
+# RuVector Postgres v2 - Migration Guide
+
+## Overview
+
+This guide provides step-by-step instructions for migrating from pgvector to RuVector Postgres v2. The migration is designed to be **non-disruptive** with zero data loss and minimal downtime.
+
+---
+
+## Migration Approaches
+
+### Approach 1: In-Place Extension Swap (Recommended)
+
+Swap the extension while keeping data in place. Fastest with zero data copy.
+
+**Downtime**: < 5 minutes
+**Risk**: Low
+
+### Approach 2: Parallel Run with Gradual Cutover
+
+Run both extensions simultaneously, gradually shifting traffic.
+
+**Downtime**: Zero
+**Risk**: Very Low
+
+### Approach 3: Full Data Migration
+
+Export and re-import all data. Use when changing schema significantly.
+
+**Downtime**: Proportional to data size
+**Risk**: Medium
+
+---
+
+## Pre-Migration Checklist
+
+### 1. Verify Compatibility
+
+```sql
+-- Check pgvector version
+SELECT extversion FROM pg_extension WHERE extname = 'vector';
+
+-- Check PostgreSQL version (RuVector requires 14+)
+SELECT version();
+
+-- Count vectors and indexes
+SELECT
+    relname AS table_name,
+    pg_size_pretty(pg_relation_size(c.oid)) AS size,
+    (SELECT COUNT(*) FROM pg_class WHERE relname = c.relname) AS rows
+FROM pg_class c
+JOIN pg_namespace n ON n.oid = c.relnamespace
+WHERE c.relkind = 'r'
+  AND EXISTS (
+      SELECT 1 FROM pg_attribute a
+      JOIN pg_type t ON a.atttypid = t.oid
+      WHERE a.attrelid = c.oid AND t.typname = 'vector'
+  );
+
+-- List vector indexes
+SELECT
+    i.relname AS index_name,
+    t.relname AS table_name,
+    am.amname AS index_type,
+    pg_size_pretty(pg_relation_size(i.oid)) AS size
+FROM pg_index ix
+JOIN pg_class i ON ix.indexrelid = i.oid
+JOIN pg_class t ON ix.indrelid = t.oid
+JOIN pg_am am ON i.relam = am.oid
+WHERE am.amname IN ('hnsw', 'ivfflat');
+```
+
+### 2. Backup
+
+```bash
+# Full database backup
+pg_dump -Fc -f backup_before_migration.dump mydb
+
+# Or just schema with vector data
+pg_dump -Fc --table='*embedding*' -f vector_tables.dump mydb
+```
+
+### 3. Test Environment
+
+```bash
+# Restore to test environment
+createdb mydb_test
+pg_restore -d mydb_test backup_before_migration.dump
+
+# Install RuVector extension for testing
+psql mydb_test -c "CREATE EXTENSION ruvector"
+```
+
+---
+
+## Approach 1: In-Place Extension Swap
+
+### Step 1: Install RuVector Extension
+
+```bash
+# Install RuVector package
+# Option A: From source
+cd ruvector-postgres
+cargo pgrx install --release
+
+# Option B: From package (when available)
+apt install postgresql-16-ruvector
+```
+
+### Step 2: Stop Application Writes
+
+```sql
+-- Optional: Put tables in read-only mode
+BEGIN;
+LOCK TABLE items IN EXCLUSIVE MODE;
+-- Keep transaction open to block writes
+```
+
+### Step 3: Drop pgvector Indexes
+
+```sql
+-- Save index definitions for recreation
+SELECT indexdef
+FROM pg_indexes
+WHERE indexname IN (
+    SELECT i.relname
+    FROM pg_index ix
+    JOIN pg_class i ON ix.indexrelid = i.oid
+    JOIN pg_am am ON i.relam = am.oid
+    WHERE am.amname IN ('hnsw', 'ivfflat')
+);
+
+-- Drop indexes (saves original DDL first)
+DO $$
+DECLARE
+    idx RECORD;
+BEGIN
+    FOR idx IN
+        SELECT i.relname AS index_name
+        FROM pg_index ix
+        JOIN pg_class i ON ix.indexrelid = i.oid
+        JOIN pg_am am ON i.relam = am.oid
+        WHERE am.amname IN ('hnsw', 'ivfflat')
+    LOOP
+        EXECUTE format('DROP INDEX IF EXISTS %I', idx.index_name);
+    END LOOP;
+END $$;
+```
+
+### Step 4: Swap Extensions
+
+```sql
+-- Drop pgvector
+DROP EXTENSION vector CASCADE;
+
+-- Create RuVector
+CREATE EXTENSION ruvector;
+```
+
+### Step 5: Recreate Indexes
+
+```sql
+-- Recreate HNSW index (same syntax)
+CREATE INDEX idx_items_embedding ON items
+USING hnsw (embedding vector_l2_ops)
+WITH (m = 16, ef_construction = 64);
+
+-- Or with RuVector-specific options
+CREATE INDEX idx_items_embedding ON items
+USING hnsw (embedding vector_l2_ops)
+WITH (m = 16, ef_construction = 64);
+```
+
+### Step 6: Verify
+
+```sql
+-- Check extension
+SELECT * FROM pg_extension WHERE extname = 'ruvector';
+
+-- Test query
+EXPLAIN ANALYZE
+SELECT id, embedding <-> '[0.1, 0.2, ...]' AS distance
+FROM items
+ORDER BY embedding <-> '[0.1, 0.2, ...]'
+LIMIT 10;
+
+-- Compare recall (optional)
+-- Run same query with and without index
+SET enable_indexscan = off;
+-- Query without index (exact)
+SET enable_indexscan = on;
+-- Query with index (approximate)
+```
+
+### Step 7: Resume Application
+
+```sql
+-- Release lock
+ROLLBACK;  -- If you started a transaction for locking
+```
+
+---
+
+## Approach 2: Parallel Run
+
+### Step 1: Install RuVector (Different Schema)
+
+```sql
+-- Create schema for RuVector
+CREATE SCHEMA ruvector_new;
+
+-- Install RuVector in new schema
+CREATE EXTENSION ruvector WITH SCHEMA ruvector_new;
+```
+
+### Step 2: Create Shadow Tables
+
+```sql
+-- Create shadow table with same structure
+CREATE TABLE ruvector_new.items AS
+SELECT * FROM items WHERE false;
+
+-- Add vector column using RuVector type
+ALTER TABLE ruvector_new.items
+    ALTER COLUMN embedding TYPE ruvector_new.vector(768);
+
+-- Copy data
+INSERT INTO ruvector_new.items
+SELECT * FROM items;
+
+-- Create index
+CREATE INDEX ON ruvector_new.items
+USING hnsw (embedding ruvector_new.vector_l2_ops)
+WITH (m = 16, ef_construction = 64);
+```
+
+### Step 3: Set Up Triggers for Sync
+
+```sql
+-- Sync inserts
+CREATE OR REPLACE FUNCTION sync_to_ruvector()
+RETURNS TRIGGER AS $$
+BEGIN
+    INSERT INTO ruvector_new.items VALUES (NEW.*);
+    RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE TRIGGER trg_sync_insert
+    AFTER INSERT ON items
+    FOR EACH ROW EXECUTE FUNCTION sync_to_ruvector();
+
+-- Sync updates
+CREATE TRIGGER trg_sync_update
+    AFTER UPDATE ON items
+    FOR EACH ROW EXECUTE FUNCTION sync_to_ruvector_update();
+
+-- Sync deletes
+CREATE TRIGGER trg_sync_delete
+    AFTER DELETE ON items
+    FOR EACH ROW EXECUTE FUNCTION sync_to_ruvector_delete();
+```
+
+### Step 4: Gradual Cutover
+
+```python
+# Application code with gradual cutover
+import random
+
+def search_embeddings(query_vector, use_ruvector_pct=0):
+    """
+    Gradually shift traffic to RuVector.
+    Start with 0%, increase to 100% over time.
+    """
+    if random.random() * 100 < use_ruvector_pct:
+        # Use RuVector
+        return db.execute("""
+            SELECT id, embedding <-> %s AS distance
+            FROM ruvector_new.items
+            ORDER BY embedding <-> %s
+            LIMIT 10
+        """, [query_vector, query_vector])
+    else:
+        # Use pgvector
+        return db.execute("""
+            SELECT id, embedding <-> %s AS distance
+            FROM items
+            ORDER BY embedding <-> %s
+            LIMIT 10
+        """, [query_vector, query_vector])
+```
+
+### Step 5: Complete Migration
+
+Once 100% traffic on RuVector with no issues:
+
+```sql
+-- Rename tables
+ALTER TABLE items RENAME TO items_pgvector_backup;
+ALTER TABLE ruvector_new.items RENAME TO items;
+ALTER TABLE items SET SCHEMA public;
+
+-- Drop pgvector
+DROP EXTENSION vector CASCADE;
+DROP TABLE items_pgvector_backup;
+
+-- Clean up triggers
+DROP FUNCTION sync_to_ruvector CASCADE;
+```
+
+---
+
+## Approach 3: Full Data Migration
+
+### Step 1: Export Data
+
+```sql
+-- Export to CSV
+\copy (SELECT id, embedding::text, metadata FROM items) TO 'items_export.csv' CSV;
+
+-- Or to binary format
+\copy items TO 'items_export.bin' BINARY;
+```
+
+### Step 2: Switch Extensions
+
+```sql
+DROP EXTENSION vector CASCADE;
+CREATE EXTENSION ruvector;
+```
+
+### Step 3: Recreate Tables
+
+```sql
+-- Recreate with RuVector type
+CREATE TABLE items (
+    id SERIAL PRIMARY KEY,
+    embedding vector(768),
+    metadata JSONB
+);
+
+-- Import data
+\copy items FROM 'items_export.csv' CSV;
+
+-- Create index
+CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
+```
+
+---
+
+## Query Compatibility Reference
+
+### Identical Syntax (No Changes Needed)
+
+```sql
+-- Vector type declaration
+CREATE TABLE items (embedding vector(768));
+
+-- Distance operators
+SELECT * FROM items ORDER BY embedding <-> query LIMIT 10;  -- L2
+SELECT * FROM items ORDER BY embedding <=> query LIMIT 10;  -- Cosine
+SELECT * FROM items ORDER BY embedding <#> query LIMIT 10;  -- Inner product
+
+-- Index creation
+CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
+CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
+CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
+
+-- Operator classes
+vector_l2_ops
+vector_cosine_ops
+vector_ip_ops
+
+-- Utility functions
+SELECT vector_dims(embedding) FROM items LIMIT 1;
+SELECT vector_norm(embedding) FROM items LIMIT 1;
+```
+
+### Extended Syntax (RuVector Only)
+
+```sql
+-- New distance operators
+SELECT * FROM items ORDER BY embedding <+> query LIMIT 10;  -- L1/Manhattan
+
+-- Collection registration
+SELECT ruvector_register_collection(
+    'my_embeddings',
+    'public',
+    'items',
+    'embedding',
+    768,
+    'l2'
+);
+
+-- Advanced search options
+SELECT * FROM ruvector_search(
+    'my_embeddings',
+    query_vector,
+    10,           -- k
+    100,          -- ef_search
+    FALSE,        -- use_gnn
+    '{"category": "electronics"}'  -- filter
+);
+
+-- Tiered storage
+SELECT ruvector_set_tiers('my_embeddings', 24, 168, 720);
+SELECT ruvector_tier_report('my_embeddings');
+
+-- Graph integration
+SELECT ruvector_graph_create('knowledge_graph');
+SELECT ruvector_cypher('knowledge_graph', 'MATCH (n) RETURN n LIMIT 10');
+
+-- Integrity monitoring
+SELECT ruvector_integrity_status('my_embeddings');
+```
+
+---
+
+## GUC Parameter Mapping
+
+| pgvector | RuVector | Notes |
+|----------|----------|-------|
+| `ivfflat.probes` | `ruvector.probes` | Same behavior |
+| `hnsw.ef_search` | `ruvector.ef_search` | Same behavior |
+| N/A | `ruvector.use_simd` | Enable/disable SIMD |
+| N/A | `ruvector.max_index_memory` | Memory limit |
+
+```sql
+-- Set runtime parameters (same syntax)
+SET ruvector.ef_search = 100;
+SET ruvector.probes = 10;
+```
+
+---
+
+## Common Migration Issues
+
+### Issue 1: Type Mismatch After Migration
+
+```sql
+-- Error: operator does not exist: ruvector.vector <-> public.vector
+-- Solution: Ensure all tables use the new type
+
+SELECT
+    c.relname AS table_name,
+    a.attname AS column_name,
+    t.typname AS type_name,
+    n.nspname AS type_schema
+FROM pg_attribute a
+JOIN pg_class c ON a.attrelid = c.oid
+JOIN pg_type t ON a.atttypid = t.oid
+JOIN pg_namespace n ON t.typnamespace = n.oid
+WHERE t.typname = 'vector';
+
+-- Fix by recreating column
+ALTER TABLE items ALTER COLUMN embedding TYPE ruvector.vector(768);
+```
+
+### Issue 2: Index Not Using RuVector AM
+
+```sql
+-- Check which AM is being used
+SELECT
+    i.relname AS index_name,
+    am.amname AS access_method
+FROM pg_index ix
+JOIN pg_class i ON ix.indexrelid = i.oid
+JOIN pg_am am ON i.relam = am.oid;
+
+-- Rebuild index with correct AM
+DROP INDEX old_index;
+CREATE INDEX new_index ON items USING hnsw (embedding vector_l2_ops);
+```
+
+### Issue 3: Different Recall/Performance
+
+```sql
+-- RuVector may have different default parameters
+-- Adjust ef_search for recall
+SET ruvector.ef_search = 200;  -- Higher for better recall
+
+-- Check actual ef being used
+EXPLAIN (ANALYZE, VERBOSE)
+SELECT * FROM items ORDER BY embedding <-> query LIMIT 10;
+```
+
+### Issue 4: Extension Dependencies
+
+```sql
+-- Check what depends on vector extension
+SELECT
+    dependent.relname AS dependent_object,
+    dependent.relkind AS object_type
+FROM pg_depend d
+JOIN pg_extension e ON d.refobjid = e.oid
+JOIN pg_class dependent ON d.objid = dependent.oid
+WHERE e.extname = 'vector';
+
+-- May need to drop dependent objects first
+```
+
+---
+
+## Rollback Procedure
+
+If migration fails, rollback to pgvector:
+
+```bash
+# Restore from backup
+pg_restore -d mydb --clean backup_before_migration.dump
+
+# Or manually:
+```
+
+```sql
+-- Drop RuVector
+DROP EXTENSION ruvector CASCADE;
+
+-- Reinstall pgvector
+CREATE EXTENSION vector;
+
+-- Restore schema (from saved DDL)
+-- Recreate indexes (from saved DDL)
+```
+
+---
+
+## Performance Validation
+
+### Compare Query Performance
+
+```python
+import time
+import psycopg2
+import numpy as np
+
+def benchmark_extension(conn, query_vector, n_queries=100):
+    """Benchmark query latency"""
+    latencies = []
+
+    for _ in range(n_queries):
+        start = time.time()
+        with conn.cursor() as cur:
+            cur.execute("""
+                SELECT id, embedding <-> %s AS distance
+                FROM items
+                ORDER BY embedding <-> %s
+                LIMIT 10
+            """, [query_vector, query_vector])
+            cur.fetchall()
+        latencies.append((time.time() - start) * 1000)
+
+    return {
+        'p50': np.percentile(latencies, 50),
+        'p95': np.percentile(latencies, 95),
+        'p99': np.percentile(latencies, 99),
+        'mean': np.mean(latencies),
+    }
+
+# Run before migration (pgvector)
+pgvector_results = benchmark_extension(conn, query_vec)
+
+# Run after migration (RuVector)
+ruvector_results = benchmark_extension(conn, query_vec)
+
+print(f"pgvector p50: {pgvector_results['p50']:.2f}ms")
+print(f"RuVector p50: {ruvector_results['p50']:.2f}ms")
+```
+
+### Compare Recall
+
+```python
+def measure_recall(conn, query_vectors, k=10):
+    """Measure recall@k against brute force"""
+    recalls = []
+
+    for query in query_vectors:
+        # Index scan result
+        with conn.cursor() as cur:
+            cur.execute("""
+                SELECT id FROM items
+                ORDER BY embedding <-> %s
+                LIMIT %s
+            """, [query, k])
+            index_results = set(row[0] for row in cur.fetchall())
+
+        # Brute force (disable index)
+        with conn.cursor() as cur:
+            cur.execute("SET enable_indexscan = off")
+            cur.execute("""
+                SELECT id FROM items
+                ORDER BY embedding <-> %s
+                LIMIT %s
+            """, [query, k])
+            exact_results = set(row[0] for row in cur.fetchall())
+            cur.execute("SET enable_indexscan = on")
+
+        recall = len(index_results & exact_results) / k
+        recalls.append(recall)
+
+    return np.mean(recalls)
+```
+
+---
+
+## Post-Migration Steps
+
+### 1. Register Collections (Optional but Recommended)
+
+```sql
+-- Register for RuVector-specific features
+SELECT ruvector_register_collection(
+    'items_embeddings',
+    'public',
+    'items',
+    'embedding',
+    768,
+    'l2'
+);
+```
+
+### 2. Enable Tiered Storage (Optional)
+
+```sql
+-- Configure tiers
+SELECT ruvector_set_tiers('items_embeddings', 24, 168, 720);
+```
+
+### 3. Set Up Integrity Monitoring (Optional)
+
+```sql
+-- Enable integrity monitoring
+SELECT ruvector_integrity_policy_set('items_embeddings', 'default', '{
+    "threshold_high": 0.8,
+    "threshold_low": 0.3
+}'::jsonb);
+```
+
+### 4. Update Application Code
+
+```python
+# Minimal changes needed for basic operations
+
+# No change needed:
+cursor.execute("SELECT * FROM items ORDER BY embedding <-> %s LIMIT 10", [vec])
+
+# Optional: Use new features
+cursor.execute("SELECT * FROM ruvector_search('items_embeddings', %s, 10)", [vec])
+```
+
+---
+
+## Support
+
+- GitHub Issues: https://github.com/ruvnet/ruvector/issues
+- Documentation: https://ruvector.dev/docs
+- Migration Support: migration@ruvector.dev
--- a/docs/postgres/v2/10-consistency-replication.md
+++ b/docs/postgres/v2/10-consistency-replication.md
@@ -0,0 +1,826 @@
+# RuVector Postgres v2 - Consistency and Replication Model
+
+## Overview
+
+This document specifies the consistency contract between PostgreSQL heap tuples and the RuVector engine, MVCC interaction, WAL and logical decoding strategy, crash recovery, replay order, and idempotency guarantees.
+
+---
+
+## Core Consistency Contract
+
+### Authoritative Source of Truth
+
+```
+------------------------------------------------------------------+
+|                    CONSISTENCY HIERARCHY                          |
+------------------------------------------------------------------+
+|                                                                   |
+|  1. PostgreSQL Heap is AUTHORITATIVE for:                        |
+|     - Row existence                                               |
+|     - Visibility rules (MVCC xmin/xmax)                          |
+|     - Transaction commit status                                   |
+|     - Data integrity constraints                                  |
+|                                                                   |
+|  2. RuVector Engine Index is EVENTUALLY CONSISTENT:              |
+|     - Bounded lag window (configurable, default 100ms)           |
+|     - Reconciled on demand                                        |
+|     - Never returns invisible tuples                              |
+|     - Never resurrects deleted embeddings                         |
+|                                                                   |
+------------------------------------------------------------------+
+```
+
+### Consistency Guarantees
+
+| Property | Guarantee | Enforcement |
+|----------|-----------|-------------|
+| **No phantom reads** | Index never returns invisible tuples | Heap visibility check on every result |
+| **No zombie vectors** | Deleted vectors never return | Delete markers + tombstone cleanup |
+| **No stale updates** | Updated vectors show new values | Version-aware index entries |
+| **Bounded staleness** | Max lag from commit to searchable | Configurable, default 100ms |
+| **Crash consistency** | Recoverable to last WAL checkpoint | WAL-based recovery |
+
+---
+
+## Consistency Mechanisms
+
+### Option A: Synchronous Index Maintenance
+
+```
+INSERT/UPDATE Transaction:
+------------------------------------------------------------------+
+|                                                                   |
+|  1. BEGIN                                                         |
+|  2. Write heap tuple                                              |
+|  3. Call engine (synchronous)                                     |
+|     └─ If engine rejects → ROLLBACK                              |
+|  4. Append to WAL                                                 |
+|  5. COMMIT                                                        |
+|                                                                   |
+------------------------------------------------------------------+
+
+Pros:
+- Strongest consistency
+- Simple mental model
+- No reconciliation needed
+
+Cons:
+- Higher latency per operation
+- Engine failure blocks writes
+- Reduces write throughput
+```
+
+### Option B: Asynchronous Maintenance with Reconciliation
+
+```
+INSERT/UPDATE Transaction:
+------------------------------------------------------------------+
+|                                                                   |
+|  1. BEGIN                                                         |
+|  2. Write heap tuple                                              |
+|  3. Write to change log table OR trigger logical decoding         |
+|  4. Append to WAL                                                 |
+|  5. COMMIT                                                        |
+|                                                                   |
+|  Background (continuous):                                         |
+|  6. Engine reads change log / logical replication stream          |
+|  7. Applies changes to index                                      |
+|  8. Index scan checks heap visibility for every result            |
+|                                                                   |
+------------------------------------------------------------------+
+
+Pros:
+- Lower write latency
+- Engine failure doesn't block writes
+- Higher throughput
+
+Cons:
+- Bounded staleness window
+- Requires visibility rechecks
+- More complex recovery
+```
+
+### v2 Hybrid Model (Recommended)
+
+```
+------------------------------------------------------------------+
+|                   v2 HYBRID CONSISTENCY MODEL                     |
+------------------------------------------------------------------+
+|                                                                   |
+|  SYNCHRONOUS (Hot Tier):                                          |
+|  - Primary HNSW index mutations                                   |
+|  - Hot tier inserts/updates                                       |
+|  - Visibility-critical operations                                 |
+|                                                                   |
+|  ASYNCHRONOUS (Background):                                       |
+|  - Compaction and tier moves                                      |
+|  - Graph edge maintenance                                         |
+|  - GNN training data capture                                      |
+|  - Cold tier updates                                              |
+|  - Index optimization/rewiring                                    |
+|                                                                   |
+------------------------------------------------------------------+
+```
+
+---
+
+## Implementation Details
+
+### Visibility Check Protocol
+
+```rust
+/// Check heap visibility for index results
+pub fn check_visibility(
+    snapshot: &Snapshot,
+    results: &[IndexResult],
+) -> Vec<IndexResult> {
+    results.iter()
+        .filter(|r| {
+            // Fetch heap tuple header
+            let htup = heap_fetch_tuple_header(r.tid);
+
+            // Check MVCC visibility
+            htup.map_or(false, |h| {
+                heap_tuple_satisfies_snapshot(h, snapshot)
+            })
+        })
+        .cloned()
+        .collect()
+}
+
+/// Index scan must always recheck heap
+impl IndexScan {
+    fn next(&mut self) -> Option<HeapTuple> {
+        loop {
+            // Get next candidate from index
+            let candidate = self.index.next()?;
+
+            // CRITICAL: Always verify against heap
+            if let Some(tuple) = self.heap_fetch_visible(candidate.tid) {
+                return Some(tuple);
+            }
+            // Invisible tuple, try next
+        }
+    }
+}
+```
+
+### Incremental Candidate Paging API
+
+The engine must support incremental candidate paging so the executor can skip MVCC-invisible rows and request more until k visible results are produced.
+
+```rust
+/// Search request with cursor support for incremental paging
+#[derive(Debug)]
+pub struct SearchRequest {
+    pub collection_id: i32,
+    pub query: Vec<f32>,
+    pub want_k: usize,           // Desired visible results
+    pub cursor: Option<Cursor>,  // Resume from previous batch
+    pub max_candidates: usize,   // Max to return per batch (default: want_k * 2)
+}
+
+/// Search response with cursor for pagination
+#[derive(Debug)]
+pub struct SearchResponse {
+    pub candidates: Vec<Candidate>,
+    pub cursor: Option<Cursor>,  // None if exhausted
+    pub total_scanned: usize,
+}
+
+/// Cursor token for resuming search
+#[derive(Debug, Clone)]
+pub struct Cursor {
+    pub ef_search_position: usize,
+    pub last_distance: f32,
+    pub visited_count: usize,
+}
+
+/// Engine returns batches with cursor tokens
+impl Engine {
+    pub fn search_batch(&self, req: SearchRequest) -> SearchResponse {
+        let start_pos = req.cursor.map(|c| c.ef_search_position).unwrap_or(0);
+
+        // Continue HNSW search from cursor position
+        let (candidates, next_pos, exhausted) = self.hnsw.search_continue(
+            &req.query,
+            req.max_candidates,
+            start_pos,
+        );
+
+        SearchResponse {
+            candidates,
+            cursor: if exhausted {
+                None
+            } else {
+                Some(Cursor {
+                    ef_search_position: next_pos,
+                    last_distance: candidates.last().map(|c| c.distance).unwrap_or(f32::MAX),
+                    visited_count: start_pos + candidates.len(),
+                })
+            },
+            total_scanned: start_pos + candidates.len(),
+        }
+    }
+}
+
+/// Executor uses incremental paging
+fn execute_vector_search(query: &[f32], k: usize, snapshot: &Snapshot) -> Vec<HeapTuple> {
+    let mut results = Vec::with_capacity(k);
+    let mut cursor = None;
+
+    loop {
+        // Request batch from engine
+        let response = engine.search_batch(SearchRequest {
+            collection_id,
+            query: query.to_vec(),
+            want_k: k - results.len(),
+            cursor,
+            max_candidates: (k - results.len()) * 2,  // Over-fetch
+        });
+
+        // Check visibility and collect visible tuples
+        for candidate in response.candidates {
+            if let Some(tuple) = heap_fetch_visible(candidate.tid, snapshot) {
+                results.push(tuple);
+                if results.len() >= k {
+                    return results;
+                }
+            }
+        }
+
+        // Check if exhausted
+        match response.cursor {
+            Some(c) => cursor = Some(c),
+            None => break,  // No more candidates
+        }
+    }
+
+    results
+}
+```
+
+### Change Log Table (Async Mode)
+
+```sql
+-- Change log for async reconciliation
+CREATE TABLE ruvector._change_log (
+    id              BIGSERIAL PRIMARY KEY,
+    collection_id   INTEGER NOT NULL,
+    operation       CHAR(1) NOT NULL CHECK (operation IN ('I', 'U', 'D')),
+    tuple_tid       TID NOT NULL,
+    vector_data     BYTEA,  -- NULL for deletes
+    xmin            XID NOT NULL,
+    committed       BOOLEAN DEFAULT FALSE,
+    applied         BOOLEAN DEFAULT FALSE,
+    created_at      TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
+);
+
+CREATE INDEX idx_change_log_pending
+    ON ruvector._change_log(collection_id, id)
+    WHERE NOT applied;
+
+-- Trigger to capture changes
+CREATE FUNCTION ruvector._log_change() RETURNS TRIGGER AS $$
+BEGIN
+    IF TG_OP = 'INSERT' THEN
+        INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
+        SELECT collection_id, 'I', NEW.ctid, NEW.embedding, txid_current()
+        FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
+    ELSIF TG_OP = 'UPDATE' THEN
+        INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
+        SELECT collection_id, 'U', NEW.ctid, NEW.embedding, txid_current()
+        FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
+    ELSIF TG_OP = 'DELETE' THEN
+        INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
+        SELECT collection_id, 'D', OLD.ctid, NULL, txid_current()
+        FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
+    END IF;
+    RETURN NULL;
+END;
+$$ LANGUAGE plpgsql;
+```
+
+### Logical Decoding (Alternative)
+
+```rust
+/// Logical decoding output plugin for RuVector
+pub struct RuVectorOutputPlugin;
+
+impl OutputPlugin for RuVectorOutputPlugin {
+    fn begin_txn(&mut self, xid: TransactionId) {
+        self.current_xid = Some(xid);
+        self.changes.clear();
+    }
+
+    fn change(&mut self, relation: &Relation, change: &Change) {
+        // Only process tables with vector columns
+        if !self.is_vector_table(relation) {
+            return;
+        }
+
+        match change {
+            Change::Insert(new) => {
+                self.changes.push(VectorChange::Insert {
+                    tid: new.tid,
+                    vector: extract_vector(new),
+                });
+            }
+            Change::Update(old, new) => {
+                self.changes.push(VectorChange::Update {
+                    old_tid: old.tid,
+                    new_tid: new.tid,
+                    vector: extract_vector(new),
+                });
+            }
+            Change::Delete(old) => {
+                self.changes.push(VectorChange::Delete {
+                    tid: old.tid,
+                });
+            }
+        }
+    }
+
+    fn commit_txn(&mut self, xid: TransactionId, commit_lsn: XLogRecPtr) {
+        // Apply all changes atomically
+        self.engine.apply_changes(&self.changes, commit_lsn);
+    }
+}
+```
+
+---
+
+## MVCC Interaction
+
+### Transaction Visibility Rules
+
+```rust
+/// Snapshot-aware index search
+pub fn search_with_snapshot(
+    collection_id: i32,
+    query: &[f32],
+    k: usize,
+    snapshot: &Snapshot,
+) -> Vec<SearchResult> {
+    // Get more candidates than k to account for invisible tuples
+    let over_fetch_factor = 2.0;
+    let candidates = engine.search(
+        collection_id,
+        query,
+        (k as f32 * over_fetch_factor) as usize,
+    );
+
+    // Filter by visibility
+    let visible: Vec<_> = candidates.into_iter()
+        .filter(|c| is_visible(c.tid, snapshot))
+        .take(k)
+        .collect();
+
+    // If we don't have enough, fetch more
+    if visible.len() < k {
+        // Recursive fetch with larger over_fetch
+        return search_with_larger_pool(...);
+    }
+
+    visible
+}
+
+/// Check tuple visibility against snapshot
+fn is_visible(tid: TupleId, snapshot: &Snapshot) -> bool {
+    let htup = unsafe { heap_fetch_tuple(tid) };
+
+    match htup {
+        Some(tuple) => {
+            // HeapTupleSatisfiesVisibility equivalent
+            let xmin = tuple.t_xmin;
+            let xmax = tuple.t_xmax;
+
+            // Inserted by committed transaction visible to us
+            let xmin_visible = snapshot.xmin <= xmin &&
+                              !snapshot.xip.contains(&xmin) &&
+                              pg_xact_status(xmin) == XACT_STATUS_COMMITTED;
+
+            // Not deleted, or deleted by transaction not visible to us
+            let not_deleted = xmax == InvalidTransactionId ||
+                             snapshot.xmax <= xmax ||
+                             snapshot.xip.contains(&xmax) ||
+                             pg_xact_status(xmax) != XACT_STATUS_COMMITTED;
+
+            xmin_visible && not_deleted
+        }
+        None => false,  // Tuple vacuumed away
+    }
+}
+```
+
+### HOT Update Handling
+
+```rust
+/// Handle Heap-Only Tuple updates
+pub fn handle_hot_update(old_tid: TupleId, new_tid: TupleId, new_vector: &[f32]) {
+    // HOT updates may change ctid without changing embedding
+    if vectors_equal(get_vector(old_tid), new_vector) {
+        // Only ctid changed, update TID mapping
+        engine.update_tid_mapping(old_tid, new_tid);
+    } else {
+        // Vector changed, full update needed
+        engine.delete(old_tid);
+        engine.insert(new_tid, new_vector);
+    }
+}
+```
+
+---
+
+## WAL and Recovery
+
+### WAL Record Types
+
+```rust
+/// Custom WAL record types for RuVector
+#[repr(u8)]
+pub enum RuVectorWalRecord {
+    /// Vector inserted into index
+    IndexInsert = 0x10,
+    /// Vector deleted from index
+    IndexDelete = 0x11,
+    /// Index page split
+    IndexSplit = 0x12,
+    /// HNSW edge added
+    HnswEdgeAdd = 0x20,
+    /// HNSW edge removed
+    HnswEdgeRemove = 0x21,
+    /// Tier change
+    TierChange = 0x30,
+    /// Integrity state change
+    IntegrityChange = 0x40,
+}
+
+impl RuVectorWalRecord {
+    /// Write WAL record
+    pub fn write(&self, data: &[u8]) -> XLogRecPtr {
+        unsafe {
+            let rdata = XLogRecData {
+                data: data.as_ptr() as *mut c_char,
+                len: data.len() as u32,
+                next: std::ptr::null_mut(),
+            };
+
+            XLogInsert(RM_RUVECTOR_ID, self.to_u8(), &rdata)
+        }
+    }
+}
+```
+
+### Crash Recovery
+
+```rust
+/// Redo function for crash recovery
+pub extern "C" fn ruvector_redo(record: *mut XLogReaderState) {
+    let info = unsafe { (*record).decoded_record.as_ref() };
+
+    match RuVectorWalRecord::from_u8(info.xl_info) {
+        Some(RuVectorWalRecord::IndexInsert) => {
+            let insert_data: IndexInsertData = deserialize(info.data);
+            engine.redo_insert(insert_data);
+        }
+        Some(RuVectorWalRecord::IndexDelete) => {
+            let delete_data: IndexDeleteData = deserialize(info.data);
+            engine.redo_delete(delete_data);
+        }
+        Some(RuVectorWalRecord::HnswEdgeAdd) => {
+            let edge_data: HnswEdgeData = deserialize(info.data);
+            engine.redo_edge_add(edge_data);
+        }
+        // ... other record types
+        _ => {
+            pgrx::warning!("Unknown RuVector WAL record type");
+        }
+    }
+}
+
+/// Startup recovery sequence
+pub fn startup_recovery() {
+    pgrx::log!("RuVector: Starting crash recovery");
+
+    // 1. Load last consistent checkpoint
+    let checkpoint = load_checkpoint();
+
+    // 2. Rebuild in-memory structures
+    engine.load_from_checkpoint(&checkpoint);
+
+    // 3. Replay WAL from checkpoint
+    let wal_reader = WalReader::from_lsn(checkpoint.redo_lsn);
+    for record in wal_reader {
+        ruvector_redo(&record);
+    }
+
+    // 4. Reconcile with heap if needed
+    if checkpoint.needs_reconciliation {
+        reconcile_with_heap();
+    }
+
+    pgrx::log!("RuVector: Recovery complete");
+}
+```
+
+### Replay Order Guarantees
+
+```
+WAL Replay Order Contract:
+------------------------------------------------------------------+
+|                                                                   |
+|  1. WAL records replayed in LSN order (guaranteed by PostgreSQL) |
+|                                                                   |
+|  2. Within a transaction:                                         |
+|     - Heap insert before index insert                            |
+|     - Index delete before heap delete (for visibility)           |
+|                                                                   |
+|  3. Cross-transaction:                                            |
+|     - Commit order preserved                                      |
+|     - Visibility respects commit timestamps                       |
+|                                                                   |
+|  4. Recovery invariant:                                           |
+|     - After recovery, index matches committed heap state          |
+|     - No uncommitted changes in index                             |
+|                                                                   |
+------------------------------------------------------------------+
+```
+
+---
+
+## Idempotency and Ordering Rules
+
+**CRITICAL**: If WAL is truth, these invariants prevent "eventual corruption".
+
+### Explicit Replay Rules
+
+```
+------------------------------------------------------------------+
+|              ENGINE REPLAY INVARIANTS                             |
+------------------------------------------------------------------+
+
+RULE 1: Apply operations in LSN order
+  - Each operation carries its source LSN
+  - Engine rejects out-of-order operations
+  - Crash recovery replays from last checkpoint LSN
+
+RULE 2: Store last applied LSN per collection
+  - Persisted in ruvector.collection_state.last_applied_lsn
+  - Updated atomically after each operation
+  - Skip operations with LSN <= last_applied_lsn
+
+RULE 3: Delete wins over insert for same TID
+  - If TID inserted then deleted, final state is deleted
+  - Replay order handles this naturally if LSN-ordered
+  - Edge case: TID reuse after VACUUM requires checking xmin
+
+RULE 4: Update = Delete + Insert
+  - Updates decompose to delete old, insert new
+  - Both carry same transaction LSN
+  - Applied atomically
+
+RULE 5: Rollback handling
+  - Uncommitted operations not in WAL (crash safe)
+  - For explicit ROLLBACK during runtime:
+    - Synchronous mode: engine notified, reverts in-memory state
+    - Async mode: change log entry marked rollback, skipped on apply
+
+------------------------------------------------------------------+
+```
+
+### Conflict Resolution
+
+```rust
+/// Handle conflicts during replay
+pub fn apply_with_conflict_resolution(
+    &mut self,
+    op: WalOperation,
+) -> Result<(), ReplayError> {
+    // Check LSN ordering
+    let last_lsn = self.lsn_tracker.get(op.collection_id);
+    if op.lsn <= last_lsn {
+        // Already applied, skip (idempotent)
+        return Ok(());
+    }
+
+    match op.kind {
+        OpKind::Insert { tid, vector } => {
+            if self.index.contains_tid(tid) {
+                // TID exists - check if this is TID reuse after VACUUM
+                let existing_lsn = self.index.get_lsn(tid);
+                if op.lsn > existing_lsn {
+                    // Newer insert wins - delete old, insert new
+                    self.index.delete(tid);
+                    self.index.insert(tid, &vector, op.lsn);
+                }
+                // else: stale insert, skip
+            } else {
+                self.index.insert(tid, &vector, op.lsn);
+            }
+        }
+        OpKind::Delete { tid } => {
+            // Delete always wins if LSN is newer
+            if self.index.contains_tid(tid) {
+                let existing_lsn = self.index.get_lsn(tid);
+                if op.lsn > existing_lsn {
+                    self.index.delete(tid);
+                }
+            }
+            // If not present, already deleted - idempotent
+        }
+        OpKind::Update { old_tid, new_tid, vector } => {
+            // Atomic delete + insert
+            self.index.delete(old_tid);
+            self.index.insert(new_tid, &vector, op.lsn);
+        }
+    }
+
+    self.lsn_tracker.update(op.collection_id, op.lsn);
+    Ok(())
+}
+```
+
+### Idempotent Operations
+
+```rust
+/// All engine operations must be idempotent for safe replay
+impl Engine {
+    /// Idempotent insert - safe to replay
+    pub fn redo_insert(&mut self, data: IndexInsertData) {
+        // Check if already exists
+        if self.index.contains_tid(data.tid) {
+            // Already inserted, skip
+            return;
+        }
+
+        // Insert with LSN tracking
+        self.index.insert_with_lsn(data.tid, &data.vector, data.lsn);
+    }
+
+    /// Idempotent delete - safe to replay
+    pub fn redo_delete(&mut self, data: IndexDeleteData) {
+        // Check if already deleted
+        if !self.index.contains_tid(data.tid) {
+            // Already deleted, skip
+            return;
+        }
+
+        // Delete with tombstone
+        self.index.delete_with_lsn(data.tid, data.lsn);
+    }
+
+    /// Idempotent edge add - safe to replay
+    pub fn redo_edge_add(&mut self, data: HnswEdgeData) {
+        // HNSW edges are idempotent by nature
+        self.hnsw.add_edge(data.from, data.to, data.lsn);
+    }
+}
+```
+
+### LSN-Based Deduplication
+
+```rust
+/// Track applied LSN per collection
+pub struct LsnTracker {
+    applied_lsn: HashMap<i32, XLogRecPtr>,
+}
+
+impl LsnTracker {
+    /// Check if operation should be applied
+    pub fn should_apply(&self, collection_id: i32, lsn: XLogRecPtr) -> bool {
+        match self.applied_lsn.get(&collection_id) {
+            Some(&last_lsn) => lsn > last_lsn,
+            None => true,
+        }
+    }
+
+    /// Mark operation as applied
+    pub fn mark_applied(&mut self, collection_id: i32, lsn: XLogRecPtr) {
+        self.applied_lsn.insert(collection_id, lsn);
+    }
+}
+```
+
+---
+
+## Replication Strategies
+
+### Physical Replication (Streaming)
+
+```
+Primary → Standby streaming with RuVector:
+
+Primary:
+1. Write heap + index changes
+2. Generate WAL records
+3. Stream to standby
+
+Standby:
+1. Receive WAL stream
+2. Apply heap changes (PostgreSQL)
+3. Apply index changes (RuVector redo)
+4. Engine state matches primary
+```
+
+### Logical Replication
+
+```
+Publisher → Subscriber with RuVector:
+
+Publisher:
+1. Changes captured via logical decoding
+2. RuVector output plugin extracts vector changes
+3. Publishes to replication slot
+
+Subscriber:
+1. Receives logical changes
+2. Applies to local heap
+3. Local RuVector engine indexes changes
+4. Independent index structures
+```
+
+---
+
+## Configuration
+
+```sql
+-- Consistency configuration
+ALTER SYSTEM SET ruvector.consistency_mode = 'hybrid';  -- 'sync', 'async', 'hybrid'
+ALTER SYSTEM SET ruvector.max_lag_ms = 100;             -- Max staleness window
+ALTER SYSTEM SET ruvector.visibility_recheck = true;    -- Always recheck heap
+ALTER SYSTEM SET ruvector.wal_level = 'logical';        -- For logical replication
+
+-- Recovery configuration
+ALTER SYSTEM SET ruvector.checkpoint_interval = 300;    -- Checkpoint every 5 min
+ALTER SYSTEM SET ruvector.wal_buffer_size = '64MB';     -- WAL buffer
+ALTER SYSTEM SET ruvector.recovery_target_timeline = 'latest';
+```
+
+---
+
+## Monitoring
+
+```sql
+-- Consistency lag monitoring
+SELECT
+    c.name AS collection,
+    s.last_heap_lsn,
+    s.last_index_lsn,
+    pg_wal_lsn_diff(s.last_heap_lsn, s.last_index_lsn) AS lag_bytes,
+    s.lag_ms,
+    s.pending_changes
+FROM ruvector.consistency_status s
+JOIN ruvector.collections c ON s.collection_id = c.id;
+
+-- Visibility recheck statistics
+SELECT
+    collection_name,
+    total_searches,
+    visibility_rechecks,
+    invisible_filtered,
+    (invisible_filtered::float / NULLIF(visibility_rechecks, 0) * 100)::numeric(5,2) AS invisible_pct
+FROM ruvector.visibility_stats
+ORDER BY invisible_pct DESC;
+
+-- WAL replay status
+SELECT
+    pg_last_wal_receive_lsn() AS receive_lsn,
+    pg_last_wal_replay_lsn() AS replay_lsn,
+    ruvector_last_applied_lsn() AS ruvector_lsn,
+    pg_wal_lsn_diff(pg_last_wal_replay_lsn(), ruvector_last_applied_lsn()) AS ruvector_lag_bytes;
+```
+
+---
+
+## Testing Requirements
+
+### Unit Tests
+- Visibility check correctness
+- Idempotent operation replay
+- LSN tracking accuracy
+- MVCC snapshot handling
+
+### Integration Tests
+- Crash recovery scenarios
+- Concurrent transaction visibility
+- Replication lag handling
+- HOT update handling
+
+### Chaos Tests
+- Primary failover
+- Network partition during replication
+- Partial WAL replay
+- Checkpoint corruption recovery
+
+---
+
+## Summary
+
+The v2 consistency model ensures:
+
+1. **Heap is authoritative** - All visibility decisions defer to PostgreSQL heap
+2. **Bounded staleness** - Index catches up within configurable lag window
+3. **Crash safe** - WAL-based recovery with idempotent replay
+4. **Replication compatible** - Works with streaming and logical replication
+5. **MVCC aware** - Respects transaction isolation guarantees
--- a/docs/postgres/v2/11-hybrid-search.md
+++ b/docs/postgres/v2/11-hybrid-search.md
@@ -0,0 +1,608 @@
+# RuVector Postgres v2 - Hybrid Search (BM25 + Vector)
+
+## Why Hybrid Search Matters
+
+Vector search finds semantically similar content. Keyword search finds exact matches.
+
+Neither is sufficient alone:
+- **Vector-only** misses exact keyword matches (product SKUs, error codes, names)
+- **Keyword-only** misses semantic similarity ("car" vs "automobile")
+
+Every production RAG system needs both. pgvector doesn't have this. We do.
+
+---
+
+## Design Goals
+
+1. **Single query, both signals** — No application-level fusion
+2. **Configurable blending** — RRF, linear, learned weights
+3. **Integrity-aware** — Hybrid index participates in contracted graph
+4. **PostgreSQL-native** — Leverages `tsvector` and GIN indexes
+
+---
+
+## Architecture
+
+```
+                     +------------------+
+                     |   Hybrid Query   |
+                     | "error 500 fix"  |
+                     +--------+---------+
+                              |
+              +---------------+---------------+
+              |                               |
+     +--------v--------+            +---------v---------+
+     |  Vector Branch  |            |  Keyword Branch   |
+     |  (HNSW/IVF)     |            |  (GIN/tsvector)   |
+     +--------+--------+            +---------+---------+
+              |                               |
+              |  top-100 by cosine            |  top-100 by BM25
+              |                               |
+              +---------------+---------------+
+                              |
+                     +--------v--------+
+                     |  Fusion Layer   |
+                     |  (RRF / Linear) |
+                     +--------+--------+
+                              |
+                     +--------v--------+
+                     |  Final top-k    |
+                     +--------+--------+
+                              |
+                     +--------v--------+
+                     | Optional Rerank |
+                     +-----------------+
+```
+
+---
+
+## SQL Interface
+
+### Basic Hybrid Search
+
+```sql
+-- Simple hybrid search with default RRF fusion
+SELECT * FROM ruvector_hybrid_search(
+    'documents',           -- collection name
+    query_text := 'database connection timeout error',
+    query_vector := $embedding,
+    k := 10
+);
+
+-- Returns: id, content, vector_score, keyword_score, hybrid_score
+```
+
+### Configurable Fusion
+
+```sql
+-- RRF (Reciprocal Rank Fusion) - default, robust
+SELECT * FROM ruvector_hybrid_search(
+    'documents',
+    query_text := 'postgres replication lag',
+    query_vector := $embedding,
+    k := 20,
+    fusion := 'rrf',
+    rrf_k := 60  -- RRF constant (default 60)
+);
+
+-- Linear blend with alpha
+SELECT * FROM ruvector_hybrid_search(
+    'documents',
+    query_text := 'postgres replication lag',
+    query_vector := $embedding,
+    k := 20,
+    fusion := 'linear',
+    alpha := 0.7  -- 0.7 * vector + 0.3 * keyword
+);
+
+-- Learned fusion weights (from query patterns)
+SELECT * FROM ruvector_hybrid_search(
+    'documents',
+    query_text := 'postgres replication lag',
+    query_vector := $embedding,
+    k := 20,
+    fusion := 'learned'  -- Uses GNN-trained weights
+);
+```
+
+### Operator Syntax (Advanced)
+
+```sql
+-- Using hybrid operator in ORDER BY
+SELECT id, content,
+       ruvector_hybrid_score(
+           embedding <=> $query_vec,
+           ts_rank_cd(fts, plainto_tsquery($query_text)),
+           alpha := 0.6
+       ) AS score
+FROM documents
+WHERE fts @@ plainto_tsquery($query_text)  -- Pre-filter
+   OR embedding <=> $query_vec < 0.5       -- Or similar vectors
+ORDER BY score DESC
+LIMIT 10;
+```
+
+---
+
+## Schema Requirements
+
+### Collection with Hybrid Support
+
+```sql
+-- Create table with both vector and FTS columns
+CREATE TABLE documents (
+    id          BIGSERIAL PRIMARY KEY,
+    content     TEXT NOT NULL,
+    embedding   vector(1536) NOT NULL,
+    fts         tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED,
+    metadata    JSONB DEFAULT '{}'::jsonb,
+    created_at  TIMESTAMPTZ DEFAULT NOW()
+);
+
+-- Vector index
+CREATE INDEX idx_documents_embedding
+    ON documents USING ruhnsw (embedding vector_cosine_ops)
+    WITH (m = 16, ef_construction = 100);
+
+-- FTS index
+CREATE INDEX idx_documents_fts
+    ON documents USING gin (fts);
+
+-- Register for hybrid search
+SELECT ruvector_register_hybrid(
+    collection := 'documents',
+    vector_column := 'embedding',
+    fts_column := 'fts',
+    text_column := 'content'  -- For BM25 stats
+);
+```
+
+### Hybrid Registration Table
+
+```sql
+-- Internal: tracks hybrid-enabled collections
+CREATE TABLE ruvector.hybrid_collections (
+    id              SERIAL PRIMARY KEY,
+    collection_id   INTEGER NOT NULL REFERENCES ruvector.collections(id),
+    vector_column   TEXT NOT NULL,
+    fts_column      TEXT NOT NULL,
+    text_column     TEXT NOT NULL,
+
+    -- BM25 parameters (computed from corpus)
+    avg_doc_length  REAL,
+    doc_count       BIGINT,
+    k1              REAL DEFAULT 1.2,
+    b               REAL DEFAULT 0.75,
+
+    -- Fusion settings
+    default_fusion  TEXT DEFAULT 'rrf',
+    default_alpha   REAL DEFAULT 0.5,
+    learned_weights JSONB,
+
+    -- Stats
+    last_stats_update TIMESTAMPTZ,
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+```
+
+---
+
+## BM25 Implementation
+
+### Why Not Just ts_rank?
+
+PostgreSQL's `ts_rank` is not true BM25. It doesn't account for:
+- Document length normalization
+- IDF weighting across corpus
+- Term frequency saturation
+
+We implement proper BM25 in the engine.
+
+### BM25 Scoring
+
+```rust
+// src/hybrid/bm25.rs
+
+/// BM25 scorer with corpus statistics
+pub struct BM25Scorer {
+    k1: f32,           // Term frequency saturation (default 1.2)
+    b: f32,            // Length normalization (default 0.75)
+    avg_doc_len: f32,  // Average document length
+    doc_count: u64,    // Total documents
+    idf_cache: HashMap<String, f32>,  // Cached IDF values
+}
+
+impl BM25Scorer {
+    /// Compute IDF for a term
+    fn idf(&self, doc_freq: u64) -> f32 {
+        let n = self.doc_count as f32;
+        let df = doc_freq as f32;
+        ((n - df + 0.5) / (df + 0.5) + 1.0).ln()
+    }
+
+    /// Score a document for a query
+    pub fn score(&self, doc: &Document, query_terms: &[String]) -> f32 {
+        let doc_len = doc.term_count as f32;
+        let len_norm = 1.0 - self.b + self.b * (doc_len / self.avg_doc_len);
+
+        query_terms.iter()
+            .filter_map(|term| {
+                let tf = doc.term_freq(term)? as f32;
+                let idf = self.idf_cache.get(term)?;
+
+                // BM25 formula
+                let numerator = tf * (self.k1 + 1.0);
+                let denominator = tf + self.k1 * len_norm;
+
+                Some(idf * numerator / denominator)
+            })
+            .sum()
+    }
+}
+```
+
+### Corpus Statistics Update
+
+```sql
+-- Update BM25 statistics (run periodically or after bulk inserts)
+SELECT ruvector_hybrid_update_stats('documents');
+
+-- Stats stored in hybrid_collections table
+-- Computed via background worker or on-demand
+```
+
+```rust
+// Background worker updates corpus stats
+pub fn update_bm25_stats(collection_id: i32) -> Result<(), Error> {
+    Spi::run(|client| {
+        // Get average document length
+        let avg_len: f64 = client.select(
+            "SELECT AVG(LENGTH(content)) FROM documents",
+            None, &[]
+        )?.first().unwrap().get(1)?;
+
+        // Get document count
+        let doc_count: i64 = client.select(
+            "SELECT COUNT(*) FROM documents",
+            None, &[]
+        )?.first().unwrap().get(1)?;
+
+        // Update term frequencies (using tsvector stats)
+        // ... compute IDF cache ...
+
+        client.update(
+            "UPDATE ruvector.hybrid_collections
+             SET avg_doc_length = $1, doc_count = $2, last_stats_update = NOW()
+             WHERE collection_id = $3",
+            None,
+            &[avg_len.into(), doc_count.into(), collection_id.into()]
+        )
+    })
+}
+```
+
+---
+
+## Fusion Algorithms
+
+### Reciprocal Rank Fusion (RRF)
+
+Default and most robust. Works without score calibration.
+
+```rust
+// src/hybrid/fusion.rs
+
+/// RRF fusion: score = sum(1 / (k + rank_i))
+pub fn rrf_fusion(
+    vector_results: &[(DocId, f32)],  // (id, distance)
+    keyword_results: &[(DocId, f32)], // (id, bm25_score)
+    k: usize,                          // RRF constant (default 60)
+    limit: usize,
+) -> Vec<(DocId, f32)> {
+    let mut scores: HashMap<DocId, f32> = HashMap::new();
+
+    // Vector ranking (lower distance = higher rank)
+    for (rank, (doc_id, _)) in vector_results.iter().enumerate() {
+        *scores.entry(*doc_id).or_default() += 1.0 / (k + rank + 1) as f32;
+    }
+
+    // Keyword ranking (higher BM25 = higher rank)
+    for (rank, (doc_id, _)) in keyword_results.iter().enumerate() {
+        *scores.entry(*doc_id).or_default() += 1.0 / (k + rank + 1) as f32;
+    }
+
+    // Sort by fused score
+    let mut results: Vec<_> = scores.into_iter().collect();
+    results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+    results.truncate(limit);
+    results
+}
+```
+
+### Linear Fusion
+
+Simple weighted combination. Requires score normalization.
+
+```rust
+/// Linear fusion: score = alpha * vec_score + (1 - alpha) * kw_score
+pub fn linear_fusion(
+    vector_results: &[(DocId, f32)],
+    keyword_results: &[(DocId, f32)],
+    alpha: f32,
+    limit: usize,
+) -> Vec<(DocId, f32)> {
+    // Normalize vector scores (convert distance to similarity)
+    let vec_scores = normalize_to_similarity(vector_results);
+
+    // Normalize BM25 scores to [0, 1]
+    let kw_scores = min_max_normalize(keyword_results);
+
+    // Combine
+    let mut combined: HashMap<DocId, f32> = HashMap::new();
+
+    for (doc_id, score) in vec_scores {
+        *combined.entry(doc_id).or_default() += alpha * score;
+    }
+
+    for (doc_id, score) in kw_scores {
+        *combined.entry(doc_id).or_default() += (1.0 - alpha) * score;
+    }
+
+    let mut results: Vec<_> = combined.into_iter().collect();
+    results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+    results.truncate(limit);
+    results
+}
+```
+
+### Learned Fusion
+
+Uses query characteristics to select weights dynamically.
+
+```rust
+/// Learned fusion using GNN-predicted weights
+pub fn learned_fusion(
+    query_embedding: &[f32],
+    query_terms: &[String],
+    vector_results: &[(DocId, f32)],
+    keyword_results: &[(DocId, f32)],
+    model: &FusionModel,
+    limit: usize,
+) -> Vec<(DocId, f32)> {
+    // Query features
+    let features = QueryFeatures {
+        embedding_norm: l2_norm(query_embedding),
+        term_count: query_terms.len(),
+        avg_term_idf: compute_avg_idf(query_terms),
+        has_exact_match: detect_exact_match_intent(query_terms),
+        query_type: classify_query_type(query_terms),  // navigational, informational, etc.
+    };
+
+    // Predict optimal alpha for this query
+    let alpha = model.predict_alpha(&features);
+
+    linear_fusion(vector_results, keyword_results, alpha, limit)
+}
+```
+
+---
+
+## Integrity Integration
+
+Hybrid search participates in the integrity control plane.
+
+### Contracted Graph Nodes
+
+```sql
+-- Hybrid index adds nodes to contracted graph
+INSERT INTO ruvector.contracted_graph (collection_id, node_type, node_id, node_name, health_score)
+SELECT
+    c.id,
+    'hybrid_index',
+    h.id,
+    'hybrid_' || c.name,
+    CASE
+        WHEN h.last_stats_update > NOW() - INTERVAL '1 day' THEN 1.0
+        WHEN h.last_stats_update > NOW() - INTERVAL '7 days' THEN 0.7
+        ELSE 0.3  -- Stale stats degrade health
+    END
+FROM ruvector.hybrid_collections h
+JOIN ruvector.collections c ON h.collection_id = c.id;
+```
+
+### Integrity-Aware Hybrid Search
+
+```rust
+/// Hybrid search with integrity gating
+pub fn hybrid_search_with_integrity(
+    collection_id: i32,
+    query: &HybridQuery,
+) -> Result<Vec<HybridResult>, Error> {
+    // Check integrity gate
+    let gate = check_integrity_gate(collection_id, "hybrid_search");
+
+    match gate.state {
+        IntegrityState::Normal => {
+            // Full hybrid: both branches
+            execute_full_hybrid(query)
+        }
+        IntegrityState::Stress => {
+            // Degrade gracefully: prefer faster branch
+            if query.alpha > 0.5 {
+                // Vector-heavy query: use vector only
+                execute_vector_only(query)
+            } else {
+                // Keyword-heavy query: use keyword only
+                execute_keyword_only(query)
+            }
+        }
+        IntegrityState::Critical => {
+            // Minimal: keyword only (cheapest)
+            execute_keyword_only(query)
+        }
+    }
+}
+```
+
+---
+
+## Performance Optimization
+
+### Pre-filtering Strategy
+
+```sql
+-- Hybrid search with pre-filter (faster for selective filters)
+SELECT * FROM ruvector_hybrid_search(
+    'documents',
+    query_text := 'error handling',
+    query_vector := $embedding,
+    k := 10,
+    filter := 'category = ''backend'' AND created_at > NOW() - INTERVAL ''30 days'''
+);
+```
+
+```rust
+// Execution strategy selection
+fn choose_strategy(filter_selectivity: f32, corpus_size: u64) -> HybridStrategy {
+    if filter_selectivity < 0.01 {
+        // Very selective: pre-filter, then hybrid on small set
+        HybridStrategy::PreFilter
+    } else if filter_selectivity < 0.1 && corpus_size > 1_000_000 {
+        // Moderately selective, large corpus: hybrid first, post-filter
+        HybridStrategy::PostFilter
+    } else {
+        // Not selective: full hybrid
+        HybridStrategy::Full
+    }
+}
+```
+
+### Parallel Execution
+
+```rust
+/// Execute vector and keyword branches in parallel
+pub async fn parallel_hybrid(query: &HybridQuery) -> HybridResults {
+    let (vector_results, keyword_results) = tokio::join!(
+        execute_vector_branch(&query.embedding, query.prefetch_k),
+        execute_keyword_branch(&query.text, query.prefetch_k),
+    );
+
+    fuse_results(vector_results, keyword_results, query.fusion, query.k)
+}
+```
+
+### Caching
+
+```rust
+/// Cache BM25 scores for repeated terms
+pub struct HybridCache {
+    term_doc_scores: LruCache<(String, DocId), f32>,
+    idf_cache: HashMap<String, f32>,
+    ttl: Duration,
+}
+```
+
+---
+
+## Configuration
+
+### GUC Parameters
+
+```sql
+-- Default fusion method
+SET ruvector.hybrid_fusion = 'rrf';  -- 'rrf', 'linear', 'learned'
+
+-- Default alpha for linear fusion
+SET ruvector.hybrid_alpha = 0.5;
+
+-- RRF constant
+SET ruvector.hybrid_rrf_k = 60;
+
+-- Prefetch size for each branch
+SET ruvector.hybrid_prefetch_k = 100;
+
+-- Enable parallel branch execution
+SET ruvector.hybrid_parallel = true;
+```
+
+### Per-Collection Settings
+
+```sql
+SELECT ruvector_hybrid_configure('documents', '{
+    "default_fusion": "learned",
+    "prefetch_k": 200,
+    "bm25_k1": 1.5,
+    "bm25_b": 0.8,
+    "stats_refresh_interval": "1 hour"
+}'::jsonb);
+```
+
+---
+
+## Monitoring
+
+```sql
+-- Hybrid search statistics
+SELECT * FROM ruvector_hybrid_stats('documents');
+
+-- Returns:
+-- {
+--   "total_searches": 15234,
+--   "avg_vector_latency_ms": 4.2,
+--   "avg_keyword_latency_ms": 2.1,
+--   "avg_fusion_latency_ms": 0.3,
+--   "cache_hit_rate": 0.67,
+--   "last_stats_update": "2024-01-15T10:30:00Z",
+--   "corpus_size": 1250000,
+--   "avg_doc_length": 542
+-- }
+```
+
+---
+
+## Testing Requirements
+
+### Correctness Tests
+- BM25 scoring matches reference implementation
+- RRF fusion produces expected rankings
+- Linear fusion respects alpha parameter
+- Learned fusion adapts to query type
+
+### Performance Tests
+- Hybrid search < 2x single-branch latency
+- Parallel execution shows speedup
+- Cache hit rate > 50% for repeated queries
+
+### Integration Tests
+- Integrity degradation triggers graceful fallback
+- Stats update doesn't block queries
+- Large corpus (10M+ docs) scales
+
+---
+
+## Example: RAG Application
+
+```sql
+-- Complete RAG retrieval with hybrid search
+WITH retrieved AS (
+    SELECT
+        id,
+        content,
+        hybrid_score,
+        metadata
+    FROM ruvector_hybrid_search(
+        'knowledge_base',
+        query_text := $user_question,
+        query_vector := $question_embedding,
+        k := 5,
+        fusion := 'rrf',
+        filter := 'status = ''published'''
+    )
+)
+SELECT
+    string_agg(content, E'\n\n---\n\n') AS context,
+    array_agg(id) AS source_ids
+FROM retrieved;
+
+-- Pass context to LLM for answer generation
+```
--- a/docs/postgres/v2/12-multi-tenancy.md
+++ b/docs/postgres/v2/12-multi-tenancy.md
@@ -0,0 +1,719 @@
+# RuVector Postgres v2 - Multi-Tenancy Model
+
+## Why Multi-Tenancy Matters
+
+Every SaaS application needs tenant isolation. Without native support, teams build:
+- Separate databases per tenant (operational nightmare)
+- Manual partition schemes (error-prone)
+- Application-level filtering (security risk)
+
+RuVector provides **first-class multi-tenancy** with:
+- Tenant-isolated search (data never leaks)
+- Per-tenant integrity monitoring (one bad tenant doesn't sink others)
+- Efficient shared infrastructure (cost-effective)
+- Row-level security integration (PostgreSQL-native)
+
+---
+
+## Design Goals
+
+1. **Zero data leakage** — Tenant A never sees Tenant B's vectors
+2. **Per-tenant integrity** — Stress in one tenant doesn't affect others
+3. **Fair resource allocation** — No noisy neighbor problems
+4. **Transparent to queries** — SET tenant, then normal SQL
+5. **Efficient storage** — Shared indexes where safe, isolated where needed
+
+---
+
+## Architecture
+
+```
+------------------------------------------------------------------+
+|                        Application                                |
+|  SET ruvector.tenant_id = 'acme-corp';                           |
+|  SELECT * FROM embeddings ORDER BY vec <-> $q LIMIT 10;          |
+------------------------------------------------------------------+
+                              |
+------------------------------------------------------------------+
+|                    Tenant Context Layer                           |
+|  - Validates tenant_id                                            |
+|  - Injects tenant filter into all operations                     |
+|  - Routes to tenant-specific resources                            |
+------------------------------------------------------------------+
+                              |
+              +---------------+---------------+
+              |                               |
+     +--------v--------+            +---------v---------+
+     |  Shared Index   |            |  Tenant Indexes   |
+     |  (small tenants)|            |  (large tenants)  |
+     +--------+--------+            +---------+---------+
+              |                               |
+              +---------------+---------------+
+                              |
+------------------------------------------------------------------+
+|                    Per-Tenant Integrity                           |
+|  - Separate contracted graphs                                     |
+|  - Independent state machines                                     |
+|  - Isolated throttling policies                                   |
+------------------------------------------------------------------+
+```
+
+---
+
+## SQL Interface
+
+### Setting Tenant Context
+
+```sql
+-- Set tenant for session (required before any operation)
+SET ruvector.tenant_id = 'acme-corp';
+
+-- Or per-transaction
+BEGIN;
+SET LOCAL ruvector.tenant_id = 'acme-corp';
+-- ... operations ...
+COMMIT;
+
+-- Verify current tenant
+SELECT current_setting('ruvector.tenant_id');
+```
+
+### Tenant-Transparent Operations
+
+```sql
+-- Once tenant is set, all operations are automatically scoped
+SET ruvector.tenant_id = 'acme-corp';
+
+-- Insert only sees/affects acme-corp data
+INSERT INTO embeddings (content, vec) VALUES ('doc', $embedding);
+
+-- Search only returns acme-corp results
+SELECT * FROM embeddings ORDER BY vec <-> $query LIMIT 10;
+
+-- Delete only affects acme-corp
+DELETE FROM embeddings WHERE id = 123;
+```
+
+### Admin Operations (Cross-Tenant)
+
+```sql
+-- Superuser can query across tenants
+SET ruvector.tenant_id = '*';  -- Wildcard (admin only)
+
+-- View all tenants
+SELECT * FROM ruvector_tenants();
+
+-- View tenant stats
+SELECT * FROM ruvector_tenant_stats('acme-corp');
+
+-- Migrate tenant to dedicated index
+SELECT ruvector_tenant_isolate('acme-corp');
+```
+
+---
+
+## Schema Design
+
+### Tenant Registry
+
+```sql
+CREATE TABLE ruvector.tenants (
+    id              TEXT PRIMARY KEY,
+    display_name    TEXT,
+
+    -- Resource limits
+    max_vectors     BIGINT DEFAULT 1000000,
+    max_collections INTEGER DEFAULT 10,
+    max_qps         INTEGER DEFAULT 100,
+
+    -- Isolation level
+    isolation_level TEXT DEFAULT 'shared' CHECK (isolation_level IN (
+        'shared',      -- Shared index with tenant filter
+        'partition',   -- Dedicated partition in shared index
+        'dedicated'    -- Separate physical index
+    )),
+
+    -- Integrity settings
+    integrity_enabled   BOOLEAN DEFAULT true,
+    integrity_policy_id INTEGER REFERENCES ruvector.integrity_policies(id),
+
+    -- Metadata
+    metadata        JSONB DEFAULT '{}'::jsonb,
+    created_at      TIMESTAMPTZ DEFAULT NOW(),
+    suspended_at    TIMESTAMPTZ,  -- Non-null = suspended
+
+    -- Stats (updated by background worker)
+    vector_count    BIGINT DEFAULT 0,
+    storage_bytes   BIGINT DEFAULT 0,
+    last_access     TIMESTAMPTZ
+);
+
+CREATE INDEX idx_tenants_isolation ON ruvector.tenants(isolation_level);
+CREATE INDEX idx_tenants_suspended ON ruvector.tenants(suspended_at) WHERE suspended_at IS NOT NULL;
+```
+
+### Tenant-Aware Collections
+
+```sql
+-- Collections can be tenant-specific or shared
+CREATE TABLE ruvector.collections (
+    id              SERIAL PRIMARY KEY,
+    name            TEXT NOT NULL,
+    tenant_id       TEXT REFERENCES ruvector.tenants(id),  -- NULL = shared
+
+    -- ... other columns from 01-sql-schema.md ...
+
+    UNIQUE (name, tenant_id)  -- Same name allowed for different tenants
+);
+
+-- Tenant-scoped view
+CREATE VIEW ruvector.my_collections AS
+SELECT * FROM ruvector.collections
+WHERE tenant_id = current_setting('ruvector.tenant_id', true)
+   OR tenant_id IS NULL;  -- Shared collections visible to all
+```
+
+### Tenant Column in Data Tables
+
+```sql
+-- User tables include tenant_id column
+CREATE TABLE embeddings (
+    id          BIGSERIAL PRIMARY KEY,
+    tenant_id   TEXT NOT NULL DEFAULT current_setting('ruvector.tenant_id'),
+    content     TEXT,
+    vec         vector(1536),
+    created_at  TIMESTAMPTZ DEFAULT NOW(),
+
+    CONSTRAINT fk_tenant FOREIGN KEY (tenant_id)
+        REFERENCES ruvector.tenants(id) ON DELETE CASCADE
+);
+
+-- Partial index per tenant (for dedicated isolation)
+CREATE INDEX idx_embeddings_vec_tenant_acme
+    ON embeddings USING ruhnsw (vec vector_cosine_ops)
+    WHERE tenant_id = 'acme-corp';
+
+-- Or composite index for shared isolation
+CREATE INDEX idx_embeddings_vec_shared
+    ON embeddings USING ruhnsw (vec vector_cosine_ops);
+-- Engine internally filters by tenant_id
+```
+
+---
+
+## Row-Level Security Integration
+
+### RLS Policies
+
+```sql
+-- Enable RLS on data tables
+ALTER TABLE embeddings ENABLE ROW LEVEL SECURITY;
+
+-- Tenant isolation policy
+CREATE POLICY tenant_isolation ON embeddings
+    USING (tenant_id = current_setting('ruvector.tenant_id', true))
+    WITH CHECK (tenant_id = current_setting('ruvector.tenant_id', true));
+
+-- Admin bypass policy
+CREATE POLICY admin_access ON embeddings
+    FOR ALL
+    TO ruvector_admin
+    USING (true)
+    WITH CHECK (true);
+```
+
+### Automatic Policy Creation
+
+```sql
+-- Helper function to set up RLS for a table
+CREATE FUNCTION ruvector_enable_tenant_rls(
+    p_table_name TEXT,
+    p_tenant_column TEXT DEFAULT 'tenant_id'
+) RETURNS void AS $$
+BEGIN
+    -- Enable RLS
+    EXECUTE format('ALTER TABLE %I ENABLE ROW LEVEL SECURITY', p_table_name);
+
+    -- Create isolation policy
+    EXECUTE format(
+        'CREATE POLICY tenant_isolation ON %I
+         USING (%I = current_setting(''ruvector.tenant_id'', true))
+         WITH CHECK (%I = current_setting(''ruvector.tenant_id'', true))',
+        p_table_name, p_tenant_column, p_tenant_column
+    );
+
+    -- Create admin bypass
+    EXECUTE format(
+        'CREATE POLICY admin_bypass ON %I FOR ALL TO ruvector_admin USING (true)',
+        p_table_name
+    );
+END;
+$$ LANGUAGE plpgsql;
+
+-- Usage
+SELECT ruvector_enable_tenant_rls('embeddings');
+SELECT ruvector_enable_tenant_rls('documents');
+```
+
+---
+
+## Isolation Levels
+
+### Shared (Default)
+
+All tenants share one index. Engine filters by tenant_id.
+
+```
+Pros:
+  + Most memory-efficient
+  + Fastest for small tenants
+  + Simple management
+
+Cons:
+  - Some cross-tenant cache pollution
+  - Shared integrity state
+
+Best for: < 100K vectors per tenant
+```
+
+### Partition
+
+Tenants get dedicated partitions within shared index structure.
+
+```
+Pros:
+  + Better cache isolation
+  + Per-partition integrity
+  + Easy promotion to dedicated
+
+Cons:
+  - Some overhead per partition
+  - Still shares top-level structure
+
+Best for: 100K - 10M vectors per tenant
+```
+
+### Dedicated
+
+Tenant gets completely separate physical index.
+
+```
+Pros:
+  + Complete isolation
+  + Independent scaling
+  + Custom index parameters
+
+Cons:
+  - Higher memory overhead
+  + More management complexity
+
+Best for: > 10M vectors, enterprise tenants, compliance requirements
+```
+
+### Automatic Promotion
+
+```sql
+-- Configure auto-promotion thresholds
+SELECT ruvector_tenant_set_policy('{
+    "auto_promote_to_partition": 100000,   -- vectors
+    "auto_promote_to_dedicated": 10000000,
+    "check_interval": "1 hour"
+}'::jsonb);
+```
+
+```rust
+// Background worker checks and promotes
+pub fn check_tenant_promotion(tenant_id: &str) -> Option<IsolationLevel> {
+    let stats = get_tenant_stats(tenant_id)?;
+    let policy = get_promotion_policy()?;
+
+    if stats.vector_count > policy.dedicated_threshold {
+        Some(IsolationLevel::Dedicated)
+    } else if stats.vector_count > policy.partition_threshold {
+        Some(IsolationLevel::Partition)
+    } else {
+        None
+    }
+}
+```
+
+---
+
+## Per-Tenant Integrity
+
+### Separate Contracted Graphs
+
+```sql
+-- Each tenant gets its own contracted graph
+CREATE TABLE ruvector.tenant_contracted_graph (
+    tenant_id       TEXT NOT NULL REFERENCES ruvector.tenants(id),
+    collection_id   INTEGER NOT NULL,
+    node_type       TEXT NOT NULL,
+    node_id         BIGINT NOT NULL,
+    -- ... same as contracted_graph ...
+
+    PRIMARY KEY (tenant_id, collection_id, node_type, node_id)
+);
+```
+
+### Independent State Machines
+
+```rust
+// Per-tenant integrity state
+pub struct TenantIntegrityState {
+    tenant_id: String,
+    state: IntegrityState,
+    lambda_cut: f32,
+    consecutive_samples: u32,
+    last_transition: Instant,
+    cooldown_until: Option<Instant>,
+}
+
+// Tenant stress doesn't affect other tenants
+pub fn check_tenant_gate(tenant_id: &str, operation: &str) -> GateResult {
+    let state = get_tenant_integrity_state(tenant_id);
+    apply_policy(state, operation)
+}
+```
+
+### Tenant-Specific Policies
+
+```sql
+-- Each tenant can have custom thresholds
+INSERT INTO ruvector.integrity_policies (tenant_id, name, threshold_high, threshold_low)
+VALUES
+    ('acme-corp', 'enterprise', 0.6, 0.3),      -- Stricter
+    ('startup-xyz', 'standard', 0.4, 0.15);     -- Default
+```
+
+---
+
+## Resource Quotas
+
+### Quota Enforcement
+
+```sql
+-- Quota table
+CREATE TABLE ruvector.tenant_quotas (
+    tenant_id       TEXT PRIMARY KEY REFERENCES ruvector.tenants(id),
+    max_vectors     BIGINT NOT NULL DEFAULT 1000000,
+    max_storage_gb  REAL NOT NULL DEFAULT 10.0,
+    max_qps         INTEGER NOT NULL DEFAULT 100,
+    max_concurrent  INTEGER NOT NULL DEFAULT 10,
+
+    -- Current usage (updated by triggers/workers)
+    current_vectors BIGINT DEFAULT 0,
+    current_storage_gb REAL DEFAULT 0,
+
+    -- Rate limiting state
+    request_count   INTEGER DEFAULT 0,
+    window_start    TIMESTAMPTZ DEFAULT NOW()
+);
+
+-- Check quota before insert
+CREATE FUNCTION ruvector_check_quota() RETURNS TRIGGER AS $$
+DECLARE
+    v_quota RECORD;
+BEGIN
+    SELECT * INTO v_quota
+    FROM ruvector.tenant_quotas
+    WHERE tenant_id = NEW.tenant_id;
+
+    IF v_quota.current_vectors >= v_quota.max_vectors THEN
+        RAISE EXCEPTION 'Tenant % has exceeded vector quota', NEW.tenant_id;
+    END IF;
+
+    RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE TRIGGER check_quota_before_insert
+    BEFORE INSERT ON embeddings
+    FOR EACH ROW EXECUTE FUNCTION ruvector_check_quota();
+```
+
+### Rate Limiting
+
+```rust
+// Token bucket rate limiter per tenant
+pub struct TenantRateLimiter {
+    buckets: DashMap<String, TokenBucket>,
+}
+
+impl TenantRateLimiter {
+    pub fn check(&self, tenant_id: &str, tokens: u32) -> RateLimitResult {
+        let bucket = self.buckets.entry(tenant_id.to_string())
+            .or_insert_with(|| TokenBucket::new(
+                get_tenant_qps_limit(tenant_id),
+            ));
+
+        if bucket.try_acquire(tokens) {
+            RateLimitResult::Allowed
+        } else {
+            RateLimitResult::Limited {
+                retry_after_ms: bucket.time_to_refill(tokens),
+            }
+        }
+    }
+}
+```
+
+### Fair Scheduling
+
+```rust
+// Weighted fair queue for search requests
+pub struct FairScheduler {
+    queues: HashMap<String, VecDeque<SearchRequest>>,
+    weights: HashMap<String, f32>,  // Based on tier/quota
+}
+
+impl FairScheduler {
+    pub fn next(&mut self) -> Option<SearchRequest> {
+        // Weighted round-robin across tenants
+        // Prevents one tenant from monopolizing resources
+        let total_weight: f32 = self.weights.values().sum();
+
+        for (tenant_id, queue) in &mut self.queues {
+            let weight = self.weights.get(tenant_id).unwrap_or(&1.0);
+            let share = weight / total_weight;
+
+            // Probability of selecting this tenant's request
+            if rand::random::<f32>() < share {
+                if let Some(req) = queue.pop_front() {
+                    return Some(req);
+                }
+            }
+        }
+
+        // Fallback: any available request
+        self.queues.values_mut()
+            .find_map(|q| q.pop_front())
+    }
+}
+```
+
+---
+
+## Tenant Lifecycle
+
+### Create Tenant
+
+```sql
+SELECT ruvector_tenant_create('new-customer', '{
+    "display_name": "New Customer Inc.",
+    "max_vectors": 5000000,
+    "max_qps": 200,
+    "isolation_level": "shared",
+    "integrity_enabled": true
+}'::jsonb);
+```
+
+### Suspend Tenant
+
+```sql
+-- Suspend (stops all operations, keeps data)
+SELECT ruvector_tenant_suspend('bad-actor');
+
+-- Resume
+SELECT ruvector_tenant_resume('bad-actor');
+```
+
+### Delete Tenant
+
+```sql
+-- Soft delete (marks for cleanup)
+SELECT ruvector_tenant_delete('churned-customer');
+
+-- Hard delete (immediate, for compliance)
+SELECT ruvector_tenant_delete('churned-customer', hard := true);
+```
+
+### Migrate Isolation Level
+
+```sql
+-- Promote to dedicated (online, no downtime)
+SELECT ruvector_tenant_migrate('enterprise-customer', 'dedicated');
+
+-- Status check
+SELECT * FROM ruvector_tenant_migration_status('enterprise-customer');
+```
+
+---
+
+## Shared Memory Layout
+
+```rust
+// Per-tenant state in shared memory
+#[repr(C)]
+pub struct TenantSharedState {
+    tenant_id_hash: u64,           // Fast lookup key
+    integrity_state: u8,           // 0=normal, 1=stress, 2=critical
+    lambda_cut: f32,               // Current mincut value
+    request_count: AtomicU32,      // For rate limiting
+    last_request_epoch: AtomicU64, // Rate limit window
+    flags: AtomicU32,              // Suspended, migrating, etc.
+}
+
+// Tenant lookup table
+pub struct TenantRegistry {
+    states: [TenantSharedState; MAX_TENANTS],  // Fixed array in shmem
+    index: HashMap<String, usize>,              // Heap-based lookup
+}
+```
+
+---
+
+## Monitoring
+
+### Per-Tenant Metrics
+
+```sql
+-- Tenant dashboard
+SELECT
+    t.id,
+    t.display_name,
+    t.isolation_level,
+    tq.current_vectors,
+    tq.max_vectors,
+    ROUND(100.0 * tq.current_vectors / tq.max_vectors, 1) AS usage_pct,
+    ts.integrity_state,
+    ts.lambda_cut,
+    ts.avg_search_latency_ms,
+    ts.searches_last_hour
+FROM ruvector.tenants t
+JOIN ruvector.tenant_quotas tq ON t.id = tq.tenant_id
+JOIN ruvector.tenant_stats ts ON t.id = ts.tenant_id
+ORDER BY tq.current_vectors DESC;
+```
+
+### Prometheus Metrics
+
+```
+# Per-tenant metrics
+ruvector_tenant_vectors{tenant="acme-corp"} 1234567
+ruvector_tenant_integrity_state{tenant="acme-corp"} 1
+ruvector_tenant_lambda_cut{tenant="acme-corp"} 0.72
+ruvector_tenant_search_latency_p99{tenant="acme-corp"} 15.2
+ruvector_tenant_qps{tenant="acme-corp"} 45.3
+ruvector_tenant_quota_usage{tenant="acme-corp",resource="vectors"} 0.62
+```
+
+---
+
+## Security Considerations
+
+### Tenant ID Validation
+
+```rust
+// Validate tenant_id before any operation
+pub fn validate_tenant_context() -> Result<String, Error> {
+    let tenant_id = get_guc("ruvector.tenant_id")?;
+
+    // Check not empty
+    if tenant_id.is_empty() {
+        return Err(Error::NoTenantContext);
+    }
+
+    // Check tenant exists and not suspended
+    let tenant = get_tenant(&tenant_id)?;
+    if tenant.suspended_at.is_some() {
+        return Err(Error::TenantSuspended);
+    }
+
+    Ok(tenant_id)
+}
+```
+
+### Audit Logging
+
+```sql
+-- Tenant operations audit log
+CREATE TABLE ruvector.tenant_audit_log (
+    id              BIGSERIAL PRIMARY KEY,
+    tenant_id       TEXT NOT NULL,
+    operation       TEXT NOT NULL,  -- search, insert, delete, etc.
+    user_id         TEXT,           -- Application user
+    details         JSONB,
+    ip_address      INET,
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+
+-- Enabled via GUC
+SET ruvector.audit_enabled = true;
+```
+
+### Cross-Tenant Prevention
+
+```rust
+// Engine-level enforcement (defense in depth)
+pub fn execute_search(request: &SearchRequest) -> Result<SearchResults, Error> {
+    let context_tenant = validate_tenant_context()?;
+
+    // Double-check request matches context
+    if let Some(req_tenant) = &request.tenant_id {
+        if req_tenant != &context_tenant {
+            // Log security event
+            log_security_event("tenant_mismatch", &context_tenant, req_tenant);
+            return Err(Error::TenantMismatch);
+        }
+    }
+
+    // Execute with tenant filter
+    execute_search_internal(request, &context_tenant)
+}
+```
+
+---
+
+## Testing Requirements
+
+### Isolation Tests
+- Tenant A cannot see Tenant B's data
+- Tenant A's stress doesn't affect Tenant B's operations
+- Suspended tenant cannot perform any operations
+
+### Performance Tests
+- Shared isolation: < 5% overhead vs single-tenant
+- Dedicated isolation: equivalent to single-tenant
+- Rate limiting adds < 1ms latency
+
+### Scale Tests
+- 1000+ tenants on shared infrastructure
+- 100+ tenants with dedicated isolation
+- Tenant migration under load
+
+---
+
+## Example: SaaS Application
+
+```python
+# Application code
+class VectorService:
+    def __init__(self, db_pool):
+        self.pool = db_pool
+
+    def search(self, tenant_id: str, query_vec: list, k: int = 10):
+        with self.pool.connection() as conn:
+            # Set tenant context
+            conn.execute("SET ruvector.tenant_id = %s", [tenant_id])
+
+            # Search (automatically scoped to tenant)
+            results = conn.execute("""
+                SELECT id, content, vec <-> %s AS distance
+                FROM embeddings
+                ORDER BY vec <-> %s
+                LIMIT %s
+            """, [query_vec, query_vec, k])
+
+            return results.fetchall()
+
+    def insert(self, tenant_id: str, content: str, vec: list):
+        with self.pool.connection() as conn:
+            conn.execute("SET ruvector.tenant_id = %s", [tenant_id])
+
+            # Insert (tenant_id auto-populated from context)
+            conn.execute("""
+                INSERT INTO embeddings (content, vec)
+                VALUES (%s, %s)
+            """, [content, vec])
+```
--- a/docs/postgres/v2/13-self-healing.md
+++ b/docs/postgres/v2/13-self-healing.md