Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/docs/postgres/v2/10-consistency-replication.md
+++ b/vendor/ruvector/docs/postgres/v2/10-consistency-replication.md
@@ -0,0 +1,826 @@
+# RuVector Postgres v2 - Consistency and Replication Model
+
+## Overview
+
+This document specifies the consistency contract between PostgreSQL heap tuples and the RuVector engine, MVCC interaction, WAL and logical decoding strategy, crash recovery, replay order, and idempotency guarantees.
+
+---
+
+## Core Consistency Contract
+
+### Authoritative Source of Truth
+
+```
+------------------------------------------------------------------+
+|                    CONSISTENCY HIERARCHY                          |
+------------------------------------------------------------------+
+|                                                                   |
+|  1. PostgreSQL Heap is AUTHORITATIVE for:                        |
+|     - Row existence                                               |
+|     - Visibility rules (MVCC xmin/xmax)                          |
+|     - Transaction commit status                                   |
+|     - Data integrity constraints                                  |
+|                                                                   |
+|  2. RuVector Engine Index is EVENTUALLY CONSISTENT:              |
+|     - Bounded lag window (configurable, default 100ms)           |
+|     - Reconciled on demand                                        |
+|     - Never returns invisible tuples                              |
+|     - Never resurrects deleted embeddings                         |
+|                                                                   |
+------------------------------------------------------------------+
+```
+
+### Consistency Guarantees
+
+| Property | Guarantee | Enforcement |
+|----------|-----------|-------------|
+| **No phantom reads** | Index never returns invisible tuples | Heap visibility check on every result |
+| **No zombie vectors** | Deleted vectors never return | Delete markers + tombstone cleanup |
+| **No stale updates** | Updated vectors show new values | Version-aware index entries |
+| **Bounded staleness** | Max lag from commit to searchable | Configurable, default 100ms |
+| **Crash consistency** | Recoverable to last WAL checkpoint | WAL-based recovery |
+
+---
+
+## Consistency Mechanisms
+
+### Option A: Synchronous Index Maintenance
+
+```
+INSERT/UPDATE Transaction:
+------------------------------------------------------------------+
+|                                                                   |
+|  1. BEGIN                                                         |
+|  2. Write heap tuple                                              |
+|  3. Call engine (synchronous)                                     |
+|     └─ If engine rejects → ROLLBACK                              |
+|  4. Append to WAL                                                 |
+|  5. COMMIT                                                        |
+|                                                                   |
+------------------------------------------------------------------+
+
+Pros:
+- Strongest consistency
+- Simple mental model
+- No reconciliation needed
+
+Cons:
+- Higher latency per operation
+- Engine failure blocks writes
+- Reduces write throughput
+```
+
+### Option B: Asynchronous Maintenance with Reconciliation
+
+```
+INSERT/UPDATE Transaction:
+------------------------------------------------------------------+
+|                                                                   |
+|  1. BEGIN                                                         |
+|  2. Write heap tuple                                              |
+|  3. Write to change log table OR trigger logical decoding         |
+|  4. Append to WAL                                                 |
+|  5. COMMIT                                                        |
+|                                                                   |
+|  Background (continuous):                                         |
+|  6. Engine reads change log / logical replication stream          |
+|  7. Applies changes to index                                      |
+|  8. Index scan checks heap visibility for every result            |
+|                                                                   |
+------------------------------------------------------------------+
+
+Pros:
+- Lower write latency
+- Engine failure doesn't block writes
+- Higher throughput
+
+Cons:
+- Bounded staleness window
+- Requires visibility rechecks
+- More complex recovery
+```
+
+### v2 Hybrid Model (Recommended)
+
+```
+------------------------------------------------------------------+
+|                   v2 HYBRID CONSISTENCY MODEL                     |
+------------------------------------------------------------------+
+|                                                                   |
+|  SYNCHRONOUS (Hot Tier):                                          |
+|  - Primary HNSW index mutations                                   |
+|  - Hot tier inserts/updates                                       |
+|  - Visibility-critical operations                                 |
+|                                                                   |
+|  ASYNCHRONOUS (Background):                                       |
+|  - Compaction and tier moves                                      |
+|  - Graph edge maintenance                                         |
+|  - GNN training data capture                                      |
+|  - Cold tier updates                                              |
+|  - Index optimization/rewiring                                    |
+|                                                                   |
+------------------------------------------------------------------+
+```
+
+---
+
+## Implementation Details
+
+### Visibility Check Protocol
+
+```rust
+/// Check heap visibility for index results
+pub fn check_visibility(
+    snapshot: &Snapshot,
+    results: &[IndexResult],
+) -> Vec<IndexResult> {
+    results.iter()
+        .filter(|r| {
+            // Fetch heap tuple header
+            let htup = heap_fetch_tuple_header(r.tid);
+
+            // Check MVCC visibility
+            htup.map_or(false, |h| {
+                heap_tuple_satisfies_snapshot(h, snapshot)
+            })
+        })
+        .cloned()
+        .collect()
+}
+
+/// Index scan must always recheck heap
+impl IndexScan {
+    fn next(&mut self) -> Option<HeapTuple> {
+        loop {
+            // Get next candidate from index
+            let candidate = self.index.next()?;
+
+            // CRITICAL: Always verify against heap
+            if let Some(tuple) = self.heap_fetch_visible(candidate.tid) {
+                return Some(tuple);
+            }
+            // Invisible tuple, try next
+        }
+    }
+}
+```
+
+### Incremental Candidate Paging API
+
+The engine must support incremental candidate paging so the executor can skip MVCC-invisible rows and request more until k visible results are produced.
+
+```rust
+/// Search request with cursor support for incremental paging
+#[derive(Debug)]
+pub struct SearchRequest {
+    pub collection_id: i32,
+    pub query: Vec<f32>,
+    pub want_k: usize,           // Desired visible results
+    pub cursor: Option<Cursor>,  // Resume from previous batch
+    pub max_candidates: usize,   // Max to return per batch (default: want_k * 2)
+}
+
+/// Search response with cursor for pagination
+#[derive(Debug)]
+pub struct SearchResponse {
+    pub candidates: Vec<Candidate>,
+    pub cursor: Option<Cursor>,  // None if exhausted
+    pub total_scanned: usize,
+}
+
+/// Cursor token for resuming search
+#[derive(Debug, Clone)]
+pub struct Cursor {
+    pub ef_search_position: usize,
+    pub last_distance: f32,
+    pub visited_count: usize,
+}
+
+/// Engine returns batches with cursor tokens
+impl Engine {
+    pub fn search_batch(&self, req: SearchRequest) -> SearchResponse {
+        let start_pos = req.cursor.map(|c| c.ef_search_position).unwrap_or(0);
+
+        // Continue HNSW search from cursor position
+        let (candidates, next_pos, exhausted) = self.hnsw.search_continue(
+            &req.query,
+            req.max_candidates,
+            start_pos,
+        );
+
+        SearchResponse {
+            candidates,
+            cursor: if exhausted {
+                None
+            } else {
+                Some(Cursor {
+                    ef_search_position: next_pos,
+                    last_distance: candidates.last().map(|c| c.distance).unwrap_or(f32::MAX),
+                    visited_count: start_pos + candidates.len(),
+                })
+            },
+            total_scanned: start_pos + candidates.len(),
+        }
+    }
+}
+
+/// Executor uses incremental paging
+fn execute_vector_search(query: &[f32], k: usize, snapshot: &Snapshot) -> Vec<HeapTuple> {
+    let mut results = Vec::with_capacity(k);
+    let mut cursor = None;
+
+    loop {
+        // Request batch from engine
+        let response = engine.search_batch(SearchRequest {
+            collection_id,
+            query: query.to_vec(),
+            want_k: k - results.len(),
+            cursor,
+            max_candidates: (k - results.len()) * 2,  // Over-fetch
+        });
+
+        // Check visibility and collect visible tuples
+        for candidate in response.candidates {
+            if let Some(tuple) = heap_fetch_visible(candidate.tid, snapshot) {
+                results.push(tuple);
+                if results.len() >= k {
+                    return results;
+                }
+            }
+        }
+
+        // Check if exhausted
+        match response.cursor {
+            Some(c) => cursor = Some(c),
+            None => break,  // No more candidates
+        }
+    }
+
+    results
+}
+```
+
+### Change Log Table (Async Mode)
+
+```sql
+-- Change log for async reconciliation
+CREATE TABLE ruvector._change_log (
+    id              BIGSERIAL PRIMARY KEY,
+    collection_id   INTEGER NOT NULL,
+    operation       CHAR(1) NOT NULL CHECK (operation IN ('I', 'U', 'D')),
+    tuple_tid       TID NOT NULL,
+    vector_data     BYTEA,  -- NULL for deletes
+    xmin            XID NOT NULL,
+    committed       BOOLEAN DEFAULT FALSE,
+    applied         BOOLEAN DEFAULT FALSE,
+    created_at      TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
+);
+
+CREATE INDEX idx_change_log_pending
+    ON ruvector._change_log(collection_id, id)
+    WHERE NOT applied;
+
+-- Trigger to capture changes
+CREATE FUNCTION ruvector._log_change() RETURNS TRIGGER AS $$
+BEGIN
+    IF TG_OP = 'INSERT' THEN
+        INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
+        SELECT collection_id, 'I', NEW.ctid, NEW.embedding, txid_current()
+        FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
+    ELSIF TG_OP = 'UPDATE' THEN
+        INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
+        SELECT collection_id, 'U', NEW.ctid, NEW.embedding, txid_current()
+        FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
+    ELSIF TG_OP = 'DELETE' THEN
+        INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
+        SELECT collection_id, 'D', OLD.ctid, NULL, txid_current()
+        FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
+    END IF;
+    RETURN NULL;
+END;
+$$ LANGUAGE plpgsql;
+```
+
+### Logical Decoding (Alternative)
+
+```rust
+/// Logical decoding output plugin for RuVector
+pub struct RuVectorOutputPlugin;
+
+impl OutputPlugin for RuVectorOutputPlugin {
+    fn begin_txn(&mut self, xid: TransactionId) {
+        self.current_xid = Some(xid);
+        self.changes.clear();
+    }
+
+    fn change(&mut self, relation: &Relation, change: &Change) {
+        // Only process tables with vector columns
+        if !self.is_vector_table(relation) {
+            return;
+        }
+
+        match change {
+            Change::Insert(new) => {
+                self.changes.push(VectorChange::Insert {
+                    tid: new.tid,
+                    vector: extract_vector(new),
+                });
+            }
+            Change::Update(old, new) => {
+                self.changes.push(VectorChange::Update {
+                    old_tid: old.tid,
+                    new_tid: new.tid,
+                    vector: extract_vector(new),
+                });
+            }
+            Change::Delete(old) => {
+                self.changes.push(VectorChange::Delete {
+                    tid: old.tid,
+                });
+            }
+        }
+    }
+
+    fn commit_txn(&mut self, xid: TransactionId, commit_lsn: XLogRecPtr) {
+        // Apply all changes atomically
+        self.engine.apply_changes(&self.changes, commit_lsn);
+    }
+}
+```
+
+---
+
+## MVCC Interaction
+
+### Transaction Visibility Rules
+
+```rust
+/// Snapshot-aware index search
+pub fn search_with_snapshot(
+    collection_id: i32,
+    query: &[f32],
+    k: usize,
+    snapshot: &Snapshot,
+) -> Vec<SearchResult> {
+    // Get more candidates than k to account for invisible tuples
+    let over_fetch_factor = 2.0;
+    let candidates = engine.search(
+        collection_id,
+        query,
+        (k as f32 * over_fetch_factor) as usize,
+    );
+
+    // Filter by visibility
+    let visible: Vec<_> = candidates.into_iter()
+        .filter(|c| is_visible(c.tid, snapshot))
+        .take(k)
+        .collect();
+
+    // If we don't have enough, fetch more
+    if visible.len() < k {
+        // Recursive fetch with larger over_fetch
+        return search_with_larger_pool(...);
+    }
+
+    visible
+}
+
+/// Check tuple visibility against snapshot
+fn is_visible(tid: TupleId, snapshot: &Snapshot) -> bool {
+    let htup = unsafe { heap_fetch_tuple(tid) };
+
+    match htup {
+        Some(tuple) => {
+            // HeapTupleSatisfiesVisibility equivalent
+            let xmin = tuple.t_xmin;
+            let xmax = tuple.t_xmax;
+
+            // Inserted by committed transaction visible to us
+            let xmin_visible = snapshot.xmin <= xmin &&
+                              !snapshot.xip.contains(&xmin) &&
+                              pg_xact_status(xmin) == XACT_STATUS_COMMITTED;
+
+            // Not deleted, or deleted by transaction not visible to us
+            let not_deleted = xmax == InvalidTransactionId ||
+                             snapshot.xmax <= xmax ||
+                             snapshot.xip.contains(&xmax) ||
+                             pg_xact_status(xmax) != XACT_STATUS_COMMITTED;
+
+            xmin_visible && not_deleted
+        }
+        None => false,  // Tuple vacuumed away
+    }
+}
+```
+
+### HOT Update Handling
+
+```rust
+/// Handle Heap-Only Tuple updates
+pub fn handle_hot_update(old_tid: TupleId, new_tid: TupleId, new_vector: &[f32]) {
+    // HOT updates may change ctid without changing embedding
+    if vectors_equal(get_vector(old_tid), new_vector) {
+        // Only ctid changed, update TID mapping
+        engine.update_tid_mapping(old_tid, new_tid);
+    } else {
+        // Vector changed, full update needed
+        engine.delete(old_tid);
+        engine.insert(new_tid, new_vector);
+    }
+}
+```
+
+---
+
+## WAL and Recovery
+
+### WAL Record Types
+
+```rust
+/// Custom WAL record types for RuVector
+#[repr(u8)]
+pub enum RuVectorWalRecord {
+    /// Vector inserted into index
+    IndexInsert = 0x10,
+    /// Vector deleted from index
+    IndexDelete = 0x11,
+    /// Index page split
+    IndexSplit = 0x12,
+    /// HNSW edge added
+    HnswEdgeAdd = 0x20,
+    /// HNSW edge removed
+    HnswEdgeRemove = 0x21,
+    /// Tier change
+    TierChange = 0x30,
+    /// Integrity state change
+    IntegrityChange = 0x40,
+}
+
+impl RuVectorWalRecord {
+    /// Write WAL record
+    pub fn write(&self, data: &[u8]) -> XLogRecPtr {
+        unsafe {
+            let rdata = XLogRecData {
+                data: data.as_ptr() as *mut c_char,
+                len: data.len() as u32,
+                next: std::ptr::null_mut(),
+            };
+
+            XLogInsert(RM_RUVECTOR_ID, self.to_u8(), &rdata)
+        }
+    }
+}
+```
+
+### Crash Recovery
+
+```rust
+/// Redo function for crash recovery
+pub extern "C" fn ruvector_redo(record: *mut XLogReaderState) {
+    let info = unsafe { (*record).decoded_record.as_ref() };
+
+    match RuVectorWalRecord::from_u8(info.xl_info) {
+        Some(RuVectorWalRecord::IndexInsert) => {
+            let insert_data: IndexInsertData = deserialize(info.data);
+            engine.redo_insert(insert_data);
+        }
+        Some(RuVectorWalRecord::IndexDelete) => {
+            let delete_data: IndexDeleteData = deserialize(info.data);
+            engine.redo_delete(delete_data);
+        }
+        Some(RuVectorWalRecord::HnswEdgeAdd) => {
+            let edge_data: HnswEdgeData = deserialize(info.data);
+            engine.redo_edge_add(edge_data);
+        }
+        // ... other record types
+        _ => {
+            pgrx::warning!("Unknown RuVector WAL record type");
+        }
+    }
+}
+
+/// Startup recovery sequence
+pub fn startup_recovery() {
+    pgrx::log!("RuVector: Starting crash recovery");
+
+    // 1. Load last consistent checkpoint
+    let checkpoint = load_checkpoint();
+
+    // 2. Rebuild in-memory structures
+    engine.load_from_checkpoint(&checkpoint);
+
+    // 3. Replay WAL from checkpoint
+    let wal_reader = WalReader::from_lsn(checkpoint.redo_lsn);
+    for record in wal_reader {
+        ruvector_redo(&record);
+    }
+
+    // 4. Reconcile with heap if needed
+    if checkpoint.needs_reconciliation {
+        reconcile_with_heap();
+    }
+
+    pgrx::log!("RuVector: Recovery complete");
+}
+```
+
+### Replay Order Guarantees
+
+```
+WAL Replay Order Contract:
+------------------------------------------------------------------+
+|                                                                   |
+|  1. WAL records replayed in LSN order (guaranteed by PostgreSQL) |
+|                                                                   |
+|  2. Within a transaction:                                         |
+|     - Heap insert before index insert                            |
+|     - Index delete before heap delete (for visibility)           |
+|                                                                   |
+|  3. Cross-transaction:                                            |
+|     - Commit order preserved                                      |
+|     - Visibility respects commit timestamps                       |
+|                                                                   |
+|  4. Recovery invariant:                                           |
+|     - After recovery, index matches committed heap state          |
+|     - No uncommitted changes in index                             |
+|                                                                   |
+------------------------------------------------------------------+
+```
+
+---
+
+## Idempotency and Ordering Rules
+
+**CRITICAL**: If WAL is truth, these invariants prevent "eventual corruption".
+
+### Explicit Replay Rules
+
+```
+------------------------------------------------------------------+
+|              ENGINE REPLAY INVARIANTS                             |
+------------------------------------------------------------------+
+
+RULE 1: Apply operations in LSN order
+  - Each operation carries its source LSN
+  - Engine rejects out-of-order operations
+  - Crash recovery replays from last checkpoint LSN
+
+RULE 2: Store last applied LSN per collection
+  - Persisted in ruvector.collection_state.last_applied_lsn
+  - Updated atomically after each operation
+  - Skip operations with LSN <= last_applied_lsn
+
+RULE 3: Delete wins over insert for same TID
+  - If TID inserted then deleted, final state is deleted
+  - Replay order handles this naturally if LSN-ordered
+  - Edge case: TID reuse after VACUUM requires checking xmin
+
+RULE 4: Update = Delete + Insert
+  - Updates decompose to delete old, insert new
+  - Both carry same transaction LSN
+  - Applied atomically
+
+RULE 5: Rollback handling
+  - Uncommitted operations not in WAL (crash safe)
+  - For explicit ROLLBACK during runtime:
+    - Synchronous mode: engine notified, reverts in-memory state
+    - Async mode: change log entry marked rollback, skipped on apply
+
+------------------------------------------------------------------+
+```
+
+### Conflict Resolution
+
+```rust
+/// Handle conflicts during replay
+pub fn apply_with_conflict_resolution(
+    &mut self,
+    op: WalOperation,
+) -> Result<(), ReplayError> {
+    // Check LSN ordering
+    let last_lsn = self.lsn_tracker.get(op.collection_id);
+    if op.lsn <= last_lsn {
+        // Already applied, skip (idempotent)
+        return Ok(());
+    }
+
+    match op.kind {
+        OpKind::Insert { tid, vector } => {
+            if self.index.contains_tid(tid) {
+                // TID exists - check if this is TID reuse after VACUUM
+                let existing_lsn = self.index.get_lsn(tid);
+                if op.lsn > existing_lsn {
+                    // Newer insert wins - delete old, insert new
+                    self.index.delete(tid);
+                    self.index.insert(tid, &vector, op.lsn);
+                }
+                // else: stale insert, skip
+            } else {
+                self.index.insert(tid, &vector, op.lsn);
+            }
+        }
+        OpKind::Delete { tid } => {
+            // Delete always wins if LSN is newer
+            if self.index.contains_tid(tid) {
+                let existing_lsn = self.index.get_lsn(tid);
+                if op.lsn > existing_lsn {
+                    self.index.delete(tid);
+                }
+            }
+            // If not present, already deleted - idempotent
+        }
+        OpKind::Update { old_tid, new_tid, vector } => {
+            // Atomic delete + insert
+            self.index.delete(old_tid);
+            self.index.insert(new_tid, &vector, op.lsn);
+        }
+    }
+
+    self.lsn_tracker.update(op.collection_id, op.lsn);
+    Ok(())
+}
+```
+
+### Idempotent Operations
+
+```rust
+/// All engine operations must be idempotent for safe replay
+impl Engine {
+    /// Idempotent insert - safe to replay
+    pub fn redo_insert(&mut self, data: IndexInsertData) {
+        // Check if already exists
+        if self.index.contains_tid(data.tid) {
+            // Already inserted, skip
+            return;
+        }
+
+        // Insert with LSN tracking
+        self.index.insert_with_lsn(data.tid, &data.vector, data.lsn);
+    }
+
+    /// Idempotent delete - safe to replay
+    pub fn redo_delete(&mut self, data: IndexDeleteData) {
+        // Check if already deleted
+        if !self.index.contains_tid(data.tid) {
+            // Already deleted, skip
+            return;
+        }
+
+        // Delete with tombstone
+        self.index.delete_with_lsn(data.tid, data.lsn);
+    }
+
+    /// Idempotent edge add - safe to replay
+    pub fn redo_edge_add(&mut self, data: HnswEdgeData) {
+        // HNSW edges are idempotent by nature
+        self.hnsw.add_edge(data.from, data.to, data.lsn);
+    }
+}
+```
+
+### LSN-Based Deduplication
+
+```rust
+/// Track applied LSN per collection
+pub struct LsnTracker {
+    applied_lsn: HashMap<i32, XLogRecPtr>,
+}
+
+impl LsnTracker {
+    /// Check if operation should be applied
+    pub fn should_apply(&self, collection_id: i32, lsn: XLogRecPtr) -> bool {
+        match self.applied_lsn.get(&collection_id) {
+            Some(&last_lsn) => lsn > last_lsn,
+            None => true,
+        }
+    }
+
+    /// Mark operation as applied
+    pub fn mark_applied(&mut self, collection_id: i32, lsn: XLogRecPtr) {
+        self.applied_lsn.insert(collection_id, lsn);
+    }
+}
+```
+
+---
+
+## Replication Strategies
+
+### Physical Replication (Streaming)
+
+```
+Primary → Standby streaming with RuVector:
+
+Primary:
+1. Write heap + index changes
+2. Generate WAL records
+3. Stream to standby
+
+Standby:
+1. Receive WAL stream
+2. Apply heap changes (PostgreSQL)
+3. Apply index changes (RuVector redo)
+4. Engine state matches primary
+```
+
+### Logical Replication
+
+```
+Publisher → Subscriber with RuVector:
+
+Publisher:
+1. Changes captured via logical decoding
+2. RuVector output plugin extracts vector changes
+3. Publishes to replication slot
+
+Subscriber:
+1. Receives logical changes
+2. Applies to local heap
+3. Local RuVector engine indexes changes
+4. Independent index structures
+```
+
+---
+
+## Configuration
+
+```sql
+-- Consistency configuration
+ALTER SYSTEM SET ruvector.consistency_mode = 'hybrid';  -- 'sync', 'async', 'hybrid'
+ALTER SYSTEM SET ruvector.max_lag_ms = 100;             -- Max staleness window
+ALTER SYSTEM SET ruvector.visibility_recheck = true;    -- Always recheck heap
+ALTER SYSTEM SET ruvector.wal_level = 'logical';        -- For logical replication
+
+-- Recovery configuration
+ALTER SYSTEM SET ruvector.checkpoint_interval = 300;    -- Checkpoint every 5 min
+ALTER SYSTEM SET ruvector.wal_buffer_size = '64MB';     -- WAL buffer
+ALTER SYSTEM SET ruvector.recovery_target_timeline = 'latest';
+```
+
+---
+
+## Monitoring
+
+```sql
+-- Consistency lag monitoring
+SELECT
+    c.name AS collection,
+    s.last_heap_lsn,
+    s.last_index_lsn,
+    pg_wal_lsn_diff(s.last_heap_lsn, s.last_index_lsn) AS lag_bytes,
+    s.lag_ms,
+    s.pending_changes
+FROM ruvector.consistency_status s
+JOIN ruvector.collections c ON s.collection_id = c.id;
+
+-- Visibility recheck statistics
+SELECT
+    collection_name,
+    total_searches,
+    visibility_rechecks,
+    invisible_filtered,
+    (invisible_filtered::float / NULLIF(visibility_rechecks, 0) * 100)::numeric(5,2) AS invisible_pct
+FROM ruvector.visibility_stats
+ORDER BY invisible_pct DESC;
+
+-- WAL replay status
+SELECT
+    pg_last_wal_receive_lsn() AS receive_lsn,
+    pg_last_wal_replay_lsn() AS replay_lsn,
+    ruvector_last_applied_lsn() AS ruvector_lsn,
+    pg_wal_lsn_diff(pg_last_wal_replay_lsn(), ruvector_last_applied_lsn()) AS ruvector_lag_bytes;
+```
+
+---
+
+## Testing Requirements
+
+### Unit Tests
+- Visibility check correctness
+- Idempotent operation replay
+- LSN tracking accuracy
+- MVCC snapshot handling
+
+### Integration Tests
+- Crash recovery scenarios
+- Concurrent transaction visibility
+- Replication lag handling
+- HOT update handling
+
+### Chaos Tests
+- Primary failover
+- Network partition during replication
+- Partial WAL replay
+- Checkpoint corruption recovery
+
+---
+
+## Summary
+
+The v2 consistency model ensures:
+
+1. **Heap is authoritative** - All visibility decisions defer to PostgreSQL heap
+2. **Bounded staleness** - Index catches up within configurable lag window
+3. **Crash safe** - WAL-based recovery with idempotent replay
+4. **Replication compatible** - Works with streaming and logical replication
+5. **MVCC aware** - Respects transaction isolation guarantees