Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/examples/delta-behavior/research/WASM-DELTA-ARCHITECTURE.md
+++ b/vendor/ruvector/examples/delta-behavior/research/WASM-DELTA-ARCHITECTURE.md
@@ -0,0 +1,362 @@
+# WASM Delta Computation Research Report
+
+## Executive Summary
+
+This research analyzes the existing ruvector WASM infrastructure and designs a novel delta computation architecture optimized for vector database incremental updates. The proposed system leverages WASM SIMD128, shared memory protocols, and the WASM component model to achieve sub-100µs delta application latency.
+
+---
+
+## 1. Current WASM Infrastructure Analysis
+
+### 1.1 Existing WASM Crates in RuVector
+
+| Crate | Purpose | Delta Relevance |
+|-------|---------|-----------------|
+| `ruvector-wasm` | Core VectorDB bindings | Memory protocol foundation |
+| `ruvector-gnn-wasm` | Graph Neural Networks | Node embedding deltas |
+| `ruvector-graph-wasm` | Graph database | Structure deltas |
+| `ruvector-learning-wasm` | MicroLoRA training | Weight deltas |
+| `ruvector-mincut-wasm` | Graph partitioning | Partition deltas |
+| `ruvector-attention-wasm` | Attention mechanisms | KV cache deltas |
+
+### 1.2 Key Patterns Identified
+
+**Memory Layout Protocol** (from `ruvector-wasm/src/kernel/memory.rs`):
+- 64KB page-aligned allocations
+- 16-byte SIMD alignment
+- Region-based memory validation
+- Zero-copy tensor descriptors
+
+**Batch Operations** (from `ruvector-mincut/src/optimization/wasm_batch.rs`):
+- TypedArray bulk transfers
+- Operation batching to minimize FFI overhead
+- 64-byte AVX-512 alignment for SIMD compatibility
+
+**SIMD Distance Operations** (from `simd_distance.rs`):
+- WASM SIMD128 intrinsics for parallel min/max
+- Batch relaxation for Dijkstra-style updates
+- Scalar fallback for non-SIMD environments
+
+---
+
+## 2. WASM Delta Primitives Design
+
+### 2.1 WIT Interface Definition
+
+```wit
+// delta-streaming.wit
+package ruvector:delta@0.1.0;
+
+/// Delta operation types for incremental updates
+enum delta-operation {
+    insert,
+    update,
+    delete,
+    batch-update,
+    reindex-layers,
+}
+
+/// Delta header for streaming protocol
+record delta-header {
+    sequence: u64,
+    operation: delta-operation,
+    vector-id: option<string>,
+    timestamp: u64,
+    payload-size: u32,
+    checksum: u64,
+}
+
+/// Delta payload for vector operations
+record vector-delta {
+    id: string,
+    changed-dims: list<u32>,
+    new-values: list<f32>,
+    metadata-delta: list<tuple<string, string>>,
+}
+
+/// HNSW index delta for graph structure changes
+record hnsw-delta {
+    layer: u8,
+    add-edges: list<tuple<u32, u32, f32>>,
+    remove-edges: list<tuple<u32, u32>>,
+    entry-point-update: option<u32>,
+}
+
+/// Delta stream interface for producers
+interface delta-capture {
+    init-capture: func(db-id: string, config: capture-config) -> result<capture-handle, delta-error>;
+    start-capture: func(handle: capture-handle) -> result<_, delta-error>;
+    poll-deltas: func(handle: capture-handle, max-batch: u32) -> result<list<delta-header>, delta-error>;
+    get-payload: func(handle: capture-handle, sequence: u64) -> result<list<u8>, delta-error>;
+    checkpoint: func(handle: capture-handle) -> result<checkpoint-marker, delta-error>;
+}
+
+/// Delta stream interface for consumers
+interface delta-apply {
+    init-apply: func(db-id: string, config: apply-config) -> result<apply-handle, delta-error>;
+    apply-delta: func(handle: apply-handle, header: delta-header, payload: list<u8>) -> result<u64, delta-error>;
+    apply-batch: func(handle: apply-handle, deltas: list<tuple<delta-header, list<u8>>>) -> result<batch-result, delta-error>;
+    current-position: func(handle: apply-handle) -> result<u64, delta-error>;
+    seek: func(handle: apply-handle, sequence: u64) -> result<_, delta-error>;
+}
+```
+
+### 2.2 Memory Layout for Delta Structures
+
+```
+Delta Ring Buffer Memory Layout (64KB pages):
+------------------------------------------------------------------+
+| Page 0-3: Delta Headers (64KB total)                              |
+| +--------------------------------------------------------------+ |
+| | Header 0     | Header 1     | Header 2     | ...              | |
+| | [64 bytes]   | [64 bytes]   | [64 bytes]   |                  | |
+| +--------------------------------------------------------------+ |
+|                                                                  |
+| Header Structure (64 bytes, cache-line aligned):                 |
+| +--------------------------------------------------------------+ |
+| | sequence: u64          | 8 bytes                              | |
+| | operation: u8          | 1 byte                               | |
+| | flags: u8              | 1 byte                               | |
+| | reserved: u16          | 2 bytes                              | |
+| | vector_id_hash: u32    | 4 bytes                              | |
+| | timestamp: u64         | 8 bytes                              | |
+| | payload_offset: u32    | 4 bytes                              | |
+| | payload_size: u32      | 4 bytes                              | |
+| | checksum: u64          | 8 bytes                              | |
+| | prev_sequence: u64     | 8 bytes (for linked list)           | |
+| | padding: [u8; 16]      | 16 bytes (to 64)                     | |
+| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
+| Pages 4-N: Delta Payloads (variable)                             |
+| +--------------------------------------------------------------+ |
+| | Compressed delta data                                         | |
+| | [SIMD-aligned, 16-byte boundary]                              | |
+| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
+```
+
+---
+
+## 3. Novel WASM Delta Architecture
+
+### 3.1 Architecture Diagram
+
+```
+=====================================================================+
+|                     DELTA HOST RUNTIME                               |
+=====================================================================+
+|                                                                      |
+|  +-------------------------+     +-----------------------------+     |
+|  |   Change Capture        |     |     Delta Apply Engine      |     |
+|  |   (Producer Side)       |     |     (Consumer Side)         |     |
+|  +-------------------------+     +-----------------------------+     |
+|  | - Vector write hooks    |     | - Sequence validation       |     |
+|  | - HNSW mutation capture |     | - Conflict detection        |     |
+|  | - Batch accumulation    |     | - Parallel application      |     |
+|  | - Compression pipeline  |     | - Index maintenance         |     |
+|  +-------------------------+     +-----------------------------+     |
+|           |                                    |                     |
+|           v                                    v                     |
+|  +===========================================================+      |
+|  |            SHARED DELTA MEMORY (WebAssembly.Memory)        |      |
+|  +===========================================================+      |
+|  |  +-------------+  +-------------+  +-------------------+   |      |
+|  |  | Capture     |  | Process     |  | Apply             |   |      |
+|  |  | WASM Module |  | WASM Module |  | WASM Module       |   |      |
+|  |  +-------------+  +-------------+  +-------------------+   |      |
+|  |  | - Intercept |  | - Filter    |  | - Decompress      |   |      |
+|  |  | - Serialize |  | - Transform |  | - SIMD apply      |   |      |
+|  |  | - Compress  |  | - Route     |  | - Index update    |   |      |
+|  |  +-------------+  +-------------+  +-------------------+   |      |
+|  |       |               |                  |                 |      |
+|  |       v               v                  v                 |      |
+|  |  +===========================================================+   |
+|  |  |                 DELTA RING BUFFER                         |   |
+|  |  +===========================================================+   |
+|  +===========================================================+      |
+|                                                                      |
+=====================================================================+
+```
+
+### 3.2 Three-Stage Delta Pipeline
+
+```rust
+/// Stage 1: Capture WASM Module
+#[wasm_bindgen]
+pub struct DeltaCaptureModule {
+    sequence: AtomicU64,
+    pending: RingBuffer<DeltaHeader>,
+    compressor: LZ4Compressor,
+    stats: CaptureStats,
+}
+
+impl DeltaCaptureModule {
+    /// SIMD-accelerated diff computation
+    #[cfg(target_feature = "simd128")]
+    fn compute_diff(&self, old: &[f32], new: &[f32]) -> Vec<(u32, f32)> {
+        use core::arch::wasm32::*;
+
+        let mut changes = Vec::new();
+        let epsilon = f32x4_splat(1e-6);
+
+        for (i, chunk) in old.chunks_exact(4).enumerate() {
+            let old_v = v128_load(chunk.as_ptr() as *const v128);
+            let new_v = v128_load(new[i*4..].as_ptr() as *const v128);
+
+            let diff = f32x4_sub(new_v, old_v);
+            let abs_diff = f32x4_abs(diff);
+            let mask = f32x4_gt(abs_diff, epsilon);
+
+            if v128_any_true(mask) {
+                for j in 0..4 {
+                    let idx = i * 4 + j;
+                    if (old[idx] - new[idx]).abs() > 1e-6 {
+                        changes.push((idx as u32, new[idx]));
+                    }
+                }
+            }
+        }
+        changes
+    }
+}
+
+/// Stage 3: Apply WASM Module
+impl DeltaApplyModule {
+    /// Apply single delta with SIMD acceleration
+    #[cfg(target_feature = "simd128")]
+    pub fn apply_vector_delta_simd(
+        &mut self,
+        vector_ptr: *mut f32,
+        dim_indices: &[u32],
+        new_values: &[f32],
+    ) -> Result<u64, DeltaError> {
+        use core::arch::wasm32::*;
+
+        let start = std::time::Instant::now();
+
+        // Process 4 updates at a time using SIMD
+        let chunks = dim_indices.len() / 4;
+
+        for i in 0..chunks {
+            let idx_base = i * 4;
+            let val_v = v128_load(new_values[idx_base..].as_ptr() as *const v128);
+
+            for j in 0..4 {
+                let idx = dim_indices[idx_base + j] as usize;
+                unsafe { *vector_ptr.add(idx) = new_values[idx_base + j]; }
+            }
+        }
+
+        // Handle remainder
+        for i in (chunks * 4)..dim_indices.len() {
+            let idx = dim_indices[i] as usize;
+            unsafe { *vector_ptr.add(idx) = new_values[i]; }
+        }
+
+        Ok(start.elapsed().as_micros() as u64)
+    }
+}
+```
+
+---
+
+## 4. Performance Benchmarks Targets
+
+### 4.1 Delta Operation Latency Targets
+
+| Operation | Target Latency | Notes |
+|-----------|---------------|-------|
+| Single vector insert | <50µs | Zero-copy path |
+| Single vector update (dense) | <30µs | Full vector replacement |
+| Single vector update (sparse) | <10µs | <10% dimensions changed |
+| Vector delete | <20µs | Mark deleted + async cleanup |
+| HNSW edge add (single) | <15µs | Per layer |
+| HNSW edge remove (single) | <10µs | Per layer |
+| Batch insert (100 vectors) | <2ms | Amortized 20µs/vector |
+| Batch update (100 vectors) | <1ms | Amortized 10µs/vector |
+
+### 4.2 Throughput Targets
+
+| Metric | Target | Configuration |
+|--------|--------|---------------|
+| Delta capture rate | 50K deltas/sec | Single producer |
+| Delta apply rate | 100K deltas/sec | 4 parallel workers |
+| Delta compression ratio | 4:1 | Typical vector updates |
+| Ring buffer throughput | 200MB/sec | Shared memory path |
+
+---
+
+## 5. Lock-Free Ring Buffer
+
+```rust
+/// Lock-free SPSC ring buffer for delta streaming
+#[repr(C, align(64))]
+pub struct DeltaRingBuffer {
+    capacity: u32,
+    mask: u32,
+    read_pos: AtomicU64,  // Cache-line padded
+    _pad1: [u8; 56],
+    write_pos: AtomicU64, // Cache-line padded
+    _pad2: [u8; 56],
+    headers: *mut DeltaHeader,
+    payloads: *mut u8,
+}
+
+impl DeltaRingBuffer {
+    #[inline]
+    pub fn try_reserve(&self, payload_size: u32) -> Option<ReservedSlot> {
+        let write = self.write_pos.load(Ordering::Relaxed);
+        let read = self.read_pos.load(Ordering::Acquire);
+
+        if write.wrapping_sub(read) >= self.capacity as u64 {
+            return None;
+        }
+
+        match self.write_pos.compare_exchange_weak(
+            write, write + 1, Ordering::AcqRel, Ordering::Relaxed,
+        ) {
+            Ok(_) => Some(ReservedSlot { sequence: write, /* ... */ }),
+            Err(_) => None,
+        }
+    }
+}
+```
+
+---
+
+## 6. Performance Projections
+
+| Scenario | Current (no delta) | With Delta System | Improvement |
+|----------|-------------------|-------------------|-------------|
+| Single vector update | ~500µs | <30µs | **16x** |
+| Batch 100 vectors | ~50ms | <2ms | **25x** |
+| HNSW reindex | ~10ms | <1ms (incremental) | **10x** |
+| Memory overhead | 0 | +1MB per database | Acceptable |
+
+---
+
+## 7. Recommended Implementation Order
+
+1. **Phase 1**: Implement `DeltaRingBuffer` and basic capture in `ruvector-wasm`
+2. **Phase 2**: Add SIMD-accelerated apply module with sparse update path
+3. **Phase 3**: Integrate with `ruvector-graph-wasm` for structure deltas
+4. **Phase 4**: Add WIT interfaces for component model support
+5. **Phase 5**: Implement parallel application with shared memory workers
+
+---
+
+## 8. Integration with Δ-Behavior
+
+The WASM delta system directly supports Δ-behavior enforcement:
+
+| Δ-Behavior Property | WASM Implementation |
+|---------------------|---------------------|
+| **Local Change** | Sparse updates, bounded payload sizes |
+| **Global Preservation** | Coherence check in apply stage |
+| **Violation Resistance** | Ring buffer backpressure, validation |
+| **Closure Preference** | Delta compaction toward stable states |
+
+The three-stage pipeline naturally implements the three enforcement layers:
+- **Capture** → Energy cost (compression overhead)
+- **Process** → Scheduling (filtering, routing)
+- **Apply** → Memory gate (validation, commit)