git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
14 KiB
14 KiB
WASM Delta Computation Research Report
Executive Summary
This research analyzes the existing ruvector WASM infrastructure and designs a novel delta computation architecture optimized for vector database incremental updates. The proposed system leverages WASM SIMD128, shared memory protocols, and the WASM component model to achieve sub-100µs delta application latency.
1. Current WASM Infrastructure Analysis
1.1 Existing WASM Crates in RuVector
| Crate | Purpose | Delta Relevance |
|---|---|---|
ruvector-wasm |
Core VectorDB bindings | Memory protocol foundation |
ruvector-gnn-wasm |
Graph Neural Networks | Node embedding deltas |
ruvector-graph-wasm |
Graph database | Structure deltas |
ruvector-learning-wasm |
MicroLoRA training | Weight deltas |
ruvector-mincut-wasm |
Graph partitioning | Partition deltas |
ruvector-attention-wasm |
Attention mechanisms | KV cache deltas |
1.2 Key Patterns Identified
Memory Layout Protocol (from ruvector-wasm/src/kernel/memory.rs):
- 64KB page-aligned allocations
- 16-byte SIMD alignment
- Region-based memory validation
- Zero-copy tensor descriptors
Batch Operations (from ruvector-mincut/src/optimization/wasm_batch.rs):
- TypedArray bulk transfers
- Operation batching to minimize FFI overhead
- 64-byte AVX-512 alignment for SIMD compatibility
SIMD Distance Operations (from simd_distance.rs):
- WASM SIMD128 intrinsics for parallel min/max
- Batch relaxation for Dijkstra-style updates
- Scalar fallback for non-SIMD environments
2. WASM Delta Primitives Design
2.1 WIT Interface Definition
// delta-streaming.wit
package ruvector:delta@0.1.0;
/// Delta operation types for incremental updates
enum delta-operation {
insert,
update,
delete,
batch-update,
reindex-layers,
}
/// Delta header for streaming protocol
record delta-header {
sequence: u64,
operation: delta-operation,
vector-id: option<string>,
timestamp: u64,
payload-size: u32,
checksum: u64,
}
/// Delta payload for vector operations
record vector-delta {
id: string,
changed-dims: list<u32>,
new-values: list<f32>,
metadata-delta: list<tuple<string, string>>,
}
/// HNSW index delta for graph structure changes
record hnsw-delta {
layer: u8,
add-edges: list<tuple<u32, u32, f32>>,
remove-edges: list<tuple<u32, u32>>,
entry-point-update: option<u32>,
}
/// Delta stream interface for producers
interface delta-capture {
init-capture: func(db-id: string, config: capture-config) -> result<capture-handle, delta-error>;
start-capture: func(handle: capture-handle) -> result<_, delta-error>;
poll-deltas: func(handle: capture-handle, max-batch: u32) -> result<list<delta-header>, delta-error>;
get-payload: func(handle: capture-handle, sequence: u64) -> result<list<u8>, delta-error>;
checkpoint: func(handle: capture-handle) -> result<checkpoint-marker, delta-error>;
}
/// Delta stream interface for consumers
interface delta-apply {
init-apply: func(db-id: string, config: apply-config) -> result<apply-handle, delta-error>;
apply-delta: func(handle: apply-handle, header: delta-header, payload: list<u8>) -> result<u64, delta-error>;
apply-batch: func(handle: apply-handle, deltas: list<tuple<delta-header, list<u8>>>) -> result<batch-result, delta-error>;
current-position: func(handle: apply-handle) -> result<u64, delta-error>;
seek: func(handle: apply-handle, sequence: u64) -> result<_, delta-error>;
}
2.2 Memory Layout for Delta Structures
Delta Ring Buffer Memory Layout (64KB pages):
+------------------------------------------------------------------+
| Page 0-3: Delta Headers (64KB total) |
| +--------------------------------------------------------------+ |
| | Header 0 | Header 1 | Header 2 | ... | |
| | [64 bytes] | [64 bytes] | [64 bytes] | | |
| +--------------------------------------------------------------+ |
| |
| Header Structure (64 bytes, cache-line aligned): |
| +--------------------------------------------------------------+ |
| | sequence: u64 | 8 bytes | |
| | operation: u8 | 1 byte | |
| | flags: u8 | 1 byte | |
| | reserved: u16 | 2 bytes | |
| | vector_id_hash: u32 | 4 bytes | |
| | timestamp: u64 | 8 bytes | |
| | payload_offset: u32 | 4 bytes | |
| | payload_size: u32 | 4 bytes | |
| | checksum: u64 | 8 bytes | |
| | prev_sequence: u64 | 8 bytes (for linked list) | |
| | padding: [u8; 16] | 16 bytes (to 64) | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
| Pages 4-N: Delta Payloads (variable) |
| +--------------------------------------------------------------+ |
| | Compressed delta data | |
| | [SIMD-aligned, 16-byte boundary] | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
3. Novel WASM Delta Architecture
3.1 Architecture Diagram
+=====================================================================+
| DELTA HOST RUNTIME |
+=====================================================================+
| |
| +-------------------------+ +-----------------------------+ |
| | Change Capture | | Delta Apply Engine | |
| | (Producer Side) | | (Consumer Side) | |
| +-------------------------+ +-----------------------------+ |
| | - Vector write hooks | | - Sequence validation | |
| | - HNSW mutation capture | | - Conflict detection | |
| | - Batch accumulation | | - Parallel application | |
| | - Compression pipeline | | - Index maintenance | |
| +-------------------------+ +-----------------------------+ |
| | | |
| v v |
| +===========================================================+ |
| | SHARED DELTA MEMORY (WebAssembly.Memory) | |
| +===========================================================+ |
| | +-------------+ +-------------+ +-------------------+ | |
| | | Capture | | Process | | Apply | | |
| | | WASM Module | | WASM Module | | WASM Module | | |
| | +-------------+ +-------------+ +-------------------+ | |
| | | - Intercept | | - Filter | | - Decompress | | |
| | | - Serialize | | - Transform | | - SIMD apply | | |
| | | - Compress | | - Route | | - Index update | | |
| | +-------------+ +-------------+ +-------------------+ | |
| | | | | | |
| | v v v | |
| | +===========================================================+ |
| | | DELTA RING BUFFER | |
| | +===========================================================+ |
| +===========================================================+ |
| |
+=====================================================================+
3.2 Three-Stage Delta Pipeline
/// Stage 1: Capture WASM Module
#[wasm_bindgen]
pub struct DeltaCaptureModule {
sequence: AtomicU64,
pending: RingBuffer<DeltaHeader>,
compressor: LZ4Compressor,
stats: CaptureStats,
}
impl DeltaCaptureModule {
/// SIMD-accelerated diff computation
#[cfg(target_feature = "simd128")]
fn compute_diff(&self, old: &[f32], new: &[f32]) -> Vec<(u32, f32)> {
use core::arch::wasm32::*;
let mut changes = Vec::new();
let epsilon = f32x4_splat(1e-6);
for (i, chunk) in old.chunks_exact(4).enumerate() {
let old_v = v128_load(chunk.as_ptr() as *const v128);
let new_v = v128_load(new[i*4..].as_ptr() as *const v128);
let diff = f32x4_sub(new_v, old_v);
let abs_diff = f32x4_abs(diff);
let mask = f32x4_gt(abs_diff, epsilon);
if v128_any_true(mask) {
for j in 0..4 {
let idx = i * 4 + j;
if (old[idx] - new[idx]).abs() > 1e-6 {
changes.push((idx as u32, new[idx]));
}
}
}
}
changes
}
}
/// Stage 3: Apply WASM Module
impl DeltaApplyModule {
/// Apply single delta with SIMD acceleration
#[cfg(target_feature = "simd128")]
pub fn apply_vector_delta_simd(
&mut self,
vector_ptr: *mut f32,
dim_indices: &[u32],
new_values: &[f32],
) -> Result<u64, DeltaError> {
use core::arch::wasm32::*;
let start = std::time::Instant::now();
// Process 4 updates at a time using SIMD
let chunks = dim_indices.len() / 4;
for i in 0..chunks {
let idx_base = i * 4;
let val_v = v128_load(new_values[idx_base..].as_ptr() as *const v128);
for j in 0..4 {
let idx = dim_indices[idx_base + j] as usize;
unsafe { *vector_ptr.add(idx) = new_values[idx_base + j]; }
}
}
// Handle remainder
for i in (chunks * 4)..dim_indices.len() {
let idx = dim_indices[i] as usize;
unsafe { *vector_ptr.add(idx) = new_values[i]; }
}
Ok(start.elapsed().as_micros() as u64)
}
}
4. Performance Benchmarks Targets
4.1 Delta Operation Latency Targets
| Operation | Target Latency | Notes |
|---|---|---|
| Single vector insert | <50µs | Zero-copy path |
| Single vector update (dense) | <30µs | Full vector replacement |
| Single vector update (sparse) | <10µs | <10% dimensions changed |
| Vector delete | <20µs | Mark deleted + async cleanup |
| HNSW edge add (single) | <15µs | Per layer |
| HNSW edge remove (single) | <10µs | Per layer |
| Batch insert (100 vectors) | <2ms | Amortized 20µs/vector |
| Batch update (100 vectors) | <1ms | Amortized 10µs/vector |
4.2 Throughput Targets
| Metric | Target | Configuration |
|---|---|---|
| Delta capture rate | 50K deltas/sec | Single producer |
| Delta apply rate | 100K deltas/sec | 4 parallel workers |
| Delta compression ratio | 4:1 | Typical vector updates |
| Ring buffer throughput | 200MB/sec | Shared memory path |
5. Lock-Free Ring Buffer
/// Lock-free SPSC ring buffer for delta streaming
#[repr(C, align(64))]
pub struct DeltaRingBuffer {
capacity: u32,
mask: u32,
read_pos: AtomicU64, // Cache-line padded
_pad1: [u8; 56],
write_pos: AtomicU64, // Cache-line padded
_pad2: [u8; 56],
headers: *mut DeltaHeader,
payloads: *mut u8,
}
impl DeltaRingBuffer {
#[inline]
pub fn try_reserve(&self, payload_size: u32) -> Option<ReservedSlot> {
let write = self.write_pos.load(Ordering::Relaxed);
let read = self.read_pos.load(Ordering::Acquire);
if write.wrapping_sub(read) >= self.capacity as u64 {
return None;
}
match self.write_pos.compare_exchange_weak(
write, write + 1, Ordering::AcqRel, Ordering::Relaxed,
) {
Ok(_) => Some(ReservedSlot { sequence: write, /* ... */ }),
Err(_) => None,
}
}
}
6. Performance Projections
| Scenario | Current (no delta) | With Delta System | Improvement |
|---|---|---|---|
| Single vector update | ~500µs | <30µs | 16x |
| Batch 100 vectors | ~50ms | <2ms | 25x |
| HNSW reindex | ~10ms | <1ms (incremental) | 10x |
| Memory overhead | 0 | +1MB per database | Acceptable |
7. Recommended Implementation Order
- Phase 1: Implement
DeltaRingBufferand basic capture inruvector-wasm - Phase 2: Add SIMD-accelerated apply module with sparse update path
- Phase 3: Integrate with
ruvector-graph-wasmfor structure deltas - Phase 4: Add WIT interfaces for component model support
- Phase 5: Implement parallel application with shared memory workers
8. Integration with Δ-Behavior
The WASM delta system directly supports Δ-behavior enforcement:
| Δ-Behavior Property | WASM Implementation |
|---|---|
| Local Change | Sparse updates, bounded payload sizes |
| Global Preservation | Coherence check in apply stage |
| Violation Resistance | Ring buffer backpressure, validation |
| Closure Preference | Delta compaction toward stable states |
The three-stage pipeline naturally implements the three enforcement layers:
- Capture → Energy cost (compression overhead)
- Process → Scheduling (filtering, routing)
- Apply → Memory gate (validation, commit)