Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
362
vendor/ruvector/examples/delta-behavior/research/WASM-DELTA-ARCHITECTURE.md
vendored
Normal file
362
vendor/ruvector/examples/delta-behavior/research/WASM-DELTA-ARCHITECTURE.md
vendored
Normal file
@@ -0,0 +1,362 @@
|
||||
# WASM Delta Computation Research Report
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This research analyzes the existing ruvector WASM infrastructure and designs a novel delta computation architecture optimized for vector database incremental updates. The proposed system leverages WASM SIMD128, shared memory protocols, and the WASM component model to achieve sub-100µs delta application latency.
|
||||
|
||||
---
|
||||
|
||||
## 1. Current WASM Infrastructure Analysis
|
||||
|
||||
### 1.1 Existing WASM Crates in RuVector
|
||||
|
||||
| Crate | Purpose | Delta Relevance |
|
||||
|-------|---------|-----------------|
|
||||
| `ruvector-wasm` | Core VectorDB bindings | Memory protocol foundation |
|
||||
| `ruvector-gnn-wasm` | Graph Neural Networks | Node embedding deltas |
|
||||
| `ruvector-graph-wasm` | Graph database | Structure deltas |
|
||||
| `ruvector-learning-wasm` | MicroLoRA training | Weight deltas |
|
||||
| `ruvector-mincut-wasm` | Graph partitioning | Partition deltas |
|
||||
| `ruvector-attention-wasm` | Attention mechanisms | KV cache deltas |
|
||||
|
||||
### 1.2 Key Patterns Identified
|
||||
|
||||
**Memory Layout Protocol** (from `ruvector-wasm/src/kernel/memory.rs`):
|
||||
- 64KB page-aligned allocations
|
||||
- 16-byte SIMD alignment
|
||||
- Region-based memory validation
|
||||
- Zero-copy tensor descriptors
|
||||
|
||||
**Batch Operations** (from `ruvector-mincut/src/optimization/wasm_batch.rs`):
|
||||
- TypedArray bulk transfers
|
||||
- Operation batching to minimize FFI overhead
|
||||
- 64-byte AVX-512 alignment for SIMD compatibility
|
||||
|
||||
**SIMD Distance Operations** (from `simd_distance.rs`):
|
||||
- WASM SIMD128 intrinsics for parallel min/max
|
||||
- Batch relaxation for Dijkstra-style updates
|
||||
- Scalar fallback for non-SIMD environments
|
||||
|
||||
---
|
||||
|
||||
## 2. WASM Delta Primitives Design
|
||||
|
||||
### 2.1 WIT Interface Definition
|
||||
|
||||
```wit
|
||||
// delta-streaming.wit
|
||||
package ruvector:delta@0.1.0;
|
||||
|
||||
/// Delta operation types for incremental updates
|
||||
enum delta-operation {
|
||||
insert,
|
||||
update,
|
||||
delete,
|
||||
batch-update,
|
||||
reindex-layers,
|
||||
}
|
||||
|
||||
/// Delta header for streaming protocol
|
||||
record delta-header {
|
||||
sequence: u64,
|
||||
operation: delta-operation,
|
||||
vector-id: option<string>,
|
||||
timestamp: u64,
|
||||
payload-size: u32,
|
||||
checksum: u64,
|
||||
}
|
||||
|
||||
/// Delta payload for vector operations
|
||||
record vector-delta {
|
||||
id: string,
|
||||
changed-dims: list<u32>,
|
||||
new-values: list<f32>,
|
||||
metadata-delta: list<tuple<string, string>>,
|
||||
}
|
||||
|
||||
/// HNSW index delta for graph structure changes
|
||||
record hnsw-delta {
|
||||
layer: u8,
|
||||
add-edges: list<tuple<u32, u32, f32>>,
|
||||
remove-edges: list<tuple<u32, u32>>,
|
||||
entry-point-update: option<u32>,
|
||||
}
|
||||
|
||||
/// Delta stream interface for producers
|
||||
interface delta-capture {
|
||||
init-capture: func(db-id: string, config: capture-config) -> result<capture-handle, delta-error>;
|
||||
start-capture: func(handle: capture-handle) -> result<_, delta-error>;
|
||||
poll-deltas: func(handle: capture-handle, max-batch: u32) -> result<list<delta-header>, delta-error>;
|
||||
get-payload: func(handle: capture-handle, sequence: u64) -> result<list<u8>, delta-error>;
|
||||
checkpoint: func(handle: capture-handle) -> result<checkpoint-marker, delta-error>;
|
||||
}
|
||||
|
||||
/// Delta stream interface for consumers
|
||||
interface delta-apply {
|
||||
init-apply: func(db-id: string, config: apply-config) -> result<apply-handle, delta-error>;
|
||||
apply-delta: func(handle: apply-handle, header: delta-header, payload: list<u8>) -> result<u64, delta-error>;
|
||||
apply-batch: func(handle: apply-handle, deltas: list<tuple<delta-header, list<u8>>>) -> result<batch-result, delta-error>;
|
||||
current-position: func(handle: apply-handle) -> result<u64, delta-error>;
|
||||
seek: func(handle: apply-handle, sequence: u64) -> result<_, delta-error>;
|
||||
}
|
||||
```
|
||||
|
||||
### 2.2 Memory Layout for Delta Structures
|
||||
|
||||
```
|
||||
Delta Ring Buffer Memory Layout (64KB pages):
|
||||
+------------------------------------------------------------------+
|
||||
| Page 0-3: Delta Headers (64KB total) |
|
||||
| +--------------------------------------------------------------+ |
|
||||
| | Header 0 | Header 1 | Header 2 | ... | |
|
||||
| | [64 bytes] | [64 bytes] | [64 bytes] | | |
|
||||
| +--------------------------------------------------------------+ |
|
||||
| |
|
||||
| Header Structure (64 bytes, cache-line aligned): |
|
||||
| +--------------------------------------------------------------+ |
|
||||
| | sequence: u64 | 8 bytes | |
|
||||
| | operation: u8 | 1 byte | |
|
||||
| | flags: u8 | 1 byte | |
|
||||
| | reserved: u16 | 2 bytes | |
|
||||
| | vector_id_hash: u32 | 4 bytes | |
|
||||
| | timestamp: u64 | 8 bytes | |
|
||||
| | payload_offset: u32 | 4 bytes | |
|
||||
| | payload_size: u32 | 4 bytes | |
|
||||
| | checksum: u64 | 8 bytes | |
|
||||
| | prev_sequence: u64 | 8 bytes (for linked list) | |
|
||||
| | padding: [u8; 16] | 16 bytes (to 64) | |
|
||||
| +--------------------------------------------------------------+ |
|
||||
+------------------------------------------------------------------+
|
||||
| Pages 4-N: Delta Payloads (variable) |
|
||||
| +--------------------------------------------------------------+ |
|
||||
| | Compressed delta data | |
|
||||
| | [SIMD-aligned, 16-byte boundary] | |
|
||||
| +--------------------------------------------------------------+ |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Novel WASM Delta Architecture
|
||||
|
||||
### 3.1 Architecture Diagram
|
||||
|
||||
```
|
||||
+=====================================================================+
|
||||
| DELTA HOST RUNTIME |
|
||||
+=====================================================================+
|
||||
| |
|
||||
| +-------------------------+ +-----------------------------+ |
|
||||
| | Change Capture | | Delta Apply Engine | |
|
||||
| | (Producer Side) | | (Consumer Side) | |
|
||||
| +-------------------------+ +-----------------------------+ |
|
||||
| | - Vector write hooks | | - Sequence validation | |
|
||||
| | - HNSW mutation capture | | - Conflict detection | |
|
||||
| | - Batch accumulation | | - Parallel application | |
|
||||
| | - Compression pipeline | | - Index maintenance | |
|
||||
| +-------------------------+ +-----------------------------+ |
|
||||
| | | |
|
||||
| v v |
|
||||
| +===========================================================+ |
|
||||
| | SHARED DELTA MEMORY (WebAssembly.Memory) | |
|
||||
| +===========================================================+ |
|
||||
| | +-------------+ +-------------+ +-------------------+ | |
|
||||
| | | Capture | | Process | | Apply | | |
|
||||
| | | WASM Module | | WASM Module | | WASM Module | | |
|
||||
| | +-------------+ +-------------+ +-------------------+ | |
|
||||
| | | - Intercept | | - Filter | | - Decompress | | |
|
||||
| | | - Serialize | | - Transform | | - SIMD apply | | |
|
||||
| | | - Compress | | - Route | | - Index update | | |
|
||||
| | +-------------+ +-------------+ +-------------------+ | |
|
||||
| | | | | | |
|
||||
| | v v v | |
|
||||
| | +===========================================================+ |
|
||||
| | | DELTA RING BUFFER | |
|
||||
| | +===========================================================+ |
|
||||
| +===========================================================+ |
|
||||
| |
|
||||
+=====================================================================+
|
||||
```
|
||||
|
||||
### 3.2 Three-Stage Delta Pipeline
|
||||
|
||||
```rust
|
||||
/// Stage 1: Capture WASM Module
|
||||
#[wasm_bindgen]
|
||||
pub struct DeltaCaptureModule {
|
||||
sequence: AtomicU64,
|
||||
pending: RingBuffer<DeltaHeader>,
|
||||
compressor: LZ4Compressor,
|
||||
stats: CaptureStats,
|
||||
}
|
||||
|
||||
impl DeltaCaptureModule {
|
||||
/// SIMD-accelerated diff computation
|
||||
#[cfg(target_feature = "simd128")]
|
||||
fn compute_diff(&self, old: &[f32], new: &[f32]) -> Vec<(u32, f32)> {
|
||||
use core::arch::wasm32::*;
|
||||
|
||||
let mut changes = Vec::new();
|
||||
let epsilon = f32x4_splat(1e-6);
|
||||
|
||||
for (i, chunk) in old.chunks_exact(4).enumerate() {
|
||||
let old_v = v128_load(chunk.as_ptr() as *const v128);
|
||||
let new_v = v128_load(new[i*4..].as_ptr() as *const v128);
|
||||
|
||||
let diff = f32x4_sub(new_v, old_v);
|
||||
let abs_diff = f32x4_abs(diff);
|
||||
let mask = f32x4_gt(abs_diff, epsilon);
|
||||
|
||||
if v128_any_true(mask) {
|
||||
for j in 0..4 {
|
||||
let idx = i * 4 + j;
|
||||
if (old[idx] - new[idx]).abs() > 1e-6 {
|
||||
changes.push((idx as u32, new[idx]));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
changes
|
||||
}
|
||||
}
|
||||
|
||||
/// Stage 3: Apply WASM Module
|
||||
impl DeltaApplyModule {
|
||||
/// Apply single delta with SIMD acceleration
|
||||
#[cfg(target_feature = "simd128")]
|
||||
pub fn apply_vector_delta_simd(
|
||||
&mut self,
|
||||
vector_ptr: *mut f32,
|
||||
dim_indices: &[u32],
|
||||
new_values: &[f32],
|
||||
) -> Result<u64, DeltaError> {
|
||||
use core::arch::wasm32::*;
|
||||
|
||||
let start = std::time::Instant::now();
|
||||
|
||||
// Process 4 updates at a time using SIMD
|
||||
let chunks = dim_indices.len() / 4;
|
||||
|
||||
for i in 0..chunks {
|
||||
let idx_base = i * 4;
|
||||
let val_v = v128_load(new_values[idx_base..].as_ptr() as *const v128);
|
||||
|
||||
for j in 0..4 {
|
||||
let idx = dim_indices[idx_base + j] as usize;
|
||||
unsafe { *vector_ptr.add(idx) = new_values[idx_base + j]; }
|
||||
}
|
||||
}
|
||||
|
||||
// Handle remainder
|
||||
for i in (chunks * 4)..dim_indices.len() {
|
||||
let idx = dim_indices[i] as usize;
|
||||
unsafe { *vector_ptr.add(idx) = new_values[i]; }
|
||||
}
|
||||
|
||||
Ok(start.elapsed().as_micros() as u64)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Performance Benchmarks Targets
|
||||
|
||||
### 4.1 Delta Operation Latency Targets
|
||||
|
||||
| Operation | Target Latency | Notes |
|
||||
|-----------|---------------|-------|
|
||||
| Single vector insert | <50µs | Zero-copy path |
|
||||
| Single vector update (dense) | <30µs | Full vector replacement |
|
||||
| Single vector update (sparse) | <10µs | <10% dimensions changed |
|
||||
| Vector delete | <20µs | Mark deleted + async cleanup |
|
||||
| HNSW edge add (single) | <15µs | Per layer |
|
||||
| HNSW edge remove (single) | <10µs | Per layer |
|
||||
| Batch insert (100 vectors) | <2ms | Amortized 20µs/vector |
|
||||
| Batch update (100 vectors) | <1ms | Amortized 10µs/vector |
|
||||
|
||||
### 4.2 Throughput Targets
|
||||
|
||||
| Metric | Target | Configuration |
|
||||
|--------|--------|---------------|
|
||||
| Delta capture rate | 50K deltas/sec | Single producer |
|
||||
| Delta apply rate | 100K deltas/sec | 4 parallel workers |
|
||||
| Delta compression ratio | 4:1 | Typical vector updates |
|
||||
| Ring buffer throughput | 200MB/sec | Shared memory path |
|
||||
|
||||
---
|
||||
|
||||
## 5. Lock-Free Ring Buffer
|
||||
|
||||
```rust
|
||||
/// Lock-free SPSC ring buffer for delta streaming
|
||||
#[repr(C, align(64))]
|
||||
pub struct DeltaRingBuffer {
|
||||
capacity: u32,
|
||||
mask: u32,
|
||||
read_pos: AtomicU64, // Cache-line padded
|
||||
_pad1: [u8; 56],
|
||||
write_pos: AtomicU64, // Cache-line padded
|
||||
_pad2: [u8; 56],
|
||||
headers: *mut DeltaHeader,
|
||||
payloads: *mut u8,
|
||||
}
|
||||
|
||||
impl DeltaRingBuffer {
|
||||
#[inline]
|
||||
pub fn try_reserve(&self, payload_size: u32) -> Option<ReservedSlot> {
|
||||
let write = self.write_pos.load(Ordering::Relaxed);
|
||||
let read = self.read_pos.load(Ordering::Acquire);
|
||||
|
||||
if write.wrapping_sub(read) >= self.capacity as u64 {
|
||||
return None;
|
||||
}
|
||||
|
||||
match self.write_pos.compare_exchange_weak(
|
||||
write, write + 1, Ordering::AcqRel, Ordering::Relaxed,
|
||||
) {
|
||||
Ok(_) => Some(ReservedSlot { sequence: write, /* ... */ }),
|
||||
Err(_) => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Performance Projections
|
||||
|
||||
| Scenario | Current (no delta) | With Delta System | Improvement |
|
||||
|----------|-------------------|-------------------|-------------|
|
||||
| Single vector update | ~500µs | <30µs | **16x** |
|
||||
| Batch 100 vectors | ~50ms | <2ms | **25x** |
|
||||
| HNSW reindex | ~10ms | <1ms (incremental) | **10x** |
|
||||
| Memory overhead | 0 | +1MB per database | Acceptable |
|
||||
|
||||
---
|
||||
|
||||
## 7. Recommended Implementation Order
|
||||
|
||||
1. **Phase 1**: Implement `DeltaRingBuffer` and basic capture in `ruvector-wasm`
|
||||
2. **Phase 2**: Add SIMD-accelerated apply module with sparse update path
|
||||
3. **Phase 3**: Integrate with `ruvector-graph-wasm` for structure deltas
|
||||
4. **Phase 4**: Add WIT interfaces for component model support
|
||||
5. **Phase 5**: Implement parallel application with shared memory workers
|
||||
|
||||
---
|
||||
|
||||
## 8. Integration with Δ-Behavior
|
||||
|
||||
The WASM delta system directly supports Δ-behavior enforcement:
|
||||
|
||||
| Δ-Behavior Property | WASM Implementation |
|
||||
|---------------------|---------------------|
|
||||
| **Local Change** | Sparse updates, bounded payload sizes |
|
||||
| **Global Preservation** | Coherence check in apply stage |
|
||||
| **Violation Resistance** | Ring buffer backpressure, validation |
|
||||
| **Closure Preference** | Delta compaction toward stable states |
|
||||
|
||||
The three-stage pipeline naturally implements the three enforcement layers:
|
||||
- **Capture** → Energy cost (compression overhead)
|
||||
- **Process** → Scheduling (filtering, routing)
|
||||
- **Apply** → Memory gate (validation, commit)
|
||||
Reference in New Issue
Block a user