Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/docs/research/rvf/benchmarks/acceptance-tests.md
+++ b/vendor/ruvector/docs/research/rvf/benchmarks/acceptance-tests.md
@@ -0,0 +1,341 @@
+# RVF Acceptance Tests and Performance Targets
+
+## 1. Primary Acceptance Test
+
+> **Cold start on a 10 million vector file: load and answer the first query with a
+> useful result (recall@10 >= 0.70) without reading more than the last 4 MB, then
+> converge to full quality (recall@10 >= 0.95) as it progressively maps more segments.**
+
+### Test Parameters
+
+```
+Dataset:         10 million vectors
+Dimensions:      384 (sentence embedding size)
+Base dtype:      fp16 (768 bytes per vector)
+Raw file size:   ~7.2 GB (vectors only)
+With index:      ~10-12 GB total
+Query set:       1000 queries from held-out test set
+Ground truth:    Brute-force exact k-NN (k=10)
+Metric:          L2 distance
+```
+
+### Success Criteria
+
+| Phase | Time Budget | Data Read | Min Recall@10 | Description |
+|-------|------------|-----------|---------------|-------------|
+| Boot | < 5 ms | 4 KB (Level 0) | N/A | Parse root manifest |
+| First query | < 50 ms | <= 4 MB | >= 0.70 | Layer A + hot cache |
+| Working quality | < 500 ms | <= 200 MB | >= 0.85 | Layer A + B |
+| Full quality | < 5 s | <= 4 GB | >= 0.95 | Layers A + B + C |
+| Optimized | < 30 s | Full file | >= 0.98 | All layers + hot tier |
+
+### Measurement Methodology
+
+```
+1. Create RVF file from 10M vector dataset
+   - Build full HNSW index (M=16, ef_construction=200)
+   - Compute temperature tiers (default: all warm initially)
+   - Write with all segment types
+
+2. Cold start measurement
+   - Drop filesystem cache: echo 3 > /proc/sys/vm/drop_caches
+   - Open file, start timer
+   - Read Level 0 (4 KB), record time T_boot
+   - Read hotset data, record time T_hotset
+   - Execute first query, record time T_first_query and recall@10
+   - Continue progressive loading
+   - At each milestone: record time, data read, recall@10
+
+3. Throughput measurement (warm)
+   - After full load, execute 1000 queries
+   - Measure queries per second (QPS)
+   - Measure p50, p95, p99 latency
+   - Measure recall@10 average
+
+4. Streaming ingest measurement
+   - Start with empty file
+   - Ingest 10M vectors in streaming mode
+   - Measure ingest rate (vectors/second)
+   - Measure file size over time
+   - Verify crash safety (kill -9 at random points, verify recovery)
+```
+
+## 2. Performance Targets
+
+### Query Latency (10M vectors, 384 dim, fp16)
+
+| Hardware | QPS (single thread) | p50 Latency | p95 Latency | p99 Latency |
+|----------|-------------------|-------------|-------------|-------------|
+| Desktop (AVX-512) | 5,000-15,000 | 0.1 ms | 0.3 ms | 1.0 ms |
+| Desktop (AVX2) | 3,000-8,000 | 0.2 ms | 0.5 ms | 2.0 ms |
+| Laptop (NEON) | 2,000-5,000 | 0.3 ms | 1.0 ms | 3.0 ms |
+| WASM (browser) | 500-2,000 | 1.0 ms | 3.0 ms | 10.0 ms |
+| Cognitum tile | 100-500 | 2.0 ms | 5.0 ms | 15.0 ms |
+
+### Streaming Ingest Rate
+
+| Hardware | Vectors/Second | Bytes/Second | Notes |
+|----------|---------------|-------------|-------|
+| NVMe SSD | 200K-500K | 150-380 MB/s | fsync every 1000 vectors |
+| SATA SSD | 50K-100K | 38-76 MB/s | fsync every 1000 vectors |
+| HDD | 10K-30K | 7-23 MB/s | Sequential append |
+| Network (1 Gbps) | 50K-100K | 38-76 MB/s | Streaming over network |
+
+### Progressive Load Times
+
+| Phase | NVMe SSD | SATA SSD | HDD | Network |
+|-------|----------|----------|-----|---------|
+| Boot (4 KB) | < 0.1 ms | < 0.5 ms | < 10 ms | < 50 ms |
+| First query (4 MB) | < 2 ms | < 10 ms | < 100 ms | < 500 ms |
+| Working quality (200 MB) | < 100 ms | < 500 ms | < 5 s | < 20 s |
+| Full quality (4 GB) | < 2 s | < 10 s | < 120 s | < 400 s |
+
+### Space Efficiency
+
+| Configuration | Bytes/Vector | File Size (10M) | Ratio vs Raw |
+|--------------|-------------|-----------------|-------------|
+| Raw fp32 | 1,536 | 14.3 GB | 1.0x |
+| RVF uniform fp16 | 768 + overhead | 8.0 GB | 0.56x |
+| RVF adaptive (equilibrium) | ~300 avg | 3.2 GB | 0.22x |
+| RVF aggressive (binary cold) | ~100 avg | 1.1 GB | 0.08x |
+
+## 3. Crash Safety Tests
+
+### Test 1: Kill During Vector Ingest
+
+```
+1. Start ingesting 1M vectors
+2. After 500K vectors: kill -9 the writer
+3. Verify: file is readable
+4. Verify: latest valid manifest is found
+5. Verify: all vectors referenced by latest manifest are intact
+6. Verify: no data corruption (all segment hashes valid)
+```
+
+**Pass criteria**: Zero data loss for committed segments. At most the
+last incomplete segment is lost (bounded by fsync interval).
+
+### Test 2: Kill During Manifest Write
+
+```
+1. Create file with 1M vectors
+2. Trigger manifest rewrite (add metadata, trigger compaction)
+3. Kill -9 during manifest write
+4. Verify: file falls back to previous valid manifest
+5. Verify: all queries work correctly with previous manifest
+```
+
+**Pass criteria**: Automatic fallback to previous manifest. No manual
+recovery needed.
+
+### Test 3: Kill During Compaction
+
+```
+1. Create file with 1M vectors across 100 small VEC_SEGs
+2. Trigger compaction
+3. Kill -9 during compaction
+4. Verify: file is readable (old segments still valid)
+5. Verify: partial compaction output is safely ignored
+```
+
+**Pass criteria**: Old segments remain valid. Incomplete compaction
+output has no manifest reference and is safely orphaned.
+
+### Test 4: Bit Flip Detection
+
+```
+1. Create valid RVF file
+2. Flip random bits in various locations
+3. Verify: corruption detected by hash/CRC checks
+4. Verify: specific corrupted segment identified
+5. Verify: other segments still readable
+```
+
+**Pass criteria**: 100% detection of single-bit flips. Corruption
+isolated to affected segment.
+
+## 4. Scalability Tests
+
+### Test: 1 Billion Vectors
+
+```
+Dataset:     1B vectors, 384 dimensions, fp16
+File size:   ~700 GB (raw) -> ~200 GB (adaptive RVF)
+Hardware:    Server with 256 GB RAM, NVMe array
+
+Verify:
+  - Boot time < 10 ms
+  - First query < 100 ms
+  - Full quality convergence < 60 s
+  - Recall@10 >= 0.95 at full quality
+  - Streaming ingest sustained at 100K+ vectors/second
+```
+
+### Test: High Dimensionality
+
+```
+Dataset:     1M vectors, 4096 dimensions (LLM embeddings)
+File size:   ~8 GB (fp16)
+
+Verify:
+  - PQ compression to 5-bit achieves >= 10x compression
+  - Recall@10 >= 0.90 with PQ
+  - Query latency < 5 ms (p95) with PQ + HNSW
+```
+
+### Test: Multi-File Sharding
+
+```
+Dataset:     100M vectors across 10 shard files
+Verify:
+  - Transparent query across all shards
+  - Shard addition without full rebuild
+  - Individual shard compaction
+  - Shard removal with manifest update only
+```
+
+## 5. WASM Performance Tests
+
+### Browser Environment
+
+```
+Runtime:     Chrome V8 / Firefox SpiderMonkey
+SIMD:        WASM v128
+Memory:      Limited to 4 GB WASM heap
+
+Test: Load 1M vector RVF file via fetch()
+  - Boot time < 50 ms
+  - First query < 200 ms (after boot)
+  - QPS >= 500 (single thread)
+  - Memory usage < 500 MB
+```
+
+### Cognitum Tile Simulation
+
+```
+Runtime:     wasmtime with memory limits
+Code limit:  8 KB
+Data limit:  8 KB
+Scratch:     64 KB
+
+Test: Process 1000 blocks via hub protocol
+  - Distance computation matches reference implementation
+  - Top-K results match brute-force within quantization tolerance
+  - No memory access out of bounds
+  - Tile recovers from simulated faults
+```
+
+## 6. Interoperability Tests
+
+### Round-Trip Test
+
+```
+1. Create RVF file from numpy arrays
+2. Read back with independent implementation
+3. Verify: all vectors bit-identical
+4. Verify: all metadata preserved
+5. Verify: index produces same results
+```
+
+### Profile Compatibility Test
+
+```
+1. Create RVDNA file with genomic data
+2. Create RVText file with text embeddings
+3. Read both with generic RVF reader
+4. Verify: generic reader can access vectors and metadata
+5. Verify: profile-specific features degrade gracefully
+```
+
+### Version Forward Compatibility Test
+
+```
+1. Create RVF file with version 1
+2. Add segments with hypothetical version 2 features (unknown tags)
+3. Read with version 1 reader
+4. Verify: version 1 reader skips unknown segments/tags
+5. Verify: version 1 data is fully accessible
+```
+
+## 7. Security Tests
+
+### Signature Verification
+
+```
+1. Create signed RVF file (ML-DSA-65)
+2. Verify all segment signatures
+3. Modify one byte in a signed segment
+4. Verify: modification detected
+5. Verify: other segments still valid
+```
+
+### Encryption Round-Trip
+
+```
+1. Create encrypted RVF file (ML-KEM-768 + AES-256-GCM)
+2. Decrypt with correct key
+3. Verify: plaintext matches original
+4. Attempt decrypt with wrong key
+5. Verify: decryption fails (GCM auth tag mismatch)
+```
+
+### Key Rotation
+
+```
+1. Create file signed with key A
+2. Rotate to key B (write CRYPTO_SEG rotation record)
+3. Write new segments signed with key B
+4. Verify: old segments valid with key A
+5. Verify: new segments valid with key B
+6. Verify: cross-signature in rotation record is valid
+```
+
+## 8. Benchmark Harness
+
+### Recommended Tools
+
+| Purpose | Tool | Notes |
+|---------|------|-------|
+| Latency measurement | criterion (Rust) / benchmark.js | Statistical rigor |
+| Recall measurement | Custom recall@K computation | Against brute-force ground truth |
+| Memory profiling | valgrind massif / Chrome DevTools | Peak and sustained |
+| I/O profiling | blktrace / iostat | Verify read patterns |
+| SIMD verification | Intel SDE / ARM emulator | Correct SIMD codegen |
+| Crash testing | Custom harness with kill -9 | Random timing |
+
+### Report Format
+
+Each benchmark run produces a report:
+
+```json
+{
+  "test_name": "cold_start_10m",
+  "dataset": {
+    "vector_count": 10000000,
+    "dimensions": 384,
+    "dtype": "fp16",
+    "file_size_bytes": 10737418240
+  },
+  "hardware": {
+    "cpu": "Intel Xeon w5-3435X",
+    "simd": "AVX-512",
+    "ram_gb": 256,
+    "storage": "NVMe Samsung 990 Pro"
+  },
+  "results": {
+    "boot_ms": 0.08,
+    "first_query_ms": 12.3,
+    "first_query_recall_at_10": 0.73,
+    "working_quality_ms": 340,
+    "working_quality_recall_at_10": 0.87,
+    "full_quality_ms": 3200,
+    "full_quality_recall_at_10": 0.96,
+    "steady_state_qps": 8500,
+    "steady_state_p50_ms": 0.12,
+    "steady_state_p95_ms": 0.28,
+    "steady_state_p99_ms": 0.85,
+    "data_read_first_query_mb": 3.2,
+    "data_read_working_quality_mb": 180
+  }
+}
+```