Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
341
vendor/ruvector/docs/research/rvf/benchmarks/acceptance-tests.md
vendored
Normal file
341
vendor/ruvector/docs/research/rvf/benchmarks/acceptance-tests.md
vendored
Normal file
@@ -0,0 +1,341 @@
|
||||
# RVF Acceptance Tests and Performance Targets
|
||||
|
||||
## 1. Primary Acceptance Test
|
||||
|
||||
> **Cold start on a 10 million vector file: load and answer the first query with a
|
||||
> useful result (recall@10 >= 0.70) without reading more than the last 4 MB, then
|
||||
> converge to full quality (recall@10 >= 0.95) as it progressively maps more segments.**
|
||||
|
||||
### Test Parameters
|
||||
|
||||
```
|
||||
Dataset: 10 million vectors
|
||||
Dimensions: 384 (sentence embedding size)
|
||||
Base dtype: fp16 (768 bytes per vector)
|
||||
Raw file size: ~7.2 GB (vectors only)
|
||||
With index: ~10-12 GB total
|
||||
Query set: 1000 queries from held-out test set
|
||||
Ground truth: Brute-force exact k-NN (k=10)
|
||||
Metric: L2 distance
|
||||
```
|
||||
|
||||
### Success Criteria
|
||||
|
||||
| Phase | Time Budget | Data Read | Min Recall@10 | Description |
|
||||
|-------|------------|-----------|---------------|-------------|
|
||||
| Boot | < 5 ms | 4 KB (Level 0) | N/A | Parse root manifest |
|
||||
| First query | < 50 ms | <= 4 MB | >= 0.70 | Layer A + hot cache |
|
||||
| Working quality | < 500 ms | <= 200 MB | >= 0.85 | Layer A + B |
|
||||
| Full quality | < 5 s | <= 4 GB | >= 0.95 | Layers A + B + C |
|
||||
| Optimized | < 30 s | Full file | >= 0.98 | All layers + hot tier |
|
||||
|
||||
### Measurement Methodology
|
||||
|
||||
```
|
||||
1. Create RVF file from 10M vector dataset
|
||||
- Build full HNSW index (M=16, ef_construction=200)
|
||||
- Compute temperature tiers (default: all warm initially)
|
||||
- Write with all segment types
|
||||
|
||||
2. Cold start measurement
|
||||
- Drop filesystem cache: echo 3 > /proc/sys/vm/drop_caches
|
||||
- Open file, start timer
|
||||
- Read Level 0 (4 KB), record time T_boot
|
||||
- Read hotset data, record time T_hotset
|
||||
- Execute first query, record time T_first_query and recall@10
|
||||
- Continue progressive loading
|
||||
- At each milestone: record time, data read, recall@10
|
||||
|
||||
3. Throughput measurement (warm)
|
||||
- After full load, execute 1000 queries
|
||||
- Measure queries per second (QPS)
|
||||
- Measure p50, p95, p99 latency
|
||||
- Measure recall@10 average
|
||||
|
||||
4. Streaming ingest measurement
|
||||
- Start with empty file
|
||||
- Ingest 10M vectors in streaming mode
|
||||
- Measure ingest rate (vectors/second)
|
||||
- Measure file size over time
|
||||
- Verify crash safety (kill -9 at random points, verify recovery)
|
||||
```
|
||||
|
||||
## 2. Performance Targets
|
||||
|
||||
### Query Latency (10M vectors, 384 dim, fp16)
|
||||
|
||||
| Hardware | QPS (single thread) | p50 Latency | p95 Latency | p99 Latency |
|
||||
|----------|-------------------|-------------|-------------|-------------|
|
||||
| Desktop (AVX-512) | 5,000-15,000 | 0.1 ms | 0.3 ms | 1.0 ms |
|
||||
| Desktop (AVX2) | 3,000-8,000 | 0.2 ms | 0.5 ms | 2.0 ms |
|
||||
| Laptop (NEON) | 2,000-5,000 | 0.3 ms | 1.0 ms | 3.0 ms |
|
||||
| WASM (browser) | 500-2,000 | 1.0 ms | 3.0 ms | 10.0 ms |
|
||||
| Cognitum tile | 100-500 | 2.0 ms | 5.0 ms | 15.0 ms |
|
||||
|
||||
### Streaming Ingest Rate
|
||||
|
||||
| Hardware | Vectors/Second | Bytes/Second | Notes |
|
||||
|----------|---------------|-------------|-------|
|
||||
| NVMe SSD | 200K-500K | 150-380 MB/s | fsync every 1000 vectors |
|
||||
| SATA SSD | 50K-100K | 38-76 MB/s | fsync every 1000 vectors |
|
||||
| HDD | 10K-30K | 7-23 MB/s | Sequential append |
|
||||
| Network (1 Gbps) | 50K-100K | 38-76 MB/s | Streaming over network |
|
||||
|
||||
### Progressive Load Times
|
||||
|
||||
| Phase | NVMe SSD | SATA SSD | HDD | Network |
|
||||
|-------|----------|----------|-----|---------|
|
||||
| Boot (4 KB) | < 0.1 ms | < 0.5 ms | < 10 ms | < 50 ms |
|
||||
| First query (4 MB) | < 2 ms | < 10 ms | < 100 ms | < 500 ms |
|
||||
| Working quality (200 MB) | < 100 ms | < 500 ms | < 5 s | < 20 s |
|
||||
| Full quality (4 GB) | < 2 s | < 10 s | < 120 s | < 400 s |
|
||||
|
||||
### Space Efficiency
|
||||
|
||||
| Configuration | Bytes/Vector | File Size (10M) | Ratio vs Raw |
|
||||
|--------------|-------------|-----------------|-------------|
|
||||
| Raw fp32 | 1,536 | 14.3 GB | 1.0x |
|
||||
| RVF uniform fp16 | 768 + overhead | 8.0 GB | 0.56x |
|
||||
| RVF adaptive (equilibrium) | ~300 avg | 3.2 GB | 0.22x |
|
||||
| RVF aggressive (binary cold) | ~100 avg | 1.1 GB | 0.08x |
|
||||
|
||||
## 3. Crash Safety Tests
|
||||
|
||||
### Test 1: Kill During Vector Ingest
|
||||
|
||||
```
|
||||
1. Start ingesting 1M vectors
|
||||
2. After 500K vectors: kill -9 the writer
|
||||
3. Verify: file is readable
|
||||
4. Verify: latest valid manifest is found
|
||||
5. Verify: all vectors referenced by latest manifest are intact
|
||||
6. Verify: no data corruption (all segment hashes valid)
|
||||
```
|
||||
|
||||
**Pass criteria**: Zero data loss for committed segments. At most the
|
||||
last incomplete segment is lost (bounded by fsync interval).
|
||||
|
||||
### Test 2: Kill During Manifest Write
|
||||
|
||||
```
|
||||
1. Create file with 1M vectors
|
||||
2. Trigger manifest rewrite (add metadata, trigger compaction)
|
||||
3. Kill -9 during manifest write
|
||||
4. Verify: file falls back to previous valid manifest
|
||||
5. Verify: all queries work correctly with previous manifest
|
||||
```
|
||||
|
||||
**Pass criteria**: Automatic fallback to previous manifest. No manual
|
||||
recovery needed.
|
||||
|
||||
### Test 3: Kill During Compaction
|
||||
|
||||
```
|
||||
1. Create file with 1M vectors across 100 small VEC_SEGs
|
||||
2. Trigger compaction
|
||||
3. Kill -9 during compaction
|
||||
4. Verify: file is readable (old segments still valid)
|
||||
5. Verify: partial compaction output is safely ignored
|
||||
```
|
||||
|
||||
**Pass criteria**: Old segments remain valid. Incomplete compaction
|
||||
output has no manifest reference and is safely orphaned.
|
||||
|
||||
### Test 4: Bit Flip Detection
|
||||
|
||||
```
|
||||
1. Create valid RVF file
|
||||
2. Flip random bits in various locations
|
||||
3. Verify: corruption detected by hash/CRC checks
|
||||
4. Verify: specific corrupted segment identified
|
||||
5. Verify: other segments still readable
|
||||
```
|
||||
|
||||
**Pass criteria**: 100% detection of single-bit flips. Corruption
|
||||
isolated to affected segment.
|
||||
|
||||
## 4. Scalability Tests
|
||||
|
||||
### Test: 1 Billion Vectors
|
||||
|
||||
```
|
||||
Dataset: 1B vectors, 384 dimensions, fp16
|
||||
File size: ~700 GB (raw) -> ~200 GB (adaptive RVF)
|
||||
Hardware: Server with 256 GB RAM, NVMe array
|
||||
|
||||
Verify:
|
||||
- Boot time < 10 ms
|
||||
- First query < 100 ms
|
||||
- Full quality convergence < 60 s
|
||||
- Recall@10 >= 0.95 at full quality
|
||||
- Streaming ingest sustained at 100K+ vectors/second
|
||||
```
|
||||
|
||||
### Test: High Dimensionality
|
||||
|
||||
```
|
||||
Dataset: 1M vectors, 4096 dimensions (LLM embeddings)
|
||||
File size: ~8 GB (fp16)
|
||||
|
||||
Verify:
|
||||
- PQ compression to 5-bit achieves >= 10x compression
|
||||
- Recall@10 >= 0.90 with PQ
|
||||
- Query latency < 5 ms (p95) with PQ + HNSW
|
||||
```
|
||||
|
||||
### Test: Multi-File Sharding
|
||||
|
||||
```
|
||||
Dataset: 100M vectors across 10 shard files
|
||||
Verify:
|
||||
- Transparent query across all shards
|
||||
- Shard addition without full rebuild
|
||||
- Individual shard compaction
|
||||
- Shard removal with manifest update only
|
||||
```
|
||||
|
||||
## 5. WASM Performance Tests
|
||||
|
||||
### Browser Environment
|
||||
|
||||
```
|
||||
Runtime: Chrome V8 / Firefox SpiderMonkey
|
||||
SIMD: WASM v128
|
||||
Memory: Limited to 4 GB WASM heap
|
||||
|
||||
Test: Load 1M vector RVF file via fetch()
|
||||
- Boot time < 50 ms
|
||||
- First query < 200 ms (after boot)
|
||||
- QPS >= 500 (single thread)
|
||||
- Memory usage < 500 MB
|
||||
```
|
||||
|
||||
### Cognitum Tile Simulation
|
||||
|
||||
```
|
||||
Runtime: wasmtime with memory limits
|
||||
Code limit: 8 KB
|
||||
Data limit: 8 KB
|
||||
Scratch: 64 KB
|
||||
|
||||
Test: Process 1000 blocks via hub protocol
|
||||
- Distance computation matches reference implementation
|
||||
- Top-K results match brute-force within quantization tolerance
|
||||
- No memory access out of bounds
|
||||
- Tile recovers from simulated faults
|
||||
```
|
||||
|
||||
## 6. Interoperability Tests
|
||||
|
||||
### Round-Trip Test
|
||||
|
||||
```
|
||||
1. Create RVF file from numpy arrays
|
||||
2. Read back with independent implementation
|
||||
3. Verify: all vectors bit-identical
|
||||
4. Verify: all metadata preserved
|
||||
5. Verify: index produces same results
|
||||
```
|
||||
|
||||
### Profile Compatibility Test
|
||||
|
||||
```
|
||||
1. Create RVDNA file with genomic data
|
||||
2. Create RVText file with text embeddings
|
||||
3. Read both with generic RVF reader
|
||||
4. Verify: generic reader can access vectors and metadata
|
||||
5. Verify: profile-specific features degrade gracefully
|
||||
```
|
||||
|
||||
### Version Forward Compatibility Test
|
||||
|
||||
```
|
||||
1. Create RVF file with version 1
|
||||
2. Add segments with hypothetical version 2 features (unknown tags)
|
||||
3. Read with version 1 reader
|
||||
4. Verify: version 1 reader skips unknown segments/tags
|
||||
5. Verify: version 1 data is fully accessible
|
||||
```
|
||||
|
||||
## 7. Security Tests
|
||||
|
||||
### Signature Verification
|
||||
|
||||
```
|
||||
1. Create signed RVF file (ML-DSA-65)
|
||||
2. Verify all segment signatures
|
||||
3. Modify one byte in a signed segment
|
||||
4. Verify: modification detected
|
||||
5. Verify: other segments still valid
|
||||
```
|
||||
|
||||
### Encryption Round-Trip
|
||||
|
||||
```
|
||||
1. Create encrypted RVF file (ML-KEM-768 + AES-256-GCM)
|
||||
2. Decrypt with correct key
|
||||
3. Verify: plaintext matches original
|
||||
4. Attempt decrypt with wrong key
|
||||
5. Verify: decryption fails (GCM auth tag mismatch)
|
||||
```
|
||||
|
||||
### Key Rotation
|
||||
|
||||
```
|
||||
1. Create file signed with key A
|
||||
2. Rotate to key B (write CRYPTO_SEG rotation record)
|
||||
3. Write new segments signed with key B
|
||||
4. Verify: old segments valid with key A
|
||||
5. Verify: new segments valid with key B
|
||||
6. Verify: cross-signature in rotation record is valid
|
||||
```
|
||||
|
||||
## 8. Benchmark Harness
|
||||
|
||||
### Recommended Tools
|
||||
|
||||
| Purpose | Tool | Notes |
|
||||
|---------|------|-------|
|
||||
| Latency measurement | criterion (Rust) / benchmark.js | Statistical rigor |
|
||||
| Recall measurement | Custom recall@K computation | Against brute-force ground truth |
|
||||
| Memory profiling | valgrind massif / Chrome DevTools | Peak and sustained |
|
||||
| I/O profiling | blktrace / iostat | Verify read patterns |
|
||||
| SIMD verification | Intel SDE / ARM emulator | Correct SIMD codegen |
|
||||
| Crash testing | Custom harness with kill -9 | Random timing |
|
||||
|
||||
### Report Format
|
||||
|
||||
Each benchmark run produces a report:
|
||||
|
||||
```json
|
||||
{
|
||||
"test_name": "cold_start_10m",
|
||||
"dataset": {
|
||||
"vector_count": 10000000,
|
||||
"dimensions": 384,
|
||||
"dtype": "fp16",
|
||||
"file_size_bytes": 10737418240
|
||||
},
|
||||
"hardware": {
|
||||
"cpu": "Intel Xeon w5-3435X",
|
||||
"simd": "AVX-512",
|
||||
"ram_gb": 256,
|
||||
"storage": "NVMe Samsung 990 Pro"
|
||||
},
|
||||
"results": {
|
||||
"boot_ms": 0.08,
|
||||
"first_query_ms": 12.3,
|
||||
"first_query_recall_at_10": 0.73,
|
||||
"working_quality_ms": 340,
|
||||
"working_quality_recall_at_10": 0.87,
|
||||
"full_quality_ms": 3200,
|
||||
"full_quality_recall_at_10": 0.96,
|
||||
"steady_state_qps": 8500,
|
||||
"steady_state_p50_ms": 0.12,
|
||||
"steady_state_p95_ms": 0.28,
|
||||
"steady_state_p99_ms": 0.85,
|
||||
"data_read_first_query_mb": 3.2,
|
||||
"data_read_working_quality_mb": 180
|
||||
}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user