Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
436
vendor/ruvector/docs/benchmarks/BENCHMARKING_GUIDE.md
vendored
Normal file
436
vendor/ruvector/docs/benchmarks/BENCHMARKING_GUIDE.md
vendored
Normal file
@@ -0,0 +1,436 @@
|
||||
# Benchmarking Guide
|
||||
|
||||
This guide explains how to run, interpret, and contribute benchmarks for Ruvector.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Running Benchmarks](#running-benchmarks)
|
||||
2. [Benchmark Suite](#benchmark-suite)
|
||||
3. [Interpreting Results](#interpreting-results)
|
||||
4. [Performance Targets](#performance-targets)
|
||||
5. [Comparison Methodology](#comparison-methodology)
|
||||
6. [Contributing Benchmarks](#contributing-benchmarks)
|
||||
|
||||
## Running Benchmarks
|
||||
|
||||
### Quick Start
|
||||
|
||||
```bash
|
||||
# Run all benchmarks
|
||||
cargo bench
|
||||
|
||||
# Run specific benchmark
|
||||
cargo bench distance_metrics
|
||||
cargo bench hnsw_search
|
||||
cargo bench batch_operations
|
||||
|
||||
# With flamegraph profiling
|
||||
cargo flamegraph --bench hnsw_search
|
||||
|
||||
# With criterion reports
|
||||
cargo bench -- --save-baseline main
|
||||
git checkout feature-branch
|
||||
cargo bench -- --baseline main
|
||||
```
|
||||
|
||||
### Benchmark Crates
|
||||
|
||||
```bash
|
||||
# Core benchmarks
|
||||
cd crates/ruvector-bench
|
||||
cargo bench
|
||||
|
||||
# Comparison benchmarks
|
||||
cargo run --release --bin comparison_benchmark
|
||||
|
||||
# Memory benchmarks
|
||||
cargo run --release --bin memory_benchmark
|
||||
|
||||
# Latency benchmarks
|
||||
cargo run --release --bin latency_benchmark
|
||||
```
|
||||
|
||||
### SIMD Optimization
|
||||
|
||||
Enable SIMD for maximum performance:
|
||||
|
||||
```bash
|
||||
RUSTFLAGS="-C target-cpu=native" cargo bench
|
||||
|
||||
# Or specific features
|
||||
RUSTFLAGS="-C target-feature=+avx2,+fma" cargo bench
|
||||
```
|
||||
|
||||
## Benchmark Suite
|
||||
|
||||
### 1. Distance Metrics Benchmark
|
||||
|
||||
**File**: `crates/ruvector-core/benches/distance_metrics.rs`
|
||||
|
||||
**What it measures**: Raw distance calculation performance
|
||||
|
||||
**Metrics**:
|
||||
- Euclidean (L2) distance
|
||||
- Cosine similarity
|
||||
- Dot product
|
||||
- Manhattan (L1) distance
|
||||
- SIMD vs scalar implementations
|
||||
|
||||
**Run**:
|
||||
```bash
|
||||
cargo bench distance_metrics
|
||||
```
|
||||
|
||||
**Expected results**:
|
||||
```
|
||||
euclidean_128d/simd time: [45.234 ns 45.456 ns 45.678 ns]
|
||||
euclidean_128d/scalar time: [312.45 ns 315.23 ns 318.91 ns]
|
||||
↑ 7x slower
|
||||
cosine_128d/simd time: [52.123 ns 52.345 ns 52.567 ns]
|
||||
dotproduct_128d/simd time: [38.901 ns 39.123 ns 39.345 ns]
|
||||
```
|
||||
|
||||
### 2. HNSW Search Benchmark
|
||||
|
||||
**File**: `crates/ruvector-core/benches/hnsw_search.rs`
|
||||
|
||||
**What it measures**: End-to-end search performance
|
||||
|
||||
**Metrics**:
|
||||
- Search latency (p50, p95, p99)
|
||||
- Queries per second (QPS)
|
||||
- Recall accuracy
|
||||
- Different dataset sizes (1K, 10K, 100K, 1M vectors)
|
||||
- Different ef_search values (50, 100, 200, 500)
|
||||
|
||||
**Run**:
|
||||
```bash
|
||||
cargo bench hnsw_search
|
||||
```
|
||||
|
||||
**Expected results**:
|
||||
```
|
||||
search_1M_vectors_k10_ef100
|
||||
time: [845.23 µs 856.78 µs 868.45 µs]
|
||||
thrpt: [1,151 queries/s]
|
||||
recall: [95.2%]
|
||||
|
||||
search_1M_vectors_k10_ef200
|
||||
time: [1.678 ms 1.689 ms 1.701 ms]
|
||||
thrpt: [587 queries/s]
|
||||
recall: [98.7%]
|
||||
```
|
||||
|
||||
### 3. Batch Operations Benchmark
|
||||
|
||||
**File**: `crates/ruvector-core/benches/batch_operations.rs`
|
||||
|
||||
**What it measures**: Throughput for bulk operations
|
||||
|
||||
**Metrics**:
|
||||
- Batch insert throughput
|
||||
- Parallel vs sequential inserts
|
||||
- Different batch sizes (100, 1K, 10K)
|
||||
|
||||
**Run**:
|
||||
```bash
|
||||
cargo bench batch_operations
|
||||
```
|
||||
|
||||
**Expected results**:
|
||||
```
|
||||
batch_insert_1000_parallel
|
||||
time: [45.234 ms 46.123 ms 47.012 ms]
|
||||
thrpt: [21,271 vectors/s]
|
||||
|
||||
batch_insert_1000_sequential
|
||||
time: [234.56 ms 238.91 ms 243.27 ms]
|
||||
thrpt: [4,111 vectors/s]
|
||||
↑ 5x slower
|
||||
```
|
||||
|
||||
### 4. Quantization Benchmark
|
||||
|
||||
**File**: `crates/ruvector-core/benches/quantization_bench.rs`
|
||||
|
||||
**What it measures**: Quantization performance and accuracy
|
||||
|
||||
**Metrics**:
|
||||
- Quantization time
|
||||
- Dequantization time
|
||||
- Distance calculation with quantized vectors
|
||||
- Recall impact
|
||||
|
||||
**Run**:
|
||||
```bash
|
||||
cargo bench quantization
|
||||
```
|
||||
|
||||
**Expected results**:
|
||||
```
|
||||
scalar_quantize_128d time: [234.56 ns 236.78 ns 239.01 ns]
|
||||
product_quantize_128d time: [1.234 µs 1.245 µs 1.256 µs]
|
||||
|
||||
search_with_scalar_quant time: [678.90 µs 685.12 µs 691.34 µs]
|
||||
recall: [97.3%]
|
||||
|
||||
search_with_product_quant time: [523.45 µs 528.67 µs 533.89 µs]
|
||||
recall: [92.8%]
|
||||
```
|
||||
|
||||
### 5. Comprehensive Benchmark
|
||||
|
||||
**File**: `crates/ruvector-core/benches/comprehensive_bench.rs`
|
||||
|
||||
**What it measures**: End-to-end system performance
|
||||
|
||||
**Run**:
|
||||
```bash
|
||||
cargo bench comprehensive
|
||||
```
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
### Criterion Output
|
||||
|
||||
```
|
||||
test_name time: [lower_bound mean upper_bound]
|
||||
thrpt: [throughput]
|
||||
change: [% change from baseline]
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```
|
||||
search_100K_vectors time: [234.56 µs 238.91 µs 243.27 µs]
|
||||
thrpt: [4,111 queries/s]
|
||||
change: [-5.2% -3.8% -2.1%] (faster)
|
||||
```
|
||||
|
||||
**Interpretation**:
|
||||
- Mean: 238.91 µs
|
||||
- 95% confidence interval: [234.56 µs, 243.27 µs]
|
||||
- Throughput: ~4,111 queries/second
|
||||
- 3.8% faster than baseline
|
||||
|
||||
### Latency Percentiles
|
||||
|
||||
```bash
|
||||
cargo run --release --bin latency_benchmark
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```
|
||||
Latency percentiles (100K queries):
|
||||
p50: 0.85 ms
|
||||
p90: 1.23 ms
|
||||
p95: 1.67 ms
|
||||
p99: 3.45 ms
|
||||
p999: 8.91 ms
|
||||
```
|
||||
|
||||
**Interpretation**:
|
||||
- 50% of queries complete in < 0.85ms
|
||||
- 95% of queries complete in < 1.67ms
|
||||
- 99% of queries complete in < 3.45ms
|
||||
|
||||
### Memory Usage
|
||||
|
||||
```bash
|
||||
cargo run --release --bin memory_benchmark
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```
|
||||
Memory usage (1M vectors, 128D):
|
||||
Vectors (full): 512.0 MB
|
||||
Vectors (scalar): 128.0 MB (4x compression)
|
||||
HNSW graph: 640.0 MB
|
||||
Metadata: 50.0 MB
|
||||
──────────────────────────────
|
||||
Total: 818.0 MB
|
||||
```
|
||||
|
||||
## Performance Targets
|
||||
|
||||
### Search Latency
|
||||
|
||||
| Dataset | Target p50 | Target p95 | Target QPS |
|
||||
|---------|-----------|-----------|-----------|
|
||||
| 10K vectors | < 100 µs | < 200 µs | 10,000+ |
|
||||
| 100K vectors | < 500 µs | < 1 ms | 2,000+ |
|
||||
| 1M vectors | < 1 ms | < 2 ms | 1,000+ |
|
||||
| 10M vectors | < 2 ms | < 5 ms | 500+ |
|
||||
|
||||
### Insert Throughput
|
||||
|
||||
| Operation | Target |
|
||||
|-----------|--------|
|
||||
| Single insert | 1,000+ ops/sec |
|
||||
| Batch insert (1K) | 10,000+ vectors/sec |
|
||||
| Batch insert (10K) | 50,000+ vectors/sec |
|
||||
|
||||
### Memory Efficiency
|
||||
|
||||
| Configuration | Target Memory per Vector |
|
||||
|---------------|-------------------------|
|
||||
| Full precision | 512 bytes (128D) |
|
||||
| Scalar quant | 128 bytes (4x compression) |
|
||||
| Product quant | 16-32 bytes (16-32x compression) |
|
||||
|
||||
### Recall Accuracy
|
||||
|
||||
| Configuration | Target Recall |
|
||||
|---------------|---------------|
|
||||
| ef_search=50 | 85%+ |
|
||||
| ef_search=100 | 90%+ |
|
||||
| ef_search=200 | 95%+ |
|
||||
| ef_search=500 | 99%+ |
|
||||
|
||||
## Comparison Methodology
|
||||
|
||||
### Against FAISS
|
||||
|
||||
```bash
|
||||
cargo run --release --bin comparison_benchmark -- --system faiss
|
||||
```
|
||||
|
||||
**Metrics compared**:
|
||||
- Search latency (same dataset, same k)
|
||||
- Memory usage
|
||||
- Build time
|
||||
- Recall@10
|
||||
|
||||
**Example output**:
|
||||
```
|
||||
Benchmark: 1M vectors, 128D, k=10
|
||||
|
||||
Ruvector FAISS Speedup
|
||||
────────────────────────────────────────────────
|
||||
Build time 245s 312s 1.27x
|
||||
Search (p50) 0.85ms 2.34ms 2.75x
|
||||
Search (p95) 1.67ms 4.56ms 2.73x
|
||||
Memory 818MB 1,245MB 1.52x
|
||||
Recall@10 95.2% 95.8% ~same
|
||||
```
|
||||
|
||||
### Versioned Benchmarks
|
||||
|
||||
Track performance over time:
|
||||
|
||||
```bash
|
||||
# Save baseline
|
||||
git checkout v0.1.0
|
||||
cargo bench -- --save-baseline v0.1.0
|
||||
|
||||
# Compare to new version
|
||||
git checkout v0.2.0
|
||||
cargo bench -- --baseline v0.1.0
|
||||
```
|
||||
|
||||
## Contributing Benchmarks
|
||||
|
||||
### Adding a New Benchmark
|
||||
|
||||
1. Create benchmark file:
|
||||
```rust
|
||||
// crates/ruvector-core/benches/my_benchmark.rs
|
||||
use criterion::{black_box, criterion_group, criterion_main, Criterion};
|
||||
use ruvector_core::*;
|
||||
|
||||
fn my_benchmark(c: &mut Criterion) {
|
||||
let db = setup_test_db();
|
||||
|
||||
c.bench_function("my_operation", |b| {
|
||||
b.iter(|| {
|
||||
// Operation to benchmark
|
||||
db.my_operation(black_box(&input))
|
||||
})
|
||||
});
|
||||
}
|
||||
|
||||
criterion_group!(benches, my_benchmark);
|
||||
criterion_main!(benches);
|
||||
```
|
||||
|
||||
2. Register in `Cargo.toml`:
|
||||
```toml
|
||||
[[bench]]
|
||||
name = "my_benchmark"
|
||||
harness = false
|
||||
```
|
||||
|
||||
3. Run and verify:
|
||||
```bash
|
||||
cargo bench my_benchmark
|
||||
```
|
||||
|
||||
### Benchmark Best Practices
|
||||
|
||||
1. **Use `black_box`**: Prevent compiler optimizations
|
||||
```rust
|
||||
b.iter(|| db.search(black_box(&query)))
|
||||
```
|
||||
|
||||
2. **Measure what matters**: Focus on user-facing operations
|
||||
|
||||
3. **Realistic workloads**: Use representative data sizes
|
||||
|
||||
4. **Multiple iterations**: Criterion handles this automatically
|
||||
|
||||
5. **Isolate variables**: Benchmark one thing at a time
|
||||
|
||||
6. **Document context**: Explain what's being measured
|
||||
|
||||
7. **CI integration**: Run benchmarks in CI to catch regressions
|
||||
|
||||
### Profiling
|
||||
|
||||
```bash
|
||||
# Flamegraph
|
||||
cargo flamegraph --bench hnsw_search
|
||||
|
||||
# perf (Linux)
|
||||
perf record -g cargo bench hnsw_search
|
||||
perf report
|
||||
|
||||
# Cachegrind (memory profiling)
|
||||
valgrind --tool=cachegrind cargo bench hnsw_search
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
``yaml
|
||||
- name: Run benchmarks
|
||||
run: |
|
||||
cargo bench --bench distance_metrics -- --save-baseline main
|
||||
|
||||
- name: Compare to baseline
|
||||
run: |
|
||||
cargo bench --bench distance_metrics -- --baseline main
|
||||
```
|
||||
|
||||
### Performance Regression Detection
|
||||
|
||||
Fail CI if performance regresses > 5%:
|
||||
|
||||
```rust
|
||||
// In benchmark code
|
||||
let previous_mean = load_baseline("main");
|
||||
let current_mean = measure_current();
|
||||
let regression = (current_mean - previous_mean) / previous_mean;
|
||||
|
||||
assert!(regression < 0.05, "Performance regression > 5%");
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- [Criterion.rs documentation](https://bheisler.github.io/criterion.rs/book/)
|
||||
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
|
||||
- [Benchmarking Rust programs](https://doc.rust-lang.org/cargo/commands/cargo-bench.html)
|
||||
- [ANN-Benchmarks](http://ann-benchmarks.com/) - Standard vector search benchmarks
|
||||
|
||||
## Questions?
|
||||
|
||||
Open an issue: https://github.com/ruvnet/ruvector/issues
|
||||
Reference in New Issue
Block a user