12 KiB
Ruvector Benchmark Suite Documentation
Comprehensive benchmarking tools for measuring and analyzing Ruvector's performance across various workloads and configurations.
Table of Contents
- Overview
- Installation
- Benchmark Tools
- Quick Start
- Detailed Usage
- Understanding Results
- Performance Targets
- Troubleshooting
Overview
The Ruvector benchmark suite provides:
- ANN-Benchmarks Compatibility: Standard SIFT1M, GIST1M, Deep1M testing
- AgenticDB Workloads: Reflexion episodes, skill libraries, causal graphs
- Latency Analysis: p50, p95, p99, p99.9 percentile measurements
- Memory Profiling: Usage at various scales with quantization effects
- System Comparison: Ruvector vs other implementations
- Performance Profiling: CPU flamegraphs and hotspot analysis
Installation
Prerequisites
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Optional: HDF5 for loading real ANN benchmark datasets
# Ubuntu/Debian
sudo apt-get install libhdf5-dev
# macOS
brew install hdf5
# Optional: Profiling tools
sudo apt-get install linux-perf # Linux only
Build Benchmarks
cd crates/ruvector-bench
# Standard build
cargo build --release
# With profiling support
cargo build --release --features profiling
# With HDF5 dataset support
cargo build --release --features hdf5-datasets
Benchmark Tools
1. ANN Benchmark (ann-benchmark)
Tests standard ANN benchmark datasets with configurable HNSW parameters.
Features:
- SIFT1M (128D, 1M vectors)
- GIST1M (960D, 1M vectors)
- Deep1M (96D, 1M vectors)
- Synthetic dataset generation
- Recall-QPS curves at 90%, 95%, 99%
- Multiple ef_search values
2. AgenticDB Benchmark (agenticdb-benchmark)
Simulates agentic AI workloads.
Workloads:
- Reflexion episode storage/retrieval
- Skill library search
- Causal graph queries
- Learning session throughput (mixed read/write)
3. Latency Benchmark (latency-benchmark)
Measures detailed latency characteristics.
Tests:
- Single-threaded latency
- Multi-threaded latency (configurable thread counts)
- Effect of ef_search on latency
- Effect of quantization on latency/recall tradeoff
4. Memory Benchmark (memory-benchmark)
Profiles memory usage at scale.
Tests:
- Memory at 10K, 100K, 1M vectors
- Effect of quantization (none, scalar, binary)
- Index overhead analysis
- Memory per vector calculation
5. Comparison Benchmark (comparison-benchmark)
Compares Ruvector against other systems.
Comparisons:
- Ruvector (optimized)
- Ruvector (no quantization)
- Simulated Python baseline
- Simulated brute-force search
6. Profiling Benchmark (profiling-benchmark)
Generates performance profiles.
Outputs:
- CPU flamegraphs (SVG)
- Profiling reports
- Hotspot identification
- SIMD utilization analysis
Quick Start
Run All Benchmarks
# Full benchmark suite
./scripts/run_all_benchmarks.sh
# Quick mode (smaller datasets)
./scripts/run_all_benchmarks.sh --quick
# With profiling
./scripts/run_all_benchmarks.sh --profile
Run Individual Benchmarks
# ANN benchmarks
cargo run --release --bin ann-benchmark -- \
--dataset synthetic \
--num-vectors 100000 \
--queries 1000
# AgenticDB workloads
cargo run --release --bin agenticdb-benchmark -- \
--episodes 10000 \
--queries 500
# Latency profiling
cargo run --release --bin latency-benchmark -- \
--num-vectors 50000 \
--threads "1,4,8,16"
# Memory profiling
cargo run --release --bin memory-benchmark -- \
--scales "1000,10000,100000"
# System comparison
cargo run --release --bin comparison-benchmark -- \
--num-vectors 50000
# Performance profiling
cargo run --release --features profiling --bin profiling-benchmark -- \
--flamegraph
Detailed Usage
ANN Benchmark Options
cargo run --release --bin ann-benchmark -- --help
Options:
-d, --dataset <DATASET> Dataset: sift1m, gist1m, deep1m, synthetic [default: synthetic]
-n, --num-vectors <NUM_VECTORS> Number of vectors [default: 100000]
-q, --queries <NUM_QUERIES> Number of queries [default: 1000]
-d, --dimensions <DIMENSIONS> Vector dimensions [default: 128]
-k, --k <K> K nearest neighbors [default: 10]
-m, --m <M> HNSW M parameter [default: 32]
--ef-construction <VALUE> HNSW ef_construction [default: 200]
--ef-search-values <VALUES> HNSW ef_search values (comma-separated) [default: 50,100,200,400]
-o, --output <OUTPUT> Output directory [default: bench_results]
--metric <METRIC> Distance metric [default: cosine]
--quantization <QUANT> Quantization: none, scalar, binary [default: scalar]
AgenticDB Benchmark Options
cargo run --release --bin agenticdb-benchmark -- --help
Options:
--episodes <EPISODES> Number of episodes [default: 10000]
--skills <SKILLS> Number of skills [default: 1000]
-q, --queries <QUERIES> Number of queries [default: 500]
-o, --output <OUTPUT> Output directory [default: bench_results]
Latency Benchmark Options
cargo run --release --bin latency-benchmark -- --help
Options:
-n, --num-vectors <NUM_VECTORS> Number of vectors [default: 50000]
-q, --queries <QUERIES> Number of queries [default: 1000]
-d, --dimensions <DIMENSIONS> Vector dimensions [default: 384]
-t, --threads <THREADS> Thread counts to test [default: 1,4,8,16]
-o, --output <OUTPUT> Output directory [default: bench_results]
Understanding Results
Output Files
Each benchmark generates three output files:
- JSON (
{benchmark}_benchmark.json): Raw data for programmatic analysis - CSV (
{benchmark}_benchmark.csv): Tabular data for spreadsheet analysis - Markdown (
{benchmark}_benchmark.md): Human-readable report
Key Metrics
QPS (Queries Per Second)
- Higher is better
- Measures throughput
- Target: >10,000 QPS for 100K vectors
Latency Percentiles
- p50: Median latency (typical user experience)
- p95: 95th percentile (captures most outliers)
- p99: 99th percentile (worst-case for most users)
- p99.9: 99.9th percentile (extreme outliers)
- Lower is better
- Target: <5ms p99 for 100K vectors
Recall
- Recall@1: Percentage of times the true nearest neighbor is found
- Recall@10: Percentage of true top-10 neighbors found
- Recall@100: Percentage of true top-100 neighbors found
- Higher is better
- Target: >95% recall@10
Memory
- Total memory usage in MB
- Memory per vector in KB
- Compression ratio with quantization
- Target: <2KB per vector with quantization
Reading Benchmark Reports
Example output interpretation:
ef_search QPS p50 (ms) p99 (ms) Recall@10 Memory (MB)
50 15234 0.05 0.12 92.5% 156.2
100 12456 0.06 0.15 96.8% 156.2
200 8932 0.08 0.20 98.9% 156.2
Analysis:
- Increasing ef_search improves recall but reduces QPS
- ef_search=100 offers good balance (96.8% recall, 12K QPS)
- Memory usage constant across ef_search values
Performance Targets
AgenticDB Replacement Goals
Ruvector targets 10-100x performance improvement over AgenticDB:
| Metric | AgenticDB (Python) | Ruvector (Target) | Speedup |
|---|---|---|---|
| Reflexion Retrieval | ~100 QPS | >5,000 QPS | 50x |
| Skill Search | ~50 QPS | >2,000 QPS | 40x |
| Index Build Time | ~60s/10K | <5s/10K | 12x |
| Memory Usage | ~500MB/100K | <100MB/100K | 5x |
ANN-Benchmarks Targets
Competitive with state-of-the-art implementations:
| Dataset | Recall@10 | QPS Target | Latency p99 |
|---|---|---|---|
| SIFT1M | >95% | >10,000 | <1ms |
| GIST1M | >95% | >5,000 | <2ms |
| Deep1M | >95% | >15,000 | <0.5ms |
Advanced Topics
Profiling with Flamegraphs
Generate CPU flamegraphs to identify performance bottlenecks:
cargo run --release --features profiling --bin profiling-benchmark -- \
--flamegraph \
--output bench_results/profiling
# View flamegraph
firefox bench_results/profiling/flamegraph.svg
Interpreting Flamegraphs:
- Width = CPU time spent
- Height = call stack depth
- Look for wide plateaus (hotspots)
- Focus optimization on top 20% of time
Custom Benchmark Scenarios
Create custom benchmarks by modifying the tools:
// Example: Custom dimension test
let dimensions = vec![64, 128, 256, 512, 768, 1024];
for dim in dimensions {
let result = bench_custom(dim)?;
results.push(result);
}
Continuous Benchmarking
Integrate with CI/CD:
# .github/workflows/benchmark.yml
name: Benchmarks
on: [push]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run benchmarks
run: |
cd crates/ruvector-bench
./scripts/run_all_benchmarks.sh --quick
- name: Upload results
uses: actions/upload-artifact@v2
with:
name: benchmark-results
path: crates/ruvector-bench/bench_results/
Troubleshooting
Common Issues
"HDF5 not found"
# Install HDF5 development libraries
sudo apt-get install libhdf5-dev # Ubuntu/Debian
brew install hdf5 # macOS
# Or build without HDF5 support
cargo build --release --no-default-features
"Out of memory"
# Reduce dataset size
cargo run --release --bin ann-benchmark -- --num-vectors 10000
# Or use quick mode
./scripts/run_all_benchmarks.sh --quick
"Profiling not working"
# Ensure profiling feature is enabled
cargo build --release --features profiling
# Linux: May need perf permissions
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
"Benchmarks taking too long"
# Use quick mode
./scripts/run_all_benchmarks.sh --quick
# Or run individual benchmarks
cargo run --release --bin latency-benchmark -- --queries 100
Performance Debugging
If benchmarks show unexpectedly slow results:
-
Check CPU governor:
# Linux: Use performance mode sudo cpupower frequency-set -g performance -
Verify release build:
cargo build --release # Not --debug! -
Check system load:
htop # Ensure no other heavy processes -
Review HNSW parameters:
- Reduce ef_construction for faster indexing
- Reduce ef_search for faster queries (at cost of recall)
Results Analysis
Comparing Runs
# Compare two benchmark runs
diff -u bench_results_old/ann_benchmark.csv bench_results_new/ann_benchmark.csv
# Plot results with Python
python3 scripts/plot_results.py bench_results/
Statistical Significance
For reliable benchmarks:
- Run multiple iterations (3-5 times)
- Use appropriate dataset sizes (>10K vectors)
- Ensure consistent system load
- Record system specs in metadata
Contributing
To add new benchmarks:
- Create new binary in
src/bin/ - Use
ruvector_benchutilities - Output results in standard format
- Update this documentation
- Add to
run_all_benchmarks.sh
References
Support
For issues or questions:
- GitHub Issues: https://github.com/ruvnet/ruvector/issues
- Documentation: https://github.com/ruvnet/ruvector/docs
Last updated: 2025-11-19