RuVector Benchmark Suite
Comprehensive benchmarks comparing ruvector vs pgvector across multiple dimensions.
Overview
This benchmark suite provides:
- Rust Benchmarks - Low-level performance testing using Criterion
- SQL Benchmarks - Realistic PostgreSQL workload testing
- Automated CI - GitHub Actions workflow for continuous benchmarking
Quick Start
Run All Benchmarks
cd crates/ruvector-postgres
bash benches/scripts/run_benchmarks.sh
Run Individual Benchmarks
# Distance function benchmarks
cargo bench --bench distance_bench
# HNSW index benchmarks
cargo bench --bench index_bench
# Quantization benchmarks
cargo bench --bench quantization_bench
# Quantized distance benchmarks
cargo bench --bench quantized_distance_bench
Run SQL Benchmarks
# Setup database
createdb ruvector_bench
psql -d ruvector_bench -c 'CREATE EXTENSION ruvector;'
psql -d ruvector_bench -c 'CREATE EXTENSION pgvector;'
# Quick benchmark (10k vectors)
psql -d ruvector_bench -f benches/sql/quick_benchmark.sql
# Full workload (1M vectors)
psql -d ruvector_bench -f benches/sql/benchmark_workload.sql
Benchmark Categories
1. Distance Function Benchmarks (distance_bench.rs)
Tests distance calculation performance across different vector dimensions:
- L2 (Euclidean) Distance: Scalar vs SIMD implementations
- Cosine Distance: Normalized similarity measurement
- Inner Product: Dot product for maximum inner product search
- Batch Operations: Sequential vs parallel processing
Dimensions tested: 128, 384, 768, 1536, 3072
Key metrics:
- Single operation latency
- Throughput (ops/sec)
- SIMD speedup vs scalar
2. HNSW Index Benchmarks (index_bench.rs)
Tests Hierarchical Navigable Small World graph index:
Build Benchmarks
- Index construction time vs dataset size (1K, 10K, 100K, 1M vectors)
- Impact of
ef_constructionparameter (16, 32, 64, 128, 256) - Impact of
Mparameter (8, 12, 16, 24, 32, 48)
Search Benchmarks
- Query latency vs dataset size
- Impact of
ef_searchparameter (10, 20, 40, 80, 160, 320) - Impact of
k(number of neighbors: 1, 5, 10, 20, 50, 100)
Recall Accuracy
- Recall@10 vs
ef_searchvalues - Ground truth comparison
Memory Usage
- Index size vs dataset size
- Memory per vector overhead
Dimensions tested: 128, 384, 768, 1536
3. Quantization Benchmarks (quantization_bench.rs)
Tests vector compression and quantized search:
Scalar Quantization (SQ8)
- Encoding/decoding speed
- Distance calculation speedup
- Recall vs exact search
- Memory reduction (4x compression)
Binary Quantization
- Encoding speed
- Hamming distance calculation (SIMD)
- Massive compression (32x for f32)
- Re-ranking strategies
Product Quantization (PQ)
- ADC (Asymmetric Distance Computation)
- SIMD vs scalar lookup
- Configurable compression ratios
Key metrics:
- Speedup vs exact search
- Recall@10 accuracy
- Compression ratio
- Throughput improvement
4. SQL Workload Benchmarks
Realistic PostgreSQL scenarios:
Quick Benchmark (quick_benchmark.sql)
- 10,000 vectors, 768 dimensions
- Sequential scan baseline
- HNSW index build
- Index search performance
- Distance function comparisons
Full Workload (benchmark_workload.sql)
- 1,000,000 vectors, 1536 dimensions
- 1,000 queries for statistical significance
- P50, P99 latency measurements
- Memory usage analysis
- Recall accuracy testing
- ruvector vs pgvector comparison
Understanding Results
Criterion Output
Distance/euclidean/scalar/768
time: [2.1234 µs 2.1456 µs 2.1678 µs]
thrpt: [354.23 Melem/s 357.89 Melem/s 361.55 Melem/s]
- time: Mean execution time with confidence intervals
- thrpt: Throughput (operations per second)
Comparing Implementations
# Set baseline
cargo bench --bench distance_bench -- --save-baseline main
# Make changes, then compare
cargo bench --bench distance_bench -- --baseline main
SQL Benchmark Interpretation
p50_ms | p99_ms | avg_ms | min_ms | max_ms
--------+--------+--------+--------+--------
0.856 | 1.234 | 0.912 | 0.654 | 2.456
- p50: Median latency (50th percentile)
- p99: 99th percentile latency (worst 1%)
- avg: Average latency
- min/max: Best and worst case
Performance Targets
Distance Functions
| Operation | Dimension | Target Throughput |
|---|---|---|
| L2 (SIMD) | 768 | > 400 Mops/s |
| L2 (SIMD) | 1536 | > 200 Mops/s |
| Cosine | 768 | > 300 Mops/s |
| Inner Product | 768 | > 500 Mops/s |
HNSW Index
| Dataset Size | Build Time | Search Latency | Recall@10 |
|---|---|---|---|
| 100K | < 30s | < 1ms | > 0.95 |
| 1M | < 5min | < 2ms | > 0.95 |
| 10M | < 1hr | < 5ms | > 0.90 |
Quantization
| Method | Compression | Speedup | Recall@10 |
|---|---|---|---|
| SQ8 | 4x | 2-3x | > 0.95 |
| Binary | 32x | 10-20x | > 0.85 |
| PQ(8) | 16x | 5-10x | > 0.90 |
Continuous Integration
The GitHub Actions workflow runs automatically on:
- Pull requests touching benchmark code
- Pushes to
mainanddevelopbranches - Manual workflow dispatch
Results are:
- Posted as PR comments
- Stored as artifacts (30 day retention)
- Tracked over time on main branch
- Compared against baseline
Triggering Manual Runs
# From GitHub UI: Actions → Benchmarks → Run workflow
# Or using gh CLI
gh workflow run benchmarks.yml
Enabling SQL Benchmarks in CI
SQL benchmarks are disabled by default (too slow). Enable via workflow dispatch:
gh workflow run benchmarks.yml -f run_sql_benchmarks=true
Advanced Usage
Profiling with Criterion
# Generate flamegraph
cargo bench --bench distance_bench -- --profile-time=5
# Output to specific format
cargo bench --bench distance_bench -- --output-format bencher
Custom Benchmark Parameters
Edit benchmark files to adjust:
- Vector dimensions
- Dataset sizes
- Number of queries
- HNSW parameters (M, ef_construction, ef_search)
- Quantization settings
Comparing with pgvector
Ensure pgvector is installed:
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
Then run SQL benchmarks for side-by-side comparison.
Interpreting Regressions
Performance Degradation Alert
If CI fails due to performance regression:
- Check the comparison: Review the baseline vs current results
- Validate the change: Ensure it's not due to measurement noise
- Profile the code: Use flamegraphs to identify bottlenecks
- Consider trade-offs: Sometimes correctness > speed
Common Causes
- SIMD disabled: Check compiler flags
- Debug build: Ensure --release mode
- Thermal throttling: CPU overheating in CI
- Cache effects: Different data access patterns
Contributing
When adding benchmarks:
- Add to appropriate
*_bench.rsfile - Update this README
- Ensure benchmarks complete in < 5 minutes
- Use
black_box()to prevent optimization - Test both small and large inputs
Resources
License
Same as ruvector project - MIT