Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,307 @@
# RuVector Benchmark Suite
Comprehensive benchmarks comparing ruvector vs pgvector across multiple dimensions.
## Overview
This benchmark suite provides:
1. **Rust Benchmarks** - Low-level performance testing using Criterion
2. **SQL Benchmarks** - Realistic PostgreSQL workload testing
3. **Automated CI** - GitHub Actions workflow for continuous benchmarking
## Quick Start
### Run All Benchmarks
```bash
cd crates/ruvector-postgres
bash benches/scripts/run_benchmarks.sh
```
### Run Individual Benchmarks
```bash
# Distance function benchmarks
cargo bench --bench distance_bench
# HNSW index benchmarks
cargo bench --bench index_bench
# Quantization benchmarks
cargo bench --bench quantization_bench
# Quantized distance benchmarks
cargo bench --bench quantized_distance_bench
```
### Run SQL Benchmarks
```bash
# Setup database
createdb ruvector_bench
psql -d ruvector_bench -c 'CREATE EXTENSION ruvector;'
psql -d ruvector_bench -c 'CREATE EXTENSION pgvector;'
# Quick benchmark (10k vectors)
psql -d ruvector_bench -f benches/sql/quick_benchmark.sql
# Full workload (1M vectors)
psql -d ruvector_bench -f benches/sql/benchmark_workload.sql
```
## Benchmark Categories
### 1. Distance Function Benchmarks (`distance_bench.rs`)
Tests distance calculation performance across different vector dimensions:
- **L2 (Euclidean) Distance**: Scalar vs SIMD implementations
- **Cosine Distance**: Normalized similarity measurement
- **Inner Product**: Dot product for maximum inner product search
- **Batch Operations**: Sequential vs parallel processing
**Dimensions tested**: 128, 384, 768, 1536, 3072
**Key metrics**:
- Single operation latency
- Throughput (ops/sec)
- SIMD speedup vs scalar
### 2. HNSW Index Benchmarks (`index_bench.rs`)
Tests Hierarchical Navigable Small World graph index:
#### Build Benchmarks
- Index construction time vs dataset size (1K, 10K, 100K, 1M vectors)
- Impact of `ef_construction` parameter (16, 32, 64, 128, 256)
- Impact of `M` parameter (8, 12, 16, 24, 32, 48)
#### Search Benchmarks
- Query latency vs dataset size
- Impact of `ef_search` parameter (10, 20, 40, 80, 160, 320)
- Impact of `k` (number of neighbors: 1, 5, 10, 20, 50, 100)
#### Recall Accuracy
- Recall@10 vs `ef_search` values
- Ground truth comparison
#### Memory Usage
- Index size vs dataset size
- Memory per vector overhead
**Dimensions tested**: 128, 384, 768, 1536
### 3. Quantization Benchmarks (`quantization_bench.rs`)
Tests vector compression and quantized search:
#### Scalar Quantization (SQ8)
- Encoding/decoding speed
- Distance calculation speedup
- Recall vs exact search
- Memory reduction (4x compression)
#### Binary Quantization
- Encoding speed
- Hamming distance calculation (SIMD)
- Massive compression (32x for f32)
- Re-ranking strategies
#### Product Quantization (PQ)
- ADC (Asymmetric Distance Computation)
- SIMD vs scalar lookup
- Configurable compression ratios
**Key metrics**:
- Speedup vs exact search
- Recall@10 accuracy
- Compression ratio
- Throughput improvement
### 4. SQL Workload Benchmarks
Realistic PostgreSQL scenarios:
#### Quick Benchmark (`quick_benchmark.sql`)
- 10,000 vectors, 768 dimensions
- Sequential scan baseline
- HNSW index build
- Index search performance
- Distance function comparisons
#### Full Workload (`benchmark_workload.sql`)
- 1,000,000 vectors, 1536 dimensions
- 1,000 queries for statistical significance
- P50, P99 latency measurements
- Memory usage analysis
- Recall accuracy testing
- ruvector vs pgvector comparison
## Understanding Results
### Criterion Output
```
Distance/euclidean/scalar/768
time: [2.1234 µs 2.1456 µs 2.1678 µs]
thrpt: [354.23 Melem/s 357.89 Melem/s 361.55 Melem/s]
```
- **time**: Mean execution time with confidence intervals
- **thrpt**: Throughput (operations per second)
### Comparing Implementations
```bash
# Set baseline
cargo bench --bench distance_bench -- --save-baseline main
# Make changes, then compare
cargo bench --bench distance_bench -- --baseline main
```
### SQL Benchmark Interpretation
```sql
p50_ms | p99_ms | avg_ms | min_ms | max_ms
--------+--------+--------+--------+--------
0.856 | 1.234 | 0.912 | 0.654 | 2.456
```
- **p50**: Median latency (50th percentile)
- **p99**: 99th percentile latency (worst 1%)
- **avg**: Average latency
- **min/max**: Best and worst case
## Performance Targets
### Distance Functions
| Operation | Dimension | Target Throughput |
|-----------|-----------|-------------------|
| L2 (SIMD) | 768 | > 400 Mops/s |
| L2 (SIMD) | 1536 | > 200 Mops/s |
| Cosine | 768 | > 300 Mops/s |
| Inner Product | 768 | > 500 Mops/s |
### HNSW Index
| Dataset Size | Build Time | Search Latency | Recall@10 |
|--------------|------------|----------------|-----------|
| 100K | < 30s | < 1ms | > 0.95 |
| 1M | < 5min | < 2ms | > 0.95 |
| 10M | < 1hr | < 5ms | > 0.90 |
### Quantization
| Method | Compression | Speedup | Recall@10 |
|---------|-------------|---------|-----------|
| SQ8 | 4x | 2-3x | > 0.95 |
| Binary | 32x | 10-20x | > 0.85 |
| PQ(8) | 16x | 5-10x | > 0.90 |
## Continuous Integration
The GitHub Actions workflow runs automatically on:
- Pull requests touching benchmark code
- Pushes to `main` and `develop` branches
- Manual workflow dispatch
Results are:
- Posted as PR comments
- Stored as artifacts (30 day retention)
- Tracked over time on main branch
- Compared against baseline
### Triggering Manual Runs
```bash
# From GitHub UI: Actions → Benchmarks → Run workflow
# Or using gh CLI
gh workflow run benchmarks.yml
```
### Enabling SQL Benchmarks in CI
SQL benchmarks are disabled by default (too slow). Enable via workflow dispatch:
```bash
gh workflow run benchmarks.yml -f run_sql_benchmarks=true
```
## Advanced Usage
### Profiling with Criterion
```bash
# Generate flamegraph
cargo bench --bench distance_bench -- --profile-time=5
# Output to specific format
cargo bench --bench distance_bench -- --output-format bencher
```
### Custom Benchmark Parameters
Edit benchmark files to adjust:
- Vector dimensions
- Dataset sizes
- Number of queries
- HNSW parameters (M, ef_construction, ef_search)
- Quantization settings
### Comparing with pgvector
Ensure pgvector is installed:
```bash
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
```
Then run SQL benchmarks for side-by-side comparison.
## Interpreting Regressions
### Performance Degradation Alert
If CI fails due to performance regression:
1. **Check the comparison**: Review the baseline vs current results
2. **Validate the change**: Ensure it's not due to measurement noise
3. **Profile the code**: Use flamegraphs to identify bottlenecks
4. **Consider trade-offs**: Sometimes correctness > speed
### Common Causes
- **SIMD disabled**: Check compiler flags
- **Debug build**: Ensure --release mode
- **Thermal throttling**: CPU overheating in CI
- **Cache effects**: Different data access patterns
## Contributing
When adding benchmarks:
1. Add to appropriate `*_bench.rs` file
2. Update this README
3. Ensure benchmarks complete in < 5 minutes
4. Use `black_box()` to prevent optimization
5. Test both small and large inputs
## Resources
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [HNSW Paper](https://arxiv.org/abs/1603.09320)
- [Product Quantization Paper](https://ieeexplore.ieee.org/document/5432202)
- [pgvector Repository](https://github.com/pgvector/pgvector)
## License
Same as ruvector project - MIT