Files

RuVector Benchmark Suite

Comprehensive benchmarks comparing ruvector vs pgvector across multiple dimensions.

Overview

This benchmark suite provides:

  1. Rust Benchmarks - Low-level performance testing using Criterion
  2. SQL Benchmarks - Realistic PostgreSQL workload testing
  3. Automated CI - GitHub Actions workflow for continuous benchmarking

Quick Start

Run All Benchmarks

cd crates/ruvector-postgres
bash benches/scripts/run_benchmarks.sh

Run Individual Benchmarks

# Distance function benchmarks
cargo bench --bench distance_bench

# HNSW index benchmarks
cargo bench --bench index_bench

# Quantization benchmarks
cargo bench --bench quantization_bench

# Quantized distance benchmarks
cargo bench --bench quantized_distance_bench

Run SQL Benchmarks

# Setup database
createdb ruvector_bench
psql -d ruvector_bench -c 'CREATE EXTENSION ruvector;'
psql -d ruvector_bench -c 'CREATE EXTENSION pgvector;'

# Quick benchmark (10k vectors)
psql -d ruvector_bench -f benches/sql/quick_benchmark.sql

# Full workload (1M vectors)
psql -d ruvector_bench -f benches/sql/benchmark_workload.sql

Benchmark Categories

1. Distance Function Benchmarks (distance_bench.rs)

Tests distance calculation performance across different vector dimensions:

  • L2 (Euclidean) Distance: Scalar vs SIMD implementations
  • Cosine Distance: Normalized similarity measurement
  • Inner Product: Dot product for maximum inner product search
  • Batch Operations: Sequential vs parallel processing

Dimensions tested: 128, 384, 768, 1536, 3072

Key metrics:

  • Single operation latency
  • Throughput (ops/sec)
  • SIMD speedup vs scalar

2. HNSW Index Benchmarks (index_bench.rs)

Tests Hierarchical Navigable Small World graph index:

Build Benchmarks

  • Index construction time vs dataset size (1K, 10K, 100K, 1M vectors)
  • Impact of ef_construction parameter (16, 32, 64, 128, 256)
  • Impact of M parameter (8, 12, 16, 24, 32, 48)

Search Benchmarks

  • Query latency vs dataset size
  • Impact of ef_search parameter (10, 20, 40, 80, 160, 320)
  • Impact of k (number of neighbors: 1, 5, 10, 20, 50, 100)

Recall Accuracy

  • Recall@10 vs ef_search values
  • Ground truth comparison

Memory Usage

  • Index size vs dataset size
  • Memory per vector overhead

Dimensions tested: 128, 384, 768, 1536

3. Quantization Benchmarks (quantization_bench.rs)

Tests vector compression and quantized search:

Scalar Quantization (SQ8)

  • Encoding/decoding speed
  • Distance calculation speedup
  • Recall vs exact search
  • Memory reduction (4x compression)

Binary Quantization

  • Encoding speed
  • Hamming distance calculation (SIMD)
  • Massive compression (32x for f32)
  • Re-ranking strategies

Product Quantization (PQ)

  • ADC (Asymmetric Distance Computation)
  • SIMD vs scalar lookup
  • Configurable compression ratios

Key metrics:

  • Speedup vs exact search
  • Recall@10 accuracy
  • Compression ratio
  • Throughput improvement

4. SQL Workload Benchmarks

Realistic PostgreSQL scenarios:

Quick Benchmark (quick_benchmark.sql)

  • 10,000 vectors, 768 dimensions
  • Sequential scan baseline
  • HNSW index build
  • Index search performance
  • Distance function comparisons

Full Workload (benchmark_workload.sql)

  • 1,000,000 vectors, 1536 dimensions
  • 1,000 queries for statistical significance
  • P50, P99 latency measurements
  • Memory usage analysis
  • Recall accuracy testing
  • ruvector vs pgvector comparison

Understanding Results

Criterion Output

Distance/euclidean/scalar/768
                        time:   [2.1234 µs 2.1456 µs 2.1678 µs]
                        thrpt: [354.23 Melem/s 357.89 Melem/s 361.55 Melem/s]
  • time: Mean execution time with confidence intervals
  • thrpt: Throughput (operations per second)

Comparing Implementations

# Set baseline
cargo bench --bench distance_bench -- --save-baseline main

# Make changes, then compare
cargo bench --bench distance_bench -- --baseline main

SQL Benchmark Interpretation

 p50_ms | p99_ms | avg_ms | min_ms | max_ms
--------+--------+--------+--------+--------
  0.856 |  1.234 |  0.912 |  0.654 |  2.456
  • p50: Median latency (50th percentile)
  • p99: 99th percentile latency (worst 1%)
  • avg: Average latency
  • min/max: Best and worst case

Performance Targets

Distance Functions

Operation Dimension Target Throughput
L2 (SIMD) 768 > 400 Mops/s
L2 (SIMD) 1536 > 200 Mops/s
Cosine 768 > 300 Mops/s
Inner Product 768 > 500 Mops/s

HNSW Index

Dataset Size Build Time Search Latency Recall@10
100K < 30s < 1ms > 0.95
1M < 5min < 2ms > 0.95
10M < 1hr < 5ms > 0.90

Quantization

Method Compression Speedup Recall@10
SQ8 4x 2-3x > 0.95
Binary 32x 10-20x > 0.85
PQ(8) 16x 5-10x > 0.90

Continuous Integration

The GitHub Actions workflow runs automatically on:

  • Pull requests touching benchmark code
  • Pushes to main and develop branches
  • Manual workflow dispatch

Results are:

  • Posted as PR comments
  • Stored as artifacts (30 day retention)
  • Tracked over time on main branch
  • Compared against baseline

Triggering Manual Runs

# From GitHub UI: Actions → Benchmarks → Run workflow

# Or using gh CLI
gh workflow run benchmarks.yml

Enabling SQL Benchmarks in CI

SQL benchmarks are disabled by default (too slow). Enable via workflow dispatch:

gh workflow run benchmarks.yml -f run_sql_benchmarks=true

Advanced Usage

Profiling with Criterion

# Generate flamegraph
cargo bench --bench distance_bench -- --profile-time=5

# Output to specific format
cargo bench --bench distance_bench -- --output-format bencher

Custom Benchmark Parameters

Edit benchmark files to adjust:

  • Vector dimensions
  • Dataset sizes
  • Number of queries
  • HNSW parameters (M, ef_construction, ef_search)
  • Quantization settings

Comparing with pgvector

Ensure pgvector is installed:

git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

Then run SQL benchmarks for side-by-side comparison.

Interpreting Regressions

Performance Degradation Alert

If CI fails due to performance regression:

  1. Check the comparison: Review the baseline vs current results
  2. Validate the change: Ensure it's not due to measurement noise
  3. Profile the code: Use flamegraphs to identify bottlenecks
  4. Consider trade-offs: Sometimes correctness > speed

Common Causes

  • SIMD disabled: Check compiler flags
  • Debug build: Ensure --release mode
  • Thermal throttling: CPU overheating in CI
  • Cache effects: Different data access patterns

Contributing

When adding benchmarks:

  1. Add to appropriate *_bench.rs file
  2. Update this README
  3. Ensure benchmarks complete in < 5 minutes
  4. Use black_box() to prevent optimization
  5. Test both small and large inputs

Resources

License

Same as ruvector project - MIT