wifi-densepose/crates/ruvector-postgres/benches/README.md

# RuVector Benchmark Suite

Comprehensive benchmarks comparing ruvector vs pgvector across multiple dimensions.

## Overview

This benchmark suite provides:

1. **Rust Benchmarks** - Low-level performance testing using Criterion
2. **SQL Benchmarks** - Realistic PostgreSQL workload testing
3. **Automated CI** - GitHub Actions workflow for continuous benchmarking

## Quick Start

### Run All Benchmarks

```bash
cd crates/ruvector-postgres
bash benches/scripts/run_benchmarks.sh
```

### Run Individual Benchmarks

```bash
# Distance function benchmarks
cargo bench --bench distance_bench

# HNSW index benchmarks
cargo bench --bench index_bench

# Quantization benchmarks
cargo bench --bench quantization_bench

# Quantized distance benchmarks
cargo bench --bench quantized_distance_bench
```

### Run SQL Benchmarks

```bash
# Setup database
createdb ruvector_bench
psql -d ruvector_bench -c 'CREATE EXTENSION ruvector;'
psql -d ruvector_bench -c 'CREATE EXTENSION pgvector;'

# Quick benchmark (10k vectors)
psql -d ruvector_bench -f benches/sql/quick_benchmark.sql

# Full workload (1M vectors)
psql -d ruvector_bench -f benches/sql/benchmark_workload.sql
```

## Benchmark Categories

### 1. Distance Function Benchmarks (`distance_bench.rs`)

Tests distance calculation performance across different vector dimensions:

- **L2 (Euclidean) Distance**: Scalar vs SIMD implementations
- **Cosine Distance**: Normalized similarity measurement
- **Inner Product**: Dot product for maximum inner product search
- **Batch Operations**: Sequential vs parallel processing

**Dimensions tested**: 128, 384, 768, 1536, 3072

**Key metrics**:
- Single operation latency
- Throughput (ops/sec)
- SIMD speedup vs scalar

### 2. HNSW Index Benchmarks (`index_bench.rs`)

Tests Hierarchical Navigable Small World graph index:

#### Build Benchmarks
- Index construction time vs dataset size (1K, 10K, 100K, 1M vectors)
- Impact of `ef_construction` parameter (16, 32, 64, 128, 256)
- Impact of `M` parameter (8, 12, 16, 24, 32, 48)

#### Search Benchmarks
- Query latency vs dataset size
- Impact of `ef_search` parameter (10, 20, 40, 80, 160, 320)
- Impact of `k` (number of neighbors: 1, 5, 10, 20, 50, 100)

#### Recall Accuracy
- Recall@10 vs `ef_search` values
- Ground truth comparison

#### Memory Usage
- Index size vs dataset size
- Memory per vector overhead

**Dimensions tested**: 128, 384, 768, 1536

### 3. Quantization Benchmarks (`quantization_bench.rs`)

Tests vector compression and quantized search:

#### Scalar Quantization (SQ8)
- Encoding/decoding speed
- Distance calculation speedup
- Recall vs exact search
- Memory reduction (4x compression)

#### Binary Quantization
- Encoding speed
- Hamming distance calculation (SIMD)
- Massive compression (32x for f32)
- Re-ranking strategies

#### Product Quantization (PQ)
- ADC (Asymmetric Distance Computation)
- SIMD vs scalar lookup
- Configurable compression ratios

**Key metrics**:
- Speedup vs exact search
- Recall@10 accuracy
- Compression ratio
- Throughput improvement

### 4. SQL Workload Benchmarks

Realistic PostgreSQL scenarios:

#### Quick Benchmark (`quick_benchmark.sql`)
- 10,000 vectors, 768 dimensions
- Sequential scan baseline
- HNSW index build
- Index search performance
- Distance function comparisons

#### Full Workload (`benchmark_workload.sql`)
- 1,000,000 vectors, 1536 dimensions
- 1,000 queries for statistical significance
- P50, P99 latency measurements
- Memory usage analysis
- Recall accuracy testing
- ruvector vs pgvector comparison

## Understanding Results

### Criterion Output

```
Distance/euclidean/scalar/768
                        time:   [2.1234 µs 2.1456 µs 2.1678 µs]
                        thrpt: [354.23 Melem/s 357.89 Melem/s 361.55 Melem/s]
```

- **time**: Mean execution time with confidence intervals
- **thrpt**: Throughput (operations per second)

### Comparing Implementations

```bash
# Set baseline
cargo bench --bench distance_bench -- --save-baseline main

# Make changes, then compare
cargo bench --bench distance_bench -- --baseline main
```

### SQL Benchmark Interpretation

```sql
 p50_ms | p99_ms | avg_ms | min_ms | max_ms
--------+--------+--------+--------+--------
  0.856 |  1.234 |  0.912 |  0.654 |  2.456
```

- **p50**: Median latency (50th percentile)
- **p99**: 99th percentile latency (worst 1%)
- **avg**: Average latency
- **min/max**: Best and worst case

## Performance Targets

### Distance Functions

| Operation | Dimension | Target Throughput |
|-----------|-----------|-------------------|
| L2 (SIMD) | 768       | > 400 Mops/s     |
| L2 (SIMD) | 1536      | > 200 Mops/s     |
| Cosine    | 768       | > 300 Mops/s     |
| Inner Product | 768   | > 500 Mops/s     |

### HNSW Index

| Dataset Size | Build Time | Search Latency | Recall@10 |
|--------------|------------|----------------|-----------|
| 100K         | < 30s      | < 1ms          | > 0.95    |
| 1M           | < 5min     | < 2ms          | > 0.95    |
| 10M          | < 1hr      | < 5ms          | > 0.90    |

### Quantization

| Method  | Compression | Speedup | Recall@10 |
|---------|-------------|---------|-----------|
| SQ8     | 4x          | 2-3x    | > 0.95    |
| Binary  | 32x         | 10-20x  | > 0.85    |
| PQ(8)   | 16x         | 5-10x   | > 0.90    |

## Continuous Integration

The GitHub Actions workflow runs automatically on:

- Pull requests touching benchmark code
- Pushes to `main` and `develop` branches
- Manual workflow dispatch

Results are:
- Posted as PR comments
- Stored as artifacts (30 day retention)
- Tracked over time on main branch
- Compared against baseline

### Triggering Manual Runs

```bash
# From GitHub UI: Actions → Benchmarks → Run workflow

# Or using gh CLI
gh workflow run benchmarks.yml
```

### Enabling SQL Benchmarks in CI

SQL benchmarks are disabled by default (too slow). Enable via workflow dispatch:

```bash
gh workflow run benchmarks.yml -f run_sql_benchmarks=true
```

## Advanced Usage

### Profiling with Criterion

```bash
# Generate flamegraph
cargo bench --bench distance_bench -- --profile-time=5

# Output to specific format
cargo bench --bench distance_bench -- --output-format bencher
```

### Custom Benchmark Parameters

Edit benchmark files to adjust:

- Vector dimensions
- Dataset sizes
- Number of queries
- HNSW parameters (M, ef_construction, ef_search)
- Quantization settings

### Comparing with pgvector

Ensure pgvector is installed:

```bash
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
```

Then run SQL benchmarks for side-by-side comparison.

## Interpreting Regressions

### Performance Degradation Alert

If CI fails due to performance regression:

1. **Check the comparison**: Review the baseline vs current results
2. **Validate the change**: Ensure it's not due to measurement noise
3. **Profile the code**: Use flamegraphs to identify bottlenecks
4. **Consider trade-offs**: Sometimes correctness > speed

### Common Causes

- **SIMD disabled**: Check compiler flags
- **Debug build**: Ensure --release mode
- **Thermal throttling**: CPU overheating in CI
- **Cache effects**: Different data access patterns

## Contributing

When adding benchmarks:

1. Add to appropriate `*_bench.rs` file
2. Update this README
3. Ensure benchmarks complete in < 5 minutes
4. Use `black_box()` to prevent optimization
5. Test both small and large inputs

## Resources

- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [HNSW Paper](https://arxiv.org/abs/1603.09320)
- [Product Quantization Paper](https://ieeexplore.ieee.org/document/5432202)
- [pgvector Repository](https://github.com/pgvector/pgvector)

## License

Same as ruvector project - MIT