Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/crates/ruvector-postgres/benches/README.md
+++ b/crates/ruvector-postgres/benches/README.md
@@ -0,0 +1,307 @@
+# RuVector Benchmark Suite
+
+Comprehensive benchmarks comparing ruvector vs pgvector across multiple dimensions.
+
+## Overview
+
+This benchmark suite provides:
+
+1. **Rust Benchmarks** - Low-level performance testing using Criterion
+2. **SQL Benchmarks** - Realistic PostgreSQL workload testing
+3. **Automated CI** - GitHub Actions workflow for continuous benchmarking
+
+## Quick Start
+
+### Run All Benchmarks
+
+```bash
+cd crates/ruvector-postgres
+bash benches/scripts/run_benchmarks.sh
+```
+
+### Run Individual Benchmarks
+
+```bash
+# Distance function benchmarks
+cargo bench --bench distance_bench
+
+# HNSW index benchmarks
+cargo bench --bench index_bench
+
+# Quantization benchmarks
+cargo bench --bench quantization_bench
+
+# Quantized distance benchmarks
+cargo bench --bench quantized_distance_bench
+```
+
+### Run SQL Benchmarks
+
+```bash
+# Setup database
+createdb ruvector_bench
+psql -d ruvector_bench -c 'CREATE EXTENSION ruvector;'
+psql -d ruvector_bench -c 'CREATE EXTENSION pgvector;'
+
+# Quick benchmark (10k vectors)
+psql -d ruvector_bench -f benches/sql/quick_benchmark.sql
+
+# Full workload (1M vectors)
+psql -d ruvector_bench -f benches/sql/benchmark_workload.sql
+```
+
+## Benchmark Categories
+
+### 1. Distance Function Benchmarks (`distance_bench.rs`)
+
+Tests distance calculation performance across different vector dimensions:
+
+- **L2 (Euclidean) Distance**: Scalar vs SIMD implementations
+- **Cosine Distance**: Normalized similarity measurement
+- **Inner Product**: Dot product for maximum inner product search
+- **Batch Operations**: Sequential vs parallel processing
+
+**Dimensions tested**: 128, 384, 768, 1536, 3072
+
+**Key metrics**:
+- Single operation latency
+- Throughput (ops/sec)
+- SIMD speedup vs scalar
+
+### 2. HNSW Index Benchmarks (`index_bench.rs`)
+
+Tests Hierarchical Navigable Small World graph index:
+
+#### Build Benchmarks
+- Index construction time vs dataset size (1K, 10K, 100K, 1M vectors)
+- Impact of `ef_construction` parameter (16, 32, 64, 128, 256)
+- Impact of `M` parameter (8, 12, 16, 24, 32, 48)
+
+#### Search Benchmarks
+- Query latency vs dataset size
+- Impact of `ef_search` parameter (10, 20, 40, 80, 160, 320)
+- Impact of `k` (number of neighbors: 1, 5, 10, 20, 50, 100)
+
+#### Recall Accuracy
+- Recall@10 vs `ef_search` values
+- Ground truth comparison
+
+#### Memory Usage
+- Index size vs dataset size
+- Memory per vector overhead
+
+**Dimensions tested**: 128, 384, 768, 1536
+
+### 3. Quantization Benchmarks (`quantization_bench.rs`)
+
+Tests vector compression and quantized search:
+
+#### Scalar Quantization (SQ8)
+- Encoding/decoding speed
+- Distance calculation speedup
+- Recall vs exact search
+- Memory reduction (4x compression)
+
+#### Binary Quantization
+- Encoding speed
+- Hamming distance calculation (SIMD)
+- Massive compression (32x for f32)
+- Re-ranking strategies
+
+#### Product Quantization (PQ)
+- ADC (Asymmetric Distance Computation)
+- SIMD vs scalar lookup
+- Configurable compression ratios
+
+**Key metrics**:
+- Speedup vs exact search
+- Recall@10 accuracy
+- Compression ratio
+- Throughput improvement
+
+### 4. SQL Workload Benchmarks
+
+Realistic PostgreSQL scenarios:
+
+#### Quick Benchmark (`quick_benchmark.sql`)
+- 10,000 vectors, 768 dimensions
+- Sequential scan baseline
+- HNSW index build
+- Index search performance
+- Distance function comparisons
+
+#### Full Workload (`benchmark_workload.sql`)
+- 1,000,000 vectors, 1536 dimensions
+- 1,000 queries for statistical significance
+- P50, P99 latency measurements
+- Memory usage analysis
+- Recall accuracy testing
+- ruvector vs pgvector comparison
+
+## Understanding Results
+
+### Criterion Output
+
+```
+Distance/euclidean/scalar/768
+                        time:   [2.1234 µs 2.1456 µs 2.1678 µs]
+                        thrpt: [354.23 Melem/s 357.89 Melem/s 361.55 Melem/s]
+```
+
+- **time**: Mean execution time with confidence intervals
+- **thrpt**: Throughput (operations per second)
+
+### Comparing Implementations
+
+```bash
+# Set baseline
+cargo bench --bench distance_bench -- --save-baseline main
+
+# Make changes, then compare
+cargo bench --bench distance_bench -- --baseline main
+```
+
+### SQL Benchmark Interpretation
+
+```sql
+ p50_ms | p99_ms | avg_ms | min_ms | max_ms
+--------+--------+--------+--------+--------
+  0.856 |  1.234 |  0.912 |  0.654 |  2.456
+```
+
+- **p50**: Median latency (50th percentile)
+- **p99**: 99th percentile latency (worst 1%)
+- **avg**: Average latency
+- **min/max**: Best and worst case
+
+## Performance Targets
+
+### Distance Functions
+
+| Operation | Dimension | Target Throughput |
+|-----------|-----------|-------------------|
+| L2 (SIMD) | 768       | > 400 Mops/s     |
+| L2 (SIMD) | 1536      | > 200 Mops/s     |
+| Cosine    | 768       | > 300 Mops/s     |
+| Inner Product | 768   | > 500 Mops/s     |
+
+### HNSW Index
+
+| Dataset Size | Build Time | Search Latency | Recall@10 |
+|--------------|------------|----------------|-----------|
+| 100K         | < 30s      | < 1ms          | > 0.95    |
+| 1M           | < 5min     | < 2ms          | > 0.95    |
+| 10M          | < 1hr      | < 5ms          | > 0.90    |
+
+### Quantization
+
+| Method  | Compression | Speedup | Recall@10 |
+|---------|-------------|---------|-----------|
+| SQ8     | 4x          | 2-3x    | > 0.95    |
+| Binary  | 32x         | 10-20x  | > 0.85    |
+| PQ(8)   | 16x         | 5-10x   | > 0.90    |
+
+## Continuous Integration
+
+The GitHub Actions workflow runs automatically on:
+
+- Pull requests touching benchmark code
+- Pushes to `main` and `develop` branches
+- Manual workflow dispatch
+
+Results are:
+- Posted as PR comments
+- Stored as artifacts (30 day retention)
+- Tracked over time on main branch
+- Compared against baseline
+
+### Triggering Manual Runs
+
+```bash
+# From GitHub UI: Actions → Benchmarks → Run workflow
+
+# Or using gh CLI
+gh workflow run benchmarks.yml
+```
+
+### Enabling SQL Benchmarks in CI
+
+SQL benchmarks are disabled by default (too slow). Enable via workflow dispatch:
+
+```bash
+gh workflow run benchmarks.yml -f run_sql_benchmarks=true
+```
+
+## Advanced Usage
+
+### Profiling with Criterion
+
+```bash
+# Generate flamegraph
+cargo bench --bench distance_bench -- --profile-time=5
+
+# Output to specific format
+cargo bench --bench distance_bench -- --output-format bencher
+```
+
+### Custom Benchmark Parameters
+
+Edit benchmark files to adjust:
+
+- Vector dimensions
+- Dataset sizes
+- Number of queries
+- HNSW parameters (M, ef_construction, ef_search)
+- Quantization settings
+
+### Comparing with pgvector
+
+Ensure pgvector is installed:
+
+```bash
+git clone https://github.com/pgvector/pgvector.git
+cd pgvector
+make
+sudo make install
+```
+
+Then run SQL benchmarks for side-by-side comparison.
+
+## Interpreting Regressions
+
+### Performance Degradation Alert
+
+If CI fails due to performance regression:
+
+1. **Check the comparison**: Review the baseline vs current results
+2. **Validate the change**: Ensure it's not due to measurement noise
+3. **Profile the code**: Use flamegraphs to identify bottlenecks
+4. **Consider trade-offs**: Sometimes correctness > speed
+
+### Common Causes
+
+- **SIMD disabled**: Check compiler flags
+- **Debug build**: Ensure --release mode
+- **Thermal throttling**: CPU overheating in CI
+- **Cache effects**: Different data access patterns
+
+## Contributing
+
+When adding benchmarks:
+
+1. Add to appropriate `*_bench.rs` file
+2. Update this README
+3. Ensure benchmarks complete in < 5 minutes
+4. Use `black_box()` to prevent optimization
+5. Test both small and large inputs
+
+## Resources
+
+- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
+- [HNSW Paper](https://arxiv.org/abs/1603.09320)
+- [Product Quantization Paper](https://ieeexplore.ieee.org/document/5432202)
+- [pgvector Repository](https://github.com/pgvector/pgvector)
+
+## License
+
+Same as ruvector project - MIT
--- a/crates/ruvector-postgres/benches/distance_bench.rs
+++ b/crates/ruvector-postgres/benches/distance_bench.rs
@@ -0,0 +1,565 @@
+//! Comprehensive distance function benchmarks
+//!
+//! Compare SIMD vs scalar implementations across different vector sizes
+//! and distance metrics (L2, cosine, inner product, Manhattan).
+//!
+//! Dimensions tested: 128, 384, 768, 1536, 3072
+//! This covers common embedding sizes:
+//! - 128: SBERT MiniLM
+//! - 384: all-MiniLM-L6-v2
+//! - 768: BERT base, RoBERTa
+//! - 1536: OpenAI text-embedding-ada-002
+//! - 3072: OpenAI text-embedding-3-large
+
+use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
+use rand::prelude::*;
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+// ============================================================================
+// Distance Implementations
+// ============================================================================
+
+mod distance_impl {
+    /// Scalar Euclidean distance
+    pub fn euclidean_scalar(a: &[f32], b: &[f32]) -> f32 {
+        a.iter()
+            .zip(b.iter())
+            .map(|(x, y)| {
+                let diff = x - y;
+                diff * diff
+            })
+            .sum::<f32>()
+            .sqrt()
+    }
+
+    /// Scalar cosine distance
+    pub fn cosine_scalar(a: &[f32], b: &[f32]) -> f32 {
+        let mut dot = 0.0f32;
+        let mut norm_a = 0.0f32;
+        let mut norm_b = 0.0f32;
+
+        for (x, y) in a.iter().zip(b.iter()) {
+            dot += x * y;
+            norm_a += x * x;
+            norm_b += y * y;
+        }
+
+        let denominator = (norm_a * norm_b).sqrt();
+        if denominator == 0.0 {
+            return 1.0;
+        }
+
+        1.0 - (dot / denominator)
+    }
+
+    /// Scalar inner product distance (negative)
+    pub fn inner_product_scalar(a: &[f32], b: &[f32]) -> f32 {
+        -a.iter().zip(b.iter()).map(|(x, y)| x * y).sum::<f32>()
+    }
+
+    /// Scalar Manhattan distance
+    pub fn manhattan_scalar(a: &[f32], b: &[f32]) -> f32 {
+        a.iter()
+            .zip(b.iter())
+            .map(|(x, y)| (x - y).abs())
+            .sum::<f32>()
+    }
+
+    /// AVX2 Euclidean distance squared (L2^2)
+    #[cfg(target_arch = "x86_64")]
+    #[target_feature(enable = "avx2", enable = "fma")]
+    pub unsafe fn euclidean_avx2(a: &[f32], b: &[f32]) -> f32 {
+        use std::arch::x86_64::*;
+
+        let n = a.len();
+        let mut sum = _mm256_setzero_ps();
+
+        let chunks = n / 8;
+        for i in 0..chunks {
+            let offset = i * 8;
+            let va = _mm256_loadu_ps(a.as_ptr().add(offset));
+            let vb = _mm256_loadu_ps(b.as_ptr().add(offset));
+            let diff = _mm256_sub_ps(va, vb);
+            sum = _mm256_fmadd_ps(diff, diff, sum);
+        }
+
+        // Horizontal sum
+        let sum_high = _mm256_extractf128_ps(sum, 1);
+        let sum_low = _mm256_castps256_ps128(sum);
+        let sum128 = _mm_add_ps(sum_high, sum_low);
+        let sum64 = _mm_add_ps(sum128, _mm_movehl_ps(sum128, sum128));
+        let sum32 = _mm_add_ss(sum64, _mm_shuffle_ps(sum64, sum64, 1));
+
+        let mut result = _mm_cvtss_f32(sum32);
+
+        // Handle remainder
+        for i in (chunks * 8)..n {
+            let diff = a[i] - b[i];
+            result += diff * diff;
+        }
+
+        result.sqrt()
+    }
+
+    /// AVX2 cosine distance
+    #[cfg(target_arch = "x86_64")]
+    #[target_feature(enable = "avx2", enable = "fma")]
+    pub unsafe fn cosine_avx2(a: &[f32], b: &[f32]) -> f32 {
+        use std::arch::x86_64::*;
+
+        let n = a.len();
+        let mut dot_sum = _mm256_setzero_ps();
+        let mut norm_a_sum = _mm256_setzero_ps();
+        let mut norm_b_sum = _mm256_setzero_ps();
+
+        let chunks = n / 8;
+        for i in 0..chunks {
+            let offset = i * 8;
+            let va = _mm256_loadu_ps(a.as_ptr().add(offset));
+            let vb = _mm256_loadu_ps(b.as_ptr().add(offset));
+
+            dot_sum = _mm256_fmadd_ps(va, vb, dot_sum);
+            norm_a_sum = _mm256_fmadd_ps(va, va, norm_a_sum);
+            norm_b_sum = _mm256_fmadd_ps(vb, vb, norm_b_sum);
+        }
+
+        // Horizontal sums
+        let h_dot = horizontal_sum_avx2(dot_sum);
+        let h_norm_a = horizontal_sum_avx2(norm_a_sum);
+        let h_norm_b = horizontal_sum_avx2(norm_b_sum);
+
+        // Handle remainder
+        let mut dot = h_dot;
+        let mut norm_a = h_norm_a;
+        let mut norm_b = h_norm_b;
+        for i in (chunks * 8)..n {
+            dot += a[i] * b[i];
+            norm_a += a[i] * a[i];
+            norm_b += b[i] * b[i];
+        }
+
+        let denom = (norm_a * norm_b).sqrt();
+        if denom == 0.0 {
+            return 1.0;
+        }
+        1.0 - (dot / denom)
+    }
+
+    /// AVX2 inner product
+    #[cfg(target_arch = "x86_64")]
+    #[target_feature(enable = "avx2", enable = "fma")]
+    pub unsafe fn inner_product_avx2(a: &[f32], b: &[f32]) -> f32 {
+        use std::arch::x86_64::*;
+
+        let n = a.len();
+        let mut sum = _mm256_setzero_ps();
+
+        let chunks = n / 8;
+        for i in 0..chunks {
+            let offset = i * 8;
+            let va = _mm256_loadu_ps(a.as_ptr().add(offset));
+            let vb = _mm256_loadu_ps(b.as_ptr().add(offset));
+            sum = _mm256_fmadd_ps(va, vb, sum);
+        }
+
+        let mut result = horizontal_sum_avx2(sum);
+
+        // Handle remainder
+        for i in (chunks * 8)..n {
+            result += a[i] * b[i];
+        }
+
+        -result
+    }
+
+    #[cfg(target_arch = "x86_64")]
+    #[inline]
+    unsafe fn horizontal_sum_avx2(v: std::arch::x86_64::__m256) -> f32 {
+        use std::arch::x86_64::*;
+        let sum_high = _mm256_extractf128_ps(v, 1);
+        let sum_low = _mm256_castps256_ps128(v);
+        let sum128 = _mm_add_ps(sum_high, sum_low);
+        let sum64 = _mm_add_ps(sum128, _mm_movehl_ps(sum128, sum128));
+        let sum32 = _mm_add_ss(sum64, _mm_shuffle_ps(sum64, sum64, 1));
+        _mm_cvtss_f32(sum32)
+    }
+
+    #[cfg(not(target_arch = "x86_64"))]
+    pub unsafe fn euclidean_avx2(a: &[f32], b: &[f32]) -> f32 {
+        euclidean_scalar(a, b)
+    }
+
+    #[cfg(not(target_arch = "x86_64"))]
+    pub unsafe fn cosine_avx2(a: &[f32], b: &[f32]) -> f32 {
+        cosine_scalar(a, b)
+    }
+
+    #[cfg(not(target_arch = "x86_64"))]
+    pub unsafe fn inner_product_avx2(a: &[f32], b: &[f32]) -> f32 {
+        inner_product_scalar(a, b)
+    }
+}
+
+// ============================================================================
+// Test Data Generation
+// ============================================================================
+
+fn generate_vectors(dims: usize, seed: u64) -> (Vec<f32>, Vec<f32>) {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+    let a: Vec<f32> = (0..dims).map(|_| rng.gen_range(-1.0..1.0)).collect();
+    let b: Vec<f32> = (0..dims).map(|_| rng.gen_range(-1.0..1.0)).collect();
+    (a, b)
+}
+
+fn generate_normalized_vectors(dims: usize, seed: u64) -> (Vec<f32>, Vec<f32>) {
+    let (mut a, mut b) = generate_vectors(dims, seed);
+
+    // Normalize vectors
+    let norm_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
+    let norm_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
+
+    for x in &mut a {
+        *x /= norm_a;
+    }
+    for x in &mut b {
+        *x /= norm_b;
+    }
+
+    (a, b)
+}
+
+fn generate_vector_dataset(n: usize, dims: usize, seed: u64) -> Vec<Vec<f32>> {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+    (0..n)
+        .map(|_| (0..dims).map(|_| rng.gen_range(-1.0..1.0)).collect())
+        .collect()
+}
+
+// ============================================================================
+// Euclidean Distance Benchmarks
+// ============================================================================
+
+const DIMENSIONS: [usize; 5] = [128, 384, 768, 1536, 3072];
+
+fn bench_euclidean(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Euclidean Distance");
+
+    for dims in DIMENSIONS.iter() {
+        let (a, b) = generate_vectors(*dims, 42);
+
+        group.throughput(Throughput::Elements(*dims as u64));
+
+        group.bench_with_input(BenchmarkId::new("scalar", dims), dims, |bench, _| {
+            bench.iter(|| distance_impl::euclidean_scalar(black_box(&a), black_box(&b)))
+        });
+
+        #[cfg(target_arch = "x86_64")]
+        if is_x86_feature_detected!("avx2") && is_x86_feature_detected!("fma") {
+            group.bench_with_input(BenchmarkId::new("avx2", dims), dims, |bench, _| {
+                bench
+                    .iter(|| unsafe { distance_impl::euclidean_avx2(black_box(&a), black_box(&b)) })
+            });
+        }
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Cosine Distance Benchmarks
+// ============================================================================
+
+fn bench_cosine(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Cosine Distance");
+
+    for dims in DIMENSIONS.iter() {
+        let (a, b) = generate_vectors(*dims, 42);
+
+        group.throughput(Throughput::Elements(*dims as u64));
+
+        group.bench_with_input(BenchmarkId::new("scalar", dims), dims, |bench, _| {
+            bench.iter(|| distance_impl::cosine_scalar(black_box(&a), black_box(&b)))
+        });
+
+        #[cfg(target_arch = "x86_64")]
+        if is_x86_feature_detected!("avx2") && is_x86_feature_detected!("fma") {
+            group.bench_with_input(BenchmarkId::new("avx2", dims), dims, |bench, _| {
+                bench.iter(|| unsafe { distance_impl::cosine_avx2(black_box(&a), black_box(&b)) })
+            });
+        }
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Cosine Distance for Pre-Normalized Vectors
+// ============================================================================
+
+fn bench_cosine_normalized(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Cosine Distance (Normalized)");
+
+    for dims in DIMENSIONS.iter() {
+        let (a, b) = generate_normalized_vectors(*dims, 42);
+
+        group.throughput(Throughput::Elements(*dims as u64));
+
+        // For normalized vectors, cosine = 1 - dot product
+        group.bench_with_input(BenchmarkId::new("scalar_dot", dims), dims, |bench, _| {
+            bench.iter(|| {
+                let dot: f32 = a.iter().zip(&b).map(|(x, y)| x * y).sum();
+                1.0 - black_box(dot)
+            })
+        });
+
+        #[cfg(target_arch = "x86_64")]
+        if is_x86_feature_detected!("avx2") && is_x86_feature_detected!("fma") {
+            group.bench_with_input(BenchmarkId::new("avx2_dot", dims), dims, |bench, _| {
+                bench.iter(|| unsafe {
+                    1.0 + distance_impl::inner_product_avx2(black_box(&a), black_box(&b))
+                })
+            });
+        }
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Inner Product Benchmarks
+// ============================================================================
+
+fn bench_inner_product(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Inner Product");
+
+    for dims in DIMENSIONS.iter() {
+        let (a, b) = generate_vectors(*dims, 42);
+
+        group.throughput(Throughput::Elements(*dims as u64));
+
+        group.bench_with_input(BenchmarkId::new("scalar", dims), dims, |bench, _| {
+            bench.iter(|| distance_impl::inner_product_scalar(black_box(&a), black_box(&b)))
+        });
+
+        #[cfg(target_arch = "x86_64")]
+        if is_x86_feature_detected!("avx2") && is_x86_feature_detected!("fma") {
+            group.bench_with_input(BenchmarkId::new("avx2", dims), dims, |bench, _| {
+                bench.iter(|| unsafe {
+                    distance_impl::inner_product_avx2(black_box(&a), black_box(&b))
+                })
+            });
+        }
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Manhattan Distance Benchmarks
+// ============================================================================
+
+fn bench_manhattan(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Manhattan Distance");
+
+    for dims in DIMENSIONS.iter() {
+        let (a, b) = generate_vectors(*dims, 42);
+
+        group.throughput(Throughput::Elements(*dims as u64));
+
+        group.bench_with_input(BenchmarkId::new("scalar", dims), dims, |bench, _| {
+            bench.iter(|| distance_impl::manhattan_scalar(black_box(&a), black_box(&b)))
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Batch Distance Benchmarks (1000 vectors)
+// ============================================================================
+
+fn bench_batch_sequential(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Batch Distance (Sequential, 1000 vectors)");
+
+    for dims in [128, 384, 1536].iter() {
+        let query = generate_vectors(*dims, 42).0;
+        let vectors = generate_vector_dataset(1000, *dims, 123);
+
+        group.throughput(Throughput::Elements(1000));
+
+        group.bench_with_input(BenchmarkId::new("euclidean", dims), dims, |bench, _| {
+            bench.iter(|| {
+                vectors
+                    .iter()
+                    .map(|v| distance_impl::euclidean_scalar(black_box(&query), black_box(v)))
+                    .collect::<Vec<_>>()
+            })
+        });
+
+        group.bench_with_input(BenchmarkId::new("cosine", dims), dims, |bench, _| {
+            bench.iter(|| {
+                vectors
+                    .iter()
+                    .map(|v| distance_impl::cosine_scalar(black_box(&query), black_box(v)))
+                    .collect::<Vec<_>>()
+            })
+        });
+
+        group.bench_with_input(BenchmarkId::new("inner_product", dims), dims, |bench, _| {
+            bench.iter(|| {
+                vectors
+                    .iter()
+                    .map(|v| distance_impl::inner_product_scalar(black_box(&query), black_box(v)))
+                    .collect::<Vec<_>>()
+            })
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_batch_parallel(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Batch Distance (Parallel, 1000 vectors)");
+
+    for dims in [128, 384, 1536].iter() {
+        let query = generate_vectors(*dims, 42).0;
+        let vectors = generate_vector_dataset(1000, *dims, 123);
+
+        group.throughput(Throughput::Elements(1000));
+
+        group.bench_with_input(
+            BenchmarkId::new("euclidean_rayon", dims),
+            dims,
+            |bench, _| {
+                bench.iter(|| {
+                    vectors
+                        .par_iter()
+                        .map(|v| distance_impl::euclidean_scalar(black_box(&query), black_box(v)))
+                        .collect::<Vec<_>>()
+                })
+            },
+        );
+
+        group.bench_with_input(BenchmarkId::new("cosine_rayon", dims), dims, |bench, _| {
+            bench.iter(|| {
+                vectors
+                    .par_iter()
+                    .map(|v| distance_impl::cosine_scalar(black_box(&query), black_box(v)))
+                    .collect::<Vec<_>>()
+            })
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Large Batch Benchmarks (10K vectors)
+// ============================================================================
+
+fn bench_large_batch(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Large Batch Distance (10K vectors)");
+    group.sample_size(10);
+
+    for dims in [384, 768, 1536].iter() {
+        let query = generate_vectors(*dims, 42).0;
+        let vectors = generate_vector_dataset(10_000, *dims, 123);
+
+        group.throughput(Throughput::Elements(10_000));
+
+        group.bench_with_input(BenchmarkId::new("sequential", dims), dims, |bench, _| {
+            bench.iter(|| {
+                vectors
+                    .iter()
+                    .map(|v| distance_impl::euclidean_scalar(black_box(&query), black_box(v)))
+                    .collect::<Vec<_>>()
+            })
+        });
+
+        group.bench_with_input(BenchmarkId::new("parallel", dims), dims, |bench, _| {
+            bench.iter(|| {
+                vectors
+                    .par_iter()
+                    .map(|v| distance_impl::euclidean_scalar(black_box(&query), black_box(v)))
+                    .collect::<Vec<_>>()
+            })
+        });
+
+        #[cfg(target_arch = "x86_64")]
+        if is_x86_feature_detected!("avx2") && is_x86_feature_detected!("fma") {
+            group.bench_with_input(BenchmarkId::new("parallel_avx2", dims), dims, |bench, _| {
+                bench.iter(|| {
+                    vectors
+                        .par_iter()
+                        .map(|v| unsafe {
+                            distance_impl::euclidean_avx2(black_box(&query), black_box(v))
+                        })
+                        .collect::<Vec<_>>()
+                })
+            });
+        }
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// SIMD Speedup Comparison
+// ============================================================================
+
+fn bench_simd_speedup(c: &mut Criterion) {
+    let mut group = c.benchmark_group("SIMD Speedup Analysis");
+
+    #[cfg(target_arch = "x86_64")]
+    if is_x86_feature_detected!("avx2") && is_x86_feature_detected!("fma") {
+        for dims in DIMENSIONS.iter() {
+            let (a, b) = generate_vectors(*dims, 42);
+
+            // Euclidean
+            group.bench_with_input(
+                BenchmarkId::new("euclidean_scalar", dims),
+                dims,
+                |bench, _| {
+                    bench.iter(|| distance_impl::euclidean_scalar(black_box(&a), black_box(&b)))
+                },
+            );
+
+            group.bench_with_input(
+                BenchmarkId::new("euclidean_avx2", dims),
+                dims,
+                |bench, _| {
+                    bench.iter(|| unsafe {
+                        distance_impl::euclidean_avx2(black_box(&a), black_box(&b))
+                    })
+                },
+            );
+
+            // Cosine
+            group.bench_with_input(BenchmarkId::new("cosine_scalar", dims), dims, |bench, _| {
+                bench.iter(|| distance_impl::cosine_scalar(black_box(&a), black_box(&b)))
+            });
+
+            group.bench_with_input(BenchmarkId::new("cosine_avx2", dims), dims, |bench, _| {
+                bench.iter(|| unsafe { distance_impl::cosine_avx2(black_box(&a), black_box(&b)) })
+            });
+        }
+    }
+
+    group.finish();
+}
+
+criterion_group!(
+    benches,
+    bench_euclidean,
+    bench_cosine,
+    bench_cosine_normalized,
+    bench_inner_product,
+    bench_manhattan,
+    bench_batch_sequential,
+    bench_batch_parallel,
+    bench_large_batch,
+    bench_simd_speedup,
+);
+
+criterion_main!(benches);
--- a/crates/ruvector-postgres/benches/e2e_bench.rs
+++ b/crates/ruvector-postgres/benches/e2e_bench.rs
@@ -0,0 +1,782 @@
+//! End-to-End benchmarks for RuVector PostgreSQL extension
+//!
+//! Comprehensive benchmarks for:
+//! - Full query pipeline latency
+//! - Insert throughput
+//! - Concurrent query scaling
+//! - Memory usage under load
+//! - pgvector comparison baselines
+
+use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
+use rand::prelude::*;
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+use std::collections::HashMap;
+use std::sync::atomic::{AtomicUsize, Ordering as AtomicOrdering};
+use std::sync::Arc;
+use std::time::{Duration, Instant};
+
+// ============================================================================
+// Simulated Vector Index (Full Pipeline)
+// ============================================================================
+
+mod index {
+    use dashmap::DashMap;
+    use parking_lot::RwLock;
+    use rand::prelude::*;
+    use rand_chacha::ChaCha8Rng;
+    use rayon::prelude::*;
+    use std::cmp::Ordering;
+    use std::collections::{BinaryHeap, HashMap, HashSet};
+    use std::sync::atomic::{AtomicUsize, Ordering as AtomicOrdering};
+
+    /// Full-featured HNSW index for benchmarking
+    pub struct HnswIndex {
+        pub nodes: DashMap<u64, Vec<f32>>,
+        pub neighbors: DashMap<u64, Vec<Vec<u64>>>,
+        pub entry_point: RwLock<Option<u64>>,
+        pub max_layer: AtomicUsize,
+        pub m: usize,
+        pub m0: usize,
+        pub ef_construction: usize,
+        pub ef_search: usize,
+        pub dimensions: usize,
+        next_id: AtomicUsize,
+        rng: RwLock<ChaCha8Rng>,
+    }
+
+    impl HnswIndex {
+        pub fn new(
+            dimensions: usize,
+            m: usize,
+            ef_construction: usize,
+            ef_search: usize,
+            seed: u64,
+        ) -> Self {
+            Self {
+                nodes: DashMap::new(),
+                neighbors: DashMap::new(),
+                entry_point: RwLock::new(None),
+                max_layer: AtomicUsize::new(0),
+                m,
+                m0: m * 2,
+                ef_construction,
+                ef_search,
+                dimensions,
+                next_id: AtomicUsize::new(0),
+                rng: RwLock::new(ChaCha8Rng::seed_from_u64(seed)),
+            }
+        }
+
+        pub fn len(&self) -> usize {
+            self.nodes.len()
+        }
+
+        fn random_level(&self) -> usize {
+            let ml = 1.0 / (self.m as f64).ln();
+            let mut rng = self.rng.write();
+            let r: f64 = rng.gen();
+            ((-r.ln() * ml).floor() as usize).min(32)
+        }
+
+        fn distance(&self, a: &[f32], b: &[f32]) -> f32 {
+            a.iter()
+                .zip(b.iter())
+                .map(|(x, y)| (x - y).powi(2))
+                .sum::<f32>()
+                .sqrt()
+        }
+
+        pub fn insert(&self, vector: Vec<f32>) -> u64 {
+            let id = self.next_id.fetch_add(1, AtomicOrdering::Relaxed) as u64;
+            let level = self.random_level();
+
+            // Initialize neighbor lists for all layers
+            let mut neighbor_lists = Vec::with_capacity(level + 1);
+            for _ in 0..=level {
+                neighbor_lists.push(Vec::new());
+            }
+
+            self.nodes.insert(id, vector.clone());
+            self.neighbors.insert(id, neighbor_lists);
+
+            let current_entry = *self.entry_point.read();
+
+            if current_entry.is_none() {
+                *self.entry_point.write() = Some(id);
+                self.max_layer.store(level, AtomicOrdering::Relaxed);
+                return id;
+            }
+
+            // Simplified insertion
+            let entry_id = current_entry.unwrap();
+
+            // Connect to some neighbors
+            if let Some(entry_vec) = self.nodes.get(&entry_id) {
+                let max_conn = if level == 0 { self.m0 } else { self.m };
+
+                if let Some(mut neighbors) = self.neighbors.get_mut(&id) {
+                    neighbors[0].push(entry_id);
+                }
+
+                if let Some(mut entry_neighbors) = self.neighbors.get_mut(&entry_id) {
+                    if entry_neighbors[0].len() < max_conn {
+                        entry_neighbors[0].push(id);
+                    }
+                }
+            }
+
+            if level > self.max_layer.load(AtomicOrdering::Relaxed) {
+                *self.entry_point.write() = Some(id);
+                self.max_layer.store(level, AtomicOrdering::Relaxed);
+            }
+
+            id
+        }
+
+        pub fn insert_batch(&self, vectors: &[Vec<f32>]) -> Vec<u64> {
+            vectors.iter().map(|v| self.insert(v.clone())).collect()
+        }
+
+        pub fn insert_batch_parallel(&self, vectors: &[Vec<f32>]) -> Vec<u64> {
+            // Parallel insertion with batching
+            vectors.par_iter().map(|v| self.insert(v.clone())).collect()
+        }
+
+        pub fn search(&self, query: &[f32], k: usize) -> Vec<(u64, f32)> {
+            // Brute force for simplicity in benchmarks
+            let mut results: Vec<(u64, f32)> = self
+                .nodes
+                .iter()
+                .map(|entry| {
+                    let dist = self.distance(query, entry.value());
+                    (*entry.key(), dist)
+                })
+                .collect();
+
+            results.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+            results.truncate(k);
+            results
+        }
+
+        pub fn search_parallel(&self, query: &[f32], k: usize) -> Vec<(u64, f32)> {
+            let mut results: Vec<(u64, f32)> = self
+                .nodes
+                .iter()
+                .collect::<Vec<_>>()
+                .par_iter()
+                .map(|entry| {
+                    let dist = self.distance(query, entry.value());
+                    (*entry.key(), dist)
+                })
+                .collect();
+
+            results.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+            results.truncate(k);
+            results
+        }
+
+        pub fn memory_usage(&self) -> usize {
+            let vector_bytes = self.nodes.len() * self.dimensions * 4;
+            let neighbor_bytes: usize = self
+                .neighbors
+                .iter()
+                .map(|entry| entry.value().iter().map(|l| l.len() * 8).sum::<usize>())
+                .sum();
+            vector_bytes + neighbor_bytes
+        }
+    }
+}
+
+use index::HnswIndex;
+
+// ============================================================================
+// Test Data Generation
+// ============================================================================
+
+fn generate_random_vectors(n: usize, dims: usize, seed: u64) -> Vec<Vec<f32>> {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+    (0..n)
+        .map(|_| (0..dims).map(|_| rng.gen_range(-1.0..1.0)).collect())
+        .collect()
+}
+
+fn generate_normalized_vectors(n: usize, dims: usize, seed: u64) -> Vec<Vec<f32>> {
+    let vectors = generate_random_vectors(n, dims, seed);
+    vectors
+        .into_iter()
+        .map(|v| {
+            let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
+            v.into_iter().map(|x| x / norm).collect()
+        })
+        .collect()
+}
+
+// ============================================================================
+// Full Query Pipeline Benchmarks
+// ============================================================================
+
+fn bench_query_pipeline(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Query Pipeline");
+
+    for &dims in [128, 384, 768, 1536].iter() {
+        for &n in [10_000, 100_000].iter() {
+            let vectors = generate_random_vectors(n, dims, 42);
+            let query = vectors[0].clone();
+
+            let index = HnswIndex::new(dims, 16, 64, 40, 42);
+            index.insert_batch(&vectors);
+
+            group.throughput(Throughput::Elements(1));
+
+            // Full pipeline: search + post-process
+            group.bench_with_input(BenchmarkId::new(format!("{}d", dims), n), &n, |bench, _| {
+                bench.iter(|| {
+                    // Search
+                    let results = index.search(&query, 10);
+
+                    // Post-process (e.g., fetch metadata, rerank)
+                    let processed: Vec<_> = results
+                        .iter()
+                        .map(|(id, dist)| {
+                            // Simulate metadata lookup
+                            let metadata = id.to_string();
+                            (*id, *dist, metadata)
+                        })
+                        .collect();
+
+                    black_box(processed)
+                })
+            });
+        }
+    }
+
+    group.finish();
+}
+
+fn bench_query_pipeline_parallel(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Query Pipeline (Parallel)");
+
+    let dims = 768;
+    let n = 100_000;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let queries: Vec<Vec<f32>> = generate_random_vectors(100, dims, 999);
+
+    let index = HnswIndex::new(dims, 16, 64, 40, 42);
+    index.insert_batch(&vectors);
+
+    group.throughput(Throughput::Elements(100));
+
+    group.bench_function("sequential", |bench| {
+        bench.iter(|| {
+            queries
+                .iter()
+                .map(|q| index.search(q, 10))
+                .collect::<Vec<_>>()
+        })
+    });
+
+    group.bench_function("parallel_queries", |bench| {
+        bench.iter(|| {
+            queries
+                .par_iter()
+                .map(|q| index.search(q, 10))
+                .collect::<Vec<_>>()
+        })
+    });
+
+    group.bench_function("parallel_search_internal", |bench| {
+        bench.iter(|| {
+            queries
+                .iter()
+                .map(|q| index.search_parallel(q, 10))
+                .collect::<Vec<_>>()
+        })
+    });
+
+    group.bench_function("full_parallel", |bench| {
+        bench.iter(|| {
+            queries
+                .par_iter()
+                .map(|q| index.search_parallel(q, 10))
+                .collect::<Vec<_>>()
+        })
+    });
+
+    group.finish();
+}
+
+// ============================================================================
+// Insert Throughput Benchmarks
+// ============================================================================
+
+fn bench_insert_throughput(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Insert Throughput");
+    group.sample_size(10);
+
+    for &dims in [128, 384, 768, 1536].iter() {
+        for &n in [1_000, 10_000, 100_000].iter() {
+            let vectors = generate_random_vectors(n, dims, 42);
+
+            group.throughput(Throughput::Elements(n as u64));
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("{}d", dims), n),
+                &vectors,
+                |bench, vecs| {
+                    bench.iter(|| {
+                        let index = HnswIndex::new(dims, 16, 64, 40, 42);
+                        index.insert_batch(vecs);
+                        black_box(index.len())
+                    })
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+
+fn bench_insert_throughput_parallel(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Insert Throughput (Parallel)");
+    group.sample_size(10);
+
+    let dims = 768;
+
+    for &n in [10_000, 100_000].iter() {
+        let vectors = generate_random_vectors(n, dims, 42);
+
+        group.throughput(Throughput::Elements(n as u64));
+
+        group.bench_with_input(
+            BenchmarkId::new("sequential", n),
+            &vectors,
+            |bench, vecs| {
+                bench.iter(|| {
+                    let index = HnswIndex::new(dims, 16, 64, 40, 42);
+                    index.insert_batch(vecs);
+                    black_box(index.len())
+                })
+            },
+        );
+
+        group.bench_with_input(BenchmarkId::new("parallel", n), &vectors, |bench, vecs| {
+            bench.iter(|| {
+                let index = HnswIndex::new(dims, 16, 64, 40, 42);
+                index.insert_batch_parallel(vecs);
+                black_box(index.len())
+            })
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_insert_batching(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Insert Batch Sizes");
+    group.sample_size(10);
+
+    let dims = 768;
+    let n = 10_000;
+    let vectors = generate_random_vectors(n, dims, 42);
+
+    for &batch_size in [1, 10, 100, 1000, 10000].iter() {
+        group.throughput(Throughput::Elements(n as u64));
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(batch_size),
+            &batch_size,
+            |bench, &bs| {
+                bench.iter(|| {
+                    let index = HnswIndex::new(dims, 16, 64, 40, 42);
+
+                    for chunk in vectors.chunks(bs) {
+                        index.insert_batch(chunk);
+                    }
+
+                    black_box(index.len())
+                })
+            },
+        );
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Concurrent Query Scaling
+// ============================================================================
+
+fn bench_concurrent_scaling(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Concurrent Query Scaling");
+    group.sample_size(10);
+
+    let dims = 768;
+    let n = 100_000;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let queries = generate_random_vectors(1000, dims, 999);
+
+    let index = Arc::new(HnswIndex::new(dims, 16, 64, 40, 42));
+    index.insert_batch(&vectors);
+
+    for &num_threads in [1, 2, 4, 8, 16].iter() {
+        group.throughput(Throughput::Elements(1000));
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(num_threads),
+            &num_threads,
+            |bench, &threads| {
+                let pool = rayon::ThreadPoolBuilder::new()
+                    .num_threads(threads)
+                    .build()
+                    .unwrap();
+
+                bench.iter(|| {
+                    pool.install(|| {
+                        queries.par_iter().for_each(|q| {
+                            black_box(index.search(q, 10));
+                        });
+                    })
+                })
+            },
+        );
+    }
+
+    group.finish();
+}
+
+fn bench_mixed_workload(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Mixed Read/Write Workload");
+    group.sample_size(10);
+
+    let dims = 768;
+    let n = 50_000;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let queries = generate_random_vectors(100, dims, 999);
+    let new_vectors = generate_random_vectors(1000, dims, 123);
+
+    let index = Arc::new(HnswIndex::new(dims, 16, 64, 40, 42));
+    index.insert_batch(&vectors);
+
+    // Read-heavy (90% reads, 10% writes)
+    group.bench_function("read_heavy", |bench| {
+        let idx = index.clone();
+        bench.iter(|| {
+            // 90 reads
+            for q in queries.iter().take(90) {
+                black_box(idx.search(q, 10));
+            }
+            // 10 writes
+            for v in new_vectors.iter().take(10) {
+                black_box(idx.insert(v.clone()));
+            }
+        })
+    });
+
+    // Balanced (50% reads, 50% writes)
+    group.bench_function("balanced", |bench| {
+        let idx = index.clone();
+        bench.iter(|| {
+            for (q, v) in queries.iter().take(50).zip(new_vectors.iter().take(50)) {
+                black_box(idx.search(q, 10));
+                black_box(idx.insert(v.clone()));
+            }
+        })
+    });
+
+    // Write-heavy (10% reads, 90% writes)
+    group.bench_function("write_heavy", |bench| {
+        let idx = index.clone();
+        bench.iter(|| {
+            // 10 reads
+            for q in queries.iter().take(10) {
+                black_box(idx.search(q, 10));
+            }
+            // 90 writes
+            for v in new_vectors.iter().take(90) {
+                black_box(idx.insert(v.clone()));
+            }
+        })
+    });
+
+    group.finish();
+}
+
+// ============================================================================
+// Memory Usage Under Load
+// ============================================================================
+
+fn bench_memory_growth(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Memory Growth");
+    group.sample_size(10);
+
+    let dims = 768;
+
+    for &n in [1_000, 10_000, 50_000, 100_000].iter() {
+        let vectors = generate_random_vectors(n, dims, 42);
+
+        group.bench_with_input(BenchmarkId::from_parameter(n), &vectors, |bench, vecs| {
+            bench.iter(|| {
+                let index = HnswIndex::new(dims, 16, 64, 40, 42);
+                index.insert_batch(vecs);
+
+                let memory = index.memory_usage();
+                let per_vector = memory as f64 / n as f64;
+
+                black_box((memory, per_vector))
+            })
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_memory_efficiency(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Memory Efficiency (M parameter)");
+    group.sample_size(10);
+
+    let dims = 768;
+    let n = 10_000;
+    let vectors = generate_random_vectors(n, dims, 42);
+
+    for &m in [8, 12, 16, 24, 32, 48].iter() {
+        group.bench_with_input(BenchmarkId::from_parameter(m), &m, |bench, &m_val| {
+            bench.iter(|| {
+                let index = HnswIndex::new(dims, m_val, 64, 40, 42);
+                index.insert_batch(&vectors);
+
+                let memory = index.memory_usage();
+                let per_vector = memory as f64 / n as f64;
+
+                black_box(per_vector)
+            })
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Latency Distribution
+// ============================================================================
+
+fn bench_latency_distribution(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Latency Distribution");
+    group.sample_size(10);
+
+    let dims = 768;
+    let n = 100_000;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let queries = generate_random_vectors(1000, dims, 999);
+
+    let index = HnswIndex::new(dims, 16, 64, 40, 42);
+    index.insert_batch(&vectors);
+
+    group.bench_function("collect_percentiles", |bench| {
+        bench.iter(|| {
+            let mut latencies: Vec<Duration> = Vec::with_capacity(queries.len());
+
+            for query in &queries {
+                let start = Instant::now();
+                black_box(index.search(query, 10));
+                latencies.push(start.elapsed());
+            }
+
+            latencies.sort();
+
+            let p50 = latencies[latencies.len() / 2];
+            let p95 = latencies[(latencies.len() as f64 * 0.95) as usize];
+            let p99 = latencies[(latencies.len() as f64 * 0.99) as usize];
+            let p999 = latencies[(latencies.len() as f64 * 0.999) as usize];
+
+            black_box((p50, p95, p99, p999))
+        })
+    });
+
+    group.finish();
+}
+
+// ============================================================================
+// Dimension Scaling
+// ============================================================================
+
+fn bench_dimension_scaling(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Dimension Scaling");
+    group.sample_size(10);
+
+    let n = 10_000;
+
+    for &dims in [64, 128, 256, 384, 512, 768, 1024, 1536, 2048, 3072].iter() {
+        let vectors = generate_random_vectors(n, dims, 42);
+        let query = vectors[0].clone();
+
+        let index = HnswIndex::new(dims, 16, 64, 40, 42);
+        index.insert_batch(&vectors);
+
+        group.bench_with_input(BenchmarkId::new("search", dims), &dims, |bench, _| {
+            bench.iter(|| black_box(index.search(&query, 10)))
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// pgvector Comparison Baselines
+// ============================================================================
+
+fn bench_baseline_brute_force(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Baseline Brute Force");
+    group.sample_size(10);
+
+    for &dims in [128, 384, 768, 1536].iter() {
+        for &n in [1_000, 10_000, 100_000].iter() {
+            let vectors = generate_random_vectors(n, dims, 42);
+            let query = vectors[0].clone();
+
+            group.throughput(Throughput::Elements(n as u64));
+
+            // Sequential brute force
+            group.bench_with_input(
+                BenchmarkId::new(format!("{}d_seq", dims), n),
+                &vectors,
+                |bench, vecs| {
+                    bench.iter(|| {
+                        let mut distances: Vec<(usize, f32)> = vecs
+                            .iter()
+                            .enumerate()
+                            .map(|(i, v)| {
+                                let dist: f32 = query
+                                    .iter()
+                                    .zip(v.iter())
+                                    .map(|(a, b)| (a - b).powi(2))
+                                    .sum::<f32>()
+                                    .sqrt();
+                                (i, dist)
+                            })
+                            .collect();
+
+                        distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+                        distances.truncate(10);
+                        black_box(distances)
+                    })
+                },
+            );
+
+            // Parallel brute force
+            group.bench_with_input(
+                BenchmarkId::new(format!("{}d_par", dims), n),
+                &vectors,
+                |bench, vecs| {
+                    bench.iter(|| {
+                        let mut distances: Vec<(usize, f32)> = vecs
+                            .par_iter()
+                            .enumerate()
+                            .map(|(i, v)| {
+                                let dist: f32 = query
+                                    .iter()
+                                    .zip(v.iter())
+                                    .map(|(a, b)| (a - b).powi(2))
+                                    .sum::<f32>()
+                                    .sqrt();
+                                (i, dist)
+                            })
+                            .collect();
+
+                        distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+                        distances.truncate(10);
+                        black_box(distances)
+                    })
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Recall vs Throughput Tradeoff
+// ============================================================================
+
+fn bench_recall_throughput_tradeoff(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Recall vs Throughput");
+    group.sample_size(10);
+
+    let dims = 768;
+    let n = 10_000;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let query = vectors[0].clone();
+
+    // Compute ground truth
+    let ground_truth: Vec<usize> = {
+        let mut distances: Vec<(usize, f32)> = vectors
+            .iter()
+            .enumerate()
+            .map(|(i, v)| {
+                let dist: f32 = query
+                    .iter()
+                    .zip(v.iter())
+                    .map(|(a, b)| (a - b).powi(2))
+                    .sum::<f32>()
+                    .sqrt();
+                (i, dist)
+            })
+            .collect();
+        distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+        distances.iter().take(10).map(|(i, _)| *i).collect()
+    };
+
+    for &ef_search in [10, 20, 40, 80, 160, 320].iter() {
+        let index = HnswIndex::new(dims, 16, 64, ef_search, 42);
+        index.insert_batch(&vectors);
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(ef_search),
+            &ef_search,
+            |bench, _| {
+                bench.iter(|| {
+                    let results = index.search(&query, 10);
+
+                    // Calculate recall
+                    let recall = results
+                        .iter()
+                        .filter(|(id, _)| ground_truth.contains(&(*id as usize)))
+                        .count() as f64
+                        / 10.0;
+
+                    black_box(recall)
+                })
+            },
+        );
+    }
+
+    group.finish();
+}
+
+criterion_group!(
+    benches,
+    // Query Pipeline
+    bench_query_pipeline,
+    bench_query_pipeline_parallel,
+    // Insert Throughput
+    bench_insert_throughput,
+    bench_insert_throughput_parallel,
+    bench_insert_batching,
+    // Concurrent Scaling
+    bench_concurrent_scaling,
+    bench_mixed_workload,
+    // Memory Usage
+    bench_memory_growth,
+    bench_memory_efficiency,
+    // Latency
+    bench_latency_distribution,
+    // Dimension Scaling
+    bench_dimension_scaling,
+    // Baselines
+    bench_baseline_brute_force,
+    // Recall/Throughput
+    bench_recall_throughput_tradeoff,
+);
+
+criterion_main!(benches);
--- a/crates/ruvector-postgres/benches/hybrid_bench.rs
+++ b/crates/ruvector-postgres/benches/hybrid_bench.rs
@@ -0,0 +1,742 @@
+//! Hybrid search benchmarks
+//!
+//! Benchmarks for combining vector search with keyword/BM25 scoring:
+//! - Vector-only vs hybrid latency
+//! - BM25 scoring overhead
+//! - Fusion algorithm comparison (RRF, weighted sum)
+//! - Parallel branch execution gain
+
+use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
+use rand::prelude::*;
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+use std::cmp::Ordering;
+use std::collections::{BinaryHeap, HashMap, HashSet};
+
+// ============================================================================
+// BM25 Implementation
+// ============================================================================
+
+mod bm25 {
+    use std::cmp::Ordering;
+    use std::collections::HashMap;
+
+    /// Simple tokenizer
+    pub fn tokenize(text: &str) -> Vec<String> {
+        text.to_lowercase()
+            .split(|c: char| !c.is_alphanumeric())
+            .filter(|s| !s.is_empty() && s.len() > 2)
+            .map(|s| s.to_string())
+            .collect()
+    }
+
+    /// BM25 scoring index
+    pub struct BM25Index {
+        /// Document frequency for each term
+        pub doc_freq: HashMap<String, usize>,
+        /// Term frequency per document
+        pub term_freq: Vec<HashMap<String, usize>>,
+        /// Document lengths
+        pub doc_lengths: Vec<usize>,
+        /// Average document length
+        pub avg_doc_len: f64,
+        /// Number of documents
+        pub num_docs: usize,
+        /// BM25 parameters
+        pub k1: f64,
+        pub b: f64,
+    }
+
+    impl BM25Index {
+        pub fn new(k1: f64, b: f64) -> Self {
+            Self {
+                doc_freq: HashMap::new(),
+                term_freq: Vec::new(),
+                doc_lengths: Vec::new(),
+                avg_doc_len: 0.0,
+                num_docs: 0,
+                k1,
+                b,
+            }
+        }
+
+        pub fn build(&mut self, documents: &[String]) {
+            self.num_docs = documents.len();
+            self.term_freq = Vec::with_capacity(documents.len());
+            self.doc_lengths = Vec::with_capacity(documents.len());
+
+            let mut total_len = 0usize;
+
+            for doc in documents {
+                let tokens = tokenize(doc);
+                self.doc_lengths.push(tokens.len());
+                total_len += tokens.len();
+
+                let mut tf: HashMap<String, usize> = HashMap::new();
+                let mut seen_terms: std::collections::HashSet<String> =
+                    std::collections::HashSet::new();
+
+                for token in tokens {
+                    *tf.entry(token.clone()).or_insert(0) += 1;
+
+                    if !seen_terms.contains(&token) {
+                        *self.doc_freq.entry(token.clone()).or_insert(0) += 1;
+                        seen_terms.insert(token);
+                    }
+                }
+
+                self.term_freq.push(tf);
+            }
+
+            self.avg_doc_len = total_len as f64 / documents.len() as f64;
+        }
+
+        /// Calculate IDF for a term
+        fn idf(&self, term: &str) -> f64 {
+            let df = self.doc_freq.get(term).copied().unwrap_or(0) as f64;
+            if df == 0.0 {
+                return 0.0;
+            }
+            ((self.num_docs as f64 - df + 0.5) / (df + 0.5) + 1.0).ln()
+        }
+
+        /// Score a document against a query
+        pub fn score(&self, doc_id: usize, query_tokens: &[String]) -> f64 {
+            if doc_id >= self.term_freq.len() {
+                return 0.0;
+            }
+
+            let doc_tf = &self.term_freq[doc_id];
+            let doc_len = self.doc_lengths[doc_id] as f64;
+
+            let mut score = 0.0;
+
+            for term in query_tokens {
+                let tf = doc_tf.get(term).copied().unwrap_or(0) as f64;
+                if tf == 0.0 {
+                    continue;
+                }
+
+                let idf = self.idf(term);
+                let numerator = tf * (self.k1 + 1.0);
+                let denominator =
+                    tf + self.k1 * (1.0 - self.b + self.b * (doc_len / self.avg_doc_len));
+
+                score += idf * (numerator / denominator);
+            }
+
+            score
+        }
+
+        /// Search and return top-k documents
+        pub fn search(&self, query: &str, k: usize) -> Vec<(usize, f64)> {
+            let query_tokens = tokenize(query);
+
+            let mut scores: Vec<(usize, f64)> = (0..self.num_docs)
+                .map(|doc_id| (doc_id, self.score(doc_id, &query_tokens)))
+                .filter(|(_, score)| *score > 0.0)
+                .collect();
+
+            scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(Ordering::Equal));
+            scores.truncate(k);
+            scores
+        }
+    }
+}
+
+// ============================================================================
+// Vector Search (Simplified)
+// ============================================================================
+
+mod vector_search {
+    use std::cmp::Ordering;
+
+    pub fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
+        a.iter()
+            .zip(b.iter())
+            .map(|(x, y)| (x - y).powi(2))
+            .sum::<f32>()
+            .sqrt()
+    }
+
+    pub fn search(vectors: &[Vec<f32>], query: &[f32], k: usize) -> Vec<(usize, f32)> {
+        let mut results: Vec<(usize, f32)> = vectors
+            .iter()
+            .enumerate()
+            .map(|(i, v)| (i, euclidean_distance(query, v)))
+            .collect();
+
+        results.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(Ordering::Equal));
+        results.truncate(k);
+        results
+    }
+
+    pub fn search_parallel(vectors: &[Vec<f32>], query: &[f32], k: usize) -> Vec<(usize, f32)> {
+        use rayon::prelude::*;
+
+        let mut results: Vec<(usize, f32)> = vectors
+            .par_iter()
+            .enumerate()
+            .map(|(i, v)| (i, euclidean_distance(query, v)))
+            .collect();
+
+        results.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(Ordering::Equal));
+        results.truncate(k);
+        results
+    }
+}
+
+// ============================================================================
+// Fusion Algorithms
+// ============================================================================
+
+mod fusion {
+    use std::collections::HashMap;
+
+    /// Reciprocal Rank Fusion
+    pub fn rrf(
+        vector_results: &[(usize, f32)],
+        text_results: &[(usize, f64)],
+        k: usize,
+        rrf_k: f64,
+    ) -> Vec<(usize, f64)> {
+        let mut scores: HashMap<usize, f64> = HashMap::new();
+
+        // Vector results
+        for (rank, (doc_id, _)) in vector_results.iter().enumerate() {
+            let rrf_score = 1.0 / (rrf_k + rank as f64 + 1.0);
+            *scores.entry(*doc_id).or_insert(0.0) += rrf_score;
+        }
+
+        // Text results
+        for (rank, (doc_id, _)) in text_results.iter().enumerate() {
+            let rrf_score = 1.0 / (rrf_k + rank as f64 + 1.0);
+            *scores.entry(*doc_id).or_insert(0.0) += rrf_score;
+        }
+
+        let mut results: Vec<(usize, f64)> = scores.into_iter().collect();
+        results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+        results.truncate(k);
+        results
+    }
+
+    /// Weighted score fusion (requires normalized scores)
+    pub fn weighted_sum(
+        vector_results: &[(usize, f32)],
+        text_results: &[(usize, f64)],
+        k: usize,
+        vector_weight: f64,
+        text_weight: f64,
+    ) -> Vec<(usize, f64)> {
+        // Normalize vector scores (lower distance = higher score)
+        let max_dist = vector_results
+            .iter()
+            .map(|(_, d)| *d)
+            .fold(0.0f32, f32::max);
+        let vector_scores: HashMap<usize, f64> = vector_results
+            .iter()
+            .map(|(id, dist)| (*id, (1.0 - dist / max_dist.max(1e-6)) as f64))
+            .collect();
+
+        // Normalize text scores
+        let max_text = text_results.iter().map(|(_, s)| *s).fold(0.0f64, f64::max);
+        let text_scores: HashMap<usize, f64> = text_results
+            .iter()
+            .map(|(id, score)| (*id, score / max_text.max(1e-6)))
+            .collect();
+
+        // Combine
+        let mut all_ids: std::collections::HashSet<usize> = std::collections::HashSet::new();
+        all_ids.extend(vector_scores.keys());
+        all_ids.extend(text_scores.keys());
+
+        let mut results: Vec<(usize, f64)> = all_ids
+            .iter()
+            .map(|&id| {
+                let v_score = vector_scores.get(&id).copied().unwrap_or(0.0);
+                let t_score = text_scores.get(&id).copied().unwrap_or(0.0);
+                (id, vector_weight * v_score + text_weight * t_score)
+            })
+            .collect();
+
+        results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+        results.truncate(k);
+        results
+    }
+
+    /// Disjunctive Normalization
+    pub fn disjunctive_normalization(
+        vector_results: &[(usize, f32)],
+        text_results: &[(usize, f64)],
+        k: usize,
+    ) -> Vec<(usize, f64)> {
+        let mut scores: HashMap<usize, f64> = HashMap::new();
+
+        // Vector results (convert distance to similarity)
+        let max_dist = vector_results
+            .iter()
+            .map(|(_, d)| *d)
+            .fold(0.0f32, f32::max);
+        for (doc_id, dist) in vector_results {
+            let sim = 1.0 - (*dist / max_dist.max(1e-6)) as f64;
+            scores.insert(*doc_id, sim);
+        }
+
+        // Text results (add if not present, max if present)
+        let max_text = text_results.iter().map(|(_, s)| *s).fold(0.0f64, f64::max);
+        for (doc_id, score) in text_results {
+            let norm_score = score / max_text.max(1e-6);
+            let current = scores.entry(*doc_id).or_insert(0.0);
+            *current = current.max(norm_score);
+        }
+
+        let mut results: Vec<(usize, f64)> = scores.into_iter().collect();
+        results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+        results.truncate(k);
+        results
+    }
+}
+
+use bm25::{tokenize, BM25Index};
+use fusion::{disjunctive_normalization, rrf, weighted_sum};
+use vector_search::{search as vector_search_fn, search_parallel as vector_search_parallel};
+
+// ============================================================================
+// Test Data Generation
+// ============================================================================
+
+fn generate_random_vectors(n: usize, dims: usize, seed: u64) -> Vec<Vec<f32>> {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+    (0..n)
+        .map(|_| (0..dims).map(|_| rng.gen_range(-1.0..1.0)).collect())
+        .collect()
+}
+
+fn generate_random_documents(n: usize, seed: u64) -> Vec<String> {
+    let words = [
+        "machine",
+        "learning",
+        "artificial",
+        "intelligence",
+        "neural",
+        "network",
+        "deep",
+        "training",
+        "model",
+        "data",
+        "algorithm",
+        "optimization",
+        "gradient",
+        "descent",
+        "backpropagation",
+        "convolution",
+        "recurrent",
+        "transformer",
+        "attention",
+        "embedding",
+        "vector",
+        "search",
+        "similarity",
+        "distance",
+        "nearest",
+        "neighbor",
+        "index",
+        "query",
+        "retrieval",
+        "ranking",
+        "database",
+        "storage",
+        "distributed",
+        "parallel",
+        "processing",
+    ];
+
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+
+    (0..n)
+        .map(|_| {
+            let len = rng.gen_range(20..100);
+            (0..len)
+                .map(|_| words[rng.gen_range(0..words.len())])
+                .collect::<Vec<_>>()
+                .join(" ")
+        })
+        .collect()
+}
+
+// ============================================================================
+// Vector-Only vs Hybrid Benchmarks
+// ============================================================================
+
+fn bench_vector_only(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Vector Only Search");
+
+    for &n in [10_000, 100_000].iter() {
+        let dims = 768;
+        let vectors = generate_random_vectors(n, dims, 42);
+        let query = vectors[0].clone();
+
+        group.throughput(Throughput::Elements(n as u64));
+
+        group.bench_with_input(BenchmarkId::new("sequential", n), &n, |bench, _| {
+            bench.iter(|| black_box(vector_search_fn(&vectors, &query, 10)))
+        });
+
+        group.bench_with_input(BenchmarkId::new("parallel", n), &n, |bench, _| {
+            bench.iter(|| black_box(vector_search_parallel(&vectors, &query, 10)))
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_text_only(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Text Only (BM25) Search");
+
+    for &n in [10_000, 100_000].iter() {
+        let documents = generate_random_documents(n, 42);
+
+        let mut bm25 = BM25Index::new(1.2, 0.75);
+        bm25.build(&documents);
+
+        let query = "machine learning neural network";
+
+        group.throughput(Throughput::Elements(n as u64));
+
+        group.bench_with_input(BenchmarkId::from_parameter(n), &n, |bench, _| {
+            bench.iter(|| black_box(bm25.search(query, 10)))
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_hybrid_search(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Hybrid Search");
+
+    for &n in [10_000, 100_000].iter() {
+        let dims = 768;
+        let vectors = generate_random_vectors(n, dims, 42);
+        let documents = generate_random_documents(n, 42);
+        let vector_query = vectors[0].clone();
+        let text_query = "machine learning neural network";
+
+        let mut bm25 = BM25Index::new(1.2, 0.75);
+        bm25.build(&documents);
+
+        group.throughput(Throughput::Elements(n as u64));
+
+        // Sequential hybrid
+        group.bench_with_input(BenchmarkId::new("sequential", n), &n, |bench, _| {
+            bench.iter(|| {
+                let vector_results = vector_search_fn(&vectors, &vector_query, 100);
+                let text_results = bm25.search(text_query, 100);
+                black_box(rrf(&vector_results, &text_results, 10, 60.0))
+            })
+        });
+
+        // Parallel hybrid (branches)
+        group.bench_with_input(BenchmarkId::new("parallel_branches", n), &n, |bench, _| {
+            bench.iter(|| {
+                let (vector_results, text_results) = rayon::join(
+                    || vector_search_parallel(&vectors, &vector_query, 100),
+                    || bm25.search(text_query, 100),
+                );
+                black_box(rrf(&vector_results, &text_results, 10, 60.0))
+            })
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// BM25 Overhead Benchmarks
+// ============================================================================
+
+fn bench_bm25_build(c: &mut Criterion) {
+    let mut group = c.benchmark_group("BM25 Index Build");
+
+    for &n in [1_000, 10_000, 100_000].iter() {
+        let documents = generate_random_documents(n, 42);
+
+        group.throughput(Throughput::Elements(n as u64));
+
+        group.bench_with_input(BenchmarkId::from_parameter(n), &documents, |bench, docs| {
+            bench.iter(|| {
+                let mut bm25 = BM25Index::new(1.2, 0.75);
+                bm25.build(docs);
+                black_box(bm25)
+            })
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_bm25_query_lengths(c: &mut Criterion) {
+    let mut group = c.benchmark_group("BM25 Query Length");
+
+    let n = 100_000;
+    let documents = generate_random_documents(n, 42);
+
+    let mut bm25 = BM25Index::new(1.2, 0.75);
+    bm25.build(&documents);
+
+    let queries = [
+        "machine",
+        "machine learning",
+        "machine learning neural network",
+        "machine learning neural network deep training model",
+        "machine learning neural network deep training model algorithm optimization gradient descent",
+    ];
+
+    for query in queries.iter() {
+        let token_count = tokenize(query).len();
+
+        group.bench_with_input(
+            BenchmarkId::new("tokens", token_count),
+            query,
+            |bench, q| bench.iter(|| black_box(bm25.search(q, 10))),
+        );
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Fusion Algorithm Comparison
+// ============================================================================
+
+fn bench_fusion_algorithms(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Fusion Algorithms");
+
+    let n = 100_000;
+    let dims = 768;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let documents = generate_random_documents(n, 42);
+    let vector_query = vectors[0].clone();
+    let text_query = "machine learning neural network";
+
+    let mut bm25 = BM25Index::new(1.2, 0.75);
+    bm25.build(&documents);
+
+    // Pre-compute search results
+    let vector_results = vector_search_fn(&vectors, &vector_query, 1000);
+    let text_results = bm25.search(text_query, 1000);
+
+    for &k in [10, 50, 100].iter() {
+        group.bench_with_input(BenchmarkId::new("rrf", k), &k, |bench, &k_val| {
+            bench.iter(|| black_box(rrf(&vector_results, &text_results, k_val, 60.0)))
+        });
+
+        group.bench_with_input(BenchmarkId::new("weighted_sum", k), &k, |bench, &k_val| {
+            bench.iter(|| {
+                black_box(weighted_sum(
+                    &vector_results,
+                    &text_results,
+                    k_val,
+                    0.6,
+                    0.4,
+                ))
+            })
+        });
+
+        group.bench_with_input(
+            BenchmarkId::new("disjunctive_norm", k),
+            &k,
+            |bench, &k_val| {
+                bench.iter(|| {
+                    black_box(disjunctive_normalization(
+                        &vector_results,
+                        &text_results,
+                        k_val,
+                    ))
+                })
+            },
+        );
+    }
+
+    group.finish();
+}
+
+fn bench_rrf_k_parameter(c: &mut Criterion) {
+    let mut group = c.benchmark_group("RRF K Parameter");
+
+    let n = 100_000;
+    let dims = 768;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let documents = generate_random_documents(n, 42);
+    let vector_query = vectors[0].clone();
+    let text_query = "machine learning neural network";
+
+    let mut bm25 = BM25Index::new(1.2, 0.75);
+    bm25.build(&documents);
+
+    let vector_results = vector_search_fn(&vectors, &vector_query, 1000);
+    let text_results = bm25.search(text_query, 1000);
+
+    for &rrf_k in [1.0, 20.0, 60.0, 100.0, 200.0].iter() {
+        group.bench_with_input(
+            BenchmarkId::from_parameter(rrf_k as i32),
+            &rrf_k,
+            |bench, &k| bench.iter(|| black_box(rrf(&vector_results, &text_results, 10, k))),
+        );
+    }
+
+    group.finish();
+}
+
+fn bench_weight_ratios(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Weight Ratios");
+
+    let n = 100_000;
+    let dims = 768;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let documents = generate_random_documents(n, 42);
+    let vector_query = vectors[0].clone();
+    let text_query = "machine learning neural network";
+
+    let mut bm25 = BM25Index::new(1.2, 0.75);
+    bm25.build(&documents);
+
+    let vector_results = vector_search_fn(&vectors, &vector_query, 1000);
+    let text_results = bm25.search(text_query, 1000);
+
+    let ratios = [
+        (0.0, 1.0, "text_only"),
+        (0.3, 0.7, "text_heavy"),
+        (0.5, 0.5, "balanced"),
+        (0.7, 0.3, "vector_heavy"),
+        (1.0, 0.0, "vector_only"),
+    ];
+
+    for (vector_w, text_w, name) in ratios.iter() {
+        group.bench_with_input(
+            BenchmarkId::from_parameter(name),
+            &(*vector_w, *text_w),
+            |bench, &(v_w, t_w)| {
+                bench.iter(|| black_box(weighted_sum(&vector_results, &text_results, 10, v_w, t_w)))
+            },
+        );
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Parallel Branch Execution
+// ============================================================================
+
+fn bench_parallel_execution_gain(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Parallel Branch Execution");
+
+    for &n in [10_000, 50_000, 100_000].iter() {
+        let dims = 768;
+        let vectors = generate_random_vectors(n, dims, 42);
+        let documents = generate_random_documents(n, 42);
+        let vector_query = vectors[0].clone();
+        let text_query = "machine learning neural network";
+
+        let mut bm25 = BM25Index::new(1.2, 0.75);
+        bm25.build(&documents);
+
+        // Sequential
+        group.bench_with_input(BenchmarkId::new("sequential", n), &n, |bench, _| {
+            bench.iter(|| {
+                let vector_results = vector_search_fn(&vectors, &vector_query, 100);
+                let text_results = bm25.search(text_query, 100);
+                black_box((vector_results, text_results))
+            })
+        });
+
+        // Parallel with rayon::join
+        group.bench_with_input(BenchmarkId::new("parallel_join", n), &n, |bench, _| {
+            bench.iter(|| {
+                let (vector_results, text_results) = rayon::join(
+                    || vector_search_fn(&vectors, &vector_query, 100),
+                    || bm25.search(text_query, 100),
+                );
+                black_box((vector_results, text_results))
+            })
+        });
+
+        // Parallel vector search only
+        group.bench_with_input(BenchmarkId::new("parallel_vector", n), &n, |bench, _| {
+            bench.iter(|| {
+                let vector_results = vector_search_parallel(&vectors, &vector_query, 100);
+                let text_results = bm25.search(text_query, 100);
+                black_box((vector_results, text_results))
+            })
+        });
+
+        // Full parallel
+        group.bench_with_input(BenchmarkId::new("full_parallel", n), &n, |bench, _| {
+            bench.iter(|| {
+                let (vector_results, text_results) = rayon::join(
+                    || vector_search_parallel(&vectors, &vector_query, 100),
+                    || bm25.search(text_query, 100),
+                );
+                black_box((vector_results, text_results))
+            })
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Candidate Count Analysis
+// ============================================================================
+
+fn bench_candidate_counts(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Candidate Count Analysis");
+
+    let n = 100_000;
+    let dims = 768;
+    let vectors = generate_random_vectors(n, dims, 42);
+    let documents = generate_random_documents(n, 42);
+    let vector_query = vectors[0].clone();
+    let text_query = "machine learning neural network";
+
+    let mut bm25 = BM25Index::new(1.2, 0.75);
+    bm25.build(&documents);
+
+    for &candidates in [50, 100, 200, 500, 1000, 2000].iter() {
+        group.bench_with_input(
+            BenchmarkId::from_parameter(candidates),
+            &candidates,
+            |bench, &k_candidates| {
+                bench.iter(|| {
+                    let (vector_results, text_results) = rayon::join(
+                        || vector_search_parallel(&vectors, &vector_query, k_candidates),
+                        || bm25.search(text_query, k_candidates),
+                    );
+                    black_box(rrf(&vector_results, &text_results, 10, 60.0))
+                })
+            },
+        );
+    }
+
+    group.finish();
+}
+
+criterion_group!(
+    benches,
+    // Vector vs Text
+    bench_vector_only,
+    bench_text_only,
+    bench_hybrid_search,
+    // BM25 Overhead
+    bench_bm25_build,
+    bench_bm25_query_lengths,
+    // Fusion Algorithms
+    bench_fusion_algorithms,
+    bench_rrf_k_parameter,
+    bench_weight_ratios,
+    // Parallel Execution
+    bench_parallel_execution_gain,
+    bench_candidate_counts,
+);
+
+criterion_main!(benches);
--- a/crates/ruvector-postgres/benches/index_bench.rs
+++ b/crates/ruvector-postgres/benches/index_bench.rs
--- a/crates/ruvector-postgres/benches/integrity_bench.rs
+++ b/crates/ruvector-postgres/benches/integrity_bench.rs
@@ -0,0 +1,915 @@
+//! Index integrity and graph maintenance benchmarks
+//!
+//! Benchmarks for v2 structural integrity features:
+//! - Contracted graph construction
+//! - Mincut computation time
+//! - State transition overhead
+//! - Gating check latency
+//! - Graph connectivity verification
+
+use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
+use rand::prelude::*;
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+use std::cmp::Ordering;
+use std::collections::{BinaryHeap, HashMap, HashSet, VecDeque};
+
+// ============================================================================
+// Graph Structures for Index Integrity
+// ============================================================================
+
+mod graph {
+    use std::cmp::Ordering;
+    use std::collections::{BinaryHeap, HashMap, HashSet, VecDeque};
+
+    /// Node in the HNSW graph (simplified)
+    #[derive(Clone)]
+    pub struct GraphNode {
+        pub id: u64,
+        pub neighbors: Vec<u64>,
+        pub layer: usize,
+    }
+
+    /// Graph for integrity checking
+    pub struct Graph {
+        pub nodes: HashMap<u64, GraphNode>,
+        pub max_layer: usize,
+    }
+
+    impl Graph {
+        pub fn new() -> Self {
+            Self {
+                nodes: HashMap::new(),
+                max_layer: 0,
+            }
+        }
+
+        pub fn add_node(&mut self, id: u64, layer: usize) {
+            self.nodes.insert(
+                id,
+                GraphNode {
+                    id,
+                    neighbors: Vec::new(),
+                    layer,
+                },
+            );
+            self.max_layer = self.max_layer.max(layer);
+        }
+
+        pub fn add_edge(&mut self, from: u64, to: u64) {
+            if let Some(node) = self.nodes.get_mut(&from) {
+                if !node.neighbors.contains(&to) {
+                    node.neighbors.push(to);
+                }
+            }
+        }
+
+        pub fn len(&self) -> usize {
+            self.nodes.len()
+        }
+    }
+
+    /// Contracted graph for integrity verification
+    pub struct ContractedGraph {
+        /// Super-nodes (contracted regions)
+        pub super_nodes: Vec<SuperNode>,
+        /// Edges between super-nodes
+        pub super_edges: Vec<(usize, usize, f32)>,
+        /// Node to super-node mapping
+        pub node_mapping: HashMap<u64, usize>,
+    }
+
+    #[derive(Clone)]
+    pub struct SuperNode {
+        pub id: usize,
+        pub original_nodes: Vec<u64>,
+        pub internal_edges: usize,
+    }
+
+    impl ContractedGraph {
+        pub fn new() -> Self {
+            Self {
+                super_nodes: Vec::new(),
+                super_edges: Vec::new(),
+                node_mapping: HashMap::new(),
+            }
+        }
+
+        /// Build contracted graph from original graph
+        pub fn build_from_graph(graph: &Graph, contraction_factor: usize) -> Self {
+            let mut contracted = ContractedGraph::new();
+
+            // Group nodes by region (simplified partitioning)
+            let node_ids: Vec<u64> = graph.nodes.keys().copied().collect();
+            let num_super_nodes = (node_ids.len() / contraction_factor).max(1);
+
+            for (i, chunk) in node_ids.chunks(contraction_factor).enumerate() {
+                let super_node = SuperNode {
+                    id: i,
+                    original_nodes: chunk.to_vec(),
+                    internal_edges: chunk
+                        .iter()
+                        .filter_map(|&id| graph.nodes.get(&id))
+                        .flat_map(|n| n.neighbors.iter())
+                        .filter(|&&neighbor| chunk.contains(&neighbor))
+                        .count(),
+                };
+
+                for &node_id in chunk {
+                    contracted.node_mapping.insert(node_id, i);
+                }
+
+                contracted.super_nodes.push(super_node);
+            }
+
+            // Build super edges
+            let mut edge_weights: HashMap<(usize, usize), f32> = HashMap::new();
+
+            for node in graph.nodes.values() {
+                let from_super = contracted.node_mapping[&node.id];
+
+                for &neighbor in &node.neighbors {
+                    if let Some(&to_super) = contracted.node_mapping.get(&neighbor) {
+                        if from_super != to_super {
+                            let key = if from_super < to_super {
+                                (from_super, to_super)
+                            } else {
+                                (to_super, from_super)
+                            };
+                            *edge_weights.entry(key).or_insert(0.0) += 1.0;
+                        }
+                    }
+                }
+            }
+
+            contracted.super_edges = edge_weights
+                .into_iter()
+                .map(|((a, b), w)| (a, b, w))
+                .collect();
+
+            contracted
+        }
+
+        pub fn num_super_nodes(&self) -> usize {
+            self.super_nodes.len()
+        }
+
+        pub fn num_super_edges(&self) -> usize {
+            self.super_edges.len()
+        }
+    }
+
+    /// Mincut computation using Ford-Fulkerson algorithm
+    pub struct MincutComputer {
+        /// Adjacency list with capacities
+        adj: Vec<Vec<(usize, f32)>>,
+        pub n: usize,
+    }
+
+    impl MincutComputer {
+        pub fn from_contracted_graph(contracted: &ContractedGraph) -> Self {
+            let n = contracted.num_super_nodes();
+            let mut adj: Vec<Vec<(usize, f32)>> = vec![Vec::new(); n];
+
+            for &(a, b, w) in &contracted.super_edges {
+                adj[a].push((b, w));
+                adj[b].push((a, w));
+            }
+
+            Self { adj, n }
+        }
+
+        /// Find mincut using BFS-based augmenting paths
+        pub fn compute_mincut(&self, source: usize, sink: usize) -> f32 {
+            if source == sink || self.n == 0 {
+                return 0.0;
+            }
+
+            // Create residual capacity matrix
+            let mut residual: Vec<Vec<f32>> = vec![vec![0.0; self.n]; self.n];
+
+            for (from, edges) in self.adj.iter().enumerate() {
+                for &(to, cap) in edges {
+                    residual[from][to] = cap;
+                }
+            }
+
+            let mut max_flow = 0.0;
+
+            // BFS to find augmenting path
+            loop {
+                let mut parent = vec![None; self.n];
+                let mut visited = vec![false; self.n];
+                let mut queue = VecDeque::new();
+
+                visited[source] = true;
+                queue.push_back(source);
+
+                while let Some(u) = queue.pop_front() {
+                    for v in 0..self.n {
+                        if !visited[v] && residual[u][v] > 0.0 {
+                            visited[v] = true;
+                            parent[v] = Some(u);
+                            queue.push_back(v);
+                        }
+                    }
+                }
+
+                if !visited[sink] {
+                    break;
+                }
+
+                // Find minimum residual capacity along path
+                let mut path_flow = f32::MAX;
+                let mut v = sink;
+                while let Some(u) = parent[v] {
+                    path_flow = path_flow.min(residual[u][v]);
+                    v = u;
+                }
+
+                // Update residual capacities
+                v = sink;
+                while let Some(u) = parent[v] {
+                    residual[u][v] -= path_flow;
+                    residual[v][u] += path_flow;
+                    v = u;
+                }
+
+                max_flow += path_flow;
+            }
+
+            max_flow
+        }
+
+        /// Compute global mincut (minimum over all pairs)
+        pub fn compute_global_mincut(&self) -> f32 {
+            if self.n <= 1 {
+                return 0.0;
+            }
+
+            let mut min_cut = f32::MAX;
+
+            // Use Stoer-Wagner-like approach: fix node 0 as source
+            for sink in 1..self.n {
+                let cut = self.compute_mincut(0, sink);
+                min_cut = min_cut.min(cut);
+            }
+
+            min_cut
+        }
+    }
+
+    /// State machine for index integrity
+    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+    pub enum IndexState {
+        Uninitialized,
+        Building,
+        Ready,
+        Updating,
+        Corrupted,
+        Recovering,
+    }
+
+    pub struct IndexStateMachine {
+        pub state: IndexState,
+        pub transition_count: usize,
+        pub last_integrity_check: std::time::Instant,
+        pub integrity_score: f32,
+    }
+
+    impl IndexStateMachine {
+        pub fn new() -> Self {
+            Self {
+                state: IndexState::Uninitialized,
+                transition_count: 0,
+                last_integrity_check: std::time::Instant::now(),
+                integrity_score: 1.0,
+            }
+        }
+
+        pub fn can_transition(&self, to: IndexState) -> bool {
+            match (self.state, to) {
+                (IndexState::Uninitialized, IndexState::Building) => true,
+                (IndexState::Building, IndexState::Ready) => true,
+                (IndexState::Ready, IndexState::Updating) => true,
+                (IndexState::Updating, IndexState::Ready) => true,
+                (_, IndexState::Corrupted) => true,
+                (IndexState::Corrupted, IndexState::Recovering) => true,
+                (IndexState::Recovering, IndexState::Ready) => true,
+                _ => false,
+            }
+        }
+
+        pub fn transition(&mut self, to: IndexState) -> Result<(), &'static str> {
+            if self.can_transition(to) {
+                self.state = to;
+                self.transition_count += 1;
+                Ok(())
+            } else {
+                Err("Invalid state transition")
+            }
+        }
+    }
+
+    /// Gating check for index operations
+    pub struct GatingCheck {
+        /// Minimum connectivity threshold
+        pub min_connectivity: f32,
+        /// Maximum allowed dead nodes
+        pub max_dead_nodes_ratio: f32,
+        /// Maximum layer imbalance
+        pub max_layer_imbalance: f32,
+    }
+
+    impl GatingCheck {
+        pub fn default() -> Self {
+            Self {
+                min_connectivity: 0.95,
+                max_dead_nodes_ratio: 0.01,
+                max_layer_imbalance: 2.0,
+            }
+        }
+
+        /// Check if graph passes all gates
+        pub fn check(&self, graph: &Graph) -> GatingResult {
+            let connectivity = self.check_connectivity(graph);
+            let dead_ratio = self.check_dead_nodes(graph);
+            let layer_balance = self.check_layer_balance(graph);
+
+            GatingResult {
+                passed: connectivity >= self.min_connectivity
+                    && dead_ratio <= self.max_dead_nodes_ratio
+                    && layer_balance <= self.max_layer_imbalance,
+                connectivity,
+                dead_nodes_ratio: dead_ratio,
+                layer_imbalance: layer_balance,
+            }
+        }
+
+        fn check_connectivity(&self, graph: &Graph) -> f32 {
+            if graph.len() <= 1 {
+                return 1.0;
+            }
+
+            // BFS from first node
+            let start = *graph.nodes.keys().next().unwrap();
+            let mut visited = HashSet::new();
+            let mut queue = VecDeque::new();
+
+            visited.insert(start);
+            queue.push_back(start);
+
+            while let Some(node) = queue.pop_front() {
+                if let Some(n) = graph.nodes.get(&node) {
+                    for &neighbor in &n.neighbors {
+                        if !visited.contains(&neighbor) && graph.nodes.contains_key(&neighbor) {
+                            visited.insert(neighbor);
+                            queue.push_back(neighbor);
+                        }
+                    }
+                }
+            }
+
+            visited.len() as f32 / graph.len() as f32
+        }
+
+        fn check_dead_nodes(&self, graph: &Graph) -> f32 {
+            let dead_count = graph
+                .nodes
+                .values()
+                .filter(|n| n.neighbors.is_empty())
+                .count();
+
+            dead_count as f32 / graph.len() as f32
+        }
+
+        fn check_layer_balance(&self, graph: &Graph) -> f32 {
+            if graph.max_layer == 0 {
+                return 1.0;
+            }
+
+            let mut layer_counts = vec![0usize; graph.max_layer + 1];
+            for node in graph.nodes.values() {
+                layer_counts[node.layer] += 1;
+            }
+
+            let max_count = layer_counts.iter().max().copied().unwrap_or(1) as f32;
+            let min_count = layer_counts
+                .iter()
+                .filter(|&&c| c > 0)
+                .min()
+                .copied()
+                .unwrap_or(1) as f32;
+
+            max_count / min_count
+        }
+    }
+
+    #[derive(Debug)]
+    pub struct GatingResult {
+        pub passed: bool,
+        pub connectivity: f32,
+        pub dead_nodes_ratio: f32,
+        pub layer_imbalance: f32,
+    }
+}
+
+use graph::{ContractedGraph, GatingCheck, Graph, IndexState, IndexStateMachine, MincutComputer};
+
+// ============================================================================
+// Test Data Generation
+// ============================================================================
+
+fn generate_random_graph(n: usize, avg_neighbors: usize, max_layer: usize, seed: u64) -> Graph {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+    let mut graph = Graph::new();
+
+    // Add nodes with random layers
+    for id in 0..n {
+        let layer = if id == 0 {
+            max_layer
+        } else {
+            let ml = 1.0 / (16.0_f64).ln();
+            let r: f64 = rng.gen();
+            ((-r.ln() * ml).floor() as usize).min(max_layer)
+        };
+        graph.add_node(id as u64, layer);
+    }
+
+    // Add random edges (maintaining HNSW-like structure)
+    for id in 0..n {
+        let num_neighbors = rng.gen_range(1..=avg_neighbors * 2);
+        for _ in 0..num_neighbors {
+            let neighbor = rng.gen_range(0..n) as u64;
+            if neighbor != id as u64 {
+                graph.add_edge(id as u64, neighbor);
+            }
+        }
+    }
+
+    graph
+}
+
+fn generate_connected_graph(n: usize, avg_neighbors: usize, seed: u64) -> Graph {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+    let mut graph = Graph::new();
+
+    // Add nodes
+    for id in 0..n {
+        let layer = if id == 0 { 5 } else { rng.gen_range(0..=5) };
+        graph.add_node(id as u64, layer);
+    }
+
+    // Ensure connectivity: chain all nodes
+    for id in 1..n {
+        graph.add_edge(id as u64, (id - 1) as u64);
+        graph.add_edge((id - 1) as u64, id as u64);
+    }
+
+    // Add random extra edges
+    for id in 0..n {
+        let num_extra = rng.gen_range(0..avg_neighbors);
+        for _ in 0..num_extra {
+            let neighbor = rng.gen_range(0..n) as u64;
+            if neighbor != id as u64 {
+                graph.add_edge(id as u64, neighbor);
+            }
+        }
+    }
+
+    graph
+}
+
+// ============================================================================
+// Contracted Graph Benchmarks
+// ============================================================================
+
+fn bench_contracted_graph_build(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Contracted Graph Build");
+    group.sample_size(10);
+
+    for &n in [1_000, 10_000, 100_000].iter() {
+        let graph = generate_connected_graph(n, 16, 42);
+
+        for &factor in [10, 50, 100, 500].iter() {
+            if factor > n {
+                continue;
+            }
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("n{}_factor{}", n, factor), n),
+                &(&graph, factor),
+                |bench, (g, f)| bench.iter(|| black_box(ContractedGraph::build_from_graph(g, *f))),
+            );
+        }
+    }
+
+    group.finish();
+}
+
+fn bench_contracted_graph_memory(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Contracted Graph Memory");
+    group.sample_size(10);
+
+    for &n in [10_000, 100_000].iter() {
+        let graph = generate_connected_graph(n, 16, 42);
+
+        for &factor in [10, 50, 100].iter() {
+            group.bench_with_input(
+                BenchmarkId::new(format!("n{}_factor{}", n, factor), n),
+                &(&graph, factor),
+                |bench, (g, f)| {
+                    bench.iter(|| {
+                        let contracted = ContractedGraph::build_from_graph(g, *f);
+
+                        // Calculate memory usage
+                        let super_node_mem = contracted
+                            .super_nodes
+                            .iter()
+                            .map(|sn| sn.original_nodes.len() * 8)
+                            .sum::<usize>();
+                        let edge_mem = contracted.super_edges.len() * 20; // (usize, usize, f32)
+                        let mapping_mem = contracted.node_mapping.len() * 16;
+
+                        black_box(super_node_mem + edge_mem + mapping_mem)
+                    })
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Mincut Computation Benchmarks
+// ============================================================================
+
+fn bench_mincut_compute(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Mincut Computation");
+    group.sample_size(10);
+
+    for &n in [1_000, 5_000, 10_000].iter() {
+        let graph = generate_connected_graph(n, 16, 42);
+        let contracted = ContractedGraph::build_from_graph(&graph, 50);
+        let mincut_computer = MincutComputer::from_contracted_graph(&contracted);
+
+        group.bench_with_input(
+            BenchmarkId::new("single_pair", n),
+            &mincut_computer,
+            |bench, mc| bench.iter(|| black_box(mc.compute_mincut(0, mc.n - 1))),
+        );
+
+        group.bench_with_input(
+            BenchmarkId::new("global", n),
+            &mincut_computer,
+            |bench, mc| bench.iter(|| black_box(mc.compute_global_mincut())),
+        );
+    }
+
+    group.finish();
+}
+
+fn bench_mincut_contraction_factors(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Mincut vs Contraction Factor");
+    group.sample_size(10);
+
+    let n = 10_000;
+    let graph = generate_connected_graph(n, 16, 42);
+
+    for &factor in [10, 25, 50, 100, 200].iter() {
+        let contracted = ContractedGraph::build_from_graph(&graph, factor);
+        let mincut_computer = MincutComputer::from_contracted_graph(&contracted);
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(factor),
+            &mincut_computer,
+            |bench, mc| bench.iter(|| black_box(mc.compute_global_mincut())),
+        );
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// State Transition Benchmarks
+// ============================================================================
+
+fn bench_state_transitions(c: &mut Criterion) {
+    let mut group = c.benchmark_group("State Transitions");
+
+    // Single transition
+    group.bench_function("single_transition", |bench| {
+        bench.iter(|| {
+            let mut sm = IndexStateMachine::new();
+            black_box(sm.transition(IndexState::Building))
+        })
+    });
+
+    // Full lifecycle
+    group.bench_function("full_lifecycle", |bench| {
+        bench.iter(|| {
+            let mut sm = IndexStateMachine::new();
+            sm.transition(IndexState::Building).ok();
+            sm.transition(IndexState::Ready).ok();
+            sm.transition(IndexState::Updating).ok();
+            sm.transition(IndexState::Ready).ok();
+            black_box(sm.state)
+        })
+    });
+
+    // Transition check only (no mutation)
+    group.bench_function("transition_check", |bench| {
+        let sm = IndexStateMachine::new();
+        bench.iter(|| black_box(sm.can_transition(IndexState::Building)))
+    });
+
+    // Many transitions
+    group.bench_function("1000_transitions", |bench| {
+        bench.iter(|| {
+            let mut sm = IndexStateMachine::new();
+            sm.transition(IndexState::Building).ok();
+            sm.transition(IndexState::Ready).ok();
+
+            for _ in 0..500 {
+                sm.transition(IndexState::Updating).ok();
+                sm.transition(IndexState::Ready).ok();
+            }
+
+            black_box(sm.transition_count)
+        })
+    });
+
+    group.finish();
+}
+
+fn bench_state_machine_overhead(c: &mut Criterion) {
+    let mut group = c.benchmark_group("State Machine Overhead");
+
+    // Measure overhead of state checking before operations
+    let graph = generate_connected_graph(10_000, 16, 42);
+
+    group.bench_function("with_state_check", |bench| {
+        let mut sm = IndexStateMachine::new();
+        sm.transition(IndexState::Building).ok();
+        sm.transition(IndexState::Ready).ok();
+
+        bench.iter(|| {
+            // Simulate operation with state check
+            if sm.state == IndexState::Ready {
+                // Perform "operation"
+                let count = graph.nodes.len();
+                black_box(count)
+            } else {
+                black_box(0)
+            }
+        })
+    });
+
+    group.bench_function("without_state_check", |bench| {
+        bench.iter(|| {
+            // Perform operation directly
+            let count = graph.nodes.len();
+            black_box(count)
+        })
+    });
+
+    group.finish();
+}
+
+// ============================================================================
+// Gating Check Benchmarks
+// ============================================================================
+
+fn bench_gating_check(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Gating Check");
+
+    for &n in [1_000, 10_000, 100_000].iter() {
+        let graph = generate_connected_graph(n, 16, 42);
+        let gating = GatingCheck::default();
+
+        group.bench_with_input(
+            BenchmarkId::new("full_check", n),
+            &(&graph, &gating),
+            |bench, (g, gate)| bench.iter(|| black_box(gate.check(g))),
+        );
+    }
+
+    group.finish();
+}
+
+fn bench_connectivity_check(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Connectivity Check");
+
+    for &n in [1_000, 10_000, 100_000].iter() {
+        // Well-connected graph
+        let connected_graph = generate_connected_graph(n, 16, 42);
+
+        // Sparse graph (may have disconnected components)
+        let sparse_graph = generate_random_graph(n, 2, 5, 42);
+
+        let gating = GatingCheck::default();
+
+        group.bench_with_input(
+            BenchmarkId::new("connected", n),
+            &(&connected_graph, &gating),
+            |bench, (g, gate)| bench.iter(|| black_box(gate.check(g).connectivity)),
+        );
+
+        group.bench_with_input(
+            BenchmarkId::new("sparse", n),
+            &(&sparse_graph, &gating),
+            |bench, (g, gate)| bench.iter(|| black_box(gate.check(g).connectivity)),
+        );
+    }
+
+    group.finish();
+}
+
+fn bench_dead_node_detection(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Dead Node Detection");
+
+    for &n in [10_000, 100_000].iter() {
+        let graph = generate_connected_graph(n, 16, 42);
+        let gating = GatingCheck::default();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(n),
+            &(&graph, &gating),
+            |bench, (g, gate)| bench.iter(|| black_box(gate.check(g).dead_nodes_ratio)),
+        );
+    }
+
+    group.finish();
+}
+
+fn bench_layer_balance_check(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Layer Balance Check");
+
+    for &n in [10_000, 100_000].iter() {
+        let graph = generate_random_graph(n, 16, 10, 42);
+        let gating = GatingCheck::default();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(n),
+            &(&graph, &gating),
+            |bench, (g, gate)| bench.iter(|| black_box(gate.check(g).layer_imbalance)),
+        );
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Parallel Integrity Checks
+// ============================================================================
+
+fn bench_parallel_integrity(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Parallel Integrity Check");
+    group.sample_size(10);
+
+    let n = 100_000;
+    let graph = generate_connected_graph(n, 16, 42);
+    let gating = GatingCheck::default();
+
+    // Sequential checks
+    group.bench_function("sequential", |bench| {
+        bench.iter(|| {
+            let result = gating.check(&graph);
+            black_box(result)
+        })
+    });
+
+    // Parallel checks (connectivity, dead nodes, layer balance)
+    group.bench_function("parallel", |bench| {
+        bench.iter(|| {
+            let (connectivity, (dead_ratio, layer_balance)) = rayon::join(
+                || {
+                    // Connectivity check
+                    if graph.len() <= 1 {
+                        return 1.0;
+                    }
+                    let start = *graph.nodes.keys().next().unwrap();
+                    let mut visited = HashSet::new();
+                    let mut queue = VecDeque::new();
+                    visited.insert(start);
+                    queue.push_back(start);
+                    while let Some(node) = queue.pop_front() {
+                        if let Some(n) = graph.nodes.get(&node) {
+                            for &neighbor in &n.neighbors {
+                                if !visited.contains(&neighbor)
+                                    && graph.nodes.contains_key(&neighbor)
+                                {
+                                    visited.insert(neighbor);
+                                    queue.push_back(neighbor);
+                                }
+                            }
+                        }
+                    }
+                    visited.len() as f32 / graph.len() as f32
+                },
+                || {
+                    rayon::join(
+                        || {
+                            // Dead nodes
+                            let dead = graph
+                                .nodes
+                                .values()
+                                .filter(|n| n.neighbors.is_empty())
+                                .count();
+                            dead as f32 / graph.len() as f32
+                        },
+                        || {
+                            // Layer balance
+                            let mut layer_counts = vec![0usize; graph.max_layer + 1];
+                            for node in graph.nodes.values() {
+                                layer_counts[node.layer] += 1;
+                            }
+                            let max_count = layer_counts.iter().max().copied().unwrap_or(1) as f32;
+                            let min_count = layer_counts
+                                .iter()
+                                .filter(|&&c| c > 0)
+                                .min()
+                                .copied()
+                                .unwrap_or(1) as f32;
+                            max_count / min_count
+                        },
+                    )
+                },
+            );
+
+            let passed = connectivity >= gating.min_connectivity
+                && dead_ratio <= gating.max_dead_nodes_ratio
+                && layer_balance <= gating.max_layer_imbalance;
+
+            black_box(passed)
+        })
+    });
+
+    group.finish();
+}
+
+// ============================================================================
+// Complete Integrity Pipeline
+// ============================================================================
+
+fn bench_full_integrity_pipeline(c: &mut Criterion) {
+    let mut group = c.benchmark_group("Full Integrity Pipeline");
+    group.sample_size(10);
+
+    for &n in [10_000, 50_000, 100_000].iter() {
+        let graph = generate_connected_graph(n, 16, 42);
+        let gating = GatingCheck::default();
+
+        group.bench_with_input(BenchmarkId::from_parameter(n), &n, |bench, _| {
+            bench.iter(|| {
+                // 1. State check
+                let mut sm = IndexStateMachine::new();
+                sm.transition(IndexState::Building).ok();
+                sm.transition(IndexState::Ready).ok();
+
+                // 2. Gating check
+                let gate_result = gating.check(&graph);
+
+                // 3. If passed, build contracted graph
+                if gate_result.passed {
+                    let contracted = ContractedGraph::build_from_graph(&graph, 100);
+
+                    // 4. Compute mincut
+                    let mincut_computer = MincutComputer::from_contracted_graph(&contracted);
+                    let mincut = mincut_computer.compute_global_mincut();
+
+                    black_box((gate_result, mincut))
+                } else {
+                    black_box((gate_result, 0.0))
+                }
+            })
+        });
+    }
+
+    group.finish();
+}
+
+criterion_group!(
+    benches,
+    // Contracted Graph
+    bench_contracted_graph_build,
+    bench_contracted_graph_memory,
+    // Mincut
+    bench_mincut_compute,
+    bench_mincut_contraction_factors,
+    // State Transitions
+    bench_state_transitions,
+    bench_state_machine_overhead,
+    // Gating Checks
+    bench_gating_check,
+    bench_connectivity_check,
+    bench_dead_node_detection,
+    bench_layer_balance_check,
+    // Parallel Integrity
+    bench_parallel_integrity,
+    // Full Pipeline
+    bench_full_integrity_pipeline,
+);
+
+criterion_main!(benches);
--- a/crates/ruvector-postgres/benches/quantization_bench.rs
+++ b/crates/ruvector-postgres/benches/quantization_bench.rs
@@ -0,0 +1,434 @@
+//! Comprehensive quantization benchmarks
+//!
+//! Compares exact vs quantized search with different quantization methods
+
+use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
+use rand::prelude::*;
+use rand_chacha::ChaCha8Rng;
+use ruvector_postgres::distance::DistanceMetric;
+use ruvector_postgres::types::{BinaryVec, ProductVec, RuVector, ScalarVec};
+
+// ============================================================================
+// Test Data Generation
+// ============================================================================
+
+fn generate_vectors(n: usize, dims: usize, seed: u64) -> Vec<Vec<f32>> {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
+    (0..n)
+        .map(|_| (0..dims).map(|_| rng.gen_range(-1.0..1.0)).collect())
+        .collect()
+}
+
+// ============================================================================
+// Scalar Quantization (SQ8) Benchmarks
+// ============================================================================
+
+fn bench_sq8_quantization(c: &mut Criterion) {
+    let mut group = c.benchmark_group("sq8_quantization");
+
+    for dims in [128, 384, 768, 1536, 3072].iter() {
+        let data: Vec<f32> = (0..*dims).map(|i| (i as f32) * 0.001).collect();
+
+        group.bench_with_input(BenchmarkId::new("encode", dims), dims, |bench, _| {
+            bench.iter(|| black_box(ScalarVec::from_f32(&data)));
+        });
+
+        let encoded = ScalarVec::from_f32(&data);
+        group.bench_with_input(BenchmarkId::new("decode", dims), dims, |bench, _| {
+            bench.iter(|| black_box(encoded.to_f32()));
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_sq8_distance(c: &mut Criterion) {
+    let mut group = c.benchmark_group("sq8_distance");
+
+    for dims in [128, 384, 768, 1536, 3072].iter() {
+        let a_data: Vec<f32> = (0..*dims).map(|i| i as f32 * 0.1).collect();
+        let b_data: Vec<f32> = (0..*dims).map(|i| (*dims - i) as f32 * 0.1).collect();
+
+        let a_exact = RuVector::from_slice(&a_data);
+        let b_exact = RuVector::from_slice(&b_data);
+
+        let a_sq8 = ScalarVec::from_f32(&a_data);
+        let b_sq8 = ScalarVec::from_f32(&b_data);
+
+        group.bench_with_input(BenchmarkId::new("exact", dims), dims, |bench, _| {
+            bench.iter(|| black_box(a_exact.dot(&b_exact)));
+        });
+
+        group.bench_with_input(BenchmarkId::new("quantized", dims), dims, |bench, _| {
+            bench.iter(|| black_box(a_sq8.distance(&b_sq8)));
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_sq8_search(c: &mut Criterion) {
+    let mut group = c.benchmark_group("sq8_search");
+
+    for dims in [128, 768, 1536].iter() {
+        let n = 10000;
+        let vectors = generate_vectors(n, *dims, 42);
+        let query = generate_vectors(1, *dims, 999)[0].clone();
+
+        // Exact search
+        let exact_vecs: Vec<RuVector> = vectors.iter().map(|v| RuVector::from_slice(v)).collect();
+
+        let exact_query = RuVector::from_slice(&query);
+
+        group.bench_with_input(BenchmarkId::new("exact", dims), dims, |bench, _| {
+            bench.iter(|| {
+                let mut distances: Vec<(usize, f32)> = exact_vecs
+                    .iter()
+                    .enumerate()
+                    .map(|(id, vec)| {
+                        let dist = exact_query.dot(vec);
+                        (id, -dist) // Negative for max inner product
+                    })
+                    .collect();
+
+                distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+                let top_k: Vec<_> = distances[..10].to_vec();
+                black_box(top_k)
+            });
+        });
+
+        // Quantized search
+        let sq8_vecs: Vec<ScalarVec> = vectors.iter().map(|v| ScalarVec::from_f32(v)).collect();
+
+        let sq8_query = ScalarVec::from_f32(&query);
+
+        group.bench_with_input(BenchmarkId::new("quantized", dims), dims, |bench, _| {
+            bench.iter(|| {
+                let mut distances: Vec<(usize, f32)> = sq8_vecs
+                    .iter()
+                    .enumerate()
+                    .map(|(id, vec)| (id, sq8_query.distance(vec)))
+                    .collect();
+
+                distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+                let top_k: Vec<_> = distances[..10].to_vec();
+                black_box(top_k)
+            });
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Binary Quantization Benchmarks
+// ============================================================================
+
+fn bench_binary_quantization(c: &mut Criterion) {
+    let mut group = c.benchmark_group("binary_quantization");
+
+    for dims in [128, 512, 1024, 2048, 4096].iter() {
+        let data: Vec<f32> = (0..*dims)
+            .map(|i| if i % 2 == 0 { 1.0 } else { -1.0 })
+            .collect();
+
+        group.bench_with_input(BenchmarkId::new("encode", dims), dims, |bench, _| {
+            bench.iter(|| black_box(BinaryVec::from_f32(&data)));
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_binary_hamming(c: &mut Criterion) {
+    let mut group = c.benchmark_group("binary_hamming");
+
+    for dims in [128, 512, 1024, 2048, 4096, 8192].iter() {
+        let a_data: Vec<f32> = (0..*dims)
+            .map(|i| if i % 2 == 0 { 1.0 } else { -1.0 })
+            .collect();
+        let b_data: Vec<f32> = (0..*dims)
+            .map(|i| if i % 3 == 0 { 1.0 } else { -1.0 })
+            .collect();
+
+        let a = BinaryVec::from_f32(&a_data);
+        let b = BinaryVec::from_f32(&b_data);
+
+        group.bench_with_input(BenchmarkId::new("simd", dims), dims, |bench, _| {
+            bench.iter(|| black_box(a.hamming_distance(&b)));
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_binary_search(c: &mut Criterion) {
+    let mut group = c.benchmark_group("binary_search");
+
+    for dims in [1024, 2048, 4096].iter() {
+        let n = 100000;
+        let vectors = generate_vectors(n, *dims, 42);
+        let query = generate_vectors(1, *dims, 999)[0].clone();
+
+        let binary_vecs: Vec<BinaryVec> = vectors.iter().map(|v| BinaryVec::from_f32(v)).collect();
+
+        let binary_query = BinaryVec::from_f32(&query);
+
+        group.bench_with_input(BenchmarkId::new("scan", dims), dims, |bench, _| {
+            bench.iter(|| {
+                let mut distances: Vec<(usize, u32)> = binary_vecs
+                    .iter()
+                    .enumerate()
+                    .map(|(id, vec)| (id, binary_query.hamming_distance(vec)))
+                    .collect();
+
+                distances.sort_by_key(|k| k.1);
+                let top_k: Vec<_> = distances[..10].to_vec();
+                black_box(top_k)
+            });
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Product Quantization (PQ) Benchmarks
+// ============================================================================
+
+fn bench_pq_adc_distance(c: &mut Criterion) {
+    let mut group = c.benchmark_group("pq_adc_distance");
+
+    for m in [8u8, 16, 32, 48, 64].iter() {
+        let k: usize = 256; // Number of centroids
+        let codes: Vec<u8> = (0..*m).map(|i| ((i * 7) % k as u8) as u8).collect();
+        let pq = ProductVec::new((*m as usize * 32) as u16, *m, 255, codes);
+
+        // Create distance table
+        let mut table = Vec::with_capacity(*m as usize * k as usize);
+        for i in 0..(*m as usize * k as usize) {
+            table.push((i % 100) as f32 * 0.01);
+        }
+
+        group.bench_with_input(BenchmarkId::new("simd", m), m, |bench, _| {
+            bench.iter(|| black_box(pq.adc_distance_simd(&table)));
+        });
+
+        group.bench_with_input(BenchmarkId::new("flat", m), m, |bench, _| {
+            bench.iter(|| black_box(pq.adc_distance_flat(&table)));
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Compression Ratio Benchmarks
+// ============================================================================
+
+fn bench_compression_comparison(c: &mut Criterion) {
+    let mut group = c.benchmark_group("compression_ratio");
+
+    for dims in [384, 768, 1536, 3072].iter() {
+        let data: Vec<f32> = (0..*dims).map(|i| (i as f32) * 0.001).collect();
+        let original_size = dims * std::mem::size_of::<f32>();
+
+        group.bench_with_input(BenchmarkId::new("binary", dims), dims, |bench, _| {
+            bench.iter(|| {
+                let binary = black_box(BinaryVec::from_f32(&data));
+                let compressed = binary.memory_size();
+                let ratio = original_size as f32 / compressed as f32;
+                black_box(ratio)
+            });
+        });
+
+        group.bench_with_input(BenchmarkId::new("scalar", dims), dims, |bench, _| {
+            bench.iter(|| {
+                let scalar = black_box(ScalarVec::from_f32(&data));
+                let compressed = scalar.memory_size();
+                let ratio = original_size as f32 / compressed as f32;
+                black_box(ratio)
+            });
+        });
+
+        group.bench_with_input(BenchmarkId::new("product", dims), dims, |bench, _| {
+            bench.iter(|| {
+                let m = (dims / 32).min(64);
+                let pq = black_box(ProductVec::new(*dims as u16, m as u8, 255, vec![0; m]));
+                let compressed = pq.memory_size();
+                let ratio = original_size as f32 / compressed as f32;
+                black_box(ratio)
+            });
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Speedup vs Accuracy Trade-off
+// ============================================================================
+
+fn bench_quantization_tradeoff(c: &mut Criterion) {
+    let mut group = c.benchmark_group("quantization_tradeoff");
+    group.sample_size(10);
+
+    let dims = 768;
+    let n = 10000;
+    let num_queries = 100;
+
+    let vectors = generate_vectors(n, dims, 42);
+    let queries = generate_vectors(num_queries, dims, 999);
+
+    // Compute ground truth
+    let exact_vecs: Vec<RuVector> = vectors.iter().map(|v| RuVector::from_slice(v)).collect();
+
+    let ground_truth: Vec<Vec<usize>> = queries
+        .iter()
+        .map(|query| {
+            let query_vec = RuVector::from_slice(query);
+            let mut distances: Vec<(usize, f32)> = exact_vecs
+                .iter()
+                .enumerate()
+                .map(|(id, vec)| {
+                    let diff = query_vec.sub(vec);
+                    let dist = diff.norm();
+                    (id, dist)
+                })
+                .collect();
+
+            distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+            distances.iter().take(10).map(|(id, _)| *id).collect()
+        })
+        .collect();
+
+    // Benchmark SQ8
+    let sq8_vecs: Vec<ScalarVec> = vectors.iter().map(|v| ScalarVec::from_f32(v)).collect();
+
+    group.bench_function("sq8_speedup", |bench| {
+        bench.iter(|| {
+            for (i, query) in queries.iter().enumerate() {
+                let sq8_query = ScalarVec::from_f32(query);
+                let mut distances: Vec<(usize, f32)> = sq8_vecs
+                    .iter()
+                    .enumerate()
+                    .map(|(id, vec)| (id, sq8_query.distance(vec)))
+                    .collect();
+
+                distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
+                let results: Vec<usize> = distances.iter().take(10).map(|(id, _)| *id).collect();
+
+                // Compute recall
+                let hits = results
+                    .iter()
+                    .filter(|id| ground_truth[i].contains(id))
+                    .count();
+
+                black_box(hits as f32 / 10.0);
+            }
+        });
+    });
+
+    // Benchmark Binary
+    let binary_vecs: Vec<BinaryVec> = vectors.iter().map(|v| BinaryVec::from_f32(v)).collect();
+
+    group.bench_function("binary_speedup", |bench| {
+        bench.iter(|| {
+            for (i, query) in queries.iter().enumerate() {
+                let binary_query = BinaryVec::from_f32(query);
+                let mut distances: Vec<(usize, u32)> = binary_vecs
+                    .iter()
+                    .enumerate()
+                    .map(|(id, vec)| (id, binary_query.hamming_distance(vec)))
+                    .collect();
+
+                distances.sort_by_key(|k| k.1);
+                let results: Vec<usize> = distances.iter().take(10).map(|(id, _)| *id).collect();
+
+                // Compute recall
+                let hits = results
+                    .iter()
+                    .filter(|id| ground_truth[i].contains(id))
+                    .count();
+
+                black_box(hits as f32 / 10.0);
+            }
+        });
+    });
+
+    group.finish();
+}
+
+// ============================================================================
+// Throughput Comparison
+// ============================================================================
+
+fn bench_quantization_throughput(c: &mut Criterion) {
+    let mut group = c.benchmark_group("quantization_throughput");
+
+    let dims = 1536;
+    let n = 100000;
+
+    let vectors = generate_vectors(n, dims, 42);
+    let query = generate_vectors(1, dims, 999)[0].clone();
+
+    // Exact
+    let exact_vecs: Vec<RuVector> = vectors.iter().map(|v| RuVector::from_slice(v)).collect();
+    let exact_query = RuVector::from_slice(&query);
+
+    group.bench_function("exact_scan", |bench| {
+        bench.iter(|| {
+            let mut total = 0.0f32;
+            for vec in &exact_vecs {
+                total += exact_query.dot(vec);
+            }
+            black_box(total)
+        });
+    });
+
+    // SQ8
+    let sq8_vecs: Vec<ScalarVec> = vectors.iter().map(|v| ScalarVec::from_f32(v)).collect();
+    let sq8_query = ScalarVec::from_f32(&query);
+
+    group.bench_function("sq8_scan", |bench| {
+        bench.iter(|| {
+            let mut total = 0.0f32;
+            for vec in &sq8_vecs {
+                total += sq8_query.distance(vec);
+            }
+            black_box(total)
+        });
+    });
+
+    // Binary
+    let binary_vecs: Vec<BinaryVec> = vectors.iter().map(|v| BinaryVec::from_f32(v)).collect();
+    let binary_query = BinaryVec::from_f32(&query);
+
+    group.bench_function("binary_scan", |bench| {
+        bench.iter(|| {
+            let mut total = 0u64;
+            for vec in &binary_vecs {
+                total += binary_query.hamming_distance(vec) as u64;
+            }
+            black_box(total)
+        });
+    });
+
+    group.finish();
+}
+
+criterion_group!(
+    benches,
+    bench_sq8_quantization,
+    bench_sq8_distance,
+    bench_sq8_search,
+    bench_binary_quantization,
+    bench_binary_hamming,
+    bench_binary_search,
+    bench_pq_adc_distance,
+    bench_compression_comparison,
+    bench_quantization_tradeoff,
+    bench_quantization_throughput,
+);
+
+criterion_main!(benches);
--- a/crates/ruvector-postgres/benches/quantized_distance_bench.rs
+++ b/crates/ruvector-postgres/benches/quantized_distance_bench.rs
@@ -0,0 +1,217 @@
+//! Benchmarks for quantized vector distance calculations
+//!
+//! Compares scalar vs SIMD implementations for all quantized types
+
+use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
+use ruvector_postgres::types::{BinaryVec, ProductVec, ScalarVec};
+
+// ============================================================================
+// BinaryVec Benchmarks
+// ============================================================================
+
+fn bench_binaryvec_hamming(c: &mut Criterion) {
+    let mut group = c.benchmark_group("binaryvec_hamming");
+
+    for dims in [128, 512, 1024, 2048, 4096].iter() {
+        let a_data: Vec<f32> = (0..*dims)
+            .map(|i| if i % 2 == 0 { 1.0 } else { -1.0 })
+            .collect();
+        let b_data: Vec<f32> = (0..*dims)
+            .map(|i| if i % 3 == 0 { 1.0 } else { -1.0 })
+            .collect();
+
+        let a = BinaryVec::from_f32(&a_data);
+        let b = BinaryVec::from_f32(&b_data);
+
+        group.bench_with_input(BenchmarkId::new("simd", dims), dims, |bencher, _| {
+            bencher.iter(|| black_box(a.hamming_distance(&b)));
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_binaryvec_quantization(c: &mut Criterion) {
+    let mut group = c.benchmark_group("binaryvec_quantization");
+
+    for dims in [128, 512, 1024, 2048, 4096].iter() {
+        let data: Vec<f32> = (0..*dims).map(|i| (i as f32) * 0.01).collect();
+
+        group.bench_with_input(BenchmarkId::new("from_f32", dims), dims, |bencher, _| {
+            bencher.iter(|| black_box(BinaryVec::from_f32(&data)));
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// ScalarVec Benchmarks
+// ============================================================================
+
+fn bench_scalarvec_distance(c: &mut Criterion) {
+    let mut group = c.benchmark_group("scalarvec_distance");
+
+    for dims in [128, 512, 1024, 2048, 4096].iter() {
+        let a_data: Vec<f32> = (0..*dims).map(|i| i as f32 * 0.1).collect();
+        let b_data: Vec<f32> = (0..*dims).map(|i| (*dims - i) as f32 * 0.1).collect();
+
+        let a = ScalarVec::from_f32(&a_data);
+        let b = ScalarVec::from_f32(&b_data);
+
+        group.bench_with_input(BenchmarkId::new("simd", dims), dims, |bencher, _| {
+            bencher.iter(|| black_box(a.distance(&b)));
+        });
+    }
+
+    group.finish();
+}
+
+fn bench_scalarvec_quantization(c: &mut Criterion) {
+    let mut group = c.benchmark_group("scalarvec_quantization");
+
+    for dims in [128, 512, 1024, 2048, 4096].iter() {
+        let data: Vec<f32> = (0..*dims).map(|i| (i as f32) * 0.01).collect();
+
+        group.bench_with_input(BenchmarkId::new("from_f32", dims), dims, |bencher, _| {
+            bencher.iter(|| black_box(ScalarVec::from_f32(&data)));
+        });
+
+        let scalar = ScalarVec::from_f32(&data);
+        group.bench_with_input(BenchmarkId::new("to_f32", dims), dims, |bencher, _| {
+            bencher.iter(|| black_box(scalar.to_f32()));
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// ProductVec Benchmarks
+// ============================================================================
+
+fn bench_productvec_adc_distance(c: &mut Criterion) {
+    let mut group = c.benchmark_group("productvec_adc_distance");
+
+    for m in [8u8, 16, 32, 48, 64].iter() {
+        let k: usize = 256;
+        let codes: Vec<u8> = (0..*m).map(|i| ((i * 7) % k as u8) as u8).collect();
+        let pq = ProductVec::new((*m as usize * 32) as u16, *m, 255, codes);
+
+        // Create distance table
+        let mut table = Vec::with_capacity(*m as usize * k as usize);
+        for i in 0..(*m as usize * k as usize) {
+            table.push((i % 100) as f32 * 0.01);
+        }
+
+        group.bench_with_input(BenchmarkId::new("simd", m), m, |bencher, _| {
+            bencher.iter(|| black_box(pq.adc_distance_simd(&table)));
+        });
+
+        group.bench_with_input(BenchmarkId::new("flat", m), m, |bencher, _| {
+            bencher.iter(|| black_box(pq.adc_distance_flat(&table)));
+        });
+    }
+
+    group.finish();
+}
+
+// ============================================================================
+// Compression Benchmarks
+// ============================================================================
+
+fn bench_compression_ratios(c: &mut Criterion) {
+    let mut group = c.benchmark_group("compression");
+
+    let dims = 1536; // OpenAI embedding size
+    let data: Vec<f32> = (0..dims).map(|i| (i as f32) * 0.001).collect();
+
+    // Original size
+    let original_size = dims * std::mem::size_of::<f32>();
+
+    group.bench_function("binary_quantize", |bencher| {
+        bencher.iter(|| {
+            let binary = black_box(BinaryVec::from_f32(&data));
+            let ratio = original_size as f32 / binary.memory_size() as f32;
+            black_box(ratio)
+        });
+    });
+
+    group.bench_function("scalar_quantize", |bencher| {
+        bencher.iter(|| {
+            let scalar = black_box(ScalarVec::from_f32(&data));
+            let ratio = original_size as f32 / scalar.memory_size() as f32;
+            black_box(ratio)
+        });
+    });
+
+    group.bench_function("product_quantize", |bencher| {
+        bencher.iter(|| {
+            let pq = black_box(ProductVec::new(dims as u16, 48, 255, vec![0; 48]));
+            let ratio = original_size as f32 / pq.memory_size() as f32;
+            black_box(ratio)
+        });
+    });
+
+    group.finish();
+}
+
+// ============================================================================
+// Throughput Benchmarks
+// ============================================================================
+
+fn bench_throughput_comparison(c: &mut Criterion) {
+    let mut group = c.benchmark_group("throughput");
+
+    let dims = 1024;
+    let num_vectors = 1000;
+
+    // Generate test data
+    let vectors: Vec<Vec<f32>> = (0..num_vectors)
+        .map(|i| (0..dims).map(|j| ((i * dims + j) as f32) * 0.001).collect())
+        .collect();
+
+    let query = vectors[0].clone();
+
+    // Quantize all vectors
+    let binary_vecs: Vec<BinaryVec> = vectors.iter().map(|v| BinaryVec::from_f32(v)).collect();
+    let scalar_vecs: Vec<ScalarVec> = vectors.iter().map(|v| ScalarVec::from_f32(v)).collect();
+
+    let query_binary = BinaryVec::from_f32(&query);
+    let query_scalar = ScalarVec::from_f32(&query);
+
+    group.bench_function("binary_scan", |bencher| {
+        bencher.iter(|| {
+            let mut total_dist = 0u32;
+            for v in &binary_vecs {
+                total_dist += black_box(query_binary.hamming_distance(v));
+            }
+            black_box(total_dist)
+        });
+    });
+
+    group.bench_function("scalar_scan", |bencher| {
+        bencher.iter(|| {
+            let mut total_dist = 0.0f32;
+            for v in &scalar_vecs {
+                total_dist += black_box(query_scalar.distance(v));
+            }
+            black_box(total_dist)
+        });
+    });
+
+    group.finish();
+}
+
+criterion_group!(
+    benches,
+    bench_binaryvec_hamming,
+    bench_binaryvec_quantization,
+    bench_scalarvec_distance,
+    bench_scalarvec_quantization,
+    bench_productvec_adc_distance,
+    bench_compression_ratios,
+    bench_throughput_comparison,
+);
+
+criterion_main!(benches);
--- a/crates/ruvector-postgres/benches/scripts/run_benchmarks.sh
+++ b/crates/ruvector-postgres/benches/scripts/run_benchmarks.sh
@@ -0,0 +1,173 @@
+#!/bin/bash
+# Comprehensive benchmark runner script
+
+set -e
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+# Configuration
+BENCHMARK_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+RESULTS_DIR="${BENCHMARK_DIR}/results"
+TIMESTAMP=$(date +%Y%m%d_%H%M%S)
+
+# Create results directory
+mkdir -p "${RESULTS_DIR}"
+
+echo -e "${BLUE}==================================================${NC}"
+echo -e "${BLUE}  RuVector Comprehensive Benchmark Suite${NC}"
+echo -e "${BLUE}==================================================${NC}"
+echo ""
+
+# ============================================================================
+# Rust Benchmarks
+# ============================================================================
+
+echo -e "${GREEN}Running Rust benchmarks...${NC}"
+echo ""
+
+# Distance benchmarks
+echo -e "${YELLOW}1. Distance function benchmarks${NC}"
+cargo bench --bench distance_bench -- --output-format bencher | tee "${RESULTS_DIR}/distance_${TIMESTAMP}.txt"
+
+# Index benchmarks
+echo -e "${YELLOW}2. HNSW index benchmarks${NC}"
+cargo bench --bench index_bench -- --output-format bencher | tee "${RESULTS_DIR}/index_${TIMESTAMP}.txt"
+
+# Quantization benchmarks
+echo -e "${YELLOW}3. Quantization benchmarks${NC}"
+cargo bench --bench quantization_bench -- --output-format bencher | tee "${RESULTS_DIR}/quantization_${TIMESTAMP}.txt"
+
+# Quantized distance benchmarks
+echo -e "${YELLOW}4. Quantized distance benchmarks${NC}"
+cargo bench --bench quantized_distance_bench -- --output-format bencher | tee "${RESULTS_DIR}/quantized_distance_${TIMESTAMP}.txt"
+
+# ============================================================================
+# SQL Benchmarks (if PostgreSQL is available)
+# ============================================================================
+
+if command -v psql &> /dev/null; then
+    echo ""
+    echo -e "${GREEN}Running SQL benchmarks...${NC}"
+    echo ""
+
+    # Check if test database exists
+    if psql -lqt | cut -d \| -f 1 | grep -qw ruvector_bench; then
+        echo -e "${YELLOW}5. Quick SQL benchmark${NC}"
+        psql -d ruvector_bench -f "${BENCHMARK_DIR}/sql/quick_benchmark.sql" | tee "${RESULTS_DIR}/sql_quick_${TIMESTAMP}.txt"
+
+        echo -e "${YELLOW}6. Full workload benchmark${NC}"
+        echo -e "${RED}Warning: This may take several minutes...${NC}"
+        psql -d ruvector_bench -f "${BENCHMARK_DIR}/sql/benchmark_workload.sql" | tee "${RESULTS_DIR}/sql_workload_${TIMESTAMP}.txt"
+    else
+        echo -e "${YELLOW}Skipping SQL benchmarks (database 'ruvector_bench' not found)${NC}"
+        echo -e "${YELLOW}To run SQL benchmarks:${NC}"
+        echo -e "  createdb ruvector_bench"
+        echo -e "  psql -d ruvector_bench -c 'CREATE EXTENSION ruvector;'"
+        echo -e "  psql -d ruvector_bench -c 'CREATE EXTENSION pgvector;'"
+    fi
+else
+    echo -e "${YELLOW}Skipping SQL benchmarks (psql not found)${NC}"
+fi
+
+# ============================================================================
+# Generate Summary Report
+# ============================================================================
+
+echo ""
+echo -e "${GREEN}Generating summary report...${NC}"
+
+cat > "${RESULTS_DIR}/summary_${TIMESTAMP}.md" <<EOF
+# RuVector Benchmark Results
+
+**Date:** $(date)
+**Platform:** $(uname -s) $(uname -m)
+**Rust Version:** $(rustc --version)
+
+## Benchmark Files
+
+- Distance functions: \`distance_${TIMESTAMP}.txt\`
+- HNSW index: \`index_${TIMESTAMP}.txt\`
+- Quantization: \`quantization_${TIMESTAMP}.txt\`
+- Quantized distance: \`quantized_distance_${TIMESTAMP}.txt\`
+
+## SQL Benchmarks
+
+EOF
+
+if [ -f "${RESULTS_DIR}/sql_quick_${TIMESTAMP}.txt" ]; then
+    cat >> "${RESULTS_DIR}/summary_${TIMESTAMP}.md" <<EOF
+- Quick benchmark: \`sql_quick_${TIMESTAMP}.txt\`
+- Full workload: \`sql_workload_${TIMESTAMP}.txt\`
+
+EOF
+else
+    cat >> "${RESULTS_DIR}/summary_${TIMESTAMP}.md" <<EOF
+SQL benchmarks were not run. See setup instructions above.
+
+EOF
+fi
+
+cat >> "${RESULTS_DIR}/summary_${TIMESTAMP}.md" <<EOF
+## System Information
+
+\`\`\`
+$(uname -a)
+\`\`\`
+
+### CPU Information
+
+\`\`\`
+$(lscpu 2>/dev/null || sysctl -a | grep machdep.cpu || echo "CPU info not available")
+\`\`\`
+
+### Memory Information
+
+\`\`\`
+$(free -h 2>/dev/null || vm_stat || echo "Memory info not available")
+\`\`\`
+
+## Running the Benchmarks
+
+To reproduce these results:
+
+\`\`\`bash
+cd crates/ruvector-postgres
+bash benches/scripts/run_benchmarks.sh
+\`\`\`
+
+## Comparing with Previous Results
+
+\`\`\`bash
+# Install cargo-criterion for better comparison
+cargo install cargo-criterion
+
+# Run with baseline
+cargo criterion --bench distance_bench --baseline main
+\`\`\`
+EOF
+
+echo ""
+echo -e "${GREEN}==================================================${NC}"
+echo -e "${GREEN}  Benchmark Complete!${NC}"
+echo -e "${GREEN}==================================================${NC}"
+echo ""
+echo -e "Results saved to: ${BLUE}${RESULTS_DIR}${NC}"
+echo -e "Summary report: ${BLUE}${RESULTS_DIR}/summary_${TIMESTAMP}.md${NC}"
+echo ""
+
+# ============================================================================
+# Optional: Open results in browser if criterion HTML is available
+# ============================================================================
+
+if [ -d "target/criterion" ]; then
+    echo -e "${YELLOW}Criterion HTML reports available at:${NC}"
+    echo -e "  ${BLUE}file://$(pwd)/target/criterion/report/index.html${NC}"
+fi
+
+echo ""
+echo -e "${GREEN}Done!${NC}"
--- a/crates/ruvector-postgres/benches/sql/benchmark_workload.sql
+++ b/crates/ruvector-postgres/benches/sql/benchmark_workload.sql
@@ -0,0 +1,381 @@
+-- Realistic workload benchmark for ruvector vs pgvector
+-- This script tests common operations with realistic dataset sizes
+
+\timing on
+\set ECHO all
+
+-- Configuration
+\set num_vectors 1000000
+\set num_queries 1000
+\set dims 1536
+\set k 10
+
+BEGIN;
+
+-- ============================================================================
+-- Setup Test Tables
+-- ============================================================================
+
+DROP TABLE IF EXISTS vectors_ruvector CASCADE;
+DROP TABLE IF EXISTS vectors_pgvector CASCADE;
+DROP TABLE IF EXISTS queries CASCADE;
+
+-- Create tables
+CREATE TABLE vectors_ruvector (
+    id SERIAL PRIMARY KEY,
+    embedding ruvector(:dims),
+    metadata JSONB
+);
+
+CREATE TABLE vectors_pgvector (
+    id SERIAL PRIMARY KEY,
+    embedding vector(:dims),
+    metadata JSONB
+);
+
+CREATE TABLE queries (
+    id SERIAL PRIMARY KEY,
+    query_vector ruvector(:dims)
+);
+
+-- ============================================================================
+-- Generate Test Data
+-- ============================================================================
+
+\echo 'Generating test data...'
+
+-- Insert vectors (ruvector)
+INSERT INTO vectors_ruvector (embedding, metadata)
+SELECT
+    array_to_ruvector(ARRAY(
+        SELECT random()::real
+        FROM generate_series(1, :dims)
+    )),
+    jsonb_build_object('category', i % 100)
+FROM generate_series(1, :num_vectors) i;
+
+-- Insert vectors (pgvector)
+INSERT INTO vectors_pgvector (embedding, metadata)
+SELECT
+    ARRAY(
+        SELECT random()::real
+        FROM generate_series(1, :dims)
+    )::vector(:dims),
+    jsonb_build_object('category', i % 100)
+FROM generate_series(1, :num_vectors) i;
+
+-- Generate query vectors
+INSERT INTO queries (query_vector)
+SELECT
+    array_to_ruvector(ARRAY(
+        SELECT random()::real
+        FROM generate_series(1, :dims)
+    ))
+FROM generate_series(1, :num_queries);
+
+COMMIT;
+
+-- ============================================================================
+-- Benchmark 1: Sequential Scan (No Index)
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark 1: Sequential Scan (No Index) ==='
+\echo ''
+
+-- Get a test query
+\set test_query 'SELECT query_vector FROM queries WHERE id = 1'
+
+-- RuVector scan
+\echo 'RuVector sequential scan (p50, p99 latency):'
+SELECT
+    percentile_cont(0.5) WITHIN GROUP (ORDER BY duration) AS p50_ms,
+    percentile_cont(0.99) WITHIN GROUP (ORDER BY duration) AS p99_ms,
+    AVG(duration) AS avg_ms,
+    MIN(duration) AS min_ms,
+    MAX(duration) AS max_ms
+FROM (
+    SELECT
+        id,
+        extract(milliseconds FROM (clock_timestamp() - start_time)) AS duration
+    FROM (
+        SELECT
+            id,
+            clock_timestamp() AS start_time,
+            (SELECT id FROM vectors_ruvector v ORDER BY v.embedding <-> (:test_query)::ruvector LIMIT :k)
+        FROM queries
+        LIMIT 100
+    ) t
+) times;
+
+-- PGVector scan
+\echo 'pgvector sequential scan (p50, p99 latency):'
+SELECT
+    percentile_cont(0.5) WITHIN GROUP (ORDER BY duration) AS p50_ms,
+    percentile_cont(0.99) WITHIN GROUP (ORDER BY duration) AS p99_ms,
+    AVG(duration) AS avg_ms,
+    MIN(duration) AS min_ms,
+    MAX(duration) AS max_ms
+FROM (
+    SELECT
+        id,
+        extract(milliseconds FROM (clock_timestamp() - start_time)) AS duration
+    FROM (
+        SELECT
+            id,
+            clock_timestamp() AS start_time,
+            (SELECT id FROM vectors_pgvector v ORDER BY v.embedding <-> (SELECT query_vector::vector FROM queries WHERE id = 1) LIMIT :k)
+        FROM queries
+        LIMIT 100
+    ) t
+) times;
+
+-- ============================================================================
+-- Benchmark 2: Build Index
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark 2: Index Build Time ==='
+\echo ''
+
+-- RuVector HNSW
+\echo 'Building ruvector HNSW index...'
+\timing on
+CREATE INDEX vectors_ruvector_hnsw_idx ON vectors_ruvector
+USING hnsw (embedding ruvector_l2_ops)
+WITH (m = 16, ef_construction = 64);
+
+-- PGVector HNSW
+\echo 'Building pgvector HNSW index...'
+\timing on
+CREATE INDEX vectors_pgvector_hnsw_idx ON vectors_pgvector
+USING hnsw (embedding vector_l2_ops)
+WITH (m = 16, ef_construction = 64);
+
+-- ============================================================================
+-- Benchmark 3: Index Search Performance
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark 3: Index Search (HNSW) ==='
+\echo ''
+
+-- Warm up
+SELECT COUNT(*) FROM vectors_ruvector v, queries q
+WHERE v.embedding <-> q.query_vector < 1000 LIMIT 100;
+
+-- RuVector HNSW search
+\echo 'RuVector HNSW search (p50, p99 latency):'
+SELECT
+    percentile_cont(0.5) WITHIN GROUP (ORDER BY duration) AS p50_ms,
+    percentile_cont(0.99) WITHIN GROUP (ORDER BY duration) AS p99_ms,
+    AVG(duration) AS avg_ms,
+    MIN(duration) AS min_ms,
+    MAX(duration) AS max_ms
+FROM (
+    SELECT
+        id,
+        extract(milliseconds FROM (clock_timestamp() - start_time)) AS duration
+    FROM (
+        SELECT
+            q.id,
+            clock_timestamp() AS start_time,
+            (SELECT id FROM vectors_ruvector v ORDER BY v.embedding <-> q.query_vector LIMIT :k)
+        FROM queries q
+        LIMIT 1000
+    ) t
+) times;
+
+-- PGVector HNSW search
+\echo 'pgvector HNSW search (p50, p99 latency):'
+SELECT
+    percentile_cont(0.5) WITHIN GROUP (ORDER BY duration) AS p50_ms,
+    percentile_cont(0.99) WITHIN GROUP (ORDER BY duration) AS p99_ms,
+    AVG(duration) AS avg_ms,
+    MIN(duration) AS min_ms,
+    MAX(duration) AS max_ms
+FROM (
+    SELECT
+        id,
+        extract(milliseconds FROM (clock_timestamp() - start_time)) AS duration
+    FROM (
+        SELECT
+            q.id,
+            clock_timestamp() AS start_time,
+            (SELECT id FROM vectors_pgvector v ORDER BY v.embedding <-> q.query_vector::vector LIMIT :k)
+        FROM queries q
+        LIMIT 1000
+    ) t
+) times;
+
+-- ============================================================================
+-- Benchmark 4: Distance Function Performance
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark 4: Distance Functions ==='
+\echo ''
+
+-- L2 Distance
+\echo 'L2 Distance (100k calculations):'
+\timing on
+SELECT SUM(ruvector_l2_distance(v1.embedding, v2.embedding))
+FROM vectors_ruvector v1
+CROSS JOIN vectors_ruvector v2
+WHERE v1.id <= 100 AND v2.id <= 1000;
+
+\timing on
+SELECT SUM(v1.embedding <-> v2.embedding)
+FROM vectors_pgvector v1
+CROSS JOIN vectors_pgvector v2
+WHERE v1.id <= 100 AND v2.id <= 1000;
+
+-- Cosine Distance
+\echo 'Cosine Distance (100k calculations):'
+\timing on
+SELECT SUM(ruvector_cosine_distance(v1.embedding, v2.embedding))
+FROM vectors_ruvector v1
+CROSS JOIN vectors_ruvector v2
+WHERE v1.id <= 100 AND v2.id <= 1000;
+
+\timing on
+SELECT SUM(v1.embedding <=> v2.embedding)
+FROM vectors_pgvector v1
+CROSS JOIN vectors_pgvector v2
+WHERE v1.id <= 100 AND v2.id <= 1000;
+
+-- Inner Product
+\echo 'Inner Product (100k calculations):'
+\timing on
+SELECT SUM(ruvector_inner_product(v1.embedding, v2.embedding))
+FROM vectors_ruvector v1
+CROSS JOIN vectors_ruvector v2
+WHERE v1.id <= 100 AND v2.id <= 1000;
+
+\timing on
+SELECT SUM(v1.embedding <#> v2.embedding)
+FROM vectors_pgvector v1
+CROSS JOIN vectors_pgvector v2
+WHERE v1.id <= 100 AND v2.id <= 1000;
+
+-- ============================================================================
+-- Benchmark 5: Index Recall Accuracy
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark 5: Index Recall ==='
+\echo ''
+
+-- Create ground truth table
+DROP TABLE IF EXISTS ground_truth;
+CREATE TEMP TABLE ground_truth AS
+SELECT
+    q.id AS query_id,
+    ARRAY_AGG(v.id ORDER BY v.embedding <-> q.query_vector) AS true_neighbors
+FROM queries q
+CROSS JOIN LATERAL (
+    SELECT id, embedding
+    FROM vectors_ruvector
+    ORDER BY embedding <-> q.query_vector
+    LIMIT :k
+) v
+WHERE q.id <= 100
+GROUP BY q.id;
+
+-- Compute recall for ruvector HNSW
+WITH hnsw_results AS (
+    SELECT
+        q.id AS query_id,
+        ARRAY_AGG(v.id ORDER BY v.embedding <-> q.query_vector) AS hnsw_neighbors
+    FROM queries q
+    CROSS JOIN LATERAL (
+        SELECT id
+        FROM vectors_ruvector
+        ORDER BY embedding <-> q.query_vector
+        LIMIT :k
+    ) v
+    WHERE q.id <= 100
+    GROUP BY q.id
+)
+SELECT
+    AVG(
+        (
+            SELECT COUNT(*)
+            FROM unnest(h.hnsw_neighbors) AS hn
+            WHERE hn = ANY(g.true_neighbors)
+        )::float / :k
+    ) AS recall
+FROM hnsw_results h
+JOIN ground_truth g ON h.query_id = g.query_id;
+
+-- ============================================================================
+-- Benchmark 6: Memory Usage
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark 6: Memory Usage ==='
+\echo ''
+
+-- Table sizes
+\echo 'Table sizes:'
+SELECT
+    'ruvector' AS type,
+    pg_size_pretty(pg_total_relation_size('vectors_ruvector')) AS total_size,
+    pg_size_pretty(pg_relation_size('vectors_ruvector')) AS table_size,
+    pg_size_pretty(pg_indexes_size('vectors_ruvector')) AS index_size
+UNION ALL
+SELECT
+    'pgvector' AS type,
+    pg_size_pretty(pg_total_relation_size('vectors_pgvector')) AS total_size,
+    pg_size_pretty(pg_relation_size('vectors_pgvector')) AS table_size,
+    pg_size_pretty(pg_indexes_size('vectors_pgvector')) AS index_size;
+
+-- Index sizes
+\echo 'Index sizes:'
+SELECT
+    indexname,
+    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
+FROM pg_indexes
+WHERE tablename IN ('vectors_ruvector', 'vectors_pgvector')
+ORDER BY tablename, indexname;
+
+-- ============================================================================
+-- Benchmark 7: Quantization Performance
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark 7: Quantization ==='
+\echo ''
+
+-- Create quantized tables
+DROP TABLE IF EXISTS vectors_scalar;
+CREATE TABLE vectors_scalar (
+    id SERIAL PRIMARY KEY,
+    embedding scalarvec
+);
+
+INSERT INTO vectors_scalar (embedding)
+SELECT quantize_scalar(embedding)
+FROM vectors_ruvector
+LIMIT 100000;
+
+-- Quantized search
+\echo 'Scalar quantized search:'
+\timing on
+SELECT id
+FROM vectors_scalar
+ORDER BY embedding <-> quantize_scalar((SELECT query_vector FROM queries WHERE id = 1))
+LIMIT :k;
+
+-- ============================================================================
+-- Cleanup
+-- ============================================================================
+
+\echo ''
+\echo '=== Benchmark Complete ==='
+\echo ''
+
+DROP TABLE IF EXISTS vectors_ruvector CASCADE;
+DROP TABLE IF EXISTS vectors_pgvector CASCADE;
+DROP TABLE IF EXISTS queries CASCADE;
+DROP TABLE IF EXISTS vectors_scalar CASCADE;
--- a/crates/ruvector-postgres/benches/sql/quick_benchmark.sql
+++ b/crates/ruvector-postgres/benches/sql/quick_benchmark.sql
@@ -0,0 +1,123 @@
+-- Quick benchmark script for development testing
+-- Smaller dataset for faster iteration
+
+\timing on
+\set ECHO all
+
+-- Configuration
+\set num_vectors 10000
+\set num_queries 100
+\set dims 768
+\set k 10
+
+BEGIN;
+
+-- ============================================================================
+-- Setup
+-- ============================================================================
+
+DROP TABLE IF EXISTS test_vectors CASCADE;
+DROP TABLE IF EXISTS test_queries CASCADE;
+
+CREATE TABLE test_vectors (
+    id SERIAL PRIMARY KEY,
+    embedding ruvector(:dims)
+);
+
+CREATE TABLE test_queries (
+    id SERIAL PRIMARY KEY,
+    query_vector ruvector(:dims)
+);
+
+-- ============================================================================
+-- Load Data
+-- ============================================================================
+
+\echo 'Loading test data...'
+
+INSERT INTO test_vectors (embedding)
+SELECT
+    array_to_ruvector(ARRAY(
+        SELECT random()::real
+        FROM generate_series(1, :dims)
+    ))
+FROM generate_series(1, :num_vectors);
+
+INSERT INTO test_queries (query_vector)
+SELECT
+    array_to_ruvector(ARRAY(
+        SELECT random()::real
+        FROM generate_series(1, :dims)
+    ))
+FROM generate_series(1, :num_queries);
+
+COMMIT;
+
+-- ============================================================================
+-- Sequential Scan Baseline
+-- ============================================================================
+
+\echo ''
+\echo 'Sequential scan baseline:'
+EXPLAIN ANALYZE
+SELECT id
+FROM test_vectors
+ORDER BY embedding <-> (SELECT query_vector FROM test_queries WHERE id = 1)
+LIMIT :k;
+
+-- ============================================================================
+-- Build HNSW Index
+-- ============================================================================
+
+\echo ''
+\echo 'Building HNSW index...'
+CREATE INDEX test_vectors_hnsw_idx ON test_vectors
+USING hnsw (embedding ruvector_l2_ops)
+WITH (m = 16, ef_construction = 64);
+
+-- ============================================================================
+-- Index Search
+-- ============================================================================
+
+\echo ''
+\echo 'HNSW index search:'
+EXPLAIN ANALYZE
+SELECT id
+FROM test_vectors
+ORDER BY embedding <-> (SELECT query_vector FROM test_queries WHERE id = 1)
+LIMIT :k;
+
+-- ============================================================================
+-- Distance Functions
+-- ============================================================================
+
+\echo ''
+\echo 'Distance function performance (1000 calculations):'
+
+-- L2
+\timing on
+SELECT SUM(ruvector_l2_distance(v1.embedding, v2.embedding))
+FROM test_vectors v1, test_vectors v2
+WHERE v1.id <= 10 AND v2.id <= 100;
+
+-- Cosine
+\timing on
+SELECT SUM(ruvector_cosine_distance(v1.embedding, v2.embedding))
+FROM test_vectors v1, test_vectors v2
+WHERE v1.id <= 10 AND v2.id <= 100;
+
+-- Inner Product
+\timing on
+SELECT SUM(ruvector_inner_product(v1.embedding, v2.embedding))
+FROM test_vectors v1, test_vectors v2
+WHERE v1.id <= 10 AND v2.id <= 100;
+
+-- ============================================================================
+-- Cleanup
+-- ============================================================================
+
+DROP TABLE IF EXISTS test_vectors CASCADE;
+DROP TABLE IF EXISTS test_queries CASCADE;
+
+\echo ''
+\echo 'Quick benchmark complete!'