# Ruvector-Bench [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Rust](https://img.shields.io/badge/rust-1.77%2B-orange.svg)](https://www.rust-lang.org) **Comprehensive benchmarking suite for measuring Ruvector performance across different operations and configurations.** > Professional-grade performance testing tools for validating sub-millisecond vector search, HNSW optimization, quantization efficiency, and cross-system comparisons. Built for developers who demand data-driven insights. ## 🎯 Overview The `ruvector-bench` crate provides a complete benchmarking infrastructure to measure and analyze Ruvector's performance characteristics. It includes standardized test suites compatible with [ann-benchmarks.com](http://ann-benchmarks.com), comprehensive latency profiling, memory usage analysis, and cross-system performance comparison tools. ### Key Features - ⚡ **ANN-Benchmarks Compatible**: Standard datasets (SIFT1M, GIST1M, Deep1M) and metrics - 📊 **Latency Profiling**: High-precision measurement of p50, p95, p99, p99.9 percentiles - 💾 **Memory Analysis**: Track memory usage with quantization and optimization techniques - 🔬 **AgenticDB Workloads**: Simulate real-world AI agent memory patterns - 🏆 **Cross-System Comparison**: Compare against Python baselines and other vector databases - 📈 **Comprehensive Reporting**: JSON, CSV, and Markdown output formats - 🔥 **Performance Profiling**: CPU flamegraphs and memory profiling support ## 📦 Installation Add to your `Cargo.toml`: ```toml [dev-dependencies] ruvector-bench = { path = "../ruvector-bench" } # Optional: Enable profiling features ruvector-bench = { path = "../ruvector-bench", features = ["profiling"] } # Optional: Enable HDF5 dataset loading ruvector-bench = { path = "../ruvector-bench", features = ["hdf5-datasets"] } ``` ## 🚀 Available Benchmarks The suite includes 6 specialized benchmark binaries: | Benchmark | Purpose | Metrics | |-----------|---------|---------| | **ann-benchmark** | ANN-Benchmarks compatibility | QPS, latency, recall@k, memory | | **agenticdb-benchmark** | AI agent memory workloads | Insert/search/update latency, memory | | **latency-benchmark** | Detailed latency profiling | p50/p95/p99/p99.9 latencies | | **memory-benchmark** | Memory usage analysis | Memory per vector, quantization savings | | **comparison-benchmark** | Cross-system performance | Ruvector vs baselines (10-100x faster) | | **profiling-benchmark** | CPU/memory profiling | Flamegraphs, allocation tracking | ## ⚡ Quick Start ### Running Basic Benchmarks ```bash # Run ANN-Benchmarks suite with default settings cargo run --bin ann-benchmark --release # Run with custom parameters cargo run --bin ann-benchmark --release -- \ --num-vectors 100000 \ --dimensions 384 \ --ef-search-values 50,100,200 \ --output bench_results # Run latency profiling cargo run --bin latency-benchmark --release # Run AgenticDB workload simulation cargo run --bin agenticdb-benchmark --release # Run cross-system comparison cargo run --bin comparison-benchmark --release ``` ### Running with Profiling ```bash # Build with profiling enabled cargo build --bin profiling-benchmark --release --features profiling # Run and generate flamegraph cargo run --bin profiling-benchmark --release --features profiling -- \ --enable-flamegraph \ --output profiling_results ``` ## 📊 Benchmark Categories ### 1. ANN-Benchmarks Suite (`ann-benchmark`) Standard benchmarking compatible with [ann-benchmarks.com](http://ann-benchmarks.com) methodology. **Supported Datasets:** - **SIFT1M**: 1M vectors, 128 dimensions (image descriptors) - **GIST1M**: 1M vectors, 960 dimensions (scene recognition) - **Deep1M**: 1M vectors, 96 dimensions (deep learning embeddings) - **Synthetic**: Configurable size and distribution **Usage:** ```bash # Test with synthetic data (default) cargo run --bin ann-benchmark --release -- \ --dataset synthetic \ --num-vectors 100000 \ --dimensions 384 \ --k 10 # Test with SIFT1M (requires dataset download) cargo run --bin ann-benchmark --release -- \ --dataset sift1m \ --ef-search-values 50,100,200,400 ``` **Measured Metrics:** - Queries per second (QPS) - Latency percentiles (p50, p95, p99, p99.9) - Recall@1, Recall@10, Recall@100 - Memory usage (MB) - Build/index time **Example Output:** ``` ╔════════════════════════════════════════╗ ║ Ruvector ANN-Benchmarks Suite ║ ╚════════════════════════════════════════╝ ✓ Dataset loaded: 100000 vectors, 1000 queries ============================================================ Testing with ef_search = 100 ============================================================ ┌───────────┬──────┬──────────┬──────────┬───────────┬─────────────┐ │ ef_search │ QPS │ p50 (ms) │ p99 (ms) │ Recall@10 │ Memory (MB) │ ├───────────┼──────┼──────────┼──────────┼───────────┼─────────────┤ │ 100 │ 5243 │ 0.19 │ 0.45 │ 95.23% │ 246.8 │ └───────────┴──────┴──────────┴──────────┴───────────┴─────────────┘ ``` ### 2. AgenticDB Workload Simulation (`agenticdb-benchmark`) Simulates real-world AI agent memory patterns with mixed read/write workloads. **Workload Types:** - **Conversational AI**: High read ratio (70/30 read/write) - **Learning Agents**: Balanced read/write (50/50) - **Batch Processing**: Write-heavy (30/70 read/write) **Usage:** ```bash cargo run --bin agenticdb-benchmark --release -- \ --workload conversational \ --num-vectors 50000 \ --num-operations 10000 ``` **Measured Operations:** - Insert latency - Search latency - Update latency - Batch operation throughput - Memory efficiency ### 3. Latency Profiling (`latency-benchmark`) Detailed latency analysis across different configurations and concurrency levels. **Test Scenarios:** - Single-threaded vs multi-threaded search - Effect of `ef_search` parameter on latency - Effect of quantization on latency/recall tradeoff - Concurrent query handling **Usage:** ```bash # Test with different thread counts cargo run --bin latency-benchmark --release -- \ --threads 1,4,8,16 \ --num-vectors 50000 \ --queries 1000 ``` **Example Output:** ``` Test 1: Single-threaded Latency - p50: 0.42ms - p95: 1.23ms - p99: 2.15ms - p99.9: 4.87ms Test 2: Multi-threaded Latency (8 threads) - p50: 0.38ms - p95: 1.05ms - p99: 1.89ms - p99.9: 3.92ms ``` ### 4. Memory Benchmarks (`memory-benchmark`) Analyzes memory usage with different quantization strategies. **Quantization Tests:** - **None**: Full precision (baseline) - **Scalar**: 4x compression - **Binary**: 32x compression **Usage:** ```bash cargo run --bin memory-benchmark --release -- \ --num-vectors 100000 \ --dimensions 384 ``` **Measured Metrics:** - Memory per vector (bytes) - Compression ratio - Memory overhead - Quantization impact on recall **Example Results:** ``` ┌──────────────┬─────────────┬───────────────┬────────────┐ │ Quantization │ Memory (MB) │ Bytes/Vector │ Recall@10 │ ├──────────────┼─────────────┼───────────────┼────────────┤ │ None │ 147.5 │ 1536 │ 100.00% │ │ Scalar │ 38.2 │ 398 │ 95.80% │ │ Binary │ 4.7 │ 49 │ 87.20% │ └──────────────┴─────────────┴───────────────┴────────────┘ ✓ Scalar quantization: 4.0x memory reduction, 4.2% recall loss ✓ Binary quantization: 31.4x memory reduction, 12.8% recall loss ``` ### 5. Cross-System Comparison (`comparison-benchmark`) Compare Ruvector against other implementations and baselines. **Comparison Targets:** - Ruvector (optimized: SIMD + Quantization + HNSW) - Ruvector (no quantization) - Simulated Python baseline (numpy) - Simulated brute-force search **Usage:** ```bash cargo run --bin comparison-benchmark --release -- \ --num-vectors 50000 \ --dimensions 384 ``` **Example Results:** ``` ┌──────────────────────────┬──────┬──────────┬─────────────┬────────────┐ │ System │ QPS │ p50 (ms) │ Memory (MB) │ Speedup │ ├──────────────────────────┼──────┼──────────┼─────────────┼────────────┤ │ Ruvector (optimized) │ 5243 │ 0.19 │ 38.2 │ 1.0x │ │ Ruvector (no quant) │ 4891 │ 0.20 │ 147.5 │ 0.93x │ │ Python baseline │ 89 │ 11.2 │ 153.6 │ 58.9x │ │ Brute-force │ 12 │ 83.3 │ 147.5 │ 437x │ └──────────────────────────┴──────┴──────────┴─────────────┴────────────┘ ✓ Ruvector is 58.9x faster than Python baseline ✓ Ruvector uses 74.1% less memory with quantization ``` ### 6. Performance Profiling (`profiling-benchmark`) CPU and memory profiling with flamegraph generation (requires `profiling` feature). **Usage:** ```bash # Build with profiling support cargo build --bin profiling-benchmark --release --features profiling # Run with flamegraph generation cargo run --bin profiling-benchmark --release --features profiling -- \ --enable-flamegraph \ --num-vectors 50000 \ --output profiling_results # View flamegraph open profiling_results/flamegraph.svg ``` **Generated Artifacts:** - CPU flamegraph (SVG) - Memory allocation profile - Hotspot analysis - Function-level timing breakdown ## 📈 Interpreting Results ### Latency Metrics | Percentile | Meaning | Target | |------------|---------|--------| | **p50** | Median latency - typical query performance | <0.5ms | | **p95** | 95% of queries complete within this time | <1.5ms | | **p99** | 99% of queries complete within this time | <3.0ms | | **p99.9** | 99.9% of queries (tail latency) | <5.0ms | ### Recall Metrics - **Recall@k**: Fraction of true nearest neighbors found in top-k results - **Target Recall@10**: ≥95% for most applications - **Trade-off**: Higher `ef_search` → better recall, higher latency ### Memory Efficiency ``` Memory per vector = Total Memory / Number of Vectors Typical values: - No quantization: ~1536 bytes (384D float32) - Scalar quantization: ~400 bytes (4x compression) - Binary quantization: ~50 bytes (32x compression) ``` ## 🔧 Benchmark Configuration Options ### Common Options (All Benchmarks) ```bash --num-vectors # Number of vectors to index (default: 50000) --dimensions # Vector dimensions (default: 384) --output # Output directory for results (default: bench_results) ``` ### ANN-Benchmark Specific ```bash --dataset # Dataset: sift1m, gist1m, deep1m, synthetic --num-queries # Number of search queries (default: 1000) --k # Number of nearest neighbors to retrieve (default: 10) --m # HNSW M parameter (default: 32) --ef-construction # HNSW build parameter (default: 200) --ef-search-values # Comma-separated ef_search values to test (default: 50,100,200,400) --metric # Distance metric: cosine, euclidean, dot (default: cosine) --quantization # Quantization: none, scalar, binary (default: scalar) ``` ### Latency-Benchmark Specific ```bash --threads # Comma-separated thread counts (default: 1,4,8,16) ``` ### AgenticDB-Benchmark Specific ```bash --workload # Workload type: conversational, learning, batch --num-operations # Number of operations to perform (default: 10000) ``` ### Profiling-Benchmark Specific ```bash --enable-flamegraph # Generate CPU flamegraph (requires profiling feature) --enable-memory-profile # Enable detailed memory profiling ``` ## 🎨 Custom Benchmark Creation Create your own benchmarks using the `ruvector-bench` library: ```rust use ruvector_bench::{ BenchmarkResult, DatasetGenerator, LatencyStats, MemoryProfiler, ResultWriter, VectorDistribution, }; use ruvector_core::{VectorDB, DbOptions, SearchQuery, VectorEntry}; use std::time::Instant; fn my_custom_benchmark() -> anyhow::Result<()> { // Generate test data let gen = DatasetGenerator::new(384, VectorDistribution::Normal { mean: 0.0, std_dev: 1.0, }); let vectors = gen.generate(10000); let queries = gen.generate(100); // Create database let db = VectorDB::new(DbOptions::default())?; // Measure indexing let mem_profiler = MemoryProfiler::new(); let build_start = Instant::now(); for (idx, vector) in vectors.iter().enumerate() { db.insert(VectorEntry { id: Some(idx.to_string()), vector: vector.clone(), metadata: None, })?; } let build_time = build_start.elapsed(); // Measure search performance let mut latency_stats = LatencyStats::new()?; for query in &queries { let start = Instant::now(); db.search(SearchQuery { vector: query.clone(), k: 10, filter: None, ef_search: None, })?; latency_stats.record(start.elapsed())?; } // Print results println!("Build time: {:.2}s", build_time.as_secs_f64()); println!("p50 latency: {:.2}ms", latency_stats.percentile(0.50).as_secs_f64() * 1000.0); println!("p99 latency: {:.2}ms", latency_stats.percentile(0.99).as_secs_f64() * 1000.0); println!("Memory usage: {:.2}MB", mem_profiler.current_usage_mb()); Ok(()) } ``` ## 🔄 CI/CD Integration ### GitHub Actions Example ```yaml name: Benchmarks on: push: branches: [main] pull_request: branches: [main] jobs: benchmark: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install Rust uses: actions-rs/toolchain@v1 with: toolchain: stable profile: minimal - name: Run benchmarks run: | cd crates/ruvector-bench cargo run --bin ann-benchmark --release -- --output ci_results cargo run --bin latency-benchmark --release -- --output ci_results - name: Upload results uses: actions/upload-artifact@v3 with: name: benchmark-results path: crates/ruvector-bench/ci_results/ - name: Check performance regression run: | python scripts/check_regression.py ci_results/ann_benchmark.json ``` ## 📉 Performance Regression Testing Track performance over time using historical benchmark data: ```bash # Run baseline benchmarks (on main branch) git checkout main cargo run --bin ann-benchmark --release -- --output baseline_results # Run comparison benchmarks (on feature branch) git checkout feature-branch cargo run --bin ann-benchmark --release -- --output feature_results # Compare results python scripts/compare_benchmarks.py \ baseline_results/ann_benchmark.json \ feature_results/ann_benchmark.json ``` **Regression Thresholds:** - ✅ **Pass**: <5% latency regression, <10% memory regression - ⚠️ **Warning**: 5-10% latency regression, 10-20% memory regression - ❌ **Fail**: >10% latency regression, >20% memory regression ## 📊 Results Visualization Benchmark results are automatically saved in multiple formats: ### JSON Format ```json { "name": "ruvector-ef100", "dataset": "synthetic", "dimensions": 384, "num_vectors": 100000, "qps": 5243.2, "latency_p50": 0.19, "latency_p99": 2.15, "recall_at_10": 0.9523, "memory_mb": 38.2 } ``` ### CSV Format ```csv name,dataset,dimensions,num_vectors,qps,p50,p99,recall@10,memory_mb ruvector-ef100,synthetic,384,100000,5243.2,0.19,2.15,0.9523,38.2 ``` ### Markdown Report Results include automatically generated markdown reports with detailed performance analysis. ### Custom Visualization Generate performance charts using the provided data: ```python import pandas as pd import matplotlib.pyplot as plt # Load benchmark results df = pd.read_csv('bench_results/ann_benchmark.csv') # Plot QPS vs Recall tradeoff plt.figure(figsize=(10, 6)) plt.scatter(df['recall@10'] * 100, df['qps']) plt.xlabel('Recall@10 (%)') plt.ylabel('Queries per Second') plt.title('Ruvector Performance: QPS vs Recall') plt.grid(True) plt.savefig('qps_vs_recall.png') ``` ## 🔗 Links to Benchmark Reports - [Latest Benchmark Results](../../benchmarks/LOAD_TEST_SCENARIOS.md) - [Performance Optimization Guide](../../docs/cloud-architecture/PERFORMANCE_OPTIMIZATION_GUIDE.md) - [Implementation Summary](../../docs/IMPLEMENTATION_SUMMARY.md) - [ANN-Benchmarks.com](http://ann-benchmarks.com) - Standard vector search benchmarks ## 🎯 Optimization Based on Benchmarks ### Use Benchmark Results to Tune Performance 1. **Optimize for Latency** (sub-millisecond queries): ```rust HnswConfig { m: 16, // Lower M = faster search, less recall ef_construction: 100, ef_search: 50, // Lower ef_search = faster, less recall max_elements: 100000, } ``` 2. **Optimize for Recall** (95%+ accuracy): ```rust HnswConfig { m: 64, // Higher M = better recall ef_construction: 400, ef_search: 200, // Higher ef_search = better recall max_elements: 100000, } ``` 3. **Optimize for Memory** (minimal footprint): ```rust DbOptions { quantization: Some(QuantizationConfig::Binary), // 32x compression ..Default::default() } ``` ### Recommended Configurations by Use Case | Use Case | M | ef_construction | ef_search | Quantization | Expected Performance | |----------|---|----------------|-----------|--------------|----------------------| | **Low-Latency Search** | 16 | 100 | 50 | Scalar | <0.5ms p50, 90%+ recall | | **Balanced** | 32 | 200 | 100 | Scalar | <1ms p50, 95%+ recall | | **High Accuracy** | 64 | 400 | 200 | None | <2ms p50, 98%+ recall | | **Memory Constrained** | 16 | 100 | 50 | Binary | <1ms p50, 85%+ recall, 32x compression | ## 🛠️ Development ### Running Tests ```bash # Run unit tests cargo test -p ruvector-bench # Run specific benchmark cargo test -p ruvector-bench --test latency_stats_test ``` ### Building Documentation ```bash # Generate API documentation cargo doc -p ruvector-bench --open ``` ### Adding New Benchmarks 1. Create a new binary in `src/bin/`: ```bash touch src/bin/my_benchmark.rs ``` 2. Add to `Cargo.toml`: ```toml [[bin]] name = "my-benchmark" path = "src/bin/my_benchmark.rs" ``` 3. Implement using `ruvector-bench` utilities: ```rust use ruvector_bench::{LatencyStats, ResultWriter}; ``` ## 📚 API Reference ### Core Types - **`BenchmarkResult`**: Comprehensive benchmark result structure - **`LatencyStats`**: HDR histogram-based latency measurement - **`DatasetGenerator`**: Synthetic vector data generation - **`MemoryProfiler`**: Memory usage tracking - **`ResultWriter`**: Multi-format result output (JSON, CSV, Markdown) ### Utilities - **`calculate_recall()`**: Compute recall@k metric - **`create_progress_bar()`**: Terminal progress indication - **`VectorDistribution`**: Uniform, Normal, or Clustered vector generation See [full API documentation](https://docs.rs/ruvector-bench) for details. ## 🤝 Contributing We welcome contributions to improve the benchmarking suite! ### Areas for Contribution - 📊 Additional benchmark scenarios (concurrent writes, updates, deletes) - 🔌 Integration with other vector databases (Pinecone, Qdrant, Milvus) - 📈 Enhanced visualization and reporting - 🎯 Real-world dataset support (SIFT, GIST, Deep1M loaders) - 🚀 Performance optimization insights See [Contributing Guidelines](../../docs/development/CONTRIBUTING.md) for details. ## 📜 License This crate is part of the Ruvector project and is licensed under the MIT License. ---
**Part of [Ruvector](../../README.md) - Next-generation vector database built in Rust** Built by [rUv](https://ruv.io) • [GitHub](https://github.com/ruvnet/ruvector) • [Documentation](../../docs/README.md)