# RuVector Benchmarking Suite Comprehensive benchmarking tool for testing the globally distributed RuVector vector search system at scale (500M+ concurrent connections). ## Table of Contents - [Overview](#overview) - [Features](#features) - [Prerequisites](#prerequisites) - [Installation](#installation) - [Quick Start](#quick-start) - [Benchmark Scenarios](#benchmark-scenarios) - [Running Benchmarks](#running-benchmarks) - [Understanding Results](#understanding-results) - [Best Practices](#best-practices) - [Cost Estimation](#cost-estimation) - [Troubleshooting](#troubleshooting) - [Advanced Usage](#advanced-usage) ## Overview This benchmarking suite provides enterprise-grade load testing capabilities for RuVector, supporting: - **Massive Scale**: Test up to 25B concurrent connections - **Multi-Region**: Distributed load generation across 11 GCP regions - **Comprehensive Metrics**: Latency, throughput, errors, resource utilization, costs - **SLA Validation**: Automated checking against 99.99% availability, <50ms p99 latency targets - **Advanced Analysis**: Statistical analysis, bottleneck identification, recommendations ## Features ### Load Generation - Multi-protocol support (HTTP, HTTP/2, WebSocket, gRPC) - Realistic query patterns (uniform, hotspot, Zipfian, burst) - Configurable ramp-up/down rates - Connection lifecycle management - Geographic distribution ### Metrics Collection - Latency distribution (p50, p90, p95, p99, p99.9) - Throughput tracking (QPS, bandwidth) - Error analysis by type and region - Resource utilization (CPU, memory, network) - Cost per million queries - Regional performance comparison ### Analysis & Reporting - Statistical analysis with anomaly detection - SLA compliance checking - Bottleneck identification - Performance score calculation - Actionable recommendations - Interactive visualization dashboard - Markdown and JSON reports - CSV export for further analysis ## Prerequisites ### Required - **Node.js**: v18+ (for TypeScript execution) - **k6**: Latest version ([installation guide](https://k6.io/docs/getting-started/installation/)) - **Access**: RuVector cluster endpoint ### Optional - **Claude Flow**: For hooks integration ```bash npm install -g claude-flow@alpha ``` - **Docker**: For containerized execution - **GCP Account**: For multi-region load generation ## Installation 1. **Clone Repository** ```bash cd /home/user/ruvector/benchmarks ``` 2. **Install Dependencies** ```bash npm install -g typescript ts-node npm install k6 @types/k6 ``` 3. **Verify Installation** ```bash k6 version ts-node --version ``` 4. **Configure Environment** ```bash export BASE_URL="https://your-ruvector-cluster.example.com" export PARALLEL=2 # Number of parallel scenarios ``` ## Quick Start ### Run a Single Scenario ```bash # Quick validation (100M connections, 45 minutes) ts-node benchmark-runner.ts run baseline_100m # Full baseline test (500M connections, 3+ hours) ts-node benchmark-runner.ts run baseline_500m # Burst test (10x spike to 5B connections) ts-node benchmark-runner.ts run burst_10x ``` ### Run Scenario Groups ```bash # Quick validation suite (~1 hour) ts-node benchmark-runner.ts group quick_validation # Standard test suite (~6 hours) ts-node benchmark-runner.ts group standard_suite # Full stress testing suite (~10 hours) ts-node benchmark-runner.ts group stress_suite # All scenarios (~48 hours) ts-node benchmark-runner.ts group full_suite ``` ### List Available Tests ```bash ts-node benchmark-runner.ts list ``` ## Benchmark Scenarios ### Baseline Tests #### baseline_500m - **Description**: Steady-state operation with 500M concurrent connections - **Duration**: 3h 15m - **Target**: P99 < 50ms, 99.99% availability - **Use Case**: Production capacity validation #### baseline_100m - **Description**: Smaller baseline for quick validation - **Duration**: 45m - **Target**: P99 < 50ms, 99.99% availability - **Use Case**: CI/CD integration, quick regression tests ### Burst Tests #### burst_10x - **Description**: Sudden spike to 5B concurrent (10x baseline) - **Duration**: 20m - **Target**: P99 < 100ms, 99.9% availability - **Use Case**: Flash sale, viral event simulation #### burst_25x - **Description**: Extreme spike to 12.5B concurrent (25x baseline) - **Duration**: 35m - **Target**: P99 < 150ms, 99.5% availability - **Use Case**: Major global event (Olympics, elections) #### burst_50x - **Description**: Maximum spike to 25B concurrent (50x baseline) - **Duration**: 50m - **Target**: P99 < 200ms, 99% availability - **Use Case**: Stress testing absolute limits ### Failover Tests #### regional_failover - **Description**: Test recovery when one region fails - **Duration**: 45m - **Target**: <10% throughput degradation, <1% errors - **Use Case**: Disaster recovery validation #### multi_region_failover - **Description**: Test recovery when multiple regions fail - **Duration**: 55m - **Target**: <20% throughput degradation, <2% errors - **Use Case**: Multi-region outage preparation ### Workload Tests #### read_heavy - **Description**: 95% reads, 5% writes (typical production workload) - **Duration**: 1h 50m - **Target**: P99 < 50ms, 99.99% availability - **Use Case**: Production simulation #### write_heavy - **Description**: 70% writes, 30% reads (batch indexing scenario) - **Duration**: 1h 50m - **Target**: P99 < 80ms, 99.95% availability - **Use Case**: Bulk data ingestion #### balanced_workload - **Description**: 50% reads, 50% writes - **Duration**: 1h 50m - **Target**: P99 < 60ms, 99.98% availability - **Use Case**: Mixed workload validation ### Real-World Scenarios #### world_cup - **Description**: Predictable spike with geographic concentration (Europe) - **Duration**: 3h - **Target**: P99 < 100ms during matches - **Use Case**: Major sporting event #### black_friday - **Description**: Sustained high load with periodic spikes - **Duration**: 14h - **Target**: P99 < 80ms, 99.95% availability - **Use Case**: E-commerce peak period ## Running Benchmarks ### Basic Usage ```bash # Set environment variables export BASE_URL="https://ruvector.example.com" export REGION="us-east1" # Run single test ts-node benchmark-runner.ts run baseline_500m # Run with custom config BASE_URL="https://staging.example.com" \ PARALLEL=3 \ ts-node benchmark-runner.ts group standard_suite ``` ### With Claude Flow Hooks ```bash # Enable hooks (default) export ENABLE_HOOKS=true # Disable hooks export ENABLE_HOOKS=false ts-node benchmark-runner.ts run baseline_500m ``` Hooks will automatically: - Execute `npx claude-flow@alpha hooks pre-task` before each test - Store results in swarm memory - Execute `npx claude-flow@alpha hooks post-task` after completion ### Multi-Region Execution To distribute load across regions: ```bash # Deploy load generators to GCP regions for region in us-east1 us-west1 europe-west1 asia-east1; do gcloud compute instances create "k6-${region}" \ --zone="${region}-a" \ --machine-type="n2-standard-32" \ --image-family="ubuntu-2004-lts" \ --image-project="ubuntu-os-cloud" \ --metadata-from-file=startup-script=setup-k6.sh done # Run distributed test ts-node benchmark-runner.ts run baseline_500m ``` ### Docker Execution ```bash # Build container docker build -t ruvector-benchmark . # Run test docker run \ -e BASE_URL="https://ruvector.example.com" \ -v $(pwd)/results:/results \ ruvector-benchmark run baseline_500m ``` ## Understanding Results ### Output Structure ``` results/ run-{timestamp}/ {scenario}-{timestamp}-raw.json # Raw K6 metrics {scenario}-{timestamp}-metrics.json # Processed metrics {scenario}-{timestamp}-metrics.csv # CSV export {scenario}-{timestamp}-analysis.json # Analysis report {scenario}-{timestamp}-report.md # Markdown report SUMMARY.md # Multi-scenario summary ``` ### Key Metrics #### Latency - **P50 (Median)**: 50% of requests faster than this - **P90**: 90% of requests faster than this - **P95**: 95% of requests faster than this - **P99**: 99% of requests faster than this (SLA target) - **P99.9**: 99.9% of requests faster than this **Target**: P99 < 50ms for baseline, <100ms for burst #### Throughput - **QPS**: Queries per second - **Peak QPS**: Maximum sustained throughput - **Average QPS**: Mean throughput over test duration **Target**: 50M QPS for 500M baseline connections #### Error Rate - **Total Errors**: Count of failed requests - **Error Rate %**: Percentage of requests that failed - **By Type**: Breakdown (timeout, connection, server, client) - **By Region**: Geographic distribution **Target**: < 0.01% error rate (99.99% success) #### Availability - **Uptime %**: Percentage of time system was available - **Downtime**: Total milliseconds of unavailability - **MTBF**: Mean time between failures - **MTTR**: Mean time to recovery **Target**: 99.99% availability (52 minutes/year downtime) #### Resource Utilization - **CPU %**: Average and peak CPU usage - **Memory %**: Average and peak memory usage - **Network**: Bandwidth, ingress/egress bytes - **Per Region**: Resource usage by geographic location **Alert Thresholds**: CPU > 80%, Memory > 85% #### Cost - **Total Cost**: Compute + network + storage - **Cost Per Million**: Queries per million queries - **Per Region**: Cost breakdown by location **Target**: < $0.50 per million queries ### Performance Score Overall score (0-100) calculated from: - **Performance** (35%): Latency and throughput - **Reliability** (35%): Availability and error rate - **Scalability** (20%): Resource utilization efficiency - **Efficiency** (10%): Cost effectiveness **Grades**: - 90-100: Excellent - 80-89: Good - 70-79: Fair - 60-69: Needs Improvement - <60: Poor ### SLA Compliance ✅ **PASSED** if all criteria met: - P99 latency < 50ms (baseline) or scenario target - Availability >= 99.99% - Error rate < 0.01% ❌ **FAILED** if any criterion violated ### Analysis Report Each test generates an analysis report with: 1. **Statistical Analysis** - Summary statistics - Distribution histograms - Time series charts - Anomaly detection 2. **SLA Compliance** - Pass/fail status - Violation details - Duration and severity 3. **Bottlenecks** - Identified constraints - Current vs. threshold values - Impact assessment - Recommendations 4. **Recommendations** - Prioritized action items - Implementation guidance - Estimated impact and cost ### Visualization Dashboard Open `visualization-dashboard.html` in a browser to view: - Real-time metrics - Interactive charts - Geographic heat maps - Historical comparisons - Cost analysis ## Best Practices ### Before Running Tests 1. **Baseline Environment** - Ensure cluster is healthy - No active deployments or maintenance - Stable configuration 2. **Resource Allocation** - Sufficient load generator capacity - Network bandwidth provisioned - Monitoring systems ready 3. **Communication** - Notify team of upcoming test - Schedule during low-traffic periods - Have rollback plan ready ### During Tests 1. **Monitoring** - Watch real-time metrics - Check for anomalies - Monitor costs 2. **Safety** - Start with smaller tests (baseline_100m) - Gradually increase load - Be ready to abort if issues detected 3. **Documentation** - Note any unusual events - Document configuration changes - Record observations ### After Tests 1. **Analysis** - Review all metrics - Identify bottlenecks - Compare to previous runs 2. **Reporting** - Share results with team - Document findings - Create action items 3. **Follow-Up** - Implement recommendations - Re-test after changes - Track improvements over time ### Test Frequency - **Quick Validation**: Daily (CI/CD) - **Standard Suite**: Weekly - **Stress Testing**: Monthly - **Full Suite**: Quarterly ## Cost Estimation ### Load Generation Costs Per hour of testing: - **Compute**: ~$1,000/hour (distributed load generators) - **Network**: ~$200/hour (egress traffic) - **Storage**: ~$10/hour (results storage) **Total**: ~$1,200/hour ### Scenario Cost Estimates | Scenario | Duration | Estimated Cost | |----------|----------|----------------| | baseline_100m | 45m | $900 | | baseline_500m | 3h 15m | $3,900 | | burst_10x | 20m | $400 | | burst_25x | 35m | $700 | | burst_50x | 50m | $1,000 | | read_heavy | 1h 50m | $2,200 | | world_cup | 3h | $3,600 | | black_friday | 14h | $16,800 | | **Full Suite** | ~48h | **~$57,600** | ### Cost Optimization 1. **Use Spot Instances**: 60-80% savings on load generators 2. **Regional Selection**: Test in fewer regions 3. **Shorter Duration**: Reduce steady-state phase 4. **Parallel Execution**: Minimize total runtime ## Troubleshooting ### Common Issues #### K6 Not Found ```bash # Install k6 brew install k6 # macOS sudo apt install k6 # Linux choco install k6 # Windows ``` #### Connection Refused ```bash # Check cluster endpoint curl -v https://your-ruvector-cluster.example.com/health # Verify network connectivity ping your-ruvector-cluster.example.com ``` #### Out of Memory ```bash # Increase Node.js memory limit export NODE_OPTIONS="--max-old-space-size=8192" # Use smaller scenario ts-node benchmark-runner.ts run baseline_100m ``` #### High Error Rate - Check cluster health - Verify capacity (not overloaded) - Review network latency - Check authentication/authorization #### Slow Performance - Insufficient load generator capacity - Network bandwidth limitations - Target cluster under-provisioned - Configuration issues (connection limits, timeouts) ### Debug Mode ```bash # Enable verbose logging export DEBUG=true export LOG_LEVEL=debug ts-node benchmark-runner.ts run baseline_500m ``` ### Support For issues or questions: - GitHub Issues: https://github.com/ruvnet/ruvector/issues - Documentation: https://docs.ruvector.io - Community: https://discord.gg/ruvector ## Advanced Usage ### Custom Scenarios Create custom scenario in `benchmark-scenarios.ts`: ```typescript export const SCENARIOS = { ...SCENARIOS, my_custom_test: { name: 'My Custom Test', description: 'Custom workload pattern', config: { targetConnections: 1000000000, rampUpDuration: '15m', steadyStateDuration: '1h', rampDownDuration: '10m', queriesPerConnection: 100, queryInterval: '1000', protocol: 'http', vectorDimension: 768, queryPattern: 'uniform', }, k6Options: { // K6 configuration }, expectedMetrics: { p99Latency: 50, errorRate: 0.01, throughput: 100000000, availability: 99.99, }, duration: '1h25m', tags: ['custom'], }, }; ``` ### Integration with CI/CD ```yaml # .github/workflows/benchmark.yml name: Benchmark on: schedule: - cron: '0 0 * * 0' # Weekly workflow_dispatch: jobs: benchmark: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: actions/setup-node@v3 - name: Install k6 run: | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69 echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list sudo apt-get update sudo apt-get install k6 - name: Run benchmark env: BASE_URL: ${{ secrets.BASE_URL }} run: | cd benchmarks ts-node benchmark-runner.ts run baseline_100m - name: Upload results uses: actions/upload-artifact@v3 with: name: benchmark-results path: benchmarks/results/ ``` ### Programmatic Usage ```typescript import { BenchmarkRunner } from './benchmark-runner'; const runner = new BenchmarkRunner({ baseUrl: 'https://ruvector.example.com', parallelScenarios: 2, enableHooks: true, }); // Run single scenario const run = await runner.runScenario('baseline_500m'); console.log(`Score: ${run.analysis?.score.overall}/100`); // Run multiple scenarios const results = await runner.runScenarios([ 'baseline_500m', 'burst_10x', 'read_heavy', ]); // Check if all passed SLA const allPassed = Array.from(results.values()).every( r => r.analysis?.slaCompliance.met ); ``` --- **Happy Benchmarking!** 🚀 For questions or contributions, please visit: https://github.com/ruvnet/ruvector