Files

RuVector Benchmarking Suite

Comprehensive benchmarking tool for testing the globally distributed RuVector vector search system at scale (500M+ concurrent connections).

Table of Contents

Overview

This benchmarking suite provides enterprise-grade load testing capabilities for RuVector, supporting:

  • Massive Scale: Test up to 25B concurrent connections
  • Multi-Region: Distributed load generation across 11 GCP regions
  • Comprehensive Metrics: Latency, throughput, errors, resource utilization, costs
  • SLA Validation: Automated checking against 99.99% availability, <50ms p99 latency targets
  • Advanced Analysis: Statistical analysis, bottleneck identification, recommendations

Features

Load Generation

  • Multi-protocol support (HTTP, HTTP/2, WebSocket, gRPC)
  • Realistic query patterns (uniform, hotspot, Zipfian, burst)
  • Configurable ramp-up/down rates
  • Connection lifecycle management
  • Geographic distribution

Metrics Collection

  • Latency distribution (p50, p90, p95, p99, p99.9)
  • Throughput tracking (QPS, bandwidth)
  • Error analysis by type and region
  • Resource utilization (CPU, memory, network)
  • Cost per million queries
  • Regional performance comparison

Analysis & Reporting

  • Statistical analysis with anomaly detection
  • SLA compliance checking
  • Bottleneck identification
  • Performance score calculation
  • Actionable recommendations
  • Interactive visualization dashboard
  • Markdown and JSON reports
  • CSV export for further analysis

Prerequisites

Required

  • Node.js: v18+ (for TypeScript execution)
  • k6: Latest version (installation guide)
  • Access: RuVector cluster endpoint

Optional

  • Claude Flow: For hooks integration
    npm install -g claude-flow@alpha
    
  • Docker: For containerized execution
  • GCP Account: For multi-region load generation

Installation

  1. Clone Repository

    cd /home/user/ruvector/benchmarks
    
  2. Install Dependencies

    npm install -g typescript ts-node
    npm install k6 @types/k6
    
  3. Verify Installation

    k6 version
    ts-node --version
    
  4. Configure Environment

    export BASE_URL="https://your-ruvector-cluster.example.com"
    export PARALLEL=2  # Number of parallel scenarios
    

Quick Start

Run a Single Scenario

# Quick validation (100M connections, 45 minutes)
ts-node benchmark-runner.ts run baseline_100m

# Full baseline test (500M connections, 3+ hours)
ts-node benchmark-runner.ts run baseline_500m

# Burst test (10x spike to 5B connections)
ts-node benchmark-runner.ts run burst_10x

Run Scenario Groups

# Quick validation suite (~1 hour)
ts-node benchmark-runner.ts group quick_validation

# Standard test suite (~6 hours)
ts-node benchmark-runner.ts group standard_suite

# Full stress testing suite (~10 hours)
ts-node benchmark-runner.ts group stress_suite

# All scenarios (~48 hours)
ts-node benchmark-runner.ts group full_suite

List Available Tests

ts-node benchmark-runner.ts list

Benchmark Scenarios

Baseline Tests

baseline_500m

  • Description: Steady-state operation with 500M concurrent connections
  • Duration: 3h 15m
  • Target: P99 < 50ms, 99.99% availability
  • Use Case: Production capacity validation

baseline_100m

  • Description: Smaller baseline for quick validation
  • Duration: 45m
  • Target: P99 < 50ms, 99.99% availability
  • Use Case: CI/CD integration, quick regression tests

Burst Tests

burst_10x

  • Description: Sudden spike to 5B concurrent (10x baseline)
  • Duration: 20m
  • Target: P99 < 100ms, 99.9% availability
  • Use Case: Flash sale, viral event simulation

burst_25x

  • Description: Extreme spike to 12.5B concurrent (25x baseline)
  • Duration: 35m
  • Target: P99 < 150ms, 99.5% availability
  • Use Case: Major global event (Olympics, elections)

burst_50x

  • Description: Maximum spike to 25B concurrent (50x baseline)
  • Duration: 50m
  • Target: P99 < 200ms, 99% availability
  • Use Case: Stress testing absolute limits

Failover Tests

regional_failover

  • Description: Test recovery when one region fails
  • Duration: 45m
  • Target: <10% throughput degradation, <1% errors
  • Use Case: Disaster recovery validation

multi_region_failover

  • Description: Test recovery when multiple regions fail
  • Duration: 55m
  • Target: <20% throughput degradation, <2% errors
  • Use Case: Multi-region outage preparation

Workload Tests

read_heavy

  • Description: 95% reads, 5% writes (typical production workload)
  • Duration: 1h 50m
  • Target: P99 < 50ms, 99.99% availability
  • Use Case: Production simulation

write_heavy

  • Description: 70% writes, 30% reads (batch indexing scenario)
  • Duration: 1h 50m
  • Target: P99 < 80ms, 99.95% availability
  • Use Case: Bulk data ingestion

balanced_workload

  • Description: 50% reads, 50% writes
  • Duration: 1h 50m
  • Target: P99 < 60ms, 99.98% availability
  • Use Case: Mixed workload validation

Real-World Scenarios

world_cup

  • Description: Predictable spike with geographic concentration (Europe)
  • Duration: 3h
  • Target: P99 < 100ms during matches
  • Use Case: Major sporting event

black_friday

  • Description: Sustained high load with periodic spikes
  • Duration: 14h
  • Target: P99 < 80ms, 99.95% availability
  • Use Case: E-commerce peak period

Running Benchmarks

Basic Usage

# Set environment variables
export BASE_URL="https://ruvector.example.com"
export REGION="us-east1"

# Run single test
ts-node benchmark-runner.ts run baseline_500m

# Run with custom config
BASE_URL="https://staging.example.com" \
PARALLEL=3 \
ts-node benchmark-runner.ts group standard_suite

With Claude Flow Hooks

# Enable hooks (default)
export ENABLE_HOOKS=true

# Disable hooks
export ENABLE_HOOKS=false

ts-node benchmark-runner.ts run baseline_500m

Hooks will automatically:

  • Execute npx claude-flow@alpha hooks pre-task before each test
  • Store results in swarm memory
  • Execute npx claude-flow@alpha hooks post-task after completion

Multi-Region Execution

To distribute load across regions:

# Deploy load generators to GCP regions
for region in us-east1 us-west1 europe-west1 asia-east1; do
  gcloud compute instances create "k6-${region}" \
    --zone="${region}-a" \
    --machine-type="n2-standard-32" \
    --image-family="ubuntu-2004-lts" \
    --image-project="ubuntu-os-cloud" \
    --metadata-from-file=startup-script=setup-k6.sh
done

# Run distributed test
ts-node benchmark-runner.ts run baseline_500m

Docker Execution

# Build container
docker build -t ruvector-benchmark .

# Run test
docker run \
  -e BASE_URL="https://ruvector.example.com" \
  -v $(pwd)/results:/results \
  ruvector-benchmark run baseline_500m

Understanding Results

Output Structure

results/
  run-{timestamp}/
    {scenario}-{timestamp}-raw.json       # Raw K6 metrics
    {scenario}-{timestamp}-metrics.json   # Processed metrics
    {scenario}-{timestamp}-metrics.csv    # CSV export
    {scenario}-{timestamp}-analysis.json  # Analysis report
    {scenario}-{timestamp}-report.md      # Markdown report
    SUMMARY.md                            # Multi-scenario summary

Key Metrics

Latency

  • P50 (Median): 50% of requests faster than this
  • P90: 90% of requests faster than this
  • P95: 95% of requests faster than this
  • P99: 99% of requests faster than this (SLA target)
  • P99.9: 99.9% of requests faster than this

Target: P99 < 50ms for baseline, <100ms for burst

Throughput

  • QPS: Queries per second
  • Peak QPS: Maximum sustained throughput
  • Average QPS: Mean throughput over test duration

Target: 50M QPS for 500M baseline connections

Error Rate

  • Total Errors: Count of failed requests
  • Error Rate %: Percentage of requests that failed
  • By Type: Breakdown (timeout, connection, server, client)
  • By Region: Geographic distribution

Target: < 0.01% error rate (99.99% success)

Availability

  • Uptime %: Percentage of time system was available
  • Downtime: Total milliseconds of unavailability
  • MTBF: Mean time between failures
  • MTTR: Mean time to recovery

Target: 99.99% availability (52 minutes/year downtime)

Resource Utilization

  • CPU %: Average and peak CPU usage
  • Memory %: Average and peak memory usage
  • Network: Bandwidth, ingress/egress bytes
  • Per Region: Resource usage by geographic location

Alert Thresholds: CPU > 80%, Memory > 85%

Cost

  • Total Cost: Compute + network + storage
  • Cost Per Million: Queries per million queries
  • Per Region: Cost breakdown by location

Target: < $0.50 per million queries

Performance Score

Overall score (0-100) calculated from:

  • Performance (35%): Latency and throughput
  • Reliability (35%): Availability and error rate
  • Scalability (20%): Resource utilization efficiency
  • Efficiency (10%): Cost effectiveness

Grades:

  • 90-100: Excellent
  • 80-89: Good
  • 70-79: Fair
  • 60-69: Needs Improvement
  • <60: Poor

SLA Compliance

PASSED if all criteria met:

  • P99 latency < 50ms (baseline) or scenario target
  • Availability >= 99.99%
  • Error rate < 0.01%

FAILED if any criterion violated

Analysis Report

Each test generates an analysis report with:

  1. Statistical Analysis

    • Summary statistics
    • Distribution histograms
    • Time series charts
    • Anomaly detection
  2. SLA Compliance

    • Pass/fail status
    • Violation details
    • Duration and severity
  3. Bottlenecks

    • Identified constraints
    • Current vs. threshold values
    • Impact assessment
    • Recommendations
  4. Recommendations

    • Prioritized action items
    • Implementation guidance
    • Estimated impact and cost

Visualization Dashboard

Open visualization-dashboard.html in a browser to view:

  • Real-time metrics
  • Interactive charts
  • Geographic heat maps
  • Historical comparisons
  • Cost analysis

Best Practices

Before Running Tests

  1. Baseline Environment

    • Ensure cluster is healthy
    • No active deployments or maintenance
    • Stable configuration
  2. Resource Allocation

    • Sufficient load generator capacity
    • Network bandwidth provisioned
    • Monitoring systems ready
  3. Communication

    • Notify team of upcoming test
    • Schedule during low-traffic periods
    • Have rollback plan ready

During Tests

  1. Monitoring

    • Watch real-time metrics
    • Check for anomalies
    • Monitor costs
  2. Safety

    • Start with smaller tests (baseline_100m)
    • Gradually increase load
    • Be ready to abort if issues detected
  3. Documentation

    • Note any unusual events
    • Document configuration changes
    • Record observations

After Tests

  1. Analysis

    • Review all metrics
    • Identify bottlenecks
    • Compare to previous runs
  2. Reporting

    • Share results with team
    • Document findings
    • Create action items
  3. Follow-Up

    • Implement recommendations
    • Re-test after changes
    • Track improvements over time

Test Frequency

  • Quick Validation: Daily (CI/CD)
  • Standard Suite: Weekly
  • Stress Testing: Monthly
  • Full Suite: Quarterly

Cost Estimation

Load Generation Costs

Per hour of testing:

  • Compute: ~$1,000/hour (distributed load generators)
  • Network: ~$200/hour (egress traffic)
  • Storage: ~$10/hour (results storage)

Total: ~$1,200/hour

Scenario Cost Estimates

Scenario Duration Estimated Cost
baseline_100m 45m $900
baseline_500m 3h 15m $3,900
burst_10x 20m $400
burst_25x 35m $700
burst_50x 50m $1,000
read_heavy 1h 50m $2,200
world_cup 3h $3,600
black_friday 14h $16,800
Full Suite ~48h ~$57,600

Cost Optimization

  1. Use Spot Instances: 60-80% savings on load generators
  2. Regional Selection: Test in fewer regions
  3. Shorter Duration: Reduce steady-state phase
  4. Parallel Execution: Minimize total runtime

Troubleshooting

Common Issues

K6 Not Found

# Install k6
brew install k6  # macOS
sudo apt install k6  # Linux
choco install k6  # Windows

Connection Refused

# Check cluster endpoint
curl -v https://your-ruvector-cluster.example.com/health

# Verify network connectivity
ping your-ruvector-cluster.example.com

Out of Memory

# Increase Node.js memory limit
export NODE_OPTIONS="--max-old-space-size=8192"

# Use smaller scenario
ts-node benchmark-runner.ts run baseline_100m

High Error Rate

  • Check cluster health
  • Verify capacity (not overloaded)
  • Review network latency
  • Check authentication/authorization

Slow Performance

  • Insufficient load generator capacity
  • Network bandwidth limitations
  • Target cluster under-provisioned
  • Configuration issues (connection limits, timeouts)

Debug Mode

# Enable verbose logging
export DEBUG=true
export LOG_LEVEL=debug

ts-node benchmark-runner.ts run baseline_500m

Support

For issues or questions:

Advanced Usage

Custom Scenarios

Create custom scenario in benchmark-scenarios.ts:

export const SCENARIOS = {
  ...SCENARIOS,
  my_custom_test: {
    name: 'My Custom Test',
    description: 'Custom workload pattern',
    config: {
      targetConnections: 1000000000,
      rampUpDuration: '15m',
      steadyStateDuration: '1h',
      rampDownDuration: '10m',
      queriesPerConnection: 100,
      queryInterval: '1000',
      protocol: 'http',
      vectorDimension: 768,
      queryPattern: 'uniform',
    },
    k6Options: {
      // K6 configuration
    },
    expectedMetrics: {
      p99Latency: 50,
      errorRate: 0.01,
      throughput: 100000000,
      availability: 99.99,
    },
    duration: '1h25m',
    tags: ['custom'],
  },
};

Integration with CI/CD

# .github/workflows/benchmark.yml
name: Benchmark
on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly
  workflow_dispatch:

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
      - name: Install k6
        run: |
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6
      - name: Run benchmark
        env:
          BASE_URL: ${{ secrets.BASE_URL }}
        run: |
          cd benchmarks
          ts-node benchmark-runner.ts run baseline_100m
      - name: Upload results
        uses: actions/upload-artifact@v3
        with:
          name: benchmark-results
          path: benchmarks/results/

Programmatic Usage

import { BenchmarkRunner } from './benchmark-runner';

const runner = new BenchmarkRunner({
  baseUrl: 'https://ruvector.example.com',
  parallelScenarios: 2,
  enableHooks: true,
});

// Run single scenario
const run = await runner.runScenario('baseline_500m');
console.log(`Score: ${run.analysis?.score.overall}/100`);

// Run multiple scenarios
const results = await runner.runScenarios([
  'baseline_500m',
  'burst_10x',
  'read_heavy',
]);

// Check if all passed SLA
const allPassed = Array.from(results.values()).every(
  r => r.analysis?.slaCompliance.met
);

Happy Benchmarking! 🚀

For questions or contributions, please visit: https://github.com/ruvnet/ruvector