Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/benchmarks/docs/LOAD_TEST_SCENARIOS.md
+++ b/vendor/ruvector/benchmarks/docs/LOAD_TEST_SCENARIOS.md
@@ -0,0 +1,582 @@
+# RuVector Load Testing Scenarios
+
+## Overview
+
+This document defines comprehensive load testing scenarios for the globally distributed RuVector system, targeting 500 million concurrent learning streams with burst capacity up to 25 billion.
+
+## Test Environment
+
+### Global Regions
+- **Americas**: us-central1, us-east1, us-west1, southamerica-east1
+- **Europe**: europe-west1, europe-west3, europe-north1
+- **Asia-Pacific**: asia-east1, asia-southeast1, asia-northeast1, australia-southeast1
+- **Total**: 11 regions
+
+### Infrastructure
+- **Cloud Run**: Auto-scaling instances (10-1000 per region)
+- **Load Balancer**: Global HTTPS LB with Cloud CDN
+- **Database**: Cloud SQL PostgreSQL (multi-region)
+- **Cache**: Memorystore Redis (128GB per region)
+- **Monitoring**: Cloud Monitoring + OpenTelemetry
+
+---
+
+## Scenario Categories
+
+### 1. Baseline Scenarios
+
+#### 1.1 Steady State (500M Concurrent)
+**Objective**: Validate system handles target baseline load
+
+**Configuration**:
+- Total connections: 500M globally
+- Distribution: Proportional to region capacity
+  - Tier-1 regions (5): 80M each = 400M
+  - Tier-2 regions (10): 10M each = 100M
+- Query rate: 50K QPS globally
+- Test duration: 4 hours
+- Ramp-up: 30 minutes
+
+**Success Criteria**:
+- P99 latency < 50ms
+- P50 latency < 10ms
+- Error rate < 0.1%
+- No memory leaks
+- CPU utilization 60-80%
+- All regions healthy
+
+**Load Pattern**:
+```javascript
+{
+  type: "ramped-arrival-rate",
+  stages: [
+    { duration: "30m", target: 50000 }, // Ramp up
+    { duration: "4h",  target: 50000 }, // Steady
+    { duration: "15m", target: 0 }      // Ramp down
+  ]
+}
+```
+
+#### 1.2 Daily Peak (750M Concurrent)
+**Objective**: Handle 1.5x baseline during peak hours
+
+**Configuration**:
+- Total connections: 750M globally
+- Peak hours: 18:00-22:00 local time per region
+- Query rate: 75K QPS
+- Test duration: 5 hours
+- Multiple peaks (simulate time zones)
+
+**Success Criteria**:
+- P99 latency < 75ms
+- P50 latency < 15ms
+- Error rate < 0.5%
+- Auto-scaling triggers within 60s
+- Cost < $5K for test
+
+---
+
+### 2. Burst Scenarios
+
+#### 2.1 World Cup Final (50x Burst)
+**Objective**: Handle massive spike during major sporting event
+
+**Event Profile**:
+- **Pre-event**: 30 minutes before kickoff
+- **Peak**: During match (90 minutes + 30 min halftime)
+- **Post-event**: 60 minutes after final whistle
+- **Geography**: Concentrated in specific regions (France, Argentina)
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Peak: 25B concurrent (50x)
+- Primary regions: europe-west3 (France), southamerica-east1 (Argentina)
+- Secondary spillover: All Europe/Americas regions
+- Query rate: 2.5M QPS at peak
+- Test duration: 3 hours
+
+**Load Pattern**:
+```javascript
+{
+  stages: [
+    // Pre-event buzz (30 min before)
+    { duration: "30m", target: 500000 },   // 10x baseline
+    { duration: "15m", target: 2500000 },  // 50x PEAK
+    // First half (45 min)
+    { duration: "45m", target: 2500000 },  // Sustained peak
+    // Halftime (15 min - slight drop)
+    { duration: "15m", target: 1500000 },  // 30x
+    // Second half (45 min)
+    { duration: "45m", target: 2500000 },  // Back to peak
+    // Extra time / penalties (30 min)
+    { duration: "30m", target: 3000000 },  // 60x SUPER PEAK
+    // Post-game analysis (30 min)
+    { duration: "30m", target: 1000000 },  // 20x
+    // Gradual decline (30 min)
+    { duration: "30m", target: 100000 }    // 2x
+  ]
+}
+```
+
+**Regional Distribution**:
+- **France**: 40% (10B peak)
+- **Argentina**: 35% (8.75B peak)
+- **Spain/Italy/Portugal**: 10% (2.5B peak)
+- **Rest of Europe**: 8% (2B peak)
+- **Americas**: 5% (1.25B peak)
+- **Asia/Pacific**: 2% (500M peak)
+
+**Success Criteria**:
+- System survives without crash
+- P99 latency < 200ms (degraded acceptable)
+- P50 latency < 50ms
+- Error rate < 5% (acceptable during super peak)
+- Auto-scaling completes within 10 minutes
+- No cascading failures
+- Graceful degradation activated when needed
+- Cost < $100K for full test
+
+**Pre-warming**:
+- Enable predictive scaling 15 minutes before test
+- Pre-allocate 25x capacity in primary regions
+- Warm up CDN caches
+- Increase database connection pools
+
+#### 2.2 Product Launch (10x Burst)
+**Objective**: Handle viral traffic spike (e.g., AI model release)
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Peak: 5B concurrent (10x)
+- Distribution: Global, concentrated in US
+- Query rate: 500K QPS
+- Test duration: 2 hours
+- Pattern: Sudden spike, gradual decline
+
+**Load Pattern**:
+```javascript
+{
+  stages: [
+    { duration: "5m",  target: 500000 },  // 10x instant spike
+    { duration: "30m", target: 500000 },  // Sustained
+    { duration: "45m", target: 300000 },  // Gradual decline
+    { duration: "40m", target: 100000 }   // Return to normal
+  ]
+}
+```
+
+**Success Criteria**:
+- Reactive scaling responds within 60s
+- P99 latency < 100ms
+- Error rate < 2%
+- No downtime
+
+#### 2.3 Flash Crowd (25x Burst)
+**Objective**: Unpredictable viral event
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Peak: 12.5B concurrent (25x)
+- Geography: Unpredictable (use US for test)
+- Query rate: 1.25M QPS
+- Test duration: 90 minutes
+- Pattern: Very rapid spike (< 2 minutes)
+
+**Load Pattern**:
+```javascript
+{
+  stages: [
+    { duration: "2m",  target: 1250000 }, // 25x in 2 minutes!
+    { duration: "30m", target: 1250000 }, // Hold peak
+    { duration: "30m", target: 750000 },  // Decline
+    { duration: "28m", target: 100000 }   // Return
+  ]
+}
+```
+
+**Success Criteria**:
+- System survives without manual intervention
+- Reactive scaling activates immediately
+- P99 latency < 150ms
+- Error rate < 3%
+- Cost cap respected
+
+---
+
+### 3. Failover Scenarios
+
+#### 3.1 Single Region Failure
+**Objective**: Validate regional failover
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Failed region: europe-west1 (80M connections)
+- Failover targets: europe-west3, europe-north1
+- Query rate: 50K QPS
+- Test duration: 1 hour
+- Failure trigger: 30 minutes into test
+
+**Procedure**:
+1. Run baseline load for 30 minutes
+2. Simulate region failure (kill all instances in europe-west1)
+3. Observe failover behavior
+4. Measure recovery time
+5. Validate data consistency
+
+**Success Criteria**:
+- Failover completes within 60 seconds
+- Connection loss < 5%
+- No data loss
+- P99 latency spike < 200ms during failover
+- Automatic recovery when region restored
+
+#### 3.2 Multi-Region Cascade Failure
+**Objective**: Test disaster recovery
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Failed regions: europe-west1, europe-west3 (160M connections)
+- Failover: Global redistribution
+- Test duration: 2 hours
+- Progressive failures (15 min apart)
+
+**Procedure**:
+1. Run baseline load
+2. Kill europe-west1 at T+30m
+3. Kill europe-west3 at T+45m
+4. Observe cascade prevention
+5. Validate global recovery
+
+**Success Criteria**:
+- No cascading failures
+- Circuit breakers activate
+- Graceful degradation if needed
+- Connection loss < 10%
+- System remains stable
+
+#### 3.3 Database Failover
+**Objective**: Test database resilience
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Database: Trigger Cloud SQL failover to replica
+- Query rate: 50K QPS (read-heavy)
+- Test duration: 1 hour
+- Failure trigger: 20 minutes into test
+
+**Success Criteria**:
+- Failover completes within 30 seconds
+- Connection pool recovers automatically
+- Read queries continue with < 5% errors
+- Write queries resume after failover
+- No permanent data loss
+
+---
+
+### 4. Workload Scenarios
+
+#### 4.1 Read-Heavy (90% Reads)
+**Objective**: Validate cache effectiveness
+
+**Configuration**:
+- Total connections: 500M
+- Query mix: 90% similarity search, 10% updates
+- Cache hit rate target: > 75%
+- Query rate: 50K QPS
+- Test duration: 2 hours
+
+**Success Criteria**:
+- P99 latency < 30ms (due to caching)
+- Cache hit rate > 75%
+- Database CPU < 50%
+
+#### 4.2 Write-Heavy (40% Writes)
+**Objective**: Test write throughput
+
+**Configuration**:
+- Total connections: 500M
+- Query mix: 60% reads, 40% vector updates
+- Query rate: 50K QPS
+- Test duration: 2 hours
+- Vector dimensions: 768
+
+**Success Criteria**:
+- P99 latency < 100ms
+- Database CPU < 80%
+- Replication lag < 5 seconds
+- No write conflicts
+
+#### 4.3 Mixed Workload (Realistic)
+**Objective**: Simulate production traffic
+
+**Configuration**:
+- Total connections: 500M
+- Query mix:
+  - 70% similarity search
+  - 15% filtered search
+  - 10% vector inserts
+  - 5% deletes
+- Query rate: 50K QPS
+- Test duration: 4 hours
+- Varying vector dimensions (384, 768, 1536)
+
+**Success Criteria**:
+- P99 latency < 50ms
+- All operations succeed
+- Resource utilization balanced
+
+---
+
+### 5. Stress Scenarios
+
+#### 5.1 Gradual Load Increase
+**Objective**: Find breaking point
+
+**Configuration**:
+- Start: 100M concurrent
+- End: Until system breaks
+- Increment: +100M every 30 minutes
+- Query rate: Proportional to connections
+- Test duration: Until failure
+
+**Success Criteria**:
+- Identify maximum capacity
+- Measure degradation curve
+- Observe failure modes
+
+#### 5.2 Long-Duration Soak Test
+**Objective**: Detect memory leaks and resource exhaustion
+
+**Configuration**:
+- Total connections: 500M
+- Query rate: 50K QPS
+- Test duration: 24 hours
+- Pattern: Steady state
+
+**Success Criteria**:
+- No memory leaks
+- No connection leaks
+- Stable performance over time
+- Resource cleanup works
+
+---
+
+## Test Execution Strategy
+
+### Sequential Execution (Standard Suite)
+Total time: ~18 hours
+
+1. Baseline Steady State (4h)
+2. Daily Peak (5h)
+3. Product Launch 10x (2h)
+4. Single Region Failover (1h)
+5. Read-Heavy Workload (2h)
+6. Write-Heavy Workload (2h)
+7. Mixed Workload (4h)
+
+### Burst Suite (Special Events)
+Total time: ~8 hours
+
+1. World Cup 50x (3h)
+2. Flash Crowd 25x (1.5h)
+3. Multi-Region Cascade (2h)
+4. Database Failover (1h)
+
+### Quick Validation (Smoke Test)
+Total time: ~2 hours
+
+1. Baseline Steady State - 30 minutes
+2. Product Launch 10x - 30 minutes
+3. Single Region Failover - 30 minutes
+4. Mixed Workload - 30 minutes
+
+---
+
+## Monitoring During Tests
+
+### Real-Time Metrics
+- Connection count per region
+- Query latency percentiles (p50, p95, p99)
+- Error rates by type
+- CPU/Memory utilization
+- Network throughput
+- Database connections
+- Cache hit rates
+
+### Alerts
+- P99 latency > 50ms (warning)
+- P99 latency > 100ms (critical)
+- Error rate > 1% (warning)
+- Error rate > 5% (critical)
+- Region unhealthy
+- Database connections > 90%
+- Cost > $10K/hour
+
+### Dashboards
+1. Executive: High-level metrics, SLA status
+2. Operations: Regional health, resource utilization
+3. Cost: Hourly spend, projections
+4. Performance: Latency distributions, throughput
+
+---
+
+## Cost Estimates
+
+### Per-Test Costs
+
+| Scenario | Duration | Peak Load | Estimated Cost |
+|----------|----------|-----------|----------------|
+| Baseline Steady | 4h | 500M | $180 |
+| Daily Peak | 5h | 750M | $350 |
+| World Cup 50x | 3h | 25B | $80,000 |
+| Product Launch 10x | 2h | 5B | $3,600 |
+| Flash Crowd 25x | 1.5h | 12.5B | $28,000 |
+| Single Region Failover | 1h | 500M | $45 |
+| Workload Tests | 2h | 500M | $90 |
+
+### Full Suite Costs
+- **Standard Suite**: ~$900
+- **Burst Suite**: ~$112K
+- **Quick Validation**: ~$150
+
+**Cost Optimization**:
+- Use committed use discounts (30% off)
+- Run tests in low-cost regions when possible
+- Use preemptible instances for load generators
+- Leverage CDN caching
+- Clean up resources immediately after tests
+
+---
+
+## Pre-Test Checklist
+
+### Infrastructure
+- [ ] All regions deployed and healthy
+- [ ] Load balancer configured
+- [ ] CDN enabled
+- [ ] Database replicas ready
+- [ ] Redis caches warmed
+- [ ] Monitoring dashboards set up
+- [ ] Alerting policies active
+- [ ] Budget alerts configured
+
+### Load Generation
+- [ ] K6 scripts validated
+- [ ] Load generators deployed in all regions
+- [ ] Test data prepared
+- [ ] Baseline traffic running
+- [ ] Credentials configured
+- [ ] Results storage ready
+
+### Team
+- [ ] On-call engineer available
+- [ ] Communication channels open (Slack)
+- [ ] Runbook reviewed
+- [ ] Rollback plan ready
+- [ ] Stakeholders notified
+
+---
+
+## Post-Test Analysis
+
+### Deliverables
+1. Test execution log
+2. Metrics summary (latency, throughput, errors)
+3. SLA compliance report
+4. Cost breakdown
+5. Bottleneck analysis
+6. Recommendations document
+7. Performance comparison (vs. previous tests)
+
+### Key Questions
+- Did we meet SLA targets?
+- Where did bottlenecks occur?
+- How well did auto-scaling perform?
+- Were there any unexpected failures?
+- What was the actual cost vs. estimate?
+- What improvements should we make?
+
+---
+
+## Example: Running World Cup Test
+
+```bash
+# 1. Pre-warm infrastructure
+cd /home/user/ruvector/src/burst-scaling
+npm run build
+node dist/burst-predictor.js --event "World Cup Final" --time "2026-07-15T18:00:00Z"
+
+# 2. Deploy load generators
+cd /home/user/ruvector/benchmarks
+npm run deploy:generators
+
+# 3. Run scenario
+npm run scenario:worldcup -- \
+  --regions "europe-west3,southamerica-east1" \
+  --peak-multiplier 50 \
+  --duration "3h" \
+  --enable-notifications
+
+# 4. Monitor (separate terminal)
+npm run dashboard
+
+# 5. Collect results
+npm run analyze -- --test-id "worldcup-2026-final-test"
+
+# 6. Generate report
+npm run report -- --test-id "worldcup-2026-final-test" --format pdf
+```
+
+---
+
+## Troubleshooting
+
+### High Error Rates
+- Check: Database connection pool exhaustion
+- Check: Network bandwidth limits
+- Check: Rate limiting too aggressive
+- Action: Scale up resources or enable degradation
+
+### High Latency
+- Check: Cold cache (low hit rate)
+- Check: Database query performance
+- Check: Network latency between regions
+- Action: Warm caches, optimize queries, adjust routing
+
+### Failed Auto-Scaling
+- Check: GCP quotas and limits
+- Check: Budget caps
+- Check: IAM permissions
+- Action: Request quota increase, adjust caps
+
+### Cost Overruns
+- Check: Instances not scaling down
+- Check: Database overprovisioned
+- Check: Excessive logging
+- Action: Force scale-in, reduce logging verbosity
+
+---
+
+## Next Steps
+
+1. **Run Quick Validation**: Ensure system is ready
+2. **Run Standard Suite**: Comprehensive testing
+3. **Schedule Burst Tests**: Coordinate with team (expensive!)
+4. **Iterate Based on Results**: Tune thresholds and configurations
+5. **Document Learnings**: Update runbooks and architecture docs
+
+---
+
+## References
+
+- [Architecture Overview](/home/user/ruvector/docs/cloud-architecture/architecture-overview.md)
+- [Scaling Strategy](/home/user/ruvector/docs/cloud-architecture/scaling-strategy.md)
+- [Burst Scaling](/home/user/ruvector/src/burst-scaling/README.md)
+- [Benchmarking Guide](/home/user/ruvector/benchmarks/README.md)
+- [Operations Runbook](/home/user/ruvector/src/burst-scaling/RUNBOOK.md)
+
+---
+
+**Document Version**: 1.0
+**Last Updated**: 2025-11-20
+**Author**: RuVector Performance Team
--- a/vendor/ruvector/benchmarks/docs/QUICKSTART.md
+++ b/vendor/ruvector/benchmarks/docs/QUICKSTART.md
@@ -0,0 +1,235 @@
+# RuVector Benchmarks - Quick Start Guide
+
+Get up and running with RuVector benchmarks in 5 minutes!
+
+## Prerequisites
+
+- Node.js 18+ and npm
+- k6 load testing tool
+- Access to RuVector cluster
+
+## Installation
+
+### Step 1: Install k6
+
+**macOS:**
+```bash
+brew install k6
+```
+
+**Linux (Debian/Ubuntu):**
+```bash
+sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
+  --keyserver hkp://keyserver.ubuntu.com:80 \
+  --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
+echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | \
+  sudo tee /etc/apt/sources.list.d/k6.list
+sudo apt-get update
+sudo apt-get install k6
+```
+
+**Windows:**
+```powershell
+choco install k6
+```
+
+### Step 2: Run Setup Script
+
+```bash
+cd /home/user/ruvector/benchmarks
+./setup.sh
+```
+
+This will:
+- Check dependencies
+- Install TypeScript/ts-node
+- Create results directory
+- Configure environment
+
+### Step 3: Configure Environment
+
+Edit `.env` file with your cluster URL:
+
+```bash
+BASE_URL=https://your-ruvector-cluster.example.com
+PARALLEL=1
+ENABLE_HOOKS=true
+```
+
+## Running Your First Test
+
+### Quick Validation (45 minutes)
+
+```bash
+npm run test:quick
+```
+
+This runs `baseline_100m` scenario:
+- 100M concurrent connections
+- 30 minutes steady-state
+- Validates basic functionality
+
+### View Results
+
+```bash
+# Start visualization dashboard
+npm run dashboard
+
+# Open in browser
+open http://localhost:8000/visualization-dashboard.html
+```
+
+## Common Scenarios
+
+### Baseline Test (500M connections)
+```bash
+npm run test:baseline
+```
+Duration: 3h 15m
+
+### Burst Test (10x spike)
+```bash
+npm run test:burst
+```
+Duration: 20m
+
+### Standard Test Suite
+```bash
+npm run test:standard
+```
+Duration: ~6 hours
+
+## Understanding Results
+
+After a test completes, check:
+
+```bash
+results/
+  run-{timestamp}/
+    {scenario}-metrics.json     # Raw metrics
+    {scenario}-analysis.json    # Analysis report
+    {scenario}-report.md        # Human-readable report
+    SUMMARY.md                  # Overall summary
+```
+
+### Key Metrics
+
+- **P99 Latency**: Should be < 50ms (baseline)
+- **Throughput**: Queries per second
+- **Error Rate**: Should be < 0.01%
+- **Availability**: Should be > 99.99%
+
+### Performance Score
+
+Each test gets a score 0-100:
+- 90+: Excellent
+- 80-89: Good
+- 70-79: Fair
+- <70: Needs improvement
+
+## Troubleshooting
+
+### Connection Failed
+```bash
+# Test cluster connectivity
+curl -v https://your-cluster.example.com/health
+```
+
+### k6 Errors
+```bash
+# Verify k6 installation
+k6 version
+
+# Reinstall if needed
+brew reinstall k6  # macOS
+```
+
+### High Memory Usage
+```bash
+# Increase Node.js memory
+export NODE_OPTIONS="--max-old-space-size=8192"
+```
+
+## Docker Usage
+
+### Build Image
+```bash
+docker build -t ruvector-benchmark .
+```
+
+### Run Test
+```bash
+docker run \
+  -e BASE_URL="https://your-cluster.example.com" \
+  -v $(pwd)/results:/benchmarks/results \
+  ruvector-benchmark run baseline_100m
+```
+
+## Next Steps
+
+1. **Review README.md** for comprehensive documentation
+2. **Explore scenarios** in `benchmark-scenarios.ts`
+3. **Customize tests** for your workload
+4. **Set up CI/CD** for continuous benchmarking
+
+## Quick Command Reference
+
+```bash
+# List all scenarios
+npm run list
+
+# Run specific scenario
+ts-node benchmark-runner.ts run <scenario-name>
+
+# Run scenario group
+ts-node benchmark-runner.ts group <group-name>
+
+# View dashboard
+npm run dashboard
+
+# Clean results
+npm run clean
+```
+
+## Available Scenarios
+
+### Baseline Tests
+- `baseline_100m` - Quick validation (45m)
+- `baseline_500m` - Full baseline (3h 15m)
+
+### Burst Tests
+- `burst_10x` - 10x spike (20m)
+- `burst_25x` - 25x spike (35m)
+- `burst_50x` - 50x spike (50m)
+
+### Workload Tests
+- `read_heavy` - 95% reads (1h 50m)
+- `write_heavy` - 70% writes (1h 50m)
+- `balanced_workload` - 50/50 split (1h 50m)
+
+### Failover Tests
+- `regional_failover` - Single region failure (45m)
+- `multi_region_failover` - Multiple region failure (55m)
+
+### Real-World Tests
+- `world_cup` - Sporting event simulation (3h)
+- `black_friday` - E-commerce peak (14h)
+
+### Scenario Groups
+- `quick_validation` - Fast validation suite
+- `standard_suite` - Standard test suite
+- `stress_suite` - Stress testing
+- `reliability_suite` - Failover tests
+- `full_suite` - All scenarios
+
+## Support
+
+- **Documentation**: See README.md
+- **Issues**: https://github.com/ruvnet/ruvector/issues
+- **Slack**: https://ruvector.slack.com
+
+---
+
+**Ready to benchmark!** 🚀
+
+Start with: `npm run test:quick`
--- a/vendor/ruvector/benchmarks/docs/README.md
+++ b/vendor/ruvector/benchmarks/docs/README.md
@@ -0,0 +1,665 @@
+# RuVector Benchmarking Suite
+
+Comprehensive benchmarking tool for testing the globally distributed RuVector vector search system at scale (500M+ concurrent connections).
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Features](#features)
+- [Prerequisites](#prerequisites)
+- [Installation](#installation)
+- [Quick Start](#quick-start)
+- [Benchmark Scenarios](#benchmark-scenarios)
+- [Running Benchmarks](#running-benchmarks)
+- [Understanding Results](#understanding-results)
+- [Best Practices](#best-practices)
+- [Cost Estimation](#cost-estimation)
+- [Troubleshooting](#troubleshooting)
+- [Advanced Usage](#advanced-usage)
+
+## Overview
+
+This benchmarking suite provides enterprise-grade load testing capabilities for RuVector, supporting:
+
+- **Massive Scale**: Test up to 25B concurrent connections
+- **Multi-Region**: Distributed load generation across 11 GCP regions
+- **Comprehensive Metrics**: Latency, throughput, errors, resource utilization, costs
+- **SLA Validation**: Automated checking against 99.99% availability, <50ms p99 latency targets
+- **Advanced Analysis**: Statistical analysis, bottleneck identification, recommendations
+
+## Features
+
+### Load Generation
+- Multi-protocol support (HTTP, HTTP/2, WebSocket, gRPC)
+- Realistic query patterns (uniform, hotspot, Zipfian, burst)
+- Configurable ramp-up/down rates
+- Connection lifecycle management
+- Geographic distribution
+
+### Metrics Collection
+- Latency distribution (p50, p90, p95, p99, p99.9)
+- Throughput tracking (QPS, bandwidth)
+- Error analysis by type and region
+- Resource utilization (CPU, memory, network)
+- Cost per million queries
+- Regional performance comparison
+
+### Analysis & Reporting
+- Statistical analysis with anomaly detection
+- SLA compliance checking
+- Bottleneck identification
+- Performance score calculation
+- Actionable recommendations
+- Interactive visualization dashboard
+- Markdown and JSON reports
+- CSV export for further analysis
+
+## Prerequisites
+
+### Required
+- **Node.js**: v18+ (for TypeScript execution)
+- **k6**: Latest version ([installation guide](https://k6.io/docs/getting-started/installation/))
+- **Access**: RuVector cluster endpoint
+
+### Optional
+- **Claude Flow**: For hooks integration
+  ```bash
+  npm install -g claude-flow@alpha
+  ```
+- **Docker**: For containerized execution
+- **GCP Account**: For multi-region load generation
+
+## Installation
+
+1. **Clone Repository**
+   ```bash
+   cd /home/user/ruvector/benchmarks
+   ```
+
+2. **Install Dependencies**
+   ```bash
+   npm install -g typescript ts-node
+   npm install k6 @types/k6
+   ```
+
+3. **Verify Installation**
+   ```bash
+   k6 version
+   ts-node --version
+   ```
+
+4. **Configure Environment**
+   ```bash
+   export BASE_URL="https://your-ruvector-cluster.example.com"
+   export PARALLEL=2  # Number of parallel scenarios
+   ```
+
+## Quick Start
+
+### Run a Single Scenario
+
+```bash
+# Quick validation (100M connections, 45 minutes)
+ts-node benchmark-runner.ts run baseline_100m
+
+# Full baseline test (500M connections, 3+ hours)
+ts-node benchmark-runner.ts run baseline_500m
+
+# Burst test (10x spike to 5B connections)
+ts-node benchmark-runner.ts run burst_10x
+```
+
+### Run Scenario Groups
+
+```bash
+# Quick validation suite (~1 hour)
+ts-node benchmark-runner.ts group quick_validation
+
+# Standard test suite (~6 hours)
+ts-node benchmark-runner.ts group standard_suite
+
+# Full stress testing suite (~10 hours)
+ts-node benchmark-runner.ts group stress_suite
+
+# All scenarios (~48 hours)
+ts-node benchmark-runner.ts group full_suite
+```
+
+### List Available Tests
+
+```bash
+ts-node benchmark-runner.ts list
+```
+
+## Benchmark Scenarios
+
+### Baseline Tests
+
+#### baseline_500m
+- **Description**: Steady-state operation with 500M concurrent connections
+- **Duration**: 3h 15m
+- **Target**: P99 < 50ms, 99.99% availability
+- **Use Case**: Production capacity validation
+
+#### baseline_100m
+- **Description**: Smaller baseline for quick validation
+- **Duration**: 45m
+- **Target**: P99 < 50ms, 99.99% availability
+- **Use Case**: CI/CD integration, quick regression tests
+
+### Burst Tests
+
+#### burst_10x
+- **Description**: Sudden spike to 5B concurrent (10x baseline)
+- **Duration**: 20m
+- **Target**: P99 < 100ms, 99.9% availability
+- **Use Case**: Flash sale, viral event simulation
+
+#### burst_25x
+- **Description**: Extreme spike to 12.5B concurrent (25x baseline)
+- **Duration**: 35m
+- **Target**: P99 < 150ms, 99.5% availability
+- **Use Case**: Major global event (Olympics, elections)
+
+#### burst_50x
+- **Description**: Maximum spike to 25B concurrent (50x baseline)
+- **Duration**: 50m
+- **Target**: P99 < 200ms, 99% availability
+- **Use Case**: Stress testing absolute limits
+
+### Failover Tests
+
+#### regional_failover
+- **Description**: Test recovery when one region fails
+- **Duration**: 45m
+- **Target**: <10% throughput degradation, <1% errors
+- **Use Case**: Disaster recovery validation
+
+#### multi_region_failover
+- **Description**: Test recovery when multiple regions fail
+- **Duration**: 55m
+- **Target**: <20% throughput degradation, <2% errors
+- **Use Case**: Multi-region outage preparation
+
+### Workload Tests
+
+#### read_heavy
+- **Description**: 95% reads, 5% writes (typical production workload)
+- **Duration**: 1h 50m
+- **Target**: P99 < 50ms, 99.99% availability
+- **Use Case**: Production simulation
+
+#### write_heavy
+- **Description**: 70% writes, 30% reads (batch indexing scenario)
+- **Duration**: 1h 50m
+- **Target**: P99 < 80ms, 99.95% availability
+- **Use Case**: Bulk data ingestion
+
+#### balanced_workload
+- **Description**: 50% reads, 50% writes
+- **Duration**: 1h 50m
+- **Target**: P99 < 60ms, 99.98% availability
+- **Use Case**: Mixed workload validation
+
+### Real-World Scenarios
+
+#### world_cup
+- **Description**: Predictable spike with geographic concentration (Europe)
+- **Duration**: 3h
+- **Target**: P99 < 100ms during matches
+- **Use Case**: Major sporting event
+
+#### black_friday
+- **Description**: Sustained high load with periodic spikes
+- **Duration**: 14h
+- **Target**: P99 < 80ms, 99.95% availability
+- **Use Case**: E-commerce peak period
+
+## Running Benchmarks
+
+### Basic Usage
+
+```bash
+# Set environment variables
+export BASE_URL="https://ruvector.example.com"
+export REGION="us-east1"
+
+# Run single test
+ts-node benchmark-runner.ts run baseline_500m
+
+# Run with custom config
+BASE_URL="https://staging.example.com" \
+PARALLEL=3 \
+ts-node benchmark-runner.ts group standard_suite
+```
+
+### With Claude Flow Hooks
+
+```bash
+# Enable hooks (default)
+export ENABLE_HOOKS=true
+
+# Disable hooks
+export ENABLE_HOOKS=false
+
+ts-node benchmark-runner.ts run baseline_500m
+```
+
+Hooks will automatically:
+- Execute `npx claude-flow@alpha hooks pre-task` before each test
+- Store results in swarm memory
+- Execute `npx claude-flow@alpha hooks post-task` after completion
+
+### Multi-Region Execution
+
+To distribute load across regions:
+
+```bash
+# Deploy load generators to GCP regions
+for region in us-east1 us-west1 europe-west1 asia-east1; do
+  gcloud compute instances create "k6-${region}" \
+    --zone="${region}-a" \
+    --machine-type="n2-standard-32" \
+    --image-family="ubuntu-2004-lts" \
+    --image-project="ubuntu-os-cloud" \
+    --metadata-from-file=startup-script=setup-k6.sh
+done
+
+# Run distributed test
+ts-node benchmark-runner.ts run baseline_500m
+```
+
+### Docker Execution
+
+```bash
+# Build container
+docker build -t ruvector-benchmark .
+
+# Run test
+docker run \
+  -e BASE_URL="https://ruvector.example.com" \
+  -v $(pwd)/results:/results \
+  ruvector-benchmark run baseline_500m
+```
+
+## Understanding Results
+
+### Output Structure
+
+```
+results/
+  run-{timestamp}/
+    {scenario}-{timestamp}-raw.json       # Raw K6 metrics
+    {scenario}-{timestamp}-metrics.json   # Processed metrics
+    {scenario}-{timestamp}-metrics.csv    # CSV export
+    {scenario}-{timestamp}-analysis.json  # Analysis report
+    {scenario}-{timestamp}-report.md      # Markdown report
+    SUMMARY.md                            # Multi-scenario summary
+```
+
+### Key Metrics
+
+#### Latency
+- **P50 (Median)**: 50% of requests faster than this
+- **P90**: 90% of requests faster than this
+- **P95**: 95% of requests faster than this
+- **P99**: 99% of requests faster than this (SLA target)
+- **P99.9**: 99.9% of requests faster than this
+
+**Target**: P99 < 50ms for baseline, <100ms for burst
+
+#### Throughput
+- **QPS**: Queries per second
+- **Peak QPS**: Maximum sustained throughput
+- **Average QPS**: Mean throughput over test duration
+
+**Target**: 50M QPS for 500M baseline connections
+
+#### Error Rate
+- **Total Errors**: Count of failed requests
+- **Error Rate %**: Percentage of requests that failed
+- **By Type**: Breakdown (timeout, connection, server, client)
+- **By Region**: Geographic distribution
+
+**Target**: < 0.01% error rate (99.99% success)
+
+#### Availability
+- **Uptime %**: Percentage of time system was available
+- **Downtime**: Total milliseconds of unavailability
+- **MTBF**: Mean time between failures
+- **MTTR**: Mean time to recovery
+
+**Target**: 99.99% availability (52 minutes/year downtime)
+
+#### Resource Utilization
+- **CPU %**: Average and peak CPU usage
+- **Memory %**: Average and peak memory usage
+- **Network**: Bandwidth, ingress/egress bytes
+- **Per Region**: Resource usage by geographic location
+
+**Alert Thresholds**: CPU > 80%, Memory > 85%
+
+#### Cost
+- **Total Cost**: Compute + network + storage
+- **Cost Per Million**: Queries per million queries
+- **Per Region**: Cost breakdown by location
+
+**Target**: < $0.50 per million queries
+
+### Performance Score
+
+Overall score (0-100) calculated from:
+- **Performance** (35%): Latency and throughput
+- **Reliability** (35%): Availability and error rate
+- **Scalability** (20%): Resource utilization efficiency
+- **Efficiency** (10%): Cost effectiveness
+
+**Grades**:
+- 90-100: Excellent
+- 80-89: Good
+- 70-79: Fair
+- 60-69: Needs Improvement
+- <60: Poor
+
+### SLA Compliance
+
+✅ **PASSED** if all criteria met:
+- P99 latency < 50ms (baseline) or scenario target
+- Availability >= 99.99%
+- Error rate < 0.01%
+
+❌ **FAILED** if any criterion violated
+
+### Analysis Report
+
+Each test generates an analysis report with:
+
+1. **Statistical Analysis**
+   - Summary statistics
+   - Distribution histograms
+   - Time series charts
+   - Anomaly detection
+
+2. **SLA Compliance**
+   - Pass/fail status
+   - Violation details
+   - Duration and severity
+
+3. **Bottlenecks**
+   - Identified constraints
+   - Current vs. threshold values
+   - Impact assessment
+   - Recommendations
+
+4. **Recommendations**
+   - Prioritized action items
+   - Implementation guidance
+   - Estimated impact and cost
+
+### Visualization Dashboard
+
+Open `visualization-dashboard.html` in a browser to view:
+
+- Real-time metrics
+- Interactive charts
+- Geographic heat maps
+- Historical comparisons
+- Cost analysis
+
+## Best Practices
+
+### Before Running Tests
+
+1. **Baseline Environment**
+   - Ensure cluster is healthy
+   - No active deployments or maintenance
+   - Stable configuration
+
+2. **Resource Allocation**
+   - Sufficient load generator capacity
+   - Network bandwidth provisioned
+   - Monitoring systems ready
+
+3. **Communication**
+   - Notify team of upcoming test
+   - Schedule during low-traffic periods
+   - Have rollback plan ready
+
+### During Tests
+
+1. **Monitoring**
+   - Watch real-time metrics
+   - Check for anomalies
+   - Monitor costs
+
+2. **Safety**
+   - Start with smaller tests (baseline_100m)
+   - Gradually increase load
+   - Be ready to abort if issues detected
+
+3. **Documentation**
+   - Note any unusual events
+   - Document configuration changes
+   - Record observations
+
+### After Tests
+
+1. **Analysis**
+   - Review all metrics
+   - Identify bottlenecks
+   - Compare to previous runs
+
+2. **Reporting**
+   - Share results with team
+   - Document findings
+   - Create action items
+
+3. **Follow-Up**
+   - Implement recommendations
+   - Re-test after changes
+   - Track improvements over time
+
+### Test Frequency
+
+- **Quick Validation**: Daily (CI/CD)
+- **Standard Suite**: Weekly
+- **Stress Testing**: Monthly
+- **Full Suite**: Quarterly
+
+## Cost Estimation
+
+### Load Generation Costs
+
+Per hour of testing:
+- **Compute**: ~$1,000/hour (distributed load generators)
+- **Network**: ~$200/hour (egress traffic)
+- **Storage**: ~$10/hour (results storage)
+
+**Total**: ~$1,200/hour
+
+### Scenario Cost Estimates
+
+| Scenario | Duration | Estimated Cost |
+|----------|----------|----------------|
+| baseline_100m | 45m | $900 |
+| baseline_500m | 3h 15m | $3,900 |
+| burst_10x | 20m | $400 |
+| burst_25x | 35m | $700 |
+| burst_50x | 50m | $1,000 |
+| read_heavy | 1h 50m | $2,200 |
+| world_cup | 3h | $3,600 |
+| black_friday | 14h | $16,800 |
+| **Full Suite** | ~48h | **~$57,600** |
+
+### Cost Optimization
+
+1. **Use Spot Instances**: 60-80% savings on load generators
+2. **Regional Selection**: Test in fewer regions
+3. **Shorter Duration**: Reduce steady-state phase
+4. **Parallel Execution**: Minimize total runtime
+
+## Troubleshooting
+
+### Common Issues
+
+#### K6 Not Found
+```bash
+# Install k6
+brew install k6  # macOS
+sudo apt install k6  # Linux
+choco install k6  # Windows
+```
+
+#### Connection Refused
+```bash
+# Check cluster endpoint
+curl -v https://your-ruvector-cluster.example.com/health
+
+# Verify network connectivity
+ping your-ruvector-cluster.example.com
+```
+
+#### Out of Memory
+```bash
+# Increase Node.js memory limit
+export NODE_OPTIONS="--max-old-space-size=8192"
+
+# Use smaller scenario
+ts-node benchmark-runner.ts run baseline_100m
+```
+
+#### High Error Rate
+- Check cluster health
+- Verify capacity (not overloaded)
+- Review network latency
+- Check authentication/authorization
+
+#### Slow Performance
+- Insufficient load generator capacity
+- Network bandwidth limitations
+- Target cluster under-provisioned
+- Configuration issues (connection limits, timeouts)
+
+### Debug Mode
+
+```bash
+# Enable verbose logging
+export DEBUG=true
+export LOG_LEVEL=debug
+
+ts-node benchmark-runner.ts run baseline_500m
+```
+
+### Support
+
+For issues or questions:
+- GitHub Issues: https://github.com/ruvnet/ruvector/issues
+- Documentation: https://docs.ruvector.io
+- Community: https://discord.gg/ruvector
+
+## Advanced Usage
+
+### Custom Scenarios
+
+Create custom scenario in `benchmark-scenarios.ts`:
+
+```typescript
+export const SCENARIOS = {
+  ...SCENARIOS,
+  my_custom_test: {
+    name: 'My Custom Test',
+    description: 'Custom workload pattern',
+    config: {
+      targetConnections: 1000000000,
+      rampUpDuration: '15m',
+      steadyStateDuration: '1h',
+      rampDownDuration: '10m',
+      queriesPerConnection: 100,
+      queryInterval: '1000',
+      protocol: 'http',
+      vectorDimension: 768,
+      queryPattern: 'uniform',
+    },
+    k6Options: {
+      // K6 configuration
+    },
+    expectedMetrics: {
+      p99Latency: 50,
+      errorRate: 0.01,
+      throughput: 100000000,
+      availability: 99.99,
+    },
+    duration: '1h25m',
+    tags: ['custom'],
+  },
+};
+```
+
+### Integration with CI/CD
+
+```yaml
+# .github/workflows/benchmark.yml
+name: Benchmark
+on:
+  schedule:
+    - cron: '0 0 * * 0'  # Weekly
+  workflow_dispatch:
+
+jobs:
+  benchmark:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions/setup-node@v3
+      - name: Install k6
+        run: |
+          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
+          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
+          sudo apt-get update
+          sudo apt-get install k6
+      - name: Run benchmark
+        env:
+          BASE_URL: ${{ secrets.BASE_URL }}
+        run: |
+          cd benchmarks
+          ts-node benchmark-runner.ts run baseline_100m
+      - name: Upload results
+        uses: actions/upload-artifact@v3
+        with:
+          name: benchmark-results
+          path: benchmarks/results/
+```
+
+### Programmatic Usage
+
+```typescript
+import { BenchmarkRunner } from './benchmark-runner';
+
+const runner = new BenchmarkRunner({
+  baseUrl: 'https://ruvector.example.com',
+  parallelScenarios: 2,
+  enableHooks: true,
+});
+
+// Run single scenario
+const run = await runner.runScenario('baseline_500m');
+console.log(`Score: ${run.analysis?.score.overall}/100`);
+
+// Run multiple scenarios
+const results = await runner.runScenarios([
+  'baseline_500m',
+  'burst_10x',
+  'read_heavy',
+]);
+
+// Check if all passed SLA
+const allPassed = Array.from(results.values()).every(
+  r => r.analysis?.slaCompliance.met
+);
+```
+
+---
+
+**Happy Benchmarking!** 🚀
+
+For questions or contributions, please visit: https://github.com/ruvnet/ruvector