Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/benchmarks/docs/LOAD_TEST_SCENARIOS.md
+++ b/benchmarks/docs/LOAD_TEST_SCENARIOS.md
@@ -0,0 +1,582 @@
+# RuVector Load Testing Scenarios
+
+## Overview
+
+This document defines comprehensive load testing scenarios for the globally distributed RuVector system, targeting 500 million concurrent learning streams with burst capacity up to 25 billion.
+
+## Test Environment
+
+### Global Regions
+- **Americas**: us-central1, us-east1, us-west1, southamerica-east1
+- **Europe**: europe-west1, europe-west3, europe-north1
+- **Asia-Pacific**: asia-east1, asia-southeast1, asia-northeast1, australia-southeast1
+- **Total**: 11 regions
+
+### Infrastructure
+- **Cloud Run**: Auto-scaling instances (10-1000 per region)
+- **Load Balancer**: Global HTTPS LB with Cloud CDN
+- **Database**: Cloud SQL PostgreSQL (multi-region)
+- **Cache**: Memorystore Redis (128GB per region)
+- **Monitoring**: Cloud Monitoring + OpenTelemetry
+
+---
+
+## Scenario Categories
+
+### 1. Baseline Scenarios
+
+#### 1.1 Steady State (500M Concurrent)
+**Objective**: Validate system handles target baseline load
+
+**Configuration**:
+- Total connections: 500M globally
+- Distribution: Proportional to region capacity
+  - Tier-1 regions (5): 80M each = 400M
+  - Tier-2 regions (10): 10M each = 100M
+- Query rate: 50K QPS globally
+- Test duration: 4 hours
+- Ramp-up: 30 minutes
+
+**Success Criteria**:
+- P99 latency < 50ms
+- P50 latency < 10ms
+- Error rate < 0.1%
+- No memory leaks
+- CPU utilization 60-80%
+- All regions healthy
+
+**Load Pattern**:
+```javascript
+{
+  type: "ramped-arrival-rate",
+  stages: [
+    { duration: "30m", target: 50000 }, // Ramp up
+    { duration: "4h",  target: 50000 }, // Steady
+    { duration: "15m", target: 0 }      // Ramp down
+  ]
+}
+```
+
+#### 1.2 Daily Peak (750M Concurrent)
+**Objective**: Handle 1.5x baseline during peak hours
+
+**Configuration**:
+- Total connections: 750M globally
+- Peak hours: 18:00-22:00 local time per region
+- Query rate: 75K QPS
+- Test duration: 5 hours
+- Multiple peaks (simulate time zones)
+
+**Success Criteria**:
+- P99 latency < 75ms
+- P50 latency < 15ms
+- Error rate < 0.5%
+- Auto-scaling triggers within 60s
+- Cost < $5K for test
+
+---
+
+### 2. Burst Scenarios
+
+#### 2.1 World Cup Final (50x Burst)
+**Objective**: Handle massive spike during major sporting event
+
+**Event Profile**:
+- **Pre-event**: 30 minutes before kickoff
+- **Peak**: During match (90 minutes + 30 min halftime)
+- **Post-event**: 60 minutes after final whistle
+- **Geography**: Concentrated in specific regions (France, Argentina)
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Peak: 25B concurrent (50x)
+- Primary regions: europe-west3 (France), southamerica-east1 (Argentina)
+- Secondary spillover: All Europe/Americas regions
+- Query rate: 2.5M QPS at peak
+- Test duration: 3 hours
+
+**Load Pattern**:
+```javascript
+{
+  stages: [
+    // Pre-event buzz (30 min before)
+    { duration: "30m", target: 500000 },   // 10x baseline
+    { duration: "15m", target: 2500000 },  // 50x PEAK
+    // First half (45 min)
+    { duration: "45m", target: 2500000 },  // Sustained peak
+    // Halftime (15 min - slight drop)
+    { duration: "15m", target: 1500000 },  // 30x
+    // Second half (45 min)
+    { duration: "45m", target: 2500000 },  // Back to peak
+    // Extra time / penalties (30 min)
+    { duration: "30m", target: 3000000 },  // 60x SUPER PEAK
+    // Post-game analysis (30 min)
+    { duration: "30m", target: 1000000 },  // 20x
+    // Gradual decline (30 min)
+    { duration: "30m", target: 100000 }    // 2x
+  ]
+}
+```
+
+**Regional Distribution**:
+- **France**: 40% (10B peak)
+- **Argentina**: 35% (8.75B peak)
+- **Spain/Italy/Portugal**: 10% (2.5B peak)
+- **Rest of Europe**: 8% (2B peak)
+- **Americas**: 5% (1.25B peak)
+- **Asia/Pacific**: 2% (500M peak)
+
+**Success Criteria**:
+- System survives without crash
+- P99 latency < 200ms (degraded acceptable)
+- P50 latency < 50ms
+- Error rate < 5% (acceptable during super peak)
+- Auto-scaling completes within 10 minutes
+- No cascading failures
+- Graceful degradation activated when needed
+- Cost < $100K for full test
+
+**Pre-warming**:
+- Enable predictive scaling 15 minutes before test
+- Pre-allocate 25x capacity in primary regions
+- Warm up CDN caches
+- Increase database connection pools
+
+#### 2.2 Product Launch (10x Burst)
+**Objective**: Handle viral traffic spike (e.g., AI model release)
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Peak: 5B concurrent (10x)
+- Distribution: Global, concentrated in US
+- Query rate: 500K QPS
+- Test duration: 2 hours
+- Pattern: Sudden spike, gradual decline
+
+**Load Pattern**:
+```javascript
+{
+  stages: [
+    { duration: "5m",  target: 500000 },  // 10x instant spike
+    { duration: "30m", target: 500000 },  // Sustained
+    { duration: "45m", target: 300000 },  // Gradual decline
+    { duration: "40m", target: 100000 }   // Return to normal
+  ]
+}
+```
+
+**Success Criteria**:
+- Reactive scaling responds within 60s
+- P99 latency < 100ms
+- Error rate < 2%
+- No downtime
+
+#### 2.3 Flash Crowd (25x Burst)
+**Objective**: Unpredictable viral event
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Peak: 12.5B concurrent (25x)
+- Geography: Unpredictable (use US for test)
+- Query rate: 1.25M QPS
+- Test duration: 90 minutes
+- Pattern: Very rapid spike (< 2 minutes)
+
+**Load Pattern**:
+```javascript
+{
+  stages: [
+    { duration: "2m",  target: 1250000 }, // 25x in 2 minutes!
+    { duration: "30m", target: 1250000 }, // Hold peak
+    { duration: "30m", target: 750000 },  // Decline
+    { duration: "28m", target: 100000 }   // Return
+  ]
+}
+```
+
+**Success Criteria**:
+- System survives without manual intervention
+- Reactive scaling activates immediately
+- P99 latency < 150ms
+- Error rate < 3%
+- Cost cap respected
+
+---
+
+### 3. Failover Scenarios
+
+#### 3.1 Single Region Failure
+**Objective**: Validate regional failover
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Failed region: europe-west1 (80M connections)
+- Failover targets: europe-west3, europe-north1
+- Query rate: 50K QPS
+- Test duration: 1 hour
+- Failure trigger: 30 minutes into test
+
+**Procedure**:
+1. Run baseline load for 30 minutes
+2. Simulate region failure (kill all instances in europe-west1)
+3. Observe failover behavior
+4. Measure recovery time
+5. Validate data consistency
+
+**Success Criteria**:
+- Failover completes within 60 seconds
+- Connection loss < 5%
+- No data loss
+- P99 latency spike < 200ms during failover
+- Automatic recovery when region restored
+
+#### 3.2 Multi-Region Cascade Failure
+**Objective**: Test disaster recovery
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Failed regions: europe-west1, europe-west3 (160M connections)
+- Failover: Global redistribution
+- Test duration: 2 hours
+- Progressive failures (15 min apart)
+
+**Procedure**:
+1. Run baseline load
+2. Kill europe-west1 at T+30m
+3. Kill europe-west3 at T+45m
+4. Observe cascade prevention
+5. Validate global recovery
+
+**Success Criteria**:
+- No cascading failures
+- Circuit breakers activate
+- Graceful degradation if needed
+- Connection loss < 10%
+- System remains stable
+
+#### 3.3 Database Failover
+**Objective**: Test database resilience
+
+**Configuration**:
+- Baseline: 500M concurrent
+- Database: Trigger Cloud SQL failover to replica
+- Query rate: 50K QPS (read-heavy)
+- Test duration: 1 hour
+- Failure trigger: 20 minutes into test
+
+**Success Criteria**:
+- Failover completes within 30 seconds
+- Connection pool recovers automatically
+- Read queries continue with < 5% errors
+- Write queries resume after failover
+- No permanent data loss
+
+---
+
+### 4. Workload Scenarios
+
+#### 4.1 Read-Heavy (90% Reads)
+**Objective**: Validate cache effectiveness
+
+**Configuration**:
+- Total connections: 500M
+- Query mix: 90% similarity search, 10% updates
+- Cache hit rate target: > 75%
+- Query rate: 50K QPS
+- Test duration: 2 hours
+
+**Success Criteria**:
+- P99 latency < 30ms (due to caching)
+- Cache hit rate > 75%
+- Database CPU < 50%
+
+#### 4.2 Write-Heavy (40% Writes)
+**Objective**: Test write throughput
+
+**Configuration**:
+- Total connections: 500M
+- Query mix: 60% reads, 40% vector updates
+- Query rate: 50K QPS
+- Test duration: 2 hours
+- Vector dimensions: 768
+
+**Success Criteria**:
+- P99 latency < 100ms
+- Database CPU < 80%
+- Replication lag < 5 seconds
+- No write conflicts
+
+#### 4.3 Mixed Workload (Realistic)
+**Objective**: Simulate production traffic
+
+**Configuration**:
+- Total connections: 500M
+- Query mix:
+  - 70% similarity search
+  - 15% filtered search
+  - 10% vector inserts
+  - 5% deletes
+- Query rate: 50K QPS
+- Test duration: 4 hours
+- Varying vector dimensions (384, 768, 1536)
+
+**Success Criteria**:
+- P99 latency < 50ms
+- All operations succeed
+- Resource utilization balanced
+
+---
+
+### 5. Stress Scenarios
+
+#### 5.1 Gradual Load Increase
+**Objective**: Find breaking point
+
+**Configuration**:
+- Start: 100M concurrent
+- End: Until system breaks
+- Increment: +100M every 30 minutes
+- Query rate: Proportional to connections
+- Test duration: Until failure
+
+**Success Criteria**:
+- Identify maximum capacity
+- Measure degradation curve
+- Observe failure modes
+
+#### 5.2 Long-Duration Soak Test
+**Objective**: Detect memory leaks and resource exhaustion
+
+**Configuration**:
+- Total connections: 500M
+- Query rate: 50K QPS
+- Test duration: 24 hours
+- Pattern: Steady state
+
+**Success Criteria**:
+- No memory leaks
+- No connection leaks
+- Stable performance over time
+- Resource cleanup works
+
+---
+
+## Test Execution Strategy
+
+### Sequential Execution (Standard Suite)
+Total time: ~18 hours
+
+1. Baseline Steady State (4h)
+2. Daily Peak (5h)
+3. Product Launch 10x (2h)
+4. Single Region Failover (1h)
+5. Read-Heavy Workload (2h)
+6. Write-Heavy Workload (2h)
+7. Mixed Workload (4h)
+
+### Burst Suite (Special Events)
+Total time: ~8 hours
+
+1. World Cup 50x (3h)
+2. Flash Crowd 25x (1.5h)
+3. Multi-Region Cascade (2h)
+4. Database Failover (1h)
+
+### Quick Validation (Smoke Test)
+Total time: ~2 hours
+
+1. Baseline Steady State - 30 minutes
+2. Product Launch 10x - 30 minutes
+3. Single Region Failover - 30 minutes
+4. Mixed Workload - 30 minutes
+
+---
+
+## Monitoring During Tests
+
+### Real-Time Metrics
+- Connection count per region
+- Query latency percentiles (p50, p95, p99)
+- Error rates by type
+- CPU/Memory utilization
+- Network throughput
+- Database connections
+- Cache hit rates
+
+### Alerts
+- P99 latency > 50ms (warning)
+- P99 latency > 100ms (critical)
+- Error rate > 1% (warning)
+- Error rate > 5% (critical)
+- Region unhealthy
+- Database connections > 90%
+- Cost > $10K/hour
+
+### Dashboards
+1. Executive: High-level metrics, SLA status
+2. Operations: Regional health, resource utilization
+3. Cost: Hourly spend, projections
+4. Performance: Latency distributions, throughput
+
+---
+
+## Cost Estimates
+
+### Per-Test Costs
+
+| Scenario | Duration | Peak Load | Estimated Cost |
+|----------|----------|-----------|----------------|
+| Baseline Steady | 4h | 500M | $180 |
+| Daily Peak | 5h | 750M | $350 |
+| World Cup 50x | 3h | 25B | $80,000 |
+| Product Launch 10x | 2h | 5B | $3,600 |
+| Flash Crowd 25x | 1.5h | 12.5B | $28,000 |
+| Single Region Failover | 1h | 500M | $45 |
+| Workload Tests | 2h | 500M | $90 |
+
+### Full Suite Costs
+- **Standard Suite**: ~$900
+- **Burst Suite**: ~$112K
+- **Quick Validation**: ~$150
+
+**Cost Optimization**:
+- Use committed use discounts (30% off)
+- Run tests in low-cost regions when possible
+- Use preemptible instances for load generators
+- Leverage CDN caching
+- Clean up resources immediately after tests
+
+---
+
+## Pre-Test Checklist
+
+### Infrastructure
+- [ ] All regions deployed and healthy
+- [ ] Load balancer configured
+- [ ] CDN enabled
+- [ ] Database replicas ready
+- [ ] Redis caches warmed
+- [ ] Monitoring dashboards set up
+- [ ] Alerting policies active
+- [ ] Budget alerts configured
+
+### Load Generation
+- [ ] K6 scripts validated
+- [ ] Load generators deployed in all regions
+- [ ] Test data prepared
+- [ ] Baseline traffic running
+- [ ] Credentials configured
+- [ ] Results storage ready
+
+### Team
+- [ ] On-call engineer available
+- [ ] Communication channels open (Slack)
+- [ ] Runbook reviewed
+- [ ] Rollback plan ready
+- [ ] Stakeholders notified
+
+---
+
+## Post-Test Analysis
+
+### Deliverables
+1. Test execution log
+2. Metrics summary (latency, throughput, errors)
+3. SLA compliance report
+4. Cost breakdown
+5. Bottleneck analysis
+6. Recommendations document
+7. Performance comparison (vs. previous tests)
+
+### Key Questions
+- Did we meet SLA targets?
+- Where did bottlenecks occur?
+- How well did auto-scaling perform?
+- Were there any unexpected failures?
+- What was the actual cost vs. estimate?
+- What improvements should we make?
+
+---
+
+## Example: Running World Cup Test
+
+```bash
+# 1. Pre-warm infrastructure
+cd /home/user/ruvector/src/burst-scaling
+npm run build
+node dist/burst-predictor.js --event "World Cup Final" --time "2026-07-15T18:00:00Z"
+
+# 2. Deploy load generators
+cd /home/user/ruvector/benchmarks
+npm run deploy:generators
+
+# 3. Run scenario
+npm run scenario:worldcup -- \
+  --regions "europe-west3,southamerica-east1" \
+  --peak-multiplier 50 \
+  --duration "3h" \
+  --enable-notifications
+
+# 4. Monitor (separate terminal)
+npm run dashboard
+
+# 5. Collect results
+npm run analyze -- --test-id "worldcup-2026-final-test"
+
+# 6. Generate report
+npm run report -- --test-id "worldcup-2026-final-test" --format pdf
+```
+
+---
+
+## Troubleshooting
+
+### High Error Rates
+- Check: Database connection pool exhaustion
+- Check: Network bandwidth limits
+- Check: Rate limiting too aggressive
+- Action: Scale up resources or enable degradation
+
+### High Latency
+- Check: Cold cache (low hit rate)
+- Check: Database query performance
+- Check: Network latency between regions
+- Action: Warm caches, optimize queries, adjust routing
+
+### Failed Auto-Scaling
+- Check: GCP quotas and limits
+- Check: Budget caps
+- Check: IAM permissions
+- Action: Request quota increase, adjust caps
+
+### Cost Overruns
+- Check: Instances not scaling down
+- Check: Database overprovisioned
+- Check: Excessive logging
+- Action: Force scale-in, reduce logging verbosity
+
+---
+
+## Next Steps
+
+1. **Run Quick Validation**: Ensure system is ready
+2. **Run Standard Suite**: Comprehensive testing
+3. **Schedule Burst Tests**: Coordinate with team (expensive!)
+4. **Iterate Based on Results**: Tune thresholds and configurations
+5. **Document Learnings**: Update runbooks and architecture docs
+
+---
+
+## References
+
+- [Architecture Overview](/home/user/ruvector/docs/cloud-architecture/architecture-overview.md)
+- [Scaling Strategy](/home/user/ruvector/docs/cloud-architecture/scaling-strategy.md)
+- [Burst Scaling](/home/user/ruvector/src/burst-scaling/README.md)
+- [Benchmarking Guide](/home/user/ruvector/benchmarks/README.md)
+- [Operations Runbook](/home/user/ruvector/src/burst-scaling/RUNBOOK.md)
+
+---
+
+**Document Version**: 1.0
+**Last Updated**: 2025-11-20
+**Author**: RuVector Performance Team