14 KiB
RuVector Load Testing Scenarios
Overview
This document defines comprehensive load testing scenarios for the globally distributed RuVector system, targeting 500 million concurrent learning streams with burst capacity up to 25 billion.
Test Environment
Global Regions
- Americas: us-central1, us-east1, us-west1, southamerica-east1
- Europe: europe-west1, europe-west3, europe-north1
- Asia-Pacific: asia-east1, asia-southeast1, asia-northeast1, australia-southeast1
- Total: 11 regions
Infrastructure
- Cloud Run: Auto-scaling instances (10-1000 per region)
- Load Balancer: Global HTTPS LB with Cloud CDN
- Database: Cloud SQL PostgreSQL (multi-region)
- Cache: Memorystore Redis (128GB per region)
- Monitoring: Cloud Monitoring + OpenTelemetry
Scenario Categories
1. Baseline Scenarios
1.1 Steady State (500M Concurrent)
Objective: Validate system handles target baseline load
Configuration:
- Total connections: 500M globally
- Distribution: Proportional to region capacity
- Tier-1 regions (5): 80M each = 400M
- Tier-2 regions (10): 10M each = 100M
- Query rate: 50K QPS globally
- Test duration: 4 hours
- Ramp-up: 30 minutes
Success Criteria:
- P99 latency < 50ms
- P50 latency < 10ms
- Error rate < 0.1%
- No memory leaks
- CPU utilization 60-80%
- All regions healthy
Load Pattern:
{
type: "ramped-arrival-rate",
stages: [
{ duration: "30m", target: 50000 }, // Ramp up
{ duration: "4h", target: 50000 }, // Steady
{ duration: "15m", target: 0 } // Ramp down
]
}
1.2 Daily Peak (750M Concurrent)
Objective: Handle 1.5x baseline during peak hours
Configuration:
- Total connections: 750M globally
- Peak hours: 18:00-22:00 local time per region
- Query rate: 75K QPS
- Test duration: 5 hours
- Multiple peaks (simulate time zones)
Success Criteria:
- P99 latency < 75ms
- P50 latency < 15ms
- Error rate < 0.5%
- Auto-scaling triggers within 60s
- Cost < $5K for test
2. Burst Scenarios
2.1 World Cup Final (50x Burst)
Objective: Handle massive spike during major sporting event
Event Profile:
- Pre-event: 30 minutes before kickoff
- Peak: During match (90 minutes + 30 min halftime)
- Post-event: 60 minutes after final whistle
- Geography: Concentrated in specific regions (France, Argentina)
Configuration:
- Baseline: 500M concurrent
- Peak: 25B concurrent (50x)
- Primary regions: europe-west3 (France), southamerica-east1 (Argentina)
- Secondary spillover: All Europe/Americas regions
- Query rate: 2.5M QPS at peak
- Test duration: 3 hours
Load Pattern:
{
stages: [
// Pre-event buzz (30 min before)
{ duration: "30m", target: 500000 }, // 10x baseline
{ duration: "15m", target: 2500000 }, // 50x PEAK
// First half (45 min)
{ duration: "45m", target: 2500000 }, // Sustained peak
// Halftime (15 min - slight drop)
{ duration: "15m", target: 1500000 }, // 30x
// Second half (45 min)
{ duration: "45m", target: 2500000 }, // Back to peak
// Extra time / penalties (30 min)
{ duration: "30m", target: 3000000 }, // 60x SUPER PEAK
// Post-game analysis (30 min)
{ duration: "30m", target: 1000000 }, // 20x
// Gradual decline (30 min)
{ duration: "30m", target: 100000 } // 2x
]
}
Regional Distribution:
- France: 40% (10B peak)
- Argentina: 35% (8.75B peak)
- Spain/Italy/Portugal: 10% (2.5B peak)
- Rest of Europe: 8% (2B peak)
- Americas: 5% (1.25B peak)
- Asia/Pacific: 2% (500M peak)
Success Criteria:
- System survives without crash
- P99 latency < 200ms (degraded acceptable)
- P50 latency < 50ms
- Error rate < 5% (acceptable during super peak)
- Auto-scaling completes within 10 minutes
- No cascading failures
- Graceful degradation activated when needed
- Cost < $100K for full test
Pre-warming:
- Enable predictive scaling 15 minutes before test
- Pre-allocate 25x capacity in primary regions
- Warm up CDN caches
- Increase database connection pools
2.2 Product Launch (10x Burst)
Objective: Handle viral traffic spike (e.g., AI model release)
Configuration:
- Baseline: 500M concurrent
- Peak: 5B concurrent (10x)
- Distribution: Global, concentrated in US
- Query rate: 500K QPS
- Test duration: 2 hours
- Pattern: Sudden spike, gradual decline
Load Pattern:
{
stages: [
{ duration: "5m", target: 500000 }, // 10x instant spike
{ duration: "30m", target: 500000 }, // Sustained
{ duration: "45m", target: 300000 }, // Gradual decline
{ duration: "40m", target: 100000 } // Return to normal
]
}
Success Criteria:
- Reactive scaling responds within 60s
- P99 latency < 100ms
- Error rate < 2%
- No downtime
2.3 Flash Crowd (25x Burst)
Objective: Unpredictable viral event
Configuration:
- Baseline: 500M concurrent
- Peak: 12.5B concurrent (25x)
- Geography: Unpredictable (use US for test)
- Query rate: 1.25M QPS
- Test duration: 90 minutes
- Pattern: Very rapid spike (< 2 minutes)
Load Pattern:
{
stages: [
{ duration: "2m", target: 1250000 }, // 25x in 2 minutes!
{ duration: "30m", target: 1250000 }, // Hold peak
{ duration: "30m", target: 750000 }, // Decline
{ duration: "28m", target: 100000 } // Return
]
}
Success Criteria:
- System survives without manual intervention
- Reactive scaling activates immediately
- P99 latency < 150ms
- Error rate < 3%
- Cost cap respected
3. Failover Scenarios
3.1 Single Region Failure
Objective: Validate regional failover
Configuration:
- Baseline: 500M concurrent
- Failed region: europe-west1 (80M connections)
- Failover targets: europe-west3, europe-north1
- Query rate: 50K QPS
- Test duration: 1 hour
- Failure trigger: 30 minutes into test
Procedure:
- Run baseline load for 30 minutes
- Simulate region failure (kill all instances in europe-west1)
- Observe failover behavior
- Measure recovery time
- Validate data consistency
Success Criteria:
- Failover completes within 60 seconds
- Connection loss < 5%
- No data loss
- P99 latency spike < 200ms during failover
- Automatic recovery when region restored
3.2 Multi-Region Cascade Failure
Objective: Test disaster recovery
Configuration:
- Baseline: 500M concurrent
- Failed regions: europe-west1, europe-west3 (160M connections)
- Failover: Global redistribution
- Test duration: 2 hours
- Progressive failures (15 min apart)
Procedure:
- Run baseline load
- Kill europe-west1 at T+30m
- Kill europe-west3 at T+45m
- Observe cascade prevention
- Validate global recovery
Success Criteria:
- No cascading failures
- Circuit breakers activate
- Graceful degradation if needed
- Connection loss < 10%
- System remains stable
3.3 Database Failover
Objective: Test database resilience
Configuration:
- Baseline: 500M concurrent
- Database: Trigger Cloud SQL failover to replica
- Query rate: 50K QPS (read-heavy)
- Test duration: 1 hour
- Failure trigger: 20 minutes into test
Success Criteria:
- Failover completes within 30 seconds
- Connection pool recovers automatically
- Read queries continue with < 5% errors
- Write queries resume after failover
- No permanent data loss
4. Workload Scenarios
4.1 Read-Heavy (90% Reads)
Objective: Validate cache effectiveness
Configuration:
- Total connections: 500M
- Query mix: 90% similarity search, 10% updates
- Cache hit rate target: > 75%
- Query rate: 50K QPS
- Test duration: 2 hours
Success Criteria:
- P99 latency < 30ms (due to caching)
- Cache hit rate > 75%
- Database CPU < 50%
4.2 Write-Heavy (40% Writes)
Objective: Test write throughput
Configuration:
- Total connections: 500M
- Query mix: 60% reads, 40% vector updates
- Query rate: 50K QPS
- Test duration: 2 hours
- Vector dimensions: 768
Success Criteria:
- P99 latency < 100ms
- Database CPU < 80%
- Replication lag < 5 seconds
- No write conflicts
4.3 Mixed Workload (Realistic)
Objective: Simulate production traffic
Configuration:
- Total connections: 500M
- Query mix:
- 70% similarity search
- 15% filtered search
- 10% vector inserts
- 5% deletes
- Query rate: 50K QPS
- Test duration: 4 hours
- Varying vector dimensions (384, 768, 1536)
Success Criteria:
- P99 latency < 50ms
- All operations succeed
- Resource utilization balanced
5. Stress Scenarios
5.1 Gradual Load Increase
Objective: Find breaking point
Configuration:
- Start: 100M concurrent
- End: Until system breaks
- Increment: +100M every 30 minutes
- Query rate: Proportional to connections
- Test duration: Until failure
Success Criteria:
- Identify maximum capacity
- Measure degradation curve
- Observe failure modes
5.2 Long-Duration Soak Test
Objective: Detect memory leaks and resource exhaustion
Configuration:
- Total connections: 500M
- Query rate: 50K QPS
- Test duration: 24 hours
- Pattern: Steady state
Success Criteria:
- No memory leaks
- No connection leaks
- Stable performance over time
- Resource cleanup works
Test Execution Strategy
Sequential Execution (Standard Suite)
Total time: ~18 hours
- Baseline Steady State (4h)
- Daily Peak (5h)
- Product Launch 10x (2h)
- Single Region Failover (1h)
- Read-Heavy Workload (2h)
- Write-Heavy Workload (2h)
- Mixed Workload (4h)
Burst Suite (Special Events)
Total time: ~8 hours
- World Cup 50x (3h)
- Flash Crowd 25x (1.5h)
- Multi-Region Cascade (2h)
- Database Failover (1h)
Quick Validation (Smoke Test)
Total time: ~2 hours
- Baseline Steady State - 30 minutes
- Product Launch 10x - 30 minutes
- Single Region Failover - 30 minutes
- Mixed Workload - 30 minutes
Monitoring During Tests
Real-Time Metrics
- Connection count per region
- Query latency percentiles (p50, p95, p99)
- Error rates by type
- CPU/Memory utilization
- Network throughput
- Database connections
- Cache hit rates
Alerts
- P99 latency > 50ms (warning)
- P99 latency > 100ms (critical)
- Error rate > 1% (warning)
- Error rate > 5% (critical)
- Region unhealthy
- Database connections > 90%
- Cost > $10K/hour
Dashboards
- Executive: High-level metrics, SLA status
- Operations: Regional health, resource utilization
- Cost: Hourly spend, projections
- Performance: Latency distributions, throughput
Cost Estimates
Per-Test Costs
| Scenario | Duration | Peak Load | Estimated Cost |
|---|---|---|---|
| Baseline Steady | 4h | 500M | $180 |
| Daily Peak | 5h | 750M | $350 |
| World Cup 50x | 3h | 25B | $80,000 |
| Product Launch 10x | 2h | 5B | $3,600 |
| Flash Crowd 25x | 1.5h | 12.5B | $28,000 |
| Single Region Failover | 1h | 500M | $45 |
| Workload Tests | 2h | 500M | $90 |
Full Suite Costs
- Standard Suite: ~$900
- Burst Suite: ~$112K
- Quick Validation: ~$150
Cost Optimization:
- Use committed use discounts (30% off)
- Run tests in low-cost regions when possible
- Use preemptible instances for load generators
- Leverage CDN caching
- Clean up resources immediately after tests
Pre-Test Checklist
Infrastructure
- All regions deployed and healthy
- Load balancer configured
- CDN enabled
- Database replicas ready
- Redis caches warmed
- Monitoring dashboards set up
- Alerting policies active
- Budget alerts configured
Load Generation
- K6 scripts validated
- Load generators deployed in all regions
- Test data prepared
- Baseline traffic running
- Credentials configured
- Results storage ready
Team
- On-call engineer available
- Communication channels open (Slack)
- Runbook reviewed
- Rollback plan ready
- Stakeholders notified
Post-Test Analysis
Deliverables
- Test execution log
- Metrics summary (latency, throughput, errors)
- SLA compliance report
- Cost breakdown
- Bottleneck analysis
- Recommendations document
- Performance comparison (vs. previous tests)
Key Questions
- Did we meet SLA targets?
- Where did bottlenecks occur?
- How well did auto-scaling perform?
- Were there any unexpected failures?
- What was the actual cost vs. estimate?
- What improvements should we make?
Example: Running World Cup Test
# 1. Pre-warm infrastructure
cd /home/user/ruvector/src/burst-scaling
npm run build
node dist/burst-predictor.js --event "World Cup Final" --time "2026-07-15T18:00:00Z"
# 2. Deploy load generators
cd /home/user/ruvector/benchmarks
npm run deploy:generators
# 3. Run scenario
npm run scenario:worldcup -- \
--regions "europe-west3,southamerica-east1" \
--peak-multiplier 50 \
--duration "3h" \
--enable-notifications
# 4. Monitor (separate terminal)
npm run dashboard
# 5. Collect results
npm run analyze -- --test-id "worldcup-2026-final-test"
# 6. Generate report
npm run report -- --test-id "worldcup-2026-final-test" --format pdf
Troubleshooting
High Error Rates
- Check: Database connection pool exhaustion
- Check: Network bandwidth limits
- Check: Rate limiting too aggressive
- Action: Scale up resources or enable degradation
High Latency
- Check: Cold cache (low hit rate)
- Check: Database query performance
- Check: Network latency between regions
- Action: Warm caches, optimize queries, adjust routing
Failed Auto-Scaling
- Check: GCP quotas and limits
- Check: Budget caps
- Check: IAM permissions
- Action: Request quota increase, adjust caps
Cost Overruns
- Check: Instances not scaling down
- Check: Database overprovisioned
- Check: Excessive logging
- Action: Force scale-in, reduce logging verbosity
Next Steps
- Run Quick Validation: Ensure system is ready
- Run Standard Suite: Comprehensive testing
- Schedule Burst Tests: Coordinate with team (expensive!)
- Iterate Based on Results: Tune thresholds and configurations
- Document Learnings: Update runbooks and architecture docs
References
Document Version: 1.0 Last Updated: 2025-11-20 Author: RuVector Performance Team