git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
583 lines
14 KiB
Markdown
583 lines
14 KiB
Markdown
# RuVector Load Testing Scenarios
|
|
|
|
## Overview
|
|
|
|
This document defines comprehensive load testing scenarios for the globally distributed RuVector system, targeting 500 million concurrent learning streams with burst capacity up to 25 billion.
|
|
|
|
## Test Environment
|
|
|
|
### Global Regions
|
|
- **Americas**: us-central1, us-east1, us-west1, southamerica-east1
|
|
- **Europe**: europe-west1, europe-west3, europe-north1
|
|
- **Asia-Pacific**: asia-east1, asia-southeast1, asia-northeast1, australia-southeast1
|
|
- **Total**: 11 regions
|
|
|
|
### Infrastructure
|
|
- **Cloud Run**: Auto-scaling instances (10-1000 per region)
|
|
- **Load Balancer**: Global HTTPS LB with Cloud CDN
|
|
- **Database**: Cloud SQL PostgreSQL (multi-region)
|
|
- **Cache**: Memorystore Redis (128GB per region)
|
|
- **Monitoring**: Cloud Monitoring + OpenTelemetry
|
|
|
|
---
|
|
|
|
## Scenario Categories
|
|
|
|
### 1. Baseline Scenarios
|
|
|
|
#### 1.1 Steady State (500M Concurrent)
|
|
**Objective**: Validate system handles target baseline load
|
|
|
|
**Configuration**:
|
|
- Total connections: 500M globally
|
|
- Distribution: Proportional to region capacity
|
|
- Tier-1 regions (5): 80M each = 400M
|
|
- Tier-2 regions (10): 10M each = 100M
|
|
- Query rate: 50K QPS globally
|
|
- Test duration: 4 hours
|
|
- Ramp-up: 30 minutes
|
|
|
|
**Success Criteria**:
|
|
- P99 latency < 50ms
|
|
- P50 latency < 10ms
|
|
- Error rate < 0.1%
|
|
- No memory leaks
|
|
- CPU utilization 60-80%
|
|
- All regions healthy
|
|
|
|
**Load Pattern**:
|
|
```javascript
|
|
{
|
|
type: "ramped-arrival-rate",
|
|
stages: [
|
|
{ duration: "30m", target: 50000 }, // Ramp up
|
|
{ duration: "4h", target: 50000 }, // Steady
|
|
{ duration: "15m", target: 0 } // Ramp down
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 1.2 Daily Peak (750M Concurrent)
|
|
**Objective**: Handle 1.5x baseline during peak hours
|
|
|
|
**Configuration**:
|
|
- Total connections: 750M globally
|
|
- Peak hours: 18:00-22:00 local time per region
|
|
- Query rate: 75K QPS
|
|
- Test duration: 5 hours
|
|
- Multiple peaks (simulate time zones)
|
|
|
|
**Success Criteria**:
|
|
- P99 latency < 75ms
|
|
- P50 latency < 15ms
|
|
- Error rate < 0.5%
|
|
- Auto-scaling triggers within 60s
|
|
- Cost < $5K for test
|
|
|
|
---
|
|
|
|
### 2. Burst Scenarios
|
|
|
|
#### 2.1 World Cup Final (50x Burst)
|
|
**Objective**: Handle massive spike during major sporting event
|
|
|
|
**Event Profile**:
|
|
- **Pre-event**: 30 minutes before kickoff
|
|
- **Peak**: During match (90 minutes + 30 min halftime)
|
|
- **Post-event**: 60 minutes after final whistle
|
|
- **Geography**: Concentrated in specific regions (France, Argentina)
|
|
|
|
**Configuration**:
|
|
- Baseline: 500M concurrent
|
|
- Peak: 25B concurrent (50x)
|
|
- Primary regions: europe-west3 (France), southamerica-east1 (Argentina)
|
|
- Secondary spillover: All Europe/Americas regions
|
|
- Query rate: 2.5M QPS at peak
|
|
- Test duration: 3 hours
|
|
|
|
**Load Pattern**:
|
|
```javascript
|
|
{
|
|
stages: [
|
|
// Pre-event buzz (30 min before)
|
|
{ duration: "30m", target: 500000 }, // 10x baseline
|
|
{ duration: "15m", target: 2500000 }, // 50x PEAK
|
|
// First half (45 min)
|
|
{ duration: "45m", target: 2500000 }, // Sustained peak
|
|
// Halftime (15 min - slight drop)
|
|
{ duration: "15m", target: 1500000 }, // 30x
|
|
// Second half (45 min)
|
|
{ duration: "45m", target: 2500000 }, // Back to peak
|
|
// Extra time / penalties (30 min)
|
|
{ duration: "30m", target: 3000000 }, // 60x SUPER PEAK
|
|
// Post-game analysis (30 min)
|
|
{ duration: "30m", target: 1000000 }, // 20x
|
|
// Gradual decline (30 min)
|
|
{ duration: "30m", target: 100000 } // 2x
|
|
]
|
|
}
|
|
```
|
|
|
|
**Regional Distribution**:
|
|
- **France**: 40% (10B peak)
|
|
- **Argentina**: 35% (8.75B peak)
|
|
- **Spain/Italy/Portugal**: 10% (2.5B peak)
|
|
- **Rest of Europe**: 8% (2B peak)
|
|
- **Americas**: 5% (1.25B peak)
|
|
- **Asia/Pacific**: 2% (500M peak)
|
|
|
|
**Success Criteria**:
|
|
- System survives without crash
|
|
- P99 latency < 200ms (degraded acceptable)
|
|
- P50 latency < 50ms
|
|
- Error rate < 5% (acceptable during super peak)
|
|
- Auto-scaling completes within 10 minutes
|
|
- No cascading failures
|
|
- Graceful degradation activated when needed
|
|
- Cost < $100K for full test
|
|
|
|
**Pre-warming**:
|
|
- Enable predictive scaling 15 minutes before test
|
|
- Pre-allocate 25x capacity in primary regions
|
|
- Warm up CDN caches
|
|
- Increase database connection pools
|
|
|
|
#### 2.2 Product Launch (10x Burst)
|
|
**Objective**: Handle viral traffic spike (e.g., AI model release)
|
|
|
|
**Configuration**:
|
|
- Baseline: 500M concurrent
|
|
- Peak: 5B concurrent (10x)
|
|
- Distribution: Global, concentrated in US
|
|
- Query rate: 500K QPS
|
|
- Test duration: 2 hours
|
|
- Pattern: Sudden spike, gradual decline
|
|
|
|
**Load Pattern**:
|
|
```javascript
|
|
{
|
|
stages: [
|
|
{ duration: "5m", target: 500000 }, // 10x instant spike
|
|
{ duration: "30m", target: 500000 }, // Sustained
|
|
{ duration: "45m", target: 300000 }, // Gradual decline
|
|
{ duration: "40m", target: 100000 } // Return to normal
|
|
]
|
|
}
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- Reactive scaling responds within 60s
|
|
- P99 latency < 100ms
|
|
- Error rate < 2%
|
|
- No downtime
|
|
|
|
#### 2.3 Flash Crowd (25x Burst)
|
|
**Objective**: Unpredictable viral event
|
|
|
|
**Configuration**:
|
|
- Baseline: 500M concurrent
|
|
- Peak: 12.5B concurrent (25x)
|
|
- Geography: Unpredictable (use US for test)
|
|
- Query rate: 1.25M QPS
|
|
- Test duration: 90 minutes
|
|
- Pattern: Very rapid spike (< 2 minutes)
|
|
|
|
**Load Pattern**:
|
|
```javascript
|
|
{
|
|
stages: [
|
|
{ duration: "2m", target: 1250000 }, // 25x in 2 minutes!
|
|
{ duration: "30m", target: 1250000 }, // Hold peak
|
|
{ duration: "30m", target: 750000 }, // Decline
|
|
{ duration: "28m", target: 100000 } // Return
|
|
]
|
|
}
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- System survives without manual intervention
|
|
- Reactive scaling activates immediately
|
|
- P99 latency < 150ms
|
|
- Error rate < 3%
|
|
- Cost cap respected
|
|
|
|
---
|
|
|
|
### 3. Failover Scenarios
|
|
|
|
#### 3.1 Single Region Failure
|
|
**Objective**: Validate regional failover
|
|
|
|
**Configuration**:
|
|
- Baseline: 500M concurrent
|
|
- Failed region: europe-west1 (80M connections)
|
|
- Failover targets: europe-west3, europe-north1
|
|
- Query rate: 50K QPS
|
|
- Test duration: 1 hour
|
|
- Failure trigger: 30 minutes into test
|
|
|
|
**Procedure**:
|
|
1. Run baseline load for 30 minutes
|
|
2. Simulate region failure (kill all instances in europe-west1)
|
|
3. Observe failover behavior
|
|
4. Measure recovery time
|
|
5. Validate data consistency
|
|
|
|
**Success Criteria**:
|
|
- Failover completes within 60 seconds
|
|
- Connection loss < 5%
|
|
- No data loss
|
|
- P99 latency spike < 200ms during failover
|
|
- Automatic recovery when region restored
|
|
|
|
#### 3.2 Multi-Region Cascade Failure
|
|
**Objective**: Test disaster recovery
|
|
|
|
**Configuration**:
|
|
- Baseline: 500M concurrent
|
|
- Failed regions: europe-west1, europe-west3 (160M connections)
|
|
- Failover: Global redistribution
|
|
- Test duration: 2 hours
|
|
- Progressive failures (15 min apart)
|
|
|
|
**Procedure**:
|
|
1. Run baseline load
|
|
2. Kill europe-west1 at T+30m
|
|
3. Kill europe-west3 at T+45m
|
|
4. Observe cascade prevention
|
|
5. Validate global recovery
|
|
|
|
**Success Criteria**:
|
|
- No cascading failures
|
|
- Circuit breakers activate
|
|
- Graceful degradation if needed
|
|
- Connection loss < 10%
|
|
- System remains stable
|
|
|
|
#### 3.3 Database Failover
|
|
**Objective**: Test database resilience
|
|
|
|
**Configuration**:
|
|
- Baseline: 500M concurrent
|
|
- Database: Trigger Cloud SQL failover to replica
|
|
- Query rate: 50K QPS (read-heavy)
|
|
- Test duration: 1 hour
|
|
- Failure trigger: 20 minutes into test
|
|
|
|
**Success Criteria**:
|
|
- Failover completes within 30 seconds
|
|
- Connection pool recovers automatically
|
|
- Read queries continue with < 5% errors
|
|
- Write queries resume after failover
|
|
- No permanent data loss
|
|
|
|
---
|
|
|
|
### 4. Workload Scenarios
|
|
|
|
#### 4.1 Read-Heavy (90% Reads)
|
|
**Objective**: Validate cache effectiveness
|
|
|
|
**Configuration**:
|
|
- Total connections: 500M
|
|
- Query mix: 90% similarity search, 10% updates
|
|
- Cache hit rate target: > 75%
|
|
- Query rate: 50K QPS
|
|
- Test duration: 2 hours
|
|
|
|
**Success Criteria**:
|
|
- P99 latency < 30ms (due to caching)
|
|
- Cache hit rate > 75%
|
|
- Database CPU < 50%
|
|
|
|
#### 4.2 Write-Heavy (40% Writes)
|
|
**Objective**: Test write throughput
|
|
|
|
**Configuration**:
|
|
- Total connections: 500M
|
|
- Query mix: 60% reads, 40% vector updates
|
|
- Query rate: 50K QPS
|
|
- Test duration: 2 hours
|
|
- Vector dimensions: 768
|
|
|
|
**Success Criteria**:
|
|
- P99 latency < 100ms
|
|
- Database CPU < 80%
|
|
- Replication lag < 5 seconds
|
|
- No write conflicts
|
|
|
|
#### 4.3 Mixed Workload (Realistic)
|
|
**Objective**: Simulate production traffic
|
|
|
|
**Configuration**:
|
|
- Total connections: 500M
|
|
- Query mix:
|
|
- 70% similarity search
|
|
- 15% filtered search
|
|
- 10% vector inserts
|
|
- 5% deletes
|
|
- Query rate: 50K QPS
|
|
- Test duration: 4 hours
|
|
- Varying vector dimensions (384, 768, 1536)
|
|
|
|
**Success Criteria**:
|
|
- P99 latency < 50ms
|
|
- All operations succeed
|
|
- Resource utilization balanced
|
|
|
|
---
|
|
|
|
### 5. Stress Scenarios
|
|
|
|
#### 5.1 Gradual Load Increase
|
|
**Objective**: Find breaking point
|
|
|
|
**Configuration**:
|
|
- Start: 100M concurrent
|
|
- End: Until system breaks
|
|
- Increment: +100M every 30 minutes
|
|
- Query rate: Proportional to connections
|
|
- Test duration: Until failure
|
|
|
|
**Success Criteria**:
|
|
- Identify maximum capacity
|
|
- Measure degradation curve
|
|
- Observe failure modes
|
|
|
|
#### 5.2 Long-Duration Soak Test
|
|
**Objective**: Detect memory leaks and resource exhaustion
|
|
|
|
**Configuration**:
|
|
- Total connections: 500M
|
|
- Query rate: 50K QPS
|
|
- Test duration: 24 hours
|
|
- Pattern: Steady state
|
|
|
|
**Success Criteria**:
|
|
- No memory leaks
|
|
- No connection leaks
|
|
- Stable performance over time
|
|
- Resource cleanup works
|
|
|
|
---
|
|
|
|
## Test Execution Strategy
|
|
|
|
### Sequential Execution (Standard Suite)
|
|
Total time: ~18 hours
|
|
|
|
1. Baseline Steady State (4h)
|
|
2. Daily Peak (5h)
|
|
3. Product Launch 10x (2h)
|
|
4. Single Region Failover (1h)
|
|
5. Read-Heavy Workload (2h)
|
|
6. Write-Heavy Workload (2h)
|
|
7. Mixed Workload (4h)
|
|
|
|
### Burst Suite (Special Events)
|
|
Total time: ~8 hours
|
|
|
|
1. World Cup 50x (3h)
|
|
2. Flash Crowd 25x (1.5h)
|
|
3. Multi-Region Cascade (2h)
|
|
4. Database Failover (1h)
|
|
|
|
### Quick Validation (Smoke Test)
|
|
Total time: ~2 hours
|
|
|
|
1. Baseline Steady State - 30 minutes
|
|
2. Product Launch 10x - 30 minutes
|
|
3. Single Region Failover - 30 minutes
|
|
4. Mixed Workload - 30 minutes
|
|
|
|
---
|
|
|
|
## Monitoring During Tests
|
|
|
|
### Real-Time Metrics
|
|
- Connection count per region
|
|
- Query latency percentiles (p50, p95, p99)
|
|
- Error rates by type
|
|
- CPU/Memory utilization
|
|
- Network throughput
|
|
- Database connections
|
|
- Cache hit rates
|
|
|
|
### Alerts
|
|
- P99 latency > 50ms (warning)
|
|
- P99 latency > 100ms (critical)
|
|
- Error rate > 1% (warning)
|
|
- Error rate > 5% (critical)
|
|
- Region unhealthy
|
|
- Database connections > 90%
|
|
- Cost > $10K/hour
|
|
|
|
### Dashboards
|
|
1. Executive: High-level metrics, SLA status
|
|
2. Operations: Regional health, resource utilization
|
|
3. Cost: Hourly spend, projections
|
|
4. Performance: Latency distributions, throughput
|
|
|
|
---
|
|
|
|
## Cost Estimates
|
|
|
|
### Per-Test Costs
|
|
|
|
| Scenario | Duration | Peak Load | Estimated Cost |
|
|
|----------|----------|-----------|----------------|
|
|
| Baseline Steady | 4h | 500M | $180 |
|
|
| Daily Peak | 5h | 750M | $350 |
|
|
| World Cup 50x | 3h | 25B | $80,000 |
|
|
| Product Launch 10x | 2h | 5B | $3,600 |
|
|
| Flash Crowd 25x | 1.5h | 12.5B | $28,000 |
|
|
| Single Region Failover | 1h | 500M | $45 |
|
|
| Workload Tests | 2h | 500M | $90 |
|
|
|
|
### Full Suite Costs
|
|
- **Standard Suite**: ~$900
|
|
- **Burst Suite**: ~$112K
|
|
- **Quick Validation**: ~$150
|
|
|
|
**Cost Optimization**:
|
|
- Use committed use discounts (30% off)
|
|
- Run tests in low-cost regions when possible
|
|
- Use preemptible instances for load generators
|
|
- Leverage CDN caching
|
|
- Clean up resources immediately after tests
|
|
|
|
---
|
|
|
|
## Pre-Test Checklist
|
|
|
|
### Infrastructure
|
|
- [ ] All regions deployed and healthy
|
|
- [ ] Load balancer configured
|
|
- [ ] CDN enabled
|
|
- [ ] Database replicas ready
|
|
- [ ] Redis caches warmed
|
|
- [ ] Monitoring dashboards set up
|
|
- [ ] Alerting policies active
|
|
- [ ] Budget alerts configured
|
|
|
|
### Load Generation
|
|
- [ ] K6 scripts validated
|
|
- [ ] Load generators deployed in all regions
|
|
- [ ] Test data prepared
|
|
- [ ] Baseline traffic running
|
|
- [ ] Credentials configured
|
|
- [ ] Results storage ready
|
|
|
|
### Team
|
|
- [ ] On-call engineer available
|
|
- [ ] Communication channels open (Slack)
|
|
- [ ] Runbook reviewed
|
|
- [ ] Rollback plan ready
|
|
- [ ] Stakeholders notified
|
|
|
|
---
|
|
|
|
## Post-Test Analysis
|
|
|
|
### Deliverables
|
|
1. Test execution log
|
|
2. Metrics summary (latency, throughput, errors)
|
|
3. SLA compliance report
|
|
4. Cost breakdown
|
|
5. Bottleneck analysis
|
|
6. Recommendations document
|
|
7. Performance comparison (vs. previous tests)
|
|
|
|
### Key Questions
|
|
- Did we meet SLA targets?
|
|
- Where did bottlenecks occur?
|
|
- How well did auto-scaling perform?
|
|
- Were there any unexpected failures?
|
|
- What was the actual cost vs. estimate?
|
|
- What improvements should we make?
|
|
|
|
---
|
|
|
|
## Example: Running World Cup Test
|
|
|
|
```bash
|
|
# 1. Pre-warm infrastructure
|
|
cd /home/user/ruvector/src/burst-scaling
|
|
npm run build
|
|
node dist/burst-predictor.js --event "World Cup Final" --time "2026-07-15T18:00:00Z"
|
|
|
|
# 2. Deploy load generators
|
|
cd /home/user/ruvector/benchmarks
|
|
npm run deploy:generators
|
|
|
|
# 3. Run scenario
|
|
npm run scenario:worldcup -- \
|
|
--regions "europe-west3,southamerica-east1" \
|
|
--peak-multiplier 50 \
|
|
--duration "3h" \
|
|
--enable-notifications
|
|
|
|
# 4. Monitor (separate terminal)
|
|
npm run dashboard
|
|
|
|
# 5. Collect results
|
|
npm run analyze -- --test-id "worldcup-2026-final-test"
|
|
|
|
# 6. Generate report
|
|
npm run report -- --test-id "worldcup-2026-final-test" --format pdf
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### High Error Rates
|
|
- Check: Database connection pool exhaustion
|
|
- Check: Network bandwidth limits
|
|
- Check: Rate limiting too aggressive
|
|
- Action: Scale up resources or enable degradation
|
|
|
|
### High Latency
|
|
- Check: Cold cache (low hit rate)
|
|
- Check: Database query performance
|
|
- Check: Network latency between regions
|
|
- Action: Warm caches, optimize queries, adjust routing
|
|
|
|
### Failed Auto-Scaling
|
|
- Check: GCP quotas and limits
|
|
- Check: Budget caps
|
|
- Check: IAM permissions
|
|
- Action: Request quota increase, adjust caps
|
|
|
|
### Cost Overruns
|
|
- Check: Instances not scaling down
|
|
- Check: Database overprovisioned
|
|
- Check: Excessive logging
|
|
- Action: Force scale-in, reduce logging verbosity
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Run Quick Validation**: Ensure system is ready
|
|
2. **Run Standard Suite**: Comprehensive testing
|
|
3. **Schedule Burst Tests**: Coordinate with team (expensive!)
|
|
4. **Iterate Based on Results**: Tune thresholds and configurations
|
|
5. **Document Learnings**: Update runbooks and architecture docs
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [Architecture Overview](/home/user/ruvector/docs/cloud-architecture/architecture-overview.md)
|
|
- [Scaling Strategy](/home/user/ruvector/docs/cloud-architecture/scaling-strategy.md)
|
|
- [Burst Scaling](/home/user/ruvector/src/burst-scaling/README.md)
|
|
- [Benchmarking Guide](/home/user/ruvector/benchmarks/README.md)
|
|
- [Operations Runbook](/home/user/ruvector/src/burst-scaling/RUNBOOK.md)
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0
|
|
**Last Updated**: 2025-11-20
|
|
**Author**: RuVector Performance Team
|