Files
wifi-densepose/vendor/ruvector/benchmarks/docs/LOAD_TEST_SCENARIOS.md

14 KiB

RuVector Load Testing Scenarios

Overview

This document defines comprehensive load testing scenarios for the globally distributed RuVector system, targeting 500 million concurrent learning streams with burst capacity up to 25 billion.

Test Environment

Global Regions

  • Americas: us-central1, us-east1, us-west1, southamerica-east1
  • Europe: europe-west1, europe-west3, europe-north1
  • Asia-Pacific: asia-east1, asia-southeast1, asia-northeast1, australia-southeast1
  • Total: 11 regions

Infrastructure

  • Cloud Run: Auto-scaling instances (10-1000 per region)
  • Load Balancer: Global HTTPS LB with Cloud CDN
  • Database: Cloud SQL PostgreSQL (multi-region)
  • Cache: Memorystore Redis (128GB per region)
  • Monitoring: Cloud Monitoring + OpenTelemetry

Scenario Categories

1. Baseline Scenarios

1.1 Steady State (500M Concurrent)

Objective: Validate system handles target baseline load

Configuration:

  • Total connections: 500M globally
  • Distribution: Proportional to region capacity
    • Tier-1 regions (5): 80M each = 400M
    • Tier-2 regions (10): 10M each = 100M
  • Query rate: 50K QPS globally
  • Test duration: 4 hours
  • Ramp-up: 30 minutes

Success Criteria:

  • P99 latency < 50ms
  • P50 latency < 10ms
  • Error rate < 0.1%
  • No memory leaks
  • CPU utilization 60-80%
  • All regions healthy

Load Pattern:

{
  type: "ramped-arrival-rate",
  stages: [
    { duration: "30m", target: 50000 }, // Ramp up
    { duration: "4h",  target: 50000 }, // Steady
    { duration: "15m", target: 0 }      // Ramp down
  ]
}

1.2 Daily Peak (750M Concurrent)

Objective: Handle 1.5x baseline during peak hours

Configuration:

  • Total connections: 750M globally
  • Peak hours: 18:00-22:00 local time per region
  • Query rate: 75K QPS
  • Test duration: 5 hours
  • Multiple peaks (simulate time zones)

Success Criteria:

  • P99 latency < 75ms
  • P50 latency < 15ms
  • Error rate < 0.5%
  • Auto-scaling triggers within 60s
  • Cost < $5K for test

2. Burst Scenarios

2.1 World Cup Final (50x Burst)

Objective: Handle massive spike during major sporting event

Event Profile:

  • Pre-event: 30 minutes before kickoff
  • Peak: During match (90 minutes + 30 min halftime)
  • Post-event: 60 minutes after final whistle
  • Geography: Concentrated in specific regions (France, Argentina)

Configuration:

  • Baseline: 500M concurrent
  • Peak: 25B concurrent (50x)
  • Primary regions: europe-west3 (France), southamerica-east1 (Argentina)
  • Secondary spillover: All Europe/Americas regions
  • Query rate: 2.5M QPS at peak
  • Test duration: 3 hours

Load Pattern:

{
  stages: [
    // Pre-event buzz (30 min before)
    { duration: "30m", target: 500000 },   // 10x baseline
    { duration: "15m", target: 2500000 },  // 50x PEAK
    // First half (45 min)
    { duration: "45m", target: 2500000 },  // Sustained peak
    // Halftime (15 min - slight drop)
    { duration: "15m", target: 1500000 },  // 30x
    // Second half (45 min)
    { duration: "45m", target: 2500000 },  // Back to peak
    // Extra time / penalties (30 min)
    { duration: "30m", target: 3000000 },  // 60x SUPER PEAK
    // Post-game analysis (30 min)
    { duration: "30m", target: 1000000 },  // 20x
    // Gradual decline (30 min)
    { duration: "30m", target: 100000 }    // 2x
  ]
}

Regional Distribution:

  • France: 40% (10B peak)
  • Argentina: 35% (8.75B peak)
  • Spain/Italy/Portugal: 10% (2.5B peak)
  • Rest of Europe: 8% (2B peak)
  • Americas: 5% (1.25B peak)
  • Asia/Pacific: 2% (500M peak)

Success Criteria:

  • System survives without crash
  • P99 latency < 200ms (degraded acceptable)
  • P50 latency < 50ms
  • Error rate < 5% (acceptable during super peak)
  • Auto-scaling completes within 10 minutes
  • No cascading failures
  • Graceful degradation activated when needed
  • Cost < $100K for full test

Pre-warming:

  • Enable predictive scaling 15 minutes before test
  • Pre-allocate 25x capacity in primary regions
  • Warm up CDN caches
  • Increase database connection pools

2.2 Product Launch (10x Burst)

Objective: Handle viral traffic spike (e.g., AI model release)

Configuration:

  • Baseline: 500M concurrent
  • Peak: 5B concurrent (10x)
  • Distribution: Global, concentrated in US
  • Query rate: 500K QPS
  • Test duration: 2 hours
  • Pattern: Sudden spike, gradual decline

Load Pattern:

{
  stages: [
    { duration: "5m",  target: 500000 },  // 10x instant spike
    { duration: "30m", target: 500000 },  // Sustained
    { duration: "45m", target: 300000 },  // Gradual decline
    { duration: "40m", target: 100000 }   // Return to normal
  ]
}

Success Criteria:

  • Reactive scaling responds within 60s
  • P99 latency < 100ms
  • Error rate < 2%
  • No downtime

2.3 Flash Crowd (25x Burst)

Objective: Unpredictable viral event

Configuration:

  • Baseline: 500M concurrent
  • Peak: 12.5B concurrent (25x)
  • Geography: Unpredictable (use US for test)
  • Query rate: 1.25M QPS
  • Test duration: 90 minutes
  • Pattern: Very rapid spike (< 2 minutes)

Load Pattern:

{
  stages: [
    { duration: "2m",  target: 1250000 }, // 25x in 2 minutes!
    { duration: "30m", target: 1250000 }, // Hold peak
    { duration: "30m", target: 750000 },  // Decline
    { duration: "28m", target: 100000 }   // Return
  ]
}

Success Criteria:

  • System survives without manual intervention
  • Reactive scaling activates immediately
  • P99 latency < 150ms
  • Error rate < 3%
  • Cost cap respected

3. Failover Scenarios

3.1 Single Region Failure

Objective: Validate regional failover

Configuration:

  • Baseline: 500M concurrent
  • Failed region: europe-west1 (80M connections)
  • Failover targets: europe-west3, europe-north1
  • Query rate: 50K QPS
  • Test duration: 1 hour
  • Failure trigger: 30 minutes into test

Procedure:

  1. Run baseline load for 30 minutes
  2. Simulate region failure (kill all instances in europe-west1)
  3. Observe failover behavior
  4. Measure recovery time
  5. Validate data consistency

Success Criteria:

  • Failover completes within 60 seconds
  • Connection loss < 5%
  • No data loss
  • P99 latency spike < 200ms during failover
  • Automatic recovery when region restored

3.2 Multi-Region Cascade Failure

Objective: Test disaster recovery

Configuration:

  • Baseline: 500M concurrent
  • Failed regions: europe-west1, europe-west3 (160M connections)
  • Failover: Global redistribution
  • Test duration: 2 hours
  • Progressive failures (15 min apart)

Procedure:

  1. Run baseline load
  2. Kill europe-west1 at T+30m
  3. Kill europe-west3 at T+45m
  4. Observe cascade prevention
  5. Validate global recovery

Success Criteria:

  • No cascading failures
  • Circuit breakers activate
  • Graceful degradation if needed
  • Connection loss < 10%
  • System remains stable

3.3 Database Failover

Objective: Test database resilience

Configuration:

  • Baseline: 500M concurrent
  • Database: Trigger Cloud SQL failover to replica
  • Query rate: 50K QPS (read-heavy)
  • Test duration: 1 hour
  • Failure trigger: 20 minutes into test

Success Criteria:

  • Failover completes within 30 seconds
  • Connection pool recovers automatically
  • Read queries continue with < 5% errors
  • Write queries resume after failover
  • No permanent data loss

4. Workload Scenarios

4.1 Read-Heavy (90% Reads)

Objective: Validate cache effectiveness

Configuration:

  • Total connections: 500M
  • Query mix: 90% similarity search, 10% updates
  • Cache hit rate target: > 75%
  • Query rate: 50K QPS
  • Test duration: 2 hours

Success Criteria:

  • P99 latency < 30ms (due to caching)
  • Cache hit rate > 75%
  • Database CPU < 50%

4.2 Write-Heavy (40% Writes)

Objective: Test write throughput

Configuration:

  • Total connections: 500M
  • Query mix: 60% reads, 40% vector updates
  • Query rate: 50K QPS
  • Test duration: 2 hours
  • Vector dimensions: 768

Success Criteria:

  • P99 latency < 100ms
  • Database CPU < 80%
  • Replication lag < 5 seconds
  • No write conflicts

4.3 Mixed Workload (Realistic)

Objective: Simulate production traffic

Configuration:

  • Total connections: 500M
  • Query mix:
    • 70% similarity search
    • 15% filtered search
    • 10% vector inserts
    • 5% deletes
  • Query rate: 50K QPS
  • Test duration: 4 hours
  • Varying vector dimensions (384, 768, 1536)

Success Criteria:

  • P99 latency < 50ms
  • All operations succeed
  • Resource utilization balanced

5. Stress Scenarios

5.1 Gradual Load Increase

Objective: Find breaking point

Configuration:

  • Start: 100M concurrent
  • End: Until system breaks
  • Increment: +100M every 30 minutes
  • Query rate: Proportional to connections
  • Test duration: Until failure

Success Criteria:

  • Identify maximum capacity
  • Measure degradation curve
  • Observe failure modes

5.2 Long-Duration Soak Test

Objective: Detect memory leaks and resource exhaustion

Configuration:

  • Total connections: 500M
  • Query rate: 50K QPS
  • Test duration: 24 hours
  • Pattern: Steady state

Success Criteria:

  • No memory leaks
  • No connection leaks
  • Stable performance over time
  • Resource cleanup works

Test Execution Strategy

Sequential Execution (Standard Suite)

Total time: ~18 hours

  1. Baseline Steady State (4h)
  2. Daily Peak (5h)
  3. Product Launch 10x (2h)
  4. Single Region Failover (1h)
  5. Read-Heavy Workload (2h)
  6. Write-Heavy Workload (2h)
  7. Mixed Workload (4h)

Burst Suite (Special Events)

Total time: ~8 hours

  1. World Cup 50x (3h)
  2. Flash Crowd 25x (1.5h)
  3. Multi-Region Cascade (2h)
  4. Database Failover (1h)

Quick Validation (Smoke Test)

Total time: ~2 hours

  1. Baseline Steady State - 30 minutes
  2. Product Launch 10x - 30 minutes
  3. Single Region Failover - 30 minutes
  4. Mixed Workload - 30 minutes

Monitoring During Tests

Real-Time Metrics

  • Connection count per region
  • Query latency percentiles (p50, p95, p99)
  • Error rates by type
  • CPU/Memory utilization
  • Network throughput
  • Database connections
  • Cache hit rates

Alerts

  • P99 latency > 50ms (warning)
  • P99 latency > 100ms (critical)
  • Error rate > 1% (warning)
  • Error rate > 5% (critical)
  • Region unhealthy
  • Database connections > 90%
  • Cost > $10K/hour

Dashboards

  1. Executive: High-level metrics, SLA status
  2. Operations: Regional health, resource utilization
  3. Cost: Hourly spend, projections
  4. Performance: Latency distributions, throughput

Cost Estimates

Per-Test Costs

Scenario Duration Peak Load Estimated Cost
Baseline Steady 4h 500M $180
Daily Peak 5h 750M $350
World Cup 50x 3h 25B $80,000
Product Launch 10x 2h 5B $3,600
Flash Crowd 25x 1.5h 12.5B $28,000
Single Region Failover 1h 500M $45
Workload Tests 2h 500M $90

Full Suite Costs

  • Standard Suite: ~$900
  • Burst Suite: ~$112K
  • Quick Validation: ~$150

Cost Optimization:

  • Use committed use discounts (30% off)
  • Run tests in low-cost regions when possible
  • Use preemptible instances for load generators
  • Leverage CDN caching
  • Clean up resources immediately after tests

Pre-Test Checklist

Infrastructure

  • All regions deployed and healthy
  • Load balancer configured
  • CDN enabled
  • Database replicas ready
  • Redis caches warmed
  • Monitoring dashboards set up
  • Alerting policies active
  • Budget alerts configured

Load Generation

  • K6 scripts validated
  • Load generators deployed in all regions
  • Test data prepared
  • Baseline traffic running
  • Credentials configured
  • Results storage ready

Team

  • On-call engineer available
  • Communication channels open (Slack)
  • Runbook reviewed
  • Rollback plan ready
  • Stakeholders notified

Post-Test Analysis

Deliverables

  1. Test execution log
  2. Metrics summary (latency, throughput, errors)
  3. SLA compliance report
  4. Cost breakdown
  5. Bottleneck analysis
  6. Recommendations document
  7. Performance comparison (vs. previous tests)

Key Questions

  • Did we meet SLA targets?
  • Where did bottlenecks occur?
  • How well did auto-scaling perform?
  • Were there any unexpected failures?
  • What was the actual cost vs. estimate?
  • What improvements should we make?

Example: Running World Cup Test

# 1. Pre-warm infrastructure
cd /home/user/ruvector/src/burst-scaling
npm run build
node dist/burst-predictor.js --event "World Cup Final" --time "2026-07-15T18:00:00Z"

# 2. Deploy load generators
cd /home/user/ruvector/benchmarks
npm run deploy:generators

# 3. Run scenario
npm run scenario:worldcup -- \
  --regions "europe-west3,southamerica-east1" \
  --peak-multiplier 50 \
  --duration "3h" \
  --enable-notifications

# 4. Monitor (separate terminal)
npm run dashboard

# 5. Collect results
npm run analyze -- --test-id "worldcup-2026-final-test"

# 6. Generate report
npm run report -- --test-id "worldcup-2026-final-test" --format pdf

Troubleshooting

High Error Rates

  • Check: Database connection pool exhaustion
  • Check: Network bandwidth limits
  • Check: Rate limiting too aggressive
  • Action: Scale up resources or enable degradation

High Latency

  • Check: Cold cache (low hit rate)
  • Check: Database query performance
  • Check: Network latency between regions
  • Action: Warm caches, optimize queries, adjust routing

Failed Auto-Scaling

  • Check: GCP quotas and limits
  • Check: Budget caps
  • Check: IAM permissions
  • Action: Request quota increase, adjust caps

Cost Overruns

  • Check: Instances not scaling down
  • Check: Database overprovisioned
  • Check: Excessive logging
  • Action: Force scale-in, reduce logging verbosity

Next Steps

  1. Run Quick Validation: Ensure system is ready
  2. Run Standard Suite: Comprehensive testing
  3. Schedule Burst Tests: Coordinate with team (expensive!)
  4. Iterate Based on Results: Tune thresholds and configurations
  5. Document Learnings: Update runbooks and architecture docs

References


Document Version: 1.0 Last Updated: 2025-11-20 Author: RuVector Performance Team