dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

9.4 KiB

Raw Blame History

Benchmark Suite Documentation

Overview

The agentic-synth benchmark suite provides comprehensive performance testing across multiple dimensions:

Data generation throughput
API latency and percentiles
Memory usage profiling
Cache effectiveness
Streaming performance
Concurrent generation scenarios

Quick Start

# Install dependencies
npm install

# Build project
npm run build

# Run all benchmarks
npm run benchmark

# Run specific benchmark
npm run benchmark -- --suite "Throughput Test"

# Run with custom configuration
npm run benchmark -- --iterations 20 --concurrency 200

# Generate report
npm run benchmark -- --output benchmarks/report.md

Benchmark Suites

1. Throughput Benchmark

Measures: Requests per second at various concurrency levels

Configuration:

{
  iterations: 10,
  concurrency: 100,
  maxTokens: 100
}

Targets:

Minimum: 10 req/s
Target: 50+ req/s
Optimal: 100+ req/s

2. Latency Benchmark

Measures: Response time percentiles (P50, P95, P99)

Configuration:

{
  iterations: 50,
  maxTokens: 50
}

Targets:

P50: < 500ms
P95: < 800ms
P99: < 1000ms
Cached: < 100ms

3. Memory Benchmark

Measures: Memory usage patterns and leak detection

Configuration:

{
  iterations: 100,
  maxTokens: 100,
  enableGC: true
}

Targets:

Peak: < 400MB
Final (after GC): < 200MB
No memory leaks

4. Cache Benchmark

Measures: Cache hit rates and effectiveness

Configuration:

{
  cacheSize: 1000,
  ttl: 3600000,
  repeatRatio: 0.5
}

Targets:

Hit rate: > 50%
Optimal: > 80%

5. Concurrency Benchmark

Measures: Performance at various concurrency levels

Tests: 10, 50, 100, 200 concurrent requests

Targets:

10 concurrent: < 2s total
50 concurrent: < 5s total
100 concurrent: < 10s total
200 concurrent: < 20s total

6. Streaming Benchmark

Measures: Streaming performance and time-to-first-byte

Configuration:

{
  maxTokens: 500,
  measureFirstChunk: true
}

Targets:

First chunk: < 200ms
Total duration: < 5s
Chunks: 50-100

CLI Usage

Basic Commands

# Run all benchmarks
agentic-synth benchmark

# Run specific suite
agentic-synth benchmark --suite "Latency Test"

# Custom iterations
agentic-synth benchmark --iterations 20

# Custom concurrency
agentic-synth benchmark --concurrency 200

# Output report
agentic-synth benchmark --output report.md

Advanced Options

# Full configuration
agentic-synth benchmark \
  --suite "All" \
  --iterations 20 \
  --concurrency 100 \
  --warmup 5 \
  --output benchmarks/detailed-report.md

Programmatic Usage

Running Benchmarks

import {
  BenchmarkRunner,
  ThroughputBenchmark,
  LatencyBenchmark,
  BenchmarkAnalyzer,
  BenchmarkReporter
} from '@ruvector/agentic-synth/benchmarks';
import { AgenticSynth } from '@ruvector/agentic-synth';

const synth = new AgenticSynth({
  enableCache: true,
  maxConcurrency: 100
});

const runner = new BenchmarkRunner();
runner.registerSuite(new ThroughputBenchmark(synth));
runner.registerSuite(new LatencyBenchmark(synth));

const result = await runner.runAll({
  name: 'My Benchmark',
  iterations: 10,
  concurrency: 100,
  warmupIterations: 2,
  timeout: 300000
});

console.log('Throughput:', result.metrics.throughput);
console.log('P99 Latency:', result.metrics.p99LatencyMs);

Analyzing Results

import { BenchmarkAnalyzer } from '@ruvector/agentic-synth/benchmarks';

const analyzer = new BenchmarkAnalyzer();
analyzer.analyze(result);

// Automatic bottleneck detection
// Optimization recommendations
// Performance comparison

Generating Reports

import { BenchmarkReporter } from '@ruvector/agentic-synth/benchmarks';

const reporter = new BenchmarkReporter();

// Markdown report
await reporter.generateMarkdown([result], 'report.md');

// JSON data export
await reporter.generateJSON([result], 'data.json');

CI/CD Integration

GitHub Actions

name: Performance Benchmarks

on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install Dependencies
        run: npm ci

      - name: Build
        run: npm run build

      - name: Run Benchmarks
        run: npm run benchmark:ci
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

      - name: Upload Report
        uses: actions/upload-artifact@v3
        with:
          name: performance-report
          path: benchmarks/performance-report.md

      - name: Check Regression
        run: |
          if [ $? -ne 0 ]; then
            echo "Performance regression detected!"
            exit 1
          fi

GitLab CI

benchmark:
  stage: test
  script:
    - npm ci
    - npm run build
    - npm run benchmark:ci
  artifacts:
    paths:
      - benchmarks/performance-report.md
    when: always
  only:
    - main
    - merge_requests

Performance Regression Detection

The CI runner automatically checks for regressions:

{
  maxP99Latency: 1000,      // 1 second
  minThroughput: 10,        // 10 req/s
  maxMemoryMB: 400,         // 400MB
  minCacheHitRate: 0.5,     // 50%
  maxErrorRate: 0.01        // 1%
}

Exit Codes:

0: All tests passed
1: Performance regression detected

Report Formats

Markdown Report

Includes:

Performance metrics table
Latency distribution
Optimization recommendations
Historical trends
Pass/fail status

JSON Report

Includes:

Raw metrics data
Timestamp
Configuration
Recommendations
Full result objects

Performance Metrics

Collected Metrics

Metric	Description	Unit
throughput	Requests per second	req/s
p50LatencyMs	50th percentile latency	ms
p95LatencyMs	95th percentile latency	ms
p99LatencyMs	99th percentile latency	ms
avgLatencyMs	Average latency	ms
cacheHitRate	Cache hit ratio	0-1
memoryUsageMB	Memory usage	MB
cpuUsagePercent	CPU usage	%
concurrentRequests	Active requests	count
errorRate	Error ratio	0-1

Performance Targets

Category	Metric	Target	Optimal
Speed	P99 Latency	< 1000ms	< 500ms
Speed	Throughput	> 10 req/s	> 50 req/s
Cache	Hit Rate	> 50%	> 80%
Memory	Usage	< 400MB	< 200MB
Reliability	Error Rate	< 1%	< 0.1%

Bottleneck Analysis

Automatic Detection

The analyzer automatically detects:

Latency Bottlenecks
- Slow API responses
- Network issues
- Cache misses
Throughput Bottlenecks
- Low concurrency
- Sequential processing
- API rate limits
Memory Bottlenecks
- Large cache size
- Memory leaks
- Excessive buffering
Cache Bottlenecks
- Low hit rate
- Small cache size
- Poor key strategy

Recommendations

Each bottleneck includes:

Category (cache, routing, memory, etc.)
Severity (low, medium, high, critical)
Issue description
Optimization recommendation
Estimated improvement
Implementation effort

Best Practices

Running Benchmarks

Warmup: Always use warmup iterations (2-5)
Iterations: Use 10+ for statistical significance
Concurrency: Test at expected load levels
Environment: Run in consistent environment
Monitoring: Watch system resources

Analyzing Results

Trends: Compare across multiple runs
Baselines: Establish performance baselines
Regressions: Set up automated checks
Profiling: Profile bottlenecks before optimizing
Documentation: Document optimization changes

CI/CD Integration

Automation: Run on every PR/commit
Thresholds: Set realistic regression thresholds
Artifacts: Save reports and data
Notifications: Alert on regressions
History: Track performance over time

Troubleshooting

Common Issues

High Variance:

Increase warmup iterations
Run more iterations
Check system load

API Errors:

Verify API key
Check rate limits
Review network connectivity

Out of Memory:

Reduce concurrency
Decrease cache size
Enable GC

Slow Benchmarks:

Reduce iterations
Decrease concurrency
Use smaller maxTokens

Advanced Features

Custom Benchmarks

import { BenchmarkSuite } from '@ruvector/agentic-synth/benchmarks';

class CustomBenchmark implements BenchmarkSuite {
  name = 'Custom Test';

  async run(): Promise<void> {
    // Your benchmark logic
  }
}

runner.registerSuite(new CustomBenchmark());

Custom Thresholds

import { BottleneckAnalyzer } from '@ruvector/agentic-synth/benchmarks';

const analyzer = new BottleneckAnalyzer();
analyzer.setThresholds({
  maxP99LatencyMs: 500,    // Stricter than default
  minThroughput: 50,       // Higher than default
  maxMemoryMB: 300         // Lower than default
});

Performance Hooks

# Pre-benchmark hook
npx claude-flow@alpha hooks pre-task --description "Benchmarking"

# Post-benchmark hook
npx claude-flow@alpha hooks post-task --task-id "bench-123"

9.4 KiB Raw Blame History

Benchmark Suite Documentation

Overview

Quick Start

Benchmark Suites

1. Throughput Benchmark

2. Latency Benchmark

3. Memory Benchmark

4. Cache Benchmark

5. Concurrency Benchmark

6. Streaming Benchmark

CLI Usage

Basic Commands

Advanced Options

Programmatic Usage

Running Benchmarks

Analyzing Results

Generating Reports

CI/CD Integration

GitHub Actions

GitLab CI

Performance Regression Detection

Report Formats

Markdown Report

JSON Report

Performance Metrics

Collected Metrics

Performance Targets

Bottleneck Analysis

Automatic Detection

Recommendations

Best Practices

Running Benchmarks

Analyzing Results

CI/CD Integration

Troubleshooting

Common Issues

Advanced Features

Custom Benchmarks

Custom Thresholds

Performance Hooks

Resources

9.4 KiB

Raw Blame History