Files
wifi-densepose/vendor/ruvector/npm/packages/agentic-synth/docs/BENCHMARKS.md

9.4 KiB

Benchmark Suite Documentation

Overview

The agentic-synth benchmark suite provides comprehensive performance testing across multiple dimensions:

  • Data generation throughput
  • API latency and percentiles
  • Memory usage profiling
  • Cache effectiveness
  • Streaming performance
  • Concurrent generation scenarios

Quick Start

# Install dependencies
npm install

# Build project
npm run build

# Run all benchmarks
npm run benchmark

# Run specific benchmark
npm run benchmark -- --suite "Throughput Test"

# Run with custom configuration
npm run benchmark -- --iterations 20 --concurrency 200

# Generate report
npm run benchmark -- --output benchmarks/report.md

Benchmark Suites

1. Throughput Benchmark

Measures: Requests per second at various concurrency levels

Configuration:

{
  iterations: 10,
  concurrency: 100,
  maxTokens: 100
}

Targets:

  • Minimum: 10 req/s
  • Target: 50+ req/s
  • Optimal: 100+ req/s

2. Latency Benchmark

Measures: Response time percentiles (P50, P95, P99)

Configuration:

{
  iterations: 50,
  maxTokens: 50
}

Targets:

  • P50: < 500ms
  • P95: < 800ms
  • P99: < 1000ms
  • Cached: < 100ms

3. Memory Benchmark

Measures: Memory usage patterns and leak detection

Configuration:

{
  iterations: 100,
  maxTokens: 100,
  enableGC: true
}

Targets:

  • Peak: < 400MB
  • Final (after GC): < 200MB
  • No memory leaks

4. Cache Benchmark

Measures: Cache hit rates and effectiveness

Configuration:

{
  cacheSize: 1000,
  ttl: 3600000,
  repeatRatio: 0.5
}

Targets:

  • Hit rate: > 50%
  • Optimal: > 80%

5. Concurrency Benchmark

Measures: Performance at various concurrency levels

Tests: 10, 50, 100, 200 concurrent requests

Targets:

  • 10 concurrent: < 2s total
  • 50 concurrent: < 5s total
  • 100 concurrent: < 10s total
  • 200 concurrent: < 20s total

6. Streaming Benchmark

Measures: Streaming performance and time-to-first-byte

Configuration:

{
  maxTokens: 500,
  measureFirstChunk: true
}

Targets:

  • First chunk: < 200ms
  • Total duration: < 5s
  • Chunks: 50-100

CLI Usage

Basic Commands

# Run all benchmarks
agentic-synth benchmark

# Run specific suite
agentic-synth benchmark --suite "Latency Test"

# Custom iterations
agentic-synth benchmark --iterations 20

# Custom concurrency
agentic-synth benchmark --concurrency 200

# Output report
agentic-synth benchmark --output report.md

Advanced Options

# Full configuration
agentic-synth benchmark \
  --suite "All" \
  --iterations 20 \
  --concurrency 100 \
  --warmup 5 \
  --output benchmarks/detailed-report.md

Programmatic Usage

Running Benchmarks

import {
  BenchmarkRunner,
  ThroughputBenchmark,
  LatencyBenchmark,
  BenchmarkAnalyzer,
  BenchmarkReporter
} from '@ruvector/agentic-synth/benchmarks';
import { AgenticSynth } from '@ruvector/agentic-synth';

const synth = new AgenticSynth({
  enableCache: true,
  maxConcurrency: 100
});

const runner = new BenchmarkRunner();
runner.registerSuite(new ThroughputBenchmark(synth));
runner.registerSuite(new LatencyBenchmark(synth));

const result = await runner.runAll({
  name: 'My Benchmark',
  iterations: 10,
  concurrency: 100,
  warmupIterations: 2,
  timeout: 300000
});

console.log('Throughput:', result.metrics.throughput);
console.log('P99 Latency:', result.metrics.p99LatencyMs);

Analyzing Results

import { BenchmarkAnalyzer } from '@ruvector/agentic-synth/benchmarks';

const analyzer = new BenchmarkAnalyzer();
analyzer.analyze(result);

// Automatic bottleneck detection
// Optimization recommendations
// Performance comparison

Generating Reports

import { BenchmarkReporter } from '@ruvector/agentic-synth/benchmarks';

const reporter = new BenchmarkReporter();

// Markdown report
await reporter.generateMarkdown([result], 'report.md');

// JSON data export
await reporter.generateJSON([result], 'data.json');

CI/CD Integration

GitHub Actions

name: Performance Benchmarks

on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install Dependencies
        run: npm ci

      - name: Build
        run: npm run build

      - name: Run Benchmarks
        run: npm run benchmark:ci
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

      - name: Upload Report
        uses: actions/upload-artifact@v3
        with:
          name: performance-report
          path: benchmarks/performance-report.md

      - name: Check Regression
        run: |
          if [ $? -ne 0 ]; then
            echo "Performance regression detected!"
            exit 1
          fi

GitLab CI

benchmark:
  stage: test
  script:
    - npm ci
    - npm run build
    - npm run benchmark:ci
  artifacts:
    paths:
      - benchmarks/performance-report.md
    when: always
  only:
    - main
    - merge_requests

Performance Regression Detection

The CI runner automatically checks for regressions:

{
  maxP99Latency: 1000,      // 1 second
  minThroughput: 10,        // 10 req/s
  maxMemoryMB: 400,         // 400MB
  minCacheHitRate: 0.5,     // 50%
  maxErrorRate: 0.01        // 1%
}

Exit Codes:

  • 0: All tests passed
  • 1: Performance regression detected

Report Formats

Markdown Report

Includes:

  • Performance metrics table
  • Latency distribution
  • Optimization recommendations
  • Historical trends
  • Pass/fail status

JSON Report

Includes:

  • Raw metrics data
  • Timestamp
  • Configuration
  • Recommendations
  • Full result objects

Performance Metrics

Collected Metrics

Metric Description Unit
throughput Requests per second req/s
p50LatencyMs 50th percentile latency ms
p95LatencyMs 95th percentile latency ms
p99LatencyMs 99th percentile latency ms
avgLatencyMs Average latency ms
cacheHitRate Cache hit ratio 0-1
memoryUsageMB Memory usage MB
cpuUsagePercent CPU usage %
concurrentRequests Active requests count
errorRate Error ratio 0-1

Performance Targets

Category Metric Target Optimal
Speed P99 Latency < 1000ms < 500ms
Speed Throughput > 10 req/s > 50 req/s
Cache Hit Rate > 50% > 80%
Memory Usage < 400MB < 200MB
Reliability Error Rate < 1% < 0.1%

Bottleneck Analysis

Automatic Detection

The analyzer automatically detects:

  1. Latency Bottlenecks

    • Slow API responses
    • Network issues
    • Cache misses
  2. Throughput Bottlenecks

    • Low concurrency
    • Sequential processing
    • API rate limits
  3. Memory Bottlenecks

    • Large cache size
    • Memory leaks
    • Excessive buffering
  4. Cache Bottlenecks

    • Low hit rate
    • Small cache size
    • Poor key strategy

Recommendations

Each bottleneck includes:

  • Category (cache, routing, memory, etc.)
  • Severity (low, medium, high, critical)
  • Issue description
  • Optimization recommendation
  • Estimated improvement
  • Implementation effort

Best Practices

Running Benchmarks

  1. Warmup: Always use warmup iterations (2-5)
  2. Iterations: Use 10+ for statistical significance
  3. Concurrency: Test at expected load levels
  4. Environment: Run in consistent environment
  5. Monitoring: Watch system resources

Analyzing Results

  1. Trends: Compare across multiple runs
  2. Baselines: Establish performance baselines
  3. Regressions: Set up automated checks
  4. Profiling: Profile bottlenecks before optimizing
  5. Documentation: Document optimization changes

CI/CD Integration

  1. Automation: Run on every PR/commit
  2. Thresholds: Set realistic regression thresholds
  3. Artifacts: Save reports and data
  4. Notifications: Alert on regressions
  5. History: Track performance over time

Troubleshooting

Common Issues

High Variance:

  • Increase warmup iterations
  • Run more iterations
  • Check system load

API Errors:

  • Verify API key
  • Check rate limits
  • Review network connectivity

Out of Memory:

  • Reduce concurrency
  • Decrease cache size
  • Enable GC

Slow Benchmarks:

  • Reduce iterations
  • Decrease concurrency
  • Use smaller maxTokens

Advanced Features

Custom Benchmarks

import { BenchmarkSuite } from '@ruvector/agentic-synth/benchmarks';

class CustomBenchmark implements BenchmarkSuite {
  name = 'Custom Test';

  async run(): Promise<void> {
    // Your benchmark logic
  }
}

runner.registerSuite(new CustomBenchmark());

Custom Thresholds

import { BottleneckAnalyzer } from '@ruvector/agentic-synth/benchmarks';

const analyzer = new BottleneckAnalyzer();
analyzer.setThresholds({
  maxP99LatencyMs: 500,    // Stricter than default
  minThroughput: 50,       // Higher than default
  maxMemoryMB: 300         // Lower than default
});

Performance Hooks

# Pre-benchmark hook
npx claude-flow@alpha hooks pre-task --description "Benchmarking"

# Post-benchmark hook
npx claude-flow@alpha hooks post-task --task-id "bench-123"

Resources