Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
492
vendor/ruvector/npm/packages/agentic-synth/docs/BENCHMARKS.md
vendored
Normal file
492
vendor/ruvector/npm/packages/agentic-synth/docs/BENCHMARKS.md
vendored
Normal file
@@ -0,0 +1,492 @@
|
||||
# Benchmark Suite Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The agentic-synth benchmark suite provides comprehensive performance testing across multiple dimensions:
|
||||
- Data generation throughput
|
||||
- API latency and percentiles
|
||||
- Memory usage profiling
|
||||
- Cache effectiveness
|
||||
- Streaming performance
|
||||
- Concurrent generation scenarios
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Build project
|
||||
npm run build
|
||||
|
||||
# Run all benchmarks
|
||||
npm run benchmark
|
||||
|
||||
# Run specific benchmark
|
||||
npm run benchmark -- --suite "Throughput Test"
|
||||
|
||||
# Run with custom configuration
|
||||
npm run benchmark -- --iterations 20 --concurrency 200
|
||||
|
||||
# Generate report
|
||||
npm run benchmark -- --output benchmarks/report.md
|
||||
```
|
||||
|
||||
## Benchmark Suites
|
||||
|
||||
### 1. Throughput Benchmark
|
||||
|
||||
**Measures**: Requests per second at various concurrency levels
|
||||
|
||||
**Configuration**:
|
||||
```typescript
|
||||
{
|
||||
iterations: 10,
|
||||
concurrency: 100,
|
||||
maxTokens: 100
|
||||
}
|
||||
```
|
||||
|
||||
**Targets**:
|
||||
- Minimum: 10 req/s
|
||||
- Target: 50+ req/s
|
||||
- Optimal: 100+ req/s
|
||||
|
||||
### 2. Latency Benchmark
|
||||
|
||||
**Measures**: Response time percentiles (P50, P95, P99)
|
||||
|
||||
**Configuration**:
|
||||
```typescript
|
||||
{
|
||||
iterations: 50,
|
||||
maxTokens: 50
|
||||
}
|
||||
```
|
||||
|
||||
**Targets**:
|
||||
- P50: < 500ms
|
||||
- P95: < 800ms
|
||||
- P99: < 1000ms
|
||||
- Cached: < 100ms
|
||||
|
||||
### 3. Memory Benchmark
|
||||
|
||||
**Measures**: Memory usage patterns and leak detection
|
||||
|
||||
**Configuration**:
|
||||
```typescript
|
||||
{
|
||||
iterations: 100,
|
||||
maxTokens: 100,
|
||||
enableGC: true
|
||||
}
|
||||
```
|
||||
|
||||
**Targets**:
|
||||
- Peak: < 400MB
|
||||
- Final (after GC): < 200MB
|
||||
- No memory leaks
|
||||
|
||||
### 4. Cache Benchmark
|
||||
|
||||
**Measures**: Cache hit rates and effectiveness
|
||||
|
||||
**Configuration**:
|
||||
```typescript
|
||||
{
|
||||
cacheSize: 1000,
|
||||
ttl: 3600000,
|
||||
repeatRatio: 0.5
|
||||
}
|
||||
```
|
||||
|
||||
**Targets**:
|
||||
- Hit rate: > 50%
|
||||
- Optimal: > 80%
|
||||
|
||||
### 5. Concurrency Benchmark
|
||||
|
||||
**Measures**: Performance at various concurrency levels
|
||||
|
||||
**Tests**: 10, 50, 100, 200 concurrent requests
|
||||
|
||||
**Targets**:
|
||||
- 10 concurrent: < 2s total
|
||||
- 50 concurrent: < 5s total
|
||||
- 100 concurrent: < 10s total
|
||||
- 200 concurrent: < 20s total
|
||||
|
||||
### 6. Streaming Benchmark
|
||||
|
||||
**Measures**: Streaming performance and time-to-first-byte
|
||||
|
||||
**Configuration**:
|
||||
```typescript
|
||||
{
|
||||
maxTokens: 500,
|
||||
measureFirstChunk: true
|
||||
}
|
||||
```
|
||||
|
||||
**Targets**:
|
||||
- First chunk: < 200ms
|
||||
- Total duration: < 5s
|
||||
- Chunks: 50-100
|
||||
|
||||
## CLI Usage
|
||||
|
||||
### Basic Commands
|
||||
|
||||
```bash
|
||||
# Run all benchmarks
|
||||
agentic-synth benchmark
|
||||
|
||||
# Run specific suite
|
||||
agentic-synth benchmark --suite "Latency Test"
|
||||
|
||||
# Custom iterations
|
||||
agentic-synth benchmark --iterations 20
|
||||
|
||||
# Custom concurrency
|
||||
agentic-synth benchmark --concurrency 200
|
||||
|
||||
# Output report
|
||||
agentic-synth benchmark --output report.md
|
||||
```
|
||||
|
||||
### Advanced Options
|
||||
|
||||
```bash
|
||||
# Full configuration
|
||||
agentic-synth benchmark \
|
||||
--suite "All" \
|
||||
--iterations 20 \
|
||||
--concurrency 100 \
|
||||
--warmup 5 \
|
||||
--output benchmarks/detailed-report.md
|
||||
```
|
||||
|
||||
## Programmatic Usage
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
```typescript
|
||||
import {
|
||||
BenchmarkRunner,
|
||||
ThroughputBenchmark,
|
||||
LatencyBenchmark,
|
||||
BenchmarkAnalyzer,
|
||||
BenchmarkReporter
|
||||
} from '@ruvector/agentic-synth/benchmarks';
|
||||
import { AgenticSynth } from '@ruvector/agentic-synth';
|
||||
|
||||
const synth = new AgenticSynth({
|
||||
enableCache: true,
|
||||
maxConcurrency: 100
|
||||
});
|
||||
|
||||
const runner = new BenchmarkRunner();
|
||||
runner.registerSuite(new ThroughputBenchmark(synth));
|
||||
runner.registerSuite(new LatencyBenchmark(synth));
|
||||
|
||||
const result = await runner.runAll({
|
||||
name: 'My Benchmark',
|
||||
iterations: 10,
|
||||
concurrency: 100,
|
||||
warmupIterations: 2,
|
||||
timeout: 300000
|
||||
});
|
||||
|
||||
console.log('Throughput:', result.metrics.throughput);
|
||||
console.log('P99 Latency:', result.metrics.p99LatencyMs);
|
||||
```
|
||||
|
||||
### Analyzing Results
|
||||
|
||||
```typescript
|
||||
import { BenchmarkAnalyzer } from '@ruvector/agentic-synth/benchmarks';
|
||||
|
||||
const analyzer = new BenchmarkAnalyzer();
|
||||
analyzer.analyze(result);
|
||||
|
||||
// Automatic bottleneck detection
|
||||
// Optimization recommendations
|
||||
// Performance comparison
|
||||
```
|
||||
|
||||
### Generating Reports
|
||||
|
||||
```typescript
|
||||
import { BenchmarkReporter } from '@ruvector/agentic-synth/benchmarks';
|
||||
|
||||
const reporter = new BenchmarkReporter();
|
||||
|
||||
// Markdown report
|
||||
await reporter.generateMarkdown([result], 'report.md');
|
||||
|
||||
// JSON data export
|
||||
await reporter.generateJSON([result], 'data.json');
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
```yaml
|
||||
name: Performance Benchmarks
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
benchmark:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Setup Node.js
|
||||
uses: actions/setup-node@v3
|
||||
with:
|
||||
node-version: '18'
|
||||
|
||||
- name: Install Dependencies
|
||||
run: npm ci
|
||||
|
||||
- name: Build
|
||||
run: npm run build
|
||||
|
||||
- name: Run Benchmarks
|
||||
run: npm run benchmark:ci
|
||||
env:
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
|
||||
- name: Upload Report
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: performance-report
|
||||
path: benchmarks/performance-report.md
|
||||
|
||||
- name: Check Regression
|
||||
run: |
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "Performance regression detected!"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### GitLab CI
|
||||
|
||||
```yaml
|
||||
benchmark:
|
||||
stage: test
|
||||
script:
|
||||
- npm ci
|
||||
- npm run build
|
||||
- npm run benchmark:ci
|
||||
artifacts:
|
||||
paths:
|
||||
- benchmarks/performance-report.md
|
||||
when: always
|
||||
only:
|
||||
- main
|
||||
- merge_requests
|
||||
```
|
||||
|
||||
## Performance Regression Detection
|
||||
|
||||
The CI runner automatically checks for regressions:
|
||||
|
||||
```typescript
|
||||
{
|
||||
maxP99Latency: 1000, // 1 second
|
||||
minThroughput: 10, // 10 req/s
|
||||
maxMemoryMB: 400, // 400MB
|
||||
minCacheHitRate: 0.5, // 50%
|
||||
maxErrorRate: 0.01 // 1%
|
||||
}
|
||||
```
|
||||
|
||||
**Exit Codes**:
|
||||
- 0: All tests passed
|
||||
- 1: Performance regression detected
|
||||
|
||||
## Report Formats
|
||||
|
||||
### Markdown Report
|
||||
|
||||
Includes:
|
||||
- Performance metrics table
|
||||
- Latency distribution
|
||||
- Optimization recommendations
|
||||
- Historical trends
|
||||
- Pass/fail status
|
||||
|
||||
### JSON Report
|
||||
|
||||
Includes:
|
||||
- Raw metrics data
|
||||
- Timestamp
|
||||
- Configuration
|
||||
- Recommendations
|
||||
- Full result objects
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
### Collected Metrics
|
||||
|
||||
| Metric | Description | Unit |
|
||||
|--------|-------------|------|
|
||||
| throughput | Requests per second | req/s |
|
||||
| p50LatencyMs | 50th percentile latency | ms |
|
||||
| p95LatencyMs | 95th percentile latency | ms |
|
||||
| p99LatencyMs | 99th percentile latency | ms |
|
||||
| avgLatencyMs | Average latency | ms |
|
||||
| cacheHitRate | Cache hit ratio | 0-1 |
|
||||
| memoryUsageMB | Memory usage | MB |
|
||||
| cpuUsagePercent | CPU usage | % |
|
||||
| concurrentRequests | Active requests | count |
|
||||
| errorRate | Error ratio | 0-1 |
|
||||
|
||||
### Performance Targets
|
||||
|
||||
| Category | Metric | Target | Optimal |
|
||||
|----------|--------|--------|---------|
|
||||
| Speed | P99 Latency | < 1000ms | < 500ms |
|
||||
| Speed | Throughput | > 10 req/s | > 50 req/s |
|
||||
| Cache | Hit Rate | > 50% | > 80% |
|
||||
| Memory | Usage | < 400MB | < 200MB |
|
||||
| Reliability | Error Rate | < 1% | < 0.1% |
|
||||
|
||||
## Bottleneck Analysis
|
||||
|
||||
### Automatic Detection
|
||||
|
||||
The analyzer automatically detects:
|
||||
|
||||
1. **Latency Bottlenecks**
|
||||
- Slow API responses
|
||||
- Network issues
|
||||
- Cache misses
|
||||
|
||||
2. **Throughput Bottlenecks**
|
||||
- Low concurrency
|
||||
- Sequential processing
|
||||
- API rate limits
|
||||
|
||||
3. **Memory Bottlenecks**
|
||||
- Large cache size
|
||||
- Memory leaks
|
||||
- Excessive buffering
|
||||
|
||||
4. **Cache Bottlenecks**
|
||||
- Low hit rate
|
||||
- Small cache size
|
||||
- Poor key strategy
|
||||
|
||||
### Recommendations
|
||||
|
||||
Each bottleneck includes:
|
||||
- Category (cache, routing, memory, etc.)
|
||||
- Severity (low, medium, high, critical)
|
||||
- Issue description
|
||||
- Optimization recommendation
|
||||
- Estimated improvement
|
||||
- Implementation effort
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
1. **Warmup**: Always use warmup iterations (2-5)
|
||||
2. **Iterations**: Use 10+ for statistical significance
|
||||
3. **Concurrency**: Test at expected load levels
|
||||
4. **Environment**: Run in consistent environment
|
||||
5. **Monitoring**: Watch system resources
|
||||
|
||||
### Analyzing Results
|
||||
|
||||
1. **Trends**: Compare across multiple runs
|
||||
2. **Baselines**: Establish performance baselines
|
||||
3. **Regressions**: Set up automated checks
|
||||
4. **Profiling**: Profile bottlenecks before optimizing
|
||||
5. **Documentation**: Document optimization changes
|
||||
|
||||
### CI/CD Integration
|
||||
|
||||
1. **Automation**: Run on every PR/commit
|
||||
2. **Thresholds**: Set realistic regression thresholds
|
||||
3. **Artifacts**: Save reports and data
|
||||
4. **Notifications**: Alert on regressions
|
||||
5. **History**: Track performance over time
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**High Variance**:
|
||||
- Increase warmup iterations
|
||||
- Run more iterations
|
||||
- Check system load
|
||||
|
||||
**API Errors**:
|
||||
- Verify API key
|
||||
- Check rate limits
|
||||
- Review network connectivity
|
||||
|
||||
**Out of Memory**:
|
||||
- Reduce concurrency
|
||||
- Decrease cache size
|
||||
- Enable GC
|
||||
|
||||
**Slow Benchmarks**:
|
||||
- Reduce iterations
|
||||
- Decrease concurrency
|
||||
- Use smaller maxTokens
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Custom Benchmarks
|
||||
|
||||
```typescript
|
||||
import { BenchmarkSuite } from '@ruvector/agentic-synth/benchmarks';
|
||||
|
||||
class CustomBenchmark implements BenchmarkSuite {
|
||||
name = 'Custom Test';
|
||||
|
||||
async run(): Promise<void> {
|
||||
// Your benchmark logic
|
||||
}
|
||||
}
|
||||
|
||||
runner.registerSuite(new CustomBenchmark());
|
||||
```
|
||||
|
||||
### Custom Thresholds
|
||||
|
||||
```typescript
|
||||
import { BottleneckAnalyzer } from '@ruvector/agentic-synth/benchmarks';
|
||||
|
||||
const analyzer = new BottleneckAnalyzer();
|
||||
analyzer.setThresholds({
|
||||
maxP99LatencyMs: 500, // Stricter than default
|
||||
minThroughput: 50, // Higher than default
|
||||
maxMemoryMB: 300 // Lower than default
|
||||
});
|
||||
```
|
||||
|
||||
### Performance Hooks
|
||||
|
||||
```bash
|
||||
# Pre-benchmark hook
|
||||
npx claude-flow@alpha hooks pre-task --description "Benchmarking"
|
||||
|
||||
# Post-benchmark hook
|
||||
npx claude-flow@alpha hooks post-task --task-id "bench-123"
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- [Performance Optimization Guide](./PERFORMANCE.md)
|
||||
- [API Documentation](./API.md)
|
||||
- [Examples](../examples/)
|
||||
- [Source Code](../src/benchmarks/)
|
||||
Reference in New Issue
Block a user