wifi-densepose/npm/packages/agentic-synth/docs/PERFORMANCE.md

# Performance Optimization Guide

## Overview

Agentic-Synth is optimized for high-performance synthetic data generation with the following targets:
- **Sub-second response times** for cached requests
- **100+ concurrent generations** supported
- **Memory efficient** data handling (< 400MB)
- **50%+ cache hit rate** for typical workloads

## Performance Targets

| Metric | Target | Notes |
|--------|--------|-------|
| P99 Latency | < 1000ms | For cached requests < 100ms |
| Throughput | > 10 req/s | Scales with concurrency |
| Memory Usage | < 400MB | With 1000-item cache |
| Cache Hit Rate | > 50% | Depends on workload patterns |
| Error Rate | < 1% | With retry logic |

## Optimization Strategies

### 1. Context Caching

**Configuration:**
```typescript
const synth = new AgenticSynth({
  enableCache: true,
  cacheSize: 1000,      // Adjust based on memory
  cacheTTL: 3600000,    // 1 hour in milliseconds
});
```

**Benefits:**
- Reduces API calls by 50-80%
- Sub-100ms latency for cache hits
- Automatic LRU eviction

**Best Practices:**
- Use consistent prompts for better cache hits
- Increase cache size for repetitive workloads
- Monitor cache hit rate with `synth.getMetrics()`

### 2. Model Routing

**Configuration:**
```typescript
const synth = new AgenticSynth({
  modelPreference: [
    'claude-sonnet-4-5-20250929',
    'claude-3-5-sonnet-20241022'
  ],
});
```

**Features:**
- Automatic load balancing
- Performance-based routing
- Error handling and fallback

### 3. Concurrent Generation

**Configuration:**
```typescript
const synth = new AgenticSynth({
  maxConcurrency: 100,  // Adjust based on API limits
});
```

**Usage:**
```typescript
const prompts = [...]; // 100+ prompts
const results = await synth.generateBatch(prompts, {
  maxTokens: 500
});
```

**Performance:**
- 2-3x faster than sequential
- Respects concurrency limits
- Automatic batching

### 4. Memory Management

**Configuration:**
```typescript
const synth = new AgenticSynth({
  memoryLimit: 512 * 1024 * 1024,  // 512MB
});
```

**Features:**
- Automatic memory tracking
- LRU eviction when over limit
- Periodic cleanup with `synth.optimize()`

### 5. Streaming for Large Outputs

**Usage:**
```typescript
const stream = synth.generateStream(prompt, {
  maxTokens: 4096
});

for await (const chunk of stream) {
  // Process chunk immediately
  processChunk(chunk);
}
```

**Benefits:**
- Lower time-to-first-byte
- Reduced memory usage
- Better user experience

## Benchmarking

### Running Benchmarks

```bash
# Run all benchmarks
npm run benchmark

# Run specific suite
npm run benchmark -- --suite "Throughput Test"

# With custom settings
npm run benchmark -- --iterations 20 --concurrency 200

# Generate report
npm run benchmark -- --output benchmarks/report.md
```

### Benchmark Suites

1. **Throughput Test**: Measures requests per second
2. **Latency Test**: Measures P50/P95/P99 latencies
3. **Memory Test**: Measures memory usage and leaks
4. **Cache Test**: Measures cache effectiveness
5. **Concurrency Test**: Tests concurrent request handling
6. **Streaming Test**: Measures streaming performance

### Analyzing Results

```bash
# Analyze performance
npm run perf:analyze

# Generate detailed report
npm run perf:report
```

## Bottleneck Detection

The built-in bottleneck analyzer automatically detects:

### 1. Latency Bottlenecks
- **Cause**: Slow API responses, network issues
- **Solution**: Increase cache size, optimize prompts
- **Impact**: 30-50% latency reduction

### 2. Throughput Bottlenecks
- **Cause**: Low concurrency, sequential processing
- **Solution**: Increase maxConcurrency, use batch API
- **Impact**: 2-3x throughput increase

### 3. Memory Bottlenecks
- **Cause**: Large cache, memory leaks
- **Solution**: Reduce cache size, call optimize()
- **Impact**: 40-60% memory reduction

### 4. Cache Bottlenecks
- **Cause**: Low hit rate, small cache
- **Solution**: Increase cache size, optimize keys
- **Impact**: 20-40% cache improvement

## CI/CD Integration

### Performance Regression Detection

```bash
# Run in CI
npm run benchmark:ci
```

**Features:**
- Automatic threshold checking
- Fails build on regression
- Generates reports for artifacts

### GitHub Actions Example

```yaml
- name: Performance Benchmarks
  run: npm run benchmark:ci

- name: Upload Report
  uses: actions/upload-artifact@v3
  with:
    name: performance-report
    path: benchmarks/performance-report.md
```

## Profiling

### CPU Profiling

```bash
npm run benchmark:profile
node --prof-process isolate-*.log > profile.txt
```

### Memory Profiling

```bash
node --expose-gc --max-old-space-size=512 dist/benchmarks/runner.js
```

### Chrome DevTools

```bash
node --inspect-brk dist/benchmarks/runner.js
# Open chrome://inspect
```

## Optimization Checklist

- [ ] Enable caching for repetitive workloads
- [ ] Set appropriate cache size (1000+ items)
- [ ] Configure concurrency based on API limits
- [ ] Use batch API for multiple generations
- [ ] Implement streaming for large outputs
- [ ] Monitor memory usage regularly
- [ ] Run benchmarks before releases
- [ ] Set up CI/CD performance tests
- [ ] Profile bottlenecks periodically
- [ ] Optimize prompt patterns for cache hits

## Performance Monitoring

### Runtime Metrics

```typescript
// Get current metrics
const metrics = synth.getMetrics();
console.log('Cache:', metrics.cache);
console.log('Memory:', metrics.memory);
console.log('Router:', metrics.router);
```

### Performance Monitor

```typescript
import { PerformanceMonitor } from '@ruvector/agentic-synth';

const monitor = new PerformanceMonitor();
monitor.start();

// ... run workload ...

const metrics = monitor.getMetrics();
console.log('Throughput:', metrics.throughput);
console.log('P99 Latency:', metrics.p99LatencyMs);
```

### Bottleneck Analysis

```typescript
import { BottleneckAnalyzer } from '@ruvector/agentic-synth';

const analyzer = new BottleneckAnalyzer();
const report = analyzer.analyze(metrics);

if (report.detected) {
  console.log('Bottlenecks:', report.bottlenecks);
  console.log('Recommendations:', report.recommendations);
}
```

## Best Practices

1. **Cache Strategy**: Use prompts as cache keys, normalize formatting
2. **Concurrency**: Start with 100, increase based on API limits
3. **Memory**: Monitor with getMetrics(), call optimize() periodically
4. **Streaming**: Use for outputs > 1000 tokens
5. **Benchmarking**: Run before releases, track trends
6. **Monitoring**: Enable in production, set up alerts
7. **Optimization**: Profile first, optimize bottlenecks
8. **Testing**: Include performance tests in CI/CD

## Troubleshooting

### High Latency
- Check cache hit rate
- Increase cache size
- Optimize prompt patterns
- Check network connectivity

### Low Throughput
- Increase maxConcurrency
- Use batch API
- Reduce maxTokens
- Check API rate limits

### High Memory Usage
- Reduce cache size
- Call optimize() regularly
- Use streaming for large outputs
- Check for memory leaks

### Low Cache Hit Rate
- Normalize prompt formatting
- Increase cache size
- Increase TTL
- Review workload patterns

## Additional Resources

- [API Documentation](./API.md)
- [Examples](../examples/)
- [Benchmark Source](../src/benchmarks/)
- [GitHub Issues](https://github.com/ruvnet/ruvector/issues)