git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
323 lines
7.1 KiB
Markdown
323 lines
7.1 KiB
Markdown
# Performance Optimization Guide
|
|
|
|
## Overview
|
|
|
|
Agentic-Synth is optimized for high-performance synthetic data generation with the following targets:
|
|
- **Sub-second response times** for cached requests
|
|
- **100+ concurrent generations** supported
|
|
- **Memory efficient** data handling (< 400MB)
|
|
- **50%+ cache hit rate** for typical workloads
|
|
|
|
## Performance Targets
|
|
|
|
| Metric | Target | Notes |
|
|
|--------|--------|-------|
|
|
| P99 Latency | < 1000ms | For cached requests < 100ms |
|
|
| Throughput | > 10 req/s | Scales with concurrency |
|
|
| Memory Usage | < 400MB | With 1000-item cache |
|
|
| Cache Hit Rate | > 50% | Depends on workload patterns |
|
|
| Error Rate | < 1% | With retry logic |
|
|
|
|
## Optimization Strategies
|
|
|
|
### 1. Context Caching
|
|
|
|
**Configuration:**
|
|
```typescript
|
|
const synth = new AgenticSynth({
|
|
enableCache: true,
|
|
cacheSize: 1000, // Adjust based on memory
|
|
cacheTTL: 3600000, // 1 hour in milliseconds
|
|
});
|
|
```
|
|
|
|
**Benefits:**
|
|
- Reduces API calls by 50-80%
|
|
- Sub-100ms latency for cache hits
|
|
- Automatic LRU eviction
|
|
|
|
**Best Practices:**
|
|
- Use consistent prompts for better cache hits
|
|
- Increase cache size for repetitive workloads
|
|
- Monitor cache hit rate with `synth.getMetrics()`
|
|
|
|
### 2. Model Routing
|
|
|
|
**Configuration:**
|
|
```typescript
|
|
const synth = new AgenticSynth({
|
|
modelPreference: [
|
|
'claude-sonnet-4-5-20250929',
|
|
'claude-3-5-sonnet-20241022'
|
|
],
|
|
});
|
|
```
|
|
|
|
**Features:**
|
|
- Automatic load balancing
|
|
- Performance-based routing
|
|
- Error handling and fallback
|
|
|
|
### 3. Concurrent Generation
|
|
|
|
**Configuration:**
|
|
```typescript
|
|
const synth = new AgenticSynth({
|
|
maxConcurrency: 100, // Adjust based on API limits
|
|
});
|
|
```
|
|
|
|
**Usage:**
|
|
```typescript
|
|
const prompts = [...]; // 100+ prompts
|
|
const results = await synth.generateBatch(prompts, {
|
|
maxTokens: 500
|
|
});
|
|
```
|
|
|
|
**Performance:**
|
|
- 2-3x faster than sequential
|
|
- Respects concurrency limits
|
|
- Automatic batching
|
|
|
|
### 4. Memory Management
|
|
|
|
**Configuration:**
|
|
```typescript
|
|
const synth = new AgenticSynth({
|
|
memoryLimit: 512 * 1024 * 1024, // 512MB
|
|
});
|
|
```
|
|
|
|
**Features:**
|
|
- Automatic memory tracking
|
|
- LRU eviction when over limit
|
|
- Periodic cleanup with `synth.optimize()`
|
|
|
|
### 5. Streaming for Large Outputs
|
|
|
|
**Usage:**
|
|
```typescript
|
|
const stream = synth.generateStream(prompt, {
|
|
maxTokens: 4096
|
|
});
|
|
|
|
for await (const chunk of stream) {
|
|
// Process chunk immediately
|
|
processChunk(chunk);
|
|
}
|
|
```
|
|
|
|
**Benefits:**
|
|
- Lower time-to-first-byte
|
|
- Reduced memory usage
|
|
- Better user experience
|
|
|
|
## Benchmarking
|
|
|
|
### Running Benchmarks
|
|
|
|
```bash
|
|
# Run all benchmarks
|
|
npm run benchmark
|
|
|
|
# Run specific suite
|
|
npm run benchmark -- --suite "Throughput Test"
|
|
|
|
# With custom settings
|
|
npm run benchmark -- --iterations 20 --concurrency 200
|
|
|
|
# Generate report
|
|
npm run benchmark -- --output benchmarks/report.md
|
|
```
|
|
|
|
### Benchmark Suites
|
|
|
|
1. **Throughput Test**: Measures requests per second
|
|
2. **Latency Test**: Measures P50/P95/P99 latencies
|
|
3. **Memory Test**: Measures memory usage and leaks
|
|
4. **Cache Test**: Measures cache effectiveness
|
|
5. **Concurrency Test**: Tests concurrent request handling
|
|
6. **Streaming Test**: Measures streaming performance
|
|
|
|
### Analyzing Results
|
|
|
|
```bash
|
|
# Analyze performance
|
|
npm run perf:analyze
|
|
|
|
# Generate detailed report
|
|
npm run perf:report
|
|
```
|
|
|
|
## Bottleneck Detection
|
|
|
|
The built-in bottleneck analyzer automatically detects:
|
|
|
|
### 1. Latency Bottlenecks
|
|
- **Cause**: Slow API responses, network issues
|
|
- **Solution**: Increase cache size, optimize prompts
|
|
- **Impact**: 30-50% latency reduction
|
|
|
|
### 2. Throughput Bottlenecks
|
|
- **Cause**: Low concurrency, sequential processing
|
|
- **Solution**: Increase maxConcurrency, use batch API
|
|
- **Impact**: 2-3x throughput increase
|
|
|
|
### 3. Memory Bottlenecks
|
|
- **Cause**: Large cache, memory leaks
|
|
- **Solution**: Reduce cache size, call optimize()
|
|
- **Impact**: 40-60% memory reduction
|
|
|
|
### 4. Cache Bottlenecks
|
|
- **Cause**: Low hit rate, small cache
|
|
- **Solution**: Increase cache size, optimize keys
|
|
- **Impact**: 20-40% cache improvement
|
|
|
|
## CI/CD Integration
|
|
|
|
### Performance Regression Detection
|
|
|
|
```bash
|
|
# Run in CI
|
|
npm run benchmark:ci
|
|
```
|
|
|
|
**Features:**
|
|
- Automatic threshold checking
|
|
- Fails build on regression
|
|
- Generates reports for artifacts
|
|
|
|
### GitHub Actions Example
|
|
|
|
```yaml
|
|
- name: Performance Benchmarks
|
|
run: npm run benchmark:ci
|
|
|
|
- name: Upload Report
|
|
uses: actions/upload-artifact@v3
|
|
with:
|
|
name: performance-report
|
|
path: benchmarks/performance-report.md
|
|
```
|
|
|
|
## Profiling
|
|
|
|
### CPU Profiling
|
|
|
|
```bash
|
|
npm run benchmark:profile
|
|
node --prof-process isolate-*.log > profile.txt
|
|
```
|
|
|
|
### Memory Profiling
|
|
|
|
```bash
|
|
node --expose-gc --max-old-space-size=512 dist/benchmarks/runner.js
|
|
```
|
|
|
|
### Chrome DevTools
|
|
|
|
```bash
|
|
node --inspect-brk dist/benchmarks/runner.js
|
|
# Open chrome://inspect
|
|
```
|
|
|
|
## Optimization Checklist
|
|
|
|
- [ ] Enable caching for repetitive workloads
|
|
- [ ] Set appropriate cache size (1000+ items)
|
|
- [ ] Configure concurrency based on API limits
|
|
- [ ] Use batch API for multiple generations
|
|
- [ ] Implement streaming for large outputs
|
|
- [ ] Monitor memory usage regularly
|
|
- [ ] Run benchmarks before releases
|
|
- [ ] Set up CI/CD performance tests
|
|
- [ ] Profile bottlenecks periodically
|
|
- [ ] Optimize prompt patterns for cache hits
|
|
|
|
## Performance Monitoring
|
|
|
|
### Runtime Metrics
|
|
|
|
```typescript
|
|
// Get current metrics
|
|
const metrics = synth.getMetrics();
|
|
console.log('Cache:', metrics.cache);
|
|
console.log('Memory:', metrics.memory);
|
|
console.log('Router:', metrics.router);
|
|
```
|
|
|
|
### Performance Monitor
|
|
|
|
```typescript
|
|
import { PerformanceMonitor } from '@ruvector/agentic-synth';
|
|
|
|
const monitor = new PerformanceMonitor();
|
|
monitor.start();
|
|
|
|
// ... run workload ...
|
|
|
|
const metrics = monitor.getMetrics();
|
|
console.log('Throughput:', metrics.throughput);
|
|
console.log('P99 Latency:', metrics.p99LatencyMs);
|
|
```
|
|
|
|
### Bottleneck Analysis
|
|
|
|
```typescript
|
|
import { BottleneckAnalyzer } from '@ruvector/agentic-synth';
|
|
|
|
const analyzer = new BottleneckAnalyzer();
|
|
const report = analyzer.analyze(metrics);
|
|
|
|
if (report.detected) {
|
|
console.log('Bottlenecks:', report.bottlenecks);
|
|
console.log('Recommendations:', report.recommendations);
|
|
}
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Cache Strategy**: Use prompts as cache keys, normalize formatting
|
|
2. **Concurrency**: Start with 100, increase based on API limits
|
|
3. **Memory**: Monitor with getMetrics(), call optimize() periodically
|
|
4. **Streaming**: Use for outputs > 1000 tokens
|
|
5. **Benchmarking**: Run before releases, track trends
|
|
6. **Monitoring**: Enable in production, set up alerts
|
|
7. **Optimization**: Profile first, optimize bottlenecks
|
|
8. **Testing**: Include performance tests in CI/CD
|
|
|
|
## Troubleshooting
|
|
|
|
### High Latency
|
|
- Check cache hit rate
|
|
- Increase cache size
|
|
- Optimize prompt patterns
|
|
- Check network connectivity
|
|
|
|
### Low Throughput
|
|
- Increase maxConcurrency
|
|
- Use batch API
|
|
- Reduce maxTokens
|
|
- Check API rate limits
|
|
|
|
### High Memory Usage
|
|
- Reduce cache size
|
|
- Call optimize() regularly
|
|
- Use streaming for large outputs
|
|
- Check for memory leaks
|
|
|
|
### Low Cache Hit Rate
|
|
- Normalize prompt formatting
|
|
- Increase cache size
|
|
- Increase TTL
|
|
- Review workload patterns
|
|
|
|
## Additional Resources
|
|
|
|
- [API Documentation](./API.md)
|
|
- [Examples](../examples/)
|
|
- [Benchmark Source](../src/benchmarks/)
|
|
- [GitHub Issues](https://github.com/ruvnet/ruvector/issues)
|