Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
322
npm/packages/agentic-synth/docs/PERFORMANCE.md
Normal file
322
npm/packages/agentic-synth/docs/PERFORMANCE.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# Performance Optimization Guide
|
||||
|
||||
## Overview
|
||||
|
||||
Agentic-Synth is optimized for high-performance synthetic data generation with the following targets:
|
||||
- **Sub-second response times** for cached requests
|
||||
- **100+ concurrent generations** supported
|
||||
- **Memory efficient** data handling (< 400MB)
|
||||
- **50%+ cache hit rate** for typical workloads
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Metric | Target | Notes |
|
||||
|--------|--------|-------|
|
||||
| P99 Latency | < 1000ms | For cached requests < 100ms |
|
||||
| Throughput | > 10 req/s | Scales with concurrency |
|
||||
| Memory Usage | < 400MB | With 1000-item cache |
|
||||
| Cache Hit Rate | > 50% | Depends on workload patterns |
|
||||
| Error Rate | < 1% | With retry logic |
|
||||
|
||||
## Optimization Strategies
|
||||
|
||||
### 1. Context Caching
|
||||
|
||||
**Configuration:**
|
||||
```typescript
|
||||
const synth = new AgenticSynth({
|
||||
enableCache: true,
|
||||
cacheSize: 1000, // Adjust based on memory
|
||||
cacheTTL: 3600000, // 1 hour in milliseconds
|
||||
});
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Reduces API calls by 50-80%
|
||||
- Sub-100ms latency for cache hits
|
||||
- Automatic LRU eviction
|
||||
|
||||
**Best Practices:**
|
||||
- Use consistent prompts for better cache hits
|
||||
- Increase cache size for repetitive workloads
|
||||
- Monitor cache hit rate with `synth.getMetrics()`
|
||||
|
||||
### 2. Model Routing
|
||||
|
||||
**Configuration:**
|
||||
```typescript
|
||||
const synth = new AgenticSynth({
|
||||
modelPreference: [
|
||||
'claude-sonnet-4-5-20250929',
|
||||
'claude-3-5-sonnet-20241022'
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Automatic load balancing
|
||||
- Performance-based routing
|
||||
- Error handling and fallback
|
||||
|
||||
### 3. Concurrent Generation
|
||||
|
||||
**Configuration:**
|
||||
```typescript
|
||||
const synth = new AgenticSynth({
|
||||
maxConcurrency: 100, // Adjust based on API limits
|
||||
});
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```typescript
|
||||
const prompts = [...]; // 100+ prompts
|
||||
const results = await synth.generateBatch(prompts, {
|
||||
maxTokens: 500
|
||||
});
|
||||
```
|
||||
|
||||
**Performance:**
|
||||
- 2-3x faster than sequential
|
||||
- Respects concurrency limits
|
||||
- Automatic batching
|
||||
|
||||
### 4. Memory Management
|
||||
|
||||
**Configuration:**
|
||||
```typescript
|
||||
const synth = new AgenticSynth({
|
||||
memoryLimit: 512 * 1024 * 1024, // 512MB
|
||||
});
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Automatic memory tracking
|
||||
- LRU eviction when over limit
|
||||
- Periodic cleanup with `synth.optimize()`
|
||||
|
||||
### 5. Streaming for Large Outputs
|
||||
|
||||
**Usage:**
|
||||
```typescript
|
||||
const stream = synth.generateStream(prompt, {
|
||||
maxTokens: 4096
|
||||
});
|
||||
|
||||
for await (const chunk of stream) {
|
||||
// Process chunk immediately
|
||||
processChunk(chunk);
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Lower time-to-first-byte
|
||||
- Reduced memory usage
|
||||
- Better user experience
|
||||
|
||||
## Benchmarking
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
```bash
|
||||
# Run all benchmarks
|
||||
npm run benchmark
|
||||
|
||||
# Run specific suite
|
||||
npm run benchmark -- --suite "Throughput Test"
|
||||
|
||||
# With custom settings
|
||||
npm run benchmark -- --iterations 20 --concurrency 200
|
||||
|
||||
# Generate report
|
||||
npm run benchmark -- --output benchmarks/report.md
|
||||
```
|
||||
|
||||
### Benchmark Suites
|
||||
|
||||
1. **Throughput Test**: Measures requests per second
|
||||
2. **Latency Test**: Measures P50/P95/P99 latencies
|
||||
3. **Memory Test**: Measures memory usage and leaks
|
||||
4. **Cache Test**: Measures cache effectiveness
|
||||
5. **Concurrency Test**: Tests concurrent request handling
|
||||
6. **Streaming Test**: Measures streaming performance
|
||||
|
||||
### Analyzing Results
|
||||
|
||||
```bash
|
||||
# Analyze performance
|
||||
npm run perf:analyze
|
||||
|
||||
# Generate detailed report
|
||||
npm run perf:report
|
||||
```
|
||||
|
||||
## Bottleneck Detection
|
||||
|
||||
The built-in bottleneck analyzer automatically detects:
|
||||
|
||||
### 1. Latency Bottlenecks
|
||||
- **Cause**: Slow API responses, network issues
|
||||
- **Solution**: Increase cache size, optimize prompts
|
||||
- **Impact**: 30-50% latency reduction
|
||||
|
||||
### 2. Throughput Bottlenecks
|
||||
- **Cause**: Low concurrency, sequential processing
|
||||
- **Solution**: Increase maxConcurrency, use batch API
|
||||
- **Impact**: 2-3x throughput increase
|
||||
|
||||
### 3. Memory Bottlenecks
|
||||
- **Cause**: Large cache, memory leaks
|
||||
- **Solution**: Reduce cache size, call optimize()
|
||||
- **Impact**: 40-60% memory reduction
|
||||
|
||||
### 4. Cache Bottlenecks
|
||||
- **Cause**: Low hit rate, small cache
|
||||
- **Solution**: Increase cache size, optimize keys
|
||||
- **Impact**: 20-40% cache improvement
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### Performance Regression Detection
|
||||
|
||||
```bash
|
||||
# Run in CI
|
||||
npm run benchmark:ci
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Automatic threshold checking
|
||||
- Fails build on regression
|
||||
- Generates reports for artifacts
|
||||
|
||||
### GitHub Actions Example
|
||||
|
||||
```yaml
|
||||
- name: Performance Benchmarks
|
||||
run: npm run benchmark:ci
|
||||
|
||||
- name: Upload Report
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: performance-report
|
||||
path: benchmarks/performance-report.md
|
||||
```
|
||||
|
||||
## Profiling
|
||||
|
||||
### CPU Profiling
|
||||
|
||||
```bash
|
||||
npm run benchmark:profile
|
||||
node --prof-process isolate-*.log > profile.txt
|
||||
```
|
||||
|
||||
### Memory Profiling
|
||||
|
||||
```bash
|
||||
node --expose-gc --max-old-space-size=512 dist/benchmarks/runner.js
|
||||
```
|
||||
|
||||
### Chrome DevTools
|
||||
|
||||
```bash
|
||||
node --inspect-brk dist/benchmarks/runner.js
|
||||
# Open chrome://inspect
|
||||
```
|
||||
|
||||
## Optimization Checklist
|
||||
|
||||
- [ ] Enable caching for repetitive workloads
|
||||
- [ ] Set appropriate cache size (1000+ items)
|
||||
- [ ] Configure concurrency based on API limits
|
||||
- [ ] Use batch API for multiple generations
|
||||
- [ ] Implement streaming for large outputs
|
||||
- [ ] Monitor memory usage regularly
|
||||
- [ ] Run benchmarks before releases
|
||||
- [ ] Set up CI/CD performance tests
|
||||
- [ ] Profile bottlenecks periodically
|
||||
- [ ] Optimize prompt patterns for cache hits
|
||||
|
||||
## Performance Monitoring
|
||||
|
||||
### Runtime Metrics
|
||||
|
||||
```typescript
|
||||
// Get current metrics
|
||||
const metrics = synth.getMetrics();
|
||||
console.log('Cache:', metrics.cache);
|
||||
console.log('Memory:', metrics.memory);
|
||||
console.log('Router:', metrics.router);
|
||||
```
|
||||
|
||||
### Performance Monitor
|
||||
|
||||
```typescript
|
||||
import { PerformanceMonitor } from '@ruvector/agentic-synth';
|
||||
|
||||
const monitor = new PerformanceMonitor();
|
||||
monitor.start();
|
||||
|
||||
// ... run workload ...
|
||||
|
||||
const metrics = monitor.getMetrics();
|
||||
console.log('Throughput:', metrics.throughput);
|
||||
console.log('P99 Latency:', metrics.p99LatencyMs);
|
||||
```
|
||||
|
||||
### Bottleneck Analysis
|
||||
|
||||
```typescript
|
||||
import { BottleneckAnalyzer } from '@ruvector/agentic-synth';
|
||||
|
||||
const analyzer = new BottleneckAnalyzer();
|
||||
const report = analyzer.analyze(metrics);
|
||||
|
||||
if (report.detected) {
|
||||
console.log('Bottlenecks:', report.bottlenecks);
|
||||
console.log('Recommendations:', report.recommendations);
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Cache Strategy**: Use prompts as cache keys, normalize formatting
|
||||
2. **Concurrency**: Start with 100, increase based on API limits
|
||||
3. **Memory**: Monitor with getMetrics(), call optimize() periodically
|
||||
4. **Streaming**: Use for outputs > 1000 tokens
|
||||
5. **Benchmarking**: Run before releases, track trends
|
||||
6. **Monitoring**: Enable in production, set up alerts
|
||||
7. **Optimization**: Profile first, optimize bottlenecks
|
||||
8. **Testing**: Include performance tests in CI/CD
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### High Latency
|
||||
- Check cache hit rate
|
||||
- Increase cache size
|
||||
- Optimize prompt patterns
|
||||
- Check network connectivity
|
||||
|
||||
### Low Throughput
|
||||
- Increase maxConcurrency
|
||||
- Use batch API
|
||||
- Reduce maxTokens
|
||||
- Check API rate limits
|
||||
|
||||
### High Memory Usage
|
||||
- Reduce cache size
|
||||
- Call optimize() regularly
|
||||
- Use streaming for large outputs
|
||||
- Check for memory leaks
|
||||
|
||||
### Low Cache Hit Rate
|
||||
- Normalize prompt formatting
|
||||
- Increase cache size
|
||||
- Increase TTL
|
||||
- Review workload patterns
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [API Documentation](./API.md)
|
||||
- [Examples](../examples/)
|
||||
- [Benchmark Source](../src/benchmarks/)
|
||||
- [GitHub Issues](https://github.com/ruvnet/ruvector/issues)
|
||||
Reference in New Issue
Block a user