Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,322 @@
# Performance Optimization Guide
## Overview
Agentic-Synth is optimized for high-performance synthetic data generation with the following targets:
- **Sub-second response times** for cached requests
- **100+ concurrent generations** supported
- **Memory efficient** data handling (< 400MB)
- **50%+ cache hit rate** for typical workloads
## Performance Targets
| Metric | Target | Notes |
|--------|--------|-------|
| P99 Latency | < 1000ms | For cached requests < 100ms |
| Throughput | > 10 req/s | Scales with concurrency |
| Memory Usage | < 400MB | With 1000-item cache |
| Cache Hit Rate | > 50% | Depends on workload patterns |
| Error Rate | < 1% | With retry logic |
## Optimization Strategies
### 1. Context Caching
**Configuration:**
```typescript
const synth = new AgenticSynth({
enableCache: true,
cacheSize: 1000, // Adjust based on memory
cacheTTL: 3600000, // 1 hour in milliseconds
});
```
**Benefits:**
- Reduces API calls by 50-80%
- Sub-100ms latency for cache hits
- Automatic LRU eviction
**Best Practices:**
- Use consistent prompts for better cache hits
- Increase cache size for repetitive workloads
- Monitor cache hit rate with `synth.getMetrics()`
### 2. Model Routing
**Configuration:**
```typescript
const synth = new AgenticSynth({
modelPreference: [
'claude-sonnet-4-5-20250929',
'claude-3-5-sonnet-20241022'
],
});
```
**Features:**
- Automatic load balancing
- Performance-based routing
- Error handling and fallback
### 3. Concurrent Generation
**Configuration:**
```typescript
const synth = new AgenticSynth({
maxConcurrency: 100, // Adjust based on API limits
});
```
**Usage:**
```typescript
const prompts = [...]; // 100+ prompts
const results = await synth.generateBatch(prompts, {
maxTokens: 500
});
```
**Performance:**
- 2-3x faster than sequential
- Respects concurrency limits
- Automatic batching
### 4. Memory Management
**Configuration:**
```typescript
const synth = new AgenticSynth({
memoryLimit: 512 * 1024 * 1024, // 512MB
});
```
**Features:**
- Automatic memory tracking
- LRU eviction when over limit
- Periodic cleanup with `synth.optimize()`
### 5. Streaming for Large Outputs
**Usage:**
```typescript
const stream = synth.generateStream(prompt, {
maxTokens: 4096
});
for await (const chunk of stream) {
// Process chunk immediately
processChunk(chunk);
}
```
**Benefits:**
- Lower time-to-first-byte
- Reduced memory usage
- Better user experience
## Benchmarking
### Running Benchmarks
```bash
# Run all benchmarks
npm run benchmark
# Run specific suite
npm run benchmark -- --suite "Throughput Test"
# With custom settings
npm run benchmark -- --iterations 20 --concurrency 200
# Generate report
npm run benchmark -- --output benchmarks/report.md
```
### Benchmark Suites
1. **Throughput Test**: Measures requests per second
2. **Latency Test**: Measures P50/P95/P99 latencies
3. **Memory Test**: Measures memory usage and leaks
4. **Cache Test**: Measures cache effectiveness
5. **Concurrency Test**: Tests concurrent request handling
6. **Streaming Test**: Measures streaming performance
### Analyzing Results
```bash
# Analyze performance
npm run perf:analyze
# Generate detailed report
npm run perf:report
```
## Bottleneck Detection
The built-in bottleneck analyzer automatically detects:
### 1. Latency Bottlenecks
- **Cause**: Slow API responses, network issues
- **Solution**: Increase cache size, optimize prompts
- **Impact**: 30-50% latency reduction
### 2. Throughput Bottlenecks
- **Cause**: Low concurrency, sequential processing
- **Solution**: Increase maxConcurrency, use batch API
- **Impact**: 2-3x throughput increase
### 3. Memory Bottlenecks
- **Cause**: Large cache, memory leaks
- **Solution**: Reduce cache size, call optimize()
- **Impact**: 40-60% memory reduction
### 4. Cache Bottlenecks
- **Cause**: Low hit rate, small cache
- **Solution**: Increase cache size, optimize keys
- **Impact**: 20-40% cache improvement
## CI/CD Integration
### Performance Regression Detection
```bash
# Run in CI
npm run benchmark:ci
```
**Features:**
- Automatic threshold checking
- Fails build on regression
- Generates reports for artifacts
### GitHub Actions Example
```yaml
- name: Performance Benchmarks
run: npm run benchmark:ci
- name: Upload Report
uses: actions/upload-artifact@v3
with:
name: performance-report
path: benchmarks/performance-report.md
```
## Profiling
### CPU Profiling
```bash
npm run benchmark:profile
node --prof-process isolate-*.log > profile.txt
```
### Memory Profiling
```bash
node --expose-gc --max-old-space-size=512 dist/benchmarks/runner.js
```
### Chrome DevTools
```bash
node --inspect-brk dist/benchmarks/runner.js
# Open chrome://inspect
```
## Optimization Checklist
- [ ] Enable caching for repetitive workloads
- [ ] Set appropriate cache size (1000+ items)
- [ ] Configure concurrency based on API limits
- [ ] Use batch API for multiple generations
- [ ] Implement streaming for large outputs
- [ ] Monitor memory usage regularly
- [ ] Run benchmarks before releases
- [ ] Set up CI/CD performance tests
- [ ] Profile bottlenecks periodically
- [ ] Optimize prompt patterns for cache hits
## Performance Monitoring
### Runtime Metrics
```typescript
// Get current metrics
const metrics = synth.getMetrics();
console.log('Cache:', metrics.cache);
console.log('Memory:', metrics.memory);
console.log('Router:', metrics.router);
```
### Performance Monitor
```typescript
import { PerformanceMonitor } from '@ruvector/agentic-synth';
const monitor = new PerformanceMonitor();
monitor.start();
// ... run workload ...
const metrics = monitor.getMetrics();
console.log('Throughput:', metrics.throughput);
console.log('P99 Latency:', metrics.p99LatencyMs);
```
### Bottleneck Analysis
```typescript
import { BottleneckAnalyzer } from '@ruvector/agentic-synth';
const analyzer = new BottleneckAnalyzer();
const report = analyzer.analyze(metrics);
if (report.detected) {
console.log('Bottlenecks:', report.bottlenecks);
console.log('Recommendations:', report.recommendations);
}
```
## Best Practices
1. **Cache Strategy**: Use prompts as cache keys, normalize formatting
2. **Concurrency**: Start with 100, increase based on API limits
3. **Memory**: Monitor with getMetrics(), call optimize() periodically
4. **Streaming**: Use for outputs > 1000 tokens
5. **Benchmarking**: Run before releases, track trends
6. **Monitoring**: Enable in production, set up alerts
7. **Optimization**: Profile first, optimize bottlenecks
8. **Testing**: Include performance tests in CI/CD
## Troubleshooting
### High Latency
- Check cache hit rate
- Increase cache size
- Optimize prompt patterns
- Check network connectivity
### Low Throughput
- Increase maxConcurrency
- Use batch API
- Reduce maxTokens
- Check API rate limits
### High Memory Usage
- Reduce cache size
- Call optimize() regularly
- Use streaming for large outputs
- Check for memory leaks
### Low Cache Hit Rate
- Normalize prompt formatting
- Increase cache size
- Increase TTL
- Review workload patterns
## Additional Resources
- [API Documentation](./API.md)
- [Examples](../examples/)
- [Benchmark Source](../src/benchmarks/)
- [GitHub Issues](https://github.com/ruvnet/ruvector/issues)