12 KiB
DSPy.ts Learning Session
Production-ready DSPy integration framework for multi-model AI training with automatic prompt optimization, cross-model learning, and comprehensive benchmarking.
Overview
The DSPy Learning Session provides a powerful orchestration framework for training multiple AI models concurrently, optimizing prompts automatically, and comparing performance across different model providers.
Key Features
- 🚀 Concurrent Multi-Model Training: Train 4+ models in parallel (Claude, GPT-4, Llama, Gemini)
- 🧠 DSPy-Powered Optimization: Automatic prompt optimization using DSPy signatures
- 📊 Real-time Metrics: Track quality, latency, cost, and convergence in real-time
- 🔄 Cross-Model Learning: Share successful patterns across different models
- 💰 Cost Tracking: Monitor and control costs with budget limits
- ⚡ Convergence Detection: Automatically detect when models reach optimal performance
- 🔗 Hooks Integration: Seamless integration with Claude Flow swarm coordination
- 📈 Comprehensive Benchmarking: Generate detailed reports with comparative analysis
Architecture
Core Components
1. DSPyTrainingSession
Main orchestrator that manages the entire training pipeline.
const session = new DSPyTrainingSession({
models: [/* model configs */],
optimizationRounds: 5,
convergenceThreshold: 0.95,
maxConcurrency: 4,
enableCrossLearning: true,
enableHooksIntegration: true,
costBudget: 10.0
});
2. ModelTrainingAgent
Abstract base class for model-specific agents.
ClaudeSonnetAgent: Claude Sonnet 4 trainingGPT4Agent: GPT-4 Turbo trainingLlamaAgent: Llama 3.1 trainingGeminiAgent: Gemini 2.0 Flash training
3. OptimizationEngine
DSPy-powered prompt optimization engine.
const optimizer = new OptimizationEngine();
const signature = optimizer.createSignature(
'task-name',
'input description',
'output description',
{
examples: [/* few-shot examples */],
constraints: [/* validation rules */],
objectives: [/* optimization goals */]
}
);
4. BenchmarkCollector
Metrics collection and analysis.
const collector = new BenchmarkCollector();
collector.addResult(result);
const comparison = collector.getComparison();
const bestModel = collector.getBestModel();
Training Pipeline
Phase 1: Baseline Generation
All models generate initial outputs to establish baseline performance.
- Runs 3 iterations per model (configurable)
- Collects quality and performance metrics
- No optimization applied
Phase 2: DSPy Optimization
Prompts are optimized based on previous results.
- 5 rounds of optimization per model (configurable)
- DSPy signatures guide optimization
- Continuous quality improvement
- Convergence detection
Phase 3: Cross-Model Learning
Best patterns are shared across models.
- Identify best-performing model
- Extract successful patterns
- Apply to other models
- Boost overall performance
Phase 4: Final Benchmark
Comprehensive performance comparison.
- 50-100 samples per model (configurable)
- Statistical analysis
- Cost-per-quality metrics
- Latency profiling
Phase 5: Report Generation
Detailed analysis and recommendations.
- Quality score comparisons
- Cost efficiency analysis
- Latency benchmarks
- Best model identification
- Improvement rates
Metrics
Quality Metrics (0.0-1.0)
- Score: Overall quality score (weighted average)
- Accuracy: Output correctness and format compliance
- Coherence: Logical flow and consistency
- Relevance: Alignment with input requirements
- Diversity: Vocabulary richness
- Creativity: Novel expression and uncommon patterns
Performance Metrics
- Latency: Generation time (milliseconds)
- Throughput: Samples per second
- Tokens Used: Total token consumption
- Cost: USD per generation
- Memory Usage: Heap usage (MB)
- Error Rate: Failed generations ratio
Training Metrics
- Convergence Rate: Quality improvement velocity
- Improvement Rate: Total quality gain percentage
- Cost Efficiency: Quality per dollar spent
- Learning Speed: Iterations to convergence
Usage Examples
Basic Training
import { DSPyTrainingSession, ModelProvider } from './training/dspy-learning-session.js';
const session = new DSPyTrainingSession({
models: [
{
provider: ModelProvider.CLAUDE,
model: 'claude-sonnet-4',
apiKey: process.env.ANTHROPIC_API_KEY
},
{
provider: ModelProvider.GEMINI,
model: 'gemini-2.0-flash-exp',
apiKey: process.env.GEMINI_API_KEY
}
],
optimizationRounds: 5,
costBudget: 5.0
});
// Listen to events
session.on('iteration', (result) => {
console.log(`${result.modelProvider}: Quality=${result.quality.score.toFixed(3)}`);
});
session.on('complete', (data) => {
console.log('Training complete!');
console.log(data.report);
});
// Run training
const signature = optimizer.createSignature(
'task',
'input',
'output',
{ constraints: ['min_length:100'] }
);
await session.run('Your prompt here', signature);
Cost-Optimized Training
const session = new DSPyTrainingSession({
models: [
{
provider: ModelProvider.GEMINI, // Low cost
model: 'gemini-2.0-flash-exp',
apiKey: process.env.GEMINI_API_KEY
},
{
provider: ModelProvider.LLAMA, // Very low cost
model: 'llama-3.1-70b',
apiKey: process.env.TOGETHER_API_KEY
}
],
optimizationRounds: 3,
baselineIterations: 2,
benchmarkSamples: 20,
costBudget: 1.0 // Strict $1 budget
});
Quality-Focused Training
const session = new DSPyTrainingSession({
models: [
{
provider: ModelProvider.CLAUDE,
model: 'claude-sonnet-4',
apiKey: process.env.ANTHROPIC_API_KEY,
temperature: 0.3 // Lower for consistency
},
{
provider: ModelProvider.GPT4,
model: 'gpt-4-turbo',
apiKey: process.env.OPENAI_API_KEY,
temperature: 0.3
}
],
optimizationRounds: 15,
convergenceThreshold: 0.98,
benchmarkSamples: 100
});
Event System
Available Events
start: Training session beginsphase: Phase transitioniteration: Single iteration completemetrics: Real-time metrics updateoptimization_round: Optimization round startsconverged: Model reaches convergencebenchmark_progress: Benchmark progress updatebudget_exceeded: Cost budget exceededreport: Final report generatedcomplete: Training session completestopped: Session manually stoppederror: Error occurredhooks_integration: Hooks coordination event
Event Listeners
session.on('iteration', (result: IterationResult) => {
// Handle each iteration
});
session.on('phase', (phase: TrainingPhase) => {
// Handle phase transitions
});
session.on('metrics', (metrics) => {
// Track real-time metrics
});
session.on('complete', (data) => {
// Process final results
});
Integration
Claude Flow Hooks
When enableHooksIntegration: true, the session automatically:
- Pre-Task: Initialize swarm coordination
- During Training: Store results in shared memory
- Post-Task: Export metrics and best models
- Session End: Generate coordination reports
Memory Coordination
// Results stored in swarm memory
{
key: 'swarm/training/dspy-results',
value: {
bestModel: 'claude',
comparison: { /* stats */ },
totalCost: 5.23,
timestamp: '2025-11-22T...'
}
}
Configuration
TrainingConfig
interface TrainingConfig {
models: ModelConfig[]; // Array of model configurations
optimizationRounds?: number; // Default: 5
convergenceThreshold?: number; // Default: 0.95
maxConcurrency?: number; // Default: 4
enableCrossLearning?: boolean; // Default: true
enableHooksIntegration?: boolean; // Default: true
costBudget?: number; // USD, optional
timeoutPerIteration?: number; // Default: 30000ms
baselineIterations?: number; // Default: 3
benchmarkSamples?: number; // Default: 100
}
ModelConfig
interface ModelConfig {
provider: ModelProvider;
model: string;
apiKey: string;
temperature?: number; // Default: 0.7
maxTokens?: number; // Default: 1000
topP?: number; // Optional
presencePenalty?: number; // Optional
frequencyPenalty?: number; // Optional
}
DSPySignature
interface DSPySignature {
input: string; // Input description
output: string; // Expected output format
examples?: Array<{ // Few-shot examples
input: string;
output: string;
}>;
constraints?: string[]; // Validation rules
objectives?: string[]; // Optimization goals
}
Cost Information
Model Pricing (Approximate)
| Model | Cost per 1K tokens | Relative Cost |
|---|---|---|
| Gemini Flash | $0.00025 | 1x (cheapest) |
| Llama 3.1 | $0.0002 | 0.8x |
| Claude Sonnet | $0.003 | 12x |
| GPT-4 Turbo | $0.03 | 120x |
Budget Planning
For typical training session:
- Budget $1: ~200 iterations with Gemini/Llama
- Budget $5: ~100 iterations with Claude + mixed models
- Budget $10: ~50 iterations with all models including GPT-4
Best Practices
1. Start Small
// Begin with 2 models and low iterations
const session = new DSPyTrainingSession({
models: [
{ provider: ModelProvider.GEMINI, /* ... */ },
{ provider: ModelProvider.CLAUDE, /* ... */ }
],
optimizationRounds: 3,
benchmarkSamples: 20
});
2. Use Cost-Effective Models First
Train with Gemini/Llama first, then validate winners with Claude/GPT-4.
3. Set Realistic Budgets
Start with $1-2 budgets for experimentation.
4. Monitor Convergence
Enable convergence detection to avoid over-training.
5. Leverage Cross-Learning
Enable cross-model learning to share best practices.
6. Define Clear Signatures
Provide examples, constraints, and objectives for better optimization.
Troubleshooting
High Costs
- Reduce
benchmarkSamples - Lower
optimizationRounds - Use cost-effective models (Gemini, Llama)
- Set strict
costBudget
Slow Convergence
- Increase
optimizationRounds - Add more examples to DSPy signature
- Adjust model temperature (lower = more consistent)
- Enable cross-model learning
Low Quality Scores
- Review DSPy signature constraints
- Add more few-shot examples
- Increase
convergenceThreshold - Use higher-quality models
Memory Issues
- Reduce
maxConcurrency - Lower
benchmarkSamples - Clear results between sessions
Examples
See examples/dspy-training-example.ts for:
- Basic training session
- Advanced monitoring
- Cost-optimized training
- Quality-focused training
- Benchmark comparison
Run examples:
# Run basic example
npm run example:dspy 0
# Run cost-optimized example
npm run example:dspy 2
# Run quality-focused example
npm run example:dspy 3
API Reference
Classes
DSPyTrainingSession: Main orchestratorModelTrainingAgent: Base agent classClaudeSonnetAgent: Claude training agentGPT4Agent: GPT-4 training agentLlamaAgent: Llama training agentGeminiAgent: Gemini training agentOptimizationEngine: DSPy optimizationBenchmarkCollector: Metrics collection
Enums
ModelProvider: Model provider typesTrainingPhase: Training pipeline phases
Interfaces
TrainingConfig: Session configurationModelConfig: Model configurationDSPySignature: DSPy signature definitionQualityMetrics: Quality measurementPerformanceMetrics: Performance measurementIterationResult: Single iteration result
License
MIT
Contributing
Contributions welcome! Please see CONTRIBUTING.md.