# RuvLLM Training Module Fine-tuning dataset generation for RuvLTRA models, focusing on Claude Flow agent task routing and model selection. ## SOTA Achievements (v2.3) | Metric | Before | After | Method | |--------|--------|-------|--------| | **Hybrid Routing Accuracy** | 95% | **100%** | Keyword-First + Embedding Fallback | | **Embedding-Only Accuracy** | 45% | **88.2%** | Contrastive Learning (Triplet + InfoNCE) | | **Hard Negative Accuracy** | N/A | **81.2%** | Claude-Generated Confusing Pairs | | **Agent Types Supported** | 13 | 13 | All Claude Code agent types | ### Training Data (v2.3 SOTA) - **Base triplets**: 578 examples from Claude Code routing data - **Claude-generated hard negatives**: 500+ high-quality confusing pairs - **Total training set**: 1,078 triplets - **Hard negative ratio**: 48.4% (up from 18%) ### Training Pipeline ``` ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ Hard Negative │────►│ Contrastive │────►│ GRPO Feedback │ │ Generation │ │ Training │ │ Loop │ │ (Claude Opus) │ │ (Candle/Metal) │ │ (Claude Judge) │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ ▼ ┌──────────────────┐ │ GGUF Export │ │ (Adapter Merge) │ └──────────────────┘ ``` ## Overview The training module generates synthetic datasets for fine-tuning RuvLTRA models on two key tasks: 1. **Agent Routing**: Classify tasks to appropriate Claude Flow agents (Coder, Researcher, Security, Architecture, Reviewer) 2. **Model Selection**: Route tasks to optimal Claude models (Haiku/Sonnet/Opus) based on complexity ## Real Contrastive Training (v2.3 - Production) The `real_trainer` module provides production-grade training with actual Candle weight updates: ```rust use ruvllm::training::{RealContrastiveTrainer, RealTrainingConfig, run_training_pipeline}; use std::path::PathBuf; // Option 1: Full pipeline with GRPO feedback #[tokio::main] async fn main() -> Result<(), String> { run_training_pipeline( &PathBuf::from("~/.ruvllm/training/combined-sota.jsonl"), &PathBuf::from("ruvltra-claude-code-0.5b-q4_k_m.gguf"), &PathBuf::from("ruvltra-claude-code-sota.gguf"), Some(&std::env::var("ANTHROPIC_API_KEY").unwrap()), // For GRPO ).await } // Option 2: Manual training with fine-grained control let config = RealTrainingConfig { model_path: PathBuf::from("ruvltra-claude-code-0.5b-q4_k_m.gguf"), output_path: PathBuf::from("ruvltra-claude-code-sota.gguf"), learning_rate: 2e-5, weight_decay: 0.01, batch_size: 16, epochs: 30, margin: 0.5, // Triplet loss margin temperature: 0.07, // InfoNCE temperature embedding_dim: 896, // Qwen 0.5B embedding size use_metal: true, // Apple Silicon GPU acceleration enable_grpo: true, // Enable GRPO reward scaling ..Default::default() }; let mut trainer = RealContrastiveTrainer::new(config)?; trainer.load_triplets("combined-sota.jsonl")?; // Train with real weight updates let result = trainer.train()?; println!("Best accuracy: {:.2}%", result.best_accuracy * 100.0); // Export to GGUF format let export = trainer.export_gguf("output.gguf")?; println!("Exported {} weights to {}", export.total_weights, export.weights_path.display()); ``` ### GGUF Export The trainer exports adapter weights that can be merged with the base Qwen model: ```bash # After training, merge adapter with base model bash output.gguf.weights/merge_adapter.sh # Files created: # - output.gguf.weights/adapter_weights.bin (binary weights) # - output.gguf.weights/metadata.json (training config) # - output.gguf.weights/merge_adapter.sh (merge script) ``` ### GRPO Feedback Loop GRPO (Group Relative Policy Optimization) uses Claude as a judge to improve training: ```rust use ruvllm::training::{GrpoEvaluator, GrpoFeedback}; let evaluator = GrpoEvaluator::new(api_key); // Evaluate predictions let predictions = vec![ ("Add error handling".to_string(), "coder".to_string(), "coder".to_string()), ("Review the PR".to_string(), "reviewer".to_string(), "tester".to_string()), ]; let feedback = evaluator.evaluate(&predictions).await?; for fb in feedback { trainer.add_grpo_feedback(fb); } // Re-train with GRPO-enhanced loss scaling let result = trainer.train()?; ``` ## Contrastive Learning (Simulated) The `contrastive` module provides state-of-the-art embedding fine-tuning: ```rust use ruvllm::training::{ContrastiveTrainer, ContrastiveConfig, TrainingTriplet}; // Configure contrastive training let config = ContrastiveConfig { learning_rate: 2e-5, margin: 0.5, // Triplet loss margin temperature: 0.07, // InfoNCE temperature batch_size: 32, embedding_dim: 896, // Qwen 0.5B embedding size hard_negative_ratio: 0.18, use_metal: true, // Apple Silicon GPU ..Default::default() }; // Initialize and train let mut trainer = ContrastiveTrainer::new(config)?; trainer.load_triplets("triplets.jsonl")?; let result = trainer.train(30)?; // 30 epochs println!("Final accuracy: {:.2}%", result.final_accuracy * 100.0); ``` ### Claude-Powered Hard Negative Generation Generate high-quality confusing training pairs using Claude Opus 4.5: ```bash node scripts/training/claude-hard-negatives.js --count=10 --grpo # Output: ~/.ruvllm/training/claude-hard-negatives.jsonl ``` This generates triplets for confusing agent pairs: - `coder` vs `refactorer` (both modify code) - `researcher` vs `architect` (both analyze) - `reviewer` vs `tester` (both validate) - `debugger` vs `optimizer` (both fix issues) - And 6 more confusing pairs... ## Quick Start ```rust use ruvllm::training::{DatasetGenerator, DatasetConfig}; // Generate dataset with 100 examples per category let config = DatasetConfig::default(); let mut generator = DatasetGenerator::new(config); let dataset = generator.generate(); // Export to JSONL dataset.export_jsonl("training.jsonl")?; // Split for training/validation/test let (train, val, test) = dataset.split(0.7, 0.15, 0.15, 42); ``` ## Task Categories ### 1. Coder (20% of dataset) - **Focus**: Code generation, debugging, refactoring - **Examples**: - "Implement JWT authentication middleware in TypeScript" - "Debug memory leak in request handler" - "Refactor UserService to use dependency injection" **Model Routing:** - Simple tasks → Haiku (quick fixes, simple functions) - Moderate tasks → Sonnet (components, APIs) - Complex tasks → Opus (algorithms, system-level) ### 2. Researcher (20% of dataset) - **Focus**: Analysis, exploration, documentation - **Examples**: - "Analyze GraphQL performance bottlenecks" - "Research best practices for microservices" - "Document REST API endpoints" **Model Routing:** - Simple tasks → Haiku (basic docs) - Moderate/Complex → Sonnet (analysis, research) ### 3. Security (20% of dataset) - **Focus**: Audit, vulnerability analysis, threat detection - **Examples**: - "Audit authentication flow for security vulnerabilities" - "Review cryptographic key management" - "Identify SQL injection attack vectors" **Model Routing:** - All tasks → Opus (security requires highest quality) ### 4. Architecture (20% of dataset) - **Focus**: System design, planning, architecture - **Examples**: - "Design microservices architecture for e-commerce" - "Plan database schema for multi-tenant SaaS" - "Architect real-time event streaming pipeline" **Model Routing:** - Simple tasks → Sonnet (basic schemas) - Moderate/Complex → Opus (distributed systems) ### 5. Reviewer (20% of dataset) - **Focus**: Code review, quality assessment - **Examples**: - "Review pull request #123 for best practices" - "Assess code quality of UserController" - "Review error handling in payment service" **Model Routing:** - Simple tasks → Haiku (standards compliance) - Moderate/Complex → Sonnet (quality, architecture review) ## Dataset Configuration ```rust use ruvllm::training::{DatasetConfig, AugmentationConfig}; let config = DatasetConfig { // Base examples per category examples_per_category: 100, // Enable data augmentation enable_augmentation: true, // Augmentation settings augmentation: AugmentationConfig { // Generate 2 paraphrases per example paraphrases_per_example: 2, // Generate 2 complexity variations complexity_variations: 2, // Enable domain transfer enable_domain_transfer: true, }, // Random seed for reproducibility seed: 42, }; ``` ### Dataset Size Calculation With default configuration: - **Base examples**: 5 categories × 100 = 500 examples - **Paraphrases**: 500 × 2 = 1,000 additional examples - **Complexity variations**: 500 × 2 = ~800 additional examples (some filtered) - **Domain transfer**: 500 × 1 = ~400 additional examples (some filtered) - **Total**: ~2,700 examples (actual varies due to filtering) ## Data Augmentation ### 1. Paraphrasing Replaces words with synonyms to increase linguistic diversity: ``` Original: "Implement a function to validate user input" Paraphrased: "Create a function to validate user input" "Build a function to validate user input" ``` ### 2. Complexity Variations Creates examples at different complexity levels: ``` Simple: "Add error handling to API endpoint" Moderate: "Implement error handling with retry logic" Complex: "Design fault-tolerant error handling with circuit breakers" ``` ### 3. Domain Transfer Applies task patterns across technical domains: ``` Web: "Optimize React component rendering" Mobile: "Optimize Flutter widget rendering" Systems: "Optimize kernel thread scheduling" ``` ## Export Formats ### JSONL (Streaming Format) ```rust // One JSON object per line dataset.export_jsonl("training.jsonl")?; ``` **Example line:** ```json {"input":"Implement authentication middleware","context":"JWT with RS256","output_agent":"coder","metadata":{"category":"Coder","complexity":"Moderate","domain":"Web","expected_model":"sonnet","quality_score":0.87,"tags":["auth","middleware"]}} ``` ### JSON (Full Array) ```rust // Human-readable JSON array dataset.export_json("training.json")?; ``` ### Statistics ```rust // Export dataset statistics dataset.export_stats("stats.json")?; ``` **Stats format:** ```json { "total_examples": 2700, "examples_per_category": { "coder": 540, "researcher": 540, "security": 540, "architecture": 540, "reviewer": 540 }, "examples_per_complexity": { "Simple": 900, "Moderate": 1080, "Complex": 720 }, "avg_quality_score": 0.87 } ``` ## Dataset Splits ```rust // 70% train, 15% validation, 15% test let (train, val, test) = dataset.split(0.7, 0.15, 0.15, 42); // Export each split ClaudeTaskDataset::new(train).export_jsonl("train.jsonl")?; ClaudeTaskDataset::new(val).export_jsonl("val.jsonl")?; ClaudeTaskDataset::new(test).export_jsonl("test.jsonl")?; ``` ## Example Structure ### ClaudeTaskExample ```rust pub struct ClaudeTaskExample { /// Task description (model input) pub input: String, /// Additional context pub context: String, /// Expected agent (target output) pub output_agent: String, /// Task metadata pub metadata: TaskMetadata, } ``` ### TaskMetadata ```rust pub struct TaskMetadata { /// Task category pub category: TaskCategory, /// Complexity level (Simple/Moderate/Complex) pub complexity: ComplexityLevel, /// Technical domain pub domain: DomainType, /// Recommended Claude model pub expected_model: String, /// Quality score (0.0-1.0) pub quality_score: f32, /// Descriptive tags pub tags: Vec, } ``` ## Model Selection Logic The dataset includes intelligent model routing based on task category and complexity: | Category | Simple | Moderate | Complex | |----------|--------|----------|---------| | Coder | Haiku | Sonnet | Opus | | Researcher | Haiku | Sonnet | Sonnet | | Security | Opus | Opus | Opus | | Architecture | Sonnet | Opus | Opus | | Reviewer | Haiku | Sonnet | Sonnet | **Cost Optimization:** - **Haiku**: ~75% cheaper than Opus, 2-3x faster - **Sonnet**: Balanced cost/quality for most tasks - **Opus**: Highest quality for complex/security-critical tasks ## Quality Scores Training examples include quality scores (0.0-1.0) based on: 1. **Template Quality** (0.80-0.96) - Hand-crafted seed templates: 0.90-0.96 - Paraphrased examples: 0.85-0.90 - Domain transferred: 0.80-0.85 2. **Category Appropriateness** - Security tasks: 0.90-0.96 (critical quality) - Architecture tasks: 0.85-0.93 (high quality) - Code generation: 0.83-0.90 (good quality) - Research tasks: 0.80-0.89 (adequate quality) - Review tasks: 0.82-0.90 (good quality) ## Integration with RuvLTRA ### Fine-Tuning Pipeline ```rust use ruvllm::training::DatasetGenerator; use ruvllm::SonaLlm; // 1. Generate dataset let dataset = DatasetGenerator::new(config).generate(); // 2. Split data let (train, val, _test) = dataset.split(0.7, 0.15, 0.15, 42); // 3. Fine-tune model let model = SonaLlm::new(config)?; for example in train { let embedding = model.embed(&example.input)?; let target = encode_agent(&example.output_agent); model.train(embedding, target)?; } ``` ### Model Architecture The dataset supports training multiple heads: 1. **Task Embedding Layer** - Input: Task description + context - Output: 768-dim semantic embedding 2. **Agent Classification Head** - Input: Task embedding - Output: 5-way softmax (5 agent types) 3. **Model Selection Head** - Input: Task embedding + complexity features - Output: 3-way softmax (Haiku/Sonnet/Opus) 4. **Quality Prediction Head** - Input: Task embedding - Output: Regression (0-1 quality score) ## Domain Types The dataset covers 8 technical domains: - **Web**: Frontend, backend, full-stack development - **Systems**: Operating systems, low-level programming - **DataScience**: ML, analytics, data processing - **Mobile**: iOS, Android, cross-platform - **DevOps**: Infrastructure, CI/CD, deployment - **Security**: Cryptography, vulnerabilities, compliance - **Database**: SQL, NoSQL, data modeling - **Api**: REST, GraphQL, API design ## Template System The generator uses 100+ hand-crafted templates per category: ```rust TaskTemplate { input: "Implement a {function_type} function in {language}", context: "Should {requirements} and optimize for {target}", complexity: ComplexityLevel::Moderate, domain: DomainType::Web, tags: vec!["code-generation", "function"], quality: 0.87, } ``` **Placeholders** are filled with random values: - `{language}`: Rust, TypeScript, Python, Go, Java - `{framework}`: React, Vue, Angular, Svelte - `{function_type}`: async, recursive, higher-order - `{data_structure}`: binary tree, hash map, linked list ## Running the Examples ### Complete SOTA Training Pipeline ```bash # 1. Generate 500+ Claude-powered hard negatives node npm/packages/ruvllm/scripts/training/claude-hard-negatives.js --count=50 # 2. Merge all triplets (base + hard negatives) cat ~/.ruvllm/training/ruvltra-finetuned/triplets.jsonl > combined.jsonl echo "" >> combined.jsonl cat ~/.ruvllm/training/claude-hard-negatives.jsonl >> combined.jsonl echo "" >> combined.jsonl cat ~/.ruvllm/training/claude-hard-negatives-batch2.jsonl >> combined.jsonl # 3. Run REAL contrastive training with Candle (30 epochs) cargo run --example train_real --release --features candle -- \ --triplets ~/.ruvllm/training/combined-sota.jsonl \ --base-model ruvltra-claude-code-0.5b-q4_k_m.gguf \ --output ruvltra-claude-code-sota.gguf \ --epochs 30 \ --grpo # Enable GRPO feedback loop # 4. Merge trained adapter with base model bash ruvltra-claude-code-sota.gguf.weights/merge_adapter.sh # 5. Benchmark the improvement node npm/packages/ruvllm/scripts/hybrid-model-compare.js ``` ### Simulated Contrastive Fine-Tuning (Quick Test) ```bash # Simulated training (no real weight updates, for testing) cargo run --example train_contrastive --release -- \ --triplets ~/.ruvllm/training/combined-sota.jsonl \ --epochs 30 # Expected output: # - 88%+ embedding-only accuracy # - 81%+ hard negative accuracy # - 100% hybrid routing accuracy ``` ### Dataset Generation ```bash # Generate dataset cargo run --example generate_claude_dataset --release # Output files: # - claude_training_full.jsonl (all examples) # - claude_training_train.jsonl (70% training) # - claude_training_val.jsonl (15% validation) # - claude_training_test.jsonl (15% test) # - claude_training_stats.json (statistics) ``` ## Testing ```bash # Run tests cargo test --package ruvllm --lib training # Test specific functionality cargo test --package ruvllm test_dataset_generation cargo test --package ruvllm test_dataset_augmentation cargo test --package ruvllm test_model_recommendation ``` ## Performance Dataset generation is highly optimized: - **Generation Speed**: ~10,000 examples/second - **Memory Usage**: ~200 MB for 3,000 examples - **Export Speed**: - JSONL: ~50 MB/s - JSON: ~30 MB/s (pretty-printed) ## Future Enhancements ### Planned Features - [ ] Parquet export format - [ ] HuggingFace Datasets integration - [ ] Multi-language support (non-English tasks) - [ ] Custom template loading - [ ] Active learning integration - [ ] Difficulty progression scheduling - [ ] Cross-validation splits - [ ] Balanced sampling strategies ### Research Directions - [ ] Few-shot learning examples - [ ] Task decomposition datasets - [ ] Multi-turn conversation datasets - [ ] Code execution feedback datasets - [ ] Self-improvement trajectory datasets ## References - **Claude Flow**: https://github.com/ruvnet/claude-flow - **RuvLTRA Architecture**: `../../README.md` - **SONA Learning**: `../../../sona/README.md` - **Dataset Format**: `../../../../docs/claude_dataset_format.md` ## License MIT OR Apache-2.0