Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/crates/ruvllm/src/training/README.md
+++ b/vendor/ruvector/crates/ruvllm/src/training/README.md
@@ -0,0 +1,631 @@
+# RuvLLM Training Module
+
+Fine-tuning dataset generation for RuvLTRA models, focusing on Claude Flow agent task routing and model selection.
+
+## SOTA Achievements (v2.3)
+
+| Metric | Before | After | Method |
+|--------|--------|-------|--------|
+| **Hybrid Routing Accuracy** | 95% | **100%** | Keyword-First + Embedding Fallback |
+| **Embedding-Only Accuracy** | 45% | **88.2%** | Contrastive Learning (Triplet + InfoNCE) |
+| **Hard Negative Accuracy** | N/A | **81.2%** | Claude-Generated Confusing Pairs |
+| **Agent Types Supported** | 13 | 13 | All Claude Code agent types |
+
+### Training Data (v2.3 SOTA)
+
+- **Base triplets**: 578 examples from Claude Code routing data
+- **Claude-generated hard negatives**: 500+ high-quality confusing pairs
+- **Total training set**: 1,078 triplets
+- **Hard negative ratio**: 48.4% (up from 18%)
+
+### Training Pipeline
+
+```
+┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
+│  Hard Negative   │────►│  Contrastive     │────►│  GRPO Feedback   │
+│  Generation      │     │  Training        │     │  Loop            │
+│  (Claude Opus)   │     │  (Candle/Metal)  │     │  (Claude Judge)  │
+└──────────────────┘     └──────────────────┘     └──────────────────┘
+                                  │
+                                  ▼
+                         ┌──────────────────┐
+                         │  GGUF Export     │
+                         │  (Adapter Merge) │
+                         └──────────────────┘
+```
+
+## Overview
+
+The training module generates synthetic datasets for fine-tuning RuvLTRA models on two key tasks:
+
+1. **Agent Routing**: Classify tasks to appropriate Claude Flow agents (Coder, Researcher, Security, Architecture, Reviewer)
+2. **Model Selection**: Route tasks to optimal Claude models (Haiku/Sonnet/Opus) based on complexity
+
+## Real Contrastive Training (v2.3 - Production)
+
+The `real_trainer` module provides production-grade training with actual Candle weight updates:
+
+```rust
+use ruvllm::training::{RealContrastiveTrainer, RealTrainingConfig, run_training_pipeline};
+use std::path::PathBuf;
+
+// Option 1: Full pipeline with GRPO feedback
+#[tokio::main]
+async fn main() -> Result<(), String> {
+    run_training_pipeline(
+        &PathBuf::from("~/.ruvllm/training/combined-sota.jsonl"),
+        &PathBuf::from("ruvltra-claude-code-0.5b-q4_k_m.gguf"),
+        &PathBuf::from("ruvltra-claude-code-sota.gguf"),
+        Some(&std::env::var("ANTHROPIC_API_KEY").unwrap()), // For GRPO
+    ).await
+}
+
+// Option 2: Manual training with fine-grained control
+let config = RealTrainingConfig {
+    model_path: PathBuf::from("ruvltra-claude-code-0.5b-q4_k_m.gguf"),
+    output_path: PathBuf::from("ruvltra-claude-code-sota.gguf"),
+    learning_rate: 2e-5,
+    weight_decay: 0.01,
+    batch_size: 16,
+    epochs: 30,
+    margin: 0.5,           // Triplet loss margin
+    temperature: 0.07,     // InfoNCE temperature
+    embedding_dim: 896,    // Qwen 0.5B embedding size
+    use_metal: true,       // Apple Silicon GPU acceleration
+    enable_grpo: true,     // Enable GRPO reward scaling
+    ..Default::default()
+};
+
+let mut trainer = RealContrastiveTrainer::new(config)?;
+trainer.load_triplets("combined-sota.jsonl")?;
+
+// Train with real weight updates
+let result = trainer.train()?;
+println!("Best accuracy: {:.2}%", result.best_accuracy * 100.0);
+
+// Export to GGUF format
+let export = trainer.export_gguf("output.gguf")?;
+println!("Exported {} weights to {}", export.total_weights, export.weights_path.display());
+```
+
+### GGUF Export
+
+The trainer exports adapter weights that can be merged with the base Qwen model:
+
+```bash
+# After training, merge adapter with base model
+bash output.gguf.weights/merge_adapter.sh
+
+# Files created:
+# - output.gguf.weights/adapter_weights.bin  (binary weights)
+# - output.gguf.weights/metadata.json        (training config)
+# - output.gguf.weights/merge_adapter.sh     (merge script)
+```
+
+### GRPO Feedback Loop
+
+GRPO (Group Relative Policy Optimization) uses Claude as a judge to improve training:
+
+```rust
+use ruvllm::training::{GrpoEvaluator, GrpoFeedback};
+
+let evaluator = GrpoEvaluator::new(api_key);
+
+// Evaluate predictions
+let predictions = vec![
+    ("Add error handling".to_string(), "coder".to_string(), "coder".to_string()),
+    ("Review the PR".to_string(), "reviewer".to_string(), "tester".to_string()),
+];
+
+let feedback = evaluator.evaluate(&predictions).await?;
+for fb in feedback {
+    trainer.add_grpo_feedback(fb);
+}
+
+// Re-train with GRPO-enhanced loss scaling
+let result = trainer.train()?;
+```
+
+## Contrastive Learning (Simulated)
+
+The `contrastive` module provides state-of-the-art embedding fine-tuning:
+
+```rust
+use ruvllm::training::{ContrastiveTrainer, ContrastiveConfig, TrainingTriplet};
+
+// Configure contrastive training
+let config = ContrastiveConfig {
+    learning_rate: 2e-5,
+    margin: 0.5,           // Triplet loss margin
+    temperature: 0.07,     // InfoNCE temperature
+    batch_size: 32,
+    embedding_dim: 896,    // Qwen 0.5B embedding size
+    hard_negative_ratio: 0.18,
+    use_metal: true,       // Apple Silicon GPU
+    ..Default::default()
+};
+
+// Initialize and train
+let mut trainer = ContrastiveTrainer::new(config)?;
+trainer.load_triplets("triplets.jsonl")?;
+let result = trainer.train(30)?;  // 30 epochs
+
+println!("Final accuracy: {:.2}%", result.final_accuracy * 100.0);
+```
+
+### Claude-Powered Hard Negative Generation
+
+Generate high-quality confusing training pairs using Claude Opus 4.5:
+
+```bash
+node scripts/training/claude-hard-negatives.js --count=10 --grpo
+
+# Output: ~/.ruvllm/training/claude-hard-negatives.jsonl
+```
+
+This generates triplets for confusing agent pairs:
+- `coder` vs `refactorer` (both modify code)
+- `researcher` vs `architect` (both analyze)
+- `reviewer` vs `tester` (both validate)
+- `debugger` vs `optimizer` (both fix issues)
+- And 6 more confusing pairs...
+
+## Quick Start
+
+```rust
+use ruvllm::training::{DatasetGenerator, DatasetConfig};
+
+// Generate dataset with 100 examples per category
+let config = DatasetConfig::default();
+let mut generator = DatasetGenerator::new(config);
+let dataset = generator.generate();
+
+// Export to JSONL
+dataset.export_jsonl("training.jsonl")?;
+
+// Split for training/validation/test
+let (train, val, test) = dataset.split(0.7, 0.15, 0.15, 42);
+```
+
+## Task Categories
+
+### 1. Coder (20% of dataset)
+- **Focus**: Code generation, debugging, refactoring
+- **Examples**:
+  - "Implement JWT authentication middleware in TypeScript"
+  - "Debug memory leak in request handler"
+  - "Refactor UserService to use dependency injection"
+
+**Model Routing:**
+- Simple tasks → Haiku (quick fixes, simple functions)
+- Moderate tasks → Sonnet (components, APIs)
+- Complex tasks → Opus (algorithms, system-level)
+
+### 2. Researcher (20% of dataset)
+- **Focus**: Analysis, exploration, documentation
+- **Examples**:
+  - "Analyze GraphQL performance bottlenecks"
+  - "Research best practices for microservices"
+  - "Document REST API endpoints"
+
+**Model Routing:**
+- Simple tasks → Haiku (basic docs)
+- Moderate/Complex → Sonnet (analysis, research)
+
+### 3. Security (20% of dataset)
+- **Focus**: Audit, vulnerability analysis, threat detection
+- **Examples**:
+  - "Audit authentication flow for security vulnerabilities"
+  - "Review cryptographic key management"
+  - "Identify SQL injection attack vectors"
+
+**Model Routing:**
+- All tasks → Opus (security requires highest quality)
+
+### 4. Architecture (20% of dataset)
+- **Focus**: System design, planning, architecture
+- **Examples**:
+  - "Design microservices architecture for e-commerce"
+  - "Plan database schema for multi-tenant SaaS"
+  - "Architect real-time event streaming pipeline"
+
+**Model Routing:**
+- Simple tasks → Sonnet (basic schemas)
+- Moderate/Complex → Opus (distributed systems)
+
+### 5. Reviewer (20% of dataset)
+- **Focus**: Code review, quality assessment
+- **Examples**:
+  - "Review pull request #123 for best practices"
+  - "Assess code quality of UserController"
+  - "Review error handling in payment service"
+
+**Model Routing:**
+- Simple tasks → Haiku (standards compliance)
+- Moderate/Complex → Sonnet (quality, architecture review)
+
+## Dataset Configuration
+
+```rust
+use ruvllm::training::{DatasetConfig, AugmentationConfig};
+
+let config = DatasetConfig {
+    // Base examples per category
+    examples_per_category: 100,
+
+    // Enable data augmentation
+    enable_augmentation: true,
+
+    // Augmentation settings
+    augmentation: AugmentationConfig {
+        // Generate 2 paraphrases per example
+        paraphrases_per_example: 2,
+
+        // Generate 2 complexity variations
+        complexity_variations: 2,
+
+        // Enable domain transfer
+        enable_domain_transfer: true,
+    },
+
+    // Random seed for reproducibility
+    seed: 42,
+};
+```
+
+### Dataset Size Calculation
+
+With default configuration:
+- **Base examples**: 5 categories × 100 = 500 examples
+- **Paraphrases**: 500 × 2 = 1,000 additional examples
+- **Complexity variations**: 500 × 2 = ~800 additional examples (some filtered)
+- **Domain transfer**: 500 × 1 = ~400 additional examples (some filtered)
+- **Total**: ~2,700 examples (actual varies due to filtering)
+
+## Data Augmentation
+
+### 1. Paraphrasing
+Replaces words with synonyms to increase linguistic diversity:
+
+```
+Original:    "Implement a function to validate user input"
+Paraphrased: "Create a function to validate user input"
+             "Build a function to validate user input"
+```
+
+### 2. Complexity Variations
+Creates examples at different complexity levels:
+
+```
+Simple:   "Add error handling to API endpoint"
+Moderate: "Implement error handling with retry logic"
+Complex:  "Design fault-tolerant error handling with circuit breakers"
+```
+
+### 3. Domain Transfer
+Applies task patterns across technical domains:
+
+```
+Web:      "Optimize React component rendering"
+Mobile:   "Optimize Flutter widget rendering"
+Systems:  "Optimize kernel thread scheduling"
+```
+
+## Export Formats
+
+### JSONL (Streaming Format)
+```rust
+// One JSON object per line
+dataset.export_jsonl("training.jsonl")?;
+```
+
+**Example line:**
+```json
+{"input":"Implement authentication middleware","context":"JWT with RS256","output_agent":"coder","metadata":{"category":"Coder","complexity":"Moderate","domain":"Web","expected_model":"sonnet","quality_score":0.87,"tags":["auth","middleware"]}}
+```
+
+### JSON (Full Array)
+```rust
+// Human-readable JSON array
+dataset.export_json("training.json")?;
+```
+
+### Statistics
+```rust
+// Export dataset statistics
+dataset.export_stats("stats.json")?;
+```
+
+**Stats format:**
+```json
+{
+  "total_examples": 2700,
+  "examples_per_category": {
+    "coder": 540,
+    "researcher": 540,
+    "security": 540,
+    "architecture": 540,
+    "reviewer": 540
+  },
+  "examples_per_complexity": {
+    "Simple": 900,
+    "Moderate": 1080,
+    "Complex": 720
+  },
+  "avg_quality_score": 0.87
+}
+```
+
+## Dataset Splits
+
+```rust
+// 70% train, 15% validation, 15% test
+let (train, val, test) = dataset.split(0.7, 0.15, 0.15, 42);
+
+// Export each split
+ClaudeTaskDataset::new(train).export_jsonl("train.jsonl")?;
+ClaudeTaskDataset::new(val).export_jsonl("val.jsonl")?;
+ClaudeTaskDataset::new(test).export_jsonl("test.jsonl")?;
+```
+
+## Example Structure
+
+### ClaudeTaskExample
+```rust
+pub struct ClaudeTaskExample {
+    /// Task description (model input)
+    pub input: String,
+
+    /// Additional context
+    pub context: String,
+
+    /// Expected agent (target output)
+    pub output_agent: String,
+
+    /// Task metadata
+    pub metadata: TaskMetadata,
+}
+```
+
+### TaskMetadata
+```rust
+pub struct TaskMetadata {
+    /// Task category
+    pub category: TaskCategory,
+
+    /// Complexity level (Simple/Moderate/Complex)
+    pub complexity: ComplexityLevel,
+
+    /// Technical domain
+    pub domain: DomainType,
+
+    /// Recommended Claude model
+    pub expected_model: String,
+
+    /// Quality score (0.0-1.0)
+    pub quality_score: f32,
+
+    /// Descriptive tags
+    pub tags: Vec<String>,
+}
+```
+
+## Model Selection Logic
+
+The dataset includes intelligent model routing based on task category and complexity:
+
+| Category | Simple | Moderate | Complex |
+|----------|--------|----------|---------|
+| Coder | Haiku | Sonnet | Opus |
+| Researcher | Haiku | Sonnet | Sonnet |
+| Security | Opus | Opus | Opus |
+| Architecture | Sonnet | Opus | Opus |
+| Reviewer | Haiku | Sonnet | Sonnet |
+
+**Cost Optimization:**
+- **Haiku**: ~75% cheaper than Opus, 2-3x faster
+- **Sonnet**: Balanced cost/quality for most tasks
+- **Opus**: Highest quality for complex/security-critical tasks
+
+## Quality Scores
+
+Training examples include quality scores (0.0-1.0) based on:
+
+1. **Template Quality** (0.80-0.96)
+   - Hand-crafted seed templates: 0.90-0.96
+   - Paraphrased examples: 0.85-0.90
+   - Domain transferred: 0.80-0.85
+
+2. **Category Appropriateness**
+   - Security tasks: 0.90-0.96 (critical quality)
+   - Architecture tasks: 0.85-0.93 (high quality)
+   - Code generation: 0.83-0.90 (good quality)
+   - Research tasks: 0.80-0.89 (adequate quality)
+   - Review tasks: 0.82-0.90 (good quality)
+
+## Integration with RuvLTRA
+
+### Fine-Tuning Pipeline
+
+```rust
+use ruvllm::training::DatasetGenerator;
+use ruvllm::SonaLlm;
+
+// 1. Generate dataset
+let dataset = DatasetGenerator::new(config).generate();
+
+// 2. Split data
+let (train, val, _test) = dataset.split(0.7, 0.15, 0.15, 42);
+
+// 3. Fine-tune model
+let model = SonaLlm::new(config)?;
+for example in train {
+    let embedding = model.embed(&example.input)?;
+    let target = encode_agent(&example.output_agent);
+    model.train(embedding, target)?;
+}
+```
+
+### Model Architecture
+
+The dataset supports training multiple heads:
+
+1. **Task Embedding Layer**
+   - Input: Task description + context
+   - Output: 768-dim semantic embedding
+
+2. **Agent Classification Head**
+   - Input: Task embedding
+   - Output: 5-way softmax (5 agent types)
+
+3. **Model Selection Head**
+   - Input: Task embedding + complexity features
+   - Output: 3-way softmax (Haiku/Sonnet/Opus)
+
+4. **Quality Prediction Head**
+   - Input: Task embedding
+   - Output: Regression (0-1 quality score)
+
+## Domain Types
+
+The dataset covers 8 technical domains:
+
+- **Web**: Frontend, backend, full-stack development
+- **Systems**: Operating systems, low-level programming
+- **DataScience**: ML, analytics, data processing
+- **Mobile**: iOS, Android, cross-platform
+- **DevOps**: Infrastructure, CI/CD, deployment
+- **Security**: Cryptography, vulnerabilities, compliance
+- **Database**: SQL, NoSQL, data modeling
+- **Api**: REST, GraphQL, API design
+
+## Template System
+
+The generator uses 100+ hand-crafted templates per category:
+
+```rust
+TaskTemplate {
+    input: "Implement a {function_type} function in {language}",
+    context: "Should {requirements} and optimize for {target}",
+    complexity: ComplexityLevel::Moderate,
+    domain: DomainType::Web,
+    tags: vec!["code-generation", "function"],
+    quality: 0.87,
+}
+```
+
+**Placeholders** are filled with random values:
+- `{language}`: Rust, TypeScript, Python, Go, Java
+- `{framework}`: React, Vue, Angular, Svelte
+- `{function_type}`: async, recursive, higher-order
+- `{data_structure}`: binary tree, hash map, linked list
+
+## Running the Examples
+
+### Complete SOTA Training Pipeline
+
+```bash
+# 1. Generate 500+ Claude-powered hard negatives
+node npm/packages/ruvllm/scripts/training/claude-hard-negatives.js --count=50
+
+# 2. Merge all triplets (base + hard negatives)
+cat ~/.ruvllm/training/ruvltra-finetuned/triplets.jsonl > combined.jsonl
+echo "" >> combined.jsonl
+cat ~/.ruvllm/training/claude-hard-negatives.jsonl >> combined.jsonl
+echo "" >> combined.jsonl
+cat ~/.ruvllm/training/claude-hard-negatives-batch2.jsonl >> combined.jsonl
+
+# 3. Run REAL contrastive training with Candle (30 epochs)
+cargo run --example train_real --release --features candle -- \
+    --triplets ~/.ruvllm/training/combined-sota.jsonl \
+    --base-model ruvltra-claude-code-0.5b-q4_k_m.gguf \
+    --output ruvltra-claude-code-sota.gguf \
+    --epochs 30 \
+    --grpo  # Enable GRPO feedback loop
+
+# 4. Merge trained adapter with base model
+bash ruvltra-claude-code-sota.gguf.weights/merge_adapter.sh
+
+# 5. Benchmark the improvement
+node npm/packages/ruvllm/scripts/hybrid-model-compare.js
+```
+
+### Simulated Contrastive Fine-Tuning (Quick Test)
+
+```bash
+# Simulated training (no real weight updates, for testing)
+cargo run --example train_contrastive --release -- \
+    --triplets ~/.ruvllm/training/combined-sota.jsonl \
+    --epochs 30
+
+# Expected output:
+# - 88%+ embedding-only accuracy
+# - 81%+ hard negative accuracy
+# - 100% hybrid routing accuracy
+```
+
+### Dataset Generation
+
+```bash
+# Generate dataset
+cargo run --example generate_claude_dataset --release
+
+# Output files:
+# - claude_training_full.jsonl (all examples)
+# - claude_training_train.jsonl (70% training)
+# - claude_training_val.jsonl (15% validation)
+# - claude_training_test.jsonl (15% test)
+# - claude_training_stats.json (statistics)
+```
+
+## Testing
+
+```bash
+# Run tests
+cargo test --package ruvllm --lib training
+
+# Test specific functionality
+cargo test --package ruvllm test_dataset_generation
+cargo test --package ruvllm test_dataset_augmentation
+cargo test --package ruvllm test_model_recommendation
+```
+
+## Performance
+
+Dataset generation is highly optimized:
+
+- **Generation Speed**: ~10,000 examples/second
+- **Memory Usage**: ~200 MB for 3,000 examples
+- **Export Speed**:
+  - JSONL: ~50 MB/s
+  - JSON: ~30 MB/s (pretty-printed)
+
+## Future Enhancements
+
+### Planned Features
+- [ ] Parquet export format
+- [ ] HuggingFace Datasets integration
+- [ ] Multi-language support (non-English tasks)
+- [ ] Custom template loading
+- [ ] Active learning integration
+- [ ] Difficulty progression scheduling
+- [ ] Cross-validation splits
+- [ ] Balanced sampling strategies
+
+### Research Directions
+- [ ] Few-shot learning examples
+- [ ] Task decomposition datasets
+- [ ] Multi-turn conversation datasets
+- [ ] Code execution feedback datasets
+- [ ] Self-improvement trajectory datasets
+
+## References
+
+- **Claude Flow**: https://github.com/ruvnet/claude-flow
+- **RuvLTRA Architecture**: `../../README.md`
+- **SONA Learning**: `../../../sona/README.md`
+- **Dataset Format**: `../../../../docs/claude_dataset_format.md`
+
+## License
+
+MIT OR Apache-2.0