Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
394
docs/training/task_specific_lora_adapters.md
Normal file
394
docs/training/task_specific_lora_adapters.md
Normal file
@@ -0,0 +1,394 @@
|
||||
# Task-Specific LoRA Adapters for RuvLTRA
|
||||
|
||||
## Overview
|
||||
|
||||
The task-specific LoRA adapter system provides pre-configured, optimized adapters for different agent types in the Claude Flow ecosystem. Each adapter is tuned with specific rank and alpha values for optimal performance in its domain.
|
||||
|
||||
## Features
|
||||
|
||||
- **Pre-defined Adapters**: 5 specialized adapters (Coder, Researcher, Security, Architect, Reviewer)
|
||||
- **Adapter Training**: Full training pipeline with gradient checkpointing and early stopping
|
||||
- **Adapter Merging**: Multiple merge strategies (Average, Weighted, SLERP, TIES, DARE)
|
||||
- **Hot-Swapping**: Runtime adapter switching without model reload
|
||||
- **Persistence**: Save/load adapters in safetensors-compatible format
|
||||
- **Mixed Precision**: Optional bf16/fp16 training support
|
||||
|
||||
## Pre-defined Adapters
|
||||
|
||||
### 1. Coder Adapter
|
||||
|
||||
**Optimized for**: Code generation and refactoring
|
||||
|
||||
- **Rank**: 16 (high capacity for code patterns)
|
||||
- **Alpha**: 32.0 (strong adaptation signal)
|
||||
- **Target Modules**: All attention modules (Q, K, V, O)
|
||||
- **Memory**: ~200 KB @ 768d
|
||||
- **Use Cases**: Code completion, refactoring, syntax correction
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::RuvLtraAdapters;
|
||||
|
||||
let adapters = RuvLtraAdapters::new();
|
||||
let coder = adapters.create_lora("coder", 768)?;
|
||||
```
|
||||
|
||||
### 2. Researcher Adapter
|
||||
|
||||
**Optimized for**: Information analysis and synthesis
|
||||
|
||||
- **Rank**: 8 (moderate capacity)
|
||||
- **Alpha**: 16.0 (balanced adaptation)
|
||||
- **Target Modules**: Q, K, V projections
|
||||
- **Memory**: ~100 KB @ 768d
|
||||
- **Use Cases**: Research synthesis, information extraction, analysis
|
||||
|
||||
### 3. Security Adapter
|
||||
|
||||
**Optimized for**: Vulnerability detection and secure coding
|
||||
|
||||
- **Rank**: 16 (high capacity)
|
||||
- **Alpha**: 32.0 (strong signal for critical issues)
|
||||
- **Target Modules**: All attention + MLP modules
|
||||
- **Memory**: ~350 KB @ 768d
|
||||
- **Use Cases**: Security auditing, vulnerability detection, secure code patterns
|
||||
|
||||
### 4. Architect Adapter
|
||||
|
||||
**Optimized for**: System design and architecture
|
||||
|
||||
- **Rank**: 12 (good capacity for architectural patterns)
|
||||
- **Alpha**: 24.0 (strong but balanced)
|
||||
- **Target Modules**: Q, V projections + Gate, Up projections
|
||||
- **Memory**: ~180 KB @ 768d
|
||||
- **Use Cases**: System design, architectural decisions, pattern selection
|
||||
|
||||
### 5. Reviewer Adapter
|
||||
|
||||
**Optimized for**: Code review and quality assessment
|
||||
|
||||
- **Rank**: 8 (focused capacity)
|
||||
- **Alpha**: 16.0 (balanced)
|
||||
- **Target Modules**: Q, V projections
|
||||
- **Memory**: ~100 KB @ 768d
|
||||
- **Use Cases**: Code review, quality assessment, best practices
|
||||
|
||||
## Training Adapters
|
||||
|
||||
### Quick Training (1 epoch)
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::{
|
||||
RuvLtraAdapters, AdapterTrainer, AdapterTrainingConfig,
|
||||
SyntheticDataGenerator,
|
||||
};
|
||||
|
||||
// Generate synthetic training data
|
||||
let generator = SyntheticDataGenerator::new(768, 42);
|
||||
let dataset = generator.generate("coder", 1000);
|
||||
|
||||
// Create adapter
|
||||
let adapters = RuvLtraAdapters::new();
|
||||
let lora = adapters.create_lora("coder", 768)?;
|
||||
|
||||
// Train
|
||||
let config = AdapterTrainingConfig::quick();
|
||||
let mut trainer = AdapterTrainer::new(config);
|
||||
let result = trainer.train(&lora, &dataset)?;
|
||||
|
||||
println!("Final loss: {:.4}", result.final_loss);
|
||||
```
|
||||
|
||||
### Stable Training (5 epochs)
|
||||
|
||||
```rust
|
||||
let config = AdapterTrainingConfig::stable();
|
||||
let mut trainer = AdapterTrainer::new(config);
|
||||
let result = trainer.train(&lora, &dataset)?;
|
||||
```
|
||||
|
||||
### Custom Training Configuration
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::{AdapterTrainingConfig, LearningRateSchedule, TrainingConfig};
|
||||
|
||||
let config = AdapterTrainingConfig {
|
||||
training: TrainingConfig {
|
||||
learning_rate: 0.001,
|
||||
ewc_lambda: 3000.0,
|
||||
lr_schedule: LearningRateSchedule::Cosine,
|
||||
..Default::default()
|
||||
},
|
||||
epochs: 3,
|
||||
validation_interval: 100,
|
||||
early_stopping_patience: 5,
|
||||
gradient_checkpointing: true,
|
||||
mixed_precision: false,
|
||||
save_best: true,
|
||||
output_dir: "./my_adapters".to_string(),
|
||||
};
|
||||
```
|
||||
|
||||
## Adapter Merging
|
||||
|
||||
### Average Merge
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::{AdapterMerger, MergeConfig};
|
||||
|
||||
let adapters_to_merge = vec![
|
||||
("coder".to_string(), coder_lora),
|
||||
("security".to_string(), security_lora),
|
||||
];
|
||||
|
||||
let config = MergeConfig::average();
|
||||
let merger = AdapterMerger::new(config);
|
||||
let merged = merger.merge(&adapters_to_merge, &adapters.coder, 768)?;
|
||||
```
|
||||
|
||||
### Weighted Merge
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut weights = HashMap::new();
|
||||
weights.insert("coder".to_string(), 0.7);
|
||||
weights.insert("security".to_string(), 0.3);
|
||||
|
||||
let config = MergeConfig::weighted(weights);
|
||||
let merger = AdapterMerger::new(config);
|
||||
let merged = merger.merge(&adapters_to_merge, &adapters.coder, 768)?;
|
||||
```
|
||||
|
||||
### SLERP Interpolation
|
||||
|
||||
Spherical Linear Interpolation for smooth transitions between two adapters:
|
||||
|
||||
```rust
|
||||
let config = MergeConfig::slerp(0.5); // t ∈ [0, 1]
|
||||
let merger = AdapterMerger::new(config);
|
||||
let merged = merger.merge(&two_adapters, &adapters.coder, 768)?;
|
||||
```
|
||||
|
||||
### TIES Merging
|
||||
|
||||
Trim, Elect, Merge strategy for multi-adapter composition:
|
||||
|
||||
```rust
|
||||
let config = MergeConfig::ties(0.6); // density parameter
|
||||
let merger = AdapterMerger::new(config);
|
||||
let merged = merger.merge(&multiple_adapters, &adapters.coder, 768)?;
|
||||
```
|
||||
|
||||
### DARE Merging
|
||||
|
||||
Drop And REscale for sparse adapter merging:
|
||||
|
||||
```rust
|
||||
let config = MergeConfig {
|
||||
strategy: MergeStrategy::Dare,
|
||||
density: 0.7,
|
||||
..Default::default()
|
||||
};
|
||||
let merger = AdapterMerger::new(config);
|
||||
let merged = merger.merge(&adapters_list, &adapters.coder, 768)?;
|
||||
```
|
||||
|
||||
## Hot-Swapping Adapters
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::HotSwapManager;
|
||||
|
||||
let mut manager = HotSwapManager::new();
|
||||
|
||||
// Set initial active adapter
|
||||
manager.set_active(coder_lora);
|
||||
|
||||
// Use active adapter
|
||||
if let Some(active) = manager.active() {
|
||||
let output = active.forward(&input, &TargetModule::QProj);
|
||||
}
|
||||
|
||||
// Prepare new adapter in standby
|
||||
manager.prepare_standby(security_lora);
|
||||
|
||||
// Atomic swap
|
||||
manager.swap()?;
|
||||
|
||||
// Now security adapter is active
|
||||
```
|
||||
|
||||
## Per-Request Adaptation
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::AdaptFeedback;
|
||||
|
||||
// Inference
|
||||
let output = lora.forward(&input, &TargetModule::QProj);
|
||||
|
||||
// Adapt based on feedback
|
||||
let feedback = AdaptFeedback::from_quality(0.85);
|
||||
lora.adapt(&input, feedback)?;
|
||||
|
||||
// Apply accumulated updates
|
||||
lora.apply_updates(0.01); // learning rate
|
||||
```
|
||||
|
||||
## Custom Adapter Configuration
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::{LoraConfig, TargetModule};
|
||||
|
||||
let custom = LoraConfig::builder("my_adapter")
|
||||
.rank(12)
|
||||
.alpha(24.0)
|
||||
.dropout(0.1)
|
||||
.target_modules(vec![
|
||||
TargetModule::QProj,
|
||||
TargetModule::VProj,
|
||||
TargetModule::GateProj,
|
||||
])
|
||||
.description("Custom adapter for specialized task")
|
||||
.add_tag("custom")
|
||||
.add_tag("specialized")
|
||||
.build();
|
||||
|
||||
// Create MicroLoRA from custom config
|
||||
let lora_config = custom.to_micro_lora_config(768)?;
|
||||
let lora = MicroLoRA::new(lora_config);
|
||||
```
|
||||
|
||||
## Persistence
|
||||
|
||||
### Save Adapter
|
||||
|
||||
```rust
|
||||
lora.save("./adapters/coder_v1.bin")?;
|
||||
```
|
||||
|
||||
### Load Adapter
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::MicroLoRA;
|
||||
|
||||
let lora = MicroLoRA::load("./adapters/coder_v1.bin")?;
|
||||
```
|
||||
|
||||
### Save Training Dataset
|
||||
|
||||
```rust
|
||||
dataset.save("./datasets/coder_train.bin")?;
|
||||
```
|
||||
|
||||
### Load Training Dataset
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::AdapterDataset;
|
||||
|
||||
let dataset = AdapterDataset::load("./datasets/coder_train.bin")?;
|
||||
```
|
||||
|
||||
## Synthetic Data Generation
|
||||
|
||||
Generate task-specific synthetic training data:
|
||||
|
||||
```rust
|
||||
use ruvllm::lora::SyntheticDataGenerator;
|
||||
|
||||
let generator = SyntheticDataGenerator::new(768, 42); // dim, seed
|
||||
|
||||
// Generate for specific task
|
||||
let coder_data = generator.generate("coder", 1000);
|
||||
|
||||
// Generate for all tasks
|
||||
let all_datasets = generator.generate_all(1000);
|
||||
|
||||
for (name, dataset) in all_datasets {
|
||||
println!("{}: {} train, {} val",
|
||||
name, dataset.examples.len(), dataset.validation.len());
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
| Adapter | Rank | Params (768d) | Memory | Forward (μs) |
|
||||
|---------|------|---------------|--------|--------------|
|
||||
| Coder | 16 | 196,608 | 200 KB | <50 |
|
||||
| Researcher | 8 | 98,304 | 100 KB | <30 |
|
||||
| Security | 16 | 393,216 | 350 KB | <80 |
|
||||
| Architect | 12 | 196,608 | 180 KB | <60 |
|
||||
| Reviewer | 8 | 98,304 | 100 KB | <30 |
|
||||
|
||||
## Training Performance
|
||||
|
||||
- **Gradient Checkpointing**: 50% memory reduction
|
||||
- **Mixed Precision**: 2x throughput (when supported)
|
||||
- **EWC++ Regularization**: Prevents catastrophic forgetting
|
||||
- **Early Stopping**: Automatic convergence detection
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Adapter Selection
|
||||
|
||||
Choose adapters based on task requirements:
|
||||
- **Code tasks**: Use Coder adapter
|
||||
- **Analysis tasks**: Use Researcher adapter
|
||||
- **Security audits**: Use Security adapter
|
||||
- **Design tasks**: Use Architect adapter
|
||||
- **Review tasks**: Use Reviewer adapter
|
||||
|
||||
### 2. Training
|
||||
|
||||
- Use **quick** config for experimentation (1 epoch)
|
||||
- Use **stable** config for production (5 epochs, lower LR)
|
||||
- Enable **gradient checkpointing** for large models
|
||||
- Set appropriate **quality threshold** to filter low-quality examples
|
||||
|
||||
### 3. Merging
|
||||
|
||||
- Use **Average** for simple multi-task scenarios
|
||||
- Use **Weighted** when tasks have different importance
|
||||
- Use **SLERP** for smooth transitions
|
||||
- Use **TIES** for robust multi-adapter composition
|
||||
|
||||
### 4. Hot-Swapping
|
||||
|
||||
- Always **prepare standby** before swapping
|
||||
- Check **is_swapping()** before critical operations
|
||||
- Use for dynamic task routing
|
||||
|
||||
## Integration with Claude Flow
|
||||
|
||||
```rust
|
||||
// Route task to appropriate adapter
|
||||
let adapter = match task_type {
|
||||
"code" => adapters.create_lora("coder", 768)?,
|
||||
"research" => adapters.create_lora("researcher", 768)?,
|
||||
"security" => adapters.create_lora("security", 768)?,
|
||||
"architecture" => adapters.create_lora("architect", 768)?,
|
||||
"review" => adapters.create_lora("reviewer", 768)?,
|
||||
_ => adapters.create_lora("coder", 768)?, // default
|
||||
};
|
||||
|
||||
// Use for inference
|
||||
let output = adapter.forward(&input, &TargetModule::QProj);
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- [ ] Safetensors format support
|
||||
- [ ] Quantized adapter loading (4-bit, 8-bit)
|
||||
- [ ] PEFT integration
|
||||
- [ ] LoRA+ (optimized learning rates for A and B)
|
||||
- [ ] DoRA (Weight-Decomposed Low-Rank Adaptation)
|
||||
- [ ] Adapter routing networks
|
||||
|
||||
## References
|
||||
|
||||
- LoRA: [https://arxiv.org/abs/2106.09685](https://arxiv.org/abs/2106.09685)
|
||||
- EWC++: [https://arxiv.org/abs/1801.10112](https://arxiv.org/abs/1801.10112)
|
||||
- TIES-Merging: [https://arxiv.org/abs/2306.01708](https://arxiv.org/abs/2306.01708)
|
||||
- DARE: [https://arxiv.org/abs/2311.03099](https://arxiv.org/abs/2311.03099)
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 / MIT
|
||||
Reference in New Issue
Block a user