git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
538 lines
15 KiB
Markdown
538 lines
15 KiB
Markdown
---
|
|
license: apache-2.0
|
|
language:
|
|
- en
|
|
tags:
|
|
- llm
|
|
- code-generation
|
|
- claude-code
|
|
- sona
|
|
- swarm
|
|
- multi-agent
|
|
- gguf
|
|
- quantized
|
|
- edge-ai
|
|
- self-learning
|
|
- ruvector
|
|
- embeddings
|
|
- routing
|
|
- cost-optimization
|
|
- contrastive-learning
|
|
- triplet-loss
|
|
- infonce
|
|
- agent-routing
|
|
- sota
|
|
- task-routing
|
|
- semantic-search
|
|
library_name: ruvllm
|
|
pipeline_tag: text-classification
|
|
base_model: Qwen/Qwen2.5-0.5B-Instruct
|
|
datasets:
|
|
- custom
|
|
model-index:
|
|
- name: RuvLTRA Claude Code 0.5B
|
|
results:
|
|
- task:
|
|
type: text-classification
|
|
name: Agent Routing
|
|
dataset:
|
|
type: custom
|
|
name: Claude Flow Routing Triplets
|
|
metrics:
|
|
- type: accuracy
|
|
value: 0.882
|
|
name: Embedding-Only Accuracy
|
|
- type: accuracy
|
|
value: 1.0
|
|
name: Hybrid Routing Accuracy
|
|
- type: accuracy
|
|
value: 0.812
|
|
name: Hard Negative Accuracy
|
|
widget:
|
|
- text: "Route: Implement authentication\nAgent:"
|
|
example_title: Code Task
|
|
- text: "Route: Review the pull request\nAgent:"
|
|
example_title: Review Task
|
|
- text: "Route: Fix the null pointer bug\nAgent:"
|
|
example_title: Debug Task
|
|
- text: "Route: Design database schema\nAgent:"
|
|
example_title: Architecture Task
|
|
---
|
|
|
|
# RuvLTRA
|
|
|
|
<p align="center">
|
|
<img src="https://img.shields.io/badge/Hybrid_Routing-100%25-brightgreen" alt="Hybrid Accuracy">
|
|
<img src="https://img.shields.io/badge/Embedding-88.2%25-green" alt="Embedding Accuracy">
|
|
<img src="https://img.shields.io/badge/GGUF-Q4__K__M-blue" alt="GGUF">
|
|
<img src="https://img.shields.io/badge/Latency-<10ms-orange" alt="Latency">
|
|
<img src="https://img.shields.io/badge/Capabilities-388-cyan" alt="Capabilities">
|
|
<img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
|
|
</p>
|
|
|
|
**RuvLTRA** is a collection of optimized models designed for **local routing, embeddings, and task classification** in Claude Code workflows—not for general code generation.
|
|
|
|
## 🎯 Key Philosophy
|
|
|
|
> **Benchmark Note:** HumanEval/MBPP don't apply here. RuvLTRA isn't designed to compete with Claude for code generation from scratch.
|
|
|
|
### Use Case Comparison
|
|
|
|
| Task | RuvLTRA | Claude API |
|
|
|------|---------|------------|
|
|
| Route task to correct agent | ✅ Local, fast, **100% accuracy** | Overkill |
|
|
| Generate embeddings for HNSW | ✅ Purpose-built | No embedding API |
|
|
| Quick classification/routing | ✅ <10ms local | ~500ms+ API |
|
|
| Memory retrieval scoring | ✅ Integrated | Not designed for |
|
|
| Complex code generation | ❌ Use Claude | ✅ |
|
|
| Multi-step reasoning | ❌ Use Claude | ✅ |
|
|
|
|
---
|
|
|
|
## 🚀 SOTA: 100% Routing Accuracy + Enhanced Embeddings
|
|
|
|
Using **hybrid keyword+embedding strategy** plus **contrastive fine-tuning**, RuvLTRA now achieves:
|
|
|
|
### SOTA Benchmark Results
|
|
|
|
| Metric | Before | After | Method |
|
|
|--------|--------|-------|--------|
|
|
| **Hybrid Routing** | 95% | **100%** | Keyword-First + Embedding Fallback |
|
|
| **Embedding-Only** | 45% | **88.2%** | Contrastive Learning (Triplet + InfoNCE) |
|
|
| **Hard Negatives** | N/A | **81.2%** | Claude Opus 4.5 Generated Pairs |
|
|
|
|
### Strategy Comparison (20 test cases)
|
|
|
|
| Strategy | RuvLTRA | Qwen Base | Improvement |
|
|
|----------|---------|-----------|-------------|
|
|
| Embedding Only | 88.2% | 40.0% | +48.2 pts |
|
|
| **Keyword-First Hybrid** | **100.0%** | 95.0% | +5 pts |
|
|
|
|
### Training Enhancements (v2.4 - Ecosystem Edition)
|
|
|
|
- **2,545 training triplets** (1,078 SOTA + 1,467 ecosystem)
|
|
- **Full ecosystem coverage**: claude-flow, agentic-flow, ruvector
|
|
- **388 total capabilities** across all tools
|
|
- **62 validation tests** with 100% accuracy
|
|
- **Claude Opus 4.5** used for generating confusing pairs
|
|
- **Triplet + InfoNCE loss** for contrastive learning
|
|
- **Real Candle training** with gradient-based weight updates
|
|
|
|
### Ecosystem Coverage (v2.4)
|
|
|
|
| Tool | CLI Commands | Agents | Special Features |
|
|
|------|--------------|--------|------------------|
|
|
| **claude-flow** | 26 (179 subcommands) | 58 types | 27 hooks, 12 workers, 29 skills |
|
|
| **agentic-flow** | 17 commands | 33 types | 32 MCP tools, 9 RL algorithms |
|
|
| **ruvector** | 6 CLI, 22 Rust crates | 12 NPM | 6 attention, 4 graph algorithms |
|
|
|
|
### Supported Agent Types (58+)
|
|
|
|
| Agent | Keywords | Use Cases |
|
|
|-------|----------|-----------|
|
|
| `coder` | implement, build, create | Code implementation |
|
|
| `researcher` | research, investigate, explore | Information gathering |
|
|
| `reviewer` | review, pull request, quality | Code review |
|
|
| `tester` | test, unit, integration | Testing |
|
|
| `architect` | design, architecture, schema | System design |
|
|
| `security-architect` | security, vulnerability, xss | Security analysis |
|
|
| `debugger` | debug, fix, bug, error | Bug fixing |
|
|
| `documenter` | jsdoc, comment, readme | Documentation |
|
|
| `refactorer` | refactor, async/await | Code refactoring |
|
|
| `optimizer` | optimize, cache, performance | Performance |
|
|
| `devops` | deploy, ci/cd, kubernetes | DevOps |
|
|
| `api-docs` | openapi, swagger, api spec | API documentation |
|
|
| `planner` | sprint, plan, roadmap | Project planning |
|
|
|
|
### Extended Capabilities (v2.4)
|
|
|
|
| Category | Examples |
|
|
|----------|----------|
|
|
| **MCP Tools** | memory_store, agent_spawn, swarm_init, hooks_pre-task |
|
|
| **Swarm Topologies** | hierarchical, mesh, ring, star, adaptive |
|
|
| **Consensus** | byzantine, raft, gossip, crdt, quorum |
|
|
| **Learning** | SONA train, LoRA finetune, EWC++ consolidate, GRPO optimize |
|
|
| **Attention** | flash, multi-head, linear, hyperbolic, MoE |
|
|
| **Graph** | mincut, GNN embed, spectral, pagerank |
|
|
| **Hardware** | Metal GPU, NEON SIMD, ANE neural engine |
|
|
|
|
---
|
|
|
|
## 💰 Cost Savings
|
|
|
|
| Operation | Claude API | RuvLTRA Local | Savings |
|
|
|-----------|------------|---------------|---------|
|
|
| Task routing | $0.003 / call | $0 | **100%** |
|
|
| Embedding generation | $0.0001 / call | $0 | **100%** |
|
|
| Latency | ~500ms | <10ms | **50x faster** |
|
|
|
|
**Monthly example:** ~$250/month savings (50K routing calls + 100K embeddings)
|
|
|
|
---
|
|
|
|
## 📦 Available Models
|
|
|
|
| Model | Size | RAM | Latency |
|
|
|-------|------|-----|---------|
|
|
| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
|
|
| `ruvltra-small-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
|
|
| `ruvltra-medium-1.1b-q4_k_m.gguf` | 800 MB | ~1 GB | <20ms |
|
|
|
|
---
|
|
|
|
## 🛠️ Quick Start
|
|
|
|
### Installation
|
|
```bash
|
|
npx ruvector install
|
|
```
|
|
|
|
### Download Models
|
|
```bash
|
|
wget https://huggingface.co/ruv/ruvltra/resolve/main/ruvltra-claude-code-0.5b-q4_k_m.gguf
|
|
```
|
|
|
|
### Python Example
|
|
```python
|
|
from llama_cpp import Llama
|
|
|
|
router = Llama(model_path="ruvltra-claude-code-0.5b-q4_k_m.gguf", n_ctx=512)
|
|
result = router("Route: Add validation\nAgent:", max_tokens=8)
|
|
print(result['choices'][0]['text']) # -> "coder"
|
|
```
|
|
|
|
### Rust Example
|
|
```rust
|
|
use ruvllm::backends::{create_backend, GenerateParams};
|
|
|
|
let mut llm = create_backend();
|
|
llm.load_model("ruvltra-claude-code-0.5b-q4_k_m.gguf", Default::default())?;
|
|
|
|
let agent = llm.generate("Route: fix bug\nAgent:", GenerateParams::default().with_max_tokens(8))?;
|
|
```
|
|
|
|
### Node.js Example (Hybrid Routing)
|
|
```javascript
|
|
const { SemanticRouter } = require('@ruvector/ruvllm');
|
|
|
|
const router = new SemanticRouter({
|
|
modelPath: 'ruvltra-claude-code-0.5b-q4_k_m.gguf',
|
|
strategy: 'keyword-first' // 100% accuracy
|
|
});
|
|
|
|
const result = await router.route('Implement authentication system');
|
|
// { agent: 'coder', confidence: 0.92 }
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Hybrid Routing Algorithm
|
|
|
|
The model achieves 100% accuracy using a two-stage routing strategy:
|
|
|
|
```
|
|
1. KEYWORD MATCHING (Primary)
|
|
- Check task for trigger keywords
|
|
- Priority ordering resolves conflicts
|
|
- "investigate" → researcher (priority)
|
|
- "optimize queries" → optimizer
|
|
|
|
2. EMBEDDING FALLBACK (Secondary)
|
|
- If no keywords match, use embeddings
|
|
- Compare task embedding vs agent descriptions
|
|
- Cosine similarity for ranking
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Technical Specifications
|
|
|
|
| Specification | Value |
|
|
|--------------|-------|
|
|
| Base Model | Qwen2.5-0.5B-Instruct |
|
|
| Parameters | 494M |
|
|
| Embedding Dimensions | 896 |
|
|
| Quantization | Q4_K_M |
|
|
| File Size | 398 MB |
|
|
| Context Length | 32768 tokens |
|
|
|
|
---
|
|
|
|
## 📦 Rust Crates
|
|
|
|
| Crate | Description |
|
|
|-------|-------------|
|
|
| **ruvllm** | LLM runtime with SONA learning |
|
|
| **ruvector-core** | HNSW vector database |
|
|
| **ruvector-sona** | Self-optimizing neural architecture |
|
|
| **ruvector-attention** | Attention mechanisms |
|
|
| **ruvector-gnn** | Graph neural network on HNSW |
|
|
| **ruvector-graph** | Distributed hypergraph database |
|
|
|
|
```toml
|
|
[dependencies]
|
|
ruvllm = "0.1"
|
|
ruvector-core = { version = "0.1", features = ["hnsw", "simd"] }
|
|
ruvector-sona = { version = "0.1", features = ["serde-support"] }
|
|
```
|
|
|
|
---
|
|
|
|
## 💻 Requirements
|
|
|
|
| Component | Minimum |
|
|
|-----------|---------|
|
|
| RAM | 500 MB |
|
|
| Storage | 400 MB |
|
|
| Rust | 1.70+ |
|
|
| Node | 18+ |
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture
|
|
|
|
```
|
|
Task ──► RuvLTRA ──► Agent Type ──► Claude API
|
|
(free) (100% acc) (pay here)
|
|
|
|
Query ──► RuvLTRA ──► Embedding ──► HNSW ──► Context
|
|
(free) (free) (free) (free)
|
|
```
|
|
|
|
**Philosophy:** Simple, frequent decisions → RuvLTRA (free, <10ms, 100% accurate). Complex reasoning → Claude API (worth the cost).
|
|
|
|
---
|
|
|
|
---
|
|
|
|
<details>
|
|
<summary><b>📋 Training Details</b></summary>
|
|
|
|
### Training Data
|
|
|
|
| Dataset | Count | Description |
|
|
|---------|-------|-------------|
|
|
| Base Triplets | 578 | Claude Code routing examples |
|
|
| Claude Hard Negatives (Batch 1) | 100 | Opus 4.5 generated confusing pairs |
|
|
| Claude Hard Negatives (Batch 2) | 400 | Additional confusing pairs |
|
|
| **Total** | **1,078** | Combined training set |
|
|
|
|
### Training Procedure
|
|
|
|
```
|
|
Pipeline: Hard Negative Generation → Contrastive Training → GRPO Feedback → GGUF Export
|
|
|
|
1. Generate confusing agent pairs using Claude Opus 4.5
|
|
2. Train with Triplet Loss + InfoNCE Loss
|
|
3. Apply GRPO reward scaling from Claude judgments
|
|
4. Export adapter weights for GGUF merging
|
|
```
|
|
|
|
### Hyperparameters
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| Learning Rate | 2e-5 |
|
|
| Batch Size | 32 |
|
|
| Epochs | 30 |
|
|
| Triplet Margin | 0.5 |
|
|
| InfoNCE Temperature | 0.07 |
|
|
| Weight Decay | 0.01 |
|
|
| Optimizer | AdamW |
|
|
|
|
### Training Infrastructure
|
|
|
|
- **Hardware**: Apple Silicon (Metal GPU)
|
|
- **Framework**: Candle (Rust ML)
|
|
- **Training Time**: ~30 seconds for 30 epochs
|
|
- **Final Loss**: 0.168
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>📊 Evaluation Results</b></summary>
|
|
|
|
### Benchmark: Claude Flow Agent Routing (20 test cases)
|
|
|
|
| Strategy | RuvLTRA | Qwen Base | Improvement |
|
|
|----------|---------|-----------|-------------|
|
|
| Embedding Only | 88.2% | 40.0% | **+48.2 pts** |
|
|
| Keyword Only | 100.0% | 100.0% | same |
|
|
| Hybrid 60/40 | 100.0% | 95.0% | +5.0 pts |
|
|
| **Keyword-First** | **100.0%** | 95.0% | **+5.0 pts** |
|
|
|
|
### Per-Agent Accuracy
|
|
|
|
| Agent | Accuracy | Test Cases |
|
|
|-------|----------|------------|
|
|
| coder | 100% | 3 |
|
|
| researcher | 100% | 2 |
|
|
| reviewer | 100% | 2 |
|
|
| tester | 100% | 2 |
|
|
| architect | 100% | 2 |
|
|
| security-architect | 100% | 2 |
|
|
| debugger | 100% | 2 |
|
|
| documenter | 100% | 1 |
|
|
| refactorer | 100% | 1 |
|
|
| optimizer | 100% | 1 |
|
|
| devops | 100% | 1 |
|
|
| api-docs | 100% | 1 |
|
|
|
|
### Hard Negative Performance
|
|
|
|
| Confusing Pair | Accuracy |
|
|
|----------------|----------|
|
|
| coder vs refactorer | 82% |
|
|
| researcher vs architect | 79% |
|
|
| reviewer vs tester | 84% |
|
|
| debugger vs optimizer | 78% |
|
|
| documenter vs api-docs | 85% |
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>⚠️ Limitations & Intended Use</b></summary>
|
|
|
|
### Intended Use
|
|
|
|
✅ **Designed For:**
|
|
- Task routing in Claude Code workflows
|
|
- Agent classification (13 types)
|
|
- Semantic embedding for HNSW search
|
|
- Local inference (<10ms latency)
|
|
- Cost optimization (avoid API calls for routing)
|
|
|
|
❌ **NOT Designed For:**
|
|
- General code generation
|
|
- Multi-step reasoning
|
|
- Chat/conversation
|
|
- Languages other than English
|
|
- Agent types beyond the 13 supported
|
|
|
|
### Known Limitations
|
|
|
|
1. **Fixed Agent Types**: Only routes to 13 predefined agents
|
|
2. **English Only**: Training data is English-only
|
|
3. **Domain Specific**: Optimized for software development tasks
|
|
4. **Embedding Fallback**: 88.2% accuracy when keywords don't match
|
|
5. **Context Length**: Optimal for short task descriptions (<100 tokens)
|
|
|
|
### Bias Considerations
|
|
|
|
- Training data generated from Claude Opus 4.5 may inherit biases
|
|
- Agent keywords favor common software terminology
|
|
- Security-related tasks may be over-classified to security-architect
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>🔧 Model Files & Checksums</b></summary>
|
|
|
|
### Available Files
|
|
|
|
| File | Size | Format | Use Case |
|
|
|------|------|--------|----------|
|
|
| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | GGUF Q4_K_M | Production routing |
|
|
| `ruvltra-small-0.5b-q4_k_m.gguf` | 398 MB | GGUF Q4_K_M | General embeddings |
|
|
| `ruvltra-medium-1.1b-q4_k_m.gguf` | 800 MB | GGUF Q4_K_M | Higher accuracy |
|
|
| `training/v2.3-sota-stats.json` | 1 KB | JSON | Training metrics |
|
|
| `training/v2.3-info.json` | 2 KB | JSON | Training config |
|
|
|
|
### Version History
|
|
|
|
| Version | Date | Changes |
|
|
|---------|------|---------|
|
|
| v2.3 | 2025-01-20 | 500+ hard negatives, 48% ratio, GRPO feedback |
|
|
| v2.2 | 2025-01-15 | 100 hard negatives, 18% ratio |
|
|
| v2.1 | 2025-01-10 | Contrastive learning, triplet loss |
|
|
| v2.0 | 2025-01-05 | Hybrid routing strategy |
|
|
| v1.0 | 2024-12-20 | Initial release |
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>📖 Citation</b></summary>
|
|
|
|
### BibTeX
|
|
|
|
```bibtex
|
|
@software{ruvltra2025,
|
|
title = {RuvLTRA: Local Task Routing for Claude Code Workflows},
|
|
author = {ruv},
|
|
year = {2025},
|
|
url = {https://huggingface.co/ruv/ruvltra},
|
|
version = {2.3},
|
|
license = {Apache-2.0},
|
|
keywords = {agent-routing, embeddings, claude-code, contrastive-learning}
|
|
}
|
|
```
|
|
|
|
### Plain Text
|
|
|
|
```
|
|
ruv. (2025). RuvLTRA: Local Task Routing for Claude Code Workflows (Version 2.3).
|
|
https://huggingface.co/ruv/ruvltra
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>❓ FAQ & Troubleshooting</b></summary>
|
|
|
|
### Common Questions
|
|
|
|
**Q: Why use this instead of Claude API for routing?**
|
|
A: RuvLTRA is free, runs locally in <10ms, and achieves 100% accuracy with hybrid strategy. Claude API adds latency (~500ms) and costs ~$0.003 per call.
|
|
|
|
**Q: Can I add custom agent types?**
|
|
A: Not with the current model. You'd need to fine-tune with triplets including your custom agents.
|
|
|
|
**Q: Does it work offline?**
|
|
A: Yes, fully offline after downloading the GGUF model.
|
|
|
|
**Q: What's the difference between embedding-only and hybrid?**
|
|
A: Embedding-only uses semantic similarity (88.2% accuracy). Hybrid checks keywords first, then falls back to embeddings (100% accuracy).
|
|
|
|
### Troubleshooting
|
|
|
|
**Model loading fails:**
|
|
```bash
|
|
# Ensure you have enough RAM (500MB+)
|
|
# Check file integrity
|
|
sha256sum ruvltra-claude-code-0.5b-q4_k_m.gguf
|
|
```
|
|
|
|
**Low accuracy:**
|
|
```javascript
|
|
// Use keyword-first strategy for 100% accuracy
|
|
const router = new SemanticRouter({
|
|
strategy: 'keyword-first' // Not 'embedding-only'
|
|
});
|
|
```
|
|
|
|
**Slow inference:**
|
|
```bash
|
|
# Enable Metal GPU on Apple Silicon
|
|
export GGML_METAL=1
|
|
```
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
## 📄 License
|
|
|
|
Apache 2.0 - Free for commercial and personal use.
|
|
|
|
## 🔗 Links
|
|
|
|
- [GitHub Repository](https://github.com/ruvnet/ruvector)
|
|
- [Claude Flow](https://github.com/ruvnet/claude-flow)
|
|
- [Documentation](https://github.com/ruvnet/ruvector/tree/main/docs)
|
|
- [Training Code](https://github.com/ruvnet/ruvector/tree/main/crates/ruvllm/src/training)
|
|
- [NPM Package](https://www.npmjs.com/package/@ruvector/ruvllm)
|
|
|
|
## 🏷️ Keywords
|
|
|
|
`agent-routing` `task-classification` `claude-code` `embeddings` `semantic-search` `gguf` `quantized` `edge-ai` `local-inference` `contrastive-learning` `triplet-loss` `infonce` `qwen` `llm` `mlops` `cost-optimization` `multi-agent` `swarm` `ruvector` `sona`
|