Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
537
npm/packages/ruvllm/scripts/huggingface/README.md
Normal file
537
npm/packages/ruvllm/scripts/huggingface/README.md
Normal file
@@ -0,0 +1,537 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
tags:
|
||||
- llm
|
||||
- code-generation
|
||||
- claude-code
|
||||
- sona
|
||||
- swarm
|
||||
- multi-agent
|
||||
- gguf
|
||||
- quantized
|
||||
- edge-ai
|
||||
- self-learning
|
||||
- ruvector
|
||||
- embeddings
|
||||
- routing
|
||||
- cost-optimization
|
||||
- contrastive-learning
|
||||
- triplet-loss
|
||||
- infonce
|
||||
- agent-routing
|
||||
- sota
|
||||
- task-routing
|
||||
- semantic-search
|
||||
library_name: ruvllm
|
||||
pipeline_tag: text-classification
|
||||
base_model: Qwen/Qwen2.5-0.5B-Instruct
|
||||
datasets:
|
||||
- custom
|
||||
model-index:
|
||||
- name: RuvLTRA Claude Code 0.5B
|
||||
results:
|
||||
- task:
|
||||
type: text-classification
|
||||
name: Agent Routing
|
||||
dataset:
|
||||
type: custom
|
||||
name: Claude Flow Routing Triplets
|
||||
metrics:
|
||||
- type: accuracy
|
||||
value: 0.882
|
||||
name: Embedding-Only Accuracy
|
||||
- type: accuracy
|
||||
value: 1.0
|
||||
name: Hybrid Routing Accuracy
|
||||
- type: accuracy
|
||||
value: 0.812
|
||||
name: Hard Negative Accuracy
|
||||
widget:
|
||||
- text: "Route: Implement authentication\nAgent:"
|
||||
example_title: Code Task
|
||||
- text: "Route: Review the pull request\nAgent:"
|
||||
example_title: Review Task
|
||||
- text: "Route: Fix the null pointer bug\nAgent:"
|
||||
example_title: Debug Task
|
||||
- text: "Route: Design database schema\nAgent:"
|
||||
example_title: Architecture Task
|
||||
---
|
||||
|
||||
# RuvLTRA
|
||||
|
||||
<p align="center">
|
||||
<img src="https://img.shields.io/badge/Hybrid_Routing-100%25-brightgreen" alt="Hybrid Accuracy">
|
||||
<img src="https://img.shields.io/badge/Embedding-88.2%25-green" alt="Embedding Accuracy">
|
||||
<img src="https://img.shields.io/badge/GGUF-Q4__K__M-blue" alt="GGUF">
|
||||
<img src="https://img.shields.io/badge/Latency-<10ms-orange" alt="Latency">
|
||||
<img src="https://img.shields.io/badge/Capabilities-388-cyan" alt="Capabilities">
|
||||
<img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
|
||||
</p>
|
||||
|
||||
**RuvLTRA** is a collection of optimized models designed for **local routing, embeddings, and task classification** in Claude Code workflows—not for general code generation.
|
||||
|
||||
## 🎯 Key Philosophy
|
||||
|
||||
> **Benchmark Note:** HumanEval/MBPP don't apply here. RuvLTRA isn't designed to compete with Claude for code generation from scratch.
|
||||
|
||||
### Use Case Comparison
|
||||
|
||||
| Task | RuvLTRA | Claude API |
|
||||
|------|---------|------------|
|
||||
| Route task to correct agent | ✅ Local, fast, **100% accuracy** | Overkill |
|
||||
| Generate embeddings for HNSW | ✅ Purpose-built | No embedding API |
|
||||
| Quick classification/routing | ✅ <10ms local | ~500ms+ API |
|
||||
| Memory retrieval scoring | ✅ Integrated | Not designed for |
|
||||
| Complex code generation | ❌ Use Claude | ✅ |
|
||||
| Multi-step reasoning | ❌ Use Claude | ✅ |
|
||||
|
||||
---
|
||||
|
||||
## 🚀 SOTA: 100% Routing Accuracy + Enhanced Embeddings
|
||||
|
||||
Using **hybrid keyword+embedding strategy** plus **contrastive fine-tuning**, RuvLTRA now achieves:
|
||||
|
||||
### SOTA Benchmark Results
|
||||
|
||||
| Metric | Before | After | Method |
|
||||
|--------|--------|-------|--------|
|
||||
| **Hybrid Routing** | 95% | **100%** | Keyword-First + Embedding Fallback |
|
||||
| **Embedding-Only** | 45% | **88.2%** | Contrastive Learning (Triplet + InfoNCE) |
|
||||
| **Hard Negatives** | N/A | **81.2%** | Claude Opus 4.5 Generated Pairs |
|
||||
|
||||
### Strategy Comparison (20 test cases)
|
||||
|
||||
| Strategy | RuvLTRA | Qwen Base | Improvement |
|
||||
|----------|---------|-----------|-------------|
|
||||
| Embedding Only | 88.2% | 40.0% | +48.2 pts |
|
||||
| **Keyword-First Hybrid** | **100.0%** | 95.0% | +5 pts |
|
||||
|
||||
### Training Enhancements (v2.4 - Ecosystem Edition)
|
||||
|
||||
- **2,545 training triplets** (1,078 SOTA + 1,467 ecosystem)
|
||||
- **Full ecosystem coverage**: claude-flow, agentic-flow, ruvector
|
||||
- **388 total capabilities** across all tools
|
||||
- **62 validation tests** with 100% accuracy
|
||||
- **Claude Opus 4.5** used for generating confusing pairs
|
||||
- **Triplet + InfoNCE loss** for contrastive learning
|
||||
- **Real Candle training** with gradient-based weight updates
|
||||
|
||||
### Ecosystem Coverage (v2.4)
|
||||
|
||||
| Tool | CLI Commands | Agents | Special Features |
|
||||
|------|--------------|--------|------------------|
|
||||
| **claude-flow** | 26 (179 subcommands) | 58 types | 27 hooks, 12 workers, 29 skills |
|
||||
| **agentic-flow** | 17 commands | 33 types | 32 MCP tools, 9 RL algorithms |
|
||||
| **ruvector** | 6 CLI, 22 Rust crates | 12 NPM | 6 attention, 4 graph algorithms |
|
||||
|
||||
### Supported Agent Types (58+)
|
||||
|
||||
| Agent | Keywords | Use Cases |
|
||||
|-------|----------|-----------|
|
||||
| `coder` | implement, build, create | Code implementation |
|
||||
| `researcher` | research, investigate, explore | Information gathering |
|
||||
| `reviewer` | review, pull request, quality | Code review |
|
||||
| `tester` | test, unit, integration | Testing |
|
||||
| `architect` | design, architecture, schema | System design |
|
||||
| `security-architect` | security, vulnerability, xss | Security analysis |
|
||||
| `debugger` | debug, fix, bug, error | Bug fixing |
|
||||
| `documenter` | jsdoc, comment, readme | Documentation |
|
||||
| `refactorer` | refactor, async/await | Code refactoring |
|
||||
| `optimizer` | optimize, cache, performance | Performance |
|
||||
| `devops` | deploy, ci/cd, kubernetes | DevOps |
|
||||
| `api-docs` | openapi, swagger, api spec | API documentation |
|
||||
| `planner` | sprint, plan, roadmap | Project planning |
|
||||
|
||||
### Extended Capabilities (v2.4)
|
||||
|
||||
| Category | Examples |
|
||||
|----------|----------|
|
||||
| **MCP Tools** | memory_store, agent_spawn, swarm_init, hooks_pre-task |
|
||||
| **Swarm Topologies** | hierarchical, mesh, ring, star, adaptive |
|
||||
| **Consensus** | byzantine, raft, gossip, crdt, quorum |
|
||||
| **Learning** | SONA train, LoRA finetune, EWC++ consolidate, GRPO optimize |
|
||||
| **Attention** | flash, multi-head, linear, hyperbolic, MoE |
|
||||
| **Graph** | mincut, GNN embed, spectral, pagerank |
|
||||
| **Hardware** | Metal GPU, NEON SIMD, ANE neural engine |
|
||||
|
||||
---
|
||||
|
||||
## 💰 Cost Savings
|
||||
|
||||
| Operation | Claude API | RuvLTRA Local | Savings |
|
||||
|-----------|------------|---------------|---------|
|
||||
| Task routing | $0.003 / call | $0 | **100%** |
|
||||
| Embedding generation | $0.0001 / call | $0 | **100%** |
|
||||
| Latency | ~500ms | <10ms | **50x faster** |
|
||||
|
||||
**Monthly example:** ~$250/month savings (50K routing calls + 100K embeddings)
|
||||
|
||||
---
|
||||
|
||||
## 📦 Available Models
|
||||
|
||||
| Model | Size | RAM | Latency |
|
||||
|-------|------|-----|---------|
|
||||
| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
|
||||
| `ruvltra-small-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
|
||||
| `ruvltra-medium-1.1b-q4_k_m.gguf` | 800 MB | ~1 GB | <20ms |
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Quick Start
|
||||
|
||||
### Installation
|
||||
```bash
|
||||
npx ruvector install
|
||||
```
|
||||
|
||||
### Download Models
|
||||
```bash
|
||||
wget https://huggingface.co/ruv/ruvltra/resolve/main/ruvltra-claude-code-0.5b-q4_k_m.gguf
|
||||
```
|
||||
|
||||
### Python Example
|
||||
```python
|
||||
from llama_cpp import Llama
|
||||
|
||||
router = Llama(model_path="ruvltra-claude-code-0.5b-q4_k_m.gguf", n_ctx=512)
|
||||
result = router("Route: Add validation\nAgent:", max_tokens=8)
|
||||
print(result['choices'][0]['text']) # -> "coder"
|
||||
```
|
||||
|
||||
### Rust Example
|
||||
```rust
|
||||
use ruvllm::backends::{create_backend, GenerateParams};
|
||||
|
||||
let mut llm = create_backend();
|
||||
llm.load_model("ruvltra-claude-code-0.5b-q4_k_m.gguf", Default::default())?;
|
||||
|
||||
let agent = llm.generate("Route: fix bug\nAgent:", GenerateParams::default().with_max_tokens(8))?;
|
||||
```
|
||||
|
||||
### Node.js Example (Hybrid Routing)
|
||||
```javascript
|
||||
const { SemanticRouter } = require('@ruvector/ruvllm');
|
||||
|
||||
const router = new SemanticRouter({
|
||||
modelPath: 'ruvltra-claude-code-0.5b-q4_k_m.gguf',
|
||||
strategy: 'keyword-first' // 100% accuracy
|
||||
});
|
||||
|
||||
const result = await router.route('Implement authentication system');
|
||||
// { agent: 'coder', confidence: 0.92 }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Hybrid Routing Algorithm
|
||||
|
||||
The model achieves 100% accuracy using a two-stage routing strategy:
|
||||
|
||||
```
|
||||
1. KEYWORD MATCHING (Primary)
|
||||
- Check task for trigger keywords
|
||||
- Priority ordering resolves conflicts
|
||||
- "investigate" → researcher (priority)
|
||||
- "optimize queries" → optimizer
|
||||
|
||||
2. EMBEDDING FALLBACK (Secondary)
|
||||
- If no keywords match, use embeddings
|
||||
- Compare task embedding vs agent descriptions
|
||||
- Cosine similarity for ranking
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Technical Specifications
|
||||
|
||||
| Specification | Value |
|
||||
|--------------|-------|
|
||||
| Base Model | Qwen2.5-0.5B-Instruct |
|
||||
| Parameters | 494M |
|
||||
| Embedding Dimensions | 896 |
|
||||
| Quantization | Q4_K_M |
|
||||
| File Size | 398 MB |
|
||||
| Context Length | 32768 tokens |
|
||||
|
||||
---
|
||||
|
||||
## 📦 Rust Crates
|
||||
|
||||
| Crate | Description |
|
||||
|-------|-------------|
|
||||
| **ruvllm** | LLM runtime with SONA learning |
|
||||
| **ruvector-core** | HNSW vector database |
|
||||
| **ruvector-sona** | Self-optimizing neural architecture |
|
||||
| **ruvector-attention** | Attention mechanisms |
|
||||
| **ruvector-gnn** | Graph neural network on HNSW |
|
||||
| **ruvector-graph** | Distributed hypergraph database |
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
ruvllm = "0.1"
|
||||
ruvector-core = { version = "0.1", features = ["hnsw", "simd"] }
|
||||
ruvector-sona = { version = "0.1", features = ["serde-support"] }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💻 Requirements
|
||||
|
||||
| Component | Minimum |
|
||||
|-----------|---------|
|
||||
| RAM | 500 MB |
|
||||
| Storage | 400 MB |
|
||||
| Rust | 1.70+ |
|
||||
| Node | 18+ |
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
```
|
||||
Task ──► RuvLTRA ──► Agent Type ──► Claude API
|
||||
(free) (100% acc) (pay here)
|
||||
|
||||
Query ──► RuvLTRA ──► Embedding ──► HNSW ──► Context
|
||||
(free) (free) (free) (free)
|
||||
```
|
||||
|
||||
**Philosophy:** Simple, frequent decisions → RuvLTRA (free, <10ms, 100% accurate). Complex reasoning → Claude API (worth the cost).
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
<details>
|
||||
<summary><b>📋 Training Details</b></summary>
|
||||
|
||||
### Training Data
|
||||
|
||||
| Dataset | Count | Description |
|
||||
|---------|-------|-------------|
|
||||
| Base Triplets | 578 | Claude Code routing examples |
|
||||
| Claude Hard Negatives (Batch 1) | 100 | Opus 4.5 generated confusing pairs |
|
||||
| Claude Hard Negatives (Batch 2) | 400 | Additional confusing pairs |
|
||||
| **Total** | **1,078** | Combined training set |
|
||||
|
||||
### Training Procedure
|
||||
|
||||
```
|
||||
Pipeline: Hard Negative Generation → Contrastive Training → GRPO Feedback → GGUF Export
|
||||
|
||||
1. Generate confusing agent pairs using Claude Opus 4.5
|
||||
2. Train with Triplet Loss + InfoNCE Loss
|
||||
3. Apply GRPO reward scaling from Claude judgments
|
||||
4. Export adapter weights for GGUF merging
|
||||
```
|
||||
|
||||
### Hyperparameters
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Learning Rate | 2e-5 |
|
||||
| Batch Size | 32 |
|
||||
| Epochs | 30 |
|
||||
| Triplet Margin | 0.5 |
|
||||
| InfoNCE Temperature | 0.07 |
|
||||
| Weight Decay | 0.01 |
|
||||
| Optimizer | AdamW |
|
||||
|
||||
### Training Infrastructure
|
||||
|
||||
- **Hardware**: Apple Silicon (Metal GPU)
|
||||
- **Framework**: Candle (Rust ML)
|
||||
- **Training Time**: ~30 seconds for 30 epochs
|
||||
- **Final Loss**: 0.168
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>📊 Evaluation Results</b></summary>
|
||||
|
||||
### Benchmark: Claude Flow Agent Routing (20 test cases)
|
||||
|
||||
| Strategy | RuvLTRA | Qwen Base | Improvement |
|
||||
|----------|---------|-----------|-------------|
|
||||
| Embedding Only | 88.2% | 40.0% | **+48.2 pts** |
|
||||
| Keyword Only | 100.0% | 100.0% | same |
|
||||
| Hybrid 60/40 | 100.0% | 95.0% | +5.0 pts |
|
||||
| **Keyword-First** | **100.0%** | 95.0% | **+5.0 pts** |
|
||||
|
||||
### Per-Agent Accuracy
|
||||
|
||||
| Agent | Accuracy | Test Cases |
|
||||
|-------|----------|------------|
|
||||
| coder | 100% | 3 |
|
||||
| researcher | 100% | 2 |
|
||||
| reviewer | 100% | 2 |
|
||||
| tester | 100% | 2 |
|
||||
| architect | 100% | 2 |
|
||||
| security-architect | 100% | 2 |
|
||||
| debugger | 100% | 2 |
|
||||
| documenter | 100% | 1 |
|
||||
| refactorer | 100% | 1 |
|
||||
| optimizer | 100% | 1 |
|
||||
| devops | 100% | 1 |
|
||||
| api-docs | 100% | 1 |
|
||||
|
||||
### Hard Negative Performance
|
||||
|
||||
| Confusing Pair | Accuracy |
|
||||
|----------------|----------|
|
||||
| coder vs refactorer | 82% |
|
||||
| researcher vs architect | 79% |
|
||||
| reviewer vs tester | 84% |
|
||||
| debugger vs optimizer | 78% |
|
||||
| documenter vs api-docs | 85% |
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>⚠️ Limitations & Intended Use</b></summary>
|
||||
|
||||
### Intended Use
|
||||
|
||||
✅ **Designed For:**
|
||||
- Task routing in Claude Code workflows
|
||||
- Agent classification (13 types)
|
||||
- Semantic embedding for HNSW search
|
||||
- Local inference (<10ms latency)
|
||||
- Cost optimization (avoid API calls for routing)
|
||||
|
||||
❌ **NOT Designed For:**
|
||||
- General code generation
|
||||
- Multi-step reasoning
|
||||
- Chat/conversation
|
||||
- Languages other than English
|
||||
- Agent types beyond the 13 supported
|
||||
|
||||
### Known Limitations
|
||||
|
||||
1. **Fixed Agent Types**: Only routes to 13 predefined agents
|
||||
2. **English Only**: Training data is English-only
|
||||
3. **Domain Specific**: Optimized for software development tasks
|
||||
4. **Embedding Fallback**: 88.2% accuracy when keywords don't match
|
||||
5. **Context Length**: Optimal for short task descriptions (<100 tokens)
|
||||
|
||||
### Bias Considerations
|
||||
|
||||
- Training data generated from Claude Opus 4.5 may inherit biases
|
||||
- Agent keywords favor common software terminology
|
||||
- Security-related tasks may be over-classified to security-architect
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>🔧 Model Files & Checksums</b></summary>
|
||||
|
||||
### Available Files
|
||||
|
||||
| File | Size | Format | Use Case |
|
||||
|------|------|--------|----------|
|
||||
| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | GGUF Q4_K_M | Production routing |
|
||||
| `ruvltra-small-0.5b-q4_k_m.gguf` | 398 MB | GGUF Q4_K_M | General embeddings |
|
||||
| `ruvltra-medium-1.1b-q4_k_m.gguf` | 800 MB | GGUF Q4_K_M | Higher accuracy |
|
||||
| `training/v2.3-sota-stats.json` | 1 KB | JSON | Training metrics |
|
||||
| `training/v2.3-info.json` | 2 KB | JSON | Training config |
|
||||
|
||||
### Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| v2.3 | 2025-01-20 | 500+ hard negatives, 48% ratio, GRPO feedback |
|
||||
| v2.2 | 2025-01-15 | 100 hard negatives, 18% ratio |
|
||||
| v2.1 | 2025-01-10 | Contrastive learning, triplet loss |
|
||||
| v2.0 | 2025-01-05 | Hybrid routing strategy |
|
||||
| v1.0 | 2024-12-20 | Initial release |
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>📖 Citation</b></summary>
|
||||
|
||||
### BibTeX
|
||||
|
||||
```bibtex
|
||||
@software{ruvltra2025,
|
||||
title = {RuvLTRA: Local Task Routing for Claude Code Workflows},
|
||||
author = {ruv},
|
||||
year = {2025},
|
||||
url = {https://huggingface.co/ruv/ruvltra},
|
||||
version = {2.3},
|
||||
license = {Apache-2.0},
|
||||
keywords = {agent-routing, embeddings, claude-code, contrastive-learning}
|
||||
}
|
||||
```
|
||||
|
||||
### Plain Text
|
||||
|
||||
```
|
||||
ruv. (2025). RuvLTRA: Local Task Routing for Claude Code Workflows (Version 2.3).
|
||||
https://huggingface.co/ruv/ruvltra
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>❓ FAQ & Troubleshooting</b></summary>
|
||||
|
||||
### Common Questions
|
||||
|
||||
**Q: Why use this instead of Claude API for routing?**
|
||||
A: RuvLTRA is free, runs locally in <10ms, and achieves 100% accuracy with hybrid strategy. Claude API adds latency (~500ms) and costs ~$0.003 per call.
|
||||
|
||||
**Q: Can I add custom agent types?**
|
||||
A: Not with the current model. You'd need to fine-tune with triplets including your custom agents.
|
||||
|
||||
**Q: Does it work offline?**
|
||||
A: Yes, fully offline after downloading the GGUF model.
|
||||
|
||||
**Q: What's the difference between embedding-only and hybrid?**
|
||||
A: Embedding-only uses semantic similarity (88.2% accuracy). Hybrid checks keywords first, then falls back to embeddings (100% accuracy).
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
**Model loading fails:**
|
||||
```bash
|
||||
# Ensure you have enough RAM (500MB+)
|
||||
# Check file integrity
|
||||
sha256sum ruvltra-claude-code-0.5b-q4_k_m.gguf
|
||||
```
|
||||
|
||||
**Low accuracy:**
|
||||
```javascript
|
||||
// Use keyword-first strategy for 100% accuracy
|
||||
const router = new SemanticRouter({
|
||||
strategy: 'keyword-first' // Not 'embedding-only'
|
||||
});
|
||||
```
|
||||
|
||||
**Slow inference:**
|
||||
```bash
|
||||
# Enable Metal GPU on Apple Silicon
|
||||
export GGML_METAL=1
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
## 📄 License
|
||||
|
||||
Apache 2.0 - Free for commercial and personal use.
|
||||
|
||||
## 🔗 Links
|
||||
|
||||
- [GitHub Repository](https://github.com/ruvnet/ruvector)
|
||||
- [Claude Flow](https://github.com/ruvnet/claude-flow)
|
||||
- [Documentation](https://github.com/ruvnet/ruvector/tree/main/docs)
|
||||
- [Training Code](https://github.com/ruvnet/ruvector/tree/main/crates/ruvllm/src/training)
|
||||
- [NPM Package](https://www.npmjs.com/package/@ruvector/ruvllm)
|
||||
|
||||
## 🏷️ Keywords
|
||||
|
||||
`agent-routing` `task-classification` `claude-code` `embeddings` `semantic-search` `gguf` `quantized` `edge-ai` `local-inference` `contrastive-learning` `triplet-loss` `infonce` `qwen` `llm` `mlops` `cost-optimization` `multi-agent` `swarm` `ruvector` `sona`
|
||||
Reference in New Issue
Block a user