Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
390
crates/ruvector-tiny-dancer-core/README.md
Normal file
390
crates/ruvector-tiny-dancer-core/README.md
Normal file
@@ -0,0 +1,390 @@
|
||||
# Ruvector Tiny Dancer Core
|
||||
|
||||
[](https://crates.io/crates/ruvector-tiny-dancer-core)
|
||||
[](https://docs.rs/ruvector-tiny-dancer-core)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://github.com/ruvnet/ruvector/actions)
|
||||
[](https://www.rust-lang.org)
|
||||
|
||||
Production-grade AI agent routing system with FastGRNN neural inference for **70-85% LLM cost reduction**.
|
||||
|
||||
## 🚀 Introduction
|
||||
|
||||
**The Problem**: AI applications often send every request to expensive, powerful models, even when simpler models could handle the task. This wastes money and resources.
|
||||
|
||||
**The Solution**: Tiny Dancer acts as a smart traffic controller for your AI requests. It quickly analyzes each request and decides whether to route it to a fast, cheap model or a powerful, expensive one.
|
||||
|
||||
**How It Works**:
|
||||
1. You send a request with potential responses (candidates)
|
||||
2. Tiny Dancer scores each candidate in microseconds
|
||||
3. High-confidence candidates go to lightweight models (fast & cheap)
|
||||
4. Low-confidence candidates go to powerful models (accurate but expensive)
|
||||
|
||||
**The Result**: Save 70-85% on AI costs while maintaining quality.
|
||||
|
||||
**Real-World Example**: Instead of sending 100 memory items to GPT-4 for evaluation, Tiny Dancer filters them down to the top 3-5 in microseconds, then sends only those to the expensive model.
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- ⚡ **Sub-millisecond Latency**: 144ns feature extraction, 7.5µs model inference
|
||||
- 💰 **70-85% Cost Reduction**: Intelligent routing to appropriately-sized models
|
||||
- 🧠 **FastGRNN Architecture**: <1MB models with 80-90% sparsity
|
||||
- 🔒 **Circuit Breaker**: Graceful degradation with automatic recovery
|
||||
- 📊 **Uncertainty Quantification**: Conformal prediction for reliable routing
|
||||
- 🗄️ **AgentDB Integration**: Persistent SQLite storage with WAL mode
|
||||
- 🎯 **Multi-Signal Scoring**: Semantic similarity, recency, frequency, success rate
|
||||
- 🔧 **Model Optimization**: INT8 quantization, magnitude pruning
|
||||
|
||||
## 📊 Benchmark Results
|
||||
|
||||
```
|
||||
Feature Extraction:
|
||||
10 candidates: 1.73µs (173ns per candidate)
|
||||
50 candidates: 9.44µs (189ns per candidate)
|
||||
100 candidates: 18.48µs (185ns per candidate)
|
||||
|
||||
Model Inference:
|
||||
Single: 7.50µs
|
||||
Batch 10: 74.94µs (7.49µs per item)
|
||||
Batch 100: 735.45µs (7.35µs per item)
|
||||
|
||||
Complete Routing:
|
||||
10 candidates: 8.83µs
|
||||
50 candidates: 48.23µs
|
||||
100 candidates: 92.86µs
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
Add to your `Cargo.toml`:
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
ruvector-tiny-dancer-core = "0.1.1"
|
||||
```
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```rust
|
||||
use ruvector_tiny_dancer_core::{
|
||||
Router,
|
||||
types::{RouterConfig, RoutingRequest, Candidate},
|
||||
};
|
||||
use std::collections::HashMap;
|
||||
|
||||
// Create router
|
||||
let config = RouterConfig {
|
||||
model_path: "./models/fastgrnn.safetensors".to_string(),
|
||||
confidence_threshold: 0.85,
|
||||
max_uncertainty: 0.15,
|
||||
enable_circuit_breaker: true,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let router = Router::new(config)?;
|
||||
|
||||
// Prepare candidates
|
||||
let candidates = vec![
|
||||
Candidate {
|
||||
id: "candidate-1".to_string(),
|
||||
embedding: vec![0.5; 384],
|
||||
metadata: HashMap::new(),
|
||||
created_at: chrono::Utc::now().timestamp(),
|
||||
access_count: 10,
|
||||
success_rate: 0.95,
|
||||
},
|
||||
];
|
||||
|
||||
// Route request
|
||||
let request = RoutingRequest {
|
||||
query_embedding: vec![0.5; 384],
|
||||
candidates,
|
||||
metadata: None,
|
||||
};
|
||||
|
||||
let response = router.route(request)?;
|
||||
|
||||
// Process decisions
|
||||
for decision in response.decisions {
|
||||
println!("Candidate: {}", decision.candidate_id);
|
||||
println!("Confidence: {:.2}", decision.confidence);
|
||||
println!("Use lightweight: {}", decision.use_lightweight);
|
||||
println!("Inference time: {}µs", response.inference_time_us);
|
||||
}
|
||||
```
|
||||
|
||||
## 📚 Tutorials
|
||||
|
||||
### Tutorial 1: Basic Routing
|
||||
|
||||
```rust
|
||||
use ruvector_tiny_dancer_core::{Router, types::*};
|
||||
|
||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Create default router
|
||||
let router = Router::default()?;
|
||||
|
||||
// Create a simple request
|
||||
let request = RoutingRequest {
|
||||
query_embedding: vec![0.9; 384],
|
||||
candidates: vec![
|
||||
Candidate {
|
||||
id: "high-quality".to_string(),
|
||||
embedding: vec![0.85; 384],
|
||||
metadata: Default::default(),
|
||||
created_at: chrono::Utc::now().timestamp(),
|
||||
access_count: 100,
|
||||
success_rate: 0.98,
|
||||
}
|
||||
],
|
||||
metadata: None,
|
||||
};
|
||||
|
||||
// Route and inspect results
|
||||
let response = router.route(request)?;
|
||||
let decision = &response.decisions[0];
|
||||
|
||||
if decision.use_lightweight {
|
||||
println!("✅ High confidence - route to lightweight model");
|
||||
} else {
|
||||
println!("⚠️ Low confidence - route to powerful model");
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Tutorial 2: Feature Engineering
|
||||
|
||||
```rust
|
||||
use ruvector_tiny_dancer_core::feature_engineering::{FeatureEngineer, FeatureConfig};
|
||||
|
||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Custom feature weights
|
||||
let config = FeatureConfig {
|
||||
similarity_weight: 0.5, // Prioritize semantic similarity
|
||||
recency_weight: 0.3, // Recent items are important
|
||||
frequency_weight: 0.1,
|
||||
success_weight: 0.05,
|
||||
metadata_weight: 0.05,
|
||||
recency_decay: 0.001,
|
||||
};
|
||||
|
||||
let engineer = FeatureEngineer::with_config(config);
|
||||
|
||||
// Extract features
|
||||
let query = vec![0.5; 384];
|
||||
let candidate = Candidate { /* ... */ };
|
||||
let features = engineer.extract_features(&query, &candidate, None)?;
|
||||
|
||||
println!("Semantic similarity: {:.4}", features.semantic_similarity);
|
||||
println!("Recency score: {:.4}", features.recency_score);
|
||||
println!("Combined score: {:.4}",
|
||||
features.features.iter().sum::<f32>());
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Tutorial 3: Circuit Breaker
|
||||
|
||||
```rust
|
||||
use ruvector_tiny_dancer_core::Router;
|
||||
|
||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let router = Router::default()?;
|
||||
|
||||
// Check circuit breaker status
|
||||
match router.circuit_breaker_status() {
|
||||
Some(true) => {
|
||||
println!("✅ Circuit closed - system healthy");
|
||||
// Normal routing
|
||||
}
|
||||
Some(false) => {
|
||||
println!("⚠️ Circuit open - using fallback");
|
||||
// Route to default powerful model
|
||||
}
|
||||
None => {
|
||||
println!("Circuit breaker disabled");
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Tutorial 4: Model Optimization
|
||||
|
||||
```rust
|
||||
use ruvector_tiny_dancer_core::model::{FastGRNN, FastGRNNConfig};
|
||||
|
||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Create model
|
||||
let config = FastGRNNConfig {
|
||||
input_dim: 5,
|
||||
hidden_dim: 8,
|
||||
output_dim: 1,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let mut model = FastGRNN::new(config)?;
|
||||
|
||||
println!("Original size: {} bytes", model.size_bytes());
|
||||
|
||||
// Apply quantization
|
||||
model.quantize()?;
|
||||
println!("After quantization: {} bytes", model.size_bytes());
|
||||
|
||||
// Apply pruning
|
||||
model.prune(0.9)?; // 90% sparsity
|
||||
println!("After pruning: {} bytes", model.size_bytes());
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Tutorial 5: SQLite Storage
|
||||
|
||||
```rust
|
||||
use ruvector_tiny_dancer_core::storage::Storage;
|
||||
|
||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Create storage
|
||||
let storage = Storage::new("./routing.db")?;
|
||||
|
||||
// Insert candidate
|
||||
let candidate = Candidate { /* ... */ };
|
||||
storage.insert_candidate(&candidate)?;
|
||||
|
||||
// Query candidates
|
||||
let candidates = storage.query_candidates(50)?;
|
||||
println!("Retrieved {} candidates", candidates.len());
|
||||
|
||||
// Record routing
|
||||
storage.record_routing(
|
||||
"candidate-1",
|
||||
&vec![0.5; 384],
|
||||
0.92, // confidence
|
||||
true, // use_lightweight
|
||||
0.08, // uncertainty
|
||||
8_500, // inference_time_us
|
||||
)?;
|
||||
|
||||
// Get statistics
|
||||
let stats = storage.get_statistics()?;
|
||||
println!("Total routes: {}", stats.total_routes);
|
||||
println!("Lightweight: {}", stats.lightweight_routes);
|
||||
println!("Avg inference: {:.2}µs", stats.avg_inference_time_us);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
## 🎯 Advanced Usage
|
||||
|
||||
### Hot Model Reloading
|
||||
|
||||
```rust
|
||||
// Reload model without downtime
|
||||
router.reload_model()?;
|
||||
```
|
||||
|
||||
### Custom Configuration
|
||||
|
||||
```rust
|
||||
let config = RouterConfig {
|
||||
model_path: "./models/custom.safetensors".to_string(),
|
||||
confidence_threshold: 0.90, // Higher threshold
|
||||
max_uncertainty: 0.10, // Lower tolerance
|
||||
enable_circuit_breaker: true,
|
||||
circuit_breaker_threshold: 3, // Faster circuit opening
|
||||
enable_quantization: true,
|
||||
database_path: Some("./data/routing.db".to_string()),
|
||||
};
|
||||
```
|
||||
|
||||
### Batch Processing
|
||||
|
||||
```rust
|
||||
let inputs = vec![
|
||||
vec![0.5; 5],
|
||||
vec![0.3; 5],
|
||||
vec![0.8; 5],
|
||||
];
|
||||
|
||||
let scores = model.forward_batch(&inputs)?;
|
||||
// Process 3 inputs in ~22µs total
|
||||
```
|
||||
|
||||
## 📈 Performance Optimization
|
||||
|
||||
### SIMD Acceleration
|
||||
|
||||
Feature extraction uses `simsimd` for hardware-accelerated similarity:
|
||||
- Cosine similarity: **144ns** (384-dim vectors)
|
||||
- Batch processing: **Linear scaling** with candidate count
|
||||
|
||||
### Zero-Copy Operations
|
||||
|
||||
- Memory-mapped models with `memmap2`
|
||||
- Zero-allocation inference paths
|
||||
- Efficient buffer reuse
|
||||
|
||||
### Parallel Processing
|
||||
|
||||
- Rayon-based parallel feature extraction
|
||||
- Batch inference for multiple candidates
|
||||
- Concurrent storage operations with WAL
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `confidence_threshold` | 0.85 | Minimum confidence for lightweight routing |
|
||||
| `max_uncertainty` | 0.15 | Maximum uncertainty tolerance |
|
||||
| `circuit_breaker_threshold` | 5 | Failures before circuit opens |
|
||||
| `recency_decay` | 0.001 | Exponential decay rate for recency |
|
||||
|
||||
## 📊 Cost Analysis
|
||||
|
||||
For 10,000 daily queries at $0.02 per query:
|
||||
|
||||
| Scenario | Reduction | Daily Savings | Annual Savings |
|
||||
|----------|-----------|---------------|----------------|
|
||||
| Conservative | 70% | $132 | $48,240 |
|
||||
| Aggressive | 85% | $164 | $59,876 |
|
||||
|
||||
**Break-even**: ~2 months with typical engineering costs
|
||||
|
||||
## 🔗 Related Projects
|
||||
|
||||
- **WASM**: [ruvector-tiny-dancer-wasm](../ruvector-tiny-dancer-wasm) - Browser/edge deployment
|
||||
- **Node.js**: [ruvector-tiny-dancer-node](../ruvector-tiny-dancer-node) - TypeScript bindings
|
||||
- **Ruvector**: [ruvector-core](../ruvector-core) - Vector database
|
||||
|
||||
## 📚 Resources
|
||||
|
||||
- **Documentation**: [docs.rs/ruvector-tiny-dancer-core](https://docs.rs/ruvector-tiny-dancer-core)
|
||||
- **GitHub**: [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector)
|
||||
- **Website**: [ruv.io](https://ruv.io)
|
||||
- **Examples**: [github.com/ruvnet/ruvector/tree/main/examples](https://github.com/ruvnet/ruvector/tree/main/examples)
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Contributions are welcome! Please see [CONTRIBUTING.md](../../CONTRIBUTING.md) for guidelines.
|
||||
|
||||
## 📄 License
|
||||
|
||||
MIT License - see [LICENSE](../../LICENSE) for details.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- FastGRNN architecture inspired by Microsoft Research
|
||||
- RouteLLM for routing methodology
|
||||
- Cloudflare Workers for WASM deployment patterns
|
||||
|
||||
---
|
||||
|
||||
Built with ❤️ by the [Ruvector Team](https://github.com/ruvnet)
|
||||
Reference in New Issue
Block a user