Files
wifi-densepose/crates/ruvector-router-core/README.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

762 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Router Core
[![Rust](https://img.shields.io/badge/rust-1.77%2B-orange.svg)](https://www.rust-lang.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Performance](https://img.shields.io/badge/latency-<0.5ms-green.svg)](../../docs/TECHNICAL_PLAN.md)
**High-performance vector database and neural routing inference engine built in Rust.**
Core engine powering Ruvector's intelligent request distribution, model selection, and sub-millisecond vector similarity search. Combines advanced indexing algorithms with SIMD-optimized distance calculations for maximum performance.
## 🎯 Overview
Router Core is the foundation of Ruvector's vector database capabilities, providing:
- **Neural Routing**: Intelligent request distribution across multiple models and endpoints
- **Vector Database**: High-performance storage and retrieval with HNSW indexing
- **Model Selection**: Adaptive routing strategies for multi-model AI systems
- **SIMD Acceleration**: Hardware-optimized vector operations via simsimd
- **Memory Efficiency**: Advanced quantization techniques (4-32x compression)
- **Zero Dependencies**: Pure Rust implementation with minimal external dependencies
## ⚡ Key Features
### Core Capabilities
- **Sub-Millisecond Search**: <0.5ms p50 latency with HNSW indexing
- **HNSW Indexing**: Hierarchical Navigable Small World for fast approximate nearest neighbor search
- **Multiple Distance Metrics**: Euclidean, Cosine, Dot Product, Manhattan
- **Advanced Quantization**: Scalar (4x), Product (8-16x), Binary (32x) compression
- **SIMD Optimizations**: Hardware-accelerated distance calculations
- **Zero-Copy I/O**: Memory-mapped files for efficient data access
- **Thread-Safe**: Concurrent read/write operations with minimal locking
- **Persistent Storage**: Durable vector storage with redb backend
### Neural Routing Features
- **Intelligent Request Distribution**: Route queries to optimal model endpoints
- **Load Balancing**: Distribute workload across multiple inference servers
- **Model Selection**: Automatically select best model based on query characteristics
- **Adaptive Strategies**: Learn and optimize routing decisions over time
- **Latency Optimization**: Minimize end-to-end inference time
- **Failover Support**: Automatic fallback to backup endpoints
## 📦 Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
router-core = "0.1.0"
```
Or use the full ruvector package:
```toml
[dependencies]
ruvector-core = "0.1.0"
```
## 🚀 Quick Start
### Basic Vector Database
```rust
use router_core::{VectorDB, VectorEntry, SearchQuery, DistanceMetric};
use std::collections::HashMap;
// Create database with builder pattern
let db = VectorDB::builder()
.dimensions(384) // Vector dimensions
.distance_metric(DistanceMetric::Cosine)
.hnsw_m(32) // HNSW connections per node
.hnsw_ef_construction(200) // Construction accuracy
.storage_path("./vectors.db")
.build()?;
// Insert vectors
let entry = VectorEntry {
id: "doc1".to_string(),
vector: vec![0.1; 384],
metadata: HashMap::new(),
timestamp: chrono::Utc::now().timestamp(),
};
db.insert(entry)?;
// Search for similar vectors
let query = SearchQuery {
vector: vec![0.1; 384],
k: 10, // Top 10 results
filters: None,
threshold: Some(0.8), // Minimum similarity
ef_search: Some(100), // Search accuracy
};
let results = db.search(query)?;
for result in results {
println!("{}: {}", result.id, result.score);
}
```
### Batch Operations
```rust
use router_core::{VectorDB, VectorEntry};
// Insert multiple vectors efficiently
let entries: Vec<VectorEntry> = (0..1000)
.map(|i| VectorEntry {
id: format!("doc{}", i),
vector: vec![0.1; 384],
metadata: HashMap::new(),
timestamp: chrono::Utc::now().timestamp(),
})
.collect();
// Batch insert (much faster than individual inserts)
db.insert_batch(entries)?;
// Check statistics
let stats = db.stats();
println!("Total vectors: {}", stats.total_vectors);
println!("Avg latency: {:.2}μs", stats.avg_query_latency_us);
```
### Advanced Configuration
```rust
use router_core::{VectorDB, DistanceMetric, QuantizationType};
let db = VectorDB::builder()
.dimensions(768) // Larger embeddings
.max_elements(10_000_000) // 10M vectors
.distance_metric(DistanceMetric::Cosine) // Cosine similarity
.hnsw_m(64) // More connections = higher recall
.hnsw_ef_construction(400) // Higher accuracy during build
.hnsw_ef_search(200) // Search-time accuracy
.quantization(QuantizationType::Scalar) // 4x memory compression
.mmap_vectors(true) // Memory-mapped storage
.storage_path("./large_db.redb")
.build()?;
```
## 🧠 Neural Routing Strategies
Router Core supports multiple routing strategies for intelligent request distribution:
### 1. **Round-Robin Routing**
Simple load balancing across endpoints:
```rust
use router_core::routing::{Router, RoundRobinStrategy};
let router = Router::new(RoundRobinStrategy::new(vec![
"http://model1:8080",
"http://model2:8080",
"http://model3:8080",
]));
let endpoint = router.select_endpoint(&query)?;
```
### 2. **Latency-Based Routing**
Route to fastest available endpoint:
```rust
use router_core::routing::{Router, LatencyBasedStrategy};
let router = Router::new(LatencyBasedStrategy::new(vec![
("http://model1:8080", 50), // 50ms avg latency
("http://model2:8080", 30), // 30ms avg latency (preferred)
("http://model3:8080", 100), // 100ms avg latency
]));
```
### 3. **Semantic Routing**
Route based on query similarity to model specializations:
```rust
use router_core::routing::{Router, SemanticStrategy};
// Define model specializations with example vectors
let models = vec![
("general-model", vec![0.1; 384]), // General queries
("code-model", vec![0.8, 0.2, ...]), // Code-related queries
("math-model", vec![0.3, 0.9, ...]), // Math queries
];
let router = Router::new(SemanticStrategy::new(models));
// Routes to most appropriate model based on query vector
let endpoint = router.select_endpoint(&query_vector)?;
```
### 4. **Adaptive Routing**
Learn optimal routing decisions over time:
```rust
use router_core::routing::{Router, AdaptiveStrategy};
let mut router = Router::new(AdaptiveStrategy::new());
// Router learns from feedback
router.record_request(&query, &endpoint, latency, success)?;
// Routing improves with more data
let best_endpoint = router.select_endpoint(&query)?;
```
## 🎨 Distance Metrics
Router Core supports multiple distance metrics with SIMD optimization:
### Cosine Similarity
Best for normalized embeddings (recommended for most AI applications):
```rust
use router_core::{DistanceMetric, distance::calculate_distance};
let a = vec![1.0, 0.0, 0.0];
let b = vec![0.9, 0.1, 0.0];
let dist = calculate_distance(&a, &b, DistanceMetric::Cosine)?;
// Returns 1 - cosine_similarity (0 = identical, 2 = opposite)
```
### Euclidean Distance (L2)
Measures absolute geometric distance:
```rust
let dist = calculate_distance(&a, &b, DistanceMetric::Euclidean)?;
// Returns sqrt(sum((a[i] - b[i])^2))
```
### Dot Product
Fast similarity for pre-normalized vectors:
```rust
let dist = calculate_distance(&a, &b, DistanceMetric::DotProduct)?;
// Returns -sum(a[i] * b[i]) (negated for distance)
```
### Manhattan Distance (L1)
Sum of absolute differences:
```rust
let dist = calculate_distance(&a, &b, DistanceMetric::Manhattan)?;
// Returns sum(|a[i] - b[i]|)
```
## 🗜️ Quantization Techniques
Reduce memory usage with minimal accuracy loss:
### Scalar Quantization (4x compression)
Compress float32 to int8:
```rust
use router_core::{QuantizationType, VectorDB};
let db = VectorDB::builder()
.dimensions(384)
.quantization(QuantizationType::Scalar)
.build()?;
// Automatic quantization on insert
// 384 dims × 4 bytes = 1536 bytes → 384 bytes + overhead
```
### Product Quantization (8-16x compression)
Divide vector into subspaces and quantize independently:
```rust
let db = VectorDB::builder()
.dimensions(384)
.quantization(QuantizationType::Product {
subspaces: 8, // Divide into 8 subspaces
k: 256, // 256 centroids per subspace
})
.build()?;
// 384 dims × 4 bytes = 1536 bytes → 8 bytes + overhead
```
### Binary Quantization (32x compression)
Compress to 1 bit per dimension:
```rust
let db = VectorDB::builder()
.dimensions(384)
.quantization(QuantizationType::Binary)
.build()?;
// 384 dims × 4 bytes = 1536 bytes → 48 bytes + overhead
// Fast Hamming distance for similarity
```
### Compression Ratio Comparison
```rust
use router_core::quantization::calculate_compression_ratio;
let dims = 384;
let none_ratio = calculate_compression_ratio(dims, QuantizationType::None);
// 1x - no compression
let scalar_ratio = calculate_compression_ratio(dims, QuantizationType::Scalar);
// ~4x compression
let product_ratio = calculate_compression_ratio(
dims,
QuantizationType::Product { subspaces: 8, k: 256 }
);
// ~8-16x compression
let binary_ratio = calculate_compression_ratio(dims, QuantizationType::Binary);
// ~32x compression
```
## 📊 HNSW Index Configuration
Tune the HNSW index for your performance/accuracy requirements:
### M Parameter (Connections per Node)
Controls graph connectivity and search accuracy:
```rust
// Low M = faster build, less memory, lower recall
let db_fast = VectorDB::builder()
.hnsw_m(16) // Minimal connections
.build()?;
// Medium M = balanced (default)
let db_balanced = VectorDB::builder()
.hnsw_m(32) // Default setting
.build()?;
// High M = slower build, more memory, higher recall
let db_accurate = VectorDB::builder()
.hnsw_m(64) // Maximum accuracy
.build()?;
```
### ef_construction (Build-Time Accuracy)
Controls accuracy during index construction:
```rust
// Fast build, lower recall
let db_fast = VectorDB::builder()
.hnsw_ef_construction(100)
.build()?;
// Balanced (default)
let db_balanced = VectorDB::builder()
.hnsw_ef_construction(200)
.build()?;
// Slow build, maximum recall
let db_accurate = VectorDB::builder()
.hnsw_ef_construction(400)
.build()?;
```
### ef_search (Query-Time Accuracy)
Can be adjusted per query for dynamic performance/accuracy tradeoff:
```rust
// Fast search, lower recall
let query_fast = SearchQuery {
vector: query_vec,
k: 10,
ef_search: Some(50), // Override default
..Default::default()
};
// Accurate search
let query_accurate = SearchQuery {
vector: query_vec,
k: 10,
ef_search: Some(200), // Higher accuracy
..Default::default()
};
```
## 🎯 Use Cases
### Multi-Model AI Systems
Route queries to specialized models based on content:
```rust
// Route code questions to code model, math to math model, etc.
let router = SemanticRouter::new(vec![
("gpt-4-code", code_specialization_vector),
("gpt-4-math", math_specialization_vector),
("gpt-4-general", general_specialization_vector),
]);
let best_model = router.route(&user_query_embedding)?;
```
### Load Balancing
Distribute inference load across multiple servers:
```rust
// Balance load across 10 GPU servers
let router = LoadBalancer::new(vec![
"gpu-0.internal:8080",
"gpu-1.internal:8080",
// ... gpu-9
]);
let endpoint = router.next_endpoint()?;
```
### RAG (Retrieval-Augmented Generation)
Fast context retrieval for LLMs:
```rust
// Store document embeddings
for doc in documents {
let embedding = embed_model.encode(&doc.text)?;
db.insert(VectorEntry {
id: doc.id,
vector: embedding,
metadata: doc.metadata,
timestamp: now(),
})?;
}
// Retrieve relevant context for query
let query_embedding = embed_model.encode(&user_query)?;
let context_docs = db.search(SearchQuery {
vector: query_embedding,
k: 5, // Top 5 most relevant
threshold: Some(0.7),
..Default::default()
})?;
```
### Semantic Search
Build intelligent search engines:
```rust
// Index product catalog
for product in catalog {
let embedding = encode_product(&product)?;
db.insert(VectorEntry {
id: product.sku,
vector: embedding,
metadata: product.to_metadata(),
timestamp: now(),
})?;
}
// Search by natural language
let search_embedding = encode_query("comfortable running shoes")?;
let results = db.search(SearchQuery {
vector: search_embedding,
k: 20,
filters: Some(HashMap::from([
("category", "footwear"),
("in_stock", true),
])),
..Default::default()
})?;
```
### Agent Memory Systems
Store and retrieve agent experiences:
```rust
// Store agent observations
struct AgentMemory {
db: VectorDB,
}
impl AgentMemory {
pub fn remember(&self, observation: &str, context: Vec<f32>) -> Result<()> {
self.db.insert(VectorEntry {
id: uuid::Uuid::new_v4().to_string(),
vector: context,
metadata: HashMap::from([
("observation", observation.into()),
("timestamp", now().into()),
]),
timestamp: now(),
})
}
pub fn recall(&self, query_context: Vec<f32>, k: usize) -> Result<Vec<String>> {
let results = self.db.search(SearchQuery {
vector: query_context,
k,
..Default::default()
})?;
Ok(results.iter()
.filter_map(|r| r.metadata.get("observation"))
.map(|v| v.as_str().unwrap().to_string())
.collect())
}
}
```
## 🔧 Configuration Guide
### Optimizing for Different Workloads
#### High Throughput (Batch Processing)
```rust
let db = VectorDB::builder()
.dimensions(384)
.hnsw_m(16) // Lower M for faster queries
.hnsw_ef_construction(100) // Faster build
.hnsw_ef_search(50) // Lower default search accuracy
.quantization(QuantizationType::Scalar) // Compress for speed
.mmap_vectors(true) // Reduce memory pressure
.build()?;
```
#### High Accuracy (Research/Analysis)
```rust
let db = VectorDB::builder()
.dimensions(768)
.hnsw_m(64) // Maximum connections
.hnsw_ef_construction(400) // High build accuracy
.hnsw_ef_search(200) // High search accuracy
.quantization(QuantizationType::None) // No compression
.build()?;
```
#### Memory Constrained (Edge Devices)
```rust
let db = VectorDB::builder()
.dimensions(256) // Smaller embeddings
.max_elements(100_000) // Limit dataset size
.hnsw_m(16) // Fewer connections
.quantization(QuantizationType::Binary) // 32x compression
.mmap_vectors(true) // Use disk instead of RAM
.build()?;
```
#### Balanced (Production Default)
```rust
let db = VectorDB::builder()
.dimensions(384)
.hnsw_m(32)
.hnsw_ef_construction(200)
.hnsw_ef_search(100)
.quantization(QuantizationType::Scalar)
.mmap_vectors(true)
.build()?;
```
## 📈 Performance Characteristics
### Latency Benchmarks
```
Configuration Query Latency (p50) Recall@10
─────────────────────────────────────────────────────────
Uncompressed, M=64 0.3ms 98.5%
Scalar Quant, M=32 0.4ms 96.2%
Product Quant, M=32 0.5ms 94.8%
Binary Quant, M=16 0.6ms 91.3%
```
### Memory Usage (1M vectors @ 384 dims)
```
Quantization Memory Usage Compression Ratio
───────────────────────────────────────────────────────
None (float32) 1536 MB 1x
Scalar (int8) 392 MB 3.9x
Product (8 subspaces) 120 MB 12.8x
Binary (1 bit/dim) 52 MB 29.5x
```
### Throughput (1M vectors)
```
Operation Throughput Notes
─────────────────────────────────────────────────────────
Single Insert ~100K/sec Sequential
Batch Insert ~500K/sec Parallel (rayon)
Query (k=10) ~50K QPS ef_search=100
Query (k=100) ~20K QPS ef_search=100
```
## 🏗️ Integration with Vector Database
Router Core integrates seamlessly with the main Ruvector database:
```rust
use ruvector_core::VectorDB as MainDB;
use router_core::VectorDB as RouterDB;
// Use router-core for specialized routing logic
let router_db = RouterDB::builder()
.dimensions(384)
.build()?;
// Or use main ruvector-core for full features
let main_db = MainDB::builder()
.dimensions(384)
.build()?;
// Both share the same API!
```
## 🧪 Building and Testing
### Build
```bash
# Build library
cargo build --release -p router-core
# Build with all features
cargo build --release -p router-core --all-features
# Build static library
cargo build --release -p router-core --lib
```
### Test
```bash
# Run all tests
cargo test -p router-core
# Run specific test
cargo test -p router-core test_hnsw_insert_and_search
# Run with logging
RUST_LOG=debug cargo test -p router-core
```
### Benchmark
```bash
# Run benchmarks
cargo bench -p router-core
# Run specific benchmark
cargo bench -p router-core --bench vector_search
# With criterion output
cargo bench -p router-core -- --output-format verbose
```
## 📚 API Documentation
### Core Types
- **`VectorDB`**: Main database interface
- **`VectorEntry`**: Vector with ID, data, and metadata
- **`SearchQuery`**: Query parameters for similarity search
- **`SearchResult`**: Search result with ID, score, and metadata
- **`DistanceMetric`**: Enum for distance calculation methods
- **`QuantizationType`**: Enum for compression methods
### Key Methods
```rust
// VectorDB
pub fn new(config: VectorDbConfig) -> Result<Self>
pub fn builder() -> VectorDbBuilder
pub fn insert(&self, entry: VectorEntry) -> Result<String>
pub fn insert_batch(&self, entries: Vec<VectorEntry>) -> Result<Vec<String>>
pub fn search(&self, query: SearchQuery) -> Result<Vec<SearchResult>>
pub fn delete(&self, id: &str) -> Result<bool>
pub fn get(&self, id: &str) -> Result<Option<VectorEntry>>
pub fn stats(&self) -> VectorDbStats
pub fn count(&self) -> Result<usize>
// Distance calculations
pub fn calculate_distance(a: &[f32], b: &[f32], metric: DistanceMetric) -> Result<f32>
pub fn batch_distance(query: &[f32], vectors: &[Vec<f32>], metric: DistanceMetric) -> Result<Vec<f32>>
// Quantization
pub fn quantize(vector: &[f32], qtype: QuantizationType) -> Result<QuantizedVector>
pub fn dequantize(quantized: &QuantizedVector) -> Vec<f32>
pub fn calculate_compression_ratio(original_dims: usize, qtype: QuantizationType) -> f32
```
## 🔗 Links
- **Main Repository**: [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector)
- **Documentation**: [docs/README.md](../../docs/README.md)
- **API Reference**: [docs/api/RUST_API.md](../../docs/api/RUST_API.md)
- **Performance Guide**: [docs/optimization/PERFORMANCE_TUNING_GUIDE.md](../../docs/optimization/PERFORMANCE_TUNING_GUIDE.md)
- **Examples**: [examples/](../../examples/)
## 📊 Related Crates
- **`ruvector-core`**: Full-featured vector database (superset of router-core)
- **`ruvector-node`**: Node.js bindings via NAPI-RS
- **`ruvector-wasm`**: WebAssembly bindings for browsers
- **`router-cli`**: Command-line interface for router operations
- **`router-ffi`**: Foreign function interface for C/C++
- **`router-wasm`**: WebAssembly bindings for router
## 🤝 Contributing
Contributions are welcome! Please see:
- **[Contributing Guidelines](../../docs/development/CONTRIBUTING.md)**
- **[Development Guide](../../docs/development/MIGRATION.md)**
- **[Code of Conduct](../../CODE_OF_CONDUCT.md)**
## 📜 License
MIT License - see [LICENSE](../../LICENSE) for details.
## 🙏 Acknowledgments
Built with battle-tested technologies:
- **HNSW**: Hierarchical Navigable Small World algorithm
- **Product Quantization**: Memory-efficient vector compression
- **simsimd**: SIMD-accelerated similarity computations
- **redb**: Embedded database for persistent storage
- **rayon**: Data parallelism for batch operations
- **parking_lot**: High-performance synchronization primitives
---
<div align="center">
**Part of the [Ruvector](https://github.com/ruvnet/ruvector) ecosystem**
Built by [rUv](https://ruv.io) • Production Ready • MIT Licensed
[Documentation](../../docs/README.md) • [API Reference](../../docs/api/RUST_API.md) • [Examples](../../examples/) • [Benchmarks](../../benchmarks/)
</div>