# Ruvector Core [![Crates.io](https://img.shields.io/crates/v/ruvector-core.svg)](https://crates.io/crates/ruvector-core) [![Documentation](https://docs.rs/ruvector-core/badge.svg)](https://docs.rs/ruvector-core) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Rust](https://img.shields.io/badge/rust-1.77%2B-orange.svg)](https://www.rust-lang.org) **The pure-Rust vector database engine behind RuVector -- HNSW indexing, quantization, and SIMD acceleration in a single crate.** `ruvector-core` is the foundational library that powers the entire [RuVector](https://github.com/ruvnet/ruvector) ecosystem. It gives you a production-grade vector database you can embed directly into any Rust application: insert vectors, search them in under a millisecond, filter by metadata, and compress storage up to 32x -- all without external services. If you need vector search as a library instead of a server, this is the crate. | | ruvector-core | Typical Vector Database | |---|---|---| | **Deployment** | Embed as a Rust dependency -- no server, no network calls | Run a separate service, manage connections | | **Query latency** | <0.5 ms p50 at 1M vectors with HNSW | ~1-5 ms depending on network and index | | **Memory compression** | Scalar (4x), Product (8-32x), Binary (32x) quantization built in | Often requires paid tiers or external tools | | **SIMD acceleration** | SimSIMD hardware-optimized distance calculations, automatic | Manual tuning or not available | | **Search modes** | Dense vectors, sparse BM25, hybrid, MMR diversity, filtered -- all in one API | Typically dense-only; hybrid and filtering are add-ons | | **Storage** | Zero-copy mmap with `redb` -- instant loading, no deserialization | Load time scales with dataset size | | **Concurrency** | Lock-free indexing with parallel batch processing via Rayon | Varies; many require single-writer locks | | **Dependencies** | Minimal -- pure Rust, compiles anywhere `rustc` runs | Often depends on C/C++ libraries (BLAS, LAPACK) | | **Cost** | Free forever -- open source (MIT) | Per-vector or per-query pricing on managed tiers | ## Installation Add `ruvector-core` to your `Cargo.toml`: ```toml [dependencies] ruvector-core = "0.1.0" ``` ### Feature Flags ```toml [dependencies] ruvector-core = { version = "0.1.0", features = ["simd", "uuid-support"] } ``` Available features: - `simd` (default): Enable SIMD-optimized distance calculations - `uuid-support` (default): Enable UUID generation for vector IDs ## Key Features | Feature | What It Does | Why It Matters | |---------|-------------|----------------| | **HNSW Indexing** | Hierarchical Navigable Small World graphs for O(log n) approximate nearest neighbor search | Sub-millisecond queries at million-vector scale | | **Multiple Distance Metrics** | Euclidean, Cosine, Dot Product, Manhattan | Match the metric to your embedding model without conversion | | **Scalar Quantization** | Compress vectors to 8-bit integers (4x reduction) | Cut memory by 75% with 98% recall preserved | | **Product Quantization** | Split vectors into subspaces with codebooks (8-32x reduction) | Store millions of vectors on a single machine | | **Binary Quantization** | 1-bit representation (32x reduction) | Ultra-fast screening pass for massive datasets | | **SIMD Distance** | Hardware-accelerated distance via SimSIMD | Up to 80K QPS on 8 cores without code changes | | **Zero-Copy I/O** | Memory-mapped storage loads instantly | No deserialization step -- open a file and search immediately | | **Hybrid Search** | Combine dense vector similarity with sparse BM25 text scoring | One query handles both semantic and keyword matching | | **Metadata Filtering** | Apply key-value filters during search | No post-filtering needed -- results are already filtered | | **MMR Diversification** | Maximal Marginal Relevance re-ranking | Avoid redundant results when top-K are too similar | | **Conformal Prediction** | Uncertainty quantification on search results | Know when to trust (or distrust) a match | | **Lock-Free Indexing** | Concurrent reads and writes without blocking | High-throughput ingestion while serving queries | | **Batch Processing** | Parallel insert and search via Rayon | Saturate all cores for bulk operations | ## Quick Start ### Basic Usage ```rust use ruvector_core::{VectorDB, DbOptions, VectorEntry, SearchQuery, DistanceMetric}; fn main() -> Result<(), Box> { // Create a new vector database let mut options = DbOptions::default(); options.dimensions = 384; // Vector dimensions options.storage_path = "./my_vectors.db".to_string(); options.distance_metric = DistanceMetric::Cosine; let db = VectorDB::new(options)?; // Insert vectors db.insert(VectorEntry { id: Some("doc1".to_string()), vector: vec![0.1, 0.2, 0.3, /* ... 384 dimensions */], metadata: None, })?; db.insert(VectorEntry { id: Some("doc2".to_string()), vector: vec![0.4, 0.5, 0.6, /* ... 384 dimensions */], metadata: None, })?; // Search for similar vectors let results = db.search(SearchQuery { vector: vec![0.1, 0.2, 0.3, /* ... 384 dimensions */], k: 10, // Return top 10 results filter: None, ef_search: None, })?; for result in results { println!("ID: {}, Score: {}", result.id, result.score); } Ok(()) } ``` ### Batch Operations ```rust use ruvector_core::{VectorDB, VectorEntry}; // Insert multiple vectors efficiently let entries = vec![ VectorEntry { id: Some("doc1".to_string()), vector: vec![0.1, 0.2, 0.3], metadata: None, }, VectorEntry { id: Some("doc2".to_string()), vector: vec![0.4, 0.5, 0.6], metadata: None, }, ]; let ids = db.insert_batch(entries)?; println!("Inserted {} vectors", ids.len()); ``` ### With Metadata Filtering ```rust use std::collections::HashMap; use serde_json::json; // Insert with metadata db.insert(VectorEntry { id: Some("product1".to_string()), vector: vec![0.1, 0.2, 0.3], metadata: Some(HashMap::from([ ("category".to_string(), json!("electronics")), ("price".to_string(), json!(299.99)), ])), })?; // Search with metadata filter let results = db.search(SearchQuery { vector: vec![0.1, 0.2, 0.3], k: 10, filter: Some(HashMap::from([ ("category".to_string(), json!("electronics")), ])), ef_search: None, })?; ``` ### HNSW Configuration ```rust use ruvector_core::{DbOptions, HnswConfig, DistanceMetric}; let mut options = DbOptions::default(); options.dimensions = 384; options.distance_metric = DistanceMetric::Cosine; // Configure HNSW index parameters options.hnsw_config = Some(HnswConfig { m: 32, // Connections per layer (16-64 typical) ef_construction: 200, // Build-time accuracy (100-500 typical) ef_search: 100, // Search-time accuracy (50-200 typical) max_elements: 10_000_000, // Maximum vectors }); let db = VectorDB::new(options)?; ``` ### Quantization ```rust use ruvector_core::{DbOptions, QuantizationConfig}; let mut options = DbOptions::default(); options.dimensions = 384; // Enable scalar quantization (4x compression) options.quantization = Some(QuantizationConfig::Scalar); // Or product quantization (8-32x compression) options.quantization = Some(QuantizationConfig::Product { subspaces: 8, // Number of subspaces k: 256, // Codebook size }); let db = VectorDB::new(options)?; ``` ## API Overview ### Core Types ```rust // Main database interface pub struct VectorDB { /* ... */ } // Vector entry with optional ID and metadata pub struct VectorEntry { pub id: Option, pub vector: Vec, pub metadata: Option>, } // Search query parameters pub struct SearchQuery { pub vector: Vec, pub k: usize, pub filter: Option>, pub ef_search: Option, } // Search result with score pub struct SearchResult { pub id: VectorId, pub score: f32, pub vector: Option>, pub metadata: Option>, } ``` ### Main Operations ```rust impl VectorDB { // Create new database with options pub fn new(options: DbOptions) -> Result; // Create with just dimensions (uses defaults) pub fn with_dimensions(dimensions: usize) -> Result; // Insert single vector pub fn insert(&self, entry: VectorEntry) -> Result; // Insert multiple vectors pub fn insert_batch(&self, entries: Vec) -> Result>; // Search for similar vectors pub fn search(&self, query: SearchQuery) -> Result>; // Delete vector by ID pub fn delete(&self, id: &str) -> Result; // Get vector by ID pub fn get(&self, id: &str) -> Result>; // Get total count pub fn len(&self) -> Result; // Check if empty pub fn is_empty(&self) -> Result; } ``` ### Distance Metrics ```rust pub enum DistanceMetric { Euclidean, // L2 distance - default for embeddings Cosine, // Cosine similarity (1 - similarity) DotProduct, // Negative dot product (for maximization) Manhattan, // L1 distance } ``` ### Advanced Features ```rust // Hybrid search (dense + sparse) use ruvector_core::{HybridSearch, HybridConfig}; let hybrid = HybridSearch::new(HybridConfig { alpha: 0.7, // Balance between dense (0.7) and sparse (0.3) ..Default::default() }); // Filtered search with expressions use ruvector_core::{FilteredSearch, FilterExpression}; let filtered = FilteredSearch::new(db); let expr = FilterExpression::And(vec![ FilterExpression::Equals("category".to_string(), json!("books")), FilterExpression::GreaterThan("price".to_string(), json!(10.0)), ]); // MMR diversification use ruvector_core::{MMRSearch, MMRConfig}; let mmr = MMRSearch::new(MMRConfig { lambda: 0.5, // Balance relevance (0.5) and diversity (0.5) ..Default::default() }); ``` ## Performance ### Latency (Single Query) ``` Operation Flat Index HNSW Index --------------------------------------------- Search (1K vecs) ~0.1ms ~0.2ms Search (100K vecs) ~10ms ~0.5ms Search (1M vecs) ~100ms <1ms Insert ~0.1ms ~1ms Batch (1000) ~50ms ~500ms ``` ### Memory Usage (1M Vectors, 384 Dimensions) ``` Configuration Memory Recall --------------------------------------------- Full Precision (f32) ~1.5GB 100% Scalar Quantization ~400MB 98% Product Quantization ~200MB 95% Binary Quantization ~50MB 85% ``` ### Throughput (Queries Per Second) ``` Configuration QPS Latency (p50) ----------------------------------------------------- Single Thread ~2,000 ~0.5ms Multi-Thread (8 cores) ~50,000 <0.5ms With SIMD ~80,000 <0.3ms With Quantization ~100,000 <0.2ms ``` ## Configuration Guide ### For Maximum Accuracy ```rust let options = DbOptions { dimensions: 384, distance_metric: DistanceMetric::Cosine, hnsw_config: Some(HnswConfig { m: 64, ef_construction: 500, ef_search: 200, max_elements: 10_000_000, }), quantization: None, // Full precision ..Default::default() }; ``` ### For Maximum Speed ```rust let options = DbOptions { dimensions: 384, distance_metric: DistanceMetric::DotProduct, hnsw_config: Some(HnswConfig { m: 16, ef_construction: 100, ef_search: 50, max_elements: 10_000_000, }), quantization: Some(QuantizationConfig::Binary), ..Default::default() }; ``` ### For Balanced Performance ```rust let options = DbOptions::default(); // Recommended defaults ``` ## Building and Testing ### Build ```bash # Build with default features cargo build --release # Build without SIMD cargo build --release --no-default-features --features uuid-support # Build for specific target with optimizations RUSTFLAGS="-C target-cpu=native" cargo build --release ``` ### Testing ```bash # Run all tests cargo test # Run with specific features cargo test --features simd # Run with logging RUST_LOG=debug cargo test ``` ### Benchmarks ```bash # Run all benchmarks cargo bench # Run specific benchmark cargo bench --bench hnsw_search # Run with features cargo bench --features simd ``` Available benchmarks: - `distance_metrics` - SIMD-optimized distance calculations - `hnsw_search` - HNSW index search performance - `quantization_bench` - Quantization techniques - `batch_operations` - Batch insert/search operations - `comprehensive_bench` - Full system benchmarks ## Related Crates `ruvector-core` is the foundation for platform-specific bindings: - **[ruvector-node](../ruvector-node/)** - Node.js bindings via NAPI-RS - **[ruvector-wasm](../ruvector-wasm/)** - WebAssembly bindings for browsers - **[ruvector-gnn](../ruvector-gnn/)** - Graph Neural Network layer for learned search - **[ruvector-cli](../ruvector-cli/)** - Command-line interface - **[ruvector-bench](../ruvector-bench/)** - Performance benchmarks ## Documentation - **[Main README](../../README.md)** - Complete project overview - **[Getting Started Guide](../../docs/guide/GETTING_STARTED.md)** - Quick start tutorial - **[Rust API Reference](../../docs/api/RUST_API.md)** - Detailed API documentation - **[Advanced Features Guide](../../docs/guide/ADVANCED_FEATURES.md)** - Quantization, indexing, tuning - **[Performance Tuning](../../docs/optimization/PERFORMANCE_TUNING_GUIDE.md)** - Optimization strategies - **[API Documentation](https://docs.rs/ruvector-core)** - Full API reference on docs.rs ## Acknowledgments Built with state-of-the-art algorithms and libraries: - **[hnsw_rs](https://crates.io/crates/hnsw_rs)** - HNSW implementation - **[simsimd](https://crates.io/crates/simsimd)** - SIMD distance calculations - **[redb](https://crates.io/crates/redb)** - Embedded database - **[rayon](https://crates.io/crates/rayon)** - Data parallelism - **[memmap2](https://crates.io/crates/memmap2)** - Memory-mapped files ## License **MIT License** - see [LICENSE](../../LICENSE) for details. ---
**Part of [RuVector](https://github.com/ruvnet/ruvector) - Built by [rUv](https://ruv.io)** [![Star on GitHub](https://img.shields.io/github/stars/ruvnet/ruvector?style=social)](https://github.com/ruvnet/ruvector) [Documentation](https://docs.rs/ruvector-core) | [Crates.io](https://crates.io/crates/ruvector-core) | [GitHub](https://github.com/ruvnet/ruvector)