Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
787
docs/adr/ADR-001-ruvector-core-architecture.md
Normal file
787
docs/adr/ADR-001-ruvector-core-architecture.md
Normal file
@@ -0,0 +1,787 @@
|
||||
# ADR-001: Ruvector Core Architecture
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-01-18
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
**SDK**: Claude-Flow
|
||||
|
||||
**Note**: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 0.1 | 2026-01-18 | ruv.io | Initial architecture proposal |
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
### The Vector Database Challenge
|
||||
|
||||
Modern AI applications require vector databases that can:
|
||||
|
||||
1. **Store high-dimensional embeddings** from LLMs and embedding models
|
||||
2. **Search with sub-millisecond latency** for real-time inference
|
||||
3. **Scale to billions of vectors** while maintaining performance
|
||||
4. **Deploy anywhere** - edge devices, browsers (WASM), cloud servers
|
||||
5. **Integrate seamlessly** with LLM inference pipelines
|
||||
|
||||
### Current State of Vector Databases
|
||||
|
||||
Existing solutions fall into several categories:
|
||||
|
||||
| Category | Examples | Limitations |
|
||||
|----------|----------|-------------|
|
||||
| **Cloud-only** | Pinecone | No edge deployment, vendor lock-in |
|
||||
| **Heavy native** | Milvus, Qdrant | Complex deployment, high memory |
|
||||
| **Python-first** | ChromaDB, FAISS | Performance overhead, no WASM |
|
||||
| **Learning-capable** | None | No existing solutions learn from usage |
|
||||
|
||||
### The Ruvector Vision
|
||||
|
||||
Ruvector is designed as a **high-performance, learning-capable vector database** implemented in Rust that:
|
||||
|
||||
- Achieves **61us p50 latency** for k=10 search on 384-dim vectors
|
||||
- Provides **2-32x memory compression** through tiered quantization
|
||||
- Runs **anywhere** - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
|
||||
- **Learns from usage** via GNN layers that improve search quality over time
|
||||
- Integrates with **AI agent memory systems** for policy, session state, and audit logs
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### Adopt a Layered, SIMD-Optimized Architecture
|
||||
|
||||
We implement ruvector-core as the foundational vector database engine with the following architecture:
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------------+
|
||||
| APPLICATION LAYER |
|
||||
| AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server |
|
||||
+-----------------------------------------------------------------------------+
|
||||
|
|
||||
+-----------------------------------------------------------------------------+
|
||||
| INDEX LAYER |
|
||||
| HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR |
|
||||
+-----------------------------------------------------------------------------+
|
||||
|
|
||||
+-----------------------------------------------------------------------------+
|
||||
| QUANTIZATION LAYER |
|
||||
| Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction |
|
||||
+-----------------------------------------------------------------------------+
|
||||
|
|
||||
+-----------------------------------------------------------------------------+
|
||||
| DISTANCE LAYER |
|
||||
| Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch |
|
||||
+-----------------------------------------------------------------------------+
|
||||
|
|
||||
+-----------------------------------------------------------------------------+
|
||||
| SIMD INTRINSICS LAYER |
|
||||
| AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback |
|
||||
+-----------------------------------------------------------------------------+
|
||||
|
|
||||
+-----------------------------------------------------------------------------+
|
||||
| STORAGE LAYER |
|
||||
| REDB (native) | Memory-only (WASM) | PostgreSQL Extension |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. SIMD Intrinsics Layer (`simd_intrinsics.rs`)
|
||||
|
||||
The performance foundation of ruvector, providing hardware-accelerated distance calculations.
|
||||
|
||||
#### Architecture Dispatch
|
||||
|
||||
```rust
|
||||
pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
|
||||
#[cfg(target_arch = "x86_64")]
|
||||
{
|
||||
if is_x86_feature_detected!("avx2") {
|
||||
unsafe { euclidean_distance_avx2_impl(a, b) }
|
||||
} else {
|
||||
euclidean_distance_scalar(a, b)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(target_arch = "aarch64")]
|
||||
{
|
||||
unsafe { euclidean_distance_neon_impl(a, b) }
|
||||
}
|
||||
|
||||
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
|
||||
{
|
||||
euclidean_distance_scalar(a, b)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Supported Operations
|
||||
|
||||
| Operation | AVX2 (x86_64) | NEON (ARM64) | Scalar Fallback |
|
||||
|-----------|---------------|--------------|-----------------|
|
||||
| Euclidean Distance | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
|
||||
| Dot Product | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
|
||||
| Cosine Similarity | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
|
||||
| Manhattan Distance | N/A | 4 floats/cycle | 1 float/cycle |
|
||||
|
||||
#### Performance Characteristics
|
||||
|
||||
| Metric | AVX2 | NEON | Scalar |
|
||||
|--------|------|------|--------|
|
||||
| **512-dim Euclidean** | ~16M ops/sec | ~8M ops/sec | ~2M ops/sec |
|
||||
| **384-dim Cosine** | ~143ns | ~200ns | ~800ns |
|
||||
| **1536-dim Dot Product** | ~33ns | ~50ns | ~150ns |
|
||||
|
||||
#### Security Guarantees
|
||||
|
||||
- Bounds checking via `assert_eq!(a.len(), b.len())` prevents buffer overflows
|
||||
- Unaligned loads (`_mm256_loadu_ps`, `vld1q_f32`) handle arbitrary alignment
|
||||
- Scalar fallback handles remainder elements after SIMD processing
|
||||
|
||||
### 2. Distance Metrics Layer (`distance.rs`)
|
||||
|
||||
High-level distance API with optional SimSIMD integration for additional acceleration.
|
||||
|
||||
#### Supported Metrics
|
||||
|
||||
```rust
|
||||
pub enum DistanceMetric {
|
||||
Euclidean, // L2 distance: sqrt(sum((a[i] - b[i])^2))
|
||||
Cosine, // 1 - cosine_similarity
|
||||
DotProduct, // Negative dot product (for maximization)
|
||||
Manhattan, // L1 distance: sum(|a[i] - b[i]|)
|
||||
}
|
||||
```
|
||||
|
||||
#### Feature Flags
|
||||
|
||||
| Feature | Description | Use Case |
|
||||
|---------|-------------|----------|
|
||||
| `simd` | SimSIMD acceleration | Native builds |
|
||||
| `parallel` | Rayon batch processing | Multi-core systems |
|
||||
| None | Pure Rust fallback | WASM builds |
|
||||
|
||||
#### Batch Distance API
|
||||
|
||||
```rust
|
||||
pub fn batch_distances(
|
||||
query: &[f32],
|
||||
vectors: &[Vec<f32>],
|
||||
metric: DistanceMetric,
|
||||
) -> Result<Vec<f32>> {
|
||||
#[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
|
||||
{
|
||||
use rayon::prelude::*;
|
||||
vectors.par_iter()
|
||||
.map(|v| distance(query, v, metric))
|
||||
.collect()
|
||||
}
|
||||
// Sequential fallback for WASM...
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Index Structures (`index/`)
|
||||
|
||||
#### HNSW Index (`index/hnsw.rs`)
|
||||
|
||||
Hierarchical Navigable Small World graph for approximate nearest neighbor search.
|
||||
|
||||
**Configuration Parameters:**
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `m` | 32 | Connections per layer (higher = better recall, more memory) |
|
||||
| `ef_construction` | 200 | Build-time search depth (higher = better graph, slower build) |
|
||||
| `ef_search` | 100 | Query-time search depth (higher = better recall, slower query) |
|
||||
| `max_elements` | 10M | Pre-allocated capacity |
|
||||
|
||||
**Complexity Analysis:**
|
||||
|
||||
| Operation | Time Complexity | Space Complexity |
|
||||
|-----------|-----------------|------------------|
|
||||
| Insert | O(log n * m * ef_construction) | O(m * log n) per vector |
|
||||
| Search | O(log n * m * ef_search) | O(ef_search) |
|
||||
| Delete | O(1)* | O(1) |
|
||||
|
||||
*Note: HNSW deletion marks vectors as removed but does not restructure the graph.
|
||||
|
||||
**Serialization:**
|
||||
|
||||
```rust
|
||||
pub struct HnswState {
|
||||
vectors: Vec<(String, Vec<f32>)>,
|
||||
id_to_idx: Vec<(String, usize)>,
|
||||
idx_to_id: Vec<(usize, String)>,
|
||||
next_idx: usize,
|
||||
config: SerializableHnswConfig,
|
||||
dimensions: usize,
|
||||
metric: SerializableDistanceMetric,
|
||||
}
|
||||
```
|
||||
|
||||
#### Flat Index
|
||||
|
||||
Linear scan index for small datasets or exact search.
|
||||
|
||||
**Use Cases:**
|
||||
- Datasets < 10K vectors
|
||||
- Exact k-NN required
|
||||
- Benchmarking HNSW recall
|
||||
|
||||
### 4. Quantization Strategies (`quantization.rs`)
|
||||
|
||||
Memory compression techniques trading precision for storage efficiency.
|
||||
|
||||
#### Scalar Quantization (4x compression)
|
||||
|
||||
Quantizes f32 to u8 using min-max scaling.
|
||||
|
||||
```rust
|
||||
pub struct ScalarQuantized {
|
||||
pub data: Vec<u8>, // Quantized values
|
||||
pub min: f32, // Minimum for dequantization
|
||||
pub scale: f32, // Scale factor
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Compression: 4x (f32 -> u8)
|
||||
- Distance calculation: Uses average scale for symmetric distance
|
||||
- Reconstruction error: < 0.4% for typical embedding distributions
|
||||
|
||||
#### Product Quantization (8-16x compression)
|
||||
|
||||
Divides vectors into subspaces, each quantized independently via k-means codebooks.
|
||||
|
||||
```rust
|
||||
pub struct ProductQuantized {
|
||||
pub codes: Vec<u8>, // One code per subspace
|
||||
pub codebooks: Vec<Vec<Vec<f32>>>, // Learned centroids
|
||||
}
|
||||
```
|
||||
|
||||
**Training:**
|
||||
- K-means clustering on subspace vectors
|
||||
- Codebook size typically 256 (fits in u8)
|
||||
- Iterations: 10-100 for convergence
|
||||
|
||||
#### Binary Quantization (32x compression)
|
||||
|
||||
Single-bit representation based on sign.
|
||||
|
||||
```rust
|
||||
pub struct BinaryQuantized {
|
||||
pub bits: Vec<u8>, // Packed bits (8 dimensions per byte)
|
||||
pub dimensions: usize,
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Compression: 32x (f32 -> 1 bit)
|
||||
- Distance: Hamming distance (XOR + popcount)
|
||||
- Best for: Filtering stage before exact distance on candidates
|
||||
|
||||
#### Tiered Compression Strategy
|
||||
|
||||
Ruvector automatically manages compression based on access patterns:
|
||||
|
||||
| Access Frequency | Format | Compression | Latency |
|
||||
|-----------------|--------|-------------|---------|
|
||||
| Hot (>80%) | f32 | 1x | Instant |
|
||||
| Warm (40-80%) | f16 | 2x | ~1us |
|
||||
| Cool (10-40%) | Scalar | 4x | ~10us |
|
||||
| Cold (1-10%) | Product | 8-16x | ~100us |
|
||||
| Archive (<1%) | Binary | 32x | ~1ms |
|
||||
|
||||
### 5. Memory Management
|
||||
|
||||
#### Arena Allocator (`arena.rs`)
|
||||
|
||||
Bump allocator for batch operations reducing allocation overhead.
|
||||
|
||||
#### Lock-Free Structures (`lockfree.rs`)
|
||||
|
||||
- Crossbeam-based concurrent data structures
|
||||
- Lock-free queues for batch ingestion
|
||||
- Available only on `parallel` feature (not WASM)
|
||||
|
||||
#### Cache-Optimized Operations (`cache_optimized.rs`)
|
||||
|
||||
- Prefetching hints for sequential access
|
||||
- Cache-line aligned storage
|
||||
- NUMA-aware allocation on supported platforms
|
||||
|
||||
### 6. Storage Layer (`storage.rs`)
|
||||
|
||||
#### Native Storage (REDB)
|
||||
|
||||
- ACID transactions
|
||||
- Memory-mapped vectors
|
||||
- Configuration persistence
|
||||
- Connection pooling for multiple VectorDB instances
|
||||
|
||||
```rust
|
||||
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
|
||||
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
|
||||
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");
|
||||
```
|
||||
|
||||
**Security:**
|
||||
- Path traversal protection
|
||||
- Validates relative paths don't escape working directory
|
||||
|
||||
#### Memory-Only Storage (`storage_memory.rs`)
|
||||
|
||||
- Pure in-memory for WASM
|
||||
- No persistence
|
||||
- DashMap for concurrent access
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### 1. Policy Memory Store
|
||||
|
||||
Ruvector serves as the backing store for AI agent policy memory:
|
||||
|
||||
```
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
| AI Agent | | Policy Memory | | ruvector-core |
|
||||
| | ----> | (AgenticDB) | ----> | |
|
||||
| "What action for | | Search similar | | HNSW search |
|
||||
| this situation?" | | past situations | | with metadata |
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
```
|
||||
|
||||
**Use Cases:**
|
||||
- Q-learning state-action lookups
|
||||
- Contextual bandit policy retrieval
|
||||
- Episodic memory for reasoning
|
||||
|
||||
### 2. Session State Index
|
||||
|
||||
Real-time session context for conversational AI:
|
||||
|
||||
```
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
| Chat Session | | Session Index | | ruvector-core |
|
||||
| | ----> | | ----> | |
|
||||
| Current context | | Find relevant | | Cosine similarity |
|
||||
| embedding | | past turns | | top-k search |
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
- < 10ms latency for interactive use
|
||||
- Session isolation via namespaces
|
||||
- TTL-based cleanup
|
||||
|
||||
### 3. Witness Log for Audit
|
||||
|
||||
Cryptographically-linked audit trail:
|
||||
|
||||
```
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
| Agent Action | | Witness Log | | ruvector-core |
|
||||
| | ----> | | ----> | |
|
||||
| Action embedding | | Store with hash | | Append-only |
|
||||
| + metadata | | chain reference | | with timestamps |
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
```
|
||||
|
||||
**Properties:**
|
||||
- Immutable entries
|
||||
- Hash-chain linking
|
||||
- Semantic searchability
|
||||
|
||||
---
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
### 1. Performance (Sub-millisecond Latency)
|
||||
|
||||
| Requirement | Implementation |
|
||||
|-------------|----------------|
|
||||
| 61us p50 search | SIMD-optimized distance + HNSW |
|
||||
| 16,400 QPS | Parallel search with Rayon |
|
||||
| Batch ingestion | Lock-free queues + bulk insert |
|
||||
|
||||
### 2. Memory Efficiency (Quantization Support)
|
||||
|
||||
| Requirement | Implementation |
|
||||
|-------------|----------------|
|
||||
| 4x compression | Scalar quantization |
|
||||
| 8-16x compression | Product quantization |
|
||||
| 32x compression | Binary quantization |
|
||||
| Automatic tiering | Access pattern tracking |
|
||||
|
||||
### 3. Cross-Platform Portability (WASM, Native)
|
||||
|
||||
| Platform | Features Available |
|
||||
|----------|-------------------|
|
||||
| x86_64 Linux/macOS | Full (SIMD, parallel, storage) |
|
||||
| ARM64 macOS (Apple Silicon) | Full (NEON, parallel, storage) |
|
||||
| WASM (browser) | Memory-only, scalar fallback |
|
||||
| PostgreSQL extension | Full + SQL integration |
|
||||
|
||||
### 4. LLM Integration
|
||||
|
||||
| Requirement | Implementation |
|
||||
|-------------|----------------|
|
||||
| Embedding ingestion | API-based and local providers |
|
||||
| Semantic search | Cosine/dot product metrics |
|
||||
| RAG pipeline | Hybrid search + metadata filtering |
|
||||
|
||||
---
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Pure Python Implementation (NumPy/FAISS)
|
||||
|
||||
**Rejected because:**
|
||||
- 10-100x slower than Rust SIMD
|
||||
- No WASM support
|
||||
- GIL contention in concurrent workloads
|
||||
|
||||
### Alternative 2: C++ with Bindings
|
||||
|
||||
**Rejected because:**
|
||||
- Memory safety concerns
|
||||
- Complex cross-compilation
|
||||
- Build system complexity (CMake)
|
||||
|
||||
### Alternative 3: Qdrant/Milvus Integration
|
||||
|
||||
**Rejected because:**
|
||||
- External service dependency
|
||||
- No WASM support
|
||||
- Complex deployment for edge use cases
|
||||
|
||||
### Alternative 4: GPU-Only Acceleration (CUDA/ROCm)
|
||||
|
||||
**Rejected because:**
|
||||
- Not portable to edge/mobile
|
||||
- Driver dependencies
|
||||
- Overkill for < 100M vectors
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Performance**: Sub-millisecond latency enables real-time AI applications
|
||||
2. **Portability**: Single codebase runs native, WASM, and PostgreSQL
|
||||
3. **Memory Efficiency**: 2-32x compression makes large datasets practical on edge
|
||||
4. **Integration**: Native Rust means zero-cost abstractions for embedding in other systems
|
||||
5. **Learning**: GNN layers can improve search quality without reindexing
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|-------------|--------|------------|
|
||||
| HNSW recall < 100% | High | Medium | ef_search tuning, hybrid with exact search |
|
||||
| Quantization accuracy loss | Medium | Medium | Conformal prediction bounds |
|
||||
| WASM performance gap | Medium | Low | Specialized WASM-optimized builds |
|
||||
| API embeddings require external call | High | Low | Local embedding option via ONNX |
|
||||
|
||||
### Performance Targets
|
||||
|
||||
| Metric | Target | Achieved |
|
||||
|--------|--------|----------|
|
||||
| HNSW Search (k=10, 384-dim) | < 100us p50 | 61us |
|
||||
| HNSW Search (k=100, 384-dim) | < 200us p50 | 164us |
|
||||
| Cosine Distance (1536-dim) | < 200ns | 143ns |
|
||||
| Dot Product (384-dim) | < 50ns | 33ns |
|
||||
| Batch Distance (1000 vectors) | < 500us | 237us |
|
||||
| QPS (10K vectors, k=10) | > 10K | 16,400 |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Status
|
||||
|
||||
### Completed (v0.1.x)
|
||||
|
||||
| Module | Status | Description |
|
||||
|--------|--------|-------------|
|
||||
| `simd_intrinsics` | Complete | AVX2/NEON dispatch with scalar fallback |
|
||||
| `distance` | Complete | All 4 metrics with SimSIMD integration |
|
||||
| `index/hnsw` | Complete | Full HNSW with serialization |
|
||||
| `index/flat` | Complete | Linear scan baseline |
|
||||
| `quantization` | Complete | Scalar, Product, Binary |
|
||||
| `storage` | Complete | REDB-based with connection pooling |
|
||||
| `storage_memory` | Complete | In-memory for WASM |
|
||||
| `types` | Complete | Core types with serde |
|
||||
| `error` | Complete | Error types with thiserror |
|
||||
| `vector_db` | Complete | High-level API |
|
||||
| `agenticdb` | Complete | AI agent memory interface |
|
||||
|
||||
### Advanced Features
|
||||
|
||||
| Module | Status | Description |
|
||||
|--------|--------|-------------|
|
||||
| `advanced_features/filtered_search` | Complete | Metadata-based filtering |
|
||||
| `advanced_features/hybrid_search` | Complete | Dense + sparse (BM25) |
|
||||
| `advanced_features/mmr` | Complete | Maximal Marginal Relevance |
|
||||
| `advanced_features/conformal_prediction` | Complete | Uncertainty quantification |
|
||||
| `advanced_features/product_quantization` | Complete | Enhanced PQ with training |
|
||||
|
||||
### Research Features (`advanced/`)
|
||||
|
||||
| Module | Status | Description |
|
||||
|--------|--------|-------------|
|
||||
| `hypergraph` | Experimental | Hyperedge relationships |
|
||||
| `learned_index` | Experimental | Neural index structures |
|
||||
| `neural_hash` | Experimental | LSH with neural tuning |
|
||||
| `tda` | Experimental | Topological data analysis |
|
||||
|
||||
---
|
||||
|
||||
## Feature Flags
|
||||
|
||||
| Feature | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `default` | Yes | simd, storage, hnsw, api-embeddings, parallel |
|
||||
| `simd` | Yes | SimSIMD acceleration |
|
||||
| `parallel` | Yes | Rayon parallel processing |
|
||||
| `storage` | Yes | REDB file-based storage |
|
||||
| `hnsw` | Yes | HNSW index support |
|
||||
| `api-embeddings` | Yes | HTTP-based embedding providers |
|
||||
| `memory-only` | No | Pure in-memory (WASM) |
|
||||
| `real-embeddings` | No | Deprecated, use api-embeddings |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Core Dependencies
|
||||
|
||||
| Dependency | Version | Purpose |
|
||||
|------------|---------|---------|
|
||||
| `hnsw_rs` | workspace | HNSW implementation |
|
||||
| `simsimd` | workspace | SIMD distance functions |
|
||||
| `rayon` | workspace | Parallel iteration |
|
||||
| `redb` | workspace | Embedded database |
|
||||
| `bincode` | workspace | Binary serialization |
|
||||
| `dashmap` | workspace | Concurrent hash map |
|
||||
| `parking_lot` | workspace | Optimized locks |
|
||||
|
||||
### Optional Dependencies
|
||||
|
||||
| Dependency | Feature | Purpose |
|
||||
|------------|---------|---------|
|
||||
| `reqwest` | api-embeddings | HTTP client for embedding APIs |
|
||||
| `memmap2` | storage | Memory-mapped files |
|
||||
| `crossbeam` | parallel | Lock-free data structures |
|
||||
|
||||
---
|
||||
|
||||
## API Examples
|
||||
|
||||
### Basic Vector Search
|
||||
|
||||
```rust
|
||||
use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};
|
||||
|
||||
// Create database
|
||||
let config = HnswConfig {
|
||||
m: 32,
|
||||
ef_construction: 200,
|
||||
ef_search: 100,
|
||||
max_elements: 1_000_000,
|
||||
};
|
||||
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;
|
||||
|
||||
// Insert vectors
|
||||
db.insert("doc_1".to_string(), vec![0.1; 384])?;
|
||||
db.insert("doc_2".to_string(), vec![0.2; 384])?;
|
||||
|
||||
// Search
|
||||
let query = vec![0.15; 384];
|
||||
let results = db.search(&query, 10)?;
|
||||
```
|
||||
|
||||
### Quantized Search
|
||||
|
||||
```rust
|
||||
use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};
|
||||
|
||||
// Quantize vectors for storage
|
||||
let quantized = ScalarQuantized::quantize(&vector);
|
||||
|
||||
// Distance in quantized space
|
||||
let distance = quantized.distance(&other_quantized);
|
||||
|
||||
// Reconstruct if needed
|
||||
let reconstructed = quantized.reconstruct();
|
||||
```
|
||||
|
||||
### Batch Operations
|
||||
|
||||
```rust
|
||||
use ruvector_core::distance::batch_distances;
|
||||
|
||||
// Calculate distances to many vectors in parallel
|
||||
let distances = batch_distances(
|
||||
&query,
|
||||
&corpus_vectors,
|
||||
DistanceMetric::Cosine,
|
||||
)?;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.
|
||||
|
||||
2. Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.
|
||||
|
||||
3. RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/
|
||||
|
||||
4. SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: SIMD Register Usage
|
||||
|
||||
### AVX2 (256-bit registers)
|
||||
|
||||
```
|
||||
+-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
| f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 |
|
||||
+-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
[0] [1] [2] [3] [4] [5] [6] [7]
|
||||
|
||||
Operations per cycle:
|
||||
- _mm256_loadu_ps: Load 8 floats
|
||||
- _mm256_sub_ps: 8 subtractions
|
||||
- _mm256_mul_ps: 8 multiplications
|
||||
- _mm256_add_ps: 8 additions
|
||||
```
|
||||
|
||||
### NEON (128-bit registers)
|
||||
|
||||
```
|
||||
+-------+-------+-------+-------+
|
||||
| f32 | f32 | f32 | f32 |
|
||||
+-------+-------+-------+-------+
|
||||
[0] [1] [2] [3]
|
||||
|
||||
Operations per cycle:
|
||||
- vld1q_f32: Load 4 floats
|
||||
- vsubq_f32: 4 subtractions
|
||||
- vfmaq_f32: 4 fused multiply-add
|
||||
- vaddvq_f32: Horizontal sum
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: Memory Layout
|
||||
|
||||
### VectorEntry
|
||||
|
||||
```
|
||||
+------------------+------------------+------------------+
|
||||
| id: String | vector: Vec<f32>| metadata: JSON |
|
||||
| (optional) | (required) | (optional) |
|
||||
+------------------+------------------+------------------+
|
||||
```
|
||||
|
||||
### HNSW Graph Structure
|
||||
|
||||
```
|
||||
Level 3: [v0] -------- [v5]
|
||||
\ /
|
||||
Level 2: [v0] -- [v3] -- [v5] -- [v9]
|
||||
\ / \ / \
|
||||
Level 1: [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
|
||||
| | | | | | |
|
||||
Level 0: [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix C: Benchmark Results
|
||||
|
||||
### Platform: Apple M2 (ARM64 NEON)
|
||||
|
||||
```
|
||||
HNSW Search k=10 (10K vectors, 384-dim):
|
||||
p50: 61us
|
||||
p95: 89us
|
||||
p99: 112us
|
||||
Throughput: 16,400 QPS
|
||||
|
||||
HNSW Search k=100 (10K vectors, 384-dim):
|
||||
p50: 164us
|
||||
p95: 203us
|
||||
p99: 245us
|
||||
Throughput: 6,100 QPS
|
||||
|
||||
Distance Operations (1536-dim):
|
||||
Cosine: 143ns
|
||||
Euclidean: 156ns
|
||||
Dot Product: 33ns (384-dim)
|
||||
|
||||
Batch Distance (1000 vectors, 384-dim):
|
||||
Parallel (Rayon): 237us
|
||||
Sequential: 890us
|
||||
```
|
||||
|
||||
### Platform: Intel i7 (AVX2)
|
||||
|
||||
```
|
||||
HNSW Search k=10 (10K vectors, 384-dim):
|
||||
p50: 72us
|
||||
p95: 105us
|
||||
p99: 134us
|
||||
Throughput: 13,900 QPS
|
||||
|
||||
Distance Operations (1536-dim):
|
||||
Cosine: 128ns
|
||||
Euclidean: 141ns
|
||||
Dot Product: 29ns (384-dim)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- **ADR-002**: RuvLLM Integration with Ruvector
|
||||
- **ADR-003**: SIMD Optimization Strategy
|
||||
- **ADR-004**: KV Cache Management
|
||||
- **ADR-005**: WASM Runtime Integration
|
||||
- **ADR-006**: Memory Management
|
||||
- **ADR-007**: Security Review & Technical Debt
|
||||
|
||||
---
|
||||
|
||||
## Implementation Status (v2.1)
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| HNSW Index | ✅ Implemented | M=32, ef_construct=256, 16K QPS |
|
||||
| SIMD Distance | ✅ Implemented | AVX2/NEON with fallback |
|
||||
| Scalar Quantization | ✅ Implemented | 8-bit with min/max scaling |
|
||||
| Batch Operations | ✅ Implemented | Rayon parallel distances |
|
||||
| Graph Store | ✅ Implemented | Adjacency list with metadata |
|
||||
| Persistence | ✅ Implemented | Binary format with versioning |
|
||||
|
||||
**Security Status:** Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).
|
||||
|
||||
---
|
||||
|
||||
## Revision History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0 | 2026-01-18 | Ruvector Architecture Team | Initial version |
|
||||
| 1.1 | 2026-01-19 | Security Review Agent | Added implementation status, related decisions |
|
||||
Reference in New Issue
Block a user