Files
wifi-densepose/docs/adr/ADR-001-ruvector-core-architecture.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

788 lines
24 KiB
Markdown

# ADR-001: Ruvector Core Architecture
**Status**: Proposed
**Date**: 2026-01-18
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
**Note**: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-18 | ruv.io | Initial architecture proposal |
---
## Context
### The Vector Database Challenge
Modern AI applications require vector databases that can:
1. **Store high-dimensional embeddings** from LLMs and embedding models
2. **Search with sub-millisecond latency** for real-time inference
3. **Scale to billions of vectors** while maintaining performance
4. **Deploy anywhere** - edge devices, browsers (WASM), cloud servers
5. **Integrate seamlessly** with LLM inference pipelines
### Current State of Vector Databases
Existing solutions fall into several categories:
| Category | Examples | Limitations |
|----------|----------|-------------|
| **Cloud-only** | Pinecone | No edge deployment, vendor lock-in |
| **Heavy native** | Milvus, Qdrant | Complex deployment, high memory |
| **Python-first** | ChromaDB, FAISS | Performance overhead, no WASM |
| **Learning-capable** | None | No existing solutions learn from usage |
### The Ruvector Vision
Ruvector is designed as a **high-performance, learning-capable vector database** implemented in Rust that:
- Achieves **61us p50 latency** for k=10 search on 384-dim vectors
- Provides **2-32x memory compression** through tiered quantization
- Runs **anywhere** - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
- **Learns from usage** via GNN layers that improve search quality over time
- Integrates with **AI agent memory systems** for policy, session state, and audit logs
---
## Decision
### Adopt a Layered, SIMD-Optimized Architecture
We implement ruvector-core as the foundational vector database engine with the following architecture:
```
+-----------------------------------------------------------------------------+
| APPLICATION LAYER |
| AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| INDEX LAYER |
| HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| QUANTIZATION LAYER |
| Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DISTANCE LAYER |
| Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| SIMD INTRINSICS LAYER |
| AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| STORAGE LAYER |
| REDB (native) | Memory-only (WASM) | PostgreSQL Extension |
+-----------------------------------------------------------------------------+
```
---
## Key Components
### 1. SIMD Intrinsics Layer (`simd_intrinsics.rs`)
The performance foundation of ruvector, providing hardware-accelerated distance calculations.
#### Architecture Dispatch
```rust
pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
#[cfg(target_arch = "x86_64")]
{
if is_x86_feature_detected!("avx2") {
unsafe { euclidean_distance_avx2_impl(a, b) }
} else {
euclidean_distance_scalar(a, b)
}
}
#[cfg(target_arch = "aarch64")]
{
unsafe { euclidean_distance_neon_impl(a, b) }
}
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
{
euclidean_distance_scalar(a, b)
}
}
```
#### Supported Operations
| Operation | AVX2 (x86_64) | NEON (ARM64) | Scalar Fallback |
|-----------|---------------|--------------|-----------------|
| Euclidean Distance | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Dot Product | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Cosine Similarity | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Manhattan Distance | N/A | 4 floats/cycle | 1 float/cycle |
#### Performance Characteristics
| Metric | AVX2 | NEON | Scalar |
|--------|------|------|--------|
| **512-dim Euclidean** | ~16M ops/sec | ~8M ops/sec | ~2M ops/sec |
| **384-dim Cosine** | ~143ns | ~200ns | ~800ns |
| **1536-dim Dot Product** | ~33ns | ~50ns | ~150ns |
#### Security Guarantees
- Bounds checking via `assert_eq!(a.len(), b.len())` prevents buffer overflows
- Unaligned loads (`_mm256_loadu_ps`, `vld1q_f32`) handle arbitrary alignment
- Scalar fallback handles remainder elements after SIMD processing
### 2. Distance Metrics Layer (`distance.rs`)
High-level distance API with optional SimSIMD integration for additional acceleration.
#### Supported Metrics
```rust
pub enum DistanceMetric {
Euclidean, // L2 distance: sqrt(sum((a[i] - b[i])^2))
Cosine, // 1 - cosine_similarity
DotProduct, // Negative dot product (for maximization)
Manhattan, // L1 distance: sum(|a[i] - b[i]|)
}
```
#### Feature Flags
| Feature | Description | Use Case |
|---------|-------------|----------|
| `simd` | SimSIMD acceleration | Native builds |
| `parallel` | Rayon batch processing | Multi-core systems |
| None | Pure Rust fallback | WASM builds |
#### Batch Distance API
```rust
pub fn batch_distances(
query: &[f32],
vectors: &[Vec<f32>],
metric: DistanceMetric,
) -> Result<Vec<f32>> {
#[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
{
use rayon::prelude::*;
vectors.par_iter()
.map(|v| distance(query, v, metric))
.collect()
}
// Sequential fallback for WASM...
}
```
### 3. Index Structures (`index/`)
#### HNSW Index (`index/hnsw.rs`)
Hierarchical Navigable Small World graph for approximate nearest neighbor search.
**Configuration Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `m` | 32 | Connections per layer (higher = better recall, more memory) |
| `ef_construction` | 200 | Build-time search depth (higher = better graph, slower build) |
| `ef_search` | 100 | Query-time search depth (higher = better recall, slower query) |
| `max_elements` | 10M | Pre-allocated capacity |
**Complexity Analysis:**
| Operation | Time Complexity | Space Complexity |
|-----------|-----------------|------------------|
| Insert | O(log n * m * ef_construction) | O(m * log n) per vector |
| Search | O(log n * m * ef_search) | O(ef_search) |
| Delete | O(1)* | O(1) |
*Note: HNSW deletion marks vectors as removed but does not restructure the graph.
**Serialization:**
```rust
pub struct HnswState {
vectors: Vec<(String, Vec<f32>)>,
id_to_idx: Vec<(String, usize)>,
idx_to_id: Vec<(usize, String)>,
next_idx: usize,
config: SerializableHnswConfig,
dimensions: usize,
metric: SerializableDistanceMetric,
}
```
#### Flat Index
Linear scan index for small datasets or exact search.
**Use Cases:**
- Datasets < 10K vectors
- Exact k-NN required
- Benchmarking HNSW recall
### 4. Quantization Strategies (`quantization.rs`)
Memory compression techniques trading precision for storage efficiency.
#### Scalar Quantization (4x compression)
Quantizes f32 to u8 using min-max scaling.
```rust
pub struct ScalarQuantized {
pub data: Vec<u8>, // Quantized values
pub min: f32, // Minimum for dequantization
pub scale: f32, // Scale factor
}
```
**Characteristics:**
- Compression: 4x (f32 -> u8)
- Distance calculation: Uses average scale for symmetric distance
- Reconstruction error: < 0.4% for typical embedding distributions
#### Product Quantization (8-16x compression)
Divides vectors into subspaces, each quantized independently via k-means codebooks.
```rust
pub struct ProductQuantized {
pub codes: Vec<u8>, // One code per subspace
pub codebooks: Vec<Vec<Vec<f32>>>, // Learned centroids
}
```
**Training:**
- K-means clustering on subspace vectors
- Codebook size typically 256 (fits in u8)
- Iterations: 10-100 for convergence
#### Binary Quantization (32x compression)
Single-bit representation based on sign.
```rust
pub struct BinaryQuantized {
pub bits: Vec<u8>, // Packed bits (8 dimensions per byte)
pub dimensions: usize,
}
```
**Characteristics:**
- Compression: 32x (f32 -> 1 bit)
- Distance: Hamming distance (XOR + popcount)
- Best for: Filtering stage before exact distance on candidates
#### Tiered Compression Strategy
Ruvector automatically manages compression based on access patterns:
| Access Frequency | Format | Compression | Latency |
|-----------------|--------|-------------|---------|
| Hot (>80%) | f32 | 1x | Instant |
| Warm (40-80%) | f16 | 2x | ~1us |
| Cool (10-40%) | Scalar | 4x | ~10us |
| Cold (1-10%) | Product | 8-16x | ~100us |
| Archive (<1%) | Binary | 32x | ~1ms |
### 5. Memory Management
#### Arena Allocator (`arena.rs`)
Bump allocator for batch operations reducing allocation overhead.
#### Lock-Free Structures (`lockfree.rs`)
- Crossbeam-based concurrent data structures
- Lock-free queues for batch ingestion
- Available only on `parallel` feature (not WASM)
#### Cache-Optimized Operations (`cache_optimized.rs`)
- Prefetching hints for sequential access
- Cache-line aligned storage
- NUMA-aware allocation on supported platforms
### 6. Storage Layer (`storage.rs`)
#### Native Storage (REDB)
- ACID transactions
- Memory-mapped vectors
- Configuration persistence
- Connection pooling for multiple VectorDB instances
```rust
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");
```
**Security:**
- Path traversal protection
- Validates relative paths don't escape working directory
#### Memory-Only Storage (`storage_memory.rs`)
- Pure in-memory for WASM
- No persistence
- DashMap for concurrent access
---
## Integration Points
### 1. Policy Memory Store
Ruvector serves as the backing store for AI agent policy memory:
```
+-------------------+ +-------------------+ +-------------------+
| AI Agent | | Policy Memory | | ruvector-core |
| | ----> | (AgenticDB) | ----> | |
| "What action for | | Search similar | | HNSW search |
| this situation?" | | past situations | | with metadata |
+-------------------+ +-------------------+ +-------------------+
```
**Use Cases:**
- Q-learning state-action lookups
- Contextual bandit policy retrieval
- Episodic memory for reasoning
### 2. Session State Index
Real-time session context for conversational AI:
```
+-------------------+ +-------------------+ +-------------------+
| Chat Session | | Session Index | | ruvector-core |
| | ----> | | ----> | |
| Current context | | Find relevant | | Cosine similarity |
| embedding | | past turns | | top-k search |
+-------------------+ +-------------------+ +-------------------+
```
**Requirements:**
- < 10ms latency for interactive use
- Session isolation via namespaces
- TTL-based cleanup
### 3. Witness Log for Audit
Cryptographically-linked audit trail:
```
+-------------------+ +-------------------+ +-------------------+
| Agent Action | | Witness Log | | ruvector-core |
| | ----> | | ----> | |
| Action embedding | | Store with hash | | Append-only |
| + metadata | | chain reference | | with timestamps |
+-------------------+ +-------------------+ +-------------------+
```
**Properties:**
- Immutable entries
- Hash-chain linking
- Semantic searchability
---
## Decision Drivers
### 1. Performance (Sub-millisecond Latency)
| Requirement | Implementation |
|-------------|----------------|
| 61us p50 search | SIMD-optimized distance + HNSW |
| 16,400 QPS | Parallel search with Rayon |
| Batch ingestion | Lock-free queues + bulk insert |
### 2. Memory Efficiency (Quantization Support)
| Requirement | Implementation |
|-------------|----------------|
| 4x compression | Scalar quantization |
| 8-16x compression | Product quantization |
| 32x compression | Binary quantization |
| Automatic tiering | Access pattern tracking |
### 3. Cross-Platform Portability (WASM, Native)
| Platform | Features Available |
|----------|-------------------|
| x86_64 Linux/macOS | Full (SIMD, parallel, storage) |
| ARM64 macOS (Apple Silicon) | Full (NEON, parallel, storage) |
| WASM (browser) | Memory-only, scalar fallback |
| PostgreSQL extension | Full + SQL integration |
### 4. LLM Integration
| Requirement | Implementation |
|-------------|----------------|
| Embedding ingestion | API-based and local providers |
| Semantic search | Cosine/dot product metrics |
| RAG pipeline | Hybrid search + metadata filtering |
---
## Alternatives Considered
### Alternative 1: Pure Python Implementation (NumPy/FAISS)
**Rejected because:**
- 10-100x slower than Rust SIMD
- No WASM support
- GIL contention in concurrent workloads
### Alternative 2: C++ with Bindings
**Rejected because:**
- Memory safety concerns
- Complex cross-compilation
- Build system complexity (CMake)
### Alternative 3: Qdrant/Milvus Integration
**Rejected because:**
- External service dependency
- No WASM support
- Complex deployment for edge use cases
### Alternative 4: GPU-Only Acceleration (CUDA/ROCm)
**Rejected because:**
- Not portable to edge/mobile
- Driver dependencies
- Overkill for < 100M vectors
---
## Consequences
### Benefits
1. **Performance**: Sub-millisecond latency enables real-time AI applications
2. **Portability**: Single codebase runs native, WASM, and PostgreSQL
3. **Memory Efficiency**: 2-32x compression makes large datasets practical on edge
4. **Integration**: Native Rust means zero-cost abstractions for embedding in other systems
5. **Learning**: GNN layers can improve search quality without reindexing
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| HNSW recall < 100% | High | Medium | ef_search tuning, hybrid with exact search |
| Quantization accuracy loss | Medium | Medium | Conformal prediction bounds |
| WASM performance gap | Medium | Low | Specialized WASM-optimized builds |
| API embeddings require external call | High | Low | Local embedding option via ONNX |
### Performance Targets
| Metric | Target | Achieved |
|--------|--------|----------|
| HNSW Search (k=10, 384-dim) | < 100us p50 | 61us |
| HNSW Search (k=100, 384-dim) | < 200us p50 | 164us |
| Cosine Distance (1536-dim) | < 200ns | 143ns |
| Dot Product (384-dim) | < 50ns | 33ns |
| Batch Distance (1000 vectors) | < 500us | 237us |
| QPS (10K vectors, k=10) | > 10K | 16,400 |
---
## Implementation Status
### Completed (v0.1.x)
| Module | Status | Description |
|--------|--------|-------------|
| `simd_intrinsics` | Complete | AVX2/NEON dispatch with scalar fallback |
| `distance` | Complete | All 4 metrics with SimSIMD integration |
| `index/hnsw` | Complete | Full HNSW with serialization |
| `index/flat` | Complete | Linear scan baseline |
| `quantization` | Complete | Scalar, Product, Binary |
| `storage` | Complete | REDB-based with connection pooling |
| `storage_memory` | Complete | In-memory for WASM |
| `types` | Complete | Core types with serde |
| `error` | Complete | Error types with thiserror |
| `vector_db` | Complete | High-level API |
| `agenticdb` | Complete | AI agent memory interface |
### Advanced Features
| Module | Status | Description |
|--------|--------|-------------|
| `advanced_features/filtered_search` | Complete | Metadata-based filtering |
| `advanced_features/hybrid_search` | Complete | Dense + sparse (BM25) |
| `advanced_features/mmr` | Complete | Maximal Marginal Relevance |
| `advanced_features/conformal_prediction` | Complete | Uncertainty quantification |
| `advanced_features/product_quantization` | Complete | Enhanced PQ with training |
### Research Features (`advanced/`)
| Module | Status | Description |
|--------|--------|-------------|
| `hypergraph` | Experimental | Hyperedge relationships |
| `learned_index` | Experimental | Neural index structures |
| `neural_hash` | Experimental | LSH with neural tuning |
| `tda` | Experimental | Topological data analysis |
---
## Feature Flags
| Feature | Default | Description |
|---------|---------|-------------|
| `default` | Yes | simd, storage, hnsw, api-embeddings, parallel |
| `simd` | Yes | SimSIMD acceleration |
| `parallel` | Yes | Rayon parallel processing |
| `storage` | Yes | REDB file-based storage |
| `hnsw` | Yes | HNSW index support |
| `api-embeddings` | Yes | HTTP-based embedding providers |
| `memory-only` | No | Pure in-memory (WASM) |
| `real-embeddings` | No | Deprecated, use api-embeddings |
---
## Dependencies
### Core Dependencies
| Dependency | Version | Purpose |
|------------|---------|---------|
| `hnsw_rs` | workspace | HNSW implementation |
| `simsimd` | workspace | SIMD distance functions |
| `rayon` | workspace | Parallel iteration |
| `redb` | workspace | Embedded database |
| `bincode` | workspace | Binary serialization |
| `dashmap` | workspace | Concurrent hash map |
| `parking_lot` | workspace | Optimized locks |
### Optional Dependencies
| Dependency | Feature | Purpose |
|------------|---------|---------|
| `reqwest` | api-embeddings | HTTP client for embedding APIs |
| `memmap2` | storage | Memory-mapped files |
| `crossbeam` | parallel | Lock-free data structures |
---
## API Examples
### Basic Vector Search
```rust
use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};
// Create database
let config = HnswConfig {
m: 32,
ef_construction: 200,
ef_search: 100,
max_elements: 1_000_000,
};
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;
// Insert vectors
db.insert("doc_1".to_string(), vec![0.1; 384])?;
db.insert("doc_2".to_string(), vec![0.2; 384])?;
// Search
let query = vec![0.15; 384];
let results = db.search(&query, 10)?;
```
### Quantized Search
```rust
use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};
// Quantize vectors for storage
let quantized = ScalarQuantized::quantize(&vector);
// Distance in quantized space
let distance = quantized.distance(&other_quantized);
// Reconstruct if needed
let reconstructed = quantized.reconstruct();
```
### Batch Operations
```rust
use ruvector_core::distance::batch_distances;
// Calculate distances to many vectors in parallel
let distances = batch_distances(
&query,
&corpus_vectors,
DistanceMetric::Cosine,
)?;
```
---
## References
1. Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.
2. Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.
3. RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/
4. SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD
---
## Appendix A: SIMD Register Usage
### AVX2 (256-bit registers)
```
+-------+-------+-------+-------+-------+-------+-------+-------+
| f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 |
+-------+-------+-------+-------+-------+-------+-------+-------+
[0] [1] [2] [3] [4] [5] [6] [7]
Operations per cycle:
- _mm256_loadu_ps: Load 8 floats
- _mm256_sub_ps: 8 subtractions
- _mm256_mul_ps: 8 multiplications
- _mm256_add_ps: 8 additions
```
### NEON (128-bit registers)
```
+-------+-------+-------+-------+
| f32 | f32 | f32 | f32 |
+-------+-------+-------+-------+
[0] [1] [2] [3]
Operations per cycle:
- vld1q_f32: Load 4 floats
- vsubq_f32: 4 subtractions
- vfmaq_f32: 4 fused multiply-add
- vaddvq_f32: Horizontal sum
```
---
## Appendix B: Memory Layout
### VectorEntry
```
+------------------+------------------+------------------+
| id: String | vector: Vec<f32>| metadata: JSON |
| (optional) | (required) | (optional) |
+------------------+------------------+------------------+
```
### HNSW Graph Structure
```
Level 3: [v0] -------- [v5]
\ /
Level 2: [v0] -- [v3] -- [v5] -- [v9]
\ / \ / \
Level 1: [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
| | | | | | |
Level 0: [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]
```
---
## Appendix C: Benchmark Results
### Platform: Apple M2 (ARM64 NEON)
```
HNSW Search k=10 (10K vectors, 384-dim):
p50: 61us
p95: 89us
p99: 112us
Throughput: 16,400 QPS
HNSW Search k=100 (10K vectors, 384-dim):
p50: 164us
p95: 203us
p99: 245us
Throughput: 6,100 QPS
Distance Operations (1536-dim):
Cosine: 143ns
Euclidean: 156ns
Dot Product: 33ns (384-dim)
Batch Distance (1000 vectors, 384-dim):
Parallel (Rayon): 237us
Sequential: 890us
```
### Platform: Intel i7 (AVX2)
```
HNSW Search k=10 (10K vectors, 384-dim):
p50: 72us
p95: 105us
p99: 134us
Throughput: 13,900 QPS
Distance Operations (1536-dim):
Cosine: 128ns
Euclidean: 141ns
Dot Product: 29ns (384-dim)
```
---
## Related Decisions
- **ADR-002**: RuvLLM Integration with Ruvector
- **ADR-003**: SIMD Optimization Strategy
- **ADR-004**: KV Cache Management
- **ADR-005**: WASM Runtime Integration
- **ADR-006**: Memory Management
- **ADR-007**: Security Review & Technical Debt
---
## Implementation Status (v2.1)
| Component | Status | Notes |
|-----------|--------|-------|
| HNSW Index | ✅ Implemented | M=32, ef_construct=256, 16K QPS |
| SIMD Distance | ✅ Implemented | AVX2/NEON with fallback |
| Scalar Quantization | ✅ Implemented | 8-bit with min/max scaling |
| Batch Operations | ✅ Implemented | Rayon parallel distances |
| Graph Store | ✅ Implemented | Adjacency list with metadata |
| Persistence | ✅ Implemented | Binary format with versioning |
**Security Status:** Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | Ruvector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added implementation status, related decisions |