wifi-densepose/docs/adr/ADR-001-ruvector-core-architecture.md

# ADR-001: Ruvector Core Architecture

**Status**: Proposed
**Date**: 2026-01-18
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow

**Note**: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.

## Version History

| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-18 | ruv.io | Initial architecture proposal |

---

## Context

### The Vector Database Challenge

Modern AI applications require vector databases that can:

1. **Store high-dimensional embeddings** from LLMs and embedding models
2. **Search with sub-millisecond latency** for real-time inference
3. **Scale to billions of vectors** while maintaining performance
4. **Deploy anywhere** - edge devices, browsers (WASM), cloud servers
5. **Integrate seamlessly** with LLM inference pipelines

### Current State of Vector Databases

Existing solutions fall into several categories:

| Category | Examples | Limitations |
|----------|----------|-------------|
| **Cloud-only** | Pinecone | No edge deployment, vendor lock-in |
| **Heavy native** | Milvus, Qdrant | Complex deployment, high memory |
| **Python-first** | ChromaDB, FAISS | Performance overhead, no WASM |
| **Learning-capable** | None | No existing solutions learn from usage |

### The Ruvector Vision

Ruvector is designed as a **high-performance, learning-capable vector database** implemented in Rust that:

- Achieves **61us p50 latency** for k=10 search on 384-dim vectors
- Provides **2-32x memory compression** through tiered quantization
- Runs **anywhere** - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
- **Learns from usage** via GNN layers that improve search quality over time
- Integrates with **AI agent memory systems** for policy, session state, and audit logs

---

## Decision

### Adopt a Layered, SIMD-Optimized Architecture

We implement ruvector-core as the foundational vector database engine with the following architecture:

```
+-----------------------------------------------------------------------------+
|                              APPLICATION LAYER                               |
|  AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              INDEX LAYER                                     |
|  HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR            |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              QUANTIZATION LAYER                              |
|  Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction        |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              DISTANCE LAYER                                  |
|  Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              SIMD INTRINSICS LAYER                           |
|  AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback       |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              STORAGE LAYER                                   |
|  REDB (native) | Memory-only (WASM) | PostgreSQL Extension                  |
+-----------------------------------------------------------------------------+
```

---

## Key Components

### 1. SIMD Intrinsics Layer (`simd_intrinsics.rs`)

The performance foundation of ruvector, providing hardware-accelerated distance calculations.

#### Architecture Dispatch

```rust
pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
    #[cfg(target_arch = "x86_64")]
    {
        if is_x86_feature_detected!("avx2") {
            unsafe { euclidean_distance_avx2_impl(a, b) }
        } else {
            euclidean_distance_scalar(a, b)
        }
    }

    #[cfg(target_arch = "aarch64")]
    {
        unsafe { euclidean_distance_neon_impl(a, b) }
    }

    #[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
    {
        euclidean_distance_scalar(a, b)
    }
}
```

#### Supported Operations

| Operation | AVX2 (x86_64) | NEON (ARM64) | Scalar Fallback |
|-----------|---------------|--------------|-----------------|
| Euclidean Distance | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Dot Product | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Cosine Similarity | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Manhattan Distance | N/A | 4 floats/cycle | 1 float/cycle |

#### Performance Characteristics

| Metric | AVX2 | NEON | Scalar |
|--------|------|------|--------|
| **512-dim Euclidean** | ~16M ops/sec | ~8M ops/sec | ~2M ops/sec |
| **384-dim Cosine** | ~143ns | ~200ns | ~800ns |
| **1536-dim Dot Product** | ~33ns | ~50ns | ~150ns |

#### Security Guarantees

- Bounds checking via `assert_eq!(a.len(), b.len())` prevents buffer overflows
- Unaligned loads (`_mm256_loadu_ps`, `vld1q_f32`) handle arbitrary alignment
- Scalar fallback handles remainder elements after SIMD processing

### 2. Distance Metrics Layer (`distance.rs`)

High-level distance API with optional SimSIMD integration for additional acceleration.

#### Supported Metrics

```rust
pub enum DistanceMetric {
    Euclidean,   // L2 distance: sqrt(sum((a[i] - b[i])^2))
    Cosine,      // 1 - cosine_similarity
    DotProduct,  // Negative dot product (for maximization)
    Manhattan,   // L1 distance: sum(|a[i] - b[i]|)
}
```

#### Feature Flags

| Feature | Description | Use Case |
|---------|-------------|----------|
| `simd` | SimSIMD acceleration | Native builds |
| `parallel` | Rayon batch processing | Multi-core systems |
| None | Pure Rust fallback | WASM builds |

#### Batch Distance API

```rust
pub fn batch_distances(
    query: &[f32],
    vectors: &[Vec<f32>],
    metric: DistanceMetric,
) -> Result<Vec<f32>> {
    #[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
    {
        use rayon::prelude::*;
        vectors.par_iter()
            .map(|v| distance(query, v, metric))
            .collect()
    }
    // Sequential fallback for WASM...
}
```

### 3. Index Structures (`index/`)

#### HNSW Index (`index/hnsw.rs`)

Hierarchical Navigable Small World graph for approximate nearest neighbor search.

**Configuration Parameters:**

| Parameter | Default | Description |
|-----------|---------|-------------|
| `m` | 32 | Connections per layer (higher = better recall, more memory) |
| `ef_construction` | 200 | Build-time search depth (higher = better graph, slower build) |
| `ef_search` | 100 | Query-time search depth (higher = better recall, slower query) |
| `max_elements` | 10M | Pre-allocated capacity |

**Complexity Analysis:**

| Operation | Time Complexity | Space Complexity |
|-----------|-----------------|------------------|
| Insert | O(log n * m * ef_construction) | O(m * log n) per vector |
| Search | O(log n * m * ef_search) | O(ef_search) |
| Delete | O(1)* | O(1) |

*Note: HNSW deletion marks vectors as removed but does not restructure the graph.

**Serialization:**

```rust
pub struct HnswState {
    vectors: Vec<(String, Vec<f32>)>,
    id_to_idx: Vec<(String, usize)>,
    idx_to_id: Vec<(usize, String)>,
    next_idx: usize,
    config: SerializableHnswConfig,
    dimensions: usize,
    metric: SerializableDistanceMetric,
}
```

#### Flat Index

Linear scan index for small datasets or exact search.

**Use Cases:**
- Datasets < 10K vectors
- Exact k-NN required
- Benchmarking HNSW recall

### 4. Quantization Strategies (`quantization.rs`)

Memory compression techniques trading precision for storage efficiency.

#### Scalar Quantization (4x compression)

Quantizes f32 to u8 using min-max scaling.

```rust
pub struct ScalarQuantized {
    pub data: Vec<u8>,     // Quantized values
    pub min: f32,          // Minimum for dequantization
    pub scale: f32,        // Scale factor
}
```

**Characteristics:**
- Compression: 4x (f32 -> u8)
- Distance calculation: Uses average scale for symmetric distance
- Reconstruction error: < 0.4% for typical embedding distributions

#### Product Quantization (8-16x compression)

Divides vectors into subspaces, each quantized independently via k-means codebooks.

```rust
pub struct ProductQuantized {
    pub codes: Vec<u8>,                    // One code per subspace
    pub codebooks: Vec<Vec<Vec<f32>>>,     // Learned centroids
}
```

**Training:**
- K-means clustering on subspace vectors
- Codebook size typically 256 (fits in u8)
- Iterations: 10-100 for convergence

#### Binary Quantization (32x compression)

Single-bit representation based on sign.

```rust
pub struct BinaryQuantized {
    pub bits: Vec<u8>,      // Packed bits (8 dimensions per byte)
    pub dimensions: usize,
}
```

**Characteristics:**
- Compression: 32x (f32 -> 1 bit)
- Distance: Hamming distance (XOR + popcount)
- Best for: Filtering stage before exact distance on candidates

#### Tiered Compression Strategy

Ruvector automatically manages compression based on access patterns:

| Access Frequency | Format | Compression | Latency |
|-----------------|--------|-------------|---------|
| Hot (>80%) | f32 | 1x | Instant |
| Warm (40-80%) | f16 | 2x | ~1us |
| Cool (10-40%) | Scalar | 4x | ~10us |
| Cold (1-10%) | Product | 8-16x | ~100us |
| Archive (<1%) | Binary | 32x | ~1ms |

### 5. Memory Management

#### Arena Allocator (`arena.rs`)

Bump allocator for batch operations reducing allocation overhead.

#### Lock-Free Structures (`lockfree.rs`)

- Crossbeam-based concurrent data structures
- Lock-free queues for batch ingestion
- Available only on `parallel` feature (not WASM)

#### Cache-Optimized Operations (`cache_optimized.rs`)

- Prefetching hints for sequential access
- Cache-line aligned storage
- NUMA-aware allocation on supported platforms

### 6. Storage Layer (`storage.rs`)

#### Native Storage (REDB)

- ACID transactions
- Memory-mapped vectors
- Configuration persistence
- Connection pooling for multiple VectorDB instances

```rust
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");
```

**Security:**
- Path traversal protection
- Validates relative paths don't escape working directory

#### Memory-Only Storage (`storage_memory.rs`)

- Pure in-memory for WASM
- No persistence
- DashMap for concurrent access

---

## Integration Points

### 1. Policy Memory Store

Ruvector serves as the backing store for AI agent policy memory:

```
+-------------------+       +-------------------+       +-------------------+
|   AI Agent        |       |   Policy Memory   |       |   ruvector-core   |
|                   | ----> |   (AgenticDB)     | ----> |                   |
| "What action for  |       | Search similar    |       | HNSW search       |
|  this situation?" |       | past situations   |       | with metadata     |
+-------------------+       +-------------------+       +-------------------+
```

**Use Cases:**
- Q-learning state-action lookups
- Contextual bandit policy retrieval
- Episodic memory for reasoning

### 2. Session State Index

Real-time session context for conversational AI:

```
+-------------------+       +-------------------+       +-------------------+
|   Chat Session    |       |   Session Index   |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Current context   |       | Find relevant     |       | Cosine similarity |
| embedding         |       | past turns        |       | top-k search      |
+-------------------+       +-------------------+       +-------------------+
```

**Requirements:**
- < 10ms latency for interactive use
- Session isolation via namespaces
- TTL-based cleanup

### 3. Witness Log for Audit

Cryptographically-linked audit trail:

```
+-------------------+       +-------------------+       +-------------------+
|   Agent Action    |       |   Witness Log     |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Action embedding  |       | Store with hash   |       | Append-only       |
| + metadata        |       | chain reference   |       | with timestamps   |
+-------------------+       +-------------------+       +-------------------+
```

**Properties:**
- Immutable entries
- Hash-chain linking
- Semantic searchability

---

## Decision Drivers

### 1. Performance (Sub-millisecond Latency)

| Requirement | Implementation |
|-------------|----------------|
| 61us p50 search | SIMD-optimized distance + HNSW |
| 16,400 QPS | Parallel search with Rayon |
| Batch ingestion | Lock-free queues + bulk insert |

### 2. Memory Efficiency (Quantization Support)

| Requirement | Implementation |
|-------------|----------------|
| 4x compression | Scalar quantization |
| 8-16x compression | Product quantization |
| 32x compression | Binary quantization |
| Automatic tiering | Access pattern tracking |

### 3. Cross-Platform Portability (WASM, Native)

| Platform | Features Available |
|----------|-------------------|
| x86_64 Linux/macOS | Full (SIMD, parallel, storage) |
| ARM64 macOS (Apple Silicon) | Full (NEON, parallel, storage) |
| WASM (browser) | Memory-only, scalar fallback |
| PostgreSQL extension | Full + SQL integration |

### 4. LLM Integration

| Requirement | Implementation |
|-------------|----------------|
| Embedding ingestion | API-based and local providers |
| Semantic search | Cosine/dot product metrics |
| RAG pipeline | Hybrid search + metadata filtering |

---

## Alternatives Considered

### Alternative 1: Pure Python Implementation (NumPy/FAISS)

**Rejected because:**
- 10-100x slower than Rust SIMD
- No WASM support
- GIL contention in concurrent workloads

### Alternative 2: C++ with Bindings

**Rejected because:**
- Memory safety concerns
- Complex cross-compilation
- Build system complexity (CMake)

### Alternative 3: Qdrant/Milvus Integration

**Rejected because:**
- External service dependency
- No WASM support
- Complex deployment for edge use cases

### Alternative 4: GPU-Only Acceleration (CUDA/ROCm)

**Rejected because:**
- Not portable to edge/mobile
- Driver dependencies
- Overkill for < 100M vectors

---

## Consequences

### Benefits

1. **Performance**: Sub-millisecond latency enables real-time AI applications
2. **Portability**: Single codebase runs native, WASM, and PostgreSQL
3. **Memory Efficiency**: 2-32x compression makes large datasets practical on edge
4. **Integration**: Native Rust means zero-cost abstractions for embedding in other systems
5. **Learning**: GNN layers can improve search quality without reindexing

### Risks and Mitigations

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| HNSW recall < 100% | High | Medium | ef_search tuning, hybrid with exact search |
| Quantization accuracy loss | Medium | Medium | Conformal prediction bounds |
| WASM performance gap | Medium | Low | Specialized WASM-optimized builds |
| API embeddings require external call | High | Low | Local embedding option via ONNX |

### Performance Targets

| Metric | Target | Achieved |
|--------|--------|----------|
| HNSW Search (k=10, 384-dim) | < 100us p50 | 61us |
| HNSW Search (k=100, 384-dim) | < 200us p50 | 164us |
| Cosine Distance (1536-dim) | < 200ns | 143ns |
| Dot Product (384-dim) | < 50ns | 33ns |
| Batch Distance (1000 vectors) | < 500us | 237us |
| QPS (10K vectors, k=10) | > 10K | 16,400 |

---

## Implementation Status

### Completed (v0.1.x)

| Module | Status | Description |
|--------|--------|-------------|
| `simd_intrinsics` | Complete | AVX2/NEON dispatch with scalar fallback |
| `distance` | Complete | All 4 metrics with SimSIMD integration |
| `index/hnsw` | Complete | Full HNSW with serialization |
| `index/flat` | Complete | Linear scan baseline |
| `quantization` | Complete | Scalar, Product, Binary |
| `storage` | Complete | REDB-based with connection pooling |
| `storage_memory` | Complete | In-memory for WASM |
| `types` | Complete | Core types with serde |
| `error` | Complete | Error types with thiserror |
| `vector_db` | Complete | High-level API |
| `agenticdb` | Complete | AI agent memory interface |

### Advanced Features

| Module | Status | Description |
|--------|--------|-------------|
| `advanced_features/filtered_search` | Complete | Metadata-based filtering |
| `advanced_features/hybrid_search` | Complete | Dense + sparse (BM25) |
| `advanced_features/mmr` | Complete | Maximal Marginal Relevance |
| `advanced_features/conformal_prediction` | Complete | Uncertainty quantification |
| `advanced_features/product_quantization` | Complete | Enhanced PQ with training |

### Research Features (`advanced/`)

| Module | Status | Description |
|--------|--------|-------------|
| `hypergraph` | Experimental | Hyperedge relationships |
| `learned_index` | Experimental | Neural index structures |
| `neural_hash` | Experimental | LSH with neural tuning |
| `tda` | Experimental | Topological data analysis |

---

## Feature Flags

| Feature | Default | Description |
|---------|---------|-------------|
| `default` | Yes | simd, storage, hnsw, api-embeddings, parallel |
| `simd` | Yes | SimSIMD acceleration |
| `parallel` | Yes | Rayon parallel processing |
| `storage` | Yes | REDB file-based storage |
| `hnsw` | Yes | HNSW index support |
| `api-embeddings` | Yes | HTTP-based embedding providers |
| `memory-only` | No | Pure in-memory (WASM) |
| `real-embeddings` | No | Deprecated, use api-embeddings |

---

## Dependencies

### Core Dependencies

| Dependency | Version | Purpose |
|------------|---------|---------|
| `hnsw_rs` | workspace | HNSW implementation |
| `simsimd` | workspace | SIMD distance functions |
| `rayon` | workspace | Parallel iteration |
| `redb` | workspace | Embedded database |
| `bincode` | workspace | Binary serialization |
| `dashmap` | workspace | Concurrent hash map |
| `parking_lot` | workspace | Optimized locks |

### Optional Dependencies

| Dependency | Feature | Purpose |
|------------|---------|---------|
| `reqwest` | api-embeddings | HTTP client for embedding APIs |
| `memmap2` | storage | Memory-mapped files |
| `crossbeam` | parallel | Lock-free data structures |

---

## API Examples

### Basic Vector Search

```rust
use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};

// Create database
let config = HnswConfig {
    m: 32,
    ef_construction: 200,
    ef_search: 100,
    max_elements: 1_000_000,
};
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;

// Insert vectors
db.insert("doc_1".to_string(), vec![0.1; 384])?;
db.insert("doc_2".to_string(), vec![0.2; 384])?;

// Search
let query = vec![0.15; 384];
let results = db.search(&query, 10)?;
```

### Quantized Search

```rust
use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};

// Quantize vectors for storage
let quantized = ScalarQuantized::quantize(&vector);

// Distance in quantized space
let distance = quantized.distance(&other_quantized);

// Reconstruct if needed
let reconstructed = quantized.reconstruct();
```

### Batch Operations

```rust
use ruvector_core::distance::batch_distances;

// Calculate distances to many vectors in parallel
let distances = batch_distances(
    &query,
    &corpus_vectors,
    DistanceMetric::Cosine,
)?;
```

---

## References

1. Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.

2. Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.

3. RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/

4. SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD

---

## Appendix A: SIMD Register Usage

### AVX2 (256-bit registers)

```
+-------+-------+-------+-------+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]     [4]     [5]     [6]     [7]

Operations per cycle:
- _mm256_loadu_ps: Load 8 floats
- _mm256_sub_ps: 8 subtractions
- _mm256_mul_ps: 8 multiplications
- _mm256_add_ps: 8 additions
```

### NEON (128-bit registers)

```
+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]

Operations per cycle:
- vld1q_f32: Load 4 floats
- vsubq_f32: 4 subtractions
- vfmaq_f32: 4 fused multiply-add
- vaddvq_f32: Horizontal sum
```

---

## Appendix B: Memory Layout

### VectorEntry

```
+------------------+------------------+------------------+
|     id: String   |  vector: Vec<f32>|  metadata: JSON  |
|     (optional)   |  (required)      |  (optional)      |
+------------------+------------------+------------------+
```

### HNSW Graph Structure

```
Level 3:  [v0] -------- [v5]
            \            /
Level 2:  [v0] -- [v3] -- [v5] -- [v9]
            \    /    \    /    \
Level 1:  [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
            |    |    |    |    |    |    |
Level 0:  [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]
```

---

## Appendix C: Benchmark Results

### Platform: Apple M2 (ARM64 NEON)

```
HNSW Search k=10 (10K vectors, 384-dim):
  p50: 61us
  p95: 89us
  p99: 112us
  Throughput: 16,400 QPS

HNSW Search k=100 (10K vectors, 384-dim):
  p50: 164us
  p95: 203us
  p99: 245us
  Throughput: 6,100 QPS

Distance Operations (1536-dim):
  Cosine: 143ns
  Euclidean: 156ns
  Dot Product: 33ns (384-dim)

Batch Distance (1000 vectors, 384-dim):
  Parallel (Rayon): 237us
  Sequential: 890us
```

### Platform: Intel i7 (AVX2)

```
HNSW Search k=10 (10K vectors, 384-dim):
  p50: 72us
  p95: 105us
  p99: 134us
  Throughput: 13,900 QPS

Distance Operations (1536-dim):
  Cosine: 128ns
  Euclidean: 141ns
  Dot Product: 29ns (384-dim)
```

---

## Related Decisions

- **ADR-002**: RuvLLM Integration with Ruvector
- **ADR-003**: SIMD Optimization Strategy
- **ADR-004**: KV Cache Management
- **ADR-005**: WASM Runtime Integration
- **ADR-006**: Memory Management
- **ADR-007**: Security Review & Technical Debt

---

## Implementation Status (v2.1)

| Component | Status | Notes |
|-----------|--------|-------|
| HNSW Index | ✅ Implemented | M=32, ef_construct=256, 16K QPS |
| SIMD Distance | ✅ Implemented | AVX2/NEON with fallback |
| Scalar Quantization | ✅ Implemented | 8-bit with min/max scaling |
| Batch Operations | ✅ Implemented | Rayon parallel distances |
| Graph Store | ✅ Implemented | Adjacency list with metadata |
| Persistence | ✅ Implemented | Binary format with versioning |

**Security Status:** Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).

---

## Revision History

| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | Ruvector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added implementation status, related decisions |