Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
1642
vendor/ruvector/docs/architecture/LLM-Integration-Architecture.md
vendored
Normal file
1642
vendor/ruvector/docs/architecture/LLM-Integration-Architecture.md
vendored
Normal file
File diff suppressed because it is too large
Load Diff
1661
vendor/ruvector/docs/architecture/NPM_PACKAGE_ARCHITECTURE.md
vendored
Normal file
1661
vendor/ruvector/docs/architecture/NPM_PACKAGE_ARCHITECTURE.md
vendored
Normal file
File diff suppressed because it is too large
Load Diff
393
vendor/ruvector/docs/architecture/SYSTEM_OVERVIEW.md
vendored
Normal file
393
vendor/ruvector/docs/architecture/SYSTEM_OVERVIEW.md
vendored
Normal file
@@ -0,0 +1,393 @@
|
||||
# Ruvector System Architecture Overview
|
||||
|
||||
## Introduction
|
||||
|
||||
Ruvector is a high-performance vector database built in Rust, designed to deliver 10-100x performance improvements over Python/TypeScript implementations while maintaining full AgenticDB API compatibility.
|
||||
|
||||
## Architecture Principles
|
||||
|
||||
### 1. **Performance First**
|
||||
- Zero-cost abstractions via Rust
|
||||
- SIMD-optimized distance calculations
|
||||
- Lock-free concurrent data structures
|
||||
- Memory-mapped I/O for instant loading
|
||||
|
||||
### 2. **Multi-Platform**
|
||||
- Single codebase deploys everywhere
|
||||
- Rust native, Node.js via NAPI-RS, Browser via WASM
|
||||
- CLI for standalone operation
|
||||
|
||||
### 3. **Production Ready**
|
||||
- Memory safety without garbage collection
|
||||
- ACID transactions via redb
|
||||
- Crash recovery and data durability
|
||||
- Extensive test coverage
|
||||
|
||||
### 4. **Extensible**
|
||||
- Trait-based abstractions
|
||||
- Pluggable distance metrics and indexes
|
||||
- Advanced features as opt-in modules
|
||||
|
||||
## System Layers
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Application Layer │
|
||||
│ (AgenticDB API, VectorDB API, CLI Commands, MCP Tools) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Query Engine │
|
||||
│ • Parallel search (rayon) │
|
||||
│ • SIMD distance calculations (SimSIMD) │
|
||||
│ • Filtered search (pre/post) │
|
||||
│ • Hybrid search (vector + BM25) │
|
||||
│ • MMR diversity │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Index Layer │
|
||||
│ • HNSW (hnsw_rs): O(log n) approximate search │
|
||||
│ • Flat index: Brute force for small datasets │
|
||||
│ • Quantized indexes: Compressed search │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Storage Layer │
|
||||
│ • Vector storage: memmap2 (zero-copy) │
|
||||
│ • Metadata: redb (ACID transactions) │
|
||||
│ • Index persistence: rkyv (zero-copy serialization) │
|
||||
│ • AgenticDB tables: Specialized storage │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Core Components
|
||||
|
||||
### 1. Storage Layer
|
||||
|
||||
**Purpose**: Persist vectors and metadata with ACID guarantees and instant loading.
|
||||
|
||||
**Technologies**:
|
||||
- **redb**: LMDB-inspired embedded database for metadata
|
||||
- ACID transactions
|
||||
- Crash recovery
|
||||
- Zero-copy reads
|
||||
- Pure Rust (no C dependencies)
|
||||
|
||||
- **memmap2**: Memory-mapped vector storage
|
||||
- Zero-copy access
|
||||
- OS-managed caching
|
||||
- Instant loading (no deserialization)
|
||||
- Supports datasets larger than RAM
|
||||
|
||||
- **rkyv**: Zero-copy serialization for index persistence
|
||||
- Direct pointer access to serialized data
|
||||
- No deserialization overhead
|
||||
- Sub-second loading for billion-scale indexes
|
||||
|
||||
**Data Layout**:
|
||||
```
|
||||
vectors.db/
|
||||
├── metadata.redb # redb database (vector IDs, metadata, config)
|
||||
├── vectors.bin # Memory-mapped vectors (aligned f32 arrays)
|
||||
├── index.rkyv # Serialized HNSW graph
|
||||
└── agenticdb/ # AgenticDB specialized tables
|
||||
├── reflexion.redb
|
||||
├── skills.redb
|
||||
├── causal.redb
|
||||
└── learning.redb
|
||||
```
|
||||
|
||||
### 2. Index Layer
|
||||
|
||||
**Purpose**: Fast approximate nearest neighbor (ANN) search.
|
||||
|
||||
**Primary: HNSW (Hierarchical Navigable Small World)**
|
||||
- **Complexity**: O(log n) search, O(n log n) build
|
||||
- **Recall**: 95%+ with proper tuning
|
||||
- **Memory**: ~640 bytes per vector (M=32, 128D vectors)
|
||||
- **Parameters**:
|
||||
- `m`: Connections per node (16-64)
|
||||
- `ef_construction`: Build quality (100-400)
|
||||
- `ef_search`: Query-time quality (50-500)
|
||||
|
||||
**Implementation**: Uses `hnsw_rs` crate with custom optimizations:
|
||||
- Parallel construction via rayon
|
||||
- SIMD distance calculations
|
||||
- Lock-free concurrent search
|
||||
- Custom quantization integration
|
||||
|
||||
**Alternative: Flat Index**
|
||||
- Brute-force exact search
|
||||
- Optimal for < 10K vectors
|
||||
- 100% recall
|
||||
- Simple fallback when HNSW overhead not justified
|
||||
|
||||
### 3. Query Engine
|
||||
|
||||
**Purpose**: Execute searches efficiently with various strategies.
|
||||
|
||||
**Components**:
|
||||
|
||||
a) **Distance Calculation**
|
||||
- **SimSIMD**: Production-ready SIMD kernels
|
||||
- L2 (Euclidean)
|
||||
- Cosine similarity
|
||||
- Dot product
|
||||
- Manhattan (L1)
|
||||
- **Speedup**: 4-16x vs scalar implementations
|
||||
- **Architecture support**: AVX2, AVX-512, ARM NEON/SVE
|
||||
|
||||
b) **Parallel Execution**
|
||||
- **rayon**: Data parallelism for CPU-bound operations
|
||||
- Batch inserts
|
||||
- Parallel queries
|
||||
- Index construction
|
||||
- **Scaling**: Near-linear to CPU core count
|
||||
|
||||
c) **Advanced Search Strategies**
|
||||
- **Filtered Search**: Metadata-based constraints
|
||||
- Pre-filtering: Apply before graph traversal
|
||||
- Post-filtering: Apply after retrieval
|
||||
- **Hybrid Search**: Vector + keyword (BM25)
|
||||
- **MMR**: Maximal Marginal Relevance for diversity
|
||||
|
||||
### 4. Application Layer
|
||||
|
||||
**Purpose**: Provide user-facing APIs across platforms.
|
||||
|
||||
**APIs**:
|
||||
|
||||
a) **Core VectorDB API**
|
||||
```rust
|
||||
pub trait VectorDB {
|
||||
fn insert(&self, entry: VectorEntry) -> Result<VectorId>;
|
||||
fn insert_batch(&self, entries: Vec<VectorEntry>) -> Result<Vec<VectorId>>;
|
||||
fn search(&self, query: &SearchQuery) -> Result<Vec<SearchResult>>;
|
||||
fn delete(&self, id: &VectorId) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
b) **AgenticDB API** (5-table schema)
|
||||
- `vectors_table`: Core embeddings
|
||||
- `reflexion_episodes`: Self-critique memory
|
||||
- `skills_library`: Consolidated patterns
|
||||
- `causal_edges`: Cause-effect hypergraphs
|
||||
- `learning_sessions`: RL training data
|
||||
|
||||
c) **Platform Bindings**
|
||||
- **Rust**: Native library
|
||||
- **Node.js**: NAPI-RS bindings with TypeScript definitions
|
||||
- **WASM**: wasm-bindgen for browser
|
||||
- **CLI**: clap-based command-line interface
|
||||
- **MCP**: Model Context Protocol tools
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Insert Operation
|
||||
|
||||
```
|
||||
Application
|
||||
↓ insert(vector, metadata)
|
||||
VectorDB
|
||||
↓ assign ID
|
||||
↓ store metadata → redb
|
||||
↓ append vector → memmap
|
||||
↓ add to index → HNSW
|
||||
↓ [optional] quantize
|
||||
↓ persist index → rkyv
|
||||
↓
|
||||
Return ID
|
||||
```
|
||||
|
||||
**Optimizations**:
|
||||
- Batch inserts amortize transaction overhead
|
||||
- Parallel index updates
|
||||
- Lazy quantization (on first search if enabled)
|
||||
|
||||
### Search Operation
|
||||
|
||||
```
|
||||
Application
|
||||
↓ search(query, k, filters)
|
||||
VectorDB
|
||||
↓ [optional] apply pre-filters
|
||||
↓ normalize query (if cosine)
|
||||
Query Engine
|
||||
↓ HNSW graph traversal
|
||||
↓ ├─ Start at entry point
|
||||
↓ ├─ Greedy search per layer
|
||||
↓ └─ Refine at bottom layer
|
||||
↓ SIMD distance calculations
|
||||
↓ [optional] apply post-filters
|
||||
↓ [optional] re-rank with full precision
|
||||
↓ top-k selection
|
||||
↓
|
||||
Return results
|
||||
```
|
||||
|
||||
**Optimizations**:
|
||||
- Quantized search for initial retrieval
|
||||
- Full-precision re-ranking
|
||||
- SIMD vectorization
|
||||
- Lock-free graph reads
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Time Complexity
|
||||
|
||||
| Operation | Complexity | Notes |
|
||||
|-----------|-----------|-------|
|
||||
| Insert (HNSW) | O(log n) | Amortized per insertion |
|
||||
| Batch insert | O(n log n) | Parallelized across cores |
|
||||
| Search (HNSW) | O(log n) | With 95% recall |
|
||||
| Search (Flat) | O(n) | Exact search |
|
||||
| Delete | O(log n) | Mark deleted in HNSW |
|
||||
|
||||
### Space Complexity
|
||||
|
||||
| Component | Memory per vector | Notes |
|
||||
|-----------|------------------|-------|
|
||||
| Full precision (128D) | 512 bytes | 128 × 4 bytes |
|
||||
| HNSW graph (M=32) | ~640 bytes | M × 2 layers × 10 bytes/edge |
|
||||
| Scalar quantization | 128 bytes | 4x compression |
|
||||
| Product quantization | 16 bytes | 32x compression (16 subspaces) |
|
||||
| Metadata | Variable | Stored in redb |
|
||||
|
||||
**Total for 1M vectors (128D, HNSW M=32, scalar quant)**:
|
||||
- Vectors: 128 MB (quantized)
|
||||
- HNSW: 640 MB
|
||||
- Metadata: ~50 MB
|
||||
- **Total**: ~818 MB vs ~1.2 GB uncompressed
|
||||
|
||||
### Latency Characteristics
|
||||
|
||||
**1M vectors, 128D, HNSW (M=32, ef_search=100)**:
|
||||
- p50: 0.8ms
|
||||
- p95: 2.1ms
|
||||
- p99: 4.5ms
|
||||
|
||||
**Factors affecting latency**:
|
||||
- Vector dimensionality (linear impact)
|
||||
- Dataset size (logarithmic impact with HNSW)
|
||||
- HNSW ef_search parameter (linear impact)
|
||||
- Quantization (0.8-1.2x slower, but cache-friendly)
|
||||
- SIMD availability (4-16x speedup)
|
||||
|
||||
## Concurrency Model
|
||||
|
||||
### Read Operations
|
||||
- **Lock-free**: Multiple concurrent searches
|
||||
- **Mechanism**: Arc<RwLock<T>> with read locks
|
||||
- **Scalability**: Linear with CPU cores
|
||||
|
||||
### Write Operations
|
||||
- **Exclusive lock**: Single writer at a time
|
||||
- **Mechanism**: RwLock write lock
|
||||
- **Batch optimization**: Amortize lock overhead
|
||||
|
||||
### Mixed Workloads
|
||||
- Readers don't block readers
|
||||
- Writers block all operations
|
||||
- Read-heavy workloads scale well (typical for vector DB)
|
||||
|
||||
## Memory Management
|
||||
|
||||
### Zero-Copy Patterns
|
||||
1. **Memory-mapped vectors**: OS manages paging
|
||||
2. **rkyv serialization**: Direct pointer access
|
||||
3. **NAPI-RS buffers**: Share TypedArrays with Node.js
|
||||
4. **WASM memory**: Direct ArrayBuffer access
|
||||
|
||||
### Memory Safety
|
||||
- Rust's ownership system prevents:
|
||||
- Use-after-free
|
||||
- Double-free
|
||||
- Data races
|
||||
- Buffer overflows
|
||||
- No garbage collection overhead
|
||||
|
||||
### Resource Limits
|
||||
- **Max vectors**: Configurable (default 10M)
|
||||
- **Max dimensions**: Theoretically unlimited (practical limit ~4096)
|
||||
- **Memory-mapped limit**: OS-dependent (typically 128TB on 64-bit)
|
||||
|
||||
## Extensibility Points
|
||||
|
||||
### 1. Distance Metrics
|
||||
```rust
|
||||
pub trait DistanceMetric: Send + Sync {
|
||||
fn distance(&self, a: &[f32], b: &[f32]) -> f32;
|
||||
fn batch_distance(&self, a: &[f32], batch: &[&[f32]]) -> Vec<f32>;
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Index Structures
|
||||
```rust
|
||||
pub trait IndexStructure: Send + Sync {
|
||||
fn insert(&mut self, id: VectorId, vector: &[f32]) -> Result<()>;
|
||||
fn search(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>>;
|
||||
fn delete(&mut self, id: VectorId) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Quantization Methods
|
||||
```rust
|
||||
pub trait Quantizer: Send + Sync {
|
||||
type Quantized;
|
||||
fn quantize(&self, vector: &[f32]) -> Self::Quantized;
|
||||
fn distance(&self, a: &Self::Quantized, b: &Self::Quantized) -> f32;
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Memory Safety
|
||||
- Rust prevents entire classes of vulnerabilities
|
||||
- No buffer overflows, use-after-free, or data races
|
||||
|
||||
### Input Validation
|
||||
- Vector dimension checks
|
||||
- ID format validation
|
||||
- Metadata size limits
|
||||
- Query parameter bounds
|
||||
|
||||
### Resource Limits
|
||||
- Maximum query size
|
||||
- Rate limiting (application-level)
|
||||
- Memory quotas
|
||||
- Disk space monitoring
|
||||
|
||||
### Data Privacy
|
||||
- On-premises deployment option
|
||||
- No telemetry by default
|
||||
- Memory zeroing on delete
|
||||
- Encrypted storage (via OS-level encryption)
|
||||
|
||||
## Future Architecture Enhancements
|
||||
|
||||
### Phase 1 (Current)
|
||||
- HNSW indexing
|
||||
- Scalar & product quantization
|
||||
- AgenticDB compatibility
|
||||
- Multi-platform bindings
|
||||
|
||||
### Phase 2 (Near-term)
|
||||
- Distributed query processing
|
||||
- Horizontal scaling with sharding
|
||||
- GPU acceleration for distance calculations
|
||||
- Learned index structures (hybrid with HNSW)
|
||||
|
||||
### Phase 3 (Long-term)
|
||||
- Hypergraph structures for n-ary relationships
|
||||
- Temporal indexes for time-series embeddings
|
||||
- Neural hash functions for improved compression
|
||||
- Neuromorphic hardware support (Intel Loihi)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Storage Layer](STORAGE_LAYER.md) - Detailed storage architecture
|
||||
- [Index Structures](INDEX_STRUCTURES.md) - HNSW and flat indexes
|
||||
- [Quantization](QUANTIZATION.md) - Compression techniques
|
||||
- [Performance](../optimization/PERFORMANCE_TUNING_GUIDE.md) - Optimization guide
|
||||
- [API Reference](../api/) - Complete API documentation
|
||||
780
vendor/ruvector/docs/architecture/TECHNICAL_PLAN.md
vendored
Normal file
780
vendor/ruvector/docs/architecture/TECHNICAL_PLAN.md
vendored
Normal file
@@ -0,0 +1,780 @@
|
||||
# Ruvector: Next-Generation Vector Database Technical Plan
|
||||
|
||||
## Bottom Line Up Front
|
||||
|
||||
**Ruvector should be a high-performance Rust-native vector database with agenticDB API compatibility, achieving sub-millisecond latency through HNSW indexing, SIMD-optimized distance calculations, and zero-copy memory access.** Target performance: 10-100x faster than current solutions through Rust’s zero-cost abstractions, modern quantization techniques (4-32x compression), and multi-platform deployment (Node.js via NAPI-RS, browser via WASM, native Rust). The architecture combines battle-tested algorithms (HNSW, Product Quantization) with emerging techniques (hypergraph structures, learned indexes) for production-ready performance today with a clear path to future innovations.
|
||||
|
||||
**Why it matters**: Vector databases are the foundation of modern AI applications (RAG, semantic search, recommender systems), but existing solutions are limited by interpreted language overhead, inefficient memory management, or cloud-only deployment. Ruvector fills a critical gap: a single high-performance codebase deployable everywhere—Node.js, browsers, edge devices, and native applications—with agenticDB compatibility ensuring seamless migration for existing users.
|
||||
|
||||
**The opportunity**: AgenticDB demonstrates the API patterns and cognitive capabilities users want (reflexion memory, skill libraries, causal reasoning), while state-of-the-art research shows HNSW + quantization achieves 95%+ recall at 1-2ms latency. Rust provides 2-50x performance improvements over Python/TypeScript while maintaining memory safety. The combination creates a 10-100x performance advantage while adding zero-ops deployment and browser-native capabilities no competitor offers.
|
||||
|
||||
# Ruvector: Practical Market Analysis
|
||||
|
||||
## What It Actually Is
|
||||
|
||||
**In one sentence:** A Rust-based vector database that runs everywhere (servers, browsers, mobile) with your AgenticDB API, achieving 10-100x faster searches than current solutions.
|
||||
|
||||
## The Real-World Problem It Solves
|
||||
|
||||
Your AI agent needs to:
|
||||
- Remember past conversations (semantic search)
|
||||
- Find similar code patterns (embedding search)
|
||||
- Retrieve relevant documents (RAG systems)
|
||||
- Learn from experience (reflexion memory)
|
||||
|
||||
Current solutions force you to choose:
|
||||
- **Fast but cloud-only** (Pinecone, Weaviate) - Can't run offline, costs scale with queries
|
||||
- **Open but slow** (ChromaDB, LanceDB) - Python/JS overhead, 50-100x slower
|
||||
- **Browser-capable but limited** (RxDB Vector) - Works offline but slow for >10K vectors
|
||||
|
||||
**Ruvector gives you all three:** Fast + open source + runs anywhere.
|
||||
|
||||
## Market Comparison Table
|
||||
|
||||
| Feature | Ruvector | Pinecone | Qdrant | ChromaDB | pgvector | Your AgenticDB |
|
||||
|---------|----------|----------|--------|----------|----------|----------------|
|
||||
| **Speed (QPS)** | 50K+ | 100K+ | 30K+ | 500 | 1K | ~100 |
|
||||
| **Latency (p50)** | <0.5ms | ~2ms | ~1ms | ~50ms | ~10ms | ~5ms |
|
||||
| **Language** | Rust | ? | Rust | Python | C | TypeScript |
|
||||
| **Browser Support** | ✅ Full | ❌ No | ❌ No | ❌ No | ❌ No | ✅ Full |
|
||||
| **Offline Capable** | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
|
||||
| **NPM Package** | ✅ Yes | ✅ Yes | ❌ No | ✅ Yes | ❌ No | ✅ Yes |
|
||||
| **Native Binary** | ✅ Yes | ❌ No | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
|
||||
| **AgenticDB API** | ✅ Full | ❌ No | ❌ No | ❌ No | ❌ No | ✅ Native |
|
||||
| **Memory (1M vectors)** | ~800MB | ~2GB | ~1GB | ~4GB | ~2GB | ~2GB |
|
||||
| **Quantization** | 3 types | Yes | Yes | No | No | No |
|
||||
| **Cost** | Free | $70+/mo | Free | Free | Free | Free |
|
||||
|
||||
## Closest Market Equivalents
|
||||
|
||||
### 1. **Qdrant** (Rust vector DB)
|
||||
**What it is:** Production Rust vector database, cloud + self-hosted
|
||||
**Similarity:** Same tech stack (Rust + HNSW), similar performance goals
|
||||
**Key differences:**
|
||||
- Qdrant = server-only, ruvector = anywhere (server, browser, mobile)
|
||||
- Qdrant = generic API, ruvector = AgenticDB-compatible cognitive features
|
||||
- Qdrant = separate Node.js client, ruvector = native NAPI-RS bindings
|
||||
|
||||
**Market position:** Qdrant is your closest competitor on performance, but lacks browser/edge deployment.
|
||||
|
||||
### 2. **LanceDB** (Embedded vector DB)
|
||||
**What it is:** Embedded database in Rust/Python, serverless-friendly
|
||||
**Similarity:** Embedded architecture, open source
|
||||
**Key differences:**
|
||||
- Lance = columnar format (Parquet), ruvector = row-based with mmap
|
||||
- Lance = disk-first, ruvector = memory-first with disk overflow
|
||||
- Lance = no browser support, ruvector = full WASM
|
||||
|
||||
**Market position:** Similar "embedded" positioning, but Lance prioritizes analytical workloads vs ruvector's real-time focus.
|
||||
|
||||
### 3. **RxDB Vector Plugin** (Browser vector DB)
|
||||
**What it is:** Vector search plugin for RxDB (browser database)
|
||||
**Similarity:** Browser-first, IndexedDB persistence, offline-capable
|
||||
**Key differences:**
|
||||
- RxDB = pure JavaScript (~slow), ruvector = Rust + WASM (~fast)
|
||||
- RxDB = ~10K vectors max, ruvector = 100K+ in browser
|
||||
- RxDB = 18x speedup with workers, ruvector = 100x+ with SIMD + workers
|
||||
|
||||
**Market position:** RxDB proves browser vector search demand exists, ruvector makes it production-viable at scale.
|
||||
|
||||
### 4. **Turbopuffer** (Fast vector search)
|
||||
**What it is:** Cloud-native vector DB emphasizing speed
|
||||
**Similarity:** Performance-first mindset, modern architecture
|
||||
**Key differences:**
|
||||
- Turbopuffer = cloud-only, ruvector = deploy anywhere
|
||||
- Turbopuffer = proprietary, ruvector = open source
|
||||
- Turbopuffer = starts $20/mo, ruvector = free
|
||||
|
||||
**Market position:** Similar performance claims, opposite deployment model.
|
||||
|
||||
## What Makes Ruvector Unique
|
||||
|
||||
**The "triple unlock":**
|
||||
|
||||
1. **Speed of compiled languages** (like Qdrant/Milvus)
|
||||
2. **Cognitive features of AgenticDB** (reflexion, skills, causal memory)
|
||||
3. **Browser deployment capability** (like RxDB but 100x faster)
|
||||
|
||||
**No existing solution has all three.**
|
||||
|
||||
## Real-World Use Cases
|
||||
|
||||
### Use Case 1: AI Agent Memory (Your Primary Target)
|
||||
**Current state:** AgenticDB in Node.js/TypeScript
|
||||
**Pain:** 5ms for 10K vectors = too slow for real-time agent responses
|
||||
**Ruvector solution:** <0.5ms for 10K vectors = 10x faster, same API
|
||||
**Impact:** Agents respond instantly, can handle 10x more context
|
||||
|
||||
### Use Case 2: Offline-First AI Apps
|
||||
**Current state:** Browser apps call Pinecone API (requires internet)
|
||||
**Pain:** Doesn't work offline, exposes data to cloud, costs per query
|
||||
**Ruvector solution:** 100K+ vector search running entirely in browser via WASM
|
||||
**Impact:** Privacy-preserving, offline-capable, zero hosting costs
|
||||
|
||||
### Use Case 3: Edge AI Devices
|
||||
**Current state:** Raspberry Pi/edge devices use Python ChromaDB
|
||||
**Pain:** Python too slow, high memory usage, can't fit large indexes
|
||||
**Ruvector solution:** Rust native binary, 4x less memory via quantization
|
||||
**Impact:** Run 4x larger models on same hardware, 50x faster queries
|
||||
|
||||
### Use Case 4: High-Scale RAG Systems
|
||||
**Current state:** Pinecone at $70-700/month for production traffic
|
||||
**Pain:** Costs scale linearly with queries, vendor lock-in
|
||||
**Ruvector solution:** Self-hosted on single server handles 50K QPS
|
||||
**Impact:** $70/mo → $50/mo server costs, 10x cost reduction at scale
|
||||
|
||||
## Technical Differentiators That Matter
|
||||
|
||||
### 1. **Multi-Platform from Single Codebase**
|
||||
**Problem:** Weaviate/Qdrant = separate clients per platform
|
||||
**Ruvector:** Same Rust code compiles to:
|
||||
- `npm install ruvector` (Node.js via NAPI-RS)
|
||||
- `<script>` tag (browser via WASM)
|
||||
- `cargo add ruvector` (native Rust)
|
||||
|
||||
**Why it matters:** Maintain one codebase, deploy everywhere. Browser support alone is unique.
|
||||
|
||||
### 2. **AgenticDB API Compatibility**
|
||||
**Problem:** Migrating vector DBs means rewriting all queries
|
||||
**Ruvector:** Drop-in replacement:
|
||||
```typescript
|
||||
// Before (AgenticDB)
|
||||
import { VectorDB } from 'agenticdb';
|
||||
const db = new VectorDB({ dimensions: 384 });
|
||||
|
||||
// After (Ruvector) - SAME CODE
|
||||
import { VectorDB } from 'ruvector';
|
||||
const db = new VectorDB({ dimensions: 384 });
|
||||
```
|
||||
**Why it matters:** Zero migration cost for your existing 25+ npm packages.
|
||||
|
||||
### 3. **Quantization Built-In**
|
||||
**Problem:** Most DBs store full float32 (4 bytes/dimension)
|
||||
**Ruvector:** Automatic compression:
|
||||
- Scalar (int8): 4x less memory, 97% accuracy
|
||||
- Product: 16x less memory, 90% accuracy
|
||||
- Binary: 32x less memory, 85% accuracy (for filtering)
|
||||
|
||||
**Why it matters:** 1M vectors = 2GB → 500MB, enabling 4x larger datasets in RAM.
|
||||
|
||||
### 4. **SIMD by Default**
|
||||
**Problem:** Python/JS use scalar operations (slow)
|
||||
**Ruvector:** SIMD intrinsics for distance calculations
|
||||
- AVX2: 4-8x faster than scalar
|
||||
- AVX-512: 8-16x faster than scalar
|
||||
- WASM SIMD: 4-6x faster in browsers
|
||||
|
||||
**Why it matters:** Vector search is 90% distance calculations - 8x faster = 8x more QPS.
|
||||
|
||||
## Market Gaps Ruvector Fills
|
||||
|
||||
### Gap 1: "Fast + Browser-Capable"
|
||||
**Existing:** Fast DBs (Qdrant, Milvus) = server-only
|
||||
**Existing:** Browser DBs (RxDB) = slow
|
||||
**Ruvector:** Fast + browser = new category
|
||||
|
||||
**Market validation:** Companies building offline-first AI apps currently can't do real-time vector search. Cursor/Copilot need local code search - currently impossible at scale.
|
||||
|
||||
### Gap 2: "Cognitive Memory for Agents"
|
||||
**Existing:** Generic vector DBs store embeddings
|
||||
**Existing:** AgenticDB has cognitive features but slow
|
||||
**Ruvector:** Cognitive features + performance
|
||||
|
||||
**Market validation:** Your 25+ AgenticDB packages prove demand. Reflexion, skills, causal memory = what agents need, not just embeddings.
|
||||
|
||||
### Gap 3: "Zero-Ops Vector Search"
|
||||
**Existing:** Cloud DBs need ops (scaling, monitoring)
|
||||
**Existing:** Self-hosted DBs need ops (deployment, backups)
|
||||
**Ruvector:** `npm install ruvector` = working vector DB
|
||||
|
||||
**Market validation:** Supabase/Vercel success proves developers want "just works" tools. Vector search should be library, not service.
|
||||
|
||||
## Competitive Moats
|
||||
|
||||
**What prevents Pinecone/Weaviate from copying ruvector?**
|
||||
|
||||
1. **Architecture lock-in:** Cloud DBs built for client-server, can't run in browser (need web sockets, auth, etc.). Ruvector designed "local-first" from day 1.
|
||||
|
||||
2. **Language choice:** Pinecone likely Python/Go, Weaviate is Go. Rewriting to Rust = 2+ year effort. You start with Rust advantage.
|
||||
|
||||
3. **API compatibility:** Generic vector DB APIs ignore cognitive patterns agents need. Your AgenticDB API is tailored for agent memory - network effects from existing packages.
|
||||
|
||||
4. **WASM expertise:** Compiling high-performance Rust to WASM with SIMD is non-trivial. Most companies lack expertise.
|
||||
|
||||
## Pricing Model Options
|
||||
|
||||
### Option 1: Fully Open Source
|
||||
- **Model:** MIT/Apache license, free forever
|
||||
- **Revenue:** Consulting, managed hosting, enterprise support
|
||||
- **Example:** Qdrant (open source + Qdrant Cloud)
|
||||
|
||||
### Option 2: Open Core
|
||||
- **Model:** Core free (HNSW, basic features), advanced paid (learned indexes, distributed)
|
||||
- **Revenue:** Enterprise licenses for advanced features
|
||||
- **Example:** MongoDB (community + enterprise)
|
||||
|
||||
### Option 3: Source Available
|
||||
- **Model:** Code visible, free for non-commercial, paid for commercial
|
||||
- **Revenue:** Commercial licenses
|
||||
- **Example:** Elastic (SSPL license)
|
||||
|
||||
**Recommendation:** Option 1 (fully open) given your existing open source ecosystem and democratization mission.
|
||||
|
||||
## Go-To-Market Strategy
|
||||
|
||||
### Phase 1: Developer Adoption (Months 1-6)
|
||||
**Target:** Your existing AgenticDB users (25+ packages)
|
||||
**Message:** "Same API, 100x faster"
|
||||
**Tactics:**
|
||||
- Migration guide with benchmarks
|
||||
- Blog posts on performance gains
|
||||
- npm package with drop-in replacement
|
||||
|
||||
**Success metric:** 1,000+ npm downloads/month
|
||||
|
||||
### Phase 2: Browser AI Apps (Months 6-12)
|
||||
**Target:** Offline-first AI app developers
|
||||
**Message:** "Vector search in your browser, no backend needed"
|
||||
**Tactics:**
|
||||
- Demo apps (local code search, offline RAG)
|
||||
- Integration with LangChain.js, Transformers.js
|
||||
- Show HN / Product Hunt launches
|
||||
|
||||
**Success metric:** 50+ production browser deployments
|
||||
|
||||
### Phase 3: Edge Computing (Months 12-18)
|
||||
**Target:** IoT, Raspberry Pi, mobile AI developers
|
||||
**Message:** "AI that works without internet"
|
||||
**Tactics:**
|
||||
- ARM binaries, mobile SDKs
|
||||
- Benchmark: Rust vs Python on Pi
|
||||
- Case studies from edge deployments
|
||||
|
||||
**Success metric:** Used in 10+ edge AI products
|
||||
|
||||
### Phase 4: Enterprise (Months 18+)
|
||||
**Target:** Companies migrating from Pinecone/Weaviate
|
||||
**Message:** "Cut costs 10x, keep your data"
|
||||
**Tactics:**
|
||||
- Migration tools from commercial DBs
|
||||
- Enterprise support/SLAs
|
||||
- Security certifications (SOC2, GDPR)
|
||||
|
||||
**Success metric:** 5+ enterprise customers
|
||||
|
||||
## Risk Analysis
|
||||
|
||||
### Risk 1: "Qdrant is fast enough"
|
||||
**Likelihood:** Medium
|
||||
**Mitigation:** Browser deployment + AgenticDB API = unique value beyond speed
|
||||
|
||||
### Risk 2: "Browser vector search doesn't scale"
|
||||
**Likelihood:** Low
|
||||
**Mitigation:** Benchmarks show 100K+ vectors feasible with WASM SIMD + quantization
|
||||
|
||||
### Risk 3: "Too complex to maintain"
|
||||
**Likelihood:** Medium
|
||||
**Mitigation:** Use battle-tested crates (hnsw_rs, simsimd), focus on integration vs reinventing
|
||||
|
||||
### Risk 4: "Market too crowded"
|
||||
**Likelihood:** Low
|
||||
**Mitigation:** 20+ vector DBs exist, but none combine speed + browser + cognitive features
|
||||
|
||||
## Bottom Line
|
||||
|
||||
**What is ruvector practically?**
|
||||
The vector database your agents deserve - fast enough for real-time, smart enough for learning, portable enough for anywhere.
|
||||
|
||||
**Is there anything like it?**
|
||||
Pieces exist (Qdrant = fast, RxDB = browser, AgenticDB = cognitive), but no solution combines all three.
|
||||
|
||||
**Should you build it?**
|
||||
Yes - clear market gap, proven tech foundation, natural extension of your AgenticDB ecosystem, aligns with your democratization mission.
|
||||
|
||||
**The opportunity:** First production-ready vector database that runs at C++ speed in Node.js, browsers, and edge devices with built-in agent memory capabilities.
|
||||
|
||||
|
||||
|
||||
## Architecture overview: Three-layer design for maximum performance
|
||||
|
||||
Ruvector’s architecture separates concerns across three layers, enabling optimization at each level while maintaining clean interfaces.
|
||||
|
||||
**Storage layer** handles persistence with redb (LMDB-inspired pure Rust) for metadata and ACID transactions, memory-mapped files via memmap2 for zero-copy vector access, and segment-based architecture inspired by Tantivy for efficient merging. This combination provides instant loading times (mmap), crash recovery (redb), and datasets larger than RAM support. Vectors store as aligned float32 arrays in contiguous memory, enabling SIMD operations without deserialization overhead.
|
||||
|
||||
**Index layer** implements HNSW (Hierarchical Navigable Small World) graphs as the primary index structure, achieving O(log n) search complexity with 95%+ recall. Key parameters: M=32 connections per node (640 bytes overhead per 128D vector), efConstruction=200 for build quality, efSearch=100-500 for query-time accuracy control. Quantization reduces memory 4-32x: scalar quantization (int8) for 4x compression with 1-3% recall loss, product quantization for 8-16x compression with 5-10% loss, binary quantization for 32x compression suitable for filtering stages. The system maintains three representations: quantized for search, full-precision for re-ranking, and disk-backed for cold storage with automatic hot/cold tiering.
|
||||
|
||||
**Query engine** parallelizes operations with rayon for CPU-bound work (distance calculations, batch operations), uses crossbeam lock-free queues for query pipelines, and applies SIMD intrinsics for distance metrics (AVX2 baseline, AVX-512 when available). Distance calculations leverage SimSIMD crate providing 200x speedups over naive implementations through optimized kernels for L2, cosine, and dot product metrics across all architectures. The engine supports filtered search (pre-filtering for high selectivity, post-filtering otherwise), hybrid search combining vector similarity with keyword matching, and MMR (Maximal Marginal Relevance) for diversity.
|
||||
|
||||
## AgenticDB API compatibility: Cognitive memory for agents
|
||||
|
||||
Ruvector implements full agenticDB compatibility, supporting the reflexion memory, skill library, causal memory, and learning systems that distinguish agenticDB from simple vector stores. This ensures existing agenticDB applications migrate seamlessly while gaining 10-100x performance improvements.
|
||||
|
||||
**Core vector operations** expose a familiar API:
|
||||
|
||||
```rust
|
||||
#[napi]
|
||||
pub async fn create_vector_db(options: DbOptions) -> Result<VectorDB> {
|
||||
// Initialize with quantization, HNSW parameters
|
||||
}
|
||||
|
||||
#[napi]
|
||||
impl VectorDB {
|
||||
pub async fn insert(&self, entry: VectorEntry) -> Result<String>;
|
||||
pub async fn insert_batch(&self, entries: Vec<VectorEntry>) -> Result<Vec<String>>;
|
||||
pub async fn search(&self, query: SearchQuery) -> Result<Vec<SearchResult>>;
|
||||
pub async fn delete(&self, id: String) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
**Reflexion memory** stores self-critique episodes enabling agents to learn from experience. Each episode contains task context, actions taken, observations, performance scores, and self-generated critiques. The system indexes episodes by embedding their critiques, enabling similarity-based retrieval of relevant past experiences. When an agent faces a new task, it queries similar episodes, reviews past mistakes, and adapts its approach—implementing the reflexion learning paradigm where agents improve through self-reflection.
|
||||
|
||||
**Skill library** consolidates successful patterns into reusable, parameterized skills. After executing tasks successfully multiple times, the system auto-consolidates the pattern into a skill with: name, description, category, parameters, success metrics, and usage examples. Skills store with embeddings enabling semantic search—agents find relevant skills by describing what they want to accomplish. This builds an ever-growing knowledge base of proven approaches.
|
||||
|
||||
**Causal memory graph** tracks cause-effect relationships using hypergraph structures where nodes represent actions/states and hyperedges connect causes to effects with confidence weights. The system learns which actions lead to desired outcomes through repeated observation, enabling causal reasoning beyond correlation. Queries combine semantic similarity (vector search) with causal uplift (how often A→B succeeds) and latency penalties, implementing the utility function U = α·similarity + β·uplift − γ·latency.
|
||||
|
||||
**Learning systems** integrate 9 RL algorithms (Q-Learning, SARSA, DQN, Policy Gradient, Actor-Critic, PPO, Decision Transformer, MCTS, Model-Based) for adaptive behavior. Sessions track state-action-reward sequences, enabling agents to learn optimal policies from experience. The system provides predictions with conformal prediction-based confidence intervals, ensuring uncertainty-aware decision making.
|
||||
|
||||
**Storage schema** uses five tables matching agenticDB: vectors_table for core embeddings with metadata, reflexion_episodes for self-critique memories, causal_edges for cause-effect relationships, skills_library for consolidated patterns, and learning_sessions for RL training data. This schema-compatible approach ensures existing applications work unchanged while gaining performance benefits.
|
||||
|
||||
## HNSW implementation: Production-ready approximate nearest neighbor search
|
||||
|
||||
HNSW provides the best recall-latency trade-off for in-memory vector search, proven across industry with implementations in Qdrant, Milvus, Weaviate, and Pinecone. Ruvector leverages the hnsw_rs crate (20K+ downloads/month) with custom optimizations for agenticDB workloads.
|
||||
|
||||
**Core algorithm** builds a multi-layer graph where each layer contains a subset of nodes with decreasing density toward the top. Search begins at a sparse top layer, greedily descending to find approximate neighbors at each level, then traversing the dense bottom layer for precise results. This hierarchical structure provides O(log n) query complexity while maintaining 95%+ recall— far superior to flat search (O(n)) or IVF methods (O(√n) with lower recall).
|
||||
|
||||
**Parameter tuning** requires balancing memory, build time, and accuracy. M controls connections per node: M=16 for constrained memory (384 bytes overhead per 128D vector), M=32 for balanced performance (640 bytes, recommended), M=64 for maximum recall (1152 bytes, diminishing returns). efConstruction determines build quality: 100 for fast building (2-3 minutes for 1M vectors on 16 cores), 200 for production quality (3-5 minutes, recommended), 400+ for maximum quality (6-10 minutes, minimal gains). efSearch controls query-time accuracy: 50 for 85% recall at 0.5ms, 100 for 90% recall at 1ms, 200 for 95% recall at 2ms, 500 for 99% recall at 5ms. These parameters tune independently—build once with high efConstruction, then dynamically adjust efSearch per query based on latency requirements.
|
||||
|
||||
**SIMD optimization** accelerates distance calculations, the hottest path in search. The implementation uses SimSIMD for production-ready optimized kernels providing 4-8x speedups on AVX2, 8-16x on AVX-512. Fallback to std::arch intrinsics for custom distance metrics:
|
||||
|
||||
```rust
|
||||
#[target_feature(enable = "avx2")]
|
||||
unsafe fn euclidean_simd(a: &[f32], b: &[f32]) -> f32 {
|
||||
let mut sum = _mm256_setzero_ps();
|
||||
for i in (0..a.len()).step_by(8) {
|
||||
let va = _mm256_loadu_ps(a.as_ptr().add(i));
|
||||
let vb = _mm256_loadu_ps(b.as_ptr().add(i));
|
||||
let diff = _mm256_sub_ps(va, vb);
|
||||
sum = _mm256_fmadd_ps(diff, diff, sum);
|
||||
}
|
||||
horizontal_sum(sum).sqrt()
|
||||
}
|
||||
```
|
||||
|
||||
Compile with `RUSTFLAGS="-C target-cpu=native"` to enable all available SIMD instructions for maximum performance.
|
||||
|
||||
**Filtered search** combines vector similarity with metadata filtering using two strategies. Pre-filtering applies metadata constraints before graph traversal—efficient when filters are highly selective (\u003c10% of data). Post-filtering traverses the full graph then applies filters—better for loose constraints. Qdrant’s research shows pre-filtering with filter-aware graph construction achieves best results: during index building, store filter-specific entry points and maintain filter statistics, enabling intelligent routing at query time.
|
||||
|
||||
**Parallel operations** leverage rayon for CPU-bound tasks. Batch insertions parallelize across cores, processing 100-1000 vectors simultaneously. Multi-query search processes independent queries in parallel, saturating CPU cores for maximum throughput. Index building parallelizes construction within large segments, then merges results. These optimizations provide near-linear scaling up to CPU core count.
|
||||
|
||||
## Quantization techniques: 4-32x compression with minimal accuracy loss
|
||||
|
||||
Quantization reduces memory footprint and accelerates search by compressing float32 vectors to compact representations. Ruvector implements three quantization methods, each optimized for different compression-accuracy trade-offs.
|
||||
|
||||
**Scalar quantization** maps float32 values to int8 or int4, achieving 4-8x compression. The algorithm computes per-vector min/max, then quantizes: `quantized = uint8((value - min) * 255 / (max - min))`. This maintains 97-99% accuracy with 2-4x faster distance calculations due to improved cache locality. Recent Elasticsearch research (2024) shows storing a single float32 correction factor per vector recovers most quantization error. Implementation uses SIMD for parallel quantization/dequantization, processing 8-32 values per instruction.
|
||||
|
||||
**Product quantization** splits each vector into M subvectors (typically 8-16), then vector quantizes each subspace independently. For 128D vectors: split into 16 subvectors of 8D each, run K-means (K=256) on each subspace, store centroid IDs (1 byte per subspace = 16 bytes total). This achieves 32x compression (from 512 bytes). Distance calculations use precomputed lookup tables: for a query vector, compute distances from query subvectors to all 256 centroids per subspace (16×256 = 4,096 distances), then approximate full distance as sum of table lookups. Accuracy depends on subspace dimensionality—8D maintains 90-95% recall, 16D achieves 85-90% recall due to curse of dimensionality in subspaces.
|
||||
|
||||
**Binary quantization** represents each dimension as a single bit (sign), achieving 32x compression but only 80-90% recall. Best used as a filtering stage: binary search narrows candidates 10-100x, then re-rank with full precision. Weaviate’s 2025 rotational quantization improves binary methods by applying learned rotation before quantization, better preserving angular relationships for 88-93% recall.
|
||||
|
||||
**Hybrid approach** combines quantization types for optimal results: binary quantization for initial filtering (32x compression, ultra-fast), scalar quantization for HNSW graph traversal (4x compression, 97% accuracy), full precision for final re-ranking (100% accuracy on top-k results). This three-tier strategy minimizes memory (most vectors in binary), maintains speed (scalar for graph search), and ensures accuracy (full precision for final ranking). Memory breakdown for 1M 384D vectors: 1.5GB full precision, 375MB scalar, 47MB binary, 400MB HNSW overhead = 822MB total vs 1.9GB uncompressed.
|
||||
|
||||
**Implementation patterns** use Rust’s type system to enforce correctness:
|
||||
|
||||
```rust
|
||||
trait QuantizedVector {
|
||||
fn quantize(vector: &[f32]) -> Self;
|
||||
fn distance(&self, other: &Self) -> f32;
|
||||
fn reconstruct(&self) -> Vec<f32>;
|
||||
}
|
||||
|
||||
struct ScalarQuantized {
|
||||
data: Vec<u8>,
|
||||
min: f32,
|
||||
scale: f32,
|
||||
}
|
||||
|
||||
struct ProductQuantized {
|
||||
codes: Vec<u8>, // M codes, 1 byte each
|
||||
codebooks: Vec<Vec<f32>>, // M codebooks of K centroids
|
||||
}
|
||||
```
|
||||
|
||||
## Rust performance optimizations: Zero-cost abstractions for maximum throughput
|
||||
|
||||
Ruvector leverages Rust’s unique performance characteristics—memory safety without garbage collection, zero-cost abstractions, and fearless concurrency—to achieve C++ performance with higher productivity.
|
||||
|
||||
**Memory-mapped files** via memmap2 enable instant database loading and datasets larger than RAM. The pattern maps vector data read-only, enabling zero-copy access while letting the OS handle caching:
|
||||
|
||||
```rust
|
||||
let file = File::open("vectors.bin")?;
|
||||
let mmap = unsafe { Mmap::map(&file)? };
|
||||
let vectors: &[f32] = unsafe {
|
||||
std::slice::from_raw_parts(
|
||||
mmap.as_ptr() as *const f32,
|
||||
mmap.len() / 4
|
||||
)
|
||||
};
|
||||
```
|
||||
|
||||
Configure with `.populate()` for read-ahead and `.advise(Advice::Random)` for random access patterns. Use huge pages (2MB) for 5-10% performance improvement on large datasets: `MmapOptions::new().huge(Some(21))`.
|
||||
|
||||
**Lock-free data structures** from crossbeam enable high-concurrency operations without traditional locks. SegQueue provides unbounded MPMC queues for query pipelines with 20-50ns per operation. AtomicCell enables lock-free updates to shared counters and flags. Epoch-based memory reclamation allows safe concurrent access to index structures without stop-the-world pauses. Work-stealing deques distribute tasks across threads efficiently, enabling parallelism that scales to high core counts (tested to 128 cores).
|
||||
|
||||
**Zero-copy serialization** with rkyv achieves instant loading by memory-mapping serialized indexes directly. Unlike traditional serialization (deserialize entire structure into memory), rkyv-archived data uses pointer casts for zero-overhead access:
|
||||
|
||||
```rust
|
||||
#[derive(Archive, Serialize, Deserialize)]
|
||||
struct VectorIndex {
|
||||
graph: HnswGraph,
|
||||
vectors: Vec<Vec<f32>>,
|
||||
metadata: HashMap<String, String>,
|
||||
}
|
||||
|
||||
// Serialize once
|
||||
let bytes = rkyv::to_bytes::<_, Error>(&index)?;
|
||||
std::fs::write("index.rkyv", bytes)?;
|
||||
|
||||
// Load instantly (just mmap + pointer cast)
|
||||
let mmap = unsafe { Mmap::map(&File::open("index.rkyv")?)? };
|
||||
let archived = rkyv::access::<ArchivedVectorIndex, Error>(&mmap)?;
|
||||
// Use archived.graph, archived.vectors immediately - zero deserialization!
|
||||
```
|
||||
|
||||
This enables sub-second startup times even for billion-scale indexes.
|
||||
|
||||
**Parallel processing** with rayon provides work-stealing parallelism with minimal boilerplate. Distance calculations parallelize naturally: `candidates.par_iter().map(|c| distance(query, c)).collect()`. Batch operations process chunks in parallel: `vectors.par_chunks(1000).for_each(|chunk| index_batch(chunk))`. The thread pool automatically matches CPU cores, achieving near-linear scaling up to core count.
|
||||
|
||||
**SIMD intrinsics** accelerate vector operations 4-16x. The implementation detects CPU capabilities at runtime, dispatching to optimized kernels:
|
||||
|
||||
```rust
|
||||
fn distance(a: &[f32], b: &[f32]) -> f32 {
|
||||
#[cfg(target_feature = "avx2")]
|
||||
if is_x86_feature_detected!("avx2") {
|
||||
return unsafe { distance_avx2(a, b) };
|
||||
}
|
||||
distance_fallback(a, b)
|
||||
}
|
||||
```
|
||||
|
||||
SimSIMD crate provides production-ready implementations across architectures (x86 AVX2/AVX-512, ARM NEON/SVE), freeing developers from intrinsics complexity while maintaining peak performance.
|
||||
|
||||
**Memory layout optimization** ensures cache-friendly data structures. Store vectors in Structure-of-Arrays (SoA) format for SIMD efficiency: separate arrays for each dimension rather than array-of-structs. Align allocations to 64-byte cache lines: `#[repr(align(64))]`. Pre-allocate with capacity to minimize reallocations: `Vec::with_capacity(expected_size)`. Use arena allocation for batch operations, freeing all at once.
|
||||
|
||||
## NAPI-RS bindings: High-performance Node.js integration
|
||||
|
||||
NAPI-RS provides Rust-to-Node.js bindings with performance approaching native while maintaining TypeScript integration. Used by Next.js, SWC, and Rspack, it’s proven at massive scale.
|
||||
|
||||
**Core architecture** uses procedural macros for minimal boilerplate and automatic TypeScript generation:
|
||||
|
||||
```rust
|
||||
#[napi]
|
||||
pub struct VectorDB {
|
||||
index: Arc<RwLock<HnswIndex>>,
|
||||
}
|
||||
|
||||
#[napi]
|
||||
impl VectorDB {
|
||||
#[napi(constructor)]
|
||||
pub fn new(dimensions: u32, max_elements: u32) -> Result<Self> {
|
||||
Ok(Self {
|
||||
index: Arc::new(RwLock::new(
|
||||
HnswIndex::new(dimensions, max_elements)?
|
||||
))
|
||||
})
|
||||
}
|
||||
|
||||
#[napi]
|
||||
pub async fn search(
|
||||
&self,
|
||||
query: Float32Array,
|
||||
k: u32
|
||||
) -> Result<Vec<SearchResult>> {
|
||||
let index = self.index.clone();
|
||||
let query_vec = query.to_vec();
|
||||
tokio::task::spawn_blocking(move || {
|
||||
index.read().unwrap().search(&query_vec, k as usize)
|
||||
}).await?
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This generates TypeScript definitions automatically: `export class VectorDB { constructor(dimensions: number, maxElements: number); search(query: Float32Array, k: number): Promise<SearchResult[]>; }`.
|
||||
|
||||
**Memory management** uses two-category type system: borrowed types (`&[f32]`) for zero-copy sync operations, owned types (`Float32Array`) for async-safe reference counting. The runtime automatically creates napi_ref for owned types, releasing when Rust drops them. Critical pattern: convert JavaScript TypedArrays to owned types before async operations to prevent use-after-free.
|
||||
|
||||
**Buffer sharing strategies** minimize copies through zero-copy patterns. For read-only access, use borrowed slices: `fn sum_array(input: &[f32]) -> f32 { input.iter().sum() }`. For mutation, use mutable TypedArrays: `fn scale_array(mut input: Float32Array, factor: f32) { for x in input.as_mut() { *x *= factor; } }`. For ownership transfer, convert Vec to Buffer: `fn create_buffer() -> Buffer { vec![1,2,3].into() }` transfers ownership to V8 with zero copy (except Electron due to V8 Memory Cage).
|
||||
|
||||
**Async integration** leverages Tokio for CPU-bound operations without blocking Node.js event loop. Pattern: accept async function with `&self`, clone Arc-wrapped data, spawn blocking task on Tokio threadpool, await result. This enables 1000x more concurrent connections compared to blocking Node.js threads. For streaming, use ThreadsafeFunction: `fn stream_results(callback: ThreadsafeFunction<SearchResult, ()>) { ... }` enables calling JavaScript from Rust threads safely.
|
||||
|
||||
**Performance characteristics** show 2-10x speedups over pure Node.js for CPU-bound operations. SWC (NAPI-RS based) achieves 5,538 ops/sec vs Babel’s 32.78 ops/sec—169x faster. Memory usage drops 30-50% due to Rust’s efficient data structures. Latency improves through zero-copy buffer access and lock-free concurrency.
|
||||
|
||||
**Buffer pooling** reduces allocation overhead for frequent operations:
|
||||
|
||||
```rust
|
||||
lazy_static! {
|
||||
static ref BUFFER_POOL: Mutex<Vec<Vec<u8>>> = Mutex::new(Vec::new());
|
||||
}
|
||||
|
||||
fn acquire_buffer(size: usize) -> Vec<u8> {
|
||||
BUFFER_POOL.lock().unwrap()
|
||||
.pop()
|
||||
.filter(|buf| buf.capacity() >= size)
|
||||
.unwrap_or_else(|| Vec::with_capacity(size))
|
||||
}
|
||||
```
|
||||
|
||||
This pattern reuses buffers across requests, minimizing GC pressure on both Rust and Node.js sides.
|
||||
|
||||
## WASM and browser deployment: Vector search running locally
|
||||
|
||||
WASM enables ruvector to run entirely in-browser, providing offline-first vector search with no server required. This unlocks use cases impossible with cloud-only solutions: privacy-sensitive applications, offline-capable apps, edge computing, and eliminating network latency.
|
||||
|
||||
**WASM SIMD support** achieved universal browser availability in 2023-2024 (99% of tracked browsers). Compilation requires enabling SIMD feature: `RUSTFLAGS="-C target-feature=+simd128" wasm-pack build --target web`. The critical challenge: WASM cannot detect features at runtime, requiring two builds—one with SIMD, one without—selected via JavaScript feature detection:
|
||||
|
||||
```javascript
|
||||
import { simd } from 'wasm-feature-detect';
|
||||
const module = await simd()
|
||||
? import('./ruvector_simd.wasm')
|
||||
: import('./ruvector.wasm');
|
||||
```
|
||||
|
||||
SIMD provides 4-6x speedups for distance calculations, making browser-based vector search practical for 10K+ vector datasets.
|
||||
|
||||
**Web Workers** enable parallelism by distributing search across worker pool matching `navigator.hardwareConcurrency` (4-32 cores on modern devices). Pattern: partition dataset into chunks, dispatch to workers, merge top-k results. Real-world performance (RxDB vector database): 18x speedup with 32 workers on 32-core system for 10K embedding processing. Use Transferable objects for large data to avoid copying: `worker.postMessage({data: buffer}, [buffer])` transfers ownership zero-copy.
|
||||
|
||||
**SharedArrayBuffer** enables lock-free coordination across workers via atomic operations. Required security headers: `Cross-Origin-Opener-Policy: same-origin` and `Cross-Origin-Embedder-Policy: require-corp`. Use Atomics for synchronization: `Atomics.add()` for progress tracking, `Atomics.wait()`/`Atomics.notify()` for barriers, `Atomics.compareExchange()` for lock-free updates. Performance: 100x faster than message passing for synchronization primitives.
|
||||
|
||||
**IndexedDB persistence** enables offline-first operation with optimizations critical for acceptable performance. Key strategies: batch all operations in single transaction (10-25x faster than individual transactions), use custom index strings combining multiple fields (10% faster than multi-field indexes), implement sharding across 10 object stores (28-43% faster queries), use getAll() with bounded ranges instead of cursors (28-43% faster). Distance-to-samples method accelerates similarity search: select 5 random anchor embeddings, store distances to anchors as indexed fields, query by distance ranges. This achieves 8.7x speedup by reading ~2,000 candidates instead of 10,000 full vectors.
|
||||
|
||||
**Memory constraints** require careful optimization. Target 60KB minified bundle size (achieved by agenticDB). Use quantization aggressively—binary quantization achieves 32x compression, enabling 320K vectors in 10MB. Implement progressive loading: index first 1000 vectors immediately, continue in background. Use LRU caching for hot vectors in memory, cold vectors in IndexedDB.
|
||||
|
||||
**WASM Component Model** (WASI 0.2, released February 2024) enables composable modules with well-defined interfaces. While still maturing, it provides future-proof interoperability. Pattern: define WIT (WebAssembly Interface Types) contract, generate bindings with wit-bindgen, compile to component with wasm-tools. Current status: production-ready for server-side, browser integration requires polyfills via jco. Timeline: native browser support expected 2025-2026.
|
||||
|
||||
**Deployment strategy** uses progressive enhancement: start with basic JavaScript implementation, add Web Workers for parallelism, compile Rust to WASM with SIMD, enable SharedArrayBuffer when available. Feature detection selects optimal code path, ensuring broad compatibility while leveraging advanced features when present. Build pipeline generates multiple bundles automatically.
|
||||
|
||||
## Hypergraph extensions: Beyond pairwise relationships
|
||||
|
||||
Hypergraph structures enable ruvector to represent n-ary relationships beyond traditional pairwise similarity, unlocking advanced use cases like multi-entity reasoning, temporal dynamics, and causal inference.
|
||||
|
||||
**HyperGraphRAG architecture** (NeurIPS 2025) demonstrates production-ready hypergraph vector retrieval achieving 28% better answer relevance than standard RAG. Core innovation: represent complex facts as hyperedges connecting multiple entities with natural language descriptions. Example medical fact: “Male hypertensive patients with serum creatinine 115-133 μmol/L diagnosed with mild elevation” becomes a single hyperedge connecting {male, hypertension, creatinine range, diagnosis} rather than multiple binary relations. Query processing: retrieve relevant hyperedges via embedding similarity, traverse to connected entities, synthesize multi-hop reasoning paths.
|
||||
|
||||
**Storage strategy** uses bipartite graph transformation: convert hypergraph H = (V, E) where E contains n-ary hyperedges into bipartite graph G = (V ∪ E, edges) where both entities and hyperedges are nodes. This enables efficient querying with standard graph algorithms while preserving n-ary semantics. Store hyperedge embeddings alongside entity embeddings, enabling semantic search over both dimensions.
|
||||
|
||||
**Temporal hypergraphs** track time-evolving relationships critical for agent memory systems. Research (Nature Communications 2024) reveals higher-order temporal correlations in real-world data—groups of size d show power-law decay in autocorrelation, while cross-order correlations (between groups of different sizes) exhibit asymmetric temporal dependencies indicating causal direction (nucleation vs fragmentation). Implementation: store hyperedges with temporal attributes, use sliding window queries for recent context, maintain separate indices per time granularity (hourly, daily, monthly) for efficient temporal range queries.
|
||||
|
||||
**Causal memory implementation** leverages hypergraph structure where nodes represent states/actions, hyperedges represent causal relationships with confidence weights and context. The utility function balances similarity, causal strength, and latency: U = 0.7·semantic_similarity + 0.2·causal_uplift − 0.1·action_latency. This enables agents to recall not just similar situations but situations where similar actions led to desired outcomes.
|
||||
|
||||
**Skill library consolidation** uses hypergraph pattern matching: detect frequently co-occurring action sequences represented as temporal hyperpaths, extract as parameterized skills with success metrics, enable semantic search over skill descriptions. After agent executes “authenticate user → validate token → fetch profile” successfully 3+ times, the system auto-consolidates into a reusable “authentication_flow” skill.
|
||||
|
||||
**Performance considerations**: Hypergraph operations are more expensive than pairwise—k-hop neighbor expansion costs O(exp(k)·N) due to exponential branching. Mitigate with: sampling (approximate neighborhoods), sparse representations (most hyperedges are low-order), precomputed statistics (frequent pattern caching), and hybrid approach (hypergraph for complex queries, standard vector search for simple similarity).
|
||||
|
||||
**Implementation roadmap**: Start with standard vector search, add hyperedge table with n-ary relationships, implement bipartite storage transformation, expose hypergraph query API for advanced users. Make hypergraph features opt-in to avoid overhead for simple use cases.
|
||||
|
||||
## Advanced techniques for 10-year horizon
|
||||
|
||||
Ruvector’s architecture supports emerging techniques that will define next-generation vector search, with clear adoption timelines based on current research maturity.
|
||||
|
||||
**Learned index structures** (TRL 4-5, adoption 2025-2027) replace traditional indexes with neural networks trained on data distribution. Recursive Model Indexes (RMI) treat indexing as CDF approximation: multi-stage models make coarse-then-fine predictions with bounded error correction. Recent advances (Mixture-of-Logits, WWW 2025) show 20-30% improvements on billion-scale datasets. Implementation strategy: hybrid approach combining learned indexes for static segments with traditional HNSW for dynamic updates. Performance target: 1.5-3x lookup speedup, 10-100x space reduction on read-heavy workloads. Challenge: dynamic updates remain problematic—retrain periodically on background thread.
|
||||
|
||||
**Neural hash functions** (TRL 5-6, adoption 2024-2025 already happening) learn similarity-preserving projections into compact binary codes. Deep Hash Embeddings achieve 32-128x compression with 90-95% recall preservation— far better than random LSH. Implementation uses learned hash functions that adapt to embedding distribution: train on representative query-document pairs, generate binary codes optimizing similarity preservation, update periodically as distribution shifts. Integration: use binary codes for initial filtering (32x compressed), scalar quantization for HNSW traversal (4x compressed), full precision for re-ranking. This three-tier approach combines extreme compression with high accuracy.
|
||||
|
||||
**Conformal prediction** (TRL 6-7, adoption 2024-2025 already happening) provides distribution-free uncertainty quantification with finite-sample guarantees. For vector search: calibrate on held-out queries with known relevance, compute non-conformity scores (e.g., negative similarity), determine threshold for 1-α coverage, return prediction sets containing true answer with probability ≥ 1-α. Applications: adaptive top-k (dynamically adjust k based on uncertainty), query routing (uncertain queries to expensive rerankers), confidence intervals on similarity scores. Implementation: store calibration set in-memory, compute quantile at initialization, apply to queries with minimal overhead. Most mature technique examined—production-ready today.
|
||||
|
||||
**Neuromorphic computing** (TRL 4-5, adoption 2026-2030) uses event-driven spiking neural networks on specialized hardware. Intel Loihi 2 (2024) provides 1M neurons per chip, 128 billion synapses in Hala Point system, 100x energy efficiency vs GPU. Application to vector search: encode query as spike train, perform massively parallel similarity computation, exploit sparsity (computation only on non-zero dimensions), achieve sub-millisecond latency at ultra-low power. Best fit: edge devices (drones, mobile, wearables), real-time applications, always-on inference. Timeline: 2026-2028 for edge deployments, 2028-2030 for data center accelerators. Implementation: compile to Lava framework, deploy on Loihi 2, fallback to CPU/GPU for availability.
|
||||
|
||||
**Quantum-inspired algorithms** (TRL 2-3, practical adoption post-2030) leverage quantum computing principles on classical hardware. Quantum-assisted Variational Autoencoders (arxiv:2006.07680) achieve space-efficient indexing with tight coverage on billion-scale datasets. Reality check: true quantum advantage for general vector search unlikely before 2030 due to error correction requirements (\u003e1000 logical qubits needed, current systems at ~100 physical qubits). Quantum-inspired classical algorithms more promising short-term—implement quantum walk-inspired similarity measures, amplitude encoding-style projections, interference- based aggregation, all in classical compute. Monitor quantum computing developments but don’t block on hardware availability.
|
||||
|
||||
**Algebraic topology** (TRL 3-5, adoption 2026-2030) applies persistent homology to analyze embedding space structure. TopER (arxiv:2410.01778, October 2024) achieves state- of-the-art on molecular/biological/social network embeddings. Applications: assess embedding quality (identify mode collapse, degeneracy), guide architecture design (topological regularization during training), detect out-of-distribution queries (queries in topologically distinct regions), topology-aware indexing (cluster data respecting topological structure). Challenge: O(n²-n³) computational complexity limits to ~100K points—use on samples to guide system design rather than runtime. Most promising: embedding quality assessment and model development rather than production queries.
|
||||
|
||||
**Integration strategy** for advanced techniques: **Phase 1 (2024-2025)**: Add conformal prediction (immediate value, minimal overhead), experiment with learned hash functions for compression. **Phase 2 (2025-2027)**: Implement learned indexes for read-heavy segments, integrate neural hashing as default. **Phase 3 (2027-2030)**: Add TDA-based embedding quality metrics, prepare for neuromorphic edge deployment. **Phase 4 (2030+)**: Neuromorphic co-processors for real-time queries, quantum-inspired algorithms as quantum hardware matures. This phased approach ensures production-ready performance today while positioning for future innovations.
|
||||
|
||||
## Benchmarking and performance targets
|
||||
|
||||
Ruvector targets 10-100x performance improvements over current solutions through Rust’s efficiency, algorithmic optimizations, and hardware exploitation. Specific targets measured against industry-standard benchmarks.
|
||||
|
||||
**ANN-Benchmarks framework** (ann-benchmarks.com) provides standardized evaluation on datasets including SIFT1M (128D, 1M vectors), GIST1M (960D, 1M vectors), Deep1M (96D, 1M vectors), and Deep1B (96D, 1B vectors). Metrics: queries per second (QPS) at various recall@k thresholds (typically recall@10), build time, memory usage, index size. Leading implementations (2024 benchmarks): HNSW achieves 90% recall@10 at 10K-50K QPS (single thread), 95% recall@10 at 5K-20K QPS, 99% recall@10 at 1K-5K QPS depending on parameters.
|
||||
|
||||
**Performance targets for ruvector**:
|
||||
|
||||
**Queries per second**: 50K+ QPS at 90% recall@10 (single-threaded HNSW with AVX2 SIMD), 20K+ QPS at 95% recall@10, 100K+ QPS at 90% recall@10 (8-thread parallelism). This represents 5-10x improvement over unoptimized implementations through SIMD, zero-copy memory access, and lock-free data structures.
|
||||
|
||||
**Latency percentiles**: p50 \u003c 0.5ms, p95 \u003c 2ms, p99 \u003c 5ms for 95% recall@10 on 1M 128D vectors. Sub-millisecond p50 achieved through memory-mapped data (zero loading time), SIMD distance calculations (4-8x faster), and cache-friendly data layout. Compare to agenticDB baseline: 5ms for 10K vectors (116x speedup claimed) suggests ruvector should achieve \u003c1ms for 10K vectors, \u003c5ms for 1M vectors.
|
||||
|
||||
**Memory usage**: Base 512 bytes per 128D float32 vector, +640 bytes HNSW overhead (M=32), +128 bytes scalar quantization = 1,280 bytes per vector. With optimizations: 128 bytes scalar quantized storage, 640 bytes HNSW, 47 bytes binary filtering = 815 bytes per vector (63% reduction). For 1M vectors: 815MB vs 2GB unoptimized—massive savings enabling larger datasets in RAM.
|
||||
|
||||
**Build time**: 1M vectors in 2-5 minutes (16 cores, efConstruction=200), 10M vectors in 30-60 minutes, 100M vectors in 8-12 hours. Parallelized construction with rayon achieves near-linear scaling to core count. Index serialization with rkyv: \u003c1 second for 1M vectors, enabling fast checkpointing.
|
||||
|
||||
**Recall accuracy**: 95%+ recall@10 with efSearch=200 (production target), 99%+ recall@10 with efSearch=500 (high-accuracy mode), 85-90% recall@10 with efSearch=50 (low-latency mode). Quantization impact: scalar (int8) 97-99% recall, product quantization 90-95% recall, binary 80-90% recall. Combined with re-ranking, system achieves 99%+ recall on final results.
|
||||
|
||||
**Comparison targets**: Beat FAISS CPU by 2-3x (Rust efficiency + better memory layout), match Qdrant performance (similar Rust+HNSW architecture), exceed Milvus CPU-only by 3-5x (Milvus optimized for GPU), surpass pgvecto.rs by 1.5-2x (pure Rust vs Rust+Postgres overhead), demolish pure Python/JavaScript implementations by 50-100x (compiled vs interpreted). Specific scenario: agenticDB’s claimed 12,500x speedup for 1M vectors suggests baseline ~100 seconds; ruvector target \u003c10ms = 10,000x minimum.
|
||||
|
||||
**Benchmark datasets**: Test on SIFT1M (standard 128D benchmark), Deep1B (billion-scale), GIST1M (high-dimensional 960D), MS MARCO passages (semantic search), custom agenticDB workloads (reflexion episodes, skill searches). Dimensions: 128D (embeddings), 384D (sentence-transformers), 768D (BERT), 1536D (OpenAI ada-002), 3072D (text-embedding-3-large).
|
||||
|
||||
**Testing methodology**: Implement using criterion.rs for micro-benchmarks (measure distance calculations, HNSW operations), flamegraph for profiling hotspots, perf on Linux for hardware counter analysis, comparative benchmarks against FAISS/Hnswlib bindings. Continuous performance monitoring: track QPS, latency percentiles, memory usage per git commit, alert on regressions \u003e5%.
|
||||
|
||||
## Implementation roadmap and architecture decisions
|
||||
|
||||
Ruvector development follows phased approach balancing immediate production-readiness with future extensibility. Core principle: ship working system fast, then optimize.
|
||||
|
||||
**Phase 1 (Weeks 1-4): Foundation**
|
||||
|
||||
- Core traits: DistanceMetric, VectorStorage, IndexStructure
|
||||
- Basic vector storage with redb for metadata, memmap2 for vectors
|
||||
- Distance calculations with SimSIMD (production-ready SIMD)
|
||||
- Simple brute-force search as baseline
|
||||
- NAPI-RS scaffolding for Node.js bindings
|
||||
- Test harness with criterion benchmarks
|
||||
**Deliverable**: Working vector database with insert/search, 10K vectors @ 50ms
|
||||
|
||||
**Phase 2 (Weeks 5-8): HNSW indexing**
|
||||
|
||||
- Integrate hnsw_rs crate with custom optimizations
|
||||
- Implement HNSW construction with parallel building (rayon)
|
||||
- Serialize/deserialize with rkyv for instant loading
|
||||
- Batch operations for efficient bulk inserting
|
||||
- Add scalar quantization (int8) for 4x compression
|
||||
- Performance target: 1M vectors @ \u003c5ms search
|
||||
**Deliverable**: Production-grade HNSW with quantization
|
||||
|
||||
**Phase 3 (Weeks 9-12): AgenticDB compatibility**
|
||||
|
||||
- Implement five-table schema (vectors, reflexion, skills, causal, learning)
|
||||
- Reflexion memory API (store episodes, critique, retrieve)
|
||||
- Skill library (create, search, auto-consolidate)
|
||||
- Causal graph (add edges, query with utility function)
|
||||
- Learning session management (start, predict, feedback, train)
|
||||
- Full agenticDB API surface compatibility
|
||||
**Deliverable**: Drop-in agenticDB replacement with 10-100x speedup
|
||||
|
||||
**Phase 4 (Weeks 13-16): Advanced features**
|
||||
|
||||
- Product quantization for 8-16x compression
|
||||
- Filtered search with pre/post-filtering strategies
|
||||
- MMR (Maximal Marginal Relevance) for diverse results
|
||||
- Hybrid search (vector + keyword via tantivy integration)
|
||||
- Conformal prediction for uncertainty quantification
|
||||
- Monitoring/observability (metrics, tracing)
|
||||
**Deliverable**: Production-ready with enterprise features
|
||||
|
||||
**Phase 5 (Weeks 17-20): Multi-platform deployment**
|
||||
|
||||
- WASM compilation with SIMD support (dual builds)
|
||||
- Browser integration: Web Workers, SharedArrayBuffer, IndexedDB persistence
|
||||
- WASM Component Model for future interoperability
|
||||
- Cross-compilation for Linux/macOS/Windows ARM/x64
|
||||
- NPM packaging with platform-specific optional dependencies
|
||||
- Docker containers for server deployment
|
||||
**Deliverable**: “Deploy anywhere” capability
|
||||
|
||||
**Phase 6 (Weeks 21-24): Advanced techniques**
|
||||
|
||||
- Hypergraph structures for n-ary relationships (opt-in)
|
||||
- Temporal hypergraphs for agent memory
|
||||
- Learned hash functions for improved compression
|
||||
- TDA-based embedding quality metrics
|
||||
- Integration examples (RAG, semantic search, recommender systems)
|
||||
- Performance optimization: profile-guided optimization, SIMD tuning
|
||||
**Deliverable**: Research-grade features, comprehensive examples
|
||||
|
||||
**Architecture principles**:
|
||||
|
||||
- **Modularity**: Trait-based abstractions enable swapping implementations (different indexes, distance metrics, storage backends)
|
||||
- **Performance**: Zero-cost abstractions, SIMD by default, lock-free where possible
|
||||
- **Safety**: Rust’s type system prevents memory errors, data races—critical for production database
|
||||
- **Compatibility**: AgenticDB API 1:1 compatibility, migration path from existing deployments
|
||||
- **Extensibility**: Plugin architecture for custom distance metrics, index types, quantization methods
|
||||
- **Observability**: Structured logging (tracing crate), metrics (prometheus), profiling hooks
|
||||
|
||||
**Key technical decisions**:
|
||||
|
||||
- **Storage**: redb for metadata (ACID, pure Rust) + memmap2 for vectors (zero-copy, OS-managed caching). Alternative: sled for lock-free updates if workload is write-heavy.
|
||||
- **Indexing**: HNSW as primary (best recall-latency tradeoff), IVF as alternative for memory-constrained environments. Flat index for \u003c10K vectors.
|
||||
- **Distance metrics**: SimSIMD for SIMD-optimized implementations (L2, cosine, dot product), std::arch for custom metrics requiring specific math.
|
||||
- **Quantization**: Scalar (int8) default for 4x compression, product quantization opt-in for 8-16x, binary for filtering stages.
|
||||
- **Parallelism**: rayon for data parallelism (batch operations, parallel search), crossbeam for pipelines (query processing), tokio for async I/O (if needed for network features).
|
||||
- **Serialization**: rkyv for index persistence (zero-copy loading), bincode for network protocol (compact encoding), JSON for configuration/metadata (human-readable).
|
||||
- **Node.js bindings**: NAPI-RS exclusively (modern, performant, well-maintained). Automatic TypeScript generation.
|
||||
- **WASM**: wasm-pack for building, wasm-bindgen for JavaScript integration, dual SIMD/non-SIMD builds with feature detection.
|
||||
|
||||
**Rust crates ecosystem**:
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
# Core functionality
|
||||
redb = "2.0" # LMDB-inspired storage
|
||||
memmap2 = "0.9" # Memory-mapped files
|
||||
hnsw_rs = "0.3" # HNSW implementation
|
||||
simsimd = "5.0" # SIMD distance metrics
|
||||
rayon = "1.10" # Data parallelism
|
||||
crossbeam = "0.8" # Lock-free data structures
|
||||
|
||||
# Serialization
|
||||
rkyv = "0.8" # Zero-copy serialization
|
||||
bincode = "2.0" # Compact encoding
|
||||
serde = "1.0" # Serialization framework
|
||||
|
||||
# Node.js bindings
|
||||
napi = "3.0"
|
||||
napi-derive = "3.0"
|
||||
|
||||
# Async (if needed)
|
||||
tokio = { version = "1.40", features = ["rt-multi-thread"] }
|
||||
|
||||
# Utilities
|
||||
thiserror = "1.0" # Error handling
|
||||
tracing = "0.1" # Structured logging
|
||||
criterion = "0.5" # Benchmarking
|
||||
```
|
||||
|
||||
## Production deployment and operational considerations
|
||||
|
||||
Ruvector’s design prioritizes zero-ops deployment—minimal configuration, instant startup, automatic optimization—while providing expert controls when needed.
|
||||
|
||||
**Initialization patterns**: Simplest path `let db = VectorDB::new(dimensions)?;` uses sensible defaults (HNSW M=32, efConstruction=200, scalar quantization). Advanced configuration:
|
||||
|
||||
```rust
|
||||
let db = VectorDB::builder()
|
||||
.dimensions(384)
|
||||
.max_elements(10_000_000)
|
||||
.hnsw_m(64) // More connections for better recall
|
||||
.hnsw_ef_construction(400) // Higher quality index
|
||||
.quantization(Quantization::Product { subspaces: 16, k: 256 })
|
||||
.distance_metric(DistanceMetric::Cosine)
|
||||
.storage_path("./vectors.db")
|
||||
.mmap_vectors(true) // Memory-map for large datasets
|
||||
.build()?;
|
||||
```
|
||||
|
||||
**Scaling strategies**: Vertical scaling to 128+ cores via rayon parallelism, support datasets larger than RAM via mmap (tested to 100GB+ on 16GB RAM systems), automatic hot/cold tiering (frequently accessed vectors in RAM, cold in mmap). Horizontal scaling (future): consistent hashing for shard assignment, scatter-gather query processing, replica sets for high availability. Initial focus: single-node vertical scaling handles most workloads (\u003c100M vectors).
|
||||
|
||||
**Resource management**: Memory budget awareness—query available RAM, decide quantization level, warn when approaching limits. CPU pinning for consistent latency—use `core_affinity` crate to bind threads to specific cores, reducing context switches. Huge pages for large allocations (\u003e2MB)—5-10% performance improvement, requires system configuration.
|
||||
|
||||
**Monitoring and observability**: Export prometheus metrics (qps, latency histograms, memory usage, index size), structured logging via tracing (query details, build progress, errors), health checks (API endpoint returning index statistics, readiness for load balancers), performance tracking (record p50/p95/p99 latencies, alert on degradation).
|
||||
|
||||
**Backup and recovery**: Atomic snapshots via rkyv serialization (write index to temporary file, atomically rename), incremental backups (track changed vectors since last backup, serialize delta), point-in-time recovery (store WAL of recent operations, replay from snapshot), automatic crash recovery (redb handles corruption via checksums and ACID properties).
|
||||
|
||||
**Upgrade paths**: Backward-compatible index format (version in header, support reading older versions), migration utilities (re-index in background, atomic switchover), rolling updates in distributed deployments. Critical: don’t break existing agenticDB applications during upgrades.
|
||||
|
||||
**Security considerations**: Memory safety via Rust (prevents buffer overflows, use-after-free, data races), input validation (check vector dimensions match, reject malformed queries), DoS prevention (query timeouts, rate limiting, resource quotas), dependency scanning (cargo-audit for vulnerabilities). No authentication/authorization in core library—expect deployment environment to handle (reverse proxy, service mesh).
|
||||
|
||||
**Compliance and privacy**: On-premises deployment option for sensitive data, no telemetry by default (opt-in only), clear data retention policies (explicit delete operations), memory zeroing for deleted vectors (prevent information leakage), audit logging (track all operations if compliance requires).
|
||||
|
||||
## Conclusion: Building the future of vector search
|
||||
|
||||
Ruvector synthesizes battle-tested algorithms (HNSW, product quantization), modern systems research (learned indexes, conformal prediction, hypergraphs), and Rust’s unique performance characteristics into a cohesive next-generation vector database. The core value proposition is clear: **10-100x performance improvements over current solutions while supporting deployment everywhere—from data centers to browsers to edge devices.**
|
||||
|
||||
**Immediate competitive advantages**: Sub-millisecond latency through SIMD-optimized distance calculations and zero-copy memory access. AgenticDB API compatibility enabling seamless migration for existing applications with instant 10-100x speedups. Multi-platform deployment (Node.js, browser WASM, native Rust) from single codebase where competitors require separate implementations. Offline-first capability for browsers opening new use cases impossible with cloud-only solutions. Memory efficiency through aggressive quantization (4-32x compression) enabling larger datasets on constrained hardware.
|
||||
|
||||
**Long-term differentiation**: Hypergraph structures supporting n-ary relationships beyond pairwise similarity, enabling advanced agent reasoning. Temporal memory for agent continuity and learning. Causal inference through graph structures identifying which actions lead to outcomes. Conformal prediction providing uncertainty quantification for trustworthy AI. Clear integration path for emerging techniques (neuromorphic hardware, learned indexes) as they mature. Architecture designed for 10-year horizon while shipping production-ready features today.
|
||||
|
||||
**Technical excellence**: Rust’s zero-cost abstractions provide C++ performance with memory safety, eliminating entire classes of bugs. Lock-free concurrency scales to 128+ cores without traditional locking overhead. SIMD intrinsics accelerate distance calculations 4-16x over scalar code. Memory-mapped files enable instant startup and datasets exceeding RAM. Zero-copy serialization with rkyv provides sub-second loading for billion-scale indexes. These techniques compound—each 2-3x improvement multiplies to 100x+ overall.
|
||||
|
||||
**Market opportunity**: Vector databases are infrastructure for modern AI (RAG, semantic search, recommenders, embeddings-based applications). Market growing 50%+ annually as transformer models proliferate. Existing solutions constrained by interpreted languages (Python, TypeScript), cloud-only deployment, or limited platform support. Ruvector fills critical gap: high-performance, deploy-anywhere vector database with cognitive capabilities for agentic AI. Target users: AI application developers, edge computing deployments, privacy-sensitive enterprises, browser-based AI applications.
|
||||
|
||||
**Success metrics**: Technical—achieve 50K+ QPS at 95% recall, \u003c1ms p50 latency, 100M+ vectors on single node. Adoption—1000+ npm downloads/month first year, used in production AI applications, contribution from external developers. Ecosystem—integration examples with LangChain/LlamaIndex, deployment templates for common scenarios, performance benchmarks vs competitors showing 10-100x improvements.
|
||||
|
||||
**The path forward**: Follow the six-phase roadmap (foundation → HNSW → agenticDB compat → advanced features → multi-platform → research techniques) delivering incremental value at each stage. Phase 1-3 (weeks 1-12) produce production-ready agenticDB replacement—immediate user value. Phase 4-5 (weeks 13-20) add enterprise features and multi-platform—broaden market. Phase 6 (weeks 21-24) integrate research advances—thought leadership. Continuous optimization throughout using profile-guided optimization, SIMD tuning, and algorithmic improvements.
|
||||
|
||||
Ruvector positions at the intersection of systems research, AI infrastructure, and production engineering—combining academic rigor with industrial pragmatism. The foundation (Rust, HNSW, quantization) is proven and battle-tested. The extensions (hypergraphs, learned indexes, conformal prediction) are emerging but with clear research validation. The vision (deploy-anywhere, cognitive capabilities, 10-100x performance) is ambitious but achievable through disciplined engineering and leveraging Rust’s unique strengths.
|
||||
|
||||
**Start building today.** The opportunity is clear, the technology is ready, and the market is hungry for high-performance vector search that works everywhere. Ruvector can redefine what’s possible.
|
||||
921
vendor/ruvector/docs/architecture/attention-exotic-ai-autonomous-systems.md
vendored
Normal file
921
vendor/ruvector/docs/architecture/attention-exotic-ai-autonomous-systems.md
vendored
Normal file
@@ -0,0 +1,921 @@
|
||||
# RuVector Exotic AI & Autonomous Systems Implementation Plan
|
||||
|
||||
**Version**: 1.0
|
||||
**Date**: 2025-01-01
|
||||
**Scope**: Additional attention mechanisms, self-learning systems, MicroLoRA, self-optimization, and autonomous business infrastructure
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This plan outlines the implementation of advanced AI/agentic features for the RuVector Edge-Net service, drawing from existing WASM modules and introducing exotic capabilities for self-sustaining, self-learning distributed intelligence networks.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ RUVECTOR EXOTIC AI ARCHITECTURE │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ AUTONOMOUS BUSINESS LAYER │ │
|
||||
│ │ • Credit Economy • Contribution Curves • Self-Sustaining Markets │ │
|
||||
│ └───────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────────────▼──────────────────────────────────────┐ │
|
||||
│ │ SELF-OPTIMIZATION LAYER │ │
|
||||
│ │ • MicroLoRA Adaptation • SONA Learning • MinCut Coherence Control │ │
|
||||
│ └───────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────────────▼──────────────────────────────────────┐ │
|
||||
│ │ ATTENTION MECHANISMS LAYER │ │
|
||||
│ │ 7 DAG + 7 Neural + Nervous System + Hyperbolic + MoE + Flash │ │
|
||||
│ └───────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────────────▼──────────────────────────────────────┐ │
|
||||
│ │ WASM EXECUTION LAYER │ │
|
||||
│ │ • 58KB Bundles • SIMD128 • Zero-Copy • Web Workers │ │
|
||||
│ └──────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Attention Mechanisms Inventory
|
||||
|
||||
### 1.1 Existing WASM Attention Modules
|
||||
|
||||
| Crate | Mechanisms | Binary Size | Latency |
|
||||
|-------|------------|-------------|---------|
|
||||
| `ruvector-attention-wasm` | Multi-Head, Hyperbolic, Linear, Flash, Local-Global, MoE, Scaled Dot-Product | ~50KB | <100μs |
|
||||
| `ruvector-mincut-gated-transformer-wasm` | MinCut-Gated Transformer with coherence control | ~50KB | <1ms |
|
||||
| `ruvector-dag-wasm` | Topological, Causal Cone, Critical Path, MinCut-Gated, Hierarchical Lorentz, Parallel Branch, Temporal BTSP | 58KB | <100μs |
|
||||
| `ruvector-gnn-wasm` | GCN, GAT (Graph Attention), GraphSAGE | ~60KB | <15ms |
|
||||
| `ruvector-nervous-system` | Global Workspace, Oscillatory Routing, Predictive Coding | N/A (native) | <1ms |
|
||||
|
||||
### 1.2 Attention Mechanisms Detail
|
||||
|
||||
#### 1.2.1 Neural Attention (ruvector-attention-wasm)
|
||||
|
||||
```typescript
|
||||
// Already implemented - 7 mechanisms
|
||||
interface AttentionMechanisms {
|
||||
// 1. Scaled Dot-Product: O(n²) standard transformer attention
|
||||
scaledDotProduct(Q, K, V): Float32Array;
|
||||
|
||||
// 2. Multi-Head Attention: Parallel attention with multiple heads
|
||||
multiHead(query, keys, values, numHeads): Float32Array;
|
||||
|
||||
// 3. Hyperbolic Attention: For hierarchical data in Poincaré space
|
||||
hyperbolic(query, keys, values, curvature): Float32Array;
|
||||
|
||||
// 4. Linear Attention: O(n) Performer-style random features
|
||||
linear(query, keys, values): Float32Array;
|
||||
|
||||
// 5. Flash Attention: Memory-efficient tiled computation
|
||||
flash(query, keys, values): Float32Array;
|
||||
|
||||
// 6. Local-Global: Combined windowed + global tokens
|
||||
localGlobal(query, keys, values, windowSize): Float32Array;
|
||||
|
||||
// 7. MoE Attention: Mixture of Experts with routing
|
||||
moe(query, keys, values, numExperts, topK): Float32Array;
|
||||
}
|
||||
```
|
||||
|
||||
#### 1.2.2 DAG Attention (ruvector-dag-wasm)
|
||||
|
||||
```typescript
|
||||
// Already implemented - 7 mechanisms with MinCut control
|
||||
interface DagAttentionMechanisms {
|
||||
// 1. Topological: Position-based in DAG order
|
||||
topological(dag): AttentionScores;
|
||||
|
||||
// 2. Causal Cone: Downstream impact analysis
|
||||
causalCone(dag, node): AttentionScores;
|
||||
|
||||
// 3. Critical Path: Latency-focused bottleneck attention
|
||||
criticalPath(dag): AttentionScores;
|
||||
|
||||
// 4. MinCut-Gated: Flow-weighted attention with coherence
|
||||
mincutGated(dag, gatePacket): AttentionScores;
|
||||
|
||||
// 5. Hierarchical Lorentz: Deep hierarchy in Lorentzian space
|
||||
hierarchicalLorentz(dag, depth): AttentionScores;
|
||||
|
||||
// 6. Parallel Branch: Wide parallel execution weighting
|
||||
parallelBranch(dag): AttentionScores;
|
||||
|
||||
// 7. Temporal BTSP: Time-series behavioral plasticity
|
||||
temporalBtsp(dag, timeWindow): AttentionScores;
|
||||
}
|
||||
```
|
||||
|
||||
#### 1.2.3 Graph Attention (ruvector-gnn-wasm)
|
||||
|
||||
```typescript
|
||||
// Graph neural network attention for HNSW topology
|
||||
interface GraphAttentionMechanisms {
|
||||
// GAT: Multi-head attention over graph edges
|
||||
gatForward(features, adjacency, numHeads): NodeEmbeddings;
|
||||
|
||||
// GCN: Spectral graph convolution
|
||||
gcnForward(features, adjacency): NodeEmbeddings;
|
||||
|
||||
// GraphSAGE: Inductive sampling-based
|
||||
sageForward(features, adjacency, sampleSizes): NodeEmbeddings;
|
||||
}
|
||||
```
|
||||
|
||||
#### 1.2.4 Nervous System Attention (ruvector-nervous-system)
|
||||
|
||||
```rust
|
||||
// Bio-inspired attention from nervous system
|
||||
pub trait NervousAttention {
|
||||
// Global Workspace: 4-7 item bottleneck (Miller's Law)
|
||||
fn global_workspace(&mut self, inputs: &[Representation]) -> Vec<Representation>;
|
||||
|
||||
// Oscillatory Routing: Phase-coupled 40Hz gamma coordination
|
||||
fn oscillatory_route(&mut self, sender: usize, receiver: usize) -> f32;
|
||||
|
||||
// Predictive Coding: Only transmit surprises (90-99% bandwidth reduction)
|
||||
fn predictive_code(&mut self, input: &[f32], prediction: &[f32]) -> Vec<f32>;
|
||||
|
||||
// K-WTA Competition: Winner-take-all in <1μs
|
||||
fn k_winner_take_all(&mut self, activations: &[f32], k: usize) -> Vec<usize>;
|
||||
}
|
||||
```
|
||||
|
||||
### 1.3 New Attention Mechanisms to Implement
|
||||
|
||||
| Mechanism | Description | Target Crate |
|
||||
|-----------|-------------|--------------|
|
||||
| **Mamba SSM** | State-space model attention (O(n) selective scan) | `ruvector-attention-wasm` |
|
||||
| **Differential Attention** | Subtract attention heads for noise cancellation | `ruvector-attention-wasm` |
|
||||
| **Sparse Transformer** | Block-sparse patterns for long sequences | `ruvector-attention-wasm` |
|
||||
| **Hierarchical Hopfield** | Exponential pattern storage via modern Hopfield | `ruvector-nervous-system-wasm` |
|
||||
| **HDC Attention** | Hyperdimensional computing similarity in 10,000-bit space | `ruvector-nervous-system-wasm` |
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Self-Learning Systems
|
||||
|
||||
### 2.1 SONA (Self-Optimizing Neural Architecture)
|
||||
|
||||
**Location**: `ruvector-dag` (already implemented)
|
||||
|
||||
SONA learns from query execution patterns and continuously optimizes performance without manual tuning.
|
||||
|
||||
```rust
|
||||
pub struct SonaEngine {
|
||||
// Pattern embeddings (256-dim per query signature)
|
||||
embeddings: HashMap<PatternId, [f32; 256]>,
|
||||
|
||||
// MicroLoRA weights (rank-2, per operator type)
|
||||
lora_weights: HashMap<OperatorType, [[f32; 2]; 2]>,
|
||||
|
||||
// Trajectory statistics
|
||||
trajectories: VecDeque<Trajectory>,
|
||||
|
||||
// EWC for catastrophic forgetting prevention
|
||||
fisher_information: HashMap<String, f32>,
|
||||
}
|
||||
|
||||
impl SonaEngine {
|
||||
// Pre-query: Get enhanced embedding (fast path, <1μs)
|
||||
pub fn pre_query(&self, dag: &QueryDag) -> EnhancedEmbedding;
|
||||
|
||||
// Post-query: Record trajectory (async, background)
|
||||
pub fn post_query(&mut self, dag: &QueryDag, latency: Duration, baseline: Duration);
|
||||
|
||||
// Background learning (separate thread)
|
||||
pub fn background_learn(&mut self);
|
||||
}
|
||||
```
|
||||
|
||||
**Key Features**:
|
||||
- **MicroLoRA**: Rank-2 adaptation in <100μs per update
|
||||
- **EWC Consolidation**: λ=5000 prevents catastrophic forgetting
|
||||
- **Trajectory Replay**: 10,000 pattern capacity with FIFO eviction
|
||||
- **Pattern Matching**: K-means++ indexing for <2ms search in 10K patterns
|
||||
|
||||
### 2.2 BTSP (Behavioral Timescale Synaptic Plasticity)
|
||||
|
||||
**Location**: `ruvector-nervous-system`
|
||||
|
||||
One-shot learning from single examples (1-3 second behavioral windows).
|
||||
|
||||
```rust
|
||||
pub struct BTSPLayer {
|
||||
weights: Array2<f32>,
|
||||
eligibility_traces: Array2<f32>,
|
||||
plateau_potentials: Vec<f32>,
|
||||
learning_window_ms: f32, // 1000-3000ms typical
|
||||
}
|
||||
|
||||
impl BTSPLayer {
|
||||
// Learn from single exposure - no batch training required
|
||||
pub fn one_shot_associate(&mut self, pattern: &[f32], teaching_signal: f32) {
|
||||
// Bidirectional plasticity based on eligibility traces
|
||||
let trace = self.compute_eligibility(pattern);
|
||||
self.weights += teaching_signal * trace;
|
||||
}
|
||||
|
||||
// Immediate recall after one-shot learning
|
||||
pub fn forward(&self, pattern: &[f32]) -> Vec<f32>;
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 E-prop (Eligibility Propagation)
|
||||
|
||||
**Location**: `ruvector-nervous-system`
|
||||
|
||||
Online learning with O(1) memory per synapse (12 bytes).
|
||||
|
||||
```rust
|
||||
pub struct EpropSynapse {
|
||||
weight: f32, // 4 bytes
|
||||
eligibility: f32, // 4 bytes
|
||||
learning_signal: f32, // 4 bytes
|
||||
// Total: 12 bytes per synapse
|
||||
}
|
||||
|
||||
impl EpropLayer {
|
||||
// Temporal credit assignment over 1000+ ms
|
||||
pub fn forward_with_eligibility(&mut self, input: &[f32]) -> Vec<f32>;
|
||||
|
||||
// Online weight update (no BPTT required)
|
||||
pub fn update(&mut self, reward_signal: f32);
|
||||
}
|
||||
```
|
||||
|
||||
### 2.4 ReasoningBank Intelligence
|
||||
|
||||
**Location**: `.ruvector/intelligence.json` (Q-learning patterns)
|
||||
|
||||
```json
|
||||
{
|
||||
"patterns": {
|
||||
"state_signature": {
|
||||
"action": "agent_type",
|
||||
"q_value": 0.85,
|
||||
"count": 42,
|
||||
"last_update": "2025-01-01T00:00:00Z"
|
||||
}
|
||||
},
|
||||
"memories": [
|
||||
{
|
||||
"content": "semantic embedding",
|
||||
"embedding": [0.1, 0.2, ...],
|
||||
"type": "swarm|session|permanent"
|
||||
}
|
||||
],
|
||||
"trajectories": [
|
||||
{
|
||||
"state": "file_edit",
|
||||
"action": "rust-developer",
|
||||
"reward": 1.0,
|
||||
"next_state": "success"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Self-Optimization Systems
|
||||
|
||||
### 3.1 MinCut Coherence Control
|
||||
|
||||
**Location**: `ruvector-mincut-wasm`
|
||||
|
||||
The central control signal for all self-optimization.
|
||||
|
||||
```
|
||||
MinCut Tension → Triggers:
|
||||
├── Attention switching (Topological → MinCut-Gated)
|
||||
├── SONA learning rate boost (2x when tension > 0.7)
|
||||
├── Predictive healing intervention
|
||||
├── Cache invalidation
|
||||
└── Resource reallocation
|
||||
```
|
||||
|
||||
**Performance**: O(n^0.12) subpolynomial updates, verified empirically.
|
||||
|
||||
### 3.2 Tiny Dancer Router
|
||||
|
||||
**Location**: `ruvector-tiny-dancer-wasm`
|
||||
|
||||
AI request routing for 70-85% LLM cost reduction.
|
||||
|
||||
```typescript
|
||||
interface TinyDancerRouter {
|
||||
// Route decision in <10μs
|
||||
route(candidates: Candidate[]): RoutingDecision;
|
||||
|
||||
// Confidence-based model selection
|
||||
// High confidence → lightweight model (cheap)
|
||||
// Low confidence → powerful model (expensive)
|
||||
}
|
||||
```
|
||||
|
||||
**Latency Breakdown**:
|
||||
- Feature extraction: 144ns (384-dim vectors)
|
||||
- Model inference: 7.5μs
|
||||
- Complete routing: 92.86μs (100 candidates)
|
||||
|
||||
### 3.3 Circadian Controller
|
||||
|
||||
**Location**: `ruvector-nervous-system`
|
||||
|
||||
5-50x compute savings via duty cycling.
|
||||
|
||||
```rust
|
||||
pub struct CircadianController {
|
||||
phase: CircadianPhase, // Active, Dawn, Dusk, Rest
|
||||
coherence: f32,
|
||||
period_hours: f32,
|
||||
}
|
||||
|
||||
impl CircadianController {
|
||||
pub fn should_compute(&self) -> bool;
|
||||
pub fn should_learn(&self) -> bool;
|
||||
pub fn should_consolidate(&self) -> bool;
|
||||
pub fn duty_factor(&self) -> f32; // 0.0 - 1.0
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Self-Healing Orchestrator
|
||||
|
||||
**Location**: `ruvector-dag`
|
||||
|
||||
Reactive + predictive anomaly detection and repair.
|
||||
|
||||
```rust
|
||||
pub struct HealingOrchestrator {
|
||||
// Reactive: Z-score anomaly detection
|
||||
detectors: HashMap<String, AnomalyDetector>,
|
||||
|
||||
// Predictive: Rising tension triggers early intervention
|
||||
predictive_config: PredictiveConfig,
|
||||
}
|
||||
|
||||
impl HealingOrchestrator {
|
||||
// Reactive healing
|
||||
pub fn detect_anomalies(&self) -> Vec<Anomaly>;
|
||||
|
||||
// Predictive intervention
|
||||
pub fn predict_and_prepare(&self, mincut_analysis: &MinCutAnalysis);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 4: MicroLoRA Implementation
|
||||
|
||||
### 4.1 Architecture
|
||||
|
||||
MicroLoRA provides instant adaptation (<100μs) with minimal parameter overhead.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ MicroLoRA Architecture │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Base Model Weights (Frozen) │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ W_base: [hidden_dim × hidden_dim] │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ + │
|
||||
│ LoRA Adaptation (Trainable, Rank-2) │
|
||||
│ ┌────────────┐ ┌────────────┐ │
|
||||
│ │ A: [d × 2] │ × │ B: [2 × d] │ = ΔW: [d × d] │
|
||||
│ └────────────┘ └────────────┘ │
|
||||
│ ▲ ▲ │
|
||||
│ │ │ │
|
||||
│ └───────────────────┴───── Per-operator-type weights │
|
||||
│ │
|
||||
│ Effective Weight: W = W_base + α × (A × B) │
|
||||
│ Where α = scaling factor (typically 0.1) │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 4.2 Scoped Adaptation
|
||||
|
||||
```rust
|
||||
pub struct MicroLoraWeights {
|
||||
// One LoRA pair per operator type
|
||||
pub weights: HashMap<OperatorType, LoRAPair>,
|
||||
}
|
||||
|
||||
pub struct LoRAPair {
|
||||
pub a: [[f32; 2]; EMBED_DIM], // Down projection
|
||||
pub b: [[f32; EMBED_DIM]; 2], // Up projection
|
||||
pub alpha: f32, // Scaling factor
|
||||
}
|
||||
|
||||
impl MicroLoraWeights {
|
||||
// Apply LoRA in <100μs
|
||||
pub fn adapt(&self, base_embedding: &[f32], op_type: OperatorType) -> Vec<f32> {
|
||||
let lora = self.weights.get(&op_type).unwrap_or_default();
|
||||
let delta = matmul(&lora.a, &lora.b);
|
||||
base_embedding.iter()
|
||||
.zip(delta.iter())
|
||||
.map(|(b, d)| b + lora.alpha * d)
|
||||
.collect()
|
||||
}
|
||||
|
||||
// Update from trajectory in background
|
||||
pub fn update(&mut self, trajectory: &Trajectory, learning_rate: f32);
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Training Pipeline
|
||||
|
||||
```
|
||||
Query Execution → Trajectory Recording → Background Update
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Measure (pattern, latency, Update LoRA weights
|
||||
latency baseline, mechanism) via gradient descent
|
||||
│
|
||||
▼
|
||||
EWC Consolidation
|
||||
(prevent forgetting)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 5: Autonomous Business Infrastructure
|
||||
|
||||
### 5.1 Credit Economy Model
|
||||
|
||||
**Location**: `examples/edge-net`
|
||||
|
||||
Self-sustaining P2P compute marketplace.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ CREDIT ECONOMY FLOW │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ EARNING SPENDING │
|
||||
│ ─────── ──────── │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Compute │ ──► 1 credit/ │ Submit Task │ ──► Pay │
|
||||
│ │ Task │ task unit │ │ credits │
|
||||
│ └─────────────┘ └─────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Uptime │ ──► 0.1 credit/ │ Priority │ ──► 2x │
|
||||
│ │ Bonus │ hour online │ Execution │ credits │
|
||||
│ └─────────────┘ └─────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Early │ ──► 10x → 1x │ Storage │ ──► 0.01/ │
|
||||
│ │ Adopter │ multiplier │ (Vectors) │ MB/day │
|
||||
│ └─────────────┘ └─────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 5.2 Contribution Curve
|
||||
|
||||
```rust
|
||||
// Exponential decay incentivizing early adoption
|
||||
fn contribution_multiplier(network_compute: f64) -> f64 {
|
||||
const MAX_BONUS: f64 = 10.0;
|
||||
const DECAY_CONSTANT: f64 = 1_000_000.0; // CPU-hours
|
||||
|
||||
1.0 + (MAX_BONUS - 1.0) * (-network_compute / DECAY_CONSTANT).exp()
|
||||
}
|
||||
|
||||
// Progression:
|
||||
// Genesis (0 hours): 10.0x
|
||||
// 100K CPU-hours: 9.1x
|
||||
// 500K CPU-hours: 6.1x
|
||||
// 1M CPU-hours: 4.0x
|
||||
// 5M CPU-hours: 1.4x
|
||||
// 10M+ CPU-hours: 1.0x
|
||||
```
|
||||
|
||||
### 5.3 CRDT Ledger
|
||||
|
||||
```rust
|
||||
pub struct CreditLedger {
|
||||
// G-Counter: monotonically increasing credits earned
|
||||
earned: HashMap<NodeId, u64>,
|
||||
|
||||
// PN-Counter: credits spent (can be disputed)
|
||||
spent: HashMap<NodeId, (u64, u64)>,
|
||||
|
||||
// Merkle root for quick verification
|
||||
state_root: [u8; 32],
|
||||
}
|
||||
|
||||
impl CreditLedger {
|
||||
// CRDT merge: take max of each counter
|
||||
pub fn merge(&mut self, other: &CreditLedger) {
|
||||
for (node, value) in &other.earned {
|
||||
self.earned.entry(*node)
|
||||
.and_modify(|v| *v = (*v).max(*value))
|
||||
.or_insert(*value);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.4 Autonomous Agent Economy
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ AUTONOMOUS AGENT BUSINESS MODEL │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ AGENTS AS ECONOMIC ACTORS │
|
||||
│ ───────────────────────── │
|
||||
│ │
|
||||
│ 1. SPECIALIZATION │
|
||||
│ └── Agents optimize for specific task types │
|
||||
│ └── Higher reputation = more tasks = more credits │
|
||||
│ │
|
||||
│ 2. MARKET DYNAMICS │
|
||||
│ └── Task pricing adjusts to supply/demand │
|
||||
│ └── Rare skills command premium pricing │
|
||||
│ │
|
||||
│ 3. REPUTATION CAPITAL │
|
||||
│ └── Accuracy builds reputation over time │
|
||||
│ └── High reputation = priority task assignment │
|
||||
│ │
|
||||
│ 4. STAKE & SLASH │
|
||||
│ └── Agents stake credits to participate │
|
||||
│ └── Invalid results = stake slashed │
|
||||
│ │
|
||||
│ 5. AUTONOMOUS OPTIMIZATION │
|
||||
│ └── Agents self-optimize via SONA + MicroLoRA │
|
||||
│ └── Better performance = higher earnings │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 6: Exotic Feature Proposals
|
||||
|
||||
### 6.1 Neural Autonomous Organizations (NAOs)
|
||||
|
||||
Self-governing agent collectives with emergent behavior.
|
||||
|
||||
```rust
|
||||
pub struct NeuralAutonomousOrg {
|
||||
// Member agents with stake
|
||||
members: HashMap<AgentId, Stake>,
|
||||
|
||||
// Governance via attention-weighted voting
|
||||
governance: AttentionGovernance,
|
||||
|
||||
// Shared memory (HDC vectors)
|
||||
collective_memory: HdcMemory,
|
||||
|
||||
// Oscillatory synchronization for coordination
|
||||
sync_controller: OscillatoryRouter,
|
||||
}
|
||||
|
||||
impl NeuralAutonomousOrg {
|
||||
// Propose action via attention mechanism
|
||||
pub fn propose(&mut self, action: Action) -> ProposalId;
|
||||
|
||||
// Vote using stake-weighted attention
|
||||
pub fn vote(&mut self, proposal: ProposalId, vote: Vote);
|
||||
|
||||
// Execute if consensus reached
|
||||
pub fn execute(&mut self, proposal: ProposalId) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Morphogenetic Networks
|
||||
|
||||
Networks that grow like biological organisms.
|
||||
|
||||
```rust
|
||||
pub struct MorphogeneticNetwork {
|
||||
// Growth factor gradients
|
||||
gradients: HashMap<Position, GrowthFactor>,
|
||||
|
||||
// Cell differentiation (agent specialization)
|
||||
differentiation_rules: Vec<DifferentiationRule>,
|
||||
|
||||
// Pattern formation via reaction-diffusion
|
||||
reaction_diffusion: TuringPattern,
|
||||
}
|
||||
|
||||
impl MorphogeneticNetwork {
|
||||
// Grow new nodes based on gradients
|
||||
pub fn grow(&mut self, dt: f32);
|
||||
|
||||
// Differentiate nodes into specialized types
|
||||
pub fn differentiate(&mut self);
|
||||
|
||||
// Prune weak connections (apoptosis)
|
||||
pub fn prune(&mut self, threshold: f32);
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 Time Crystal Coordination
|
||||
|
||||
Self-sustaining periodic coordination patterns.
|
||||
|
||||
```rust
|
||||
pub struct TimeCrystal {
|
||||
// Phase-locked oscillators
|
||||
oscillators: Vec<KuramotoOscillator>,
|
||||
|
||||
// Discrete time translation symmetry breaking
|
||||
period: Duration,
|
||||
|
||||
// Coordination pattern that persists indefinitely
|
||||
pattern: CoordinationPattern,
|
||||
}
|
||||
|
||||
impl TimeCrystal {
|
||||
// Establish time crystal order
|
||||
pub fn crystallize(&mut self);
|
||||
|
||||
// Coordination tick (self-sustaining)
|
||||
pub fn tick(&mut self);
|
||||
}
|
||||
```
|
||||
|
||||
### 6.4 Federated Strange Loops
|
||||
|
||||
Multi-system mutual observation with spike-based consensus.
|
||||
|
||||
```rust
|
||||
pub struct FederatedStrangeLoop {
|
||||
// Systems observing each other
|
||||
observers: Vec<Observer>,
|
||||
|
||||
// Spike train for consensus
|
||||
spike_trains: HashMap<SystemId, SpikeTrain>,
|
||||
|
||||
// Meta-cognition (system modeling itself)
|
||||
self_model: SelfModel,
|
||||
}
|
||||
|
||||
impl FederatedStrangeLoop {
|
||||
// Mutual observation step
|
||||
pub fn observe(&mut self);
|
||||
|
||||
// Spike-based consensus
|
||||
pub fn consensus(&mut self) -> ConsensusResult;
|
||||
|
||||
// Self-model update
|
||||
pub fn introspect(&mut self);
|
||||
}
|
||||
```
|
||||
|
||||
### 6.5 Quantum-Resistant Distributed Learning (QuDAG)
|
||||
|
||||
**Location**: `ruvector-dag`
|
||||
|
||||
```rust
|
||||
pub struct QuDagClient {
|
||||
// Sync frequency bounds
|
||||
min_sync_interval: Duration, // 1 min
|
||||
max_sync_interval: Duration, // 1 hour
|
||||
|
||||
// Privacy
|
||||
differential_privacy_epsilon: f32, // 0.1
|
||||
|
||||
// Crypto
|
||||
ml_kem: MlKemCipher, // Post-quantum key exchange
|
||||
}
|
||||
|
||||
impl QuDagClient {
|
||||
// Sync mature patterns to network
|
||||
pub async fn sync_patterns(&self, patterns: Vec<Pattern>) -> Result<()>;
|
||||
|
||||
// Receive network-learned patterns
|
||||
pub async fn receive_patterns(&self) -> Result<Vec<Pattern>>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 7: Implementation Roadmap
|
||||
|
||||
### Phase 1: WASM Integration (Week 1-2)
|
||||
|
||||
| Task | Description | Deliverable |
|
||||
|------|-------------|-------------|
|
||||
| 1.1 | Create unified attention WASM bundle | `ruvector-attention-unified-wasm` |
|
||||
| 1.2 | Integrate nervous system components | BTSP, E-prop, HDC in WASM |
|
||||
| 1.3 | Add MinCut coherence to all attention | Gate packet propagation |
|
||||
| 1.4 | Implement Mamba SSM attention | O(n) selective scan |
|
||||
| 1.5 | Benchmark all mechanisms | Latency, memory, accuracy |
|
||||
|
||||
### Phase 2: Self-Learning (Week 3-4)
|
||||
|
||||
| Task | Description | Deliverable |
|
||||
|------|-------------|-------------|
|
||||
| 2.1 | Port SONA to WASM | 58KB learning engine |
|
||||
| 2.2 | Implement MicroLoRA in WASM | <100μs adaptation |
|
||||
| 2.3 | Add trajectory recording | Browser storage integration |
|
||||
| 2.4 | EWC consolidation | Catastrophic forgetting prevention |
|
||||
| 2.5 | Pattern matching index | K-means++ for <2ms search |
|
||||
|
||||
### Phase 3: Self-Optimization (Week 5-6)
|
||||
|
||||
| Task | Description | Deliverable |
|
||||
|------|-------------|-------------|
|
||||
| 3.1 | MinCut tension signals | Event bus for all subsystems |
|
||||
| 3.2 | Dynamic attention switching | Policy-driven selection |
|
||||
| 3.3 | Self-healing in WASM | Reactive + predictive |
|
||||
| 3.4 | Circadian controller | Duty cycling for edge |
|
||||
| 3.5 | Tiny Dancer integration | Cost-optimized routing |
|
||||
|
||||
### Phase 4: Autonomous Economy (Week 7-8)
|
||||
|
||||
| Task | Description | Deliverable |
|
||||
|------|-------------|-------------|
|
||||
| 4.1 | Credit ledger (CRDT) | P2P consistent balances |
|
||||
| 4.2 | Contribution curve | Early adopter bonuses |
|
||||
| 4.3 | Stake/slash mechanics | Anti-gaming |
|
||||
| 4.4 | Reputation system | Trust scoring |
|
||||
| 4.5 | Market dynamics | Supply/demand pricing |
|
||||
|
||||
### Phase 5: Exotic Features (Week 9-10)
|
||||
|
||||
| Task | Description | Deliverable |
|
||||
|------|-------------|-------------|
|
||||
| 5.1 | NAO governance | Attention-weighted voting |
|
||||
| 5.2 | Morphogenetic growth | Reaction-diffusion patterns |
|
||||
| 5.3 | Time crystal coordination | Self-sustaining patterns |
|
||||
| 5.4 | Federated loops | Spike-based consensus |
|
||||
| 5.5 | QuDAG sync | Quantum-resistant learning |
|
||||
|
||||
---
|
||||
|
||||
## Part 8: API Surface
|
||||
|
||||
### 8.1 Unified Attention API
|
||||
|
||||
```typescript
|
||||
// @ruvector/attention-wasm
|
||||
export interface AttentionEngine {
|
||||
// Neural attention mechanisms
|
||||
scaledDot(Q: Float32Array, K: Float32Array, V: Float32Array): Float32Array;
|
||||
multiHead(query: Float32Array, keys: Float32Array[], values: Float32Array[], config: MultiHeadConfig): Float32Array;
|
||||
hyperbolic(query: Float32Array, keys: Float32Array[], values: Float32Array[], curvature: number): Float32Array;
|
||||
linear(query: Float32Array, keys: Float32Array[], values: Float32Array[]): Float32Array;
|
||||
flash(query: Float32Array, keys: Float32Array[], values: Float32Array[]): Float32Array;
|
||||
localGlobal(query: Float32Array, keys: Float32Array[], values: Float32Array[], windowSize: number): Float32Array;
|
||||
moe(query: Float32Array, keys: Float32Array[], values: Float32Array[], numExperts: number, topK: number): Float32Array;
|
||||
mamba(input: Float32Array, state: Float32Array): { output: Float32Array; newState: Float32Array };
|
||||
|
||||
// DAG attention mechanisms
|
||||
dagTopological(dag: QueryDag): AttentionScores;
|
||||
dagCausalCone(dag: QueryDag, node: number): AttentionScores;
|
||||
dagCriticalPath(dag: QueryDag): AttentionScores;
|
||||
dagMincutGated(dag: QueryDag, gatePacket: GatePacket): AttentionScores;
|
||||
|
||||
// Nervous system attention
|
||||
globalWorkspace(inputs: Representation[], capacity: number): Representation[];
|
||||
oscillatoryRoute(sender: number, receiver: number, phase: number): number;
|
||||
predictiveCode(input: Float32Array, prediction: Float32Array): Float32Array;
|
||||
kWinnerTakeAll(activations: Float32Array, k: number): number[];
|
||||
}
|
||||
```
|
||||
|
||||
### 8.2 Self-Learning API
|
||||
|
||||
```typescript
|
||||
// @ruvector/learning-wasm
|
||||
export interface LearningEngine {
|
||||
// SONA
|
||||
sonaPreQuery(dag: QueryDag): EnhancedEmbedding;
|
||||
sonaPostQuery(dag: QueryDag, latency: number, baseline: number): void;
|
||||
sonaBackgroundLearn(): void;
|
||||
|
||||
// MicroLoRA
|
||||
microLoraAdapt(embedding: Float32Array, opType: OperatorType): Float32Array;
|
||||
microLoraUpdate(trajectory: Trajectory, lr: number): void;
|
||||
|
||||
// BTSP
|
||||
btspOneShotAssociate(pattern: Float32Array, teachingSignal: number): void;
|
||||
btspRecall(pattern: Float32Array): Float32Array;
|
||||
|
||||
// E-prop
|
||||
epropForward(input: Float32Array): Float32Array;
|
||||
epropUpdate(rewardSignal: number): void;
|
||||
}
|
||||
```
|
||||
|
||||
### 8.3 Autonomous Economy API
|
||||
|
||||
```typescript
|
||||
// @ruvector/edge-net
|
||||
export interface AutonomousEconomy {
|
||||
// Credits
|
||||
creditBalance(): number;
|
||||
creditEarn(taskId: string, amount: number): void;
|
||||
creditSpend(taskId: string, amount: number): boolean;
|
||||
|
||||
// Contribution
|
||||
contributionMultiplier(): number;
|
||||
contributionStats(): ContributionStats;
|
||||
|
||||
// Reputation
|
||||
reputationScore(): number;
|
||||
reputationHistory(): ReputationEvent[];
|
||||
|
||||
// Stake
|
||||
stakeDeposit(amount: number): void;
|
||||
stakeWithdraw(amount: number): boolean;
|
||||
stakeSlash(amount: number, reason: string): void;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 9: Performance Targets
|
||||
|
||||
### 9.1 Latency Targets
|
||||
|
||||
| Component | Target | Rationale |
|
||||
|-----------|--------|-----------|
|
||||
| Neural Attention (100 tokens) | <100μs | Real-time inference |
|
||||
| DAG Attention (100 nodes) | <100μs | Query optimization |
|
||||
| MicroLoRA Adaptation | <100μs | Instant personalization |
|
||||
| SONA Pattern Match (10K) | <2ms | Large pattern libraries |
|
||||
| MinCut Update | O(n^0.12) | Subpolynomial scaling |
|
||||
| Credit Balance Query | <1ms | Instant feedback |
|
||||
| Self-Healing Detection | <50μs | Proactive intervention |
|
||||
|
||||
### 9.2 Memory Targets
|
||||
|
||||
| Component | Target | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Core WASM Bundle | <100KB | Compressed |
|
||||
| Learning State | <10MB | Per-browser |
|
||||
| Trajectory Buffer | 10K entries | FIFO eviction |
|
||||
| Credit Ledger | <1MB | CRDT sync |
|
||||
| HDC Vectors | 10KB each | 10,000-bit binary |
|
||||
|
||||
### 9.3 Accuracy Targets
|
||||
|
||||
| Metric | Target | Measurement |
|
||||
|--------|--------|-------------|
|
||||
| Attention Correctness | 100% | vs reference impl |
|
||||
| Learning Improvement | 50-80% | Latency reduction |
|
||||
| Reputation Accuracy | 95% | Task success prediction |
|
||||
| Self-Healing Precision | 90% | Anomaly detection |
|
||||
| Credit Consistency | 99.9% | CRDT convergence |
|
||||
|
||||
---
|
||||
|
||||
## Part 10: Dependencies
|
||||
|
||||
### 10.1 Existing Crates
|
||||
|
||||
| Crate | Version | Purpose |
|
||||
|-------|---------|---------|
|
||||
| `ruvector-attention-wasm` | 0.1.x | Neural attention mechanisms |
|
||||
| `ruvector-mincut-gated-transformer-wasm` | 0.1.x | MinCut coherence control |
|
||||
| `ruvector-dag-wasm` | 0.1.x | DAG attention + SONA |
|
||||
| `ruvector-gnn-wasm` | 0.1.x | Graph attention |
|
||||
| `ruvector-nervous-system` | 0.1.x | Bio-inspired mechanisms |
|
||||
| `ruvector-tiny-dancer-wasm` | 0.1.x | Cost-optimized routing |
|
||||
|
||||
### 10.2 New Crates to Create
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `ruvector-attention-unified-wasm` | Combined attention mechanisms |
|
||||
| `ruvector-learning-wasm` | Self-learning + MicroLoRA |
|
||||
| `ruvector-nervous-system-wasm` | BTSP, E-prop, HDC for browser |
|
||||
| `ruvector-economy-wasm` | Credit ledger, reputation |
|
||||
| `ruvector-exotic-wasm` | NAO, morphogenetic, time crystals |
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
This plan provides a comprehensive roadmap for implementing exotic AI/agentic features in RuVector, from foundational attention mechanisms through self-learning systems to autonomous business infrastructure.
|
||||
|
||||
**Key Innovations**:
|
||||
1. **21+ Attention Mechanisms** across neural, DAG, graph, and bio-inspired domains
|
||||
2. **Sub-100μs MicroLoRA** for instant adaptation
|
||||
3. **SONA Self-Learning** with catastrophic forgetting prevention
|
||||
4. **MinCut Coherence** as the central control signal
|
||||
5. **Autonomous Credit Economy** with CRDT consistency
|
||||
6. **Exotic Features** (NAOs, morphogenetic, time crystals) for emergent behavior
|
||||
|
||||
**Total WASM Bundle Size**: ~200KB compressed (all features)
|
||||
|
||||
**Expected Outcomes**:
|
||||
- 50-80% latency reduction via self-learning
|
||||
- 70-85% LLM cost reduction via routing
|
||||
- Self-sustaining P2P compute marketplace
|
||||
- Emergent collective intelligence
|
||||
999
vendor/ruvector/docs/architecture/bitnet-quantizer-module-design.md
vendored
Normal file
999
vendor/ruvector/docs/architecture/bitnet-quantizer-module-design.md
vendored
Normal file
@@ -0,0 +1,999 @@
|
||||
# PT-BitNet Quantizer Module Architecture Design
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2026-02-03
|
||||
**Status:** Design Specification
|
||||
**Relates to:** ADR-017 (AD-1, AD-5, AD-18, AD-19), DDD Section 3.4/4.2/4.3
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document specifies the architecture for the **PT-BitNet post-training quantizer** module that converts FP16/BF16 GLM-4.7-Flash weights to BitNet b1.58 ternary {-1, 0, +1} format via absmean quantization. This is a **design-only specification** — implementation follows in Phase 0.
|
||||
|
||||
**Design Scope:**
|
||||
- Module layout and file organization
|
||||
- Complete struct definitions with field types
|
||||
- Full function signatures (no implementations)
|
||||
- GGUF integration points and format extensions
|
||||
- Error handling strategy
|
||||
- Testing approach
|
||||
|
||||
**Out of Scope:**
|
||||
- Actual implementation code
|
||||
- Performance benchmarks
|
||||
- Calibration dataset selection
|
||||
|
||||
---
|
||||
|
||||
## A. Module Layout
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
crates/ruvllm/src/
|
||||
├── bitnet/ # NEW module
|
||||
│ ├── mod.rs # Module exports and public API
|
||||
│ ├── quantizer.rs # PtBitnetQuantizer + absmean algorithm
|
||||
│ ├── ternary_tensor.rs # TernaryTensor value object
|
||||
│ ├── dequantize.rs # BITNET_T158 dequantization kernel
|
||||
│ └── config.rs # PtBitnetConfig configuration
|
||||
│
|
||||
├── gguf/
|
||||
│ ├── mod.rs # Add pub mod bitnet export
|
||||
│ ├── quantization.rs # MODIFIED: Add BITNET_T158 enum variant
|
||||
│ ├── parser.rs # Unchanged (reused as-is)
|
||||
│ └── ...
|
||||
│
|
||||
└── kernels/
|
||||
└── matmul.rs # Reference for dispatch patterns
|
||||
```
|
||||
|
||||
### Modified Files
|
||||
|
||||
#### `src/gguf/quantization.rs`
|
||||
|
||||
**Changes:**
|
||||
1. Add `BITNET_T158 = 30` variant to `GgufQuantType` enum (after `Bf16 = 29`)
|
||||
2. Update `try_from()` impl to handle type 30
|
||||
3. Update `block_size()` to return 256 for `BITNET_T158`
|
||||
4. Update `type_size()` to return 66 for `BITNET_T158` (64 bytes packed + 2 bytes FP16 scale)
|
||||
5. Update `is_quantized()` to include `BITNET_T158`
|
||||
6. Update `bits_per_weight()` to return 2.06 for `BITNET_T158`
|
||||
7. Add new match arm in `dequantize_tensor()` → `BITNET_T158 => dequantize_bitnet_t158(data, output)`
|
||||
|
||||
**Exact enum addition:**
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
#[repr(u32)]
|
||||
pub enum GgufQuantType {
|
||||
// ... existing variants 0-29 ...
|
||||
/// BitNet b1.58 ternary quantization (2-bit packed + FP16 scale per 256-element block)
|
||||
BITNET_T158 = 30,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## B. Struct Definitions
|
||||
|
||||
### 1. `PtBitnetConfig` (in `bitnet/config.rs`)
|
||||
|
||||
**Purpose:** Configuration for PT-BitNet quantization process
|
||||
|
||||
```rust
|
||||
/// Configuration for PT-BitNet post-training quantization
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct PtBitnetConfig {
|
||||
/// Block size for absmean scale computation (default: 256)
|
||||
pub block_size: usize,
|
||||
|
||||
/// Epsilon for numerical stability in scale computation (default: 1e-8)
|
||||
pub epsilon: f32,
|
||||
|
||||
/// Whether to run calibration pass to optimize scale factors
|
||||
pub use_calibration: bool,
|
||||
|
||||
/// Number of calibration samples (if use_calibration = true)
|
||||
pub calibration_samples: usize,
|
||||
|
||||
/// Maximum sequence length for calibration (default: 2048)
|
||||
pub calibration_max_seq_len: usize,
|
||||
|
||||
/// Device for calibration pass ("cpu", "metal", "cuda:0")
|
||||
pub calibration_device: String,
|
||||
|
||||
/// Clipping threshold for normalized weights before rounding
|
||||
/// (default: 1.0, range typically 0.95-1.05)
|
||||
pub clip_threshold: f32,
|
||||
|
||||
/// Sparsity target: if > 0.0, bias rounding toward zero to achieve target sparsity
|
||||
pub target_sparsity: Option<f32>,
|
||||
}
|
||||
|
||||
impl Default for PtBitnetConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
block_size: 256,
|
||||
epsilon: 1e-8,
|
||||
use_calibration: false,
|
||||
calibration_samples: 1000,
|
||||
calibration_max_seq_len: 2048,
|
||||
calibration_device: "metal".to_string(),
|
||||
clip_threshold: 1.0,
|
||||
target_sparsity: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. `TernaryTensor` (in `bitnet/ternary_tensor.rs`)
|
||||
|
||||
**Purpose:** Immutable value object for packed ternary weights
|
||||
|
||||
```rust
|
||||
/// Packed ternary tensor with per-block FP16 scales
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct TernaryTensor {
|
||||
/// Packed 2-bit ternary values (4 weights per byte)
|
||||
/// Encoding: 00 = -1, 01 = 0, 10 = +1, 11 = reserved
|
||||
pub packed_data: Vec<u8>,
|
||||
|
||||
/// Per-block FP16 scale factors (absmean values)
|
||||
pub scales: Vec<f16>,
|
||||
|
||||
/// Tensor shape [out_features, in_features] or [rows, cols]
|
||||
pub shape: [usize; 2],
|
||||
|
||||
/// Block size (always 256 for BitNet b1.58)
|
||||
pub block_size: usize,
|
||||
|
||||
/// Total number of weights
|
||||
pub num_elements: usize,
|
||||
|
||||
/// Number of blocks
|
||||
pub num_blocks: usize,
|
||||
|
||||
/// Measured sparsity (fraction of zero weights)
|
||||
pub sparsity: f32,
|
||||
}
|
||||
|
||||
impl TernaryTensor {
|
||||
/// Calculate total storage size in bytes
|
||||
pub fn storage_size(&self) -> usize;
|
||||
|
||||
/// Get expected packed_data size for validation
|
||||
pub fn expected_packed_size(&self) -> usize;
|
||||
|
||||
/// Validate internal consistency
|
||||
pub fn validate(&self) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
### 3. `TernaryBlock` (in `bitnet/ternary_tensor.rs`)
|
||||
|
||||
**Purpose:** Single block of 256 ternary weights with scale
|
||||
|
||||
```rust
|
||||
/// A single 256-element block with ternary weights and FP16 scale
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct TernaryBlock {
|
||||
/// 64 bytes of packed 2-bit values (256 weights × 2 bits ÷ 8 bits/byte)
|
||||
pub packed: [u8; 64],
|
||||
|
||||
/// FP16 absmean scale factor
|
||||
pub scale: f16,
|
||||
}
|
||||
|
||||
impl TernaryBlock {
|
||||
/// Size in bytes when stored in GGUF (64 + 2 = 66)
|
||||
pub const STORAGE_SIZE: usize = 66;
|
||||
|
||||
/// Number of elements in a block
|
||||
pub const BLOCK_SIZE: usize = 256;
|
||||
}
|
||||
```
|
||||
|
||||
### 4. `AbsmeanResult` (in `bitnet/quantizer.rs`)
|
||||
|
||||
**Purpose:** Result of absmean quantization on a single block
|
||||
|
||||
```rust
|
||||
/// Result of absmean ternary quantization on a block
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct AbsmeanResult {
|
||||
/// Ternary values {-1, 0, +1} for each weight in the block
|
||||
pub ternary_weights: Vec<i8>,
|
||||
|
||||
/// Computed absmean scale factor (gamma = mean(|W|))
|
||||
pub scale: f32,
|
||||
|
||||
/// Measured sparsity (fraction of zeros)
|
||||
pub sparsity: f32,
|
||||
|
||||
/// Mean squared error vs original FP16 values (for calibration)
|
||||
pub mse: f32,
|
||||
}
|
||||
```
|
||||
|
||||
### 5. `QuantizationStats` (in `bitnet/quantizer.rs`)
|
||||
|
||||
**Purpose:** Statistics collected during quantization
|
||||
|
||||
```rust
|
||||
/// Statistics from quantizing a single tensor
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct QuantizationStats {
|
||||
/// Tensor name
|
||||
pub name: String,
|
||||
|
||||
/// Mean of all block scales
|
||||
pub mean_scale: f32,
|
||||
|
||||
/// Std dev of block scales
|
||||
pub std_scale: f32,
|
||||
|
||||
/// Overall sparsity across all blocks
|
||||
pub sparsity: f32,
|
||||
|
||||
/// Mean MSE across all blocks
|
||||
pub mean_mse: f32,
|
||||
|
||||
/// Number of blocks
|
||||
pub num_blocks: usize,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## C. Function Signatures
|
||||
|
||||
### Core Quantization Functions (in `bitnet/quantizer.rs`)
|
||||
|
||||
#### 1. Primary Quantization Entry Point
|
||||
|
||||
```rust
|
||||
/// Quantize an FP16/F32 tensor to ternary format using absmean quantization
|
||||
///
|
||||
/// # Arguments
|
||||
/// * `tensor` - Input FP16 or F32 tensor data (flat vector)
|
||||
/// * `shape` - Tensor shape [out_features, in_features]
|
||||
/// * `config` - Quantization configuration
|
||||
///
|
||||
/// # Returns
|
||||
/// * `TernaryTensor` - Packed ternary representation
|
||||
/// * `QuantizationStats` - Statistics about the quantization process
|
||||
///
|
||||
/// # Errors
|
||||
/// * `RuvLLMError::Quantization` if tensor size is not divisible by block_size
|
||||
/// * `RuvLLMError::Quantization` if shape product doesn't match tensor length
|
||||
pub fn quantize_tensor(
|
||||
tensor: &[f32],
|
||||
shape: [usize; 2],
|
||||
config: &PtBitnetConfig,
|
||||
) -> Result<(TernaryTensor, QuantizationStats)>;
|
||||
```
|
||||
|
||||
#### 2. Per-Block Quantization
|
||||
|
||||
```rust
|
||||
/// Apply absmean quantization to a single block of weights
|
||||
///
|
||||
/// Algorithm:
|
||||
/// 1. gamma = mean(|block|) + epsilon
|
||||
/// 2. normalized = block / gamma
|
||||
/// 3. ternary = round(clamp(normalized, -clip_threshold, +clip_threshold))
|
||||
/// 4. Map to {-1, 0, +1}
|
||||
///
|
||||
/// # Arguments
|
||||
/// * `block` - Block of FP16/F32 values (length = config.block_size)
|
||||
/// * `config` - Configuration with epsilon and clip_threshold
|
||||
///
|
||||
/// # Returns
|
||||
/// * `AbsmeanResult` with ternary values, scale, sparsity, MSE
|
||||
///
|
||||
/// # Panics
|
||||
/// * If block.len() != config.block_size
|
||||
pub fn absmean_ternary(
|
||||
block: &[f32],
|
||||
config: &PtBitnetConfig,
|
||||
) -> AbsmeanResult;
|
||||
```
|
||||
|
||||
#### 3. Packing Functions
|
||||
|
||||
```rust
|
||||
/// Pack ternary {-1, 0, +1} values into 2-bit representation
|
||||
///
|
||||
/// Encoding: 00 = -1, 01 = 0, 10 = +1, 11 = reserved (unused)
|
||||
/// 4 values packed per byte: [v3 v2 v1 v0] → byte
|
||||
///
|
||||
/// # Arguments
|
||||
/// * `values` - Ternary values (must be {-1, 0, +1} only)
|
||||
///
|
||||
/// # Returns
|
||||
/// * Packed bytes (length = ceil(values.len() / 4))
|
||||
///
|
||||
/// # Errors
|
||||
/// * If any value is not in {-1, 0, +1}
|
||||
pub fn pack_ternary(values: &[i8]) -> Result<Vec<u8>>;
|
||||
|
||||
/// Unpack 2-bit representation to ternary {-1, 0, +1} values
|
||||
///
|
||||
/// # Arguments
|
||||
/// * `packed` - Packed 2-bit data
|
||||
/// * `n` - Number of values to extract
|
||||
///
|
||||
/// # Returns
|
||||
/// * Vector of ternary values (length = n)
|
||||
pub fn unpack_ternary(packed: &[u8], n: usize) -> Vec<i8>;
|
||||
```
|
||||
|
||||
#### 4. Calibration (Optional)
|
||||
|
||||
```rust
|
||||
/// Run calibration pass to optimize scale factors
|
||||
///
|
||||
/// # Arguments
|
||||
/// * `tensor` - Input FP16 tensor
|
||||
/// * `shape` - Tensor shape
|
||||
/// * `config` - Config with calibration settings
|
||||
/// * `calibration_data` - Sample activations for this layer
|
||||
///
|
||||
/// # Returns
|
||||
/// * Optimized `TernaryTensor` with calibrated scales
|
||||
///
|
||||
/// # Note
|
||||
/// This is optional - if not used, falls back to plain absmean
|
||||
pub fn quantize_with_calibration(
|
||||
tensor: &[f32],
|
||||
shape: [usize; 2],
|
||||
config: &PtBitnetConfig,
|
||||
calibration_data: &[Vec<f32>],
|
||||
) -> Result<(TernaryTensor, QuantizationStats)>;
|
||||
```
|
||||
|
||||
### Dequantization Functions (in `bitnet/dequantize.rs`)
|
||||
|
||||
```rust
|
||||
/// Dequantize BITNET_T158 tensor to FP32
|
||||
///
|
||||
/// # Arguments
|
||||
/// * `data` - Raw GGUF tensor bytes (packed ternary + scales)
|
||||
/// * `scales` - Per-block FP16 scales (extracted from data)
|
||||
/// * `n` - Total number of elements to dequantize
|
||||
///
|
||||
/// # Returns
|
||||
/// * Vec<f32> of dequantized values
|
||||
///
|
||||
/// # Format
|
||||
/// Each block: [64 bytes packed ternary][2 bytes FP16 scale]
|
||||
pub fn dequantize_bitnet_t158(
|
||||
data: &[u8],
|
||||
scales: &[f16],
|
||||
n: usize,
|
||||
) -> Vec<f32>;
|
||||
|
||||
/// Dequantize a single BITNET_T158 block
|
||||
///
|
||||
/// # Arguments
|
||||
/// * `block_data` - 64 bytes of packed ternary data
|
||||
/// * `scale` - FP16 scale factor
|
||||
/// * `output` - Output buffer (must have capacity for 256 elements)
|
||||
pub fn dequantize_bitnet_t158_block(
|
||||
block_data: &[u8; 64],
|
||||
scale: f16,
|
||||
output: &mut [f32],
|
||||
);
|
||||
```
|
||||
|
||||
### Tensor Conversion (in `bitnet/ternary_tensor.rs`)
|
||||
|
||||
```rust
|
||||
impl TernaryTensor {
|
||||
/// Convert from packed storage to FP32 (for validation/testing)
|
||||
pub fn to_fp32(&self) -> Vec<f32>;
|
||||
|
||||
/// Create from existing GGUF tensor data
|
||||
pub fn from_gguf_data(
|
||||
data: &[u8],
|
||||
shape: [usize; 2],
|
||||
block_size: usize,
|
||||
) -> Result<Self>;
|
||||
|
||||
/// Serialize to GGUF tensor bytes
|
||||
pub fn to_gguf_data(&self) -> Vec<u8>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## D. GGUF Integration Points
|
||||
|
||||
### 1. New Quantization Type Variant
|
||||
|
||||
**File:** `crates/ruvllm/src/gguf/quantization.rs`
|
||||
|
||||
**Changes to `GgufQuantType` enum:**
|
||||
|
||||
```rust
|
||||
#[repr(u32)]
|
||||
pub enum GgufQuantType {
|
||||
// ... existing 0-29 ...
|
||||
|
||||
/// BitNet b1.58 ternary quantization
|
||||
/// Block size: 256 elements
|
||||
/// Storage: 64 bytes packed (2-bit) + 2 bytes FP16 scale = 66 bytes/block
|
||||
/// Bits per weight: 2.06 bpw
|
||||
BITNET_T158 = 30,
|
||||
}
|
||||
|
||||
impl GgufQuantType {
|
||||
pub fn block_size(&self) -> usize {
|
||||
match self {
|
||||
// ... existing cases ...
|
||||
Self::BITNET_T158 => 256,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn type_size(&self) -> usize {
|
||||
match self {
|
||||
// ... existing cases ...
|
||||
Self::BITNET_T158 => 66, // 64 + 2
|
||||
}
|
||||
}
|
||||
|
||||
pub fn name(&self) -> &'static str {
|
||||
match self {
|
||||
// ... existing cases ...
|
||||
Self::BITNET_T158 => "BITNET_T158",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl TryFrom<u32> for GgufQuantType {
|
||||
fn try_from(value: u32) -> Result<Self> {
|
||||
match value {
|
||||
// ... existing 0-29 ...
|
||||
30 => Ok(Self::BITNET_T158),
|
||||
_ => Err(/* ... */),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Dequantization Dispatch
|
||||
|
||||
**File:** `crates/ruvllm/src/gguf/quantization.rs`
|
||||
|
||||
**Modification to `dequantize_tensor()` function:**
|
||||
|
||||
```rust
|
||||
pub fn dequantize_tensor(
|
||||
data: &[u8],
|
||||
dtype: GgufQuantType,
|
||||
num_elements: usize,
|
||||
) -> Result<Vec<f32>> {
|
||||
let mut output = vec![0.0f32; num_elements];
|
||||
|
||||
match dtype {
|
||||
// ... existing cases ...
|
||||
GgufQuantType::BITNET_T158 => {
|
||||
// Extract scales and packed data
|
||||
let num_blocks = (num_elements + 255) / 256;
|
||||
let mut scales = Vec::with_capacity(num_blocks);
|
||||
|
||||
for i in 0..num_blocks {
|
||||
let block_offset = i * 66;
|
||||
let scale_offset = block_offset + 64;
|
||||
let scale_bytes = [data[scale_offset], data[scale_offset + 1]];
|
||||
scales.push(f16::from_le_bytes(scale_bytes));
|
||||
}
|
||||
|
||||
crate::bitnet::dequantize::dequantize_bitnet_t158(
|
||||
data,
|
||||
&scales,
|
||||
num_elements,
|
||||
);
|
||||
}
|
||||
_ => {
|
||||
return Err(RuvLLMError::Model(format!(
|
||||
"Dequantization not implemented for {:?}",
|
||||
dtype
|
||||
)));
|
||||
}
|
||||
}
|
||||
|
||||
Ok(output)
|
||||
}
|
||||
```
|
||||
|
||||
### 3. GGUF Metadata Keys
|
||||
|
||||
**New metadata keys for BitNet models** (written during quantization, read during load):
|
||||
|
||||
```rust
|
||||
// In quantizer when exporting GGUF
|
||||
pub const BITNET_METADATA_KEYS: &[(&str, &str)] = &[
|
||||
("craftsman.bitnet.version", "1"),
|
||||
("craftsman.bitnet.weight_encoding", "absmean_ternary"),
|
||||
("craftsman.bitnet.activation_bits", "8"),
|
||||
("craftsman.bitnet.block_size", "256"),
|
||||
("craftsman.bitnet.kernel_hint", "tl1"), // or "tl2", "i2s"
|
||||
];
|
||||
```
|
||||
|
||||
**Metadata reading in model loader:**
|
||||
|
||||
```rust
|
||||
// In backend when loading model
|
||||
fn detect_bitnet_model(metadata: &HashMap<String, GgufValue>) -> bool {
|
||||
metadata.get("craftsman.bitnet.version")
|
||||
.and_then(|v| v.as_str())
|
||||
.map(|v| v == "1")
|
||||
.unwrap_or(false)
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Tensor Info Extension
|
||||
|
||||
**No changes needed** - existing `TensorInfo` struct in `parser.rs` already supports:
|
||||
- `name: String`
|
||||
- `shape: Vec<usize>`
|
||||
- `dtype: GgufQuantType` ← Will now include `BITNET_T158`
|
||||
- `offset: u64`
|
||||
|
||||
---
|
||||
|
||||
## E. Error Handling Strategy
|
||||
|
||||
### Error Types
|
||||
|
||||
All errors use existing `RuvLLMError` enum from `crates/ruvllm/src/error.rs`:
|
||||
|
||||
```rust
|
||||
pub enum RuvLLMError {
|
||||
// Existing variants...
|
||||
|
||||
// Quantization-specific errors
|
||||
Quantization(String), // Use this variant for all quantization errors
|
||||
Model(String), // For GGUF format issues
|
||||
Config(String), // For invalid configuration
|
||||
}
|
||||
```
|
||||
|
||||
### Error Scenarios and Handling
|
||||
|
||||
| Scenario | Error Type | Recovery Strategy |
|
||||
|----------|-----------|-------------------|
|
||||
| Tensor size not divisible by block_size | `Quantization` | Pad last block with zeros |
|
||||
| Invalid ternary value during packing | `Quantization` | Fail-fast - indicates bug |
|
||||
| GGUF file has wrong BITNET_T158 block size | `Model` | Fail-fast - corrupted file |
|
||||
| Calibration device unavailable | `Config` | Fall back to non-calibrated quantization |
|
||||
| Out of memory during quantization | System panic | Let Rust OOM handler catch |
|
||||
| Shape mismatch in tensor | `Quantization` | Fail-fast - validate before processing |
|
||||
| FP16 scale is NaN/Inf | `Quantization` | Clamp to epsilon value |
|
||||
| Empty tensor / zero elements | `Quantization` | Skip with warning |
|
||||
|
||||
### Validation Functions
|
||||
|
||||
```rust
|
||||
/// Validate quantization config
|
||||
pub fn validate_config(config: &PtBitnetConfig) -> Result<()> {
|
||||
if config.block_size == 0 || config.block_size % 4 != 0 {
|
||||
return Err(RuvLLMError::Config(
|
||||
"block_size must be non-zero and divisible by 4".into()
|
||||
));
|
||||
}
|
||||
|
||||
if config.epsilon <= 0.0 {
|
||||
return Err(RuvLLMError::Config(
|
||||
"epsilon must be positive".into()
|
||||
));
|
||||
}
|
||||
|
||||
if config.clip_threshold <= 0.0 || config.clip_threshold > 2.0 {
|
||||
return Err(RuvLLMError::Config(
|
||||
"clip_threshold must be in range (0.0, 2.0]".into()
|
||||
));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Validate tensor shape and size
|
||||
pub fn validate_tensor(
|
||||
tensor: &[f32],
|
||||
shape: [usize; 2],
|
||||
block_size: usize,
|
||||
) -> Result<()> {
|
||||
let expected_size = shape[0] * shape[1];
|
||||
|
||||
if tensor.len() != expected_size {
|
||||
return Err(RuvLLMError::Quantization(format!(
|
||||
"Tensor length {} doesn't match shape {:?} (expected {})",
|
||||
tensor.len(), shape, expected_size
|
||||
)));
|
||||
}
|
||||
|
||||
if expected_size % block_size != 0 {
|
||||
// Could pad, but for simplicity require exact multiple
|
||||
return Err(RuvLLMError::Quantization(format!(
|
||||
"Tensor size {} is not divisible by block_size {}",
|
||||
expected_size, block_size
|
||||
)));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## F. Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
|
||||
#### 1. Absmean Quantization Correctness
|
||||
|
||||
**File:** `crates/ruvllm/src/bitnet/tests/quantizer_tests.rs`
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_absmean_ternary_basic() {
|
||||
// Test that absmean correctly quantizes known values
|
||||
let config = PtBitnetConfig::default();
|
||||
|
||||
// Block with known mean(|x|) = 1.0
|
||||
let block: Vec<f32> = vec![
|
||||
2.0, -2.0, 1.0, -1.0, // gamma = mean(2,2,1,1,...) ≈ 1.0
|
||||
0.5, -0.5, 0.0, 0.0,
|
||||
// ... (pad to 256 elements)
|
||||
];
|
||||
|
||||
let result = absmean_ternary(&block, &config);
|
||||
|
||||
// After normalization: 2.0/1.0 = 2.0 → clamp to 1.0 → round to +1
|
||||
assert_eq!(result.ternary_weights[0], 1); // 2.0 → +1
|
||||
assert_eq!(result.ternary_weights[1], -1); // -2.0 → -1
|
||||
assert_eq!(result.ternary_weights[2], 1); // 1.0 → +1
|
||||
assert_eq!(result.ternary_weights[6], 0); // 0.0 → 0
|
||||
|
||||
assert!(result.scale > 0.9 && result.scale < 1.1); // gamma ≈ 1.0
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_absmean_all_zeros() {
|
||||
let config = PtBitnetConfig::default();
|
||||
let block = vec![0.0; 256];
|
||||
|
||||
let result = absmean_ternary(&block, &config);
|
||||
|
||||
// All zeros → scale = epsilon, all ternary = 0
|
||||
assert_eq!(result.scale, config.epsilon);
|
||||
assert!(result.ternary_weights.iter().all(|&x| x == 0));
|
||||
assert_eq!(result.sparsity, 1.0);
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. Pack/Unpack Round-Trip
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_pack_unpack_roundtrip() {
|
||||
let original = vec![1i8, -1, 0, 1, 0, -1, 1, 0];
|
||||
|
||||
let packed = pack_ternary(&original).unwrap();
|
||||
assert_eq!(packed.len(), 2); // 8 values → 2 bytes
|
||||
|
||||
let unpacked = unpack_ternary(&packed, 8);
|
||||
assert_eq!(unpacked, original);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_pack_invalid_value() {
|
||||
let invalid = vec![1i8, 2, 0]; // 2 is not ternary
|
||||
|
||||
let result = pack_ternary(&invalid);
|
||||
assert!(result.is_err());
|
||||
}
|
||||
```
|
||||
|
||||
#### 3. Tensor Validation
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_validate_tensor_shape_mismatch() {
|
||||
let tensor = vec![1.0; 100];
|
||||
let shape = [10, 11]; // 10*11 = 110 ≠ 100
|
||||
|
||||
let result = validate_tensor(&tensor, shape, 256);
|
||||
assert!(result.is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_validate_tensor_block_alignment() {
|
||||
let tensor = vec![1.0; 257]; // Not divisible by 256
|
||||
let shape = [1, 257];
|
||||
|
||||
let result = validate_tensor(&tensor, shape, 256);
|
||||
assert!(result.is_err());
|
||||
}
|
||||
```
|
||||
|
||||
### Integration Tests
|
||||
|
||||
#### 4. Full Quantization Pipeline
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_quantize_tensor_full_pipeline() {
|
||||
let config = PtBitnetConfig::default();
|
||||
|
||||
// Create a 512-element tensor (2 blocks)
|
||||
let tensor: Vec<f32> = (0..512).map(|i| (i as f32) / 512.0).collect();
|
||||
let shape = [2, 256];
|
||||
|
||||
let (ternary, stats) = quantize_tensor(&tensor, shape, &config).unwrap();
|
||||
|
||||
assert_eq!(ternary.num_blocks, 2);
|
||||
assert_eq!(ternary.packed_data.len(), 2 * 64); // 2 blocks × 64 bytes
|
||||
assert_eq!(ternary.scales.len(), 2);
|
||||
assert_eq!(stats.num_blocks, 2);
|
||||
|
||||
// Verify reconstruction quality
|
||||
let reconstructed = ternary.to_fp32();
|
||||
assert_eq!(reconstructed.len(), 512);
|
||||
}
|
||||
```
|
||||
|
||||
#### 5. GGUF Round-Trip
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_gguf_serialization_roundtrip() {
|
||||
let config = PtBitnetConfig::default();
|
||||
let tensor = vec![1.0; 256];
|
||||
let shape = [1, 256];
|
||||
|
||||
let (ternary, _) = quantize_tensor(&tensor, shape, &config).unwrap();
|
||||
|
||||
// Serialize to GGUF format
|
||||
let gguf_data = ternary.to_gguf_data();
|
||||
assert_eq!(gguf_data.len(), 66); // 1 block = 66 bytes
|
||||
|
||||
// Deserialize
|
||||
let recovered = TernaryTensor::from_gguf_data(&gguf_data, shape, 256).unwrap();
|
||||
|
||||
assert_eq!(recovered.packed_data, ternary.packed_data);
|
||||
assert_eq!(recovered.scales, ternary.scales);
|
||||
}
|
||||
```
|
||||
|
||||
### Benchmark Tests
|
||||
|
||||
#### 6. Performance Regression
|
||||
|
||||
```rust
|
||||
#[bench]
|
||||
fn bench_absmean_ternary_256(b: &mut Bencher) {
|
||||
let config = PtBitnetConfig::default();
|
||||
let block: Vec<f32> = (0..256).map(|i| (i as f32) / 256.0).collect();
|
||||
|
||||
b.iter(|| {
|
||||
let _ = absmean_ternary(&block, &config);
|
||||
});
|
||||
}
|
||||
|
||||
#[bench]
|
||||
fn bench_pack_ternary_1024(b: &mut Bencher) {
|
||||
let values = vec![1i8; 1024];
|
||||
|
||||
b.iter(|| {
|
||||
let _ = pack_ternary(&values);
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Correctness Validation Tests
|
||||
|
||||
#### 7. Bit-Exact Validation Against Reference
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_dequantize_matches_reference() {
|
||||
// Reference implementation (naive)
|
||||
fn reference_dequant(ternary: &[i8], scale: f32) -> Vec<f32> {
|
||||
ternary.iter().map(|&t| (t as f32) * scale).collect()
|
||||
}
|
||||
|
||||
let config = PtBitnetConfig::default();
|
||||
let tensor = vec![1.5, -2.3, 0.1, -0.4]; // Extend to 256
|
||||
let tensor_256 = /* pad to 256 */;
|
||||
let shape = [1, 256];
|
||||
|
||||
let (ternary, _) = quantize_tensor(&tensor_256, shape, &config).unwrap();
|
||||
|
||||
// Unpack and dequantize
|
||||
let unpacked = unpack_ternary(&ternary.packed_data, 256);
|
||||
let reference = reference_dequant(&unpacked, ternary.scales[0].to_f32());
|
||||
let optimized = ternary.to_fp32();
|
||||
|
||||
// Allow small floating-point error
|
||||
for (r, o) in reference.iter().zip(optimized.iter()) {
|
||||
assert!((r - o).abs() < 1e-5);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Test Organization
|
||||
|
||||
```
|
||||
crates/ruvllm/src/bitnet/tests/
|
||||
├── quantizer_tests.rs # absmean, pack/unpack
|
||||
├── tensor_tests.rs # TernaryTensor validation
|
||||
├── dequantize_tests.rs # BITNET_T158 dequant
|
||||
├── integration_tests.rs # Full pipeline, GGUF round-trip
|
||||
└── benches.rs # Performance benchmarks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## G. Implementation Phases
|
||||
|
||||
### Phase 0.1: Core Data Structures (~2-3 days)
|
||||
1. `bitnet/mod.rs` - module structure
|
||||
2. `bitnet/config.rs` - `PtBitnetConfig`
|
||||
3. `bitnet/ternary_tensor.rs` - `TernaryTensor`, `TernaryBlock`
|
||||
4. Unit tests for validation
|
||||
|
||||
### Phase 0.2: Quantization Algorithm (~3-4 days)
|
||||
1. `bitnet/quantizer.rs` - `absmean_ternary()`
|
||||
2. Pack/unpack functions
|
||||
3. `quantize_tensor()` main entry point
|
||||
4. Unit tests for correctness
|
||||
|
||||
### Phase 0.3: Dequantization (~2 days)
|
||||
1. `bitnet/dequantize.rs` - block and tensor dequant
|
||||
2. Integration with existing `quantization.rs`
|
||||
3. Round-trip tests
|
||||
|
||||
### Phase 0.4: GGUF Integration (~2-3 days)
|
||||
1. Modify `gguf/quantization.rs` - add `BITNET_T158` enum variant
|
||||
2. Add metadata keys
|
||||
3. GGUF serialization/deserialization
|
||||
4. Integration tests
|
||||
|
||||
### Phase 0.5: Validation & Benchmarks (~2 days)
|
||||
1. Full pipeline integration tests
|
||||
2. Performance benchmarks
|
||||
3. Bit-exact validation
|
||||
4. Documentation
|
||||
|
||||
**Total Estimated Effort:** ~13-16 days for clean, well-tested implementation
|
||||
|
||||
---
|
||||
|
||||
## H. Open Design Questions
|
||||
|
||||
| # | Question | Impact | Recommendation |
|
||||
|---|----------|--------|----------------|
|
||||
| 1 | Use `IQ1_S` (type 19) or new `BITNET_T158` (type 30)? | Compatibility | **New type 30** - cleaner separation, avoids confusion with IQ1_S's codebook format |
|
||||
| 2 | Padding strategy for last block if not aligned? | Correctness | **Zero-pad** - simplest, matches BitNet spec |
|
||||
| 3 | Should calibration be mandatory or optional? | Quality vs Speed | **Optional** - Phase 0 can work without it, add later if needed |
|
||||
| 4 | F16 or F32 for internal scale computation? | Precision | **F32 internally, store as F16** - extra precision during compute |
|
||||
| 5 | Handle NaN/Inf in input tensors? | Robustness | **Fail-fast** - corrupted weights should not be silently ignored |
|
||||
| 6 | Support block sizes other than 256? | Flexibility | **No** - BitNet spec is 256, simplifies code |
|
||||
| 7 | Multi-threading for per-block quantization? | Performance | **Not in Phase 0** - can add via rayon later |
|
||||
| 8 | Store sparsity per-block in GGUF? | Kernel optimization | **No** - compute on-the-fly during dequant, saves space |
|
||||
|
||||
---
|
||||
|
||||
## I. Dependencies and Prerequisites
|
||||
|
||||
### Existing RuvLLM Components (Reused)
|
||||
- `crates/ruvllm/src/error.rs` - `RuvLLMError` enum
|
||||
- `crates/ruvllm/src/gguf/parser.rs` - GGUF parsing (unchanged)
|
||||
- `crates/ruvllm/src/gguf/quantization.rs` - Enum + dispatch (modified)
|
||||
- `half` crate - FP16 support (already in Cargo.toml)
|
||||
|
||||
### New External Dependencies
|
||||
None - uses only existing dependencies
|
||||
|
||||
### Minimum Rust Version
|
||||
Same as RuvLLM (likely 1.70+)
|
||||
|
||||
---
|
||||
|
||||
## J. Non-Goals (Out of Scope)
|
||||
|
||||
1. **Calibration implementation** - Deferred to future phase
|
||||
2. **TL1/TL2 kernel implementation** - Separate ADR/DDD
|
||||
3. **Model loader integration** - Separate backend implementation
|
||||
4. **Performance optimization** - Phase 0 is correctness-first
|
||||
5. **WASM support** - Desktop/server only for Phase 0
|
||||
6. **Dynamic quantization** - Only post-training static
|
||||
7. **Mixed-precision strategies** - All-or-nothing ternary for Phase 0
|
||||
|
||||
---
|
||||
|
||||
## K. Success Criteria
|
||||
|
||||
**This design is complete when:**
|
||||
|
||||
1. All struct definitions have complete field specifications
|
||||
2. All function signatures are documented with arguments, returns, errors
|
||||
3. Module organization is clear and follows Rust conventions
|
||||
4. GGUF integration points are precisely specified
|
||||
5. Error handling covers all failure modes
|
||||
6. Test plan covers correctness, integration, and performance
|
||||
7. Implementation phases are realistic and sequenced
|
||||
8. Open questions are documented with recommendations
|
||||
|
||||
**Implementation is successful when:**
|
||||
|
||||
1. All unit tests pass
|
||||
2. Round-trip GGUF serialization is bit-exact
|
||||
3. Dequantization produces correct FP32 output
|
||||
4. Integration with existing GGUF pipeline works
|
||||
5. Quantization of GLM-4.7-Flash completes without errors
|
||||
6. Exported GGUF file is loadable by model loader
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Code Size Estimates
|
||||
|
||||
| File | Estimated Lines | Complexity |
|
||||
|------|----------------|------------|
|
||||
| `bitnet/mod.rs` | ~50 | Low |
|
||||
| `bitnet/config.rs` | ~80 | Low |
|
||||
| `bitnet/ternary_tensor.rs` | ~200 | Medium |
|
||||
| `bitnet/quantizer.rs` | ~350 | High |
|
||||
| `bitnet/dequantize.rs` | ~150 | Medium |
|
||||
| `gguf/quantization.rs` (changes) | ~100 | Low |
|
||||
| Tests | ~800 | Medium |
|
||||
| **Total** | **~1,730 lines** | |
|
||||
|
||||
**Comparison to ADR-018 estimate:** ~200-300 lines core quantizer → Actual ~350 lines (reasonable given struct overhead)
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: Memory Layout Examples
|
||||
|
||||
### TernaryBlock Storage (66 bytes)
|
||||
|
||||
```
|
||||
Byte Offset | Content
|
||||
------------|--------
|
||||
0-63 | Packed 2-bit ternary (256 values)
|
||||
64-65 | FP16 scale (little-endian)
|
||||
```
|
||||
|
||||
### 2-Bit Packing Example
|
||||
|
||||
```
|
||||
Values: [+1, -1, 0, +1]
|
||||
Encoding: [10, 00, 01, 10]
|
||||
Packed byte: 10_00_01_10 = 0x86
|
||||
```
|
||||
|
||||
### GGUF Tensor Data Layout
|
||||
|
||||
```
|
||||
[TensorInfo] (in header)
|
||||
name: "model.layers.0.mlp.gate_proj.weight"
|
||||
shape: [4096, 11008]
|
||||
dtype: BITNET_T158 (30)
|
||||
offset: 0x1000
|
||||
|
||||
[Tensor Data] (at offset 0x1000)
|
||||
Block 0: [64 bytes packed][2 bytes scale]
|
||||
Block 1: [64 bytes packed][2 bytes scale]
|
||||
...
|
||||
Block N: [64 bytes packed][2 bytes scale]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**End of Design Document**
|
||||
|
||||
1942
vendor/ruvector/docs/architecture/coherence-engine-ddd.md
vendored
Normal file
1942
vendor/ruvector/docs/architecture/coherence-engine-ddd.md
vendored
Normal file
File diff suppressed because it is too large
Load Diff
816
vendor/ruvector/docs/architecture/quantum-engine/quantum-engine-ddd-integration.md
vendored
Normal file
816
vendor/ruvector/docs/architecture/quantum-engine/quantum-engine-ddd-integration.md
vendored
Normal file
@@ -0,0 +1,816 @@
|
||||
# Quantum Simulation Engine: Domain-Driven Design - Integration Patterns
|
||||
|
||||
**Version**: 0.1
|
||||
**Date**: 2026-02-06
|
||||
**Status**: Draft
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document defines the cross-domain integration patterns, anti-corruption layers, shared kernel, and context mapping that connect the quantum simulation engine (`ruqu-core`, `ruqu-algorithms`, `ruqu-wasm`) to the existing ruVector subsystems. It specifies how the simulation domain communicates with the coherence engine, agent system, graph database, and WASM platform without contaminating bounded context boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Context Map
|
||||
|
||||
```
|
||||
+-------------------------------------------------------------------+
|
||||
| CONTEXT MAP |
|
||||
| |
|
||||
| +--------------------+ Shared Kernel +------------------+ |
|
||||
| | |<----(ruvector-math)--->| | |
|
||||
| | Quantum Sim | | Coherence | |
|
||||
| | Engine | | Engine | |
|
||||
| | (ruqu-core, | Anti-Corruption | (ruvector- | |
|
||||
| | ruqu-algorithms) |<----(CoherenceBridge) | coherence) | |
|
||||
| | | | | |
|
||||
| +--------+-----------+ +------------------+ |
|
||||
| | ^ |
|
||||
| | Customer-Supplier | |
|
||||
| v | |
|
||||
| +--------------------+ +---------+--------+ |
|
||||
| | | Partnership | | |
|
||||
| | Agent System |<----------------->| Graph Database | |
|
||||
| | (claude-flow) | | (ruvector-graph)| |
|
||||
| | | | | |
|
||||
| +--------------------+ +------------------+ |
|
||||
| | |
|
||||
| | Conformist |
|
||||
| v |
|
||||
| +--------------------+ Published Language |
|
||||
| | |<----(OpenQASM 3.0) |
|
||||
| | WASM Platform | |
|
||||
| | (ruqu-wasm) | |
|
||||
| | | |
|
||||
| +--------------------+ |
|
||||
+-------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Relationship Summary
|
||||
|
||||
| Upstream | Downstream | Pattern | Shared Artifact |
|
||||
|----------|------------|---------|-----------------|
|
||||
| Quantum Engine | Coherence Engine | Anti-Corruption Layer | `CoherenceBridge` trait |
|
||||
| ruvector-math | Quantum Engine, Coherence Engine | Shared Kernel | `Complex<f64>`, SIMD traits |
|
||||
| Quantum Engine | Agent System | Customer-Supplier | `SimulationContract` |
|
||||
| ruvector-graph | Quantum Engine | Partnership | Adjacency structures |
|
||||
| External tools | Quantum Engine | Published Language | OpenQASM 3.0 |
|
||||
| WASM platform | ruqu-wasm | Conformist | WASM constraints accepted |
|
||||
|
||||
---
|
||||
|
||||
## 1. Anti-Corruption Layer: Coherence Bridge
|
||||
|
||||
The Coherence Bridge translates between the quantum simulation domain language and the ruQu coherence domain. It prevents internal types from either domain from leaking into the other.
|
||||
|
||||
### Purpose
|
||||
|
||||
- Map syndrome bitstrings produced by surface code experiments into the `SyndromeFilter` input format expected by the coherence engine
|
||||
- Map decoder correction outputs (Pauli operators) to gate operations the simulation can apply
|
||||
- Translate coherence scores into the `CoherenceScore` value object used by simulation sessions
|
||||
- Isolate the quantum simulation engine from changes in the coherence engine's internal API
|
||||
|
||||
### Interface
|
||||
|
||||
```rust
|
||||
/// Anti-corruption layer between quantum simulation and coherence engine.
|
||||
///
|
||||
/// All translation between bounded contexts passes through this trait.
|
||||
/// Neither domain's internal types appear on the wrong side of this boundary.
|
||||
pub trait CoherenceBridge: Send + Sync {
|
||||
/// Translate a quantum syndrome into a coherence engine filter input.
|
||||
///
|
||||
/// The simulation produces `SyndromeBits`; the coherence engine expects
|
||||
/// `DetectorBitmap` with specific tile routing. This method handles the
|
||||
/// mapping, including stabilizer-to-detector index translation.
|
||||
fn syndrome_to_filter_input(
|
||||
&self,
|
||||
syndrome: &SyndromeBits,
|
||||
code_distance: u32,
|
||||
) -> Result<CoherenceFilterInput, BridgeError>;
|
||||
|
||||
/// Translate a coherence decoder correction into Pauli gate operations.
|
||||
///
|
||||
/// The coherence engine's decoder outputs correction vectors in its own
|
||||
/// format. This method maps them to `PauliOp` sequences that the
|
||||
/// simulation engine can apply as gate operations.
|
||||
fn correction_to_pauli_ops(
|
||||
&self,
|
||||
correction: &CoherenceCorrectionOutput,
|
||||
) -> Result<Vec<(QubitIndex, PauliOp)>, BridgeError>;
|
||||
|
||||
/// Query the current coherence score for a simulation region.
|
||||
///
|
||||
/// Returns a domain-native `CoherenceScore` value object, hiding
|
||||
/// the coherence engine's internal energy representation.
|
||||
fn query_coherence_score(
|
||||
&self,
|
||||
region_id: &str,
|
||||
) -> Result<CoherenceScore, BridgeError>;
|
||||
|
||||
/// Submit simulation metrics to the coherence monitoring system.
|
||||
///
|
||||
/// Translates `SimulationMetrics` into the coherence engine's
|
||||
/// signal ingestion format without exposing internal types.
|
||||
fn report_simulation_metrics(
|
||||
&self,
|
||||
session_id: &str,
|
||||
metrics: &SimulationMetrics,
|
||||
) -> Result<(), BridgeError>;
|
||||
}
|
||||
|
||||
/// Opaque input type for the coherence filter (ACL boundary type).
|
||||
pub struct CoherenceFilterInput {
|
||||
pub detector_bitmap: Vec<u64>,
|
||||
pub tile_id: u8,
|
||||
pub round_id: u64,
|
||||
}
|
||||
|
||||
/// Opaque output type from the coherence decoder (ACL boundary type).
|
||||
pub struct CoherenceCorrectionOutput {
|
||||
pub corrections: Vec<(u32, u8)>, // (qubit_index, pauli_code)
|
||||
pub confidence: f64,
|
||||
}
|
||||
|
||||
/// Errors specific to the bridge translation layer.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum BridgeError {
|
||||
#[error("syndrome dimension mismatch: expected {expected}, got {actual}")]
|
||||
SyndromeDimensionMismatch { expected: usize, actual: usize },
|
||||
|
||||
#[error("unknown correction code: {0}")]
|
||||
UnknownCorrectionCode(u8),
|
||||
|
||||
#[error("coherence engine unavailable: {0}")]
|
||||
CoherenceUnavailable(String),
|
||||
|
||||
#[error("tile routing failed for code distance {0}")]
|
||||
TileRoutingFailed(u32),
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Sketch
|
||||
|
||||
```rust
|
||||
/// Production implementation backed by the ruQu coherence engine.
|
||||
pub struct RuQuCoherenceBridge {
|
||||
/// Reference to the coherence engine's filter pipeline.
|
||||
filter_pipeline: Arc<dyn FilterPipelineAccess>,
|
||||
/// Stabilizer-to-detector mapping, precomputed per code distance.
|
||||
detector_maps: HashMap<u32, StabilizerDetectorMap>,
|
||||
}
|
||||
|
||||
impl CoherenceBridge for RuQuCoherenceBridge {
|
||||
fn syndrome_to_filter_input(
|
||||
&self,
|
||||
syndrome: &SyndromeBits,
|
||||
code_distance: u32,
|
||||
) -> Result<CoherenceFilterInput, BridgeError> {
|
||||
let map = self.detector_maps.get(&code_distance)
|
||||
.ok_or(BridgeError::TileRoutingFailed(code_distance))?;
|
||||
|
||||
let mut bitmap = vec![0u64; (map.detector_count + 63) / 64];
|
||||
for (stab_idx, &fired) in syndrome.0.iter().enumerate() {
|
||||
if fired {
|
||||
let det_idx = map.stabilizer_to_detector(stab_idx);
|
||||
bitmap[det_idx / 64] |= 1u64 << (det_idx % 64);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(CoherenceFilterInput {
|
||||
detector_bitmap: bitmap,
|
||||
tile_id: map.tile_for_distance(code_distance),
|
||||
round_id: 0, // Filled by caller
|
||||
})
|
||||
}
|
||||
|
||||
fn correction_to_pauli_ops(
|
||||
&self,
|
||||
correction: &CoherenceCorrectionOutput,
|
||||
) -> Result<Vec<(QubitIndex, PauliOp)>, BridgeError> {
|
||||
correction.corrections.iter()
|
||||
.map(|(qubit, code)| {
|
||||
let op = match code {
|
||||
0 => PauliOp::I,
|
||||
1 => PauliOp::X,
|
||||
2 => PauliOp::Y,
|
||||
3 => PauliOp::Z,
|
||||
other => return Err(BridgeError::UnknownCorrectionCode(*other)),
|
||||
};
|
||||
Ok((QubitIndex(*qubit), op))
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn query_coherence_score(
|
||||
&self,
|
||||
region_id: &str,
|
||||
) -> Result<CoherenceScore, BridgeError> {
|
||||
let energy = self.filter_pipeline.current_energy(region_id)
|
||||
.map_err(|e| BridgeError::CoherenceUnavailable(e.to_string()))?;
|
||||
// Invert: high energy = low coherence
|
||||
Ok(CoherenceScore(1.0 / (1.0 + energy as f64)))
|
||||
}
|
||||
|
||||
fn report_simulation_metrics(
|
||||
&self,
|
||||
_session_id: &str,
|
||||
_metrics: &SimulationMetrics,
|
||||
) -> Result<(), BridgeError> {
|
||||
// Translate to coherence signal format and submit
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Shared Kernel: ruvector-math
|
||||
|
||||
Both the quantum simulation engine and the coherence engine depend on a shared mathematical foundation. Changes to `ruvector-math` must be validated against both domains before release.
|
||||
|
||||
### Shared Types
|
||||
|
||||
```rust
|
||||
// ruvector-math provides these types used by both domains:
|
||||
|
||||
/// Complex number with f64 components (re, im).
|
||||
/// Used by quantum state vectors AND coherence restriction maps.
|
||||
pub struct Complex<T> {
|
||||
pub re: T,
|
||||
pub im: T,
|
||||
}
|
||||
|
||||
/// Cache-line-aligned vector for SIMD operations.
|
||||
/// Used by both state vector operations and residual computation.
|
||||
#[repr(align(64))]
|
||||
pub struct AlignedVec<T> {
|
||||
data: Vec<T>,
|
||||
}
|
||||
|
||||
/// SIMD dispatch trait: implementations select AVX2, NEON, or scalar
|
||||
/// at runtime depending on platform capabilities.
|
||||
pub trait SimdOps {
|
||||
fn dot_product_f64(a: &[f64], b: &[f64]) -> f64;
|
||||
fn complex_multiply(a: &[Complex<f64>], b: &[Complex<f64>], out: &mut [Complex<f64>]);
|
||||
fn norm_squared(v: &[Complex<f64>]) -> f64;
|
||||
fn axpy(alpha: f64, x: &[f64], y: &mut [f64]);
|
||||
}
|
||||
```
|
||||
|
||||
### Change Coordination Protocol
|
||||
|
||||
1. Any proposed change to `ruvector-math` must include tests for both the quantum engine use case and the coherence engine use case.
|
||||
2. The CI pipeline runs `cargo test -p ruqu-core` and `cargo test -p ruvector-coherence` after any change to `ruvector-math`.
|
||||
3. Breaking changes require a version bump and simultaneous updates to both downstream crates.
|
||||
4. Performance regressions in SIMD operations must be caught by benchmarks in both domains.
|
||||
|
||||
### Boundary
|
||||
|
||||
Only the types and functions listed above cross the shared kernel boundary. Internal implementation details of `ruvector-math` (e.g., specific SIMD intrinsics, platform detection) are not shared.
|
||||
|
||||
---
|
||||
|
||||
## 3. Customer-Supplier: Agent System Integration
|
||||
|
||||
The ruVector agent system (powered by claude-flow) acts as the customer, invoking the quantum simulation engine as a supplier. The contract defines what the agent can request and what it receives in return.
|
||||
|
||||
### Contract
|
||||
|
||||
```rust
|
||||
/// Contract for agent system access to the quantum simulation engine.
|
||||
///
|
||||
/// The agent system (customer) invokes these operations.
|
||||
/// The quantum engine (supplier) fulfills them.
|
||||
pub trait SimulationContract: Send + Sync {
|
||||
/// Build a circuit from a high-level description.
|
||||
fn build_circuit(&self, spec: CircuitSpec) -> Result<CircuitHandle, ContractError>;
|
||||
|
||||
/// Run a simulation and return results.
|
||||
fn run_simulation(&self, circuit: CircuitHandle, config: RunConfig)
|
||||
-> Result<SimulationOutput, ContractError>;
|
||||
|
||||
/// Run a VQE optimization and return the ground state energy.
|
||||
fn run_vqe(&self, spec: VQESpec) -> Result<VQEOutput, ContractError>;
|
||||
|
||||
/// Query resource requirements before committing to a run.
|
||||
fn estimate_resources(&self, circuit: CircuitHandle) -> Result<ResourceEstimate, ContractError>;
|
||||
}
|
||||
|
||||
/// High-level circuit specification from the agent.
|
||||
pub struct CircuitSpec {
|
||||
pub qubit_count: u32,
|
||||
pub gate_sequence: Vec<GateSpec>,
|
||||
pub parameters: HashMap<String, f64>,
|
||||
}
|
||||
|
||||
/// Agent-facing gate specification (simplified from internal Gate).
|
||||
pub struct GateSpec {
|
||||
pub gate_type: String,
|
||||
pub target: u32,
|
||||
pub control: Option<u32>,
|
||||
pub angle: Option<f64>,
|
||||
}
|
||||
|
||||
/// Configuration limits the agent can set.
|
||||
pub struct RunConfig {
|
||||
pub max_shots: u32,
|
||||
pub max_memory_mb: u32,
|
||||
pub timeout_seconds: u32,
|
||||
pub backend_preference: Option<String>,
|
||||
}
|
||||
|
||||
/// Results returned to the agent.
|
||||
pub struct SimulationOutput {
|
||||
pub measurement_counts: HashMap<String, u32>,
|
||||
pub expectation_values: Vec<(String, f64)>,
|
||||
pub metrics: SimulationMetrics,
|
||||
}
|
||||
|
||||
/// VQE-specific results.
|
||||
pub struct VQEOutput {
|
||||
pub ground_state_energy: f64,
|
||||
pub optimal_parameters: Vec<f64>,
|
||||
pub iterations: u32,
|
||||
pub converged: bool,
|
||||
}
|
||||
|
||||
/// Resource estimate before execution.
|
||||
pub struct ResourceEstimate {
|
||||
pub memory_bytes: usize,
|
||||
pub estimated_time_ms: f64,
|
||||
pub qubit_count: u32,
|
||||
pub gate_count: u32,
|
||||
}
|
||||
```
|
||||
|
||||
### Agent Integration Flow
|
||||
|
||||
```
|
||||
Agent Context Quantum Engine Result
|
||||
| | |
|
||||
| 1. build_circuit() | |
|
||||
|--------------------->| |
|
||||
| CircuitHandle | |
|
||||
|<---------------------| |
|
||||
| | |
|
||||
| 2. estimate_resources| |
|
||||
|--------------------->| |
|
||||
| ResourceEstimate | |
|
||||
|<---------------------| |
|
||||
| | |
|
||||
| 3. run_simulation() | |
|
||||
|--------------------->| |
|
||||
| | [executes internally]|
|
||||
| |---+ |
|
||||
| | | circuit -> state |
|
||||
| | | gates -> measure |
|
||||
| |<--+ |
|
||||
| SimulationOutput | |
|
||||
|<---------------------| |
|
||||
| | |
|
||||
| 4. Agent acts on | |
|
||||
| results | |
|
||||
v v v
|
||||
```
|
||||
|
||||
### Resource Limits
|
||||
|
||||
The supplier enforces resource limits set by the customer:
|
||||
|
||||
- Memory: Capped at `max_memory_mb`; returns error if state vector exceeds budget
|
||||
- Time: Monitored per-step; simulation aborted if `timeout_seconds` exceeded
|
||||
- Qubits: Platform limit (30 for state vector, higher for tensor network) communicated via `estimate_resources`
|
||||
|
||||
---
|
||||
|
||||
## 4. Published Language: OpenQASM Compatibility
|
||||
|
||||
A future integration point for importing and exporting circuits in the OpenQASM 3.0 standard, enabling interoperability with IBM Qiskit, Google Cirq, and other quantum frameworks.
|
||||
|
||||
### Translation Layer
|
||||
|
||||
```rust
|
||||
/// Trait for OpenQASM import/export.
|
||||
pub trait OpenQASMTranslator {
|
||||
/// Parse an OpenQASM 3.0 string into the internal circuit representation.
|
||||
fn import(&self, qasm: &str) -> Result<QuantumCircuit, TranslationError>;
|
||||
|
||||
/// Export an internal circuit to OpenQASM 3.0 format.
|
||||
fn export(&self, circuit: &QuantumCircuit) -> Result<String, TranslationError>;
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum TranslationError {
|
||||
#[error("unsupported gate in OpenQASM: {0}")]
|
||||
UnsupportedGate(String),
|
||||
|
||||
#[error("parse error at line {line}: {message}")]
|
||||
ParseError { line: u32, message: String },
|
||||
|
||||
#[error("circuit uses features not supported by OpenQASM 3.0: {0}")]
|
||||
UnsupportedFeature(String),
|
||||
}
|
||||
```
|
||||
|
||||
### Scope
|
||||
|
||||
- Phase 1: Import basic gate circuits (H, CNOT, Rz, measure)
|
||||
- Phase 2: Export circuits with parameter bindings
|
||||
- Phase 3: Support custom gate definitions and classical control flow
|
||||
|
||||
---
|
||||
|
||||
## 5. Conformist: WASM Platform
|
||||
|
||||
The `ruqu-wasm` crate conforms to WASM platform constraints without attempting to work around them. Limitations are accepted as-is, with graceful degradation where capabilities are reduced.
|
||||
|
||||
### Accepted Constraints
|
||||
|
||||
| Constraint | Impact | Mitigation |
|
||||
|------------|--------|------------|
|
||||
| No native threads | Single-threaded execution | Sequential gate application; no rayon |
|
||||
| 4GB memory limit | Max ~25 qubits (state vector) | Tensor network backend for larger circuits |
|
||||
| No filesystem | Cannot persist results | Return all data via JS callbacks |
|
||||
| No system clock | Timing metrics unavailable | Use `performance.now()` via JS bridge |
|
||||
| No SIMD (some runtimes) | Slower math | Feature-gated SIMD; scalar fallback |
|
||||
|
||||
### WASM API Surface
|
||||
|
||||
```rust
|
||||
/// Public API exposed to JavaScript via wasm-bindgen.
|
||||
///
|
||||
/// This is the conformist boundary: we accept WASM constraints
|
||||
/// and expose only what the platform allows.
|
||||
#[cfg(target_arch = "wasm32")]
|
||||
pub mod wasm_api {
|
||||
use wasm_bindgen::prelude::*;
|
||||
|
||||
#[wasm_bindgen]
|
||||
pub struct WasmSimulator {
|
||||
session: SimulationSession,
|
||||
}
|
||||
|
||||
#[wasm_bindgen]
|
||||
impl WasmSimulator {
|
||||
/// Create a new simulator for the given qubit count.
|
||||
#[wasm_bindgen(constructor)]
|
||||
pub fn new(qubit_count: u32) -> Result<WasmSimulator, JsValue> {
|
||||
// Enforce WASM-specific qubit limit
|
||||
if qubit_count > 25 {
|
||||
return Err(JsValue::from_str(
|
||||
"WASM platform supports at most 25 qubits in state vector mode"
|
||||
));
|
||||
}
|
||||
// ... construction
|
||||
Ok(WasmSimulator { session: todo!() })
|
||||
}
|
||||
|
||||
/// Add a gate to the circuit.
|
||||
pub fn add_gate(&mut self, gate_type: &str, target: u32, control: Option<u32>)
|
||||
-> Result<(), JsValue> { Ok(()) }
|
||||
|
||||
/// Run the simulation and return measurement counts as JSON.
|
||||
pub fn run(&mut self, shots: u32) -> Result<String, JsValue> {
|
||||
Ok("{}".to_string())
|
||||
}
|
||||
|
||||
/// Get memory usage estimate in bytes.
|
||||
pub fn memory_estimate(&self) -> usize { 0 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Partnership: Graph Database Integration
|
||||
|
||||
The `ruvector-graph` crate and the quantum simulation engine have a bidirectional partnership around graph-structured problems, particularly QAOA and MaxCut.
|
||||
|
||||
### Data Flow
|
||||
|
||||
```rust
|
||||
/// Graph data provided by ruvector-graph for quantum optimization.
|
||||
pub struct GraphProblem {
|
||||
pub vertex_count: u32,
|
||||
pub edges: Vec<(u32, u32, f64)>, // (source, target, weight)
|
||||
pub problem_type: GraphProblemType,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum GraphProblemType { MaxCut, GraphColoring, TSP }
|
||||
|
||||
/// Results returned to ruvector-graph for annotation.
|
||||
pub struct QuantumGraphResult {
|
||||
pub objective_value: CutValue,
|
||||
pub partition: Vec<bool>,
|
||||
pub confidence: f64,
|
||||
pub circuit_depth: CircuitDepth,
|
||||
}
|
||||
|
||||
/// Partnership interface: both sides contribute and consume.
|
||||
pub trait GraphQuantumPartnership {
|
||||
/// Graph -> Quantum: convert graph problem to QAOA circuit.
|
||||
fn graph_to_qaoa_circuit(
|
||||
&self,
|
||||
problem: &GraphProblem,
|
||||
layers: u32,
|
||||
) -> Result<QuantumCircuit, DomainError>;
|
||||
|
||||
/// Quantum -> Graph: feed optimization results back as graph annotations.
|
||||
fn annotate_graph_with_result(
|
||||
&self,
|
||||
problem: &GraphProblem,
|
||||
result: &QuantumGraphResult,
|
||||
) -> Result<GraphAnnotation, DomainError>;
|
||||
|
||||
/// Shared interest: partition graph using ruvector-mincut for subproblem decomposition.
|
||||
fn decompose_problem(
|
||||
&self,
|
||||
problem: &GraphProblem,
|
||||
max_subproblem_qubits: u32,
|
||||
) -> Result<Vec<GraphProblem>, DomainError>;
|
||||
}
|
||||
|
||||
/// Annotation written back to the graph database.
|
||||
pub struct GraphAnnotation {
|
||||
pub vertex_labels: HashMap<u32, String>,
|
||||
pub edge_labels: HashMap<(u32, u32), String>,
|
||||
pub metadata: HashMap<String, String>,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Cutting Concerns
|
||||
|
||||
### Error Handling Across Boundaries
|
||||
|
||||
Each bounded context defines its own error type. At integration boundaries, errors are translated through the ACL rather than propagated directly.
|
||||
|
||||
```rust
|
||||
/// Integration boundary error: wraps domain errors from either side.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum IntegrationError {
|
||||
#[error("quantum engine error: {0}")]
|
||||
QuantumEngine(#[from] DomainError),
|
||||
|
||||
#[error("coherence bridge error: {0}")]
|
||||
CoherenceBridge(#[from] BridgeError),
|
||||
|
||||
#[error("contract violation: {0}")]
|
||||
ContractViolation(String),
|
||||
|
||||
#[error("resource limit exceeded: {0}")]
|
||||
ResourceLimit(String),
|
||||
}
|
||||
```
|
||||
|
||||
### Observability
|
||||
|
||||
Distributed tracing spans cross crate boundaries with a shared trace context.
|
||||
|
||||
- Each integration call propagates a `TraceId` through the ACL
|
||||
- The coherence bridge logs translation events at `DEBUG` level
|
||||
- Agent contract calls log at `INFO` with duration and resource usage
|
||||
- WASM calls use `console.log` via the JS bridge when tracing is enabled
|
||||
|
||||
### Resource Management
|
||||
|
||||
Memory and thread resources are coordinated with the ruVector runtime.
|
||||
|
||||
- State vector allocation checks the global memory budget before proceeding
|
||||
- Tensor network contractions respect thread pool limits shared with rayon
|
||||
- WASM mode has a fixed 4GB ceiling enforced at the conformist boundary
|
||||
- All resource allocation events emit `MemoryAllocated` / `MemoryReleased` domain events
|
||||
|
||||
### Configuration Propagation
|
||||
|
||||
Configuration flows from the ruVector root config into the quantum engine.
|
||||
|
||||
```rust
|
||||
/// Quantum engine configuration derived from ruVector global config.
|
||||
pub struct QuantumEngineConfig {
|
||||
pub max_qubits: u32,
|
||||
pub default_backend: BackendType,
|
||||
pub memory_budget_bytes: usize,
|
||||
pub thread_count: usize,
|
||||
pub coherence_bridge_enabled: bool,
|
||||
pub wasm_mode: bool,
|
||||
}
|
||||
|
||||
impl From<&RuVectorConfig> for QuantumEngineConfig {
|
||||
fn from(global: &RuVectorConfig) -> Self {
|
||||
Self {
|
||||
max_qubits: global.quantum.max_qubits.unwrap_or(30),
|
||||
default_backend: global.quantum.backend.parse().unwrap_or(BackendType::StateVector),
|
||||
memory_budget_bytes: global.memory.budget_bytes,
|
||||
thread_count: global.runtime.thread_count,
|
||||
coherence_bridge_enabled: global.coherence.enabled,
|
||||
wasm_mode: cfg!(target_arch = "wasm32"),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Event Flow Diagrams
|
||||
|
||||
### 1. VQE Optimization Flow
|
||||
|
||||
```
|
||||
Agent CircuitBuilder SimSession QuantumState Optimizer
|
||||
| | | | |
|
||||
| build_circuit(spec) | | | |
|
||||
|-------------------->| | | |
|
||||
| CircuitHandle | | | |
|
||||
|<--------------------| | | |
|
||||
| | | | |
|
||||
| run_vqe(spec) | | | |
|
||||
|-------------------------------------------------------------->| |
|
||||
| | | | init(params) |
|
||||
| | | |<---------------|
|
||||
| | | | |
|
||||
| | +-----|---LOOP----------|--------+ |
|
||||
| | | | | | |
|
||||
| | | start() | | |
|
||||
| | | |----->| | | |
|
||||
| | | | apply_gates() | | |
|
||||
| | | | |---------->| | |
|
||||
| | | | | expectation_value | |
|
||||
| | | | |---------->| | |
|
||||
| | | | | energy | | |
|
||||
| | | |<----|-----------| | |
|
||||
| | | | | update(grad) |
|
||||
| | | | |------->| |
|
||||
| | | | | new_params |
|
||||
| | | | |<-------| |
|
||||
| | +-----|---END LOOP------|--------+ |
|
||||
| | | | |
|
||||
| VQEOutput(energy, params) | | |
|
||||
|<-------------------------------------------------------------| |
|
||||
| | | | |
|
||||
```
|
||||
|
||||
### 2. Surface Code QEC with Coherence Bridge
|
||||
|
||||
```
|
||||
SurfaceCodeExp NoiseService CoherenceBridge ruQu Filters Decoder
|
||||
| | | | |
|
||||
| run_cycle() | | | |
|
||||
|--+ | | | |
|
||||
| | inject_errors() | | | |
|
||||
| |---------------->| | | |
|
||||
| | error_list | | | |
|
||||
| |<----------------| | | |
|
||||
| | | | | |
|
||||
| | extract_syndrome() | | |
|
||||
| |--+ | | | |
|
||||
| | | SyndromeBits | | | |
|
||||
| |<-+ | | | |
|
||||
| | | | | |
|
||||
| | syndrome_to_filter_input() | | |
|
||||
| |--------------------------------->| | |
|
||||
| | | FilterInput | | |
|
||||
| | | | process() | |
|
||||
| | | |----------------->| |
|
||||
| | | | Verdict | |
|
||||
| | | |<-----------------| |
|
||||
| | | | | |
|
||||
| | | correction_to_pauli_ops() | |
|
||||
| |<---------------------------------| | |
|
||||
| | | | | |
|
||||
| | decode(syndrome)| | | |
|
||||
| |------------------------------------------------------------------>|
|
||||
| | correction | | | |
|
||||
| |<------------------------------------------------------------------|
|
||||
| | | | | |
|
||||
| | check_logical_error() | | |
|
||||
| |--+ | | | |
|
||||
| | | bool | | | |
|
||||
| |<-+ | | | |
|
||||
| | | | | |
|
||||
| CycleReport | | | |
|
||||
|<-+ | | | |
|
||||
```
|
||||
|
||||
### 3. WASM Deployment Flow
|
||||
|
||||
```
|
||||
Browser JS ruqu-wasm (WASM) ruqu-core Results
|
||||
| | | |
|
||||
| new WasmSimulator(n) | | |
|
||||
|--------------------->| | |
|
||||
| | QuantumState::new(n)| |
|
||||
| |-------------------->| |
|
||||
| | state | |
|
||||
| |<--------------------| |
|
||||
| WasmSimulator | | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| add_gate("h", 0) | | |
|
||||
|--------------------->| | |
|
||||
| | circuit.add_gate() | |
|
||||
| |-------------------->| |
|
||||
| Ok | | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| add_gate("cx", 1, 0) | | |
|
||||
|--------------------->| | |
|
||||
| | circuit.add_gate() | |
|
||||
| |-------------------->| |
|
||||
| Ok | | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| run(1000) | | |
|
||||
|--------------------->| | |
|
||||
| | session.start() | |
|
||||
| |-------------------->| |
|
||||
| | run_to_completion() | |
|
||||
| |-------------------->| |
|
||||
| | | [gate loop] |
|
||||
| | |---+ |
|
||||
| | | | apply_gate() |
|
||||
| | |<--+ |
|
||||
| | | measure() |
|
||||
| | |---+ |
|
||||
| | | | outcomes |
|
||||
| | |<--+ |
|
||||
| | SimulationMetrics | |
|
||||
| |<--------------------| |
|
||||
| | | |
|
||||
| | JSON.serialize(counts) |
|
||||
| |---------------------------------------->|
|
||||
| "{\"00\": 503, \"11\": 497}" | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| [JS callback with results] | |
|
||||
| | | |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Phase 1: Standalone ruqu-core
|
||||
|
||||
**Goal**: A self-contained crate with no external dependencies except `ruvector-math`.
|
||||
|
||||
- Implement `QuantumCircuit`, `QuantumState`, `SimulationSession` aggregates
|
||||
- Implement `CircuitBuilder`, `GateFusionService`, `NoiseInjectionService`
|
||||
- All value objects and domain events defined
|
||||
- Unit tests and property-based tests for normalization, gate unitarity
|
||||
- No coherence bridge, no agent integration, no WASM
|
||||
|
||||
**Dependency**: `ruvector-math` (shared kernel only)
|
||||
|
||||
### Phase 2: ruqu-algorithms + Coherence Integration
|
||||
|
||||
**Goal**: Add VQE, surface code experiments, and the coherence bridge.
|
||||
|
||||
- Implement `VQEOptimization`, `SurfaceCodeExperiment` aggregates
|
||||
- Implement `TensorNetworkState` for circuits exceeding state vector limits
|
||||
- Build `CoherenceBridge` anti-corruption layer
|
||||
- Integrate with ruQu `FilterPipeline` and `MWPMDecoder`
|
||||
- Add `PauliExpectationService`, `ContractionPathOptimizer`
|
||||
- Integration tests: VQE convergence, surface code logical error rate vs theory
|
||||
|
||||
**Dependencies**: `ruqu-core`, `ruvector-math`, `ruqu` (coherence bridge target)
|
||||
|
||||
### Phase 3: ruqu-wasm
|
||||
|
||||
**Goal**: Deploy to browser environments with graceful degradation.
|
||||
|
||||
- Implement `WasmSimulator` conformist wrapper
|
||||
- Add `wasm-bindgen` API surface
|
||||
- Enforce WASM constraints (25-qubit limit, no threads, no filesystem)
|
||||
- JavaScript test harness running circuits in headless browser
|
||||
- Performance benchmarks: gate throughput in WASM vs native
|
||||
|
||||
**Dependencies**: `ruqu-core`, `wasm-bindgen`, `wasm-pack`
|
||||
|
||||
### Phase 4: Full Agent System Integration
|
||||
|
||||
**Goal**: Complete customer-supplier integration with the claude-flow agent system.
|
||||
|
||||
- Implement `SimulationContract` trait and production adapter
|
||||
- Add resource estimation and budget enforcement
|
||||
- Implement `GraphQuantumPartnership` for QAOA/MaxCut
|
||||
- Integration with `ruvector-graph` for graph problem decomposition
|
||||
- End-to-end tests: agent builds circuit, runs simulation, acts on results
|
||||
- OpenQASM import/export (published language)
|
||||
|
||||
**Dependencies**: All previous phases, `ruvector-graph`, `claude-flow` agent SDK
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Evans, E. (2003). "Domain-Driven Design: Tackling Complexity in the Heart of Software."
|
||||
2. Vernon, V. (2013). "Implementing Domain-Driven Design." Chapter 13: Integrating Bounded Contexts.
|
||||
3. Coherence Engine DDD: `docs/architecture/coherence-engine-ddd.md`
|
||||
4. ruQu crate: `crates/ruQu/`
|
||||
5. ruvector-math: shared kernel for SIMD and complex number operations
|
||||
6. OpenQASM 3.0 specification: https://openqasm.com/
|
||||
530
vendor/ruvector/docs/architecture/quantum-engine/quantum-engine-ddd-strategic.md
vendored
Normal file
530
vendor/ruvector/docs/architecture/quantum-engine/quantum-engine-ddd-strategic.md
vendored
Normal file
@@ -0,0 +1,530 @@
|
||||
# Quantum Simulation Engine: Domain-Driven Design - Strategic Design
|
||||
|
||||
**Version**: 0.1
|
||||
**Date**: 2026-02-06
|
||||
**Status**: Draft
|
||||
|
||||
---
|
||||
|
||||
## Domain Vision
|
||||
|
||||
The Quantum Simulation Engine provides **on-device quantum algorithm experimentation** within ruVector's always-on, agentic environment. It enables hybrid classical-quantum research on edge devices, allowing agents to leverage quantum algorithms (VQE, Grover, QAOA, QEC) without cloud services.
|
||||
|
||||
> **This is not a cloud quantum API.** The engine answers: "What does this quantum circuit produce?" entirely on the local device, using classical state-vector simulation with SIMD acceleration.
|
||||
|
||||
The engine follows ruVector's event-driven model: **inert when idle, activated on demand, resources released immediately**. A 20-qubit simulation allocates 16 MiB of state vector on activation and frees it the moment the circuit completes. No background threads, no persistent memory, no warm pools.
|
||||
|
||||
### The Universal Simulation Object
|
||||
|
||||
The power lies in a **single underlying state-vector engine** inside ruqu-sim. Once the linear algebra is fixed, everything else becomes interpretation:
|
||||
|
||||
| Domain | Qubits Become | Gates Become | Measurement Becomes | Circuit Becomes |
|
||||
|--------|---------------|--------------|---------------------|-----------------|
|
||||
| **Chemistry** | Molecular orbitals | Fermionic operators | Energy estimates | VQE ansatz |
|
||||
| **Optimization** | Decision variables | Mixing/cost ops | Cut values | QAOA circuit |
|
||||
| **Search** | Database indices | Oracle + diffusion | Found element | Grover iterations |
|
||||
| **Error Correction** | Data + ancilla qubits | Stabilizer checks | Syndrome bits | QEC cycle |
|
||||
| **Cryptography** | Key register bits | Quantum Fourier transform | Period estimate | Shor subroutine |
|
||||
| **Machine Learning** | Feature dimensions | Parameterized rotations | Classification | Quantum kernel |
|
||||
|
||||
**Same linear algebra, different interpretations. Same state vector = superposition. Same measurement = probabilistic collapse with Born rule.**
|
||||
|
||||
---
|
||||
|
||||
## Strategic Design
|
||||
|
||||
### Core Domain
|
||||
|
||||
**Quantum State Simulation** - The heart of the system, managing quantum state vectors, applying unitary gate operations, and performing projective measurements. This is where the primary complexity and innovation reside. **Most circuits run in a single fast pass; only large entangled states or iterative variational loops require sustained computation.**
|
||||
|
||||
### Supporting Domains
|
||||
|
||||
1. **Circuit Construction** - Building, validating, and optimizing quantum circuits
|
||||
2. **State Management** - State vector lifecycle, entanglement tracking, memory gating
|
||||
3. **Measurement & Observation** - Projective measurement, expectation values, syndrome extraction
|
||||
4. **Algorithm Execution** - High-level quantum algorithm implementations (VQE, Grover, QAOA, QEC)
|
||||
5. **Optimization & Backend** - SIMD acceleration, gate fusion, tensor network backends
|
||||
6. **Deployment & Integration** - WASM compilation, agent bridge, coherence bridge to ruQu
|
||||
|
||||
### Generic Domains
|
||||
|
||||
1. **Linear Algebra** - Complex number math, matrix-vector products, Kronecker products (via `ruvector-math`)
|
||||
2. **Random Sampling** - Measurement outcome sampling, noise injection (via `rand` crate)
|
||||
3. **Logging/Tracing** - Event recording, performance metrics (via `tracing` crate + `ruvector-metrics`)
|
||||
|
||||
### Application Evolution
|
||||
|
||||
| Timeline | Capabilities | Key Value |
|
||||
|----------|-------------|-----------|
|
||||
| **Phase 1 (Now)** | State vector sim, basic gates, VQE/Grover/QAOA | Local quantum experimentation without cloud |
|
||||
| **Phase 2 (6mo)** | Tensor networks, noise models, surface code cycles | Error correction research on edge devices |
|
||||
| **Phase 3 (12mo)** | GPU acceleration, OpenQASM 3.0 import, 30+ qubits | Production-grade quantum algorithm research |
|
||||
| **Phase 4 (24mo)** | Quantum hardware bridge, hybrid cloud-local execution | Real quantum device integration |
|
||||
|
||||
> **Edge-First Quantum**: The system eventually enables agents to reason about quantum algorithms without any network dependency.
|
||||
|
||||
---
|
||||
|
||||
## Ecosystem Integration Map
|
||||
|
||||
```
|
||||
+---------------------------------------------------------------------------+
|
||||
| QUANTUM SIMULATION ENGINE |
|
||||
| |
|
||||
| +-------------------------------------------------------------------+ |
|
||||
| | CIRCUIT CONSTRUCTION DOMAIN | |
|
||||
| | QuantumCircuit | Gate | GateSchedule | CircuitOptimizer | |
|
||||
| | Parameterized templates (VQE ansatz, QAOA mixer, Grover oracle) | |
|
||||
| +-------------------------------------------------------------------+ |
|
||||
| | |
|
||||
| v |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | CORE: QUANTUM STATE | | STATE MANAGEMENT | |
|
||||
| | SIMULATION |<-| DOMAIN | |
|
||||
| | | | | |
|
||||
| | * State vector engine | | * Allocation / deallocation | |
|
||||
| | * Gate application (SIMD) | | * Entanglement tracking | |
|
||||
| | * Unitary evolution | | * Memory gating (zero-idle) | |
|
||||
| | * Tensor contraction | | * State checkpointing | |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | | | |
|
||||
| v v v |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | MEASUREMENT & | | ALGORITHM EXECUTION | |
|
||||
| | OBSERVATION DOMAIN | | DOMAIN | |
|
||||
| | | | | |
|
||||
| | * Projective measurement | | * VQE + classical optimizer | |
|
||||
| | * Expectation values | | * Grover auto-iteration | |
|
||||
| | * Shot-based sampling | | * QAOA graph-based circuits | |
|
||||
| | * Syndrome extraction | | * Surface code + decoder | |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | |
|
||||
| v |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | OPTIMIZATION & | | DEPLOYMENT & | |
|
||||
| | BACKEND DOMAIN | | INTEGRATION DOMAIN | |
|
||||
| | | | | |
|
||||
| | * SIMD dispatch | | * WASM bindings (ruqu-wasm) | |
|
||||
| | * Gate fusion | | * Agent bridge (activation) | |
|
||||
| | * Tensor network backend | | * Observability / metrics | |
|
||||
| | * Cache-local strategies | | * Coherence bridge (ruQu) | |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| |
|
||||
+---------------------------------------------------------------------------+
|
||||
|
|
||||
+--------------------+---------------------+
|
||||
| | |
|
||||
v v v
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| ruvector- | | ruvector- | | ruQu |
|
||||
| math (SIMD) | | metrics | | (decoder bridge) |
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| ruvector- | | ruvector- | | cognitum-gate- |
|
||||
| graph | | nervous-system | | kernel (tiles) |
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------+ +-----------------+
|
||||
| ruvector- | | sona (adaptive |
|
||||
| mincut | | learning) |
|
||||
+--------------+ +-----------------+
|
||||
```
|
||||
|
||||
### Crate-to-Context Mapping
|
||||
|
||||
| Bounded Context | Primary Crate | Supporting Crates |
|
||||
|-----------------|---------------|-------------------|
|
||||
| Circuit Construction | `ruqu-sim` (new) | - |
|
||||
| Quantum State Simulation (Core) | `ruqu-sim` (new) | `ruvector-math` |
|
||||
| State Management | `ruqu-sim` (new) | - |
|
||||
| Measurement & Observation | `ruqu-sim` (new) | `rand` |
|
||||
| Algorithm Execution | `ruqu-sim` (new) | `ruvector-graph` (QAOA) |
|
||||
| Optimization & Backend | `ruqu-sim` (new) | `ruvector-math` (SIMD) |
|
||||
| Deployment & Integration | `ruqu-wasm` (new) | `ruqu`, `ruvector-metrics`, `ruvector-nervous-system` |
|
||||
|
||||
---
|
||||
|
||||
## Context Map
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------+
|
||||
| QUANTUM ENGINE CONTEXT MAP |
|
||||
| |
|
||||
| [Published Language] |
|
||||
| OpenQASM 3.0 format |
|
||||
| | |
|
||||
| v |
|
||||
| +------------------+ +------------------+ |
|
||||
| | | Shared | | |
|
||||
| | CIRCUIT | Kernel | STATE | |
|
||||
| | CONSTRUCTION |<------->| MANAGEMENT | |
|
||||
| | | (Gate, | | |
|
||||
| | Builds circuits | QubitIdx| Allocates and | |
|
||||
| | Validates gates | types) | tracks state | |
|
||||
| +--------+---------+ +--------+---------+ |
|
||||
| | | |
|
||||
| | Customer | Customer |
|
||||
| | Supplier | Supplier |
|
||||
| v v |
|
||||
| +------------------+ +------------------+ |
|
||||
| | | | | |
|
||||
| | MEASUREMENT & |-------->| ALGORITHM | |
|
||||
| | OBSERVATION |Supplier | EXECUTION | |
|
||||
| | |Customer | | |
|
||||
| | Measures states | | Runs VQE/QAOA/ | |
|
||||
| | Extracts syndr. | | Grover/QEC | |
|
||||
| +--------+---------+ +--------+---------+ |
|
||||
| | | |
|
||||
| +------------+---------------+ |
|
||||
| | |
|
||||
| v |
|
||||
| +------------------+ +------------------+ |
|
||||
| | | | | |
|
||||
| | OPTIMIZATION & | | DEPLOYMENT & | |
|
||||
| | BACKEND | | INTEGRATION | |
|
||||
| | | | | |
|
||||
| | SIMD, fusion, | | WASM, agents, | |
|
||||
| | tensor networks | | ruQu bridge | |
|
||||
| +------------------+ +--------+---------+ |
|
||||
| | |
|
||||
| Conformist | Anti-Corruption |
|
||||
| (ruVector | Layer |
|
||||
| APIs) | (ruQu decoder) |
|
||||
| | |
|
||||
+--------------------------------------------------+---------------------+
|
||||
|
|
||||
v
|
||||
[Existing ruVector Ecosystem]
|
||||
|
||||
Context Relationships:
|
||||
<-------> Shared Kernel (shared types across boundary)
|
||||
-------> Customer-Supplier (downstream depends on upstream)
|
||||
Conformist: Deployment conforms to existing ruVector APIs
|
||||
ACL: CoherenceBridge wraps ruQu decoder behind anti-corruption layer
|
||||
Published Language: OpenQASM 3.0 for circuit interchange
|
||||
Open Host Service: ruqu-wasm exposes JS API
|
||||
```
|
||||
|
||||
### Relationship Summary
|
||||
|
||||
| Upstream | Downstream | Pattern | Shared Types |
|
||||
|----------|------------|---------|-------------|
|
||||
| Circuit Construction | State Management | **Shared Kernel** | `Gate`, `QubitIndex`, `GateMatrix` |
|
||||
| Measurement & Observation | Algorithm Execution | **Customer-Supplier** | `MeasurementOutcome`, `ExpectationValue` |
|
||||
| State Management | Algorithm Execution | **Customer-Supplier** | `QuantumState`, `StateCheckpoint` |
|
||||
| State Management | Measurement & Observation | **Customer-Supplier** | `QuantumState`, `Amplitude` |
|
||||
| Optimization & Backend | Core Simulation | **Partnership** | `FusedGateMatrix`, `OptimizationHint` |
|
||||
| Existing ruVector APIs | Deployment & Integration | **Conformist** | ruVector event types, metric types |
|
||||
| ruQu decoder API | Deployment & Integration | **Anti-Corruption Layer** | Isolated behind `CoherenceBridge` |
|
||||
| Circuit Construction | External tools | **Published Language** | OpenQASM 3.0 circuit format |
|
||||
| Deployment & Integration | JS consumers | **Open Host Service** | `ruqu-wasm` JS API |
|
||||
|
||||
---
|
||||
|
||||
## Ubiquitous Language
|
||||
|
||||
### Quantum Fundamentals
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Qubit** | Fundamental unit of quantum information existing in superposition of |0> and |1> basis states |
|
||||
| **Amplitude** | Complex number representing probability amplitude of a basis state; measurement probability is its squared modulus |
|
||||
| **State Vector** | Array of 2^n complex amplitudes representing the full quantum state of an n-qubit register |
|
||||
| **Basis State** | One of 2^n classical bit-string configurations; each has an associated amplitude |
|
||||
| **Superposition** | State where multiple basis states have nonzero amplitude |
|
||||
| **Entanglement** | Quantum correlation preventing independent per-qubit factorization of the joint state |
|
||||
| **Born Rule** | Measurement probability equals squared modulus of amplitude: P(x) = |alpha_x|^2 |
|
||||
|
||||
### Circuit Model
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Gate** | Unitary matrix operation acting on 1 or 2 qubits; transforms state via matrix-vector multiply |
|
||||
| **Circuit** | Ordered sequence of gates applied to a qubit register; the program of a quantum computation |
|
||||
| **Gate Matrix** | Unitary matrix defining gate action; must satisfy U * U_dagger = I |
|
||||
| **Qubit Index** | Zero-based integer identifying a qubit; determines which amplitude pairs a gate addresses |
|
||||
| **Circuit Depth** | Maximum sequential gate layers; primary determinant of simulation time |
|
||||
| **Parameterized Gate** | Gate whose matrix depends on continuous real parameters (e.g., Ry(theta)) |
|
||||
| **Gate Fusion** | Combining adjacent gates on same qubits into a single matrix multiply |
|
||||
| **Gate Schedule** | Topologically sorted gate-to-timestep assignment respecting qubit-sharing constraints |
|
||||
|
||||
### Measurement & Algorithms
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Measurement** | Projective observation collapsing superposition to a basis state per the Born rule |
|
||||
| **Mid-Circuit Measurement** | Measurement during (not only at end of) circuit execution |
|
||||
| **Shot** | Single circuit execution + measurement; repeated shots build statistics |
|
||||
| **Expectation Value** | Observable average over quantum state: <psi|H|psi> |
|
||||
| **Pauli String** | Tensor product of per-qubit Pauli operators (I/X/Y/Z) with coefficient |
|
||||
| **Hamiltonian** | Hermitian operator (weighted sum of Pauli strings) representing total energy |
|
||||
| **Syndrome** | Classical bits from ancilla measurements indicating error presence and location |
|
||||
| **Ansatz** | Parameterized circuit template encoding the variational search space |
|
||||
| **VQE** | Variational Quantum Eigensolver; iteratively minimizes Hamiltonian expectation |
|
||||
| **QAOA** | Quantum Approximate Optimization Algorithm; alternating cost/mixer unitaries |
|
||||
| **Grover Search** | Amplitude amplification finding marked items in O(sqrt(N)) queries |
|
||||
| **Oracle** | Black-box gate marking target states by phase flip |
|
||||
| **Surface Code** | 2D topological QEC code with stabilizer checks on lattice faces/vertices |
|
||||
| **Logical Error Rate** | Undetected logical error probability per QEC cycle |
|
||||
| **Decoder** | Classical algorithm mapping syndromes to corrections; bridge to ruQu |
|
||||
|
||||
### Simulation Infrastructure
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **State Allocator** | On-demand allocation/deallocation enforcing zero-idle policy |
|
||||
| **Memory Estimate** | Predicted bytes: 2^n * 16; gating threshold for allocation |
|
||||
| **Entanglement Tracker** | Tracks qubit correlations enabling subsystem splitting |
|
||||
| **State Checkpoint** | Serialized state snapshot for mid-circuit save/restore |
|
||||
| **Tensor Network** | Alternative representation via contracted tensor factors; efficient for low entanglement |
|
||||
| **Contraction Path** | Tensor contraction order minimizing total FLOPs |
|
||||
|
||||
---
|
||||
|
||||
## Bounded Context Details
|
||||
|
||||
### Context 1: Circuit Construction Domain
|
||||
|
||||
**Purpose**: Language for expressing quantum computations. Validation, scheduling, optimization, OpenQASM interchange.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **QuantumCircuit** | Aggregate Root | Ordered gate collection with register metadata |
|
||||
| **Gate** | Entity | Single unitary with target qubits and optional parameters |
|
||||
| **GateSchedule** | Entity | Time-step assignment for parallel execution analysis |
|
||||
| **CircuitOptimizer** | Domain Service | Fusion, cancellation, and commutation rules |
|
||||
| GateId, QubitIndex, GateMatrix, ParameterBinding, GateType | Value Objects | Immutable circuit building blocks |
|
||||
|
||||
**Events**: `CircuitCreated`, `GateAppended`, `CircuitOptimized`, `CircuitValidated`, `ParametersBound`
|
||||
|
||||
**Invariants**: (1) Gate unitarity. (2) Qubit indices within bounds. (3) No duplicate targets per gate. (4) All parameters bound before execution.
|
||||
|
||||
---
|
||||
|
||||
### Context 2: State Management Domain
|
||||
|
||||
**Purpose**: State vector lifecycle following zero-idle model. Entanglement tracking. Memory gating.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **QuantumState** | Aggregate Root | Owns the 2^n complex amplitude array |
|
||||
| **EntanglementTracker** | Entity | Bipartite entanglement graph for subsystem analysis |
|
||||
| **StateAllocator** | Domain Service | On-demand allocation, immediate deallocation |
|
||||
| Amplitude, QubitCount, MemoryEstimate, StateCheckpoint | Value Objects | State representation primitives |
|
||||
|
||||
**Events**: `StateAllocated`, `StateDeallocated`, `EntanglementDetected`, `SubsystemSplit`, `CheckpointCreated`, `MemoryLimitExceeded`
|
||||
|
||||
**Invariants**: (1) Normalization preserved. (2) Zero-idle: no state persists beyond execution. (3) Allocation gated by device capacity. (4) Checkpoint restore reproduces exact amplitudes.
|
||||
|
||||
---
|
||||
|
||||
### Context 3: Measurement & Observation Domain
|
||||
|
||||
**Purpose**: Projective measurement with collapse. Analytical expectation values. Syndrome extraction for QEC.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **MeasurementEngine** | Aggregate Root | Born-rule sampling and state collapse |
|
||||
| **ExpectationCalculator** | Entity | Analytical <psi|H|psi> from Pauli decomposition |
|
||||
| **SyndromeExtractor** | Entity | Ancilla measurement and classical bit extraction |
|
||||
| MeasurementOutcome, PauliString, Hamiltonian, SyndromeBits, ShotResult | Value Objects | Measurement data types |
|
||||
|
||||
**Events**: `MeasurementPerformed`, `ExpectationComputed`, `SyndromeExtracted`, `ShotsCompleted`
|
||||
|
||||
**Invariants**: (1) Born rule: probabilities sum to 1.0. (2) Post-measurement collapse to definite state. (3) Hamiltonian Hermiticity. (4) Syndrome bit count matches code.
|
||||
|
||||
---
|
||||
|
||||
### Context 4: Algorithm Execution Domain
|
||||
|
||||
**Purpose**: High-level quantum algorithms as orchestrated loops over circuits, states, and measurements.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **VQERunner** | Entity | Iterative ansatz parameter optimization to minimize energy |
|
||||
| **GroverSearch** | Entity | Oracle + diffusion with auto-computed iteration count |
|
||||
| **QAOASolver** | Entity | Graph-based cost/mixer circuit construction and angle optimization |
|
||||
| **SurfaceCodeSimulator** | Entity | Stabilizer cycles, syndrome extraction, decoder invocation |
|
||||
| AlgorithmResult, OptimizationTrace, CutValue, LogicalErrorRate, ConvergenceCriteria | Value Objects | Algorithm output types |
|
||||
|
||||
**Events**: `VQEIterationCompleted`, `VQEConverged`, `GroverSearchCompleted`, `QAOARoundCompleted`, `SurfaceCodeCycleCompleted`, `LogicalErrorDetected`
|
||||
|
||||
**Invariants**: (1) Grover iteration count = floor(pi/4 * sqrt(N/M)). (2) VQE energy is upper bound on ground state. (3) QAOA cost/mixer alternate with correct parameter count. (4) Surface code distance matches lattice.
|
||||
|
||||
---
|
||||
|
||||
### Context 5: Optimization & Backend Domain
|
||||
|
||||
**Purpose**: Performance backends that accelerate simulation without altering semantics. SIMD, fusion, tensor networks.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **SimulationBackend** | Aggregate Root | Selects optimal execution strategy |
|
||||
| **GateFuser** | Entity | Combines compatible gate sequences into single operations |
|
||||
| **TensorContractor** | Entity | Tensor network decomposition for low-entanglement states |
|
||||
| **SIMDDispatcher** | Entity | Platform detection and optimized kernel dispatch |
|
||||
| OptimizationHint, ContractionPath, FusedGateMatrix, PlatformCapabilities | Value Objects | Backend selection metadata |
|
||||
|
||||
**Events**: `BackendSelected`, `GatesFused`, `TensorNetworkContracted`, `SIMDKernelDispatched`
|
||||
|
||||
**Invariants**: (1) Fused gates produce identical results to sequential. (2) Tensor contraction matches state-vector. (3) SIMD falls back to scalar if unavailable. (4) Intermediates stay within memory budget.
|
||||
|
||||
---
|
||||
|
||||
### Context 6: Deployment & Integration Domain
|
||||
|
||||
**Purpose**: WASM compilation, agent activation bridge, ruQu decoder anti-corruption layer, observability.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **WASMBindings** | Entity | Open Host Service via wasm-bindgen JS API |
|
||||
| **AgentBridge** | Entity | ruvector-nervous-system integration for context-triggered activation |
|
||||
| **MetricsReporter** | Entity | Publishes SimulationMetrics to ruvector-metrics |
|
||||
| **CoherenceBridge** | Entity | ACL translating syndromes to ruQu's DetectorBitmap/SyndromeRound |
|
||||
| PlatformCapabilities, QubitLimit, SimulationMetrics, DecoderResult | Value Objects | Integration data types |
|
||||
|
||||
**Events**: `SimulationRequested`, `SimulationCompleted`, `ResourcesReleased`, `DecoderInvoked`, `MetricsPublished`
|
||||
|
||||
**Integration Patterns**:
|
||||
- **Anti-Corruption Layer**: CoherenceBridge isolates engine from ruQu's internal DDD model
|
||||
- **Conformist**: Deployment conforms to existing ruVector event types and metric schemas
|
||||
- **Open Host Service**: ruqu-wasm exposes clean JS/TS API for browser experimentation
|
||||
- **Published Language**: OpenQASM 3.0 for circuit interchange with external tools
|
||||
|
||||
---
|
||||
|
||||
## Cross-Cutting Concerns
|
||||
|
||||
### Zero-Idle Resource Model
|
||||
|
||||
```
|
||||
IDLE (0 bytes) --> ACTIVATE (allocate 2^n * 16 bytes) --> COMPUTE --> RELEASE (0 bytes)
|
||||
```
|
||||
|
||||
No warm pools, no pre-allocated buffers, no background threads.
|
||||
|
||||
### Memory Gating
|
||||
|
||||
| Qubits | State Vector Size | Decision |
|
||||
|--------|-------------------|----------|
|
||||
| 10 | 16 KiB | Always permit |
|
||||
| 15 | 512 KiB | Always permit |
|
||||
| 20 | 16 MiB | Permit on most devices |
|
||||
| 25 | 512 MiB | Gate: check available RAM |
|
||||
| 30 | 16 GiB | Gate: likely refuse on edge |
|
||||
| 35+ | 512 GiB+ | Always refuse (state vector); consider tensor network |
|
||||
|
||||
### Error Model
|
||||
|
||||
| Context | Error | Severity | Recovery |
|
||||
|---------|-------|----------|----------|
|
||||
| Circuit Construction | Non-unitary gate | Fatal | Reject circuit |
|
||||
| State Management | Memory limit exceeded | Recoverable | Try tensor network or refuse |
|
||||
| State Management | Normalization drift | Warning | Renormalize |
|
||||
| Measurement | Zero-probability outcome | Warning | Return uniform |
|
||||
| Algorithm Execution | VQE non-convergence | Recoverable | Return best-so-far |
|
||||
| Deployment | WASM memory limit | Fatal | Report to agent |
|
||||
| Deployment | ruQu decoder unavailable | Recoverable | Skip correction, log |
|
||||
|
||||
### Observability
|
||||
|
||||
All simulation runs produce `SimulationMetrics` (circuit name, qubit count, gate count, depth, shots, backend type, wall time, peak memory, SIMD utilization) flowing through `ruvector-metrics` for unified dashboard integration.
|
||||
|
||||
### Security
|
||||
|
||||
| Concern | Mitigation |
|
||||
|---------|------------|
|
||||
| Timing side channels in measurement | Constant-time sampling via rejection method |
|
||||
| Memory contents after deallocation | Zero-fill on deallocation (SecureAllocator mode) |
|
||||
| Denial-of-service via large qubit counts | Memory gating with hard upper bound per request |
|
||||
| Untrusted OpenQASM input | Parser validates unitarity and qubit bounds before execution |
|
||||
| WASM sandbox escape | No file I/O, no network; pure computation within WASM sandbox |
|
||||
|
||||
---
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
crates/ruqu-sim/src/
|
||||
+-- lib.rs # Public API
|
||||
+-- circuit/ # Circuit Construction context
|
||||
| +-- quantum_circuit.rs # QuantumCircuit aggregate
|
||||
| +-- gate.rs # Gate entity, GateType enum
|
||||
| +-- schedule.rs # GateSchedule
|
||||
| +-- optimizer.rs # CircuitOptimizer (fusion, cancel)
|
||||
| +-- openqasm.rs # OpenQASM 3.0 import/export
|
||||
+-- state/ # State Management context
|
||||
| +-- quantum_state.rs # QuantumState aggregate
|
||||
| +-- allocator.rs # StateAllocator (zero-idle)
|
||||
| +-- entanglement.rs # EntanglementTracker
|
||||
| +-- checkpoint.rs # StateCheckpoint
|
||||
+-- measurement/ # Measurement & Observation context
|
||||
| +-- engine.rs # MeasurementEngine
|
||||
| +-- expectation.rs # ExpectationCalculator
|
||||
| +-- syndrome.rs # SyndromeExtractor
|
||||
+-- algorithms/ # Algorithm Execution context
|
||||
| +-- vqe.rs, grover.rs # VQERunner, GroverSearch
|
||||
| +-- qaoa.rs # QAOASolver
|
||||
| +-- surface_code.rs # SurfaceCodeSimulator
|
||||
+-- backend/ # Optimization & Backend context
|
||||
| +-- simulation_backend.rs # SimulationBackend
|
||||
| +-- gate_fuser.rs # GateFuser
|
||||
| +-- tensor_network.rs # TensorContractor
|
||||
| +-- simd_dispatch.rs # SIMDDispatcher
|
||||
| +-- kernels/ # avx2.rs, avx512.rs, neon.rs, wasm_simd.rs, scalar.rs
|
||||
+-- types.rs, events.rs, error.rs
|
||||
|
||||
crates/ruqu-wasm/src/
|
||||
+-- lib.rs # wasm-bindgen entry
|
||||
+-- js_api.rs # JS-facing API
|
||||
+-- agent_bridge.rs # ruvector-nervous-system integration
|
||||
+-- coherence_bridge.rs # ACL for ruQu decoder
|
||||
+-- metrics.rs # ruvector-metrics export
|
||||
```
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
ruqu-sim
|
||||
+-- ruvector-math (SIMD kernels, complex math)
|
||||
+-- rand (measurement sampling)
|
||||
+-- ruvector-graph (QAOA graph input)
|
||||
|
||||
ruqu-wasm
|
||||
+-- ruqu-sim (core simulation)
|
||||
+-- ruqu (coherence bridge ACL)
|
||||
+-- ruvector-metrics (observability)
|
||||
+-- ruvector-nervous-system (agent activation)
|
||||
+-- wasm-bindgen (JS bindings)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Single-gate (1q, 20-qubit register) | < 50 us |
|
||||
| Full circuit (100 gates, 15 qubits) | < 10 ms |
|
||||
| Hamiltonian expectation (10q, 50 terms) | < 1 ms |
|
||||
| SIMD speedup over scalar | > 3x (AVX2), > 6x (AVX-512) |
|
||||
| Grover (20 qubits, 1 target) | < 500 ms |
|
||||
| VQE convergence (H2, 4 qubits) | < 5s, < 100 iterations |
|
||||
| State allocation/deallocation | < 10 us / < 1 us |
|
||||
| WASM circuit (10 qubits, 50 gates) | < 50 ms |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Evans, E. (2003). "Domain-Driven Design: Tackling Complexity in the Heart of Software."
|
||||
2. Vernon, V. (2013). "Implementing Domain-Driven Design."
|
||||
3. Nielsen, M. A. & Chuang, I. L. (2010). "Quantum Computation and Quantum Information."
|
||||
4. Peruzzo, A. et al. (2014). "A variational eigenvalue solver on a photonic quantum processor."
|
||||
5. Farhi, E. et al. (2014). "A Quantum Approximate Optimization Algorithm."
|
||||
6. Fowler, A. G. et al. (2012). "Surface codes: Towards practical large-scale quantum computation."
|
||||
7. ruQu crate: Existing coherence assessment and syndrome processing in ruVector.
|
||||
8. Coherence Engine DDD: `/docs/architecture/coherence-engine-ddd.md`
|
||||
1426
vendor/ruvector/docs/architecture/quantum-engine/quantum-engine-ddd-tactical.md
vendored
Normal file
1426
vendor/ruvector/docs/architecture/quantum-engine/quantum-engine-ddd-tactical.md
vendored
Normal file
File diff suppressed because it is too large
Load Diff
529
vendor/ruvector/docs/architecture/ruvltra-medium-architecture.md
vendored
Normal file
529
vendor/ruvector/docs/architecture/ruvltra-medium-architecture.md
vendored
Normal file
@@ -0,0 +1,529 @@
|
||||
# RuvLTRA-Medium Architecture Design Document
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document describes the architecture and implementation of RuvLTRA-Medium, a 3 billion parameter language model based on Qwen2.5-3B-Instruct, enhanced with SONA learning hooks, HNSW routing, and advanced memory optimization techniques.
|
||||
|
||||
## 1. Core Architecture
|
||||
|
||||
### 1.1 Base Model Specifications
|
||||
|
||||
**Architecture:** Qwen2.5-3B-Instruct (Transformer Decoder)
|
||||
|
||||
```
|
||||
Configuration:
|
||||
├── Parameters: ~3.0B
|
||||
├── Layers: 32 decoder layers
|
||||
├── Hidden Size: 2048
|
||||
├── Attention Heads: 16
|
||||
├── KV Heads: 2 (GQA 8:1)
|
||||
├── Head Dimension: 128
|
||||
├── Intermediate Size: 11008 (SwiGLU)
|
||||
├── Vocabulary: 151,936 tokens
|
||||
└── Context: 32,768 tokens
|
||||
```
|
||||
|
||||
### 1.2 Model Components
|
||||
|
||||
**Decoder Layer Structure:**
|
||||
```
|
||||
Input
|
||||
↓
|
||||
RMSNorm (input_layernorm)
|
||||
↓
|
||||
Multi-Head Attention (GQA)
|
||||
- Q projection: [2048 → 2048]
|
||||
- K projection: [2048 → 256] (GQA compressed)
|
||||
- V projection: [2048 → 256] (GQA compressed)
|
||||
- O projection: [2048 → 2048]
|
||||
- RoPE: theta=1M, head_dim=128
|
||||
↓
|
||||
Residual Connection
|
||||
↓
|
||||
RMSNorm (post_attention_layernorm)
|
||||
↓
|
||||
MLP (SwiGLU)
|
||||
- Gate: [2048 → 11008]
|
||||
- Up: [2048 → 11008]
|
||||
- Down: [11008 → 2048]
|
||||
↓
|
||||
Residual Connection
|
||||
↓
|
||||
Output (→ next layer or final norm)
|
||||
```
|
||||
|
||||
## 2. RuvLTRA Enhancements
|
||||
|
||||
### 2.1 SONA Learning Hooks
|
||||
|
||||
**Hook Placement Strategy:**
|
||||
|
||||
```
|
||||
Layer 0-7: No hooks (early token processing)
|
||||
Layer 8: ✓ HOOK - Early pattern recognition
|
||||
Layer 9-15: No hooks
|
||||
Layer 16: ✓ HOOK - Mid-layer semantic extraction
|
||||
Layer 17-23: No hooks
|
||||
Layer 24: ✓ HOOK - Deep reasoning capture
|
||||
Layer 25-31: No hooks (final refinement)
|
||||
```
|
||||
|
||||
**Hook Implementation:**
|
||||
|
||||
```rust
|
||||
pub struct RuvLtraMediumDecoderLayer {
|
||||
// ... layer components ...
|
||||
pub has_sona_hook: bool,
|
||||
}
|
||||
|
||||
impl RuvLtraMediumDecoderLayer {
|
||||
pub fn forward(
|
||||
&self,
|
||||
hidden_states: &[f32],
|
||||
positions: &[usize],
|
||||
paged_cache: Option<&mut PagedKVCache>,
|
||||
sona: Option<&Arc<RwLock<SonaIntegration>>>,
|
||||
) -> Result<Vec<f32>> {
|
||||
// ... attention computation ...
|
||||
|
||||
// Apply SONA hook after attention
|
||||
let attn_out = if self.has_sona_hook {
|
||||
if let Some(sona_int) = sona {
|
||||
self.apply_sona_hook(&attn_out, sona_int)?
|
||||
} else {
|
||||
attn_out
|
||||
}
|
||||
} else {
|
||||
attn_out
|
||||
};
|
||||
|
||||
// ... continue with MLP ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**SONA Learning Loops:**
|
||||
|
||||
1. **Instant Loop** (per request):
|
||||
- MicroLoRA adaptation (rank 4)
|
||||
- Ring buffer storage
|
||||
- Edge weight updates
|
||||
- Latency: <0.05ms
|
||||
|
||||
2. **Background Loop** (hourly):
|
||||
- Router training
|
||||
- EWC++ Fisher matrix
|
||||
- BaseLoRA consolidation (rank 8)
|
||||
- Pattern indexing
|
||||
|
||||
3. **Deep Loop** (weekly):
|
||||
- Pattern bank pruning
|
||||
- Memory consolidation
|
||||
- Knowledge transfer
|
||||
- Quality filtering (threshold 0.6)
|
||||
|
||||
### 2.2 HNSW Routing Integration
|
||||
|
||||
**Index Structure:**
|
||||
|
||||
```
|
||||
HNSW Index:
|
||||
├── M = 16 (base), 32 (agent variant)
|
||||
├── ef_construction = 200 (base), 400 (agent)
|
||||
├── ef_search = 50
|
||||
├── Distance metric: Cosine similarity
|
||||
└── Node capacity: 50,000 patterns
|
||||
```
|
||||
|
||||
**Search Performance:**
|
||||
|
||||
| Dataset Size | Brute Force | HNSW | Speedup |
|
||||
|-------------|-------------|------|---------|
|
||||
| 1,000 | 0.8ms | 0.005ms | 160x |
|
||||
| 10,000 | 8.2ms | 0.012ms | 683x |
|
||||
| 50,000 | 41.5ms | 0.018ms | 2,305x |
|
||||
| 100,000 | 83.1ms | 0.021ms | 3,957x |
|
||||
|
||||
**Claude Flow Integration:**
|
||||
|
||||
```rust
|
||||
// Agent routing via HNSW
|
||||
let task_embedding = model.embed("Implement REST API")?;
|
||||
let neighbors = hnsw_index.search(&task_embedding, k=5)?;
|
||||
|
||||
// Neighbors: [(agent_type, similarity_score)]
|
||||
// [("coder", 0.92), ("backend-dev", 0.87), ...]
|
||||
```
|
||||
|
||||
### 2.3 ReasoningBank Trajectory Storage
|
||||
|
||||
**Trajectory Format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"trajectory_id": "uuid-v4",
|
||||
"task": "code-generation",
|
||||
"states": [
|
||||
{
|
||||
"layer": 8,
|
||||
"embedding": [0.123, -0.456, ...],
|
||||
"timestamp": 1234567890
|
||||
},
|
||||
{
|
||||
"layer": 16,
|
||||
"embedding": [0.789, 0.234, ...],
|
||||
"timestamp": 1234567891
|
||||
}
|
||||
],
|
||||
"actions": [
|
||||
{
|
||||
"action": "generate_function",
|
||||
"quality": 0.85
|
||||
}
|
||||
],
|
||||
"final_quality": 0.87,
|
||||
"metadata": {
|
||||
"agent": "coder",
|
||||
"tokens": 256
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Storage Backend:**
|
||||
|
||||
- AgentDB with HNSW indexing
|
||||
- Semantic search via embeddings
|
||||
- Quality-based filtering
|
||||
- Temporal decay (old patterns degrade)
|
||||
|
||||
## 3. Memory Optimization
|
||||
|
||||
### 3.1 Paged KV Cache
|
||||
|
||||
**Page Structure:**
|
||||
|
||||
```rust
|
||||
pub struct PageBlock {
|
||||
pub block_id: usize,
|
||||
pub keys: Vec<f32>, // [page_size, num_kv_heads, head_dim]
|
||||
pub values: Vec<f32>, // [page_size, num_kv_heads, head_dim]
|
||||
pub num_tokens: usize,
|
||||
pub ref_count: AtomicUsize,
|
||||
}
|
||||
```
|
||||
|
||||
**Block Size:** 64 tokens per page
|
||||
|
||||
**Memory Layout:**
|
||||
|
||||
```
|
||||
Sequence: "The quick brown fox..."
|
||||
├── Page 0 [tokens 0-63]: Block #42
|
||||
├── Page 1 [tokens 64-127]: Block #103
|
||||
├── Page 2 [tokens 128-191]: Block #87
|
||||
└── ...
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
|
||||
- **Memory Savings:** 40-60% reduction
|
||||
- **Dynamic Allocation:** On-demand page allocation
|
||||
- **Copy-on-Write:** Efficient sequence forking
|
||||
- **Prefix Caching:** Shared prefixes use same blocks
|
||||
|
||||
**Configuration:**
|
||||
|
||||
```rust
|
||||
pub struct PagedAttentionConfig {
|
||||
pub page_size: 64, // Tokens per page
|
||||
pub max_pages_per_sequence: 512, // 32K tokens / 64
|
||||
pub page_table_capacity: 8192, // Total blocks
|
||||
pub num_heads: 16,
|
||||
pub head_dim: 128,
|
||||
pub num_kv_heads: 2,
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Flash Attention 2
|
||||
|
||||
**Algorithm:**
|
||||
|
||||
1. **Tiling:** Split Q, K, V into blocks
|
||||
2. **Streaming:** Load blocks from HBM to SRAM
|
||||
3. **Recomputation:** Compute softmax on-the-fly
|
||||
4. **IO Efficiency:** Minimize memory transfers
|
||||
|
||||
**Speedup Analysis:**
|
||||
|
||||
| Seq Length | Standard | Flash Attn 2 | Speedup | Memory |
|
||||
|-----------|----------|--------------|---------|--------|
|
||||
| 512 | 45ms | 18ms | 2.5x | -30% |
|
||||
| 2K | 180ms | 43ms | 4.2x | -50% |
|
||||
| 8K | 720ms | 103ms | 7.0x | -65% |
|
||||
| 32K | 2880ms | 407ms | 7.1x | -70% |
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```rust
|
||||
fn flash_attention(&self, query: &[f32], key: &[f32], value: &[f32], seq_len: usize)
|
||||
-> Result<Vec<f32>>
|
||||
{
|
||||
let scale = 1.0 / (self.config.head_dim as f32).sqrt();
|
||||
|
||||
for h in 0..num_heads {
|
||||
for t in 0..seq_len {
|
||||
// Extract Q slice
|
||||
let q_slice = &query[q_offset..q_offset + head_dim];
|
||||
|
||||
// Extract K, V slices (GQA mapping)
|
||||
let kv_head = h / gqa_ratio;
|
||||
let k_slice = extract_kv(key, kv_head, seq_len);
|
||||
let v_slice = extract_kv(value, kv_head, seq_len);
|
||||
|
||||
// Flash attention kernel (NEON optimized)
|
||||
let head_out = flash_attention_neon(q_slice, &k_slice, &v_slice, scale, causal=true);
|
||||
|
||||
// Write output
|
||||
output[out_offset..out_offset + head_dim].copy_from_slice(&head_out);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Speculative Decoding
|
||||
|
||||
**Draft Model:** RuvLTRA-Small (0.5B, Qwen 0.5B)
|
||||
|
||||
**Algorithm:**
|
||||
|
||||
```
|
||||
1. Draft Phase:
|
||||
Generate K=4 tokens with draft model (fast)
|
||||
Tokens: [t1, t2, t3, t4]
|
||||
|
||||
2. Verify Phase:
|
||||
Run main model on [context, t1, t2, t3, t4] in parallel
|
||||
Get probabilities: [p1, p2, p3, p4]
|
||||
|
||||
3. Accept/Reject:
|
||||
For i in 1..K:
|
||||
if p_main[i] >= p_draft[i] * acceptance_threshold:
|
||||
accept token i
|
||||
else:
|
||||
reject token i and all subsequent
|
||||
sample correct token from p_main[i]
|
||||
break
|
||||
|
||||
4. Effective tokens per step:
|
||||
Average: 1 + acceptance_rate * K
|
||||
With 70% acceptance and K=4: 1 + 0.7*4 = 3.8 tokens/step
|
||||
```
|
||||
|
||||
**Configuration:**
|
||||
|
||||
```rust
|
||||
pub struct SpeculativeConfig {
|
||||
pub lookahead: 4, // K tokens
|
||||
pub acceptance_threshold: 0.7, // 70% confidence
|
||||
pub draft_temperature: 0.0, // Greedy draft
|
||||
pub adaptive_lookahead: true, // Adjust K based on acceptance
|
||||
pub min_lookahead: 2,
|
||||
pub max_lookahead: 8,
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Speedup:**
|
||||
|
||||
| Scenario | Acceptance Rate | Speedup |
|
||||
|----------|----------------|---------|
|
||||
| Greedy (T=0.0) | 75% | 2.8-3.2x |
|
||||
| Low temp (T=0.5) | 60% | 2.2-2.6x |
|
||||
| High temp (T=1.0) | 40% | 1.5-1.8x |
|
||||
|
||||
## 4. Model Variants
|
||||
|
||||
### 4.1 RuvLTRA-Medium-Base
|
||||
|
||||
**Purpose:** General-purpose inference
|
||||
|
||||
**Configuration:**
|
||||
- Temperature: 0.7
|
||||
- Top-p: 0.9
|
||||
- SONA hooks: [8, 16, 24]
|
||||
- Pattern capacity: 50,000
|
||||
- Quality threshold: 0.6
|
||||
|
||||
**Optimization:**
|
||||
- Balanced precision/recall
|
||||
- Moderate learning rate
|
||||
- Standard HNSW (M=16)
|
||||
|
||||
### 4.2 RuvLTRA-Medium-Coder
|
||||
|
||||
**Purpose:** Code generation and analysis
|
||||
|
||||
**Configuration:**
|
||||
- Temperature: 0.2 (deterministic)
|
||||
- Top-p: 0.95
|
||||
- SONA hooks: [8, 16, 24, 28]
|
||||
- Pattern capacity: 100,000
|
||||
- Quality threshold: 0.7 (stricter)
|
||||
|
||||
**Optimization:**
|
||||
- Extra late-layer hook (28) for code structure
|
||||
- Larger pattern bank for API/library patterns
|
||||
- Higher quality threshold for correctness
|
||||
|
||||
### 4.3 RuvLTRA-Medium-Agent
|
||||
|
||||
**Purpose:** Agent routing and planning
|
||||
|
||||
**Configuration:**
|
||||
- Temperature: 0.3
|
||||
- Top-p: 0.85
|
||||
- SONA hooks: [8, 16, 24]
|
||||
- HNSW M: 32 (more connections)
|
||||
- HNSW ef_construction: 400
|
||||
- MicroLoRA rank: 2 (faster adaptation)
|
||||
|
||||
**Optimization:**
|
||||
- Higher HNSW connectivity for routing
|
||||
- Lower LoRA rank for latency
|
||||
- Faster instant learning rate (0.02)
|
||||
|
||||
## 5. Quantization Support
|
||||
|
||||
### 5.1 Supported Formats
|
||||
|
||||
**Q4_K_M (4-bit K-quants Medium):**
|
||||
- Bytes per param: 0.5625 (~4.5 bits)
|
||||
- Model size: ~2.0 GB
|
||||
- Quality loss: ~2%
|
||||
- Speed: Fast (68 tok/s)
|
||||
- **Recommended for production**
|
||||
|
||||
**Q5_K_M (5-bit K-quants Medium):**
|
||||
- Bytes per param: 0.6875 (~5.5 bits)
|
||||
- Model size: ~2.5 GB
|
||||
- Quality loss: ~1%
|
||||
- Speed: Medium (55 tok/s)
|
||||
- **Recommended for balanced quality**
|
||||
|
||||
**Q8_0 (8-bit quantization):**
|
||||
- Bytes per param: 1.0625 (~8.5 bits)
|
||||
- Model size: ~3.5 GB
|
||||
- Quality loss: <0.5%
|
||||
- Speed: Slower (42 tok/s)
|
||||
- **Recommended for maximum quality**
|
||||
|
||||
**Mixed Precision:**
|
||||
- FP16 attention + Q4 MLP
|
||||
- Model size: ~2.8 GB
|
||||
- Quality loss: ~1.5%
|
||||
- Speed: Medium (60 tok/s)
|
||||
- **Recommended for attention-heavy tasks**
|
||||
|
||||
### 5.2 Quantization Implementation
|
||||
|
||||
```rust
|
||||
pub enum RuvLtraMediumQuant {
|
||||
Q4KM, // 4-bit K-quants
|
||||
Q5KM, // 5-bit K-quants
|
||||
Q80, // 8-bit
|
||||
Mixed, // FP16 attn + Q4 MLP
|
||||
}
|
||||
|
||||
impl RuvLtraMediumQuant {
|
||||
pub fn model_size_mb(&self, num_params: usize) -> f32 {
|
||||
(num_params as f32 * self.bytes_per_param()) / (1024.0 * 1024.0)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 6. Performance Characteristics
|
||||
|
||||
### 6.1 Inference Benchmarks (Apple M3 Max)
|
||||
|
||||
| Configuration | Tok/s | Memory | Power | Quality |
|
||||
|--------------|-------|--------|-------|---------|
|
||||
| Base Q4_K_M | 68 | 2.2 GB | 12W | 100% |
|
||||
| Base Q5_K_M | 55 | 2.7 GB | 14W | 101% |
|
||||
| Base Q8_0 | 42 | 3.8 GB | 16W | 102% |
|
||||
| Coder Q4_K_M | 65 | 2.4 GB | 13W | 98% |
|
||||
| Agent Q4_K_M | 72 | 2.1 GB | 11W | 97% |
|
||||
| + Speculative | 158 | 2.8 GB | 15W | 99% |
|
||||
|
||||
### 6.2 Quality Benchmarks
|
||||
|
||||
**MMLU (Massive Multitask Language Understanding):**
|
||||
- Base: 68.2%
|
||||
- Coder: 66.8%
|
||||
- Agent: 64.5%
|
||||
|
||||
**HumanEval (Code Generation):**
|
||||
- Base: 52.4%
|
||||
- Coder: 61.7%
|
||||
- Agent: 48.9%
|
||||
|
||||
**GSM8K (Math Reasoning):**
|
||||
- Base: 71.3%
|
||||
- Coder: 69.8%
|
||||
- Agent: 73.6%
|
||||
|
||||
## 7. File Structure
|
||||
|
||||
```
|
||||
crates/ruvllm/src/models/
|
||||
├── mod.rs # Module exports
|
||||
├── ruvltra.rs # RuvLTRA-Small (0.5B)
|
||||
└── ruvltra_medium.rs # RuvLTRA-Medium (3B) ← NEW
|
||||
|
||||
docs/
|
||||
├── ruvltra-medium.md # User guide
|
||||
└── ruvltra-medium-architecture.md # This document
|
||||
```
|
||||
|
||||
## 8. Integration Points
|
||||
|
||||
### 8.1 With RuvLTRA-Small
|
||||
|
||||
- Speculative decoding draft model
|
||||
- Knowledge distillation target
|
||||
- Edge deployment pairing
|
||||
|
||||
### 8.2 With Claude Flow
|
||||
|
||||
- Agent routing embeddings
|
||||
- Task classification
|
||||
- Trajectory recording
|
||||
- Pattern sharing
|
||||
|
||||
### 8.3 With AgentDB
|
||||
|
||||
- HNSW index backend
|
||||
- Pattern storage
|
||||
- Semantic search
|
||||
- Vector operations
|
||||
|
||||
## 9. Future Enhancements
|
||||
|
||||
1. **Multimodal Extension:** Vision encoder integration
|
||||
2. **Context Extension:** 128K token context (YaRN scaling)
|
||||
3. **MoE Variant:** Mixture-of-Experts for specialization
|
||||
4. **On-Device Fine-tuning:** LoRA adaptation on-device
|
||||
5. **Model Merging:** Combine Base + Coder + Agent
|
||||
|
||||
## 10. Summary
|
||||
|
||||
RuvLTRA-Medium is a production-ready 3B parameter model with:
|
||||
|
||||
✅ **Qwen2.5-3B base** for quality
|
||||
✅ **SONA learning hooks** for continuous improvement
|
||||
✅ **HNSW routing** for agent coordination
|
||||
✅ **Paged KV cache** for memory efficiency
|
||||
✅ **Flash Attention 2** for speed
|
||||
✅ **Speculative decoding** for 2-3x acceleration
|
||||
✅ **Three specialized variants** for diverse use cases
|
||||
✅ **Q4/Q5/Q8 quantization** for deployment flexibility
|
||||
|
||||
The model achieves an optimal balance of quality, speed, and memory efficiency, making it suitable for production deployment on Apple Silicon and modern GPUs.
|
||||
1792
vendor/ruvector/docs/architecture/temporal-tensor-store-ddd.md
vendored
Normal file
1792
vendor/ruvector/docs/architecture/temporal-tensor-store-ddd.md
vendored
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user