Files
wifi-densepose/docs/adr/quantum-engine/ADR-QE-001-quantum-engine-core-architecture.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

306 lines
12 KiB
Markdown

# ADR-QE-001: Quantum Engine Core Architecture
**Status**: Proposed
**Date**: 2026-02-06
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
## Context
### Problem Statement
ruVector needs a quantum simulation engine for on-device quantum algorithm
experimentation. The platform runs on distributed edge systems, primarily
targeting Cognitum's 256-core low-power processors, and emphasizes ultra-low-power
event-driven computing. Quantum simulation is a natural extension of ruVector's
mathematical computation capabilities: the same SIMD-optimized linear algebra
that powers vector search and neural inference can drive state-vector manipulation
for quantum circuits.
### Requirements
The engine must support gate-model quantum circuit simulation up to approximately
25 qubits, covering the following algorithm families:
| Algorithm Family | Use Case | Typical Qubits | Gate Depth |
|------------------|----------|-----------------|------------|
| VQE (Variational Quantum Eigensolver) | Molecular simulation, optimization | 8-20 | 50-500 per iteration |
| Grover's Search | Unstructured database search | 8-25 | O(sqrt(2^n)) |
| QAOA (Quantum Approximate Optimization) | Combinatorial optimization | 10-25 | O(p * edges) |
| Quantum Error Correction | Surface code, stabilizer circuits | 9-25 (logical + ancilla) | Repetitive syndrome rounds |
### Memory Scaling Analysis
Quantum state-vector simulation stores the full amplitude vector of 2^n complex
numbers. Each amplitude is a pair of f64 values (real + imaginary = 16 bytes).
Memory grows exponentially:
```
Qubits Amplitudes State Size With Scratch Buffer
------ ----------- ---------- -------------------
10 1,024 16 KB 32 KB
15 32,768 512 KB 1 MB
20 1,048,576 16 MB 32 MB
22 4,194,304 64 MB 128 MB
24 16,777,216 256 MB 512 MB
25 33,554,432 512 MB 1.07 GB
26 67,108,864 1.07 GB 2.14 GB
28 268,435,456 4.29 GB 8.59 GB
30 1,073,741,824 17.18 GB 34.36 GB
```
At 25 qubits the state vector requires approximately 512 MB (1.07 GB with a
scratch buffer for intermediate calculations). This is the practical ceiling
for WebAssembly's 32-bit address space. Native execution with sufficient RAM
can push to 30+ qubits.
### Edge Computing Constraints
Cognitum's 256-core processors operate under strict power and memory budgets:
- **Power envelope**: Event-driven activation; cores idle at near-zero draw
- **Memory**: Shared pool, typically 2-8 GB per node
- **Interconnect**: Low-latency mesh between cores, suitable for parallel simulation
- **Workload model**: Burst computation triggered by agent events, not continuous
The quantum engine must respect this model: allocate state only when a simulation
is triggered, execute the circuit, return results, and immediately release all
memory.
## Decision
Implement a **pure Rust state-vector quantum simulator** as a new crate family
(`ruQu` quantum engine) within the ruVector workspace. The following architectural
decisions define the engine.
### 1. Pure Rust Implementation (No C/C++ FFI)
The entire simulation engine is written in Rust with no foreign function interface
dependencies. This ensures:
- Compilation to `wasm32-unknown-unknown` without emscripten or C toolchains
- Memory safety guarantees throughout the simulation pipeline
- Unified build system via Cargo across all targets
- No external library version conflicts or platform-specific linking issues
### 2. State-Vector Simulation as Primary Backend
The engine uses explicit full-amplitude state-vector representation as its
primary simulation mode. Each gate application transforms the full 2^n
amplitude vector via matrix-vector multiplication.
```
Circuit Execution Model:
|psi_0> ──[H]──[CNOT]──[Rz(theta)]──[Measure]── classical bits
| | | |
v v v v
[init] [apply_H] [apply_CNOT] [apply_Rz] [sample]
| | | | |
2^n f64 2^n f64 2^n f64 2^n f64 collapse
complex complex complex complex to basis
```
Gate application follows the standard decomposition:
- **Single-qubit gates**: Iterate amplitude pairs (i, i XOR 2^target), apply 2x2
unitary. O(2^n) operations per gate.
- **Two-qubit gates**: Iterate amplitude quadruples, apply 4x4 unitary.
O(2^n) operations per gate.
- **Multi-qubit gates**: Decompose into single and two-qubit gates, or apply
directly via 2^k x 2^k matrix on k target qubits.
### 3. Qubit Limits and Precision
| Parameter | WASM Target | Native Target |
|-----------|-------------|---------------|
| Max qubits (default) | 25 | 30+ (RAM-dependent) |
| Max qubits (hard limit) | 26 (with f32) | Memory-limited |
| Precision (default) | Complex f64 | Complex f64 |
| Precision (optional) | Complex f32 | Complex f32 |
| State size at max | ~1.07 GB | ~17 GB at 30 qubits |
Complex f64 is the default precision, providing approximately 15 decimal digits
of accuracy -- sufficient for quantum chemistry applications and deep circuits
where accumulated floating-point error matters. An optional f32 mode halves
memory usage at the cost of precision, suitable for shallow circuits and
approximate optimization.
### 4. Event-Driven Activation Model
The engine follows ruVector's event-driven philosophy:
```
Agent Context ruQu Engine Memory
| | |
|-- trigger(circuit) ->| |
| |-- allocate(2^n) ---->|
| |<---- state_ptr ------|
| | |
| |-- [execute gates] -->|
| |-- [measure] -------->|
| | |
|<-- results ---------| |
| |-- deallocate() ----->|
| | |
(idle) (inert) (freed)
```
- **Inert by default**: No background threads, no persistent allocations
- **Allocate on demand**: State vector created when circuit execution begins
- **Free immediately**: All simulation memory released upon result delivery
- **No global state**: Multiple concurrent simulations supported via independent
state handles (no shared mutable global)
### 5. Dual-Target Compilation
The crate supports two compilation targets from a single codebase:
```
ruqu-core
|
+----------+----------+
| |
[native target] [wasm32-unknown-unknown]
| |
- Full SIMD (AVX2, - WASM SIMD128
AVX-512, NEON) - 4GB address limit
- Rayon threading - Optional SharedArrayBuffer
- Optional GPU (wgpu) - No GPU
- 30+ qubits - 25 qubit ceiling
- Full OS integration - Sandboxed
```
Conditional compilation via Cargo feature flags controls target-specific code
paths. The public API surface is identical across targets.
### 6. Optional Tensor Network Mode
For circuits with limited entanglement (e.g., shallow QAOA, certain VQE
ansatze), the engine offers an optional tensor network backend:
- Represents the quantum state as a network of tensors rather than a single
exponential vector
- Memory scales as O(n * chi^2) where chi is the bond dimension (maximum
entanglement width)
- Efficient for circuits where entanglement grows slowly or remains bounded
- Falls back to full state-vector when bond dimension exceeds threshold
- Enabled via the `tensor-network` feature flag
## Alternatives Considered
### Alternative 1: Qukit (Rust, WASM-ready)
A pre-1.0 Rust quantum simulator with WASM support.
| Criterion | Assessment |
|-----------|------------|
| Maturity | Pre-1.0, limited community |
| WASM support | Present but untested at scale |
| Optimization | Basic; no SIMD, no gate fusion |
| Integration | Would require adapter layer |
| Maintenance | External dependency risk |
**Rejected**: Insufficient optimization depth and maturity for production use.
### Alternative 2: QuantRS2 (Rust, Python-focused)
A Rust quantum simulator primarily targeting Python bindings via PyO3.
| Criterion | Assessment |
|-----------|------------|
| Performance | Good benchmarks on native |
| WASM support | Not a design target |
| Dependencies | Heavy; Python-oriented build |
| API design | Python-first, Rust API secondary |
| Integration | Significant impedance mismatch |
**Rejected**: Python-centric design creates unnecessary weight and integration
friction for a Rust-native edge system.
### Alternative 3: roqoqo + QuEST (Rust frontend, C backend)
roqoqo provides a Rust circuit description layer; QuEST is a high-performance
C/C++ state-vector simulator.
| Criterion | Assessment |
|-----------|------------|
| Performance | Excellent (QuEST is highly optimized) |
| WASM support | QuEST's C code breaks WASM compilation |
| Maintenance | External C library maintenance burden |
| Memory safety | C backend outside Rust safety guarantees |
**Rejected**: C dependency is incompatible with WASM target requirement.
### Alternative 4: Quant-Iron (Rust + OpenCL)
A Rust simulator leveraging OpenCL for GPU acceleration.
| Criterion | Assessment |
|-----------|------------|
| Performance | Excellent on GPU-equipped hardware |
| WASM support | OpenCL incompatible with WASM |
| Edge deployment | Most edge nodes lack discrete GPUs |
| Complexity | OpenCL runtime adds operational burden |
**Rejected**: OpenCL dependency incompatible with WASM and edge deployment model.
### Alternative 5: No Simulator (Cloud Quantum APIs)
Delegate all quantum computation to cloud-based quantum simulators or hardware.
| Criterion | Assessment |
|-----------|------------|
| Performance | Network-bound latency |
| Offline support | None; requires connectivity |
| Cost | Per-execution charges |
| Privacy | Circuit data sent to third party |
| Edge philosophy | Violates offline-first design |
**Rejected**: Fundamentally incompatible with ruVector's offline-first edge
computing philosophy.
## Consequences
### Positive
- **Full control**: Complete ownership of the simulation pipeline, enabling
deep integration with ruVector's math, SIMD, and memory subsystems
- **WASM portable**: Single codebase compiles to any WASM runtime, enabling
browser-based quantum experimentation
- **No external dependencies**: Eliminates supply chain risk from C/C++ or
Python library dependencies
- **Edge-aligned**: Event-driven activation model matches Cognitum's power
architecture
- **Extensible**: Gate set, noise models, and backends can evolve independently
### Negative
- **Development effort**: Building a competitive quantum simulator from scratch
requires significant engineering investment
- **Maintenance burden**: Team must benchmark, optimize, and maintain the
simulation engine alongside the rest of ruVector
- **Classical simulation limits**: Exponential scaling is a fundamental physics
constraint; the engine cannot exceed ~30 qubits on practical hardware
### Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Performance below competitors | Medium | High | Benchmark-driven development against QuantRS2/Qukit |
| Floating-point accuracy drift | Low | Medium | Comprehensive numerical tests, optional f64 enforcement |
| WASM memory exhaustion | Medium | Medium | Hard qubit limit with clear error messages (ADR-QE-003) |
| Scope creep into hardware simulation | Low | Low | Strict scope: gate-model only, no analog/pulse simulation |
## References
- [ADR-005: WASM Runtime Integration](/docs/adr/ADR-005-wasm-runtime-integration.md)
- [ADR-003: SIMD Optimization Strategy](/docs/adr/ADR-003-simd-optimization-strategy.md)
- [ADR-006: Memory Management](/docs/adr/ADR-006-memory-management.md)
- [ADR-014: Coherence Engine](/docs/adr/ADR-014-coherence-engine.md)
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
- [ADR-QE-003: WASM Compilation Strategy](./ADR-QE-003-wasm-compilation-strategy.md)
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
- Nielsen & Chuang, "Quantum Computation and Quantum Information" (2010)
- Aaronson & Gottesman, "Improved simulation of stabilizer circuits" (2004)