Files
wifi-densepose/docs/adr/quantum-engine/ADR-QE-001-quantum-engine-core-architecture.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

12 KiB

ADR-QE-001: Quantum Engine Core Architecture

Status: Proposed Date: 2026-02-06 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board

Context

Problem Statement

ruVector needs a quantum simulation engine for on-device quantum algorithm experimentation. The platform runs on distributed edge systems, primarily targeting Cognitum's 256-core low-power processors, and emphasizes ultra-low-power event-driven computing. Quantum simulation is a natural extension of ruVector's mathematical computation capabilities: the same SIMD-optimized linear algebra that powers vector search and neural inference can drive state-vector manipulation for quantum circuits.

Requirements

The engine must support gate-model quantum circuit simulation up to approximately 25 qubits, covering the following algorithm families:

Algorithm Family Use Case Typical Qubits Gate Depth
VQE (Variational Quantum Eigensolver) Molecular simulation, optimization 8-20 50-500 per iteration
Grover's Search Unstructured database search 8-25 O(sqrt(2^n))
QAOA (Quantum Approximate Optimization) Combinatorial optimization 10-25 O(p * edges)
Quantum Error Correction Surface code, stabilizer circuits 9-25 (logical + ancilla) Repetitive syndrome rounds

Memory Scaling Analysis

Quantum state-vector simulation stores the full amplitude vector of 2^n complex numbers. Each amplitude is a pair of f64 values (real + imaginary = 16 bytes). Memory grows exponentially:

Qubits  Amplitudes       State Size     With Scratch Buffer
------  -----------      ----------     -------------------
10      1,024            16 KB          32 KB
15      32,768           512 KB         1 MB
20      1,048,576        16 MB          32 MB
22      4,194,304        64 MB          128 MB
24      16,777,216       256 MB         512 MB
25      33,554,432       512 MB         1.07 GB
26      67,108,864       1.07 GB        2.14 GB
28      268,435,456      4.29 GB        8.59 GB
30      1,073,741,824    17.18 GB       34.36 GB

At 25 qubits the state vector requires approximately 512 MB (1.07 GB with a scratch buffer for intermediate calculations). This is the practical ceiling for WebAssembly's 32-bit address space. Native execution with sufficient RAM can push to 30+ qubits.

Edge Computing Constraints

Cognitum's 256-core processors operate under strict power and memory budgets:

  • Power envelope: Event-driven activation; cores idle at near-zero draw
  • Memory: Shared pool, typically 2-8 GB per node
  • Interconnect: Low-latency mesh between cores, suitable for parallel simulation
  • Workload model: Burst computation triggered by agent events, not continuous

The quantum engine must respect this model: allocate state only when a simulation is triggered, execute the circuit, return results, and immediately release all memory.

Decision

Implement a pure Rust state-vector quantum simulator as a new crate family (ruQu quantum engine) within the ruVector workspace. The following architectural decisions define the engine.

1. Pure Rust Implementation (No C/C++ FFI)

The entire simulation engine is written in Rust with no foreign function interface dependencies. This ensures:

  • Compilation to wasm32-unknown-unknown without emscripten or C toolchains
  • Memory safety guarantees throughout the simulation pipeline
  • Unified build system via Cargo across all targets
  • No external library version conflicts or platform-specific linking issues

2. State-Vector Simulation as Primary Backend

The engine uses explicit full-amplitude state-vector representation as its primary simulation mode. Each gate application transforms the full 2^n amplitude vector via matrix-vector multiplication.

Circuit Execution Model:

  |psi_0> ──[H]──[CNOT]──[Rz(theta)]──[Measure]── classical bits
     |          |            |              |
     v          v            v              v
  [init]    [apply_H]   [apply_CNOT]   [apply_Rz]   [sample]
     |          |            |              |           |
  2^n f64   2^n f64      2^n f64        2^n f64     collapse
  complex   complex      complex        complex     to basis

Gate application follows the standard decomposition:

  • Single-qubit gates: Iterate amplitude pairs (i, i XOR 2^target), apply 2x2 unitary. O(2^n) operations per gate.
  • Two-qubit gates: Iterate amplitude quadruples, apply 4x4 unitary. O(2^n) operations per gate.
  • Multi-qubit gates: Decompose into single and two-qubit gates, or apply directly via 2^k x 2^k matrix on k target qubits.

3. Qubit Limits and Precision

Parameter WASM Target Native Target
Max qubits (default) 25 30+ (RAM-dependent)
Max qubits (hard limit) 26 (with f32) Memory-limited
Precision (default) Complex f64 Complex f64
Precision (optional) Complex f32 Complex f32
State size at max ~1.07 GB ~17 GB at 30 qubits

Complex f64 is the default precision, providing approximately 15 decimal digits of accuracy -- sufficient for quantum chemistry applications and deep circuits where accumulated floating-point error matters. An optional f32 mode halves memory usage at the cost of precision, suitable for shallow circuits and approximate optimization.

4. Event-Driven Activation Model

The engine follows ruVector's event-driven philosophy:

Agent Context          ruQu Engine              Memory
     |                      |                      |
     |-- trigger(circuit) ->|                      |
     |                      |-- allocate(2^n) ---->|
     |                      |<---- state_ptr ------|
     |                      |                      |
     |                      |-- [execute gates] -->|
     |                      |-- [measure] -------->|
     |                      |                      |
     |<-- results ---------|                      |
     |                      |-- deallocate() ----->|
     |                      |                      |
   (idle)                (inert)               (freed)
  • Inert by default: No background threads, no persistent allocations
  • Allocate on demand: State vector created when circuit execution begins
  • Free immediately: All simulation memory released upon result delivery
  • No global state: Multiple concurrent simulations supported via independent state handles (no shared mutable global)

5. Dual-Target Compilation

The crate supports two compilation targets from a single codebase:

                    ruqu-core
                       |
            +----------+----------+
            |                     |
    [native target]       [wasm32-unknown-unknown]
            |                     |
    - Full SIMD (AVX2,      - WASM SIMD128
      AVX-512, NEON)        - 4GB address limit
    - Rayon threading        - Optional SharedArrayBuffer
    - Optional GPU (wgpu)    - No GPU
    - 30+ qubits             - 25 qubit ceiling
    - Full OS integration    - Sandboxed

Conditional compilation via Cargo feature flags controls target-specific code paths. The public API surface is identical across targets.

6. Optional Tensor Network Mode

For circuits with limited entanglement (e.g., shallow QAOA, certain VQE ansatze), the engine offers an optional tensor network backend:

  • Represents the quantum state as a network of tensors rather than a single exponential vector
  • Memory scales as O(n * chi^2) where chi is the bond dimension (maximum entanglement width)
  • Efficient for circuits where entanglement grows slowly or remains bounded
  • Falls back to full state-vector when bond dimension exceeds threshold
  • Enabled via the tensor-network feature flag

Alternatives Considered

Alternative 1: Qukit (Rust, WASM-ready)

A pre-1.0 Rust quantum simulator with WASM support.

Criterion Assessment
Maturity Pre-1.0, limited community
WASM support Present but untested at scale
Optimization Basic; no SIMD, no gate fusion
Integration Would require adapter layer
Maintenance External dependency risk

Rejected: Insufficient optimization depth and maturity for production use.

Alternative 2: QuantRS2 (Rust, Python-focused)

A Rust quantum simulator primarily targeting Python bindings via PyO3.

Criterion Assessment
Performance Good benchmarks on native
WASM support Not a design target
Dependencies Heavy; Python-oriented build
API design Python-first, Rust API secondary
Integration Significant impedance mismatch

Rejected: Python-centric design creates unnecessary weight and integration friction for a Rust-native edge system.

Alternative 3: roqoqo + QuEST (Rust frontend, C backend)

roqoqo provides a Rust circuit description layer; QuEST is a high-performance C/C++ state-vector simulator.

Criterion Assessment
Performance Excellent (QuEST is highly optimized)
WASM support QuEST's C code breaks WASM compilation
Maintenance External C library maintenance burden
Memory safety C backend outside Rust safety guarantees

Rejected: C dependency is incompatible with WASM target requirement.

Alternative 4: Quant-Iron (Rust + OpenCL)

A Rust simulator leveraging OpenCL for GPU acceleration.

Criterion Assessment
Performance Excellent on GPU-equipped hardware
WASM support OpenCL incompatible with WASM
Edge deployment Most edge nodes lack discrete GPUs
Complexity OpenCL runtime adds operational burden

Rejected: OpenCL dependency incompatible with WASM and edge deployment model.

Alternative 5: No Simulator (Cloud Quantum APIs)

Delegate all quantum computation to cloud-based quantum simulators or hardware.

Criterion Assessment
Performance Network-bound latency
Offline support None; requires connectivity
Cost Per-execution charges
Privacy Circuit data sent to third party
Edge philosophy Violates offline-first design

Rejected: Fundamentally incompatible with ruVector's offline-first edge computing philosophy.

Consequences

Positive

  • Full control: Complete ownership of the simulation pipeline, enabling deep integration with ruVector's math, SIMD, and memory subsystems
  • WASM portable: Single codebase compiles to any WASM runtime, enabling browser-based quantum experimentation
  • No external dependencies: Eliminates supply chain risk from C/C++ or Python library dependencies
  • Edge-aligned: Event-driven activation model matches Cognitum's power architecture
  • Extensible: Gate set, noise models, and backends can evolve independently

Negative

  • Development effort: Building a competitive quantum simulator from scratch requires significant engineering investment
  • Maintenance burden: Team must benchmark, optimize, and maintain the simulation engine alongside the rest of ruVector
  • Classical simulation limits: Exponential scaling is a fundamental physics constraint; the engine cannot exceed ~30 qubits on practical hardware

Risks and Mitigations

Risk Likelihood Impact Mitigation
Performance below competitors Medium High Benchmark-driven development against QuantRS2/Qukit
Floating-point accuracy drift Low Medium Comprehensive numerical tests, optional f64 enforcement
WASM memory exhaustion Medium Medium Hard qubit limit with clear error messages (ADR-QE-003)
Scope creep into hardware simulation Low Low Strict scope: gate-model only, no analog/pulse simulation

References