git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
629 lines
23 KiB
Markdown
629 lines
23 KiB
Markdown
# ADR-QE-011: Memory Gating & Power Management
|
|
|
|
**Status**: Proposed
|
|
**Date**: 2026-02-06
|
|
**Authors**: ruv.io, RuVector Team
|
|
**Deciders**: Architecture Review Board
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
ruVector is designed to operate within the Cognitum computing paradigm: a tile-based
|
|
architecture with 256 low-power processor cores, event-driven activation, and
|
|
aggressive power gating. Agents (software components) remain fully dormant until an
|
|
event triggers their activation. Once their work completes, they release all
|
|
resources and return to dormancy.
|
|
|
|
The quantum simulation engine must adhere to this model:
|
|
|
|
1. **Zero idle footprint**: When no simulation is running, the engine consumes zero
|
|
CPU cycles and zero heap memory beyond its compiled code and static data.
|
|
2. **Rapid activation**: The engine must be ready to execute a simulation within
|
|
microseconds of receiving a request.
|
|
3. **Prompt resource release**: Upon simulation completion (or failure), all
|
|
allocated memory is immediately freed.
|
|
4. **Predictable memory**: Callers must be able to determine exact memory
|
|
requirements before committing to a simulation.
|
|
|
|
### Memory Scale
|
|
|
|
The state vector for n qubits requires 2^n complex amplitudes, each consuming 16
|
|
bytes (two f64 values):
|
|
|
|
| Qubits | Amplitudes | Memory | Notes |
|
|
|--------|-----------|--------|-------|
|
|
| 10 | 1,024 | 16 KiB | Trivial |
|
|
| 15 | 32,768 | 512 KiB | Small |
|
|
| 20 | 1,048,576 | 16 MiB | Moderate |
|
|
| 25 | 33,554,432 | 512 MiB | Large |
|
|
| 28 | 268,435,456 | 4 GiB | Needs dedicated memory |
|
|
| 30 | 1,073,741,824 | 16 GiB | Workstation-class |
|
|
| 32 | 4,294,967,296 | 64 GiB | Server-class |
|
|
| 35 | 34,359,738,368 | 512 GiB | HPC |
|
|
| 40 | 1,099,511,627,776 | 16 TiB | Infeasible (state vector) |
|
|
|
|
Each additional qubit doubles memory. This exponential scaling makes memory the
|
|
primary resource constraint and the most important resource to manage.
|
|
|
|
### Edge and Embedded Constraints
|
|
|
|
On edge devices (embedded ruVector nodes, IoT gateways, mobile processors), memory
|
|
is severely limited:
|
|
|
|
| Platform | Typical RAM | Max qubits (state vector) |
|
|
|----------|------------|--------------------------|
|
|
| Cognitum tile (single) | 256 MiB | 23 |
|
|
| Cognitum tile cluster (4) | 1 GiB | 25 |
|
|
| Raspberry Pi 4 | 8 GiB | 28 |
|
|
| Mobile device | 4-6 GiB | 27-28 (with other apps) |
|
|
| Laptop | 16-64 GiB | 29-31 |
|
|
| Server | 256-512 GiB | 33-34 |
|
|
|
|
### WASM Memory Model
|
|
|
|
WebAssembly uses a linear memory that can grow but cannot shrink. Once a large
|
|
simulation allocates pages, those pages remain mapped until the WASM instance is
|
|
destroyed. This is a fundamental platform limitation that must be documented and
|
|
accounted for.
|
|
|
|
## Decision
|
|
|
|
### 1. Zero-Idle Footprint Architecture
|
|
|
|
The quantum engine is implemented as a pure library with no runtime overhead:
|
|
|
|
```rust
|
|
// The engine is a collection of functions and types.
|
|
// No background threads, no event loops, no persistent state.
|
|
// When not called, it consumes exactly zero CPU and zero heap.
|
|
|
|
pub struct QuantumEngine; // Zero-sized type; purely a namespace
|
|
|
|
impl QuantumEngine {
|
|
/// Execute a simulation. All resources are allocated on entry
|
|
/// and freed on exit (or on error).
|
|
pub fn execute(
|
|
circuit: &QuantumCircuit,
|
|
shots: usize,
|
|
config: &SimulationConfig,
|
|
) -> Result<SimulationResult, SimulationError> {
|
|
// 1. Estimate and validate memory
|
|
let required = Self::estimate_memory(circuit.num_qubits());
|
|
Self::validate_memory_available(required)?;
|
|
|
|
// 2. Allocate state vector (the big allocation)
|
|
let mut state = Self::allocate_state(circuit.num_qubits())?;
|
|
|
|
// 3. Execute gates (all computation happens here)
|
|
Self::apply_gates(circuit, &mut state, config)?;
|
|
|
|
// 4. Measure (if requested)
|
|
let measurements = Self::measure(&state, shots)?;
|
|
|
|
// 5. Build result (copies out what we need)
|
|
let result = SimulationResult::from_state_and_measurements(
|
|
&state, measurements, circuit,
|
|
);
|
|
|
|
// 6. state is dropped here -- Vec<Complex<f64>> deallocated
|
|
// No cleanup needed. No finalizers. Just drop.
|
|
|
|
Ok(result)
|
|
}
|
|
// state goes out of scope and is deallocated by Rust's ownership system
|
|
}
|
|
```
|
|
|
|
Key properties:
|
|
- No `new()` or `init()` methods that create persistent state.
|
|
- No `Drop` impl with complex cleanup logic.
|
|
- No `Arc`, `Mutex`, or shared state between calls.
|
|
- Each call is fully independent and self-contained.
|
|
|
|
### 2. On-Demand Allocation Strategy
|
|
|
|
State vectors are allocated at simulation start and freed at simulation end:
|
|
|
|
```rust
|
|
fn allocate_state(n_qubits: u32) -> Result<StateVector, SimulationError> {
|
|
let num_amplitudes = 1_usize.checked_shl(n_qubits)
|
|
.ok_or(SimulationError::QubitLimitExceeded {
|
|
requested: n_qubits,
|
|
maximum: (usize::BITS - 1) as u32,
|
|
estimated_memory_bytes: u64::MAX,
|
|
available_memory_bytes: estimate_available_memory() as u64,
|
|
})?;
|
|
|
|
let required_bytes = num_amplitudes
|
|
.checked_mul(std::mem::size_of::<Complex<f64>>())
|
|
.ok_or(SimulationError::MemoryAllocationFailed {
|
|
requested_bytes: u64::MAX,
|
|
qubit_count: n_qubits,
|
|
suggestion: "Qubit count exceeds addressable memory",
|
|
})?;
|
|
|
|
// Attempt allocation. Rust's global allocator will return an error
|
|
// (with #[global_allocator] configured) or the OS will OOM-kill us.
|
|
// We use try_reserve to handle this gracefully.
|
|
let mut amplitudes = Vec::new();
|
|
amplitudes.try_reserve_exact(num_amplitudes)
|
|
.map_err(|_| SimulationError::MemoryAllocationFailed {
|
|
requested_bytes: required_bytes as u64,
|
|
qubit_count: n_qubits,
|
|
suggestion: "Reduce qubit count or use tensor-network backend",
|
|
})?;
|
|
|
|
// Initialize to |00...0> state
|
|
amplitudes.resize(num_amplitudes, Complex::new(0.0, 0.0));
|
|
amplitudes[0] = Complex::new(1.0, 0.0);
|
|
|
|
Ok(StateVector { amplitudes, n_qubits })
|
|
}
|
|
```
|
|
|
|
The allocation sequence:
|
|
|
|
```
|
|
IDLE (zero memory)
|
|
|
|
|
v
|
|
estimate_memory(n) --> returns bytes needed
|
|
|
|
|
v
|
|
validate_memory_available(bytes) --> checks against OS/platform limits
|
|
| returns Err if insufficient
|
|
v
|
|
Vec::try_reserve_exact(2^n) --> attempts allocation
|
|
| returns Err on failure (no panic)
|
|
v
|
|
ALLOCATED (2^n * 16 bytes on heap)
|
|
|
|
|
v
|
|
[... simulation runs ...]
|
|
|
|
|
v
|
|
Vec::drop() --> automatic deallocation
|
|
|
|
|
v
|
|
IDLE (zero memory)
|
|
```
|
|
|
|
### 3. Memory Estimation API
|
|
|
|
Callers can query exact memory requirements before committing:
|
|
|
|
```rust
|
|
/// Returns the number of bytes required to simulate n_qubits.
|
|
/// This accounts for the state vector plus working memory for
|
|
/// gate application (temporary buffers, measurement arrays, etc.).
|
|
///
|
|
/// # Returns
|
|
/// - `Ok(bytes)` if the qubit count is representable
|
|
/// - `Err(...)` if 2^n_qubits overflows usize
|
|
pub fn estimate_memory(n_qubits: u32) -> Result<MemoryEstimate, SimulationError> {
|
|
let num_amplitudes = 1_usize.checked_shl(n_qubits)
|
|
.ok_or(SimulationError::QubitLimitExceeded {
|
|
requested: n_qubits,
|
|
maximum: (usize::BITS - 1) as u32,
|
|
estimated_memory_bytes: u64::MAX,
|
|
available_memory_bytes: 0,
|
|
})?;
|
|
|
|
let state_vector_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>();
|
|
|
|
// Working memory: temporary buffer for gate application (1 amplitude slice)
|
|
// Plus measurement result storage
|
|
let working_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>() / 4;
|
|
|
|
// Thread-local scratch space (per Rayon thread)
|
|
let thread_count = rayon::current_num_threads();
|
|
let scratch_per_thread = 64 * 1024; // 64 KiB per thread for local buffers
|
|
let thread_scratch = thread_count * scratch_per_thread;
|
|
|
|
Ok(MemoryEstimate {
|
|
state_vector_bytes: state_vector_bytes as u64,
|
|
working_bytes: working_bytes as u64,
|
|
thread_scratch_bytes: thread_scratch as u64,
|
|
total_bytes: (state_vector_bytes + working_bytes + thread_scratch) as u64,
|
|
num_amplitudes: num_amplitudes as u64,
|
|
})
|
|
}
|
|
|
|
#[derive(Debug, Clone)]
|
|
pub struct MemoryEstimate {
|
|
/// Bytes for the state vector (dominant cost).
|
|
pub state_vector_bytes: u64,
|
|
/// Bytes for gate-application working memory.
|
|
pub working_bytes: u64,
|
|
/// Bytes for thread-local scratch space.
|
|
pub thread_scratch_bytes: u64,
|
|
/// Total estimated bytes.
|
|
pub total_bytes: u64,
|
|
/// Number of complex amplitudes.
|
|
pub num_amplitudes: u64,
|
|
}
|
|
|
|
impl MemoryEstimate {
|
|
/// Returns true if the estimate fits within the given byte budget.
|
|
pub fn fits_in(&self, available_bytes: u64) -> bool {
|
|
self.total_bytes <= available_bytes
|
|
}
|
|
|
|
/// Suggest the maximum qubits for a given memory budget.
|
|
pub fn max_qubits_for(available_bytes: u64) -> u32 {
|
|
// Each qubit doubles memory; find largest n where 20 * 2^n <= available
|
|
// Factor of 20 accounts for 16-byte amplitudes + 25% working memory
|
|
let effective = available_bytes / 20;
|
|
if effective == 0 { return 0; }
|
|
(effective.ilog2()) as u32
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. Allocation Failure Handling
|
|
|
|
The engine never panics on allocation failure. All paths return structured errors:
|
|
|
|
```rust
|
|
// Pattern: every allocation is fallible and returns a descriptive error.
|
|
|
|
// State vector allocation failure:
|
|
SimulationError::MemoryAllocationFailed {
|
|
requested_bytes: 17_179_869_184, // 16 GiB
|
|
qubit_count: 30,
|
|
suggestion: "Reduce qubit count by 2 (to 28, ~4 GiB) or enable tensor-network backend",
|
|
}
|
|
|
|
// Integer overflow (qubit count too large):
|
|
SimulationError::QubitLimitExceeded {
|
|
requested: 64,
|
|
maximum: 33, // based on available memory
|
|
estimated_memory_bytes: u64::MAX,
|
|
available_memory_bytes: 68_719_476_736, // 64 GiB
|
|
}
|
|
```
|
|
|
|
Decision tree on allocation failure:
|
|
|
|
```
|
|
Memory allocation failed
|
|
|
|
|
+-- Is tensor-network feature enabled?
|
|
| |
|
|
| +-- YES: Suggest tensor-network backend
|
|
| | (may work if circuit has low treewidth)
|
|
| |
|
|
| +-- NO: Suggest reducing qubit count
|
|
| Calculate: max_qubits = floor(log2(available / 20))
|
|
| Suggest: "Reduce to {max_qubits} qubits ({memory} bytes)"
|
|
|
|
|
+-- Is the request wildly over budget (>100x)?
|
|
| |
|
|
| +-- YES: "Circuit requires {X} GiB but only {Y} MiB available"
|
|
| |
|
|
| +-- NO: "Circuit requires {X} GiB, {Y} GiB available.
|
|
| Reducing by {delta} qubits would fit."
|
|
|
|
|
+-- Return SimulationError (no panic, no abort)
|
|
```
|
|
|
|
### 5. CPU Yielding for Long Simulations
|
|
|
|
For simulations estimated to exceed 100ms, the engine can optionally yield between
|
|
gate batches to allow the OS scheduler to manage power states:
|
|
|
|
```rust
|
|
pub struct YieldConfig {
|
|
/// Enable cooperative yielding between gate batches.
|
|
/// Default: false (maximum throughput).
|
|
pub enabled: bool,
|
|
|
|
/// Number of gates to apply before yielding.
|
|
/// Default: 1000.
|
|
pub gates_per_slice: usize,
|
|
|
|
/// Yield mechanism.
|
|
/// Default: ThreadYield (std::thread::yield_now).
|
|
pub yield_strategy: YieldStrategy,
|
|
}
|
|
|
|
pub enum YieldStrategy {
|
|
/// Call std::thread::yield_now() between slices.
|
|
ThreadYield,
|
|
/// Sleep for specified duration between slices.
|
|
Sleep(Duration),
|
|
/// Call a user-provided callback between slices.
|
|
Callback(Box<dyn Fn(SliceProgress) + Send>),
|
|
}
|
|
|
|
pub struct SliceProgress {
|
|
pub gates_completed: u64,
|
|
pub gates_remaining: u64,
|
|
pub elapsed: Duration,
|
|
pub estimated_remaining: Duration,
|
|
}
|
|
|
|
// Usage in gate application loop:
|
|
fn apply_gates_with_yield(
|
|
circuit: &QuantumCircuit,
|
|
state: &mut StateVector,
|
|
yield_config: &YieldConfig,
|
|
) -> Result<(), SimulationError> {
|
|
let gates = circuit.gates();
|
|
|
|
for (i, gate) in gates.iter().enumerate() {
|
|
apply_single_gate(gate, state)?;
|
|
|
|
if yield_config.enabled && (i + 1) % yield_config.gates_per_slice == 0 {
|
|
match &yield_config.yield_strategy {
|
|
YieldStrategy::ThreadYield => std::thread::yield_now(),
|
|
YieldStrategy::Sleep(d) => std::thread::sleep(*d),
|
|
YieldStrategy::Callback(cb) => cb(SliceProgress {
|
|
gates_completed: (i + 1) as u64,
|
|
gates_remaining: (gates.len() - i - 1) as u64,
|
|
elapsed: start.elapsed(),
|
|
estimated_remaining: estimate_remaining(i, gates.len(), start),
|
|
}),
|
|
}
|
|
}
|
|
}
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
Yield is **disabled by default** to maximize throughput. It is primarily intended
|
|
for:
|
|
- Edge devices where power management is critical.
|
|
- Interactive applications where UI responsiveness matters.
|
|
- Long-running simulations (>1 second) where progress reporting is needed.
|
|
|
|
### 6. Thread Management
|
|
|
|
The quantum engine does not create or manage its own threads:
|
|
|
|
```
|
|
+-----------------------------------------------+
|
|
| Global Rayon Thread Pool |
|
|
| (shared by all ruVector subsystems) |
|
|
| |
|
|
| [Thread 0] [Thread 1] ... [Thread N-1] |
|
|
| ^ ^ ^ |
|
|
| | | | |
|
|
| +--+---+ +--+---+ +---+--+ |
|
|
| | ruQu | | ruQu | | idle | |
|
|
| | gate | | gate | | | |
|
|
| | apply | | apply| | | |
|
|
| +-------+ +------+ +------+ |
|
|
| |
|
|
| During simulation: threads work on gates |
|
|
| After simulation: threads return to pool |
|
|
| Pool idle: OS can power-gate cores |
|
|
+-----------------------------------------------+
|
|
```
|
|
|
|
Key properties:
|
|
- Rayon's global thread pool is initialized once by `ruvector-core` at startup.
|
|
- The quantum engine calls `rayon::par_iter()` and related APIs, borrowing threads
|
|
temporarily.
|
|
- When simulation completes, all threads are returned to the global pool.
|
|
- If no ruVector work is pending, Rayon threads park (blocking on a condvar),
|
|
consuming zero CPU. The OS can then power-gate the underlying cores.
|
|
|
|
### 7. WASM Memory Considerations
|
|
|
|
WebAssembly linear memory has a specific behavior that affects resource management:
|
|
|
|
```
|
|
WASM Memory Layout
|
|
+------------------+------------------+
|
|
| Initial pages | Grown pages |
|
|
| (compiled size) | (runtime alloc) |
|
|
+------------------+------------------+
|
|
0 initial_size current_size
|
|
|
|
Growth: memory.grow(delta_pages) -> adds pages to the end
|
|
Shrink: NOT SUPPORTED in WASM spec
|
|
|
|
After 25-qubit simulation:
|
|
+------------------+----------------------------------+
|
|
| Initial (1 MiB) | Grown for state vec (512 MiB) | <- HIGH WATER MARK
|
|
+------------------+----------------------------------+
|
|
|
|
After simulation completes:
|
|
+------------------+----------------------------------+
|
|
| Initial (1 MiB) | FREED internally but pages |
|
|
| | still mapped (512 MiB virtual) |
|
|
+------------------+----------------------------------+
|
|
The Rust allocator returns memory to its free list,
|
|
but WASM pages are not returned to the host.
|
|
```
|
|
|
|
**Implications and mitigations**:
|
|
|
|
1. **Document the behavior**: Users must understand that WASM memory is a high-water
|
|
mark. A 25-qubit simulation permanently increases the WASM instance's memory
|
|
footprint to ~512 MiB.
|
|
|
|
2. **Instance recycling**: For applications that run multiple simulations, create a
|
|
new WASM instance periodically to reset the memory high-water mark.
|
|
|
|
3. **Memory budget enforcement**: The WASM host can set `WebAssembly.Memory` with a
|
|
`maximum` parameter to cap growth:
|
|
|
|
```javascript
|
|
const memory = new WebAssembly.Memory({
|
|
initial: 16, // 1 MiB
|
|
maximum: 8192, // 512 MiB cap
|
|
});
|
|
```
|
|
|
|
4. **Pre-check in WASM**: The engine's `estimate_memory()` function works in WASM
|
|
and should be called before simulation to verify the allocation will succeed.
|
|
|
|
### 8. Cognitum Tile Integration
|
|
|
|
On Cognitum's tile-based architecture, the quantum engine maps to tiles as follows:
|
|
|
|
```
|
|
Cognitum Processor (256 tiles)
|
|
+--------+--------+--------+--------+
|
|
| Tile 0 | Tile 1 | Tile 2 | Tile 3 | <- Assigned to quantum sim
|
|
| ACTIVE | ACTIVE | ACTIVE | ACTIVE |
|
|
+--------+--------+--------+--------+
|
|
| Tile 4 | Tile 5 | Tile 6 | Tile 7 | <- Other ruVector work (or sleeping)
|
|
| sleep | vecDB | sleep | sleep |
|
|
+--------+--------+--------+--------+
|
|
| ... | ... | ... | ... |
|
|
| sleep | sleep | sleep | sleep | <- Power gated (zero consumption)
|
|
+--------+--------+--------+--------+
|
|
```
|
|
|
|
**Power state diagram for a quantum simulation lifecycle**:
|
|
|
|
```
|
|
State: ALL_TILES_IDLE
|
|
|
|
|
| Simulation request arrives
|
|
v
|
|
State: ALLOCATING
|
|
Action: Wake tiles 0-3 (or however many are needed)
|
|
Action: Allocate state vector across tile-local memory
|
|
Power: Tiles 0-3 ACTIVE, rest SLEEP
|
|
|
|
|
v
|
|
State: SIMULATING
|
|
Action: Apply gates in parallel across active tiles
|
|
Power: Tiles 0-3 at full clock rate
|
|
Duration: microseconds to seconds depending on circuit
|
|
|
|
|
v
|
|
State: MEASURING
|
|
Action: Sample measurement outcomes
|
|
Power: Tile 0 only (measurement is sequential)
|
|
|
|
|
v
|
|
State: DEALLOCATING
|
|
Action: Free state vector
|
|
Action: Return tiles to idle pool
|
|
|
|
|
v
|
|
State: ALL_TILES_IDLE
|
|
Power: Tiles 0-3 back to SLEEP
|
|
Memory: Zero heap allocation
|
|
```
|
|
|
|
**Tile assignment policy**:
|
|
- Small simulations (n <= 20): 1 tile sufficient.
|
|
- Medium simulations (20 < n <= 25): 2-4 tiles for parallel gate application.
|
|
- Large simulations (25 < n <= 30): All available tiles.
|
|
- The tile scheduler (part of Cognitum runtime) handles assignment. The quantum
|
|
engine simply uses Rayon parallelism; the runtime maps Rayon threads to tiles.
|
|
|
|
### 9. Memory Budget Table
|
|
|
|
Quick reference for capacity planning:
|
|
|
|
| Qubits | State Vector | Working Memory | Total | Platform Fit |
|
|
|--------|-------------|---------------|-------|-------------|
|
|
| 10 | 16 KiB | 4 KiB | 20 KiB | Any |
|
|
| 12 | 64 KiB | 16 KiB | 80 KiB | Any |
|
|
| 14 | 256 KiB | 64 KiB | 320 KiB | Any |
|
|
| 16 | 1 MiB | 256 KiB | 1.3 MiB | Any |
|
|
| 18 | 4 MiB | 1 MiB | 5 MiB | Any |
|
|
| 20 | 16 MiB | 4 MiB | 20 MiB | Any |
|
|
| 22 | 64 MiB | 16 MiB | 80 MiB | Cognitum single tile |
|
|
| 24 | 256 MiB | 64 MiB | 320 MiB | Cognitum 2+ tiles |
|
|
| 26 | 1 GiB | 256 MiB | 1.3 GiB | Cognitum cluster |
|
|
| 28 | 4 GiB | 1 GiB | 5 GiB | Laptop / RPi 8GB |
|
|
| 30 | 16 GiB | 4 GiB | 20 GiB | Workstation |
|
|
| 32 | 64 GiB | 16 GiB | 80 GiB | Server |
|
|
| 34 | 256 GiB | 64 GiB | 320 GiB | Large server |
|
|
|
|
### 10. Allocation and Deallocation Sequence Diagram
|
|
|
|
```
|
|
Caller Engine OS/Allocator
|
|
| | |
|
|
| execute(circuit) | |
|
|
|-------------------->| |
|
|
| | |
|
|
| | estimate_memory(n) |
|
|
| | validate_available() |
|
|
| | |
|
|
| | try_reserve_exact(2^n) |
|
|
| |------------------------>|
|
|
| | |
|
|
| | Ok(ptr) or Err |
|
|
| |<------------------------|
|
|
| | |
|
|
| | [if Err: return |
|
|
| | SimulationError] |
|
|
| | |
|
|
| | initialize |00...0> |
|
|
| | apply gates |
|
|
| | measure |
|
|
| | |
|
|
| | build result |
|
|
| | (copies measurements, |
|
|
| | expectation values) |
|
|
| | |
|
|
| | drop(state_vector) |
|
|
| |------------------------>|
|
|
| | | free(ptr, 2^n * 16)
|
|
| | |
|
|
| Ok(result) | |
|
|
|<--------------------| |
|
|
| | |
|
|
| [Engine holds ZERO | |
|
|
| heap memory now] | |
|
|
```
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
1. **True zero-idle cost**: No background resource consumption. Perfectly aligned
|
|
with Cognitum's event-driven architecture and power gating.
|
|
2. **Predictable memory**: `estimate_memory()` gives exact requirements before
|
|
committing, preventing OOM surprises.
|
|
3. **Graceful degradation**: Allocation failures return structured errors with
|
|
actionable suggestions, never panics.
|
|
4. **Platform portable**: The same allocation strategy works on native (Linux, macOS,
|
|
Windows), WASM, and embedded (Cognitum tiles).
|
|
5. **No resource leaks**: Rust's ownership system guarantees deallocation on all
|
|
exit paths (success, error, panic).
|
|
|
|
### Negative
|
|
|
|
1. **No state caching**: Each simulation allocates and deallocates independently.
|
|
Repeated simulations on the same qubit count pay allocation cost each time.
|
|
Mitigation: allocation is O(2^n) but fast compared to O(G * 2^n) simulation.
|
|
2. **WASM memory high-water mark**: Cannot reclaim WASM linear memory pages.
|
|
Documented as a platform limitation with instance-recycling workaround.
|
|
3. **No memory pooling**: Could theoretically amortize allocation across simulations,
|
|
but this conflicts with the zero-idle-footprint requirement.
|
|
4. **Yield overhead**: When enabled, cooperative yielding adds per-slice overhead.
|
|
Mitigated by making it opt-in and configurable.
|
|
|
|
### Risks and Mitigations
|
|
|
|
| Risk | Likelihood | Impact | Mitigation |
|
|
|---|---|---|---|
|
|
| OOM despite estimate_memory check | Low | Crash | Check returns conservative estimate including working memory |
|
|
| WASM instance runs out of address space | Medium | Failure | Set `WebAssembly.Memory` maximum; document limitation |
|
|
| Allocation latency spike (OS page faults) | Medium | Slow start | Consider `madvise` / `mlock` hints for large allocations |
|
|
| Rayon thread pool contention | Medium | Degraded perf | Quantum engine yields between slices; Rayon work-stealing handles contention |
|
|
|
|
## References
|
|
|
|
- Cognitum Architecture Specification: event-driven tile-based computing
|
|
- Rust `Vec::try_reserve_exact`: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.try_reserve_exact
|
|
- WebAssembly Memory: https://webassembly.github.io/spec/core/syntax/modules.html#memories
|
|
- Rayon thread pool: https://docs.rs/rayon
|
|
- ADR-QE-001: Core Engine Architecture (zero-overhead design principle)
|
|
- ADR-QE-005: WASM Compilation Target (WASM constraints)
|
|
- ADR-QE-009: Tensor Network Evaluation Mode (alternative for large circuits)
|
|
- ADR-QE-010: Observability & Monitoring (memory metrics reporting)
|