Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
628
vendor/ruvector/docs/adr/quantum-engine/ADR-QE-011-memory-gating-power-management.md
vendored
Normal file
628
vendor/ruvector/docs/adr/quantum-engine/ADR-QE-011-memory-gating-power-management.md
vendored
Normal file
@@ -0,0 +1,628 @@
|
||||
# ADR-QE-011: Memory Gating & Power Management
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
ruVector is designed to operate within the Cognitum computing paradigm: a tile-based
|
||||
architecture with 256 low-power processor cores, event-driven activation, and
|
||||
aggressive power gating. Agents (software components) remain fully dormant until an
|
||||
event triggers their activation. Once their work completes, they release all
|
||||
resources and return to dormancy.
|
||||
|
||||
The quantum simulation engine must adhere to this model:
|
||||
|
||||
1. **Zero idle footprint**: When no simulation is running, the engine consumes zero
|
||||
CPU cycles and zero heap memory beyond its compiled code and static data.
|
||||
2. **Rapid activation**: The engine must be ready to execute a simulation within
|
||||
microseconds of receiving a request.
|
||||
3. **Prompt resource release**: Upon simulation completion (or failure), all
|
||||
allocated memory is immediately freed.
|
||||
4. **Predictable memory**: Callers must be able to determine exact memory
|
||||
requirements before committing to a simulation.
|
||||
|
||||
### Memory Scale
|
||||
|
||||
The state vector for n qubits requires 2^n complex amplitudes, each consuming 16
|
||||
bytes (two f64 values):
|
||||
|
||||
| Qubits | Amplitudes | Memory | Notes |
|
||||
|--------|-----------|--------|-------|
|
||||
| 10 | 1,024 | 16 KiB | Trivial |
|
||||
| 15 | 32,768 | 512 KiB | Small |
|
||||
| 20 | 1,048,576 | 16 MiB | Moderate |
|
||||
| 25 | 33,554,432 | 512 MiB | Large |
|
||||
| 28 | 268,435,456 | 4 GiB | Needs dedicated memory |
|
||||
| 30 | 1,073,741,824 | 16 GiB | Workstation-class |
|
||||
| 32 | 4,294,967,296 | 64 GiB | Server-class |
|
||||
| 35 | 34,359,738,368 | 512 GiB | HPC |
|
||||
| 40 | 1,099,511,627,776 | 16 TiB | Infeasible (state vector) |
|
||||
|
||||
Each additional qubit doubles memory. This exponential scaling makes memory the
|
||||
primary resource constraint and the most important resource to manage.
|
||||
|
||||
### Edge and Embedded Constraints
|
||||
|
||||
On edge devices (embedded ruVector nodes, IoT gateways, mobile processors), memory
|
||||
is severely limited:
|
||||
|
||||
| Platform | Typical RAM | Max qubits (state vector) |
|
||||
|----------|------------|--------------------------|
|
||||
| Cognitum tile (single) | 256 MiB | 23 |
|
||||
| Cognitum tile cluster (4) | 1 GiB | 25 |
|
||||
| Raspberry Pi 4 | 8 GiB | 28 |
|
||||
| Mobile device | 4-6 GiB | 27-28 (with other apps) |
|
||||
| Laptop | 16-64 GiB | 29-31 |
|
||||
| Server | 256-512 GiB | 33-34 |
|
||||
|
||||
### WASM Memory Model
|
||||
|
||||
WebAssembly uses a linear memory that can grow but cannot shrink. Once a large
|
||||
simulation allocates pages, those pages remain mapped until the WASM instance is
|
||||
destroyed. This is a fundamental platform limitation that must be documented and
|
||||
accounted for.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Zero-Idle Footprint Architecture
|
||||
|
||||
The quantum engine is implemented as a pure library with no runtime overhead:
|
||||
|
||||
```rust
|
||||
// The engine is a collection of functions and types.
|
||||
// No background threads, no event loops, no persistent state.
|
||||
// When not called, it consumes exactly zero CPU and zero heap.
|
||||
|
||||
pub struct QuantumEngine; // Zero-sized type; purely a namespace
|
||||
|
||||
impl QuantumEngine {
|
||||
/// Execute a simulation. All resources are allocated on entry
|
||||
/// and freed on exit (or on error).
|
||||
pub fn execute(
|
||||
circuit: &QuantumCircuit,
|
||||
shots: usize,
|
||||
config: &SimulationConfig,
|
||||
) -> Result<SimulationResult, SimulationError> {
|
||||
// 1. Estimate and validate memory
|
||||
let required = Self::estimate_memory(circuit.num_qubits());
|
||||
Self::validate_memory_available(required)?;
|
||||
|
||||
// 2. Allocate state vector (the big allocation)
|
||||
let mut state = Self::allocate_state(circuit.num_qubits())?;
|
||||
|
||||
// 3. Execute gates (all computation happens here)
|
||||
Self::apply_gates(circuit, &mut state, config)?;
|
||||
|
||||
// 4. Measure (if requested)
|
||||
let measurements = Self::measure(&state, shots)?;
|
||||
|
||||
// 5. Build result (copies out what we need)
|
||||
let result = SimulationResult::from_state_and_measurements(
|
||||
&state, measurements, circuit,
|
||||
);
|
||||
|
||||
// 6. state is dropped here -- Vec<Complex<f64>> deallocated
|
||||
// No cleanup needed. No finalizers. Just drop.
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
// state goes out of scope and is deallocated by Rust's ownership system
|
||||
}
|
||||
```
|
||||
|
||||
Key properties:
|
||||
- No `new()` or `init()` methods that create persistent state.
|
||||
- No `Drop` impl with complex cleanup logic.
|
||||
- No `Arc`, `Mutex`, or shared state between calls.
|
||||
- Each call is fully independent and self-contained.
|
||||
|
||||
### 2. On-Demand Allocation Strategy
|
||||
|
||||
State vectors are allocated at simulation start and freed at simulation end:
|
||||
|
||||
```rust
|
||||
fn allocate_state(n_qubits: u32) -> Result<StateVector, SimulationError> {
|
||||
let num_amplitudes = 1_usize.checked_shl(n_qubits)
|
||||
.ok_or(SimulationError::QubitLimitExceeded {
|
||||
requested: n_qubits,
|
||||
maximum: (usize::BITS - 1) as u32,
|
||||
estimated_memory_bytes: u64::MAX,
|
||||
available_memory_bytes: estimate_available_memory() as u64,
|
||||
})?;
|
||||
|
||||
let required_bytes = num_amplitudes
|
||||
.checked_mul(std::mem::size_of::<Complex<f64>>())
|
||||
.ok_or(SimulationError::MemoryAllocationFailed {
|
||||
requested_bytes: u64::MAX,
|
||||
qubit_count: n_qubits,
|
||||
suggestion: "Qubit count exceeds addressable memory",
|
||||
})?;
|
||||
|
||||
// Attempt allocation. Rust's global allocator will return an error
|
||||
// (with #[global_allocator] configured) or the OS will OOM-kill us.
|
||||
// We use try_reserve to handle this gracefully.
|
||||
let mut amplitudes = Vec::new();
|
||||
amplitudes.try_reserve_exact(num_amplitudes)
|
||||
.map_err(|_| SimulationError::MemoryAllocationFailed {
|
||||
requested_bytes: required_bytes as u64,
|
||||
qubit_count: n_qubits,
|
||||
suggestion: "Reduce qubit count or use tensor-network backend",
|
||||
})?;
|
||||
|
||||
// Initialize to |00...0> state
|
||||
amplitudes.resize(num_amplitudes, Complex::new(0.0, 0.0));
|
||||
amplitudes[0] = Complex::new(1.0, 0.0);
|
||||
|
||||
Ok(StateVector { amplitudes, n_qubits })
|
||||
}
|
||||
```
|
||||
|
||||
The allocation sequence:
|
||||
|
||||
```
|
||||
IDLE (zero memory)
|
||||
|
|
||||
v
|
||||
estimate_memory(n) --> returns bytes needed
|
||||
|
|
||||
v
|
||||
validate_memory_available(bytes) --> checks against OS/platform limits
|
||||
| returns Err if insufficient
|
||||
v
|
||||
Vec::try_reserve_exact(2^n) --> attempts allocation
|
||||
| returns Err on failure (no panic)
|
||||
v
|
||||
ALLOCATED (2^n * 16 bytes on heap)
|
||||
|
|
||||
v
|
||||
[... simulation runs ...]
|
||||
|
|
||||
v
|
||||
Vec::drop() --> automatic deallocation
|
||||
|
|
||||
v
|
||||
IDLE (zero memory)
|
||||
```
|
||||
|
||||
### 3. Memory Estimation API
|
||||
|
||||
Callers can query exact memory requirements before committing:
|
||||
|
||||
```rust
|
||||
/// Returns the number of bytes required to simulate n_qubits.
|
||||
/// This accounts for the state vector plus working memory for
|
||||
/// gate application (temporary buffers, measurement arrays, etc.).
|
||||
///
|
||||
/// # Returns
|
||||
/// - `Ok(bytes)` if the qubit count is representable
|
||||
/// - `Err(...)` if 2^n_qubits overflows usize
|
||||
pub fn estimate_memory(n_qubits: u32) -> Result<MemoryEstimate, SimulationError> {
|
||||
let num_amplitudes = 1_usize.checked_shl(n_qubits)
|
||||
.ok_or(SimulationError::QubitLimitExceeded {
|
||||
requested: n_qubits,
|
||||
maximum: (usize::BITS - 1) as u32,
|
||||
estimated_memory_bytes: u64::MAX,
|
||||
available_memory_bytes: 0,
|
||||
})?;
|
||||
|
||||
let state_vector_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>();
|
||||
|
||||
// Working memory: temporary buffer for gate application (1 amplitude slice)
|
||||
// Plus measurement result storage
|
||||
let working_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>() / 4;
|
||||
|
||||
// Thread-local scratch space (per Rayon thread)
|
||||
let thread_count = rayon::current_num_threads();
|
||||
let scratch_per_thread = 64 * 1024; // 64 KiB per thread for local buffers
|
||||
let thread_scratch = thread_count * scratch_per_thread;
|
||||
|
||||
Ok(MemoryEstimate {
|
||||
state_vector_bytes: state_vector_bytes as u64,
|
||||
working_bytes: working_bytes as u64,
|
||||
thread_scratch_bytes: thread_scratch as u64,
|
||||
total_bytes: (state_vector_bytes + working_bytes + thread_scratch) as u64,
|
||||
num_amplitudes: num_amplitudes as u64,
|
||||
})
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct MemoryEstimate {
|
||||
/// Bytes for the state vector (dominant cost).
|
||||
pub state_vector_bytes: u64,
|
||||
/// Bytes for gate-application working memory.
|
||||
pub working_bytes: u64,
|
||||
/// Bytes for thread-local scratch space.
|
||||
pub thread_scratch_bytes: u64,
|
||||
/// Total estimated bytes.
|
||||
pub total_bytes: u64,
|
||||
/// Number of complex amplitudes.
|
||||
pub num_amplitudes: u64,
|
||||
}
|
||||
|
||||
impl MemoryEstimate {
|
||||
/// Returns true if the estimate fits within the given byte budget.
|
||||
pub fn fits_in(&self, available_bytes: u64) -> bool {
|
||||
self.total_bytes <= available_bytes
|
||||
}
|
||||
|
||||
/// Suggest the maximum qubits for a given memory budget.
|
||||
pub fn max_qubits_for(available_bytes: u64) -> u32 {
|
||||
// Each qubit doubles memory; find largest n where 20 * 2^n <= available
|
||||
// Factor of 20 accounts for 16-byte amplitudes + 25% working memory
|
||||
let effective = available_bytes / 20;
|
||||
if effective == 0 { return 0; }
|
||||
(effective.ilog2()) as u32
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Allocation Failure Handling
|
||||
|
||||
The engine never panics on allocation failure. All paths return structured errors:
|
||||
|
||||
```rust
|
||||
// Pattern: every allocation is fallible and returns a descriptive error.
|
||||
|
||||
// State vector allocation failure:
|
||||
SimulationError::MemoryAllocationFailed {
|
||||
requested_bytes: 17_179_869_184, // 16 GiB
|
||||
qubit_count: 30,
|
||||
suggestion: "Reduce qubit count by 2 (to 28, ~4 GiB) or enable tensor-network backend",
|
||||
}
|
||||
|
||||
// Integer overflow (qubit count too large):
|
||||
SimulationError::QubitLimitExceeded {
|
||||
requested: 64,
|
||||
maximum: 33, // based on available memory
|
||||
estimated_memory_bytes: u64::MAX,
|
||||
available_memory_bytes: 68_719_476_736, // 64 GiB
|
||||
}
|
||||
```
|
||||
|
||||
Decision tree on allocation failure:
|
||||
|
||||
```
|
||||
Memory allocation failed
|
||||
|
|
||||
+-- Is tensor-network feature enabled?
|
||||
| |
|
||||
| +-- YES: Suggest tensor-network backend
|
||||
| | (may work if circuit has low treewidth)
|
||||
| |
|
||||
| +-- NO: Suggest reducing qubit count
|
||||
| Calculate: max_qubits = floor(log2(available / 20))
|
||||
| Suggest: "Reduce to {max_qubits} qubits ({memory} bytes)"
|
||||
|
|
||||
+-- Is the request wildly over budget (>100x)?
|
||||
| |
|
||||
| +-- YES: "Circuit requires {X} GiB but only {Y} MiB available"
|
||||
| |
|
||||
| +-- NO: "Circuit requires {X} GiB, {Y} GiB available.
|
||||
| Reducing by {delta} qubits would fit."
|
||||
|
|
||||
+-- Return SimulationError (no panic, no abort)
|
||||
```
|
||||
|
||||
### 5. CPU Yielding for Long Simulations
|
||||
|
||||
For simulations estimated to exceed 100ms, the engine can optionally yield between
|
||||
gate batches to allow the OS scheduler to manage power states:
|
||||
|
||||
```rust
|
||||
pub struct YieldConfig {
|
||||
/// Enable cooperative yielding between gate batches.
|
||||
/// Default: false (maximum throughput).
|
||||
pub enabled: bool,
|
||||
|
||||
/// Number of gates to apply before yielding.
|
||||
/// Default: 1000.
|
||||
pub gates_per_slice: usize,
|
||||
|
||||
/// Yield mechanism.
|
||||
/// Default: ThreadYield (std::thread::yield_now).
|
||||
pub yield_strategy: YieldStrategy,
|
||||
}
|
||||
|
||||
pub enum YieldStrategy {
|
||||
/// Call std::thread::yield_now() between slices.
|
||||
ThreadYield,
|
||||
/// Sleep for specified duration between slices.
|
||||
Sleep(Duration),
|
||||
/// Call a user-provided callback between slices.
|
||||
Callback(Box<dyn Fn(SliceProgress) + Send>),
|
||||
}
|
||||
|
||||
pub struct SliceProgress {
|
||||
pub gates_completed: u64,
|
||||
pub gates_remaining: u64,
|
||||
pub elapsed: Duration,
|
||||
pub estimated_remaining: Duration,
|
||||
}
|
||||
|
||||
// Usage in gate application loop:
|
||||
fn apply_gates_with_yield(
|
||||
circuit: &QuantumCircuit,
|
||||
state: &mut StateVector,
|
||||
yield_config: &YieldConfig,
|
||||
) -> Result<(), SimulationError> {
|
||||
let gates = circuit.gates();
|
||||
|
||||
for (i, gate) in gates.iter().enumerate() {
|
||||
apply_single_gate(gate, state)?;
|
||||
|
||||
if yield_config.enabled && (i + 1) % yield_config.gates_per_slice == 0 {
|
||||
match &yield_config.yield_strategy {
|
||||
YieldStrategy::ThreadYield => std::thread::yield_now(),
|
||||
YieldStrategy::Sleep(d) => std::thread::sleep(*d),
|
||||
YieldStrategy::Callback(cb) => cb(SliceProgress {
|
||||
gates_completed: (i + 1) as u64,
|
||||
gates_remaining: (gates.len() - i - 1) as u64,
|
||||
elapsed: start.elapsed(),
|
||||
estimated_remaining: estimate_remaining(i, gates.len(), start),
|
||||
}),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
Yield is **disabled by default** to maximize throughput. It is primarily intended
|
||||
for:
|
||||
- Edge devices where power management is critical.
|
||||
- Interactive applications where UI responsiveness matters.
|
||||
- Long-running simulations (>1 second) where progress reporting is needed.
|
||||
|
||||
### 6. Thread Management
|
||||
|
||||
The quantum engine does not create or manage its own threads:
|
||||
|
||||
```
|
||||
+-----------------------------------------------+
|
||||
| Global Rayon Thread Pool |
|
||||
| (shared by all ruVector subsystems) |
|
||||
| |
|
||||
| [Thread 0] [Thread 1] ... [Thread N-1] |
|
||||
| ^ ^ ^ |
|
||||
| | | | |
|
||||
| +--+---+ +--+---+ +---+--+ |
|
||||
| | ruQu | | ruQu | | idle | |
|
||||
| | gate | | gate | | | |
|
||||
| | apply | | apply| | | |
|
||||
| +-------+ +------+ +------+ |
|
||||
| |
|
||||
| During simulation: threads work on gates |
|
||||
| After simulation: threads return to pool |
|
||||
| Pool idle: OS can power-gate cores |
|
||||
+-----------------------------------------------+
|
||||
```
|
||||
|
||||
Key properties:
|
||||
- Rayon's global thread pool is initialized once by `ruvector-core` at startup.
|
||||
- The quantum engine calls `rayon::par_iter()` and related APIs, borrowing threads
|
||||
temporarily.
|
||||
- When simulation completes, all threads are returned to the global pool.
|
||||
- If no ruVector work is pending, Rayon threads park (blocking on a condvar),
|
||||
consuming zero CPU. The OS can then power-gate the underlying cores.
|
||||
|
||||
### 7. WASM Memory Considerations
|
||||
|
||||
WebAssembly linear memory has a specific behavior that affects resource management:
|
||||
|
||||
```
|
||||
WASM Memory Layout
|
||||
+------------------+------------------+
|
||||
| Initial pages | Grown pages |
|
||||
| (compiled size) | (runtime alloc) |
|
||||
+------------------+------------------+
|
||||
0 initial_size current_size
|
||||
|
||||
Growth: memory.grow(delta_pages) -> adds pages to the end
|
||||
Shrink: NOT SUPPORTED in WASM spec
|
||||
|
||||
After 25-qubit simulation:
|
||||
+------------------+----------------------------------+
|
||||
| Initial (1 MiB) | Grown for state vec (512 MiB) | <- HIGH WATER MARK
|
||||
+------------------+----------------------------------+
|
||||
|
||||
After simulation completes:
|
||||
+------------------+----------------------------------+
|
||||
| Initial (1 MiB) | FREED internally but pages |
|
||||
| | still mapped (512 MiB virtual) |
|
||||
+------------------+----------------------------------+
|
||||
The Rust allocator returns memory to its free list,
|
||||
but WASM pages are not returned to the host.
|
||||
```
|
||||
|
||||
**Implications and mitigations**:
|
||||
|
||||
1. **Document the behavior**: Users must understand that WASM memory is a high-water
|
||||
mark. A 25-qubit simulation permanently increases the WASM instance's memory
|
||||
footprint to ~512 MiB.
|
||||
|
||||
2. **Instance recycling**: For applications that run multiple simulations, create a
|
||||
new WASM instance periodically to reset the memory high-water mark.
|
||||
|
||||
3. **Memory budget enforcement**: The WASM host can set `WebAssembly.Memory` with a
|
||||
`maximum` parameter to cap growth:
|
||||
|
||||
```javascript
|
||||
const memory = new WebAssembly.Memory({
|
||||
initial: 16, // 1 MiB
|
||||
maximum: 8192, // 512 MiB cap
|
||||
});
|
||||
```
|
||||
|
||||
4. **Pre-check in WASM**: The engine's `estimate_memory()` function works in WASM
|
||||
and should be called before simulation to verify the allocation will succeed.
|
||||
|
||||
### 8. Cognitum Tile Integration
|
||||
|
||||
On Cognitum's tile-based architecture, the quantum engine maps to tiles as follows:
|
||||
|
||||
```
|
||||
Cognitum Processor (256 tiles)
|
||||
+--------+--------+--------+--------+
|
||||
| Tile 0 | Tile 1 | Tile 2 | Tile 3 | <- Assigned to quantum sim
|
||||
| ACTIVE | ACTIVE | ACTIVE | ACTIVE |
|
||||
+--------+--------+--------+--------+
|
||||
| Tile 4 | Tile 5 | Tile 6 | Tile 7 | <- Other ruVector work (or sleeping)
|
||||
| sleep | vecDB | sleep | sleep |
|
||||
+--------+--------+--------+--------+
|
||||
| ... | ... | ... | ... |
|
||||
| sleep | sleep | sleep | sleep | <- Power gated (zero consumption)
|
||||
+--------+--------+--------+--------+
|
||||
```
|
||||
|
||||
**Power state diagram for a quantum simulation lifecycle**:
|
||||
|
||||
```
|
||||
State: ALL_TILES_IDLE
|
||||
|
|
||||
| Simulation request arrives
|
||||
v
|
||||
State: ALLOCATING
|
||||
Action: Wake tiles 0-3 (or however many are needed)
|
||||
Action: Allocate state vector across tile-local memory
|
||||
Power: Tiles 0-3 ACTIVE, rest SLEEP
|
||||
|
|
||||
v
|
||||
State: SIMULATING
|
||||
Action: Apply gates in parallel across active tiles
|
||||
Power: Tiles 0-3 at full clock rate
|
||||
Duration: microseconds to seconds depending on circuit
|
||||
|
|
||||
v
|
||||
State: MEASURING
|
||||
Action: Sample measurement outcomes
|
||||
Power: Tile 0 only (measurement is sequential)
|
||||
|
|
||||
v
|
||||
State: DEALLOCATING
|
||||
Action: Free state vector
|
||||
Action: Return tiles to idle pool
|
||||
|
|
||||
v
|
||||
State: ALL_TILES_IDLE
|
||||
Power: Tiles 0-3 back to SLEEP
|
||||
Memory: Zero heap allocation
|
||||
```
|
||||
|
||||
**Tile assignment policy**:
|
||||
- Small simulations (n <= 20): 1 tile sufficient.
|
||||
- Medium simulations (20 < n <= 25): 2-4 tiles for parallel gate application.
|
||||
- Large simulations (25 < n <= 30): All available tiles.
|
||||
- The tile scheduler (part of Cognitum runtime) handles assignment. The quantum
|
||||
engine simply uses Rayon parallelism; the runtime maps Rayon threads to tiles.
|
||||
|
||||
### 9. Memory Budget Table
|
||||
|
||||
Quick reference for capacity planning:
|
||||
|
||||
| Qubits | State Vector | Working Memory | Total | Platform Fit |
|
||||
|--------|-------------|---------------|-------|-------------|
|
||||
| 10 | 16 KiB | 4 KiB | 20 KiB | Any |
|
||||
| 12 | 64 KiB | 16 KiB | 80 KiB | Any |
|
||||
| 14 | 256 KiB | 64 KiB | 320 KiB | Any |
|
||||
| 16 | 1 MiB | 256 KiB | 1.3 MiB | Any |
|
||||
| 18 | 4 MiB | 1 MiB | 5 MiB | Any |
|
||||
| 20 | 16 MiB | 4 MiB | 20 MiB | Any |
|
||||
| 22 | 64 MiB | 16 MiB | 80 MiB | Cognitum single tile |
|
||||
| 24 | 256 MiB | 64 MiB | 320 MiB | Cognitum 2+ tiles |
|
||||
| 26 | 1 GiB | 256 MiB | 1.3 GiB | Cognitum cluster |
|
||||
| 28 | 4 GiB | 1 GiB | 5 GiB | Laptop / RPi 8GB |
|
||||
| 30 | 16 GiB | 4 GiB | 20 GiB | Workstation |
|
||||
| 32 | 64 GiB | 16 GiB | 80 GiB | Server |
|
||||
| 34 | 256 GiB | 64 GiB | 320 GiB | Large server |
|
||||
|
||||
### 10. Allocation and Deallocation Sequence Diagram
|
||||
|
||||
```
|
||||
Caller Engine OS/Allocator
|
||||
| | |
|
||||
| execute(circuit) | |
|
||||
|-------------------->| |
|
||||
| | |
|
||||
| | estimate_memory(n) |
|
||||
| | validate_available() |
|
||||
| | |
|
||||
| | try_reserve_exact(2^n) |
|
||||
| |------------------------>|
|
||||
| | |
|
||||
| | Ok(ptr) or Err |
|
||||
| |<------------------------|
|
||||
| | |
|
||||
| | [if Err: return |
|
||||
| | SimulationError] |
|
||||
| | |
|
||||
| | initialize |00...0> |
|
||||
| | apply gates |
|
||||
| | measure |
|
||||
| | |
|
||||
| | build result |
|
||||
| | (copies measurements, |
|
||||
| | expectation values) |
|
||||
| | |
|
||||
| | drop(state_vector) |
|
||||
| |------------------------>|
|
||||
| | | free(ptr, 2^n * 16)
|
||||
| | |
|
||||
| Ok(result) | |
|
||||
|<--------------------| |
|
||||
| | |
|
||||
| [Engine holds ZERO | |
|
||||
| heap memory now] | |
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **True zero-idle cost**: No background resource consumption. Perfectly aligned
|
||||
with Cognitum's event-driven architecture and power gating.
|
||||
2. **Predictable memory**: `estimate_memory()` gives exact requirements before
|
||||
committing, preventing OOM surprises.
|
||||
3. **Graceful degradation**: Allocation failures return structured errors with
|
||||
actionable suggestions, never panics.
|
||||
4. **Platform portable**: The same allocation strategy works on native (Linux, macOS,
|
||||
Windows), WASM, and embedded (Cognitum tiles).
|
||||
5. **No resource leaks**: Rust's ownership system guarantees deallocation on all
|
||||
exit paths (success, error, panic).
|
||||
|
||||
### Negative
|
||||
|
||||
1. **No state caching**: Each simulation allocates and deallocates independently.
|
||||
Repeated simulations on the same qubit count pay allocation cost each time.
|
||||
Mitigation: allocation is O(2^n) but fast compared to O(G * 2^n) simulation.
|
||||
2. **WASM memory high-water mark**: Cannot reclaim WASM linear memory pages.
|
||||
Documented as a platform limitation with instance-recycling workaround.
|
||||
3. **No memory pooling**: Could theoretically amortize allocation across simulations,
|
||||
but this conflicts with the zero-idle-footprint requirement.
|
||||
4. **Yield overhead**: When enabled, cooperative yielding adds per-slice overhead.
|
||||
Mitigated by making it opt-in and configurable.
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| OOM despite estimate_memory check | Low | Crash | Check returns conservative estimate including working memory |
|
||||
| WASM instance runs out of address space | Medium | Failure | Set `WebAssembly.Memory` maximum; document limitation |
|
||||
| Allocation latency spike (OS page faults) | Medium | Slow start | Consider `madvise` / `mlock` hints for large allocations |
|
||||
| Rayon thread pool contention | Medium | Degraded perf | Quantum engine yields between slices; Rayon work-stealing handles contention |
|
||||
|
||||
## References
|
||||
|
||||
- Cognitum Architecture Specification: event-driven tile-based computing
|
||||
- Rust `Vec::try_reserve_exact`: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.try_reserve_exact
|
||||
- WebAssembly Memory: https://webassembly.github.io/spec/core/syntax/modules.html#memories
|
||||
- Rayon thread pool: https://docs.rs/rayon
|
||||
- ADR-QE-001: Core Engine Architecture (zero-overhead design principle)
|
||||
- ADR-QE-005: WASM Compilation Target (WASM constraints)
|
||||
- ADR-QE-009: Tensor Network Evaluation Mode (alternative for large circuits)
|
||||
- ADR-QE-010: Observability & Monitoring (memory metrics reporting)
|
||||
Reference in New Issue
Block a user