Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/docs/adr/quantum-engine/ADR-QE-011-memory-gating-power-management.md
+++ b/vendor/ruvector/docs/adr/quantum-engine/ADR-QE-011-memory-gating-power-management.md
@@ -0,0 +1,628 @@
+# ADR-QE-011: Memory Gating & Power Management
+
+**Status**: Proposed
+**Date**: 2026-02-06
+**Authors**: ruv.io, RuVector Team
+**Deciders**: Architecture Review Board
+
+---
+
+## Context
+
+ruVector is designed to operate within the Cognitum computing paradigm: a tile-based
+architecture with 256 low-power processor cores, event-driven activation, and
+aggressive power gating. Agents (software components) remain fully dormant until an
+event triggers their activation. Once their work completes, they release all
+resources and return to dormancy.
+
+The quantum simulation engine must adhere to this model:
+
+1. **Zero idle footprint**: When no simulation is running, the engine consumes zero
+   CPU cycles and zero heap memory beyond its compiled code and static data.
+2. **Rapid activation**: The engine must be ready to execute a simulation within
+   microseconds of receiving a request.
+3. **Prompt resource release**: Upon simulation completion (or failure), all
+   allocated memory is immediately freed.
+4. **Predictable memory**: Callers must be able to determine exact memory
+   requirements before committing to a simulation.
+
+### Memory Scale
+
+The state vector for n qubits requires 2^n complex amplitudes, each consuming 16
+bytes (two f64 values):
+
+| Qubits | Amplitudes | Memory | Notes |
+|--------|-----------|--------|-------|
+| 10 | 1,024 | 16 KiB | Trivial |
+| 15 | 32,768 | 512 KiB | Small |
+| 20 | 1,048,576 | 16 MiB | Moderate |
+| 25 | 33,554,432 | 512 MiB | Large |
+| 28 | 268,435,456 | 4 GiB | Needs dedicated memory |
+| 30 | 1,073,741,824 | 16 GiB | Workstation-class |
+| 32 | 4,294,967,296 | 64 GiB | Server-class |
+| 35 | 34,359,738,368 | 512 GiB | HPC |
+| 40 | 1,099,511,627,776 | 16 TiB | Infeasible (state vector) |
+
+Each additional qubit doubles memory. This exponential scaling makes memory the
+primary resource constraint and the most important resource to manage.
+
+### Edge and Embedded Constraints
+
+On edge devices (embedded ruVector nodes, IoT gateways, mobile processors), memory
+is severely limited:
+
+| Platform | Typical RAM | Max qubits (state vector) |
+|----------|------------|--------------------------|
+| Cognitum tile (single) | 256 MiB | 23 |
+| Cognitum tile cluster (4) | 1 GiB | 25 |
+| Raspberry Pi 4 | 8 GiB | 28 |
+| Mobile device | 4-6 GiB | 27-28 (with other apps) |
+| Laptop | 16-64 GiB | 29-31 |
+| Server | 256-512 GiB | 33-34 |
+
+### WASM Memory Model
+
+WebAssembly uses a linear memory that can grow but cannot shrink. Once a large
+simulation allocates pages, those pages remain mapped until the WASM instance is
+destroyed. This is a fundamental platform limitation that must be documented and
+accounted for.
+
+## Decision
+
+### 1. Zero-Idle Footprint Architecture
+
+The quantum engine is implemented as a pure library with no runtime overhead:
+
+```rust
+// The engine is a collection of functions and types.
+// No background threads, no event loops, no persistent state.
+// When not called, it consumes exactly zero CPU and zero heap.
+
+pub struct QuantumEngine;  // Zero-sized type; purely a namespace
+
+impl QuantumEngine {
+    /// Execute a simulation. All resources are allocated on entry
+    /// and freed on exit (or on error).
+    pub fn execute(
+        circuit: &QuantumCircuit,
+        shots: usize,
+        config: &SimulationConfig,
+    ) -> Result<SimulationResult, SimulationError> {
+        // 1. Estimate and validate memory
+        let required = Self::estimate_memory(circuit.num_qubits());
+        Self::validate_memory_available(required)?;
+
+        // 2. Allocate state vector (the big allocation)
+        let mut state = Self::allocate_state(circuit.num_qubits())?;
+
+        // 3. Execute gates (all computation happens here)
+        Self::apply_gates(circuit, &mut state, config)?;
+
+        // 4. Measure (if requested)
+        let measurements = Self::measure(&state, shots)?;
+
+        // 5. Build result (copies out what we need)
+        let result = SimulationResult::from_state_and_measurements(
+            &state, measurements, circuit,
+        );
+
+        // 6. state is dropped here -- Vec<Complex<f64>> deallocated
+        //    No cleanup needed. No finalizers. Just drop.
+
+        Ok(result)
+    }
+    // state goes out of scope and is deallocated by Rust's ownership system
+}
+```
+
+Key properties:
+- No `new()` or `init()` methods that create persistent state.
+- No `Drop` impl with complex cleanup logic.
+- No `Arc`, `Mutex`, or shared state between calls.
+- Each call is fully independent and self-contained.
+
+### 2. On-Demand Allocation Strategy
+
+State vectors are allocated at simulation start and freed at simulation end:
+
+```rust
+fn allocate_state(n_qubits: u32) -> Result<StateVector, SimulationError> {
+    let num_amplitudes = 1_usize.checked_shl(n_qubits)
+        .ok_or(SimulationError::QubitLimitExceeded {
+            requested: n_qubits,
+            maximum: (usize::BITS - 1) as u32,
+            estimated_memory_bytes: u64::MAX,
+            available_memory_bytes: estimate_available_memory() as u64,
+        })?;
+
+    let required_bytes = num_amplitudes
+        .checked_mul(std::mem::size_of::<Complex<f64>>())
+        .ok_or(SimulationError::MemoryAllocationFailed {
+            requested_bytes: u64::MAX,
+            qubit_count: n_qubits,
+            suggestion: "Qubit count exceeds addressable memory",
+        })?;
+
+    // Attempt allocation. Rust's global allocator will return an error
+    // (with #[global_allocator] configured) or the OS will OOM-kill us.
+    // We use try_reserve to handle this gracefully.
+    let mut amplitudes = Vec::new();
+    amplitudes.try_reserve_exact(num_amplitudes)
+        .map_err(|_| SimulationError::MemoryAllocationFailed {
+            requested_bytes: required_bytes as u64,
+            qubit_count: n_qubits,
+            suggestion: "Reduce qubit count or use tensor-network backend",
+        })?;
+
+    // Initialize to |00...0> state
+    amplitudes.resize(num_amplitudes, Complex::new(0.0, 0.0));
+    amplitudes[0] = Complex::new(1.0, 0.0);
+
+    Ok(StateVector { amplitudes, n_qubits })
+}
+```
+
+The allocation sequence:
+
+```
+  IDLE (zero memory)
+    |
+    v
+  estimate_memory(n) --> returns bytes needed
+    |
+    v
+  validate_memory_available(bytes) --> checks against OS/platform limits
+    |                                   returns Err if insufficient
+    v
+  Vec::try_reserve_exact(2^n) --> attempts allocation
+    |                              returns Err on failure (no panic)
+    v
+  ALLOCATED (2^n * 16 bytes on heap)
+    |
+    v
+  [... simulation runs ...]
+    |
+    v
+  Vec::drop() --> automatic deallocation
+    |
+    v
+  IDLE (zero memory)
+```
+
+### 3. Memory Estimation API
+
+Callers can query exact memory requirements before committing:
+
+```rust
+/// Returns the number of bytes required to simulate n_qubits.
+/// This accounts for the state vector plus working memory for
+/// gate application (temporary buffers, measurement arrays, etc.).
+///
+/// # Returns
+/// - `Ok(bytes)` if the qubit count is representable
+/// - `Err(...)` if 2^n_qubits overflows usize
+pub fn estimate_memory(n_qubits: u32) -> Result<MemoryEstimate, SimulationError> {
+    let num_amplitudes = 1_usize.checked_shl(n_qubits)
+        .ok_or(SimulationError::QubitLimitExceeded {
+            requested: n_qubits,
+            maximum: (usize::BITS - 1) as u32,
+            estimated_memory_bytes: u64::MAX,
+            available_memory_bytes: 0,
+        })?;
+
+    let state_vector_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>();
+
+    // Working memory: temporary buffer for gate application (1 amplitude slice)
+    // Plus measurement result storage
+    let working_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>() / 4;
+
+    // Thread-local scratch space (per Rayon thread)
+    let thread_count = rayon::current_num_threads();
+    let scratch_per_thread = 64 * 1024; // 64 KiB per thread for local buffers
+    let thread_scratch = thread_count * scratch_per_thread;
+
+    Ok(MemoryEstimate {
+        state_vector_bytes: state_vector_bytes as u64,
+        working_bytes: working_bytes as u64,
+        thread_scratch_bytes: thread_scratch as u64,
+        total_bytes: (state_vector_bytes + working_bytes + thread_scratch) as u64,
+        num_amplitudes: num_amplitudes as u64,
+    })
+}
+
+#[derive(Debug, Clone)]
+pub struct MemoryEstimate {
+    /// Bytes for the state vector (dominant cost).
+    pub state_vector_bytes: u64,
+    /// Bytes for gate-application working memory.
+    pub working_bytes: u64,
+    /// Bytes for thread-local scratch space.
+    pub thread_scratch_bytes: u64,
+    /// Total estimated bytes.
+    pub total_bytes: u64,
+    /// Number of complex amplitudes.
+    pub num_amplitudes: u64,
+}
+
+impl MemoryEstimate {
+    /// Returns true if the estimate fits within the given byte budget.
+    pub fn fits_in(&self, available_bytes: u64) -> bool {
+        self.total_bytes <= available_bytes
+    }
+
+    /// Suggest the maximum qubits for a given memory budget.
+    pub fn max_qubits_for(available_bytes: u64) -> u32 {
+        // Each qubit doubles memory; find largest n where 20 * 2^n <= available
+        // Factor of 20 accounts for 16-byte amplitudes + 25% working memory
+        let effective = available_bytes / 20;
+        if effective == 0 { return 0; }
+        (effective.ilog2()) as u32
+    }
+}
+```
+
+### 4. Allocation Failure Handling
+
+The engine never panics on allocation failure. All paths return structured errors:
+
+```rust
+// Pattern: every allocation is fallible and returns a descriptive error.
+
+// State vector allocation failure:
+SimulationError::MemoryAllocationFailed {
+    requested_bytes: 17_179_869_184,  // 16 GiB
+    qubit_count: 30,
+    suggestion: "Reduce qubit count by 2 (to 28, ~4 GiB) or enable tensor-network backend",
+}
+
+// Integer overflow (qubit count too large):
+SimulationError::QubitLimitExceeded {
+    requested: 64,
+    maximum: 33,  // based on available memory
+    estimated_memory_bytes: u64::MAX,
+    available_memory_bytes: 68_719_476_736,  // 64 GiB
+}
+```
+
+Decision tree on allocation failure:
+
+```
+  Memory allocation failed
+    |
+    +-- Is tensor-network feature enabled?
+    |     |
+    |     +-- YES: Suggest tensor-network backend
+    |     |         (may work if circuit has low treewidth)
+    |     |
+    |     +-- NO: Suggest reducing qubit count
+    |             Calculate: max_qubits = floor(log2(available / 20))
+    |             Suggest: "Reduce to {max_qubits} qubits ({memory} bytes)"
+    |
+    +-- Is the request wildly over budget (>100x)?
+    |     |
+    |     +-- YES: "Circuit requires {X} GiB but only {Y} MiB available"
+    |     |
+    |     +-- NO: "Circuit requires {X} GiB, {Y} GiB available.
+    |              Reducing by {delta} qubits would fit."
+    |
+    +-- Return SimulationError (no panic, no abort)
+```
+
+### 5. CPU Yielding for Long Simulations
+
+For simulations estimated to exceed 100ms, the engine can optionally yield between
+gate batches to allow the OS scheduler to manage power states:
+
+```rust
+pub struct YieldConfig {
+    /// Enable cooperative yielding between gate batches.
+    /// Default: false (maximum throughput).
+    pub enabled: bool,
+
+    /// Number of gates to apply before yielding.
+    /// Default: 1000.
+    pub gates_per_slice: usize,
+
+    /// Yield mechanism.
+    /// Default: ThreadYield (std::thread::yield_now).
+    pub yield_strategy: YieldStrategy,
+}
+
+pub enum YieldStrategy {
+    /// Call std::thread::yield_now() between slices.
+    ThreadYield,
+    /// Sleep for specified duration between slices.
+    Sleep(Duration),
+    /// Call a user-provided callback between slices.
+    Callback(Box<dyn Fn(SliceProgress) + Send>),
+}
+
+pub struct SliceProgress {
+    pub gates_completed: u64,
+    pub gates_remaining: u64,
+    pub elapsed: Duration,
+    pub estimated_remaining: Duration,
+}
+
+// Usage in gate application loop:
+fn apply_gates_with_yield(
+    circuit: &QuantumCircuit,
+    state: &mut StateVector,
+    yield_config: &YieldConfig,
+) -> Result<(), SimulationError> {
+    let gates = circuit.gates();
+
+    for (i, gate) in gates.iter().enumerate() {
+        apply_single_gate(gate, state)?;
+
+        if yield_config.enabled && (i + 1) % yield_config.gates_per_slice == 0 {
+            match &yield_config.yield_strategy {
+                YieldStrategy::ThreadYield => std::thread::yield_now(),
+                YieldStrategy::Sleep(d) => std::thread::sleep(*d),
+                YieldStrategy::Callback(cb) => cb(SliceProgress {
+                    gates_completed: (i + 1) as u64,
+                    gates_remaining: (gates.len() - i - 1) as u64,
+                    elapsed: start.elapsed(),
+                    estimated_remaining: estimate_remaining(i, gates.len(), start),
+                }),
+            }
+        }
+    }
+
+    Ok(())
+}
+```
+
+Yield is **disabled by default** to maximize throughput. It is primarily intended
+for:
+- Edge devices where power management is critical.
+- Interactive applications where UI responsiveness matters.
+- Long-running simulations (>1 second) where progress reporting is needed.
+
+### 6. Thread Management
+
+The quantum engine does not create or manage its own threads:
+
+```
+  +-----------------------------------------------+
+  |              Global Rayon Thread Pool          |
+  |  (shared by all ruVector subsystems)          |
+  |                                                |
+  |  [Thread 0] [Thread 1] ... [Thread N-1]       |
+  |     ^           ^              ^               |
+  |     |           |              |               |
+  |  +--+---+   +--+---+      +---+--+            |
+  |  | ruQu |   | ruQu |      | idle |            |
+  |  | gate  |   | gate |      |      |            |
+  |  | apply |   | apply|      |      |            |
+  |  +-------+   +------+      +------+            |
+  |                                                |
+  |  During simulation: threads work on gates      |
+  |  After simulation: threads return to pool      |
+  |  Pool idle: OS can power-gate cores            |
+  +-----------------------------------------------+
+```
+
+Key properties:
+- Rayon's global thread pool is initialized once by `ruvector-core` at startup.
+- The quantum engine calls `rayon::par_iter()` and related APIs, borrowing threads
+  temporarily.
+- When simulation completes, all threads are returned to the global pool.
+- If no ruVector work is pending, Rayon threads park (blocking on a condvar),
+  consuming zero CPU. The OS can then power-gate the underlying cores.
+
+### 7. WASM Memory Considerations
+
+WebAssembly linear memory has a specific behavior that affects resource management:
+
+```
+  WASM Memory Layout
+  +------------------+------------------+
+  |  Initial pages   |  Grown pages     |
+  |  (compiled size) |  (runtime alloc) |
+  +------------------+------------------+
+  0                  initial_size       current_size
+
+  Growth: memory.grow(delta_pages) -> adds pages to the end
+  Shrink: NOT SUPPORTED in WASM spec
+
+  After 25-qubit simulation:
+  +------------------+----------------------------------+
+  |  Initial (1 MiB) |  Grown for state vec (512 MiB)  |  <- HIGH WATER MARK
+  +------------------+----------------------------------+
+
+  After simulation completes:
+  +------------------+----------------------------------+
+  |  Initial (1 MiB) |  FREED internally but pages      |
+  |                   |  still mapped (512 MiB virtual)  |
+  +------------------+----------------------------------+
+  The Rust allocator returns memory to its free list,
+  but WASM pages are not returned to the host.
+```
+
+**Implications and mitigations**:
+
+1. **Document the behavior**: Users must understand that WASM memory is a high-water
+   mark. A 25-qubit simulation permanently increases the WASM instance's memory
+   footprint to ~512 MiB.
+
+2. **Instance recycling**: For applications that run multiple simulations, create a
+   new WASM instance periodically to reset the memory high-water mark.
+
+3. **Memory budget enforcement**: The WASM host can set `WebAssembly.Memory` with a
+   `maximum` parameter to cap growth:
+
+```javascript
+const memory = new WebAssembly.Memory({
+    initial: 16,      // 1 MiB
+    maximum: 8192,     // 512 MiB cap
+});
+```
+
+4. **Pre-check in WASM**: The engine's `estimate_memory()` function works in WASM
+   and should be called before simulation to verify the allocation will succeed.
+
+### 8. Cognitum Tile Integration
+
+On Cognitum's tile-based architecture, the quantum engine maps to tiles as follows:
+
+```
+  Cognitum Processor (256 tiles)
+  +--------+--------+--------+--------+
+  | Tile 0 | Tile 1 | Tile 2 | Tile 3 |  <- Assigned to quantum sim
+  | ACTIVE | ACTIVE | ACTIVE | ACTIVE |
+  +--------+--------+--------+--------+
+  | Tile 4 | Tile 5 | Tile 6 | Tile 7 |  <- Other ruVector work (or sleeping)
+  | sleep  | vecDB  | sleep  | sleep  |
+  +--------+--------+--------+--------+
+  |  ...   |  ...   |  ...   |  ...   |
+  | sleep  | sleep  | sleep  | sleep  |  <- Power gated (zero consumption)
+  +--------+--------+--------+--------+
+```
+
+**Power state diagram for a quantum simulation lifecycle**:
+
+```
+  State: ALL_TILES_IDLE
+    |
+    | Simulation request arrives
+    v
+  State: ALLOCATING
+    Action: Wake tiles 0-3 (or however many are needed)
+    Action: Allocate state vector across tile-local memory
+    Power: Tiles 0-3 ACTIVE, rest SLEEP
+    |
+    v
+  State: SIMULATING
+    Action: Apply gates in parallel across active tiles
+    Power: Tiles 0-3 at full clock rate
+    Duration: microseconds to seconds depending on circuit
+    |
+    v
+  State: MEASURING
+    Action: Sample measurement outcomes
+    Power: Tile 0 only (measurement is sequential)
+    |
+    v
+  State: DEALLOCATING
+    Action: Free state vector
+    Action: Return tiles to idle pool
+    |
+    v
+  State: ALL_TILES_IDLE
+    Power: Tiles 0-3 back to SLEEP
+    Memory: Zero heap allocation
+```
+
+**Tile assignment policy**:
+- Small simulations (n <= 20): 1 tile sufficient.
+- Medium simulations (20 < n <= 25): 2-4 tiles for parallel gate application.
+- Large simulations (25 < n <= 30): All available tiles.
+- The tile scheduler (part of Cognitum runtime) handles assignment. The quantum
+  engine simply uses Rayon parallelism; the runtime maps Rayon threads to tiles.
+
+### 9. Memory Budget Table
+
+Quick reference for capacity planning:
+
+| Qubits | State Vector | Working Memory | Total | Platform Fit |
+|--------|-------------|---------------|-------|-------------|
+| 10 | 16 KiB | 4 KiB | 20 KiB | Any |
+| 12 | 64 KiB | 16 KiB | 80 KiB | Any |
+| 14 | 256 KiB | 64 KiB | 320 KiB | Any |
+| 16 | 1 MiB | 256 KiB | 1.3 MiB | Any |
+| 18 | 4 MiB | 1 MiB | 5 MiB | Any |
+| 20 | 16 MiB | 4 MiB | 20 MiB | Any |
+| 22 | 64 MiB | 16 MiB | 80 MiB | Cognitum single tile |
+| 24 | 256 MiB | 64 MiB | 320 MiB | Cognitum 2+ tiles |
+| 26 | 1 GiB | 256 MiB | 1.3 GiB | Cognitum cluster |
+| 28 | 4 GiB | 1 GiB | 5 GiB | Laptop / RPi 8GB |
+| 30 | 16 GiB | 4 GiB | 20 GiB | Workstation |
+| 32 | 64 GiB | 16 GiB | 80 GiB | Server |
+| 34 | 256 GiB | 64 GiB | 320 GiB | Large server |
+
+### 10. Allocation and Deallocation Sequence Diagram
+
+```
+  Caller                Engine                  OS/Allocator
+    |                     |                         |
+    |  execute(circuit)   |                         |
+    |-------------------->|                         |
+    |                     |                         |
+    |                     |  estimate_memory(n)     |
+    |                     |  validate_available()   |
+    |                     |                         |
+    |                     |  try_reserve_exact(2^n) |
+    |                     |------------------------>|
+    |                     |                         |
+    |                     |     Ok(ptr) or Err      |
+    |                     |<------------------------|
+    |                     |                         |
+    |                     |  [if Err: return        |
+    |                     |   SimulationError]      |
+    |                     |                         |
+    |                     |  initialize |00...0>    |
+    |                     |  apply gates            |
+    |                     |  measure                |
+    |                     |                         |
+    |                     |  build result           |
+    |                     |  (copies measurements,  |
+    |                     |   expectation values)   |
+    |                     |                         |
+    |                     |  drop(state_vector)     |
+    |                     |------------------------>|
+    |                     |                         |  free(ptr, 2^n * 16)
+    |                     |                         |
+    |  Ok(result)         |                         |
+    |<--------------------|                         |
+    |                     |                         |
+    |  [Engine holds ZERO |                         |
+    |   heap memory now]  |                         |
+```
+
+## Consequences
+
+### Positive
+
+1. **True zero-idle cost**: No background resource consumption. Perfectly aligned
+   with Cognitum's event-driven architecture and power gating.
+2. **Predictable memory**: `estimate_memory()` gives exact requirements before
+   committing, preventing OOM surprises.
+3. **Graceful degradation**: Allocation failures return structured errors with
+   actionable suggestions, never panics.
+4. **Platform portable**: The same allocation strategy works on native (Linux, macOS,
+   Windows), WASM, and embedded (Cognitum tiles).
+5. **No resource leaks**: Rust's ownership system guarantees deallocation on all
+   exit paths (success, error, panic).
+
+### Negative
+
+1. **No state caching**: Each simulation allocates and deallocates independently.
+   Repeated simulations on the same qubit count pay allocation cost each time.
+   Mitigation: allocation is O(2^n) but fast compared to O(G * 2^n) simulation.
+2. **WASM memory high-water mark**: Cannot reclaim WASM linear memory pages.
+   Documented as a platform limitation with instance-recycling workaround.
+3. **No memory pooling**: Could theoretically amortize allocation across simulations,
+   but this conflicts with the zero-idle-footprint requirement.
+4. **Yield overhead**: When enabled, cooperative yielding adds per-slice overhead.
+   Mitigated by making it opt-in and configurable.
+
+### Risks and Mitigations
+
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| OOM despite estimate_memory check | Low | Crash | Check returns conservative estimate including working memory |
+| WASM instance runs out of address space | Medium | Failure | Set `WebAssembly.Memory` maximum; document limitation |
+| Allocation latency spike (OS page faults) | Medium | Slow start | Consider `madvise` / `mlock` hints for large allocations |
+| Rayon thread pool contention | Medium | Degraded perf | Quantum engine yields between slices; Rayon work-stealing handles contention |
+
+## References
+
+- Cognitum Architecture Specification: event-driven tile-based computing
+- Rust `Vec::try_reserve_exact`: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.try_reserve_exact
+- WebAssembly Memory: https://webassembly.github.io/spec/core/syntax/modules.html#memories
+- Rayon thread pool: https://docs.rs/rayon
+- ADR-QE-001: Core Engine Architecture (zero-overhead design principle)
+- ADR-QE-005: WASM Compilation Target (WASM constraints)
+- ADR-QE-009: Tensor Network Evaluation Mode (alternative for large circuits)
+- ADR-QE-010: Observability & Monitoring (memory metrics reporting)