# ADR-016: RuVector Integration for Training Pipeline ## Status Accepted ## Context The `wifi-densepose-train` crate (ADR-015) was initially implemented using standard crates (`petgraph`, `ndarray`, custom signal processing). The ruvector ecosystem provides published Rust crates with subpolynomial algorithms that directly replace several components with superior implementations. All ruvector crates are published at v2.0.4 on crates.io (confirmed) and their source is available at https://github.com/ruvnet/ruvector. ### Available ruvector crates (all at v2.0.4, published on crates.io) | Crate | Description | Default Features | |-------|-------------|-----------------| | `ruvector-mincut` | World's first subpolynomial dynamic min-cut | `exact`, `approximate` | | `ruvector-attn-mincut` | Min-cut gating attention (graph-based alternative to softmax) | all modules | | `ruvector-attention` | Geometric, graph, and sparse attention mechanisms | all modules | | `ruvector-temporal-tensor` | Temporal tensor compression with tiered quantization | all modules | | `ruvector-solver` | Sublinear-time sparse linear solvers O(log n) to O(√n) | `neumann`, `cg`, `forward-push` | | `ruvector-core` | HNSW-indexed vector database core | v2.0.5 | | `ruvector-math` | Optimal transport, information geometry | v2.0.4 | ### Verified API Details (from source inspection of github.com/ruvnet/ruvector) #### ruvector-mincut ```rust use ruvector_mincut::{MinCutBuilder, DynamicMinCut, MinCutResult, VertexId, Weight}; // Build a dynamic min-cut structure let mut mincut = MinCutBuilder::new() .exact() // or .approximate(0.1) .with_edges(vec![(u: VertexId, v: VertexId, w: Weight)]) // (u32, u32, f64) tuples .build() .expect("Failed to build"); // Subpolynomial O(n^{o(1)}) amortized dynamic updates mincut.insert_edge(u, v, weight) -> Result // new cut value mincut.delete_edge(u, v) -> Result // new cut value // Queries mincut.min_cut_value() -> f64 mincut.min_cut() -> MinCutResult // includes partition mincut.partition() -> (Vec, Vec) // S and T sets mincut.cut_edges() -> Vec // edges crossing the cut // Note: VertexId = u64 (not u32); Edge has fields { source: u64, target: u64, weight: f64 } ``` `MinCutResult` contains: - `value: f64` — minimum cut weight - `is_exact: bool` - `approximation_ratio: f64` - `partition: Option<(Vec, Vec)>` — S and T node sets #### ruvector-attn-mincut ```rust use ruvector_attn_mincut::{attn_mincut, attn_softmax, AttentionOutput, MinCutConfig}; // Min-cut gated attention (drop-in for softmax attention) // Q, K, V are all flat &[f32] with shape [seq_len, d] let output: AttentionOutput = attn_mincut( q: &[f32], // queries: flat [seq_len * d] k: &[f32], // keys: flat [seq_len * d] v: &[f32], // values: flat [seq_len * d] d: usize, // feature dimension seq_len: usize, // number of tokens / antenna paths lambda: f32, // min-cut threshold (larger = more pruning) tau: usize, // temporal hysteresis window eps: f32, // numerical epsilon ) -> AttentionOutput; // AttentionOutput pub struct AttentionOutput { pub output: Vec, // attended values [seq_len * d] pub gating: GatingResult, // which edges were kept/pruned } // Baseline softmax attention for comparison let output: Vec = attn_softmax(q, k, v, d, seq_len); ``` **Use case in wifi-densepose-train**: In `ModalityTranslator`, treat the `T * n_tx * n_rx` antenna×time paths as `seq_len` tokens and the `n_sc` subcarriers as feature dimension `d`. Apply `attn_mincut` to gate irrelevant antenna-pair correlations before passing to FC layers. #### ruvector-solver (NeumannSolver) ```rust use ruvector_solver::neumann::NeumannSolver; use ruvector_solver::types::CsrMatrix; use ruvector_solver::traits::SolverEngine; // Build sparse matrix from COO entries let matrix = CsrMatrix::::from_coo(rows, cols, vec![ (row: usize, col: usize, val: f32), ... ]); // Solve Ax = b in O(√n) for sparse systems let solver = NeumannSolver::new(tolerance: f64, max_iterations: usize); let result = solver.solve(&matrix, rhs: &[f32]) -> Result; // SolverResult result.solution: Vec // solution vector x result.residual_norm: f64 // ||b - Ax|| result.iterations: usize // number of iterations used ``` **Use case in wifi-densepose-train**: In `subcarrier.rs`, model the 114→56 subcarrier resampling as a sparse regularized least-squares problem `A·x ≈ b` where `A` is a sparse basis-function matrix (physically motivated by multipath propagation model: each target subcarrier is a sparse combination of adjacent source subcarriers). Gives O(√n) vs O(n) for n=114 subcarriers. #### ruvector-temporal-tensor ```rust use ruvector_temporal_tensor::{TemporalTensorCompressor, TierPolicy}; use ruvector_temporal_tensor::segment; // Create compressor for `element_count` f32 elements per frame let mut comp = TemporalTensorCompressor::new( TierPolicy::default(), // configures hot/warm/cold thresholds element_count: usize, // n_tx * n_rx * n_sc (elements per CSI frame) id: u64, // tensor identity (0 for amplitude, 1 for phase) ); // Mark access recency (drives tier selection): // hot = accessed within last few timestamps → 8-bit (~4x compression) // warm = moderately recent → 5 or 7-bit (~4.6–6.4x) // cold = rarely accessed → 3-bit (~10.67x) comp.set_access(timestamp: u64, tensor_id: u64); // Compress frames into a byte segment let mut segment_buf: Vec = Vec::new(); comp.push_frame(frame: &[f32], timestamp: u64, &mut segment_buf); comp.flush(&mut segment_buf); // flush current partial segment // Decompress let mut decoded: Vec = Vec::new(); segment::decode(&segment_buf, &mut decoded); // all frames segment::decode_single_frame(&segment_buf, frame_index: usize) -> Option>; segment::compression_ratio(&segment_buf) -> f64; ``` **Use case in wifi-densepose-train**: In `dataset.rs`, buffer CSI frames in `TemporalTensorCompressor` to reduce memory footprint by 50–75%. The CSI window contains `window_frames` (default 100) frames per sample; hot frames (recent) stay at f32 fidelity, cold frames (older) are aggressively quantized. #### ruvector-attention ```rust use ruvector_attention::{ attention::ScaledDotProductAttention, traits::Attention, }; let attention = ScaledDotProductAttention::new(d: usize); // feature dim // Compute attention: q is [d], keys and values are Vec<&[f32]> let output: Vec = attention.compute( query: &[f32], // [d] keys: &[&[f32]], // n_nodes × [d] values: &[&[f32]], // n_nodes × [d] ) -> Result>; ``` **Use case in wifi-densepose-train**: In `model.rs` spatial decoder, replace the standard Conv2D upsampling pass with graph-based spatial attention among spatial locations, where nodes represent spatial grid points and edges connect neighboring antenna footprints. --- ## Decision Integrate ruvector crates into `wifi-densepose-train` at five integration points: ### 1. `ruvector-mincut` → `metrics.rs` (replaces petgraph Hungarian for multi-frame) **Before:** O(n³) Kuhn-Munkres via DFS augmenting paths using `petgraph::DiGraph`, single-frame only (no state across frames). **After:** `DynamicPersonMatcher` struct wrapping `ruvector_mincut::DynamicMinCut`. Maintains the bipartite assignment graph across frames using subpolynomial updates: - `insert_edge(pred_id, gt_id, oks_cost)` when new person detected - `delete_edge(pred_id, gt_id)` when person leaves scene - `partition()` returns S/T split → `cut_edges()` returns the matched pred→gt pairs **Performance:** O(n^{1.5} log n) amortized update vs O(n³) rebuild per frame. Critical for >3 person scenarios and video tracking (frame-to-frame updates). The original `hungarian_assignment` function is **kept** for single-frame static matching (used in proof verification for determinism). ### 2. `ruvector-attn-mincut` → `model.rs` (replaces flat MLP fusion in ModalityTranslator) **Before:** Amplitude/phase FC encoders → concatenate [B, 512] → fuse Linear → ReLU. **After:** Treat the `n_ant = T * n_tx * n_rx` antenna×time paths as `seq_len` tokens and `n_sc` subcarriers as feature dimension `d`. Apply `attn_mincut` to gate irrelevant antenna-pair correlations: ```rust // In ModalityTranslator::forward_t: // amp/ph tensors: [B, n_ant, n_sc] → convert to Vec // Apply attn_mincut with seq_len=n_ant, d=n_sc, lambda=0.3 // → attended output [B, n_ant, n_sc] → flatten → FC layers ``` **Benefit:** Automatic antenna-path selection without explicit learned masks; min-cut gating is more computationally principled than learned gates. ### 3. `ruvector-temporal-tensor` → `dataset.rs` (CSI temporal compression) **Before:** Raw CSI windows stored as full f32 `Array4` in memory. **After:** `CompressedCsiBuffer` struct backed by `TemporalTensorCompressor`. Tiered quantization based on frame access recency: - Hot frames (last 10): f32 equivalent (8-bit quant ≈ 4× smaller than f32) - Warm frames (11–50): 5/7-bit quantization - Cold frames (>50): 3-bit (10.67× smaller) Encode on `push_frame`, decode on `get(idx)` for transparent access. **Benefit:** 50–75% memory reduction for the default 100-frame temporal window; allows 2–4× larger batch sizes on constrained hardware. ### 4. `ruvector-solver` → `subcarrier.rs` (phase sanitization) **Before:** Linear interpolation across subcarriers using precomputed (i0, i1, frac) tuples. **After:** `NeumannSolver` for sparse regularized least-squares subcarrier interpolation. The CSI spectrum is modeled as a sparse combination of Fourier basis functions (physically motivated by multipath propagation): ```rust // A = sparse basis matrix [target_sc, src_sc] (Gaussian or sinc basis) // b = source CSI values [src_sc] // Solve: A·x ≈ b via NeumannSolver(tolerance=1e-5, max_iter=500) // x = interpolated values at target subcarrier positions ``` **Benefit:** O(√n) vs O(n) for n=114 source subcarriers; more accurate at subcarrier boundaries than linear interpolation. ### 5. `ruvector-attention` → `model.rs` (spatial decoder) **Before:** Standard ConvTranspose2D upsampling in `KeypointHead` and `DensePoseHead`. **After:** `ScaledDotProductAttention` applied to spatial feature nodes. Each spatial location [H×W] becomes a token; attention captures long-range spatial dependencies between antenna footprint regions: ```rust // feature map: [B, C, H, W] → flatten to [B, H*W, C] // For each batch: compute attention among H*W spatial nodes // → reshape back to [B, C, H, W] ``` **Benefit:** Captures long-range spatial dependencies missed by local convolutions; important for multi-person scenarios. --- ## Implementation Plan ### Files modified | File | Change | |------|--------| | `Cargo.toml` (workspace + crate) | Add ruvector-mincut, ruvector-attn-mincut, ruvector-temporal-tensor, ruvector-solver, ruvector-attention = "2.0.4" | | `metrics.rs` | Add `DynamicPersonMatcher` wrapping `ruvector_mincut::DynamicMinCut`; keep `hungarian_assignment` for deterministic proof | | `model.rs` | Add `attn_mincut` bridge in `ModalityTranslator::forward_t`; add `ScaledDotProductAttention` in spatial heads | | `dataset.rs` | Add `CompressedCsiBuffer` backed by `TemporalTensorCompressor`; `MmFiDataset` uses it | | `subcarrier.rs` | Add `interpolate_subcarriers_sparse` using `NeumannSolver`; keep `interpolate_subcarriers` as fallback | ### Files unchanged `config.rs`, `losses.rs`, `trainer.rs`, `proof.rs`, `error.rs` — no change needed. ### Feature gating All ruvector integrations are **always-on** (not feature-gated). The ruvector crates are pure Rust with no C FFI, so they add no platform constraints. --- ## Implementation Status | Phase | Status | |-------|--------| | Cargo.toml (workspace + crate) | **Complete** | | ADR-016 documentation | **Complete** | | ruvector-mincut in metrics.rs | **Complete** | | ruvector-attn-mincut in model.rs | **Complete** | | ruvector-temporal-tensor in dataset.rs | **Complete** | | ruvector-solver in subcarrier.rs | **Complete** | | ruvector-attention in model.rs spatial decoder | **Complete** | --- ## Consequences **Positive:** - Subpolynomial O(n^{1.5} log n) dynamic min-cut for multi-person tracking - Min-cut gated attention is physically motivated for CSI antenna arrays - 50–75% memory reduction from temporal quantization - Sparse least-squares interpolation is physically principled vs linear - All ruvector crates are pure Rust (no C FFI, no platform restrictions) **Negative:** - Additional compile-time dependencies (ruvector crates) - `attn_mincut` requires tensor↔Vec conversion overhead per batch element - `TemporalTensorCompressor` adds compression/decompression latency on dataset load - `NeumannSolver` requires diagonally dominant matrices; a sparse Tikhonov regularization term (λI) is added to ensure convergence ## References - ADR-015: Public Dataset Training Strategy - ADR-014: SOTA Signal Processing Algorithms - github.com/ruvnet/ruvector (source: crates at v2.0.4) - ruvector-mincut: https://crates.io/crates/ruvector-mincut - ruvector-attn-mincut: https://crates.io/crates/ruvector-attn-mincut - ruvector-temporal-tensor: https://crates.io/crates/ruvector-temporal-tensor - ruvector-solver: https://crates.io/crates/ruvector-solver - ruvector-attention: https://crates.io/crates/ruvector-attention