# ADR-023: Trained DensePose Model with RuVector Signal Intelligence Pipeline | Field | Value | |-------|-------| | **Status** | Proposed | | **Date** | 2026-02-28 | | **Deciders** | ruv | | **Relates to** | ADR-003 (RVF Cognitive Containers), ADR-005 (SONA Self-Learning), ADR-015 (Public Dataset Strategy), ADR-016 (RuVector Integration), ADR-017 (RuVector-Signal-MAT), ADR-020 (Rust AI Migration), ADR-021 (Vital Sign Detection) | ## Context ### The Gap Between Sensing and DensePose The WiFi-DensePose system currently operates in two distinct modes: 1. **WiFi CSI sensing** (working): ESP32 streams CSI frames → Rust aggregator → feature extraction → presence/motion classification. 41 tests passing, verified at ~20 Hz with real hardware. 2. **Heuristic pose derivation** (working but approximate): The Rust sensing server generates 17 COCO keypoints from WiFi signal properties using hand-crafted rules (`derive_pose_from_sensing()` in `sensing-server/src/main.rs`). This is not a trained model — keypoint positions are derived from signal amplitude, phase variance, and motion metrics rather than learned from labeled data. Neither mode produces **DensePose-quality** body surface estimation. The CMU "DensePose From WiFi" paper (arXiv:2301.00250) demonstrated that a neural network trained on paired WiFi CSI + camera pose data can produce dense body surface UV coordinates from WiFi alone. However, that approach requires: - **Environment-specific training**: The model must be trained or fine-tuned for each deployment environment because CSI multipath patterns are environment-dependent. - **Paired training data**: Simultaneous WiFi CSI captures + ground-truth pose annotations (or a camera-based teacher model generating pseudo-labels). - **Substantial compute**: Training a modality translation network + DensePose head requires GPU time (hours to days depending on dataset size). ### What Exists in the Codebase The Rust workspace already has the complete model architecture ready for training: | Component | Crate | File | Status | |-----------|-------|------|--------| | `WiFiDensePoseModel` | `wifi-densepose-train` | `model.rs` | Implemented (random weights) | | `ModalityTranslator` | `wifi-densepose-train` | `model.rs` | Implemented with RuVector attention | | `KeypointHead` | `wifi-densepose-train` | `model.rs` | Implemented (17 COCO heatmaps) | | `DensePoseHead` | `wifi-densepose-nn` | `densepose.rs` | Implemented (25 parts + 48 UV) | | `WiFiDensePoseLoss` | `wifi-densepose-train` | `losses.rs` | Implemented (keypoint + part + UV + transfer) | | `MmFiDataset` loader | `wifi-densepose-train` | `dataset.rs` | Planned (ADR-015) | | `WiFiDensePosePipeline` | `wifi-densepose-nn` | `inference.rs` | Implemented (generic over Backend) | | Training proof verification | `wifi-densepose-train` | `proof.rs` | Implemented (deterministic hash) | | Subcarrier resampling (114→56) | `wifi-densepose-train` | `subcarrier.rs` | Planned (ADR-016) | ### RuVector Crates Available The `vendor/ruvector/` subtree provides 90+ crates. The following are directly relevant to a trained DensePose pipeline: **Already integrated (5 crates, ADR-016):** | Crate | Algorithm | Current Use | |-------|-----------|-------------| | `ruvector-mincut` | Subpolynomial dynamic min-cut O(n^{o(1)}) | Multi-person assignment in `metrics.rs` | | `ruvector-attn-mincut` | Attention-gated min-cut | Noise-suppressed spectrogram in `model.rs` | | `ruvector-attention` | Scaled dot-product + geometric attention | Spatial decoder in `model.rs` | | `ruvector-solver` | Sparse Neumann solver O(√n) | Subcarrier resampling in `subcarrier.rs` | | `ruvector-temporal-tensor` | Tiered temporal compression | CSI frame buffering in `dataset.rs` | **Newly proposed for DensePose pipeline (6 additional crates):** | Crate | Description | Proposed Use | |-------|-------------|-------------| | `ruvector-gnn` | Graph neural network on HNSW topology | Spatial body-graph reasoning | | `ruvector-graph-transformer` | Proof-gated graph transformer (8 modules) | CSI-to-pose cross-attention | | `ruvector-sparse-inference` | PowerInfer-style sparse inference engine | Edge deployment with neuron activation sparsity | | `ruvector-sona` | Self-Optimizing Neural Architecture (LoRA + EWC++) | Online environment adaptation | | `ruvector-fpga-transformer` | FPGA-optimized transformer | Hardware-accelerated inference path | | `ruvector-math` | Optimal transport, information geometry | Domain adaptation loss functions | ### RVF Container Format The RuVector Format (RVF) is a segment-based binary container format designed to package intelligence artifacts — embeddings, HNSW indexes, quantized weights, WASM runtimes, witness proofs, and metadata — into a single self-contained file. Key properties: - **64-byte segment headers** (`SegmentHeader`, magic `0x52564653` "RVFS") with type discriminator, content hash, compression, and timestamp - **Progressive loading**: Layer A (entry points, <5ms) → Layer B (hot adjacency, 100ms–1s) → Layer C (full graph, seconds) - **20+ segment types**: `Vec` (embeddings), `Index` (HNSW), `Overlay` (min-cut witnesses), `Quant` (codebooks), `Witness` (proof-of-computation), `Wasm` (self-bootstrapping runtime), `Dashboard` (embedded UI), `AggregateWeights` (federated SONA deltas), `Crypto` (Ed25519 signatures), and more - **Temperature-tiered quantization** (`rvf-quant`): f32 / f16 / u8 / binary per-segment, with SIMD-accelerated distance computation - **AGI Cognitive Container** (`agi_container.rs`): packages kernel + WASM + world model + orchestrator + evaluation harness + witness chains into a single deployable file The trained DensePose model will be packaged as an `.rvf` container, making it a single self-contained artifact that includes model weights, HNSW-indexed embedding tables, min-cut graph overlays, quantization codebooks, SONA adaptation deltas, and the WASM inference runtime — deployable to any host without external dependencies. ## Decision Implement a fully trained DensePose model using RuVector signal intelligence as the backbone signal processing layer, packaged in the RVF container format. The pipeline has three stages: (1) offline training on public datasets, (2) teacher-student distillation for DensePose UV labels, and (3) online SONA adaptation for environment-specific fine-tuning. The trained model, its embeddings, indexes, and adaptation state are serialized into a single `.rvf` file. ### Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ TRAINED DENSEPOSE PIPELINE │ │ │ │ ┌─────────────┐ ┌──────────────────────┐ ┌──────────────────────┐ │ │ │ ESP32 CSI │ │ RuVector Signal │ │ Trained Neural │ │ │ │ Raw I/Q │───▶│ Intelligence Layer │───▶│ Network │ │ │ │ [ant×sub×T] │ │ (preprocessing) │ │ (inference) │ │ │ └─────────────┘ └──────────────────────┘ └──────────────────────┘ │ │ │ │ │ │ ┌─────────┴─────────┐ ┌────────┴────────┐ │ │ │ 5 RuVector crates │ │ 6 RuVector │ │ │ │ (signal processing)│ │ crates (neural) │ │ │ └───────────────────┘ └─────────────────┘ │ │ │ │ │ ┌──────────────────────────┘ │ │ ▼ │ │ ┌──────────────────────────────────────┐ │ │ │ Outputs │ │ │ │ • 17 COCO keypoints [B,17,H,W] │ │ │ │ • 25 body parts [B,25,H,W] │ │ │ │ • 48 UV coords [B,48,H,W] │ │ │ │ • Confidence scores │ │ │ └──────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Stage 1: RuVector Signal Preprocessing Layer Raw CSI frames from ESP32 (56–192 subcarriers × N antennas × T time frames) are processed through the RuVector signal intelligence stack before entering the neural network. This replaces hand-crafted feature extraction with learned, graph-aware preprocessing. ``` Raw CSI [ant, sub, T] │ ▼ ┌─────────────────────────────────────────────────────┐ │ 1. ruvector-attn-mincut: gate_spectrogram() │ │ Input: Q=amplitude, K=phase, V=combined │ │ Effect: Suppress multipath noise, keep motion- │ │ relevant subcarrier paths │ │ Output: Gated spectrogram [ant, sub', T] │ ├─────────────────────────────────────────────────────┤ │ 2. ruvector-mincut: mincut_subcarrier_partition() │ │ Input: Subcarrier coherence graph │ │ Effect: Partition into sensitive (motion- │ │ responsive) vs insensitive (static) │ │ Output: Partition mask + per-subcarrier weights │ ├─────────────────────────────────────────────────────┤ │ 3. ruvector-attention: attention_weighted_bvp() │ │ Input: Gated spectrogram + partition weights │ │ Effect: Compute body velocity profile with │ │ sensitivity-weighted attention │ │ Output: BVP feature vector [D_bvp] │ ├─────────────────────────────────────────────────────┤ │ 4. ruvector-solver: solve_fresnel_geometry() │ │ Input: Amplitude + known TX/RX positions │ │ Effect: Estimate TX-body-RX ellipsoid distances │ │ Output: Fresnel geometry features [D_fresnel] │ ├─────────────────────────────────────────────────────┤ │ 5. ruvector-temporal-tensor: compress + buffer │ │ Input: Temporal CSI window (100 frames) │ │ Effect: Tiered quantization (hot/warm/cold) │ │ Output: Compressed tensor, 50-75% memory saving │ └─────────────────────────────────────────────────────┘ │ ▼ Feature tensor [B, T*tx*rx, sub] (preprocessed, noise-suppressed) ``` ### Stage 2: Neural Network Architecture The neural network follows the CMU teacher-student architecture with RuVector enhancements at three critical points. #### 2a. ModalityTranslator (CSI → Visual Feature Space) ``` CSI features [B, T*tx*rx, sub] │ ├──amplitude──┐ │ ├─► Encoder (Conv1D stack, 64→128→256) └──phase──────┘ │ ▼ ┌──────────────────────────────┐ │ ruvector-graph-transformer │ │ │ │ Treat antenna-pair×time as │ │ graph nodes. Edges connect │ │ spatially adjacent antenna │ │ pairs and temporally │ │ adjacent frames. │ │ │ │ Proof-gated attention: │ │ Each layer verifies that │ │ attention weights satisfy │ │ physical constraints │ │ (Fresnel ellipsoid bounds) │ └──────────────────────────────┘ │ ▼ Decoder (ConvTranspose2d stack, 256→128→64→3) │ ▼ Visual features [B, 3, 48, 48] ``` **RuVector enhancement**: Replace standard multi-head self-attention in the bottleneck with `ruvector-graph-transformer`. The graph structure encodes the physical antenna topology — nodes that are closer in space (adjacent ESP32 nodes in the mesh) or time (consecutive frames) have stronger edge weights. This injects domain-specific inductive bias that standard attention lacks. #### 2b. GNN Body Graph Reasoning ``` Visual features [B, 3, 48, 48] │ ▼ ResNet18 backbone → feature maps [B, 256, 12, 12] │ ▼ ┌─────────────────────────────────────────┐ │ ruvector-gnn: Body Graph Network │ │ │ │ 17 COCO keypoints as graph nodes │ │ Edges: anatomical connections │ │ (shoulder→elbow, hip→knee, etc.) │ │ │ │ GNN message passing (3 rounds): │ │ h_i^{l+1} = σ(W·h_i^l + Σ_j α_ij·h_j)│ │ α_ij = attention(h_i, h_j, edge_ij) │ │ │ │ Enforces anatomical constraints: │ │ - Limb length ratios │ │ - Joint angle limits │ │ - Left-right symmetry priors │ └─────────────────────────────────────────┘ │ ├──────────────────┬──────────────────┐ ▼ ▼ ▼ KeypointHead DensePoseHead ConfidenceHead [B,17,H,W] [B,25+48,H,W] [B,1] heatmaps parts + UV quality score ``` **RuVector enhancement**: `ruvector-gnn` replaces the flat spatial decoder with a graph neural network that operates on the human body graph. WiFi CSI is inherently noisy — GNN message passing between anatomically connected joints enforces that predicted keypoints maintain plausible body structure even when individual joint predictions are uncertain. #### 2c. Sparse Inference for Edge Deployment ``` Trained model weights (full precision) │ ▼ ┌─────────────────────────────────────────────┐ │ ruvector-sparse-inference │ │ │ │ PowerInfer-style activation sparsity: │ │ - Profile neuron activation frequency │ │ - Partition into hot (always active, 20%) │ │ and cold (conditionally active, 80%) │ │ - Hot neurons: GPU/SIMD fast path │ │ - Cold neurons: sparse lookup on demand │ │ │ │ Quantization: │ │ - Backbone: INT8 (4x memory reduction) │ │ - DensePose head: FP16 (2x reduction) │ │ - ModalityTranslator: FP16 │ │ │ │ Target: <50ms inference on ESP32-S3 │ │ <10ms on x86 with AVX2 │ └─────────────────────────────────────────────┘ ``` ### Stage 3: Training Pipeline #### 3a. Dataset Loading and Preprocessing Primary dataset: **MM-Fi** (NeurIPS 2023) — 40 subjects, 27 actions, 114 subcarriers, 3 RX antennas, 17 COCO keypoints + DensePose UV annotations. Secondary dataset: **Wi-Pose** — 12 subjects, 12 actions, 30 subcarriers, 3×3 antenna array, 18 keypoints. ``` ┌──────────────────────────────────────────────────────────┐ │ Data Loading Pipeline │ │ │ │ MM-Fi .npy ──► Resample 114→56 subcarriers ──┐ │ │ (ruvector-solver NeumannSolver) │ │ │ ├──► Batch│ │ Wi-Pose .mat ──► Zero-pad 30→56 subcarriers ──┘ [B,T*│ │ ant, │ │ Phase sanitize ──► Hampel filter ──► unwrap sub] │ │ (wifi-densepose-signal::phase_sanitizer) │ │ │ │ Temporal buffer ──► ruvector-temporal-tensor │ │ (100 frames/sample, tiered quantization) │ └──────────────────────────────────────────────────────────┘ ``` #### 3b. Teacher-Student DensePose Labels For samples with 3D keypoints but no DensePose UV maps: 1. Run Detectron2 DensePose R-CNN on paired RGB frames (one-time preprocessing step on GPU workstation) 2. Generate `(part_labels [H,W], u_coords [H,W], v_coords [H,W])` pseudo-labels 3. Cache as `.npy` alongside original data 4. Teacher model is discarded after label generation — inference uses WiFi only #### 3c. Loss Function ```rust L_total = λ_kp · L_keypoint // MSE on predicted vs GT heatmaps + λ_part · L_part // Cross-entropy on 25-class body part segmentation + λ_uv · L_uv // Smooth L1 on UV coordinate regression + λ_xfer · L_transfer // MSE between CSI features and teacher visual features + λ_ot · L_ot // Optimal transport regularization (ruvector-math) + λ_graph · L_graph // GNN edge consistency loss (ruvector-gnn) ``` **RuVector enhancement**: `ruvector-math` provides optimal transport (Wasserstein distance) as a regularization term. This penalizes predicted body part distributions that are far from the ground truth in the Wasserstein metric, which is more geometrically meaningful than pixel-wise cross-entropy for spatial body part segmentation. #### 3d. Training Configuration | Parameter | Value | Rationale | |-----------|-------|-----------| | Optimizer | AdamW | Weight decay regularization | | Learning rate | 1e-3, cosine decay to 1e-5 | Standard for modality translation | | Batch size | 32 | Fits in 24GB GPU VRAM | | Epochs | 100 | With early stopping (patience=15) | | Warmup | 5 epochs | Linear LR warmup | | Train/val split | Subjects 1-32 / 33-40 | Subject-disjoint for generalization | | Augmentation | Time-shift ±5 frames, amplitude noise ±2dB, antenna dropout 10% | CSI-domain augmentations | | Hardware | Single RTX 3090 or A100 | ~8 hours on A100 | | Checkpoint | Every epoch, keep best-by-validation-PCK | Deterministic seed | #### 3e. Metrics | Metric | Target | Description | |--------|--------|-------------| | PCK@0.2 | >70% on MM-Fi val | Percentage of correct keypoints (threshold = 0.2 × torso diameter) | | OKS mAP | >0.50 on MM-Fi val | Object Keypoint Similarity, COCO-standard | | DensePose GPS | >0.30 on MM-Fi val | Geodesic Point Similarity for UV accuracy | | Inference latency | <50ms per frame | On x86 with ONNX Runtime | | Model size | <25MB (FP16) | Suitable for edge deployment | ### Stage 4: Online Adaptation with SONA After offline training produces a base model, SONA enables continuous adaptation to new environments without retraining from scratch. ``` ┌──────────────────────────────────────────────────────────┐ │ SONA Online Adaptation Loop │ │ │ │ Base model (frozen weights W) │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────┐ │ │ │ LoRA Adaptation Matrices │ │ │ │ W_effective = W + α · A·B │ │ │ │ │ │ │ │ Rank r=4 for translator layers │ │ │ │ Rank r=2 for backbone layers │ │ │ │ Rank r=8 for DensePose head │ │ │ │ │ │ │ │ Total trainable params: ~50K │ │ │ │ (vs ~5M frozen base) │ │ │ └──────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────┐ │ │ │ EWC++ Regularizer │ │ │ │ L = L_task + λ·Σ F_i(θ-θ*)² │ │ │ │ │ │ │ │ Prevents forgetting base model │ │ │ │ knowledge when adapting to new │ │ │ │ environment │ │ │ └──────────────────────────────────┘ │ │ │ │ │ ▼ │ │ Adaptation triggers: │ │ • First deployment in new room │ │ • PCK drops below threshold (drift detection) │ │ • User manually initiates calibration │ │ • Furniture/layout change detected (CSI baseline shift) │ │ │ │ Adaptation data: │ │ • Self-supervised: temporal consistency loss │ │ (pose at t should be similar to t-1 for slow motion) │ │ • Semi-supervised: user confirmation of presence/count │ │ • Optional: brief camera calibration session (5 min) │ │ │ │ Convergence: 10-50 gradient steps, <5 seconds on CPU │ └──────────────────────────────────────────────────────────┘ ``` ### Stage 5: Inference Pipeline (Production) ``` ESP32 CSI (UDP :5005) │ ▼ Rust Axum server (port 8080) │ ├─► RuVector signal preprocessing (Stage 1) │ 5 crates, ~2ms per frame │ ├─► ONNX Runtime inference (Stage 2) │ Quantized model, ~10ms per frame │ OR ruvector-sparse-inference, ~8ms per frame │ ├─► GNN post-processing (ruvector-gnn) │ Anatomical constraint enforcement, ~1ms │ ├─► SONA adaptation check (Stage 4) │ <0.05ms per frame (gradient accumulation only) │ └─► Output: DensePose results │ ├──► /api/v1/stream/pose (WebSocket, 17 keypoints) ├──► /api/v1/pose/current (REST, full DensePose) └──► /ws/sensing (WebSocket, raw + processed) ``` Total inference budget: **<15ms per frame** at 20 Hz on x86, **<50ms** on ESP32-S3 (with sparse inference). ### Stage 6: RVF Model Container Format The trained model is packaged as a single `.rvf` file that contains everything needed for inference — no external weight files, no ONNX runtime, no Python dependencies. #### RVF DensePose Container Layout ``` wifi-densepose-v1.rvf (single file, ~15-30 MB) ┌───────────────────────────────────────────────────────────────┐ │ SEGMENT 0: Manifest (0x05) │ │ ├── Model ID: "wifi-densepose-v1.0" │ │ ├── Training dataset: "mmfi-v1+wipose-v1" │ │ ├── Training config hash: SHA-256 │ │ ├── Target hardware: x86_64, aarch64, wasm32 │ │ ├── Segment directory (offsets to all segments) │ │ └── Level-1 TLV manifest with metadata tags │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 1: Vec (0x01) — Model Weight Embeddings │ │ ├── ModalityTranslator weights [64→128→256→3, Conv1D+ConvT] │ │ ├── ResNet18 backbone weights [3→64→128→256, residual blocks] │ │ ├── KeypointHead weights [256→17, deconv layers] │ │ ├── DensePoseHead weights [256→25+48, deconv layers] │ │ ├── GNN body graph weights [3 message-passing rounds] │ │ └── Graph transformer attention weights [proof-gated layers] │ │ Format: flat f32 vectors, 768-dim per weight tensor │ │ Total: ~5M parameters → ~20MB f32, ~10MB f16, ~5MB INT8 │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 2: Index (0x02) — HNSW Embedding Index │ │ ├── Layer A: Entry points + coarse routing centroids │ │ │ (loaded first, <5ms, enables approximate search) │ │ ├── Layer B: Hot region adjacency for frequently │ │ │ accessed weight clusters (100ms load) │ │ └── Layer C: Full adjacency graph for exact nearest │ │ neighbor lookup across all weight partitions │ │ Use: Fast weight lookup for sparse inference — │ │ only load hot neurons, skip cold neurons via HNSW routing │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 3: Overlay (0x03) — Dynamic Min-Cut Graph │ │ ├── Subcarrier partition graph (sensitive vs insensitive) │ │ ├── Min-cut witnesses from ruvector-mincut │ │ ├── Antenna topology graph (ESP32 mesh spatial layout) │ │ └── Body skeleton graph (17 COCO joints, 16 edges) │ │ Use: Pre-computed graph structures loaded at init time. │ │ Dynamic updates via ruvector-mincut insert/delete_edge │ │ as environment changes (furniture moves, new obstacles) │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 4: Quant (0x06) — Quantization Codebooks │ │ ├── INT8 codebook for backbone (4x memory reduction) │ │ ├── FP16 scale factors for translator + heads │ │ ├── Binary quantization tables for SIMD distance compute │ │ └── Per-layer calibration statistics (min, max, zero-point) │ │ Use: rvf-quant temperature-tiered quantization — │ │ hot layers stay f16, warm layers u8, cold layers binary │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 5: Witness (0x0A) — Training Proof Chain │ │ ├── Deterministic training proof (seed, loss curve, hash) │ │ ├── Dataset provenance (MM-Fi commit hash, download URL) │ │ ├── Validation metrics (PCK@0.2, OKS mAP, GPS scores) │ │ ├── Ed25519 signature over weight hash │ │ └── Attestation: training hardware, duration, config │ │ Use: Verifiable proof that model weights match a specific │ │ training run. Anyone can re-run training with same seed │ │ and verify the weight hash matches the witness. │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 6: Meta (0x07) — Model Metadata │ │ ├── COCO keypoint names and skeleton connectivity │ │ ├── DensePose body part labels (24 parts + background) │ │ ├── UV coordinate range and resolution │ │ ├── Input normalization statistics (mean, std per subcarrier)│ │ ├── RuVector crate versions used during training │ │ └── Environment calibration profiles (named, per-room) │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 7: AggregateWeights (0x36) — SONA LoRA Deltas │ │ ├── Per-environment LoRA adaptation matrices (A, B per layer)│ │ ├── EWC++ Fisher information diagonal │ │ ├── Optimal θ* reference parameters │ │ ├── Adaptation round count and convergence metrics │ │ └── Named profiles: "lab-a", "living-room", "office-3f" │ │ Use: Multiple environment adaptations stored in one file. │ │ Server loads the matching profile or creates a new one. │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 8: Profile (0x0B) — RVDNA Domain Profile │ │ ├── Domain: "wifi-csi-densepose" │ │ ├── Input spec: [B, T*ant, sub] CSI tensor format │ │ ├── Output spec: keypoints [B,17,H,W], parts [B,25,H,W], │ │ │ UV [B,48,H,W], confidence [B,1] │ │ ├── Hardware requirements: min RAM, recommended GPU │ │ └── Supported data sources: esp32, wifi-rssi, simulation │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 9: Crypto (0x0C) — Signature and Keys │ │ ├── Ed25519 public key for model publisher │ │ ├── Signature over all segment content hashes │ │ └── Certificate chain (optional, for enterprise deployment) │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 10: Wasm (0x10) — Self-Bootstrapping Runtime │ │ ├── Compiled WASM inference engine │ │ │ (ruvector-sparse-inference-wasm) │ │ ├── WASM microkernel for RVF segment parsing │ │ └── Browser-compatible: load .rvf → run inference in-browser │ │ Use: The .rvf file is fully self-contained — a WASM host │ │ can execute inference without any external dependencies. │ ├───────────────────────────────────────────────────────────────┤ │ SEGMENT 11: Dashboard (0x11) — Embedded Visualization │ │ ├── Three.js-based pose visualization (HTML/JS/CSS) │ │ ├── Gaussian splat renderer for signal field │ │ └── Served at http://localhost:8080/ when model is loaded │ │ Use: Open the .rvf file → get a working UI with no install │ └───────────────────────────────────────────────────────────────┘ ``` #### RVF Loading Sequence ``` 1. Read tail → find_latest_manifest() → SegmentDirectory 2. Load Manifest (seg 0) → validate magic, version, model ID 3. Load Profile (seg 8) → verify input/output spec compatibility 4. Load Crypto (seg 9) → verify Ed25519 signature chain 5. Load Quant (seg 4) → prepare quantization codebooks 6. Load Index Layer A (seg 2) → entry points ready (<5ms) ↓ (inference available at reduced accuracy) 7. Load Vec (seg 1) → hot weight partitions via Layer A routing 8. Load Index Layer B (seg 2) → hot adjacency ready (100ms) ↓ (inference at full accuracy for common poses) 9. Load Overlay (seg 3) → min-cut graphs, body skeleton 10. Load AggregateWeights (seg 7) → apply matching SONA profile 11. Load Index Layer C (seg 2) → complete graph loaded ↓ (full inference with all weight partitions) 12. Load Wasm (seg 10) → WASM runtime available (optional) 13. Load Dashboard (seg 11) → UI served (optional) ``` **Progressive availability**: Inference begins after step 6 (~5ms) with approximate results. Full accuracy is reached by step 9 (~500ms). This enables instant startup with gradually improving quality — critical for real-time applications. #### RVF Build Pipeline After training completes, the model is packaged into an `.rvf` file: ```bash # Build the RVF container from trained checkpoint cargo run -p wifi-densepose-train --bin build-rvf -- \ --checkpoint checkpoints/best-pck.pt \ --quantize int8,fp16 \ --hnsw-build \ --sign --key model-signing-key.pem \ --include-wasm \ --include-dashboard ../../ui \ --output wifi-densepose-v1.rvf # Verify the built container cargo run -p wifi-densepose-train --bin verify-rvf -- \ --input wifi-densepose-v1.rvf \ --verify-signature \ --verify-witness \ --benchmark-inference ``` #### RVF Runtime Integration The sensing server loads the `.rvf` container at startup: ```bash # Load model from RVF container ./target/release/sensing-server \ --model wifi-densepose-v1.rvf \ --source auto \ --ui-from-rvf # serve Dashboard segment instead of --ui-path ``` ```rust // In sensing-server/src/main.rs use rvf_runtime::RvfContainer; use rvf_index::layers::IndexLayer; use rvf_quant::QuantizedVec; let container = RvfContainer::open("wifi-densepose-v1.rvf")?; // Progressive load: Layer A first for instant startup let index = container.load_index(IndexLayer::A)?; let weights = container.load_vec_hot(&index)?; // hot partitions only // Full load in background tokio::spawn(async move { container.load_index(IndexLayer::B).await?; container.load_index(IndexLayer::C).await?; container.load_vec_cold().await?; // remaining partitions }); // SONA environment adaptation let sona_deltas = container.load_aggregate_weights("office-3f")?; model.apply_lora_deltas(&sona_deltas); // Serve embedded dashboard let dashboard = container.load_dashboard()?; // Mount at /ui/* routes in Axum ``` ## Implementation Plan ### Phase 1: Dataset Loaders (2 weeks) - Implement `MmFiDataset` in `wifi-densepose-train/src/dataset.rs` - Read MM-Fi `.npy` files with antenna correction (1TX/3RX → 3×3 zero-padding) - Subcarrier resampling 114→56 via `ruvector-solver::NeumannSolver` - Phase sanitization via `wifi-densepose-signal::phase_sanitizer` - Implement `WiPoseDataset` for secondary dataset - Temporal windowing with `ruvector-temporal-tensor` - **Deliverable**: `cargo test -p wifi-densepose-train` with dataset loading tests ### Phase 2: Graph Transformer Integration (2 weeks) - Add `ruvector-graph-transformer` dependency to `wifi-densepose-train` - Replace bottleneck self-attention in `ModalityTranslator` with proof-gated graph transformer - Build antenna topology graph (nodes = antenna pairs, edges = spatial/temporal proximity) - Add `ruvector-gnn` dependency for body graph reasoning - Build COCO body skeleton graph (17 nodes, 16 anatomical edges) - Implement GNN message passing in spatial decoder - **Deliverable**: Model forward pass produces correct output shapes with graph layers ### Phase 3: Teacher-Student Label Generation (1 week) - Python script using Detectron2 DensePose to generate UV pseudo-labels from MM-Fi RGB frames - Cache labels as `.npy` for Rust loader consumption - Validate label quality on a random subset (visual inspection) - **Deliverable**: Complete UV label set for MM-Fi training split ### Phase 4: Training Loop (3 weeks) - Implement `WiFiDensePoseTrainer` with full loss function (6 terms) - Add `ruvector-math` optimal transport loss term - Integrate GNN edge consistency loss - Training loop with cosine LR schedule, early stopping, checkpointing - Validation metrics: PCK@0.2, OKS mAP, DensePose GPS - Deterministic proof verification (`proof.rs`) with weight hash - **Deliverable**: Trained model checkpoint achieving PCK@0.2 >70% on MM-Fi validation ### Phase 5: SONA Online Adaptation (2 weeks) - Integrate `ruvector-sona` into inference pipeline - Implement LoRA injection at translator, backbone, and DensePose head layers - Implement EWC++ Fisher information computation and regularization - Self-supervised temporal consistency loss for unsupervised adaptation - Calibration mode: 5-minute camera session for supervised fine-tuning - Drift detection: monitor rolling PCK on temporal consistency proxy - **Deliverable**: Adaptation converges in <50 gradient steps, PCK recovers within 10% of base ### Phase 6: Sparse Inference and Edge Deployment (2 weeks) - Profile neuron activation frequencies on validation set - Apply `ruvector-sparse-inference` hot/cold neuron partitioning - INT8 quantization for backbone, FP16 for heads - ONNX export with quantized weights - Benchmark on x86 (target: <10ms) and ARM (target: <50ms) - WASM export via `ruvector-sparse-inference-wasm` for browser inference - **Deliverable**: Quantized ONNX model, benchmark results, WASM binary ### Phase 7: RVF Container Build Pipeline (2 weeks) - Implement `build-rvf` binary in `wifi-densepose-train` - Serialize trained weights into `Vec` segment (SegmentType::Vec, 0x01) - Build HNSW index over weight partitions for sparse inference (SegmentType::Index, 0x02) - Serialize min-cut graph overlays: subcarrier partition, antenna topology, body skeleton (SegmentType::Overlay, 0x03) - Generate quantization codebooks via `rvf-quant` (SegmentType::Quant, 0x06) - Write training proof witness with Ed25519 signature (SegmentType::Witness, 0x0A) - Store model metadata, COCO keypoint schema, normalization stats (SegmentType::Meta, 0x07) - Store SONA LoRA adaptation deltas per environment (SegmentType::AggregateWeights, 0x36) - Write RVDNA domain profile for WiFi CSI DensePose (SegmentType::Profile, 0x0B) - Optionally embed WASM inference runtime (SegmentType::Wasm, 0x10) - Optionally embed Three.js dashboard (SegmentType::Dashboard, 0x11) - Build Level-1 manifest and segment directory (SegmentType::Manifest, 0x05) - Implement `verify-rvf` binary for container validation - **Deliverable**: `wifi-densepose-v1.rvf` single-file container, verifiable and self-contained ### Phase 8: Integration with Sensing Server (1 week) - Load `.rvf` container in `wifi-densepose-sensing-server` via `rvf-runtime` - Progressive loading: Layer A first for instant startup, full graph in background - Replace `derive_pose_from_sensing()` heuristic with trained model inference - Add `--model` CLI flag accepting `.rvf` path (or legacy `.onnx`) - Apply SONA LoRA deltas from `AggregateWeights` segment based on `--env` flag - Serve embedded Dashboard segment at `/ui/*` when `--ui-from-rvf` is set - Graceful fallback to heuristic when no model file present - Update WebSocket protocol to include DensePose UV data - **Deliverable**: Sensing server serves trained model from single `.rvf` file ## File Changes ### New Files | File | Purpose | |------|---------| | `rust-port/.../wifi-densepose-train/src/dataset_mmfi.rs` | MM-Fi dataset loader with subcarrier resampling | | `rust-port/.../wifi-densepose-train/src/dataset_wipose.rs` | Wi-Pose dataset loader | | `rust-port/.../wifi-densepose-train/src/graph_transformer.rs` | Graph transformer integration | | `rust-port/.../wifi-densepose-train/src/body_gnn.rs` | GNN body graph reasoning | | `rust-port/.../wifi-densepose-train/src/adaptation.rs` | SONA LoRA + EWC++ adaptation | | `rust-port/.../wifi-densepose-train/src/trainer.rs` | Training loop with multi-term loss | | `scripts/generate_densepose_labels.py` | Teacher-student UV label generation | | `scripts/benchmark_inference.py` | Inference latency benchmarking | | `rust-port/.../wifi-densepose-train/src/rvf_builder.rs` | RVF container build pipeline | | `rust-port/.../wifi-densepose-train/src/bin/build_rvf.rs` | CLI binary for building `.rvf` containers | | `rust-port/.../wifi-densepose-train/src/bin/verify_rvf.rs` | CLI binary for verifying `.rvf` containers | ### Modified Files | File | Change | |------|--------| | `rust-port/.../wifi-densepose-train/Cargo.toml` | Add ruvector-gnn, graph-transformer, sona, sparse-inference, math, rvf-types, rvf-wire, rvf-manifest, rvf-index, rvf-quant, rvf-crypto, rvf-runtime deps | | `rust-port/.../wifi-densepose-train/src/model.rs` | Integrate graph transformer + GNN layers | | `rust-port/.../wifi-densepose-train/src/losses.rs` | Add optimal transport + GNN edge consistency loss terms | | `rust-port/.../wifi-densepose-train/src/config.rs` | Add training hyperparameters for new components | | `rust-port/.../sensing-server/Cargo.toml` | Add rvf-runtime, rvf-types, rvf-index, rvf-quant deps | | `rust-port/.../sensing-server/src/main.rs` | Add `--model` flag, load `.rvf` container, progressive startup, serve embedded dashboard | ## Consequences ### Positive - **Trained model produces accurate DensePose**: Moves from heuristic keypoints to learned body surface estimation backed by public dataset evaluation - **RuVector signal intelligence is a differentiator**: Graph transformers on antenna topology and GNN body reasoning are novel — no prior WiFi pose system uses these techniques - **SONA enables zero-shot deployment**: New environments don't require full retraining — LoRA adaptation with <50 gradient steps converges in seconds - **Sparse inference enables edge deployment**: PowerInfer-style neuron partitioning brings DensePose inference to ESP32-class hardware - **Graceful degradation**: Server falls back to heuristic pose when no model file is present — existing functionality is preserved - **Single-file deployment via RVF**: Trained model, embeddings, HNSW index, quantization codebooks, SONA adaptation profiles, WASM runtime, and dashboard UI packaged in one `.rvf` file — deploy by copying a single file - **Progressive loading**: RVF Layer A loads in <5ms for instant startup; full accuracy reached in ~500ms as remaining segments load - **Verifiable provenance**: RVF Witness segment contains deterministic training proof with Ed25519 signature — anyone can re-run training and verify weight hash - **Self-bootstrapping**: RVF Wasm segment enables browser-based inference with no server-side dependencies - **Open evaluation**: PCK, OKS, GPS metrics on public MM-Fi dataset provide reproducible, comparable results ### Negative - **Training requires GPU**: Initial model training needs RTX 3090 or better (~8 hours on A100). Not all developers will have access. - **Teacher-student label generation requires Detectron2**: One-time Python + CUDA dependency for generating UV pseudo-labels from RGB frames - **MM-Fi CC BY-NC license**: Weights trained on MM-Fi cannot be used commercially without collecting proprietary data - **Environment-specific adaptation still required**: SONA reduces the burden but a brief calibration session in each new environment is still recommended for best accuracy - **6 additional RuVector crate dependencies**: Increases compile time and binary size. Mitigated by feature flags (e.g., `--features trained-model`). - **Model size on disk**: ~25MB (FP16) or ~12MB (INT8). Acceptable for server deployment, may need further pruning for WASM. ### Risks and Mitigations | Risk | Mitigation | |------|------------| | MM-Fi 114→56 interpolation loses accuracy | Train at native 114 as alternative; ESP32 mesh can collect 56-sub data natively | | GNN overfits to training body types | Augment with diverse body proportions; Wi-Pose adds subject diversity | | SONA adaptation diverges in adversarial environments | EWC++ regularization caps parameter drift; rollback to base weights on detection | | Sparse inference degrades accuracy | Benchmark INT8 vs FP16 vs FP32; fall back to full precision if quality drops | | Training proof hash changes with RuVector version updates | Pin ruvector crate versions in Cargo.toml; regenerate hash on version bumps | ## References - Geng et al., "DensePose From WiFi" (CMU, arXiv:2301.00250, 2023) - Yang et al., "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset" (NeurIPS 2023, arXiv:2305.10345) - Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (ICLR 2022) - Kirkpatrick et al., "Overcoming Catastrophic Forgetting in Neural Networks" (PNAS, 2017) - Song et al., "PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU" (2024) - ADR-005: SONA Self-Learning for Pose Estimation - ADR-015: Public Dataset Strategy for Trained Pose Estimation Model - ADR-016: RuVector Integration for Training Pipeline - ADR-020: Migrate AI/Model Inference to Rust with RuVector and ONNX Runtime ## Appendix A: RuQu Consideration **ruQu** ("Classical nervous system for quantum machines") provides real-time coherence assessment via dynamic min-cut. While primarily designed for quantum error correction (syndrome decoding, surface code arbitration), its core primitive — the `CoherenceGate` — is architecturally relevant to WiFi CSI processing: - **CoherenceGate** uses `ruvector-mincut` to make real-time gate/pass decisions on signal streams based on structural coherence thresholds. In quantum computing, this gates qubit syndrome streams. For WiFi CSI, the same mechanism could gate CSI subcarrier streams — passing only subcarriers whose coherence (phase stability across antennas) exceeds a dynamic threshold. - **Syndrome filtering** (`filters.rs`) implements Kalman-like adaptive filters that could be repurposed for CSI noise filtering — treating each subcarrier's amplitude drift as a "syndrome" stream. - **Min-cut gated transformer** integration (optional feature) provides coherence-optimized attention with 50% FLOP reduction — directly applicable to the `ModalityTranslator` bottleneck. **Decision**: ruQu is not included in the initial pipeline (Phase 1-8) but is marked as a **Phase 9 exploration** candidate for coherence-gated CSI filtering. The CoherenceGate primitive maps naturally to subcarrier quality assessment, and the integration path is clean since ruQu already depends on `ruvector-mincut`. ## Appendix B: Training Data Strategy The pipeline supports three data sources for training, used in combination: | Source | Subcarriers | Pose Labels | Volume | Cost | When | |--------|-------------|-------------|--------|------|------| | **MM-Fi** (public) | 114 → 56 (interpolated) | 17 COCO + DensePose UV | 40 subjects, 320K frames | Free (CC BY-NC) | Phase 1 — bootstrap | | **Wi-Pose** (public) | 30 → 56 (zero-padded) | 18 keypoints | 12 subjects, 166K packets | Free (research) | Phase 1 — diversity | | **ESP32 self-collected** | 56 (native) | Teacher-student from camera | Unlimited, environment-specific | Hardware only ($54) | Phase 4+ — fine-tuning | **Recommended approach: Both public + ESP32 data.** 1. **Pre-train on MM-Fi + Wi-Pose** (public data, Phase 1-4): Provides the base model with diverse subjects and actions. The 114→56 subcarrier interpolation is acceptable for learning general CSI-to-pose mappings. 2. **Fine-tune on ESP32 self-collected data** (Phase 5+, SONA adaptation): Collect 5-30 minutes of paired ESP32 CSI + camera data in each target environment. The camera serves as the teacher model (Detectron2 generates pseudo-labels). SONA LoRA adaptation takes <50 gradient steps to converge. 3. **Continuous adaptation** (runtime): SONA's self-supervised temporal consistency loss refines the model without any camera, using the assumption that poses change smoothly over short time windows. This three-tier strategy gives you: - A working model from day one (public data) - Environment-specific accuracy (ESP32 fine-tuning) - Ongoing drift correction (SONA runtime adaptation)