wifi-densepose/docs/adr/ADR-023-trained-densepose-model-ruvector-pipeline.md

# ADR-023: Trained DensePose Model with RuVector Signal Intelligence Pipeline

| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-02-28 |
| **Deciders** | ruv |
| **Relates to** | ADR-003 (RVF Cognitive Containers), ADR-005 (SONA Self-Learning), ADR-015 (Public Dataset Strategy), ADR-016 (RuVector Integration), ADR-017 (RuVector-Signal-MAT), ADR-020 (Rust AI Migration), ADR-021 (Vital Sign Detection) |

## Context

### The Gap Between Sensing and DensePose

The WiFi-DensePose system currently operates in two distinct modes:

1. **WiFi CSI sensing** (working): ESP32 streams CSI frames → Rust aggregator → feature extraction → presence/motion classification. 41 tests passing, verified at ~20 Hz with real hardware.

2. **Heuristic pose derivation** (working but approximate): The Rust sensing server generates 17 COCO keypoints from WiFi signal properties using hand-crafted rules (`derive_pose_from_sensing()` in `sensing-server/src/main.rs`). This is not a trained model — keypoint positions are derived from signal amplitude, phase variance, and motion metrics rather than learned from labeled data.

Neither mode produces **DensePose-quality** body surface estimation. The CMU "DensePose From WiFi" paper (arXiv:2301.00250) demonstrated that a neural network trained on paired WiFi CSI + camera pose data can produce dense body surface UV coordinates from WiFi alone. However, that approach requires:

- **Environment-specific training**: The model must be trained or fine-tuned for each deployment environment because CSI multipath patterns are environment-dependent.
- **Paired training data**: Simultaneous WiFi CSI captures + ground-truth pose annotations (or a camera-based teacher model generating pseudo-labels).
- **Substantial compute**: Training a modality translation network + DensePose head requires GPU time (hours to days depending on dataset size).

### What Exists in the Codebase

The Rust workspace already has the complete model architecture ready for training:

| Component | Crate | File | Status |
|-----------|-------|------|--------|
| `WiFiDensePoseModel` | `wifi-densepose-train` | `model.rs` | Implemented (random weights) |
| `ModalityTranslator` | `wifi-densepose-train` | `model.rs` | Implemented with RuVector attention |
| `KeypointHead` | `wifi-densepose-train` | `model.rs` | Implemented (17 COCO heatmaps) |
| `DensePoseHead` | `wifi-densepose-nn` | `densepose.rs` | Implemented (25 parts + 48 UV) |
| `WiFiDensePoseLoss` | `wifi-densepose-train` | `losses.rs` | Implemented (keypoint + part + UV + transfer) |
| `MmFiDataset` loader | `wifi-densepose-train` | `dataset.rs` | Planned (ADR-015) |
| `WiFiDensePosePipeline` | `wifi-densepose-nn` | `inference.rs` | Implemented (generic over Backend) |
| Training proof verification | `wifi-densepose-train` | `proof.rs` | Implemented (deterministic hash) |
| Subcarrier resampling (114→56) | `wifi-densepose-train` | `subcarrier.rs` | Planned (ADR-016) |

### RuVector Crates Available

The `vendor/ruvector/` subtree provides 90+ crates. The following are directly relevant to a trained DensePose pipeline:

**Already integrated (5 crates, ADR-016):**

| Crate | Algorithm | Current Use |
|-------|-----------|-------------|
| `ruvector-mincut` | Subpolynomial dynamic min-cut O(n^{o(1)}) | Multi-person assignment in `metrics.rs` |
| `ruvector-attn-mincut` | Attention-gated min-cut | Noise-suppressed spectrogram in `model.rs` |
| `ruvector-attention` | Scaled dot-product + geometric attention | Spatial decoder in `model.rs` |
| `ruvector-solver` | Sparse Neumann solver O(√n) | Subcarrier resampling in `subcarrier.rs` |
| `ruvector-temporal-tensor` | Tiered temporal compression | CSI frame buffering in `dataset.rs` |

**Newly proposed for DensePose pipeline (6 additional crates):**

| Crate | Description | Proposed Use |
|-------|-------------|-------------|
| `ruvector-gnn` | Graph neural network on HNSW topology | Spatial body-graph reasoning |
| `ruvector-graph-transformer` | Proof-gated graph transformer (8 modules) | CSI-to-pose cross-attention |
| `ruvector-sparse-inference` | PowerInfer-style sparse inference engine | Edge deployment with neuron activation sparsity |
| `ruvector-sona` | Self-Optimizing Neural Architecture (LoRA + EWC++) | Online environment adaptation |
| `ruvector-fpga-transformer` | FPGA-optimized transformer | Hardware-accelerated inference path |
| `ruvector-math` | Optimal transport, information geometry | Domain adaptation loss functions |

### RVF Container Format

The RuVector Format (RVF) is a segment-based binary container format designed to package
intelligence artifacts — embeddings, HNSW indexes, quantized weights, WASM runtimes, witness
proofs, and metadata — into a single self-contained file. Key properties:

- **64-byte segment headers** (`SegmentHeader`, magic `0x52564653` "RVFS") with type discriminator, content hash, compression, and timestamp
- **Progressive loading**: Layer A (entry points, <5ms) → Layer B (hot adjacency, 100ms–1s) → Layer C (full graph, seconds)
- **20+ segment types**: `Vec` (embeddings), `Index` (HNSW), `Overlay` (min-cut witnesses), `Quant` (codebooks), `Witness` (proof-of-computation), `Wasm` (self-bootstrapping runtime), `Dashboard` (embedded UI), `AggregateWeights` (federated SONA deltas), `Crypto` (Ed25519 signatures), and more
- **Temperature-tiered quantization** (`rvf-quant`): f32 / f16 / u8 / binary per-segment, with SIMD-accelerated distance computation
- **AGI Cognitive Container** (`agi_container.rs`): packages kernel + WASM + world model + orchestrator + evaluation harness + witness chains into a single deployable file

The trained DensePose model will be packaged as an `.rvf` container, making it a single
self-contained artifact that includes model weights, HNSW-indexed embedding tables, min-cut
graph overlays, quantization codebooks, SONA adaptation deltas, and the WASM inference
runtime — deployable to any host without external dependencies.

## Decision

Implement a fully trained DensePose model using RuVector signal intelligence as the backbone signal processing layer, packaged in the RVF container format. The pipeline has three stages: (1) offline training on public datasets, (2) teacher-student distillation for DensePose UV labels, and (3) online SONA adaptation for environment-specific fine-tuning. The trained model, its embeddings, indexes, and adaptation state are serialized into a single `.rvf` file.

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    TRAINED DENSEPOSE PIPELINE                                │
│                                                                             │
│  ┌─────────────┐    ┌──────────────────────┐    ┌──────────────────────┐   │
│  │ ESP32 CSI    │    │  RuVector Signal      │    │  Trained Neural      │   │
│  │ Raw I/Q      │───▶│  Intelligence Layer   │───▶│  Network             │   │
│  │ [ant×sub×T]  │    │  (preprocessing)      │    │  (inference)         │   │
│  └─────────────┘    └──────────────────────┘    └──────────────────────┘   │
│                              │                           │                   │
│                    ┌─────────┴─────────┐       ┌────────┴────────┐         │
│                    │ 5 RuVector crates  │       │ 6 RuVector      │         │
│                    │ (signal processing)│       │ crates (neural) │         │
│                    └───────────────────┘       └─────────────────┘         │
│                                                        │                    │
│                              ┌──────────────────────────┘                   │
│                              ▼                                              │
│                    ┌──────────────────────────────────────┐                 │
│                    │              Outputs                   │                 │
│                    │  • 17 COCO keypoints [B,17,H,W]       │                 │
│                    │  • 25 body parts     [B,25,H,W]       │                 │
│                    │  • 48 UV coords      [B,48,H,W]       │                 │
│                    │  • Confidence scores                   │                 │
│                    └──────────────────────────────────────┘                 │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Stage 1: RuVector Signal Preprocessing Layer

Raw CSI frames from ESP32 (56–192 subcarriers × N antennas × T time frames) are processed through the RuVector signal intelligence stack before entering the neural network. This replaces hand-crafted feature extraction with learned, graph-aware preprocessing.

```
Raw CSI [ant, sub, T]
    │
    ▼
┌─────────────────────────────────────────────────────┐
│  1. ruvector-attn-mincut: gate_spectrogram()        │
│     Input:  Q=amplitude, K=phase, V=combined        │
│     Effect: Suppress multipath noise, keep motion-  │
│             relevant subcarrier paths                │
│     Output: Gated spectrogram [ant, sub', T]        │
├─────────────────────────────────────────────────────┤
│  2. ruvector-mincut: mincut_subcarrier_partition()   │
│     Input:  Subcarrier coherence graph               │
│     Effect: Partition into sensitive (motion-         │
│             responsive) vs insensitive (static)      │
│     Output: Partition mask + per-subcarrier weights   │
├─────────────────────────────────────────────────────┤
│  3. ruvector-attention: attention_weighted_bvp()     │
│     Input:  Gated spectrogram + partition weights    │
│     Effect: Compute body velocity profile with       │
│             sensitivity-weighted attention            │
│     Output: BVP feature vector [D_bvp]               │
├─────────────────────────────────────────────────────┤
│  4. ruvector-solver: solve_fresnel_geometry()        │
│     Input:  Amplitude + known TX/RX positions        │
│     Effect: Estimate TX-body-RX ellipsoid distances  │
│     Output: Fresnel geometry features [D_fresnel]    │
├─────────────────────────────────────────────────────┤
│  5. ruvector-temporal-tensor: compress + buffer      │
│     Input:  Temporal CSI window (100 frames)         │
│     Effect: Tiered quantization (hot/warm/cold)      │
│     Output: Compressed tensor, 50-75% memory saving  │
└─────────────────────────────────────────────────────┘
    │
    ▼
Feature tensor [B, T*tx*rx, sub] (preprocessed, noise-suppressed)
```

### Stage 2: Neural Network Architecture

The neural network follows the CMU teacher-student architecture with RuVector enhancements at three critical points.

#### 2a. ModalityTranslator (CSI → Visual Feature Space)

```
CSI features [B, T*tx*rx, sub]
    │
    ├──amplitude──┐
    │              ├─► Encoder (Conv1D stack, 64→128→256)
    └──phase──────┘         │
                            ▼
              ┌──────────────────────────────┐
              │  ruvector-graph-transformer   │
              │                              │
              │  Treat antenna-pair×time as  │
              │  graph nodes. Edges connect  │
              │  spatially adjacent antenna  │
              │  pairs and temporally        │
              │  adjacent frames.            │
              │                              │
              │  Proof-gated attention:      │
              │  Each layer verifies that    │
              │  attention weights satisfy   │
              │  physical constraints        │
              │  (Fresnel ellipsoid bounds)  │
              └──────────────────────────────┘
                            │
                            ▼
              Decoder (ConvTranspose2d stack, 256→128→64→3)
                            │
                            ▼
              Visual features [B, 3, 48, 48]
```

**RuVector enhancement**: Replace standard multi-head self-attention in the bottleneck with `ruvector-graph-transformer`. The graph structure encodes the physical antenna topology — nodes that are closer in space (adjacent ESP32 nodes in the mesh) or time (consecutive frames) have stronger edge weights. This injects domain-specific inductive bias that standard attention lacks.

#### 2b. GNN Body Graph Reasoning

```
Visual features [B, 3, 48, 48]
    │
    ▼
ResNet18 backbone → feature maps [B, 256, 12, 12]
    │
    ▼
┌─────────────────────────────────────────┐
│  ruvector-gnn: Body Graph Network       │
│                                         │
│  17 COCO keypoints as graph nodes       │
│  Edges: anatomical connections          │
│  (shoulder→elbow, hip→knee, etc.)       │
│                                         │
│  GNN message passing (3 rounds):        │
│  h_i^{l+1} = σ(W·h_i^l + Σ_j α_ij·h_j)│
│  α_ij = attention(h_i, h_j, edge_ij)   │
│                                         │
│  Enforces anatomical constraints:       │
│  - Limb length ratios                   │
│  - Joint angle limits                   │
│  - Left-right symmetry priors           │
└─────────────────────────────────────────┘
    │
    ├──────────────────┬──────────────────┐
    ▼                  ▼                  ▼
KeypointHead      DensePoseHead     ConfidenceHead
[B,17,H,W]       [B,25+48,H,W]     [B,1]
heatmaps          parts + UV         quality score
```

**RuVector enhancement**: `ruvector-gnn` replaces the flat spatial decoder with a graph neural network that operates on the human body graph. WiFi CSI is inherently noisy — GNN message passing between anatomically connected joints enforces that predicted keypoints maintain plausible body structure even when individual joint predictions are uncertain.

#### 2c. Sparse Inference for Edge Deployment

```
Trained model weights (full precision)
    │
    ▼
┌─────────────────────────────────────────────┐
│  ruvector-sparse-inference                   │
│                                              │
│  PowerInfer-style activation sparsity:       │
│  - Profile neuron activation frequency       │
│  - Partition into hot (always active, 20%)   │
│    and cold (conditionally active, 80%)      │
│  - Hot neurons: GPU/SIMD fast path           │
│  - Cold neurons: sparse lookup on demand     │
│                                              │
│  Quantization:                               │
│  - Backbone: INT8 (4x memory reduction)      │
│  - DensePose head: FP16 (2x reduction)       │
│  - ModalityTranslator: FP16                  │
│                                              │
│  Target: <50ms inference on ESP32-S3         │
│          <10ms on x86 with AVX2              │
└─────────────────────────────────────────────┘
```

### Stage 3: Training Pipeline

#### 3a. Dataset Loading and Preprocessing

Primary dataset: **MM-Fi** (NeurIPS 2023) — 40 subjects, 27 actions, 114 subcarriers, 3 RX antennas, 17 COCO keypoints + DensePose UV annotations.

Secondary dataset: **Wi-Pose** — 12 subjects, 12 actions, 30 subcarriers, 3×3 antenna array, 18 keypoints.

```
┌──────────────────────────────────────────────────────────┐
│  Data Loading Pipeline                                    │
│                                                          │
│  MM-Fi .npy ──► Resample 114→56 subcarriers ──┐         │
│                (ruvector-solver NeumannSolver)  │         │
│                                                ├──► Batch│
│  Wi-Pose .mat ──► Zero-pad 30→56 subcarriers ──┘  [B,T*│
│                                                    ant, │
│  Phase sanitize ──► Hampel filter ──► unwrap        sub] │
│  (wifi-densepose-signal::phase_sanitizer)                │
│                                                          │
│  Temporal buffer ──► ruvector-temporal-tensor             │
│  (100 frames/sample, tiered quantization)                │
└──────────────────────────────────────────────────────────┘
```

#### 3b. Teacher-Student DensePose Labels

For samples with 3D keypoints but no DensePose UV maps:

1. Run Detectron2 DensePose R-CNN on paired RGB frames (one-time preprocessing step on GPU workstation)
2. Generate `(part_labels [H,W], u_coords [H,W], v_coords [H,W])` pseudo-labels
3. Cache as `.npy` alongside original data
4. Teacher model is discarded after label generation — inference uses WiFi only

#### 3c. Loss Function

```rust
L_total = λ_kp  · L_keypoint      // MSE on predicted vs GT heatmaps
        + λ_part · L_part          // Cross-entropy on 25-class body part segmentation
        + λ_uv   · L_uv           // Smooth L1 on UV coordinate regression
        + λ_xfer · L_transfer     // MSE between CSI features and teacher visual features
        + λ_ot   · L_ot           // Optimal transport regularization (ruvector-math)
        + λ_graph · L_graph       // GNN edge consistency loss (ruvector-gnn)
```

**RuVector enhancement**: `ruvector-math` provides optimal transport (Wasserstein distance) as a regularization term. This penalizes predicted body part distributions that are far from the ground truth in the Wasserstein metric, which is more geometrically meaningful than pixel-wise cross-entropy for spatial body part segmentation.

#### 3d. Training Configuration

| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Optimizer | AdamW | Weight decay regularization |
| Learning rate | 1e-3, cosine decay to 1e-5 | Standard for modality translation |
| Batch size | 32 | Fits in 24GB GPU VRAM |
| Epochs | 100 | With early stopping (patience=15) |
| Warmup | 5 epochs | Linear LR warmup |
| Train/val split | Subjects 1-32 / 33-40 | Subject-disjoint for generalization |
| Augmentation | Time-shift ±5 frames, amplitude noise ±2dB, antenna dropout 10% | CSI-domain augmentations |
| Hardware | Single RTX 3090 or A100 | ~8 hours on A100 |
| Checkpoint | Every epoch, keep best-by-validation-PCK | Deterministic seed |

#### 3e. Metrics

| Metric | Target | Description |
|--------|--------|-------------|
| PCK@0.2 | >70% on MM-Fi val | Percentage of correct keypoints (threshold = 0.2 × torso diameter) |
| OKS mAP | >0.50 on MM-Fi val | Object Keypoint Similarity, COCO-standard |
| DensePose GPS | >0.30 on MM-Fi val | Geodesic Point Similarity for UV accuracy |
| Inference latency | <50ms per frame | On x86 with ONNX Runtime |
| Model size | <25MB (FP16) | Suitable for edge deployment |

### Stage 4: Online Adaptation with SONA

After offline training produces a base model, SONA enables continuous adaptation to new environments without retraining from scratch.

```
┌──────────────────────────────────────────────────────────┐
│  SONA Online Adaptation Loop                              │
│                                                          │
│  Base model (frozen weights W)                           │
│       │                                                  │
│       ▼                                                  │
│  ┌──────────────────────────────────┐                    │
│  │  LoRA Adaptation Matrices        │                    │
│  │  W_effective = W + α · A·B       │                    │
│  │                                  │                    │
│  │  Rank r=4 for translator layers  │                    │
│  │  Rank r=2 for backbone layers    │                    │
│  │  Rank r=8 for DensePose head     │                    │
│  │                                  │                    │
│  │  Total trainable params: ~50K    │                    │
│  │  (vs ~5M frozen base)            │                    │
│  └──────────────────────────────────┘                    │
│       │                                                  │
│       ▼                                                  │
│  ┌──────────────────────────────────┐                    │
│  │  EWC++ Regularizer               │                    │
│  │  L = L_task + λ·Σ F_i(θ-θ*)²    │                    │
│  │                                  │                    │
│  │  Prevents forgetting base model  │                    │
│  │  knowledge when adapting to new  │                    │
│  │  environment                     │                    │
│  └──────────────────────────────────┘                    │
│       │                                                  │
│       ▼                                                  │
│  Adaptation triggers:                                    │
│  • First deployment in new room                          │
│  • PCK drops below threshold (drift detection)           │
│  • User manually initiates calibration                   │
│  • Furniture/layout change detected (CSI baseline shift) │
│                                                          │
│  Adaptation data:                                        │
│  • Self-supervised: temporal consistency loss             │
│    (pose at t should be similar to t-1 for slow motion)  │
│  • Semi-supervised: user confirmation of presence/count  │
│  • Optional: brief camera calibration session (5 min)    │
│                                                          │
│  Convergence: 10-50 gradient steps, <5 seconds on CPU    │
└──────────────────────────────────────────────────────────┘
```

### Stage 5: Inference Pipeline (Production)

```
ESP32 CSI (UDP :5005)
    │
    ▼
Rust Axum server (port 8080)
    │
    ├─► RuVector signal preprocessing (Stage 1)
    │       5 crates, ~2ms per frame
    │
    ├─► ONNX Runtime inference (Stage 2)
    │       Quantized model, ~10ms per frame
    │       OR ruvector-sparse-inference, ~8ms per frame
    │
    ├─► GNN post-processing (ruvector-gnn)
    │       Anatomical constraint enforcement, ~1ms
    │
    ├─► SONA adaptation check (Stage 4)
    │       <0.05ms per frame (gradient accumulation only)
    │
    └─► Output: DensePose results
            │
            ├──► /api/v1/stream/pose (WebSocket, 17 keypoints)
            ├──► /api/v1/pose/current (REST, full DensePose)
            └──► /ws/sensing (WebSocket, raw + processed)
```

Total inference budget: **<15ms per frame** at 20 Hz on x86, **<50ms** on ESP32-S3 (with sparse inference).

### Stage 6: RVF Model Container Format

The trained model is packaged as a single `.rvf` file that contains everything needed for
inference — no external weight files, no ONNX runtime, no Python dependencies.

#### RVF DensePose Container Layout

```
wifi-densepose-v1.rvf (single file, ~15-30 MB)
┌───────────────────────────────────────────────────────────────┐
│  SEGMENT 0: Manifest (0x05)                                   │
│  ├── Model ID: "wifi-densepose-v1.0"                          │
│  ├── Training dataset: "mmfi-v1+wipose-v1"                    │
│  ├── Training config hash: SHA-256                            │
│  ├── Target hardware: x86_64, aarch64, wasm32                 │
│  ├── Segment directory (offsets to all segments)               │
│  └── Level-1 TLV manifest with metadata tags                  │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 1: Vec (0x01) — Model Weight Embeddings              │
│  ├── ModalityTranslator weights [64→128→256→3, Conv1D+ConvT]  │
│  ├── ResNet18 backbone weights [3→64→128→256, residual blocks] │
│  ├── KeypointHead weights [256→17, deconv layers]             │
│  ├── DensePoseHead weights [256→25+48, deconv layers]         │
│  ├── GNN body graph weights [3 message-passing rounds]        │
│  └── Graph transformer attention weights [proof-gated layers] │
│  Format: flat f32 vectors, 768-dim per weight tensor          │
│  Total: ~5M parameters → ~20MB f32, ~10MB f16, ~5MB INT8     │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 2: Index (0x02) — HNSW Embedding Index               │
│  ├── Layer A: Entry points + coarse routing centroids          │
│  │   (loaded first, <5ms, enables approximate search)         │
│  ├── Layer B: Hot region adjacency for frequently             │
│  │   accessed weight clusters (100ms load)                    │
│  └── Layer C: Full adjacency graph for exact nearest          │
│      neighbor lookup across all weight partitions             │
│  Use: Fast weight lookup for sparse inference —               │
│  only load hot neurons, skip cold neurons via HNSW routing    │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 3: Overlay (0x03) — Dynamic Min-Cut Graph            │
│  ├── Subcarrier partition graph (sensitive vs insensitive)     │
│  ├── Min-cut witnesses from ruvector-mincut                   │
│  ├── Antenna topology graph (ESP32 mesh spatial layout)       │
│  └── Body skeleton graph (17 COCO joints, 16 edges)           │
│  Use: Pre-computed graph structures loaded at init time.       │
│  Dynamic updates via ruvector-mincut insert/delete_edge       │
│  as environment changes (furniture moves, new obstacles)      │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 4: Quant (0x06) — Quantization Codebooks             │
│  ├── INT8 codebook for backbone (4x memory reduction)         │
│  ├── FP16 scale factors for translator + heads                │
│  ├── Binary quantization tables for SIMD distance compute     │
│  └── Per-layer calibration statistics (min, max, zero-point)  │
│  Use: rvf-quant temperature-tiered quantization —             │
│  hot layers stay f16, warm layers u8, cold layers binary      │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 5: Witness (0x0A) — Training Proof Chain             │
│  ├── Deterministic training proof (seed, loss curve, hash)    │
│  ├── Dataset provenance (MM-Fi commit hash, download URL)     │
│  ├── Validation metrics (PCK@0.2, OKS mAP, GPS scores)       │
│  ├── Ed25519 signature over weight hash                       │
│  └── Attestation: training hardware, duration, config         │
│  Use: Verifiable proof that model weights match a specific    │
│  training run. Anyone can re-run training with same seed      │
│  and verify the weight hash matches the witness.              │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 6: Meta (0x07) — Model Metadata                      │
│  ├── COCO keypoint names and skeleton connectivity            │
│  ├── DensePose body part labels (24 parts + background)       │
│  ├── UV coordinate range and resolution                       │
│  ├── Input normalization statistics (mean, std per subcarrier)│
│  ├── RuVector crate versions used during training             │
│  └── Environment calibration profiles (named, per-room)       │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 7: AggregateWeights (0x36) — SONA LoRA Deltas        │
│  ├── Per-environment LoRA adaptation matrices (A, B per layer)│
│  ├── EWC++ Fisher information diagonal                        │
│  ├── Optimal θ* reference parameters                          │
│  ├── Adaptation round count and convergence metrics           │
│  └── Named profiles: "lab-a", "living-room", "office-3f"     │
│  Use: Multiple environment adaptations stored in one file.    │
│  Server loads the matching profile or creates a new one.      │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 8: Profile (0x0B) — RVDNA Domain Profile             │
│  ├── Domain: "wifi-csi-densepose"                             │
│  ├── Input spec: [B, T*ant, sub] CSI tensor format            │
│  ├── Output spec: keypoints [B,17,H,W], parts [B,25,H,W],    │
│  │   UV [B,48,H,W], confidence [B,1]                         │
│  ├── Hardware requirements: min RAM, recommended GPU          │
│  └── Supported data sources: esp32, wifi-rssi, simulation    │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 9: Crypto (0x0C) — Signature and Keys                │
│  ├── Ed25519 public key for model publisher                   │
│  ├── Signature over all segment content hashes                │
│  └── Certificate chain (optional, for enterprise deployment)  │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 10: Wasm (0x10) — Self-Bootstrapping Runtime         │
│  ├── Compiled WASM inference engine                           │
│  │   (ruvector-sparse-inference-wasm)                         │
│  ├── WASM microkernel for RVF segment parsing                 │
│  └── Browser-compatible: load .rvf → run inference in-browser │
│  Use: The .rvf file is fully self-contained — a WASM host     │
│  can execute inference without any external dependencies.     │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 11: Dashboard (0x11) — Embedded Visualization        │
│  ├── Three.js-based pose visualization (HTML/JS/CSS)          │
│  ├── Gaussian splat renderer for signal field                 │
│  └── Served at http://localhost:8080/ when model is loaded    │
│  Use: Open the .rvf file → get a working UI with no install  │
└───────────────────────────────────────────────────────────────┘
```

#### RVF Loading Sequence

```
1. Read tail → find_latest_manifest() → SegmentDirectory
2. Load Manifest (seg 0) → validate magic, version, model ID
3. Load Profile (seg 8) → verify input/output spec compatibility
4. Load Crypto (seg 9) → verify Ed25519 signature chain
5. Load Quant (seg 4) → prepare quantization codebooks
6. Load Index Layer A (seg 2) → entry points ready (<5ms)
       ↓ (inference available at reduced accuracy)
7. Load Vec (seg 1) → hot weight partitions via Layer A routing
8. Load Index Layer B (seg 2) → hot adjacency ready (100ms)
       ↓ (inference at full accuracy for common poses)
9. Load Overlay (seg 3) → min-cut graphs, body skeleton
10. Load AggregateWeights (seg 7) → apply matching SONA profile
11. Load Index Layer C (seg 2) → complete graph loaded
       ↓ (full inference with all weight partitions)
12. Load Wasm (seg 10) → WASM runtime available (optional)
13. Load Dashboard (seg 11) → UI served (optional)
```

**Progressive availability**: Inference begins after step 6 (~5ms) with approximate
results. Full accuracy is reached by step 9 (~500ms). This enables instant startup
with gradually improving quality — critical for real-time applications.

#### RVF Build Pipeline

After training completes, the model is packaged into an `.rvf` file:

```bash
# Build the RVF container from trained checkpoint
cargo run -p wifi-densepose-train --bin build-rvf -- \
    --checkpoint checkpoints/best-pck.pt \
    --quantize int8,fp16 \
    --hnsw-build \
    --sign --key model-signing-key.pem \
    --include-wasm \
    --include-dashboard ../../ui \
    --output wifi-densepose-v1.rvf

# Verify the built container
cargo run -p wifi-densepose-train --bin verify-rvf -- \
    --input wifi-densepose-v1.rvf \
    --verify-signature \
    --verify-witness \
    --benchmark-inference
```

#### RVF Runtime Integration

The sensing server loads the `.rvf` container at startup:

```bash
# Load model from RVF container
./target/release/sensing-server \
    --model wifi-densepose-v1.rvf \
    --source auto \
    --ui-from-rvf  # serve Dashboard segment instead of --ui-path
```

```rust
// In sensing-server/src/main.rs
use rvf_runtime::RvfContainer;
use rvf_index::layers::IndexLayer;
use rvf_quant::QuantizedVec;

let container = RvfContainer::open("wifi-densepose-v1.rvf")?;

// Progressive load: Layer A first for instant startup
let index = container.load_index(IndexLayer::A)?;
let weights = container.load_vec_hot(&index)?;  // hot partitions only

// Full load in background
tokio::spawn(async move {
    container.load_index(IndexLayer::B).await?;
    container.load_index(IndexLayer::C).await?;
    container.load_vec_cold().await?;  // remaining partitions
});

// SONA environment adaptation
let sona_deltas = container.load_aggregate_weights("office-3f")?;
model.apply_lora_deltas(&sona_deltas);

// Serve embedded dashboard
let dashboard = container.load_dashboard()?;
// Mount at /ui/* routes in Axum
```

## Implementation Plan

### Phase 1: Dataset Loaders (2 weeks)

- Implement `MmFiDataset` in `wifi-densepose-train/src/dataset.rs`
- Read MM-Fi `.npy` files with antenna correction (1TX/3RX → 3×3 zero-padding)
- Subcarrier resampling 114→56 via `ruvector-solver::NeumannSolver`
- Phase sanitization via `wifi-densepose-signal::phase_sanitizer`
- Implement `WiPoseDataset` for secondary dataset
- Temporal windowing with `ruvector-temporal-tensor`
- **Deliverable**: `cargo test -p wifi-densepose-train` with dataset loading tests

### Phase 2: Graph Transformer Integration (2 weeks)

- Add `ruvector-graph-transformer` dependency to `wifi-densepose-train`
- Replace bottleneck self-attention in `ModalityTranslator` with proof-gated graph transformer
- Build antenna topology graph (nodes = antenna pairs, edges = spatial/temporal proximity)
- Add `ruvector-gnn` dependency for body graph reasoning
- Build COCO body skeleton graph (17 nodes, 16 anatomical edges)
- Implement GNN message passing in spatial decoder
- **Deliverable**: Model forward pass produces correct output shapes with graph layers

### Phase 3: Teacher-Student Label Generation (1 week)

- Python script using Detectron2 DensePose to generate UV pseudo-labels from MM-Fi RGB frames
- Cache labels as `.npy` for Rust loader consumption
- Validate label quality on a random subset (visual inspection)
- **Deliverable**: Complete UV label set for MM-Fi training split

### Phase 4: Training Loop (3 weeks)

- Implement `WiFiDensePoseTrainer` with full loss function (6 terms)
- Add `ruvector-math` optimal transport loss term
- Integrate GNN edge consistency loss
- Training loop with cosine LR schedule, early stopping, checkpointing
- Validation metrics: PCK@0.2, OKS mAP, DensePose GPS
- Deterministic proof verification (`proof.rs`) with weight hash
- **Deliverable**: Trained model checkpoint achieving PCK@0.2 >70% on MM-Fi validation

### Phase 5: SONA Online Adaptation (2 weeks)

- Integrate `ruvector-sona` into inference pipeline
- Implement LoRA injection at translator, backbone, and DensePose head layers
- Implement EWC++ Fisher information computation and regularization
- Self-supervised temporal consistency loss for unsupervised adaptation
- Calibration mode: 5-minute camera session for supervised fine-tuning
- Drift detection: monitor rolling PCK on temporal consistency proxy
- **Deliverable**: Adaptation converges in <50 gradient steps, PCK recovers within 10% of base

### Phase 6: Sparse Inference and Edge Deployment (2 weeks)

- Profile neuron activation frequencies on validation set
- Apply `ruvector-sparse-inference` hot/cold neuron partitioning
- INT8 quantization for backbone, FP16 for heads
- ONNX export with quantized weights
- Benchmark on x86 (target: <10ms) and ARM (target: <50ms)
- WASM export via `ruvector-sparse-inference-wasm` for browser inference
- **Deliverable**: Quantized ONNX model, benchmark results, WASM binary

### Phase 7: RVF Container Build Pipeline (2 weeks)

- Implement `build-rvf` binary in `wifi-densepose-train`
- Serialize trained weights into `Vec` segment (SegmentType::Vec, 0x01)
- Build HNSW index over weight partitions for sparse inference (SegmentType::Index, 0x02)
- Serialize min-cut graph overlays: subcarrier partition, antenna topology, body skeleton (SegmentType::Overlay, 0x03)
- Generate quantization codebooks via `rvf-quant` (SegmentType::Quant, 0x06)
- Write training proof witness with Ed25519 signature (SegmentType::Witness, 0x0A)
- Store model metadata, COCO keypoint schema, normalization stats (SegmentType::Meta, 0x07)
- Store SONA LoRA adaptation deltas per environment (SegmentType::AggregateWeights, 0x36)
- Write RVDNA domain profile for WiFi CSI DensePose (SegmentType::Profile, 0x0B)
- Optionally embed WASM inference runtime (SegmentType::Wasm, 0x10)
- Optionally embed Three.js dashboard (SegmentType::Dashboard, 0x11)
- Build Level-1 manifest and segment directory (SegmentType::Manifest, 0x05)
- Implement `verify-rvf` binary for container validation
- **Deliverable**: `wifi-densepose-v1.rvf` single-file container, verifiable and self-contained

### Phase 8: Integration with Sensing Server (1 week)

- Load `.rvf` container in `wifi-densepose-sensing-server` via `rvf-runtime`
- Progressive loading: Layer A first for instant startup, full graph in background
- Replace `derive_pose_from_sensing()` heuristic with trained model inference
- Add `--model` CLI flag accepting `.rvf` path (or legacy `.onnx`)
- Apply SONA LoRA deltas from `AggregateWeights` segment based on `--env` flag
- Serve embedded Dashboard segment at `/ui/*` when `--ui-from-rvf` is set
- Graceful fallback to heuristic when no model file present
- Update WebSocket protocol to include DensePose UV data
- **Deliverable**: Sensing server serves trained model from single `.rvf` file

## File Changes

### New Files

| File | Purpose |
|------|---------|
| `rust-port/.../wifi-densepose-train/src/dataset_mmfi.rs` | MM-Fi dataset loader with subcarrier resampling |
| `rust-port/.../wifi-densepose-train/src/dataset_wipose.rs` | Wi-Pose dataset loader |
| `rust-port/.../wifi-densepose-train/src/graph_transformer.rs` | Graph transformer integration |
| `rust-port/.../wifi-densepose-train/src/body_gnn.rs` | GNN body graph reasoning |
| `rust-port/.../wifi-densepose-train/src/adaptation.rs` | SONA LoRA + EWC++ adaptation |
| `rust-port/.../wifi-densepose-train/src/trainer.rs` | Training loop with multi-term loss |
| `scripts/generate_densepose_labels.py` | Teacher-student UV label generation |
| `scripts/benchmark_inference.py` | Inference latency benchmarking |
| `rust-port/.../wifi-densepose-train/src/rvf_builder.rs` | RVF container build pipeline |
| `rust-port/.../wifi-densepose-train/src/bin/build_rvf.rs` | CLI binary for building `.rvf` containers |
| `rust-port/.../wifi-densepose-train/src/bin/verify_rvf.rs` | CLI binary for verifying `.rvf` containers |

### Modified Files

| File | Change |
|------|--------|
| `rust-port/.../wifi-densepose-train/Cargo.toml` | Add ruvector-gnn, graph-transformer, sona, sparse-inference, math, rvf-types, rvf-wire, rvf-manifest, rvf-index, rvf-quant, rvf-crypto, rvf-runtime deps |
| `rust-port/.../wifi-densepose-train/src/model.rs` | Integrate graph transformer + GNN layers |
| `rust-port/.../wifi-densepose-train/src/losses.rs` | Add optimal transport + GNN edge consistency loss terms |
| `rust-port/.../wifi-densepose-train/src/config.rs` | Add training hyperparameters for new components |
| `rust-port/.../sensing-server/Cargo.toml` | Add rvf-runtime, rvf-types, rvf-index, rvf-quant deps |
| `rust-port/.../sensing-server/src/main.rs` | Add `--model` flag, load `.rvf` container, progressive startup, serve embedded dashboard |

## Consequences

### Positive

- **Trained model produces accurate DensePose**: Moves from heuristic keypoints to learned body surface estimation backed by public dataset evaluation
- **RuVector signal intelligence is a differentiator**: Graph transformers on antenna topology and GNN body reasoning are novel — no prior WiFi pose system uses these techniques
- **SONA enables zero-shot deployment**: New environments don't require full retraining — LoRA adaptation with <50 gradient steps converges in seconds
- **Sparse inference enables edge deployment**: PowerInfer-style neuron partitioning brings DensePose inference to ESP32-class hardware
- **Graceful degradation**: Server falls back to heuristic pose when no model file is present — existing functionality is preserved
- **Single-file deployment via RVF**: Trained model, embeddings, HNSW index, quantization codebooks, SONA adaptation profiles, WASM runtime, and dashboard UI packaged in one `.rvf` file — deploy by copying a single file
- **Progressive loading**: RVF Layer A loads in <5ms for instant startup; full accuracy reached in ~500ms as remaining segments load
- **Verifiable provenance**: RVF Witness segment contains deterministic training proof with Ed25519 signature — anyone can re-run training and verify weight hash
- **Self-bootstrapping**: RVF Wasm segment enables browser-based inference with no server-side dependencies
- **Open evaluation**: PCK, OKS, GPS metrics on public MM-Fi dataset provide reproducible, comparable results

### Negative

- **Training requires GPU**: Initial model training needs RTX 3090 or better (~8 hours on A100). Not all developers will have access.
- **Teacher-student label generation requires Detectron2**: One-time Python + CUDA dependency for generating UV pseudo-labels from RGB frames
- **MM-Fi CC BY-NC license**: Weights trained on MM-Fi cannot be used commercially without collecting proprietary data
- **Environment-specific adaptation still required**: SONA reduces the burden but a brief calibration session in each new environment is still recommended for best accuracy
- **6 additional RuVector crate dependencies**: Increases compile time and binary size. Mitigated by feature flags (e.g., `--features trained-model`).
- **Model size on disk**: ~25MB (FP16) or ~12MB (INT8). Acceptable for server deployment, may need further pruning for WASM.

### Risks and Mitigations

| Risk | Mitigation |
|------|------------|
| MM-Fi 114→56 interpolation loses accuracy | Train at native 114 as alternative; ESP32 mesh can collect 56-sub data natively |
| GNN overfits to training body types | Augment with diverse body proportions; Wi-Pose adds subject diversity |
| SONA adaptation diverges in adversarial environments | EWC++ regularization caps parameter drift; rollback to base weights on detection |
| Sparse inference degrades accuracy | Benchmark INT8 vs FP16 vs FP32; fall back to full precision if quality drops |
| Training proof hash changes with RuVector version updates | Pin ruvector crate versions in Cargo.toml; regenerate hash on version bumps |

## References

- Geng et al., "DensePose From WiFi" (CMU, arXiv:2301.00250, 2023)
- Yang et al., "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset" (NeurIPS 2023, arXiv:2305.10345)
- Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (ICLR 2022)
- Kirkpatrick et al., "Overcoming Catastrophic Forgetting in Neural Networks" (PNAS, 2017)
- Song et al., "PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU" (2024)
- ADR-005: SONA Self-Learning for Pose Estimation
- ADR-015: Public Dataset Strategy for Trained Pose Estimation Model
- ADR-016: RuVector Integration for Training Pipeline
- ADR-020: Migrate AI/Model Inference to Rust with RuVector and ONNX Runtime

## Appendix A: RuQu Consideration

**ruQu** ("Classical nervous system for quantum machines") provides real-time coherence
assessment via dynamic min-cut. While primarily designed for quantum error correction
(syndrome decoding, surface code arbitration), its core primitive — the `CoherenceGate` —
is architecturally relevant to WiFi CSI processing:

- **CoherenceGate** uses `ruvector-mincut` to make real-time gate/pass decisions on
  signal streams based on structural coherence thresholds. In quantum computing, this
  gates qubit syndrome streams. For WiFi CSI, the same mechanism could gate CSI
  subcarrier streams — passing only subcarriers whose coherence (phase stability across
  antennas) exceeds a dynamic threshold.

- **Syndrome filtering** (`filters.rs`) implements Kalman-like adaptive filters that
  could be repurposed for CSI noise filtering — treating each subcarrier's amplitude
  drift as a "syndrome" stream.

- **Min-cut gated transformer** integration (optional feature) provides coherence-optimized
  attention with 50% FLOP reduction — directly applicable to the `ModalityTranslator`
  bottleneck.

**Decision**: ruQu is not included in the initial pipeline (Phase 1-8) but is marked as a
**Phase 9 exploration** candidate for coherence-gated CSI filtering. The CoherenceGate
primitive maps naturally to subcarrier quality assessment, and the integration path is
clean since ruQu already depends on `ruvector-mincut`.

## Appendix B: Training Data Strategy

The pipeline supports three data sources for training, used in combination:

| Source | Subcarriers | Pose Labels | Volume | Cost | When |
|--------|-------------|-------------|--------|------|------|
| **MM-Fi** (public) | 114 → 56 (interpolated) | 17 COCO + DensePose UV | 40 subjects, 320K frames | Free (CC BY-NC) | Phase 1 — bootstrap |
| **Wi-Pose** (public) | 30 → 56 (zero-padded) | 18 keypoints | 12 subjects, 166K packets | Free (research) | Phase 1 — diversity |
| **ESP32 self-collected** | 56 (native) | Teacher-student from camera | Unlimited, environment-specific | Hardware only ($54) | Phase 4+ — fine-tuning |

**Recommended approach: Both public + ESP32 data.**

1. **Pre-train on MM-Fi + Wi-Pose** (public data, Phase 1-4): Provides the base model
   with diverse subjects and actions. The 114→56 subcarrier interpolation is acceptable
   for learning general CSI-to-pose mappings.

2. **Fine-tune on ESP32 self-collected data** (Phase 5+, SONA adaptation): Collect
   5-30 minutes of paired ESP32 CSI + camera data in each target environment. The camera
   serves as the teacher model (Detectron2 generates pseudo-labels). SONA LoRA adaptation
   takes <50 gradient steps to converge.

3. **Continuous adaptation** (runtime): SONA's self-supervised temporal consistency loss
   refines the model without any camera, using the assumption that poses change smoothly
   over short time windows.

This three-tier strategy gives you:
- A working model from day one (public data)
- Environment-specific accuracy (ESP32 fine-tuning)
- Ongoing drift correction (SONA runtime adaptation)