diff --git a/docs/adr/ADR-015-public-dataset-training-strategy.md b/docs/adr/ADR-015-public-dataset-training-strategy.md index a34ba5a..4742827 100644 --- a/docs/adr/ADR-015-public-dataset-training-strategy.md +++ b/docs/adr/ADR-015-public-dataset-training-strategy.md @@ -2,7 +2,7 @@ ## Status -Proposed +Accepted ## Context @@ -25,119 +25,156 @@ the camera is removed. This means any dataset that provides *either* ground-trut pose annotations *or* synchronized RGB frames (from which a teacher can generate labels) is sufficient for training. +### 56-Subcarrier Hardware Context + +The system targets 56 subcarriers, which corresponds specifically to **Atheros 802.11n +chipsets on a 20 MHz channel** using the Atheros CSI Tool. No publicly available +dataset with paired pose annotations was collected at exactly 56 subcarriers: + +| Hardware | Subcarriers | Datasets | +|----------|-------------|---------| +| Atheros CSI Tool (20 MHz) | **56** | None with pose labels | +| Atheros CSI Tool (40 MHz) | **114** | MM-Fi | +| Intel 5300 NIC (20 MHz) | **30** | Person-in-WiFi, Widar 3.0, Wi-Pose, XRF55 | +| Nexmon/Broadcom (80 MHz) | **242-256** | None with pose labels | + +MM-Fi uses the same Atheros hardware family at 40 MHz, making 114→56 interpolation +physically meaningful (same chipset, different channel width). + ## Decision -Use MM-Fi as the primary training dataset, supplemented by XRF55 for additional -diversity, with a teacher-student pipeline for any dataset that lacks dense pose -annotations but provides RGB video. +Use MM-Fi as the primary training dataset, supplemented by Wi-Pose (NjtechCVLab) +for additional diversity. XRF55 is downgraded to optional (Kinect labels need +post-processing). Teacher-student pipeline fills in DensePose UV labels where +only skeleton keypoints are available. ### Primary Dataset: MM-Fi **Paper:** "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless -Sensing" (NeurIPS 2023 Datasets Track) -**Repository:** https://github.com/ybCliff/MM-Fi -**Size:** 40 volunteers × 27 action classes × ~320,000 frames +Sensing" (NeurIPS 2023 Datasets & Benchmarks) +**Repository:** https://github.com/ybhbingo/MMFi_dataset +**Size:** 40 subjects × 27 action classes × ~320,000 frames, 4 environments **Modalities:** WiFi CSI, mmWave radar, LiDAR, RGB-D, IMU -**CSI format:** 3 Tx × 3 Rx antennas, 114 subcarriers, 100 Hz sampling rate, -IEEE 802.11n 5 GHz, raw amplitude + phase -**Pose annotations:** 17-keypoint COCO skeleton (from RGB-D ground truth) +**CSI format:** **1 TX × 3 RX antennas**, 114 subcarriers, 100 Hz sampling rate, +5 GHz 40 MHz (TP-Link N750 with Atheros CSI Tool), raw amplitude + phase +**Data tensor:** [3, 114, 10] per sample (antenna-pairs × subcarriers × time frames) +**Pose annotations:** 17-keypoint COCO skeleton in 3D + DensePose UV surface coords **License:** CC BY-NC 4.0 -**Why primary:** Largest public WiFi CSI + pose dataset; raw amplitude and phase -available (not just processed features); antenna count (3×3) is compatible with the -existing `CSIProcessor` configuration; COCO keypoints map directly to the -`KeypointHead` output format. +**Why primary:** Largest public WiFi CSI + pose dataset; richest annotations (3D +keypoints + DensePose UV); same Atheros hardware family as target system; COCO +keypoints map directly to the `KeypointHead` output format; actively maintained +with NeurIPS 2023 benchmark status. -### Secondary Dataset: XRF55 +**Antenna correction:** MM-Fi uses 1 TX / 3 RX (3 antenna pairs), not 3×3. +The existing system targets 3×3 (ESP32 mesh). The 3 RX antennas match; the TX +difference means MM-Fi-trained weights will work but may benefit from fine-tuning +on data from a 3-TX setup. -**Paper:** "XRF55: A Radio-Frequency Dataset for Human Indoor Action Recognition" -(ACM MM 2023) -**Repository:** https://github.com/aiotgroup/XRF55 -**Size:** 55 action classes, multiple subjects and environments -**CSI format:** WiFi CSI + UWB radar, 3 Tx × 3 Rx, 30 subcarriers -**Pose annotations:** Skeleton keypoints from Kinect +### Secondary Dataset: Wi-Pose (NjtechCVLab) + +**Paper:** CSI-Former (MDPI Entropy 2023) and related works +**Repository:** https://github.com/NjtechCVLab/Wi-PoseDataset +**Size:** 12 volunteers × 12 action classes × 166,600 packets +**CSI format:** 3 TX × 3 RX antennas, 30 subcarriers, 5 GHz, .mat format +**Pose annotations:** 18-keypoint AlphaPose skeleton (COCO-compatible subset) **License:** Research use -**Why secondary:** Different environments and action vocabulary increase -generalization; 30 subcarriers requires subcarrier interpolation to match the -existing 56-subcarrier config. +**Why secondary:** 3×3 antenna array matches target ESP32 mesh hardware exactly; +fully public; adds 12 different subjects and environments not in MM-Fi. +**Note:** 30 subcarriers require zero-padding or interpolation to 56; 18→17 +keypoint mapping drops one neck keypoint (index 1), compatible with COCO-17. -### Excluded Datasets and Reasons +### Excluded / Deprioritized Datasets -| Dataset | Reason for exclusion | -|---------|---------------------| -| RF-Pose / RF-Pose3D (MIT) | Uses 60 GHz mmWave, not 2.4/5 GHz WiFi CSI; incompatible signal physics | -| Person-in-WiFi (CMU 2019) | Amplitude only, no phase; not publicly released | -| Widar 3.0 | Gesture recognition only, no full-body pose | -| NTU-Fi | Activity labels only, no pose keypoints | -| WiPose | Limited release; superseded by MM-Fi | +| Dataset | Reason | +|---------|--------| +| RF-Pose / RF-Pose3D (MIT) | Custom FMCW radio, not 802.11n CSI; incompatible signal physics | +| Person-in-WiFi (CMU 2019) | Not publicly released (IRB restriction) | +| Person-in-WiFi 3D (CVPR 2024) | 30 subcarriers, Intel 5300; semi-public access | +| DensePose From WiFi (CMU) | Dataset not released; only paper + architecture | +| Widar 3.0 | Gesture labels only, no full-body pose keypoints | +| XRF55 | Activity labels primarily; Kinect pose requires email request; lower priority | +| UT-HAR, WiAR, SignFi | Activity/gesture labels only, no pose keypoints | ## Implementation Plan -### Phase 1: MM-Fi Loader +### Phase 1: MM-Fi Loader (Rust `wifi-densepose-train` crate) -Implement a `PyTorch Dataset` class that: -- Reads MM-Fi's HDF5/numpy CSI files -- Resamples from 114 subcarriers → 56 subcarriers (linear interpolation along - frequency axis) to match the existing `CSIProcessor` config -- Normalizes amplitude and unwraps phase using the existing `PhaseSanitizer` -- Returns `(amplitude, phase, keypoints_17)` tuples +Implement `MmFiDataset` in Rust (`crates/wifi-densepose-train/src/dataset.rs`): +- Reads MM-Fi numpy .npy files: amplitude [N, 3, 3, 114] (antenna-pairs laid flat), phase [N, 3, 3, 114] +- Resamples from 114 → 56 subcarriers (linear interpolation via `subcarrier.rs`) +- Applies phase sanitization using SOTA algorithms from `wifi-densepose-signal` crate +- Returns typed `CsiSample` structs with amplitude, phase, keypoints, visibility +- Validation split: subjects 33–40 held out -### Phase 2: Teacher-Student Labels +### Phase 2: Wi-Pose Loader -For samples where only skeleton keypoints are available (not full DensePose UV maps): -- Run Detectron2 DensePose on the paired RGB frames to generate `(part_labels, - u_coords, v_coords)` pseudo-labels -- Cache generated labels to avoid recomputation during training epochs -- This matches the training procedure in the original CMU paper +Implement `WiPoseDataset` reading .mat files (via ndarray-based MATLAB reader or +pre-converted .npy). Subcarrier interpolation: 30 → 56 (zero-pad high frequencies +rather than interpolate, since 30-sub Intel data has different spectral occupancy +than 56-sub Atheros data). -### Phase 3: Training Pipeline +### Phase 3: Teacher-Student DensePose Labels -- **Loss:** Combined keypoint heatmap loss (MSE) + DensePose part classification - (cross-entropy) + UV regression (Smooth L1) + transfer loss against teacher - RGB backbone features -- **Optimizer:** Adam, lr=1e-3, milestones at 48k and 96k steps (paper schedule) +For MM-Fi samples that provide 3D keypoints but not full DensePose UV maps: +- Run Detectron2 DensePose on paired RGB frames to generate `(part_labels, u_coords, v_coords)` +- Cache generated labels as .npy alongside original data +- This matches the training procedure in the CMU paper exactly + +### Phase 4: Training Pipeline (Rust) + +- **Model:** `WiFiDensePoseModel` (tch-rs, `crates/wifi-densepose-train/src/model.rs`) +- **Loss:** Keypoint heatmap (MSE) + DensePose part (cross-entropy) + UV (Smooth L1) + transfer (MSE) +- **Metrics:** PCK@0.2 + OKS with Hungarian min-cost assignment (`crates/wifi-densepose-train/src/metrics.rs`) +- **Optimizer:** Adam, lr=1e-3, step decay at epochs 40 and 80 - **Hardware:** Single GPU (RTX 3090 or A100); MM-Fi fits in ~50 GB disk - **Checkpointing:** Save every epoch; keep best-by-validation-PCK -### Phase 4: Evaluation +### Phase 5: Proof Verification -- **Keypoints:** PCK@0.2 (Percentage of Correct Keypoints within 20% of torso size) -- **DensePose:** GPS (Geodesic Point Similarity) and GPSM with segmentation mask -- **Held-out split:** MM-Fi subjects 33-40 (20%) for validation; no test-set leakage +`verify-training` binary provides the "trust kill switch" for training: +- Fixed seed (MODEL_SEED=0, PROOF_SEED=42) +- 50 training steps on deterministic SyntheticDataset +- Verifies: loss decreases + SHA-256 of final weights matches stored hash +- EXIT 0 = PASS, EXIT 1 = FAIL, EXIT 2 = SKIP (no stored hash) ## Subcarrier Mismatch: MM-Fi (114) vs System (56) -MM-Fi captures 114 subcarriers at 5 GHz with 40 MHz bandwidth. The existing system -is configured for 56 subcarriers. Resolution options in order of preference: +MM-Fi captures 114 subcarriers at 5 GHz with 40 MHz bandwidth (Atheros CSI Tool). +The system is configured for 56 subcarriers (Atheros, 20 MHz). Resolution options: -1. **Interpolate MM-Fi → 56** (recommended for initial training): linear interpolation - preserves spectral envelope, fast, no architecture change needed -2. **Reconfigure system → 114**: change `CSIProcessor` config; requires re-running - `verify.py --generate-hash` to update proof hash -3. **Train at native 114, serve at 56**: separate train/inference configs; adds - complexity +1. **Interpolate MM-Fi → 56** (chosen for Phase 1): linear interpolation preserves + spectral envelope, fast, no architecture change needed +2. **Train at native 114**: change `CSIProcessor` config; requires re-running + `verify.py --generate-hash` to update proof hash; future option +3. **Collect native 56-sub data**: ESP32 mesh at 20 MHz; best for production -Option 1 is chosen for Phase 1 to unblock training immediately. +Option 1 unblocks training immediately. The Rust `subcarrier.rs` module handles +interpolation as a first-class operation with tests proving correctness. ## Consequences **Positive:** -- Unblocks end-to-end training without hardware collection -- MM-Fi's 3×3 antenna setup matches this system's target hardware (ESP32 mesh, ADR-012) -- 40 subjects with 27 action classes provides reasonable diversity for a first model +- Unblocks end-to-end training on real public data immediately +- MM-Fi's Atheros hardware family matches target system (same CSI Tool) +- 40 subjects × 27 actions provides reasonable diversity for first model +- Wi-Pose's 3×3 antenna setup is an exact hardware match for ESP32 mesh - CC BY-NC license is compatible with research and internal use +- Rust implementation integrates natively with `wifi-densepose-signal` pipeline **Negative:** - CC BY-NC prohibits commercial deployment of weights trained solely on MM-Fi; custom data collection required before commercial release -- 114→56 subcarrier interpolation loses some frequency resolution; acceptable for - initial training, revisit in Phase 2 -- MM-Fi was captured in controlled lab environments; expect accuracy drop in - complex real-world deployments until fine-tuned on domain-specific data +- MM-Fi is 1 TX / 3 RX; system targets 3 TX / 3 RX; fine-tuning needed +- 114→56 subcarrier interpolation loses frequency resolution; acceptable for v1 +- MM-Fi captured in controlled lab environments; real-world accuracy will be lower + until fine-tuned on domain-specific data ## References -- He et al., "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset" (NeurIPS 2023) -- Yang et al., "DensePose From WiFi" (arXiv 2301.00250, CMU 2023) +- Yang et al., "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset" (NeurIPS 2023) — arXiv:2305.10345 +- Geng et al., "DensePose From WiFi" (CMU, arXiv:2301.00250, 2023) +- Yan et al., "Person-in-WiFi 3D" (CVPR 2024) +- NjtechCVLab, "Wi-Pose Dataset" — github.com/NjtechCVLab/Wi-PoseDataset - ADR-012: ESP32 CSI Sensor Mesh (hardware target) - ADR-013: Feature-Level Sensing on Commodity Gear - ADR-014: SOTA Signal Processing Algorithms