Files
wifi-densepose/docs/adr/ADR-015-public-dataset-training-strategy.md
Claude 4babb320bf docs: Add ADR-015 public dataset training strategy
Records the decision to use MM-Fi as primary training dataset and XRF55
as secondary, with a teacher-student pipeline for generating DensePose
UV pseudo-labels from paired RGB frames.

https://claude.ai/code/session_01BSBAQJ34SLkiJy4A8SoiL4
2026-02-28 15:00:12 +00:00

6.3 KiB
Raw Blame History

ADR-015: Public Dataset Strategy for Trained Pose Estimation Model

Status

Proposed

Context

The WiFi-DensePose system has a complete model architecture (DensePoseHead, ModalityTranslationNetwork, WiFiDensePoseRCNN) and signal processing pipeline, but no trained weights. Without a trained model, pose estimation produces random outputs regardless of input quality.

Training requires paired data: simultaneous WiFi CSI captures alongside ground-truth human pose annotations. Collecting this data from scratch requires months of effort and specialized hardware (multiple WiFi nodes + camera + motion capture rig). Several public datasets exist that can bootstrap training without custom collection.

The Teacher-Student Constraint

The CMU "DensePose From WiFi" paper (2023) trains using a teacher-student approach: a camera-based RGB pose model (e.g. Detectron2 DensePose) generates pseudo-labels during training, so the WiFi model learns to replicate those outputs. At inference, the camera is removed. This means any dataset that provides either ground-truth pose annotations or synchronized RGB frames (from which a teacher can generate labels) is sufficient for training.

Decision

Use MM-Fi as the primary training dataset, supplemented by XRF55 for additional diversity, with a teacher-student pipeline for any dataset that lacks dense pose annotations but provides RGB video.

Primary Dataset: MM-Fi

Paper: "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing" (NeurIPS 2023 Datasets Track) Repository: https://github.com/ybCliff/MM-Fi Size: 40 volunteers × 27 action classes × ~320,000 frames Modalities: WiFi CSI, mmWave radar, LiDAR, RGB-D, IMU CSI format: 3 Tx × 3 Rx antennas, 114 subcarriers, 100 Hz sampling rate, IEEE 802.11n 5 GHz, raw amplitude + phase Pose annotations: 17-keypoint COCO skeleton (from RGB-D ground truth) License: CC BY-NC 4.0 Why primary: Largest public WiFi CSI + pose dataset; raw amplitude and phase available (not just processed features); antenna count (3×3) is compatible with the existing CSIProcessor configuration; COCO keypoints map directly to the KeypointHead output format.

Secondary Dataset: XRF55

Paper: "XRF55: A Radio-Frequency Dataset for Human Indoor Action Recognition" (ACM MM 2023) Repository: https://github.com/aiotgroup/XRF55 Size: 55 action classes, multiple subjects and environments CSI format: WiFi CSI + UWB radar, 3 Tx × 3 Rx, 30 subcarriers Pose annotations: Skeleton keypoints from Kinect License: Research use Why secondary: Different environments and action vocabulary increase generalization; 30 subcarriers requires subcarrier interpolation to match the existing 56-subcarrier config.

Excluded Datasets and Reasons

Dataset Reason for exclusion
RF-Pose / RF-Pose3D (MIT) Uses 60 GHz mmWave, not 2.4/5 GHz WiFi CSI; incompatible signal physics
Person-in-WiFi (CMU 2019) Amplitude only, no phase; not publicly released
Widar 3.0 Gesture recognition only, no full-body pose
NTU-Fi Activity labels only, no pose keypoints
WiPose Limited release; superseded by MM-Fi

Implementation Plan

Phase 1: MM-Fi Loader

Implement a PyTorch Dataset class that:

  • Reads MM-Fi's HDF5/numpy CSI files
  • Resamples from 114 subcarriers → 56 subcarriers (linear interpolation along frequency axis) to match the existing CSIProcessor config
  • Normalizes amplitude and unwraps phase using the existing PhaseSanitizer
  • Returns (amplitude, phase, keypoints_17) tuples

Phase 2: Teacher-Student Labels

For samples where only skeleton keypoints are available (not full DensePose UV maps):

  • Run Detectron2 DensePose on the paired RGB frames to generate (part_labels, u_coords, v_coords) pseudo-labels
  • Cache generated labels to avoid recomputation during training epochs
  • This matches the training procedure in the original CMU paper

Phase 3: Training Pipeline

  • Loss: Combined keypoint heatmap loss (MSE) + DensePose part classification (cross-entropy) + UV regression (Smooth L1) + transfer loss against teacher RGB backbone features
  • Optimizer: Adam, lr=1e-3, milestones at 48k and 96k steps (paper schedule)
  • Hardware: Single GPU (RTX 3090 or A100); MM-Fi fits in ~50 GB disk
  • Checkpointing: Save every epoch; keep best-by-validation-PCK

Phase 4: Evaluation

  • Keypoints: PCK@0.2 (Percentage of Correct Keypoints within 20% of torso size)
  • DensePose: GPS (Geodesic Point Similarity) and GPSM with segmentation mask
  • Held-out split: MM-Fi subjects 33-40 (20%) for validation; no test-set leakage

Subcarrier Mismatch: MM-Fi (114) vs System (56)

MM-Fi captures 114 subcarriers at 5 GHz with 40 MHz bandwidth. The existing system is configured for 56 subcarriers. Resolution options in order of preference:

  1. Interpolate MM-Fi → 56 (recommended for initial training): linear interpolation preserves spectral envelope, fast, no architecture change needed
  2. Reconfigure system → 114: change CSIProcessor config; requires re-running verify.py --generate-hash to update proof hash
  3. Train at native 114, serve at 56: separate train/inference configs; adds complexity

Option 1 is chosen for Phase 1 to unblock training immediately.

Consequences

Positive:

  • Unblocks end-to-end training without hardware collection
  • MM-Fi's 3×3 antenna setup matches this system's target hardware (ESP32 mesh, ADR-012)
  • 40 subjects with 27 action classes provides reasonable diversity for a first model
  • CC BY-NC license is compatible with research and internal use

Negative:

  • CC BY-NC prohibits commercial deployment of weights trained solely on MM-Fi; custom data collection required before commercial release
  • 114→56 subcarrier interpolation loses some frequency resolution; acceptable for initial training, revisit in Phase 2
  • MM-Fi was captured in controlled lab environments; expect accuracy drop in complex real-world deployments until fine-tuned on domain-specific data

References

  • He et al., "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset" (NeurIPS 2023)
  • Yang et al., "DensePose From WiFi" (arXiv 2301.00250, CMU 2023)
  • ADR-012: ESP32 CSI Sensor Mesh (hardware target)
  • ADR-013: Feature-Level Sensing on Commodity Gear
  • ADR-014: SOTA Signal Processing Algorithms