what kind of multitarget tracker you used? #59

New Issue

2026-03-01T15:28:23+08:00

BaiLiping commented

2026-03-01 15:28:23 +08:00

(Migrated from github.com)

Hey you claim it can track up to 10 people, what kind of tracker are we talking about

ruvnet commented

2026-03-01 15:40:58 +08:00

(Migrated from github.com)

Good question. There is no traditional tracker (no Kalman, no SORT/DeepSORT, no appearance-based re-ID). Here's what's actually happening at each layer:

1. Signal-level person separation

The system does not decompose the CSI signal into "person 1's signal" and "person 2's signal." Instead, the neural network ingests all 56 subcarriers simultaneously, and multiple people manifest as separate spatial blobs in the output heatmaps — the same way camera-based DensePose handles multiple people in a single image. Each subcarrier sees the environment at a slightly different wavelength (different Fresnel zone geometry per body position), and the model learns to disentangle them.

Subcarrier selection via ruvector-mincut partitions subcarriers into motion-sensitive vs. insensitive sets, and ruvector-attn-mincut gates irrelevant antenna paths via self-attention over TX-RX pairs.

2. Multi-person assignment (frame-level)

Once the model predicts keypoints, the system matches predicted skeletons to ground-truth or previously-seen identities using bipartite assignment on an OKS (Object Keypoint Similarity) cost matrix. Three algorithms are implemented in crates/wifi-densepose-train/src/metrics.rs:

Algorithm	Complexity	Use
Hungarian (Kuhn-Munkres)	O(n³)	Deterministic single-frame proof verification
DynamicPersonMatcher (`ruvector-mincut` flow network)	O(n^1.5 log n) amortized	Primary multi-frame matcher — supports incremental `add_person()` / `remove_person()` without rebuilding the graph
Min-cost max-flow (SPFA)	O(VE)	Alternative bipartite assignment

The DynamicPersonMatcher wraps a source-sink flow network where edge capacities encode LARGE_CAP - oks_cost. Min-cut partitioning yields optimal (prediction, ground_truth) pairs.

3. What constrains the person count

There is no hard-coded max. PoseEstimate.persons is a Vec<PersonPose> with no cap. The practical limit is signal physics:

56 subcarriers on a single TX-RX link resolve ~3-5 distinct body reflections (consistent with literature: WiPose, Person-in-WiFi, DensePose from WiFi)
Multi-AP scales linearly — each additional AP adds independent subcarrier observations
9 antenna paths (3 TX × 3 RX) provide spatial diversity

The "up to 10" figure assumes a multi-AP mesh (e.g., 3 APs × ~3-4 resolvable people each). On a single AP with 56 subcarriers, 3-5 is a realistic ceiling before mutual interference degrades accuracy.

4. What's NOT here

No Kalman filter or state estimation
No appearance-based re-identification across temporal gaps
No explicit track lifecycle management (birth/death/occlusion handling)

It's fundamentally dense prediction + bipartite matching, not tracking in the MOT sense. Each frame is solved independently; temporal coherence comes from the signal's physical continuity, not from a tracker state.

Good question. There is no traditional tracker (no Kalman, no SORT/DeepSORT, no appearance-based re-ID). Here's what's actually happening at each layer: ### 1. Signal-level person separation The system does **not** decompose the CSI signal into "person 1's signal" and "person 2's signal." Instead, the neural network ingests all 56 subcarriers simultaneously, and multiple people manifest as separate spatial blobs in the output heatmaps — the same way camera-based DensePose handles multiple people in a single image. Each subcarrier sees the environment at a slightly different wavelength (different Fresnel zone geometry per body position), and the model learns to disentangle them. Subcarrier selection via `ruvector-mincut` partitions subcarriers into motion-sensitive vs. insensitive sets, and `ruvector-attn-mincut` gates irrelevant antenna paths via self-attention over TX-RX pairs. ### 2. Multi-person assignment (frame-level) Once the model predicts keypoints, the system matches predicted skeletons to ground-truth or previously-seen identities using **bipartite assignment** on an OKS (Object Keypoint Similarity) cost matrix. Three algorithms are implemented in `crates/wifi-densepose-train/src/metrics.rs`: | Algorithm | Complexity | Use | |-----------|-----------|-----| | **Hungarian (Kuhn-Munkres)** | O(n³) | Deterministic single-frame proof verification | | **DynamicPersonMatcher** (`ruvector-mincut` flow network) | O(n^1.5 log n) amortized | Primary multi-frame matcher — supports incremental `add_person()` / `remove_person()` without rebuilding the graph | | **Min-cost max-flow (SPFA)** | O(VE) | Alternative bipartite assignment | The `DynamicPersonMatcher` wraps a source-sink flow network where edge capacities encode `LARGE_CAP - oks_cost`. Min-cut partitioning yields optimal `(prediction, ground_truth)` pairs. ### 3. What constrains the person count There is no hard-coded max. `PoseEstimate.persons` is a `Vec<PersonPose>` with no cap. The practical limit is signal physics: - **56 subcarriers** on a single TX-RX link resolve ~3-5 distinct body reflections (consistent with literature: WiPose, Person-in-WiFi, DensePose from WiFi) - **Multi-AP** scales linearly — each additional AP adds independent subcarrier observations - **9 antenna paths** (3 TX × 3 RX) provide spatial diversity The "up to 10" figure assumes a multi-AP mesh (e.g., 3 APs × ~3-4 resolvable people each). On a single AP with 56 subcarriers, 3-5 is a realistic ceiling before mutual interference degrades accuracy. ### 4. What's NOT here - No Kalman filter or state estimation - No appearance-based re-identification across temporal gaps - No explicit track lifecycle management (birth/death/occlusion handling) It's fundamentally **dense prediction + bipartite matching**, not tracking in the MOT sense. Each frame is solved independently; temporal coherence comes from the signal's physical continuity, not from a tracker state.

👍 1

BaiLiping commented

2026-03-01 15:43:52 +08:00

(Migrated from github.com)

We can do better than that.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: dearsky/wifi-densepose#59