Files
wifi-densepose/docs/research/ruview-multistatic-fidelity-sota-2026.md
Claude 374b0fdcef docs: add RuView (ADR-028) sensing-first RF mode for multistatic fidelity
Introduce Project RuView — RuVector Viewpoint-Integrated Enhancement — a
sensing-first RF mode that improves WiFi DensePose fidelity through
cross-viewpoint embedding fusion on commodity ESP32 hardware.

Research document (docs/research/ruview-multistatic-fidelity-sota-2026.md):
- SOTA analysis of three fidelity levers: bandwidth, carrier frequency, viewpoints
- Multistatic array theory with virtual aperture and TDM sensing protocol
- ESP32 multistatic path ($84 BOM) and Cognitum v1 + RF front end path
- IEEE 802.11bf alignment and forward-compatibility mapping
- RuVector pipeline: all 5 crates mapped to cross-viewpoint operations
- Three-metric acceptance suite: joint error (PCK/OKS), multi-person
  separation (MOTA), vital sign sensitivity with Bronze/Silver/Gold tiers

ADR-028 (docs/adr/ADR-028-ruview-sensing-first-rf-mode.md):
- DDD bounded context: ViewpointFusion with MultistaticArray aggregate,
  ViewpointEmbedding entity, GeometricDiversityIndex value object
- Cross-viewpoint attention fusion via ruvector-attention with geometric bias
- TDM sensing protocol: 6 nodes, 119 Hz aggregate, 20 Hz per viewpoint
- Coherence-gated environment updates for multi-day stability
- File-level implementation plan across 4 phases (8 new source files)
- ADR interaction map: ADR-012, 014, 016/017, 021, 024, 027

https://claude.ai/code/session_01JBad1xig7AbGdbNiYJALZc
2026-03-02 02:07:31 +00:00

23 KiB

RuView: Viewpoint-Integrated Enhancement for WiFi DensePose Fidelity

Date: 2026-03-02 Scope: Sensing-first RF mode design, multistatic geometry, ESP32 mesh architecture, Cognitum v1 integration, IEEE 802.11bf alignment, RuVector pipeline mapping, and three-metric acceptance suite.


1. Abstract and Motivation

WiFi-based dense human pose estimation faces three persistent fidelity bottlenecks that limit practical deployment:

  1. Pose jitter. Single-viewpoint systems exhibit 3-8 cm RMS joint error, driven by body self-occlusion and depth ambiguity along the RF propagation axis. Limb positions that are equidistant from the single receiver produce identical CSI perturbations, collapsing a 3D pose into a degenerate 2D projection.

  2. Multi-person ambiguity. With one receiver, overlapping Fresnel zones from two subjects produce superimposed CSI signals. State-of-the-art trackers report 0.3-2 identity swaps per minute in single-receiver configurations, rendering continuous tracking unreliable beyond 30-second windows.

  3. Vital sign noise floor. Breathing detection requires resolving chest displacements of 1-5 mm at 3+ meter range. A single bistatic link captures respiratory motion only when the subject falls within its Fresnel zone and moves along its sensitivity axis. Off-axis breathing is invisible.

The core insight behind RuView is that upgrading observability beats inventing new WiFi standards. Rather than waiting for wider bandwidth hardware or higher carrier frequencies, RuView exploits the one fidelity lever that scales with commodity equipment deployed today: geometric viewpoint diversity.

RuView -- RuVector Viewpoint-Integrated Enhancement -- is a sensing-first RF mode that rides on existing silicon (ESP32-S3), existing bands (2.4/5 GHz), and existing regulations (Part 15 unlicensed). Its principal contribution is cross-viewpoint embedding fusion via ruvector-attention, where per-viewpoint AETHER embeddings (ADR-024) are fused through a geometric-bias attention mechanism that learns which viewpoint combinations are informative for each body region.

Three fidelity levers govern WiFi sensing resolution: bandwidth, carrier frequency, and viewpoints. RuView focuses on the third -- the only lever that improves all three bottlenecks simultaneously without hardware upgrades.


2. Three Fidelity Levers: SOTA Analysis

2.1 Bandwidth

Channel impulse response (CIR) features separate multipath components by time-of-arrival. Multipath separability is governed by the minimum resolvable delay:

delta_tau_min = 1 / BW
Standard Bandwidth Min Delay Path Separation
802.11n HT20 20 MHz 50 ns 15.0 m
802.11ac VHT80 80 MHz 12.5 ns 3.75 m
802.11ac VHT160 160 MHz 6.25 ns 1.87 m
802.11be EHT320 320 MHz 3.13 ns 0.94 m

Wider channels push the optimal feature domain from frequency (raw subcarrier CSI) toward time (CIR peaks), because multipath components become individually resolvable. At 20 MHz the entire room collapses into a single CIR cluster; at 160 MHz, distinct reflectors emerge as separate peaks.

ESP32-S3 operates at 20 MHz (HT20). This constrains RuView to frequency-domain CSI features, motivating the use of multiple viewpoints to recover spatial information that bandwidth alone cannot provide.

References: SpotFi (Kotaru et al., SIGCOMM 2015); IEEE 802.11bf sensing mode (2024).

2.2 Carrier Frequency

Phase sensitivity to displacement follows:

delta_phi = (4 * pi / lambda) * delta_d
Band Wavelength Phase Shift per 1 mm Wall Penetration
2.4 GHz 12.5 cm 0.10 rad Excellent (3+ walls)
5 GHz 6.0 cm 0.21 rad Moderate (1-2 walls)
60 GHz 5.0 mm 2.51 rad Line-of-sight only

Higher carrier frequencies provide sharper motion sensitivity but sacrifice penetration. At 60 GHz (802.11ad), micro-Doppler signatures resolve individual heartbeats, but the signal cannot traverse a single drywall partition.

Fresnel zone radius at each band governs the sensing-sensitive region:

r_n = sqrt(n * lambda * d1 * d2 / (d1 + d2))

At 2.4 GHz with 3m link distance, the first Fresnel zone radius is 0.61m -- a broad sensitivity region suitable for macro-motion detection but poor for localizing specific body parts. At 5 GHz the radius shrinks to 0.42m, improving localization at the cost of coverage.

RuView currently targets 2.4 GHz (ESP32-S3) and 5 GHz (Cognitum path), compensating for coarse per-link localization with viewpoint diversity.

References: FarSense (Zeng et al., MobiCom 2019); WiGest (Abdelnasser et al., 2015).

2.3 Viewpoints (RuView Core Contribution)

A single-viewpoint system suffers from a fundamental geometric limitation: body self-occlusion removes information that no amount of signal processing can recover. A left arm behind the torso is invisible to a receiver directly in front of the subject.

Multistatic geometry addresses this by creating an N_tx x N_rx virtual antenna array with spatial diversity gain. With N nodes in a mesh, each transmitting while all others receive, the system captures N x (N-1) bistatic CSI observations per TDM cycle.

Geometric Diversity Index (GDI). Quantify viewpoint quality:

GDI = (1/N) * sum_i min_{j != i} |theta_i - theta_j|

where theta_i is the azimuth of the i-th bistatic pair relative to the room center. Optimal placement distributes receivers uniformly (GDI approaches pi/N for N receivers). Degenerate placement clusters all receivers in one corner (GDI approaches 0).

Cramer-Rao Lower Bound for pose estimation. With N independent viewpoints, CRLB decreases as O(1/N). With correlated viewpoints:

CRLB ~ O(1/N_eff),  where N_eff = N * (1 - rho_bar)

and rho_bar is the mean pairwise correlation between viewpoint CSI streams. Maximizing GDI minimizes rho_bar.

Multipath separability x viewpoints. Joint improvement follows a product law:

Effective_resolution ~ BW * N_viewpoints * sin(angular_spread)

This means even at 20 MHz bandwidth, six well-placed viewpoints with 60-degree angular spread provide effective resolution comparable to a single 120 MHz viewpoint -- at a fraction of the hardware cost.

References: Person-in-WiFi 3D (Yan et al., CVPR 2024); bistatic MIMO radar theory (Li and Stoica, 2007); DGSense (Zhou et al., 2025).


3. Multistatic Array Theory

3.1 Virtual Aperture

N transmitters and M receivers create N x M virtual antenna elements. For an ESP32 mesh where each of 6 nodes transmits in turn while 5 others receive:

Virtual elements = 6 * 5 = 30 bistatic pairs

The virtual aperture diameter equals the maximum baseline between any two nodes. In a 5m x 5m room with nodes at the perimeter, D_aperture ~ 7m (diagonal), yielding angular resolution:

delta_theta ~ lambda / D_aperture = 0.125 / 7 ~ 1.0 degree at 2.4 GHz

This exceeds the angular resolution of any single-antenna receiver by an order of magnitude.

3.2 Time-Division Sensing Protocol

TDM assigns each node an exclusive transmit slot while all other nodes receive. With N nodes, each gets 1/N duty cycle:

Per-viewpoint rate = f_aggregate / N

At 120 Hz aggregate TDM cycle rate with 6 nodes: 20 Hz per bistatic pair.

Synchronization. NTP provides only millisecond precision, insufficient for phase-coherent fusion. RuView uses beacon-based synchronization:

  • Coordinator node broadcasts a sync beacon at the start of each TDM cycle
  • Peripheral nodes align their slot timing to the beacon with crystal precision (~20-50 ppm)
  • At 120 Hz cycle rate (8.33 ms period), 50 ppm drift produces 0.42 microsecond error
  • This is well within the 802.11n symbol duration (3.2 microseconds), acceptable for feature-level and embedding-level fusion

3.3 Cross-Viewpoint Fusion Strategies

Tier Fusion Level Requires Benefit ESP32 Feasible
1 Decision-level Labels only Majority vote on pose predictions Yes
2 Feature-level Aligned features Better than any single viewpoint Yes (ADR-012)
3 Embedding-level AETHER embeddings Learns what to fuse per body region Yes (RuView)

Decision-level fusion (Tier 1) discards information by reducing each viewpoint to a final prediction before combination. Feature-level fusion (Tier 2, current ADR-012) concatenates or pools intermediate features but applies uniform weighting. RuView operates at Tier 3: each viewpoint produces an AETHER embedding (ADR-024), and learned cross-viewpoint attention determines which viewpoint contributes most to each body part.


4. ESP32 Multistatic Array Path

4.1 Architecture Extension from ADR-012

ADR-012 defines feature-level fusion: amplitude, phase, and spectral features per node are aggregated via max/mean pooling across nodes. RuView extends this to embedding-level fusion:

Per Node:   CSI --> Signal Processing (ADR-014) --> AETHER Embedding (ADR-024)
Aggregator: [emb_1, emb_2, ..., emb_N] --> RuView Attention --> Fused Embedding
Output:     Fused Embedding --> DensePose Head --> 17 Keypoints + UV Maps

Each node runs the signal processing pipeline locally (conjugate multiplication, Hampel filtering, spectrogram extraction) and transmits a 128-dimensional AETHER embedding to the aggregator, rather than raw CSI. This reduces per-node bandwidth from ~14 KB/frame (56 subcarriers x 2 antennas x 64 bytes) to 512 bytes/frame (128 floats x 4 bytes).

4.2 Time-Scheduled Captures

The TDM coordinator runs on the aggregator (laptop or Raspberry Pi). Protocol per cycle:

Beacon --> Slot_1 (node 1 TX, all others RX) --> Slot_2 --> ... --> Slot_N --> Repeat

Each slot requires approximately 1.4 ms (one 802.11n LLTF frame plus guard interval). With 6 nodes: 8.4 ms cycle duration, yielding 119 Hz aggregate rate and 19.8 Hz per bistatic pair.

4.3 Central Aggregator Embedding Fusion

The aggregator receives per-viewpoint AETHER embeddings (d=128 each) and applies RuView cross-viewpoint attention:

Q = W_q * [emb_1; ...; emb_N]     (N x d)
K = W_k * [emb_1; ...; emb_N]     (N x d)
V = W_v * [emb_1; ...; emb_N]     (N x d)
A = softmax((Q * K^T + G_bias) / sqrt(d))
RuView_out = A * V

G_bias is a learnable geometric bias matrix encoding bistatic pair geometry. Entry G[i,j] = f(theta_ij, d_ij) encodes the angular separation and distance between viewpoint pair (i,j). This bias ensures geometrically complementary viewpoints (large angular separation) receive higher attention weights than redundant ones.

4.4 Bill of Materials

Item Qty Unit Cost Total Notes
ESP32-S3-DevKitC-1 6 $10 $60 Full multistatic mesh
USB hub + cables 1+6 $24 $24 Power and serial debug
WiFi router (any) 1 $0 $0 Existing infrastructure
Aggregator (laptop/RPi) 1 $0 $0 Existing hardware
Total $84 ~$14 per viewpoint

5. Cognitum v1 Path

5.1 Cognitum as Baseband and Embedding Engine

Cognitum v1 provides a gating kernel for intelligent signal routing, pairable with wider-bandwidth RF front ends (e.g., LimeSDR Mini at ~$200). The architecture:

RF Front End (20-160 MHz BW) --> Cognitum Baseband --> AETHER Embedding --> RuView Fusion

This path overcomes the ESP32's 20 MHz bandwidth limitation, enabling CIR-domain features alongside frequency-domain CSI. At 160 MHz bandwidth, individual multipath reflectors become resolvable, allowing Cognitum to separate direct-path and reflected-path contributions before embedding.

5.2 AETHER Contrastive Embedding (ADR-024)

Per-viewpoint AETHER embeddings are produced by the CsiToPoseTransformer backbone:

  • Input: sanitized CSI frame (56 subcarriers x 2 antennas x 2 components)
  • Backbone: cross-attention transformer producing [17 x d_model] body part features
  • Projection: linear head maps pooled features to 128-d normalized embedding
  • Training: VICReg-style contrastive loss with three terms -- invariance (same pose from different viewpoints maps nearby), variance (embeddings use full capacity), covariance (embedding dimensions are decorrelated)
  • Augmentation: subcarrier dropout (p=0.1), phase noise injection (sigma=0.05 rad), temporal jitter (+-2 frames)

5.3 RuVector Graph Memory

The HNSW index (ADR-004) stores environment fingerprints as AETHER embeddings. Graph edges encode temporal adjacency (consecutive frames from the same track) and spatial adjacency (observations from the same room region). Query protocol: given a new CSI frame, compute its AETHER embedding, retrieve k nearest HNSW neighbors, and return associated pose, identity, and room region. Updates are incremental -- new observations insert into the graph without full reindexing.

5.4 Coherence-Gated Updates

Environment changes (furniture moved, doors opened) corrupt stored fingerprints. RuView applies coherence gating:

coherence = |E[exp(j * delta_phi_t)]|   over T frames

if coherence > tau_coh (typically 0.7):
    update_environment_model(current_embedding)
else:
    mark_as_transient()

The complex mean of inter-frame phase differences measures environmental stability. Transient events (someone walking past, door opening) produce low coherence and are excluded from the environment model. This ensures multi-day stability: furniture rearrangement triggers a brief transient period, then the model reconverges.


6. IEEE 802.11bf Integration Points

IEEE 802.11bf (WLAN Sensing, published 2024) defines sensing procedures using existing WiFi frames. Key mechanisms:

  • Sensing Measurement Setup: Negotiation between sensing initiator and responder for measurement parameters
  • Sensing Measurement Report: Structured CSI feedback with standardized format
  • Trigger-Based Ranging (TBR): Time-of-flight measurement for distance estimation between stations

RuView maps directly onto 802.11bf constructs:

RuView Component 802.11bf Equivalent
TDM sensing protocol Sensing Measurement sessions
Per-viewpoint CSI capture Sensing Measurement Reports
Cross-viewpoint triangulation TBR-based distance matrix
Geometric bias matrix Station geometry from Measurement Setup

Forward compatibility: the RuView TDM protocol is designed to be expressible within 802.11bf frame structures. When commodity APs implement 802.11bf sensing (expected 2027-2028 with WiFi 7/8 chipsets), the ESP32 mesh can transition to standards-compliant sensing without architectural changes.

Current gap: no commodity APs implement 802.11bf sensing yet. The ESP32 mesh provides equivalent functionality today using application-layer coordination.


7. RuVector Pipeline for RuView

Each of the five ruvector v2.0.4 crates maps to a new cross-viewpoint operation.

7.1 ruvector-mincut: Cross-Viewpoint Subcarrier Consensus

Current usage (ADR-017): per-viewpoint subcarrier selection via motion sensitivity scoring. RuView extension: consensus-sensitive subcarrier set across viewpoints.

  • Build graph: nodes = subcarriers, edges weighted by cross-viewpoint sensitivity correlation
  • Min-cut partitions into three classes: globally sensitive (correlated across all viewpoints), locally sensitive (informative for specific viewpoints), and insensitive (noise-dominated)
  • Use globally sensitive set for cross-viewpoint features; locally sensitive set for per-viewpoint refinement

7.2 ruvector-attn-mincut: Viewpoint Attention Gating

Current usage: gate spectrogram frames by attention weight. RuView extension: gate viewpoints by geometric diversity.

  • Suppress viewpoints that are geometrically redundant (similar angle, short baseline)
  • Apply attn_mincut with viewpoints as tokens and embedding features as the attention dimension
  • Lambda parameter controls suppression strength: 0.1 (mild, keep most viewpoints) to 0.5 (aggressive, suppress redundant viewpoints)

7.3 ruvector-temporal-tensor: Multi-Viewpoint Compression

Current usage: tiered compression for single-stream CSI buffers. RuView extension: independent tier policies per viewpoint.

Tier Bit Depth Assignment Latency
Hot 8-bit Primary viewpoint (highest SNR) Real-time
Warm 5-7 bit Secondary viewpoints Real-time
Cold 3-bit Historical cross-viewpoint fusions Archival

7.4 ruvector-solver: Cross-Viewpoint Triangulation

Current usage (ADR-017): TDoA equations for single multi-AP scenarios. RuView extension: full bistatic geometry system solving.

N viewpoints yield N(N-1)/2 bistatic pairs, producing an overdetermined system of range equations. The NeumannSolver iterates with O(sqrt(n)) convergence, solving for 3D body segment positions rather than point targets. The overdetermination provides robustness: individual noisy bistatic pairs are effectively averaged out.

7.5 ruvector-attention: RuView Core Fusion

This is the heart of RuView. Cross-viewpoint scaled dot-product attention:

Input: X = [emb_1, ..., emb_N] in R^{N x d}
Q = X * W_q,   K = X * W_k,   V = X * W_v
A = softmax((Q * K^T + G_bias) / sqrt(d))
output = A * V

G_bias is a learnable geometric bias derived from viewpoint pair geometry (angular separation, baseline distance). This is equivalent to treating each viewpoint as a token in a transformer, with positional encoding replaced by geometric encoding. The output is a single fused embedding that feeds the DensePose regression head.


8. Three-Metric Acceptance Suite

8.1 Metric 1: Joint Error (PCK / OKS)

Criterion Threshold Notes
PCK@0.2 (all 17 keypoints) >= 0.70 20% of torso diameter tolerance
PCK@0.2 (torso: shoulders, hips) >= 0.80 Core body must be stable
Mean OKS >= 0.50 COCO-standard evaluation
Torso jitter (RMS, 10s windows) < 3 cm Temporal stability
Per-keypoint max error (95th pctl) < 15 cm No catastrophic outliers

8.2 Metric 2: Multi-Person Separation

Criterion Threshold Notes
Number of subjects 2 Minimum acceptance scenario
Capture rate 20 Hz Continuous tracking
Track duration 10 minutes Without intervention
Identity swaps (MOTA ID-switch) 0 Zero tolerance over full duration
Track fragmentation ratio < 0.05 Tracks must not break and reform
False track creation rate 0 per minute No phantom subjects

8.3 Metric 3: Vital Sign Sensitivity

Criterion Threshold Notes
Breathing rate detection 6-30 BPM +/- 2 BPM Stationary subject, 3m range
Breathing band SNR >= 6 dB In 0.1-0.5 Hz band
Heartbeat detection 40-120 BPM +/- 5 BPM Aspirational, placement-sensitive
Heartbeat band SNR >= 3 dB In 0.8-2.0 Hz band (aspirational)
Micro-motion resolution 1 mm chest displacement at 3m Breathing depth estimation

8.4 Tiered Pass/Fail

Tier Requirements Interpretation
Bronze Metric 2 passes Multi-person tracking works; minimum viable deployment
Silver Metrics 1 + 2 pass Tracking plus pose quality; production candidate
Gold All three metrics pass Tracking, pose, and vitals; full RuView deployment

9. RuView vs Alternatives

Capability Single ESP32 Intel 5300 6-Node ESP32 + RuView Cognitum + RF + RuView Camera DensePose
PCK@0.2 ~0.20 ~0.45 ~0.70 (target) ~0.80 (target) ~0.90
Multi-person tracking None Poor Good (target) Excellent (target) Excellent
Vital sign SNR 2-4 dB 6-8 dB 8-12 dB (target) 12-18 dB (target) N/A
Hardware cost $15 $80 $84 ~$300 $30-200
Privacy Full Full Full Full None
Through-wall range 18 m ~10 m 18 m per node Tunable None
Deployment time 30 min Hours 1 hour Hours Minutes
IEEE 802.11bf ready No No Forward-compatible Forward-compatible N/A

The 6-node ESP32 + RuView configuration achieves 70-80% of camera DensePose accuracy at $84 total cost with complete visual privacy and through-wall capability. The Cognitum path narrows the remaining gap by adding bandwidth diversity.


10. References

WiFi Sensing and Pose Estimation

Signal Processing and Localization

MIMO Radar and Array Theory

Standards and Hardware

Project ADRs

  • ADR-004: HNSW Vector Search for CSI Fingerprinting
  • ADR-012: ESP32 CSI Sensor Mesh for Distributed Sensing
  • ADR-014: SOTA Signal Processing Algorithms for WiFi Sensing
  • ADR-016: RuVector Training Pipeline Integration
  • ADR-017: RuVector Signal and MAT Integration
  • ADR-024: Project AETHER -- Contrastive CSI Embedding Model