Files

Claude 374b0fdcef docs: add RuView (ADR-028) sensing-first RF mode for multistatic fidelity

Introduce Project RuView — RuVector Viewpoint-Integrated Enhancement — a
sensing-first RF mode that improves WiFi DensePose fidelity through
cross-viewpoint embedding fusion on commodity ESP32 hardware.

Research document (docs/research/ruview-multistatic-fidelity-sota-2026.md):
- SOTA analysis of three fidelity levers: bandwidth, carrier frequency, viewpoints
- Multistatic array theory with virtual aperture and TDM sensing protocol
- ESP32 multistatic path ($84 BOM) and Cognitum v1 + RF front end path
- IEEE 802.11bf alignment and forward-compatibility mapping
- RuVector pipeline: all 5 crates mapped to cross-viewpoint operations
- Three-metric acceptance suite: joint error (PCK/OKS), multi-person
  separation (MOTA), vital sign sensitivity with Bronze/Silver/Gold tiers

ADR-028 (docs/adr/ADR-028-ruview-sensing-first-rf-mode.md):
- DDD bounded context: ViewpointFusion with MultistaticArray aggregate,
  ViewpointEmbedding entity, GeometricDiversityIndex value object
- Cross-viewpoint attention fusion via ruvector-attention with geometric bias
- TDM sensing protocol: 6 nodes, 119 Hz aggregate, 20 Hz per viewpoint
- Coherence-gated environment updates for multi-day stability
- File-level implementation plan across 4 phases (8 new source files)
- ADR interaction map: ADR-012, 014, 016/017, 021, 024, 027

https://claude.ai/code/session_01JBad1xig7AbGdbNiYJALZc

2026-03-02 02:07:31 +00:00

23 KiB

Raw Permalink Blame History

RuView: Viewpoint-Integrated Enhancement for WiFi DensePose Fidelity

Date: 2026-03-02 Scope: Sensing-first RF mode design, multistatic geometry, ESP32 mesh architecture, Cognitum v1 integration, IEEE 802.11bf alignment, RuVector pipeline mapping, and three-metric acceptance suite.

1. Abstract and Motivation

WiFi-based dense human pose estimation faces three persistent fidelity bottlenecks that limit practical deployment:

Pose jitter. Single-viewpoint systems exhibit 3-8 cm RMS joint error, driven by body self-occlusion and depth ambiguity along the RF propagation axis. Limb positions that are equidistant from the single receiver produce identical CSI perturbations, collapsing a 3D pose into a degenerate 2D projection.
Multi-person ambiguity. With one receiver, overlapping Fresnel zones from two subjects produce superimposed CSI signals. State-of-the-art trackers report 0.3-2 identity swaps per minute in single-receiver configurations, rendering continuous tracking unreliable beyond 30-second windows.
Vital sign noise floor. Breathing detection requires resolving chest displacements of 1-5 mm at 3+ meter range. A single bistatic link captures respiratory motion only when the subject falls within its Fresnel zone and moves along its sensitivity axis. Off-axis breathing is invisible.

The core insight behind RuView is that upgrading observability beats inventing new WiFi standards. Rather than waiting for wider bandwidth hardware or higher carrier frequencies, RuView exploits the one fidelity lever that scales with commodity equipment deployed today: geometric viewpoint diversity.

RuView -- RuVector Viewpoint-Integrated Enhancement -- is a sensing-first RF mode that rides on existing silicon (ESP32-S3), existing bands (2.4/5 GHz), and existing regulations (Part 15 unlicensed). Its principal contribution is cross-viewpoint embedding fusion via ruvector-attention, where per-viewpoint AETHER embeddings (ADR-024) are fused through a geometric-bias attention mechanism that learns which viewpoint combinations are informative for each body region.

Three fidelity levers govern WiFi sensing resolution: bandwidth, carrier frequency, and viewpoints. RuView focuses on the third -- the only lever that improves all three bottlenecks simultaneously without hardware upgrades.

2. Three Fidelity Levers: SOTA Analysis

2.1 Bandwidth

Channel impulse response (CIR) features separate multipath components by time-of-arrival. Multipath separability is governed by the minimum resolvable delay:

delta_tau_min = 1 / BW

Standard	Bandwidth	Min Delay	Path Separation
802.11n HT20	20 MHz	50 ns	15.0 m
802.11ac VHT80	80 MHz	12.5 ns	3.75 m
802.11ac VHT160	160 MHz	6.25 ns	1.87 m
802.11be EHT320	320 MHz	3.13 ns	0.94 m

Wider channels push the optimal feature domain from frequency (raw subcarrier CSI) toward time (CIR peaks), because multipath components become individually resolvable. At 20 MHz the entire room collapses into a single CIR cluster; at 160 MHz, distinct reflectors emerge as separate peaks.

ESP32-S3 operates at 20 MHz (HT20). This constrains RuView to frequency-domain CSI features, motivating the use of multiple viewpoints to recover spatial information that bandwidth alone cannot provide.

References: SpotFi (Kotaru et al., SIGCOMM 2015); IEEE 802.11bf sensing mode (2024).

2.2 Carrier Frequency

Phase sensitivity to displacement follows:

delta_phi = (4 * pi / lambda) * delta_d

Band	Wavelength	Phase Shift per 1 mm	Wall Penetration
2.4 GHz	12.5 cm	0.10 rad	Excellent (3+ walls)
5 GHz	6.0 cm	0.21 rad	Moderate (1-2 walls)
60 GHz	5.0 mm	2.51 rad	Line-of-sight only

Higher carrier frequencies provide sharper motion sensitivity but sacrifice penetration. At 60 GHz (802.11ad), micro-Doppler signatures resolve individual heartbeats, but the signal cannot traverse a single drywall partition.

Fresnel zone radius at each band governs the sensing-sensitive region:

r_n = sqrt(n * lambda * d1 * d2 / (d1 + d2))

At 2.4 GHz with 3m link distance, the first Fresnel zone radius is 0.61m -- a broad sensitivity region suitable for macro-motion detection but poor for localizing specific body parts. At 5 GHz the radius shrinks to 0.42m, improving localization at the cost of coverage.

RuView currently targets 2.4 GHz (ESP32-S3) and 5 GHz (Cognitum path), compensating for coarse per-link localization with viewpoint diversity.

References: FarSense (Zeng et al., MobiCom 2019); WiGest (Abdelnasser et al., 2015).

2.3 Viewpoints (RuView Core Contribution)

A single-viewpoint system suffers from a fundamental geometric limitation: body self-occlusion removes information that no amount of signal processing can recover. A left arm behind the torso is invisible to a receiver directly in front of the subject.

Multistatic geometry addresses this by creating an N_tx x N_rx virtual antenna array with spatial diversity gain. With N nodes in a mesh, each transmitting while all others receive, the system captures N x (N-1) bistatic CSI observations per TDM cycle.

Geometric Diversity Index (GDI). Quantify viewpoint quality:

GDI = (1/N) * sum_i min_{j != i} |theta_i - theta_j|

where theta_i is the azimuth of the i-th bistatic pair relative to the room center. Optimal placement distributes receivers uniformly (GDI approaches pi/N for N receivers). Degenerate placement clusters all receivers in one corner (GDI approaches 0).

Cramer-Rao Lower Bound for pose estimation. With N independent viewpoints, CRLB decreases as O(1/N). With correlated viewpoints:

CRLB ~ O(1/N_eff),  where N_eff = N * (1 - rho_bar)

and rho_bar is the mean pairwise correlation between viewpoint CSI streams. Maximizing GDI minimizes rho_bar.

Multipath separability x viewpoints. Joint improvement follows a product law:

Effective_resolution ~ BW * N_viewpoints * sin(angular_spread)

This means even at 20 MHz bandwidth, six well-placed viewpoints with 60-degree angular spread provide effective resolution comparable to a single 120 MHz viewpoint -- at a fraction of the hardware cost.

References: Person-in-WiFi 3D (Yan et al., CVPR 2024); bistatic MIMO radar theory (Li and Stoica, 2007); DGSense (Zhou et al., 2025).

3. Multistatic Array Theory

3.1 Virtual Aperture

N transmitters and M receivers create N x M virtual antenna elements. For an ESP32 mesh where each of 6 nodes transmits in turn while 5 others receive:

Virtual elements = 6 * 5 = 30 bistatic pairs

The virtual aperture diameter equals the maximum baseline between any two nodes. In a 5m x 5m room with nodes at the perimeter, D_aperture ~ 7m (diagonal), yielding angular resolution:

delta_theta ~ lambda / D_aperture = 0.125 / 7 ~ 1.0 degree at 2.4 GHz

This exceeds the angular resolution of any single-antenna receiver by an order of magnitude.

3.2 Time-Division Sensing Protocol

TDM assigns each node an exclusive transmit slot while all other nodes receive. With N nodes, each gets 1/N duty cycle:

Per-viewpoint rate = f_aggregate / N

At 120 Hz aggregate TDM cycle rate with 6 nodes: 20 Hz per bistatic pair.

Synchronization. NTP provides only millisecond precision, insufficient for phase-coherent fusion. RuView uses beacon-based synchronization:

Coordinator node broadcasts a sync beacon at the start of each TDM cycle
Peripheral nodes align their slot timing to the beacon with crystal precision (~20-50 ppm)
At 120 Hz cycle rate (8.33 ms period), 50 ppm drift produces 0.42 microsecond error
This is well within the 802.11n symbol duration (3.2 microseconds), acceptable for feature-level and embedding-level fusion

3.3 Cross-Viewpoint Fusion Strategies

Tier	Fusion Level	Requires	Benefit	ESP32 Feasible
1	Decision-level	Labels only	Majority vote on pose predictions	Yes
2	Feature-level	Aligned features	Better than any single viewpoint	Yes (ADR-012)
3	Embedding-level	AETHER embeddings	Learns what to fuse per body region	Yes (RuView)

Decision-level fusion (Tier 1) discards information by reducing each viewpoint to a final prediction before combination. Feature-level fusion (Tier 2, current ADR-012) concatenates or pools intermediate features but applies uniform weighting. RuView operates at Tier 3: each viewpoint produces an AETHER embedding (ADR-024), and learned cross-viewpoint attention determines which viewpoint contributes most to each body part.

4. ESP32 Multistatic Array Path

4.1 Architecture Extension from ADR-012

ADR-012 defines feature-level fusion: amplitude, phase, and spectral features per node are aggregated via max/mean pooling across nodes. RuView extends this to embedding-level fusion:

Per Node:   CSI --> Signal Processing (ADR-014) --> AETHER Embedding (ADR-024)
Aggregator: [emb_1, emb_2, ..., emb_N] --> RuView Attention --> Fused Embedding
Output:     Fused Embedding --> DensePose Head --> 17 Keypoints + UV Maps

Each node runs the signal processing pipeline locally (conjugate multiplication, Hampel filtering, spectrogram extraction) and transmits a 128-dimensional AETHER embedding to the aggregator, rather than raw CSI. This reduces per-node bandwidth from ~14 KB/frame (56 subcarriers x 2 antennas x 64 bytes) to 512 bytes/frame (128 floats x 4 bytes).

4.2 Time-Scheduled Captures

The TDM coordinator runs on the aggregator (laptop or Raspberry Pi). Protocol per cycle:

Beacon --> Slot_1 (node 1 TX, all others RX) --> Slot_2 --> ... --> Slot_N --> Repeat

Each slot requires approximately 1.4 ms (one 802.11n LLTF frame plus guard interval). With 6 nodes: 8.4 ms cycle duration, yielding 119 Hz aggregate rate and 19.8 Hz per bistatic pair.

4.3 Central Aggregator Embedding Fusion

The aggregator receives per-viewpoint AETHER embeddings (d=128 each) and applies RuView cross-viewpoint attention:

Q = W_q * [emb_1; ...; emb_N]     (N x d)
K = W_k * [emb_1; ...; emb_N]     (N x d)
V = W_v * [emb_1; ...; emb_N]     (N x d)
A = softmax((Q * K^T + G_bias) / sqrt(d))
RuView_out = A * V

G_bias is a learnable geometric bias matrix encoding bistatic pair geometry. Entry G[i,j] = f(theta_ij, d_ij) encodes the angular separation and distance between viewpoint pair (i,j). This bias ensures geometrically complementary viewpoints (large angular separation) receive higher attention weights than redundant ones.

4.4 Bill of Materials

Item	Qty	Unit Cost	Total	Notes
ESP32-S3-DevKitC-1	6	$10	$60	Full multistatic mesh
USB hub + cables	1+6	$24	$24	Power and serial debug
WiFi router (any)	1	$0	$0	Existing infrastructure
Aggregator (laptop/RPi)	1	$0	$0	Existing hardware
Total			$84	~$14 per viewpoint

5. Cognitum v1 Path

5.1 Cognitum as Baseband and Embedding Engine

Cognitum v1 provides a gating kernel for intelligent signal routing, pairable with wider-bandwidth RF front ends (e.g., LimeSDR Mini at ~$200). The architecture:

RF Front End (20-160 MHz BW) --> Cognitum Baseband --> AETHER Embedding --> RuView Fusion

This path overcomes the ESP32's 20 MHz bandwidth limitation, enabling CIR-domain features alongside frequency-domain CSI. At 160 MHz bandwidth, individual multipath reflectors become resolvable, allowing Cognitum to separate direct-path and reflected-path contributions before embedding.

5.2 AETHER Contrastive Embedding (ADR-024)

Per-viewpoint AETHER embeddings are produced by the CsiToPoseTransformer backbone:

Input: sanitized CSI frame (56 subcarriers x 2 antennas x 2 components)
Backbone: cross-attention transformer producing [17 x d_model] body part features
Projection: linear head maps pooled features to 128-d normalized embedding
Training: VICReg-style contrastive loss with three terms -- invariance (same pose from different viewpoints maps nearby), variance (embeddings use full capacity), covariance (embedding dimensions are decorrelated)
Augmentation: subcarrier dropout (p=0.1), phase noise injection (sigma=0.05 rad), temporal jitter (+-2 frames)

5.3 RuVector Graph Memory

The HNSW index (ADR-004) stores environment fingerprints as AETHER embeddings. Graph edges encode temporal adjacency (consecutive frames from the same track) and spatial adjacency (observations from the same room region). Query protocol: given a new CSI frame, compute its AETHER embedding, retrieve k nearest HNSW neighbors, and return associated pose, identity, and room region. Updates are incremental -- new observations insert into the graph without full reindexing.

5.4 Coherence-Gated Updates

Environment changes (furniture moved, doors opened) corrupt stored fingerprints. RuView applies coherence gating:

coherence = |E[exp(j * delta_phi_t)]|   over T frames

if coherence > tau_coh (typically 0.7):
    update_environment_model(current_embedding)
else:
    mark_as_transient()

The complex mean of inter-frame phase differences measures environmental stability. Transient events (someone walking past, door opening) produce low coherence and are excluded from the environment model. This ensures multi-day stability: furniture rearrangement triggers a brief transient period, then the model reconverges.

6. IEEE 802.11bf Integration Points

IEEE 802.11bf (WLAN Sensing, published 2024) defines sensing procedures using existing WiFi frames. Key mechanisms:

Sensing Measurement Setup: Negotiation between sensing initiator and responder for measurement parameters
Sensing Measurement Report: Structured CSI feedback with standardized format
Trigger-Based Ranging (TBR): Time-of-flight measurement for distance estimation between stations

RuView maps directly onto 802.11bf constructs:

RuView Component	802.11bf Equivalent
TDM sensing protocol	Sensing Measurement sessions
Per-viewpoint CSI capture	Sensing Measurement Reports
Cross-viewpoint triangulation	TBR-based distance matrix
Geometric bias matrix	Station geometry from Measurement Setup

Forward compatibility: the RuView TDM protocol is designed to be expressible within 802.11bf frame structures. When commodity APs implement 802.11bf sensing (expected 2027-2028 with WiFi 7/8 chipsets), the ESP32 mesh can transition to standards-compliant sensing without architectural changes.

Current gap: no commodity APs implement 802.11bf sensing yet. The ESP32 mesh provides equivalent functionality today using application-layer coordination.

7. RuVector Pipeline for RuView

Each of the five ruvector v2.0.4 crates maps to a new cross-viewpoint operation.

7.1 ruvector-mincut: Cross-Viewpoint Subcarrier Consensus

Current usage (ADR-017): per-viewpoint subcarrier selection via motion sensitivity scoring. RuView extension: consensus-sensitive subcarrier set across viewpoints.

Build graph: nodes = subcarriers, edges weighted by cross-viewpoint sensitivity correlation
Min-cut partitions into three classes: globally sensitive (correlated across all viewpoints), locally sensitive (informative for specific viewpoints), and insensitive (noise-dominated)
Use globally sensitive set for cross-viewpoint features; locally sensitive set for per-viewpoint refinement

7.2 ruvector-attn-mincut: Viewpoint Attention Gating

Current usage: gate spectrogram frames by attention weight. RuView extension: gate viewpoints by geometric diversity.

Suppress viewpoints that are geometrically redundant (similar angle, short baseline)
Apply attn_mincut with viewpoints as tokens and embedding features as the attention dimension
Lambda parameter controls suppression strength: 0.1 (mild, keep most viewpoints) to 0.5 (aggressive, suppress redundant viewpoints)

7.3 ruvector-temporal-tensor: Multi-Viewpoint Compression

Current usage: tiered compression for single-stream CSI buffers. RuView extension: independent tier policies per viewpoint.

Tier	Bit Depth	Assignment	Latency
Hot	8-bit	Primary viewpoint (highest SNR)	Real-time
Warm	5-7 bit	Secondary viewpoints	Real-time
Cold	3-bit	Historical cross-viewpoint fusions	Archival

7.4 ruvector-solver: Cross-Viewpoint Triangulation

Current usage (ADR-017): TDoA equations for single multi-AP scenarios. RuView extension: full bistatic geometry system solving.

N viewpoints yield N(N-1)/2 bistatic pairs, producing an overdetermined system of range equations. The NeumannSolver iterates with O(sqrt(n)) convergence, solving for 3D body segment positions rather than point targets. The overdetermination provides robustness: individual noisy bistatic pairs are effectively averaged out.

7.5 ruvector-attention: RuView Core Fusion

This is the heart of RuView. Cross-viewpoint scaled dot-product attention:

Input: X = [emb_1, ..., emb_N] in R^{N x d}
Q = X * W_q,   K = X * W_k,   V = X * W_v
A = softmax((Q * K^T + G_bias) / sqrt(d))
output = A * V

G_bias is a learnable geometric bias derived from viewpoint pair geometry (angular separation, baseline distance). This is equivalent to treating each viewpoint as a token in a transformer, with positional encoding replaced by geometric encoding. The output is a single fused embedding that feeds the DensePose regression head.

8. Three-Metric Acceptance Suite

8.1 Metric 1: Joint Error (PCK / OKS)

Criterion	Threshold	Notes
PCK@0.2 (all 17 keypoints)	>= 0.70	20% of torso diameter tolerance
PCK@0.2 (torso: shoulders, hips)	>= 0.80	Core body must be stable
Mean OKS	>= 0.50	COCO-standard evaluation
Torso jitter (RMS, 10s windows)	< 3 cm	Temporal stability
Per-keypoint max error (95th pctl)	< 15 cm	No catastrophic outliers

8.2 Metric 2: Multi-Person Separation

Criterion	Threshold	Notes
Number of subjects	2	Minimum acceptance scenario
Capture rate	20 Hz	Continuous tracking
Track duration	10 minutes	Without intervention
Identity swaps (MOTA ID-switch)	0	Zero tolerance over full duration
Track fragmentation ratio	< 0.05	Tracks must not break and reform
False track creation rate	0 per minute	No phantom subjects

8.3 Metric 3: Vital Sign Sensitivity

Criterion	Threshold	Notes
Breathing rate detection	6-30 BPM +/- 2 BPM	Stationary subject, 3m range
Breathing band SNR	>= 6 dB	In 0.1-0.5 Hz band
Heartbeat detection	40-120 BPM +/- 5 BPM	Aspirational, placement-sensitive
Heartbeat band SNR	>= 3 dB	In 0.8-2.0 Hz band (aspirational)
Micro-motion resolution	1 mm chest displacement at 3m	Breathing depth estimation

8.4 Tiered Pass/Fail

Tier	Requirements	Interpretation
Bronze	Metric 2 passes	Multi-person tracking works; minimum viable deployment
Silver	Metrics 1 + 2 pass	Tracking plus pose quality; production candidate
Gold	All three metrics pass	Tracking, pose, and vitals; full RuView deployment

9. RuView vs Alternatives

Capability	Single ESP32	Intel 5300	6-Node ESP32 + RuView	Cognitum + RF + RuView	Camera DensePose
PCK@0.2	~0.20	~0.45	~0.70 (target)	~0.80 (target)	~0.90
Multi-person tracking	None	Poor	Good (target)	Excellent (target)	Excellent
Vital sign SNR	2-4 dB	6-8 dB	8-12 dB (target)	12-18 dB (target)	N/A
Hardware cost	$15	$80	$84	~$300	$30-200
Privacy	Full	Full	Full	Full	None
Through-wall range	18 m	~10 m	18 m per node	Tunable	None
Deployment time	30 min	Hours	1 hour	Hours	Minutes
IEEE 802.11bf ready	No	No	Forward-compatible	Forward-compatible	N/A

The 6-node ESP32 + RuView configuration achieves 70-80% of camera DensePose accuracy at $84 total cost with complete visual privacy and through-wall capability. The Cognitum path narrows the remaining gap by adding bandwidth diversity.

10. References

Project ADRs

ADR-004: HNSW Vector Search for CSI Fingerprinting
ADR-012: ESP32 CSI Sensor Mesh for Distributed Sensing
ADR-014: SOTA Signal Processing Algorithms for WiFi Sensing
ADR-016: RuVector Training Pipeline Integration
ADR-017: RuVector Signal and MAT Integration
ADR-024: Project AETHER -- Contrastive CSI Embedding Model

23 KiB Raw Permalink Blame History