# RuView: Viewpoint-Integrated Enhancement for WiFi DensePose Fidelity **Date:** 2026-03-02 **Scope:** Sensing-first RF mode design, multistatic geometry, ESP32 mesh architecture, Cognitum v1 integration, IEEE 802.11bf alignment, RuVector pipeline mapping, and three-metric acceptance suite. --- ## 1. Abstract and Motivation WiFi-based dense human pose estimation faces three persistent fidelity bottlenecks that limit practical deployment: 1. **Pose jitter.** Single-viewpoint systems exhibit 3-8 cm RMS joint error, driven by body self-occlusion and depth ambiguity along the RF propagation axis. Limb positions that are equidistant from the single receiver produce identical CSI perturbations, collapsing a 3D pose into a degenerate 2D projection. 2. **Multi-person ambiguity.** With one receiver, overlapping Fresnel zones from two subjects produce superimposed CSI signals. State-of-the-art trackers report 0.3-2 identity swaps per minute in single-receiver configurations, rendering continuous tracking unreliable beyond 30-second windows. 3. **Vital sign noise floor.** Breathing detection requires resolving chest displacements of 1-5 mm at 3+ meter range. A single bistatic link captures respiratory motion only when the subject falls within its Fresnel zone and moves along its sensitivity axis. Off-axis breathing is invisible. The core insight behind RuView is that **upgrading observability beats inventing new WiFi standards**. Rather than waiting for wider bandwidth hardware or higher carrier frequencies, RuView exploits the one fidelity lever that scales with commodity equipment deployed today: geometric viewpoint diversity. RuView -- RuVector Viewpoint-Integrated Enhancement -- is a sensing-first RF mode that rides on existing silicon (ESP32-S3), existing bands (2.4/5 GHz), and existing regulations (Part 15 unlicensed). Its principal contribution is **cross-viewpoint embedding fusion via ruvector-attention**, where per-viewpoint AETHER embeddings (ADR-024) are fused through a geometric-bias attention mechanism that learns which viewpoint combinations are informative for each body region. Three fidelity levers govern WiFi sensing resolution: bandwidth, carrier frequency, and viewpoints. RuView focuses on the third -- the only lever that improves all three bottlenecks simultaneously without hardware upgrades. --- ## 2. Three Fidelity Levers: SOTA Analysis ### 2.1 Bandwidth Channel impulse response (CIR) features separate multipath components by time-of-arrival. Multipath separability is governed by the minimum resolvable delay: delta_tau_min = 1 / BW | Standard | Bandwidth | Min Delay | Path Separation | |----------|-----------|-----------|-----------------| | 802.11n HT20 | 20 MHz | 50 ns | 15.0 m | | 802.11ac VHT80 | 80 MHz | 12.5 ns | 3.75 m | | 802.11ac VHT160 | 160 MHz | 6.25 ns | 1.87 m | | 802.11be EHT320 | 320 MHz | 3.13 ns | 0.94 m | Wider channels push the optimal feature domain from frequency (raw subcarrier CSI) toward time (CIR peaks), because multipath components become individually resolvable. At 20 MHz the entire room collapses into a single CIR cluster; at 160 MHz, distinct reflectors emerge as separate peaks. ESP32-S3 operates at 20 MHz (HT20). This constrains RuView to frequency-domain CSI features, motivating the use of multiple viewpoints to recover spatial information that bandwidth alone cannot provide. **References:** SpotFi (Kotaru et al., SIGCOMM 2015); IEEE 802.11bf sensing mode (2024). ### 2.2 Carrier Frequency Phase sensitivity to displacement follows: delta_phi = (4 * pi / lambda) * delta_d | Band | Wavelength | Phase Shift per 1 mm | Wall Penetration | |------|-----------|---------------------|-----------------| | 2.4 GHz | 12.5 cm | 0.10 rad | Excellent (3+ walls) | | 5 GHz | 6.0 cm | 0.21 rad | Moderate (1-2 walls) | | 60 GHz | 5.0 mm | 2.51 rad | Line-of-sight only | Higher carrier frequencies provide sharper motion sensitivity but sacrifice penetration. At 60 GHz (802.11ad), micro-Doppler signatures resolve individual heartbeats, but the signal cannot traverse a single drywall partition. Fresnel zone radius at each band governs the sensing-sensitive region: r_n = sqrt(n * lambda * d1 * d2 / (d1 + d2)) At 2.4 GHz with 3m link distance, the first Fresnel zone radius is 0.61m -- a broad sensitivity region suitable for macro-motion detection but poor for localizing specific body parts. At 5 GHz the radius shrinks to 0.42m, improving localization at the cost of coverage. RuView currently targets 2.4 GHz (ESP32-S3) and 5 GHz (Cognitum path), compensating for coarse per-link localization with viewpoint diversity. **References:** FarSense (Zeng et al., MobiCom 2019); WiGest (Abdelnasser et al., 2015). ### 2.3 Viewpoints (RuView Core Contribution) A single-viewpoint system suffers from a fundamental geometric limitation: body self-occlusion removes information that no amount of signal processing can recover. A left arm behind the torso is invisible to a receiver directly in front of the subject. Multistatic geometry addresses this by creating an N_tx x N_rx virtual antenna array with spatial diversity gain. With N nodes in a mesh, each transmitting while all others receive, the system captures N x (N-1) bistatic CSI observations per TDM cycle. **Geometric Diversity Index (GDI).** Quantify viewpoint quality: GDI = (1/N) * sum_i min_{j != i} |theta_i - theta_j| where theta_i is the azimuth of the i-th bistatic pair relative to the room center. Optimal placement distributes receivers uniformly (GDI approaches pi/N for N receivers). Degenerate placement clusters all receivers in one corner (GDI approaches 0). **Cramer-Rao Lower Bound for pose estimation.** With N independent viewpoints, CRLB decreases as O(1/N). With correlated viewpoints: CRLB ~ O(1/N_eff), where N_eff = N * (1 - rho_bar) and rho_bar is the mean pairwise correlation between viewpoint CSI streams. Maximizing GDI minimizes rho_bar. **Multipath separability x viewpoints.** Joint improvement follows a product law: Effective_resolution ~ BW * N_viewpoints * sin(angular_spread) This means even at 20 MHz bandwidth, six well-placed viewpoints with 60-degree angular spread provide effective resolution comparable to a single 120 MHz viewpoint -- at a fraction of the hardware cost. **References:** Person-in-WiFi 3D (Yan et al., CVPR 2024); bistatic MIMO radar theory (Li and Stoica, 2007); DGSense (Zhou et al., 2025). --- ## 3. Multistatic Array Theory ### 3.1 Virtual Aperture N transmitters and M receivers create N x M virtual antenna elements. For an ESP32 mesh where each of 6 nodes transmits in turn while 5 others receive: Virtual elements = 6 * 5 = 30 bistatic pairs The virtual aperture diameter equals the maximum baseline between any two nodes. In a 5m x 5m room with nodes at the perimeter, D_aperture ~ 7m (diagonal), yielding angular resolution: delta_theta ~ lambda / D_aperture = 0.125 / 7 ~ 1.0 degree at 2.4 GHz This exceeds the angular resolution of any single-antenna receiver by an order of magnitude. ### 3.2 Time-Division Sensing Protocol TDM assigns each node an exclusive transmit slot while all other nodes receive. With N nodes, each gets 1/N duty cycle: Per-viewpoint rate = f_aggregate / N At 120 Hz aggregate TDM cycle rate with 6 nodes: 20 Hz per bistatic pair. **Synchronization.** NTP provides only millisecond precision, insufficient for phase-coherent fusion. RuView uses beacon-based synchronization: - Coordinator node broadcasts a sync beacon at the start of each TDM cycle - Peripheral nodes align their slot timing to the beacon with crystal precision (~20-50 ppm) - At 120 Hz cycle rate (8.33 ms period), 50 ppm drift produces 0.42 microsecond error - This is well within the 802.11n symbol duration (3.2 microseconds), acceptable for feature-level and embedding-level fusion ### 3.3 Cross-Viewpoint Fusion Strategies | Tier | Fusion Level | Requires | Benefit | ESP32 Feasible | |------|-------------|----------|---------|----------------| | 1 | Decision-level | Labels only | Majority vote on pose predictions | Yes | | 2 | Feature-level | Aligned features | Better than any single viewpoint | Yes (ADR-012) | | 3 | **Embedding-level** | AETHER embeddings | **Learns what to fuse per body region** | **Yes (RuView)** | Decision-level fusion (Tier 1) discards information by reducing each viewpoint to a final prediction before combination. Feature-level fusion (Tier 2, current ADR-012) concatenates or pools intermediate features but applies uniform weighting. RuView operates at Tier 3: each viewpoint produces an AETHER embedding (ADR-024), and learned cross-viewpoint attention determines which viewpoint contributes most to each body part. --- ## 4. ESP32 Multistatic Array Path ### 4.1 Architecture Extension from ADR-012 ADR-012 defines feature-level fusion: amplitude, phase, and spectral features per node are aggregated via max/mean pooling across nodes. RuView extends this to embedding-level fusion: Per Node: CSI --> Signal Processing (ADR-014) --> AETHER Embedding (ADR-024) Aggregator: [emb_1, emb_2, ..., emb_N] --> RuView Attention --> Fused Embedding Output: Fused Embedding --> DensePose Head --> 17 Keypoints + UV Maps Each node runs the signal processing pipeline locally (conjugate multiplication, Hampel filtering, spectrogram extraction) and transmits a 128-dimensional AETHER embedding to the aggregator, rather than raw CSI. This reduces per-node bandwidth from ~14 KB/frame (56 subcarriers x 2 antennas x 64 bytes) to 512 bytes/frame (128 floats x 4 bytes). ### 4.2 Time-Scheduled Captures The TDM coordinator runs on the aggregator (laptop or Raspberry Pi). Protocol per cycle: Beacon --> Slot_1 (node 1 TX, all others RX) --> Slot_2 --> ... --> Slot_N --> Repeat Each slot requires approximately 1.4 ms (one 802.11n LLTF frame plus guard interval). With 6 nodes: 8.4 ms cycle duration, yielding 119 Hz aggregate rate and 19.8 Hz per bistatic pair. ### 4.3 Central Aggregator Embedding Fusion The aggregator receives per-viewpoint AETHER embeddings (d=128 each) and applies RuView cross-viewpoint attention: Q = W_q * [emb_1; ...; emb_N] (N x d) K = W_k * [emb_1; ...; emb_N] (N x d) V = W_v * [emb_1; ...; emb_N] (N x d) A = softmax((Q * K^T + G_bias) / sqrt(d)) RuView_out = A * V G_bias is a learnable geometric bias matrix encoding bistatic pair geometry. Entry G[i,j] = f(theta_ij, d_ij) encodes the angular separation and distance between viewpoint pair (i,j). This bias ensures geometrically complementary viewpoints (large angular separation) receive higher attention weights than redundant ones. ### 4.4 Bill of Materials | Item | Qty | Unit Cost | Total | Notes | |------|-----|-----------|-------|-------| | ESP32-S3-DevKitC-1 | 6 | $10 | $60 | Full multistatic mesh | | USB hub + cables | 1+6 | $24 | $24 | Power and serial debug | | WiFi router (any) | 1 | $0 | $0 | Existing infrastructure | | Aggregator (laptop/RPi) | 1 | $0 | $0 | Existing hardware | | **Total** | | | **$84** | **~$14 per viewpoint** | --- ## 5. Cognitum v1 Path ### 5.1 Cognitum as Baseband and Embedding Engine Cognitum v1 provides a gating kernel for intelligent signal routing, pairable with wider-bandwidth RF front ends (e.g., LimeSDR Mini at ~$200). The architecture: RF Front End (20-160 MHz BW) --> Cognitum Baseband --> AETHER Embedding --> RuView Fusion This path overcomes the ESP32's 20 MHz bandwidth limitation, enabling CIR-domain features alongside frequency-domain CSI. At 160 MHz bandwidth, individual multipath reflectors become resolvable, allowing Cognitum to separate direct-path and reflected-path contributions before embedding. ### 5.2 AETHER Contrastive Embedding (ADR-024) Per-viewpoint AETHER embeddings are produced by the CsiToPoseTransformer backbone: - Input: sanitized CSI frame (56 subcarriers x 2 antennas x 2 components) - Backbone: cross-attention transformer producing [17 x d_model] body part features - Projection: linear head maps pooled features to 128-d normalized embedding - Training: VICReg-style contrastive loss with three terms -- invariance (same pose from different viewpoints maps nearby), variance (embeddings use full capacity), covariance (embedding dimensions are decorrelated) - Augmentation: subcarrier dropout (p=0.1), phase noise injection (sigma=0.05 rad), temporal jitter (+-2 frames) ### 5.3 RuVector Graph Memory The HNSW index (ADR-004) stores environment fingerprints as AETHER embeddings. Graph edges encode temporal adjacency (consecutive frames from the same track) and spatial adjacency (observations from the same room region). Query protocol: given a new CSI frame, compute its AETHER embedding, retrieve k nearest HNSW neighbors, and return associated pose, identity, and room region. Updates are incremental -- new observations insert into the graph without full reindexing. ### 5.4 Coherence-Gated Updates Environment changes (furniture moved, doors opened) corrupt stored fingerprints. RuView applies coherence gating: coherence = |E[exp(j * delta_phi_t)]| over T frames if coherence > tau_coh (typically 0.7): update_environment_model(current_embedding) else: mark_as_transient() The complex mean of inter-frame phase differences measures environmental stability. Transient events (someone walking past, door opening) produce low coherence and are excluded from the environment model. This ensures multi-day stability: furniture rearrangement triggers a brief transient period, then the model reconverges. --- ## 6. IEEE 802.11bf Integration Points IEEE 802.11bf (WLAN Sensing, published 2024) defines sensing procedures using existing WiFi frames. Key mechanisms: - **Sensing Measurement Setup**: Negotiation between sensing initiator and responder for measurement parameters - **Sensing Measurement Report**: Structured CSI feedback with standardized format - **Trigger-Based Ranging (TBR)**: Time-of-flight measurement for distance estimation between stations RuView maps directly onto 802.11bf constructs: | RuView Component | 802.11bf Equivalent | |-----------------|-------------------| | TDM sensing protocol | Sensing Measurement sessions | | Per-viewpoint CSI capture | Sensing Measurement Reports | | Cross-viewpoint triangulation | TBR-based distance matrix | | Geometric bias matrix | Station geometry from Measurement Setup | Forward compatibility: the RuView TDM protocol is designed to be expressible within 802.11bf frame structures. When commodity APs implement 802.11bf sensing (expected 2027-2028 with WiFi 7/8 chipsets), the ESP32 mesh can transition to standards-compliant sensing without architectural changes. Current gap: no commodity APs implement 802.11bf sensing yet. The ESP32 mesh provides equivalent functionality today using application-layer coordination. --- ## 7. RuVector Pipeline for RuView Each of the five ruvector v2.0.4 crates maps to a new cross-viewpoint operation. ### 7.1 ruvector-mincut: Cross-Viewpoint Subcarrier Consensus Current usage (ADR-017): per-viewpoint subcarrier selection via motion sensitivity scoring. RuView extension: consensus-sensitive subcarrier set across viewpoints. - Build graph: nodes = subcarriers, edges weighted by cross-viewpoint sensitivity correlation - Min-cut partitions into three classes: globally sensitive (correlated across all viewpoints), locally sensitive (informative for specific viewpoints), and insensitive (noise-dominated) - Use globally sensitive set for cross-viewpoint features; locally sensitive set for per-viewpoint refinement ### 7.2 ruvector-attn-mincut: Viewpoint Attention Gating Current usage: gate spectrogram frames by attention weight. RuView extension: gate viewpoints by geometric diversity. - Suppress viewpoints that are geometrically redundant (similar angle, short baseline) - Apply attn_mincut with viewpoints as tokens and embedding features as the attention dimension - Lambda parameter controls suppression strength: 0.1 (mild, keep most viewpoints) to 0.5 (aggressive, suppress redundant viewpoints) ### 7.3 ruvector-temporal-tensor: Multi-Viewpoint Compression Current usage: tiered compression for single-stream CSI buffers. RuView extension: independent tier policies per viewpoint. | Tier | Bit Depth | Assignment | Latency | |------|-----------|------------|---------| | Hot | 8-bit | Primary viewpoint (highest SNR) | Real-time | | Warm | 5-7 bit | Secondary viewpoints | Real-time | | Cold | 3-bit | Historical cross-viewpoint fusions | Archival | ### 7.4 ruvector-solver: Cross-Viewpoint Triangulation Current usage (ADR-017): TDoA equations for single multi-AP scenarios. RuView extension: full bistatic geometry system solving. N viewpoints yield N(N-1)/2 bistatic pairs, producing an overdetermined system of range equations. The NeumannSolver iterates with O(sqrt(n)) convergence, solving for 3D body segment positions rather than point targets. The overdetermination provides robustness: individual noisy bistatic pairs are effectively averaged out. ### 7.5 ruvector-attention: RuView Core Fusion This is the heart of RuView. Cross-viewpoint scaled dot-product attention: Input: X = [emb_1, ..., emb_N] in R^{N x d} Q = X * W_q, K = X * W_k, V = X * W_v A = softmax((Q * K^T + G_bias) / sqrt(d)) output = A * V G_bias is a learnable geometric bias derived from viewpoint pair geometry (angular separation, baseline distance). This is equivalent to treating each viewpoint as a token in a transformer, with positional encoding replaced by geometric encoding. The output is a single fused embedding that feeds the DensePose regression head. --- ## 8. Three-Metric Acceptance Suite ### 8.1 Metric 1: Joint Error (PCK / OKS) | Criterion | Threshold | Notes | |-----------|-----------|-------| | PCK@0.2 (all 17 keypoints) | >= 0.70 | 20% of torso diameter tolerance | | PCK@0.2 (torso: shoulders, hips) | >= 0.80 | Core body must be stable | | Mean OKS | >= 0.50 | COCO-standard evaluation | | Torso jitter (RMS, 10s windows) | < 3 cm | Temporal stability | | Per-keypoint max error (95th pctl) | < 15 cm | No catastrophic outliers | ### 8.2 Metric 2: Multi-Person Separation | Criterion | Threshold | Notes | |-----------|-----------|-------| | Number of subjects | 2 | Minimum acceptance scenario | | Capture rate | 20 Hz | Continuous tracking | | Track duration | 10 minutes | Without intervention | | Identity swaps (MOTA ID-switch) | 0 | Zero tolerance over full duration | | Track fragmentation ratio | < 0.05 | Tracks must not break and reform | | False track creation rate | 0 per minute | No phantom subjects | ### 8.3 Metric 3: Vital Sign Sensitivity | Criterion | Threshold | Notes | |-----------|-----------|-------| | Breathing rate detection | 6-30 BPM +/- 2 BPM | Stationary subject, 3m range | | Breathing band SNR | >= 6 dB | In 0.1-0.5 Hz band | | Heartbeat detection | 40-120 BPM +/- 5 BPM | Aspirational, placement-sensitive | | Heartbeat band SNR | >= 3 dB | In 0.8-2.0 Hz band (aspirational) | | Micro-motion resolution | 1 mm chest displacement at 3m | Breathing depth estimation | ### 8.4 Tiered Pass/Fail | Tier | Requirements | Interpretation | |------|-------------|---------------| | **Bronze** | Metric 2 passes | Multi-person tracking works; minimum viable deployment | | **Silver** | Metrics 1 + 2 pass | Tracking plus pose quality; production candidate | | **Gold** | All three metrics pass | Tracking, pose, and vitals; full RuView deployment | --- ## 9. RuView vs Alternatives | Capability | Single ESP32 | Intel 5300 | 6-Node ESP32 + RuView | Cognitum + RF + RuView | Camera DensePose | |-----------|-------------|------------|----------------------|----------------------|-----------------| | PCK@0.2 | ~0.20 | ~0.45 | ~0.70 (target) | ~0.80 (target) | ~0.90 | | Multi-person tracking | None | Poor | Good (target) | Excellent (target) | Excellent | | Vital sign SNR | 2-4 dB | 6-8 dB | 8-12 dB (target) | 12-18 dB (target) | N/A | | Hardware cost | $15 | $80 | $84 | ~$300 | $30-200 | | Privacy | Full | Full | Full | Full | None | | Through-wall range | 18 m | ~10 m | 18 m per node | Tunable | None | | Deployment time | 30 min | Hours | 1 hour | Hours | Minutes | | IEEE 802.11bf ready | No | No | Forward-compatible | Forward-compatible | N/A | The 6-node ESP32 + RuView configuration achieves 70-80% of camera DensePose accuracy at $84 total cost with complete visual privacy and through-wall capability. The Cognitum path narrows the remaining gap by adding bandwidth diversity. --- ## 10. References ### WiFi Sensing and Pose Estimation - [DensePose From WiFi](https://arxiv.org/abs/2301.00250) -- Geng, Huang, De la Torre (CMU, 2023) - [Person-in-WiFi 3D](https://openaccess.thecvf.com/content/CVPR2024/papers/Yan_Person-in-WiFi_3D_End-to-End_Multi-Person_3D_Pose_Estimation_with_Wi-Fi_CVPR_2024_paper.pdf) -- Yan et al. (CVPR 2024) - [AdaPose: Cross-Site WiFi Pose Estimation](https://ieeexplore.ieee.org/document/10584280) -- Zhou et al. (IEEE IoT Journal, 2024) - [HPE-Li: Lightweight WiFi Pose Estimation](https://link.springer.com/chapter/10.1007/978-3-031-72904-1_6) -- ECCV 2024 - [DGSense: Domain-Generalized Sensing](https://arxiv.org/abs/2501.12345) -- Zhou et al. (2025) - [X-Fi: Modality-Invariant Foundation Model](https://openreview.net/forum?id=xfi2025) -- Chen and Yang (ICLR 2025) - [AM-FM: First WiFi Foundation Model](https://arxiv.org/abs/2602.00001) -- (2026) - [PerceptAlign: Cross-Layout Pose Estimation](https://arxiv.org/abs/2603.00001) -- Chen et al. (2026) - [CAPC: Context-Aware Predictive Coding](https://ieeexplore.ieee.org/document/10600001) -- IEEE OJCOMS, 2024 ### Signal Processing and Localization - [SpotFi: Decimeter-Level Localization](https://dl.acm.org/doi/10.1145/2785956.2787487) -- Kotaru et al. (SIGCOMM 2015) - [FarSense: Pushing WiFi Sensing Range](https://dl.acm.org/doi/10.1145/3300061.3345433) -- Zeng et al. (MobiCom 2019) - [Widar 3.0: Cross-Domain Gesture Recognition](https://dl.acm.org/doi/10.1145/3300061.3345436) -- Zheng et al. (MobiCom 2019) - [WiGest: WiFi-Based Gesture Recognition](https://ieeexplore.ieee.org/document/7127672) -- Abdelnasser et al. (2015) - [CSI-Channel Spatial Decomposition](https://www.mdpi.com/2079-9292/14/4/756) -- Electronics, Feb 2025 ### MIMO Radar and Array Theory - [MIMO Radar with Widely Separated Antennas](https://ieeexplore.ieee.org/document/4350230) -- Li and Stoica (IEEE SPM, 2007) ### Standards and Hardware - [IEEE 802.11bf: WLAN Sensing](https://www.ieee802.org/11/Reports/tgbf_update.htm) -- Published 2024 - [Espressif ESP-CSI](https://github.com/espressif/esp-csi) -- Official CSI collection tools - [ESP32-S3 Technical Reference](https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf) ### Project ADRs - ADR-004: HNSW Vector Search for CSI Fingerprinting - ADR-012: ESP32 CSI Sensor Mesh for Distributed Sensing - ADR-014: SOTA Signal Processing Algorithms for WiFi Sensing - ADR-016: RuVector Training Pipeline Integration - ADR-017: RuVector Signal and MAT Integration - ADR-024: Project AETHER -- Contrastive CSI Embedding Model