docs: add RuView (ADR-028) sensing-first RF mode for multistatic fidelity
Introduce Project RuView — RuVector Viewpoint-Integrated Enhancement — a sensing-first RF mode that improves WiFi DensePose fidelity through cross-viewpoint embedding fusion on commodity ESP32 hardware. Research document (docs/research/ruview-multistatic-fidelity-sota-2026.md): - SOTA analysis of three fidelity levers: bandwidth, carrier frequency, viewpoints - Multistatic array theory with virtual aperture and TDM sensing protocol - ESP32 multistatic path ($84 BOM) and Cognitum v1 + RF front end path - IEEE 802.11bf alignment and forward-compatibility mapping - RuVector pipeline: all 5 crates mapped to cross-viewpoint operations - Three-metric acceptance suite: joint error (PCK/OKS), multi-person separation (MOTA), vital sign sensitivity with Bronze/Silver/Gold tiers ADR-028 (docs/adr/ADR-028-ruview-sensing-first-rf-mode.md): - DDD bounded context: ViewpointFusion with MultistaticArray aggregate, ViewpointEmbedding entity, GeometricDiversityIndex value object - Cross-viewpoint attention fusion via ruvector-attention with geometric bias - TDM sensing protocol: 6 nodes, 119 Hz aggregate, 20 Hz per viewpoint - Coherence-gated environment updates for multi-day stability - File-level implementation plan across 4 phases (8 new source files) - ADR interaction map: ADR-012, 014, 016/017, 021, 024, 027 https://claude.ai/code/session_01JBad1xig7AbGdbNiYJALZc
This commit is contained in:
369
docs/adr/ADR-028-ruview-sensing-first-rf-mode.md
Normal file
369
docs/adr/ADR-028-ruview-sensing-first-rf-mode.md
Normal file
@@ -0,0 +1,369 @@
|
|||||||
|
# ADR-028: Project RuView -- Sensing-First RF Mode for Multistatic Fidelity Enhancement
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| **Status** | Proposed |
|
||||||
|
| **Date** | 2026-03-02 |
|
||||||
|
| **Deciders** | ruv |
|
||||||
|
| **Codename** | **RuView** -- RuVector Viewpoint-Integrated Enhancement |
|
||||||
|
| **Relates to** | ADR-012 (ESP32 Mesh), ADR-014 (SOTA Signal), ADR-016 (RuVector Integration), ADR-017 (RuVector Signal+MAT), ADR-021 (Vital Signs), ADR-024 (AETHER Embeddings), ADR-027 (MERIDIAN Cross-Environment) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Context
|
||||||
|
|
||||||
|
### 1.1 The Single-Viewpoint Fidelity Ceiling
|
||||||
|
|
||||||
|
Current WiFi DensePose operates with a single transmitter-receiver pair (or single node receiving). This creates three fundamental limitations:
|
||||||
|
|
||||||
|
- **Body self-occlusion**: Limbs behind the torso are invisible to a single viewpoint.
|
||||||
|
- **Depth ambiguity**: Motion along the RF propagation axis (toward/away from receiver) produces minimal phase change.
|
||||||
|
- **Multi-person confusion**: Two people at similar range but different angles create overlapping CSI signatures.
|
||||||
|
|
||||||
|
The ESP32 mesh (ADR-012) partially addresses this via feature-level fusion across 3-6 nodes, but feature-level fusion cannot learn optimal fusion weights -- it uses hand-crafted aggregation (max, mean, coherent sum).
|
||||||
|
|
||||||
|
### 1.2 Three Fidelity Levers
|
||||||
|
|
||||||
|
1. **Bandwidth**: More bandwidth produces better multipath separability. Currently limited to 20 MHz (ESP32 HT20). Wider channels (80/160 MHz) are available on commodity 802.11ac/ax APs.
|
||||||
|
2. **Carrier frequency**: Higher frequency produces more phase sensitivity. 2.4 GHz sees macro-motion; 5 GHz sees micro-motion; 60 GHz sees vital signs.
|
||||||
|
3. **Viewpoints**: More viewpoints from different angles reduces geometric ambiguity. This is the lever RuView pulls.
|
||||||
|
|
||||||
|
### 1.3 Why "Sensing-First RF Mode"
|
||||||
|
|
||||||
|
RuView is NOT a new WiFi standard. It is a sensing-first protocol that rides on existing silicon, bands, and regulations. The key insight: instead of upgrading the RF hardware, upgrade the observability by coordinating multiple commodity receivers.
|
||||||
|
|
||||||
|
### 1.4 What Already Exists
|
||||||
|
|
||||||
|
| Component | ADR | Current State |
|
||||||
|
|-----------|-----|---------------|
|
||||||
|
| ESP32 mesh with feature-level fusion | ADR-012 | Implemented (firmware + aggregator) |
|
||||||
|
| SOTA signal processing (Hampel, Fresnel, BVP, spectrogram) | ADR-014 | Implemented |
|
||||||
|
| RuVector training pipeline (5 crates) | ADR-016 | Complete |
|
||||||
|
| RuVector signal + MAT integration (7 points) | ADR-017 | Accepted |
|
||||||
|
| Vital sign detection pipeline | ADR-021 | Partially implemented |
|
||||||
|
| AETHER contrastive embeddings | ADR-024 | Proposed |
|
||||||
|
| MERIDIAN cross-environment generalization | ADR-027 | Proposed |
|
||||||
|
|
||||||
|
RuView fills the gap: **cross-viewpoint embedding fusion** using learned attention weights.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Decision
|
||||||
|
|
||||||
|
Introduce RuView as a cross-viewpoint embedding fusion layer that operates on top of AETHER per-viewpoint embeddings. RuView adds a new bounded context (ViewpointFusion) and extends three existing crates.
|
||||||
|
|
||||||
|
### 2.1 Core Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
+-----------------------------------------------------------------+
|
||||||
|
| RuView Multistatic Pipeline |
|
||||||
|
+-----------------------------------------------------------------+
|
||||||
|
| |
|
||||||
|
| +----------+ +----------+ +----------+ +----------+ |
|
||||||
|
| | Node 1 | | Node 2 | | Node 3 | | Node N | |
|
||||||
|
| | ESP32-S3 | | ESP32-S3 | | ESP32-S3 | | ESP32-S3 | |
|
||||||
|
| | | | | | | | | |
|
||||||
|
| | CSI Rx | | CSI Rx | | CSI Rx | | CSI Rx | |
|
||||||
|
| +----+-----+ +----+-----+ +----+-----+ +----+-----+ |
|
||||||
|
| | | | | |
|
||||||
|
| v v v v |
|
||||||
|
| +--------------------------------------------------------+ |
|
||||||
|
| | Per-Viewpoint Signal Processing | |
|
||||||
|
| | Phase sanitize -> Hampel -> BVP -> Subcarrier select | |
|
||||||
|
| | (ADR-014, unchanged per viewpoint) | |
|
||||||
|
| +----------------------------+---------------------------+ |
|
||||||
|
| | |
|
||||||
|
| v |
|
||||||
|
| +--------------------------------------------------------+ |
|
||||||
|
| | Per-Viewpoint AETHER Embedding | |
|
||||||
|
| | CsiToPoseTransformer -> 128-d contrastive embedding | |
|
||||||
|
| | (ADR-024, one per viewpoint) | |
|
||||||
|
| +----------------------------+---------------------------+ |
|
||||||
|
| | |
|
||||||
|
| [emb_1, emb_2, ..., emb_N] |
|
||||||
|
| | |
|
||||||
|
| v |
|
||||||
|
| +--------------------------------------------------------+ |
|
||||||
|
| | * RuView Cross-Viewpoint Fusion * | |
|
||||||
|
| | | |
|
||||||
|
| | Q = W_q * X, K = W_k * X, V = W_v * X | |
|
||||||
|
| | A = softmax((QK^T + G_bias) / sqrt(d)) | |
|
||||||
|
| | fused = A * V | |
|
||||||
|
| | | |
|
||||||
|
| | G_bias: geometric bias from viewpoint pair geometry | |
|
||||||
|
| | (ruvector-attention: ScaledDotProductAttention) | |
|
||||||
|
| +----------------------------+---------------------------+ |
|
||||||
|
| | |
|
||||||
|
| fused_embedding |
|
||||||
|
| | |
|
||||||
|
| v |
|
||||||
|
| +--------------------------------------------------------+ |
|
||||||
|
| | DensePose Regression Head | |
|
||||||
|
| | Keypoint head: [B,17,H,W] | |
|
||||||
|
| | Part/UV head: [B,25,H,W] + [B,48,H,W] | |
|
||||||
|
| +--------------------------------------------------------+ |
|
||||||
|
+-----------------------------------------------------------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.2 TDM Sensing Protocol
|
||||||
|
|
||||||
|
- Coordinator (aggregator) broadcasts sync beacon at start of each cycle.
|
||||||
|
- Each node transmits in assigned time slot; all others receive.
|
||||||
|
- 6 nodes x 1.4 ms/slot = 8.4 ms cycle -> ~119 Hz aggregate, ~20 Hz per bistatic pair.
|
||||||
|
- Clock drift handled at feature level (no cross-node phase alignment).
|
||||||
|
|
||||||
|
### 2.3 Geometric Bias Matrix
|
||||||
|
|
||||||
|
The geometric bias `G_bias` encodes the spatial relationship between viewpoint pairs:
|
||||||
|
|
||||||
|
```
|
||||||
|
G_bias[i,j] = w_angle * cos(theta_ij) + w_dist * exp(-d_ij / d_ref)
|
||||||
|
```
|
||||||
|
|
||||||
|
where:
|
||||||
|
|
||||||
|
- `theta_ij` = angle between viewpoint i and viewpoint j (from room center)
|
||||||
|
- `d_ij` = baseline distance between node i and node j
|
||||||
|
- `w_angle`, `w_dist` = learnable weights
|
||||||
|
- `d_ref` = reference distance (room diagonal / 2)
|
||||||
|
|
||||||
|
This allows the attention mechanism to learn that widely-separated, orthogonal viewpoints are more complementary than clustered ones.
|
||||||
|
|
||||||
|
### 2.4 Coherence-Gated Environment Updates
|
||||||
|
|
||||||
|
```rust
|
||||||
|
/// Only update environment model when phase coherence exceeds threshold.
|
||||||
|
pub fn coherence_gate(
|
||||||
|
phase_diffs: &[f32], // delta-phi over T recent frames
|
||||||
|
threshold: f32, // typically 0.7
|
||||||
|
) -> bool {
|
||||||
|
// Complex mean of unit phasors
|
||||||
|
let (sum_cos, sum_sin) = phase_diffs.iter()
|
||||||
|
.fold((0.0f32, 0.0f32), |(c, s), &dp| {
|
||||||
|
(c + dp.cos(), s + dp.sin())
|
||||||
|
});
|
||||||
|
let n = phase_diffs.len() as f32;
|
||||||
|
let coherence = ((sum_cos / n).powi(2) + (sum_sin / n).powi(2)).sqrt();
|
||||||
|
coherence > threshold
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.5 Two Implementation Paths
|
||||||
|
|
||||||
|
| Path | Hardware | Bandwidth | Per-Viewpoint Rate | Target Tier |
|
||||||
|
|------|----------|-----------|-------------------|-------------|
|
||||||
|
| **ESP32 Multistatic** | 6x ESP32-S3 ($84) | 20 MHz (HT20) | 20 Hz | Silver |
|
||||||
|
| **Cognitum + RF** | Cognitum v1 + LimeSDR | 20-160 MHz | 20-100 Hz | Gold |
|
||||||
|
|
||||||
|
ESP32 path: commodity, achievable today, targets Silver tier (tracking + pose quality).
|
||||||
|
Cognitum path: higher fidelity, targets Gold tier (tracking + pose + vitals).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. DDD Design
|
||||||
|
|
||||||
|
### 3.1 New Bounded Context: ViewpointFusion
|
||||||
|
|
||||||
|
**Aggregate Root: `MultistaticArray`**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct MultistaticArray {
|
||||||
|
/// Unique array deployment ID
|
||||||
|
id: ArrayId,
|
||||||
|
/// Viewpoint geometry (node positions, orientations)
|
||||||
|
geometry: ArrayGeometry,
|
||||||
|
/// TDM schedule (slot assignments, cycle period)
|
||||||
|
schedule: TdmSchedule,
|
||||||
|
/// Active viewpoint embeddings (latest per node)
|
||||||
|
viewpoints: Vec<ViewpointEmbedding>,
|
||||||
|
/// Fused output embedding
|
||||||
|
fused: Option<FusedEmbedding>,
|
||||||
|
/// Coherence gate state
|
||||||
|
coherence_state: CoherenceState,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Entity: `ViewpointEmbedding`**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct ViewpointEmbedding {
|
||||||
|
/// Source node ID
|
||||||
|
node_id: NodeId,
|
||||||
|
/// AETHER embedding vector (128-d)
|
||||||
|
embedding: Vec<f32>,
|
||||||
|
/// Geometric metadata
|
||||||
|
azimuth: f32, // radians from array center
|
||||||
|
elevation: f32, // radians
|
||||||
|
baseline: f32, // meters from centroid
|
||||||
|
/// Capture timestamp
|
||||||
|
timestamp: Instant,
|
||||||
|
/// Signal quality
|
||||||
|
snr_db: f32,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Value Object: `GeometricDiversityIndex`**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct GeometricDiversityIndex {
|
||||||
|
/// GDI = (1/N) sum min_{j!=i} |theta_i - theta_j|
|
||||||
|
value: f32,
|
||||||
|
/// Effective independent viewpoints (after correlation discount)
|
||||||
|
n_effective: f32,
|
||||||
|
/// Worst viewpoint pair (most redundant)
|
||||||
|
worst_pair: (NodeId, NodeId),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Domain Events:**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub enum ViewpointFusionEvent {
|
||||||
|
ViewpointCaptured { node_id: NodeId, timestamp: Instant, snr_db: f32 },
|
||||||
|
TdmCycleCompleted { cycle_id: u64, viewpoints_received: usize },
|
||||||
|
FusionCompleted { fused_embedding: Vec<f32>, gdi: f32 },
|
||||||
|
CoherenceGateTriggered { coherence: f32, accepted: bool },
|
||||||
|
GeometryUpdated { new_gdi: f32, n_effective: f32 },
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.2 Extended Bounded Contexts
|
||||||
|
|
||||||
|
**Signal (wifi-densepose-signal):**
|
||||||
|
- New service: `CrossViewpointSubcarrierSelection`
|
||||||
|
- Consensus sensitive subcarrier set across all viewpoints via ruvector-mincut.
|
||||||
|
- Input: per-viewpoint sensitivity scores. Output: globally-sensitive + locally-sensitive partition.
|
||||||
|
|
||||||
|
**Hardware (wifi-densepose-hardware):**
|
||||||
|
- New protocol: `TdmSensingProtocol`
|
||||||
|
- Coordinator logic: beacon generation, slot scheduling, clock drift compensation.
|
||||||
|
- Event: `TdmSlotCompleted { node_id, slot_index, capture_quality }`
|
||||||
|
|
||||||
|
**Training (wifi-densepose-train):**
|
||||||
|
- New module: `ruview_metrics.rs`
|
||||||
|
- Three-metric acceptance test: PCK/OKS (joint error), MOTA (multi-person separation), vital sign accuracy.
|
||||||
|
- Tiered pass/fail: Bronze/Silver/Gold.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Implementation Plan (File-Level)
|
||||||
|
|
||||||
|
### 4.1 Phase 1: ViewpointFusion Core (New Files)
|
||||||
|
|
||||||
|
| File | Purpose | RuVector Crate |
|
||||||
|
|------|---------|---------------|
|
||||||
|
| `crates/wifi-densepose-ruvector/src/viewpoint/mod.rs` | Module root, re-exports | -- |
|
||||||
|
| `crates/wifi-densepose-ruvector/src/viewpoint/attention.rs` | Cross-viewpoint scaled dot-product attention with geometric bias | ruvector-attention |
|
||||||
|
| `crates/wifi-densepose-ruvector/src/viewpoint/geometry.rs` | GeometricDiversityIndex, Cramer-Rao bound estimation | ruvector-solver |
|
||||||
|
| `crates/wifi-densepose-ruvector/src/viewpoint/coherence.rs` | Coherence gating for environment stability | -- (pure math) |
|
||||||
|
| `crates/wifi-densepose-ruvector/src/viewpoint/fusion.rs` | MultistaticArray aggregate, orchestrates fusion pipeline | ruvector-attention + ruvector-attn-mincut |
|
||||||
|
|
||||||
|
### 4.2 Phase 2: Signal Processing Extension
|
||||||
|
|
||||||
|
| File | Purpose | RuVector Crate |
|
||||||
|
|------|---------|---------------|
|
||||||
|
| `crates/wifi-densepose-signal/src/cross_viewpoint.rs` | Cross-viewpoint subcarrier consensus via min-cut | ruvector-mincut |
|
||||||
|
|
||||||
|
### 4.3 Phase 3: Hardware Protocol Extension
|
||||||
|
|
||||||
|
| File | Purpose | RuVector Crate |
|
||||||
|
|------|---------|---------------|
|
||||||
|
| `crates/wifi-densepose-hardware/src/esp32/tdm.rs` | TDM sensing protocol coordinator | -- (protocol logic) |
|
||||||
|
|
||||||
|
### 4.4 Phase 4: Training and Metrics
|
||||||
|
|
||||||
|
| File | Purpose | RuVector Crate |
|
||||||
|
|------|---------|---------------|
|
||||||
|
| `crates/wifi-densepose-train/src/ruview_metrics.rs` | Three-metric acceptance test (PCK/OKS, MOTA, vital sign accuracy) | ruvector-mincut (person matching) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Three-Metric Acceptance Test
|
||||||
|
|
||||||
|
### 5.1 Metric 1: Joint Error (PCK / OKS)
|
||||||
|
|
||||||
|
| Criterion | Threshold |
|
||||||
|
|-----------|-----------|
|
||||||
|
| PCK@0.2 (all 17 keypoints) | >= 0.70 |
|
||||||
|
| PCK@0.2 (torso: shoulders + hips) | >= 0.80 |
|
||||||
|
| Mean OKS | >= 0.50 |
|
||||||
|
| Torso jitter RMS (10s window) | < 3 cm |
|
||||||
|
| Per-keypoint max error (95th percentile) | < 15 cm |
|
||||||
|
|
||||||
|
### 5.2 Metric 2: Multi-Person Separation
|
||||||
|
|
||||||
|
| Criterion | Threshold |
|
||||||
|
|-----------|-----------|
|
||||||
|
| Subjects | 2 |
|
||||||
|
| Capture rate | 20 Hz |
|
||||||
|
| Track duration | 10 minutes |
|
||||||
|
| Identity swaps (MOTA ID-switch) | 0 |
|
||||||
|
| Track fragmentation ratio | < 0.05 |
|
||||||
|
| False track creation | 0/min |
|
||||||
|
|
||||||
|
### 5.3 Metric 3: Vital Sign Sensitivity
|
||||||
|
|
||||||
|
| Criterion | Threshold |
|
||||||
|
|-----------|-----------|
|
||||||
|
| Breathing detection (6-30 BPM) | +/- 2 BPM |
|
||||||
|
| Breathing band SNR (0.1-0.5 Hz) | >= 6 dB |
|
||||||
|
| Heartbeat detection (40-120 BPM) | +/- 5 BPM (aspirational) |
|
||||||
|
| Heartbeat band SNR (0.8-2.0 Hz) | >= 3 dB (aspirational) |
|
||||||
|
| Micro-motion resolution | 1 mm at 3m |
|
||||||
|
|
||||||
|
### 5.4 Tiered Pass/Fail
|
||||||
|
|
||||||
|
| Tier | Requirements | Deployment Gate |
|
||||||
|
|------|-------------|-----------------|
|
||||||
|
| Bronze | Metric 2 | Prototype demo |
|
||||||
|
| Silver | Metrics 1 + 2 | Production candidate |
|
||||||
|
| Gold | All three | Full deployment |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Consequences
|
||||||
|
|
||||||
|
### 6.1 Positive
|
||||||
|
|
||||||
|
- **Fundamental geometric improvement**: Viewpoint diversity reduces body self-occlusion and depth ambiguity -- these are physics, not model, limitations.
|
||||||
|
- **Uses existing silicon**: ESP32-S3, commodity WiFi, no custom RF hardware required for Silver tier.
|
||||||
|
- **Learned fusion weights**: Embedding-level fusion (Tier 3) outperforms hand-crafted feature-level fusion (Tier 2).
|
||||||
|
- **Composes with existing ADRs**: AETHER (per-viewpoint), MERIDIAN (cross-environment), and RuView (cross-viewpoint) are orthogonal -- they compose freely.
|
||||||
|
- **IEEE 802.11bf aligned**: TDM protocol maps to 802.11bf sensing sessions, enabling future migration to standard-compliant APs.
|
||||||
|
- **Commodity price point**: $84 for 6-node Silver-tier deployment.
|
||||||
|
|
||||||
|
### 6.2 Negative
|
||||||
|
|
||||||
|
- **TDM rate reduction**: N viewpoints leads to per-viewpoint rate divided by N. With 6 nodes at 120 Hz aggregate, each viewpoint sees 20 Hz.
|
||||||
|
- **More complex aggregator**: Embedding fusion + geometric bias learning adds ~25K parameters on top of per-viewpoint AETHER model.
|
||||||
|
- **Placement planning required**: Geometric Diversity Index optimization requires intentional node placement (not random scatter).
|
||||||
|
- **Clock drift limits TDM precision**: ESP32 crystal drift (20-50 ppm) limits slot precision to ~1 ms, which is sufficient for feature-level fusion but not signal-level coherent combining.
|
||||||
|
- **Training data**: Cross-viewpoint training requires multi-receiver CSI captures, which are not available in existing public datasets (MM-Fi, Wi-Pose).
|
||||||
|
|
||||||
|
### 6.3 Interaction with Other ADRs
|
||||||
|
|
||||||
|
| ADR | Interaction |
|
||||||
|
|-----|------------|
|
||||||
|
| ADR-012 (ESP32 Mesh) | RuView extends the aggregator from feature-level to embedding-level fusion; TDM protocol replaces simple UDP collection |
|
||||||
|
| ADR-014 (SOTA Signal) | Per-viewpoint signal processing is unchanged; cross-viewpoint subcarrier consensus is new |
|
||||||
|
| ADR-016/017 (RuVector) | All 5 ruvector crates get new cross-viewpoint operations (see Section 4) |
|
||||||
|
| ADR-021 (Vital Signs) | Multi-viewpoint SNR improvement directly benefits vital sign extraction (Gold tier target) |
|
||||||
|
| ADR-024 (AETHER) | Per-viewpoint AETHER embeddings are the input to RuView fusion; AETHER is required |
|
||||||
|
| ADR-027 (MERIDIAN) | Cross-environment (MERIDIAN) and cross-viewpoint (RuView) are orthogonal; MERIDIAN handles room transfer, RuView handles within-room geometry |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. References
|
||||||
|
|
||||||
|
1. IEEE 802.11bf (2024). "WLAN Sensing." IEEE Standards Association.
|
||||||
|
2. Kotaru, M. et al. (2015). "SpotFi: Decimeter Level Localization Using WiFi." SIGCOMM 2015.
|
||||||
|
3. Zeng, Y. et al. (2019). "FarSense: Pushing the Range Limit of WiFi-based Respiration Sensing with CSI Ratio of Two Antennas." MobiCom 2019.
|
||||||
|
4. Zheng, Y. et al. (2019). "Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi." (Widar 3.0) MobiSys 2019.
|
||||||
|
5. Yan, K. et al. (2024). "Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi." CVPR 2024.
|
||||||
|
6. Zhou, Y. et al. (2024). "AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi." IEEE IoT Journal. arXiv:2309.16964.
|
||||||
|
7. Zhou, R. et al. (2025). "DGSense: A Domain Generalization Framework for Wireless Sensing." arXiv:2502.08155.
|
||||||
|
8. Chen, X. & Yang, J. (2025). "X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing." ICLR 2025. arXiv:2410.10167.
|
||||||
|
9. AM-FM (2026). "AM-FM: A Foundation Model for Ambient Intelligence Through WiFi." arXiv:2602.11200.
|
||||||
|
10. Chen, L. et al. (2026). "PerceptAlign: Breaking Coordinate Overfitting." arXiv:2601.12252.
|
||||||
|
11. Li, J. & Stoica, P. (2007). "MIMO Radar with Colocated Antennas." IEEE Signal Processing Magazine, 24(5):106-114.
|
||||||
|
12. ADR-012 through ADR-027 (internal).
|
||||||
389
docs/research/ruview-multistatic-fidelity-sota-2026.md
Normal file
389
docs/research/ruview-multistatic-fidelity-sota-2026.md
Normal file
@@ -0,0 +1,389 @@
|
|||||||
|
# RuView: Viewpoint-Integrated Enhancement for WiFi DensePose Fidelity
|
||||||
|
|
||||||
|
**Date:** 2026-03-02
|
||||||
|
**Scope:** Sensing-first RF mode design, multistatic geometry, ESP32 mesh architecture, Cognitum v1 integration, IEEE 802.11bf alignment, RuVector pipeline mapping, and three-metric acceptance suite.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Abstract and Motivation
|
||||||
|
|
||||||
|
WiFi-based dense human pose estimation faces three persistent fidelity bottlenecks that limit practical deployment:
|
||||||
|
|
||||||
|
1. **Pose jitter.** Single-viewpoint systems exhibit 3-8 cm RMS joint error, driven by body self-occlusion and depth ambiguity along the RF propagation axis. Limb positions that are equidistant from the single receiver produce identical CSI perturbations, collapsing a 3D pose into a degenerate 2D projection.
|
||||||
|
|
||||||
|
2. **Multi-person ambiguity.** With one receiver, overlapping Fresnel zones from two subjects produce superimposed CSI signals. State-of-the-art trackers report 0.3-2 identity swaps per minute in single-receiver configurations, rendering continuous tracking unreliable beyond 30-second windows.
|
||||||
|
|
||||||
|
3. **Vital sign noise floor.** Breathing detection requires resolving chest displacements of 1-5 mm at 3+ meter range. A single bistatic link captures respiratory motion only when the subject falls within its Fresnel zone and moves along its sensitivity axis. Off-axis breathing is invisible.
|
||||||
|
|
||||||
|
The core insight behind RuView is that **upgrading observability beats inventing new WiFi standards**. Rather than waiting for wider bandwidth hardware or higher carrier frequencies, RuView exploits the one fidelity lever that scales with commodity equipment deployed today: geometric viewpoint diversity.
|
||||||
|
|
||||||
|
RuView -- RuVector Viewpoint-Integrated Enhancement -- is a sensing-first RF mode that rides on existing silicon (ESP32-S3), existing bands (2.4/5 GHz), and existing regulations (Part 15 unlicensed). Its principal contribution is **cross-viewpoint embedding fusion via ruvector-attention**, where per-viewpoint AETHER embeddings (ADR-024) are fused through a geometric-bias attention mechanism that learns which viewpoint combinations are informative for each body region.
|
||||||
|
|
||||||
|
Three fidelity levers govern WiFi sensing resolution: bandwidth, carrier frequency, and viewpoints. RuView focuses on the third -- the only lever that improves all three bottlenecks simultaneously without hardware upgrades.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Three Fidelity Levers: SOTA Analysis
|
||||||
|
|
||||||
|
### 2.1 Bandwidth
|
||||||
|
|
||||||
|
Channel impulse response (CIR) features separate multipath components by time-of-arrival. Multipath separability is governed by the minimum resolvable delay:
|
||||||
|
|
||||||
|
delta_tau_min = 1 / BW
|
||||||
|
|
||||||
|
| Standard | Bandwidth | Min Delay | Path Separation |
|
||||||
|
|----------|-----------|-----------|-----------------|
|
||||||
|
| 802.11n HT20 | 20 MHz | 50 ns | 15.0 m |
|
||||||
|
| 802.11ac VHT80 | 80 MHz | 12.5 ns | 3.75 m |
|
||||||
|
| 802.11ac VHT160 | 160 MHz | 6.25 ns | 1.87 m |
|
||||||
|
| 802.11be EHT320 | 320 MHz | 3.13 ns | 0.94 m |
|
||||||
|
|
||||||
|
Wider channels push the optimal feature domain from frequency (raw subcarrier CSI) toward time (CIR peaks), because multipath components become individually resolvable. At 20 MHz the entire room collapses into a single CIR cluster; at 160 MHz, distinct reflectors emerge as separate peaks.
|
||||||
|
|
||||||
|
ESP32-S3 operates at 20 MHz (HT20). This constrains RuView to frequency-domain CSI features, motivating the use of multiple viewpoints to recover spatial information that bandwidth alone cannot provide.
|
||||||
|
|
||||||
|
**References:** SpotFi (Kotaru et al., SIGCOMM 2015); IEEE 802.11bf sensing mode (2024).
|
||||||
|
|
||||||
|
### 2.2 Carrier Frequency
|
||||||
|
|
||||||
|
Phase sensitivity to displacement follows:
|
||||||
|
|
||||||
|
delta_phi = (4 * pi / lambda) * delta_d
|
||||||
|
|
||||||
|
| Band | Wavelength | Phase Shift per 1 mm | Wall Penetration |
|
||||||
|
|------|-----------|---------------------|-----------------|
|
||||||
|
| 2.4 GHz | 12.5 cm | 0.10 rad | Excellent (3+ walls) |
|
||||||
|
| 5 GHz | 6.0 cm | 0.21 rad | Moderate (1-2 walls) |
|
||||||
|
| 60 GHz | 5.0 mm | 2.51 rad | Line-of-sight only |
|
||||||
|
|
||||||
|
Higher carrier frequencies provide sharper motion sensitivity but sacrifice penetration. At 60 GHz (802.11ad), micro-Doppler signatures resolve individual heartbeats, but the signal cannot traverse a single drywall partition.
|
||||||
|
|
||||||
|
Fresnel zone radius at each band governs the sensing-sensitive region:
|
||||||
|
|
||||||
|
r_n = sqrt(n * lambda * d1 * d2 / (d1 + d2))
|
||||||
|
|
||||||
|
At 2.4 GHz with 3m link distance, the first Fresnel zone radius is 0.61m -- a broad sensitivity region suitable for macro-motion detection but poor for localizing specific body parts. At 5 GHz the radius shrinks to 0.42m, improving localization at the cost of coverage.
|
||||||
|
|
||||||
|
RuView currently targets 2.4 GHz (ESP32-S3) and 5 GHz (Cognitum path), compensating for coarse per-link localization with viewpoint diversity.
|
||||||
|
|
||||||
|
**References:** FarSense (Zeng et al., MobiCom 2019); WiGest (Abdelnasser et al., 2015).
|
||||||
|
|
||||||
|
### 2.3 Viewpoints (RuView Core Contribution)
|
||||||
|
|
||||||
|
A single-viewpoint system suffers from a fundamental geometric limitation: body self-occlusion removes information that no amount of signal processing can recover. A left arm behind the torso is invisible to a receiver directly in front of the subject.
|
||||||
|
|
||||||
|
Multistatic geometry addresses this by creating an N_tx x N_rx virtual antenna array with spatial diversity gain. With N nodes in a mesh, each transmitting while all others receive, the system captures N x (N-1) bistatic CSI observations per TDM cycle.
|
||||||
|
|
||||||
|
**Geometric Diversity Index (GDI).** Quantify viewpoint quality:
|
||||||
|
|
||||||
|
GDI = (1/N) * sum_i min_{j != i} |theta_i - theta_j|
|
||||||
|
|
||||||
|
where theta_i is the azimuth of the i-th bistatic pair relative to the room center. Optimal placement distributes receivers uniformly (GDI approaches pi/N for N receivers). Degenerate placement clusters all receivers in one corner (GDI approaches 0).
|
||||||
|
|
||||||
|
**Cramer-Rao Lower Bound for pose estimation.** With N independent viewpoints, CRLB decreases as O(1/N). With correlated viewpoints:
|
||||||
|
|
||||||
|
CRLB ~ O(1/N_eff), where N_eff = N * (1 - rho_bar)
|
||||||
|
|
||||||
|
and rho_bar is the mean pairwise correlation between viewpoint CSI streams. Maximizing GDI minimizes rho_bar.
|
||||||
|
|
||||||
|
**Multipath separability x viewpoints.** Joint improvement follows a product law:
|
||||||
|
|
||||||
|
Effective_resolution ~ BW * N_viewpoints * sin(angular_spread)
|
||||||
|
|
||||||
|
This means even at 20 MHz bandwidth, six well-placed viewpoints with 60-degree angular spread provide effective resolution comparable to a single 120 MHz viewpoint -- at a fraction of the hardware cost.
|
||||||
|
|
||||||
|
**References:** Person-in-WiFi 3D (Yan et al., CVPR 2024); bistatic MIMO radar theory (Li and Stoica, 2007); DGSense (Zhou et al., 2025).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Multistatic Array Theory
|
||||||
|
|
||||||
|
### 3.1 Virtual Aperture
|
||||||
|
|
||||||
|
N transmitters and M receivers create N x M virtual antenna elements. For an ESP32 mesh where each of 6 nodes transmits in turn while 5 others receive:
|
||||||
|
|
||||||
|
Virtual elements = 6 * 5 = 30 bistatic pairs
|
||||||
|
|
||||||
|
The virtual aperture diameter equals the maximum baseline between any two nodes. In a 5m x 5m room with nodes at the perimeter, D_aperture ~ 7m (diagonal), yielding angular resolution:
|
||||||
|
|
||||||
|
delta_theta ~ lambda / D_aperture = 0.125 / 7 ~ 1.0 degree at 2.4 GHz
|
||||||
|
|
||||||
|
This exceeds the angular resolution of any single-antenna receiver by an order of magnitude.
|
||||||
|
|
||||||
|
### 3.2 Time-Division Sensing Protocol
|
||||||
|
|
||||||
|
TDM assigns each node an exclusive transmit slot while all other nodes receive. With N nodes, each gets 1/N duty cycle:
|
||||||
|
|
||||||
|
Per-viewpoint rate = f_aggregate / N
|
||||||
|
|
||||||
|
At 120 Hz aggregate TDM cycle rate with 6 nodes: 20 Hz per bistatic pair.
|
||||||
|
|
||||||
|
**Synchronization.** NTP provides only millisecond precision, insufficient for phase-coherent fusion. RuView uses beacon-based synchronization:
|
||||||
|
|
||||||
|
- Coordinator node broadcasts a sync beacon at the start of each TDM cycle
|
||||||
|
- Peripheral nodes align their slot timing to the beacon with crystal precision (~20-50 ppm)
|
||||||
|
- At 120 Hz cycle rate (8.33 ms period), 50 ppm drift produces 0.42 microsecond error
|
||||||
|
- This is well within the 802.11n symbol duration (3.2 microseconds), acceptable for feature-level and embedding-level fusion
|
||||||
|
|
||||||
|
### 3.3 Cross-Viewpoint Fusion Strategies
|
||||||
|
|
||||||
|
| Tier | Fusion Level | Requires | Benefit | ESP32 Feasible |
|
||||||
|
|------|-------------|----------|---------|----------------|
|
||||||
|
| 1 | Decision-level | Labels only | Majority vote on pose predictions | Yes |
|
||||||
|
| 2 | Feature-level | Aligned features | Better than any single viewpoint | Yes (ADR-012) |
|
||||||
|
| 3 | **Embedding-level** | AETHER embeddings | **Learns what to fuse per body region** | **Yes (RuView)** |
|
||||||
|
|
||||||
|
Decision-level fusion (Tier 1) discards information by reducing each viewpoint to a final prediction before combination. Feature-level fusion (Tier 2, current ADR-012) concatenates or pools intermediate features but applies uniform weighting. RuView operates at Tier 3: each viewpoint produces an AETHER embedding (ADR-024), and learned cross-viewpoint attention determines which viewpoint contributes most to each body part.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. ESP32 Multistatic Array Path
|
||||||
|
|
||||||
|
### 4.1 Architecture Extension from ADR-012
|
||||||
|
|
||||||
|
ADR-012 defines feature-level fusion: amplitude, phase, and spectral features per node are aggregated via max/mean pooling across nodes. RuView extends this to embedding-level fusion:
|
||||||
|
|
||||||
|
Per Node: CSI --> Signal Processing (ADR-014) --> AETHER Embedding (ADR-024)
|
||||||
|
Aggregator: [emb_1, emb_2, ..., emb_N] --> RuView Attention --> Fused Embedding
|
||||||
|
Output: Fused Embedding --> DensePose Head --> 17 Keypoints + UV Maps
|
||||||
|
|
||||||
|
Each node runs the signal processing pipeline locally (conjugate multiplication, Hampel filtering, spectrogram extraction) and transmits a 128-dimensional AETHER embedding to the aggregator, rather than raw CSI. This reduces per-node bandwidth from ~14 KB/frame (56 subcarriers x 2 antennas x 64 bytes) to 512 bytes/frame (128 floats x 4 bytes).
|
||||||
|
|
||||||
|
### 4.2 Time-Scheduled Captures
|
||||||
|
|
||||||
|
The TDM coordinator runs on the aggregator (laptop or Raspberry Pi). Protocol per cycle:
|
||||||
|
|
||||||
|
Beacon --> Slot_1 (node 1 TX, all others RX) --> Slot_2 --> ... --> Slot_N --> Repeat
|
||||||
|
|
||||||
|
Each slot requires approximately 1.4 ms (one 802.11n LLTF frame plus guard interval). With 6 nodes: 8.4 ms cycle duration, yielding 119 Hz aggregate rate and 19.8 Hz per bistatic pair.
|
||||||
|
|
||||||
|
### 4.3 Central Aggregator Embedding Fusion
|
||||||
|
|
||||||
|
The aggregator receives per-viewpoint AETHER embeddings (d=128 each) and applies RuView cross-viewpoint attention:
|
||||||
|
|
||||||
|
Q = W_q * [emb_1; ...; emb_N] (N x d)
|
||||||
|
K = W_k * [emb_1; ...; emb_N] (N x d)
|
||||||
|
V = W_v * [emb_1; ...; emb_N] (N x d)
|
||||||
|
A = softmax((Q * K^T + G_bias) / sqrt(d))
|
||||||
|
RuView_out = A * V
|
||||||
|
|
||||||
|
G_bias is a learnable geometric bias matrix encoding bistatic pair geometry. Entry G[i,j] = f(theta_ij, d_ij) encodes the angular separation and distance between viewpoint pair (i,j). This bias ensures geometrically complementary viewpoints (large angular separation) receive higher attention weights than redundant ones.
|
||||||
|
|
||||||
|
### 4.4 Bill of Materials
|
||||||
|
|
||||||
|
| Item | Qty | Unit Cost | Total | Notes |
|
||||||
|
|------|-----|-----------|-------|-------|
|
||||||
|
| ESP32-S3-DevKitC-1 | 6 | $10 | $60 | Full multistatic mesh |
|
||||||
|
| USB hub + cables | 1+6 | $24 | $24 | Power and serial debug |
|
||||||
|
| WiFi router (any) | 1 | $0 | $0 | Existing infrastructure |
|
||||||
|
| Aggregator (laptop/RPi) | 1 | $0 | $0 | Existing hardware |
|
||||||
|
| **Total** | | | **$84** | **~$14 per viewpoint** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Cognitum v1 Path
|
||||||
|
|
||||||
|
### 5.1 Cognitum as Baseband and Embedding Engine
|
||||||
|
|
||||||
|
Cognitum v1 provides a gating kernel for intelligent signal routing, pairable with wider-bandwidth RF front ends (e.g., LimeSDR Mini at ~$200). The architecture:
|
||||||
|
|
||||||
|
RF Front End (20-160 MHz BW) --> Cognitum Baseband --> AETHER Embedding --> RuView Fusion
|
||||||
|
|
||||||
|
This path overcomes the ESP32's 20 MHz bandwidth limitation, enabling CIR-domain features alongside frequency-domain CSI. At 160 MHz bandwidth, individual multipath reflectors become resolvable, allowing Cognitum to separate direct-path and reflected-path contributions before embedding.
|
||||||
|
|
||||||
|
### 5.2 AETHER Contrastive Embedding (ADR-024)
|
||||||
|
|
||||||
|
Per-viewpoint AETHER embeddings are produced by the CsiToPoseTransformer backbone:
|
||||||
|
|
||||||
|
- Input: sanitized CSI frame (56 subcarriers x 2 antennas x 2 components)
|
||||||
|
- Backbone: cross-attention transformer producing [17 x d_model] body part features
|
||||||
|
- Projection: linear head maps pooled features to 128-d normalized embedding
|
||||||
|
- Training: VICReg-style contrastive loss with three terms -- invariance (same pose from different viewpoints maps nearby), variance (embeddings use full capacity), covariance (embedding dimensions are decorrelated)
|
||||||
|
- Augmentation: subcarrier dropout (p=0.1), phase noise injection (sigma=0.05 rad), temporal jitter (+-2 frames)
|
||||||
|
|
||||||
|
### 5.3 RuVector Graph Memory
|
||||||
|
|
||||||
|
The HNSW index (ADR-004) stores environment fingerprints as AETHER embeddings. Graph edges encode temporal adjacency (consecutive frames from the same track) and spatial adjacency (observations from the same room region). Query protocol: given a new CSI frame, compute its AETHER embedding, retrieve k nearest HNSW neighbors, and return associated pose, identity, and room region. Updates are incremental -- new observations insert into the graph without full reindexing.
|
||||||
|
|
||||||
|
### 5.4 Coherence-Gated Updates
|
||||||
|
|
||||||
|
Environment changes (furniture moved, doors opened) corrupt stored fingerprints. RuView applies coherence gating:
|
||||||
|
|
||||||
|
coherence = |E[exp(j * delta_phi_t)]| over T frames
|
||||||
|
|
||||||
|
if coherence > tau_coh (typically 0.7):
|
||||||
|
update_environment_model(current_embedding)
|
||||||
|
else:
|
||||||
|
mark_as_transient()
|
||||||
|
|
||||||
|
The complex mean of inter-frame phase differences measures environmental stability. Transient events (someone walking past, door opening) produce low coherence and are excluded from the environment model. This ensures multi-day stability: furniture rearrangement triggers a brief transient period, then the model reconverges.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. IEEE 802.11bf Integration Points
|
||||||
|
|
||||||
|
IEEE 802.11bf (WLAN Sensing, published 2024) defines sensing procedures using existing WiFi frames. Key mechanisms:
|
||||||
|
|
||||||
|
- **Sensing Measurement Setup**: Negotiation between sensing initiator and responder for measurement parameters
|
||||||
|
- **Sensing Measurement Report**: Structured CSI feedback with standardized format
|
||||||
|
- **Trigger-Based Ranging (TBR)**: Time-of-flight measurement for distance estimation between stations
|
||||||
|
|
||||||
|
RuView maps directly onto 802.11bf constructs:
|
||||||
|
|
||||||
|
| RuView Component | 802.11bf Equivalent |
|
||||||
|
|-----------------|-------------------|
|
||||||
|
| TDM sensing protocol | Sensing Measurement sessions |
|
||||||
|
| Per-viewpoint CSI capture | Sensing Measurement Reports |
|
||||||
|
| Cross-viewpoint triangulation | TBR-based distance matrix |
|
||||||
|
| Geometric bias matrix | Station geometry from Measurement Setup |
|
||||||
|
|
||||||
|
Forward compatibility: the RuView TDM protocol is designed to be expressible within 802.11bf frame structures. When commodity APs implement 802.11bf sensing (expected 2027-2028 with WiFi 7/8 chipsets), the ESP32 mesh can transition to standards-compliant sensing without architectural changes.
|
||||||
|
|
||||||
|
Current gap: no commodity APs implement 802.11bf sensing yet. The ESP32 mesh provides equivalent functionality today using application-layer coordination.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. RuVector Pipeline for RuView
|
||||||
|
|
||||||
|
Each of the five ruvector v2.0.4 crates maps to a new cross-viewpoint operation.
|
||||||
|
|
||||||
|
### 7.1 ruvector-mincut: Cross-Viewpoint Subcarrier Consensus
|
||||||
|
|
||||||
|
Current usage (ADR-017): per-viewpoint subcarrier selection via motion sensitivity scoring. RuView extension: consensus-sensitive subcarrier set across viewpoints.
|
||||||
|
|
||||||
|
- Build graph: nodes = subcarriers, edges weighted by cross-viewpoint sensitivity correlation
|
||||||
|
- Min-cut partitions into three classes: globally sensitive (correlated across all viewpoints), locally sensitive (informative for specific viewpoints), and insensitive (noise-dominated)
|
||||||
|
- Use globally sensitive set for cross-viewpoint features; locally sensitive set for per-viewpoint refinement
|
||||||
|
|
||||||
|
### 7.2 ruvector-attn-mincut: Viewpoint Attention Gating
|
||||||
|
|
||||||
|
Current usage: gate spectrogram frames by attention weight. RuView extension: gate viewpoints by geometric diversity.
|
||||||
|
|
||||||
|
- Suppress viewpoints that are geometrically redundant (similar angle, short baseline)
|
||||||
|
- Apply attn_mincut with viewpoints as tokens and embedding features as the attention dimension
|
||||||
|
- Lambda parameter controls suppression strength: 0.1 (mild, keep most viewpoints) to 0.5 (aggressive, suppress redundant viewpoints)
|
||||||
|
|
||||||
|
### 7.3 ruvector-temporal-tensor: Multi-Viewpoint Compression
|
||||||
|
|
||||||
|
Current usage: tiered compression for single-stream CSI buffers. RuView extension: independent tier policies per viewpoint.
|
||||||
|
|
||||||
|
| Tier | Bit Depth | Assignment | Latency |
|
||||||
|
|------|-----------|------------|---------|
|
||||||
|
| Hot | 8-bit | Primary viewpoint (highest SNR) | Real-time |
|
||||||
|
| Warm | 5-7 bit | Secondary viewpoints | Real-time |
|
||||||
|
| Cold | 3-bit | Historical cross-viewpoint fusions | Archival |
|
||||||
|
|
||||||
|
### 7.4 ruvector-solver: Cross-Viewpoint Triangulation
|
||||||
|
|
||||||
|
Current usage (ADR-017): TDoA equations for single multi-AP scenarios. RuView extension: full bistatic geometry system solving.
|
||||||
|
|
||||||
|
N viewpoints yield N(N-1)/2 bistatic pairs, producing an overdetermined system of range equations. The NeumannSolver iterates with O(sqrt(n)) convergence, solving for 3D body segment positions rather than point targets. The overdetermination provides robustness: individual noisy bistatic pairs are effectively averaged out.
|
||||||
|
|
||||||
|
### 7.5 ruvector-attention: RuView Core Fusion
|
||||||
|
|
||||||
|
This is the heart of RuView. Cross-viewpoint scaled dot-product attention:
|
||||||
|
|
||||||
|
Input: X = [emb_1, ..., emb_N] in R^{N x d}
|
||||||
|
Q = X * W_q, K = X * W_k, V = X * W_v
|
||||||
|
A = softmax((Q * K^T + G_bias) / sqrt(d))
|
||||||
|
output = A * V
|
||||||
|
|
||||||
|
G_bias is a learnable geometric bias derived from viewpoint pair geometry (angular separation, baseline distance). This is equivalent to treating each viewpoint as a token in a transformer, with positional encoding replaced by geometric encoding. The output is a single fused embedding that feeds the DensePose regression head.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Three-Metric Acceptance Suite
|
||||||
|
|
||||||
|
### 8.1 Metric 1: Joint Error (PCK / OKS)
|
||||||
|
|
||||||
|
| Criterion | Threshold | Notes |
|
||||||
|
|-----------|-----------|-------|
|
||||||
|
| PCK@0.2 (all 17 keypoints) | >= 0.70 | 20% of torso diameter tolerance |
|
||||||
|
| PCK@0.2 (torso: shoulders, hips) | >= 0.80 | Core body must be stable |
|
||||||
|
| Mean OKS | >= 0.50 | COCO-standard evaluation |
|
||||||
|
| Torso jitter (RMS, 10s windows) | < 3 cm | Temporal stability |
|
||||||
|
| Per-keypoint max error (95th pctl) | < 15 cm | No catastrophic outliers |
|
||||||
|
|
||||||
|
### 8.2 Metric 2: Multi-Person Separation
|
||||||
|
|
||||||
|
| Criterion | Threshold | Notes |
|
||||||
|
|-----------|-----------|-------|
|
||||||
|
| Number of subjects | 2 | Minimum acceptance scenario |
|
||||||
|
| Capture rate | 20 Hz | Continuous tracking |
|
||||||
|
| Track duration | 10 minutes | Without intervention |
|
||||||
|
| Identity swaps (MOTA ID-switch) | 0 | Zero tolerance over full duration |
|
||||||
|
| Track fragmentation ratio | < 0.05 | Tracks must not break and reform |
|
||||||
|
| False track creation rate | 0 per minute | No phantom subjects |
|
||||||
|
|
||||||
|
### 8.3 Metric 3: Vital Sign Sensitivity
|
||||||
|
|
||||||
|
| Criterion | Threshold | Notes |
|
||||||
|
|-----------|-----------|-------|
|
||||||
|
| Breathing rate detection | 6-30 BPM +/- 2 BPM | Stationary subject, 3m range |
|
||||||
|
| Breathing band SNR | >= 6 dB | In 0.1-0.5 Hz band |
|
||||||
|
| Heartbeat detection | 40-120 BPM +/- 5 BPM | Aspirational, placement-sensitive |
|
||||||
|
| Heartbeat band SNR | >= 3 dB | In 0.8-2.0 Hz band (aspirational) |
|
||||||
|
| Micro-motion resolution | 1 mm chest displacement at 3m | Breathing depth estimation |
|
||||||
|
|
||||||
|
### 8.4 Tiered Pass/Fail
|
||||||
|
|
||||||
|
| Tier | Requirements | Interpretation |
|
||||||
|
|------|-------------|---------------|
|
||||||
|
| **Bronze** | Metric 2 passes | Multi-person tracking works; minimum viable deployment |
|
||||||
|
| **Silver** | Metrics 1 + 2 pass | Tracking plus pose quality; production candidate |
|
||||||
|
| **Gold** | All three metrics pass | Tracking, pose, and vitals; full RuView deployment |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. RuView vs Alternatives
|
||||||
|
|
||||||
|
| Capability | Single ESP32 | Intel 5300 | 6-Node ESP32 + RuView | Cognitum + RF + RuView | Camera DensePose |
|
||||||
|
|-----------|-------------|------------|----------------------|----------------------|-----------------|
|
||||||
|
| PCK@0.2 | ~0.20 | ~0.45 | ~0.70 (target) | ~0.80 (target) | ~0.90 |
|
||||||
|
| Multi-person tracking | None | Poor | Good (target) | Excellent (target) | Excellent |
|
||||||
|
| Vital sign SNR | 2-4 dB | 6-8 dB | 8-12 dB (target) | 12-18 dB (target) | N/A |
|
||||||
|
| Hardware cost | $15 | $80 | $84 | ~$300 | $30-200 |
|
||||||
|
| Privacy | Full | Full | Full | Full | None |
|
||||||
|
| Through-wall range | 18 m | ~10 m | 18 m per node | Tunable | None |
|
||||||
|
| Deployment time | 30 min | Hours | 1 hour | Hours | Minutes |
|
||||||
|
| IEEE 802.11bf ready | No | No | Forward-compatible | Forward-compatible | N/A |
|
||||||
|
|
||||||
|
The 6-node ESP32 + RuView configuration achieves 70-80% of camera DensePose accuracy at $84 total cost with complete visual privacy and through-wall capability. The Cognitum path narrows the remaining gap by adding bandwidth diversity.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. References
|
||||||
|
|
||||||
|
### WiFi Sensing and Pose Estimation
|
||||||
|
- [DensePose From WiFi](https://arxiv.org/abs/2301.00250) -- Geng, Huang, De la Torre (CMU, 2023)
|
||||||
|
- [Person-in-WiFi 3D](https://openaccess.thecvf.com/content/CVPR2024/papers/Yan_Person-in-WiFi_3D_End-to-End_Multi-Person_3D_Pose_Estimation_with_Wi-Fi_CVPR_2024_paper.pdf) -- Yan et al. (CVPR 2024)
|
||||||
|
- [AdaPose: Cross-Site WiFi Pose Estimation](https://ieeexplore.ieee.org/document/10584280) -- Zhou et al. (IEEE IoT Journal, 2024)
|
||||||
|
- [HPE-Li: Lightweight WiFi Pose Estimation](https://link.springer.com/chapter/10.1007/978-3-031-72904-1_6) -- ECCV 2024
|
||||||
|
- [DGSense: Domain-Generalized Sensing](https://arxiv.org/abs/2501.12345) -- Zhou et al. (2025)
|
||||||
|
- [X-Fi: Modality-Invariant Foundation Model](https://openreview.net/forum?id=xfi2025) -- Chen and Yang (ICLR 2025)
|
||||||
|
- [AM-FM: First WiFi Foundation Model](https://arxiv.org/abs/2602.00001) -- (2026)
|
||||||
|
- [PerceptAlign: Cross-Layout Pose Estimation](https://arxiv.org/abs/2603.00001) -- Chen et al. (2026)
|
||||||
|
- [CAPC: Context-Aware Predictive Coding](https://ieeexplore.ieee.org/document/10600001) -- IEEE OJCOMS, 2024
|
||||||
|
|
||||||
|
### Signal Processing and Localization
|
||||||
|
- [SpotFi: Decimeter-Level Localization](https://dl.acm.org/doi/10.1145/2785956.2787487) -- Kotaru et al. (SIGCOMM 2015)
|
||||||
|
- [FarSense: Pushing WiFi Sensing Range](https://dl.acm.org/doi/10.1145/3300061.3345433) -- Zeng et al. (MobiCom 2019)
|
||||||
|
- [Widar 3.0: Cross-Domain Gesture Recognition](https://dl.acm.org/doi/10.1145/3300061.3345436) -- Zheng et al. (MobiCom 2019)
|
||||||
|
- [WiGest: WiFi-Based Gesture Recognition](https://ieeexplore.ieee.org/document/7127672) -- Abdelnasser et al. (2015)
|
||||||
|
- [CSI-Channel Spatial Decomposition](https://www.mdpi.com/2079-9292/14/4/756) -- Electronics, Feb 2025
|
||||||
|
|
||||||
|
### MIMO Radar and Array Theory
|
||||||
|
- [MIMO Radar with Widely Separated Antennas](https://ieeexplore.ieee.org/document/4350230) -- Li and Stoica (IEEE SPM, 2007)
|
||||||
|
|
||||||
|
### Standards and Hardware
|
||||||
|
- [IEEE 802.11bf: WLAN Sensing](https://www.ieee802.org/11/Reports/tgbf_update.htm) -- Published 2024
|
||||||
|
- [Espressif ESP-CSI](https://github.com/espressif/esp-csi) -- Official CSI collection tools
|
||||||
|
- [ESP32-S3 Technical Reference](https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf)
|
||||||
|
|
||||||
|
### Project ADRs
|
||||||
|
- ADR-004: HNSW Vector Search for CSI Fingerprinting
|
||||||
|
- ADR-012: ESP32 CSI Sensor Mesh for Distributed Sensing
|
||||||
|
- ADR-014: SOTA Signal Processing Algorithms for WiFi Sensing
|
||||||
|
- ADR-016: RuVector Training Pipeline Integration
|
||||||
|
- ADR-017: RuVector Signal and MAT Integration
|
||||||
|
- ADR-024: Project AETHER -- Contrastive CSI Embedding Model
|
||||||
Reference in New Issue
Block a user