ADR-012 now reflects the actual working firmware: NVS runtime config, Docker build workflow, pre-built binary release, and verified metrics (20 Hz, zero frame loss). Status changed from Proposed to Accepted. Co-Authored-By: claude-flow <ruv@ruv.net>
348 lines
17 KiB
Markdown
348 lines
17 KiB
Markdown
# ADR-012: ESP32 CSI Sensor Mesh for Distributed Sensing
|
||
|
||
## Status
|
||
Accepted — Partially Implemented (firmware + aggregator working, see ADR-018)
|
||
|
||
## Date
|
||
2026-02-28
|
||
|
||
## Context
|
||
|
||
### The Hardware Reality Gap
|
||
|
||
WiFi-DensePose's Rust and Python pipelines implement real signal processing (FFT, phase unwrapping, Doppler extraction, correlation features), but the system currently has no defined path from **physical WiFi hardware → CSI bytes → pipeline input**. The `csi_extractor.py` and `router_interface.py` modules contain placeholder parsers that return `np.random.rand()` instead of real parsed data (see ADR-011).
|
||
|
||
To close this gap, we need a concrete, affordable, reproducible hardware platform that produces real CSI data and streams it into the existing pipeline.
|
||
|
||
### Why ESP32
|
||
|
||
| Factor | ESP32/ESP32-S3 | Intel 5300 (iwl5300) | Atheros AR9580 |
|
||
|--------|---------------|---------------------|----------------|
|
||
| Cost | ~$5-15/node | ~$50-100 (used NIC) | ~$30-60 (used NIC) |
|
||
| Availability | Mass produced, in stock | Discontinued, eBay only | Discontinued, eBay only |
|
||
| CSI Support | Official ESP-IDF API | Linux CSI Tool (kernel mod) | Atheros CSI Tool |
|
||
| Form Factor | Standalone MCU | Requires PCIe/Mini-PCIe host | Requires PCIe host |
|
||
| Deployment | Battery/USB, wireless | Desktop/laptop only | Desktop/laptop only |
|
||
| Antenna Config | 1-2 TX, 1-2 RX | 3 TX, 3 RX (MIMO) | 3 TX, 3 RX (MIMO) |
|
||
| Subcarriers | 52-56 (802.11n) | 30 (compressed) | 56 (full) |
|
||
| Fidelity | Lower (consumer SoC) | Higher (dedicated NIC) | Higher (dedicated NIC) |
|
||
|
||
**ESP32 wins on deployability**: It's the only option where a stranger can buy nodes on Amazon, flash firmware, and have a working CSI mesh in an afternoon. Intel 5300 and Atheros cards require specific hardware, kernel modifications, and legacy OS versions.
|
||
|
||
### ESP-IDF CSI API
|
||
|
||
Espressif provides official CSI support through three key functions:
|
||
|
||
```c
|
||
// 1. Configure what CSI data to capture
|
||
wifi_csi_config_t csi_config = {
|
||
.lltf_en = true, // Long Training Field (best for CSI)
|
||
.htltf_en = true, // HT-LTF
|
||
.stbc_htltf2_en = true, // STBC HT-LTF2
|
||
.ltf_merge_en = true, // Merge LTFs
|
||
.channel_filter_en = false,
|
||
.manu_scale = false,
|
||
};
|
||
esp_wifi_set_csi_config(&csi_config);
|
||
|
||
// 2. Register callback for received CSI data
|
||
esp_wifi_set_csi_rx_cb(csi_data_callback, NULL);
|
||
|
||
// 3. Enable CSI collection
|
||
esp_wifi_set_csi(true);
|
||
|
||
// Callback receives:
|
||
void csi_data_callback(void *ctx, wifi_csi_info_t *info) {
|
||
// info->rx_ctrl: RSSI, noise_floor, channel, secondary_channel, etc.
|
||
// info->buf: Raw CSI data (I/Q pairs per subcarrier)
|
||
// info->len: Length of CSI data buffer
|
||
// Typical: 112 bytes = 56 subcarriers × 2 (I,Q) × 1 byte each
|
||
}
|
||
```
|
||
|
||
## Decision
|
||
|
||
We will build an ESP32 CSI Sensor Mesh as the primary hardware integration path, with a full stack from firmware to aggregator to Rust pipeline to visualization.
|
||
|
||
### System Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ ESP32 CSI Sensor Mesh │
|
||
├─────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||
│ │ ESP32 │ │ ESP32 │ │ ESP32 │ ... (3-6 nodes) │
|
||
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │
|
||
│ │ │ │ │ │ │ │
|
||
│ │ CSI Rx │ │ CSI Rx │ │ CSI Rx │ ← WiFi frames from │
|
||
│ │ FFT │ │ FFT │ │ FFT │ consumer router │
|
||
│ │ Features │ │ Features │ │ Features │ │
|
||
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
||
│ │ │ │ │
|
||
│ │ UDP/TCP stream (WiFi or secondary channel) │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌─────────────────────────────────────────┐ │
|
||
│ │ Aggregator │ │
|
||
│ │ (Laptop / Raspberry Pi / Seed device) │ │
|
||
│ │ │ │
|
||
│ │ 1. Receive CSI streams from all nodes │ │
|
||
│ │ 2. Timestamp alignment (per-node) │ │
|
||
│ │ 3. Feature-level fusion │ │
|
||
│ │ 4. Feed into Rust/Python pipeline │ │
|
||
│ │ 5. Serve WebSocket to visualization │ │
|
||
│ └──────────────────┬──────────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ┌─────────────────────────────────────────┐ │
|
||
│ │ WiFi-DensePose Pipeline │ │
|
||
│ │ │ │
|
||
│ │ CsiProcessor → FeatureExtractor → │ │
|
||
│ │ MotionDetector → PoseEstimator → │ │
|
||
│ │ Three.js Visualization │ │
|
||
│ └─────────────────────────────────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Node Firmware Specification
|
||
|
||
**ESP-IDF project**: `firmware/esp32-csi-node/`
|
||
|
||
```
|
||
firmware/esp32-csi-node/
|
||
├── CMakeLists.txt
|
||
├── sdkconfig.defaults # Menuconfig defaults with CSI enabled (gitignored)
|
||
├── main/
|
||
│ ├── CMakeLists.txt
|
||
│ ├── main.c # Entry point, NVS config, WiFi init, CSI callback
|
||
│ ├── csi_collector.c # CSI collection, promiscuous mode, ADR-018 serialization
|
||
│ ├── csi_collector.h
|
||
│ ├── nvs_config.c # Runtime config from NVS (WiFi creds, target IP)
|
||
│ ├── nvs_config.h
|
||
│ ├── stream_sender.c # UDP stream to aggregator
|
||
│ ├── stream_sender.h
|
||
│ └── Kconfig.projbuild # Menuconfig options
|
||
└── README.md # Flash instructions (verified working)
|
||
```
|
||
|
||
> **Implementation note**: On-device feature extraction (`feature_extract.c`) is deferred.
|
||
> The current firmware streams raw I/Q data in ADR-018 binary format; feature extraction
|
||
> happens in the Rust aggregator. This simplifies the firmware and keeps the ESP32 code
|
||
> under 200 lines of C.
|
||
|
||
**On-device processing** (reduces bandwidth, node does pre-processing):
|
||
|
||
```c
|
||
// feature_extract.c
|
||
typedef struct {
|
||
uint32_t timestamp_ms; // Local monotonic timestamp
|
||
uint8_t node_id; // This node's ID
|
||
int8_t rssi; // Received signal strength
|
||
int8_t noise_floor; // Noise floor estimate
|
||
uint8_t channel; // WiFi channel
|
||
float amplitude[56]; // |CSI| per subcarrier (from I/Q)
|
||
float phase[56]; // arg(CSI) per subcarrier
|
||
float doppler_energy; // Motion energy from temporal FFT
|
||
float breathing_band; // 0.1-0.5 Hz band power
|
||
float motion_band; // 0.5-3 Hz band power
|
||
} csi_feature_frame_t;
|
||
// Size: ~470 bytes per frame
|
||
// At 100 Hz: ~47 KB/s per node, ~280 KB/s for 6 nodes
|
||
```
|
||
|
||
**Key firmware design decisions**:
|
||
|
||
1. **Feature extraction on-device**: Raw CSI I/Q → amplitude + phase + spectral bands. This cuts bandwidth from raw ~11 KB/frame to ~470 bytes/frame.
|
||
|
||
2. **Monotonic timestamps**: Each node uses its own monotonic clock. No NTP synchronization attempted between nodes - clock drift is handled at the aggregator by fusing features, not raw phases (see "Clock Drift" section below).
|
||
|
||
3. **UDP streaming**: Low-latency, loss-tolerant. Missing frames are acceptable; ordering is maintained via sequence numbers.
|
||
|
||
4. **Configurable sampling rate**: 10-100 Hz via menuconfig. 100 Hz for motion detection, 10 Hz sufficient for occupancy.
|
||
|
||
### Aggregator Specification
|
||
|
||
The aggregator runs on any machine with WiFi/Ethernet to the nodes:
|
||
|
||
```rust
|
||
// In wifi-densepose-rs, new module: crates/wifi-densepose-hardware/src/esp32/
|
||
pub struct Esp32Aggregator {
|
||
/// UDP socket listening for node streams
|
||
socket: UdpSocket,
|
||
|
||
/// Per-node state (last timestamp, feature buffer, drift estimate)
|
||
nodes: HashMap<u8, NodeState>,
|
||
|
||
/// Ring buffer of fused feature frames
|
||
fused_buffer: VecDeque<FusedFrame>,
|
||
|
||
/// Channel to pipeline
|
||
pipeline_tx: mpsc::Sender<CsiData>,
|
||
}
|
||
|
||
/// Fused frame from all nodes for one time window
|
||
pub struct FusedFrame {
|
||
/// Timestamp (aggregator local, monotonic)
|
||
timestamp: Instant,
|
||
|
||
/// Per-node features (may have gaps if node dropped)
|
||
node_features: Vec<Option<CsiFeatureFrame>>,
|
||
|
||
/// Cross-node correlation (computed by aggregator)
|
||
cross_node_correlation: Array2<f64>,
|
||
|
||
/// Fused motion energy (max across nodes)
|
||
fused_motion_energy: f64,
|
||
|
||
/// Fused breathing band (coherent sum where phase aligns)
|
||
fused_breathing_band: f64,
|
||
}
|
||
```
|
||
|
||
### Clock Drift Handling
|
||
|
||
ESP32 crystal oscillators drift ~20-50 ppm. Over 1 hour, two nodes may diverge by 72-180ms. This makes raw phase alignment across nodes impossible.
|
||
|
||
**Solution**: Feature-level fusion, not signal-level fusion.
|
||
|
||
```
|
||
Signal-level (WRONG for ESP32):
|
||
Align raw I/Q samples across nodes → requires <1µs sync → impractical
|
||
|
||
Feature-level (CORRECT for ESP32):
|
||
Each node: raw CSI → amplitude + phase + spectral features (local)
|
||
Aggregator: collect features → correlate → fuse decisions
|
||
No cross-node phase alignment needed
|
||
```
|
||
|
||
Specifically:
|
||
- **Motion energy**: Take max across nodes (any node seeing motion = motion)
|
||
- **Breathing band**: Use node with highest SNR as primary, others as corroboration
|
||
- **Location**: Cross-node amplitude ratios estimate position (no phase needed)
|
||
|
||
### Sensing Capabilities by Deployment
|
||
|
||
| Capability | 1 Node | 3 Nodes | 6 Nodes | Evidence |
|
||
|-----------|--------|---------|---------|----------|
|
||
| Presence detection | Good | Excellent | Excellent | Single-node RSSI variance |
|
||
| Coarse motion | Good | Excellent | Excellent | Doppler energy |
|
||
| Room-level location | None | Good | Excellent | Amplitude ratios |
|
||
| Respiration | Marginal | Good | Good | 0.1-0.5 Hz band, placement-sensitive |
|
||
| Heartbeat | Poor | Poor-Marginal | Marginal | Requires ideal placement, low noise |
|
||
| Multi-person count | None | Marginal | Good | Spatial diversity |
|
||
| Pose estimation | None | Poor | Marginal | Requires model + sufficient diversity |
|
||
|
||
**Honest assessment**: ESP32 CSI is lower fidelity than Intel 5300 or Atheros. Heartbeat detection is placement-sensitive and unreliable. Respiration works with good placement. Motion and presence are solid.
|
||
|
||
### Failure Modes and Mitigations
|
||
|
||
| Failure Mode | Severity | Mitigation |
|
||
|-------------|----------|------------|
|
||
| Multipath dominates in cluttered rooms | High | Mesh diversity: 3+ nodes from different angles |
|
||
| Person occludes path between node and router | Medium | Mesh: other nodes still have clear paths |
|
||
| Clock drift ruins cross-node fusion | Medium | Feature-level fusion only; no cross-node phase alignment |
|
||
| UDP packet loss during high traffic | Low | Sequence numbers, interpolation for gaps <100ms |
|
||
| ESP32 WiFi driver bugs with CSI | Medium | Pin ESP-IDF version, test on known-good boards |
|
||
| Node power failure | Low | Aggregator handles missing nodes gracefully |
|
||
|
||
### Bill of Materials (Starter Kit)
|
||
|
||
| Item | Quantity | Unit Cost | Total |
|
||
|------|----------|-----------|-------|
|
||
| ESP32-S3-DevKitC-1 | 3 | $10 | $30 |
|
||
| USB-A to USB-C cables | 3 | $3 | $9 |
|
||
| USB power adapter (multi-port) | 1 | $15 | $15 |
|
||
| Consumer WiFi router (any) | 1 | $0 (existing) | $0 |
|
||
| Aggregator (laptop or Pi 4) | 1 | $0 (existing) | $0 |
|
||
| **Total** | | | **$54** |
|
||
|
||
### Minimal Build Spec (Clone-Flash-Run)
|
||
|
||
**Option A: Use pre-built binaries (no toolchain required)**
|
||
|
||
```bash
|
||
# Download binaries from GitHub Release v0.1.0-esp32
|
||
# Flash with esptool (pip install esptool)
|
||
python -m esptool --chip esp32s3 --port COM7 --baud 460800 \
|
||
write-flash --flash-mode dio --flash-size 4MB \
|
||
0x0 bootloader.bin 0x8000 partition-table.bin 0x10000 esp32-csi-node.bin
|
||
|
||
# Provision WiFi credentials (no recompile needed)
|
||
python scripts/provision.py --port COM7 \
|
||
--ssid "YourWiFi" --password "secret" --target-ip 192.168.1.20
|
||
|
||
# Run aggregator
|
||
cargo run -p wifi-densepose-hardware --bin aggregator -- --bind 0.0.0.0:5005 --verbose
|
||
```
|
||
|
||
**Option B: Build from source with Docker (no ESP-IDF install needed)**
|
||
|
||
```bash
|
||
# Step 1: Edit WiFi credentials
|
||
vim firmware/esp32-csi-node/sdkconfig.defaults
|
||
|
||
# Step 2: Build with Docker
|
||
cd firmware/esp32-csi-node
|
||
MSYS_NO_PATHCONV=1 docker run --rm -v "$(pwd):/project" -w /project \
|
||
espressif/idf:v5.2 bash -c "idf.py set-target esp32s3 && idf.py build"
|
||
|
||
# Step 3: Flash
|
||
cd build
|
||
python -m esptool --chip esp32s3 --port COM7 --baud 460800 \
|
||
write-flash --flash-mode dio --flash-size 4MB \
|
||
0x0 bootloader/bootloader.bin 0x8000 partition_table/partition-table.bin \
|
||
0x10000 esp32-csi-node.bin
|
||
|
||
# Step 4: Run aggregator
|
||
cargo run -p wifi-densepose-hardware --bin aggregator -- --bind 0.0.0.0:5005 --verbose
|
||
```
|
||
|
||
**Verified**: 20 Hz CSI streaming, 64/128/192 subcarrier frames, RSSI -47 to -88 dBm.
|
||
See tutorial: https://github.com/ruvnet/wifi-densepose/issues/34
|
||
|
||
### Proof of Reality for ESP32
|
||
|
||
**Live verified** with ESP32-S3-DevKitC-1 (CP2102, MAC 3C:0F:02:EC:C2:28):
|
||
- 693 frames in 18 seconds (~21.6 fps)
|
||
- Sequence numbers contiguous (zero frame loss)
|
||
- Presence detection confirmed: motion score 10/10 with per-second amplitude variance
|
||
- Frame types: 64 sc (148 B), 128 sc (276 B), 192 sc (404 B)
|
||
- 20 Rust tests + 6 Python tests pass
|
||
|
||
Pre-built binaries: https://github.com/ruvnet/wifi-densepose/releases/tag/v0.1.0-esp32
|
||
|
||
## Consequences
|
||
|
||
### Positive
|
||
- **$54 starter kit**: Lowest possible barrier to real CSI data
|
||
- **Mass available hardware**: ESP32 boards are in stock globally
|
||
- **Real data path**: Eliminates every `np.random.rand()` placeholder with actual hardware input
|
||
- **Proof artifact**: Captured CSI + expected hash proves the pipeline processes real data
|
||
- **Scalable mesh**: Add nodes for more coverage without changing software
|
||
- **Feature-level fusion**: Avoids the impossible problem of cross-node phase synchronization
|
||
|
||
### Negative
|
||
- **Lower fidelity than research NICs**: ESP32 CSI is noisier than Intel 5300
|
||
- **Heartbeat detection unreliable**: Micro-Doppler resolution insufficient for consistent heartbeat
|
||
- **ESP-IDF learning curve**: Firmware development requires embedded C knowledge
|
||
- **WiFi interference**: Nodes sharing the same channel as data traffic adds noise
|
||
- **Placement sensitivity**: Respiration detection requires careful node positioning
|
||
|
||
### Interaction with Other ADRs
|
||
- **ADR-011** (Proof of Reality): ESP32 provides the real CSI capture for the proof bundle
|
||
- **ADR-008** (Distributed Consensus): Mesh nodes can use simplified Raft for configuration distribution
|
||
- **ADR-003** (RVF Containers): Aggregator stores CSI features in RVF format
|
||
- **ADR-004** (HNSW): Environment fingerprints from ESP32 mesh feed HNSW index
|
||
|
||
## References
|
||
|
||
- [Espressif ESP-CSI Repository](https://github.com/espressif/esp-csi)
|
||
- [ESP-IDF WiFi CSI API](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/wifi.html#wi-fi-channel-state-information)
|
||
- [ESP32 CSI Research Papers](https://ieeexplore.ieee.org/document/9439871)
|
||
- [Wi-Fi Sensing with ESP32: A Tutorial](https://arxiv.org/abs/2207.07859)
|
||
- ADR-011: Python Proof-of-Reality and Mock Elimination
|
||
- ADR-018: ESP32 Development Implementation (binary frame format specification)
|
||
- [Pre-built firmware release v0.1.0-esp32](https://github.com/ruvnet/wifi-densepose/releases/tag/v0.1.0-esp32)
|
||
- [Step-by-step tutorial (Issue #34)](https://github.com/ruvnet/wifi-densepose/issues/34)
|