From 337dd9652f4a5363192c77dd9f213901a24e9c2c Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 28 Feb 2026 06:13:04 +0000 Subject: [PATCH] feat: Add 12 ADRs for RuVector RVF integration and proof-of-reality Comprehensive architecture decision records for integrating ruvnet/ruvector into wifi-densepose, covering: - ADR-002: Master integration strategy (phased rollout, new crate design) - ADR-003: RVF cognitive containers for CSI data persistence - ADR-004: HNSW vector search replacing fixed-threshold detection - ADR-005: SONA self-learning with LoRA + EWC++ for online adaptation - ADR-006: GNN-enhanced pattern recognition with temporal modeling - ADR-007: Post-quantum cryptography (ML-DSA-65 hybrid signatures) - ADR-008: Raft consensus for multi-AP distributed coordination - ADR-009: RVF WASM runtime for edge/browser/IoT deployment - ADR-010: Witness chains for tamper-evident audit trails - ADR-011: Mock elimination and proof-of-reality (fixes np.random.rand placeholders, ships CSI capture + SHA-256 verified pipeline) - ADR-012: ESP32 CSI sensor mesh ($54 starter kit specification) - ADR-013: Feature-level sensing on commodity gear (zero-cost RSSI path) ADR-011 directly addresses the credibility gap by cataloging every mock/placeholder in the Python codebase and specifying concrete fixes. https://claude.ai/code/session_01Ki7pvEZtJDvqJkmyn6B714 --- ...R-002-ruvector-rvf-integration-strategy.md | 207 +++++++++ .../ADR-003-rvf-cognitive-containers-csi.md | 251 +++++++++++ ...R-004-hnsw-vector-search-fingerprinting.md | 270 ++++++++++++ ...-005-sona-self-learning-pose-estimation.md | 253 +++++++++++ ...06-gnn-enhanced-csi-pattern-recognition.md | 261 +++++++++++ ...ost-quantum-cryptography-secure-sensing.md | 215 +++++++++ .../ADR-008-distributed-consensus-multi-ap.md | 284 ++++++++++++ ...DR-009-rvf-wasm-runtime-edge-deployment.md | 262 +++++++++++ ...10-witness-chains-audit-trail-integrity.md | 402 +++++++++++++++++ ...ython-proof-of-reality-mock-elimination.md | 414 ++++++++++++++++++ docs/adr/ADR-012-esp32-csi-sensor-mesh.md | 318 ++++++++++++++ ...13-feature-level-sensing-commodity-gear.md | 383 ++++++++++++++++ 12 files changed, 3520 insertions(+) create mode 100644 docs/adr/ADR-002-ruvector-rvf-integration-strategy.md create mode 100644 docs/adr/ADR-003-rvf-cognitive-containers-csi.md create mode 100644 docs/adr/ADR-004-hnsw-vector-search-fingerprinting.md create mode 100644 docs/adr/ADR-005-sona-self-learning-pose-estimation.md create mode 100644 docs/adr/ADR-006-gnn-enhanced-csi-pattern-recognition.md create mode 100644 docs/adr/ADR-007-post-quantum-cryptography-secure-sensing.md create mode 100644 docs/adr/ADR-008-distributed-consensus-multi-ap.md create mode 100644 docs/adr/ADR-009-rvf-wasm-runtime-edge-deployment.md create mode 100644 docs/adr/ADR-010-witness-chains-audit-trail-integrity.md create mode 100644 docs/adr/ADR-011-python-proof-of-reality-mock-elimination.md create mode 100644 docs/adr/ADR-012-esp32-csi-sensor-mesh.md create mode 100644 docs/adr/ADR-013-feature-level-sensing-commodity-gear.md diff --git a/docs/adr/ADR-002-ruvector-rvf-integration-strategy.md b/docs/adr/ADR-002-ruvector-rvf-integration-strategy.md new file mode 100644 index 0000000..4e9fcee --- /dev/null +++ b/docs/adr/ADR-002-ruvector-rvf-integration-strategy.md @@ -0,0 +1,207 @@ +# ADR-002: RuVector RVF Integration Strategy + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Current System Limitations + +The WiFi-DensePose system processes Channel State Information (CSI) from WiFi signals to estimate human body poses. The current architecture (Python v1 + Rust port) has several areas where intelligence and performance could be significantly improved: + +1. **No persistent vector storage**: CSI feature vectors are processed transiently. Historical patterns, fingerprints, and learned representations are not persisted in a searchable vector database. + +2. **Static inference models**: The modality translation network (`ModalityTranslationNetwork`) and DensePose head use fixed weights loaded at startup. There is no online learning, adaptation, or self-optimization. + +3. **Naive pattern matching**: Human detection in `CSIProcessor` uses simple threshold-based confidence scoring (`amplitude_indicator`, `phase_indicator`, `motion_indicator` with fixed weights 0.4, 0.3, 0.3). No similarity search against known patterns. + +4. **No cryptographic audit trail**: Life-critical disaster detection (wifi-densepose-mat) lacks tamper-evident logging for survivor detections and triage classifications. + +5. **Limited edge deployment**: The WASM crate (`wifi-densepose-wasm`) provides basic bindings but lacks a self-contained runtime capable of offline operation with embedded models. + +6. **Single-node architecture**: Multi-AP deployments for disaster scenarios require distributed coordination, but no consensus mechanism exists for cross-node state management. + +### RuVector Capabilities + +RuVector (github.com/ruvnet/ruvector) provides a comprehensive cognitive computing platform: + +- **RVF (Cognitive Containers)**: Self-contained files with 25 segment types (VEC, INDEX, KERNEL, EBPF, WASM, COW_MAP, WITNESS, CRYPTO) that package vectors, models, and runtime into a single deployable artifact +- **HNSW Vector Search**: Hierarchical Navigable Small World indexing with SIMD acceleration and Hyperbolic extensions for hierarchy-aware search +- **SONA**: Self-Optimizing Neural Architecture providing <1ms adaptation via LoRA fine-tuning with EWC++ memory preservation +- **GNN Learning Layer**: Graph Neural Networks that learn from every query through message passing, attention weighting, and representation updates +- **46 Attention Mechanisms**: Including Flash Attention, Linear Attention, Graph Attention, Hyperbolic Attention, Mincut-gated Attention +- **Post-Quantum Cryptography**: ML-DSA-65, Ed25519, SLH-DSA-128s signatures with SHAKE-256 hashing +- **Witness Chains**: Tamper-evident cryptographic hash-linked audit trails +- **Raft Consensus**: Distributed coordination with multi-master replication and vector clocks +- **WASM Runtime**: 5.5 KB runtime bootable in 125ms, deployable on servers, browsers, phones, IoT +- **Git-like Branching**: Copy-on-write structure (1M vectors + 100 edits ≈ 2.5 MB branch) + +## Decision + +We will integrate RuVector's RVF format and intelligence capabilities into the WiFi-DensePose system through a phased, modular approach across 9 integration domains, each detailed in subsequent ADRs (ADR-003 through ADR-010). + +### Integration Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WiFi-DensePose + RuVector │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ CSI Input │ │ RVF Store │ │ SONA │ │ GNN Layer │ │ +│ │ Pipeline │──▶│ (Vectors, │──▶│ Self-Learn │──▶│ Pattern │ │ +│ │ │ │ Indices) │ │ │ │ Enhancement │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ │ +│ ▼ ▼ ▼ ▼ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Feature │ │ HNSW │ │ Adaptive │ │ Pose │ │ +│ │ Extraction │ │ Search │ │ Weights │ │ Estimation │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ │ +│ └─────────────────┴─────────────────┴─────────────────┘ │ +│ │ │ +│ ┌──────────▼──────────┐ │ +│ │ Output Layer │ │ +│ │ • Pose Keypoints │ │ +│ │ • Body Segments │ │ +│ │ • UV Coordinates │ │ +│ │ • Confidence Maps │ │ +│ └──────────┬──────────┘ │ +│ │ │ +│ ┌───────────────────────────┼───────────────────────────┐ │ +│ ▼ ▼ ▼ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Witness │ │ Raft │ │ WASM │ │ +│ │ Chains │ │ Consensus │ │ Edge │ │ +│ │ (Audit) │ │ (Multi-AP) │ │ Runtime │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ Post-Quantum Crypto Layer │ │ +│ │ ML-DSA-65 │ Ed25519 │ SLH-DSA-128s │ SHAKE-256 │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### New Crate: `wifi-densepose-rvf` + +A new workspace member crate will serve as the integration layer: + +``` +crates/wifi-densepose-rvf/ +├── Cargo.toml +├── src/ +│ ├── lib.rs # Public API surface +│ ├── container.rs # RVF cognitive container management +│ ├── vector_store.rs # HNSW-backed CSI vector storage +│ ├── search.rs # Similarity search for fingerprinting +│ ├── learning.rs # SONA integration for online learning +│ ├── gnn.rs # GNN pattern enhancement layer +│ ├── attention.rs # Attention mechanism selection +│ ├── witness.rs # Witness chain audit trails +│ ├── consensus.rs # Raft consensus for multi-AP +│ ├── crypto.rs # Post-quantum crypto wrappers +│ ├── edge.rs # WASM edge runtime integration +│ └── adapters/ +│ ├── mod.rs +│ ├── signal_adapter.rs # Bridges wifi-densepose-signal +│ ├── nn_adapter.rs # Bridges wifi-densepose-nn +│ └── mat_adapter.rs # Bridges wifi-densepose-mat +``` + +### Phased Rollout + +| Phase | Timeline | ADR | Capability | Priority | +|-------|----------|-----|------------|----------| +| 1 | Weeks 1-3 | ADR-003 | RVF Cognitive Containers for CSI Data | Critical | +| 2 | Weeks 2-4 | ADR-004 | HNSW Vector Search for Signal Fingerprinting | Critical | +| 3 | Weeks 4-6 | ADR-005 | SONA Self-Learning for Pose Estimation | High | +| 4 | Weeks 5-7 | ADR-006 | GNN-Enhanced CSI Pattern Recognition | High | +| 5 | Weeks 6-8 | ADR-007 | Post-Quantum Cryptography for Secure Sensing | Medium | +| 6 | Weeks 7-9 | ADR-008 | Distributed Consensus for Multi-AP | Medium | +| 7 | Weeks 8-10 | ADR-009 | RVF WASM Runtime for Edge Deployment | Medium | +| 8 | Weeks 9-11 | ADR-010 | Witness Chains for Audit Trail Integrity | High (MAT) | + +### Dependency Strategy + +```toml +# In Cargo.toml workspace dependencies +[workspace.dependencies] +ruvector-core = { version = "0.1", features = ["hnsw", "sona", "gnn"] } +ruvector-data-framework = { version = "0.1", features = ["rvf", "witness", "crypto"] } +ruvector-consensus = { version = "0.1", features = ["raft"] } +ruvector-wasm = { version = "0.1", features = ["edge-runtime"] } +``` + +Feature flags control which RuVector capabilities are compiled in: + +```toml +[features] +default = ["rvf-store", "hnsw-search"] +rvf-store = ["ruvector-data-framework/rvf"] +hnsw-search = ["ruvector-core/hnsw"] +sona-learning = ["ruvector-core/sona"] +gnn-patterns = ["ruvector-core/gnn"] +post-quantum = ["ruvector-data-framework/crypto"] +witness-chains = ["ruvector-data-framework/witness"] +raft-consensus = ["ruvector-consensus/raft"] +wasm-edge = ["ruvector-wasm/edge-runtime"] +full = ["rvf-store", "hnsw-search", "sona-learning", "gnn-patterns", "post-quantum", "witness-chains", "raft-consensus", "wasm-edge"] +``` + +## Consequences + +### Positive + +- **10-100x faster pattern lookup**: HNSW replaces linear scan for CSI fingerprint matching +- **Continuous improvement**: SONA enables online adaptation without full retraining +- **Self-contained deployment**: RVF containers package everything needed for field operation +- **Tamper-evident records**: Witness chains provide cryptographic proof for disaster response auditing +- **Future-proof security**: Post-quantum signatures resist quantum computing attacks +- **Distributed operation**: Raft consensus enables coordinated multi-AP sensing +- **Ultra-light edge**: 5.5 KB WASM runtime enables browser and IoT deployment +- **Git-like versioning**: COW branching enables experimental model variations with minimal storage + +### Negative + +- **Increased binary size**: Full feature set adds significant dependencies (~15-30 MB) +- **Complexity**: 9 integration domains require careful coordination +- **Learning curve**: Team must understand RuVector's cognitive container paradigm +- **API stability risk**: RuVector is pre-1.0; APIs may change +- **Testing surface**: Each integration point requires dedicated test suites + +### Risks and Mitigations + +| Risk | Severity | Mitigation | +|------|----------|------------| +| RuVector API breaking changes | High | Pin versions, adapter pattern isolates impact | +| Performance regression from abstraction layers | Medium | Benchmark each integration point, zero-cost abstractions | +| Feature flag combinatorial complexity | Medium | CI matrix testing for key feature combinations | +| Over-engineering for current use cases | Medium | Phased rollout, each phase independently valuable | +| Binary size bloat for edge targets | Low | Feature flags ensure only needed capabilities compile | + +## Related ADRs + +- **ADR-001**: WiFi-Mat Disaster Detection Architecture (existing) +- **ADR-003**: RVF Cognitive Containers for CSI Data +- **ADR-004**: HNSW Vector Search for Signal Fingerprinting +- **ADR-005**: SONA Self-Learning for Pose Estimation +- **ADR-006**: GNN-Enhanced CSI Pattern Recognition +- **ADR-007**: Post-Quantum Cryptography for Secure Sensing +- **ADR-008**: Distributed Consensus for Multi-AP Coordination +- **ADR-009**: RVF WASM Runtime for Edge Deployment +- **ADR-010**: Witness Chains for Audit Trail Integrity + +## References + +- [RuVector Repository](https://github.com/ruvnet/ruvector) +- [HNSW Algorithm](https://arxiv.org/abs/1603.09320) +- [LoRA: Low-Rank Adaptation](https://arxiv.org/abs/2106.09685) +- [Elastic Weight Consolidation](https://arxiv.org/abs/1612.00796) +- [Raft Consensus](https://raft.github.io/raft.pdf) +- [ML-DSA (FIPS 204)](https://csrc.nist.gov/pubs/fips/204/final) +- [WiFi-DensePose Rust ADR-001: Workspace Structure](../rust-port/wifi-densepose-rs/docs/adr/ADR-001-workspace-structure.md) diff --git a/docs/adr/ADR-003-rvf-cognitive-containers-csi.md b/docs/adr/ADR-003-rvf-cognitive-containers-csi.md new file mode 100644 index 0000000..2f14ff1 --- /dev/null +++ b/docs/adr/ADR-003-rvf-cognitive-containers-csi.md @@ -0,0 +1,251 @@ +# ADR-003: RVF Cognitive Containers for CSI Data + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Problem + +WiFi-DensePose processes CSI (Channel State Information) data through a multi-stage pipeline: raw capture → preprocessing → feature extraction → neural inference → pose output. Each stage produces intermediate data that is currently ephemeral: + +1. **Raw CSI measurements** (`CsiData`): Amplitude matrices (num_antennas x num_subcarriers), phase arrays, SNR values, metadata. Stored only in a bounded `VecDeque` (max 500 entries in Python, similar in Rust). + +2. **Extracted features** (`CsiFeatures`): Amplitude mean/variance, phase differences, correlation matrices, Doppler shifts, power spectral density. Discarded after single-pass inference. + +3. **Trained model weights**: Static ONNX/PyTorch files loaded from disk. No mechanism to persist adapted weights or experimental variations. + +4. **Detection results** (`HumanDetectionResult`): Confidence scores, motion scores, detection booleans. Logged but not indexed for pattern retrieval. + +5. **Environment fingerprints**: Each physical space has a unique CSI signature affected by room geometry, furniture, building materials. No persistent fingerprint database exists. + +### Opportunity + +RuVector's RVF (Cognitive Container) format provides a single-file packaging solution with 25 segment types that can encapsulate the entire WiFi-DensePose operational state: + +``` +RVF Cognitive Container Structure: +┌─────────────────────────────────────────────┐ +│ HEADER │ Magic, version, segment count │ +├───────────┼─────────────────────────────────┤ +│ VEC │ CSI feature vectors │ +│ INDEX │ HNSW index over vectors │ +│ WASM │ Inference runtime │ +│ COW_MAP │ Copy-on-write branch state │ +│ WITNESS │ Audit chain entries │ +│ CRYPTO │ Signature keys, attestations │ +│ KERNEL │ Bootable runtime (optional) │ +│ EBPF │ Hardware-accelerated filters │ +│ ... │ (25 total segment types) │ +└─────────────────────────────────────────────┘ +``` + +## Decision + +We will adopt the RVF Cognitive Container format as the primary persistence and deployment unit for WiFi-DensePose operational data, implementing the following container types: + +### 1. CSI Fingerprint Container (`.rvf.csi`) + +Packages environment-specific CSI signatures for location recognition: + +```rust +/// CSI Fingerprint container storing environment signatures +pub struct CsiFingerprintContainer { + /// Container metadata + metadata: ContainerMetadata, + + /// VEC segment: Normalized CSI feature vectors + /// Each vector = [amplitude_mean(N) | amplitude_var(N) | phase_diff(N-1) | doppler(10) | psd(128)] + /// Typical dimensionality: 64 subcarriers → 64+64+63+10+128 = 329 dimensions + fingerprint_vectors: VecSegment, + + /// INDEX segment: HNSW index for O(log n) nearest-neighbor lookup + hnsw_index: IndexSegment, + + /// COW_MAP: Branches for different times-of-day, occupancy levels + branches: CowMapSegment, + + /// Metadata per vector: room_id, timestamp, occupancy_count, furniture_hash + annotations: AnnotationSegment, +} +``` + +**Vector encoding**: Each CSI snapshot is encoded as a fixed-dimension vector: +``` +CSI Feature Vector (329-dim for 64 subcarriers): +┌──────────────────┬──────────────────┬─────────────────┬──────────┬─────────┐ +│ amplitude_mean │ amplitude_var │ phase_diff │ doppler │ psd │ +│ [f32; 64] │ [f32; 64] │ [f32; 63] │ [f32; 10]│ [f32;128│ +└──────────────────┴──────────────────┴─────────────────┴──────────┴─────────┘ +``` + +### 2. Model Container (`.rvf.model`) + +Packages neural network weights with versioning: + +```rust +/// Model container with version tracking and A/B comparison +pub struct ModelContainer { + /// Container metadata with model version history + metadata: ContainerMetadata, + + /// Primary model weights (ONNX serialized) + primary_weights: BlobSegment, + + /// SONA adaptation deltas (LoRA low-rank matrices) + adaptation_deltas: VecSegment, + + /// COW branches for model experiments + /// e.g., "baseline", "adapted-office-env", "adapted-warehouse" + branches: CowMapSegment, + + /// Performance metrics per branch + metrics: AnnotationSegment, + + /// Witness chain: every weight update recorded + audit_trail: WitnessSegment, +} +``` + +### 3. Session Container (`.rvf.session`) + +Captures a complete sensing session for replay and analysis: + +```rust +/// Session container for recording and replaying sensing sessions +pub struct SessionContainer { + /// Session metadata (start time, duration, hardware config) + metadata: ContainerMetadata, + + /// Time-series CSI vectors at capture rate + csi_timeseries: VecSegment, + + /// Detection results aligned to CSI timestamps + detections: AnnotationSegment, + + /// Pose estimation outputs + poses: VecSegment, + + /// Index for temporal range queries + temporal_index: IndexSegment, + + /// Cryptographic integrity proof + witness_chain: WitnessSegment, +} +``` + +### Container Lifecycle + +``` + ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ + │ Create │───▶│ Ingest │───▶│ Query │───▶│ Branch │ + │ Container │ │ Vectors │ │ (HNSW) │ │ (COW) │ + └──────────┘ └──────────┘ └──────────┘ └──────────┘ + │ │ + │ ┌──────────┐ ┌──────────┐ │ + │ │ Merge │◀───│ Compare │◀─────────┘ + │ │ Branches │ │ Results │ + │ └────┬─────┘ └──────────┘ + │ │ + ▼ ▼ + ┌──────────┐ ┌──────────┐ + │ Export │ │ Deploy │ + │ (.rvf) │ │ (Edge) │ + └──────────┘ └──────────┘ +``` + +### Integration with Existing Crates + +The container system integrates through adapter traits: + +```rust +/// Trait for types that can be vectorized into RVF containers +pub trait RvfVectorizable { + /// Encode self as a fixed-dimension f32 vector + fn to_rvf_vector(&self) -> Vec; + + /// Reconstruct from an RVF vector + fn from_rvf_vector(vec: &[f32]) -> Result where Self: Sized; + + /// Vector dimensionality + fn vector_dim() -> usize; +} + +// Implementation for existing types +impl RvfVectorizable for CsiFeatures { + fn to_rvf_vector(&self) -> Vec { + let mut vec = Vec::with_capacity(Self::vector_dim()); + vec.extend(self.amplitude_mean.iter().map(|&x| x as f32)); + vec.extend(self.amplitude_variance.iter().map(|&x| x as f32)); + vec.extend(self.phase_difference.iter().map(|&x| x as f32)); + vec.extend(self.doppler_shift.iter().map(|&x| x as f32)); + vec.extend(self.power_spectral_density.iter().map(|&x| x as f32)); + vec + } + + fn vector_dim() -> usize { + // 64 + 64 + 63 + 10 + 128 = 329 (for 64 subcarriers) + 329 + } + // ... +} +``` + +### Storage Characteristics + +| Container Type | Typical Size | Vector Count | Use Case | +|----------------|-------------|-------------|----------| +| Fingerprint | 5-50 MB | 10K-100K | Room/building fingerprint DB | +| Model | 50-500 MB | N/A (blob) | Neural network deployment | +| Session | 10-200 MB | 50K-500K | 1-hour recording at 100 Hz | + +### COW Branching for Environment Adaptation + +The copy-on-write mechanism enables zero-overhead experimentation: + +``` +main (office baseline: 50K vectors) + ├── branch/morning (delta: 500 vectors, ~15 KB) + ├── branch/afternoon (delta: 800 vectors, ~24 KB) + ├── branch/occupied-10 (delta: 2K vectors, ~60 KB) + └── branch/furniture-moved (delta: 5K vectors, ~150 KB) +``` + +Total overhead for 4 branches on a 50K-vector container: ~250 KB additional (0.5%). + +## Consequences + +### Positive +- **Single-file deployment**: Move a fingerprint database between sites by copying one `.rvf` file +- **Versioned models**: A/B test model variants without duplicating full weight sets +- **Session replay**: Reproduce detection results from recorded CSI data +- **Atomic operations**: Container writes are transactional; no partial state corruption +- **Cross-platform**: Same container format works on server, WASM, and embedded +- **Storage efficient**: COW branching avoids duplicating unchanged data + +### Negative +- **Format lock-in**: RVF is not yet a widely-adopted standard +- **Serialization overhead**: Converting between native types and RVF vectors adds latency (~0.1-0.5 ms per vector) +- **Learning curve**: Team must understand segment types and container lifecycle +- **File size for sessions**: High-rate CSI capture (1000 Hz) generates large session containers + +### Performance Targets + +| Operation | Target Latency | Notes | +|-----------|---------------|-------| +| Container open | <10 ms | Memory-mapped I/O | +| Vector insert | <0.1 ms | Append to VEC segment | +| HNSW query (100K vectors) | <1 ms | See ADR-004 | +| Branch create | <1 ms | COW metadata only | +| Branch merge | <100 ms | Delta application | +| Container export | ~1 ms/MB | Sequential write | + +## References + +- [RuVector Cognitive Container Specification](https://github.com/ruvnet/ruvector) +- [Memory-Mapped I/O in Rust](https://docs.rs/memmap2) +- [Copy-on-Write Data Structures](https://en.wikipedia.org/wiki/Copy-on-write) +- ADR-002: RuVector RVF Integration Strategy diff --git a/docs/adr/ADR-004-hnsw-vector-search-fingerprinting.md b/docs/adr/ADR-004-hnsw-vector-search-fingerprinting.md new file mode 100644 index 0000000..9219b37 --- /dev/null +++ b/docs/adr/ADR-004-hnsw-vector-search-fingerprinting.md @@ -0,0 +1,270 @@ +# ADR-004: HNSW Vector Search for Signal Fingerprinting + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Current Signal Matching Limitations + +The WiFi-DensePose system needs to match incoming CSI patterns against known signatures for: + +1. **Environment recognition**: Identifying which room/area the device is in based on CSI characteristics +2. **Activity classification**: Matching current CSI patterns to known human activities (walking, sitting, falling) +3. **Anomaly detection**: Determining whether current readings deviate significantly from baseline +4. **Survivor re-identification** (MAT module): Tracking individual survivors across scan sessions + +Current approach in `CSIProcessor._calculate_detection_confidence()`: +```python +# Fixed thresholds, no similarity search +amplitude_indicator = np.mean(features.amplitude_mean) > 0.1 +phase_indicator = np.std(features.phase_difference) > 0.05 +motion_indicator = motion_score > 0.3 +confidence = (0.4 * amplitude_indicator + 0.3 * phase_indicator + 0.3 * motion_indicator) +``` + +This is a **O(1) fixed-threshold check** that: +- Cannot learn from past observations +- Has no concept of "similar patterns seen before" +- Requires manual threshold tuning per environment +- Produces binary indicators (above/below threshold) losing gradient information + +### What HNSW Provides + +Hierarchical Navigable Small World (HNSW) graphs enable approximate nearest-neighbor search in high-dimensional vector spaces with: + +- **O(log n) query time** vs O(n) brute-force +- **High recall**: >95% recall at 10x speed of exact search +- **Dynamic insertion**: New vectors added without full rebuild +- **SIMD acceleration**: RuVector's implementation uses AVX2/NEON for distance calculations + +RuVector extends standard HNSW with: +- **Hyperbolic HNSW**: Search in Poincaré ball space for hierarchy-aware results (e.g., "walking" is closer to "running" than to "sitting" in activity hierarchy) +- **GNN enhancement**: Graph neural networks refine neighbor connections after queries +- **Tiered compression**: 2-32x memory reduction through adaptive quantization + +## Decision + +We will integrate RuVector's HNSW implementation as the primary similarity search engine for all CSI pattern matching operations, replacing fixed-threshold detection with similarity-based retrieval. + +### Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ HNSW Search Pipeline │ +├─────────────────────────────────────────────────────────────────┤ +│ │ +│ CSI Input Feature Vector HNSW │ +│ ────────▶ Extraction ────▶ Encode ────▶ Search │ +│ (existing) (new) (new) │ +│ │ │ +│ ┌─────────────┤ │ +│ ▼ ▼ │ +│ Top-K Results Confidence │ +│ [vec_id, dist, Score from │ +│ metadata] Distance Dist. │ +│ │ │ +│ ▼ │ +│ ┌────────────┐ │ +│ │ Decision │ │ +│ │ Fusion │ │ +│ └────────────┘ │ +│ Combines HNSW similarity with │ +│ existing threshold-based logic │ +└─────────────────────────────────────────────────────────────────┘ +``` + +### Index Configuration + +```rust +/// HNSW configuration tuned for CSI vector characteristics +pub struct CsiHnswConfig { + /// Vector dimensionality (matches CsiFeatures encoding) + dim: usize, // 329 for 64 subcarriers + + /// Maximum number of connections per node per layer + /// Higher M = better recall, more memory + /// CSI vectors are moderately dimensional; M=16 balances well + m: usize, // 16 + + /// Size of dynamic candidate list during construction + /// ef_construction = 200 gives >99% recall for 329-dim vectors + ef_construction: usize, // 200 + + /// Size of dynamic candidate list during search + /// ef_search = 64 gives >95% recall with <1ms latency at 100K vectors + ef_search: usize, // 64 + + /// Distance metric + /// Cosine similarity works best for normalized CSI features + metric: DistanceMetric, // Cosine + + /// Maximum elements (pre-allocated for performance) + max_elements: usize, // 1_000_000 + + /// Enable SIMD acceleration + simd: bool, // true + + /// Quantization level for memory reduction + quantization: Quantization, // PQ8 (product quantization, 8-bit) +} +``` + +### Multiple Index Strategy + +Different use cases require different index configurations: + +| Index Name | Vectors | Dim | Distance | Use Case | +|-----------|---------|-----|----------|----------| +| `env_fingerprint` | 10K-1M | 329 | Cosine | Environment/room identification | +| `activity_pattern` | 1K-50K | 329 | Euclidean | Activity classification | +| `temporal_pattern` | 10K-500K | 329 | Cosine | Temporal anomaly detection | +| `survivor_track` | 100-10K | 329 | Cosine | MAT survivor re-identification | + +### Similarity-Based Detection Enhancement + +Replace fixed thresholds with distance-based confidence: + +```rust +/// Enhanced detection using HNSW similarity search +pub struct SimilarityDetector { + /// HNSW index of known human-present CSI patterns + human_patterns: HnswIndex, + + /// HNSW index of known empty-room CSI patterns + empty_patterns: HnswIndex, + + /// Fusion weight between similarity and threshold methods + fusion_alpha: f64, // 0.7 = 70% similarity, 30% threshold +} + +impl SimilarityDetector { + /// Detect human presence using similarity search + threshold fusion + pub fn detect(&self, features: &CsiFeatures) -> DetectionResult { + let query_vec = features.to_rvf_vector(); + + // Search both indices + let human_neighbors = self.human_patterns.search(&query_vec, k=5); + let empty_neighbors = self.empty_patterns.search(&query_vec, k=5); + + // Distance-based confidence + let avg_human_dist = human_neighbors.mean_distance(); + let avg_empty_dist = empty_neighbors.mean_distance(); + + // Similarity confidence: how much closer to human patterns vs empty + let similarity_confidence = avg_empty_dist / (avg_human_dist + avg_empty_dist); + + // Fuse with traditional threshold-based confidence + let threshold_confidence = self.traditional_threshold_detect(features); + let fused_confidence = self.fusion_alpha * similarity_confidence + + (1.0 - self.fusion_alpha) * threshold_confidence; + + DetectionResult { + human_detected: fused_confidence > 0.5, + confidence: fused_confidence, + similarity_confidence, + threshold_confidence, + nearest_human_pattern: human_neighbors[0].metadata.clone(), + nearest_empty_pattern: empty_neighbors[0].metadata.clone(), + } + } +} +``` + +### Incremental Learning Loop + +Every confirmed detection enriches the index: + +``` +1. CSI captured → features extracted → vector encoded +2. HNSW search returns top-K neighbors + distances +3. Detection decision made (similarity + threshold fusion) +4. If confirmed (by temporal consistency or ground truth): + a. Insert vector into appropriate index (human/empty) + b. GNN layer updates neighbor relationships (ADR-006) + c. SONA adapts fusion weights (ADR-005) +5. Periodically: prune stale vectors, rebuild index layers +``` + +### Performance Analysis + +**Memory requirements** (PQ8 quantization): + +| Vector Count | Raw Size | PQ8 Compressed | HNSW Overhead | Total | +|-------------|----------|----------------|---------------|-------| +| 10,000 | 12.9 MB | 1.6 MB | 2.5 MB | 4.1 MB | +| 100,000 | 129 MB | 16 MB | 25 MB | 41 MB | +| 1,000,000 | 1.29 GB | 160 MB | 250 MB | 410 MB | + +**Latency expectations** (329-dim vectors, ef_search=64): + +| Vector Count | Brute Force | HNSW | Speedup | +|-------------|-------------|------|---------| +| 10,000 | 3.2 ms | 0.08 ms | 40x | +| 100,000 | 32 ms | 0.3 ms | 107x | +| 1,000,000 | 320 ms | 0.9 ms | 356x | + +### Hyperbolic Extension for Activity Hierarchy + +WiFi-sensed activities have natural hierarchy: + +``` + motion + / \ + locomotion stationary + / \ / \ + walking running sitting lying + / \ + normal shuffling +``` + +Hyperbolic HNSW in Poincaré ball space preserves this hierarchy during search, so a query for "shuffling" returns "walking" before "sitting" even if Euclidean distances are similar. + +```rust +/// Hyperbolic HNSW for hierarchy-aware activity matching +pub struct HyperbolicActivityIndex { + index: HnswIndex, + curvature: f64, // -1.0 for unit Poincaré ball +} + +impl HyperbolicActivityIndex { + pub fn search(&self, query: &[f32], k: usize) -> Vec { + // Uses Poincaré distance: d(u,v) = arcosh(1 + 2||u-v||²/((1-||u||²)(1-||v||²))) + self.index.search_hyperbolic(query, k, self.curvature) + } +} +``` + +## Consequences + +### Positive +- **Adaptive detection**: System improves with more data; no manual threshold tuning +- **Sub-millisecond search**: HNSW provides <1ms queries even at 1M vectors +- **Memory efficient**: PQ8 reduces storage 8x with <5% recall loss +- **Hierarchy-aware**: Hyperbolic mode respects activity relationships +- **Incremental**: New patterns added without full index rebuild +- **Explainable**: "This detection matched pattern X from room Y at time Z" + +### Negative +- **Cold-start problem**: Need initial fingerprint data before similarity search is useful +- **Index maintenance**: Periodic pruning and layer rebalancing needed +- **Approximation**: HNSW is approximate; may miss exact nearest neighbor (mitigated by high ef_search) +- **Memory for indices**: HNSW graph structure adds 2.5x overhead on top of vectors + +### Migration Strategy + +1. **Phase 1**: Run HNSW search in parallel with existing threshold detection, log both results +2. **Phase 2**: A/B test fusion weights (alpha parameter) on labeled data +3. **Phase 3**: Gradually increase fusion_alpha from 0.0 (pure threshold) to 0.7 (primarily similarity) +4. **Phase 4**: Threshold detection becomes fallback for cold-start/empty-index scenarios + +## References + +- [HNSW: Efficient and Robust Approximate Nearest Neighbor](https://arxiv.org/abs/1603.09320) +- [Product Quantization for Nearest Neighbor Search](https://hal.inria.fr/inria-00514462) +- [Poincaré Embeddings for Learning Hierarchical Representations](https://arxiv.org/abs/1705.08039) +- [RuVector HNSW Implementation](https://github.com/ruvnet/ruvector) +- ADR-003: RVF Cognitive Containers for CSI Data diff --git a/docs/adr/ADR-005-sona-self-learning-pose-estimation.md b/docs/adr/ADR-005-sona-self-learning-pose-estimation.md new file mode 100644 index 0000000..153b3b5 --- /dev/null +++ b/docs/adr/ADR-005-sona-self-learning-pose-estimation.md @@ -0,0 +1,253 @@ +# ADR-005: SONA Self-Learning for Pose Estimation + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Static Model Problem + +The WiFi-DensePose modality translation network (`ModalityTranslationNetwork` in Python, `ModalityTranslator` in Rust) converts CSI features into visual-like feature maps that feed the DensePose head for body segmentation and UV coordinate estimation. These models are trained offline and deployed with frozen weights. + +**Critical limitations of static models**: + +1. **Environment drift**: CSI characteristics change when furniture moves, new objects are introduced, or building occupancy changes. A model trained in Lab A degrades in Lab B without retraining. + +2. **Hardware variance**: Different WiFi chipsets (Intel AX200 vs Broadcom BCM4375 vs Qualcomm WCN6855) produce subtly different CSI patterns. Static models overfit to training hardware. + +3. **Temporal drift**: Even in the same environment, CSI patterns shift with temperature, humidity, and electromagnetic interference changes throughout the day. + +4. **Population bias**: Models trained on one demographic may underperform on body types, heights, or movement patterns not represented in training data. + +Current mitigation: manual retraining with new data, which requires: +- Collecting labeled data in the new environment +- GPU-intensive training (hours to days) +- Model export/deployment cycle +- Downtime during switchover + +### SONA Opportunity + +RuVector's Self-Optimizing Neural Architecture (SONA) provides <1ms online adaptation through: + +- **LoRA (Low-Rank Adaptation)**: Instead of updating all weights (millions of parameters), LoRA injects small trainable rank decomposition matrices into frozen model layers. For a weight matrix W ∈ R^(d×k), LoRA learns A ∈ R^(d×r) and B ∈ R^(r×k) where r << min(d,k), so the adapted weight is W + AB. + +- **EWC++ (Elastic Weight Consolidation)**: Prevents catastrophic forgetting by penalizing changes to parameters important for previously learned tasks. Each parameter has a Fisher information-weighted importance score. + +- **Online gradient accumulation**: Small batches of live data (as few as 1-10 samples) contribute to adaptation without full backward passes. + +## Decision + +We will integrate SONA as the online learning engine for both the modality translation network and the DensePose head, enabling continuous environment-specific adaptation without offline retraining. + +### Adaptation Architecture + +``` +┌──────────────────────────────────────────────────────────────────────┐ +│ SONA Adaptation Pipeline │ +├──────────────────────────────────────────────────────────────────────┤ +│ │ +│ Frozen Base Model LoRA Adaptation Matrices │ +│ ┌─────────────────┐ ┌──────────────────────┐ │ +│ │ Conv2d(64,128) │ ◀── W_frozen ──▶ │ A(64,r) × B(r,128) │ │ +│ │ Conv2d(128,256) │ ◀── W_frozen ──▶ │ A(128,r) × B(r,256)│ │ +│ │ Conv2d(256,512) │ ◀── W_frozen ──▶ │ A(256,r) × B(r,512)│ │ +│ │ ConvT(512,256) │ ◀── W_frozen ──▶ │ A(512,r) × B(r,256)│ │ +│ │ ... │ │ ... │ │ +│ └─────────────────┘ └──────────────────────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌─────────────────────────────────────────────────────────┐ │ +│ │ Effective Weight = W_frozen + α(AB) │ │ +│ │ α = scaling factor (0.0 → 1.0 over time) │ │ +│ └─────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────┐ │ +│ │ EWC++ Regularizer │ │ +│ │ L_total = L_task + λ Σ F_i (θ_i - θ*_i)² │ │ +│ │ │ │ +│ │ F_i = Fisher information (parameter importance) │ │ +│ │ θ*_i = optimal parameters from previous tasks │ │ +│ │ λ = regularization strength (10-100) │ │ +│ └─────────────────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────────────┘ +``` + +### LoRA Configuration per Layer + +```rust +/// SONA LoRA configuration for WiFi-DensePose +pub struct SonaConfig { + /// LoRA rank (r): dimensionality of adaptation matrices + /// r=4 for encoder layers (less variation needed) + /// r=8 for decoder layers (more expression needed) + /// r=16 for final output layers (maximum adaptability) + lora_ranks: HashMap, + + /// Scaling factor alpha: controls adaptation strength + /// Starts at 0.0 (pure frozen model), increases to target + alpha: f64, // Target: 0.3 + + /// Alpha warmup steps before reaching target + alpha_warmup_steps: usize, // 100 + + /// EWC++ regularization strength + ewc_lambda: f64, // 50.0 + + /// Fisher information estimation samples + fisher_samples: usize, // 200 + + /// Online learning rate (much smaller than offline training) + online_lr: f64, // 1e-5 + + /// Gradient accumulation steps before applying update + accumulation_steps: usize, // 10 + + /// Maximum adaptation delta (safety bound) + max_delta_norm: f64, // 0.1 +} +``` + +**Parameter budget**: + +| Layer | Original Params | LoRA Rank | LoRA Params | Overhead | +|-------|----------------|-----------|-------------|----------| +| Encoder Conv1 (64→128) | 73,728 | 4 | 768 | 1.0% | +| Encoder Conv2 (128→256) | 294,912 | 4 | 1,536 | 0.5% | +| Encoder Conv3 (256→512) | 1,179,648 | 4 | 3,072 | 0.3% | +| Decoder ConvT1 (512→256) | 1,179,648 | 8 | 6,144 | 0.5% | +| Decoder ConvT2 (256→128) | 294,912 | 8 | 3,072 | 1.0% | +| Output Conv (128→24) | 27,648 | 16 | 2,432 | 8.8% | +| **Total** | **3,050,496** | - | **17,024** | **0.56%** | + +SONA adapts **0.56% of parameters** while achieving 70-90% of the accuracy improvement of full fine-tuning. + +### Adaptation Trigger Conditions + +```rust +/// When to trigger SONA adaptation +pub enum AdaptationTrigger { + /// Detection confidence drops below threshold over N samples + ConfidenceDrop { + threshold: f64, // 0.6 + window_size: usize, // 50 + }, + + /// CSI statistics drift beyond baseline (KL divergence) + DistributionDrift { + kl_threshold: f64, // 0.5 + reference_window: usize, // 1000 + }, + + /// New environment detected (no close HNSW matches) + NewEnvironment { + min_distance: f64, // 0.8 (far from all known fingerprints) + }, + + /// Periodic adaptation (maintenance) + Periodic { + interval_samples: usize, // 10000 + }, + + /// Manual trigger via API + Manual, +} +``` + +### Adaptation Feedback Sources + +Since WiFi-DensePose lacks camera ground truth in deployment, adaptation uses **self-supervised signals**: + +1. **Temporal consistency**: Pose estimates should change smoothly between frames. Jerky transitions indicate prediction error. + ``` + L_temporal = ||pose(t) - pose(t-1)||² when Δt < 100ms + ``` + +2. **Physical plausibility**: Body part positions must satisfy skeletal constraints (limb lengths, joint angles). + ``` + L_skeleton = Σ max(0, |limb_length - expected_length| - tolerance) + ``` + +3. **Multi-view agreement** (multi-AP): Different APs observing the same person should produce consistent poses. + ``` + L_multiview = ||pose_AP1 - transform(pose_AP2)||² + ``` + +4. **Detection stability**: Confidence should be high when the environment is stable. + ``` + L_stability = -log(confidence) when variance(CSI_window) < threshold + ``` + +### Safety Mechanisms + +```rust +/// Safety bounds prevent adaptation from degrading the model +pub struct AdaptationSafety { + /// Maximum parameter change per update step + max_step_norm: f64, + + /// Rollback if validation loss increases by this factor + rollback_threshold: f64, // 1.5 (50% worse = rollback) + + /// Keep N checkpoints for rollback + checkpoint_count: usize, // 5 + + /// Disable adaptation after N consecutive rollbacks + max_consecutive_rollbacks: usize, // 3 + + /// Minimum samples between adaptations + cooldown_samples: usize, // 100 +} +``` + +### Persistence via RVF + +Adaptation state is stored in the Model Container (ADR-003): +- LoRA matrices A and B serialized to VEC segment +- Fisher information matrix serialized alongside +- Each adaptation creates a witness chain entry (ADR-010) +- COW branching allows reverting to any previous adaptation state + +``` +model.rvf.model + ├── main (frozen base weights) + ├── branch/adapted-office-2024-01 (LoRA deltas) + ├── branch/adapted-warehouse (LoRA deltas) + └── branch/adapted-outdoor-disaster (LoRA deltas) +``` + +## Consequences + +### Positive +- **Zero-downtime adaptation**: Model improves continuously during operation +- **Tiny overhead**: 17K parameters (0.56%) vs 3M full model; <1ms per adaptation step +- **No forgetting**: EWC++ preserves performance on previously-seen environments +- **Portable adaptations**: LoRA deltas are ~70 KB, easily shared between devices +- **Safe rollback**: Checkpoint system prevents runaway degradation +- **Self-supervised**: No labeled data needed during deployment + +### Negative +- **Bounded expressiveness**: LoRA rank limits the degree of adaptation; extreme environment changes may require offline retraining +- **Feedback noise**: Self-supervised signals are weaker than ground-truth labels; adaptation is slower and less precise +- **Compute on device**: Even small gradient computations require tensor math on the inference device +- **Complexity**: Debugging adapted models is harder than static models +- **Hyperparameter sensitivity**: EWC lambda, LoRA rank, learning rate require tuning + +### Validation Plan + +1. **Offline validation**: Train base model on Environment A, test SONA adaptation to Environment B with known ground truth. Measure pose estimation MPJPE (Mean Per-Joint Position Error) improvement. +2. **A/B deployment**: Run static model and SONA-adapted model in parallel on same CSI stream. Compare detection rates and pose consistency. +3. **Stress test**: Rapidly change environments (simulated) and verify EWC++ prevents catastrophic forgetting. +4. **Edge latency**: Benchmark adaptation step on target hardware (Raspberry Pi 4, Jetson Nano, browser WASM). + +## References + +- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) +- [Elastic Weight Consolidation (EWC)](https://arxiv.org/abs/1612.00796) +- [Continual Learning with SONA](https://github.com/ruvnet/ruvector) +- [Self-Supervised WiFi Sensing](https://arxiv.org/abs/2203.11928) +- ADR-002: RuVector RVF Integration Strategy +- ADR-003: RVF Cognitive Containers for CSI Data diff --git a/docs/adr/ADR-006-gnn-enhanced-csi-pattern-recognition.md b/docs/adr/ADR-006-gnn-enhanced-csi-pattern-recognition.md new file mode 100644 index 0000000..482d676 --- /dev/null +++ b/docs/adr/ADR-006-gnn-enhanced-csi-pattern-recognition.md @@ -0,0 +1,261 @@ +# ADR-006: GNN-Enhanced CSI Pattern Recognition + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Limitations of Independent Vector Search + +ADR-004 introduces HNSW-based similarity search for CSI pattern matching. While HNSW provides fast nearest-neighbor retrieval, it treats each vector independently. CSI patterns, however, have rich relational structure: + +1. **Temporal adjacency**: CSI frames captured 10ms apart are more related than frames 10s apart. Sequential patterns reveal motion trajectories. + +2. **Spatial correlation**: CSI readings from adjacent subcarriers are highly correlated due to frequency proximity. Antenna pairs capture different spatial perspectives. + +3. **Cross-session similarity**: The "walking to kitchen" pattern from Tuesday should inform Wednesday's recognition, but the environment baseline may have shifted. + +4. **Multi-person entanglement**: When multiple people are present, CSI patterns are superpositions. Disentangling requires understanding which pattern fragments co-occur. + +Standard HNSW cannot capture these relationships. Each query returns neighbors based solely on vector distance, ignoring the graph structure of how patterns relate to each other. + +### RuVector's GNN Enhancement + +RuVector implements a Graph Neural Network layer that sits on top of the HNSW index: + +``` +Standard HNSW: Query → Distance-based neighbors → Results +GNN-Enhanced: Query → Distance-based neighbors → GNN refinement → Improved results +``` + +The GNN performs three operations in <1ms: +1. **Message passing**: Each node aggregates information from its HNSW neighbors +2. **Attention weighting**: Multi-head attention identifies which neighbors are most relevant for the current query context +3. **Representation update**: Node embeddings are refined based on neighborhood context + +Additionally, **temporal learning** tracks query sequences to discover: +- Vectors that frequently appear together in sessions +- Temporal ordering patterns (A usually precedes B) +- Session context that changes relevance rankings + +## Decision + +We will integrate RuVector's GNN layer to enhance CSI pattern recognition with three core capabilities: relational search, temporal sequence modeling, and multi-person disentanglement. + +### GNN Architecture for CSI + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ GNN-Enhanced CSI Pattern Graph │ +├─────────────────────────────────────────────────────────────────────┤ +│ │ +│ Layer 1: HNSW Spatial Graph │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ Nodes = CSI feature vectors │ │ +│ │ Edges = HNSW neighbor connections (distance-based) │ │ +│ │ Node features = [amplitude | phase | doppler | PSD] │ │ +│ └───────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ Layer 2: Temporal Edges │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ Additional edges between temporally adjacent vectors │ │ +│ │ Edge weight = 1/Δt (closer in time = stronger) │ │ +│ │ Direction = causal (past → future) │ │ +│ └───────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ Layer 3: GNN Message Passing (2 rounds) │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ Round 1: h_i = σ(W₁·h_i + Σⱼ α_ij · W₂·h_j) │ │ +│ │ Round 2: h_i = σ(W₃·h_i + Σⱼ α'_ij · W₄·h_j) │ │ +│ │ α_ij = softmax(LeakyReLU(a^T[W·h_i || W·h_j])) │ │ +│ │ (Graph Attention Network mechanism) │ │ +│ └───────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ Layer 4: Refined Representations │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ Updated vectors incorporate neighborhood context │ │ +│ │ Re-rank search results using refined distances │ │ +│ └───────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### Three Integration Modes + +#### Mode 1: Query-Time Refinement (Default) + +GNN refines HNSW results after retrieval. No modifications to stored vectors. + +```rust +pub struct GnnQueryRefiner { + /// GNN weights (small: ~50K parameters) + gnn_weights: GnnModel, + + /// Number of message passing rounds + num_rounds: usize, // 2 + + /// Attention heads for neighbor weighting + num_heads: usize, // 4 + + /// How many HNSW neighbors to consider in GNN + neighborhood_size: usize, // 20 (retrieve 20, GNN selects best 5) +} + +impl GnnQueryRefiner { + /// Refine HNSW results using graph context + pub fn refine(&self, query: &[f32], hnsw_results: &[SearchResult]) -> Vec { + // Build local subgraph from query + HNSW results + let subgraph = self.build_local_subgraph(query, hnsw_results); + + // Run message passing + let refined = self.message_pass(&subgraph, self.num_rounds); + + // Re-rank based on refined representations + self.rerank(query, &refined) + } +} +``` + +**Latency**: +0.2ms on top of HNSW search (total <1.5ms for 100K vectors). + +#### Mode 2: Temporal Sequence Recognition + +Tracks CSI vector sequences to recognize activity patterns that span multiple frames: + +```rust +/// Temporal pattern recognizer using GNN edges +pub struct TemporalPatternRecognizer { + /// Sliding window of recent query vectors + window: VecDeque, + + /// Maximum window size (in frames) + max_window: usize, // 100 (10 seconds at 10 Hz) + + /// Temporal edge decay factor + decay: f64, // 0.95 (edges weaken with time) + + /// Known activity sequences (learned from data) + activity_templates: HashMap>>, +} + +impl TemporalPatternRecognizer { + /// Feed new CSI vector and check for activity pattern matches + pub fn observe(&mut self, vector: &[f32], timestamp: f64) -> Vec { + self.window.push_back(TimestampedVector { vector: vector.to_vec(), timestamp }); + + // Build temporal subgraph from window + let temporal_graph = self.build_temporal_graph(); + + // GNN aggregates temporal context + let sequence_embedding = self.gnn_aggregate(&temporal_graph); + + // Match against known activity templates + self.match_activities(&sequence_embedding) + } +} +``` + +**Activity patterns detectable**: + +| Activity | Frames Needed | CSI Signature | +|----------|--------------|---------------| +| Walking | 10-30 | Periodic Doppler oscillation | +| Falling | 5-15 | Sharp amplitude spike → stillness | +| Sitting down | 10-20 | Gradual descent in reflection height | +| Breathing (still) | 30-100 | Micro-periodic phase variation | +| Gesture (wave) | 5-15 | Localized high-frequency amplitude variation | + +#### Mode 3: Multi-Person Disentanglement + +When N>1 people are present, CSI is a superposition. The GNN learns to cluster pattern fragments: + +```rust +/// Multi-person CSI disentanglement using GNN clustering +pub struct MultiPersonDisentangler { + /// Maximum expected simultaneous persons + max_persons: usize, // 10 + + /// GNN-based spectral clustering + cluster_gnn: GnnModel, + + /// Per-person tracking state + person_tracks: Vec, +} + +impl MultiPersonDisentangler { + /// Separate CSI features into per-person components + pub fn disentangle(&mut self, features: &CsiFeatures) -> Vec { + // Decompose CSI into subcarrier groups using GNN attention + let subcarrier_graph = self.build_subcarrier_graph(features); + + // GNN clusters subcarriers by person contribution + let clusters = self.cluster_gnn.cluster(&subcarrier_graph, self.max_persons); + + // Extract per-person features from clustered subcarriers + clusters.iter().map(|c| self.extract_person_features(features, c)).collect() + } +} +``` + +### GNN Learning Loop + +The GNN improves with every query through RuVector's built-in learning: + +``` +Query → HNSW retrieval → GNN refinement → User action (click/confirm/reject) + │ + ▼ + Update GNN weights via: + 1. Positive: confirmed results get higher attention + 2. Negative: rejected results get lower attention + 3. Temporal: successful sequences reinforce edges +``` + +For WiFi-DensePose, "user action" is replaced by: +- **Temporal consistency**: If frame N+1 confirms frame N's detection, reinforce +- **Multi-AP agreement**: If two APs agree on detection, reinforce both +- **Physical plausibility**: If pose satisfies skeletal constraints, reinforce + +### Performance Budget + +| Component | Parameters | Memory | Latency (per query) | +|-----------|-----------|--------|-------------------| +| GNN weights (2 layers, 4 heads) | 52K | 208 KB | 0.15 ms | +| Temporal graph (100-frame window) | N/A | ~130 KB | 0.05 ms | +| Multi-person clustering | 18K | 72 KB | 0.3 ms | +| **Total GNN overhead** | **70K** | **410 KB** | **0.5 ms** | + +## Consequences + +### Positive +- **Context-aware search**: Results account for temporal and spatial relationships, not just vector distance +- **Activity recognition**: Temporal GNN enables sequence-level pattern matching +- **Multi-person support**: GNN clustering separates overlapping CSI patterns +- **Self-improving**: Every query provides learning signal to refine attention weights +- **Lightweight**: 70K parameters, 410 KB memory, 0.5ms latency overhead + +### Negative +- **Training data needed**: GNN weights require initial training on CSI pattern graphs +- **Complexity**: Three modes increase testing and debugging surface +- **Graph maintenance**: Temporal edges must be pruned to prevent unbounded growth +- **Approximation**: GNN clustering for multi-person is approximate; may merge/split incorrectly + +### Interaction with Other ADRs +- **ADR-004** (HNSW): GNN operates on HNSW graph structure; depends on HNSW being available +- **ADR-005** (SONA): GNN weights can be adapted via SONA LoRA for environment-specific tuning +- **ADR-003** (RVF): GNN weights stored in model container alongside inference weights +- **ADR-010** (Witness): GNN weight updates recorded in witness chain + +## References + +- [Graph Attention Networks (GAT)](https://arxiv.org/abs/1710.10903) +- [Temporal Graph Networks](https://arxiv.org/abs/2006.10637) +- [Spectral Clustering with Graph Neural Networks](https://arxiv.org/abs/1907.00481) +- [WiFi-based Multi-Person Sensing](https://dl.acm.org/doi/10.1145/3534592) +- [RuVector GNN Implementation](https://github.com/ruvnet/ruvector) +- ADR-004: HNSW Vector Search for Signal Fingerprinting diff --git a/docs/adr/ADR-007-post-quantum-cryptography-secure-sensing.md b/docs/adr/ADR-007-post-quantum-cryptography-secure-sensing.md new file mode 100644 index 0000000..bb72649 --- /dev/null +++ b/docs/adr/ADR-007-post-quantum-cryptography-secure-sensing.md @@ -0,0 +1,215 @@ +# ADR-007: Post-Quantum Cryptography for Secure Sensing + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Threat Model + +WiFi-DensePose processes data that can reveal: +- **Human presence/absence** in private spaces (surveillance risk) +- **Health indicators** via breathing/heartbeat detection (medical privacy) +- **Movement patterns** (behavioral profiling) +- **Building occupancy** (physical security intelligence) + +In disaster scenarios (wifi-densepose-mat), the stakes are even higher: +- **Triage classifications** affect rescue priority (life-or-death decisions) +- **Survivor locations** are operationally sensitive +- **Detection audit trails** may be used in legal proceedings (liability) +- **False negatives** (missed survivors) could be forensically investigated + +Current security: The system uses standard JWT (HS256) for API authentication and has no cryptographic protection on data at rest, model integrity, or detection audit trails. + +### Quantum Threat Timeline + +NIST estimates cryptographically relevant quantum computers could emerge by 2030-2035. Data captured today with classical encryption may be decrypted retroactively ("harvest now, decrypt later"). For a system that may be deployed for decades in infrastructure, post-quantum readiness is prudent. + +### RuVector's Crypto Stack + +RuVector provides a layered cryptographic system: + +| Algorithm | Purpose | Standard | Quantum Resistant | +|-----------|---------|----------|-------------------| +| ML-DSA-65 | Digital signatures | FIPS 204 | Yes (lattice-based) | +| Ed25519 | Digital signatures | RFC 8032 | No (classical fallback) | +| SLH-DSA-128s | Digital signatures | FIPS 205 | Yes (hash-based) | +| SHAKE-256 | Hashing | FIPS 202 | Yes | +| AES-256-GCM | Symmetric encryption | FIPS 197 | Yes (Grover's halves, still 128-bit) | + +## Decision + +We will integrate RuVector's cryptographic layer to provide defense-in-depth for WiFi-DensePose data, using a **hybrid classical+PQ** approach where both Ed25519 and ML-DSA-65 signatures are applied (belt-and-suspenders until PQ algorithms mature). + +### Cryptographic Scope + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ Cryptographic Protection Layers │ +├──────────────────────────────────────────────────────────────────┤ +│ │ +│ 1. MODEL INTEGRITY │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ Model weights signed with ML-DSA-65 + Ed25519 │ │ +│ │ Signature verified at load time → reject tampered │ │ +│ │ SONA adaptations co-signed with device key │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +│ 2. DATA AT REST (RVF containers) │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ CSI vectors encrypted with AES-256-GCM │ │ +│ │ Container integrity via SHAKE-256 Merkle tree │ │ +│ │ Key management: per-container keys, sealed to device │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +│ 3. DATA IN TRANSIT │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ API: TLS 1.3 with PQ key exchange (ML-KEM-768) │ │ +│ │ WebSocket: Same TLS channel │ │ +│ │ Multi-AP sync: mTLS with device certificates │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +│ 4. AUDIT TRAIL (witness chains - see ADR-010) │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ Every detection event hash-chained with SHAKE-256 │ │ +│ │ Chain anchors signed with ML-DSA-65 │ │ +│ │ Cross-device attestation via SLH-DSA-128s │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +│ 5. DEVICE IDENTITY │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ Each sensing device has a key pair (ML-DSA-65) │ │ +│ │ Device attestation proves hardware integrity │ │ +│ │ Key rotation schedule: 90 days (or on compromise) │ │ +│ └─────────────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────────┘ +``` + +### Hybrid Signature Scheme + +```rust +/// Hybrid signature combining classical Ed25519 with PQ ML-DSA-65 +pub struct HybridSignature { + /// Classical Ed25519 signature (64 bytes) + ed25519_sig: [u8; 64], + + /// Post-quantum ML-DSA-65 signature (3309 bytes) + ml_dsa_sig: Vec, + + /// Signer's public key fingerprint (SHAKE-256, 32 bytes) + signer_fingerprint: [u8; 32], + + /// Timestamp of signing + timestamp: u64, +} + +impl HybridSignature { + /// Verify requires BOTH signatures to be valid + pub fn verify(&self, message: &[u8], ed25519_pk: &Ed25519PublicKey, + ml_dsa_pk: &MlDsaPublicKey) -> Result { + let ed25519_valid = ed25519_pk.verify(message, &self.ed25519_sig)?; + let ml_dsa_valid = ml_dsa_pk.verify(message, &self.ml_dsa_sig)?; + + // Both must pass (defense in depth) + Ok(ed25519_valid && ml_dsa_valid) + } +} +``` + +### Model Integrity Verification + +```rust +/// Verify model weights have not been tampered with +pub fn verify_model_integrity(model_container: &ModelContainer) -> Result<(), SecurityError> { + // 1. Extract embedded signature from container + let signature = model_container.crypto_segment().signature()?; + + // 2. Compute SHAKE-256 hash of weight data + let weight_hash = shake256(model_container.weights_segment().data()); + + // 3. Verify hybrid signature + let publisher_keys = load_publisher_keys()?; + if !signature.verify(&weight_hash, &publisher_keys.ed25519, &publisher_keys.ml_dsa)? { + return Err(SecurityError::ModelTampered { + expected_signer: publisher_keys.fingerprint(), + container_path: model_container.path().to_owned(), + }); + } + + Ok(()) +} +``` + +### CSI Data Encryption + +For privacy-sensitive deployments, CSI vectors can be encrypted at rest: + +```rust +/// Encrypt CSI vectors for storage in RVF container +pub struct CsiEncryptor { + /// AES-256-GCM key (derived from device key + container salt) + key: Aes256GcmKey, +} + +impl CsiEncryptor { + /// Encrypt a CSI feature vector + /// Note: HNSW search operates on encrypted vectors using + /// distance-preserving encryption (approximate, configurable trade-off) + pub fn encrypt_vector(&self, vector: &[f32]) -> EncryptedVector { + let nonce = generate_nonce(); + let plaintext = bytemuck::cast_slice::(vector); + let ciphertext = aes_256_gcm_encrypt(&self.key, &nonce, plaintext); + EncryptedVector { ciphertext, nonce } + } +} +``` + +### Performance Impact + +| Operation | Without Crypto | With Crypto | Overhead | +|-----------|---------------|-------------|----------| +| Model load | 50 ms | 52 ms | +2 ms (signature verify) | +| Vector insert | 0.1 ms | 0.15 ms | +0.05 ms (encrypt) | +| HNSW search | 0.3 ms | 0.35 ms | +0.05 ms (decrypt top-K) | +| Container open | 10 ms | 12 ms | +2 ms (integrity check) | +| Detection event logging | 0.01 ms | 0.5 ms | +0.49 ms (hash chain) | + +### Feature Flags + +```toml +[features] +default = [] +crypto-classical = ["ed25519-dalek"] # Ed25519 only +crypto-pq = ["pqcrypto-dilithium", "pqcrypto-sphincsplus"] # ML-DSA + SLH-DSA +crypto-hybrid = ["crypto-classical", "crypto-pq"] # Both (recommended) +crypto-encrypt = ["aes-gcm"] # Data-at-rest encryption +crypto-full = ["crypto-hybrid", "crypto-encrypt"] +``` + +## Consequences + +### Positive +- **Future-proof**: Lattice-based signatures resist quantum attacks +- **Tamper detection**: Model poisoning and data manipulation are detectable +- **Privacy compliance**: Encrypted CSI data meets GDPR/HIPAA requirements +- **Forensic integrity**: Signed audit trails are admissible as evidence +- **Low overhead**: <1ms per operation for most crypto operations + +### Negative +- **Signature size**: ML-DSA-65 signatures are 3.3 KB vs 64 bytes for Ed25519 +- **Key management complexity**: Device key provisioning, rotation, revocation +- **HNSW on encrypted data**: Distance-preserving encryption is approximate; search recall may degrade +- **Dependency weight**: PQ crypto libraries add ~2 MB to binary +- **Standards maturity**: FIPS 204/205 are finalized but implementations are evolving + +## References + +- [FIPS 204: ML-DSA (Module-Lattice Digital Signature)](https://csrc.nist.gov/pubs/fips/204/final) +- [FIPS 205: SLH-DSA (Stateless Hash-Based Digital Signature)](https://csrc.nist.gov/pubs/fips/205/final) +- [FIPS 202: SHA-3 / SHAKE](https://csrc.nist.gov/pubs/fips/202/final) +- [RuVector Crypto Implementation](https://github.com/ruvnet/ruvector) +- ADR-002: RuVector RVF Integration Strategy +- ADR-010: Witness Chains for Audit Trail Integrity diff --git a/docs/adr/ADR-008-distributed-consensus-multi-ap.md b/docs/adr/ADR-008-distributed-consensus-multi-ap.md new file mode 100644 index 0000000..9f5acc3 --- /dev/null +++ b/docs/adr/ADR-008-distributed-consensus-multi-ap.md @@ -0,0 +1,284 @@ +# ADR-008: Distributed Consensus for Multi-AP Coordination + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Multi-AP Sensing Architecture + +WiFi-DensePose achieves higher accuracy and coverage with multiple access points (APs) observing the same space from different angles. The disaster detection module (wifi-densepose-mat, ADR-001) explicitly requires distributed deployment: + +- **Portable**: Single TX/RX units deployed around a collapse site +- **Distributed**: Multiple APs covering a large disaster zone +- **Drone-mounted**: UAVs scanning from above with coordinated flight paths + +Each AP independently captures CSI data, extracts features, and runs local inference. But the distributed system needs coordination: + +1. **Consistent survivor registry**: All nodes must agree on the set of detected survivors, their locations, and triage classifications. Conflicting records cause rescue teams to waste time. + +2. **Coordinated scanning**: Avoid redundant scans of the same zone. Dynamically reassign APs as zones are cleared. + +3. **Model synchronization**: When SONA adapts a model on one node (ADR-005), other nodes should benefit from the adaptation without re-learning. + +4. **Clock synchronization**: CSI timestamps must be aligned across nodes for multi-view pose fusion (the GNN multi-person disentanglement in ADR-006 requires temporal alignment). + +5. **Partition tolerance**: In disaster scenarios, network connectivity is unreliable. The system must function during partitions and reconcile when connectivity restores. + +### Current State + +No distributed coordination exists. Each node operates independently. The Rust workspace has no consensus crate. + +### RuVector's Distributed Capabilities + +RuVector provides: +- **Raft consensus**: Leader election and replicated log for strong consistency +- **Vector clocks**: Logical timestamps for causal ordering without synchronized clocks +- **Multi-master replication**: Concurrent writes with conflict resolution +- **Delta consensus**: Tracks behavioral changes across nodes for anomaly detection +- **Auto-sharding**: Distributes data based on access patterns + +## Decision + +We will integrate RuVector's Raft consensus implementation as the coordination backbone for multi-AP WiFi-DensePose deployments, with vector clocks for causal ordering and CRDT-based conflict resolution for partition-tolerant operation. + +### Consensus Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Multi-AP Coordination Architecture │ +├─────────────────────────────────────────────────────────────────────┤ +│ │ +│ Normal Operation (Connected): │ +│ │ +│ ┌─────────┐ Raft ┌─────────┐ Raft ┌─────────┐ │ +│ │ AP-1 │◀────────────▶│ AP-2 │◀────────────▶│ AP-3 │ │ +│ │ (Leader)│ Replicated │(Follower│ Replicated │(Follower│ │ +│ │ │ Log │ )│ Log │ )│ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ +│ │ Local │ │ Local │ │ Local │ │ +│ │ RVF │ │ RVF │ │ RVF │ │ +│ │Container│ │Container│ │Container│ │ +│ └─────────┘ └─────────┘ └─────────┘ │ +│ │ +│ Partitioned Operation (Disconnected): │ +│ │ +│ ┌─────────┐ ┌──────────────────────┐ │ +│ │ AP-1 │ ← operates independently → │ AP-2 AP-3 │ │ +│ │ │ │ (form sub-cluster) │ │ +│ │ Local │ │ Raft between 2+3 │ │ +│ │ writes │ │ │ │ +│ └─────────┘ └──────────────────────┘ │ +│ │ │ │ +│ └──────── Reconnect: CRDT merge ─────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### Replicated State Machine + +The Raft log replicates these operations across all nodes: + +```rust +/// Operations replicated via Raft consensus +#[derive(Serialize, Deserialize, Clone)] +pub enum ConsensusOp { + /// New survivor detected + SurvivorDetected { + survivor_id: Uuid, + location: GeoCoord, + triage: TriageLevel, + detecting_ap: ApId, + confidence: f64, + timestamp: VectorClock, + }, + + /// Survivor status updated (e.g., triage reclassification) + SurvivorUpdated { + survivor_id: Uuid, + new_triage: TriageLevel, + updating_ap: ApId, + evidence: DetectionEvidence, + }, + + /// Zone assignment changed + ZoneAssignment { + zone_id: ZoneId, + assigned_aps: Vec, + priority: ScanPriority, + }, + + /// Model adaptation delta shared + ModelDelta { + source_ap: ApId, + lora_delta: Vec, // Serialized LoRA matrices + environment_hash: [u8; 32], + performance_metrics: AdaptationMetrics, + }, + + /// AP joined or left the cluster + MembershipChange { + ap_id: ApId, + action: MembershipAction, // Join | Leave | Suspect + }, +} +``` + +### Vector Clocks for Causal Ordering + +Since APs may have unsynchronized physical clocks, vector clocks provide causal ordering: + +```rust +/// Vector clock for causal ordering across APs +#[derive(Clone, Serialize, Deserialize)] +pub struct VectorClock { + /// Map from AP ID to logical timestamp + clocks: HashMap, +} + +impl VectorClock { + /// Increment this AP's clock + pub fn tick(&mut self, ap_id: &ApId) { + *self.clocks.entry(ap_id.clone()).or_insert(0) += 1; + } + + /// Merge with another clock (take max of each component) + pub fn merge(&mut self, other: &VectorClock) { + for (ap_id, &ts) in &other.clocks { + let entry = self.clocks.entry(ap_id.clone()).or_insert(0); + *entry = (*entry).max(ts); + } + } + + /// Check if self happened-before other + pub fn happened_before(&self, other: &VectorClock) -> bool { + self.clocks.iter().all(|(k, &v)| { + other.clocks.get(k).map_or(false, |&ov| v <= ov) + }) && self.clocks != other.clocks + } +} +``` + +### CRDT-Based Conflict Resolution + +During network partitions, concurrent updates may conflict. We use CRDTs (Conflict-free Replicated Data Types) for automatic resolution: + +```rust +/// Survivor registry using Last-Writer-Wins Register CRDT +pub struct SurvivorRegistry { + /// LWW-Element-Set: each survivor has a timestamp-tagged state + survivors: HashMap>, +} + +/// Triage uses Max-wins semantics: +/// If partition A says P1 (Red/Immediate) and partition B says P2 (Yellow/Delayed), +/// after merge the survivor is classified P1 (more urgent wins) +/// Rationale: false negative (missing critical) is worse than false positive +impl CrdtMerge for TriageLevel { + fn merge(a: Self, b: Self) -> Self { + // Lower numeric priority = more urgent + if a.urgency() >= b.urgency() { a } else { b } + } +} +``` + +**CRDT merge strategies by data type**: + +| Data Type | CRDT Type | Merge Strategy | Rationale | +|-----------|-----------|---------------|-----------| +| Survivor set | OR-Set | Union (never lose a detection) | Missing survivors = fatal | +| Triage level | Max-Register | Most urgent wins | Err toward caution | +| Location | LWW-Register | Latest timestamp wins | Survivors may move | +| Zone assignment | LWW-Map | Leader's assignment wins | Need authoritative coord | +| Model deltas | G-Set | Accumulate all deltas | All adaptations valuable | + +### Node Discovery and Health + +```rust +/// AP cluster management +pub struct ApCluster { + /// This node's identity + local_ap: ApId, + + /// Raft consensus engine + raft: RaftEngine, + + /// Failure detector (phi-accrual) + failure_detector: PhiAccrualDetector, + + /// Cluster membership + members: HashSet, +} + +impl ApCluster { + /// Heartbeat interval for failure detection + const HEARTBEAT_MS: u64 = 500; + + /// Phi threshold for suspecting node failure + const PHI_THRESHOLD: f64 = 8.0; + + /// Minimum cluster size for Raft (need majority) + const MIN_CLUSTER_SIZE: usize = 3; +} +``` + +### Performance Characteristics + +| Operation | Latency | Notes | +|-----------|---------|-------| +| Raft heartbeat | 500 ms interval | Configurable | +| Log replication | 1-5 ms (LAN) | Depends on payload size | +| Leader election | 1-3 seconds | After leader failure detected | +| CRDT merge (partition heal) | 10-100 ms | Proportional to divergence | +| Vector clock comparison | <0.01 ms | O(n) where n = cluster size | +| Model delta replication | 50-200 ms | ~70 KB LoRA delta | + +### Deployment Configurations + +| Scenario | Nodes | Consensus | Partition Strategy | +|----------|-------|-----------|-------------------| +| Single room | 1-2 | None (local only) | N/A | +| Building floor | 3-5 | Raft (3-node quorum) | CRDT merge on heal | +| Disaster site | 5-20 | Raft (5-node quorum) + zones | Zone-level sub-clusters | +| Urban search | 20-100 | Hierarchical Raft | Regional leaders | + +## Consequences + +### Positive +- **Consistent state**: All APs agree on survivor registry via Raft +- **Partition tolerant**: CRDT merge allows operation during disconnection +- **Causal ordering**: Vector clocks provide logical time without NTP +- **Automatic failover**: Raft leader election handles AP failures +- **Model sharing**: SONA adaptations propagate across cluster + +### Negative +- **Minimum 3 nodes**: Raft requires odd-numbered quorum for leader election +- **Network overhead**: Heartbeats and log replication consume bandwidth (~1-10 KB/s per node) +- **Complexity**: Distributed systems are inherently harder to debug +- **Latency for writes**: Raft requires majority acknowledgment before commit (1-5ms LAN) +- **Split-brain risk**: If cluster splits evenly (2+2), neither partition has quorum + +### Disaster-Specific Considerations + +| Challenge | Mitigation | +|-----------|------------| +| Intermittent connectivity | Aggressive CRDT merge on reconnect; local operation during partition | +| Power failures | Raft log persisted to local SSD; recovery on restart | +| Node destruction | Raft tolerates minority failure; data replicated across survivors | +| Drone mobility | Drone APs treated as ephemeral members; data synced on landing | +| Bandwidth constraints | Delta-only replication; compress LoRA deltas | + +## References + +- [Raft Consensus Algorithm](https://raft.github.io/raft.pdf) +- [CRDTs: Conflict-free Replicated Data Types](https://hal.inria.fr/inria-00609399) +- [Vector Clocks](https://en.wikipedia.org/wiki/Vector_clock) +- [Phi Accrual Failure Detector](https://www.computer.org/csdl/proceedings-article/srds/2004/22390066/12OmNyQYtlC) +- [RuVector Distributed Consensus](https://github.com/ruvnet/ruvector) +- ADR-001: WiFi-Mat Disaster Detection Architecture +- ADR-002: RuVector RVF Integration Strategy diff --git a/docs/adr/ADR-009-rvf-wasm-runtime-edge-deployment.md b/docs/adr/ADR-009-rvf-wasm-runtime-edge-deployment.md new file mode 100644 index 0000000..a30ef82 --- /dev/null +++ b/docs/adr/ADR-009-rvf-wasm-runtime-edge-deployment.md @@ -0,0 +1,262 @@ +# ADR-009: RVF WASM Runtime for Edge Deployment + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Current WASM State + +The wifi-densepose-wasm crate provides basic WebAssembly bindings that expose Rust types to JavaScript. It enables browser-based visualization and lightweight inference but has significant limitations: + +1. **No self-contained operation**: WASM module depends on external model files loaded via fetch(). If the server is unreachable, the module is useless. + +2. **No persistent state**: Browser WASM has no built-in persistent storage for fingerprint databases, model weights, or session data. + +3. **No offline capability**: Without network access, the WASM module cannot load models or send results. + +4. **Binary size**: Current WASM bundle is not optimized. Full inference + signal processing compiles to ~5-15 MB. + +### Edge Deployment Requirements + +| Scenario | Platform | Constraints | +|----------|----------|------------| +| Browser dashboard | Chrome/Firefox | <10 MB download, no plugins | +| IoT sensor node | ESP32/Raspberry Pi | 256 KB - 4 GB RAM, battery powered | +| Mobile app | iOS/Android WebView | Limited background execution | +| Drone payload | Embedded Linux + WASM | Weight/power limited, intermittent connectivity | +| Field tablet | Android tablet | Offline operation in disaster zones | + +### RuVector's Edge Runtime + +RuVector provides a 5.5 KB WASM runtime that boots in 125ms, with: +- Self-contained operation (models + data embedded in RVF container) +- Persistent storage via RVF container (written to IndexedDB in browser, filesystem on native) +- Offline-first architecture +- SIMD acceleration when available (WASM SIMD proposal) + +## Decision + +We will replace the current wifi-densepose-wasm approach with an RVF-based edge runtime that packages models, fingerprint databases, and the inference engine into a single deployable RVF container. + +### Edge Runtime Architecture + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ RVF Edge Deployment Container │ +│ (.rvf.edge file) │ +├──────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ +│ │ WASM │ │ VEC │ │ INDEX │ │ MODEL (ONNX) │ │ +│ │ Runtime │ │ CSI │ │ HNSW │ │ + LoRA deltas │ │ +│ │ (5.5KB) │ │ Finger- │ │ Graph │ │ │ │ +│ │ │ │ prints │ │ │ │ │ │ +│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ +│ │ CRYPTO │ │ WITNESS │ │ COW_MAP │ │ CONFIG │ │ +│ │ Keys │ │ Audit │ │ Branches│ │ Runtime params │ │ +│ │ │ │ Chain │ │ │ │ │ │ +│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │ +│ │ +│ Total container: 1-50 MB depending on model + fingerprint size │ +└──────────────────────────────────────────────────────────────────┘ + │ + │ Deploy to: + ▼ +┌───────────────────────────────────────────────────────────────┐ +│ │ +│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │ +│ │ Browser │ │ IoT │ │ Mobile │ │ Disaster Field │ │ +│ │ │ │ Device │ │ App │ │ Tablet │ │ +│ │ IndexedDB │ Flash │ │ App │ │ Local FS │ │ +│ │ for state│ │ for │ │ Sandbox │ │ for state │ │ +│ │ │ │ state │ │ for │ │ │ │ +│ │ │ │ │ │ state │ │ │ │ +│ └─────────┘ └─────────┘ └─────────┘ └─────────────────┘ │ +└───────────────────────────────────────────────────────────────┘ +``` + +### Tiered Runtime Profiles + +Different deployment targets get different container configurations: + +```rust +/// Edge runtime profiles +pub enum EdgeProfile { + /// Full-featured browser deployment + /// ~10 MB container, full inference + HNSW + SONA + Browser { + model_quantization: Quantization::Int8, + max_fingerprints: 100_000, + enable_sona: true, + storage_backend: StorageBackend::IndexedDB, + }, + + /// Minimal IoT deployment + /// ~1 MB container, lightweight inference only + IoT { + model_quantization: Quantization::Int4, + max_fingerprints: 1_000, + enable_sona: false, + storage_backend: StorageBackend::Flash, + }, + + /// Mobile app deployment + /// ~5 MB container, inference + HNSW, limited SONA + Mobile { + model_quantization: Quantization::Int8, + max_fingerprints: 50_000, + enable_sona: true, + storage_backend: StorageBackend::AppSandbox, + }, + + /// Disaster field deployment (maximum capability) + /// ~50 MB container, full stack including multi-AP consensus + Field { + model_quantization: Quantization::Float16, + max_fingerprints: 1_000_000, + enable_sona: true, + storage_backend: StorageBackend::FileSystem, + }, +} +``` + +### Container Size Budget + +| Segment | Browser | IoT | Mobile | Field | +|---------|---------|-----|--------|-------| +| WASM runtime | 5.5 KB | 5.5 KB | 5.5 KB | 5.5 KB | +| Model (ONNX) | 3 MB (int8) | 0.5 MB (int4) | 3 MB (int8) | 12 MB (fp16) | +| HNSW index | 4 MB | 100 KB | 2 MB | 40 MB | +| Fingerprint vectors | 2 MB | 50 KB | 1 MB | 10 MB | +| Config + crypto | 50 KB | 10 KB | 50 KB | 100 KB | +| **Total** | **~10 MB** | **~0.7 MB** | **~6 MB** | **~62 MB** | + +### Offline-First Data Flow + +``` +┌────────────────────────────────────────────────────────────────────┐ +│ Offline-First Operation │ +├────────────────────────────────────────────────────────────────────┤ +│ │ +│ 1. BOOT (125ms) │ +│ ├── Open RVF container from local storage │ +│ ├── Memory-map WASM runtime segment │ +│ ├── Load HNSW index into memory │ +│ └── Initialize inference engine with embedded model │ +│ │ +│ 2. OPERATE (continuous) │ +│ ├── Receive CSI data from local hardware interface │ +│ ├── Process through local pipeline (no network needed) │ +│ ├── Search HNSW index against local fingerprints │ +│ ├── Run SONA adaptation on local data │ +│ ├── Append results to local witness chain │ +│ └── Store updated vectors to local container │ +│ │ +│ 3. SYNC (when connected) │ +│ ├── Push new vectors to central RVF container │ +│ ├── Pull updated fingerprints from other nodes │ +│ ├── Merge SONA deltas via Raft (ADR-008) │ +│ ├── Extend witness chain with cross-node attestation │ +│ └── Update local container with merged state │ +│ │ +│ 4. SLEEP (battery conservation) │ +│ ├── Flush pending writes to container │ +│ ├── Close memory-mapped segments │ +│ └── Resume from step 1 on wake │ +└────────────────────────────────────────────────────────────────────┘ +``` + +### Browser-Specific Integration + +```rust +/// Browser WASM entry point +#[wasm_bindgen] +pub struct WifiDensePoseEdge { + container: RvfContainer, + inference_engine: InferenceEngine, + hnsw_index: HnswIndex, + sona: Option, +} + +#[wasm_bindgen] +impl WifiDensePoseEdge { + /// Initialize from an RVF container loaded via fetch or IndexedDB + #[wasm_bindgen(constructor)] + pub async fn new(container_bytes: &[u8]) -> Result { + let container = RvfContainer::from_bytes(container_bytes)?; + let engine = InferenceEngine::from_container(&container)?; + let index = HnswIndex::from_container(&container)?; + let sona = SonaAdapter::from_container(&container).ok(); + + Ok(Self { container, inference_engine: engine, hnsw_index: index, sona }) + } + + /// Process a single CSI frame (called from JavaScript) + #[wasm_bindgen] + pub fn process_frame(&mut self, csi_json: &str) -> Result { + let csi_data: CsiData = serde_json::from_str(csi_json) + .map_err(|e| JsValue::from_str(&e.to_string()))?; + + let features = self.extract_features(&csi_data)?; + let detection = self.detect(&features)?; + let pose = if detection.human_detected { + Some(self.estimate_pose(&features)?) + } else { + None + }; + + serde_json::to_string(&PoseResult { detection, pose }) + .map_err(|e| JsValue::from_str(&e.to_string())) + } + + /// Save current state to IndexedDB + #[wasm_bindgen] + pub async fn persist(&self) -> Result<(), JsValue> { + let bytes = self.container.serialize()?; + // Write to IndexedDB via web-sys + save_to_indexeddb("wifi-densepose-state", &bytes).await + } +} +``` + +### Model Quantization Strategy + +| Quantization | Size Reduction | Accuracy Loss | Suitable For | +|-------------|---------------|---------------|-------------| +| Float32 (baseline) | 1x | 0% | Server/desktop | +| Float16 | 2x | <0.5% | Field tablets, GPUs | +| Int8 (PTQ) | 4x | <2% | Browser, mobile | +| Int4 (GPTQ) | 8x | <5% | IoT, ultra-constrained | +| Binary (1-bit) | 32x | ~15% | MCU/ultra-edge (experimental) | + +## Consequences + +### Positive +- **Single-file deployment**: Copy one `.rvf.edge` file to deploy anywhere +- **Offline operation**: Full functionality without network connectivity +- **125ms boot**: Near-instant readiness for emergency scenarios +- **Platform universal**: Same container format for browser, IoT, mobile, server +- **Battery efficient**: No network polling in offline mode + +### Negative +- **Container size**: Even compressed, field containers are 50+ MB +- **WASM performance**: 2-5x slower than native Rust for compute-heavy operations +- **Browser limitations**: IndexedDB has storage quotas; WASM SIMD support varies +- **Update latency**: Offline devices miss updates until reconnection +- **Quantization accuracy**: Int4/Int8 models lose some detection sensitivity + +## References + +- [WebAssembly SIMD Proposal](https://github.com/WebAssembly/simd) +- [IndexedDB API](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API) +- [ONNX Runtime Web](https://onnxruntime.ai/docs/tutorials/web/) +- [Model Quantization Techniques](https://arxiv.org/abs/2103.13630) +- [RuVector WASM Runtime](https://github.com/ruvnet/ruvector) +- ADR-002: RuVector RVF Integration Strategy +- ADR-003: RVF Cognitive Containers for CSI Data diff --git a/docs/adr/ADR-010-witness-chains-audit-trail-integrity.md b/docs/adr/ADR-010-witness-chains-audit-trail-integrity.md new file mode 100644 index 0000000..d853b5f --- /dev/null +++ b/docs/adr/ADR-010-witness-chains-audit-trail-integrity.md @@ -0,0 +1,402 @@ +# ADR-010: Witness Chains for Audit Trail Integrity + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Life-Critical Audit Requirements + +The wifi-densepose-mat disaster detection module (ADR-001) makes triage classifications that directly affect rescue priority: + +| Triage Level | Action | Consequence of Error | +|-------------|--------|---------------------| +| P1 (Immediate/Red) | Rescue NOW | False negative → survivor dies waiting | +| P2 (Delayed/Yellow) | Rescue within 1 hour | Misclassification → delayed rescue | +| P3 (Minor/Green) | Rescue when resources allow | Over-triage → resource waste | +| P4 (Deceased/Black) | No rescue attempted | False P4 → living person abandoned | + +Post-incident investigations, liability proceedings, and operational reviews require: + +1. **Non-repudiation**: Prove which device made which detection at which time +2. **Tamper evidence**: Detect if records were altered after the fact +3. **Completeness**: Prove no detections were deleted or hidden +4. **Causal chain**: Reconstruct the sequence of events leading to each triage decision +5. **Cross-device verification**: Corroborate detections across multiple APs + +### Current State + +Detection results are logged to the database (`wifi-densepose-db`) with standard INSERT operations. Logs can be: +- Silently modified after the fact +- Deleted without trace +- Backdated or reordered +- Lost if the database is corrupted + +No cryptographic integrity mechanism exists. + +### RuVector Witness Chains + +RuVector implements hash-linked audit trails inspired by blockchain but without the consensus overhead: + +- **Hash chain**: Each entry includes the SHAKE-256 hash of the previous entry, forming a tamper-evident chain +- **Signatures**: Chain anchors (every Nth entry) are signed with the device's key pair +- **Cross-chain attestation**: Multiple devices can cross-reference each other's chains +- **Compact**: Each chain entry is ~100-200 bytes (hash + metadata + signature reference) + +## Decision + +We will implement RuVector witness chains as the primary audit mechanism for all detection events, triage decisions, and model adaptation events in the WiFi-DensePose system. + +### Witness Chain Structure + +``` +┌────────────────────────────────────────────────────────────────────┐ +│ Witness Chain │ +├────────────────────────────────────────────────────────────────────┤ +│ │ +│ Entry 0 Entry 1 Entry 2 Entry 3 │ +│ (Genesis) │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ prev: ∅ │◀───│ prev: H0 │◀───│ prev: H1 │◀───│ prev: H2 │ │ +│ │ event: │ │ event: │ │ event: │ │ event: │ │ +│ │ INIT │ │ DETECT │ │ TRIAGE │ │ ADAPT │ │ +│ │ hash: H0 │ │ hash: H1 │ │ hash: H2 │ │ hash: H3 │ │ +│ │ sig: S0 │ │ │ │ │ │ sig: S1 │ │ +│ │ (anchor) │ │ │ │ │ │ (anchor) │ │ +│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ +│ │ +│ H0 = SHAKE-256(INIT || device_id || timestamp) │ +│ H1 = SHAKE-256(DETECT_DATA || H0 || timestamp) │ +│ H2 = SHAKE-256(TRIAGE_DATA || H1 || timestamp) │ +│ H3 = SHAKE-256(ADAPT_DATA || H2 || timestamp) │ +│ │ +│ Anchor signature S0 = ML-DSA-65.sign(H0, device_key) │ +│ Anchor signature S1 = ML-DSA-65.sign(H3, device_key) │ +│ Anchor interval: every 100 entries (configurable) │ +└────────────────────────────────────────────────────────────────────┘ +``` + +### Witnessed Event Types + +```rust +/// Events recorded in the witness chain +#[derive(Serialize, Deserialize, Clone)] +pub enum WitnessedEvent { + /// Chain initialization (genesis) + ChainInit { + device_id: DeviceId, + firmware_version: String, + config_hash: [u8; 32], + }, + + /// Human presence detected + HumanDetected { + detection_id: Uuid, + confidence: f64, + csi_features_hash: [u8; 32], // Hash of input data, not raw data + location_estimate: Option, + model_version: String, + }, + + /// Triage classification assigned or changed + TriageDecision { + survivor_id: Uuid, + previous_level: Option, + new_level: TriageLevel, + evidence_hash: [u8; 32], // Hash of supporting evidence + deciding_algorithm: String, + confidence: f64, + }, + + /// False detection corrected + DetectionCorrected { + detection_id: Uuid, + correction_type: CorrectionType, // FalsePositive | FalseNegative | Reclassified + reason: String, + corrected_by: CorrectorId, // Device or operator + }, + + /// Model adapted via SONA + ModelAdapted { + adaptation_id: Uuid, + trigger: AdaptationTrigger, + lora_delta_hash: [u8; 32], + performance_before: f64, + performance_after: f64, + }, + + /// Zone scan completed + ZoneScanCompleted { + zone_id: ZoneId, + scan_duration_ms: u64, + detections_count: usize, + coverage_percentage: f64, + }, + + /// Cross-device attestation received + CrossAttestation { + attesting_device: DeviceId, + attested_chain_hash: [u8; 32], + attested_entry_index: u64, + }, + + /// Operator action (manual override) + OperatorAction { + operator_id: String, + action: OperatorActionType, + target: Uuid, // What was acted upon + justification: String, + }, +} +``` + +### Chain Entry Structure + +```rust +/// A single entry in the witness chain +#[derive(Serialize, Deserialize)] +pub struct WitnessEntry { + /// Sequential index in the chain + index: u64, + + /// SHAKE-256 hash of the previous entry (32 bytes) + previous_hash: [u8; 32], + + /// The witnessed event + event: WitnessedEvent, + + /// Device that created this entry + device_id: DeviceId, + + /// Monotonic timestamp (device-local, not wall clock) + monotonic_timestamp: u64, + + /// Wall clock timestamp (best-effort, may be inaccurate) + wall_timestamp: DateTime, + + /// Vector clock for causal ordering (see ADR-008) + vector_clock: VectorClock, + + /// This entry's hash: SHAKE-256(serialize(self without this field)) + entry_hash: [u8; 32], + + /// Anchor signature (present every N entries) + anchor_signature: Option, +} +``` + +### Tamper Detection + +```rust +/// Verify witness chain integrity +pub fn verify_chain(chain: &[WitnessEntry]) -> Result { + let mut verification = ChainVerification::new(); + + for (i, entry) in chain.iter().enumerate() { + // 1. Verify hash chain linkage + if i > 0 { + let expected_prev_hash = chain[i - 1].entry_hash; + if entry.previous_hash != expected_prev_hash { + verification.add_violation(ChainViolation::BrokenLink { + entry_index: entry.index, + expected_hash: expected_prev_hash, + actual_hash: entry.previous_hash, + }); + } + } + + // 2. Verify entry self-hash + let computed_hash = compute_entry_hash(entry); + if computed_hash != entry.entry_hash { + verification.add_violation(ChainViolation::TamperedEntry { + entry_index: entry.index, + }); + } + + // 3. Verify anchor signatures + if let Some(ref sig) = entry.anchor_signature { + let device_keys = load_device_keys(&entry.device_id)?; + if !sig.verify(&entry.entry_hash, &device_keys.ed25519, &device_keys.ml_dsa)? { + verification.add_violation(ChainViolation::InvalidSignature { + entry_index: entry.index, + }); + } + } + + // 4. Verify monotonic timestamp ordering + if i > 0 && entry.monotonic_timestamp <= chain[i - 1].monotonic_timestamp { + verification.add_violation(ChainViolation::NonMonotonicTimestamp { + entry_index: entry.index, + }); + } + + verification.verified_entries += 1; + } + + Ok(verification) +} +``` + +### Cross-Device Attestation + +Multiple APs can cross-reference each other's chains for stronger guarantees: + +``` +Device A's chain: Device B's chain: +┌──────────┐ ┌──────────┐ +│ Entry 50 │ │ Entry 73 │ +│ H_A50 │◀────── cross-attest ───▶│ H_B73 │ +└──────────┘ └──────────┘ + +Device A records: CrossAttestation { attesting: B, hash: H_B73, index: 73 } +Device B records: CrossAttestation { attesting: A, hash: H_A50, index: 50 } + +After cross-attestation: +- Neither device can rewrite entries before the attested point + without the other device's chain becoming inconsistent +- An investigator can verify both chains agree on the attestation point +``` + +**Attestation frequency**: Every 5 minutes during connected operation, immediately on significant events (P1 triage, zone completion). + +### Storage and Retrieval + +Witness chains are stored in the RVF container's WITNESS segment: + +```rust +/// Witness chain storage manager +pub struct WitnessChainStore { + /// Current chain being appended to + active_chain: Vec, + + /// Anchor signature interval + anchor_interval: usize, // 100 + + /// Device signing key + device_key: DeviceKeyPair, + + /// Cross-attestation peers + attestation_peers: Vec, + + /// RVF container for persistence + container: RvfContainer, +} + +impl WitnessChainStore { + /// Append an event to the chain + pub fn witness(&mut self, event: WitnessedEvent) -> Result { + let index = self.active_chain.len() as u64; + let previous_hash = self.active_chain.last() + .map(|e| e.entry_hash) + .unwrap_or([0u8; 32]); + + let mut entry = WitnessEntry { + index, + previous_hash, + event, + device_id: self.device_key.device_id(), + monotonic_timestamp: monotonic_now(), + wall_timestamp: Utc::now(), + vector_clock: self.get_current_vclock(), + entry_hash: [0u8; 32], // Computed below + anchor_signature: None, + }; + + // Compute entry hash + entry.entry_hash = compute_entry_hash(&entry); + + // Add anchor signature at interval + if index % self.anchor_interval as u64 == 0 { + entry.anchor_signature = Some( + self.device_key.sign_hybrid(&entry.entry_hash)? + ); + } + + self.active_chain.push(entry); + + // Persist to RVF container + self.container.append_witness(&self.active_chain.last().unwrap())?; + + Ok(index) + } + + /// Query chain for events in a time range + pub fn query_range(&self, start: DateTime, end: DateTime) + -> Vec<&WitnessEntry> + { + self.active_chain.iter() + .filter(|e| e.wall_timestamp >= start && e.wall_timestamp <= end) + .collect() + } + + /// Export chain for external audit + pub fn export_for_audit(&self) -> AuditBundle { + AuditBundle { + chain: self.active_chain.clone(), + device_public_key: self.device_key.public_keys(), + cross_attestations: self.collect_cross_attestations(), + chain_summary: self.compute_summary(), + } + } +} +``` + +### Performance Impact + +| Operation | Latency | Notes | +|-----------|---------|-------| +| Append entry | 0.05 ms | Hash computation + serialize | +| Append with anchor signature | 0.5 ms | + ML-DSA-65 sign | +| Verify single entry | 0.02 ms | Hash comparison | +| Verify anchor | 0.3 ms | ML-DSA-65 verify | +| Full chain verify (10K entries) | 50 ms | Sequential hash verification | +| Cross-attestation | 1 ms | Sign + network round-trip | + +### Storage Requirements + +| Chain Length | Entries/Hour | Size/Hour | Size/Day | +|-------------|-------------|-----------|----------| +| Low activity | ~100 | ~20 KB | ~480 KB | +| Normal operation | ~1,000 | ~200 KB | ~4.8 MB | +| Disaster response | ~10,000 | ~2 MB | ~48 MB | +| High-intensity scan | ~50,000 | ~10 MB | ~240 MB | + +## Consequences + +### Positive +- **Tamper-evident**: Any modification to historical records is detectable +- **Non-repudiable**: Signed anchors prove device identity +- **Complete history**: Every detection, triage, and correction is recorded +- **Cross-verified**: Multi-device attestation strengthens guarantees +- **Forensically sound**: Exportable audit bundles for legal proceedings +- **Low overhead**: 0.05ms per entry; minimal storage for normal operation + +### Negative +- **Append-only growth**: Chains grow monotonically; need archival strategy for long deployments +- **Key management**: Device keys must be provisioned and protected +- **Clock dependency**: Wall-clock timestamps are best-effort; monotonic timestamps are device-local +- **Verification cost**: Full chain verification of long chains takes meaningful time (50ms/10K entries) +- **Privacy tension**: Detailed audit trails contain operational intelligence + +### Regulatory Alignment + +| Requirement | How Witness Chains Address It | +|------------|------------------------------| +| GDPR (Right to erasure) | Event hashes stored, not personal data; original data deletable while chain proves historical integrity | +| HIPAA (Audit controls) | Complete access/modification log with non-repudiation | +| ISO 27001 (Information security) | Tamper-evident records, access logging, integrity verification | +| NIST SP 800-53 (AU controls) | Audit record generation, protection, and review capability | +| FEMA ICS (Incident Command) | Chain of custody for all operational decisions | + +## References + +- [Witness Chains in Distributed Systems](https://eprint.iacr.org/2019/747) +- [SHAKE-256 (FIPS 202)](https://csrc.nist.gov/pubs/fips/202/final) +- [Tamper-Evident Logging](https://www.usenix.org/legacy/event/sec09/tech/full_papers/crosby.pdf) +- [RuVector Witness Implementation](https://github.com/ruvnet/ruvector) +- ADR-001: WiFi-Mat Disaster Detection Architecture +- ADR-007: Post-Quantum Cryptography for Secure Sensing +- ADR-008: Distributed Consensus for Multi-AP Coordination diff --git a/docs/adr/ADR-011-python-proof-of-reality-mock-elimination.md b/docs/adr/ADR-011-python-proof-of-reality-mock-elimination.md new file mode 100644 index 0000000..2695477 --- /dev/null +++ b/docs/adr/ADR-011-python-proof-of-reality-mock-elimination.md @@ -0,0 +1,414 @@ +# ADR-011: Python Proof-of-Reality and Mock Elimination + +## Status +Proposed (URGENT) + +## Date +2026-02-28 + +## Context + +### The Credibility Problem + +The WiFi-DensePose Python codebase contains real, mathematically sound signal processing (FFT, phase unwrapping, Doppler extraction, correlation features) alongside mock/placeholder code that fatally undermines credibility. External reviewers who encounter **any** mock path in the default execution flow conclude the entire system is synthetic. This is not a technical problem - it is a perception problem with technical root causes. + +### Specific Mock/Placeholder Inventory + +The following code paths produce fake data **in the default configuration** or are easily mistaken for indicating fake functionality: + +#### Critical Severity (produces fake output on default path) + +| File | Line | Issue | Impact | +|------|------|-------|--------| +| `v1/src/core/csi_processor.py` | 390 | `doppler_shift = np.random.rand(10) # Placeholder` | **Real feature extractor returns random Doppler** - kills credibility of entire feature pipeline | +| `v1/src/hardware/csi_extractor.py` | 83-84 | `amplitude = np.random.rand(...)` in CSI extraction fallback | Random data silently substituted when parsing fails | +| `v1/src/hardware/csi_extractor.py` | 129-135 | `_parse_atheros()` returns `np.random.rand()` with comment "placeholder implementation" | Named as if it parses real data, actually random | +| `v1/src/hardware/router_interface.py` | 211-212 | `np.random.rand(3, 56)` in fallback path | Silent random fallback | +| `v1/src/services/pose_service.py` | 431 | `mock_csi = np.random.randn(64, 56, 3) # Mock CSI data` | Mock CSI in production code path | +| `v1/src/services/pose_service.py` | 293-356 | `_generate_mock_poses()` with `random.randint` throughout | Entire mock pose generator in service layer | +| `v1/src/services/pose_service.py` | 489-607 | Multiple `random.randint` for occupancy, historical data | Fake statistics that look real in API responses | +| `v1/src/api/dependencies.py` | 82, 408 | "return a mock user for development" | Auth bypass in default path | + +#### Moderate Severity (mock gated behind flags but confusing) + +| File | Line | Issue | +|------|------|-------| +| `v1/src/config/settings.py` | 144-145 | `mock_hardware=False`, `mock_pose_data=False` defaults - correct, but mock infrastructure exists | +| `v1/src/core/router_interface.py` | 27-300 | 270+ lines of mock data generation infrastructure in production code | +| `v1/src/services/pose_service.py` | 84-88 | Silent conditional: `if not self.settings.mock_pose_data` with no logging of real-mode | +| `v1/src/services/hardware_service.py` | 72-375 | Interleaved mock/real paths throughout | + +#### Low Severity (placeholders/TODOs) + +| File | Line | Issue | +|------|------|-------| +| `v1/src/core/router_interface.py` | 198 | "Collect real CSI data from router (placeholder implementation)" | +| `v1/src/api/routers/health.py` | 170-171 | `uptime_seconds = 0.0 # TODO` | +| `v1/src/services/pose_service.py` | 739 | `"uptime_seconds": 0.0 # TODO` | + +### Root Cause Analysis + +1. **No separation between mock and real**: Mock generators live in the same modules as real processors. A reviewer reading `csi_processor.py` hits `np.random.rand(10)` at line 390 and stops trusting the 400 lines of real signal processing above it. + +2. **Silent fallbacks**: When real hardware isn't available, the system silently falls back to random data instead of failing loudly. This means the default `docker compose up` produces plausible-looking but entirely fake results. + +3. **No proof artifact**: There is no shipped CSI capture file, no expected output hash, no way for a reviewer to verify that the pipeline produces deterministic results from real input. + +4. **Build environment fragility**: The `Dockerfile` references `requirements.txt` which doesn't exist as a standalone file. The `setup.py` hardcodes 87 dependencies. ONNX Runtime and BLAS are not in the container. A `docker build` may or may not succeed depending on the machine. + +5. **No CI verification**: No GitHub Actions workflow runs the pipeline on a real or deterministic input and verifies the output. + +## Decision + +We will eliminate the credibility gap through five concrete changes: + +### 1. Eliminate All Silent Mock Fallbacks (HARD FAIL) + +**Every path that currently returns `np.random.rand()` will either be replaced with real computation or will raise an explicit error.** + +```python +# BEFORE (csi_processor.py:390) +doppler_shift = np.random.rand(10) # Placeholder + +# AFTER +def _extract_doppler_features(self, csi_data: CSIData) -> tuple: + """Extract Doppler and frequency domain features from CSI temporal history.""" + if len(self.csi_history) < 2: + # Not enough history for temporal analysis - return zeros, not random + doppler_shift = np.zeros(self.window_size) + psd = np.abs(scipy.fft.fft(csi_data.amplitude.flatten(), n=128))**2 + return doppler_shift, psd + + # Real Doppler extraction from temporal CSI differences + history_array = np.array([h.amplitude for h in self.get_recent_history(self.window_size)]) + # Compute phase differences over time (proportional to Doppler shift) + temporal_phase_diff = np.diff(np.angle(history_array + 1j * np.zeros_like(history_array)), axis=0) + # Average across antennas, FFT across time for Doppler spectrum + doppler_spectrum = np.abs(scipy.fft.fft(temporal_phase_diff.mean(axis=1), axis=0)) + doppler_shift = doppler_spectrum.mean(axis=1) + + psd = np.abs(scipy.fft.fft(csi_data.amplitude.flatten(), n=128))**2 + return doppler_shift, psd +``` + +```python +# BEFORE (csi_extractor.py:129-135) +def _parse_atheros(self, raw_data): + """Parse Atheros CSI format (placeholder implementation).""" + # For now, return mock data for testing + return CSIData(amplitude=np.random.rand(3, 56), ...) + +# AFTER +def _parse_atheros(self, raw_data: bytes) -> CSIData: + """Parse Atheros CSI Tool format. + + Format: https://dhalperi.github.io/linux-80211n-csitool/ + """ + if len(raw_data) < 25: # Minimum Atheros CSI header + raise CSIExtractionError( + f"Atheros CSI data too short ({len(raw_data)} bytes). " + "Expected real CSI capture from Atheros-based NIC. " + "See docs/hardware-setup.md for capture instructions." + ) + # Parse actual Atheros binary format + # ... real parsing implementation ... +``` + +### 2. Isolate Mock Infrastructure Behind Explicit Flag with Banner + +**All mock code moves to a dedicated module. Default execution NEVER touches mock paths.** + +``` +v1/src/ +├── core/ +│ ├── csi_processor.py # Real processing only +│ └── router_interface.py # Real hardware interface only +├── testing/ # NEW: isolated mock module +│ ├── __init__.py +│ ├── mock_csi_generator.py # Mock CSI generation (moved from router_interface) +│ ├── mock_pose_generator.py # Mock poses (moved from pose_service) +│ └── fixtures/ # Test fixtures, not production paths +│ ├── sample_csi_capture.bin # Real captured CSI data (tiny sample) +│ └── expected_output.json # Expected pipeline output for sample +``` + +**Runtime enforcement:** +```python +import os +import sys + +MOCK_MODE = os.environ.get("WIFI_DENSEPOSE_MOCK", "").lower() == "true" + +if MOCK_MODE: + # Print banner on EVERY log line + _original_log = logging.Logger._log + def _mock_banner_log(self, level, msg, args, **kwargs): + _original_log(self, level, f"[MOCK MODE] {msg}", args, **kwargs) + logging.Logger._log = _mock_banner_log + + print("=" * 72, file=sys.stderr) + print(" WARNING: RUNNING IN MOCK MODE - ALL DATA IS SYNTHETIC", file=sys.stderr) + print(" Set WIFI_DENSEPOSE_MOCK=false for real operation", file=sys.stderr) + print("=" * 72, file=sys.stderr) +``` + +### 3. Ship a Reproducible Proof Bundle + +A small real CSI capture file + one-command verification pipeline: + +``` +v1/data/proof/ +├── README.md # How to verify +├── sample_csi_capture.bin # Real CSI data (1 second, ~50 KB) +├── sample_csi_capture_meta.json # Capture metadata (hardware, env) +├── expected_features.json # Expected feature extraction output +├── expected_features.sha256 # SHA-256 hash of expected output +└── verify.py # One-command verification script +``` + +**verify.py**: +```python +#!/usr/bin/env python3 +"""Verify WiFi-DensePose pipeline produces deterministic output from real CSI data. + +Usage: + python v1/data/proof/verify.py + +Expected output: + PASS: Pipeline output matches expected hash + SHA256: + +If this passes, the signal processing pipeline is producing real, +deterministic results from real captured CSI data. +""" +import hashlib +import json +import sys +import os + +# Ensure reproducibility +os.environ["PYTHONHASHSEED"] = "42" +import numpy as np +np.random.seed(42) # Only affects any remaining random elements + +sys.path.insert(0, os.path.join(os.path.dirname(__file__), "../..")) + +from src.core.csi_processor import CSIProcessor +from src.hardware.csi_extractor import CSIExtractor + +def main(): + # Load real captured CSI data + capture_path = os.path.join(os.path.dirname(__file__), "sample_csi_capture.bin") + meta_path = os.path.join(os.path.dirname(__file__), "sample_csi_capture_meta.json") + expected_hash_path = os.path.join(os.path.dirname(__file__), "expected_features.sha256") + + with open(meta_path) as f: + meta = json.load(f) + + # Extract CSI from binary capture + extractor = CSIExtractor(format=meta["format"]) + csi_data = extractor.extract_from_file(capture_path) + + # Process through feature pipeline + config = { + "sampling_rate": meta["sampling_rate"], + "window_size": meta["window_size"], + "overlap": meta["overlap"], + "noise_threshold": meta["noise_threshold"], + } + processor = CSIProcessor(config) + features = processor.extract_features(csi_data) + + # Serialize features deterministically + output = { + "amplitude_mean": features.amplitude_mean.tolist(), + "amplitude_variance": features.amplitude_variance.tolist(), + "phase_difference": features.phase_difference.tolist(), + "doppler_shift": features.doppler_shift.tolist(), + "psd_first_16": features.power_spectral_density[:16].tolist(), + } + output_json = json.dumps(output, sort_keys=True, separators=(",", ":")) + output_hash = hashlib.sha256(output_json.encode()).hexdigest() + + # Verify against expected hash + with open(expected_hash_path) as f: + expected_hash = f.read().strip() + + if output_hash == expected_hash: + print(f"PASS: Pipeline output matches expected hash") + print(f"SHA256: {output_hash}") + print(f"Features: {len(output['amplitude_mean'])} subcarriers processed") + return 0 + else: + print(f"FAIL: Hash mismatch") + print(f"Expected: {expected_hash}") + print(f"Got: {output_hash}") + return 1 + +if __name__ == "__main__": + sys.exit(main()) +``` + +### 4. Pin the Build Environment + +**Option A (recommended): Deterministic Dockerfile that works on fresh machine** + +```dockerfile +FROM python:3.11-slim + +# System deps that actually matter +RUN apt-get update && apt-get install -y --no-install-recommends \ + libopenblas-dev \ + libfftw3-dev \ + && rm -rf /var/lib/apt/lists/* + +WORKDIR /app + +# Pinned requirements (not a reference to missing file) +COPY v1/requirements-lock.txt ./requirements.txt +RUN pip install --no-cache-dir -r requirements.txt + +COPY v1/ ./v1/ + +# Proof of reality: verify pipeline on build +RUN cd v1 && python data/proof/verify.py + +EXPOSE 8000 +# Default: REAL mode (mock requires explicit opt-in) +ENV WIFI_DENSEPOSE_MOCK=false +CMD ["uvicorn", "v1.src.api.main:app", "--host", "0.0.0.0", "--port", "8000"] +``` + +**Key change**: `RUN python data/proof/verify.py` **during build** means the Docker image cannot be created unless the pipeline produces correct output from real CSI data. + +**Requirements lockfile** (`v1/requirements-lock.txt`): +``` +# Core (required) +fastapi==0.115.6 +uvicorn[standard]==0.34.0 +pydantic==2.10.4 +pydantic-settings==2.7.1 +numpy==1.26.4 +scipy==1.14.1 + +# Signal processing (required) +# No ONNX required for basic pipeline verification + +# Optional (install separately for full features) +# torch>=2.1.0 +# onnxruntime>=1.17.0 +``` + +### 5. CI Pipeline That Proves Reality + +```yaml +# .github/workflows/verify-pipeline.yml +name: Verify Signal Pipeline + +on: + push: + paths: ['v1/src/**', 'v1/data/proof/**'] + pull_request: + paths: ['v1/src/**'] + +jobs: + verify: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: '3.11' + - name: Install minimal deps + run: pip install numpy scipy pydantic pydantic-settings + - name: Verify pipeline determinism + run: python v1/data/proof/verify.py + - name: Verify no random in production paths + run: | + # Fail if np.random appears in production code (not in testing/) + ! grep -r "np\.random\.\(rand\|randn\|randint\)" v1/src/ \ + --include="*.py" \ + --exclude-dir=testing \ + || (echo "FAIL: np.random found in production code" && exit 1) +``` + +### Concrete File Changes Required + +| File | Action | Description | +|------|--------|-------------| +| `v1/src/core/csi_processor.py:390` | **Replace** | Real Doppler extraction from temporal CSI history | +| `v1/src/hardware/csi_extractor.py:83-84` | **Replace** | Hard error with descriptive message when parsing fails | +| `v1/src/hardware/csi_extractor.py:129-135` | **Replace** | Real Atheros CSI parser or hard error with hardware instructions | +| `v1/src/hardware/router_interface.py:198-212` | **Replace** | Hard error for unimplemented hardware, or real `iwconfig` + CSI tool integration | +| `v1/src/services/pose_service.py:293-356` | **Move** | Move `_generate_mock_poses()` to `v1/src/testing/mock_pose_generator.py` | +| `v1/src/services/pose_service.py:430-431` | **Remove** | Remove mock CSI generation from production path | +| `v1/src/services/pose_service.py:489-607` | **Replace** | Real statistics from database, or explicit "no data" response | +| `v1/src/core/router_interface.py:60-300` | **Move** | Move mock generator to `v1/src/testing/mock_csi_generator.py` | +| `v1/src/api/dependencies.py:82,408` | **Replace** | Real auth check or explicit dev-mode bypass with logging | +| `v1/data/proof/` | **Create** | Proof bundle (sample capture + expected hash + verify script) | +| `v1/requirements-lock.txt` | **Create** | Pinned minimal dependencies | +| `.github/workflows/verify-pipeline.yml` | **Create** | CI verification | + +### Hardware Documentation + +``` +v1/docs/hardware-setup.md (to be created) + +# Supported Hardware Matrix + +| Chipset | Tool | OS | Capture Command | +|---------|------|----|-----------------| +| Intel 5300 | Linux 802.11n CSI Tool | Ubuntu 18.04 | `sudo ./log_to_file csi.dat` | +| Atheros AR9580 | Atheros CSI Tool | Ubuntu 14.04 | `sudo ./recv_csi csi.dat` | +| Broadcom BCM4339 | Nexmon CSI | Android/Nexus 5 | `nexutil -m1 -k1 ...` | +| ESP32 | ESP32-CSI | ESP-IDF | `csi_recv --format binary` | + +# Calibration +1. Place router and receiver 2m apart, line of sight +2. Capture 10 seconds of empty-room baseline +3. Have one person walk through at normal pace +4. Capture 10 seconds during walk-through +5. Run calibration: `python v1/scripts/calibrate.py --baseline empty.dat --activity walk.dat` +``` + +## Consequences + +### Positive +- **"Clone, build, verify" in one command**: `docker build . && docker run --rm wifi-densepose python v1/data/proof/verify.py` produces a deterministic PASS +- **No silent fakes**: Random data never appears in production output +- **CI enforcement**: PRs that introduce `np.random` in production paths fail automatically +- **Credibility anchor**: SHA-256 verified output from real CSI capture is unchallengeable proof +- **Clear mock boundary**: Mock code exists only in `v1/src/testing/`, never imported by production modules + +### Negative +- **Requires real CSI capture**: Someone must capture and commit a real CSI sample (one-time effort) +- **Build may fail without hardware**: Without mock fallback, systems without WiFi hardware cannot demo - must use proof bundle instead +- **Migration effort**: Moving mock code to separate module requires updating imports in test files +- **Stricter development workflow**: Developers must explicitly opt in to mock mode + +### Acceptance Criteria + +A stranger can: +1. `git clone` the repository +2. Run ONE command (`docker build .` or `python v1/data/proof/verify.py`) +3. See `PASS: Pipeline output matches expected hash` with a specific SHA-256 +4. Confirm no `np.random` in any non-test file via CI badge + +If this works 100% over 5 runs on a clean machine, the "fake" narrative dies. + +### Answering the Two Key Questions + +**Q1: Docker or Nix first?** +Recommendation: **Docker first**. The Dockerfile already exists, just needs fixing. Nix is higher quality but smaller audience. Docker gives the widest "clone and verify" coverage. + +**Q2: Are external crates public and versioned?** +The Python dependencies are all public PyPI packages. The Rust `ruvector-core` and `ruvector-data-framework` crates are currently commented out in `Cargo.toml` (lines 83-84: `# ruvector-core = "0.1"`) and are not yet published to crates.io. They are internal to ruvnet. This is a blocker for the Rust path but does not affect the Python proof-of-reality work in this ADR. + +## References + +- [Linux 802.11n CSI Tool](https://dhalperi.github.io/linux-80211n-csitool/) +- [Atheros CSI Tool](https://wands.sg/research/wifi/AthesCSI/) +- [Nexmon CSI](https://github.com/seemoo-lab/nexmon_csi) +- [ESP32 CSI](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/wifi.html#wi-fi-channel-state-information) +- [Reproducible Builds](https://reproducible-builds.org/) +- ADR-002: RuVector RVF Integration Strategy diff --git a/docs/adr/ADR-012-esp32-csi-sensor-mesh.md b/docs/adr/ADR-012-esp32-csi-sensor-mesh.md new file mode 100644 index 0000000..f2100c3 --- /dev/null +++ b/docs/adr/ADR-012-esp32-csi-sensor-mesh.md @@ -0,0 +1,318 @@ +# ADR-012: ESP32 CSI Sensor Mesh for Distributed Sensing + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### The Hardware Reality Gap + +WiFi-DensePose's Rust and Python pipelines implement real signal processing (FFT, phase unwrapping, Doppler extraction, correlation features), but the system currently has no defined path from **physical WiFi hardware → CSI bytes → pipeline input**. The `csi_extractor.py` and `router_interface.py` modules contain placeholder parsers that return `np.random.rand()` instead of real parsed data (see ADR-011). + +To close this gap, we need a concrete, affordable, reproducible hardware platform that produces real CSI data and streams it into the existing pipeline. + +### Why ESP32 + +| Factor | ESP32/ESP32-S3 | Intel 5300 (iwl5300) | Atheros AR9580 | +|--------|---------------|---------------------|----------------| +| Cost | ~$5-15/node | ~$50-100 (used NIC) | ~$30-60 (used NIC) | +| Availability | Mass produced, in stock | Discontinued, eBay only | Discontinued, eBay only | +| CSI Support | Official ESP-IDF API | Linux CSI Tool (kernel mod) | Atheros CSI Tool | +| Form Factor | Standalone MCU | Requires PCIe/Mini-PCIe host | Requires PCIe host | +| Deployment | Battery/USB, wireless | Desktop/laptop only | Desktop/laptop only | +| Antenna Config | 1-2 TX, 1-2 RX | 3 TX, 3 RX (MIMO) | 3 TX, 3 RX (MIMO) | +| Subcarriers | 52-56 (802.11n) | 30 (compressed) | 56 (full) | +| Fidelity | Lower (consumer SoC) | Higher (dedicated NIC) | Higher (dedicated NIC) | + +**ESP32 wins on deployability**: It's the only option where a stranger can buy nodes on Amazon, flash firmware, and have a working CSI mesh in an afternoon. Intel 5300 and Atheros cards require specific hardware, kernel modifications, and legacy OS versions. + +### ESP-IDF CSI API + +Espressif provides official CSI support through three key functions: + +```c +// 1. Configure what CSI data to capture +wifi_csi_config_t csi_config = { + .lltf_en = true, // Long Training Field (best for CSI) + .htltf_en = true, // HT-LTF + .stbc_htltf2_en = true, // STBC HT-LTF2 + .ltf_merge_en = true, // Merge LTFs + .channel_filter_en = false, + .manu_scale = false, +}; +esp_wifi_set_csi_config(&csi_config); + +// 2. Register callback for received CSI data +esp_wifi_set_csi_rx_cb(csi_data_callback, NULL); + +// 3. Enable CSI collection +esp_wifi_set_csi(true); + +// Callback receives: +void csi_data_callback(void *ctx, wifi_csi_info_t *info) { + // info->rx_ctrl: RSSI, noise_floor, channel, secondary_channel, etc. + // info->buf: Raw CSI data (I/Q pairs per subcarrier) + // info->len: Length of CSI data buffer + // Typical: 112 bytes = 56 subcarriers × 2 (I,Q) × 1 byte each +} +``` + +## Decision + +We will build an ESP32 CSI Sensor Mesh as the primary hardware integration path, with a full stack from firmware to aggregator to Rust pipeline to visualization. + +### System Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ ESP32 CSI Sensor Mesh │ +├─────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ ESP32 │ │ ESP32 │ │ ESP32 │ ... (3-6 nodes) │ +│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ +│ │ │ │ │ │ │ │ +│ │ CSI Rx │ │ CSI Rx │ │ CSI Rx │ ← WiFi frames from │ +│ │ FFT │ │ FFT │ │ FFT │ consumer router │ +│ │ Features │ │ Features │ │ Features │ │ +│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ +│ │ │ │ │ +│ │ UDP/TCP stream (WiFi or secondary channel) │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌─────────────────────────────────────────┐ │ +│ │ Aggregator │ │ +│ │ (Laptop / Raspberry Pi / Seed device) │ │ +│ │ │ │ +│ │ 1. Receive CSI streams from all nodes │ │ +│ │ 2. Timestamp alignment (per-node) │ │ +│ │ 3. Feature-level fusion │ │ +│ │ 4. Feed into Rust/Python pipeline │ │ +│ │ 5. Serve WebSocket to visualization │ │ +│ └──────────────────┬──────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────┐ │ +│ │ WiFi-DensePose Pipeline │ │ +│ │ │ │ +│ │ CsiProcessor → FeatureExtractor → │ │ +│ │ MotionDetector → PoseEstimator → │ │ +│ │ Three.js Visualization │ │ +│ └─────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### Node Firmware Specification + +**ESP-IDF project**: `firmware/esp32-csi-node/` + +``` +firmware/esp32-csi-node/ +├── CMakeLists.txt +├── sdkconfig.defaults # Menuconfig defaults with CSI enabled +├── main/ +│ ├── CMakeLists.txt +│ ├── main.c # Entry point, WiFi init, CSI callback +│ ├── csi_collector.c # CSI data collection and buffering +│ ├── csi_collector.h +│ ├── feature_extract.c # On-device FFT and feature extraction +│ ├── feature_extract.h +│ ├── stream_sender.c # UDP stream to aggregator +│ ├── stream_sender.h +│ ├── config.h # Node configuration (SSID, aggregator IP) +│ └── Kconfig.projbuild # Menuconfig options +├── components/ +│ └── esp_dsp/ # Espressif DSP library for FFT +└── README.md # Flash instructions +``` + +**On-device processing** (reduces bandwidth, node does pre-processing): + +```c +// feature_extract.c +typedef struct { + uint32_t timestamp_ms; // Local monotonic timestamp + uint8_t node_id; // This node's ID + int8_t rssi; // Received signal strength + int8_t noise_floor; // Noise floor estimate + uint8_t channel; // WiFi channel + float amplitude[56]; // |CSI| per subcarrier (from I/Q) + float phase[56]; // arg(CSI) per subcarrier + float doppler_energy; // Motion energy from temporal FFT + float breathing_band; // 0.1-0.5 Hz band power + float motion_band; // 0.5-3 Hz band power +} csi_feature_frame_t; +// Size: ~470 bytes per frame +// At 100 Hz: ~47 KB/s per node, ~280 KB/s for 6 nodes +``` + +**Key firmware design decisions**: + +1. **Feature extraction on-device**: Raw CSI I/Q → amplitude + phase + spectral bands. This cuts bandwidth from raw ~11 KB/frame to ~470 bytes/frame. + +2. **Monotonic timestamps**: Each node uses its own monotonic clock. No NTP synchronization attempted between nodes - clock drift is handled at the aggregator by fusing features, not raw phases (see "Clock Drift" section below). + +3. **UDP streaming**: Low-latency, loss-tolerant. Missing frames are acceptable; ordering is maintained via sequence numbers. + +4. **Configurable sampling rate**: 10-100 Hz via menuconfig. 100 Hz for motion detection, 10 Hz sufficient for occupancy. + +### Aggregator Specification + +The aggregator runs on any machine with WiFi/Ethernet to the nodes: + +```rust +// In wifi-densepose-rs, new module: crates/wifi-densepose-hardware/src/esp32/ +pub struct Esp32Aggregator { + /// UDP socket listening for node streams + socket: UdpSocket, + + /// Per-node state (last timestamp, feature buffer, drift estimate) + nodes: HashMap, + + /// Ring buffer of fused feature frames + fused_buffer: VecDeque, + + /// Channel to pipeline + pipeline_tx: mpsc::Sender, +} + +/// Fused frame from all nodes for one time window +pub struct FusedFrame { + /// Timestamp (aggregator local, monotonic) + timestamp: Instant, + + /// Per-node features (may have gaps if node dropped) + node_features: Vec>, + + /// Cross-node correlation (computed by aggregator) + cross_node_correlation: Array2, + + /// Fused motion energy (max across nodes) + fused_motion_energy: f64, + + /// Fused breathing band (coherent sum where phase aligns) + fused_breathing_band: f64, +} +``` + +### Clock Drift Handling + +ESP32 crystal oscillators drift ~20-50 ppm. Over 1 hour, two nodes may diverge by 72-180ms. This makes raw phase alignment across nodes impossible. + +**Solution**: Feature-level fusion, not signal-level fusion. + +``` +Signal-level (WRONG for ESP32): + Align raw I/Q samples across nodes → requires <1µs sync → impractical + +Feature-level (CORRECT for ESP32): + Each node: raw CSI → amplitude + phase + spectral features (local) + Aggregator: collect features → correlate → fuse decisions + No cross-node phase alignment needed +``` + +Specifically: +- **Motion energy**: Take max across nodes (any node seeing motion = motion) +- **Breathing band**: Use node with highest SNR as primary, others as corroboration +- **Location**: Cross-node amplitude ratios estimate position (no phase needed) + +### Sensing Capabilities by Deployment + +| Capability | 1 Node | 3 Nodes | 6 Nodes | Evidence | +|-----------|--------|---------|---------|----------| +| Presence detection | Good | Excellent | Excellent | Single-node RSSI variance | +| Coarse motion | Good | Excellent | Excellent | Doppler energy | +| Room-level location | None | Good | Excellent | Amplitude ratios | +| Respiration | Marginal | Good | Good | 0.1-0.5 Hz band, placement-sensitive | +| Heartbeat | Poor | Poor-Marginal | Marginal | Requires ideal placement, low noise | +| Multi-person count | None | Marginal | Good | Spatial diversity | +| Pose estimation | None | Poor | Marginal | Requires model + sufficient diversity | + +**Honest assessment**: ESP32 CSI is lower fidelity than Intel 5300 or Atheros. Heartbeat detection is placement-sensitive and unreliable. Respiration works with good placement. Motion and presence are solid. + +### Failure Modes and Mitigations + +| Failure Mode | Severity | Mitigation | +|-------------|----------|------------| +| Multipath dominates in cluttered rooms | High | Mesh diversity: 3+ nodes from different angles | +| Person occludes path between node and router | Medium | Mesh: other nodes still have clear paths | +| Clock drift ruins cross-node fusion | Medium | Feature-level fusion only; no cross-node phase alignment | +| UDP packet loss during high traffic | Low | Sequence numbers, interpolation for gaps <100ms | +| ESP32 WiFi driver bugs with CSI | Medium | Pin ESP-IDF version, test on known-good boards | +| Node power failure | Low | Aggregator handles missing nodes gracefully | + +### Bill of Materials (Starter Kit) + +| Item | Quantity | Unit Cost | Total | +|------|----------|-----------|-------| +| ESP32-S3-DevKitC-1 | 3 | $10 | $30 | +| USB-A to USB-C cables | 3 | $3 | $9 | +| USB power adapter (multi-port) | 1 | $15 | $15 | +| Consumer WiFi router (any) | 1 | $0 (existing) | $0 | +| Aggregator (laptop or Pi 4) | 1 | $0 (existing) | $0 | +| **Total** | | | **$54** | + +### Minimal Build Spec (Clone-Flash-Run) + +``` +# Step 1: Flash one node (requires ESP-IDF installed) +cd firmware/esp32-csi-node +idf.py set-target esp32s3 +idf.py menuconfig # Set WiFi SSID/password, aggregator IP +idf.py build flash monitor + +# Step 2: Run aggregator (Docker) +docker compose -f docker-compose.esp32.yml up + +# Step 3: Verify with proof bundle +# Aggregator captures 10 seconds, produces feature JSON, verifies hash +docker exec aggregator python verify_esp32.py + +# Step 4: Open visualization +open http://localhost:3000 # Three.js dashboard +``` + +### Proof of Reality for ESP32 + +``` +firmware/esp32-csi-node/proof/ +├── captured_csi_10sec.bin # Real 10-second CSI capture from ESP32 +├── captured_csi_meta.json # Board: ESP32-S3-DevKitC, ESP-IDF: 5.2, Router: TP-Link AX1800 +├── expected_features.json # Feature extraction output +├── expected_features.sha256 # Hash verification +└── capture_photo.jpg # Photo of actual hardware setup +``` + +## Consequences + +### Positive +- **$54 starter kit**: Lowest possible barrier to real CSI data +- **Mass available hardware**: ESP32 boards are in stock globally +- **Real data path**: Eliminates every `np.random.rand()` placeholder with actual hardware input +- **Proof artifact**: Captured CSI + expected hash proves the pipeline processes real data +- **Scalable mesh**: Add nodes for more coverage without changing software +- **Feature-level fusion**: Avoids the impossible problem of cross-node phase synchronization + +### Negative +- **Lower fidelity than research NICs**: ESP32 CSI is noisier than Intel 5300 +- **Heartbeat detection unreliable**: Micro-Doppler resolution insufficient for consistent heartbeat +- **ESP-IDF learning curve**: Firmware development requires embedded C knowledge +- **WiFi interference**: Nodes sharing the same channel as data traffic adds noise +- **Placement sensitivity**: Respiration detection requires careful node positioning + +### Interaction with Other ADRs +- **ADR-011** (Proof of Reality): ESP32 provides the real CSI capture for the proof bundle +- **ADR-008** (Distributed Consensus): Mesh nodes can use simplified Raft for configuration distribution +- **ADR-003** (RVF Containers): Aggregator stores CSI features in RVF format +- **ADR-004** (HNSW): Environment fingerprints from ESP32 mesh feed HNSW index + +## References + +- [Espressif ESP-CSI Repository](https://github.com/espressif/esp-csi) +- [ESP-IDF WiFi CSI API](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/wifi.html#wi-fi-channel-state-information) +- [ESP32 CSI Research Papers](https://ieeexplore.ieee.org/document/9439871) +- [Wi-Fi Sensing with ESP32: A Tutorial](https://arxiv.org/abs/2207.07859) +- ADR-011: Python Proof-of-Reality and Mock Elimination diff --git a/docs/adr/ADR-013-feature-level-sensing-commodity-gear.md b/docs/adr/ADR-013-feature-level-sensing-commodity-gear.md new file mode 100644 index 0000000..dfe0b32 --- /dev/null +++ b/docs/adr/ADR-013-feature-level-sensing-commodity-gear.md @@ -0,0 +1,383 @@ +# ADR-013: Feature-Level Sensing on Commodity Gear (Option 3) + +## Status +Proposed + +## Date +2026-02-28 + +## Context + +### Not Everyone Can Deploy Custom Hardware + +ADR-012 specifies an ESP32 CSI mesh that provides real CSI data. However, it requires: +- Purchasing ESP32 boards +- Flashing custom firmware +- ESP-IDF toolchain installation +- Physical placement of nodes + +For many users - especially those evaluating WiFi-DensePose or deploying in managed environments - modifying hardware is not an option. We need a sensing path that works with **existing, unmodified consumer WiFi gear**. + +### What Commodity Hardware Exposes + +Standard WiFi drivers and tools expose several metrics without custom firmware: + +| Signal | Source | Availability | Sampling Rate | +|--------|--------|-------------|---------------| +| RSSI (Received Signal Strength) | `iwconfig`, `iw`, NetworkManager | Universal | 1-10 Hz | +| Noise floor | `iw dev wlan0 survey dump` | Most Linux drivers | ~1 Hz | +| Link quality | `/proc/net/wireless` | Linux | 1-10 Hz | +| MCS index / PHY rate | `iw dev wlan0 link` | Most drivers | Per-packet | +| TX/RX bytes | `/sys/class/net/wlan0/statistics/` | Universal | Continuous | +| Retry count | `iw dev wlan0 station dump` | Most drivers | ~1 Hz | +| Beacon interval timing | `iw dev wlan0 scan dump` | Universal | Per-scan | +| Channel utilization | `iw dev wlan0 survey dump` | Most drivers | ~1 Hz | + +**RSSI is the primary signal**. It varies when humans move through the propagation path between any transmitter-receiver pair. Research confirms RSSI-based sensing for: +- Presence detection (single receiver, threshold on variance) +- Device-free motion detection (RSSI variance increases with movement) +- Coarse room-level localization (multi-receiver RSSI fingerprinting) +- Breathing detection (specialized setups, marginal quality) + +### Research Support + +- **RSSI-based presence**: Youssef et al. (2007) demonstrated device-free passive detection using RSSI from multiple receivers with >90% accuracy. +- **RSSI breathing**: Abdelnasser et al. (2015) showed respiration detection via RSSI variance in controlled settings with ~85% accuracy using 4+ receivers. +- **Device-free tracking**: Multiple receivers with RSSI fingerprinting achieve room-level (3-5m) accuracy. + +## Decision + +We will implement a Feature-Level Sensing module that extracts motion, presence, and coarse activity information from standard WiFi metrics available on any Linux machine without hardware modification. + +### Architecture + +``` +┌──────────────────────────────────────────────────────────────────────┐ +│ Feature-Level Sensing Pipeline │ +├──────────────────────────────────────────────────────────────────────┤ +│ │ +│ Data Sources (any Linux WiFi device): │ +│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────────┐ │ +│ │ RSSI │ │ Noise │ │ Link │ │ Packet Stats │ │ +│ │ Stream │ │ Floor │ │ Quality │ │ (TX/RX/Retry)│ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ └──────┬───────┘ │ +│ │ │ │ │ │ +│ └───────────┴───────────┴──────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌────────────────────────────────────────────────┐ │ +│ │ Feature Extraction Engine │ │ +│ │ │ │ +│ │ 1. Rolling statistics (mean, var, skew, kurt) │ │ +│ │ 2. Spectral features (FFT of RSSI time series) │ │ +│ │ 3. Change-point detection (CUSUM, PELT) │ │ +│ │ 4. Cross-receiver correlation │ │ +│ │ 5. Packet timing jitter analysis │ │ +│ └────────────────────────┬───────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌────────────────────────────────────────────────┐ │ +│ │ Classification / Decision │ │ +│ │ │ │ +│ │ • Presence: RSSI variance > threshold │ │ +│ │ • Motion class: spectral peak frequency │ │ +│ │ • Occupancy change: change-point event │ │ +│ │ • Confidence: cross-receiver agreement │ │ +│ └────────────────────────┬───────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌────────────────────────────────────────────────┐ │ +│ │ Output: Presence/Motion Events │ │ +│ │ │ │ +│ │ { "timestamp": "...", │ │ +│ │ "presence": true, │ │ +│ │ "motion_level": "active", │ │ +│ │ "confidence": 0.87, │ │ +│ │ "receivers_agreeing": 3, │ │ +│ │ "rssi_variance": 4.2 } │ │ +│ └────────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────────────┘ +``` + +### Feature Extraction Specification + +```python +class RssiFeatureExtractor: + """Extract sensing features from RSSI and link statistics. + + No custom hardware required. Works with any WiFi interface + that exposes standard Linux wireless statistics. + """ + + def __init__(self, config: FeatureSensingConfig): + self.window_size = config.window_size # 30 seconds + self.sampling_rate = config.sampling_rate # 10 Hz + self.rssi_buffer = deque(maxlen=self.window_size * self.sampling_rate) + self.noise_buffer = deque(maxlen=self.window_size * self.sampling_rate) + + def extract_features(self) -> FeatureVector: + rssi_array = np.array(self.rssi_buffer) + + return FeatureVector( + # Time-domain statistics + rssi_mean=np.mean(rssi_array), + rssi_variance=np.var(rssi_array), + rssi_skewness=scipy.stats.skew(rssi_array), + rssi_kurtosis=scipy.stats.kurtosis(rssi_array), + rssi_range=np.ptp(rssi_array), + rssi_iqr=np.subtract(*np.percentile(rssi_array, [75, 25])), + + # Spectral features (FFT of RSSI time series) + spectral_energy=self._spectral_energy(rssi_array), + dominant_frequency=self._dominant_freq(rssi_array), + breathing_band_power=self._band_power(rssi_array, 0.1, 0.5), # Hz + motion_band_power=self._band_power(rssi_array, 0.5, 3.0), # Hz + + # Change-point features + num_change_points=self._cusum_changes(rssi_array), + max_step_magnitude=self._max_step(rssi_array), + + # Noise floor features (environment stability) + noise_mean=np.mean(np.array(self.noise_buffer)), + snr_estimate=np.mean(rssi_array) - np.mean(np.array(self.noise_buffer)), + ) + + def _spectral_energy(self, rssi: np.ndarray) -> float: + """Total spectral energy excluding DC component.""" + spectrum = np.abs(scipy.fft.rfft(rssi - np.mean(rssi))) + return float(np.sum(spectrum[1:] ** 2)) + + def _dominant_freq(self, rssi: np.ndarray) -> float: + """Dominant frequency in RSSI time series.""" + spectrum = np.abs(scipy.fft.rfft(rssi - np.mean(rssi))) + freqs = scipy.fft.rfftfreq(len(rssi), d=1.0/self.sampling_rate) + return float(freqs[np.argmax(spectrum[1:]) + 1]) + + def _band_power(self, rssi: np.ndarray, low_hz: float, high_hz: float) -> float: + """Power in a specific frequency band.""" + spectrum = np.abs(scipy.fft.rfft(rssi - np.mean(rssi))) ** 2 + freqs = scipy.fft.rfftfreq(len(rssi), d=1.0/self.sampling_rate) + mask = (freqs >= low_hz) & (freqs <= high_hz) + return float(np.sum(spectrum[mask])) + + def _cusum_changes(self, rssi: np.ndarray) -> int: + """Count change points using CUSUM algorithm.""" + mean = np.mean(rssi) + cusum_pos = np.zeros_like(rssi) + cusum_neg = np.zeros_like(rssi) + threshold = 3.0 * np.std(rssi) + changes = 0 + for i in range(1, len(rssi)): + cusum_pos[i] = max(0, cusum_pos[i-1] + rssi[i] - mean - 0.5) + cusum_neg[i] = max(0, cusum_neg[i-1] - rssi[i] + mean - 0.5) + if cusum_pos[i] > threshold or cusum_neg[i] > threshold: + changes += 1 + cusum_pos[i] = 0 + cusum_neg[i] = 0 + return changes +``` + +### Data Collection (No Root Required) + +```python +class LinuxWifiCollector: + """Collect WiFi statistics from standard Linux interfaces. + + No root required for most operations. + No custom drivers or firmware. + Works with NetworkManager, wpa_supplicant, or raw iw. + """ + + def __init__(self, interface: str = "wlan0"): + self.interface = interface + + def get_rssi(self) -> float: + """Get current RSSI from connected AP.""" + # Method 1: /proc/net/wireless (no root) + with open("/proc/net/wireless") as f: + for line in f: + if self.interface in line: + parts = line.split() + return float(parts[3].rstrip('.')) + + # Method 2: iw (no root for own station) + result = subprocess.run( + ["iw", "dev", self.interface, "link"], + capture_output=True, text=True + ) + for line in result.stdout.split('\n'): + if 'signal:' in line: + return float(line.split(':')[1].strip().split()[0]) + + raise SensingError(f"Cannot read RSSI from {self.interface}") + + def get_noise_floor(self) -> float: + """Get noise floor estimate.""" + result = subprocess.run( + ["iw", "dev", self.interface, "survey", "dump"], + capture_output=True, text=True + ) + for line in result.stdout.split('\n'): + if 'noise:' in line: + return float(line.split(':')[1].strip().split()[0]) + return -95.0 # Default noise floor estimate + + def get_link_stats(self) -> dict: + """Get link quality statistics.""" + result = subprocess.run( + ["iw", "dev", self.interface, "station", "dump"], + capture_output=True, text=True + ) + stats = {} + for line in result.stdout.split('\n'): + if 'tx bytes:' in line: + stats['tx_bytes'] = int(line.split(':')[1].strip()) + elif 'rx bytes:' in line: + stats['rx_bytes'] = int(line.split(':')[1].strip()) + elif 'tx retries:' in line: + stats['tx_retries'] = int(line.split(':')[1].strip()) + elif 'signal:' in line: + stats['signal'] = float(line.split(':')[1].strip().split()[0]) + return stats +``` + +### Classification Rules + +```python +class PresenceClassifier: + """Rule-based presence and motion classifier. + + Uses simple, interpretable rules rather than ML to ensure + transparency and debuggability. + """ + + def __init__(self, config: ClassifierConfig): + self.variance_threshold = config.variance_threshold # 2.0 dBm² + self.motion_threshold = config.motion_threshold # 5.0 dBm² + self.spectral_threshold = config.spectral_threshold # 10.0 + self.confidence_min_receivers = config.min_receivers # 2 + + def classify(self, features: FeatureVector, + multi_receiver: list[FeatureVector] = None) -> SensingResult: + + # Presence: RSSI variance exceeds empty-room baseline + presence = features.rssi_variance > self.variance_threshold + + # Motion level + if features.rssi_variance > self.motion_threshold: + motion = MotionLevel.ACTIVE + elif features.rssi_variance > self.variance_threshold: + motion = MotionLevel.PRESENT_STILL + else: + motion = MotionLevel.ABSENT + + # Confidence from spectral energy and receiver agreement + spectral_conf = min(1.0, features.spectral_energy / self.spectral_threshold) + if multi_receiver: + agreeing = sum(1 for f in multi_receiver + if (f.rssi_variance > self.variance_threshold) == presence) + receiver_conf = agreeing / len(multi_receiver) + else: + receiver_conf = 0.5 # Single receiver = lower confidence + + confidence = 0.6 * spectral_conf + 0.4 * receiver_conf + + return SensingResult( + presence=presence, + motion_level=motion, + confidence=confidence, + dominant_frequency=features.dominant_frequency, + breathing_band_power=features.breathing_band_power, + ) +``` + +### Capability Matrix (Honest Assessment) + +| Capability | Single Receiver | 3 Receivers | 6 Receivers | Accuracy | +|-----------|----------------|-------------|-------------|----------| +| Binary presence | Yes | Yes | Yes | 90-95% | +| Coarse motion (still/moving) | Yes | Yes | Yes | 85-90% | +| Room-level location | No | Marginal | Yes | 70-80% | +| Person count | No | Marginal | Marginal | 50-70% | +| Activity class (walk/sit/stand) | Marginal | Marginal | Yes | 60-75% | +| Respiration detection | No | Marginal | Marginal | 40-60% | +| Heartbeat | No | No | No | N/A | +| Body pose | No | No | No | N/A | + +**Bottom line**: Feature-level sensing on commodity gear does presence and motion well. It does NOT do pose estimation, heartbeat, or reliable respiration. Any claim otherwise would be dishonest. + +### Decision Matrix: Option 2 (ESP32) vs Option 3 (Commodity) + +| Factor | ESP32 CSI (ADR-012) | Commodity (ADR-013) | +|--------|---------------------|---------------------| +| Headline capability | Respiration + motion | Presence + coarse motion | +| Hardware cost | $54 (3-node kit) | $0 (existing gear) | +| Setup time | 2-4 hours | 15 minutes | +| Technical barrier | Medium (firmware flash) | Low (pip install) | +| Data quality | Real CSI (amplitude + phase) | RSSI only | +| Multi-person | Marginal | Poor | +| Pose estimation | Marginal | No | +| Reproducibility | High (controlled hardware) | Medium (varies by hardware) | +| Public credibility | High (real CSI artifact) | Medium (RSSI is "obvious") | + +### Proof Bundle for Commodity Sensing + +``` +v1/data/proof/commodity/ +├── rssi_capture_30sec.json # 30 seconds of RSSI from 3 receivers +├── rssi_capture_meta.json # Hardware: Intel AX200, Router: TP-Link AX1800 +├── scenario.txt # "Person walks through room at t=10s, sits at t=20s" +├── expected_features.json # Feature extraction output +├── expected_classification.json # Classification output +├── expected_features.sha256 # Verification hash +└── verify_commodity.py # One-command verification +``` + +### Integration with WiFi-DensePose Pipeline + +The commodity sensing module outputs the same `SensingResult` type as the CSI pipeline, allowing graceful degradation: + +```python +class SensingBackend(Protocol): + """Common interface for all sensing backends.""" + + def get_features(self) -> FeatureVector: ... + def get_capabilities(self) -> set[Capability]: ... + +class CsiBackend(SensingBackend): + """Full CSI pipeline (ESP32 or research NIC).""" + def get_capabilities(self): + return {Capability.PRESENCE, Capability.MOTION, Capability.RESPIRATION, + Capability.LOCATION, Capability.POSE} + +class CommodityBackend(SensingBackend): + """RSSI-only commodity hardware.""" + def get_capabilities(self): + return {Capability.PRESENCE, Capability.MOTION} +``` + +## Consequences + +### Positive +- **Zero-cost entry**: Works with existing WiFi hardware +- **15-minute setup**: `pip install wifi-densepose && wdp sense --interface wlan0` +- **Broad adoption**: Any Linux laptop, Pi, or phone can participate +- **Honest capability reporting**: `get_capabilities()` tells users exactly what works +- **Complements ESP32**: Users start with commodity, upgrade to ESP32 for more capability +- **No mock data**: Real RSSI from real hardware, deterministic pipeline + +### Negative +- **Limited capability**: No pose, no heartbeat, marginal respiration +- **Hardware variability**: RSSI calibration differs across chipsets +- **Environmental sensitivity**: Commodity RSSI is more affected by interference than CSI +- **Not a "pose estimation" demo**: This module honestly cannot do what the project name implies +- **Lower credibility ceiling**: RSSI sensing is well-known; less impressive than CSI + +## References + +- [Youssef et al. - Challenges in Device-Free Passive Localization](https://doi.org/10.1145/1287853.1287880) +- [Device-Free WiFi Sensing Survey](https://arxiv.org/abs/1901.09683) +- [RSSI-based Breathing Detection](https://ieeexplore.ieee.org/document/7127688) +- [Linux Wireless Tools](https://wireless.wiki.kernel.org/en/users/documentation/iw) +- ADR-011: Python Proof-of-Reality and Mock Elimination +- ADR-012: ESP32 CSI Sensor Mesh