Amplitude-based detection is fragile in non-static environments #51
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I have a core question after reading the implementation, focused on the practical applicability boundary of CSI amplitude features in real-world environments.
In the current code, there is logic that applies thresholding on amplitude statistics to produce
human_detected/confidence. From my experience, in real (non-enclosed, non-static) environments, CSI amplitude is affected by many non-human factors, which makes the mapping between “amplitude fluctuations” and “human activity” unstable and increases the false positive risk. Only in relatively enclosed, stable-layout, and static environments do amplitude-based features tend to show more repeatable correlation with human activity.Could you please clarify the following verifiable information so users can understand the intended scope and limitations of this project?
Thanks.
Thanks for the sharp question — this is exactly the kind of scrutiny WiFi sensing projects need. You're right that raw CSI amplitude alone is fragile. Let me answer each question with specific code references.
1. Does the project assume a fixed scene / static environment / fixed-position calibration?
Short answer: Yes, for the rule-based RSSI classifier. No, for the Rust CSI pipeline.
There are two separate detection paths in this project:
Python RSSI path (
v1/src/sensing/classifier.py) — This is the rule-based classifier you identified. It uses fixed thresholds (presence_variance_threshold=0.5,motion_energy_threshold=0.1) and is explicitly designed for controlled, single-room scenarios. It does assume a relatively static environment. The confidence model is a weighted heuristic (60% base + 20% spectral + 20% cross-receiver agreement), not a learned model. This path is the MVP sensing layer — useful for demos and enclosed rooms, but you're correct that it would produce false positives in dynamic environments.Rust CSI path (
wifi-densepose-signalcrate) — This is the SOTA pipeline and it has several robustness mechanisms beyond raw amplitude:signal/src/hampel.rssignal/src/csi_ratio.rssignal/src/phase_sanitizer.rssignal/src/fresnel.rssignal/src/subcarrier_selection.rsmincut_subcarrier_partition()— dynamically separates sensitive (human-affected) vs insensitive subcarriers using graph min-cut. This means the system adapts to which subcarriers carry body information vs environmental noise per frame.signal/src/spectrogram.rsgate_spectrogram()— learned attention weights suppress non-human spectral componentssignal/src/motion.rs:536signal/src/motion.rs:553calibrate()captures empty-room baseline; detection uses ratio against baselinesignal/src/motion.rs:39-46The
MotionDetectorin the Rust crate uses aMotionScorethat combines variance, correlation, phase, and Doppler components. Amplitude alone is weighted at 0.3-0.4, never 1.0.2. Are there robustness designs for non-enclosed or dynamic environments?
Partially — in the signal processing layer. Not yet in the end-to-end pipeline.
What exists:
mincut_subcarrier_partition()adapts per-frame to which subcarriers are sensitive to human motion vs. environmental multipath. This is the most relevant mechanism for non-static scenes.smoothing_factor) suppresses transient false positivesclassifier.py:191-197) — when multiple receivers agree, confidence is boosted; disagreement reduces itWhat is not yet implemented but planned:
feat/adr-024-contrastive-csi-embedding. This learns environment-invariant representations via self-supervised contrastive learning. A model pretrained on diverse rooms should generalize to new environments without per-room calibration. TheFingerprintIndexwithtemporal_baselinetype is specifically designed for anomaly detection relative to a learned baseline, rather than a fixed threshold.wifi-densepose-matcrate supports multi-AP fusion which increases robustness by providing geometric constraints that single-AP amplitude cannot.Honest assessment: The current system works best in enclosed rooms with fixed AP positions. Open-plan offices, hallways with cross-traffic, or outdoor environments would require the contrastive embedding + multi-AP path to achieve acceptable false positive rates.
3. Do you have cross-scene validation results or a reproducible evaluation setup?
Not yet. This is a gap.
What exists for reproducibility:
v1/data/proof/— verifiable synthetic CSI with fixed seedswifi-densepose-train/src/dataset.rs) — these are public datasets with multi-room, multi-subject captures that could serve as cross-scene benchmarksWhat doesn't exist yet:
I'll add a "Known Limitations" section to the README to make these boundaries explicit, and we've filed ADR-024 (AETHER) specifically because contrastive pretraining is the established approach in the literature (SelfHAR, Wang et al. 2023) for cross-environment generalization.
TL;DR: You identified a real limitation. The rule-based RSSI classifier is fragile by design and assumes static environments. The Rust CSI pipeline has multiple robustness mechanisms (conjugate mult, Hampel, min-cut subcarrier partition, multi-feature fusion) but hasn't been validated cross-scene. The newly implemented AETHER embedding model (ADR-024) is specifically designed to address cross-environment transfer via contrastive pretraining. Cross-scene benchmarks are needed and not yet done.
Thank you for pushing on this — it's exactly the kind of feedback that improves the project.
Thanks for the thoughtful question — these are exactly the right concerns for real-world CSI deployments. You're correct that raw amplitude thresholding alone is fragile outside controlled environments.
We've been actively working on this. Here's what the codebase now provides across the merged pipeline and the in-flight PR #52:
1. Static/calibrated environment assumption
The original amplitude-based detection (
derive_pose_from_sensinginmain.rs) does assume a relatively static scene — it uses fixed thresholds on variance/energy and was designed as a baseline heuristic, not a production classifier. This is acknowledged.2. Robustness features for dynamic/non-enclosed environments
PR #52 (ADR-024, all 7 phases complete, 272 tests passing) introduces several mechanisms specifically designed to address the fragility you describe:
Contrastive embeddings (SimCLR/InfoNCE): Instead of thresholding raw amplitude, we learn a 128-dim embedding space where similar CSI patterns cluster together. This is inherently more robust than hand-tuned thresholds because the model learns which amplitude variations are semantically meaningful vs. noise.
5 physically-motivated augmentations (
CsiAugmenter): temporal jitter, subcarrier masking, Gaussian noise injection, phase rotation, and amplitude scaling — all applied during self-supervised pretraining to teach the model invariance to exactly the non-human fluctuations you mention.Environment drift detection (
EnvironmentDetector): 3-sigma detector on embedding distance that flags when the environment has changed (e.g., furniture moved, new reflectors). Entries inserted during drift are taggedanomalousin the fingerprint index rather than silently corrupting the baseline.MicroLoRA per-environment adaptation: Rank-4 LoRA adapters (1,792 params each, 93% smaller than full retrain) allow per-scene fine-tuning without catastrophic forgetting. When you deploy in a new room, you adapt the projection head while preserving learned CSI structure via EWC++ (Elastic Weight Consolidation).
Hard-negative mining: During training, the system selects difficult negative pairs (similar amplitude patterns from different activities) with configurable ratio and warmup, which directly improves discrimination in ambiguous scenarios.
3. Cross-scene validation
We don't yet have published cross-scene benchmarks — the ADR references MM-Fi and Wi-Pose as target evaluation datasets (both include multi-room, multi-subject protocols). The training pipeline (
--pretrain,--train,--embed,--build-indexCLI flags) is wired end-to-end and ready for this evaluation. TheFingerprintIndexsupports 4 index types (environment, activity, temporal, person) specifically to enable cross-scene retrieval experiments.Bottom line: the amplitude-threshold path is a baseline/demo. The contrastive embedding pipeline in PR #52 is the intended production path and directly addresses the environmental fragility you've identified. Contributions and cross-scene evaluation results are very welcome.
Practical finding: subcarrier spread vs. mean amplitude (real ESP32-S3 deployment)
I've been running the Python sensing pipeline with 2x ESP32-S3 nodes streaming ADR-018 CSI to a Raspberry Pi, and hit exactly the fragility described in this issue. Sharing concrete findings that may help others.
Setup
ws_server.pywithEsp32UdpCollector→RssiFeatureExtractor→PresenceClassifierProblem:
mean_amplitudeis noise-dominatedWhen using
np.mean(amplitudes)per CSI frame as the time-series signal:The mean amplitude fluctuates rapidly frame-to-frame (measurement noise from I/Q quantization), but this noise is spectrally flat — it has no structure at breathing/motion frequencies. A Butterworth LPF at 5 Hz removes the noise but also zeroes out the entire signal.
Solution: use per-frame subcarrier spread (
amp_spread)Replacing
mean_amplitudewithnp.std(amplitudes)(standard deviation across subcarriers within a single frame) produces a signal that does contain body-motion information:After the same Butterworth LPF (3rd order, 5 Hz cutoff) + 10 Hz downsampling:
The intuition: human body causes frequency-selective fading (multipath). Different subcarriers are attenuated differently depending on body position. The spread (std) across subcarriers captures this multipath diversity and changes at body-motion rates, while the mean averages it out.
Additional changes needed for multi-node
When using 2+ ESP32 nodes, their CSI frames interleave in a single buffer, corrupting sample rate estimation (we saw 504 Hz estimated rate, which pushed all FFT bins above the motion/breathing bands). Fix: per-node ring buffers with separate feature extraction, then cross-receiver agreement for confidence.
Thresholds for amplitude-spread signal
The original thresholds (
presence_variance=0.5,motion_energy=0.1) are tuned for RSSI (dBm). For the amplitude-spread signal, we found these work:presence_variance_threshold=0.3motion_energy_threshold=0.05Limitation
Breathing band (0.1–0.5 Hz) is still 0.000 with this approach. Breathing detection likely requires per-subcarrier phase tracking rather than amplitude-only analysis, as the Rust pipeline's conjugate multiplication and phase sanitization are designed to provide.
Tested on: ESP32-S3-DevKitC × 2, Raspberry Pi (Debian 13, aarch64), Python 3.11, ESP-IDF v5.2