# Troubleshooting Guide ## Overview This guide provides solutions to common issues encountered when using the WiFi-DensePose system, including installation problems, hardware connectivity issues, performance optimization, and error resolution. ## Table of Contents 1. [Quick Diagnostics](#quick-diagnostics) 2. [Installation Issues](#installation-issues) 3. [Hardware Problems](#hardware-problems) 4. [Performance Issues](#performance-issues) 5. [API and Connectivity Issues](#api-and-connectivity-issues) 6. [Data Quality Issues](#data-quality-issues) 7. [System Errors](#system-errors) 8. [Domain-Specific Issues](#domain-specific-issues) 9. [Advanced Troubleshooting](#advanced-troubleshooting) 10. [Getting Support](#getting-support) ## Quick Diagnostics ### System Health Check Run a comprehensive system health check to identify issues: ```bash # Check system status curl http://localhost:8000/api/v1/system/status # Run built-in diagnostics curl http://localhost:8000/api/v1/system/diagnostics # Check component health curl http://localhost:8000/api/v1/health ``` ### Log Analysis Check system logs for error patterns: ```bash # View recent logs docker-compose logs --tail=100 wifi-densepose-api # Search for errors docker-compose logs | grep -i error # Check specific component logs docker-compose logs neural-network docker-compose logs csi-processor ``` ### Resource Monitoring Monitor system resources: ```bash # Check Docker container resources docker stats # Check system resources htop nvidia-smi # For GPU monitoring # Check disk space df -h ``` ## Installation Issues ### Docker Installation Problems #### Issue: Docker Compose Fails to Start **Symptoms:** - Services fail to start - Port conflicts - Permission errors **Solutions:** 1. **Check Port Availability:** ```bash # Check if port 8000 is in use netstat -tulpn | grep :8000 lsof -i :8000 # Kill process using the port sudo kill -9 ``` 2. **Fix Permission Issues:** ```bash # Add user to docker group sudo usermod -aG docker $USER newgrp docker # Fix file permissions sudo chown -R $USER:$USER . ``` 3. **Update Docker Compose:** ```bash # Update Docker Compose sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose ``` #### Issue: Out of Disk Space **Symptoms:** - Build failures - Container crashes - Database errors **Solutions:** 1. **Clean Docker Resources:** ```bash # Remove unused containers, networks, images docker system prune -a # Remove unused volumes docker volume prune # Check disk usage docker system df ``` 2. **Configure Storage Location:** ```bash # Edit docker-compose.yml to use external storage volumes: - /external/storage/data:/app/data - /external/storage/models:/app/models ``` ### Native Installation Problems #### Issue: Python Dependencies Fail to Install **Symptoms:** - pip install errors - Compilation failures - Missing system libraries **Solutions:** 1. **Install System Dependencies:** ```bash # Ubuntu/Debian sudo apt update sudo apt install -y build-essential cmake python3-dev sudo apt install -y libopencv-dev libffi-dev libssl-dev # CentOS/RHEL sudo yum groupinstall -y "Development Tools" sudo yum install -y python3-devel opencv-devel ``` 2. **Use Virtual Environment:** ```bash # Create clean virtual environment python3 -m venv venv_clean source venv_clean/bin/activate pip install --upgrade pip setuptools wheel pip install -r requirements.txt ``` 3. **Install PyTorch Separately:** ```bash # Install PyTorch with specific CUDA version pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Or CPU-only version pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu ``` #### Issue: CUDA/GPU Setup Problems **Symptoms:** - GPU not detected - CUDA version mismatch - Out of GPU memory **Solutions:** 1. **Verify CUDA Installation:** ```bash # Check CUDA version nvcc --version nvidia-smi # Check PyTorch CUDA support python -c "import torch; print(torch.cuda.is_available())" ``` 2. **Install Correct CUDA Version:** ```bash # Install CUDA 11.8 (example) wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run sudo sh cuda_11.8.0_520.61.05_linux.run ``` 3. **Configure GPU Memory:** ```bash # Set GPU memory limit export CUDA_VISIBLE_DEVICES=0 export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 ``` ## Hardware Problems ### Router Connectivity Issues #### Issue: Cannot Connect to Router **Symptoms:** - No CSI data received - Connection timeouts - Authentication failures **Solutions:** 1. **Verify Network Connectivity:** ```bash # Ping router ping 192.168.1.1 # Check SSH access ssh root@192.168.1.1 # Test CSI port telnet 192.168.1.1 5500 ``` 2. **Check Router Configuration:** ```bash # SSH into router and check CSI tools ssh root@192.168.1.1 csi_tool --status # Restart CSI service /etc/init.d/csi restart ``` 3. **Verify Firewall Settings:** ```bash # Check iptables rules iptables -L # Allow CSI port iptables -A INPUT -p tcp --dport 5500 -j ACCEPT ``` #### Issue: Poor CSI Data Quality **Symptoms:** - High packet loss - Inconsistent data rates - Signal interference **Solutions:** 1. **Optimize Router Placement:** ```bash # Check signal strength iwconfig wlan0 # Analyze interference iwlist wlan0 scan | grep -E "(ESSID|Frequency|Quality)" ``` 2. **Adjust CSI Parameters:** ```bash # Reduce sampling rate echo "csi_rate=20" >> /etc/config/wireless # Change channel echo "channel=6" >> /etc/config/wireless uci commit wireless wifi reload ``` 3. **Monitor Data Quality:** ```bash # Check CSI data statistics curl http://localhost:8000/api/v1/hardware/csi/stats # View real-time quality metrics curl http://localhost:8000/api/v1/hardware/status ``` ### Hardware Resource Issues #### Issue: High CPU Usage **Symptoms:** - System slowdown - Processing delays - High temperature **Solutions:** 1. **Optimize Processing Settings:** ```bash # Reduce batch size export POSE_PROCESSING_BATCH_SIZE=16 # Lower frame rate export STREAM_FPS=15 # Disable unnecessary features export ENABLE_HISTORICAL_DATA=false ``` 2. **Scale Resources:** ```bash # Increase worker processes export WORKERS=4 # Use process affinity taskset -c 0-3 python -m src.api.main ``` #### Issue: GPU Memory Errors **Symptoms:** - CUDA out of memory errors - Model loading failures - Inference crashes **Solutions:** 1. **Optimize GPU Usage:** ```bash # Reduce batch size export POSE_PROCESSING_BATCH_SIZE=8 # Enable mixed precision export ENABLE_MIXED_PRECISION=true # Clear GPU cache python -c "import torch; torch.cuda.empty_cache()" ``` 2. **Monitor GPU Memory:** ```bash # Watch GPU memory usage watch -n 1 nvidia-smi # Check memory allocation python -c " import torch print(f'Allocated: {torch.cuda.memory_allocated()/1024**3:.2f} GB') print(f'Cached: {torch.cuda.memory_reserved()/1024**3:.2f} GB') " ``` ## Performance Issues ### Slow Pose Detection #### Issue: Low Processing Frame Rate **Symptoms:** - FPS below expected rate - High latency - Delayed responses **Solutions:** 1. **Optimize Neural Network:** ```bash # Use TensorRT optimization export ENABLE_TENSORRT=true # Enable model quantization export MODEL_QUANTIZATION=int8 # Use smaller model variant export POSE_MODEL_PATH="./models/densepose_mobile.pth" ``` 2. **Tune Processing Pipeline:** ```bash # Increase batch size (if GPU memory allows) export POSE_PROCESSING_BATCH_SIZE=64 # Reduce input resolution export INPUT_RESOLUTION=256 # Skip frames for real-time processing export FRAME_SKIP_RATIO=2 ``` 3. **Parallel Processing:** ```bash # Enable multi-threading export OMP_NUM_THREADS=4 export MKL_NUM_THREADS=4 # Use multiple GPU devices export CUDA_VISIBLE_DEVICES=0,1 ``` ### Memory Issues #### Issue: High Memory Usage **Symptoms:** - System running out of RAM - Swap usage increasing - OOM killer activated **Solutions:** 1. **Optimize Memory Usage:** ```bash # Reduce buffer sizes export CSI_BUFFER_SIZE=500 export STREAM_BUFFER_SIZE=50 # Limit historical data retention export DATA_RETENTION_HOURS=24 # Enable memory mapping for large files export USE_MEMORY_MAPPING=true ``` 2. **Configure Swap:** ```bash # Add swap space sudo fallocate -l 4G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile ``` ## API and Connectivity Issues ### Authentication Problems #### Issue: JWT Token Errors **Symptoms:** - 401 Unauthorized responses - Token expired errors - Invalid signature errors **Solutions:** 1. **Verify Token Configuration:** ```bash # Check secret key echo $SECRET_KEY # Verify token expiration curl -X POST http://localhost:8000/api/v1/auth/verify \ -H "Authorization: Bearer " ``` 2. **Regenerate Tokens:** ```bash # Get new token curl -X POST http://localhost:8000/api/v1/auth/token \ -H "Content-Type: application/json" \ -d '{"username": "admin", "password": "password"}' ``` 3. **Check System Time:** ```bash # Ensure system time is correct timedatectl status sudo ntpdate -s time.nist.gov ``` ### WebSocket Connection Issues #### Issue: WebSocket Disconnections **Symptoms:** - Frequent disconnections - Connection timeouts - No real-time data **Solutions:** 1. **Adjust WebSocket Settings:** ```bash # Increase timeout values export WEBSOCKET_TIMEOUT=600 export WEBSOCKET_PING_INTERVAL=30 # Enable keep-alive export WEBSOCKET_KEEPALIVE=true ``` 2. **Check Network Configuration:** ```bash # Test WebSocket connection wscat -c ws://localhost:8000/ws/pose # Check proxy settings curl -I http://localhost:8000/ws/pose ``` ### Rate Limiting Issues #### Issue: Rate Limit Exceeded **Symptoms:** - 429 Too Many Requests errors - API calls being rejected - Slow response times **Solutions:** 1. **Adjust Rate Limits:** ```bash # Increase rate limits export RATE_LIMIT_REQUESTS=1000 export RATE_LIMIT_WINDOW=3600 # Disable rate limiting for development export ENABLE_RATE_LIMITING=false ``` 2. **Implement Request Batching:** ```python # Batch multiple requests def batch_requests(requests, batch_size=10): for i in range(0, len(requests), batch_size): batch = requests[i:i+batch_size] # Process batch time.sleep(1) # Rate limiting delay ``` ## Data Quality Issues ### Poor Detection Accuracy #### Issue: Low Confidence Scores **Symptoms:** - Many false positives - Missing detections - Inconsistent tracking **Solutions:** 1. **Adjust Detection Thresholds:** ```bash # Increase confidence threshold curl -X PUT http://localhost:8000/api/v1/config \ -H "Content-Type: application/json" \ -d '{"detection": {"confidence_threshold": 0.8}}' ``` 2. **Improve Environment Setup:** ```bash # Recalibrate system curl -X POST http://localhost:8000/api/v1/system/calibrate # Check for interference curl http://localhost:8000/api/v1/hardware/interference ``` 3. **Optimize Model Parameters:** ```bash # Use domain-specific model export POSE_MODEL_PATH="./models/healthcare_optimized.pth" # Enable post-processing filters export ENABLE_TEMPORAL_SMOOTHING=true export ENABLE_OUTLIER_FILTERING=true ``` ### Tracking Issues #### Issue: Person ID Switching **Symptoms:** - IDs change frequently - Lost tracks - Duplicate persons **Solutions:** 1. **Tune Tracking Parameters:** ```bash # Adjust tracking thresholds curl -X PUT http://localhost:8000/api/v1/config \ -H "Content-Type: application/json" \ -d '{ "tracking": { "max_age": 30, "min_hits": 3, "iou_threshold": 0.3 } }' ``` 2. **Improve Detection Consistency:** ```bash # Enable temporal smoothing export ENABLE_TEMPORAL_SMOOTHING=true # Use appearance features export USE_APPEARANCE_FEATURES=true ``` ## System Errors ### Database Issues #### Issue: Database Connection Errors **Symptoms:** - Connection refused errors - Timeout errors - Data not persisting **Solutions:** 1. **Check Database Status:** ```bash # PostgreSQL sudo systemctl status postgresql sudo -u postgres psql -c "SELECT version();" # SQLite ls -la ./data/wifi_densepose.db sqlite3 ./data/wifi_densepose.db ".tables" ``` 2. **Fix Connection Issues:** ```bash # Reset database connection export DATABASE_URL="postgresql://user:password@localhost:5432/wifi_densepose" # Restart database service sudo systemctl restart postgresql ``` 3. **Database Migration:** ```bash # Run database migrations python -m src.database.migrate # Reset database (WARNING: Data loss) python -m src.database.reset --confirm ``` ### Service Crashes #### Issue: API Service Crashes **Symptoms:** - Service stops unexpectedly - No response from API - Error 502/503 responses **Solutions:** 1. **Check Service Logs:** ```bash # View crash logs journalctl -u wifi-densepose -f # Check for segmentation faults dmesg | grep -i "segfault" ``` 2. **Restart Services:** ```bash # Restart with Docker docker-compose restart wifi-densepose-api # Restart native service sudo systemctl restart wifi-densepose ``` 3. **Debug Memory Issues:** ```bash # Run with memory debugging valgrind --tool=memcheck python -m src.api.main # Check for memory leaks python -m tracemalloc ``` ## Domain-Specific Issues ### Healthcare Domain Issues #### Issue: Fall Detection False Alarms **Symptoms:** - Too many fall alerts - Normal activities triggering alerts - Delayed detection **Solutions:** 1. **Adjust Sensitivity:** ```bash curl -X PUT http://localhost:8000/api/v1/config \ -H "Content-Type: application/json" \ -d '{ "alerts": { "fall_detection": { "sensitivity": 0.7, "notification_delay_seconds": 10 } } }' ``` 2. **Improve Training Data:** ```bash # Collect domain-specific training data python -m src.training.collect_healthcare_data # Retrain model with healthcare data python -m src.training.train_healthcare_model ``` ### Retail Domain Issues #### Issue: Inaccurate Traffic Counting **Symptoms:** - Wrong visitor counts - Missing entries/exits - Double counting **Solutions:** 1. **Calibrate Zone Detection:** ```bash # Define entrance/exit zones curl -X PUT http://localhost:8000/api/v1/config \ -H "Content-Type: application/json" \ -d '{ "zones": { "entrance": { "coordinates": [[0, 0], [100, 50]], "type": "entrance" } } }' ``` 2. **Optimize Tracking:** ```bash # Enable zone-based tracking export ENABLE_ZONE_TRACKING=true # Adjust dwell time thresholds export MIN_DWELL_TIME_SECONDS=5 ``` ## Advanced Troubleshooting ### Performance Profiling #### CPU Profiling ```bash # Profile Python code python -m cProfile -o profile.stats -m src.api.main # Analyze profile python -c " import pstats p = pstats.Stats('profile.stats') p.sort_stats('cumulative').print_stats(20) " ``` #### GPU Profiling ```bash # Profile CUDA kernels nvprof python -m src.neural_network.inference # Use PyTorch profiler python -c " import torch with torch.profiler.profile() as prof: # Your code here pass print(prof.key_averages().table()) " ``` ### Network Debugging #### Packet Capture ```bash # Capture CSI packets sudo tcpdump -i eth0 port 5500 -w csi_capture.pcap # Analyze with Wireshark wireshark csi_capture.pcap ``` #### Network Latency Testing ```bash # Test network latency ping -c 100 192.168.1.1 | tail -1 # Test bandwidth iperf3 -c 192.168.1.1 -t 60 ``` ### System Monitoring #### Real-time Monitoring ```bash # Monitor system resources htop iotop nethogs # Monitor GPU nvidia-smi -l 1 # Monitor Docker containers docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" ``` #### Log Aggregation ```bash # Centralized logging with ELK stack docker run -d --name elasticsearch elasticsearch:7.17.0 docker run -d --name kibana kibana:7.17.0 # Configure log shipping echo 'LOGGING_DRIVER=syslog' >> .env echo 'SYSLOG_ADDRESS=tcp://localhost:514' >> .env ``` ## Getting Support ### Collecting Diagnostic Information Before contacting support, collect the following information: ```bash # System information uname -a cat /etc/os-release docker --version python --version # Application logs docker-compose logs --tail=1000 > logs.txt # Configuration cat .env > config.txt curl http://localhost:8000/api/v1/system/status > status.json # Hardware information lscpu free -h nvidia-smi > gpu_info.txt ``` ### Support Channels 1. **Documentation**: Check the comprehensive documentation first 2. **GitHub Issues**: Report bugs and feature requests 3. **Community Forum**: Ask questions and share solutions 4. **Enterprise Support**: For commercial deployments ### Creating Effective Bug Reports Include the following information: 1. **Environment Details**: - Operating system and version - Hardware specifications - Docker/Python versions 2. **Steps to Reproduce**: - Exact commands or API calls - Configuration settings - Input data characteristics 3. **Expected vs Actual Behavior**: - What you expected to happen - What actually happened - Error messages and logs 4. **Additional Context**: - Screenshots or videos - Configuration files - System logs ### Emergency Procedures For critical production issues: 1. **Immediate Actions**: ```bash # Stop the system safely curl -X POST http://localhost:8000/api/v1/system/stop # Backup current data cp -r ./data ./data_backup_$(date +%Y%m%d_%H%M%S) # Restart with minimal configuration export MOCK_HARDWARE=true docker-compose up -d ``` 2. **Rollback Procedures**: ```bash # Rollback to previous version git checkout docker-compose down docker-compose up -d # Restore data backup rm -rf ./data cp -r ./data_backup_ ./data ``` 3. **Contact Information**: - Emergency support: support@wifi-densepose.com - Phone: +1-555-SUPPORT - Slack: #wifi-densepose-emergency --- **Remember**: Most issues can be resolved by checking logs, verifying configuration, and ensuring proper hardware setup. When in doubt, start with the basic diagnostics and work your way through the troubleshooting steps systematically. For additional help, see: - [Configuration Guide](configuration.md) - [API Reference](api-reference.md) - [Hardware Setup Guide](../hardware/router-setup.md) - [Deployment Guide](../developer/deployment-guide.md)