18 KiB
Troubleshooting Guide
Overview
This guide provides solutions to common issues encountered when using the WiFi-DensePose system, including installation problems, hardware connectivity issues, performance optimization, and error resolution.
Table of Contents
- Quick Diagnostics
- Installation Issues
- Hardware Problems
- Performance Issues
- API and Connectivity Issues
- Data Quality Issues
- System Errors
- Domain-Specific Issues
- Advanced Troubleshooting
- Getting Support
Quick Diagnostics
System Health Check
Run a comprehensive system health check to identify issues:
# Check system status
curl http://localhost:8000/api/v1/system/status
# Run built-in diagnostics
curl http://localhost:8000/api/v1/system/diagnostics
# Check component health
curl http://localhost:8000/api/v1/health
Log Analysis
Check system logs for error patterns:
# View recent logs
docker-compose logs --tail=100 wifi-densepose-api
# Search for errors
docker-compose logs | grep -i error
# Check specific component logs
docker-compose logs neural-network
docker-compose logs csi-processor
Resource Monitoring
Monitor system resources:
# Check Docker container resources
docker stats
# Check system resources
htop
nvidia-smi # For GPU monitoring
# Check disk space
df -h
Installation Issues
Docker Installation Problems
Issue: Docker Compose Fails to Start
Symptoms:
- Services fail to start
- Port conflicts
- Permission errors
Solutions:
- Check Port Availability:
# Check if port 8000 is in use
netstat -tulpn | grep :8000
lsof -i :8000
# Kill process using the port
sudo kill -9 <PID>
- Fix Permission Issues:
# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
# Fix file permissions
sudo chown -R $USER:$USER .
- Update Docker Compose:
# Update Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
Issue: Out of Disk Space
Symptoms:
- Build failures
- Container crashes
- Database errors
Solutions:
- Clean Docker Resources:
# Remove unused containers, networks, images
docker system prune -a
# Remove unused volumes
docker volume prune
# Check disk usage
docker system df
- Configure Storage Location:
# Edit docker-compose.yml to use external storage
volumes:
- /external/storage/data:/app/data
- /external/storage/models:/app/models
Native Installation Problems
Issue: Python Dependencies Fail to Install
Symptoms:
- pip install errors
- Compilation failures
- Missing system libraries
Solutions:
- Install System Dependencies:
# Ubuntu/Debian
sudo apt update
sudo apt install -y build-essential cmake python3-dev
sudo apt install -y libopencv-dev libffi-dev libssl-dev
# CentOS/RHEL
sudo yum groupinstall -y "Development Tools"
sudo yum install -y python3-devel opencv-devel
- Use Virtual Environment:
# Create clean virtual environment
python3 -m venv venv_clean
source venv_clean/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
- Install PyTorch Separately:
# Install PyTorch with specific CUDA version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Or CPU-only version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Issue: CUDA/GPU Setup Problems
Symptoms:
- GPU not detected
- CUDA version mismatch
- Out of GPU memory
Solutions:
- Verify CUDA Installation:
# Check CUDA version
nvcc --version
nvidia-smi
# Check PyTorch CUDA support
python -c "import torch; print(torch.cuda.is_available())"
- Install Correct CUDA Version:
# Install CUDA 11.8 (example)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
- Configure GPU Memory:
# Set GPU memory limit
export CUDA_VISIBLE_DEVICES=0
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
Hardware Problems
Router Connectivity Issues
Issue: Cannot Connect to Router
Symptoms:
- No CSI data received
- Connection timeouts
- Authentication failures
Solutions:
- Verify Network Connectivity:
# Ping router
ping 192.168.1.1
# Check SSH access
ssh root@192.168.1.1
# Test CSI port
telnet 192.168.1.1 5500
- Check Router Configuration:
# SSH into router and check CSI tools
ssh root@192.168.1.1
csi_tool --status
# Restart CSI service
/etc/init.d/csi restart
- Verify Firewall Settings:
# Check iptables rules
iptables -L
# Allow CSI port
iptables -A INPUT -p tcp --dport 5500 -j ACCEPT
Issue: Poor CSI Data Quality
Symptoms:
- High packet loss
- Inconsistent data rates
- Signal interference
Solutions:
- Optimize Router Placement:
# Check signal strength
iwconfig wlan0
# Analyze interference
iwlist wlan0 scan | grep -E "(ESSID|Frequency|Quality)"
- Adjust CSI Parameters:
# Reduce sampling rate
echo "csi_rate=20" >> /etc/config/wireless
# Change channel
echo "channel=6" >> /etc/config/wireless
uci commit wireless
wifi reload
- Monitor Data Quality:
# Check CSI data statistics
curl http://localhost:8000/api/v1/hardware/csi/stats
# View real-time quality metrics
curl http://localhost:8000/api/v1/hardware/status
Hardware Resource Issues
Issue: High CPU Usage
Symptoms:
- System slowdown
- Processing delays
- High temperature
Solutions:
- Optimize Processing Settings:
# Reduce batch size
export POSE_PROCESSING_BATCH_SIZE=16
# Lower frame rate
export STREAM_FPS=15
# Disable unnecessary features
export ENABLE_HISTORICAL_DATA=false
- Scale Resources:
# Increase worker processes
export WORKERS=4
# Use process affinity
taskset -c 0-3 python -m src.api.main
Issue: GPU Memory Errors
Symptoms:
- CUDA out of memory errors
- Model loading failures
- Inference crashes
Solutions:
- Optimize GPU Usage:
# Reduce batch size
export POSE_PROCESSING_BATCH_SIZE=8
# Enable mixed precision
export ENABLE_MIXED_PRECISION=true
# Clear GPU cache
python -c "import torch; torch.cuda.empty_cache()"
- Monitor GPU Memory:
# Watch GPU memory usage
watch -n 1 nvidia-smi
# Check memory allocation
python -c "
import torch
print(f'Allocated: {torch.cuda.memory_allocated()/1024**3:.2f} GB')
print(f'Cached: {torch.cuda.memory_reserved()/1024**3:.2f} GB')
"
Performance Issues
Slow Pose Detection
Issue: Low Processing Frame Rate
Symptoms:
- FPS below expected rate
- High latency
- Delayed responses
Solutions:
- Optimize Neural Network:
# Use TensorRT optimization
export ENABLE_TENSORRT=true
# Enable model quantization
export MODEL_QUANTIZATION=int8
# Use smaller model variant
export POSE_MODEL_PATH="./models/densepose_mobile.pth"
- Tune Processing Pipeline:
# Increase batch size (if GPU memory allows)
export POSE_PROCESSING_BATCH_SIZE=64
# Reduce input resolution
export INPUT_RESOLUTION=256
# Skip frames for real-time processing
export FRAME_SKIP_RATIO=2
- Parallel Processing:
# Enable multi-threading
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
# Use multiple GPU devices
export CUDA_VISIBLE_DEVICES=0,1
Memory Issues
Issue: High Memory Usage
Symptoms:
- System running out of RAM
- Swap usage increasing
- OOM killer activated
Solutions:
- Optimize Memory Usage:
# Reduce buffer sizes
export CSI_BUFFER_SIZE=500
export STREAM_BUFFER_SIZE=50
# Limit historical data retention
export DATA_RETENTION_HOURS=24
# Enable memory mapping for large files
export USE_MEMORY_MAPPING=true
- Configure Swap:
# Add swap space
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
API and Connectivity Issues
Authentication Problems
Issue: JWT Token Errors
Symptoms:
- 401 Unauthorized responses
- Token expired errors
- Invalid signature errors
Solutions:
- Verify Token Configuration:
# Check secret key
echo $SECRET_KEY
# Verify token expiration
curl -X POST http://localhost:8000/api/v1/auth/verify \
-H "Authorization: Bearer <token>"
- Regenerate Tokens:
# Get new token
curl -X POST http://localhost:8000/api/v1/auth/token \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "password"}'
- Check System Time:
# Ensure system time is correct
timedatectl status
sudo ntpdate -s time.nist.gov
WebSocket Connection Issues
Issue: WebSocket Disconnections
Symptoms:
- Frequent disconnections
- Connection timeouts
- No real-time data
Solutions:
- Adjust WebSocket Settings:
# Increase timeout values
export WEBSOCKET_TIMEOUT=600
export WEBSOCKET_PING_INTERVAL=30
# Enable keep-alive
export WEBSOCKET_KEEPALIVE=true
- Check Network Configuration:
# Test WebSocket connection
wscat -c ws://localhost:8000/ws/pose
# Check proxy settings
curl -I http://localhost:8000/ws/pose
Rate Limiting Issues
Issue: Rate Limit Exceeded
Symptoms:
- 429 Too Many Requests errors
- API calls being rejected
- Slow response times
Solutions:
- Adjust Rate Limits:
# Increase rate limits
export RATE_LIMIT_REQUESTS=1000
export RATE_LIMIT_WINDOW=3600
# Disable rate limiting for development
export ENABLE_RATE_LIMITING=false
- Implement Request Batching:
# Batch multiple requests
def batch_requests(requests, batch_size=10):
for i in range(0, len(requests), batch_size):
batch = requests[i:i+batch_size]
# Process batch
time.sleep(1) # Rate limiting delay
Data Quality Issues
Poor Detection Accuracy
Issue: Low Confidence Scores
Symptoms:
- Many false positives
- Missing detections
- Inconsistent tracking
Solutions:
- Adjust Detection Thresholds:
# Increase confidence threshold
curl -X PUT http://localhost:8000/api/v1/config \
-H "Content-Type: application/json" \
-d '{"detection": {"confidence_threshold": 0.8}}'
- Improve Environment Setup:
# Recalibrate system
curl -X POST http://localhost:8000/api/v1/system/calibrate
# Check for interference
curl http://localhost:8000/api/v1/hardware/interference
- Optimize Model Parameters:
# Use domain-specific model
export POSE_MODEL_PATH="./models/healthcare_optimized.pth"
# Enable post-processing filters
export ENABLE_TEMPORAL_SMOOTHING=true
export ENABLE_OUTLIER_FILTERING=true
Tracking Issues
Issue: Person ID Switching
Symptoms:
- IDs change frequently
- Lost tracks
- Duplicate persons
Solutions:
- Tune Tracking Parameters:
# Adjust tracking thresholds
curl -X PUT http://localhost:8000/api/v1/config \
-H "Content-Type: application/json" \
-d '{
"tracking": {
"max_age": 30,
"min_hits": 3,
"iou_threshold": 0.3
}
}'
- Improve Detection Consistency:
# Enable temporal smoothing
export ENABLE_TEMPORAL_SMOOTHING=true
# Use appearance features
export USE_APPEARANCE_FEATURES=true
System Errors
Database Issues
Issue: Database Connection Errors
Symptoms:
- Connection refused errors
- Timeout errors
- Data not persisting
Solutions:
- Check Database Status:
# PostgreSQL
sudo systemctl status postgresql
sudo -u postgres psql -c "SELECT version();"
# SQLite
ls -la ./data/wifi_densepose.db
sqlite3 ./data/wifi_densepose.db ".tables"
- Fix Connection Issues:
# Reset database connection
export DATABASE_URL="postgresql://user:password@localhost:5432/wifi_densepose"
# Restart database service
sudo systemctl restart postgresql
- Database Migration:
# Run database migrations
python -m src.database.migrate
# Reset database (WARNING: Data loss)
python -m src.database.reset --confirm
Service Crashes
Issue: API Service Crashes
Symptoms:
- Service stops unexpectedly
- No response from API
- Error 502/503 responses
Solutions:
- Check Service Logs:
# View crash logs
journalctl -u wifi-densepose -f
# Check for segmentation faults
dmesg | grep -i "segfault"
- Restart Services:
# Restart with Docker
docker-compose restart wifi-densepose-api
# Restart native service
sudo systemctl restart wifi-densepose
- Debug Memory Issues:
# Run with memory debugging
valgrind --tool=memcheck python -m src.api.main
# Check for memory leaks
python -m tracemalloc
Domain-Specific Issues
Healthcare Domain Issues
Issue: Fall Detection False Alarms
Symptoms:
- Too many fall alerts
- Normal activities triggering alerts
- Delayed detection
Solutions:
- Adjust Sensitivity:
curl -X PUT http://localhost:8000/api/v1/config \
-H "Content-Type: application/json" \
-d '{
"alerts": {
"fall_detection": {
"sensitivity": 0.7,
"notification_delay_seconds": 10
}
}
}'
- Improve Training Data:
# Collect domain-specific training data
python -m src.training.collect_healthcare_data
# Retrain model with healthcare data
python -m src.training.train_healthcare_model
Retail Domain Issues
Issue: Inaccurate Traffic Counting
Symptoms:
- Wrong visitor counts
- Missing entries/exits
- Double counting
Solutions:
- Calibrate Zone Detection:
# Define entrance/exit zones
curl -X PUT http://localhost:8000/api/v1/config \
-H "Content-Type: application/json" \
-d '{
"zones": {
"entrance": {
"coordinates": [[0, 0], [100, 50]],
"type": "entrance"
}
}
}'
- Optimize Tracking:
# Enable zone-based tracking
export ENABLE_ZONE_TRACKING=true
# Adjust dwell time thresholds
export MIN_DWELL_TIME_SECONDS=5
Advanced Troubleshooting
Performance Profiling
CPU Profiling
# Profile Python code
python -m cProfile -o profile.stats -m src.api.main
# Analyze profile
python -c "
import pstats
p = pstats.Stats('profile.stats')
p.sort_stats('cumulative').print_stats(20)
"
GPU Profiling
# Profile CUDA kernels
nvprof python -m src.neural_network.inference
# Use PyTorch profiler
python -c "
import torch
with torch.profiler.profile() as prof:
# Your code here
pass
print(prof.key_averages().table())
"
Network Debugging
Packet Capture
# Capture CSI packets
sudo tcpdump -i eth0 port 5500 -w csi_capture.pcap
# Analyze with Wireshark
wireshark csi_capture.pcap
Network Latency Testing
# Test network latency
ping -c 100 192.168.1.1 | tail -1
# Test bandwidth
iperf3 -c 192.168.1.1 -t 60
System Monitoring
Real-time Monitoring
# Monitor system resources
htop
iotop
nethogs
# Monitor GPU
nvidia-smi -l 1
# Monitor Docker containers
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Log Aggregation
# Centralized logging with ELK stack
docker run -d --name elasticsearch elasticsearch:7.17.0
docker run -d --name kibana kibana:7.17.0
# Configure log shipping
echo 'LOGGING_DRIVER=syslog' >> .env
echo 'SYSLOG_ADDRESS=tcp://localhost:514' >> .env
Getting Support
Collecting Diagnostic Information
Before contacting support, collect the following information:
# System information
uname -a
cat /etc/os-release
docker --version
python --version
# Application logs
docker-compose logs --tail=1000 > logs.txt
# Configuration
cat .env > config.txt
curl http://localhost:8000/api/v1/system/status > status.json
# Hardware information
lscpu
free -h
nvidia-smi > gpu_info.txt
Support Channels
- Documentation: Check the comprehensive documentation first
- GitHub Issues: Report bugs and feature requests
- Community Forum: Ask questions and share solutions
- Enterprise Support: For commercial deployments
Creating Effective Bug Reports
Include the following information:
-
Environment Details:
- Operating system and version
- Hardware specifications
- Docker/Python versions
-
Steps to Reproduce:
- Exact commands or API calls
- Configuration settings
- Input data characteristics
-
Expected vs Actual Behavior:
- What you expected to happen
- What actually happened
- Error messages and logs
-
Additional Context:
- Screenshots or videos
- Configuration files
- System logs
Emergency Procedures
For critical production issues:
-
Immediate Actions:
# Stop the system safely curl -X POST http://localhost:8000/api/v1/system/stop # Backup current data cp -r ./data ./data_backup_$(date +%Y%m%d_%H%M%S) # Restart with minimal configuration export MOCK_HARDWARE=true docker-compose up -d -
Rollback Procedures:
# Rollback to previous version git checkout <previous-tag> docker-compose down docker-compose up -d # Restore data backup rm -rf ./data cp -r ./data_backup_<timestamp> ./data -
Contact Information:
- Emergency support: support@wifi-densepose.com
- Phone: +1-555-SUPPORT
- Slack: #wifi-densepose-emergency
Remember: Most issues can be resolved by checking logs, verifying configuration, and ensuring proper hardware setup. When in doubt, start with the basic diagnostics and work your way through the troubleshooting steps systematically.
For additional help, see: