UNPKG

robotics

Version:

Robotics.dev P2P ROS2 robot controller CLI with ROS telemetry and video streaming

176 lines (135 loc) 4.18 kB
# Crash Prevention Guide ## Overview The `comms.js` process can crash due to memory leaks and high memory usage. This guide explains the implemented solutions to prevent crashes. ## Root Causes of Crashes 1. **Memory Leaks**: Accumulation of message chunks, timers, and ROS subscribers 2. **High Memory Usage**: Large video streams and ROS topic data 3. **Unhandled Exceptions**: Network errors and connection issues 4. **Resource Exhaustion**: Too many concurrent connections or data channels ## Implemented Solutions ### 1. Memory Management Improvements #### Message Cleanup - Automatic cleanup of old message chunks every 60 seconds - Maximum message age: 5 minutes - Maximum messages per peer: 100 - Timestamps added to all messages for age tracking #### Timer Management - All timers are properly cleared on cleanup - Memory monitoring timer added - Keepalive timer management improved #### ROS Subscriber Cleanup - Subscribers are properly destroyed when connections close - Multiple cleanup methods supported (destroy, unsubscribe, close) ### 2. Memory Monitoring #### Built-in Monitoring - Memory usage logged every 30 seconds - Automatic garbage collection when heap usage > 500MB - Process memory limits set to 1GB #### External Monitoring - `monitor-comms.js` script monitors process health - Automatic restart on high memory usage (>800MB) - Process crash detection and restart ### 3. Process Management #### Node.js Flags - `--max-old-space-size=1024`: 1GB memory limit - `--expose-gc`: Enable manual garbage collection #### Auto-restart Mechanism - Process restart on exit with non-zero code - 5-second delay between restart attempts - Error handling for spawn failures ### 4. Connection Management #### Better Error Handling - Uncaught exception handlers - Unhandled rejection handlers - Graceful cleanup on disconnection #### Resource Limits - Maximum 10 data channels per connection - Connection timeout handling - ICE restart management ## Usage ### Start with Monitoring ```bash # Start the monitoring service robotics monitor # Connect with automatic restart robotics connect ``` ### Manual Monitoring ```bash # Check process status robotics status # Monitor memory usage ps -o pid,rss,vsz,comm -p $(pgrep -f comms.js) ``` ### System Service (Optional) ```bash # Install as system service sudo cp robotics-monitor.service /etc/systemd/system/ sudo systemctl enable robotics-monitor sudo systemctl start robotics-monitor # Check service status sudo systemctl status robotics-monitor ``` ## Monitoring Commands ### Check Memory Usage ```bash # Real-time memory monitoring watch -n 5 'ps -o pid,rss,vsz,comm -p $(pgrep -f comms.js)' ``` ### Check Process Health ```bash # Check if process is running ps aux | grep comms.js | grep -v grep # Check for memory leaks node --inspect comms.js ``` ### Log Analysis ```bash # Monitor logs for memory issues tail -f /var/log/syslog | grep -i "killed\|oom\|memory" ``` ## Troubleshooting ### High Memory Usage 1. Check if video streaming is enabled 2. Reduce ROS topic frequency 3. Restart the process manually 4. Check for stuck connections ### Frequent Crashes 1. Enable monitoring service 2. Check system memory availability 3. Reduce concurrent connections 4. Update Node.js to latest LTS version ### Connection Issues 1. Check network connectivity 2. Verify server URL 3. Check firewall settings 4. Restart the process ## Best Practices 1. **Always use the monitoring service** for production deployments 2. **Monitor system resources** regularly 3. **Limit video resolution** to reduce memory usage 4. **Use appropriate ROS topic frequencies** 5. **Keep Node.js updated** to latest LTS version 6. **Monitor logs** for early warning signs ## Emergency Recovery If the process continues to crash: 1. **Immediate**: Kill all robotics processes ```bash pkill -f robotics ``` 2. **Cleanup**: Clear any stuck processes ```bash pkill -f comms.js pkill -f monitor-comms.js ``` 3. **Restart**: Start fresh with monitoring ```bash robotics monitor robotics connect ``` 4. **Investigate**: Check system resources ```bash free -h df -h top ```