robotics
Version:
Robotics.dev P2P ROS2 robot controller CLI with ROS telemetry and video streaming
176 lines (135 loc) • 4.18 kB
Markdown
# Crash Prevention Guide
## Overview
The `comms.js` process can crash due to memory leaks and high memory usage. This guide explains the implemented solutions to prevent crashes.
## Root Causes of Crashes
1. **Memory Leaks**: Accumulation of message chunks, timers, and ROS subscribers
2. **High Memory Usage**: Large video streams and ROS topic data
3. **Unhandled Exceptions**: Network errors and connection issues
4. **Resource Exhaustion**: Too many concurrent connections or data channels
## Implemented Solutions
### 1. Memory Management Improvements
#### Message Cleanup
- Automatic cleanup of old message chunks every 60 seconds
- Maximum message age: 5 minutes
- Maximum messages per peer: 100
- Timestamps added to all messages for age tracking
#### Timer Management
- All timers are properly cleared on cleanup
- Memory monitoring timer added
- Keepalive timer management improved
#### ROS Subscriber Cleanup
- Subscribers are properly destroyed when connections close
- Multiple cleanup methods supported (destroy, unsubscribe, close)
### 2. Memory Monitoring
#### Built-in Monitoring
- Memory usage logged every 30 seconds
- Automatic garbage collection when heap usage > 500MB
- Process memory limits set to 1GB
#### External Monitoring
- `monitor-comms.js` script monitors process health
- Automatic restart on high memory usage (>800MB)
- Process crash detection and restart
### 3. Process Management
#### Node.js Flags
- `--max-old-space-size=1024`: 1GB memory limit
- `--expose-gc`: Enable manual garbage collection
#### Auto-restart Mechanism
- Process restart on exit with non-zero code
- 5-second delay between restart attempts
- Error handling for spawn failures
### 4. Connection Management
#### Better Error Handling
- Uncaught exception handlers
- Unhandled rejection handlers
- Graceful cleanup on disconnection
#### Resource Limits
- Maximum 10 data channels per connection
- Connection timeout handling
- ICE restart management
## Usage
### Start with Monitoring
```bash
# Start the monitoring service
robotics monitor
# Connect with automatic restart
robotics connect
```
### Manual Monitoring
```bash
# Check process status
robotics status
# Monitor memory usage
ps -o pid,rss,vsz,comm -p $(pgrep -f comms.js)
```
### System Service (Optional)
```bash
# Install as system service
sudo cp robotics-monitor.service /etc/systemd/system/
sudo systemctl enable robotics-monitor
sudo systemctl start robotics-monitor
# Check service status
sudo systemctl status robotics-monitor
```
## Monitoring Commands
### Check Memory Usage
```bash
# Real-time memory monitoring
watch -n 5 'ps -o pid,rss,vsz,comm -p $(pgrep -f comms.js)'
```
### Check Process Health
```bash
# Check if process is running
ps aux | grep comms.js | grep -v grep
# Check for memory leaks
node --inspect comms.js
```
### Log Analysis
```bash
# Monitor logs for memory issues
tail -f /var/log/syslog | grep -i "killed\|oom\|memory"
```
## Troubleshooting
### High Memory Usage
1. Check if video streaming is enabled
2. Reduce ROS topic frequency
3. Restart the process manually
4. Check for stuck connections
### Frequent Crashes
1. Enable monitoring service
2. Check system memory availability
3. Reduce concurrent connections
4. Update Node.js to latest LTS version
### Connection Issues
1. Check network connectivity
2. Verify server URL
3. Check firewall settings
4. Restart the process
## Best Practices
1. **Always use the monitoring service** for production deployments
2. **Monitor system resources** regularly
3. **Limit video resolution** to reduce memory usage
4. **Use appropriate ROS topic frequencies**
5. **Keep Node.js updated** to latest LTS version
6. **Monitor logs** for early warning signs
## Emergency Recovery
If the process continues to crash:
1. **Immediate**: Kill all robotics processes
```bash
pkill -f robotics
```
2. **Cleanup**: Clear any stuck processes
```bash
pkill -f comms.js
pkill -f monitor-comms.js
```
3. **Restart**: Start fresh with monitoring
```bash
robotics monitor
robotics connect
```
4. **Investigate**: Check system resources
```bash
free -h
df -h
top
```