Automating gateway restarts sounds simple: when it's down, restart it. But done naively, this creates a new class of problem — a crash loop that restarts hundreds of times per day, burns your systemd rate limit, fills your logs, and hides the actual bug. Here's how to do it properly.
systemd with Restart=always will restart your gateway up to 5 times in 60 seconds (the default rate limit) before giving up. If the underlying cause isn't fixed — corrupted config, missing env var, port conflict — every restart fails within seconds. After the rate limit is hit, the service stops restarting, and you're left with a gateway that appears to be "running" (systemd shows it, but it exits immediately).
This is worse than just being down — it's confusing, hard to diagnose, and fills your logs with noise.
There are three legitimate scenarios for gateway restarts, and each needs different handling:
[Unit] Description=OpenClaw Gateway After=network.target # Restart even on clean exit (your gateway should run indefinitely) StartLimitIntervalSec=300 # Rate limit window: 5 minutes StartLimitBurst=5 # Max 5 starts per window [Service] Type=simple User=openclaw WorkingDirectory=/home/openclaw/.openclaw/workspace ExecStart=/usr/bin/node /home/openclaw/.openclaw/node_modules/.bin/openclaw gateway start Restart=on-failure # Only restart on non-zero exit RestartSec=10s # Wait 10s between restarts (not 1s — gives time for port release) # Useful for debugging — sends SIGTERM first, then SIGKILL after 30s TimeoutStopSec=30s KillMode=mixed [Install] WantedBy=multi-user.target
Restart=on-failure with a longer RestartSec if the cause is intermittent, or add pre-start diagnostics.
Before each restart attempt, run a quick diagnostic to catch known fixable problems:
#!/bin/bash
# Run before openclaw gateway starts — fix known issues first
WORKSPACE="${HOME:-/root}/.openclaw/workspace"
CONFIG="$WORKSPACE/openclaw.json"
# 1. Check config is valid JSON
if [ -f "$CONFIG" ]; then
if ! node -e "JSON.parse(require('fs').readFileSync('$CONFIG', 'utf8'))" 2>/dev/null; then
echo "WARNING: Config file is not valid JSON"
# Restore backup if available
if [ -f "${CONFIG}.bak" ]; then
cp "${CONFIG}.bak" "$CONFIG"
echo "Restored config from backup"
else
echo "No backup available — manual intervention needed"
exit 1 # Block restart, don't make it worse
fi
fi
fi
# 2. Check disk has at least 100MB free
DISK_AVAIL=$(df / | tail -1 | awk '{print $4}')
if [ "$DISK_AVAIL" -lt 102400 ]; then # 100MB in KB
echo "WARNING: Low disk space (${DISK_AVAIL}KB free) — clearing caches"
npm cache clean --force 2>/dev/null
journalctl --vacuum-size=100M 2>/dev/null
fi
# 3. Check required env vars are present
if [ -f "$WORKSPACE/.env" ]; then
if ! grep -q "TELEGRAM_BOT_TOKEN\|botToken" "$WORKSPACE/.env" "$CONFIG" 2>/dev/null; then
echo "WARNING: No Telegram token found in config"
# Don't exit 1 here — let it start and fail with a clear error
fi
fi
exit 0[Service] # ... ExecStartPre=/usr/local/bin/openclaw-prestart.sh ExecStart=/usr/bin/node ... openclaw gateway start
systemd's built-in rate limiting stops the loop, but doesn't alert you or attempt to fix the underlying issue. Add a crash loop detector:
#!/bin/bash
# Run via cron every 5 minutes — detects crash loops
# Count restarts in the last 10 minutes
RESTARTS=$(journalctl -u openclaw-gateway --since "10 minutes ago" | grep -c "Started OpenClaw" || echo 0)
if [ "$RESTARTS" -gt 3 ]; then
echo "$(date): Crash loop detected — $RESTARTS restarts in 10 min"
# Collect diagnostics before anything else
journalctl -u openclaw-gateway -n 50 --no-pager > /tmp/oc-crashloop-$(date +%s).log
# Stop the loop first
systemctl stop openclaw-gateway
# Try the most common fixes:
# Fix 1: Validate + repair config
CONFIG="$HOME/.openclaw/workspace/openclaw.json"
if ! node -e "JSON.parse(require('fs').readFileSync('$CONFIG','utf8'))" 2>/dev/null; then
echo "Config corrupted — attempting restore from backup"
[ -f "${CONFIG}.bak" ] && cp "${CONFIG}.bak" "$CONFIG"
fi
# Fix 2: Clear disk if needed
DISK_USED=$(df / | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$DISK_USED" -gt 90 ]; then
npm cache clean --force 2>/dev/null
journalctl --vacuum-size=50M 2>/dev/null
fi
# Wait, then try one more time
sleep 5
systemctl start openclaw-gateway
sleep 10
if systemctl is-active openclaw-gateway; then
echo "Recovery successful"
else
echo "Recovery failed — manual intervention needed"
# Here: send a notification (email, Telegram, etc.)
fi
fi
# crontab entry:
# */5 * * * * /usr/local/bin/openclaw-crash-monitor.sh >> /var/log/oc-monitor.log 2>&1On Mac, OpenClaw uses LaunchAgents (launchctl), not systemd. The restart behavior is different:
<key>KeepAlive</key> <true/> <!-- Auto-restart on exit --> <key>ThrottleInterval</key> <integer>10</integer> <!-- Minimum 10s between restarts -->
macOS also uses a different crash loop detection: if a process exits too quickly too many times, launchctl backs off exponentially. To reset this:
# Unload + reload to reset backoff state launchctl bootout "gui/$UID/io.openclaw.gateway" 2>/dev/null launchctl bootstrap "gui/$UID" ~/Library/LaunchAgents/io.openclaw.gateway.plist # Or using openclaw command openclaw gateway restart
When your gateway goes down, this is the correct order of operations:
sudo journalctl -u openclaw-gateway -n 50 --no-pager | tail -30
Not all restarts are failures. You might restart the gateway for:
The difference matters for your monitoring: a planned restart should not fire an alert or start a repair workflow. Set a maintenance window before planned restarts:
# Tell your monitoring this is intentional touch /tmp/openclaw-planned-restart systemctl restart openclaw-gateway # Your monitor checks for this file and skips the crash alert