Operations March 2026

How to Automate OpenClaw Gateway Restarts (Without Creating More Problems)

Automating gateway restarts sounds simple: when it's down, restart it. But done naively, this creates a new class of problem — a crash loop that restarts hundreds of times per day, burns your systemd rate limit, fills your logs, and hides the actual bug. Here's how to do it properly.

Why Naive Auto-Restart Fails

The crash loop trap

systemd with Restart=always will restart your gateway up to 5 times in 60 seconds (the default rate limit) before giving up. If the underlying cause isn't fixed — corrupted config, missing env var, port conflict — every restart fails within seconds. After the rate limit is hit, the service stops restarting, and you're left with a gateway that appears to be "running" (systemd shows it, but it exits immediately).

This is worse than just being down — it's confusing, hard to diagnose, and fills your logs with noise.

There are three legitimate scenarios for gateway restarts, and each needs different handling:

Gateway crashed cleanly (process exited) — restart is likely fine
Gateway is running but not responding — kill + restart needed
Gateway crashed due to a fixable root cause (config, disk, memory) — fix first, then restart

The Right Restart Automation Stack

Step 1: systemd with rate limiting and proper restart policy

/etc/systemd/system/openclaw-gateway.service
[Unit]
Description=OpenClaw Gateway
After=network.target
# Restart even on clean exit (your gateway should run indefinitely)
StartLimitIntervalSec=300    # Rate limit window: 5 minutes
StartLimitBurst=5            # Max 5 starts per window

[Service]
Type=simple
User=openclaw
WorkingDirectory=/home/openclaw/.openclaw/workspace
ExecStart=/usr/bin/node /home/openclaw/.openclaw/node_modules/.bin/openclaw gateway start
Restart=on-failure           # Only restart on non-zero exit
RestartSec=10s               # Wait 10s between restarts (not 1s — gives time for port release)

# Useful for debugging — sends SIGTERM first, then SIGKILL after 30s
TimeoutStopSec=30s
KillMode=mixed

[Install]
WantedBy=multi-user.target

Don't use Restart=always with a short RestartSec on a broken gateway. It will hit StartLimitBurst instantly and stop restarting. Use Restart=on-failure with a longer RestartSec if the cause is intermittent, or add pre-start diagnostics.

Step 2: Add a pre-start health check

Before each restart attempt, run a quick diagnostic to catch known fixable problems:

/usr/local/bin/openclaw-prestart.sh
#!/bin/bash
# Run before openclaw gateway starts — fix known issues first

WORKSPACE="${HOME:-/root}/.openclaw/workspace"
CONFIG="$WORKSPACE/openclaw.json"

# 1. Check config is valid JSON
if [ -f "$CONFIG" ]; then
  if ! node -e "JSON.parse(require('fs').readFileSync('$CONFIG', 'utf8'))" 2>/dev/null; then
    echo "WARNING: Config file is not valid JSON"
    # Restore backup if available
    if [ -f "${CONFIG}.bak" ]; then
      cp "${CONFIG}.bak" "$CONFIG"
      echo "Restored config from backup"
    else
      echo "No backup available — manual intervention needed"
      exit 1  # Block restart, don't make it worse
    fi
  fi
fi

# 2. Check disk has at least 100MB free
DISK_AVAIL=$(df / | tail -1 | awk '{print $4}')
if [ "$DISK_AVAIL" -lt 102400 ]; then  # 100MB in KB
  echo "WARNING: Low disk space (${DISK_AVAIL}KB free) — clearing caches"
  npm cache clean --force 2>/dev/null
  journalctl --vacuum-size=100M 2>/dev/null
fi

# 3. Check required env vars are present
if [ -f "$WORKSPACE/.env" ]; then
  if ! grep -q "TELEGRAM_BOT_TOKEN\|botToken" "$WORKSPACE/.env" "$CONFIG" 2>/dev/null; then
    echo "WARNING: No Telegram token found in config"
    # Don't exit 1 here — let it start and fail with a clear error
  fi
fi

exit 0

Add to systemd service file
[Service]
# ...
ExecStartPre=/usr/local/bin/openclaw-prestart.sh
ExecStart=/usr/bin/node ... openclaw gateway start

Step 3: Detect and break out of crash loops

systemd's built-in rate limiting stops the loop, but doesn't alert you or attempt to fix the underlying issue. Add a crash loop detector:

/usr/local/bin/openclaw-crash-monitor.sh
#!/bin/bash
# Run via cron every 5 minutes — detects crash loops

# Count restarts in the last 10 minutes
RESTARTS=$(journalctl -u openclaw-gateway --since "10 minutes ago" | grep -c "Started OpenClaw" || echo 0)

if [ "$RESTARTS" -gt 3 ]; then
  echo "$(date): Crash loop detected — $RESTARTS restarts in 10 min"
  
  # Collect diagnostics before anything else
  journalctl -u openclaw-gateway -n 50 --no-pager > /tmp/oc-crashloop-$(date +%s).log
  
  # Stop the loop first
  systemctl stop openclaw-gateway
  
  # Try the most common fixes:
  
  # Fix 1: Validate + repair config
  CONFIG="$HOME/.openclaw/workspace/openclaw.json"
  if ! node -e "JSON.parse(require('fs').readFileSync('$CONFIG','utf8'))" 2>/dev/null; then
    echo "Config corrupted — attempting restore from backup"
    [ -f "${CONFIG}.bak" ] && cp "${CONFIG}.bak" "$CONFIG"
  fi
  
  # Fix 2: Clear disk if needed
  DISK_USED=$(df / | tail -1 | awk '{print $5}' | tr -d '%')
  if [ "$DISK_USED" -gt 90 ]; then
    npm cache clean --force 2>/dev/null
    journalctl --vacuum-size=50M 2>/dev/null
  fi
  
  # Wait, then try one more time
  sleep 5
  systemctl start openclaw-gateway
  sleep 10
  
  if systemctl is-active openclaw-gateway; then
    echo "Recovery successful"
  else
    echo "Recovery failed — manual intervention needed"
    # Here: send a notification (email, Telegram, etc.)
  fi
fi

# crontab entry:
# */5 * * * * /usr/local/bin/openclaw-crash-monitor.sh >> /var/log/oc-monitor.log 2>&1

Mac-Specific: launchctl Instead of systemd

On Mac, OpenClaw uses LaunchAgents (launchctl), not systemd. The restart behavior is different:

~/Library/LaunchAgents/io.openclaw.gateway.plist (key section)
<key>KeepAlive</key>
<true/>          <!-- Auto-restart on exit -->

<key>ThrottleInterval</key>
<integer>10</integer>   <!-- Minimum 10s between restarts -->

macOS also uses a different crash loop detection: if a process exits too quickly too many times, launchctl backs off exponentially. To reset this:

Terminal — Reset launchctl backoff
# Unload + reload to reset backoff state
launchctl bootout "gui/$UID/io.openclaw.gateway" 2>/dev/null
launchctl bootstrap "gui/$UID" ~/Library/LaunchAgents/io.openclaw.gateway.plist

# Or using openclaw command
openclaw gateway restart

The Restart Decision Tree

When your gateway goes down, this is the correct order of operations:

Collect logs immediately — before any restart. The last 50 lines of journal/log file tell you the exit reason. This is the most important step and the most commonly skipped.
Quick log check
```
sudo journalctl -u openclaw-gateway -n 50 --no-pager | tail -30
```
Check disk and memory — if either is critically low, fix it before restarting. A restart into a full disk will fail immediately.
Validate config — corrupted config causes instant exit. Fix it before restarting.
Restart — only after confirming the environment is healthy enough to stay running.
Verify recovery — after restart, confirm the process is running and the Telegram connection is active. A running process ≠ a working bot.

Handling Planned Restarts

Not all restarts are failures. You might restart the gateway for:

Config changes (new API key, updated settings)
Version updates
Scheduled maintenance

The difference matters for your monitoring: a planned restart should not fire an alert or start a repair workflow. Set a maintenance window before planned restarts:

Terminal — Signal a planned restart (no alerts)
# Tell your monitoring this is intentional
touch /tmp/openclaw-planned-restart
systemctl restart openclaw-gateway
# Your monitor checks for this file and skips the crash alert

Mechanic handles this automatically. It has a 30-second reconnect grace period — if an agent reconnects within 30 seconds of disconnecting (e.g., a planned restart), no offline alert is fired to the customer. The reconnect is logged but treated as normal.

Skip building all of this

Mechanic does all of this out of the box — pre-restart diagnostics, crash loop detection, root-cause-first repair, and smart alerts. Installs in about 2 minutes.

See plans →