Scripts available in the Collective Skills repo
When an OpenClaw agent misbehaves — spamming messages, going dark, burning API credits, or looping on dead channels — this skill provides the diagnostic playbook. Covers the 4 most common failure modes with exact commands to diagnose and fix each one.
Battle-tested across a 6-agent deployment spanning 3 hosts (Windows + Linux + Proxmox).
Use when you observe any of these symptoms:
429 Too many tokens or rate_limit errorsauto-restart attempt 1/10, 2/10, etc.input length exceeds context lengthSymptom: Agent sends repeated messages every N minutes.
Root cause: Heartbeat interval too low (10m = 144 messages/day) + verbose prompt that always generates output instead of HEARTBEAT_OK.
Quick fix:
# Check interval
grep -A5 heartbeat ~/.openclaw/openclaw.json
# Fix: set to 30m minimum, simplify prompt to checklist + HEARTBEAT_OK default
# Then restart gateway
openclaw gateway restart
Prevention: Never set heartbeat below 20 minutes. Heartbeat prompts should CHECK things, not CREATE things.
Symptom: All models fail, agent goes dark.
Root cause: Heartbeat + N crons = (N+1) API calls per interval. Exceeds provider TPM limit → all fallbacks exhausted simultaneously.
Quick fix:
# Check for rate limits
journalctl -u <service> --since '1h ago' | grep '429\|rate_limit'
# Count your crons (each burns tokens)
openclaw cron list
# Fix: reduce heartbeat to 30-60m, disable non-essential crons, stagger schedules
Prevention: Calculate token budget before adding crons. Each run ≈ 2K-10K tokens. Route heartbeats to cheap/local models.
Symptom: Logs show repeated auto-restart attempt N/10 for IRC/Discord/etc.
Root cause: Target server unreachable → health monitor restarts → fails again → loop. Each restart may trigger model calls, burning API tokens.
Quick fix:
# Check for loops
journalctl -u <service> --since '1h ago' | grep 'auto-restart\|timed out'
# Test connectivity
nc -zv <target-ip> <target-port> -w 5
# Fix: disable the broken channel in openclaw.json
# channels.<name>.enabled = false
openclaw gateway restart
Prevention: Test connectivity BEFORE enabling channels. Disable channels you can't reach.
Symptom: memory sync failed or input length exceeds context length errors.
Root cause: File too large for embedding model's context window (mxbai-embed-large = 8K tokens).
Quick fix: Archive old sections of large files (MEMORY.md → memory/archive/). Keep active files under 8K tokens.
Prevention: Don't let MEMORY.md grow unbounded. Archive quarterly.
| What | Command | |
|---|---|---|
| ------ | --------- | |
| Service status | systemctl is-active | |
| Recent logs | `journalctl -u | tail -40` |
| Live tail | journalctl -u | |
| Rate limits | `journalctl -u | grep '429'` |
| Cron list | openclaw cron list | |
| Port test | nc -zv | |
| Config backup | cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak |
cp openclaw.json openclaw.json.bakjournalctl tells you what's wrong 90% of the time.共 1 个版本