Automated failure detection, diagnosis, and recovery for OpenClaw agents. The watchdog that keeps your agent running.
# Full health check — scan all systems, diagnose issues, suggest fixes
python3 scripts/self-healing-agent.py check
# Auto-heal — detect and fix what it can automatically
python3 scripts/self-healing-agent.py heal
# Monitor mode — run continuously, fix issues as they appear
python3 scripts/self-healing-agent.py monitor --interval 300
# Check specific subsystem
python3 scripts/self-healing-agent.py check --target cron
python3 scripts/self-healing-agent.py check --target memory
python3 scripts/self-healing-agent.py check --target config
python3 scripts/self-healing-agent.py check --target sessions
check — Health CheckRuns diagnostic suite:
Options: --target to check one area, --json for machine output.
heal — Auto-RepairFor each detected issue, applies the safest fix:
Options: --dry-run to preview, --aggressive for riskier fixes.
monitor — Continuous WatchdogRuns in a loop, checking health every N seconds:
memory/self-healing-log.jsonOptions: --interval (default: 300), --max-heals per cycle.
report — Health ReportGenerates a markdown health report covering:
| Subsystem | Checks | Auto-Heals |
|---|---|---|
| ----------- | -------- | ------------ |
| Cron | Failed runs, timeouts, consecutive errors | Restart jobs, clear error state |
| Memory | File sizes >1MB, growth rate, duplicates | Archive old files, compact |
| Config | JSON validity, required fields, deprecated keys | Fix syntax, add defaults |
| Sessions | Zombie processes, bloated contexts | Kill zombies, archive contexts |
| Skills | Syntax errors, missing deps, broken imports | Log issue, skip broken skill |
| Network | API endpoints, DNS, SSL certs | Retry with backoff, switch endpoints |
All actions are logged to memory/self-healing-log.json:
{
"timestamp": "2026-04-05T06:00:00Z",
"issue": "cron job 'daily-intel' failed 3 consecutive times",
"diagnosis": "Script timeout — API rate limit hit",
"action": "Reset error count, added 30s backoff, restarted",
"result": "success",
"mttr_seconds": 12
}
共 1 个版本