← 返回
开发者工具 中文

Gateway Watchdog

Monitor OpenClaw Gateway health by detecting abnormal error rates in logs. Use when: (1) setting up Gateway error monitoring, (2) diagnosing repeated API fai...
通过检测日志异常错误率监控OpenClaw网关健康状态。适用场景:(1) 设置网关错误监控,(2) 诊断API重复失败...
guoqunabc
开发者工具 clawhub v1.4.0 1 版本 99786.5 Key: 无需
★ 0
Stars
📥 1,402
下载
💾 12
安装
1
版本
#latest

概述

Gateway Watchdog

Detect abnormal error patterns in the OpenClaw Gateway before they cause damage. Works with all channels: Telegram, WhatsApp, Discord, Slack, Signal, iMessage, Feishu, and more.

Born from a real incident: a silent try-catch caused 76,744 failed retries in 8 hours — undetected until the API quota was exhausted.

What It Detects

CategoryPatterns
--------------------
Rate limitingHTTP 429, rate.limit, too many requests
Server errorsHTTP 5xx status codes
Auth/permissionHTTP 401/403, unauthorized, forbidden, token expired
Network errorsETIMEDOUT, ECONNREFUSED, ECONNRESET, ENOTFOUND, socket hang up
Delivery failuressendMessage failed, deliver failed, fetch failed
CustomUser-defined via WATCHDOG_EXTRA_PATTERNS env var

Smart Analysis

  • Error rate (errors/min) — more meaningful than raw count
  • Spike detection — alerts when errors jump 3x+ vs last check
  • Error concentration — flags when 80%+ errors are one type (single fault source)

Quick Start

bash scripts/gateway-watchdog.sh check     # silent unless errors exceed threshold
bash scripts/gateway-watchdog.sh verbose   # always outputs full report
bash scripts/gateway-watchdog.sh history   # show monitoring history
bash scripts/gateway-watchdog.sh trend     # last 24h error trend

Heartbeat integration

Add to HEARTBEAT.md:

## Gateway Error Monitoring (every heartbeat)
- Run `~/.openclaw/workspace/skills/gateway-watchdog/scripts/gateway-watchdog.sh check`
- If output is non-empty, report to user immediately
- No output = healthy, skip reporting

Cron (optional)

openclaw cron add \
  --name "gateway-watchdog" \
  --schedule "*/30 * * * *" \
  --task "Run gateway-watchdog.sh verbose. If errors detected, notify user with the report." \
  --channel last

Configuration

All via environment variables:

VariableDefaultDescription
--------------------------------
WATCHDOG_THRESHOLD30Error count that triggers alert
WATCHDOG_WINDOW30Monitoring window in minutes
WATCHDOG_SPIKE_RATIO3Alert when errors jump Nx vs last check
WATCHDOG_EXTRA_PATTERNS_(empty)_Custom regex patterns (e.g., `99991400\99991403`)
WATCHDOG_STATE~/.local/state/gateway-watchdog/state.jsonState file
WATCHDOG_LOG~/.local/state/gateway-watchdog/history.logHistory log

Adding channel-specific patterns

# Feishu-specific error codes
export WATCHDOG_EXTRA_PATTERNS='99991400|99991403|99991404|99991429'

# Telegram-specific
export WATCHDOG_EXTRA_PATTERNS='Too Many Requests|FLOOD_WAIT|bot was blocked'

# Discord-specific
export WATCHDOG_EXTRA_PATTERNS='DiscordAPIError|Missing Permissions|Unknown Channel'

Interpreting Results

🔴 Alert (Chinese locale)

🔴 Gateway 最近 30 分钟出现 150 条异常错误(阈值: 30,5/min)
📈 错误突增: 12 → 150(3倍阈值触发)

错误分类:
  429/限流: 120
  5xx服务端错误: 5
  认证/权限: 0
  网络错误: 5
  消息投递失败: 20

  ⚠️  单一错误类型「429/限流」占比 80%,可能是单一故障源

🔴 Alert (English equivalent)

🔴 Gateway detected 150 errors in the last 30 min (threshold: 30, 5/min)
📈 Error spike: 12 → 150 (3x threshold triggered)

Error breakdown:
  429/Rate-limit: 120
  5xx Server errors: 5
  Auth/Permission: 0
  Network errors: 5
  Delivery failures: 20

  ⚠️  Single error type "429/Rate-limit" accounts for 80%+ — likely a single fault source

💚 Healthy

No output from check mode.

Limitations

  • Requires systemd + journalctl (falls back to ~/.openclaw/logs/ on macOS)
  • Reactive, not preventive
  • Cannot pinpoint which extension is failing — check error details for clues

Security

  • Read-only: Only reads logs
  • No credentials: No API keys accessed
  • No network: No outbound requests
  • User state only: State in ~/.local/state/gateway-watchdog/ (XDG standard, no elevated permissions needed)

版本历史

共 1 个版本

  • v1.4.0 当前
    2026-03-29 07:54 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 323,888
security-compliance

Feishu Doc Writing

guoqunabc
将小组讨论、会议及研究成果整理为结构完整、视觉美观且安全的飞书文档。
★ 0 📥 1,980
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 66 📥 179,913