LLM API error tracking, classification, retry, and task recovery for OpenClaw.
> ⚠️ 必须先安装配套插件,否则此 Skill 完全无法工作
> 此 Skill 仅提供自然语言描述、示例和触发话术。
> 核心实现(工具注册、model_call_ended 自动错误记录 hook、重试引擎、Dashboard HTTP 服务、JSONL 日志与统计持久化)全部来自插件 @leiJack-lo/resilience。
> 只装 Skill 不装插件 → 工具不存在、调用失败、不会自动收集数据、Dashboard 也不会启动。
clawhub login
# 1. 安装插件(必须带 --dangerously-force-unsafe-install)
# 原因:插件会使用 open 包打开浏览器、启动本地 HTTP server (默认 18765)、
# 注册 model_call_ended / agent_end / after_tool_call 等敏感 hook、读写 ~/.openclaw/plugins/resilience/ 下的日志和配置。
# ClawHub 安全扫描因此标记为 "suspicious"(这是正常行为,不是恶意代码)。
openclaw plugins install clawhub:@leiJack-lo/resilience --dangerously-force-unsafe-install
# 2. 安装此 Skill(获得中文自然语言能力 + 工具描述)
openclaw skills install resilience-monitor
# 3. OpenClaw 2026.6.10+:允许 agent_end 会话恢复 hook 读取会话结束事件
# 不设置时,API 错误统计仍可用,但“会话失败统计 + 下一轮恢复指令”不会启用。
openclaw config set plugins.entries.resilience.hooks.allowConversationAccess true
# 4. 必须重启 Gateway,让插件的 hooks 和工具真正注册生效
openclaw gateway restart
安装成功后即可直接对 agent 说中文指令,例如:
验证方法:重启后问 agent "resilience 插件安装好了吗?" 或直接试一个工具调用。如果提示工具不存在,就说明插件没加载成功。
配置(面板端口、是否自动启动 Dashboard 等)放在 ~/.openclaw/openclaw.json 的 plugins.entries.resilience.config 下;OpenClaw 2026.6.10+ 的会话恢复授权放在 plugins.entries.resilience.hooks.allowConversationAccess 下(见下方 dashboard 工具说明)。
This skill adds natural language support and Chinese examples on top of the Resilience plugin. It lets your agent monitor API health, inspect per-model error patterns, adjust retry strategies, generate reports, and control the live dashboard using everyday language.
Use it to:
Open the live web dashboard in your browser for real-time error stats and retry strategy management.
Parameters:
action: "open" (default) | "status" | "stop"Features:
URL: http://127.0.0.1:18765/ (default port, configurable via dashboardPort)
Voice / natural language examples:
resilience_dashboard({ action: "open" })resilience_dashboard({ action: "open" })resilience_dashboard({ action: "open" })The dashboard starts automatically when OpenClaw Gateway starts (unless dashboardEnabled: false).
重要:这些配置只有在插件已正确安装并加载后才生效(见最上面的安装前提)。
Configuration lives in ~/.openclaw/openclaw.json under plugins.entries.resilience.config (not only api.pluginConfig at hook time). Example:
"resilience": {
"enabled": true,
"hooks": {
"allowConversationAccess": true
},
"config": {
"dashboardPort": 18765,
"dashboardEnabled": true,
"instanceLabel": "my-workspace"
}
}
At gateway_start, config is read from ctx.config + ctx.workspaceDir.
Multi-instance: Use the instance dropdown to view all instances (aggregated) or a single Gateway. Each instance stores data under ~/.openclaw/plugins/resilience/instances/. Strategy edits apply only to the local Gateway instance.
View API error statistics by time period or model.
Parameters:
query (optional): Natural language query"today" or empty — today's full summary"hour" — current hour stats"week" — current week stats"mimo-v2.5") — model-specific statsExamples:
resilience_stats({ query: "today" })resilience_stats({ query: "mimo-v2.5" })resilience_stats({ query: "week" })View, add, update, or reset retry strategies.
Parameters:
action: "list" (default) | "add" | "update" | "reset"strategyName: Strategy name (required for add/update)updates: Fields to update (for add/update). Use these shapes:type: "fixed" | "exponential" | "custom"maxRetries: number or numeric string, e.g. 3 or "3"intervals: millisecond numbers or duration strings, e.g. [60000, 300000], ["30s", "2m"], or "30s, 2m, 5分钟"cooldownMs: millisecond number or duration string, e.g. 10000, "10s", "10秒"retryOn: array or comma-separated string of error categoriesmodels: array or comma-separated string of model namesExamples:
resilience_strategies({ action: "list" })resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { type: "exponential" } })resilience_strategies({ action: "add", strategyName: "my-strategy", updates: { type: "custom", maxRetries: 3, intervals: ["1m", "5分钟", "10m"], cooldownMs: "10s" } })resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { intervals: "30s, 2m, 5分钟" } })resilience_strategies({ action: "reset" })Generate detailed error reports.
Parameters:
reportType: "daily" (default) | "model" | "recovery" | "full"target: Model name or date (YYYY-MM-DD)Examples:
resilience_report({ reportType: "daily" })resilience_report({ reportType: "model", target: "mimo-v2.5" })resilience_report({ reportType: "recovery" })resilience_report({ reportType: "full" })View or update automatic session recovery settings. Use this when the user wants to change the "continue the task" wording after a session failure, switch Chinese/English recovery language, or disable/enable automatic recovery.
Parameters:
action: "show" (default) | "update" | "reset"enabled: true / falselanguage: "zh" | "en"prompt: custom prompt overriding localized defaultspromptZh: custom Chinese promptpromptEn: custom English promptttlMs: queued recovery context TTLcooldownMs: minimum interval between recovery injections per sessionmaxPerSession: maximum automatic injections per sessionExamples:
resilience_recovery({ action: "show" })resilience_recovery({ action: "update", language: "zh" })resilience_recovery({ action: "update", language: "en" })resilience_recovery({ action: "update", language: "zh", prompt: "任务完成了吗?如果没完成请继续完成任务。" })resilience_recovery({ action: "update", enabled: false })View agent session and tool failure recovery records. These are separate from LLM API errors and are stored per OpenClaw instance.
Parameters:
action: "summary" (default) | "list"limit: maximum records to returnExamples:
resilience_sessions({ action: "summary" })resilience_sessions({ action: "list", limit: 20 })| Category | Description | Retryable |
|---|---|---|
| ---------- | ------------- | ----------- |
rate_limit | 429 Too Many Requests | ✅ |
server_overload | 503 Service Unavailable | ✅ |
timeout | Request timeout | ✅ |
auth_failed | 401/403 Authentication failed | ❌ |
network_error | Connection errors | ✅ |
model_unavailable | Model not found or offline | ✅ |
context_too_long | Context length exceeded | ❌ |
token_parse_error | Tokenizer/token parsing failure | ❌ |
invalid_model_output | Malformed model output / response format failure | ❌ |
session_runtime_error | Non-API session runtime failure | ❌ |
unknown | Unclassified errors | ❌ |
| Name | Type | Max Retries | Intervals | Error Types |
|---|---|---|---|---|
| ------ | ------ | ------------- | ----------- | ------------- |
| default-exponential | exponential | 5 | 1m→15m | rate_limit, server_overload, timeout, network_error |
| rate-limit-fixed | fixed | 3 | 30s | rate_limit |
| model-backoff | custom | 6 | 1m→2h | server_overload, model_unavailable |
| network-retry | exponential | 4 | 5s→1m | network_error |
Per-instance data: ~/.openclaw/plugins/resilience/instances/ (stats, logs, strategies, tasks). Legacy root layout is still read as default.
~/.openclaw/plugins/resilience/instances/<instance-id>/
├── meta.json
├── stats.json
├── strategies.json
├── recovery-settings.json
├── active-retries.json
├── session-retries.json
├── logs/YYYY-MM-DD.jsonl
└── tasks/
共 5 个版本