← 返回
未分类

Resilience Monitor

Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status.
监控OpenClaw API错误,追踪模型性能,配置重试策略,生成报告,监督任务恢复状态。
leijack-lo leijack-lo 来源
未分类 clawhub v0.5.1 5 版本 100000 Key: 无需
★ 0
Stars
📥 169
下载
💾 0
安装
5
版本
#latest

概述

Resilience Skill

LLM API error tracking, classification, retry, and task recovery for OpenClaw.

> ⚠️ 必须先安装配套插件,否则此 Skill 完全无法工作

> 此 Skill 仅提供自然语言描述、示例和触发话术

> 核心实现(工具注册、model_call_ended 自动错误记录 hook、重试引擎、Dashboard HTTP 服务、JSONL 日志与统计持久化)全部来自插件 @leiJack-lo/resilience

> 只装 Skill 不装插件 → 工具不存在、调用失败、不会自动收集数据、Dashboard 也不会启动。

安装(必须按顺序)

clawhub login

# 1. 安装插件(必须带 --dangerously-force-unsafe-install)
# 原因:插件会使用 open 包打开浏览器、启动本地 HTTP server (默认 18765)、
# 注册 model_call_ended / agent_end / after_tool_call 等敏感 hook、读写 ~/.openclaw/plugins/resilience/ 下的日志和配置。
# ClawHub 安全扫描因此标记为 "suspicious"(这是正常行为,不是恶意代码)。
openclaw plugins install clawhub:@leiJack-lo/resilience --dangerously-force-unsafe-install

# 2. 安装此 Skill(获得中文自然语言能力 + 工具描述)
openclaw skills install resilience-monitor

# 3. OpenClaw 2026.6.10+:允许 agent_end 会话恢复 hook 读取会话结束事件
# 不设置时,API 错误统计仍可用,但“会话失败统计 + 下一轮恢复指令”不会启用。
openclaw config set plugins.entries.resilience.hooks.allowConversationAccess true

# 4. 必须重启 Gateway,让插件的 hooks 和工具真正注册生效
openclaw gateway restart

安装成功后即可直接对 agent 说中文指令,例如:

  • "查看今天报错统计"
  • "打开 resilience 面板"
  • "修改超时重试策略为指数退避"
  • "生成今日错误日报"

验证方法:重启后问 agent "resilience 插件安装好了吗?" 或直接试一个工具调用。如果提示工具不存在,就说明插件没加载成功。

配置(面板端口、是否自动启动 Dashboard 等)放在 ~/.openclaw/openclaw.jsonplugins.entries.resilience.config 下;OpenClaw 2026.6.10+ 的会话恢复授权放在 plugins.entries.resilience.hooks.allowConversationAccess 下(见下方 dashboard 工具说明)。

Overview

This skill adds natural language support and Chinese examples on top of the Resilience plugin. It lets your agent monitor API health, inspect per-model error patterns, adjust retry strategies, generate reports, and control the live dashboard using everyday language.

Use it to:

  • Monitor API error rates and patterns
  • View per-model performance statistics
  • Configure retry strategies
  • Generate error reports
  • Track task recovery status
  • Track agent session and tool failure recovery queues
  • Configure automatic session recovery prompts in Chinese or English

Tools

resilience_dashboard

Open the live web dashboard in your browser for real-time error stats and retry strategy management.

Parameters:

  • action: "open" (default) | "status" | "stop"

Features:

  • Live error overview (today / hour / active retries)
  • Model breakdown table
  • Recent errors feed
  • Session/tool recovery queue summary
  • Retry strategy cards — set default, adjust max retries
  • Auto-refresh: 5s, 60s, 5min, 1h, or off

URL: http://127.0.0.1:18765/ (default port, configurable via dashboardPort)

Voice / natural language examples:

  • "打开错误统计页面" → resilience_dashboard({ action: "open" })
  • "打开监控面板" → resilience_dashboard({ action: "open" })
  • "打开 resilience 面板" → resilience_dashboard({ action: "open" })

The dashboard starts automatically when OpenClaw Gateway starts (unless dashboardEnabled: false).

重要:这些配置只有在插件已正确安装并加载后才生效(见最上面的安装前提)。

Configuration lives in ~/.openclaw/openclaw.json under plugins.entries.resilience.config (not only api.pluginConfig at hook time). Example:

"resilience": {
  "enabled": true,
  "hooks": {
    "allowConversationAccess": true
  },
  "config": {
    "dashboardPort": 18765,
    "dashboardEnabled": true,
    "instanceLabel": "my-workspace"
  }
}

At gateway_start, config is read from ctx.config + ctx.workspaceDir.

Multi-instance: Use the instance dropdown to view all instances (aggregated) or a single Gateway. Each instance stores data under ~/.openclaw/plugins/resilience/instances//. Strategy edits apply only to the local Gateway instance.

resilience_stats

View API error statistics by time period or model.

Parameters:

  • query (optional): Natural language query
  • "today" or empty — today's full summary
  • "hour" — current hour stats
  • "week" — current week stats
  • Any model name (e.g., "mimo-v2.5") — model-specific stats

Examples:

  • "查看今天报错统计" → resilience_stats({ query: "today" })
  • "查看 mimo-v2.5 的错误率" → resilience_stats({ query: "mimo-v2.5" })
  • "查看本周错误率" → resilience_stats({ query: "week" })

resilience_strategies

View, add, update, or reset retry strategies.

Parameters:

  • action: "list" (default) | "add" | "update" | "reset"
  • strategyName: Strategy name (required for add/update)
  • updates: Fields to update (for add/update). Use these shapes:
  • type: "fixed" | "exponential" | "custom"
  • maxRetries: number or numeric string, e.g. 3 or "3"
  • intervals: millisecond numbers or duration strings, e.g. [60000, 300000], ["30s", "2m"], or "30s, 2m, 5分钟"
  • cooldownMs: millisecond number or duration string, e.g. 10000, "10s", "10秒"
  • retryOn: array or comma-separated string of error categories
  • models: array or comma-separated string of model names

Examples:

  • "查看当前所有策略配置" → resilience_strategies({ action: "list" })
  • "修改超时重试策略为指数退避" → resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { type: "exponential" } })
  • "添加一个自定义重试策略" → resilience_strategies({ action: "add", strategyName: "my-strategy", updates: { type: "custom", maxRetries: 3, intervals: ["1m", "5分钟", "10m"], cooldownMs: "10s" } })
  • "把默认策略间隔改成 30 秒、2 分钟、5 分钟" → resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { intervals: "30s, 2m, 5分钟" } })
  • "重置策略为默认" → resilience_strategies({ action: "reset" })

resilience_report

Generate detailed error reports.

Parameters:

  • reportType: "daily" (default) | "model" | "recovery" | "full"
  • target: Model name or date (YYYY-MM-DD)

Examples:

  • "生成今日错误日报" → resilience_report({ reportType: "daily" })
  • "查看 mimo-v2.5 的详细报告" → resilience_report({ reportType: "model", target: "mimo-v2.5" })
  • "查看任务恢复状态" → resilience_report({ reportType: "recovery" })
  • "生成完整状态报告" → resilience_report({ reportType: "full" })

resilience_recovery

View or update automatic session recovery settings. Use this when the user wants to change the "continue the task" wording after a session failure, switch Chinese/English recovery language, or disable/enable automatic recovery.

Parameters:

  • action: "show" (default) | "update" | "reset"
  • enabled: true / false
  • language: "zh" | "en"
  • prompt: custom prompt overriding localized defaults
  • promptZh: custom Chinese prompt
  • promptEn: custom English prompt
  • ttlMs: queued recovery context TTL
  • cooldownMs: minimum interval between recovery injections per session
  • maxPerSession: maximum automatic injections per session

Examples:

  • "查看会话自动恢复设置" → resilience_recovery({ action: "show" })
  • "把继续任务话术改成中文" → resilience_recovery({ action: "update", language: "zh" })
  • "把继续任务话术改成英文" → resilience_recovery({ action: "update", language: "en" })
  • "修改继续任务话术为:任务完成了吗?如果没完成请继续完成任务" → resilience_recovery({ action: "update", language: "zh", prompt: "任务完成了吗?如果没完成请继续完成任务。" })
  • "关闭会话自动恢复" → resilience_recovery({ action: "update", enabled: false })

resilience_sessions

View agent session and tool failure recovery records. These are separate from LLM API errors and are stored per OpenClaw instance.

Parameters:

  • action: "summary" (default) | "list"
  • limit: maximum records to return

Examples:

  • "查看会话和工具失败恢复队列" → resilience_sessions({ action: "summary" })
  • "列出最近的会话恢复记录" → resilience_sessions({ action: "list", limit: 20 })

Error Categories

CategoryDescriptionRetryable
----------------------------------
rate_limit429 Too Many Requests
server_overload503 Service Unavailable
timeoutRequest timeout
auth_failed401/403 Authentication failed
network_errorConnection errors
model_unavailableModel not found or offline
context_too_longContext length exceeded
token_parse_errorTokenizer/token parsing failure
invalid_model_outputMalformed model output / response format failure
session_runtime_errorNon-API session runtime failure
unknownUnclassified errors

Retry Strategies

Strategy Types

  • fixed: Fixed interval between retries (e.g., every 30s)
  • exponential: Exponential backoff (1min → 2min → 4min → 8min...)
  • custom: User-defined interval schedule (e.g., [1min, 3min, 5min, 15min])

Default Strategies

NameTypeMax RetriesIntervalsError Types
-------------------------------------------------
default-exponentialexponential51m→15mrate_limit, server_overload, timeout, network_error
rate-limit-fixedfixed330srate_limit
model-backoffcustom61m→2hserver_overload, model_unavailable
network-retryexponential45s→1mnetwork_error

Data Storage

Per-instance data: ~/.openclaw/plugins/resilience/instances// (stats, logs, strategies, tasks). Legacy root layout is still read as default.

~/.openclaw/plugins/resilience/instances/<instance-id>/
├── meta.json
├── stats.json
├── strategies.json
├── recovery-settings.json
├── active-retries.json
├── session-retries.json
├── logs/YYYY-MM-DD.jsonl
└── tasks/

版本历史

共 5 个版本

  • v0.5.1 当前
    2026-07-02 09:33
  • v0.3.6
    2026-06-30 20:37
  • v0.3.4
    2026-06-12 00:35 安全 安全
  • v0.3.3
    2026-06-07 06:25 安全 安全
  • v0.3.0
    2026-06-06 07:08 安全 安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,952
it-ops-security

Free Ride - Unlimited free AI

shaivpidadi
管理OpenClaw的OpenRouter免费AI模型,自动按质量排名模型,配置速率限制备用方案,并更新opencla...
★ 472 📥 78,733
it-ops-security

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装后可防止您和您的用户受到提示注入、数据泄露及恶意行为的侵害。
★ 116 📥 31,039