🧠 OpenClaw Problem Solver (v6.0-M4) — Master Document

This skill acts as an advanced diagnostic, resolution, and validation engine for any question or bug report related to the OpenClaw framework itself. v6.0 adds four layers: Runtime Health Check (M1), API Key Validation + Resource Monitor (M2), Unified Diagnosis Report + Regression Check (M3), and Interactive Health Dashboard (M4).

🔒 Privacy-First Constraint

All knowledge storage (memory, logs) and final reports must follow these rules:

API Keys: Never store API keys in plain text. Use placeholders or hashes (e.g., sk-).
Private Details: Redact sensitive project names, internal credentials, and user identity info unless explicitly authorized.

🎯 When to Use This Skill

Use this skill when the user:

Asks "Why is [feature] not working in OpenClaw?"
Reports a specific bug (e.g., "The gateway tool fails with error X").
Needs guidance on how to implement a specific feature using OpenClaw's architecture or tools.
Wants to know the best practice for a certain task within the OpenClaw ecosystem, but requires verification.

🌟 Recommended Quick Fix Flow (The Golden Path)

For the vast majority of OpenClaw issues, this sequence provides the fastest path to resolution. Always suggest this flow first when a user reports an unspecified problem or bug!

Unified Diagnosis (一键诊断, v6.0-M3): Run python scripts/diagnosis_formatter.py which auto-collects all three sources (openclaw doctor + runtime_health_check + api_key_validator) into one severity-sorted report.
Visual Dashboard (可视化仪表盘, v6.0-M4): Run python scripts/health_dashboard.py --canvas to render the report as an interactive HTML dashboard (embed with [embed ref="health_dashboard" height="740"]).
Save Baseline (保存基线, v6.0-M3): Run python scripts/diagnosis_formatter.py --save-baseline before making any fix.
Resolution Attempt（修复）: If the report reveals problems, run openclaw doctor --fix or apply suggested fixes manually.
Regression Check (回归验证, v6.0-M3): After fixes, run python scripts/diagnosis_formatter.py --compare to validate what was fixed, what's new, and what's unchanged.

🚀 The Evolved Workflow (6-Step Cycle + Proactive Layers)

The skill operates by strictly following these steps in sequence, enhanced by proactive layers:

🤖 Gateway Watchdog (v6.1) — Proactive Stability Layer

Overview:

A background daemon that periodically polls the Gateway health status. Runs independently from user requests, providing real-time monitoring for anomalies such as Gateway downtime, RPC failures, and configuration drift.

v6.1 Feature Highlights:

Feature	Description
---------	-------------
🎯 Real Health Check	Calls `openclaw gateway status --json`, parses `service.runtime.status` + `rpc.ok`
🔇 Noise Filtering	Alerts only after ≥3 consecutive failures; resets after ≥3 consecutive successes
📊 Severity Levels	Four-tier classification (🟢/🟡/🟠/🔴) with auto-escalation
📡 Dual-Channel Alerting	Feishu DM (instant, primary) + WebChat (async thread, secondary)
🔄 Single Instance	Windows Mutex ensures only one daemon runs at a time
📦 Log Rotation	Auto-rotates at 5MB, keeps 3 backup files
⏰ Precise Scheduling	Fixed-minute schedule eliminates cumulative drift
🔐 Hot-Reload Config	Monitors `openclaw.json` changes and reloads automatically
🖥️ Auto-Start	Registers in `HKCU\Run` for auto-launch on user login
👋 Startup Confirmation	Sends status to both channels on startup
🐛 Config Cache Fix	Fixed `load_gateway_config()` returning `token=None` on cache hit (v6.1)
⏱️ Async WebChat	Fixed background thread with 60s timeout for model loading (v6.1)
📝 Detailed Error Logs	Fixed full stack traces in Feishu + WebChat notifications (v6.1)

📊 Severity & Alert Rules:

Consecutive Failures	Level	Behavior
---------------------	-------	----------
< 2	🟢 Level 1 — Normal	Silent, no notification
2	🟡 Level 2 — Notice	Silent, continue monitoring
≥ 3	🟠 Level 3 — Warning	Trigger notification (first time)
≥ 5	🔴 Level 4 — Critical	Trigger notification + repeat every 5 failures
Gateway stopped	🔴 Level 4 — Critical	Immediate notification
Recovered for 3 cycles	✅ Recovered	Send recovery notification

🚨 Notification Triggers:

Severity escalation (e.g., 1→3): sends alert
First time hitting alert threshold (≥3 consecutive failures): sends alert
At critical level, every 5 failures: sends reminder
System recovery (abnormal→normal for 3 consecutive checks): sends recovery notice

Prerequisites

Feishu channel configured in OpenClaw — openclaw channels add feishu
Environment variable — Set WATCHDOG_FEISHU_USER_ID to your Feishu open_id:

```powershell

$env:WATCHDOG_FEISHU_USER_ID = "ou_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

```

Gateway HTTP API (optional, for WebChat channel) —

```bash

openclaw config set gateway.http.endpoints.chatCompletions.enabled true

openclaw gateway restart

```

> ⚠️ WebChat timeout: The model inference takes ~40s on first load.

> The watchdog uses a background thread with 60s timeout so it doesn't block the main monitoring loop.

Deployment (v6.1)

Run the Watchdog as a standalone background process:

# Start
python scripts\watchdog_monitor.py

# Install auto-start (launches on user login)
python scripts\watchdog_monitor.py --install

# Remove auto-start
python scripts\watchdog_monitor.py --uninstall

Or use Start-Process for a hidden window:

$py = (Get-Command python).Source
$script = "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\watchdog_monitor.py"
Start-Process -FilePath $py -ArgumentList $script `
    -WindowStyle Hidden `
    -WorkingDirectory "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts"

Check process status:

Get-WmiObject Win32_Process -Filter "Name like 'python%'" |
    Where-Object { $_.CommandLine -match 'watchdog_monitor' } |
    Select-Object ProcessId, @{n="Start";e={$_.CreationDate}}

Stop the Watchdog:

# Find the PID first, then
Stop-Process -Id <PID> -Force

View live logs:

Get-Content "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\gateway_watchdog.log" -Tail 10 -Wait

View state file:

Get-Content "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\watchdog_state.json" -Raw | ConvertFrom-Json

🔗 Notification Architecture (v6.1)

Watchdog (background daemon, 60s interval)
    │
    ├─ [Channel A — PRIMARY] openclaw message send --channel feishu
    │            → Feishu direct message (ou_xxx)
    │            → **Instant delivery, zero token cost**
    │            → Includes full error stack traces
    │
    ├─ [Channel B — SECONDARY] Gateway HTTP API (/v1/chat/completions)
    │            → WebChat live session (agent:main:main)
    │            → **Async background thread** (doesn't block monitoring)
    │            → 60s timeout for model loading (~40s typical)
    │            → token cost: minimal (max_tokens=10)
    │
    └─ [Log]     watchdog_state.json (local check history, last 1440)
                 gateway_watchdog.log (rotating, 5MB)

Channel priority: Feishu is now the primary channel (instant, reliable via CLI).

WebChat is secondary (async thread, requires model inference).

Dual-Channel Alerting + Autofix Triggers (v6.1)

Channel priority has changed in v6.1:

Feishu (instant) is now the primary notification channel
WebChat (async thread) is the secondary channel

When a WebChat alert arrives (~40s after error), reply with any of these commands to start diagnosis:
run autofix self-check
check what's wrong with Gateway
auto repair
Feishu messages serve as the instant primary notification (not offline backup)
Each alert message includes detailed error context and stack traces

Workflow Integration + Autofix Linkage

The Watchdog forms a Proactive Stability Layer, independent of the standard diagnostic flow (Steps 0-5). When an anomaly is detected:

a. The daemon logs the event and generates a System Health Warning (SHW) report

b. Sends a real-time alert (with diagnostic guidance + context JSON)

c. Auto-repair low-risk known issues (e.g., CLI path problems) automatically, then verifies

d. High-risk operations only provide repair suggestions, awaiting user confirmation

🛠️ Auto-Repair Module (v1.0)

Repair Script Library: scripts/auto_repair.py

Matches repair plans based on the diagnostic context from Watchdog alerts:

Issue	Match Condition	Repair Action	Risk
-------	----------------	---------------	------
Gateway stopped	`status: stopped`	Restart Gateway	🟡 Needs confirmation
RPC connection failed	`rpc_ok: false`	Restart Gateway	🟡 Needs confirmation
CLI unavailable	`status: cli_error`	Check installation path	🟢 Auto-execute
HTTP unreachable	`status: unreachable`	Check port + restart	🟡 Needs confirmation

Repair Verification Loop:

After auto-repair, wait 3 seconds then re-run health check
Verification passed → send "Auto-repair succeeded" confirmation
Verification failed → send "Still needs manual diagnosis" escalation

Health Trend Tracking:

watchdog_state.json retains the last 1440 check records (24 hours)
Each record includes: timestamp, health status, severity level, source
Trend data can be visualized via Canvas health dashboard

Current Status (v6.1)

✅ Real health check — service.runtime.status = running + rpc.ok = true
✅ Noise filtering — ≥3 failures to trigger, ≥3 successes to reset
✅ Severity levels — Four-tier (🟢/🟡/🟠/🔴) with auto-escalation
✅ Feishu channel (PRIMARY) — openclaw message send --channel feishu, zero token cost, instant delivery
✅ WebChat channel (SECONDARY) — Gateway HTTP API /v1/chat/completions, async background thread, 60s timeout
✅ Single instance — Windows Mutex prevents duplicates
✅ Log rotation — 5MB auto-rotate, 3 backups
✅ Precise scheduling — Fixed-minute schedule, no drift
✅ Hot-reload config — Watches openclaw.json for changes
✅ Auto-start — HKCU\Run registry, launches on user login
✅ Startup confirmation — Sends status to both channels on start
✅ --status command — Shows real-time state and exits cleanly (no longer starts daemon by accident)
✅ Stale process cleanup — Auto-kills orphaned --status processes on daemon startup
✅ Config cache fix — load_gateway_config() no longer returns token=None on subsequent calls
✅ Detailed error logging — Full stack traces in all notification channels
✅ Async WebChat delivery — Background thread prevents blocking main monitor loop
📁 State file: scripts/watchdog_state.json
📁 Log file: scripts/gateway_watchdog.log

Standard Workflow (6-Step Cycle + Proactive Layers)

This skill strictly follows these steps in sequence, enhanced by proactive layers:

Step 0: Resource Pre-check & Cost Management (New) — Starting Point

Before any resource-intensive external search or service call, proactively check API quotas, rate limits, and budget consumption for the current active session. If quota-low alerts or known rate-limit thresholds are hit, pause all execution steps and notify the user with a clear "resource warning," requesting they wait or switch to a low-cost / local alternative.

Step 1: Primary Search (See `docs/MODULE_02_SearchChain.md` — Step 1)

Search the official documentation (docs.openclaw.ai) for official solutions
Gather context information related to the problem
Extract key error messages and configuration status

Step 2: Backup Search (See `docs/MODULE_02_SearchChain.md` — Step 2)

If the official docs don't provide an answer, search GitHub Issues
Look for community-reported problems and solutions
Collect code verification requirements or pattern-matching information

Step 3: Analysis & Decision (See `docs/MODULE_03_ValidationAction.md` — Step 3)

Choose the best action path based on search results
Perform evidence chain analysis (L1) to evaluate solution reliability
Decide between direct answer, code verification, or contextual inquiry

Step 4: Validation & Action (v5.0 Enhanced) (See `docs/MODULE_03_ValidationAction.md` — Step 4 + `docs/MODULE_03_Enhancement_Reports.md`)

Execute validation (MRE) or propose a contextual inquiry
Generate an interactive diagnosis report (if MRE fails)
✅ Three-step confirmation before fixes: Before running any command with system-modifying or wide-ranging effects (e.g., openclaw doctor --fix, exec/write), follow these safety steps:

Problem location & explanation: Explain the diagnosis result and the core issue to be fixed
Scope confirmation: Ask about and record the specific target or runtime environment (e.g., "This change will only affect the local development config. Do you agree?")
Rollback plan: Provide an executable one-click rollback command. Only proceed after the user agrees via /approve

Step 5: Finalization & Memory Update (See `docs/MODULE_04_Finalization.md`)

Save facts, lessons learned, and update state
Trigger L2 Hot-Start Query and L3 Skill Creation Suggestions

> 💡 Golden Path (Recommended Flow): For most OpenClaw issues, the fastest resolution path is: openclaw doctor → openclaw doctor --fix

🖼️ Diagnosis Report Visualization (v5.0)

When MRE validation fails, generate an interactive diagnostic report using canvas.snapshot() with:

Visual risk flags (🔴/🟠/🟢)
Evidence chain diagram (Doc vs GH comparison)
Exec result status codes highlighted
Rollback command code block display

🧠 Error Log Intelligent Summary (ELIS — v5.0)

When MRE fails, use LLM-powered analysis to extract root causes from exec output:

Core Issue: One-sentence summary
Possible Causes: 2–3 bullet points
Recommended Fix: Specific command(s)
Risk Level + Confidence Score

📚 Modules & Deep Dives

Consult the following categorized sub-documents for detailed process explanations:

📁 `docs/` — Core Module Documentation

MODULE_01_PreCheck.md: Problem pre-check, context collection, and security scanning.
MODULE_02_SearchChain.md: Search strategy (Docs → GitHub) with evidence chain analysis (L1).
MODULE_03_ValidationAction.md: Decision-making based on search results, choosing between direct answer, code verification, or inquiry.
MODULE_04_Finalization.md: Finalization: memory storage, lesson learning, and state updates, including L2 Hot-Start Query and L3 Skill Creation Suggestions.

📁 `docs/enhancement/` — v5.0 Enhancement Features

MODULE_03_Enhancement_Reports.md: v5.0 new modules — Diagnosis Report Visualization (DRE) + Error Log Intelligent Summary (ELIS).

📁 `docs/tutorials/` — Usage Examples

EXAMPLE_usage.md: Detailed code examples and usage scenarios.
QUICK_START_v5.0.md: v5.0 quick-start guide covering environment setup, workflow, and best practices.

📁 `docs/reports/` — Summary Reports

AUTOFIX_V5.0_SUMMARY.md: v5.0 completion report and feature summary.
CHANGES_v5.0.md: Complete change log from v4.5 to v5.0.
VERIFICATION_FINAL.md: Final integrity verification report.

📁 `scripts/` — Python/JS Tools

elis_helper.py: ELIS error log analysis helper (LLM-powered error analyzer).
canvas_report_generator.py: Canvas diagnosis report generator (HTML template rendering + URL registration).
auto_repair.py: Auto-repair library — matches diagnostic context to repair plans.
watchdog_monitor.py: Gateway Watchdog daemon — background health monitoring + dual-channel alerting.
requirements.txt: Python dependencies (pywin32).

This file is the master skill document. It defines the complete problem-solving blueprint and integrates all capability layers.

OpenClaw Problem Solver自动修复小龙虾

概述

🧠 OpenClaw Problem Solver (v6.0-M4) — Master Document

🔒 Privacy-First Constraint

🎯 When to Use This Skill

🌟 Recommended Quick Fix Flow (The Golden Path)

🚀 The Evolved Workflow (6-Step Cycle + Proactive Layers)

🤖 Gateway Watchdog (v6.1) — Proactive Stability Layer

Prerequisites

Deployment (v6.1)

🔗 Notification Architecture (v6.1)

Dual-Channel Alerting + Autofix Triggers (v6.1)

Workflow Integration + Autofix Linkage

🛠️ Auto-Repair Module (v1.0)

Current Status (v6.1)

Standard Workflow (6-Step Cycle + Proactive Layers)

Step 0: Resource Pre-check & Cost Management (New) — Starting Point

Step 1: Primary Search (See `docs/MODULE_02_SearchChain.md` — Step 1)

Step 2: Backup Search (See `docs/MODULE_02_SearchChain.md` — Step 2)

Step 3: Analysis & Decision (See `docs/MODULE_03_ValidationAction.md` — Step 3)

Step 4: Validation & Action (v5.0 Enhanced) (See `docs/MODULE_03_ValidationAction.md` — Step 4 + `docs/MODULE_03_Enhancement_Reports.md`)

Step 5: Finalization & Memory Update (See `docs/MODULE_04_Finalization.md`)

🖼️ Diagnosis Report Visualization (v5.0)

🧠 Error Log Intelligent Summary (ELIS — v5.0)

📚 Modules & Deep Dives

📁 `docs/` — Core Module Documentation

📁 `docs/enhancement/` — v5.0 Enhancement Features

📁 `docs/tutorials/` — Usage Examples

📁 `docs/reports/` — Summary Reports

📁 `scripts/` — Python/JS Tools

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Github

SPIN 销售法 Spin Sales Skill

CodeConductor.ai

OpenClaw Problem Solver自动修复小龙虾

概述

🧠 OpenClaw Problem Solver (v6.0-M4) — Master Document

🔒 Privacy-First Constraint

🎯 When to Use This Skill

🌟 Recommended Quick Fix Flow (The Golden Path)

🚀 The Evolved Workflow (6-Step Cycle + Proactive Layers)

🤖 Gateway Watchdog (v6.1) — Proactive Stability Layer

Prerequisites

Deployment (v6.1)

🔗 Notification Architecture (v6.1)

Dual-Channel Alerting + Autofix Triggers (v6.1)

Workflow Integration + Autofix Linkage

🛠️ Auto-Repair Module (v1.0)

Current Status (v6.1)

Standard Workflow (6-Step Cycle + Proactive Layers)

Step 0: Resource Pre-check & Cost Management (New) — Starting Point

Step 1: Primary Search (See docs/MODULE_02_SearchChain.md — Step 1)

Step 2: Backup Search (See docs/MODULE_02_SearchChain.md — Step 2)

Step 3: Analysis & Decision (See docs/MODULE_03_ValidationAction.md — Step 3)

Step 4: Validation & Action (v5.0 Enhanced) (See docs/MODULE_03_ValidationAction.md — Step 4 + docs/MODULE_03_Enhancement_Reports.md)

Step 5: Finalization & Memory Update (See docs/MODULE_04_Finalization.md)

🖼️ Diagnosis Report Visualization (v5.0)

🧠 Error Log Intelligent Summary (ELIS — v5.0)

📚 Modules & Deep Dives

📁 docs/ — Core Module Documentation

📁 docs/enhancement/ — v5.0 Enhancement Features

📁 docs/tutorials/ — Usage Examples

📁 docs/reports/ — Summary Reports

📁 scripts/ — Python/JS Tools

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Github

SPIN 销售法 Spin Sales Skill

CodeConductor.ai

Step 1: Primary Search (See `docs/MODULE_02_SearchChain.md` — Step 1)

Step 2: Backup Search (See `docs/MODULE_02_SearchChain.md` — Step 2)

Step 3: Analysis & Decision (See `docs/MODULE_03_ValidationAction.md` — Step 3)

Step 4: Validation & Action (v5.0 Enhanced) (See `docs/MODULE_03_ValidationAction.md` — Step 4 + `docs/MODULE_03_Enhancement_Reports.md`)

Step 5: Finalization & Memory Update (See `docs/MODULE_04_Finalization.md`)

📁 `docs/` — Core Module Documentation

📁 `docs/enhancement/` — v5.0 Enhancement Features

📁 `docs/tutorials/` — Usage Examples

📁 `docs/reports/` — Summary Reports

📁 `scripts/` — Python/JS Tools