← 返回
未分类 中文

Multi-Agent Status

Cross-agent health monitoring for multi-host OpenClaw deployments. Each agent pushes structured status reports (JSON) to a central location. A PM/monitoring...
跨代理多主机OpenClaw部署的健康监控。每个代理将结构化状态报告(JSON)推送至中心位置,用于PM/监控。
agenthyjack
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 391
下载
💾 0
安装
1
版本
#dashboard#health#latest#monitoring#multi-agent#ops#status

概述

Multi-Agent Status Reporter

Overview

In a multi-agent OpenClaw deployment, each agent monitors itself but has blind spots. This skill solves that by having every agent push structured health reports to a shared location, where a monitoring agent reads and alerts on issues.

Architecture

Agent A (Host 1) --push--> /shared/agent-status/agent-a.json
Agent B (Host 2) --push--> /shared/agent-status/agent-b.json
Agent C (Host 3) --push--> /shared/agent-status/agent-c.json
                                    ↓
                          Monitor Agent reads all
                          → alerts on failures
                          → updates dashboard

What Gets Reported

Each agent pushes a JSON status report containing:

  • Gateway health — is the RPC probe passing?
  • Cron status — total crons, how many erroring, which ones
  • Active projects — what the agent is working on
  • Timestamp — so the monitor knows if a report is stale (agent might be down)

Setup

Scripts available in the Collective Skills repo

1. Create shared status directory

On your central/shared host:

mkdir -p /path/to/agent-status
chmod 777 /path/to/agent-status

Scripts are in references/

2. Configure each agent

Copy the script from references/agent-status-report.sh to your preferred location and make it executable:

#!/bin/bash
# agent-status-report.sh
AGENT_NAME="my-agent"
STATUS_DIR="/path/to/agent-status"
REPORT="$STATUS_DIR/$AGENT_NAME.json"

# Get gateway status
GW_STATUS=$(openclaw gateway status 2>&1)
if echo "$GW_STATUS" | grep -q "RPC probe: ok"; then
    GATEWAY="healthy"
elif echo "$GW_STATUS" | grep -q "RPC probe: failed"; then
    GATEWAY="failed"
else
    GATEWAY="unknown"
fi

# Count cron errors
CRON_LIST=$(openclaw cron list 2>&1)
TOTAL=$(echo "$CRON_LIST" | grep -c "ok\|error" || echo 0)
ERRORS=$(echo "$CRON_LIST" | grep -c "error" || echo 0)

# Write report
cat > "$REPORT" << EOF
{
  "agent": "$AGENT_NAME",
  "timestamp": "$(date -Iseconds)",
  "gateway": "$GATEWAY",
  "crons": {
    "total": $TOTAL,
    "errors": $ERRORS
  }
}
EOF

echo "Status report pushed at $(date)"

For remote agents (different hosts), use SCP to push:

# Add to the end of the script:
scp "$REPORT" user@central-host:/path/to/agent-status/

3. Add cron job (every 4 hours recommended)

openclaw cron add \
  --name "agent-status-report" \
  --every "4h" \
  --message "Run the agent status report script" \
  --no-deliver

4. Configure the monitor agent

The monitoring agent's HEARTBEAT.md should include:

## Agent Status Check
1. Read all files in /path/to/agent-status/*.json
2. For each agent:
   - Is gateway healthy? If "failed" → alert immediately
   - Any cron errors? If errors > 0 → ping the agent
   - Is timestamp recent (within 8 hours)? If stale → agent may be down
3. Update DASHBOARD.md with findings

Alert Thresholds

ConditionAction
-------------------
Gateway failedAlert human immediately
Cron errors ≥ 2Ping owning agent for ETA on fix
Report stale (>8h)Ping agent — might be down
Report missingAgent never pushed — check if configured

Example Dashboard

# Agent Health Dashboard
*Last updated: 2026-04-02 14:00*

| Agent | Host | Gateway | Crons | Errors | Last Report |
|-------|------|---------|-------|--------|-------------|
| Hyjack | OPT1 | ✅ healthy | 16 | 2 | 10m ago |
| Rook | PC-147 | ✅ healthy | 9 | 0 | 2h ago |
| Dozer | Vigo | ✅ healthy | 3 | 0 | 1h ago |

⚠️ Hyjack: 2 cron errors (Research Scout, sunday-self-compassion)

Windows Support

For Windows agents, copy references/agent-status-report.ps1 and run it with:

# agent-status-report.ps1
$timestamp = Get-Date -Format "o"
$tempFile = "$env:TEMP\agent-status.json"

# Gateway check
$gwStatus = openclaw gateway status 2>&1 | Out-String
if ($gwStatus -match "RPC probe: ok") { $gw = "healthy" }
elseif ($gwStatus -match "RPC probe: failed") { $gw = "failed" }
else { $gw = "unknown" }

# Build report
@{
    agent = "my-agent"
    timestamp = $timestamp
    gateway = $gw
} | ConvertTo-Json | Out-File $tempFile -Encoding utf8

# Push to central host
scp $tempFile user@central-host:/path/to/agent-status/my-agent.json

Notes

  • SSH key auth required for cross-host pushes. Set up passwordless SSH first.
  • The monitor agent should be on the same host as the status directory for local reads.
  • Reports are intentionally small (<1KB) to minimize storage and transfer overhead.
  • Stale detection (>8h) assumes 4h push interval. Adjust threshold if you change interval.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-03 09:10 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Marrs

agenthyjack
内存维护助手,适用于RAG/向量数据库。包含save_memory()辅助函数、监控和碎片整理脚本模板、热队列支持以及可配置...
★ 0 📥 471

Dial A Cron

agenthyjack
用于OpenClaw的有状态Cron系统,支持持久化内存、变更检测、智能路由、Token预算跟踪及自愈,需安装 'openclaw' 等依赖。
★ 0 📥 406

Agent Health Diagnostics

agenthyjack
诊断并修复最常见的四个OpenClaw代理故障——心跳刷屏、API速率限制级联、频道死亡循环以及内存/嵌入错误。Battl...
★ 0 📥 376