name: nas-dashboard
description: >-
🏠 NAS Dashboard v3 — 环境感知的智能运维仪表盘。
适合谁:自建 NAS / HomeLab / 家庭服务器用户,跑了 ZFS + Docker + 一堆自托管服务,需要一个"一眼看清问题、告诉你修什么"的每日健康报告。
做什么:一键采集系统全貌(ZFS/磁盘SMART/Docker/Frigate摄像头/GPU/UPS/安全审计),环境感知评估(内网/公网分级),三级严重度(🔴紧急🟡关注🟢参考),每条告警附带评估原因+可执行修复命令。
最佳环境:Ubuntu/Debian + ZFS + Docker,有 UPS 和 Frigate NVR 效果最佳。支持 Telegram 每日推送。
一句话:不用天天盯着服务器,每天早上看一眼有没有🔴,有就照着修。
Triggers: "nas status", "server status", "system dashboard", "文本仪表盘", "服务器状态", "系统健康".
Exception-driven, alert-first text dashboard for NAS/HomeLab monitoring. Prioritises problems over green lights — if everything is healthy you get a one-liner, if something is wrong you get a surgical report with severity ratings and actionable fix commands.
Platform: Linux (Ubuntu/Debian tested). Partial support: any Linux with ZFS. macOS/Windows: most sections unavailable (ZFS, SMART, systemd, apt, iostat, sensors are Linux-only).
YYYY-MM-DD HH:MM or XdXh relative.export ZPOOL="tank" # default: auto-detect first pool
export DISK_LIST="sda sdb sdc" # default: auto-detect all /dev/sd?
export FRIGATE_CAM_MAP="cam_d82e8e00:客厅,cam_ae7e3010:门口,cam_a24a20c0:车库"
export UPS_NAME="ups@localhost" # default: ups@localhost
bash scripts/collect.sh
openclaw cron add \
--name "NAS仪表盘" \
--schedule "0 9 * * *" \
--agent main \
--timeout 180 \
--delivery "announce:telegram:YOUR_CHAT_ID" \
--prompt "Run nas-dashboard skill: collect and format the dashboard report, then send to Telegram."
Run scripts/collect.sh. Sections: SYSTEM, ZFS, VDEV_DISK (disk→pool mapping), DISKS (incl. SMART realloc/pending/udma), DISKIO, DOCKER, FRIGATE, GPU, NETWORK, PROCESSES, SERVICES, LOGS, SHARES, SECURITY, UPDATES, BOOT, UPS, TIMESHIFT.
Scan ALL collected data against these thresholds and classify severity:
| Scope | Condition | Severity |
|---|---|---|
| ------- | ----------- | ---------- |
| Pool health ≠ ONLINE | any pool | ❌ CRITICAL |
| Scrub errors > 0 | any pool | ❌ CRITICAL |
| ZFS_EVENT present | any event | ⚠️ WARNING |
| Disk health ≠ PASSED | any disk | ❌ CRITICAL |
| Disk realloc > 0 | any disk | ❌ CRITICAL |
| Disk pending > 0 | any disk | ⚠️ WARNING |
| Disk udma_crc > 100 | any disk | ⚠️ WARNING |
| Disk temp > 45°C | any disk | ⚠️ WARNING |
| Disk temp > 55°C | any disk | ❌ CRITICAL |
| Disk r_await > 20ms or w_await > 20ms | any disk | ⚠️ WARNING |
| Docker container Down/Unhealthy | any ctr | ⚠️ WARNING |
| Frigate camera skip > 1.0 | any camera | ⚠️ WARNING |
| Frigate camera fps = 0 | any camera | ❌ CRITICAL |
| Frigate storage > 80% | any storage | ⚠️ WARNING |
| Frigate storage > 95% | any storage | ❌ CRITICAL |
| Timeshift count > 100 | ts active | ⚠️ WARNING |
| Timeshift count > 300 | ts active | ❌ CRITICAL |
| CPU temp > 70°C | ⚠️ WARNING | |
| CPU temp > 85°C | ❌ CRITICAL | |
| GPU temp > 75°C | ⚠️ WARNING | |
| GPU temp > 85°C | ❌ CRITICAL | |
| ARC hit rate < 90% | ⚠️ WARNING | |
| ZFS capacity > 80% | ⚠️ WARNING | |
| ZFS capacity > 90% | ❌ CRITICAL | |
| Root disk > 85% | ⚠️ WARNING | |
| Root disk > 95% | ❌ CRITICAL | |
| Failed logins > 0 | today | ⚠️ WARNING |
| Failed systemd services | any | ⚠️ WARNING |
| UPS ≠ OL (not Online) | ⚠️ WARNING | |
| UPS battery < 50% | ⚠️ WARNING | |
| OOM events present | ⚠️ WARNING | |
| SSH pass auth on | sshd | ⚠️ WARNING |
| SSH root login on | sshd | ⚠️ WARNING |
| Firewall inactive | ufw/iptables | ⚠️ WARNING |
| Container image >6mo | Docker | ℹ️ INFO |
| APT updates available | ℹ️ INFO |
If 0 alerts: skip the 🚨 风险预警 section entirely.
If alerts exist: build 🚨 风险预警 section listing every alert, grouped by severity (❌ first, then ⚠️, then ℹ️). Format:
🚨 风险预警
❌ {description}
⚠️ {description}
Sort by severity: ❌ CRITICAL → ⚠️ WARNING → ℹ️ INFO. One line per alert. Examples:
❌ sde (tank) Reallocated_Sector_Ct: 5 — 坏道增长,建议立即更换⚠️ cam_a24a20c0 skip:2.4 — 解码丢帧,检查 GPU 或降低分辨率⚠️ nextcloud (Docker) unhealthy — 容器异常⚠️ nfs-server (service) inactive — NFS 服务未运行⚠️ Timeshift: 430 snaps — 过多,建议清理 (>100)⚠️ sde udma_crc:29 — SATA 链路错误ℹ️ 0 APT updates pendingCorrelation rules (apply these when building alerts):
(疑似僵死进程,建议重启容器)(GPU 编码瓶颈)Use this compact layout. Omit sections entirely if no data or all is healthy and not noteworthy.
╭──────────────────────────────────╮
│ 🏠 NAS Dashboard · {YYYY-MM-DD (周X)} │
╰──────────────────────────────────╯
Then sections in order:
🖥 SYSTEM — one line:
🖥 {hostname} · {OS_short} · up {uptime_simplified} · load:{load_1min}
CPU:{cpu_used%} ██████░░░░ · RAM:{mem_used}/{mem_total} ({mem_pct}) · / {root_used}/{root_total} ({root_pct})
CPU:{cpu_temp°C} · Mobo:{hottest_mobo_temp}°C
uptime_simplified: convert "1 week, 2 days, 18 hours" → "1w2d18h"OS_short: "Ubuntu 24.04" from "Ubuntu 24.04.4 LTS"█ count🗄 ZFS — pool summary line + ARC line:
🗄 {pool} [{health_emoji} {health}] · {alloc}T/{size}T ({cap}%) ██████░░░░ · frag:{frag}%
ARC:{arc_size}GiB/{arc_max}GiB · hit:{arc_hit}% · Scrub:{scrub_summary}
· L2ARC:{l2_size}GiB hit:{l2_hit}%█ countSnaps:{count} latest:{yyyy-mm-dd}💾 DISKS — fixed-width column layout, one line per disk.
Use a mini-table with │ separators so all status emoji align vertically:
💾 DISKS ───────────────────────────────────
sda (tank) │ W1003ABYZ-011FA0 │ 931G │ 42°C │ 10909h │ ✅
sdc (tank) │ WD10PURX-78D85Y0 │ 931G │ 39°C │ 6804h │ r_await:10ms ✅
sde (tank) │ ST1000DM003-1ER16 │ 931G │ 36°C │ 10223h │ udma:29 ✅
Column widths (pad/crop each field to fit):
| Col | Field | Width | Align |
|---|---|---|---|
| ----- | ------- | ------- | ------- |
| 1 | {disk} ({pool_role}) | 9 | left |
| 2 | model name | 18 | left, truncate if longer |
| 3 | size | 6 | right |
| 4 | temp | 5 | right |
| 5 | hours | 7 | right |
| 6 | alerts + status | 14 | right |
realloc:X if > 0 (else pad)pending:X if > 0 (else pad)udma:X if > 0 (else pad)r_await:Xms if > 5ms (else pad)✅ (PASSED) or ❌ (FAIL)sda (tank), sdb (tank-cache) etc.Disk I/O — only show disks with util>5% or await>10ms:
IO: sda r2.5/w4.6ms util5.2% · sdc r10.5/w3.3ms
🐳 DOCKER — converged view:
🐳 {total} running ({healthy_count} healthy) · v{docker_ver} · {image_count} imgs · {volume_gb}GB
Then only list unhealthy containers explicitly:
⚠️ nextcloud: Up 2 days (no healthcheck)
ℹ️ vaultwarden/server:latest: 4 months old (consider updating)
⚠️ xunlei: Up 2 days (no healthcheck) [CPU 11.2% — 疑似僵死]
healthy_count: count of containers with "(healthy)" in status[CPU X% — 疑似僵死]· {reclaimable} reclaimable ⚠️📹 FRIGATE — cameras, only expand problem ones:
📹 3 cams · detection:{det_fps}fps · infer:{infer_ms}ms
✅ cam_d82e8e00: 5.1fps · ✅ cam_ae7e3010: 5.1fps
⚠️ cam_a24a20c0: 4.9fps · skip:2.4 (丢帧 49%)
FRIGATE_CAM_MAP env var(skip/fps*100)✅ name: fps📀 {path}: {used}G/{total}G ({pct}%) for each FRIGATE_STORAGE line. ⚠️ if >80%.📹 Frigate: no response ❌🎮 GPU — one line:
🎮 {gpu_model} · {temp}°C · {util}% · VRAM:{used}M/{total}M · {proc_count} procs
🌐 NETWORK — one line per active interface:
🌐 enp4s0: {ip} · ↓{total_rx} ↑{total_tx}
📊 PROCESSES — top 3 CPU only (compact):
📊 CPU: xunlei 11.2% · ffmpeg 3.6% · python3 3.1%
MEM: python3 3.6% · node 2.8% · gnome-shell 1.4%
⚙️ SERVICES — only show non-active or failed:
⚙️ ⚠️ nfs-server: inactive · 1 failed unit: snap.firmware-updater
🔒 SECURITY — compact with full audit:
🔒 Failed logins: {count} · Boot: {boot_time_YYYY-MM-DD HH:MM} ({Xd} ago)
SSH: port:{port} · root:{yes/no} · pass:{yes/no} · key:{yes/no}
FW: {ufw/iptables status} ({n} rules) · Ports: {open_ports_list}
Failed logins: 0· f2b active🔋 UPS — one line:
🔋 {status_icon} {status_text} · charge:{batt_charge}% · load:{ups_load}% · in:{input_v}V · batt:{batt_v}V
OL → ⚡Online, OB → 🪫Battery, OB DISCHRG → 🪫Discharging💾 TIMESHIFT — one line with health check:
💾 Timeshift: {count} snaps · latest:{YYYY-MM-DD HH:MM}
⚠️ 过多,建议清理❌ 严重过多 (>300),立即清理!📦 UPDATES — only if > 0:
📦 {count} APT updates available
🔧 OOM / Logs — only if data present:
🔧 OOM: {oom_line_truncated}
━━━━ divider (4 chars, not full-width)Use message tool with action=send to the target channel.
| Tool | Required for | Package |
|---|---|---|
| ------ | ------------- | --------- |
| zpool/zfs | ZFS section | zfsutils-linux |
| smartctl | Disk health | smartmontools |
| docker | Docker section | docker-ce |
| nvidia-smi | GPU section | nvidia-driver |
| iostat | Disk I/O | sysstat |
| sensors | Temperatures | lm-sensors |
| upsc | UPS section | nut-client |
| journalctl | Logs | systemd (built-in) |
SMART, auth.log, and zpool events need sudo -n (passwordless sudo). Sections degrade gracefully if unavailable.
| Metric | ⚠️ Warning | ❌ Critical |
|---|---|---|
| -------- | ----------- | ------------- |
| Disk temp | >45°C | >55°C |
| CPU temp | >70°C | >85°C |
| GPU temp | >75°C | >85°C |
| ZFS capacity | >80% | >90% |
| Root disk | >85% | >95% |
| ARC hit rate | <90% | — |
| Disk r_await/w_await | >20ms | — |
| Frigate skip | >1.0fps | >3.0fps |
| Frigate storage | >80% | >95% |
| Frigate camera fps | <1.0 | =0 |
| Timeshift snaps | >100 | >300 |
| realloc (SMART 5) | — | >0 |
| pending (SMART 197) | >0 | >10 |
| udma_crc (SMART 199) | >100 | >1000 |
| UPS battery | <50% | <20% |
| Disk I/O util | >50% | >80% |
FRIGATE_CAM_MAP env var: cam_id:Name,cam_id:Namezpool list. Override with ZPOOL env var.lsblk. Override with DISK_LIST env var.共 2 个版本