← 返回
未分类 中文

Server Watchdog

Monitor remote servers via SSH — check service health (PM2, systemd, Docker), database status (MongoDB, MySQL, PostgreSQL), disk space, memory, and auto-rest...
通过SSH监控远程服务器——检查服务健康状态(PM2、systemd、Docker)、数据库状态(MongoDB、MySQL、PostgreSQL)、磁盘空间、内存以及自动重启...
qoohsuan qoohsuan 来源
未分类 clawhub v1.0.0 1 版本 99839.2 Key: 无需
★ 0
Stars
📥 621
下载
💾 0
安装
1
版本
#latest

概述

Server Watchdog

Monitor and auto-heal remote servers via SSH. Check services, databases, disk, memory — restart what's down, alert what's wrong.

Prerequisites

  • SSH access to target server (password or key-based)
  • expect available locally (for password-based SSH)
  • Target server runs PM2, systemd, or Docker for service management

Quick Reference

Check PM2 services

ssh user@host "pm2 list"
ssh user@host "pm2 logs --lines 20 --nostream"

Check MongoDB

# Windows
ssh user@host "net start | findstr MongoDB"
ssh user@host "powershell -Command \"(Test-NetConnection -ComputerName 127.0.0.1 -Port 27017).TcpTestSucceeded\""

# Linux
ssh user@host "systemctl status mongod"
ssh user@host "mongosh --eval 'db.runCommand({ping:1})' --quiet"

Check disk & memory

# Linux
ssh user@host "df -h && free -h"

# Windows
ssh user@host "powershell -Command \"Get-PSDrive -PSProvider FileSystem | Select Root,Used,Free; \$os=Get-CimInstance Win32_OperatingSystem; Write-Output ('RAM: '+[math]::Round((\$os.TotalVisibleMemorySize-\$os.FreePhysicalMemory)/1MB,1)+'GB / '+[math]::Round(\$os.TotalVisibleMemorySize/1MB,1)+'GB')\""

Workflow

  1. Diagnose — SSH in, check service status, logs, disk, memory
  2. Identify — Parse logs for errors, crashes, OOM, or unclean shutdowns
  3. Fix — Restart crashed services (pm2 restart, net start, systemctl restart)
  4. Verify — Confirm service is back up and responding
  5. Alert — Notify user via messaging with summary

Crash Analysis

When a service is down, check these in order:

  1. Service logspm2 logs, journalctl -u service, Windows Event Log
  2. Application logs — Check log files at configured paths
  3. System events — OOM killer, unexpected shutdowns, disk full
  4. Database logs — MongoDB: check mongod.log for Fatal ("s":"F") entries

MongoDB crash patterns

"s":"F" — Fatal error (crash)
"Unhandled exception" — Internal bug (often FTDC related)
"Detected unclean shutdown" — Process killed without graceful shutdown
"WiredTiger error" — Storage engine corruption

Auto-Heal Recipes

PM2 service restart

pm2 restart <service-name>
pm2 save  # persist across reboots

MongoDB (Windows)

net stop MongoDB
timeout /t 5
net start MongoDB

MongoDB (Linux)

sudo systemctl restart mongod

Deploy watchdog service

For persistent monitoring, deploy the included watchdog script:

  1. Copy scripts/mongodb-watchdog.js to target server
  2. Install: npm init -y && npm install mongodb
  3. Start: pm2 start mongodb-watchdog.js --name mongodb-watchdog
  4. Save: pm2 save

SSH with password (via expect)

When key-based auth isn't available:

expect -c 'set timeout 20
spawn ssh -o StrictHostKeyChecking=no user@host "COMMAND"
expect {
    "password:" { send "PASSWORD\r"; exp_continue }
    eof
}
'

Alert Template

🚨 Server Alert — [hostname]

⏰ Time: [timestamp]
❌ Issue: [service] is DOWN
📋 Cause: [crash reason from logs]
🔄 Action: Auto-restarted [service]
✅ Status: [service] is back online

📊 System Health:
• Memory: X GB / Y GB
• Disk: Z% used
• Services: N/N online

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-02 07:23 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,855
content-creation

Content Publisher

qoohsuan
将 Markdown 文件发布至 Medium、Dev.to 和 Hashnode。自动处理格式、SEO 优化、定时发布及跨平台分发与规范链接设置。
★ 0 📥 571
it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,524