← 返回
未分类 中文

Server Mate

Build or extend a lightweight server monitoring and AI operations workflow for Linux hosts running Nginx or Apache. Use when Codex needs to collect psutil me...
为运行 Nginx 或 Apache 的Linux 主机构建或扩展轻量级服务器监控与 AI 运维工作流,适用于 Codex 收集 psutil 指标。
tankeito tankeito 来源
未分类 clawhub v1.3.3 2 版本 100000 Key: 无需
★ 1
Stars
📥 430
下载
💾 7
安装
2
版本
#latest

概述

Server Mate

Version: 1.3.3

Use this skill to design or implement a two-plane monitoring system:

  • a Python agent on the server that tails logs and samples host metrics
  • an OpenClaw-side analyzer that aggregates data, explains failures, answers questions, and sends alerts

Start

  • Confirm the environment first: Linux distribution, Nginx or Apache, PHP-FPM layout, log paths, webhook target, and whether automated actions may touch a live host.
  • Keep collection read-only until the user explicitly asks for automation. Add alerting before any auto-ban or auto-heal behavior.
  • In OpenClaw deployments, OPENAI_API_KEY is injected by the runtime when AI analysis is enabled. Do not ask the user to export it manually. Treat webhook URLs or tokens in config.yaml as secrets and do not commit them.
  • Treat ./data/GeoIP.conf the same way. It may contain MaxMind AccountID and LicenseKey, so keep it local-only and out of Git.
  • Prefer MaxMind's official GeoLite2 workflow through ./data/GeoIP.conf and geoipupdate. Treat the built-in public mirror fallback only as an operator-reviewed bootstrap path when no local .mmdb file is present.
  • Treat auto-ban and auto-heal as privileged features. They may execute operator-supplied firewall or service restart commands and should stay disabled or dry_run: true until reviewed.
  • Use the references progressively instead of loading everything at once:
  • Read references/architecture.md for overall design, component boundaries, and rollout order.
  • Read references/data-contracts.md before defining JSON payloads, storage schemas, metrics, or natural-language query handlers.
  • Read references/ops-playbook.md before implementing thresholds, webhooks, reports, auto-ban, or self-heal logic.
  • Read references/sqlite-schema.md before extending historical storage or report queries.
  • Use scripts/server_agent.py as the collector, daemon entrypoint, and SQLite rollup writer.

Delivery workflow

  1. Map the request to one or more tracks.
    • Agent collection
    • Aggregation and storage
    • Alerting and reporting
    • AI diagnosis
    • Guarded remediation
  2. Implement the smallest safe slice first.
    • Start with structured access, error, and system events.
    • Add rollup metrics and natural-language answers next.
    • Add webhook alerts after the counters are stable.
    • Enable auto-ban or auto-heal only when thresholds, cooldowns, allowlists, and audit logs already exist.
  3. Validate with real or synthetic logs before changing production services.
  4. Explain caveats in plain language.
    • Example: UV is often an approximation based on IP and user-agent unless the site provides a stronger visitor key.
    • Example: upload bandwidth is unavailable unless the access log includes request length or a similar field.

Agent rules

  • Prefer Python, psutil, and the standard library for the first implementation.
  • Prefer a generated ./config.yaml plus local SQLite state such as ./metrics.db before adding external services.
  • Keep generated artifacts inside the current skill workspace by default: ./config.yaml, ./metrics.db, ./logs/, and ./reports/. Do not default to /opt, /var/log, or other system-wide directories.
  • Prefer the system_metrics + sites[] matrix layout from config.example.yaml instead of new single-site keys.
  • Support configurable log paths. Do not hardcode site layouts when the vhost config can be read instead.
  • Emit structured JSON with timezone-aware timestamps, host or site identifiers, event type, and enough raw context to debug parser mistakes.
  • In multi-site mode, collect host CPU or memory metrics once per cycle and keep site log parsing isolated per domain.
  • Separate parsing, aggregation, transport, and action execution so that HTTP push, stdout replay, file drop, or websocket transport can be swapped independently.
  • Keep unknown lines and parser failures as first-class counters instead of dropping them silently.

Analyzer rules

  • Store raw events separately from derived counters.
  • Model traffic, performance, security, spider, and error signals as independent reducers over the same event stream.
  • Translate natural-language requests into:
  • a time window
  • filters
  • an aggregation
  • a presentation format
  • For AI error explanations, pass the fingerprint, surrounding context, and normalized fields instead of dumping entire logs.

Safety rules

  • Treat auto-ban and auto-heal as opt-in features.
  • Default Guarded Automation to dry_run: true and keep it there until the user has observed automation notifications and audit history for several days.
  • Never flip dry_run to false, or enable auto_ban.enabled / auto_heal.enabled, unless the operator explicitly approves the command templates, allowlists, cooldowns, and audit destinations.
  • Require cooldowns, max actions per window, and allowlists before running firewall or restart commands.
  • Require whitelist checks before any ban command. Never ban loopback, RFC1918 private ranges, or trusted crawler families by default.
  • Require TTL-based unban or an equivalent release plan for every ban. Do not create permanent firewall blocks from the first implementation.
  • Record an audit event for every alert, dry-run, ban, unban, restart, and failed remediation attempt.
  • Store audit history in SQLite tables such as automation_actions and banned_ips, and expose simple lookup queries in user-facing docs.
  • Prefer one-shot remediation followed by escalation. Do not loop restarts.

Report expectations

  • Daily report: prior-day PV, UV, IP, request totals, bandwidth, status mix, top errors, and slow endpoints.
  • Weekly report: blocked IP trends, crawler trends, suspicious route clusters, and recurring slow routes.
  • Monthly report: bandwidth peak, disk growth, capacity warning, and remediation summary.

Automation scheduling

Use external scheduling for production unless the user explicitly wants an always-on daemon-only design.

  • Recommended ingestion pattern:
  • Run server_agent.py --once every 10 minutes from cron or a systemd timer.
  • This keeps log parsing incremental, writes SQLite rollups, and avoids duplicate resident processes.
  • For systemd deployments in Clawhub-style packaging:
  • Do not rely on bundling a .service file inside the skill package.
  • Generate a host-local unit with server_agent.py --config ./config.yaml --generate-service, then paste it into /etc/systemd/system/server-mate.service.
  • Recommended report pattern:
  • Run report_generator.py as one-shot scheduled jobs.
  • Daily PDF push at 01:00.
  • Weekly PDF push every Monday at 01:10.
  • Monthly PDF push on day 1 at 01:20.
  • In multi-site mode, a single scheduled report_generator.py run should iterate over every configured site unless the user explicitly passes --site.

Release notes for 1.3.2

  • Multi-site matrix config using sites[] plus global system_metrics
  • Host-global metrics stored separately from site-local business rollups
  • Logrotate-tolerant incremental readers with inode or truncate recovery
  • Guarded Automation with dry_run, whitelist checks, TTL-based unban, cooldown-based auto-heal, and SQLite audit trail
  • SSH brute-force detection from logs.auth_log with ssh_brute_force alerting and optional linked auto-ban
  • SSL certificate expiry inspection in report generation and webhook summaries
  • Telegram delivery support for alerts and report notices
  • GeoIP official refresh support via local ./data/GeoIP.conf and geoipupdate, with an operator-reviewed public mirror bootstrap fallback
  • config.example.yaml and docs updated for MaxMind GeoLite2 setup in the current workspace

Copyable cron examples:

*/10 * * * * /usr/bin/env bash -lc 'python3 ./scripts/server_agent.py --config ./config.yaml --once >> ./logs/server-mate-agent.log 2>&1'
0 1 * * * /usr/bin/env bash -lc 'python3 ./scripts/report_generator.py --config ./config.yaml pdf --range daily --send >> ./logs/server-mate-report.log 2>&1'
10 1 * * 1 /usr/bin/env bash -lc 'python3 ./scripts/report_generator.py --config ./config.yaml pdf --range weekly --send >> ./logs/server-mate-report.log 2>&1'
20 1 1 * * /usr/bin/env bash -lc 'python3 ./scripts/report_generator.py --config ./config.yaml pdf --range monthly --send >> ./logs/server-mate-report.log 2>&1'

Systemd note:

  • If the host already standardizes on systemd, prefer Type=oneshot services plus timers for reports.
  • Use Restart=always only for the long-running --daemon agent mode.

Example requests

  • "Design the ingestion API for Server-Mate."
  • "Add 404 burst detection and webhook alerts."
  • "Explain today's top 5xx error in plain language."
  • "Plan a safe auto-heal flow for repeated 502 responses."

版本历史

共 2 个版本

  • v1.3.3 当前
    2026-05-03 06:03 安全 安全
  • v1.2.0
    2026-03-31 06:36 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Health Mate

tankeito
可执行的OpenClaw健康报告技能,支持中文、英文和日文报告流程。仅从明确配置的 MEMORY_DI 读取 Markdown 日志。
★ 1 📥 927

Mail Mate

tankeito
通过内置路由连接IMAP邮箱,过滤、规范化数据,使用正则表达式提取并在精确时间窗口内推送匹配的邮件。
★ 0 📥 382
data-analysis

Health Report

tankeito
通过自然语言配置健康档案,主动引导录入体重、饮食、饮水、运动数据,生成专业健康报告并多端推送。
★ 1 📥 672