概述

Server Mate

Version: 1.3.3

Use this skill to design or implement a two-plane monitoring system:

a Python agent on the server that tails logs and samples host metrics
an OpenClaw-side analyzer that aggregates data, explains failures, answers questions, and sends alerts

Start

Confirm the environment first: Linux distribution, Nginx or Apache, PHP-FPM layout, log paths, webhook target, and whether automated actions may touch a live host.
Keep collection read-only until the user explicitly asks for automation. Add alerting before any auto-ban or auto-heal behavior.
In OpenClaw deployments, OPENAI_API_KEY is injected by the runtime when AI analysis is enabled. Do not ask the user to export it manually. Treat webhook URLs or tokens in config.yaml as secrets and do not commit them.
Treat ./data/GeoIP.conf the same way. It may contain MaxMind AccountID and LicenseKey, so keep it local-only and out of Git.
Prefer MaxMind's official GeoLite2 workflow through ./data/GeoIP.conf and geoipupdate. Treat the built-in public mirror fallback only as an operator-reviewed bootstrap path when no local .mmdb file is present.
Treat auto-ban and auto-heal as privileged features. They may execute operator-supplied firewall or service restart commands and should stay disabled or dry_run: true until reviewed.
Use the references progressively instead of loading everything at once:
Read references/architecture.md for overall design, component boundaries, and rollout order.
Read references/data-contracts.md before defining JSON payloads, storage schemas, metrics, or natural-language query handlers.
Read references/ops-playbook.md before implementing thresholds, webhooks, reports, auto-ban, or self-heal logic.
Read references/sqlite-schema.md before extending historical storage or report queries.
Use scripts/server_agent.py as the collector, daemon entrypoint, and SQLite rollup writer.

Delivery workflow

Map the request to one or more tracks.

Agent collection
Aggregation and storage
Alerting and reporting
AI diagnosis
Guarded remediation

Implement the smallest safe slice first.

Start with structured access, error, and system events.
Add rollup metrics and natural-language answers next.
Add webhook alerts after the counters are stable.
Enable auto-ban or auto-heal only when thresholds, cooldowns, allowlists, and audit logs already exist.

Validate with real or synthetic logs before changing production services.
Explain caveats in plain language.

Example: UV is often an approximation based on IP and user-agent unless the site provides a stronger visitor key.
Example: upload bandwidth is unavailable unless the access log includes request length or a similar field.

Agent rules

Prefer Python, psutil, and the standard library for the first implementation.
Prefer a generated ./config.yaml plus local SQLite state such as ./metrics.db before adding external services.
Keep generated artifacts inside the current skill workspace by default: ./config.yaml, ./metrics.db, ./logs/, and ./reports/. Do not default to /opt, /var/log, or other system-wide directories.
Prefer the system_metrics + sites[] matrix layout from config.example.yaml instead of new single-site keys.
Support configurable log paths. Do not hardcode site layouts when the vhost config can be read instead.
Emit structured JSON with timezone-aware timestamps, host or site identifiers, event type, and enough raw context to debug parser mistakes.
In multi-site mode, collect host CPU or memory metrics once per cycle and keep site log parsing isolated per domain.
Separate parsing, aggregation, transport, and action execution so that HTTP push, stdout replay, file drop, or websocket transport can be swapped independently.
Keep unknown lines and parser failures as first-class counters instead of dropping them silently.

Analyzer rules

Store raw events separately from derived counters.
Model traffic, performance, security, spider, and error signals as independent reducers over the same event stream.
Translate natural-language requests into:
a time window
filters
an aggregation
a presentation format
For AI error explanations, pass the fingerprint, surrounding context, and normalized fields instead of dumping entire logs.

Safety rules

Treat auto-ban and auto-heal as opt-in features.
Default Guarded Automation to dry_run: true and keep it there until the user has observed automation notifications and audit history for several days.
Never flip dry_run to false, or enable auto_ban.enabled / auto_heal.enabled, unless the operator explicitly approves the command templates, allowlists, cooldowns, and audit destinations.
Require cooldowns, max actions per window, and allowlists before running firewall or restart commands.
Require whitelist checks before any ban command. Never ban loopback, RFC1918 private ranges, or trusted crawler families by default.
Require TTL-based unban or an equivalent release plan for every ban. Do not create permanent firewall blocks from the first implementation.
Record an audit event for every alert, dry-run, ban, unban, restart, and failed remediation attempt.
Store audit history in SQLite tables such as automation_actions and banned_ips, and expose simple lookup queries in user-facing docs.
Prefer one-shot remediation followed by escalation. Do not loop restarts.

Report expectations

Daily report: prior-day PV, UV, IP, request totals, bandwidth, status mix, top errors, and slow endpoints.
Weekly report: blocked IP trends, crawler trends, suspicious route clusters, and recurring slow routes.
Monthly report: bandwidth peak, disk growth, capacity warning, and remediation summary.

Automation scheduling

Use external scheduling for production unless the user explicitly wants an always-on daemon-only design.

Recommended ingestion pattern:
Run server_agent.py --once every 10 minutes from cron or a systemd timer.
This keeps log parsing incremental, writes SQLite rollups, and avoids duplicate resident processes.
For systemd deployments in Clawhub-style packaging:
Do not rely on bundling a .service file inside the skill package.
Generate a host-local unit with server_agent.py --config ./config.yaml --generate-service, then paste it into /etc/systemd/system/server-mate.service.
Recommended report pattern:
Run report_generator.py as one-shot scheduled jobs.
Daily PDF push at 01:00.
Weekly PDF push every Monday at 01:10.
Monthly PDF push on day 1 at 01:20.
In multi-site mode, a single scheduled report_generator.py run should iterate over every configured site unless the user explicitly passes --site.

Release notes for 1.3.2

Multi-site matrix config using sites[] plus global system_metrics
Host-global metrics stored separately from site-local business rollups
Logrotate-tolerant incremental readers with inode or truncate recovery
Guarded Automation with dry_run, whitelist checks, TTL-based unban, cooldown-based auto-heal, and SQLite audit trail
SSH brute-force detection from logs.auth_log with ssh_brute_force alerting and optional linked auto-ban
SSL certificate expiry inspection in report generation and webhook summaries
Telegram delivery support for alerts and report notices
GeoIP official refresh support via local ./data/GeoIP.conf and geoipupdate, with an operator-reviewed public mirror bootstrap fallback
config.example.yaml and docs updated for MaxMind GeoLite2 setup in the current workspace

Copyable cron examples:

*/10 * * * * /usr/bin/env bash -lc 'python3 ./scripts/server_agent.py --config ./config.yaml --once >> ./logs/server-mate-agent.log 2>&1'
0 1 * * * /usr/bin/env bash -lc 'python3 ./scripts/report_generator.py --config ./config.yaml pdf --range daily --send >> ./logs/server-mate-report.log 2>&1'
10 1 * * 1 /usr/bin/env bash -lc 'python3 ./scripts/report_generator.py --config ./config.yaml pdf --range weekly --send >> ./logs/server-mate-report.log 2>&1'
20 1 1 * * /usr/bin/env bash -lc 'python3 ./scripts/report_generator.py --config ./config.yaml pdf --range monthly --send >> ./logs/server-mate-report.log 2>&1'

Systemd note:

If the host already standardizes on systemd, prefer Type=oneshot services plus timers for reports.
Use Restart=always only for the long-running --daemon agent mode.

Example requests

"Design the ingestion API for Server-Mate."
"Add 404 burst detection and webhook alerts."
"Explain today's top 5xx error in plain language."
"Plan a safe auto-heal flow for repeated 502 responses."

版本历史

共 2 个版本

v1.3.3 当前

2026-05-03 06:03 安全安全
v1.2.0

2026-03-31 06:36 安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Server Mate

概述

Server Mate

Start

Delivery workflow

Agent rules

Analyzer rules

Safety rules

Report expectations

Automation scheduling

Release notes for 1.3.2

Example requests

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Health Mate

Mail Mate

Health Report