← 返回
数据分析 中文

ops-mcp-server

Query observability data and execute operational procedures via the ops-mcp-server MCP interface. Covers Kubernetes events, Prometheus metrics, Elasticsearch...
通过ops-mcp-server MCP接口查询可观测性数据并执行运维操作。涵盖Kubernetes事件、Prometheus指标、Elasticsearch等。
shaowenchen
数据分析 clawhub v1.0.3 2 版本 99893.7 Key: 无需
★ 1
Stars
📥 920
下载
💾 17
安装
2
版本
#latest

概述

Ops MCP Server Skill

Access your infrastructure's observability data and execute operational procedures through a unified MCP interface.

Capabilities at a Glance

ModuleToolsWhat it answers
-------------------------------
Events (Kubernetes)list-events-from-ops, get-events-from-opsWhat happened to a pod/deployment/node?
Metrics (Prometheus)list-metrics-from-prometheus, query-metrics-from-prometheus, query-metrics-range-from-prometheusIs CPU/memory/traffic normal? What changed over time?
Logs (Elasticsearch)list-log-indices-from-elasticsearch, search-logs-from-elasticsearch, query-logs-from-elasticsearchWhat errors are in the logs? What did service X log?
Traces (Jaeger)get-services-from-jaeger, get-operations-from-jaeger, find-traces-from-jaeger, get-trace-from-jaegerWhy is this request slow? Where did it fail?
SOPSlist-sops-from-ops, list-sops-parameters-from-ops, execute-sops-from-opsRun a standard operational procedure

Setup (first-time)

# 1. Use mcporter with npx (no installation needed)
# Or install globally: npm i -g mcporter

# 2. Register the server
cd ~/.openclaw/workspace
npx mcporter config add ops-mcp-server --url http://localhost/mcp

# 3. Authenticate (if needed)
npx mcporter auth ops-mcp-server
# On failure, add to ~/.openclaw/workspace/config/mcporter.json:
# "headers": { "Authorization": "Bearer YOUR_TOKEN" }

# 4. Verify
npx mcporter list ops-mcp-server
npx mcporter call ops-mcp-server list-events-from-ops page_size=5

# 5. Set env var
export OPS_MCP_SERVER_URL="http://localhost/mcp"

How to Investigate: Decision Guide

When a user describes a problem, use this guide to choose starting tools and build a complete picture.

🔴 "Something is broken / service is down"

  1. Kubernetes Events first — check if pods crashed, restarted, or got evicted

```

get-events-from-ops subject_pattern="ops.clusters..namespaces..pods..events"

```

  1. Logs — search for errors around the time of the incident

```

query-logs-from-elasticsearch query="FROM logs-* | WHERE @timestamp > NOW() - 30 minutes | WHERE level == 'error' | LIMIT 50"

```

  1. Traces — find failed or slow requests

```

find-traces-from-jaeger serviceName= tags={"error":"true"}

```

🟡 "Performance is degraded / requests are slow"

  1. Metrics — check resource saturation

```

query-metrics-from-prometheus query="100 - (avg(rate(node_cpu_seconds_total{mode='idle'}[5m])) * 100)"

query-metrics-range-from-prometheus query="node_memory_MemAvailable_bytes" time_range="1h" step="1m"

```

  1. Traces — find slow spans

```

find-traces-from-jaeger serviceName= durationMin=1000

```

  1. Logs — look for timeouts or slow query warnings

🔵 "I need to run a procedure / restart something"

  1. List available SOPs

```

list-sops-from-ops

```

  1. Get parameters

```

list-sops-parameters-from-ops sops_id=

```

  1. Execute

```

execute-sops-from-ops sops_id= parameters='{...}'

```

🟢 "General health check / nothing specific"

Start with events + a key metrics query, then go deeper based on what you find.


Tool Quick Reference

Events — NATS subject pattern format

# Namespace resources
ops.clusters.{cluster}.namespaces.{ns}.{resourceType}.{name}.{observation}

# Node level
ops.clusters.{cluster}.nodes.{nodeName}.{observation}

# Notifications
ops.notifications.providers.{provider}.channels.{channel}.severities.{severity}

Wildcards: * = one segment, > = everything remaining (tail only)

Observation types: status | events | alerts | findings

Time is Unix milliseconds: $(date +%s)000

Logs — ES|QL query patterns

-- Recent errors
FROM logs-* | WHERE @timestamp > NOW() - 30 minutes | WHERE level == 'error' | LIMIT 100

-- Top errors by frequency
FROM logs-* | WHERE @timestamp > NOW() - 1 hour | WHERE level == 'error'
| STATS count() BY message | SORT count DESC | LIMIT 10

-- Specific service
FROM logs-* | WHERE service == 'checkout-service' | WHERE @timestamp > NOW() - 1 hour | LIMIT 50

Metrics — PromQL patterns

# CPU usage
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)

# Memory available
node_memory_MemAvailable_bytes

# HTTP error rate
rate(http_requests_total{status=~"5.."}[5m])

Detailed Examples & Reference Files

For complete parameter lists, output formats, and advanced patterns, read the relevant file:

  • eventsexamples/events.md
  • metricsexamples/metrics.md
  • logsexamples/logs.md
  • tracesexamples/traces.md
  • sopsexamples/sops.md
  • event subject format designreferences/design.md

Read the relevant example file before making complex tool calls you're unsure about.


What This Skill is NOT For

  • Direct infrastructure changes (use dedicated automation tooling)
  • Real-time alerting (investigation only, not a monitoring agent)
  • Writing to or modifying operational data (all access is read-only)

版本历史

共 2 个版本

  • v1.0.3 当前
    2026-03-29 18:42 安全 安全
  • v1.0.1
    2026-03-07 02:00

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 165 📥 59,980
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,420
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 65,092