← 返回
未分类 中文

Runbook Automator

Convert manual incident runbooks into automated, executable playbooks. Parse existing runbooks, generate scripts for each step, add health checks, rollback p...
将手动事故处置手册转换为可自动执行的剧本。解析现有手册,为每一步生成脚本,加入健康检查和回滚流程。
charlie-morrison charlie-morrison 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 313
下载
💾 1
安装
1
版本
#latest

概述

Runbook Automator

Transform manual runbooks into automated, executable playbooks. Parse existing documentation, generate step-by-step scripts with health checks, decision points, rollback procedures, and notification hooks — so incidents get resolved faster with less human intervention.

Use when: "automate this runbook", "convert runbook to script", "make this playbook executable", "incident automation", "turn this wiki page into a script", or when building on-call automation.

Commands

1. convert — Parse Runbook and Generate Automation

Step 1: Identify Runbook Format

Read the input runbook (markdown, Confluence wiki, Google Doc, plain text) and extract:

  • Title and scope — what incident does this address
  • Prerequisites — access, tools, permissions needed
  • Steps — ordered actions (distinguish manual vs automatable)
  • Decision points — if/then branches
  • Verification steps — how to confirm each step worked
  • Rollback steps — how to undo if things go wrong
  • Escalation criteria — when to page someone

Step 2: Classify Each Step

For each step in the runbook, classify as:

TypeExampleAutomation
---------------------------
Command"Run kubectl rollout restart"Direct script execution
Check"Verify pods are running"Script with assertion
Decision"If error rate > 5%, proceed to step 4"Conditional branch
Manual"Call the database team"Notification + pause
Observation"Watch the dashboard for 10 minutes"Timed wait + metric check

Step 3: Generate Executable Playbook

#!/usr/bin/env bash
set -euo pipefail

# ============================================
# Automated Runbook: [Title]
# Generated from: [source document]
# Last updated: [date]
# ============================================

SLACK_WEBHOOK="${SLACK_WEBHOOK:-}"
PAGERDUTY_KEY="${PAGERDUTY_KEY:-}"
DRY_RUN="${DRY_RUN:-false}"

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
notify() {
  log "NOTIFY: $1"
  if [[ -n "$SLACK_WEBHOOK" ]]; then
    curl -s -X POST "$SLACK_WEBHOOK" -H 'Content-Type: application/json' \
      -d "{\"text\": \"🔧 Runbook: $1\"}" > /dev/null
  fi
}
fail() { notify "❌ FAILED at step $1: $2"; exit 1; }

# --- Step 1: [Name] ---
step_1() {
  log "Step 1: [description]"
  if [[ "$DRY_RUN" == "true" ]]; then
    log "DRY RUN: would execute [command]"
    return 0
  fi
  # [actual command]
  [command] || fail 1 "[error description]"
  # Verify
  [verification command] || fail 1 "Verification failed"
  log "Step 1: ✅ Complete"
}

# --- Step 2: [Decision Point] ---
step_2() {
  log "Step 2: Checking [condition]"
  local metric
  metric=$([check command])
  if (( $(echo "$metric > 5" | bc -l) )); then
    log "Threshold exceeded ($metric > 5) — escalating"
    step_2a  # escalation path
  else
    log "Within bounds ($metric <= 5) — continuing"
    step_3
  fi
}

# --- Rollback ---
rollback() {
  notify "🔄 Rolling back..."
  log "Rollback: [undo commands]"
  [rollback command 1]
  [rollback command 2]
  notify "Rollback complete"
}

trap 'rollback' ERR

# --- Execute ---
notify "Starting runbook: [Title]"
step_1
step_2
# ... remaining steps
notify "✅ Runbook complete"

2. analyze — Audit Existing Runbooks for Gaps

Read all runbooks in a directory and flag:

# Find runbook-like documents
find . -maxdepth 3 \( -name "*.md" -o -name "*.txt" -o -name "*.adoc" \) | \
  xargs grep -li "runbook\|playbook\|incident\|on-call\|troubleshoot" 2>/dev/null

For each runbook, check:

  • Missing rollback steps — what happens if step 3 fails?
  • No verification — steps that say "do X" but never check if X worked
  • Stale commands — references to deprecated tools, old hostnames, removed services
  • Missing decision criteria — "if it's bad, escalate" (how bad? what metric?)
  • No estimated time — SLA-critical runbooks need time bounds per step
  • Missing prerequisites — assumed access or tools not listed

Output a coverage report:

# Runbook Audit Report

| Runbook | Steps | Automated | Rollback | Verified | Gaps |
|---------|-------|-----------|----------|----------|------|
| DB Failover | 8 | 3/8 (38%) | ✅ | 5/8 | Stale hostname in step 4 |
| API Scale-Up | 5 | 5/5 (100%) | ❌ Missing | 4/5 | No rollback procedure |
| Cache Flush | 3 | 2/3 (67%) | ✅ | 3/3 | Step 2 references removed tool |

3. test — Dry-Run a Generated Playbook

Execute the generated script with DRY_RUN=true:

  • Validate all commands exist in PATH
  • Check prerequisite access (can reach hosts, have credentials)
  • Verify notification hooks work (send test message)
  • Estimate execution time based on sleep/wait steps
  • Flag any steps that would require manual intervention

4. template — Generate Runbook Template

Given an incident type (database, network, application, security), generate a structured template with:

  • Standard sections (scope, impact, prerequisites, steps, rollback, escalation)
  • Common steps for that incident type pre-filled
  • Placeholder verification commands
  • Notification hooks
  • Post-incident review checklist

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 23:18 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,619
it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,924
ai-agent

Slack Messaging

charlie-morrison
Slack 消息—通过 CLI 和 API 发送消息、管理频道、上传文件、添加反应并自动化团队通知。
★ 0 📥 516