概述

Runbook Automator

Transform manual runbooks into automated, executable playbooks. Parse existing documentation, generate step-by-step scripts with health checks, decision points, rollback procedures, and notification hooks — so incidents get resolved faster with less human intervention.

Use when: "automate this runbook", "convert runbook to script", "make this playbook executable", "incident automation", "turn this wiki page into a script", or when building on-call automation.

Commands

1. `convert` — Parse Runbook and Generate Automation

Step 1: Identify Runbook Format

Read the input runbook (markdown, Confluence wiki, Google Doc, plain text) and extract:

Title and scope — what incident does this address
Prerequisites — access, tools, permissions needed
Steps — ordered actions (distinguish manual vs automatable)
Decision points — if/then branches
Verification steps — how to confirm each step worked
Rollback steps — how to undo if things go wrong
Escalation criteria — when to page someone

Step 2: Classify Each Step

For each step in the runbook, classify as:

Type	Example	Automation
------	---------	------------
Command	"Run `kubectl rollout restart`"	Direct script execution
Check	"Verify pods are running"	Script with assertion
Decision	"If error rate > 5%, proceed to step 4"	Conditional branch
Manual	"Call the database team"	Notification + pause
Observation	"Watch the dashboard for 10 minutes"	Timed wait + metric check

Step 3: Generate Executable Playbook

#!/usr/bin/env bash
set -euo pipefail

# ============================================
# Automated Runbook: [Title]
# Generated from: [source document]
# Last updated: [date]
# ============================================

SLACK_WEBHOOK="${SLACK_WEBHOOK:-}"
PAGERDUTY_KEY="${PAGERDUTY_KEY:-}"
DRY_RUN="${DRY_RUN:-false}"

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
notify() {
  log "NOTIFY: $1"
  if [[ -n "$SLACK_WEBHOOK" ]]; then
    curl -s -X POST "$SLACK_WEBHOOK" -H 'Content-Type: application/json' \
      -d "{\"text\": \"🔧 Runbook: $1\"}" > /dev/null
  fi
}
fail() { notify "❌ FAILED at step $1: $2"; exit 1; }

# --- Step 1: [Name] ---
step_1() {
  log "Step 1: [description]"
  if [[ "$DRY_RUN" == "true" ]]; then
    log "DRY RUN: would execute [command]"
    return 0
  fi
  # [actual command]
  [command] || fail 1 "[error description]"
  # Verify
  [verification command] || fail 1 "Verification failed"
  log "Step 1: ✅ Complete"
}

# --- Step 2: [Decision Point] ---
step_2() {
  log "Step 2: Checking [condition]"
  local metric
  metric=$([check command])
  if (( $(echo "$metric > 5" | bc -l) )); then
    log "Threshold exceeded ($metric > 5) — escalating"
    step_2a  # escalation path
  else
    log "Within bounds ($metric <= 5) — continuing"
    step_3
  fi
}

# --- Rollback ---
rollback() {
  notify "🔄 Rolling back..."
  log "Rollback: [undo commands]"
  [rollback command 1]
  [rollback command 2]
  notify "Rollback complete"
}

trap 'rollback' ERR

# --- Execute ---
notify "Starting runbook: [Title]"
step_1
step_2
# ... remaining steps
notify "✅ Runbook complete"

2. `analyze` — Audit Existing Runbooks for Gaps

Read all runbooks in a directory and flag:

# Find runbook-like documents
find . -maxdepth 3 \( -name "*.md" -o -name "*.txt" -o -name "*.adoc" \) | \
  xargs grep -li "runbook\|playbook\|incident\|on-call\|troubleshoot" 2>/dev/null

For each runbook, check:

Missing rollback steps — what happens if step 3 fails?
No verification — steps that say "do X" but never check if X worked
Stale commands — references to deprecated tools, old hostnames, removed services
Missing decision criteria — "if it's bad, escalate" (how bad? what metric?)
No estimated time — SLA-critical runbooks need time bounds per step
Missing prerequisites — assumed access or tools not listed

Output a coverage report:

# Runbook Audit Report

| Runbook | Steps | Automated | Rollback | Verified | Gaps |
|---------|-------|-----------|----------|----------|------|
| DB Failover | 8 | 3/8 (38%) | ✅ | 5/8 | Stale hostname in step 4 |
| API Scale-Up | 5 | 5/5 (100%) | ❌ Missing | 4/5 | No rollback procedure |
| Cache Flush | 3 | 2/3 (67%) | ✅ | 3/3 | Step 2 references removed tool |

3. `test` — Dry-Run a Generated Playbook

Execute the generated script with DRY_RUN=true:

Validate all commands exist in PATH
Check prerequisite access (can reach hosts, have credentials)
Verify notification hooks work (send test message)
Estimate execution time based on sleep/wait steps
Flag any steps that would require manual intervention

4. `template` — Generate Runbook Template

Given an incident type (database, network, application, security), generate a structured template with:

Standard sections (scope, impact, prerequisites, steps, rollback, escalation)
Common steps for that incident type pre-filled
Placeholder verification commands
Notification hooks
Post-incident review checklist

版本历史

共 1 个版本

v1.0.1 当前

2026-05-07 23:18 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Runbook Automator

概述

Runbook Automator

Commands

1. `convert` — Parse Runbook and Generate Automation

Step 1: Identify Runbook Format

Step 2: Classify Each Step

Step 3: Generate Executable Playbook

2. `analyze` — Audit Existing Runbooks for Gaps

3. `test` — Dry-Run a Generated Playbook

4. `template` — Generate Runbook Template

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

1password

OpenClaw Backup

Slack Messaging

Runbook Automator

概述

Runbook Automator

Commands

1. convert — Parse Runbook and Generate Automation

Step 1: Identify Runbook Format

Step 2: Classify Each Step

Step 3: Generate Executable Playbook

2. analyze — Audit Existing Runbooks for Gaps

3. test — Dry-Run a Generated Playbook

4. template — Generate Runbook Template

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

1password

OpenClaw Backup

Slack Messaging

1. `convert` — Parse Runbook and Generate Automation

2. `analyze` — Audit Existing Runbooks for Gaps

3. `test` — Dry-Run a Generated Playbook

4. `template` — Generate Runbook Template