A good postmortem learns without blaming individuals. It produces owned actions that reduce recurrence or improve detection—not generic “communicate better” platitudes.
Trigger conditions:
Initial offer:
Use six stages: (1) scope & audience, (2) timeline & impact, (3) root cause analysis, (4) what worked / didn’t, (5) action items, (6) communication & follow-up). Confirm internal-only vs customer-facing summary.
Goal: Define readers (exec, engineering, CS) and redact PII or sensitive security details.
Exit condition: Template chosen; owner for the final document.
Goal: Minute-resolution timeline in UTC: detection → onset → mitigation → resolution.
Exit condition: Facts align with any external customer communication.
Goal: Use five whys or fishbone as tools, not rituals. Separate root cause (fix that stops the class of failure) from contributing factors (process gaps, missing tests).
Exit condition: Evidence-backed causal chain; contributing factors listed.
Goal: Reinforce positives (runbooks followed, clear comms) and negatives (missing dashboards, slow escalation).
Goal: Specific tickets with owners and dates; categorize prevent / detect / recover / process.
Exit condition: Items linked in the issue tracker.
Goal: Share summary internally; external postmortem only when policy requires; track completion in 30/60 days.
共 1 个版本