← 返回
未分类

Deep Debugging

Evidence-first debugging and incident triage for unclear, recurring, production-like, or high-risk software bugs. Use when the user asks for root cause analy...
证据优先的调试与事件分诊,适用于不明、反复、类生产或高风险的软件缺陷。在用户要求进行根本原因分析时使用。
brasco05 brasco05 来源
未分类 clawhub v2.2.0 3 版本 100000 Key: 无需
★ 0
Stars
📥 759
下载
💾 1
安装
3
版本
#latest

概述

Deep Debugging

No guessing. No random fixes. Stabilize incidents first, then prove the root cause.

When to use

Use this skill for:

  • unclear or recurring bugs
  • user-facing/prod-like failures: 500, broken login, failed deploy, red healthcheck
  • auth/session bugs: 401, 403, cookie/JWT weirdness
  • integrations/webhooks/rate limits/signature failures
  • “still broken”, “same error”, “find root cause”, “debug this properly”

Do not use it for obvious compiler errors, typos, missing install/setup steps, or cosmetic UI tweaks.

Operating contract

  1. Observe before editing. No code/config changes before evidence + hypothesis.
  2. One hypothesis at a time. If you cannot state the proof, you do not know the cause.
  3. Binary search the chain. Split request → app → service → DB/API → response.
  4. Minimal reversible fix. No drive-by refactors.
  5. Verify and prevent. Test the exact failure path and document recurrence prevention.

Workflow

0. Incident Gate   → user/prod impact? stabilize first
1. Quick Triage    → obvious setup/runtime misses
2. Evidence        → exact error, repro, affected path, last change
3. Hypothesis      → one testable cause + one test
4. Narrow          → binary-search the failure chain
5. Fix             → smallest reversible change
6. Verify          → exact repro/test/build/log evidence
7. Prevent         → regression/monitoring/learning when recurring or prod-like

Phase 0 — Incident Gate

If users, production, money flows, auth, data integrity, or external integrations are affected, switch to incident mode before debugging.

Output first:

INCIDENT SNAPSHOT
Impact:     [who/what is affected]
Severity:   [low/medium/high/critical + why]
Started:    [time/commit/deploy if known]
Evidence:   [logs/status/metrics; redacted]
Stabilize:  [rollback, feature flag, pause job, monitor, or no-op]
Next step:  [one concrete diagnostic action]

Rules:

  • High/critical production incidents: check rollback/feature flag before hotfix.
  • Preserve evidence before redeploy/restart when possible.
  • Production writes, rollbacks, migrations, credential changes, or third-party changes require explicit user approval.

For detailed incident checklists read references/incident-first.md.

Phase 1 — Quick Triage

Check these before deeper analysis:

□ Server/process restarted after config/code change?
□ Correct env file/keys present? Key names only, never values.
□ Dependencies installed/generated after package/schema changes?
□ Migration/schema state matches runtime?
□ Browser/client cache or stale build ruled out?
□ Repro uses test data, not live credentials/customer data?

If a quick triage item explains the issue, fix that minimally and still verify.

Phase 2 — Evidence

Collect real proof:

Error:       exact message/status/stack excerpt
Path:        endpoint/function/job/component
Repro:       minimal steps or request shape
Scope:       all users vs specific role/input/tenant/environment
Expected:    what should happen
Actual:      what happens
Last change: commit/deploy/config/schema/provider change

Optional helper: run scripts/incident_snapshot.sh locally to collect safe environment metadata. It prints env key names only, not values.

Phase 3 — Hypothesis

State exactly one hypothesis before touching code:

HYPOTHESIS: The failure happens because [specific cause],
which I will prove/disprove by [specific test].

Bad: “Something is wrong with auth.”

Good: “The 401 happens because the login token is set but not sent on /me, which I will prove by comparing the login response headers with the follow-up request headers.”

Phase 4 — Narrow with binary search

Pick the chain and split it:

Frontend → request creation → network → API gateway/middleware → controller → service → DB/external API → response → UI

After each test report:

✅ Ruled out: [component] because [evidence]
❌ Found: [component] fails because [evidence]

For stack-specific checklists read references/stack-checklists.md.

Phase 5 — Fix

Only after evidence supports the hypothesis:

  • change the smallest surface area
  • do not refactor unrelated code
  • prefer reversible config/code changes
  • keep secrets and production data out of logs/reports
  • stop after 3 failed fix attempts and restart from evidence/hypothesis

Phase 6 — Verify

Before saying done, provide evidence:

DEBUG REPORT
Failure:      [exact issue]
Root cause:   [specific cause]
Proof:        [test/log/code evidence]
Fix:          [minimal change]
Verified:     [command/test/repro result]
Prevention:   [test/monitoring/doc/learning, or "not needed" + why]
Remaining:    [risk/blocker, or "none known"]

For report variants read references/output-templates.md.

Phase 7 — Prevent recurrence

Required when the bug is production-like, recurring, security-adjacent, or took more than one hypothesis:

□ Regression test or smoke test added/identified
□ Monitoring/logging improved or gap named
□ Runbook/rollback note captured for future incidents
□ Durable learning written if likely to recur

Hard rules

  • Never fix before hypothesis.
  • Never test with real credentials or copy raw tokens/cookies/secrets.
  • Never mix unrelated refactor with bug fix.
  • Never claim root cause without proof.
  • Never hide remaining risk.
  • If meaningful optimization potential remains, mention it as Next optimization:.

版本历史

共 3 个版本

  • v2.2.0 当前
    2026-05-21 12:41 安全 安全
  • v1.0.5
    2026-05-08 12:38 安全 安全
  • v1.0.3
    2026-05-03 04:37 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 72 📥 181,469
dev-programming

Mcporter

steipete
使用 mcporter CLI 直接列出、配置、认证及调用 MCP 服务器/工具(支持 HTTP 或 stdio),涵盖临时服务器、配置编辑及 CLI/类型生成功能。
★ 195 📥 67,426
dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 677 📥 325,909