Bug Fix v4.0 — OpenClaw Edition (Zero-Regression + Portable)

Core Promise: Fix completely. Fix everywhere. Break nothing. Learn from every fix.

Iron Rules (12 — NEVER Violate)

┌──────────────────────────────────────────────────────────────────────────┐
│  Rule 1:  Root cause MUST pass 4 gates before fixing                     │
│           (reproducible + causal + reversible + mechanistic)              │
│                                                                          │
│  Rule 2:  Scope MUST pass 5 gates before fixing                          │
│           (consumers + contracts + invariants + call sites + dup scan)    │
│                                                                          │
│  Rule 3:  MUST trace IMPACT CHAIN (code → data → time → event)          │
│           + scan ALL files for same pattern before writing fix            │
│                                                                          │
│  Rule 4:  MUST predict side effects + check blind spots before coding    │
│           (references/blind-spots.md is single source of truth)           │
│                                                                          │
│  Rule 5:  After fix, MUST run regression verification                    │
│           (functional + performance + concurrency + all impact levels)    │
│                                                                          │
│  Rule 6:  MUST verify fix is LOADED at runtime                           │
│           (clear __pycache__ + restart + exercise code path)              │
│                                                                          │
│  Rule 7:  Framework behavior → read source code first, never trust       │
│           docs/comments/assumptions alone                                │
│                                                                          │
│  Rule 8:  UI bugs MUST gather RUNTIME EVIDENCE before proposing fixes    │
│           (screenshot + DevTools DOM/console + user repro steps)          │
│           Do NOT fix UI bugs based on code reading alone.                │
│                                                                          │
│  Rule 9:  Fix is NOT done until: Bug Summary output + code-review        │
│           passes + knowledge files updated + self-reflection complete     │
│                                                                          │
│  Rule 10: Before fixing, CLASSIFY the problem layer:                     │
│           code bug? missing config? wrong architecture? AI capability?   │
│           Fix at the root layer, not at the symptom layer.               │
│                                                                          │
│  Rule 11: Pattern matching (regex, string match, name lookup) MUST       │
│           check boundary conditions (word boundaries / anchors / exact)   │
│                                                                          │
│  Rule 12: Before fix, MUST search bug pattern library + bug records      │
│           for known fixes and historical context                         │
└──────────────────────────────────────────────────────────────────────────┘

Workflow Overview

Phase 0: Triage → Severity (P0-P3) + Tier (Trivial/Standard/Complex)
  │
  ├─ Trivial → Quick Fix → test → done
  │
  ├─ Standard ─┐
  └─ Complex ──┘
      │
Phase 1: Reproduce (evidence required)
      │
Phase 2: Root Cause Analysis
    2A: Hypothesis ladder → 5 Whys → evidence
    2B: Search knowledge files (bug-patterns + bug-records)
    2C: Impact chain (code + data + time + event)
    2D: Similar issue scan across codebase
      │
Phase 3: Scope + Prediction
    3A: Consumer list → contracts → invariants → dup scan (5 gates)
    3B: Side effect prediction + blind spot check
    3C: Fix strategy comparison (when >10 LOC, Complex only)
      │
Phase 4: Fix (minimal change, prefer ≤50 LOC)
      │
Phase 5: Verify + Review
    5A: Regression verification (functional + perf + concurrency)
    5B: Runtime deployment verification
    5C: Bug Summary + code-review skill
      │
Phase 6: Knowledge Deposit + Self-Reflection

Phase 0: Triage + Severity

> Classify severity AND tier FIRST to control workflow depth.

Severity Classification (controls workflow depth)

|----------|----------|----------|----------|

Tier Classification (controls fix path)

| Tier | Criteria | Path |

|------|----------|------|

| Trivial | Typo, config value, 1-line obvious fix, no behavioral change | Quick Fix (below) |

| Standard | Logic bug, 1-3 files, clear symptom, no cross-module risk | Standard Path (skip phases marked "Complex only") |

| Complex | Cross-module, >3 files, shared utility, schema change, multi-process | Full Path (all phases mandatory) |

Quick Fix Path (Trivial only)

## Quick Fix
- Bug: [one-line description]
- Fix: [one-line change]
- File: [path:line]
- Test: [how verified — lint/test/manual]
- Risk: None (isolated, no behavioral change)

After quick fix: update references/bug-records.md, done. No RCA, no impact chain, no self-reflection needed.

If "trivial" fix touches >1 file or changes behavior → upgrade to Standard.

Auto-Initialize Knowledge Files

Check: references/bug-patterns.md exists?
  YES → search it in Phase 2B
  NO  → skip pattern search; create after first fix

Check: references/bug-records.md exists?
  YES → search it in Phase 2B
  NO  → skip records search; create after first fix

Check: references/blind-spots.md exists?
  YES → use it in Phase 3B
  NO  → skip blind spot check; create after first fix

Phase 1: Reproduce

> MUST have evidence before continuing. No evidence = no fix.

| Bug Type | Evidence Required |

|----------|------------------|

| Backend error | Stack trace + request/response |

| Frontend UI | Screenshot + browser console + user repro steps (Rule 8) |

| Performance | Before/after metrics + profiler output |

| Intermittent | Timing conditions + frequency estimate |

UI Bug Protocol (Rule 8):

Get user screenshot or screen recording
Open browser DevTools → check Console for errors/warnings
Inspect DOM structure (check for overflow clipping, z-index, Portal needs)
Reproduce the exact user steps
ONLY THEN form hypotheses

Evidence Bundle Template

### Trigger Conditions
- Input/params: [...]
- Environment: [OS/browser/runtime version]
- Timing: [action sequence or time interval]

### Observable Output
- Error message: [full error text]
- Logs: [key log lines]
- Screenshot/recording: [if available]

### Correlation IDs
- requestId/traceId: [...]
- sessionId: [...]

Phase 2: Root Cause Analysis

2A: Hypothesis Ladder

|---|-----------|-----------|-------------------|----------------|--------|

Rules: Sort by likelihood → each must be falsifiable → run rejection tests first → test ONE at a time → use 5 Whys to reach root cause.

Root Cause Confirmation Gate (Rule 1)

Root cause is confirmed only when ALL 4 conditions are met:

| Gate | Meaning |

|------|---------|

| Reproducible | Can trigger symptom in controlled scenario |

| Causal | Minimal change makes bug disappear |

| Reversible | Reverting the change makes bug reappear |

| Mechanistic | Can point to exact code path / state transition |

Framework Assumption Audit (Rule 7)

When fix involves framework/library behavior: list assumptions → read source code to verify → document in comments with source references.

2B: Search Knowledge Files (Rule 12)

> Search bug-patterns.md and bug-records.md for matching patterns.

> Skip if files don't exist (see Phase 0 auto-init).

| Match Level | Action |

|-------------|--------|

| High (symptom + root cause match) | Apply known fix, can skip remaining RCA |

| Medium (similar symptom) | Reference strategy, verify |

| No match | Full investigation, must deposit after fix |

2C: Impact Chain (Rule 3)

| Dimension | What to Check |

|-----------|---------------|

| Code | Bug file → direct callers → indirect callers → deep callers |

| Data | Corrupted records in DB/file/cache? Repair script needed? |

| Time | When introduced? Duration of exposure? Users affected? |

| Event | Message queues, WebSocket, background workers affected? |

2D: Similar Issue Scan (Rule 3)

> Scan ALL files for the same bug pattern, not just the reported file.

rg -n "function_name\|similar_pattern" --glob "*.{ts,tsx,py,js}"

Phase 3: Scope + Prediction

Scope Accuracy Gate (Rule 2)

| # | Gate | Meaning |

|---|------|---------|

| 1 | Consumer List | All consumers (callers/dependents) enumerated |

| 2 | Contract List | Modified contracts/interfaces/behaviors listed |

| 3 | Invariant Check | Must-hold invariants listed |

| 4 | Call Site Enum | All call sites enumerated and classified |

| 5 | Duplicate Scan | No parallel implementation left unfixed |

3A: Side Effect Prediction (Rule 4)

Change Blueprint — What exactly will change
Impact Ripple — L0 (code) → L1 (module) → L2 (feature) → L3 (system) → L4 (user)
Blind Spot Check — Read references/blind-spots.md and execute every active check
Go/No-Go Decision

Quick version (for Standard-tier, ≤5 LOC, 1 file):

## Quick Impact Check
- Change: [one-line description]
- Direct callers: [list or "none - local function"]
- Duplicates: [checked — none / found and planned]
- Could break: [prediction or "low risk - isolated"]
- Decision: GO

3B: Fix Strategy Comparison (>10 LOC, Complex only)

| Dimension | Strategy A | Strategy B |

|-----------|-----------|-----------|

| LOC change | | |

| Impact scope | | |

| Regression risk | | |

| Rollback-able | | |

Phase 4: Fix

Minimal change, prefer ≤50 LOC; justify if more
ONE change at a time, never batch unrelated fixes
Layer Rule (Rule 10): Before writing fix code, verify you're fixing the right layer:

| Problem in… | Fix… | Do NOT fix… |

|-------------|------|-------------|

| Params/config | Config or param passing | Business logic |

| Single component | That component | Framework |

| Multiple components same issue | Framework/base class | Each component one by one |

| Docs vs code mismatch | Both sides in sync | Only one side |

Pattern matching safety (Rule 11): regex, string match, name lookup → always consider boundary conditions
DB schema change? Generate Alembic migration:

```bash

cd backend && alembic revision --autogenerate -m "describe change"

```

Phase 5: Verify + Review

5A: Regression Verification (Rule 5)

| Category | Checks |

|----------|--------|

| Functional | Unit tests + integration + API + E2E + manual |

| Performance | No N+1 queries, no resource leaks, no response time increase |

| Concurrency | Thread-safe shared state, atomic operations, no race conditions |

Test the entire impact chain (L0-L3), not just the original bug.

5B: Runtime Deployment Verification (Rule 6)

| Step | Action | Evidence |

|------|--------|----------|

| 1 | Clear Python bytecode cache | __pycache__ removed |

| 2 | Restart backend service | PID changed from X to Y |

| 3 | Health check passes | /docs returns 200 |

| 4 | Exercise the fixed code path | Request triggers fixed logic |

If NOT deployed → restart and re-verify before proceeding.

5C: Bug Summary + Code Review (Rule 9)

## Bug Summary [BUG-XXX]
- **Symptom**: [one-sentence user-visible problem]
- **Root Cause**: [one-sentence actual cause]
- **Fix**: [one-sentence fix description]
- **Files Modified**: [file1.py, file2.ts]
- **Severity**: P0/P1/P2

Output Bug Summary → run code-review skill → if review finds issues → fix → re-verify

Stop condition: Code review clean + regression passed + deployment verified + original bug fixed.

Special Checks

| Bug Type | Key Checks |

|----------|-----------|

| API Bug | Frontend → API → Schema → Service → DB chain; field completeness |

| DB Migration | Model changed → alembic revision --autogenerate; no migration = schema drift |

| System-level | Draw E2E chain; define handshake evidence per edge; insert probes first |

| Cross-Surface | Shared artifact → identify contract → consumer list → regression matrix |

Phase 6: Knowledge Deposit + Self-Reflection

6.1 Update Knowledge Files (Rule 9)

| File | When to Update |

|------|---------------|

| references/bug-records.md | Every fix (project history) |

| references/bug-patterns.md | New pattern / new fix strategy (universal) |

| references/blind-spots.md | New blind spot discovered |

6.2 Self-Reflection (Rule 9)

| Dimension | Score (1-5) | Evidence |

|-----------|------------|----------|

| First-time correctness | [1-5] | Did the fix work on first attempt? |

| Scope accuracy | [1-5] | Did I find all affected areas? |

| Minimal change | [1-5] | Was the change as small as possible? |

| Side effect prediction | [1-5] | Did I predict all side effects? |

| Root cause depth | [1-5] | Did I fix root cause, not symptom? |

| Total | [/25] | |

|-------|--------------|-----------------|------------|

Regression Autopsy (when fix introduced a regression)

- **Original Bug**: [what was being fixed]
- **New Bug Introduced**: [what broke]
- **Why I didn't predict it**: [blind spot]
- **Classification**: [missed consumer / contract violation / edge case / ...]

Domain-Specific Checks

| Bug Type | Key Checks |

|----------|-----------|

| Backend/API | Schema drift, timeout/retry, transactions, N+1, connection pool, ORM lazy loading |

| Frontend/UI | State (useEffect deps, unmount), race conditions, CORS, hydration, overflow/Portal |

| System-level | Cross-layer chain, async/streaming, IPC, routing |

| Framework | Read source code first (Rule 7), verify assumptions with tests |

| AI/LLM | Tool binding modes, simulated vs native, streaming, token limits |

Skill Delegation

| Trigger | Delegate To |

|---------|-----------|

| Need new API endpoint | fullstack-developer |

| UI fix needed | frontend-design |

| Schema change needed | database-migrations |

| After fix (mandatory) | code-review |

Anti-Patterns (FORBIDDEN)

| Forbidden | Correct |

|-----------|---------|

| Fix without RCA | Hypothesis ladder first |

| Single hypothesis then fix | List 3-5 hypotheses, verify each |

| Fix UI bug by code reading alone | Get runtime evidence first (Rule 8) |

| Skip consumer list for shared code | Fill consumer list first |

| Tests pass but server runs old code | Clear cache + restart + verify fix is live (Rule 6) |

| Fix code but ignore corrupted data | Assess data impact + repair if needed |

| Trust framework docs blindly | Read source code or run tests (Rule 7) |

| Fix one copy, miss the duplicate | Grep function name; check both Path A and Path B |

| Pattern match without boundary check | Add word boundaries / anchors / exact match (Rule 11) |

| Model changed but no migration | Run alembic revision --autogenerate |

| Use full workflow for a typo | Use Quick Fix path (Phase 0 Trivial tier) |

| Skip self-reflection | Must score, analyze, and learn |

Final Checklist

Core (Standard + Complex tiers)

| # | Check | Phase |

|---|-------|-------|

| 1 | Severity (P0-P3) + Tier (Trivial/Standard/Complex) classified | 0 |

| 2 | Root cause passes 4 gates | 2A |

| 3 | Bug pattern library + records searched | 2B |

| 4 | Impact chain traced (code+data+time+event) | 2C |

| 5 | Similar issue scan completed | 2D |

| 6 | Scope passes 5 gates (incl. duplicate scan) | 3A |

| 7 | Side effect prediction + blind spot check | 3A |

| 8 | Regression verification ALL passed (L0-L3) | 5A |

| 9 | Runtime deployment verified | 5B |

| 10 | Bug Summary output + code-review passed | 5C |

| 11 | Knowledge files updated | 6.1 |

| 12 | Self-reflection completed | 6.2 |

| 13 | If DB model changed: Alembic migration generated | 5 |

| 14 | User confirmed fix + no new bugs | Final |

Trivial Tier Checklist (Quick Fix path only)

| # | Check | Status |

|---|-------|--------|

| 1 | Fix applied and tested (lint/test/manual) | [ ] |

| 2 | Bug record entry added | [ ] |

| 3 | No behavioral change introduced | [ ] |

OpenClaw Project Context

Architecture Map

backend/app/
├── api/v1/              # FastAPI routes (agents, auth, chat, skills, tools, profile)
├── core/
│   ├── graph/           # LangGraph StateGraph (agent_graph, nodes/llm_node, tool_node, prepare_node)
│   ├── langchain/       # LangChain tools (tools.py, shell_tool.py, e2b_tools.py)
│   ├── mcp/             # MCP server integration (pool.py)
│   ├── database.py      # SQLAlchemy async engine
│   └── security.py      # JWT auth
├── models/              # SQLAlchemy ORM models (agent, tool, user, skill)
├── schemas/             # Pydantic request/response schemas
├── services/            # Business logic (agent_executor, chat_service, tool_call_parser, ...)
├── middleware/           # Request logging, audit, error handling
└── main.py              # FastAPI app entry

frontend/src/
├── features/            # Feature modules (chat, settings, admin, knowledge, skills, agents)
│   ├── chat/            # ChatPageV2, MessageRenderer, ToolCallCard, SkillExecutionInline
│   └── ...
├── components/ui/       # shadcn/ui style components (dialog, switch, checkbox)
├── hooks/               # React hooks (useChatStream — SSE event handling)
├── store/               # Zustand state management
└── lib/                 # API client (api-client.ts), markdown utils

Tech Stack

| Layer | Technology |

|-------|-----------|

| Backend | FastAPI + Python 3.11+ |

| ORM | SQLAlchemy 2.0 (async) |

| DB | PostgreSQL (asyncpg) or MySQL (aiomysql) |

| Migrations | Alembic (backend/alembic/) |

| Cache | Redis |

| AI | LangChain 0.3.x + LangGraph 0.4.x |

| Vector DB | ChromaDB |

| Frontend | React 18 + Vite + TypeScript |

| UI | Radix UI + Tailwind CSS |

| State | Zustand + TanStack Query |

| Tests | pytest (backend), Vitest (frontend) |

| Deploy | Docker Compose, supports PyInstaller desktop build |

High-Risk Bug Zones

Backend Hot Zones

| Zone | Files | Why It's High-Risk |

|------|-------|--------------------|

| Simulated Tool Call Parsing | services/tool_call_parser.py, core/graph/nodes/llm_node.py | Regex-based; dual implementations; multi-arg edge cases |

| Agent Executor | services/agent_executor.py | 3000+ LOC; native + simulated modes; complex streaming |

| Tool Argument Remapping | core/graph/nodes/tool_node.py | LLM wrong param names → alphabetical guess |

| LLM Streaming (httpx) | services/llm_manager.py | Reasoning model fallback; SSE; reasoning_content |

| MCP Tool Integration | core/mcp/pool.py, services/tool_service.py | MCP lifecycle; command vs HTTP; timeout |

| Skill Runtime | services/skill_executor.py, services/skill_service.py | Script exec; env var injection; enhanced vs local |

| Chat Streaming | services/chat_service.py | SSE events; client disconnect; async save |

| Memory System | services/unified_memory_manager.py | L1/L2; embedding scoring; slow queries |

Frontend Hot Zones

| Zone | Files | Why It's High-Risk |

|------|-------|--------------------|

| SSE Chat Stream | hooks/useChatStream.ts | Event parsing; reconnection; reasoning_content |

| Tool Call Rendering | features/chat/components/ToolCallCard.tsx | Dynamic display; error states; loading |

| Skill Execution UI | features/chat/components/SkillExecutionInline.tsx | Inline status; progress; error display |

| Markdown Renderer | features/chat/components/MarkdownRenderer.tsx | Nested code fences; special chars; XSS |

| Agent Editor | features/agents/AgentEditorPage.tsx | Complex form state; tool/skill/KB associations |

| Zustand Store | store/ | State updates not re-rendering if reference unchanged |

Two Code Paths for Agent Execution

Path A: Direct Executor (most common)
  chat API → chat_service → agent_executor.py → tool_call_parser.py → tool execution

Path B: StateGraph (LangGraph)
  chat API → chat_service → agent_graph.py → llm_node.py → tool_node.py → tool execution

When fixing anything in Path A, always check Path B for the same issue (and vice versa).

Known Duplicate Implementations

| Function / Feature | Primary Location | Known Alternate Location |

|--------------------|-----------------|-------------------------|

| parse_simulated_tool_calls | services/tool_call_parser.py | core/graph/nodes/llm_node.py |

| Tool loading / binding | services/agent_executor.py | core/graph/agent_graph.py |

| Token counting | services/token_counter.py | May have inline counting in agent_executor.py |

| Memory management | services/unified_memory_manager.py | core/graph/nodes/prepare_node.py |

OpenClaw Common Framework Pitfalls (Rule 7)

| Area | Assumption | Reality |

|------|-----------|---------|

| Config load order | First file has priority | Often last file wins (dict.update) |

| ORM lazy loading | Relations auto-load | Default lazy, causes N+1 |

| Async on Windows | Same as Linux | Windows uses ProactorEventLoop; run_dev.py forces SelectorEventLoop |

| Pydantic serialization | model_dump() includes all | exclude_unset=True changes behavior |

| LangChain tool binding | All models support tools | Reasoning models need simulated mode |

| __pycache__ | Python uses latest source | Stale .pyc can persist across restarts |

Windows Development Environment Gotchas

| Issue | Symptom | Workaround |

|-------|---------|-----------|

| Path separators | \ vs / | Use pathlib.Path or os.path.join |

| asyncio event loop | ProactorEventLoop default | run_dev.py forces loop="asyncio" |

| __pycache__ file locks | Can't delete while running | Kill process FIRST, then clean |

| Console encoding | GBK/CP936 default | sys.stdout.reconfigure(encoding='utf-8') |

| Playwright on Windows | Browser launch may fail | Runs in separate thread with own loop |

Verification Commands (OpenClaw)

Backend

cd backend
ruff check app/                                    # Lint
python -m pytest tests/ -v --tb=short              # Unit tests
python -m pytest tests/test_specific.py -v         # Specific test
alembic revision --autogenerate -m "check"         # DB migration check

Frontend

cd frontend
npm run lint
npm run typecheck
npm test
npm run build

Backend Server Restart

Get-WmiObject Win32_Process -Filter "Name='python.exe'" | Where { $_.CommandLine -like "*run_dev*" }
Stop-Process -Id [PID] -Force
Get-ChildItem -Path "backend" -Recurse -Filter "__pycache__" -Directory | Remove-Item -Recurse -Force
Start-Process -FilePath "backend\venv\Scripts\python.exe" -ArgumentList "run_dev.py" -WorkingDirectory "backend"
Invoke-WebRequest -Uri "http://127.0.0.1:8000/docs" -UseBasicParsing -TimeoutSec 5

Reference Files

Living Data Files (update after every fix)

| File | Purpose |

|------|---------|

| references/bug-records.md | Project-specific bug history |

| references/blind-spots.md | Single source of truth for AI blind spot registry |

Pattern Libraries (domain knowledge)

| File | Purpose |

|------|---------|

| references/bug-patterns.md | Universal bug pattern library (11 categories) |

| references/backend-patterns.md | Backend issues (API, ORM, LLM integration, OpenClaw-specific) |

| references/frontend-patterns.md | Frontend issues (React hooks, race conditions, CORS) |

Detailed Guides

| File | Purpose |

|------|---------|

| references/system-rca.md | System-level RCA (cross-layer, multi-process bugs) |

| references/regression-matrix.md | Complete zero-regression verification matrix |

Skill Evolution

Update this skill when:

Code review finds a bug that the workflow should have prevented
A recurring bug class repeats across fixes

Prefer updating specific sections over adding new rules.

After updates, validate that the workflow is still coherent and not overly bureaucratic.

bug-fixing

概述

Bug Fix v4.0 — OpenClaw Edition (Zero-Regression + Portable)

Iron Rules (12 — NEVER Violate)

Workflow Overview

Phase 0: Triage + Severity

Severity Classification (controls workflow depth)

Tier Classification (controls fix path)

Quick Fix Path (Trivial only)

Auto-Initialize Knowledge Files

Phase 1: Reproduce

Evidence Bundle Template

Phase 2: Root Cause Analysis

2A: Hypothesis Ladder

Root Cause Confirmation Gate (Rule 1)

Framework Assumption Audit (Rule 7)

2B: Search Knowledge Files (Rule 12)

2C: Impact Chain (Rule 3)

2D: Similar Issue Scan (Rule 3)

Phase 3: Scope + Prediction

Scope Accuracy Gate (Rule 2)

3A: Side Effect Prediction (Rule 4)

3B: Fix Strategy Comparison (>10 LOC, Complex only)

Phase 4: Fix

Phase 5: Verify + Review

5A: Regression Verification (Rule 5)

5B: Runtime Deployment Verification (Rule 6)

5C: Bug Summary + Code Review (Rule 9)

Special Checks

Phase 6: Knowledge Deposit + Self-Reflection

6.1 Update Knowledge Files (Rule 9)

6.2 Self-Reflection (Rule 9)

Regression Autopsy (when fix introduced a regression)

Domain-Specific Checks

Skill Delegation

Anti-Patterns (FORBIDDEN)

Final Checklist

Core (Standard + Complex tiers)

Trivial Tier Checklist (Quick Fix path only)

OpenClaw Project Context

Architecture Map

Tech Stack

High-Risk Bug Zones

Backend Hot Zones

Frontend Hot Zones

Two Code Paths for Agent Execution

Known Duplicate Implementations

OpenClaw Common Framework Pitfalls (Rule 7)

Windows Development Environment Gotchas

Verification Commands (OpenClaw)

Backend

Frontend

Backend Server Restart

Reference Files

Living Data Files (update after every fix)

Pattern Libraries (domain knowledge)

Detailed Guides

Skill Evolution

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Data Analysis

skill-expert-skills

Excel / XLSX