Shell: All Bash tool calls below use PowerShell 7 syntax — the default
Claude Code shell on Windows. On Unix, Bash auto-uses bash. Commands below
are written for bash; if running on Windows, PowerShell equivalents:
| Bash command | PowerShell equivalent | |
|---|---|---|
| --- | --- | |
which cmd | Get-Command cmd | |
grep pattern file | Select-String pattern file | |
awk | Use Select-String or PowerShell pipeline | |
find ... -exec ... {} + | `Get-ChildItem -Recurse -Filter ... \ | ForEach-Object { ... }` |
wc -l | Measure-Object -Line | |
sort -rn | Sort-Object -Descending | |
head -N | Select-Object -First N | |
tr -d '%' | -replace '%','' | |
sed -n 'N,Mp' | Select-Object -Skip (N-1) -First (M-N+1) |
If unsure which shell is active, run: Bash: echo $PSVersionTable — if it prints,
you're on PowerShell.
Loop: PRE-FLIGHT → INGEST → ROUTE → EXECUTE → VERIFY → LOOP.
Every gate is a tool call exit code. Every quality check is adversarial —
try to break it, don't check a box.
OKRs are a compliance game for LLMs — set a low bar, hit it, look good.
Adversarial evaluation asks: "What would make this FAIL?" Then tests that.
OKR pattern (DON'T): "Coverage ≥ 80%" → LLM writes 80% shallow tests → passes
Adversarial (DO): "Delete the implementation. Do tests fail? If not, they're
mock-trash — rewrite." → LLM can't game this.
Before Phase 0, classify the project state. This determines the bootstrap path.
Glob {project_root}/*.py 2>/dev/null
Glob {project_root}/package.json 2>/dev/null
Glob {project_root}/go.mod 2>/dev/null
Glob {project_root}/Cargo.toml 2>/dev/null
Glob {project_root}/PRD*.md 2>/dev/null
Read .memory/project.md → exists?
| Condition | State | Action |
|---|---|---|
| --- | --- | --- |
| No source files AND no PRD AND no .memory | greenfield_empty | Interactive bootstrap: ask tech stack, init project |
| PRD exists but no source files | greenfield_seeded | Phase 0 with PRD, skip baseline (nothing to measure) |
| Source files exist, no PRD | brownfield_no_spec | Ask for PRD, then Phase 0 |
| Source files + PRD exist | brownfield_full | Full Phase 0 with baseline measurement |
| .memory/project.md exists | resume | Phase 1 |
Bash: git --version → exit 0 = OK, exit ≠ 0 → AskUserQuestion: "Git is not installed. Install it first?"
(On PowerShell: git --version works the same way.)
Language-specific tool checks happen in Step 2 after tech stack is known.
ToolSearch: "gitnexus" → if found, usable for brownfield projects. If absent, skip.
- mcp__gitnexus__impact: before modifying existing symbols → see blast radius
- mcp__gitnexus__detect_changes: after milestone commit → architecture drift advisory
- Never used for greenfield — no existing code to index.
Skill: "superpowers" → if found, usable. If absent, skip.
MCP memory → test with mcp__memory__create_entities. If absent, .memory/project.md is authoritative.
codegraph → `Bash: codegraph affected <file>` — find tests affected by a changed file.
Redundant with full-test-suite gate. Only useful for selective re-runs in brownfield.
Escalate (AskUserQuestion) when:
Do NOT escalate for:
Read .memory/project.md. Exists → Phase 1.
Classify state per Pre-Flight table above.
If state is greenfield_empty or greenfield_seeded:
AskUserQuestion: "This is a new project. What language and tools?"
→ Record: language, test framework, linter, type checker, security scanner.
If greenfield_empty → Bash: git init && create minimal project structure
(src/, tests/, pyproject.toml or equivalent).
If greenfield_seeded → PRD exists, skip baseline measurement.
Set existing_test_count = 0, existing_max_file_lines = 500, existing_coverage_pct = 0.
Coverage floor for greenfield: 50% minimum, not "never decrease from 0%."
File size limit for greenfield: 500 (no existing code to measure).
Generate .gitignore with standard patterns for the language.
At minimum: .memory/ __pycache__/ .pytest_cache/ .mypy_cache/ *.pyc
node_modules/ target/ coverage/ dist/
Bash: git add .gitignore && git commit -m "chore: init project structure"
Glob: PRD*.md, SPEC*.md, *.prd.md, requirements*.md
Empty → AskUserQuestion: "Where is the PRD file?" Then Read it.
Extract: project name, tech stack, constraints, milestones with dependencies,
per-milestone completion standards, milestone type (code | manual | research | design).
If milestone structure unclear → Read references/prd-parsing-patterns.md.
From PRD, project files, or AskUserQuestion. If the project is NOT Python,
load the correct pattern from references/verification-patterns.md.
Record: {test_cmd}, {lint_cmd}, {typecheck_cmd}, {security_cmd}.
Skip if greenfield (values already set in Step 0a). Otherwise:
# Count existing tests (adapt -name pattern to detected language)
Bash: find . -name "test_*.py" -exec grep -c "def test_" {} + 2>/dev/null | awk -F: '{s+=$2} END {print s+0}'
result=$(above command); if [ -z "$result" ] || [ "$result" = "0" ]; then echo 0; else echo "$result"; fi
→ existing_test_count
# Measure largest file (adapt -name pattern to detected language)
Bash: result=$(find . -name "*.py" -exec wc -l {} + 2>/dev/null | sort -rn | awk 'NR==1{print $1}'); if [ -z "$result" ] || [ "$result" = "0" ]; then echo 500; else echo "$result"; fi
→ existing_max_file_lines
# Measure existing coverage
Bash: {test_cmd} --cov=. --cov-report=term 2>/dev/null | grep TOTAL | awk '{print substr($NF, 1, length($NF)-1)}' || echo 0
→ existing_coverage_pct
Derivation rules (record in .memory/project.md):
| Value | Greenfield Default | Brownfield Derivation |
|---|---|---|
| --- | --- | --- |
| New test minimum per module | 2 | max(1, ceil(existing_test_count / milestone_count)) |
| File size limit | 500 | existing_max_file_lines (never lower than current max) |
| Coverage floor | 50% | max(existing_coverage_pct, 50%) — zero baseline does NOT exempt |
| Retry ceiling | 3 | Start at 3. Minimum 1, maximum 5. If no failures across 5 milestones → ceiling was too loose, reduce by 1 (never below 1). Reset to 3 after any milestone fails. |
Only if state is brownfield_full or brownfield_no_spec AND ToolSearch: "gitnexus" found it.
mcp__uml__gitnexus_index: path = {project_root}
→ Indexes the existing codebase so Phase 2 can use impact analysis.
Skip if indexing fails (not a blocker — gitnexus is advisory, not mandatory).
This is a one-time cost: the index persists and auto-syncs on subsequent runs.
Record in .memory/project.md: gitnexus_indexed: yes | no — date.
IMPORTANT: The template below contains template variables marked «VAR».
Before writing CLAUDE.md, substitute every «VAR» with the concrete value
measured or detected in Steps 2-3. The resulting file MUST contain zero
template variables — they are resolved at write time.
Template:
# CLAUDE.md
## TDD (NON-NEGOTIABLE)
Every code change: RED → GREEN → REFACTOR → BREAK.
- RED: Write test. Bash: «test_cmd» «test_path» → MUST exit non-zero.
- GREEN: Minimum code. Bash: «test_cmd» «test_path» → MUST exit zero.
- REFACTOR: Clean. Bash: «test_cmd» «test_path» → MUST exit zero.
- BREAK (adversarial): Delete/comment out implementation → run test → MUST exit non-zero.
The failure MUST be an assertion failure (AssertionError, assert), NOT import/name error.
EXCEPTION: modules with external dependencies (database drivers, hardware SDKs,
CAD APIs, cloud services). For these, BREAK verifies that mock-returned values
are asserted — the test must fail when mock is removed, not when import fails.
If test passes without implementation: the test is vacuous. Rewrite it.
This is NOT optional. Every test must prove it catches absence.
## Verification Gate (run after EVERY module, then again after refactor)
1. Bash: «test_cmd» --cov=«module» → coverage must NOT decrease from «coverage_floor»% (hard floor: 50%)
2. Bash: «lint_cmd» → exit 0
3. Bash: «typecheck_cmd» → exit 0
4. Bash: «security_cmd» → exit 0 (or note "no security tool configured")
5. Grep: «forbidden_patterns» → MUST exit 1
## Structural Discipline
- File size ≤ «file_size_limit» lines (derived from codebase baseline; default 500 for new projects)
- No dead code. No bare except:. One-way dependencies.
- Test isolation: no shared mutable state, no external service calls in unit tests.
Mock external dependencies.
## Commit Discipline
Bash: git add «module_file» «test_file» && git commit -m "M{N}: «module» — {change}"
Never commit unrelated files. Never commit when any gate check fails.
Read references/quality-gate-checklist.md. For each checklist item:
Try to BREAK the generated CLAUDE.md:
- Are there unresolved placeholders? Grep "placeholder|TODO|{.*}" → MUST exit 1.
NOTE: { and } are never valid in a final CLAUDE.md. All template variables
were replaced with concrete values in Step 4.
- Are completion standards falsifiable? Grep "works correctly|looks good|feels right|
is done|user satisfied|PM signoff|code review approve|demo works" → MUST exit 1.
- Is the BREAK step present? Grep "BREAK" → MUST find it.
- Does BREAK include the external-dependency exception? Grep "external depend"
→ MUST find it (for projects that need it).
- Are verification commands real tools? Grep "«test_cmd»|«lint_cmd»|«typecheck_cmd»"
→ MUST exit 1 (these are template markers, not final content).
Any FAIL → Edit CLAUDE.md to fix → re-check. Loop until all pass.
TodoWrite: "CLAUDE.md quality gate" → done only when all checks pass.
Template (substitute «VAR» with extracted values, write to project root):
# «Project Name»
## Tech Stack
- Language: «language»
- Test: «test_cmd»
- Lint: «lint_cmd»
- Type: «typecheck_cmd»
- Security: «security_cmd»
## Constraints
«extracted from PRD»
## Baseline (measured «date»)
- existing_test_count: «N»
- existing_max_file_lines: «N»
- existing_coverage_pct: «N»%
- coverage_floor: «N»% (max of baseline and 50%)
- file_size_limit: «N»
- retry_ceiling: «N» (floor 1, reset to 3 on failure)
- gitnexus_indexed: yes | no — «date» (greenfield = no, skip Step 3b)
## Milestones
### «M{N}»: «Name»
- Depends on: «M{N-1} | none»
- Type: code | manual | research | design
- Has external deps: yes | no (if yes, BREAK step uses mock-exception rule)
- Files: «paths»
- Completion: «falsifiable criteria»
- Verify: «verification command»
- Status: pending | in_progress | done | skipped
Each Completion field must pass the adversarial falsifiability test:
Grep: "works correctly|looks good|feels right|is done|user satisfied|PM signoff|code review approve|demo works"
in .memory/project.md → MUST exit 1
TodoWrite: one task per milestone, status = pending
TodoWrite: first no-dependency milestone → in_progress
Read .memory/project.md + TodoWrite
├─ File missing → Phase 0
├─ All done → Phase 3
├─ No in_progress → set first pending no-dep → in_progress
└─ Has in_progress → that's the current milestone
"where are we" / "status" / "progress" → display TodoWrite + .memory/project.md.
Do NOT enter Phase 2.
TodoWrite: mark skipped. Grep for dependents → warn.in_progress. Enter Phase 2.Match by: number (M3, T2), name substring ("auth"), file path.
Cross-reference .memory/project.md.
Any dependency not done or skipped → AskUserQuestion:
"Depends on {name} ({status}). Run dep first, skip it, or pick another?"
| Type | Action |
|---|---|
| ------ | -------- |
code | Adversarial TDD loop below |
manual | AskUserQuestion: state what human must do. Wait for confirmation. |
research / design | Glob for deliverable → exists? → verify against completion standard. Skip TDD. |
If .memory/project.md has gitnexus_indexed: yes and the milestone touches
EXISTING functions (not new files — check with Grep for the target symbol):
Read .memory/project.md → milestone Files section → identify which symbols are EXISTING vs NEW.
For each EXISTING symbol being modified:
mcp__gitnexus__impact:
target = "<function_or_class_name>"
direction = "upstream" (what depends on this — what will break)
summaryOnly = true (just counts and risk — don't need full detail yet)
→ Risk CRITICAL or HIGH → display findings, but do NOT block.
Only ASK if risk = CRITICAL AND dependents ≥ 10:
AskUserQuestion: "Changing <symbol>. Impact: <N> direct callers, <M> processes affected.
Proceed or reassign to a later milestone?"
→ Risk MEDIUM or LOW → proceed. The tests will catch breakage.
This is ADVISORY only. The BREAK step + full test suite are the real safety net.
Do NOT let gitnexus override the TDD loop — if impact says CRITICAL but tests
all pass after change, the tests are right and gitnexus is just warning about
coupling that hasn't caused a bug yet.
Each step is gated by a Bash exit code. No evidence → blocked → loop back.
STEP 1 — RED (prove test FAILS for the RIGHT reason):
Write: test file.
Bash: «test_cmd» «test_file» -v
exit ≠ 0 → check WHY it failed:
Grep output for "AssertionError|assert|E " → found → STEP 2
Grep output for "SyntaxError|ImportError|NameError" → found
→ "Test failed for wrong reason (syntax/import/name error,
not a real assertion). Rewrite." → STEP 1
exit = 0 → "Test passes with zero implementation. Vacuous." → STEP 1
STEP 2 — GREEN + GENERALIZE (prove impl isn't hardcoded):
Write: minimum implementation.
Bash: «test_cmd» «test_file» -v → exit 0 → STEP 2b
exit ≠ 0 → "Fix implementation. Do NOT change the test." → STEP 2
STEP 2b — GENERALIZE:
Write: second test with DIFFERENT inputs (edge case, boundary value).
Bash: «test_cmd» «test_file» -v → MUST exit 0 WITHOUT changing impl.
If impl must change: hardcoded to first test → rewrite from STEP 1.
STEP 3 — BREAK (prove test catches absence):
CHECK .memory/project.md: does this milestone have "Has external deps: yes"?
┌─ YES (external deps: DB driver, HW SDK, CAD API, cloud service):
│ Write the test with mocking. Run it GREEN first (test + mock pass).
│ THEN remove ONLY the mock setup, NOT the import.
│ Bash: «test_cmd» «test_file» -v
│ exit ≠ 0 with AssertionError from missing mock → mock is real → STEP 4
│ exit = 0 → mock was never asserted → rewrite from STEP 1
└─ NO (pure logic module):
Edit: delete or comment out the implementation.
Bash: «test_cmd» «test_file» -v
exit ≠ 0 → Grep output for "AssertionError|assert|E " → found
→ "Test caught deletion via real assertion." → STEP 4
exit ≠ 0 but ONLY ImportError/NameError → "Test only checks module
exists, not that it works. Rewrite from STEP 1."
exit = 0 → "Test passes without ANY implementation. Mock-trash." → STEP 1
STEP 4 — REFACTOR (prove cleanup safe):
Edit: extract helpers, improve names, remove duplication.
Bash: «test_cmd» «test_file» -v → exit 0 → STEP 5
exit ≠ 0 → "Refactor broke something. Fix." → STEP 4
STEP 5 — GATE (adversarial: try to break every way):
Bash: «test_cmd» --cov=«module» → coverage ≥ coverage_floor AND
Select-String "assert" «test_file» | Measure-Object -Line → ≥ test function count
or Bash: Grep -c "assert" «test_file» → count ≥ number of test functions
Bash: «lint_cmd» → exit 0
Bash: «typecheck_cmd» → exit 0
Bash: «security_cmd» → exit 0 (or note "no security tool configured")
Grep: "«forbidden_patterns»" → exit 1
ALL pass → git add «module_file» «test_file» → git commit → advance to Phase 1
ANY fail → fix → re-run gate
Same failure crosses retry_ceiling → AskUserQuestion with failure pattern
┌─ Post-commit (brownfield only, gitnexus indexed):
│ mcp__gitnexus__detect_changes: scope = "unstaged" → expect empty
│ (nothing left uncommitted — gitnexus checks committed tree).
│ If mcp__gitnexus__detect_changes returns affected flows:
│ → "Architecture drift: <N> flows now traverse these files." — advisory.
│ → If same flow appears in drift for 2+ consecutive milestones:
│ AskUserQuestion: "Flow <name> has structural drift across milestones —
│ consider refactoring."
│ → Does NOT block. Tests are authoritative. GitNexus is structural awareness.
Paste the Bash output at each step. No evidence → blocked.
When Grep finds ≥2 cross-file imports between candidate files:
Agent: "Implement «module». Read CLAUDE.md. Adversarial TDD loop — each step
gated by Bash exit code. Paste evidence at every checkpoint. No skip allowed."
When Grep finds ZERO shared files AND .memory/project.md shows ZERO dep edges
between milestones:
Workflow: pipeline the independent milestones.
Each runs the full adversarial TDD loop internally.
Bash: check the BREAK step from previous milestone
→ Did the BREAK step catch a real bug? If yes, the adversarial gates are working.
→ Did no BREAK step ever fail? Then tests are too shallow. Require more assertions
per test in the next milestone.
Bash: «test_cmd» --cov=. --cov-report=term | grep TOTAL
→ Record actual coverage. Next milestone must NOT decrease this number.
→ If coverage increased by ≥5%: gates working. Keep.
→ If coverage increased by <1%: diversify assertions in next milestone.
TodoWrite: all = completed or skipped
Bash: git log --oneline (show milestone commits)
Bash: «test_cmd» --cov=. --cov-report=term (final coverage)
Glob: all promised deliverable files
Report:
"{N} done, {S} skipped | Coverage: {Y}% | BREAK checks passed: {B}"
Adversarial note: "Deleted {X} implementations. Tests caught {Y} deletions.
Miss rate: {Z} — these tests need hardening."
| File | Tool | When |
|---|---|---|
| --- | --- | --- |
CLAUDE.md | Write once | Phase 0, never overwritten |
.memory/project.md | Read then Edit | Bootstrap + after each milestone |
.gitignore | Write once | Phase 0a (greenfield) or Phase 0 Step 3 |
| TodoWrite | TodoWrite | Every state transition |
On crash: Read references/session-recovery.md.
| File | When to Load |
|---|---|
| --- | --- |
references/verification-patterns.md | Phase 0 Step 2 (detect tech stack) and for non-Python projects |
references/quality-gate-checklist.md | Phase 0 Step 5 (CLAUDE.md adversarial review) |
references/prd-parsing-patterns.md | Phase 0 Step 1 when PRD structure is unclear |
references/nx-sdd-example.md | ONLY when uncertain about milestone granularity |
references/session-recovery.md | On crash or session loss |
共 1 个版本