PRD-to-Action Harness

Shell: All Bash tool calls below use PowerShell 7 syntax — the default

Claude Code shell on Windows. On Unix, Bash auto-uses bash. Commands below

are written for bash; if running on Windows, PowerShell equivalents:

Bash command	PowerShell equivalent
---	---
`which cmd`	`Get-Command cmd`
`grep pattern file`	`Select-String pattern file`
`awk`	Use `Select-String` or PowerShell pipeline
`find ... -exec ... {} +`	`Get-ChildItem -Recurse -Filter ... \	ForEach-Object { ... }`
`wc -l`	`Measure-Object -Line`
`sort -rn`	`Sort-Object -Descending`
`head -N`	`Select-Object -First N`
`tr -d '%'`	`-replace '%',''`
`sed -n 'N,Mp'`	`Select-Object -Skip (N-1) -First (M-N+1)`

If unsure which shell is active, run: Bash: echo $PSVersionTable — if it prints,

you're on PowerShell.

Loop: PRE-FLIGHT → INGEST → ROUTE → EXECUTE → VERIFY → LOOP.

Every gate is a tool call exit code. Every quality check is adversarial —

try to break it, don't check a box.

Core Principle: Adversarial Evaluation

OKRs are a compliance game for LLMs — set a low bar, hit it, look good.

Adversarial evaluation asks: "What would make this FAIL?" Then tests that.

OKR pattern (DON'T):     "Coverage ≥ 80%" → LLM writes 80% shallow tests → passes
Adversarial (DO):        "Delete the implementation. Do tests fail? If not, they're
                          mock-trash — rewrite." → LLM can't game this.

Pre-Flight: Environment Classifier

Before Phase 0, classify the project state. This determines the bootstrap path.

Project State Classification

Glob {project_root}/*.py 2>/dev/null
Glob {project_root}/package.json 2>/dev/null
Glob {project_root}/go.mod 2>/dev/null
Glob {project_root}/Cargo.toml 2>/dev/null
Glob {project_root}/PRD*.md 2>/dev/null

Read .memory/project.md → exists?

Condition	State	Action
---	---	---
No source files AND no PRD AND no .memory	greenfield_empty	Interactive bootstrap: ask tech stack, init project
PRD exists but no source files	greenfield_seeded	Phase 0 with PRD, skip baseline (nothing to measure)
Source files exist, no PRD	brownfield_no_spec	Ask for PRD, then Phase 0
Source files + PRD exist	brownfield_full	Full Phase 0 with baseline measurement
.memory/project.md exists	resume	Phase 1

Required Tools

Bash: git --version → exit 0 = OK, exit ≠ 0 → AskUserQuestion: "Git is not installed. Install it first?"

(On PowerShell: git --version works the same way.)

Language-specific tool checks happen in Step 2 after tech stack is known.

Optional Tools

ToolSearch: "gitnexus" → if found, usable for brownfield projects. If absent, skip.
  - mcp__gitnexus__impact: before modifying existing symbols → see blast radius
  - mcp__gitnexus__detect_changes: after milestone commit → architecture drift advisory
  - Never used for greenfield — no existing code to index.
Skill: "superpowers" → if found, usable. If absent, skip.
MCP memory → test with mcp__memory__create_entities. If absent, .memory/project.md is authoritative.
codegraph → `Bash: codegraph affected <file>` — find tests affected by a changed file.
  Redundant with full-test-suite gate. Only useful for selective re-runs in brownfield.

Human Escalation Rules

Escalate (AskUserQuestion) when:

Required tool missing
Genuinely ambiguous decision (valid approaches, no clear winner)
Milestone dependency blocked — human chooses: wait / skip / reorder
Same failure pattern crosses retry ceiling — report the pattern, ask direction
File operation would delete/overwrite something not created by this harness

Do NOT escalate for:

Code implementation decisions (that's the TDD loop's job)
Test failures with clear fixes
Routine file operations

Phase 0: INGEST — Bootstrap

Step 0: State Check

Read .memory/project.md. Exists → Phase 1.

Classify state per Pre-Flight table above.

Step 0a: Greenfield Init

If state is greenfield_empty or greenfield_seeded:

AskUserQuestion: "This is a new project. What language and tools?"
  → Record: language, test framework, linter, type checker, security scanner.

If greenfield_empty → Bash: git init && create minimal project structure
  (src/, tests/, pyproject.toml or equivalent).

If greenfield_seeded → PRD exists, skip baseline measurement.
  Set existing_test_count = 0, existing_max_file_lines = 500, existing_coverage_pct = 0.
  Coverage floor for greenfield: 50% minimum, not "never decrease from 0%."
  File size limit for greenfield: 500 (no existing code to measure).

Generate .gitignore with standard patterns for the language.
  At minimum: .memory/  __pycache__/  .pytest_cache/  .mypy_cache/  *.pyc
  node_modules/  target/  coverage/  dist/

Bash: git add .gitignore && git commit -m "chore: init project structure"

Step 1: Find and Read the PRD

Glob: PRD*.md, SPEC*.md, *.prd.md, requirements*.md

Empty → AskUserQuestion: "Where is the PRD file?" Then Read it.

Extract: project name, tech stack, constraints, milestones with dependencies,

per-milestone completion standards, milestone type (code | manual | research | design).

If milestone structure unclear → Read references/prd-parsing-patterns.md.

Step 2: Detect Tech Stack & Load Verification Commands

From PRD, project files, or AskUserQuestion. If the project is NOT Python,

load the correct pattern from references/verification-patterns.md.

Record: {test_cmd}, {lint_cmd}, {typecheck_cmd}, {security_cmd}.

Step 3: Measure the Codebase Baseline

Skip if greenfield (values already set in Step 0a). Otherwise:

# Count existing tests (adapt -name pattern to detected language)
Bash: find . -name "test_*.py" -exec grep -c "def test_" {} + 2>/dev/null | awk -F: '{s+=$2} END {print s+0}'
result=$(above command); if [ -z "$result" ] || [ "$result" = "0" ]; then echo 0; else echo "$result"; fi
  → existing_test_count

# Measure largest file (adapt -name pattern to detected language)
Bash: result=$(find . -name "*.py" -exec wc -l {} + 2>/dev/null | sort -rn | awk 'NR==1{print $1}'); if [ -z "$result" ] || [ "$result" = "0" ]; then echo 500; else echo "$result"; fi
  → existing_max_file_lines

# Measure existing coverage
Bash: {test_cmd} --cov=. --cov-report=term 2>/dev/null | grep TOTAL | awk '{print substr($NF, 1, length($NF)-1)}' || echo 0
  → existing_coverage_pct

Derivation rules (record in .memory/project.md):

Value	Greenfield Default	Brownfield Derivation
---	---	---
New test minimum per module	2	`max(1, ceil(existing_test_count / milestone_count))`
File size limit	500	`existing_max_file_lines` (never lower than current max)
Coverage floor	50%	`max(existing_coverage_pct, 50%)` — zero baseline does NOT exempt
Retry ceiling	3	Start at 3. Minimum 1, maximum 5. If no failures across 5 milestones → ceiling was too loose, reduce by 1 (never below 1). Reset to 3 after any milestone fails.

Step 3b: Brownfield — Index with GitNexus

Only if state is brownfield_full or brownfield_no_spec AND ToolSearch: "gitnexus" found it.

mcp__uml__gitnexus_index: path = {project_root}
  → Indexes the existing codebase so Phase 2 can use impact analysis.
  Skip if indexing fails (not a blocker — gitnexus is advisory, not mandatory).
  This is a one-time cost: the index persists and auto-syncs on subsequent runs.

Record in .memory/project.md: gitnexus_indexed: yes | no — date.

Step 4: Generate CLAUDE.md

IMPORTANT: The template below contains template variables marked «VAR».

Before writing CLAUDE.md, substitute every «VAR» with the concrete value

measured or detected in Steps 2-3. The resulting file MUST contain zero

template variables — they are resolved at write time.

Template:

# CLAUDE.md

## TDD (NON-NEGOTIABLE)
Every code change: RED → GREEN → REFACTOR → BREAK.
- RED: Write test. Bash: «test_cmd» «test_path» → MUST exit non-zero.
- GREEN: Minimum code. Bash: «test_cmd» «test_path» → MUST exit zero.
- REFACTOR: Clean. Bash: «test_cmd» «test_path» → MUST exit zero.
- BREAK (adversarial): Delete/comment out implementation → run test → MUST exit non-zero.
  The failure MUST be an assertion failure (AssertionError, assert), NOT import/name error.
  EXCEPTION: modules with external dependencies (database drivers, hardware SDKs,
  CAD APIs, cloud services). For these, BREAK verifies that mock-returned values
  are asserted — the test must fail when mock is removed, not when import fails.
  If test passes without implementation: the test is vacuous. Rewrite it.
  This is NOT optional. Every test must prove it catches absence.

## Verification Gate (run after EVERY module, then again after refactor)
1. Bash: «test_cmd» --cov=«module» → coverage must NOT decrease from «coverage_floor»% (hard floor: 50%)
2. Bash: «lint_cmd» → exit 0
3. Bash: «typecheck_cmd» → exit 0
4. Bash: «security_cmd» → exit 0 (or note "no security tool configured")
5. Grep: «forbidden_patterns» → MUST exit 1

## Structural Discipline
- File size ≤ «file_size_limit» lines (derived from codebase baseline; default 500 for new projects)
- No dead code. No bare except:. One-way dependencies.
- Test isolation: no shared mutable state, no external service calls in unit tests.
  Mock external dependencies.

## Commit Discipline
Bash: git add «module_file» «test_file» && git commit -m "M{N}: «module» — {change}"
Never commit unrelated files. Never commit when any gate check fails.

Step 5: CLAUDE.md Adversarial Quality Gate

Read references/quality-gate-checklist.md. For each checklist item:

Try to BREAK the generated CLAUDE.md:
  - Are there unresolved placeholders? Grep "placeholder|TODO|{.*}" → MUST exit 1.
    NOTE: { and } are never valid in a final CLAUDE.md. All template variables
    were replaced with concrete values in Step 4.
  - Are completion standards falsifiable? Grep "works correctly|looks good|feels right|
    is done|user satisfied|PM signoff|code review approve|demo works" → MUST exit 1.
  - Is the BREAK step present? Grep "BREAK" → MUST find it.
  - Does BREAK include the external-dependency exception? Grep "external depend"
    → MUST find it (for projects that need it).
  - Are verification commands real tools? Grep "«test_cmd»|«lint_cmd»|«typecheck_cmd»"
    → MUST exit 1 (these are template markers, not final content).

Any FAIL → Edit CLAUDE.md to fix → re-check. Loop until all pass.

TodoWrite: "CLAUDE.md quality gate" → done only when all checks pass.

Step 6: Write .memory/project.md

Template (substitute «VAR» with extracted values, write to project root):

# «Project Name»

## Tech Stack
- Language: «language»
- Test: «test_cmd»
- Lint: «lint_cmd»
- Type: «typecheck_cmd»
- Security: «security_cmd»

## Constraints
«extracted from PRD»

## Baseline (measured «date»)
- existing_test_count: «N»
- existing_max_file_lines: «N»
- existing_coverage_pct: «N»%
- coverage_floor: «N»% (max of baseline and 50%)
- file_size_limit: «N»
- retry_ceiling: «N» (floor 1, reset to 3 on failure)
- gitnexus_indexed: yes | no — «date» (greenfield = no, skip Step 3b)

## Milestones
### «M{N}»: «Name»
- Depends on: «M{N-1} | none»
- Type: code | manual | research | design
- Has external deps: yes | no  (if yes, BREAK step uses mock-exception rule)
- Files: «paths»
- Completion: «falsifiable criteria»
- Verify: «verification command»
- Status: pending | in_progress | done | skipped

Each Completion field must pass the adversarial falsifiability test:

Grep: "works correctly|looks good|feels right|is done|user satisfied|PM signoff|code review approve|demo works"
in .memory/project.md → MUST exit 1

Step 7: Initialize Tracking

TodoWrite: one task per milestone, status = pending
TodoWrite: first no-dependency milestone → in_progress

Phase 1: ROUTE

State Machine (enforced by TodoWrite)

Read .memory/project.md + TodoWrite
  ├─ File missing → Phase 0
  ├─ All done → Phase 3
  ├─ No in_progress → set first pending no-dep → in_progress
  └─ Has in_progress → that's the current milestone

Status Query

"where are we" / "status" / "progress" → display TodoWrite + .memory/project.md.

Do NOT enter Phase 2.

Commands

"skip M{N}" → TodoWrite: mark skipped. Grep for dependents → warn.
"retry" → Reset retry counter. Enter Phase 2.
"resume" / "continue" → Find in_progress. Enter Phase 2.

Milestone Matching

Match by: number (M3, T2), name substring ("auth"), file path.

Cross-reference .memory/project.md.

Dependency Check

Any dependency not done or skipped → AskUserQuestion:

"Depends on {name} ({status}). Run dep first, skip it, or pick another?"

Phase 2: EXECUTE — Adversarial TDD Loop (Bash-Gated)

Milestone Type Dispatch

Type	Action
------	--------
`code`	Adversarial TDD loop below
`manual`	`AskUserQuestion`: state what human must do. Wait for confirmation.
`research` / `design`	`Glob` for deliverable → exists? → verify against completion standard. Skip TDD.

Brownfield Pre-Change Impact Analysis

If .memory/project.md has gitnexus_indexed: yes and the milestone touches

EXISTING functions (not new files — check with Grep for the target symbol):

Read .memory/project.md → milestone Files section → identify which symbols are EXISTING vs NEW.

For each EXISTING symbol being modified:
  mcp__gitnexus__impact:
    target = "<function_or_class_name>"
    direction = "upstream"  (what depends on this — what will break)
    summaryOnly = true  (just counts and risk — don't need full detail yet)
  → Risk CRITICAL or HIGH → display findings, but do NOT block.
    Only ASK if risk = CRITICAL AND dependents ≥ 10:
      AskUserQuestion: "Changing <symbol>. Impact: <N> direct callers, <M> processes affected.
                        Proceed or reassign to a later milestone?"
  → Risk MEDIUM or LOW → proceed. The tests will catch breakage.

This is ADVISORY only. The BREAK step + full test suite are the real safety net.

Do NOT let gitnexus override the TDD loop — if impact says CRITICAL but tests

all pass after change, the tests are right and gitnexus is just warning about

coupling that hasn't caused a bug yet.

Adversarial TDD Loop

Each step is gated by a Bash exit code. No evidence → blocked → loop back.

STEP 1 — RED (prove test FAILS for the RIGHT reason):
  Write: test file.
  Bash: «test_cmd» «test_file» -v
    exit ≠ 0 → check WHY it failed:
      Grep output for "AssertionError|assert|E  " → found → STEP 2
      Grep output for "SyntaxError|ImportError|NameError" → found
        → "Test failed for wrong reason (syntax/import/name error,
           not a real assertion). Rewrite." → STEP 1
    exit = 0 → "Test passes with zero implementation. Vacuous." → STEP 1

STEP 2 — GREEN + GENERALIZE (prove impl isn't hardcoded):
  Write: minimum implementation.
  Bash: «test_cmd» «test_file» -v → exit 0 → STEP 2b
    exit ≠ 0 → "Fix implementation. Do NOT change the test." → STEP 2
  STEP 2b — GENERALIZE:
    Write: second test with DIFFERENT inputs (edge case, boundary value).
    Bash: «test_cmd» «test_file» -v → MUST exit 0 WITHOUT changing impl.
    If impl must change: hardcoded to first test → rewrite from STEP 1.

STEP 3 — BREAK (prove test catches absence):
  CHECK .memory/project.md: does this milestone have "Has external deps: yes"?
  ┌─ YES (external deps: DB driver, HW SDK, CAD API, cloud service):
  │   Write the test with mocking. Run it GREEN first (test + mock pass).
  │   THEN remove ONLY the mock setup, NOT the import.
  │   Bash: «test_cmd» «test_file» -v
  │     exit ≠ 0 with AssertionError from missing mock → mock is real → STEP 4
  │     exit = 0 → mock was never asserted → rewrite from STEP 1
  └─ NO (pure logic module):
      Edit: delete or comment out the implementation.
      Bash: «test_cmd» «test_file» -v
        exit ≠ 0 → Grep output for "AssertionError|assert|E  " → found
          → "Test caught deletion via real assertion." → STEP 4
        exit ≠ 0 but ONLY ImportError/NameError → "Test only checks module
          exists, not that it works. Rewrite from STEP 1."
        exit = 0 → "Test passes without ANY implementation. Mock-trash." → STEP 1

STEP 4 — REFACTOR (prove cleanup safe):
  Edit: extract helpers, improve names, remove duplication.
  Bash: «test_cmd» «test_file» -v → exit 0 → STEP 5
    exit ≠ 0 → "Refactor broke something. Fix." → STEP 4

STEP 5 — GATE (adversarial: try to break every way):
  Bash: «test_cmd» --cov=«module» → coverage ≥ coverage_floor AND
        Select-String "assert" «test_file» | Measure-Object -Line → ≥ test function count
or  Bash: Grep -c "assert" «test_file» → count ≥ number of test functions
  Bash: «lint_cmd» → exit 0
  Bash: «typecheck_cmd» → exit 0
  Bash: «security_cmd» → exit 0 (or note "no security tool configured")
  Grep: "«forbidden_patterns»" → exit 1

  ALL pass → git add «module_file» «test_file» → git commit → advance to Phase 1
  ANY fail → fix → re-run gate
  Same failure crosses retry_ceiling → AskUserQuestion with failure pattern

  ┌─ Post-commit (brownfield only, gitnexus indexed):
  │   mcp__gitnexus__detect_changes: scope = "unstaged" → expect empty
  │   (nothing left uncommitted — gitnexus checks committed tree).
  │   If mcp__gitnexus__detect_changes returns affected flows:
  │     → "Architecture drift: <N> flows now traverse these files." — advisory.
  │     → If same flow appears in drift for 2+ consecutive milestones:
  │         AskUserQuestion: "Flow <name> has structural drift across milestones —
  │           consider refactoring."
  │     → Does NOT block. Tests are authoritative. GitNexus is structural awareness.

Paste the Bash output at each step. No evidence → blocked.

Subagent Dispatch

When Grep finds ≥2 cross-file imports between candidate files:

Agent: "Implement «module». Read CLAUDE.md. Adversarial TDD loop — each step
gated by Bash exit code. Paste evidence at every checkpoint. No skip allowed."

Workflow Dispatch

When Grep finds ZERO shared files AND .memory/project.md shows ZERO dep edges

between milestones:

Workflow: pipeline the independent milestones.
Each runs the full adversarial TDD loop internally.

Adversarial Gate Calibration (after each milestone)

Bash: check the BREAK step from previous milestone
  → Did the BREAK step catch a real bug? If yes, the adversarial gates are working.
  → Did no BREAK step ever fail? Then tests are too shallow. Require more assertions
    per test in the next milestone.

Bash: «test_cmd» --cov=. --cov-report=term | grep TOTAL
  → Record actual coverage. Next milestone must NOT decrease this number.
  → If coverage increased by ≥5%: gates working. Keep.
  → If coverage increased by <1%: diversify assertions in next milestone.

Phase 3: COMPLETION

TodoWrite: all = completed or skipped
Bash: git log --oneline (show milestone commits)
Bash: «test_cmd» --cov=. --cov-report=term (final coverage)
Glob: all promised deliverable files

Report:
  "{N} done, {S} skipped | Coverage: {Y}% | BREAK checks passed: {B}"
Adversarial note: "Deleted {X} implementations. Tests caught {Y} deletions.
                  Miss rate: {Z} — these tests need hardening."

Memory System

File	Tool	When
---	---	---
`CLAUDE.md`	`Write` once	Phase 0, never overwritten
`.memory/project.md`	`Read` then `Edit`	Bootstrap + after each milestone
`.gitignore`	`Write` once	Phase 0a (greenfield) or Phase 0 Step 3
TodoWrite	`TodoWrite`	Every state transition

On crash: Read references/session-recovery.md.

Reference Files

File	When to Load
---	---
`references/verification-patterns.md`	Phase 0 Step 2 (detect tech stack) and for non-Python projects
`references/quality-gate-checklist.md`	Phase 0 Step 5 (CLAUDE.md adversarial review)
`references/prd-parsing-patterns.md`	Phase 0 Step 1 when PRD structure is unclear
`references/nx-sdd-example.md`	ONLY when uncertain about milestone granularity
`references/session-recovery.md`	On crash or session loss

prd-to-action

概述