← 返回
未分类

prd-to-action

Universal harness that converts ANY PRD, spec, or requirements document into directly executed development tasks with adversarial verification gates enforced via Claude Code tool calls. Reads the PRD, extracts milestone structure, creates a TodoWrite tracking loop, and iterates through implementation using Agent, Workflow, and direct tool execution. Triggers: "start", "go", "ship it", "start the PRD", "start building", "start coding", "start development", "let's begin", "let's get started", "exe
Universal harness that turns any PRD, spec, or requirements document into directly executed development tasks enforced via Claude Code tool calls — not /goal text generation, but real Bash-gated TDD loops. Core mechanism: Adversarial evaluation. Instead of checkbox compliance ("coverage ≥ 80%"), the harness tries to break things. The BREAK step deletes implementation and demands tests fail with AssertionError — not ImportError. GREEN includes GENERALIZE: a second test with different inputs must pass without changing code. No hardcoded thresholds — coverage floors and retry ceilings derive from baseline measurements. Flow: PRE-FLIGHT classifies projects (greenfield/brownfield/resume) → INGEST reads the PRD, measures baseline, generates CLAUDE.md and .memory/project.md → ROUTE manages milestone state machine → EXECUTE runs adversarial TDD (RED → GREEN+GENERALIZE → BREAK → REFACTOR → bash-gated GATE) → LOOP. Each step is blocked on exit code evidence. No evidence → no progress. Optional GitNexus integration for brownfield projects: pre-change impact analysis on existing symbols, post-commit architecture drift detection. Advisory only — tests remain authoritative. Progressive disclosure: SKILL.md (~540 lines) with 5 reference files loaded on demand. Shell translation table for PowerShell/Bash compatibility. Human escalation rules: only ask when genuinely blocked, never for routine decisions. 6 files, zero hardcoded numbers, one principle: if you can't delete the code and watch the test die for the right reason, you haven't proven anything.
云怀科技
未分类 community v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 2
下载
💾 0
安装
1
版本
#latest

概述

PRD-to-Action Harness

Shell: All Bash tool calls below use PowerShell 7 syntax — the default

Claude Code shell on Windows. On Unix, Bash auto-uses bash. Commands below

are written for bash; if running on Windows, PowerShell equivalents:

Bash commandPowerShell equivalent
------
which cmdGet-Command cmd
grep pattern fileSelect-String pattern file
awkUse Select-String or PowerShell pipeline
find ... -exec ... {} +`Get-ChildItem -Recurse -Filter ... \ForEach-Object { ... }`
wc -lMeasure-Object -Line
sort -rnSort-Object -Descending
head -NSelect-Object -First N
tr -d '%'-replace '%',''
sed -n 'N,Mp'Select-Object -Skip (N-1) -First (M-N+1)

If unsure which shell is active, run: Bash: echo $PSVersionTable — if it prints,

you're on PowerShell.

Loop: PRE-FLIGHT → INGEST → ROUTE → EXECUTE → VERIFY → LOOP.

Every gate is a tool call exit code. Every quality check is adversarial —

try to break it, don't check a box.

Core Principle: Adversarial Evaluation

OKRs are a compliance game for LLMs — set a low bar, hit it, look good.

Adversarial evaluation asks: "What would make this FAIL?" Then tests that.

OKR pattern (DON'T):     "Coverage ≥ 80%" → LLM writes 80% shallow tests → passes
Adversarial (DO):        "Delete the implementation. Do tests fail? If not, they're
                          mock-trash — rewrite." → LLM can't game this.

Pre-Flight: Environment Classifier

Before Phase 0, classify the project state. This determines the bootstrap path.

Project State Classification

Glob {project_root}/*.py 2>/dev/null
Glob {project_root}/package.json 2>/dev/null
Glob {project_root}/go.mod 2>/dev/null
Glob {project_root}/Cargo.toml 2>/dev/null
Glob {project_root}/PRD*.md 2>/dev/null

Read .memory/project.md → exists?
ConditionStateAction
---------
No source files AND no PRD AND no .memorygreenfield_emptyInteractive bootstrap: ask tech stack, init project
PRD exists but no source filesgreenfield_seededPhase 0 with PRD, skip baseline (nothing to measure)
Source files exist, no PRDbrownfield_no_specAsk for PRD, then Phase 0
Source files + PRD existbrownfield_fullFull Phase 0 with baseline measurement
.memory/project.md existsresumePhase 1

Required Tools

Bash: git --version → exit 0 = OK, exit ≠ 0 → AskUserQuestion: "Git is not installed. Install it first?"

(On PowerShell: git --version works the same way.)

Language-specific tool checks happen in Step 2 after tech stack is known.

Optional Tools

ToolSearch: "gitnexus" → if found, usable for brownfield projects. If absent, skip.
  - mcp__gitnexus__impact: before modifying existing symbols → see blast radius
  - mcp__gitnexus__detect_changes: after milestone commit → architecture drift advisory
  - Never used for greenfield — no existing code to index.
Skill: "superpowers" → if found, usable. If absent, skip.
MCP memory → test with mcp__memory__create_entities. If absent, .memory/project.md is authoritative.
codegraph → `Bash: codegraph affected <file>` — find tests affected by a changed file.
  Redundant with full-test-suite gate. Only useful for selective re-runs in brownfield.

Human Escalation Rules

Escalate (AskUserQuestion) when:

  • Required tool missing
  • Genuinely ambiguous decision (valid approaches, no clear winner)
  • Milestone dependency blocked — human chooses: wait / skip / reorder
  • Same failure pattern crosses retry ceiling — report the pattern, ask direction
  • File operation would delete/overwrite something not created by this harness

Do NOT escalate for:

  • Code implementation decisions (that's the TDD loop's job)
  • Test failures with clear fixes
  • Routine file operations

Phase 0: INGEST — Bootstrap

Step 0: State Check

Read .memory/project.md. Exists → Phase 1.

Classify state per Pre-Flight table above.

Step 0a: Greenfield Init

If state is greenfield_empty or greenfield_seeded:

AskUserQuestion: "This is a new project. What language and tools?"
  → Record: language, test framework, linter, type checker, security scanner.

If greenfield_empty → Bash: git init && create minimal project structure
  (src/, tests/, pyproject.toml or equivalent).

If greenfield_seeded → PRD exists, skip baseline measurement.
  Set existing_test_count = 0, existing_max_file_lines = 500, existing_coverage_pct = 0.
  Coverage floor for greenfield: 50% minimum, not "never decrease from 0%."
  File size limit for greenfield: 500 (no existing code to measure).

Generate .gitignore with standard patterns for the language.
  At minimum: .memory/  __pycache__/  .pytest_cache/  .mypy_cache/  *.pyc
  node_modules/  target/  coverage/  dist/

Bash: git add .gitignore && git commit -m "chore: init project structure"

Step 1: Find and Read the PRD

Glob: PRD*.md, SPEC*.md, *.prd.md, requirements*.md

Empty → AskUserQuestion: "Where is the PRD file?" Then Read it.

Extract: project name, tech stack, constraints, milestones with dependencies,

per-milestone completion standards, milestone type (code | manual | research | design).

If milestone structure unclear → Read references/prd-parsing-patterns.md.

Step 2: Detect Tech Stack & Load Verification Commands

From PRD, project files, or AskUserQuestion. If the project is NOT Python,

load the correct pattern from references/verification-patterns.md.

Record: {test_cmd}, {lint_cmd}, {typecheck_cmd}, {security_cmd}.

Step 3: Measure the Codebase Baseline

Skip if greenfield (values already set in Step 0a). Otherwise:

# Count existing tests (adapt -name pattern to detected language)
Bash: find . -name "test_*.py" -exec grep -c "def test_" {} + 2>/dev/null | awk -F: '{s+=$2} END {print s+0}'
result=$(above command); if [ -z "$result" ] || [ "$result" = "0" ]; then echo 0; else echo "$result"; fi
  → existing_test_count

# Measure largest file (adapt -name pattern to detected language)
Bash: result=$(find . -name "*.py" -exec wc -l {} + 2>/dev/null | sort -rn | awk 'NR==1{print $1}'); if [ -z "$result" ] || [ "$result" = "0" ]; then echo 500; else echo "$result"; fi
  → existing_max_file_lines

# Measure existing coverage
Bash: {test_cmd} --cov=. --cov-report=term 2>/dev/null | grep TOTAL | awk '{print substr($NF, 1, length($NF)-1)}' || echo 0
  → existing_coverage_pct

Derivation rules (record in .memory/project.md):

ValueGreenfield DefaultBrownfield Derivation
---------
New test minimum per module2max(1, ceil(existing_test_count / milestone_count))
File size limit500existing_max_file_lines (never lower than current max)
Coverage floor50%max(existing_coverage_pct, 50%) — zero baseline does NOT exempt
Retry ceiling3Start at 3. Minimum 1, maximum 5. If no failures across 5 milestones → ceiling was too loose, reduce by 1 (never below 1). Reset to 3 after any milestone fails.

Step 3b: Brownfield — Index with GitNexus

Only if state is brownfield_full or brownfield_no_spec AND ToolSearch: "gitnexus" found it.

mcp__uml__gitnexus_index: path = {project_root}
  → Indexes the existing codebase so Phase 2 can use impact analysis.
  Skip if indexing fails (not a blocker — gitnexus is advisory, not mandatory).
  This is a one-time cost: the index persists and auto-syncs on subsequent runs.

Record in .memory/project.md: gitnexus_indexed: yes | no — date.

Step 4: Generate CLAUDE.md

IMPORTANT: The template below contains template variables marked «VAR».

Before writing CLAUDE.md, substitute every «VAR» with the concrete value

measured or detected in Steps 2-3. The resulting file MUST contain zero

template variables — they are resolved at write time.

Template:

# CLAUDE.md

## TDD (NON-NEGOTIABLE)
Every code change: RED → GREEN → REFACTOR → BREAK.
- RED: Write test. Bash: «test_cmd» «test_path» → MUST exit non-zero.
- GREEN: Minimum code. Bash: «test_cmd» «test_path» → MUST exit zero.
- REFACTOR: Clean. Bash: «test_cmd» «test_path» → MUST exit zero.
- BREAK (adversarial): Delete/comment out implementation → run test → MUST exit non-zero.
  The failure MUST be an assertion failure (AssertionError, assert), NOT import/name error.
  EXCEPTION: modules with external dependencies (database drivers, hardware SDKs,
  CAD APIs, cloud services). For these, BREAK verifies that mock-returned values
  are asserted — the test must fail when mock is removed, not when import fails.
  If test passes without implementation: the test is vacuous. Rewrite it.
  This is NOT optional. Every test must prove it catches absence.

## Verification Gate (run after EVERY module, then again after refactor)
1. Bash: «test_cmd» --cov=«module» → coverage must NOT decrease from «coverage_floor»% (hard floor: 50%)
2. Bash: «lint_cmd» → exit 0
3. Bash: «typecheck_cmd» → exit 0
4. Bash: «security_cmd» → exit 0 (or note "no security tool configured")
5. Grep: «forbidden_patterns» → MUST exit 1

## Structural Discipline
- File size ≤ «file_size_limit» lines (derived from codebase baseline; default 500 for new projects)
- No dead code. No bare except:. One-way dependencies.
- Test isolation: no shared mutable state, no external service calls in unit tests.
  Mock external dependencies.

## Commit Discipline
Bash: git add «module_file» «test_file» && git commit -m "M{N}: «module» — {change}"
Never commit unrelated files. Never commit when any gate check fails.

Step 5: CLAUDE.md Adversarial Quality Gate

Read references/quality-gate-checklist.md. For each checklist item:

Try to BREAK the generated CLAUDE.md:
  - Are there unresolved placeholders? Grep "placeholder|TODO|{.*}" → MUST exit 1.
    NOTE: { and } are never valid in a final CLAUDE.md. All template variables
    were replaced with concrete values in Step 4.
  - Are completion standards falsifiable? Grep "works correctly|looks good|feels right|
    is done|user satisfied|PM signoff|code review approve|demo works" → MUST exit 1.
  - Is the BREAK step present? Grep "BREAK" → MUST find it.
  - Does BREAK include the external-dependency exception? Grep "external depend"
    → MUST find it (for projects that need it).
  - Are verification commands real tools? Grep "«test_cmd»|«lint_cmd»|«typecheck_cmd»"
    → MUST exit 1 (these are template markers, not final content).

Any FAIL → Edit CLAUDE.md to fix → re-check. Loop until all pass.

TodoWrite: "CLAUDE.md quality gate" → done only when all checks pass.

Step 6: Write .memory/project.md

Template (substitute «VAR» with extracted values, write to project root):

# «Project Name»

## Tech Stack
- Language: «language»
- Test: «test_cmd»
- Lint: «lint_cmd»
- Type: «typecheck_cmd»
- Security: «security_cmd»

## Constraints
«extracted from PRD»

## Baseline (measured «date»)
- existing_test_count: «N»
- existing_max_file_lines: «N»
- existing_coverage_pct: «N»%
- coverage_floor: «N»% (max of baseline and 50%)
- file_size_limit: «N»
- retry_ceiling: «N» (floor 1, reset to 3 on failure)
- gitnexus_indexed: yes | no — «date» (greenfield = no, skip Step 3b)

## Milestones
### «M{N}»: «Name»
- Depends on: «M{N-1} | none»
- Type: code | manual | research | design
- Has external deps: yes | no  (if yes, BREAK step uses mock-exception rule)
- Files: «paths»
- Completion: «falsifiable criteria»
- Verify: «verification command»
- Status: pending | in_progress | done | skipped

Each Completion field must pass the adversarial falsifiability test:

Grep: "works correctly|looks good|feels right|is done|user satisfied|PM signoff|code review approve|demo works"
in .memory/project.md → MUST exit 1

Step 7: Initialize Tracking

TodoWrite: one task per milestone, status = pending
TodoWrite: first no-dependency milestone → in_progress

Phase 1: ROUTE

State Machine (enforced by TodoWrite)

Read .memory/project.md + TodoWrite
  ├─ File missing → Phase 0
  ├─ All done → Phase 3
  ├─ No in_progress → set first pending no-dep → in_progress
  └─ Has in_progress → that's the current milestone

Status Query

"where are we" / "status" / "progress" → display TodoWrite + .memory/project.md.

Do NOT enter Phase 2.

Commands

  • "skip M{N}" → TodoWrite: mark skipped. Grep for dependents → warn.
  • "retry" → Reset retry counter. Enter Phase 2.
  • "resume" / "continue" → Find in_progress. Enter Phase 2.

Milestone Matching

Match by: number (M3, T2), name substring ("auth"), file path.

Cross-reference .memory/project.md.

Dependency Check

Any dependency not done or skippedAskUserQuestion:

"Depends on {name} ({status}). Run dep first, skip it, or pick another?"


Phase 2: EXECUTE — Adversarial TDD Loop (Bash-Gated)

Milestone Type Dispatch

TypeAction
--------------
codeAdversarial TDD loop below
manualAskUserQuestion: state what human must do. Wait for confirmation.
research / designGlob for deliverable → exists? → verify against completion standard. Skip TDD.

Brownfield Pre-Change Impact Analysis

If .memory/project.md has gitnexus_indexed: yes and the milestone touches

EXISTING functions (not new files — check with Grep for the target symbol):

Read .memory/project.md → milestone Files section → identify which symbols are EXISTING vs NEW.

For each EXISTING symbol being modified:
  mcp__gitnexus__impact:
    target = "<function_or_class_name>"
    direction = "upstream"  (what depends on this — what will break)
    summaryOnly = true  (just counts and risk — don't need full detail yet)
  → Risk CRITICAL or HIGH → display findings, but do NOT block.
    Only ASK if risk = CRITICAL AND dependents ≥ 10:
      AskUserQuestion: "Changing <symbol>. Impact: <N> direct callers, <M> processes affected.
                        Proceed or reassign to a later milestone?"
  → Risk MEDIUM or LOW → proceed. The tests will catch breakage.

This is ADVISORY only. The BREAK step + full test suite are the real safety net.

Do NOT let gitnexus override the TDD loop — if impact says CRITICAL but tests

all pass after change, the tests are right and gitnexus is just warning about

coupling that hasn't caused a bug yet.

Adversarial TDD Loop

Each step is gated by a Bash exit code. No evidence → blocked → loop back.

STEP 1 — RED (prove test FAILS for the RIGHT reason):
  Write: test file.
  Bash: «test_cmd» «test_file» -v
    exit ≠ 0 → check WHY it failed:
      Grep output for "AssertionError|assert|E  " → found → STEP 2
      Grep output for "SyntaxError|ImportError|NameError" → found
        → "Test failed for wrong reason (syntax/import/name error,
           not a real assertion). Rewrite." → STEP 1
    exit = 0 → "Test passes with zero implementation. Vacuous." → STEP 1

STEP 2 — GREEN + GENERALIZE (prove impl isn't hardcoded):
  Write: minimum implementation.
  Bash: «test_cmd» «test_file» -v → exit 0 → STEP 2b
    exit ≠ 0 → "Fix implementation. Do NOT change the test." → STEP 2
  STEP 2b — GENERALIZE:
    Write: second test with DIFFERENT inputs (edge case, boundary value).
    Bash: «test_cmd» «test_file» -v → MUST exit 0 WITHOUT changing impl.
    If impl must change: hardcoded to first test → rewrite from STEP 1.

STEP 3 — BREAK (prove test catches absence):
  CHECK .memory/project.md: does this milestone have "Has external deps: yes"?
  ┌─ YES (external deps: DB driver, HW SDK, CAD API, cloud service):
  │   Write the test with mocking. Run it GREEN first (test + mock pass).
  │   THEN remove ONLY the mock setup, NOT the import.
  │   Bash: «test_cmd» «test_file» -v
  │     exit ≠ 0 with AssertionError from missing mock → mock is real → STEP 4
  │     exit = 0 → mock was never asserted → rewrite from STEP 1
  └─ NO (pure logic module):
      Edit: delete or comment out the implementation.
      Bash: «test_cmd» «test_file» -v
        exit ≠ 0 → Grep output for "AssertionError|assert|E  " → found
          → "Test caught deletion via real assertion." → STEP 4
        exit ≠ 0 but ONLY ImportError/NameError → "Test only checks module
          exists, not that it works. Rewrite from STEP 1."
        exit = 0 → "Test passes without ANY implementation. Mock-trash." → STEP 1

STEP 4 — REFACTOR (prove cleanup safe):
  Edit: extract helpers, improve names, remove duplication.
  Bash: «test_cmd» «test_file» -v → exit 0 → STEP 5
    exit ≠ 0 → "Refactor broke something. Fix." → STEP 4

STEP 5 — GATE (adversarial: try to break every way):
  Bash: «test_cmd» --cov=«module» → coverage ≥ coverage_floor AND
        Select-String "assert" «test_file» | Measure-Object -Line → ≥ test function count
or  Bash: Grep -c "assert" «test_file» → count ≥ number of test functions
  Bash: «lint_cmd» → exit 0
  Bash: «typecheck_cmd» → exit 0
  Bash: «security_cmd» → exit 0 (or note "no security tool configured")
  Grep: "«forbidden_patterns»" → exit 1

  ALL pass → git add «module_file» «test_file» → git commit → advance to Phase 1
  ANY fail → fix → re-run gate
  Same failure crosses retry_ceiling → AskUserQuestion with failure pattern

  ┌─ Post-commit (brownfield only, gitnexus indexed):
  │   mcp__gitnexus__detect_changes: scope = "unstaged" → expect empty
  │   (nothing left uncommitted — gitnexus checks committed tree).
  │   If mcp__gitnexus__detect_changes returns affected flows:
  │     → "Architecture drift: <N> flows now traverse these files." — advisory.
  │     → If same flow appears in drift for 2+ consecutive milestones:
  │         AskUserQuestion: "Flow <name> has structural drift across milestones —
  │           consider refactoring."
  │     → Does NOT block. Tests are authoritative. GitNexus is structural awareness.

Paste the Bash output at each step. No evidence → blocked.

Subagent Dispatch

When Grep finds ≥2 cross-file imports between candidate files:

Agent: "Implement «module». Read CLAUDE.md. Adversarial TDD loop — each step
gated by Bash exit code. Paste evidence at every checkpoint. No skip allowed."

Workflow Dispatch

When Grep finds ZERO shared files AND .memory/project.md shows ZERO dep edges

between milestones:

Workflow: pipeline the independent milestones.
Each runs the full adversarial TDD loop internally.

Adversarial Gate Calibration (after each milestone)

Bash: check the BREAK step from previous milestone
  → Did the BREAK step catch a real bug? If yes, the adversarial gates are working.
  → Did no BREAK step ever fail? Then tests are too shallow. Require more assertions
    per test in the next milestone.

Bash: «test_cmd» --cov=. --cov-report=term | grep TOTAL
  → Record actual coverage. Next milestone must NOT decrease this number.
  → If coverage increased by ≥5%: gates working. Keep.
  → If coverage increased by <1%: diversify assertions in next milestone.

Phase 3: COMPLETION

TodoWrite: all = completed or skipped
Bash: git log --oneline (show milestone commits)
Bash: «test_cmd» --cov=. --cov-report=term (final coverage)
Glob: all promised deliverable files

Report:
  "{N} done, {S} skipped | Coverage: {Y}% | BREAK checks passed: {B}"
Adversarial note: "Deleted {X} implementations. Tests caught {Y} deletions.
                  Miss rate: {Z} — these tests need hardening."

Memory System

FileToolWhen
---------
CLAUDE.mdWrite oncePhase 0, never overwritten
.memory/project.mdRead then EditBootstrap + after each milestone
.gitignoreWrite oncePhase 0a (greenfield) or Phase 0 Step 3
TodoWriteTodoWriteEvery state transition

On crash: Read references/session-recovery.md.


Reference Files

FileWhen to Load
------
references/verification-patterns.mdPhase 0 Step 2 (detect tech stack) and for non-Python projects
references/quality-gate-checklist.mdPhase 0 Step 5 (CLAUDE.md adversarial review)
references/prd-parsing-patterns.mdPhase 0 Step 1 when PRD structure is unclear
references/nx-sdd-example.mdONLY when uncertain about milestone granularity
references/session-recovery.mdOn crash or session loss

版本历史

共 1 个版本

  • v1.0.0 Initial release 当前
    2026-06-11 16:33 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

YouTube

byungkyu
使用托管OAuth集成YouTube Data API,支持搜索视频、管理播放列表、获取频道数据及评论互动,适用于用户需要时使用此技能。
★ 142 📥 42,181
dev-programming

Mcporter

steipete
使用 mcporter CLI 直接列出、配置、认证及调用 MCP 服务器/工具(支持 HTTP 或 stdio),涵盖临时服务器、配置编辑及 CLI/类型生成功能。
★ 198 📥 68,325
dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 687 📥 331,495