> Plan → [ExitPlanMode batch-approve] → Execute in Batches → Verify
Manual invocation only. Type /auto. Never auto-triggers.
Announce at start: "I'm using the auto skill to plan and execute this task."
Type /auto. That is the only trigger.
Use for: tasks with 2+ distinct steps, multi-file changes, cross-domain work.
Skip for: single-line fixes, pure Q&A, reading files.
The user's intent determines the mode. No keyword matching — understand what they want from the request.
| Mode | User Intent | Behavior |
|---|---|---|
| ------ | ------------- | ---------- |
| PLAN | User wants a plan only, no execution | EnterPlanMode -> design plan -> save to docs/plans/ -> ExitPlanMode -> STOP |
| BUILD | User wants to execute an existing plan | Load plan from docs/plans/ -> critical review -> execute in batches with checkpoints |
| FULL | (default) Plan then execute | EnterPlanMode -> present plan -> ExitPlanMode(allowedPrompts) -> execute all steps |
| AUTO | User wants autonomous execution, minimal prompts | Plan internally -> ExitPlanMode(allowedPrompts) -> execute without review gates |
Every auto invocation starts here. The skill index at references/skill-index.json is the single source of truth for skill matching. Keep it fresh.
Two modes — index absent vs. index present:
| Scenario | Behavior |
|---|---|
| ---------- | ---------- |
| First run (index missing) | Full scan → read every SKILL.md body → classify each skill from full content |
| Subsequent run (index exists) | Diff scan → only classify new/changed skills via full body read → delete removed entries |
This is the expensive path. It happens ONCE.
python scripts/scan_skills.py --json-stdoutclassifications_needed will contain ALL discovered skills.classifications_needed: a. Read the full SKILL.md body using the file path from the scan output. No shortcuts. No pattern pre-classification. No name+description guessing.
b. Classify using the Classification Guidelines below — ops, domain, prereqs, summary (one sentence capturing the actual behavior, not the frontmatter description), use_for (2-5 specific tasks), do_not_use_for (1-3 likely misapplications).
c. The scan_skills.py pre_classified field is a hint only — verify against the full body. Override when it disagrees.
skill-index.json v2 format: {version, scanned_at, skills: {name: {ops, domain, prereqs, summary, use_for, do_not_use_for, content_hash}}}Constraint: Process in batches of 20-30 skills. After each batch, write partial results to the index so a crash doesn't lose all progress.
This is the cheap path. It happens on every subsequent /auto invocation.
python scripts/scan_skills.py --json-stdoutclassifications_needed has only new + changed skillsdeleted has removed skillsclassifications_needed is non-empty:a. For each skill: read the full SKILL.md body before classifying (same classification rules as first run)
b. Merge into index: add/update entries in skills, update scanned_at
deleted is non-empty: Remove those keys from skills in the index.If scan_skills.py fails (Python not available, etc.):
skill-index.json directly and proceed with whatever data is availableIndex freshness rule: Re-scan if scanned_at > 3 days ago OR user mentions installing/removing skills.
Plan: <one-line summary> | Tasks: N | Mode: <mode>
-> Task 1..N: <brief sequence>
Skills: <list or "direct">
Plan file naming (PLAN / FULL modes): docs/plans/YYYY-MM-DD-
Use ExitPlanMode with allowedPrompts — the official Claude Code batch-approval mechanism. The user approves once, all listed operations are pre-authorized.
In AUTO mode: still call ExitPlanMode (required by the harness). For truly zero-prompt execution, pre-configure permissions.allow in settings.json:
{
"permissions": {
"allow": [
"Bash(git *)",
"Bash(npm *)",
"Bash(cargo *)",
"Bash(rtk *)",
"WebSearch",
"WebFetch(*)"
]
}
}
Use /update-config or /fewer-permission-prompts to build this list from actual usage.
Adopted from executing-plans: batch of 3 tasks -> report -> checkpoint.
FULL mode: execute batch -> report -> auto-continue to next batch.
BUILD mode: execute batch -> report -> wait for feedback before next batch.
AUTO mode: skip review gates entirely. Continue until done or blocked.
Re-evaluate after each batch: if a later task's inputs changed due to earlier results, update it before executing.
Per-task execution:
Task N/M: Track all tasks with TaskCreate / TaskUpdate (Claude Code's official task tracking).
Iron law (from verification-before-completion):
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
For each task and at final backpressure:
Never use "should work", "probably", or "seems to". Run the command. Read the output. Then claim.
Adopted from executing-plans — STOP immediately when:
Ask for clarification rather than guessing. Don't force through blockers.
The skill index at references/skill-index.json is the single source of truth. Match each task against the index using the algorithm below.
Given a task with an operation tag:
prereqs are not met in the current environment (no git repo -> remove git-prereq skills, etc.)ops tag matches the task's operation. Hard gates:explore:local tasks -> never match explore:web skillscreate tasks -> never match review-only skillsdesign tasks -> never match execute-only skillsdomain matches task domainuse_for entries semantically match the taskdo_not_use_for entries semantically match the task| Tag | Meaning |
|---|---|
| ----- | --------- |
create | Making new files, features, content from scratch |
update | Modifying, refactoring, fixing existing things |
review | Reading, analyzing, auditing, explaining |
design | Planning, brainstorming, architecting, estimating |
execute:local | Running local commands, builds, tests, scripts |
execute:remote | Deploying, pushing, remote API calls |
explore:local | Searching/reading local codebase |
explore:web | Web research, external data fetching |
meta backend frontend devops testing docs git security ml research utility performance
git git:diff web node python pip mcp api:anthropic
When classifying a skill from its description and source context:
ops (choose 1-4):
create if it produces new files/code/contentupdate if it modifies existing thingsreview if it reads, analyzes, audits, explains, or inspectsdesign if it plans, brainstorms, architects, or estimatesexecute:local if it runs local commands (build, test, install, cli)execute:remote if it deploys, pushes, or calls remote servicesexplore:local if it searches/reads the local codebaseexplore:web if it does web searches or fetches external datadomain (exactly 1): Infer from description keywords and source context.
metautilityfrontendbackenddevopstestingdocsgitsecuritymlresearchprereqs (0-4): Infer from description. Git commands -> git. pip install -> pip. npm/node -> node. Web searches -> web. MCP tools -> mcp.
use_for (2-5 short phrases): What specific tasks does this skill handle well? Be specific.
do_not_use_for (1-3 short phrases): What common tasks would be a bad fit? Focus on likely mistakes.
Batch strategy: Classify in groups by source context for consistency (e.g., all sc- skills together, all cli-anything- together).
For tasks with 5+ steps:
| File | Purpose |
|---|---|
| ------ | --------- |
docs/plans/YYYY-MM-DD- | Tasks, progress, decisions |
findings.md | Research discoveries |
progress.md | Session log |
Reboot check (after context gaps): Read plan file -> check current phase -> resume from last completed task.
After all tasks complete and verified (from finishing-a-development-branch):
All N tasks complete. . 共 10 个版本