← 返回
未分类

skill-craft

Create, optimize, or update AgentSkills with evidence-based design. Two modes: (1) CREATE — build new skills from scratch with proven design patterns. (2) OP...
基于证据设计创建、优化或更新AgentSkills。两种模式:(1) CREATE — 从零使用成熟设计模式构建新技能;(2) OP — 基于证据优化现有技能。
leonardo-lb leonardo-lb 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 399
下载
💾 0
安装
1
版本
#latest

概述

Skill Craft

Build and optimize AI agent skills using evidence-based design principles.

Two Modes of Operation

CREATE mode — Build a new skill from scratch (Steps 1-6).

OPTIMIZE mode — Diagnose and fix an existing skill (Step 7).

For new skills, follow Steps 1-6 (Step 7 baked into Step 4). For existing skills needing improvement, jump to Step 7.


Skill Classification

Before creating or optimizing, classify the skill type — this determines design priorities.

Nine Skill Types

TypeCodeDefinitionDesign Priority
------------
Process ControlPROCSequential, phased workflows (brainstorming, planning)D4 Steps → D7 Verify → D2 Efficiency
MethodologyMETHReusable methods with iron laws (TDD, debugging)D4 Steps → D7 Verify → D5 Positive
Tool IntegrationTOOLAPI/CLI wrappers (web-access, calculator)D1 Description → D8 Freedom → D6 Examples
Domain KnowledgeKNOWDomain expertise and standards (design, mobile)D9 Progressive → D2 Efficiency → D3 Structure
Creative/GenerativeGENCreative output with quality bars (prompt refinement)D5 Positive → D8 Freedom → D6 Examples
Document GenerationDOCStructured document output (reports, wikis)D4 Steps → D3 Structure → D7 Verify
OrchestrationORCHMulti-agent/task coordination (parallel dispatch)D4 Steps → D7 Verify → D6 Examples
Meta-SkillMETASkills about skills (this very skill)D3 Structure → D9 Progressive → D5 Positive
Quality AssuranceQAVerification and review (code review, testing)D7 Verify → D5 Positive → D4 Steps

See references/skill-taxonomy.md for per-type design templates, pitfalls, and the full classification decision tree.


Skill Anatomy

skill-name/
├── SKILL.md          (required) — YAML frontmatter + Markdown instructions
├── scripts/          (optional) — Deterministic code (Python/Bash)
├── references/       (optional) — Detailed docs loaded on demand
└── assets/           (optional) — Templates/images used in output

Three-Level Loading (Progressive Disclosure)

LevelContentWhen LoadedSize Target
------------
1name + description (frontmatter)Always in context~100 words
2SKILL.md bodyWhen skill triggers< 500 lines, < 5k words
3scripts/, references/, assets/On demandUnlimited

Why this matters: Context window is the scarcest resource. Claude Code Issue #2544 (39👍) proves rules get ignored when SKILL.md is too long. CFPO research (Fudan/Microsoft, 2025) shows format+content joint optimization yields +5~38% improvement.

What NOT to Include

README.md, CHANGELOG.md, installation guides, user-facing docs. Only what the AI agent needs to do the job.

State Management in Skills

Some skills need state management — tracking progress across steps or sessions. Based on OpenSpec's DAG pattern and LangGraph's checkpoint system (see references/state-machine-patterns.md).

Does this skill need state management?

ConditionNeeds State?Pattern
---------
Multi-step workflow with dependencies between stepsYESDAG + file existence checks
Output of one step feeds into anotherYESCheckpoint pattern
Each invocation is independentNOStateless
Free-form exploration or conversationNOStateless

For state patterns (file existence, checkbox, YAML, DAG) and script integration, see references/state-machine-patterns.md.

Key principle: Use deterministic checks (file exists, script output) rather than LLM judgment for state transitions. Research (arXiv:2511.07585) shows large models have only 12.5% output consistency — scripts are 8× more reliable.


CREATE Mode: Steps 1-6

Skill Naming

  • Lowercase letters, digits, hyphens only. Under 64 chars.
  • Prefer short, verb-led: pdf-editor, db-migrate, brand-styling.
  • Namespace by tool when it helps triggering: gh-address-comments.

Step 1: Understand with Concrete Examples

Ask the user (one question at a time):

  1. "What should this skill do?"
  2. "Can you give 2-3 examples of how it would be used?"
  3. "What would a user say that should trigger this skill?"
  4. "Are there edge cases or error scenarios to handle?"

Step 2: Plan Resources

For each example from Step 1, identify reusable components:

If the task...Add this resource
------
Rewrites the same code each timescripts/transform.py
Requires domain schemas/docsreferences/schema.md
Uses templates or boilerplateassets/template/

Step 3: Initialize

scripts/init_skill.py <skill-name> --path <output-dir> [--resources scripts,references,assets] [--examples]

Step 4: Write the Skill (with Built-in Optimization)

This step creates SKILL.md following all 9 optimization dimensions from the start. Write using imperative form.

4a. Write Frontmatter (D1: Trigger Precision)

The description field is the only content always loaded. It determines whether the skill triggers at all. Issue #9716 (69👍) documents skills ignored because descriptions lacked trigger phrases.

Three-part formula:

# BAD — vague, no triggers
description: Helps with PDF operations

# GOOD — capabilities + trigger scenarios + keywords
description: >
  Create, edit, and analyze PDF documents with form filling,
  text extraction, and page manipulation. Use when: (1) Creating PDFs,
  (2) Extracting text or form data, (3) Rotating/merging pages,
  (4) Converting PDFs to images. Trigger: "pdf", "rotate pdf",
  "merge pdf", "extract text", "fill form"

Only name and description fields allowed in frontmatter.

4b. Write Body Sections (D3: Structure)

Separate content types with Markdown headings:

## Workflow
[Numbered steps — see D4 below]

## Constraints
[Specific, measurable rules — see D5 below]

## Examples
[Concrete user request → agent behavior — see D6 below]

## Verification
[Success criteria and how to check — see D7 below]

## Resources
[Links to references/ — see D9 below]

4c. Write Workflows (D4: Step Decomposition)

Research (CoM, ACL 2025; Watch Every Step, EMNLP 2024) proves numbered steps with decision points outperform free-form descriptions.

# BAD — paragraph description
Process the document by extracting key fields and validating them
against the schema, then generating a report.

# GOOD — numbered steps with branching
## Workflow

1. **Extract**: Read document, extract fields in `references/schema.md`
2. **Validate**: Check each field against schema rules
   - If valid → proceed to step 3
   - If invalid → report specific errors, suggest fixes, STOP
3. **Transform**: Run `scripts/transform.py --input <extracted.json>`
4. **Report**: Generate output using `assets/report-template.md`
5. **Verify**: Confirm output contains all required sections

4d. Write Constraints (D5: Positive Framing)

Inverse IFEval (ByteDance, 2025, 1012 test cases) proves positive constraints are followed significantly more reliably. Constraint Decomposition (2025): 41.2% → 73.8% accuracy when constraints are decomposed.

# BAD — negative constraints
Don't write functions longer than 20 lines.
Never modify files outside the project.
Avoid using deprecated APIs.

# GOOD — positive constraints with specifics
Keep every function under 20 lines. Extract helpers for complex logic.
Only modify files within the target directory specified by the user.
Use APIs from the approved list in `references/api-versions.md`.

For safety-critical rules, keep negative framing but pair with positive alternative:

CRITICAL: Only delete files after explicit user confirmation. (safety negative)
Prefer creating new files over modifying existing ones when possible. (positive alternative)

4e. Write Examples (D6: Example Design)

Cursor official data: Good/Bad examples make rules ~3x more effective. ACL 2025: 1-shot achieves best flexibility/structure balance.

# BAD — abstract
Use the transform script to convert between formats.

# GOOD — concrete "user request → agent behavior"
## Examples

### User: "Convert this CSV to JSON"
→ 1. Run `scripts/transform.py data.csv --output data.json`
→ 2. Verify JSON has keys: `name`, `date`, `amount`
→ 3. Report: "Converted 150 records from CSV to JSON"

### User: "The output has wrong date format"
→ 1. Check current date format in output
→ 2. Add `--date-format ISO8601` flag
→ 3. Re-run and verify dates match `YYYY-MM-DD`

Include 1-3 examples in SKILL.md. For more, create references/examples.md.

4f. Write Verification (D7: Verification)

Issue #42796 (1781👍): Analysis of 6,852 sessions shows agent Read:Edit ratio dropped 70%, causing "rush to completion" errors. Explicit verification steps prevent this.

## Verification

After completing the workflow:
1. **Output exists**: Confirm the output file was created
2. **Schema valid**: Run `scripts/validate.py output.json`
3. **Content complete**: Output has all required sections
4. **No regressions**: Run `npm test`

If any check fails → diagnose, fix, re-verify.

4g. Calibrate Freedom (D8: Freedom Calibration)

Match specificity to task fragility (NeurIPS 2024: task type determines optimal approach):

Task FragilityFreedom LevelHow to Write
---------
Fragile (deployment, DB migration, security)Low — specific scripts + numbered steps"Run scripts/backup.sh then scripts/migrate.py --dry-run"
Moderate (API calls, code generation)Medium — preferred patterns + alternatives"Prefer pdfplumber. For encrypted PDFs, use PyMuPDF instead."
Open (writing, brainstorming, design)High — principles + examples"Match brand voice in references/brand-voice.md. Target 500-800 words."

4h. Manage Context (D2 + D9: Efficiency + Progressive Loading)

D2 — Token economy: Every line must earn its cost. Delete what the model already knows. One example beats three paragraphs.

D9 — Progressive loading: Keep SKILL.md < 500 lines. Move details to references/ with explicit "when to read" pointers:

## Resources

- **Advanced schemas**: See [SCHEMAS.md](references/schemas.md) when user provides non-standard input
- **Error handling**: See [TROUBLESHOOTING.md](references/troubleshooting.md) when scripts fail
- **API reference**: See [API.md](references/api.md) when working with specific endpoints

Keep references one level deep (no nested references). Files > 100 lines: include table of contents at top.

Step 5: Package

scripts/package_skill.py <path/to/skill-folder> [output-dir]

Automatically validates structure, frontmatter, naming, and description quality before packaging.

Step 6: Iterate

  1. Use the skill on real tasks
  2. Notice struggles or inefficiencies
  3. Apply Step 7 (Optimize) to systematically improve

OPTIMIZE Mode: Step 7

Use when an existing skill needs improvement. Read references/skill-optimization.md for detailed before/after examples and references/evidence-database.md for research sources.

7.1 Diagnose

Read the full skill. Answer each diagnostic question:

IDDimensionDiagnostic QuestionImpact if Failing
------------------------------------
D1DescriptionSeeing ONLY name+description, would an AI trigger this skill correctly?Skill never triggers or triggers incorrectly
D2EfficiencyDoes every line earn its token cost?Context bloat degrades all loaded skills
D3StructureAre workflows/constraints/examples visually separated?Instructions blur together, compliance drops
D4StepsAre complex processes numbered with decision points?Agent improvises, misses critical sequence
D5PositiveAre rules "do X" not "don't do Y"?Negative constraints poorly followed (IFEval data)
D6ExamplesIs there a concrete "user request → behavior" example?Agent guesses expected behavior
D7VerificationDoes the skill define how to check output?Agent "rushes to completion" (Issue #42796)
D8FreedomIs specificity matched to task fragility?Over/under-constrained output
D9ProgressiveIs SKILL.md lean with details in references/?All content loaded even when not needed

Run scripts/quick_validate.py for automated hints on D1, D2, D5, D6, D7, D9.

7.2 Prioritize by Skill Type

Focus on the dimensions that matter most for this skill type:

Skill TypeFix These FirstWhy
---------
Workflow (sequential process)D4 Steps → D7 Verify → D2 EfficiencySequence errors are catastrophic; verification catches them
Tool Integration (API/CLI wrapper)D1 Description → D8 Freedom → D6 ExamplesMust trigger on right tool names; right specificity level
Knowledge/Reference (schemas, policies)D9 Progressive → D2 Efficiency → D3 StructureMassive content needs aggressive context management
Creative/Generative (writing, design)D5 Positive → D8 Freedom → D6 ExamplesNeeds principles not rigid rules; examples set quality bar
Debugging/Analysis (troubleshooting)D4 Steps → D7 Verify → D5 PositiveMust follow diagnostic sequence; verify fix actually works
Document Generation (reports, analysis)D4 Steps → D3 Structure → D7 VerifyOutput structure must be fixed; verification ensures completeness
Orchestration (multi-agent, parallel tasks)D4 Steps → D7 Verify → D6 ExamplesDispatch logic must be precise; sub-agent prompts need templates
Meta-Skill (skills about skills)D3 Structure → D9 Progressive → D5 PositiveComplex logic needs clean structure; must avoid recursive bloat
Quality Assurance (review, testing, verification)D7 Verify → D5 Positive → D4 StepsVerification is the core function; positive framing prevents over-flagging

7.3 Apply Dimension Fixes

For each failing dimension, apply the fix. See references/skill-optimization.md for detailed before/after examples.

D1 — Rewrite description with 3-part formula: (1) capabilities, (2) trigger scenarios, (3) keywords.

D2 — Audit every line: delete known knowledge, compress with examples, move details to references/. Target < 500 lines.

D3 — Add visual hierarchy: Markdown ##/### headings for each content type. --- between major blocks.

D4 — Number the steps: Convert paragraphs to numbered lists. Add if/else decision points. Add error handling.

D5 — Flip to positive: Convert "don't X" → "do Y". For safety rules, keep negative but add positive alternative.

D6 — Add examples: 1-3 concrete "User: ... → Agent behavior: ..." examples. Realistic inputs, not placeholders.

D7 — Add verification: Explicit success criteria at workflow end. What to check, how to check, what if it fails.

D8 — Calibrate freedom: Fragile tasks → numbered steps/scripts. Creative tasks → principles/examples.

D9 — Move to references: Anything beyond core workflow → references/. Each link says WHEN to read it.

7.4 Scan for Anti-Patterns

Anti-PatternSymptomFix
---------
Kitchen SinkSKILL.md > 500 linesSplit into focused skills or references/
Invisible SkillNo trigger phrases in descriptionRewrite with Trigger: keywords
Vague Guardrails"Write clean code", "Be helpful"Replace with specific, measurable rules
Negative-OnlyMostly "don't" / "never" rulesConvert to positive framing
Missing ExitNo success/failure criteriaAdd verification section
Flat Wall of TextNo headings or visual breaksAdd ## / ### headings
Zombie ReferencesFiles exist but unlinked from SKILL.mdAdd explicit "See X when Y" pointers
Example-FreeNo concrete examplesAdd 1-3 user request → behavior examples
Stateless Multi-StepMulti-step workflow with no state trackingAdd file existence checks or checkbox tracking
LLM-as-State-OracleUsing LLM to determine workflow state instead of scriptsReplace with deterministic checks (file exists, script output)
Missing Script IntegrationReferences to scripts/ but no scripts/ directoryCreate the scripts or remove references

7.5 Validate After Optimization

  1. Run scripts/quick_validate.py — structure + optimization hints
  2. Re-read end-to-end — does each section serve a clear purpose?
  3. Count lines — SKILL.md body still < 500?
  4. Check all reference links — valid? Explain when to read?

Reference Files

FileWhen to Read
------
references/skill-optimization.mdOptimizing a skill — detailed before/after examples for all 9 dimensions
references/evidence-database.mdChecking research evidence — papers, data, community issues
references/skill-taxonomy.mdClassifying skill types — 9 types, design templates, user need routing
references/state-machine-patterns.mdAdding state management — DAG patterns, checkpoint design, script-driven workflows

Scripts

ScriptPurpose
------
scripts/init_skill.pyCreate new skill directory with optimized template
scripts/package_skill.pyValidate + package skill into .skill file
scripts/quick_validate.pyValidate structure + check optimization hints

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 10:32 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

contract-doc-sync

leonardo-lb
文档同步工具——检测代码与文档的漂移并同步修复。仅读写本地 docs/ 下的 Markdown 文件,仅运行本地脚本(git diff、md-sections)。源码文件仅读,无网络请求、无加密、无支付/购买功能、无远程下载。所需:pyth
★ 0 📥 423

calculator-py

leonardo-lb
高性能本地计算器,Python 脚本提供数值计算能力。触发场景:①数学运算(四则、幂、三角、对数);②矩阵运算(乘法、求逆、行列式、特征值、SVD);③统计分析(均值、标准差、回归、概率分布);④高精度/任意精度计算(大数运算、超越函数)等
★ 0 📥 522

db-toolkit

leonardo-lb
轻量级多数据库工具,支持 MySQL/PostgreSQL/SQLite 的 DDL/DML 与 Schema 探索。触发场景包括:用户说“连接数据库、测试数据库连接、连接 MySQL/PostgreSQL/SQLite”;用户说“查看表结
★ 0 📥 460