Evaluate skills across 25 criteria using a hybrid automated + manual approach.
python3 scripts/eval-skill.py /path/to/skill
python3 scripts/eval-skill.py /path/to/skill --json # machine-readable
python3 scripts/eval-skill.py /path/to/skill --verbose # show all details
Checks: file structure, frontmatter, description quality, script syntax, dependency audit, credential scan, env var documentation.
Use the rubric at references/rubric.md to score 25 criteria across 8 categories (0–4 each, 100 total). Each criterion has concrete descriptions per score level.
Copy assets/EVAL-TEMPLATE.md to the skill directory as EVAL.md. Fill in automated results + manual scores.
eval-skill.py — get the automated structural score| # | Category | Source Framework | Criteria |
|---|---|---|---|
| --- | ---------- | ----------------- | ---------- |
| 1 | Functional Suitability | ISO 25010 | Completeness, Correctness, Appropriateness |
| 2 | Reliability | ISO 25010 | Fault Tolerance, Error Reporting, Recoverability |
| 3 | Performance / Context | ISO 25010 + Agent | Token Cost, Execution Efficiency |
| 4 | Usability — AI Agent | Shneiderman, Gerhardt-Powals | Learnability, Consistency, Feedback, Error Prevention |
| 5 | Usability — Human | Tognazzini, Norman | Discoverability, Forgiveness |
| 6 | Security | ISO 25010 + OpenSSF | Credentials, Input Validation, Data Safety |
| 7 | Maintainability | ISO 25010 | Modularity, Modifiability, Testability |
| 8 | Agent-Specific | Novel | Trigger Precision, Progressive Disclosure, Composability, Idempotency, Escape Hatches |
| Range | Verdict | Action |
|---|---|---|
| ------- | --------- | -------- |
| 90–100 | Excellent | Publish confidently |
| 80–89 | Good | Publishable, note known issues |
| 70–79 | Acceptable | Fix P0s before publishing |
| 60–69 | Needs Work | Fix P0+P1 before publishing |
| <60 | Not Ready | Significant rework needed |
This evaluator covers security basics (credentials, input validation, data safety) but for thorough security audits of skills under development, consider SkillLens (npx skilllens scan ). It checks for exfiltration, code execution, persistence, privilege bypass, and prompt injection — complementary to the quality focus here.
pip install pyyaml) — for frontmatter parsing in automated checks共 1 个版本