Skill Test

概述

Test Skills Safely

Two use cases:

Try before commit — Test drive skills before installing
Evaluate before publish — Verify quality before publishing

Key principle: Test in isolation. Never affect user's environment.

References:

Read sandbox.md — Isolated testing environment
Read compare.md — A/B comparison between skills
Read evaluate.md — Multi-agent quality evaluation

Quick Start

Trial a skill:

sessions_spawn(
  task="Test skill X: Load ONLY its SKILL.md, run [sample task], report quality",
  model="anthropic/claude-haiku"
)

Compare two skills:

Run same task through each (separate sub-agents)
Present outputs side-by-side
Ask: "Which works better? Why?"

Test Modes

Trial Mode — Before installing

Spawn sub-agent with ONLY the test skill
Run 2-3 representative tasks
Evaluate: Does it help? Clear instructions?
Decision: keep, pass, or try another

Evaluation Mode — Before publishing

Spawn specialized reviewers (see evaluate.md)
Check structure, safety, usefulness
Synthesize findings
Recommend improvements

Sandbox Isolation

⚠️ Never load test skill into your main context.

Sub-agent approach (recommended):

sessions_spawn(
  task="You have ONE skill loaded: [skill content]. Test by doing: [task]",
  model="anthropic/claude-haiku"
)

Complete isolation — main session unaffected
Natural cleanup — sub-agent terminates, done
Cheap testing — use Haiku

What to check:

Does it activate correctly?
Are instructions clear?
Token cost reasonable?
Output quality acceptable?

Edge Cases

Skill requires credentials: Ask user for test credentials or skip auth-dependent features.

Skill not found: Verify slug with npx clawhub info before testing.

Test fails mid-way: Sub-agent terminates cleanly. Review logs, adjust test task, retry.

Skill has many auxiliary files: Load SKILL.md first, reference others only if needed during test.

Test thoroughly. Install only after explicit user approval.

版本历史

共 1 个版本

v1.0.0 当前

2026-03-28 22:18 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)