← 返回
未分类 中文

Nm Abstract Subagent Testing

Test skills via TDD in fresh subagents
在新子代理中通过 TDD测试技能
athola athola 来源
未分类 clawhub v1.9.14 7 版本 100000 Key: 无需
★ 0
Stars
📥 524
下载
💾 1
安装
7
版本
#latest

概述

> Night Market Skill — ported from claude-night-market/abstract. For the full experience with agents, hooks, and commands, install the Claude Code plugin.

Subagent Testing - TDD for Skills

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.

Table of Contents

  1. Overview
  2. Why Fresh Instances Matter
  3. Testing Methodology
  4. Quick Start
  5. Detailed Testing Guide
  6. Success Criteria

Overview

Fresh instances prevent priming: Each test uses a new Claude conversation to verify

the skill's impact is measured, not conversation history effects.

Why Fresh Instances Matter

The Priming Problem

Running tests in the same conversation creates bias:

  • Prior context influences responses
  • Skill effects get mixed with conversation history
  • Can't isolate skill's true impact

Fresh Instance Benefits

  • Isolation: Each test starts clean
  • Reproducibility: Consistent baseline state
  • Measurement: Clear before/after comparison
  • Validation: Proves skill effectiveness, not priming

Testing Methodology

Three-phase TDD-style approach:

Phase 1: Baseline Testing (RED)

Test without skill to establish baseline behavior.

Phase 2: With-Skill Testing (GREEN)

Test with skill loaded to measure improvements.

Phase 3: Rationalization Testing (REFACTOR)

Test skill's anti-rationalization guardrails.

Quick Start

# 1. Create baseline tests (without skill)
# Use 5 diverse scenarios
# Document full responses

# 2. Create with-skill tests (fresh instances)
# Load skill explicitly
# Use identical prompts
# Compare to baseline

# 3. Create rationalization tests
# Test anti-rationalization patterns
# Verify guardrails work

Detailed Testing Guide

For complete testing patterns, examples, and templates:

Success Criteria

  • Baseline: Document 5+ diverse baseline scenarios
  • Improvement: ≥50% improvement in skill-related metrics
  • Consistency: Results reproducible across fresh instances
  • Rationalization Defense: Guardrails prevent ≥80% of rationalization attempts

See Also

  • skill-authoring: Creating effective skills
  • bulletproof-skill: Anti-rationalization patterns
  • test-skill: Automated skill testing command

版本历史

共 7 个版本

  • v1.9.14 当前
    2026-07-02 08:38
  • v1.9.13
    2026-06-30 16:37 安全 安全
  • v1.9.12
    2026-06-19 19:45 安全 安全
  • v1.8.6
    2026-06-09 17:38 安全 安全
  • v1.8.5
    2026-05-09 16:30 安全 安全
  • v1.8.4
    2026-05-07 04:05 安全 安全
  • v1.8.3
    2026-05-03 07:49 安全 安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

ai-agent

self-improving agent

pskoett
记录自身发现以实现自我改进的技能
★ 4,164 📥 936,080
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 865 📥 344,790
dev-programming

Nm Parseltongue Python Performance

athola
分析 Python 代码的性能瓶颈和内存问题
★ 0 📥 824