> Night Market Skill — ported from claude-night-market/abstract. For the full experience with agents, hooks, and commands, install the Claude Code plugin.
Analyze the effectiveness of past skill improvements and
refine the improvement process itself. This is the core
innovation from the Hyperagents paper: not just improving
skills, but improving HOW skills are improved.
This skill should be invoked automatically when:
a skill's evaluation window ended in
pending_rollback_review status. The improvement
made things worse -- we need to understand why.
ImprovementMemory.get_effective_strategies() vs
get_failed_strategies() shows effectiveness below
50%, the improvement process itself needs refinement.
PerformanceTracker.get_improvement_trend() returns
negative for a skill that was recently improved.
(tracked via outcome count in ImprovementMemory).
The homeostatic monitor emits
"improvement_triggered": true when a skill crosses the
flag threshold. At that point, before dispatching the
skill-improver, check if metacognitive analysis is
warranted:
from abstract.improvement_memory import ImprovementMemory
from pathlib import Path
memory = ImprovementMemory(
Path.home() / ".claude/skills/improvement_memory.json"
)
# Check if metacognitive analysis is warranted
effective = memory.get_effective_strategies()
failed = memory.get_failed_strategies()
total = len(effective) + len(failed)
needs_metacognition = False
# Trigger 1: Low effectiveness rate
if total >= 5 and len(effective) / total < 0.5:
needs_metacognition = True
# Trigger 2: Periodic check (every 10 outcomes)
if total > 0 and total % 10 == 0:
needs_metacognition = True
# Trigger 3: Recent regression
if failed and failed[-1].get("outcome_type") == "failure":
needs_metacognition = True
if needs_metacognition:
# Run metacognitive analysis before next improvement
pass # Skill(abstract:metacognitive-self-mod)
worked
Read improvement memory and performance tracker data:
# Check for improvement memory
MEMORY_FILE=~/.claude/skills/improvement_memory.json
TRACKER_FILE=~/.claude/skills/performance_history.json
if [ ! -f "$MEMORY_FILE" ]; then
echo "No improvement memory found."
echo "Run skill-improver first to generate improvement data."
exit 0
fi
Load the JSON files using Python:
from abstract.improvement_memory import ImprovementMemory
from abstract.performance_tracker import PerformanceTracker
from pathlib import Path
memory = ImprovementMemory(Path.home() / ".claude/skills/improvement_memory.json")
tracker = PerformanceTracker(Path.home() / ".claude/skills/performance_history.json")
For each improvement outcome in memory, classify:
after_score - before_score >= 0.1-0.1 < improvement < 0.1after_score < before_scoreeffective = memory.get_effective_strategies()
failed = memory.get_failed_strategies()
# Calculate effectiveness rate
total = len(effective) + len(failed)
if total > 0:
effectiveness_rate = len(effective) / total
Analyze WHAT types of improvements succeed vs fail:
Success patterns to look for:
Failure patterns to look for:
For each pattern found, record as a causal hypothesis:
memory.record_insight(
skill_ref="_meta", # Special ref for meta-insights
category="causal_hypothesis",
insight="Error handling improvements have 85% success rate",
evidence=["skill-A v1.1.0: +0.3", "skill-B v2.1.0: +0.15"]
)
Use PerformanceTracker to identify:
for skill_ref in tracker.get_all_skill_refs():
trend = tracker.get_improvement_trend(skill_ref)
if trend is not None:
if trend > 0.05:
# Sustained improvement - what's working?
pass
elif trend < -0.05:
# Degrading despite improvements - investigate
pass
Based on the meta-analysis, generate recommendations for
the skill-improver:
types have higher improvement success rates, weight
them higher.
success vs "restructure workflow" at 30%, bias toward
error handling.
priority 3.0 consistently fail, raise the minimum
threshold.
in future improvements.
Record all findings back into ImprovementMemory under the
special _meta skill ref:
# Record strategy recommendation
memory.record_insight(
skill_ref="_meta",
category="strategy_success",
insight="Recommendation: Prioritize error handling and examples over restructuring",
evidence=[f"Success rate: error_handling={eh_rate:.0%}, restructure={rs_rate:.0%}"]
)
If significant meta-insights are found, propose concrete
modifications to the skill-improver agent:
Important: Propose changes, do not auto-apply. The user
must approve modifications to the improvement process.
Metacognitive Self-Modification Report
Improvement Data:
Total outcomes analyzed: 15
Effective improvements: 11 (73%)
Regressions: 2 (13%)
Neutral: 2 (13%)
Success Patterns:
1. Error handling additions: 5/6 success (83%)
2. Example additions: 3/3 success (100%)
3. Quiet mode additions: 2/2 success (100%)
Failure Patterns:
1. Workflow restructuring: 1/3 success (33%)
2. Token-heavy additions: 0/1 success (0%)
Performance Trends:
Improving: 8 skills (positive trend)
Stable: 4 skills (no trend)
Degrading: 1 skill (negative trend despite attempts)
Recommendations:
1. Weight error handling improvements 2x in priority
2. Avoid workflow restructuring below priority 8.0
3. Cap additions at 200 tokens to prevent budget overflow
4. Focus next improvement cycle on degrading skill X
Meta-insights stored: 5 new entries in improvement memory
abstract:skill-improver - The agent this skill analyzesand proposes modifications for
abstract:skills-eval - Evaluation framework whosecriteria could be refined by meta-insights
abstract:aggregate-logs - Data source for improvementmetrics
共 6 个版本