← 返回
未分类 中文

Nm Abstract Escalation Governance

Assess whether to escalate models
评估是否升级模型
athola athola 来源
未分类 clawhub v1.9.12 4 版本 99770.1 Key: 无需
★ 0
Stars
📥 434
下载
💾 1
安装
4
版本
#latest

概述

> Night Market Skill — ported from claude-night-market/abstract. For the full experience with agents, hooks, and commands, install the Claude Code plugin.

Table of Contents

Escalation Governance

Overview

Model escalation (haiku→sonnet→opus) trades speed/cost for reasoning capability. This trade-off must be justified.

Core principle: Escalation is for tasks that genuinely require deeper reasoning, not for "maybe a smarter model will figure it out."

The Iron Law

NO ESCALATION WITHOUT INVESTIGATION FIRST

Verification: Run the command with --help flag to verify availability.

Escalation is never a shortcut. If you haven't understood why the current model is insufficient, escalation is premature.

When to Escalate

Legitimate escalation triggers:

TriggerDescriptionExample
-------------------------------
Genuine complexityTask inherently requires nuanced judgmentSecurity policy trade-offs
Reasoning depthMultiple inference steps with uncertaintyArchitecture decisions
Novel patternsNo existing patterns applyFirst-of-kind implementation
High stakesError cost justifies capability investmentProduction deployment
Ambiguity resolutionMultiple valid interpretations need weighingSpec clarification

When NOT to Escalate

Illegitimate escalation triggers:

Anti-PatternWhy It's WrongWhat to Do Instead
---------------------------------------------------
"Maybe smarter model will figure it out"This is thrashingInvestigate root cause
Multiple failed attemptsSuggests wrong approach, not insufficient capabilityQuestion your assumptions
Time pressureUrgency doesn't change task complexitySystematic investigation is faster
Uncertainty without investigationYou haven't tried to understand yetGather evidence first
"Just to be safe"False safety - wastes resourcesAssess actual complexity

Decision Framework

Before escalating, answer these questions:

1. Have I understood the problem?

  • [ ] Can I articulate why the current model is insufficient?
  • [ ] Have I identified what specific reasoning capability is missing?
  • [ ] Is this a capability gap or a knowledge gap?

If knowledge gap: Gather more information, don't escalate.

2. Have I investigated systematically?

  • [ ] Did I read error messages/outputs carefully?
  • [ ] Did I check for similar solved problems?
  • [ ] Did I form and test a hypothesis?

If not investigated: Complete investigation first.

3. Is escalation the right solution?

  • [ ] Would a different approach work at current model level?
  • [ ] Is the task inherently complex, or am I making it complex?
  • [ ] Would breaking the task into smaller pieces help?

If decomposable: Break down, don't escalate.

4. Can I justify the trade-off?

  • [ ] What's the cost (latency, tokens, money) of escalation?
  • [ ] What's the benefit (accuracy, safety, completeness)?
  • [ ] Is the benefit proportional to the cost?

If not proportional: Don't escalate.

Escalation Protocol

When escalation IS justified:

  1. Document the reason - State why current model is insufficient
  2. Specify the scope - What specific subtask needs higher capability?
  3. Define success - How will you know the escalated task succeeded?
  4. Return promptly - Drop back to efficient model after reasoning task

Common Rationalizations

ExcuseReality
-----------------
"This is complex"Complex for whom? Have you tried?
"Better safe than sorry"Safety theater wastes resources
"I tried and failed"How many times? Did you investigate why?
"The user expects quality"Quality comes from process, not model size
"Just this once"Exceptions become habits
"Time is money"Systematic approach is faster than thrashing

Agent Schema

Agents can declare escalation hints in frontmatter:

model: haiku
escalation:
  to: sonnet                 # Suggested escalation target
  hints:                     # Advisory triggers (orchestrator may override)
    - security_sensitive     # Touches auth, secrets, permissions
    - ambiguous_input        # Multiple valid interpretations
    - novel_pattern          # No existing patterns apply
    - high_stakes            # Error would be costly

Verification: Run the command with --help flag to verify availability.

Key points:

  • Hints are advisory, not mandatory
  • Orchestrator has final authority
  • Orchestrator can escalate without hints (broader context)
  • Orchestrator can ignore hints (task is actually simple)

Orchestrator Authority

The orchestrator (typically Opus) makes final escalation decisions:

Can follow hints: When hint matches observed conditions

Can override to escalate: When context demands it (even without hints)

Can override to stay: When task is simpler than hints suggest

Can escalate beyond hint: Go to opus even if hint says sonnet

The orchestrator's judgment, informed by conversation context, supersedes static hints.

Red Flags - STOP and Investigate

If you catch yourself thinking:

  • "Let me try with a better model"
  • "This should be simple but isn't working"
  • "I've tried everything" (but haven't investigated why)
  • "The smarter model will know what to do"
  • "I don't understand why this isn't working"

ALL of these mean: STOP. Investigate first.

Integration with Agent Workflow

**Verification:** Run the command with `--help` flag to verify availability.
Agent starts task at assigned model
├── Task succeeds → Complete
└── Task struggles →
    ├── Investigate systematically
    │   ├── Root cause found → Fix at current model
    │   └── Genuine capability gap → Escalate with justification
    └── Don't investigate → WRONG PATH
        └── "Maybe escalate?" → NO. Investigate first.

Verification: Run the command with --help flag to verify availability.

Quick Reference

SituationAction
-------------------
Task inherently requires nuanced reasoningEscalate
Agent uncertain but hasn't investigatedInvestigate first
Multiple attempts failedQuestion approach, not model
Security/high-stakes decisionEscalate
"Maybe smarter model knows"Never escalate on this basis
Hint fires, task is actually simpleOverride, stay at current model
No hint fires, task is actually complexOverride, escalate

Model Capability Notes

MCP Tool Search (Claude Code 2.1.7+): Haiku models do not support MCP tool search. If a workflow uses many MCP tools (descriptions exceeding 10% of context), those tools load upfront on haiku instead of being deferred. This can consume significant context. Consider escalating to sonnet for MCP-heavy workflows or ensure haiku agents use only native tools (Read, Write, Bash, etc.).

Claude.ai MCP Connectors (Claude Code 2.1.46+): Users with claude.ai connectors configured may have additional MCP tools auto-loaded, increasing the total tool description footprint. This makes it more likely that haiku agents will exceed the 10% tool search threshold. When escalation decisions involve MCP-heavy workflows, factor in claude.ai connector tool count via /mcp.

Effort Controls as Escalation Alternative (Opus 4.6 / Claude Code 2.1.32+): Opus 4.6 introduces adaptive thinking with effort levels (low, medium, high). The max level was removed in 2.1.72 for Opus 4.6, and high became the ceiling on that model. Claude Code 2.1.111 reintroduced max and added xhigh (between high and max) for Opus 4.7 only; on other models xhigh falls back to high. Symbols: ○ (low) ◐ (medium) ● (high) ◉ (xhigh) ★ (max). Use /effort (interactive slider since 2.1.111) or /effort auto to reset. Before escalating between models, consider whether adjusting effort on the current model would suffice:

Instead of...Consider...When
---------------------------------
Haiku → SonnetStay on HaikuTask is still deterministic, just needs more context
Sonnet → OpusOpus@mediumModerate reasoning, not deep architectural analysis
Opus@medium → "maybe try again"Opus@high or "ultrathink"Genuine complexity needing deeper reasoning
Opus 4.7@high → escalateOpus 4.7@xhigh or @maxDeep architectural analysis on Opus 4.7 specifically

Default effort change (2.1.68+): Opus 4.6 now

defaults to medium effort for Max and Team

subscribers. Use /model to change effort level, or

type "ultrathink" in your prompt to enable high effort

for the next turn.

Opus 4/4.1 removed (2.1.68+): Opus 4 and 4.1 are

no longer available on the first-party API. Users with

these models pinned are automatically migrated to

Opus 4.6. No action needed for agents using model

frontmatter, as the migration is transparent.

Sonnet 4.5 → 4.6 migration (2.1.69+): Sonnet 4.5

users on Pro/Max/Team Premium are automatically migrated

to Sonnet 4.6. Agent model frontmatter referencing

Sonnet resolves transparently. The --model flags for

claude-opus-4-0 and claude-opus-4-1 now correctly

resolve to Opus 4.6 instead of deprecated versions.

Effort parameter fix (2.1.70+): Fixed API 400 error

This model does not support the effort parameter when

using custom Bedrock inference profiles or non-standard

Claude model identifiers. Effort controls now work

reliably across all deployment configurations.

Default Opus 4.6 on providers (2.1.73+): Bedrock,

Vertex, and Microsoft Foundry now default to Opus 4.6

(was Opus 4.1). Subagent model: opus/sonnet/haiku

aliases now resolve to the current version on all

providers; previously they were silently downgraded to

older versions (e.g., Opus 4.1 instead of 4.6). This

fix means agent dispatch workflows on third-party

providers now match first-party API behavior.

modelOverrides setting (2.1.73+): Maps model

picker entries to provider-specific IDs (Bedrock

inference profile ARNs, Vertex version names, Foundry

deployment names). Use when routing model selections to

specific inference profiles. See the model optimization

guide for configuration details.

/output-style deprecated (2.1.73+): Use /config

instead. Output style is now fixed at session start for

better prompt caching.

Full model IDs in agent frontmatter (2.1.74+): Agent

model: fields now accept full model IDs (e.g.,

claude-opus-4-6) in addition to aliases (opus,

sonnet, haiku). Previously, full IDs were silently

ignored. Agents now accept the same values as --model.

Effort controls do NOT replace the escalation governance

framework: they provide an additional axis. The Iron Law

still applies: investigate before changing either model

or effort level.

版本历史

共 4 个版本

  • v1.9.12 当前
    2026-06-19 20:01 安全 安全
  • v1.8.6
    2026-06-09 17:57
  • v1.8.5
    2026-05-09 16:38 安全 安全
  • v1.8.4
    2026-05-07 06:57 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,097 📥 824,093
dev-programming

Nm Parseltongue Python Performance

athola
分析 Python 代码的性能瓶颈和内存问题
★ 0 📥 735
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,394 📥 322,237