← 返回
AI智能 中文

Agent Cost Strategy

Tiered model selection and cost optimization for multi-agent AI workflows. Use this skill whenever you are choosing a model for a task, spinning up a sub-age...
分层模型选择与成本优化,适用于多智能体 AI 工作流。在为任务选择模型或启动子代理时使用此技能。
djc00p
AI智能 clawhub v1.3.6 3 版本 99859.9 Key: 无需
★ 0
Stars
📥 713
下载
💾 8
安装
3
版本
#latest

概述

Agent Cost Strategy

Use the cheapest model that can reliably do the job. Most tasks don't need your most powerful model.

The Three Tiers

TierWhen to UseExamples
----------------------------
Fast/CheapSub-agents, background tasks, automated fixes, simple lookups, short repliesClaude Haiku, GPT-4o-mini, Gemini Flash
Mid-tierMain session dialogue, moderate reasoning, multi-step tasksClaude Sonnet, GPT-4o, Gemini Pro
PowerfulArchitecture decisions, deep reviews, hard problems, after cheaper models fail twiceClaude Opus, GPT-4.5, Gemini Ultra

Task → Tier Routing

Fix failing tests          → Fast/Cheap
Write boilerplate          → Fast/Cheap
Research / search          → Fast/Cheap
Cron / scheduled tasks     → Fast/Cheap (always)
Short replies (hi, ok)     → Fast/Cheap (always)
Background monitoring      → Fast/Cheap (always)
Build new feature          → Mid-tier
Review a PR                → Mid-tier
Main assistant dialogue    → Mid-tier (default)
Architecture decisions     → Powerful
Deep code review           → Powerful
Stuck after 2 attempts     → Escalate one tier up

Heartbeat / Cron Model Rule

Always specify the cheapest model for scheduled and background tasks — they run frequently and costs add up fast. Check your platform's config for how to set a model per cron/heartbeat job.

For heartbeat intervals: set them just under your provider's cache TTL to keep the prompt cache warm and pay cache-read rates instead of full input rates. Check your provider's docs for the exact TTL.

Communication Pattern Rule

One-word and short conversational messages (hi, thanks, ok, sure, yes, no) should always route to Fast/Cheap. Never burn a mid-tier or powerful model on an acknowledgment.

Cache Optimization

Prompt caching cuts costs 50-90% on repeated context. Cache writes cost ~25% more but pay off after just 1-2 reuses. See references/cache-optimization.md for patterns and break-even math.

Batch API (Non-Urgent Tasks)

For cron jobs, scheduled analysis, or anything that doesn't need an immediate response — use the Batch API (Anthropic/OpenAI both offer it). 50% discount in exchange for async delivery (results within 24h). Never use real-time API for background work that can wait.

Sub-Agent Model Rule (Critical)

Always explicitly set the model when spawning sub-agents. Never rely on defaults — the default inherits the parent session model (expensive mid-tier). One month of sub-agents defaulting to Sonnet = 96% of costs going to Sonnet when it should be split ~80/20 Haiku/Sonnet.

sessions_spawn → always include model: "claude-haiku-4-5-20251001" (or equivalent fast-cheap)

Default sub-agent tasks to Haiku for cost efficiency. Override with a stronger model when task complexity or accuracy requirements justify it.

New Session / Machine Cold Start Cost

When starting a fresh session (new machine, new session after /new), the cache is empty. The first few messages will write the entire context (skills, workspace files, memory) to cache at 1.25x the normal input rate. This is unavoidable but temporary — it pays off within 2-3 messages once the cache warms up.

Don't panic at the first few messages being expensive on a new machine. The cache write cost is a one-time investment that makes every subsequent message ~90% cheaper.

Signs You're Over-Spending

  • Running powerful models on tasks Fast/Cheap can handle
  • No caching on repeated system prompts
  • Heartbeat/cron jobs using the default (expensive) model
  • Sub-agents spawned without explicit model = biggest cost leak

Session & Cache Management

Keep sessions alive when possible — longer sessions build cache and reduce costs. Only end sessions when context is genuinely full or for privacy reasons.

Anthropic's prompt cache builds from repeated context within a live session. When a session starts fresh, all context (system prompt, workspace files, skills) loads cold — typically 400-600k tokens at full cost. Once cached, subsequent messages cost ~10% of that.

The math:

  • Cold session start: 600k tokens × full price = expensive
  • After cache warms up: 600k tokens × 10% cache price = ~90% cheaper per message
  • Ending a session destroys the cache and forces a full cold reload next time

Rules:

  • Let sessions run as long as possible for cost efficiency
  • Only start a new session (/new) when context is genuinely full (>80%) or when you need a fresh privacy boundary
  • Ending sessions should be intentional — for privacy/data-retention reasons, not routine cost management
  • The longer a session runs, the cheaper each message gets

Privacy & Cache Note: Cached context may include workspace files and memory — avoid caching sessions containing secrets or sensitive PII. If a session will cache sensitive data, plan to end it when done.

Delegation rule (keep main agent lean):

  • Main agent (Sonnet/mid-tier) = conversational only: planning, coordination, reviewing results
  • Sub-agents (Haiku/fast-cheap) = all actual doing: file edits, research, builds, data tasks
  • Keeping the main agent conversational reduces its context growth and keeps cache hits high

版本历史

共 3 个版本

  • v1.3.6 当前
    2026-05-03 03:51 安全 安全
  • v1.3.0
    2026-03-29 22:26
  • v1.1.0
    2026-03-20 00:05

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 836 📥 213,164
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,424
developer-tools

Rails CI Fixer

djc00p
使用分层升级循环修复Rails PR的CI失败。当Rails拉取请求出现CI失败时使用此技能——包括RSpec测试失败、RuboCop违规等。
★ 0 📥 851