← 返回
未分类 中文

Agent Estimation

Accurately estimate AI agent work effort using the agent's own operational units (tool-call rounds) instead of human time. Use when asked to estimate, scope,...
使用智能体的操作单元(如工具调用轮次)而非人工时间来准确估算工作量。适用于被要求估算、范围界定等情形。
hjw21century hjw21century 来源
未分类 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 253
下载
💾 0
安装
1
版本
#latest

概述

Agent Work Estimation Skill

Problem

AI coding agents systematically overestimate task duration because they anchor to human developer timelines absorbed from training data. A task an agent can complete in 30 minutes gets estimated as "2-3 days" because that's what a human developer forum post would say.

Solution

Force the agent to estimate from its own operational units — tool-call rounds — and only convert to human wallclock time at the very end.

Core Units

UnitDefinitionScale
------------------------
RoundOne tool-call cycle: think → write code → execute → verify → fix~2-4 min wallclock
ModuleA functional unit built from multiple rounds until usable2-15 rounds
ProjectAll modules + integration + debuggingSum of modules × integration factor

A Round is the atomic unit. It maps directly to one iteration of:

  1. Agent reasons about what to do
  2. Agent writes/edits code
  3. Agent runs the code or a test
  4. Agent reads the output
  5. Agent decides if it needs to fix something (if yes → next round)

Estimation Procedure

When asked to estimate a task, follow these steps in order:

Step 1: Decompose into Modules

Break the task into functional modules. Each module should be independently buildable and testable. Ask yourself: "What are the distinct pieces I would build one at a time?"

Step 2: Estimate Rounds per Module

For each module, estimate the number of rounds using these anchors:

PatternTypical RoundsExamples
----------------------------------
Boilerplate / known pattern1-2CRUD endpoint, config file, standard API client
Moderate complexity3-5Custom UI layout, state management, data pipeline
Exploratory / under-documented5-10Unfamiliar framework, platform-specific APIs, complex integrations
High uncertainty8-15Undocumented behavior, novel algorithms, multi-system debugging

Key calibration rules:

  • If you can generate the code in one shot and it will likely run → 1 round
  • If you'll need to generate, run, see an error, and fix → 2-3 rounds
  • If the library/framework has sparse docs and you'll be guessing → 5+ rounds
  • If it involves platform permissions, OS-level APIs, or environment-specific behavior the user must manually verify → add 2-3 rounds

Step 3: Assign Risk Coefficients

Each module gets a risk coefficient that inflates its round count:

Risk LevelCoefficientWhen to Apply
---------------------------------------
Low1.0Mature ecosystem, clear docs, agent has strong pattern match
Medium1.3Minor unknowns, may need 1-2 extra debug rounds
High1.5Sparse docs, platform quirks, integration unknowns
Very High2.0Possible dead ends, may need to change approach entirely

Step 4: Calculate Totals

Module effective rounds = base rounds × risk coefficient
Project rounds = Σ(module effective rounds) + integration rounds
Integration rounds = 10-20% of base total (for wiring modules together)

Step 5: Convert to Wallclock Time

Only at the very end, convert to human time:

Wallclock time = project rounds × minutes_per_round

Default minutes_per_round = 3 minutes (includes agent generation time + user review time).

Adjust this parameter based on context:

  • Fast iteration, user barely reviews → 2 min/round
  • Complex domain, user carefully reviews each step → 4 min/round
  • User needs to manually test (mobile, hardware, permissions) → 5 min/round

Output Format

Always output the estimation in this exact structure:

### Task: [task name]

#### Module Breakdown

| # | Module | Base Rounds | Risk | Effective Rounds | Notes |
|---|--------|------------|------|-----------------|-------|
| 1 | ...    | N          | 1.x  | M               | why   |
| 2 | ...    | N          | 1.x  | M               | why   |

#### Summary

- **Base rounds**: X
- **Integration**: +Y rounds
- **Risk-adjusted total**: Z rounds
- **Estimated wallclock**: A – B minutes (at N min/round)

#### Biggest Risks
1. [specific risk and what could blow up the estimate]
2. [...]

Anti-Patterns to Avoid

These are the failure modes this skill exists to prevent:

  1. Human-time anchoring: "A developer would take about 2 weeks..." → NO. Start from rounds.
  2. Padding by vibes: Adding time "just to be safe" without specific risk rationale → NO. Use risk coefficients.
  3. Confusing complexity with volume: 500 lines of boilerplate ≠ hard. One line of CGEvent API ≠ easy. Estimate by uncertainty, not line count.
  4. Forgetting integration cost: Modules work alone but break together. Always add integration rounds.
  5. Ignoring user-side bottlenecks: If the user must manually grant permissions, restart an app, or test on a device, that's extra round time. Adjust minutes_per_round, don't add phantom rounds.

Calibration Reference

Here are example projects with known round counts to help calibrate:

See references/calibration-examples.md for detailed examples across project types.

Eval Prompts

See evals/evals.json for test cases to validate estimation accuracy.

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-05-12 06:18 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 843 📥 324,125
data-analysis

X Tweet Fetcher

hjw21century
无需登录或API密钥,零配置且无依赖,即可从X/Twitter获取完整推文、长推文、引用推文及X文章。
★ 15 📥 5,288
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,408 📥 324,961