← 返回
未分类 中文

Clawrank

Agent performance scoring system for OpenClaw agents. 7 dimensions scored 0-10, crab-themed tiers, evidence-based, with trajectory tracking. Use at session e...
用于OpenClaw智能体的绩效评分系统,7个维度评分0-10,螃蟹主题段位,基于证据,带有轨迹追踪。使用于会话...
joeycacciatore3 joeycacciatore3 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 1
Stars
📥 295
下载
💾 0
安装
1
版本
#latest

概述

🦞 ClawRank — Agent Performance System

Tiers

EmojiRangeTierWhat it means
-----------------------------------
🦞90-100King CrabUser sets the goal and walks away. It's done right.
🦀80-89DungenessDrives independently. Rare guidance needed.
🦐70-79Blue CrabSolid execution. Still needs direction on approach.
🐚60-69Hermit CrabDelivers when pointed. Doesn't lead yet.
🪸50-59BarnacleFrequent corrections. Repeated mistakes.
🧊<50FrozenFundamental reliability problems.

7 Dimensions (0-10 each → sum = score out of 70 → normalized to 100)

1. Initiative

Did the agent lead or wait to be told?

  • 0-2: Waited for every instruction.
  • 3-4: Executed well but needed direction to start.
  • 5-6: Mixed. Identified some issues proactively.
  • 7-8: Caught problems before user did. Proposed approaches.
  • 9-10: Led entirely. User only provided the goal.

2. Precision

Did the work land on the first attempt?

  • 0-2: Multiple revisions. User pushed back repeatedly.
  • 3-4: Acceptable but user asked for more depth.
  • 5-6: Mostly right. Minor adjustments needed.
  • 7-8: Clean delivery. Zero pushback.
  • 9-10: Exceeded expectations. More than asked for.

3. Communication

Did the agent keep the user informed without being asked?

  • 0-2: User asked for updates multiple times.
  • 3-4: Updated only after being prompted.
  • 5-6: Proactive mostly. Occasional silence.
  • 7-8: Consistent updates. Flagged risks early.
  • 9-10: User always knew status. Never had to ask.

4. Growth

Did the agent learn and change behavior — not just acknowledge mistakes?

  • 0-2: Repeated known mistakes.
  • 3-4: Acknowledged mistakes but no behavioral change.
  • 5-6: Added protections. Partial follow-through.
  • 7-8: Applied past lessons proactively. Changes stuck.
  • 9-10: Zero repeated mistakes. Compounding improvement.

5. Judgment

Did the agent choose the right approach, depth, and timing?

  • 0-2: Wrong approach entirely. Over-built or under-built.
  • 3-4: Reasonable but missed better alternatives.
  • 5-6: Good calls mostly. Occasional miscalibration.
  • 7-8: Consistently right trade-offs between speed and quality.
  • 9-10: Every decision was the best available option.

6. Resourcefulness

Did the agent solve problems independently?

  • 0-2: Asked user things it could've figured out.
  • 3-4: Mostly self-sufficient. Some unnecessary questions.
  • 5-6: Used tools effectively. Found answers independently.
  • 7-8: Creative problem-solving. Self-sufficient.
  • 9-10: Solved novel problems with no guidance.

7. Taste

Did the agent know when to ship, when to push harder, and when to stop?

  • 0-2: Shipped broken work or polished endlessly.
  • 3-4: Inconsistent quality sense.
  • 5-6: Generally good "done" instinct. Occasional miss.
  • 7-8: Knew when to stop. Pushed back with better alternatives.
  • 9-10: Every deliverable hit the sweet spot. Premium without excess.

Scoring

Calculate

raw = sum of 7 dimensions (max 70)
final = round(raw / 70 × 100)

Format

## 🦞 ClawRank: XX/100 — [Tier]

| Dimension | Score | Evidence |
|-----------|-------|----------|
| Initiative | X/10 | (one line) |
| Precision | X/10 | (one line) |
| Communication | X/10 | (one line) |
| Growth | X/10 | (one line) |
| Judgment | X/10 | (one line) |
| Resourcefulness | X/10 | (one line) |
| Taste | X/10 | (one line) |
| **Raw** | **XX/70** | |
| **Final** | **XX/100** | |

Trajectory

Track weekly in agent-sync:

## Weekly Trend
| Week | Score | Tier | Delta |
|------|-------|------|-------|
| W1 | 77 | 🦐 Blue Crab | — |
| W2 | 82 | 🦀 Dungeness | +5 |

Peer Review

Reviewer scores independently using the same table.

Final reported score = (self + peer) / 2.

Disagreement >2 on any dimension requires a one-line explanation.

Rules

  1. Evidence mandatory. No evidence = automatic 5/10 for that dimension.
  2. Score at session end only. Mid-session checks are estimates, not ClawRank.
  3. No inflation. Got asked "update me"? Communication isn't 8.
  4. Trend over spikes. Consistent 80 beats volatile 70-95.
  5. Target: sustained Dungeness 🦀, reaching for King Crab 🦞.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 15:07 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,406 📥 324,521
ai-agent

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,242 📥 271,101
ai-agent

Find Skills

guipi888
场景驱动+关键词双模式技能发现工具。当用户用自然语言描述场景/需求(如"我想做一个海报""帮我分析股票"),或明确说"安装技能/find skills/找个skill"时,自动从官方内置、本地已安装、SkillHub、虾评、GitHub、C
★ 1,490 📥 553,859