← 返回
AI智能 中文

Who Wins

Query the PinchBench AI agent leaderboard with real benchmark data. Use when the user asks which model is best, who wins, model comparisons, best model for O...
查询 PinchBench AI 代理排行榜,获取真实基准数据。适用于用户询问哪个模型最好、谁获胜、模型对比、最佳模型等场景。
spideystreet
AI智能 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 489
下载
💾 8
安装
1
版本
#latest

概述

PinchBench Leaderboard

Fetches and formats the PinchBench leaderboard — AI agent benchmarks for LLMs on standardized OpenClaw coding tasks.

Workflow

1. Determine the query

Map the user's intent to script flags:

User intentFlags
--------------------
"Show the leaderboard" / default--top 10
"Top 5 models"--top 5
"How does Claude perform?"--model claude
"Cheapest models"--sort cost --top 10
"Fastest models"--sort time --top 10
"Compare Gemini and Claude"Run twice with --model gemini and --model claude, present side by side
"Full leaderboard"--top 50

2. Run the script

{
  "tool": "exec",
  "command": "python3 {baseDir}/scripts/fetch_leaderboard.py --top 10"
}

Available flags:

  • --top N — number of models to show (default: 10)
  • --sort metric — sort by score, cost, time, or runs (default: score)
  • --model filter — filter models containing this string (case-insensitive)
  • --json — output raw JSON for further processing

3. Format the response

Present the output as-is in a code block. Add a brief one-line insight after the table:

  • Highlight the top performer and its score
  • If the user asked about a specific model, comment on its ranking relative to the field
  • If sorting by cost, note the best value (score/cost ratio)

4. Error handling

  • If the script fails with a curl error → report the error, suggest checking network connectivity
  • If the script fails to parse data → the site structure may have changed, inform the user
  • If no models match the filter → say so and suggest a broader search

Examples

User saysFlagsExpected behavior
-------------------------------------
"Show me the PinchBench leaderboard"--top 10Show top 10 by score
"Which model is cheapest for OpenClaw?"--sort cost --top 10Show top 10 sorted by cost
"How does Claude compare to GPT?"--model claude then --model gptShow both, compare
"What's the fastest model on PinchBench?"--sort time --top 5Show top 5 by execution time

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 15:42 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,351 📥 317,796
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,057 📥 796,719
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 710 📥 243,582