← 返回
数据分析 中文

Modelsense

ModelSense — The right model for the right job. Recommends the best LLM model and effort level for any task, based on benchmark data, task analysis, and the...
ModelSense — 为每项任务匹配最合适的模型。根据基准数据、任务分析等,智能推荐最佳大语言模型及其算力投入级别。
xinbenlv
数据分析 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 509
下载
💾 11
安装
1
版本
#latest

概述

ModelSense Skill

Purpose

ModelSense helps users pick the optimal model and effort level for their task.

It does NOT route automatically on every request (use a provider plugin for that).

It's an on-demand advisor: ask it a question, get a clear recommendation with reasoning.

When to trigger

  • User asks: "which model for X?", "should I use Opus or Sonnet?", "what effort level?"
  • User wants to understand what a benchmark means
  • User wants ModelSense to auto-switch the session model

Inputs to collect (infer from context, ask only if truly unclear)

  1. Task description — what is the user trying to do?
  2. Effort preference (optional): quick / balanced / deep / research
    • If not specified, infer from task urgency/complexity
  3. Auto-switch? — does the user want ModelSense to apply the recommendation automatically?

Recommendation Process

Step 1 — Task Analysis

Classify the task across these dimensions:

  • Domain: code, math, reasoning, writing, dialogue, document analysis, multimodal, research
  • Complexity: simple / moderate / complex / research-grade
  • Output type: text, code, JSON, long-form, structured data
  • Context length needed: short (<8K), medium (8–32K), long (32K+), very long (100K+)
  • Special requirements: function calling, thinking/CoT, multimodal, speed-sensitive

Step 2 — Benchmark Matching

Cross-reference task domain with relevant benchmarks from data/benchmarks.yaml.

BenchmarkBest for
---------------------
HumanEval / SWE-benchCode generation, debugging, engineering
GPQAGraduate-level science & research
MATH / AIMEMathematical reasoning
MMLUGeneral knowledge, multidomain QA
Needle-in-HaystackLong-context retrieval
MT-Bench / Arena EloDialogue, writing quality
BBH (Big-Bench Hard)Complex reasoning, multi-step logic

Step 3 — Effort × Model Matrix

EffortTarget qualityTypical model tier
------------------------------------------
quickGood enough, fastHaiku / Flash / GLM
balancedHigh quality, reasonable costSonnet / GPT-4o
deepBest available, thinking onOpus / o3
researchNo cost limit, maximum qualityOpus + thinking=high

Step 4 — Provider Filter

Check the user's available providers:

  • Run: openclaw models list via exec tool (or read from context)
  • Only recommend models the user can actually use
  • Flag when a top pick requires a provider they haven't configured

Step 5 — Output the Recommendation

Format:

🎯 Recommended: <model>
⚡ Effort: <level>
📊 Why: <1-2 sentence benchmark-grounded rationale>
🔧 Special: <thinking on? function calling? etc.>
💰 Cost estimate: <rough $/M or relative>

Alternatives:
  - <model B> — if you want faster/cheaper
  - <model C> — if you want higher quality

Auto-Switch Behaviors

Option A: Advisory only (default)

Just output the recommendation. Tell user: "Run /model to switch."

Option B: Switch current session

If user confirms or says "yes switch" / "apply it":

session_status(model="<provider/model>")

Notify user: "✅ Switched to X for this session. Run /model default to reset."

Option C: Delegate task to best model

If user says "just do it with the best model":

sessions_spawn(
  task="<original task>",
  model="<recommended model>",
  thinking="<level>"
)

Data Files

  • data/benchmarks.yaml — benchmark definitions, score leaders, task mappings
  • data/models.yaml — model catalog (updated via GitHub Actions weekly)

Examples

User: "I need to write a Solidity audit report"

→ Domain: code + security + long-form

→ Benchmarks: SWE-bench, HumanEval

→ Recommendation: claude-opus-4-6 with thinking=high, effort=deep

User: "Quick summary of this Slack thread"

→ Domain: dialogue, short

→ Recommendation: claude-haiku-4-5 or gemini-flash, effort=quick

User: "Prove this mathematical conjecture"

→ Domain: math, research-grade

→ Benchmarks: MATH, AIME, GPQA

→ Recommendation: o3 or claude-opus-4-6 with thinking=high, effort=research

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-30 18:03 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,485
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 165 📥 60,032
data-analysis

Stock Analysis

udiedrichsen
{"answer":"基于雅虎财经数据,分析股票与加密货币。支持投资组合管理、自选股预警、股息分析、8维评分、热门趋势扫描及传闻/早期信号探测。适用于股票分析、持仓追踪、财报异动、加密监控、热门股追踪或提前发掘非主流传闻。"}
★ 270 📥 56,980