ModelSense helps users pick the optimal model and effort level for their task.
It does NOT route automatically on every request (use a provider plugin for that).
It's an on-demand advisor: ask it a question, get a clear recommendation with reasoning.
quick / balanced / deep / researchClassify the task across these dimensions:
Cross-reference task domain with relevant benchmarks from data/benchmarks.yaml.
| Benchmark | Best for |
|---|---|
| ----------- | ---------- |
| HumanEval / SWE-bench | Code generation, debugging, engineering |
| GPQA | Graduate-level science & research |
| MATH / AIME | Mathematical reasoning |
| MMLU | General knowledge, multidomain QA |
| Needle-in-Haystack | Long-context retrieval |
| MT-Bench / Arena Elo | Dialogue, writing quality |
| BBH (Big-Bench Hard) | Complex reasoning, multi-step logic |
| Effort | Target quality | Typical model tier |
|---|---|---|
| -------- | --------------- | ------------------- |
quick | Good enough, fast | Haiku / Flash / GLM |
balanced | High quality, reasonable cost | Sonnet / GPT-4o |
deep | Best available, thinking on | Opus / o3 |
research | No cost limit, maximum quality | Opus + thinking=high |
Check the user's available providers:
openclaw models list via exec tool (or read from context)Format:
🎯 Recommended: <model>
⚡ Effort: <level>
📊 Why: <1-2 sentence benchmark-grounded rationale>
🔧 Special: <thinking on? function calling? etc.>
💰 Cost estimate: <rough $/M or relative>
Alternatives:
- <model B> — if you want faster/cheaper
- <model C> — if you want higher quality
Just output the recommendation. Tell user: "Run /model to switch."
If user confirms or says "yes switch" / "apply it":
session_status(model="<provider/model>")
Notify user: "✅ Switched to X for this session. Run /model default to reset."
If user says "just do it with the best model":
sessions_spawn(
task="<original task>",
model="<recommended model>",
thinking="<level>"
)
data/benchmarks.yaml — benchmark definitions, score leaders, task mappingsdata/models.yaml — model catalog (updated via GitHub Actions weekly)User: "I need to write a Solidity audit report"
→ Domain: code + security + long-form
→ Benchmarks: SWE-bench, HumanEval
→ Recommendation: claude-opus-4-6 with thinking=high, effort=deep
User: "Quick summary of this Slack thread"
→ Domain: dialogue, short
→ Recommendation: claude-haiku-4-5 or gemini-flash, effort=quick
User: "Prove this mathematical conjecture"
→ Domain: math, research-grade
→ Benchmarks: MATH, AIME, GPQA
→ Recommendation: o3 or claude-opus-4-6 with thinking=high, effort=research
共 1 个版本