← 返回
开发者工具 中文

Model Tester

Test agents or models against predefined test cases to validate model routing, performance, and output quality. Use when: (1) verifying a specific agent or m...
针对预定义测试用例对智能体或模型进行测试,以验证模型路由、性能和输出质量。适用场景:(1) 验证特定智能体或模型...
nandorocker
开发者工具 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 631
下载
💾 10
安装
1
版本
#agents#debugging#latest#testing

概述

Use scripts/model_tester.py to run repeatable test prompts and compare requested vs actual model usage from OpenClaw logs.

Run

From the skill directory (or pass absolute paths):

python3 scripts/model_tester.py --agent menial --case extract-emails
python3 scripts/model_tester.py --model openai/gpt-4.1 --case math-reasoning
python3 scripts/model_tester.py --agent chat --model openai/gpt-4.1 --case all --out /tmp/model-test.json

Inputs

  • --agent : Target agent (chat, menial, coder, etc.)
  • --model : Requested model alias/name to test
  • --case : Case from references/test-cases.json or all
  • --timeout : Per-case timeout (default 120)
  • --out : Optional JSON output file

Require at least one of --agent or --model.

What the runner does

  1. Load test cases from references/test-cases.json.
  2. Start openclaw logs --follow --json in parallel.
  3. Run openclaw agent --json with a bounded test prompt (asks agent to use a subagent for the task).
  4. Parse response + tailed logs.
  5. Emit machine-readable JSON and a short human summary.

Output format

Top-level JSON:

  • tool
  • timestamp
  • agent
  • requested_model
  • results[]

Each result entry returns:

  • test_case
  • agent
  • requested_model
  • actual_model (parsed from logs when available)
  • status (ok/error)
  • result_summary
  • runtime_seconds
  • tokens (when discoverable)
  • errors[]

Privacy & Safety

The tester spawns isolated subagent tasks with predefined test prompts only — no user data is passed to models. It tails OpenClaw logs to extract:

  • which model was actually selected (routing validation)
  • token usage statistics
  • runtime metrics

Log extraction uses regex patterns to find model/token fields. No personally identifiable information or arbitrary log content is captured — only structured fields related to the test execution.

Notes

  • Model extraction and token extraction are best-effort because log fields may vary by OpenClaw/provider version.
  • If openclaw config is invalid or gateway is unavailable, the script returns status=error with stderr details.
  • Edit references/test-cases.json to add custom prompts for your benchmark set.
  • All test cases are generic; no workspace or user data is baked in.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-31 01:27 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Gog

steipete
Google Workspace 命令行工具,支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
★ 921 📥 185,758
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 66 📥 179,966
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 323,933