Autooptimise

Autonomously optimise any OpenClaw skill using a benchmark-driven experiment loop. Scores skill outputs 0-10 across 4 dimensions, identifies the lowest-scori...

使用基准驱动的实验循环自主优化任意 OpenClaw 技能。对技能输出在 4 个维度上以 0‑10 分评分，识别得分最低的...

未分类 clawhub v0.1.0 1 版本 99675.3 Key: 无需

★ 0

Stars

📥 307

下载

💾 0

安装

版本

#agents#benchmark driven skill#latest#optimise skills

概述

autooptimise

Autonomous benchmark-driven skill optimisation for OpenClaw. Inspired by Andrej Karpathy's autoresearch — the same modify → test → score → keep/discard loop, applied to agent skill quality instead of GPU training.

Trigger Phrases

"optimise my weather skill"
"run autooptimise on [skill-name]"
"benchmark my [skill-name] skill"
"improve my skill overnight"

Key Files

File	Purpose
------	---------
`benchmark/tasks.json`	Test task suite (prompts + expected qualities)
`benchmark/scorer.md`	LLM judge scoring rubric
`runner/run_experiment.md`	Autonomous loop instructions (load this next)
`runner/experiment_log.md`	Auto-created run log (gitignored)

How to Run

Read runner/run_experiment.md — it contains the full loop instructions
Confirm the target skill with the user if not specified
Execute the loop (max 3 iterations)
Present proposed changes for human approval — never auto-apply

Scoring

Use the best available LLM judge model (prefer a strong reasoning model). Score each task 0–10 on:

Accuracy — correct answer / correct tool called
Conciseness — no padding, no unnecessary text
Tool usage — right tool, right parameters
Formatting — output matches expected format

Full rubric: benchmark/scorer.md

Safety Rules

Never auto-apply changes. Always present a diff and wait for explicit human approval.
Never modify benchmark/tasks.json or benchmark/scorer.md during a run.
Never exceed 3 iterations per run in v0.1.
Log every action to runner/experiment_log.md.

版本历史

共 1 个版本

v0.1.0 当前

2026-05-07 12:08 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

🔗 相关推荐

dev-programming

Github

steipete

使用 `gh` CLI 与 GitHub 交互，通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。

★ 676 📥 325,422

ai-agent

Skill Vetter

spclaudehome

AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前，检查风险信号、权限范围及可疑模式。

★ 1,227 📥 267,822

ai-agent

self-improving agent

pskoett

捕获经验教训、错误及修正内容，以实现持续改进。适用于以下场景：（1）命令或操作意外失败；（2）用户纠正Claude（如“不，那不对……”“实际上……”）；（3）用户请求的功能不存在；（4）外部API或工具出现故障；（5）Claude发现自身

★ 4,082 📥 810,066