← 返回
未分类 中文

Autooptimise

Autonomously optimise any OpenClaw skill using a benchmark-driven experiment loop. Scores skill outputs 0-10 across 4 dimensions, identifies the lowest-scori...
使用基准驱动的实验循环自主优化任意 OpenClaw 技能。对技能输出在 4 个维度上以 0‑10 分评分,识别得分最低的...
wealthvisionai-source wealthvisionai-source 来源
未分类 clawhub v0.1.0 1 版本 99675.3 Key: 无需
★ 0
Stars
📥 307
下载
💾 0
安装
1
版本
#agents#benchmark driven skill#latest#optimise skills

概述

autooptimise

Autonomous benchmark-driven skill optimisation for OpenClaw. Inspired by Andrej Karpathy's autoresearch — the same modify → test → score → keep/discard loop, applied to agent skill quality instead of GPU training.

Trigger Phrases

  • "optimise my weather skill"
  • "run autooptimise on [skill-name]"
  • "benchmark my [skill-name] skill"
  • "improve my skill overnight"

Key Files

FilePurpose
---------------
benchmark/tasks.jsonTest task suite (prompts + expected qualities)
benchmark/scorer.mdLLM judge scoring rubric
runner/run_experiment.mdAutonomous loop instructions (load this next)
runner/experiment_log.mdAuto-created run log (gitignored)

How to Run

  1. Read runner/run_experiment.md — it contains the full loop instructions
  2. Confirm the target skill with the user if not specified
  3. Execute the loop (max 3 iterations)
  4. Present proposed changes for human approval — never auto-apply

Scoring

Use the best available LLM judge model (prefer a strong reasoning model). Score each task 0–10 on:

  • Accuracy — correct answer / correct tool called
  • Conciseness — no padding, no unnecessary text
  • Tool usage — right tool, right parameters
  • Formatting — output matches expected format

Full rubric: benchmark/scorer.md

Safety Rules

  • Never auto-apply changes. Always present a diff and wait for explicit human approval.
  • Never modify benchmark/tasks.json or benchmark/scorer.md during a run.
  • Never exceed 3 iterations per run in v0.1.
  • Log every action to runner/experiment_log.md.

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-05-07 12:08 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 676 📥 325,422
ai-agent

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,227 📥 267,822
ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,082 📥 810,066