← 返回
数据分析 中文

Hle Benchmark Evolver

Runs HLE-oriented benchmark reward ingestion and curriculum generation for capability-evolver. Use when the user asks to optimize Humanity's Last Exam score,...
执行以HLE为导向的基准奖励摄入和课程生成,用于能力进化器优化。当用户要求提升人类最后考试(HLE)分数时使用。
wanng-ide
数据分析 clawhub v1.0.0 1 版本 99921.3 Key: 无需
★ 1
Stars
📥 1,249
下载
💾 24
安装
1
版本
#latest

概述

HLE Benchmark Evolver

This skill operationalizes HLE score-driven evolution for OpenClaw.

When to Use

  • User asks to improve HLE score (for example target >= 60%).
  • User provides question-level benchmark output and wants it converted to reward.
  • User wants easy-first curriculum queue and next-focus questions.
  • User asks for an immediate benchmark result snapshot.

Inputs

  • Benchmark report JSON path (--report=/abs/path/report.json)
  • Optional benchmark id (cais/hle default)

Workflow

  1. Validate the report JSON exists and is parseable.
  2. Ingest report into capability-evolver benchmark reward state.
  3. Generate curriculum signals:
    • benchmark_*
    • curriculum_stage:*
    • focus_subject:*
    • focus_modality:*
    • question_focus:*
  4. Return a compact result summary for this run.

Run

node skills/hle-benchmark-evolver/run_result.js --report=/absolute/path/hle_report.json

Full automatic loop (starts evolution cycle):

node skills/hle-benchmark-evolver/run_pipeline.js --report=/absolute/path/hle_report.json --cycles=1

If your evaluator can be called from shell, let pipeline generate the report each cycle:

node skills/hle-benchmark-evolver/run_pipeline.js \
  --report=/absolute/path/hle_report.json \
  --eval_cmd="python /path/to/eval_hle.py --out {{report}}" \
  --cycles=3 --interval_ms=2000

If no --report is provided, it defaults to:

skills/capability-evolver/assets/gep/hle_report.template.json

Output Contract

Always print JSON with these fields:

  • benchmark_id
  • run_id
  • accuracy
  • reward
  • trend
  • curriculum_stage
  • queue_size
  • focus_subjects
  • focus_modalities
  • next_questions

Notes

  • This skill handles reward/curriculum ingestion. It does not directly solve HLE questions.
  • run_pipeline.js links ingestion, evolve, and solidify into one executable loop.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 08:05 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 166 📥 60,239
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,834
developer-tools

Api Tester

wanng-ide
执行结构化HTTP/HTTPS请求(GET、POST、PUT、DELETE),支持自定义标头和JSON正文。适用于API测试、健康检查或交互操作。
★ 7 📥 7,498