← 返回
未分类

Sharpagent Self Evolving

SharpAgent Self-Evolving Loop — An automated 'Think→Do→Learn' cycle. Fuses the Self-Improving Agent's reflection mechanism with the autoresearch experimental...
SharpAgent 自我进化循环 — 自动化的“思考→行动→学习”循环,融合自我提升 Agent 的反思机制与自动研究实验。
yezhaowang888-stack yezhaowang888-stack 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 209
下载
💾 0
安装
1
版本
#experiment#latest#reflection#self-evolving#sharpagent

概述

SharpAgent Self-Evolving Loop v1.0.0

> Make your agent smarter with every task.

> The end of one task is the starting point for the next evolution.

> Fuses the two key discoveries from R2: Self-Improving Agent reflection × autoresearch experiment verification.

Core Philosophy

Most agents finish a task and stop. The next time a similar problem comes up, it starts from scratch. No accumulation.

SharpAgent's self-evolving loop breaks this cycle:

① Execute task → ② Reflect ("What could be better?")
                   ↓
⑤ Absorb lesson → ③ Form improvement hypothesis
                   ↓
                ④ Run small experiment to verify
                     ↓
            (back to ②)

Every task is an evolution. It doesn't get more expensive with use — it gets more accurate.

Contract

contract:
  name: sharpagent-self-evolving
  version: "1.0.0"
  category: workflow
  trust_level: verified
  reads:
    - Task
    - LearningEntry
    - FiveFactorResult
  writes:
    - LearningEntry
    - ImprovementHypothesis
  preconditions:
    - "A completed task exists to reflect on"
    - "Access to read task output and logs"
  postconditions:
    - "Reflection produces at least 1 improvement hypothesis"
    - "If hypothesis is verifiable, an experiment is designed"
    - "Experiment outcome is recorded as LearningEntry"
  calibration:
    default_mode: professional
    modes_supported: [professional, deep]
  compliance:
    jurisdiction: global
    safety_level: standard
  lifecycle:
    status: active
    publish_as: SharpAgent

Lifecycle: 4-Phase Evolution Loop

 ┌─────────────────────────────────────────────┐
 │                                              │
 │   [1. REFLECT] → [2. HYPOTHESIZE]           │
 │       ↑                        ↓             │
 │   [4. ABSORB]  ←  [3. EXPERIMENT]           │
 │                                              │
 └─────────────────────────────────────────────┘

Phase 1: REFLECT — Analyze

After every task, do a structured reflection.

When:

  • Every task completion (mandatory)
  • Major errors mid-task (force deep mode)
  • Daily summary (optional, merge multiple reflections)

Reflection Framework:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🧬 Task Reflection
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 Task: {task_name}
⏱  Duration: {duration}

🟢 What went right?
- {2-3 specific, quantifiable things}

🟡 What could improve?
- {1-3 things that could be better}

🔴 Clear mistakes?
- {If any: description + root cause + impact}

💡 Lesson learned
- {One-sentence lesson}

🧪 Improvement hypothesis
- {One clear, verifiable hypothesis}

Five-Factor Review Embedding: If the task involves information judgments, run each practice and lesson through the five factors:

🔗 Was my source credible?
🧠 Was my reasoning chain complete?
🌍 Compliance check?
🏳️ Any bias in chosen direction?
🔄 Any other sources to cross-verify?

Phase 2: HYPOTHESIZE — Form Hypothesis

Refine improvement ideas into verifiable hypotheses.

Hypothesis Format:

IF [I change approach] THEN [expected improvement] BECAUSE [reason]

Good vs Bad Hypotheses:

BadGood
-----------
"Write better next time""If I plan an outline for 30s before writing, title quality improves 20%"
"Check more sources""If I check 2 independent sources before deciding, cross-validation score improves 15%"
"Don't make that error again""If I add contract validation before commit, bug rate drops 30%"

Hypothesis Tiers:

TierMeaningAction
-----------------------
🟢 P0Critical improvement, fast (<5 min)Experiment immediately
🟡 P1Valuable, moderate effort (<30 min)Queue for experiment
🔴 P2Long-term, significant investmentRecord, experiment when possible

If reflection yields no improvement hypothesis → Check whether there's genuinely no room for improvement. 90% of the time the reflection wasn't honest enough.


Phase 3: EXPERIMENT — Verify

This is the core borrowed from autoresearch (karpathy/autoresearch ⭐80K).

Don't trust intuition that something is "better" — run a small experiment to prove it.

Experiment Cycle (borrowing autoresearch's 5-minute fixed budget):

experiment:
  budget: 5 min              # Fixed time budget
  hypothesis: "..."          # Hypothesis to verify
  setup:                     # Experiment setup
    - control: old approach
    - treatment: new approach
  measurements:              # Metrics
    - metric_1: "completion time"
    - metric_2: "error rate"
    - metric_3: "quality score"
  result:                    # Fill after experiment
    - metric_1: old=12s new=8s ✅
    - metric_2: old=3% new=1% ✅
    - metric_3: old=7/10 new=8.5/10 ✅
  verdict:                   # Conclusion
    - hypothesis_supported: true/false
    - adopt: yes/no/partial
    - notes: ""

Experiment Types:

TypeDescriptionBudget
---------------------------
A/B comparisonRun old vs new, compare results5 min
AblationRemove one step to see impact5 min
Boundary testTest stability under edge conditions3 min
Cross-verificationDifferent sources/methods for consistency5 min

Experiment Discipline:

  1. Write hypothesis before experiment (prevents post-hoc rationalization)
  2. Control variables — change one thing at a time
  3. Record data, not feelings
  4. Failed experiments are still learning

Phase 4: ABSORB — Archive

Record the result regardless of success or failure. This is the fuel for evolution.

Archive as LearningEntry:

{
  "type": "LearningEntry",
  "category": "evolution",
  "task_ref": "xxx",
  "source": "self-evolving-loop",
  "lesson": "Planning outline first improved title quality 20%",
  "evidence": "A/B experiment: control=7/10, treatment=8.5/10, n=5",
  "adopted": true,
  "applied_count": 0,
  "created_at": "2026-05-11T06:05:00Z",
  "expiry": null
}

Category Tags:

CategoryMeaningAction
---------------------------
coding-patternCode pattern improvementAuto-apply on next coding task
info-sourceInformation source improvementUpdate monitor source priority
workflowWorkflow optimizationUpdate engineering lifecycle gates
tool-usageTool usage skillEfficiency sequence
domain-knowledgeDomain knowledge accumulationLong-term memory

Auto-Propagation:

  • If coding-pattern → write to ~/.agent-templates/
  • If info-source → update monitor config
  • If workflow → check if engineering lifecycle needs update
  • If lesson verified ≥3 times → promote to verified-best-practice

Full Cycle Example

Task: Analyze an AI paper

① Reflection
✅ Good: Structured extraction of method/results/limitations
🟡 Improve: Abstract always too long, user loses patience
🔴 Error: Forgot to check arXiv for updated version
💡: 150-char abstracts are read more often than 300-char ones

② Hypothesis
IF limit abstract to 150 chars THEN user read rate improves 30%
BECAUSE last analysis (300 chars) was only read halfway

③ Experiment
A/B: Same paper, 300-char version vs 150-char version
Result: 150-char version fully read, 300-char interrupted
Conclusion: ✅ Hypothesis supported, adopt

④ Absorb
Record as workflow lesson, update monitor output template

Edge Cases

SituationAction
-------------------
Task execution failedForce deep reflection mode, focus on root cause
3 consecutive experiment failuresQuestion hypothesis itself, check experiment design
Tiny task (rename variable)Skip loop, but log if recurring error pattern
Multiple reflections same dayMerge into daily evolution summary
Hypothesis too abstractBreak into verifiable sub-hypotheses
User says "no reflection needed"Skip but log to preference profile

Quality Gates

CheckWhatFail action
--------------------------
Reflection outputAt least 1 improvement hypothesisReflect again
Hypothesis verifiableHas clear A/B or ablation planRequire refinement
Experiment has dataNumbers not "feelings"Retest or mark unverifiable
Absorb archivedExperiment result saved as LearningEntryForce archive
Self-referenceDon't repeat same hypothesis weeklyMark as duplicate

Integration Points

Five-Factor Review

  • Phase 1 reflection judgments run through five factors
  • Learning entries carry FiveFactorResult as provenance

Engineering Lifecycle

  • Phase 2 hypothesis = engineering improvement proposal
  • Phase 3 experiment = verification phase
  • Successful experiments auto-update lifecycle best practices

Intelligence Monitor

  • Source evolution: unreliable sources from reflection auto-downranked in monitor

Version History

  • v1.0.0 — Initial release. 4-phase self-evolving loop: Reflect → Hypothesize → Experiment → Absorb.

SharpAgent · MIT-0 · 2026-05-11

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-12 05:51 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

professional

学术研究助手

yezhaowang888-stack
学术研究全流程助手,提供论文写作指导、文献检索方法、学术工具推荐、期刊投稿指南、学术会议信息、科研项目管理等。适用于大学生、研究生和科研人员。支持家庭(知识库)和商业(API扩展)双模式。触发条件:用户提出与论文、文献、期刊、投稿、学术、科
★ 1 📥 1,168
ai-agent

Find Skills

root
帮助用户发现和安装智能体技能,当用户询问如「如何做X」、「找X的技能」、「有能做...的吗」等问题时
★ 1,518 📥 574,599
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 865 📥 344,773