← 返回
未分类 中文

Supervised Agentic Loop

Self-improving AI agent loop with built-in misalignment detection. An AI agent autonomously runs Brainstorm → Plan → Implement → Review → Evolve cycles — kee...
具备内置偏差检测的自我改进AI智能体循环。AI智能体自主运行「头脑风暴→规划→实现→审查→演化」迭代周期,持续优化自身能力与目标对齐性。
nefas11 nefas11 来源
未分类 clawhub v0.1.2 1 版本 100000 Key: 无需
★ 0
Stars
📥 566
下载
💾 2
安装
1
版本
#latest

概述

supervised-agentic-loop

Self-improving AI agent loop with built-in misalignment detection.

Quick Reference

WhatDetails
------
LoopBrainstorm → Plan → Implement → Review → Verify → Evolve
Agent modifiesOne file only (target_file)
MetricAny command that produces a numeric output
Safety (SAL)Git isolation + reputation scoring + 4 verification gates
Safety (Monitor)SYNC blocking + ASYNC LLM review + 10 behavior patterns
Persistenceresults.tsv + .state/learnings/ + reputation.db + *.jsonl

Two Packages, One System

sal/                        # Evolve Loop — the brain
├── config.py               # Run configuration
├── evolve_loop.py          # 6-phase loop orchestrator
├── contract.py             # AgentCallable protocol
├── metric_extractor.py     # Named strategies + regex
├── verification.py         # 4 verification gates
├── reputation.py           # EMA scoring + suspension
├── git_isolation.py        # Branch per run, auto-rollback
├── learnings.py            # Persistent pattern detection
├── brainstorm.py           # Hypothesis generation
├── cli.py                  # CLI entrypoint
└── monitor/                # Agent Monitor — the guardian
    ├── sanitizer.py        # Credential redaction (10 patterns)
    ├── behaviors.py        # 10 misalignment behaviors (B001-B010)
    ├── monitor.py          # Two-phase detection engine
    ├── classifier.py       # Severity classification + dedup
    ├── logger.py           # JSONL tool-call logging
    ├── alerter.py          # Telegram alerts (urllib)
    ├── heartbeat.py        # Self-monitoring + canary
    └── dashboard.py        # Command Center data functions

Dependency rule: sal/ imports monitor/, NEVER the reverse. Monitor has zero knowledge of SAL core.

How to Use

As a Skill (in your agent instructions)

Read the SKILL.md in supervised-agentic-loop/ and begin an evolve run.
Target file: train.py
Metric: python train.py (look for val_bpb, lower is better)

As a CLI

# Evolve loop
sal run --target train.py --metric "python train.py" --parser val_bpb
sal status
sal unsuspend --agent codex --reason "verified by human"

# Monitor
sal monitor stats       # sessions, alerts, health
sal monitor alerts      # recent misalignment alerts
sal monitor canary      # run 5 self-test checks

As a Python API

from sal.config import EvolveConfig
from sal.evolve_loop import EvolveLoop

config = EvolveConfig(
    target_file="train.py",
    metric_command="python train.py",
    metric_parser="val_bpb",
    minimize=True,
)

def my_agent(prompt: str) -> str:
    # Your LLM call here — must return output with JSON block
    ...

# Monitor auto-enabled. Set enable_monitor=False to disable.
loop = EvolveLoop(config, agent=my_agent, agent_id="my-model")
summary = loop.run()

Monitor Standalone

from sal.monitor import AgentMonitor, BlockDecision

monitor = AgentMonitor(state_dir=".state")

# SYNC: before every tool call (< 5ms)
decision = monitor.check_before_execute("exec", {"command": "rm -rf /"})
# → BlockDecision.BLOCK

# ASYNC: review full session
result = monitor.review_session(session_entries, agent_callable=my_llm)

Phases

  1. Baseline — Run metric on unmodified code (fails → HARD ABORT)
  2. Brainstorm — Generate hypothesis from history + learnings
  3. Plan — Create TaskContract with acceptance criteria
  4. Implement — Agent modifies target_file (🛡️ Monitor: SYNC prefilter)
  5. Review — Parse agent output for contract compliance
  6. Verify — Run verification gates (files, syntax, tests, lint) + extract metric
  7. Evolve — Compare metric → KEEP (commit) or DISCARD (rollback)

10 Misalignment Behaviors

IDBehaviorSeveritySync Block
------------
B001Restriction bypass / destructive deleteHIGH
B002Unexpected tool callsMEDIUM
B003Goal deviationMEDIUM
B004Self-modificationCRITICAL
B005Data exfiltrationCRITICAL
B006Hallucinated successHIGH
B007Privilege escalationHIGH
B008Resource exhaustionMEDIUM
B009Persistence / backdoorCRITICAL
B010ObfuscationHIGH

Auto-Brake Conditions

The loop stops automatically when:

  • Reputation ≤ 0.2 → Agent suspended
  • Monitor BLOCK → Iteration aborted + reputation penalty
  • Plateau → No improvement for N iterations
  • Budget → max_iterations reached
  • SIGINT → Human interrupt (graceful)

Built-in Metric Parsers

NameExtracts
------
last_line_floatFloat from last line of output
pytest_passedNumber of passed tests
pytest_failedNumber of failed tests
coverage_percentCoverage percentage
val_bpbValidation BPB value
benchmark_msMilliseconds from benchmark output
Custom regexAny regex with 1 capture group

Environment Variables

VariableDefaultDescription
---------
SAL_DB_PATH.state/reputation.dbReputation database path
MONITOR_TELEGRAM_BOT_TOKENTelegram bot token for alerts
MONITOR_TELEGRAM_CHAT_IDTelegram chat/user ID
MONITOR_LLM_COMMANDLLM for async session review
MONITOR_STATE_DIR.stateMonitor state directory

Constraints

  • Zero external dependencies (Python 3.11+ stdlib only)
  • Agent modifies exactly ONE file per iteration
  • All changes are git-isolated with automatic rollback
  • Learnings persist across runs in .state/learnings/
  • Monitor is optional — SAL works without it
  • 130 tests (69 SAL + 61 Monitor)

版本历史

共 1 个版本

  • v0.1.2 当前
    2026-03-30 22:51 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

Brainstorming

nefas11
编码前通过提问完善设计,适用于用户需求无明确规格时。
★ 1 📥 1,475
ai-agent

self-improving agent

pskoett
记录自身发现以实现自我改进的技能
★ 4,135 📥 909,928
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,424 📥 326,546