← 返回
AI智能 Key 中文

Agent Lightning

Microsoft Research's agent training framework. Optimizes AI agents with Reinforcement Learning, Automatic Prompt Optimization, and Supervised Fine-tuning. Ze...
微软研究院的智能体训练框架,使用强化学习、自动提示优化和监督微调来优化AI智能体。
olmmlo-cmd
AI智能 clawhub v1.0.0 1 版本 99897 Key: 需要
★ 0
Stars
📥 970
下载
💾 36
安装
1
版本
#latest

概述

Agent Lightning ⚡

Microsoft Research's agent training framework. Turn your AI agents into optimizable beasts with (almost) zero code changes.

Core Features

  • 🔌 Universal Compatibility: Works with LangChain, OpenAI Agent SDK, AutoGen, CrewAI, Microsoft Agent Framework, or plain Python OpenAI
  • 🎯 Selective Optimization: Optimize one or more agents in a multi-agent system
  • 🧠 Multiple Algorithms: Reinforcement Learning (RL), Automatic Prompt Optimization (APO), Supervised Fine-tuning (SFT)
  • ⚡ Zero Code Change: Add agl.emit_xxx() helpers or use tracer — your agent keeps running as usual

Installation

pip install agentlightning

For latest nightly build:

pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightning

Quick Start

1. Instrument Your Agent

Option A: Add emit helpers (recommended)

import agentlightning as agl

# In your agent's tool calls
response = agl.emit_tool_call(
    model=model,
    messages=messages,
    tools=tools,
    context={"task": "search"}
)

Option B: Use tracer (zero code change)

from agentlightning import tracer

# Wrap your agent with tracer
with tracer.trace("my-agent", input_data):
    result = your_agent.run(user_query)

2. Create Training Config

# config.yaml
agent:
  name: "my-agent"
  type: "openai"  # openai, langchain, autogen, crewai

training:
  algorithm: "grpo"  # grpo, apo, sft, rloo
  episodes: 100
  batch_size: 16
  
environment:
  eval_tasks:
    - "math"
    - "coding"
    - "reasoning"

3. Run Training

agent-lightning train --config config.yaml

Algorithms

AlgorithmUse CaseDescription
----------------------------------
GRPOGeneral RLGroup Relative Policy Optimization — stable, works well for most agents
APOPrompt TuningAutomatic Prompt Optimization — improves system prompts
SFTSupervised Fine-tuningSupervised Fine-tuning with preference data
RLOOLong-horizonRLOO for tasks with sparse rewards

Usage Commands

agent-lightning train

Train your agent with configured algorithm.

agent-lightning eval

Evaluate agent on benchmark tasks.

agent-lightning export

Export trained model/prompts for deployment.

agent-lightning serve

Launch serving endpoint for trained agent.

Example: SQL Agent Training

See full example: Train SQL Agent with RL

from agentlightning import Agent, RLConfig, GRPOTrainer

# 1. Define your agent
sql_agent = Agent(
    name="sql-agent",
    system_prompt="You are a SQL expert...",
    tools=[execute_sql, query_schema]
)

# 2. Configure RL training
config = RLConfig(
    algorithm="grpo",
    episodes=500,
    learning_rate=1e-4
)

# 3. Train
trainer = GRPOTrainer(config=config)
trainer.train(sql_agent, eval_tasks=["sql-generation"])

Integration with Clawdbot

Environment Variables

# Required for training
export OPENAI_API_KEY="sk-..."

# Optional: for remote storage
export AGL_STORAGE="s3://my-bucket/agent-lightning/"

Python API

from agentlightning import LightningStore, GRPOTrainer

# LightningStore keeps tasks, resources, and traces in sync
store = LightningStore()

# Read traces, learn, and update prompts
trainer = GRPOTrainer(store=store)
trainer.train(agent=my_agent)

Monitoring Training

# Launch dashboard
agent-lightning dashboard --port 8080

# View logs
tail -f ~/.agent-lightning/logs/training.log

Best Practices

  1. Start Small: Begin with 10-50 episodes to verify setup
  2. Define Clear Rewards: Design reward functions that match your goal
  3. Use Evaluation Tasks: Always eval on held-out tasks
  4. Checkpoint Frequently: Save model every N episodes
  5. Monitor Convergence: Watch loss curves in dashboard

Resources

Citation

If you use Agent Lightning in research:

@misc{luo2025agentlightningtrainai,
  title={Agent Lightning: Train ANY AI Agents with Reinforcement Learning},
  author={Xufang Luo and Yuge Zhang and Zhiyuan He and Zilong Wang and Siyun Zhao and Dongsheng Li and Luna K. Qiu and Yuqing Yang},
  year={2025},
  eprint={2508.03680},
  archivePrefix={arXiv},
  primaryClass={cs.AI}
}

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 08:13 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 714 📥 244,100
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,061 📥 799,434
developer-tools

Agent Guardrails

olmmlo-cmd
阻止 AI 智能体暗中绕过规则。通过 git 钩子、秘密检测、部署验证和导入注册表实现强制性约束。
★ 0 📥 1,230