← 返回
未分类 Key 中文

LLM Regression Monitor

Use this skill when the user wants to monitor LLM behavior over time and get alerted when outputs change unexpectedly. Triggers on requests like "set up LLM...
当用户希望长期监控大语言模型(LLM)行为并在输出意外变化时收到警报时,使用此技能。触发请求如“设置 LLM...”。
swanand33 swanand33 来源
未分类 clawhub v1.0.2 1 版本 100000 Key: 需要
★ 0
Stars
📥 391
下载
💾 0
安装
1
版本
#latest

概述

LLM Regression Monitor

Overview

Automated behavioral regression monitoring for LLM apps. Captures baseline outputs, detects drift on a schedule, and fires WhatsApp or Slack alerts the moment something regresses.


Workflow Decision Tree

User request
├── "set up monitoring" / first time    → Full Setup (steps 1–5)
├── "run the monitor now"               → Step 4 only
├── "I changed my prompt/model"         → Step 3b (update baseline)
└── "configure alerts"                  → Step 5

Step 1 — Install

pip install llm-behave[semantic] pyyaml requests

Step 2 — Create test_suite.yaml

Create in the project root. Minimal example:

tests:
  - name: support_response
    prompt: "A customer says they never received their order. How do you respond?"
    provider: openai        # openai | anthropic | ollama | custom
    model: gpt-4o-mini
    assertions:
      - type: tone
        expected: "empathetic"
    drift:
      enabled: true
      threshold: 0.80

Set the API key for the chosen provider:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...   # if using anthropic
# ollama needs no key

Read references/test-suite-format.md for the full field spec.

Read references/providers.md for env vars and Ollama setup.


Step 3 — Capture Baselines

python scripts/capture_baseline.py

Saves ground-truth outputs to .llm_behave_baselines/. Run once before monitoring begins.

3b — Update after intentional prompt/model change

# Reset one test
python scripts/capture_baseline.py --update-baseline <test-name>

# Reset all
python scripts/capture_baseline.py --force

Step 4 — Run the Monitor

python scripts/run_monitor.py

Writes monitor_report.json. Exits 0 on all-pass, 1 on any failure (CI-compatible).


Step 5 — Configure Alerts

# WhatsApp (requires wacli installed and logged in)
export ALERT_WHATSAPP_TO="+1234567890"

# Slack
export ALERT_SLACK_WEBHOOK="https://hooks.slack.com/services/..."

Add to .env in project root — scripts load it automatically. Send via:

python scripts/send_alert.py

Silent on green runs. Logs every alert to monitor_alerts.log regardless.


Step 6 — Schedule with OpenClaw Cron

Confirm the schedule with the user (default: 9am daily), then add:

  • Schedule: 0 9 *
  • Command: python run_monitor.py && true || python send_alert.py
  • Directory: project root (where test_suite.yaml lives)

The || send_alert.py fires only when run_monitor.py exits 1 (failures found).


Common Errors

ErrorFix
------
llm-behave is not installedpip install llm-behave[semantic]
OPENAI_API_KEY is not setExport key or add to .env
No baseline foundRun step 3 first
test_suite.yaml not foundCreate it in project root
LLM call errors in reportAPI issue — not a regression

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-05-07 05:52 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 212 📥 70,886
data-analysis

Stock Analysis

udiedrichsen
利用Yahoo Finance数据深度分析股票和加密货币。支持投资组合管理、关注列表与提醒、股息分析、八维度股票评分、热门趋势扫描(热点扫描器)及谣言/早期信号检测。适用于股票分析、投资组合追踪、财报反应、加密货币监控、热门股票发现及在主流
★ 281 📥 58,036
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 142,425