概述

LLM Regression Monitor

Overview

Automated behavioral regression monitoring for LLM apps. Captures baseline outputs, detects drift on a schedule, and fires WhatsApp or Slack alerts the moment something regresses.

Workflow Decision Tree

User request
├── "set up monitoring" / first time    → Full Setup (steps 1–5)
├── "run the monitor now"               → Step 4 only
├── "I changed my prompt/model"         → Step 3b (update baseline)
└── "configure alerts"                  → Step 5

Step 1 — Install

pip install llm-behave[semantic] pyyaml requests

Step 2 — Create test_suite.yaml

Create in the project root. Minimal example:

tests:
  - name: support_response
    prompt: "A customer says they never received their order. How do you respond?"
    provider: openai        # openai | anthropic | ollama | custom
    model: gpt-4o-mini
    assertions:
      - type: tone
        expected: "empathetic"
    drift:
      enabled: true
      threshold: 0.80

Set the API key for the chosen provider:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...   # if using anthropic
# ollama needs no key

Read references/test-suite-format.md for the full field spec.

Read references/providers.md for env vars and Ollama setup.

Step 3 — Capture Baselines

python scripts/capture_baseline.py

Saves ground-truth outputs to .llm_behave_baselines/. Run once before monitoring begins.

3b — Update after intentional prompt/model change

# Reset one test
python scripts/capture_baseline.py --update-baseline <test-name>

# Reset all
python scripts/capture_baseline.py --force

Step 4 — Run the Monitor

python scripts/run_monitor.py

Writes monitor_report.json. Exits 0 on all-pass, 1 on any failure (CI-compatible).

Step 5 — Configure Alerts

# WhatsApp (requires wacli installed and logged in)
export ALERT_WHATSAPP_TO="+1234567890"

# Slack
export ALERT_SLACK_WEBHOOK="https://hooks.slack.com/services/..."

Add to .env in project root — scripts load it automatically. Send via:

python scripts/send_alert.py

Silent on green runs. Logs every alert to monitor_alerts.log regardless.

Step 6 — Schedule with OpenClaw Cron

Confirm the schedule with the user (default: 9am daily), then add:

Schedule: 0 9 *
Command: python run_monitor.py && true || python send_alert.py
Directory: project root (where test_suite.yaml lives)

The || send_alert.py fires only when run_monitor.py exits 1 (failures found).

Common Errors

Error	Fix
---	---
`llm-behave is not installed`	`pip install llm-behave[semantic]`
`OPENAI_API_KEY is not set`	Export key or add to `.env`
`No baseline found`	Run step 3 first
`test_suite.yaml not found`	Create it in project root
LLM call errors in report	API issue — not a regression

版本历史

共 1 个版本

v1.0.2 当前

2026-05-07 05:52 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

LLM Regression Monitor

概述

LLM Regression Monitor

Overview

Workflow Decision Tree

Step 1 — Install

Step 2 — Create test_suite.yaml

Step 3 — Capture Baselines

3b — Update after intentional prompt/model change

Step 4 — Run the Monitor

Step 5 — Configure Alerts

Step 6 — Schedule with OpenClaw Cron

Common Errors

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Data Analysis

Stock Analysis

AdMapix