Automated behavioral regression monitoring for LLM apps. Captures baseline outputs, detects drift on a schedule, and fires WhatsApp or Slack alerts the moment something regresses.
User request
├── "set up monitoring" / first time → Full Setup (steps 1–5)
├── "run the monitor now" → Step 4 only
├── "I changed my prompt/model" → Step 3b (update baseline)
└── "configure alerts" → Step 5
pip install llm-behave[semantic] pyyaml requests
Create in the project root. Minimal example:
tests:
- name: support_response
prompt: "A customer says they never received their order. How do you respond?"
provider: openai # openai | anthropic | ollama | custom
model: gpt-4o-mini
assertions:
- type: tone
expected: "empathetic"
drift:
enabled: true
threshold: 0.80
Set the API key for the chosen provider:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-... # if using anthropic
# ollama needs no key
Read references/test-suite-format.md for the full field spec.
Read references/providers.md for env vars and Ollama setup.
python scripts/capture_baseline.py
Saves ground-truth outputs to .llm_behave_baselines/. Run once before monitoring begins.
# Reset one test
python scripts/capture_baseline.py --update-baseline <test-name>
# Reset all
python scripts/capture_baseline.py --force
python scripts/run_monitor.py
Writes monitor_report.json. Exits 0 on all-pass, 1 on any failure (CI-compatible).
# WhatsApp (requires wacli installed and logged in)
export ALERT_WHATSAPP_TO="+1234567890"
# Slack
export ALERT_SLACK_WEBHOOK="https://hooks.slack.com/services/..."
Add to .env in project root — scripts load it automatically. Send via:
python scripts/send_alert.py
Silent on green runs. Logs every alert to monitor_alerts.log regardless.
Confirm the schedule with the user (default: 9am daily), then add:
0 9 *python run_monitor.py && true || python send_alert.pytest_suite.yaml lives)The || send_alert.py fires only when run_monitor.py exits 1 (failures found).
| Error | Fix |
|---|---|
| --- | --- |
llm-behave is not installed | pip install llm-behave[semantic] |
OPENAI_API_KEY is not set | Export key or add to .env |
No baseline found | Run step 3 first |
test_suite.yaml not found | Create it in project root |
| LLM call errors in report | API issue — not a regression |
共 1 个版本