← 返回
未分类 中文

LLM Cost Watchdog

Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models.
实时监控 LLM API 费用,检测失控循环,执行预算限制,审计代码风险,跨多提供商和模型报告使用情况。
nimaansari nimaansari 来源
未分类 clawhub v1.0.0 1 版本 99710.1 Key: 无需
★ 0
Stars
📥 344
下载
💾 0
安装
1
版本
#latest

概述

Cost Watchdog 💰

> Real-time cost tracking layer for LLM-based agents. Prices every call live,

> detects runaway loops in code, enforces budget ceilings mid-execution.

1. Identity

Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop

disasters by making cost a first-class concern: priced at write time,

budgeted at check time, surfaced in reports.

2. Triggers

Activate when:

  • User mentions cost, budget, tokens, billing.
  • Code contains LLM API calls (Anthropic, OpenAI, OpenRouter, Google, Groq, ...).
  • Agent loops or recursive workflows.
  • Batch / streaming processing with unclear bounds.
  • /cost-watchdog [command] is invoked.

3. Commands

Run via python3 scripts/cost_watchdog.py (or hook into your own CLI).

CommandWhat it does
------
sessionSpend totals from usage.jsonl — calls, tokens, cost, top models.
report24h / 7d / 30d windows with top model per window.
tail [--once]Watch OpenClaw session JSONL and log every assistant turn.
detect [--json]Identify which model the agent is currently using (5 probes).
audit AST-based code risk scan: unbounded loops, recursion, missing max_tokens.
price Live pricing for one model, with source + cache age.
estimate Project cost for n iterations of a given call.
alternatives Cheaper same-unit models.
errors [--limit N]Recent swallowed exceptions (silent failures made visible).
validate-tokens Compare our heuristic against provider's authoritative count.
reset [--all]Clear current-day log (--all also clears rolled files).

4. Pricing layer

Source chain

openrouter/*            → OpenRouter API (live)           → static fallback
anything else           → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback
  • 2600+ models indexed across chat, completion, embedding, image, audio,

video, rerank, OCR, search modes.

  • 30+ providers in the static fallback: Anthropic, OpenAI, Google,

Groq, Mistral, Cohere, DeepSeek, Perplexity, xAI, Bedrock, Azure, and more.

  • Unit-aware: token, image, second, query, page, character,

pixel. Alternatives never compare across units.

  • Circuit breaker opens after 3 consecutive network failures for a host;

falls through to cache/static until the cool-down ends (60s).

Tuning

Env varDefaultEffect
---------
CW_PRICE_TTL_SECONDS86400 (24h)Cache lifetime. 0 = hit network every call.
CW_OFFLINEunsetIf 1, never touch the network.
CW_STATIC_ONLYunsetIf 1, skip live sources entirely. Used by tests.
CW_LOG_DIR~/.cost-watchdogWhere usage/errors/cache files live.
CW_BUDGET_USDunsetCeiling; wrappers raise BudgetExceeded when crossed.

Refresh static pricing

python3 scripts/refresh_pricing.py

Regenerates references/pricing.md from the live sources so the offline

fallback is fresh. Aborts if fewer than 100 rows came back (protects

against clobbering on a network outage).

5. Tracking layer — how we know what was spent

Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:

PathWhen to useCovers streams?
---------
openclaw_tailer.py --watchRunning OpenClaw. Zero code changes.yes (reads completed turns)
track_openai(client)You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...).yes (tee'd iterator, auto-injects stream_options={"include_usage": True})
track_anthropic(client)Direct Anthropic SDK.yes (wraps messages.stream())
track_gemini(model) / track_cohere(client) / track_bedrock(client)Direct provider SDKs.no (add wrappers if you need streams)
install_global_capture() (httpx)Any modern Python SDK using httpx.no — streams are flagged into errors.jsonl so the gap is visible. Use the SDK wrappers for stream coverage.

Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...)

skips files outside the window before scanning.

Aggregation uses canonical_family() so

claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5

are one row in reports.

6. Budget enforcement

Two mechanisms:

  1. Write-time check (race-safe): append_usage(entry, budget_ceiling=X)

takes an fcntl.flock on a sidecar, sums the current session, and

refuses the write (raises BudgetExceeded) if the call would cross X.

  1. Post-write check: wrappers compare cumulative spend to CW_BUDGET_USD

after logging and raise if over. Used when the wrapper doesn't know the

ceiling at call time.

Either path stops the agent mid-loop; the LLM call still returns to the

caller, but the next one blocks.

7. Code audit (AST)

python3 scripts/cost_watchdog.py audit path/to/agent.py

Walks the AST and reports:

  • CRITICALwhile True with an LLM call and no max_iterations-style bound.
  • CRITICAL — function that recurses and calls an LLM API with no depth argument.
  • HIGH — plain while that calls an API with no retry/iteration counter.
  • MEDIUM — LLM call missing max_tokens / max_completion_tokens.
  • MEDIUM — function with ≥5 sequential LLM calls (batching candidate).

Every finding has a file line number. No more `count('def ') > 3 and

count('self.') > 5 → "recursion detected"` false positives.

8. Detection — "what model is the agent using?"

python3 scripts/cost_watchdog.py detect

Five probe layers, ranked by confidence:

ProbeConfidence
------
OpenClaw session JSONLhigh
Claude Code session JSONLhigh
Most recent usage-log entryhigh
Claude Code settings.jsonmedium
Env vars (ANTHROPIC_MODEL, OPENAI_MODEL, ...)medium

Emits a table or --json.

9. Files

PathPurpose
------
scripts/_pricing.pyRouter: picks LiteLLM / OpenRouter / static per query.
scripts/_sources.pyThree PricingSource classes + disk cache + circuit breaker.
scripts/tokenizer.pyProvider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others).
scripts/model_canon.pycanonical_family() — collapses model variants.
scripts/code_audit.pyAST cost-risk walker.
scripts/usage_log.pyJSONL writer + rotation + aggregation.
scripts/tracker.pySDK wrappers + streaming + budget enforcement.
scripts/http_capture.pyinstall_global_capture() — httpx transport hook.
scripts/openclaw_tailer.pyWatches OpenClaw sessions.
scripts/detect_model.pyMulti-layer detector.
scripts/errors.pyerrors.jsonl writer + reader.
scripts/io_utils.pywrite_json_atomic / read_json.
scripts/refresh_pricing.pyRegenerates static pricing.md from live sources.
scripts/cost_watchdog.pyUnified CLI dispatcher.
references/pricing.mdStatic fallback (regenerated; ~2600 models).
tests/test_cost_watchdog.py73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization.

10. Quality checklist

  • [x] Live pricing from LiteLLM + OpenRouter, 24h-cached, with static fallback.
  • [x] Exact-match model lookup (no substring conflation).
  • [x] Multi-modal (token / image / second / query / page / character).
  • [x] Unit-aware alternatives (never compares tokens to images).
  • [x] AST-based code audit with line numbers.
  • [x] Provider-aware tokenization (no more tiktoken-for-Claude).
  • [x] Variance-based confidence (no += 0.05 theater).
  • [x] Atomic writes to all shared state files.
  • [x] fcntl.flock-guarded budget check-and-log (no race).
  • [x] Circuit breaker on flaky networks (no 5s hang per call).
  • [x] Streaming capture via SDK wrappers; streams flagged in errors.jsonl via HTTP capture.
  • [x] Daily log rotation + date-scoped aggregation.
  • [x] Canonical model families (variants collapse in reports).
  • [x] errors.jsonl surfaces silent failures; cost_watchdog errors shows them.
  • [x] Cassette tests for LiteLLM + OpenRouter parse paths (schema-drift safety net).
  • [x] 73 logic tests passing.

11. Known limits (be honest)

  • Tokenizer heuristics for Claude/Gemini/etc. are calibrated from docs,

not measured. Run cost_watchdog validate-tokens to check drift

against the provider's authoritative count when you have an API key.

  • install_global_capture() can't see streaming responses — httpx exposes

an empty body until the user reads the stream. Use track_openai /

track_anthropic for stream coverage; http_capture logs skipped streams

to errors.jsonl so the gap is visible.

  • Non-httpx SDKs (older Cohere, boto3 with custom transport) need the

per-SDK wrappers — HTTP capture won't see them.

  • LiteLLM community data can lag 24-48h on brand-new models. OpenRouter's

API is truly live for anything it routes.

12. Testing

python3 -m unittest tests.test_cost_watchdog     # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report          # current spend summary

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 11:13 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,383 📥 320,833
ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,085 📥 813,497
ai-agent

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,230 📥 268,142