> Real-time cost tracking layer for LLM-based agents. Prices every call live,
> detects runaway loops in code, enforces budget ceilings mid-execution.
Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop
disasters by making cost a first-class concern: priced at write time,
budgeted at check time, surfaced in reports.
Activate when:
/cost-watchdog [command] is invoked.Run via python3 scripts/cost_watchdog.py (or hook into your own CLI).
| Command | What it does |
|---|---|
| --- | --- |
session | Spend totals from usage.jsonl — calls, tokens, cost, top models. |
report | 24h / 7d / 30d windows with top model per window. |
tail [--once] | Watch OpenClaw session JSONL and log every assistant turn. |
detect [--json] | Identify which model the agent is currently using (5 probes). |
audit | AST-based code risk scan: unbounded loops, recursion, missing max_tokens. |
price | Live pricing for one model, with source + cache age. |
estimate | Project cost for n iterations of a given call. |
alternatives | Cheaper same-unit models. |
errors [--limit N] | Recent swallowed exceptions (silent failures made visible). |
validate-tokens | Compare our heuristic against provider's authoritative count. |
reset [--all] | Clear current-day log (--all also clears rolled files). |
openrouter/* → OpenRouter API (live) → static fallback
anything else → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback
video, rerank, OCR, search modes.
Groq, Mistral, Cohere, DeepSeek, Perplexity, xAI, Bedrock, Azure, and more.
token, image, second, query, page, character, pixel. Alternatives never compare across units.
falls through to cache/static until the cool-down ends (60s).
| Env var | Default | Effect |
|---|---|---|
| --- | --- | --- |
CW_PRICE_TTL_SECONDS | 86400 (24h) | Cache lifetime. 0 = hit network every call. |
CW_OFFLINE | unset | If 1, never touch the network. |
CW_STATIC_ONLY | unset | If 1, skip live sources entirely. Used by tests. |
CW_LOG_DIR | ~/.cost-watchdog | Where usage/errors/cache files live. |
CW_BUDGET_USD | unset | Ceiling; wrappers raise BudgetExceeded when crossed. |
python3 scripts/refresh_pricing.py
Regenerates references/pricing.md from the live sources so the offline
fallback is fresh. Aborts if fewer than 100 rows came back (protects
against clobbering on a network outage).
Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:
| Path | When to use | Covers streams? |
|---|---|---|
| --- | --- | --- |
openclaw_tailer.py --watch | Running OpenClaw. Zero code changes. | yes (reads completed turns) |
track_openai(client) | You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...). | yes (tee'd iterator, auto-injects stream_options={"include_usage": True}) |
track_anthropic(client) | Direct Anthropic SDK. | yes (wraps messages.stream()) |
track_gemini(model) / track_cohere(client) / track_bedrock(client) | Direct provider SDKs. | no (add wrappers if you need streams) |
install_global_capture() (httpx) | Any modern Python SDK using httpx. | no — streams are flagged into errors.jsonl so the gap is visible. Use the SDK wrappers for stream coverage. |
Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...)
skips files outside the window before scanning.
Aggregation uses canonical_family() so
claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5
are one row in reports.
Two mechanisms:
append_usage(entry, budget_ceiling=X) takes an fcntl.flock on a sidecar, sums the current session, and
refuses the write (raises BudgetExceeded) if the call would cross X.
CW_BUDGET_USDafter logging and raise if over. Used when the wrapper doesn't know the
ceiling at call time.
Either path stops the agent mid-loop; the LLM call still returns to the
caller, but the next one blocks.
python3 scripts/cost_watchdog.py audit path/to/agent.py
Walks the AST and reports:
while True with an LLM call and no max_iterations-style bound.while that calls an API with no retry/iteration counter.max_tokens / max_completion_tokens.Every finding has a file line number. No more `count('def ') > 3 and
count('self.') > 5 → "recursion detected"` false positives.
python3 scripts/cost_watchdog.py detect
Five probe layers, ranked by confidence:
| Probe | Confidence |
|---|---|
| --- | --- |
| OpenClaw session JSONL | high |
| Claude Code session JSONL | high |
| Most recent usage-log entry | high |
Claude Code settings.json | medium |
Env vars (ANTHROPIC_MODEL, OPENAI_MODEL, ...) | medium |
Emits a table or --json.
| Path | Purpose |
|---|---|
| --- | --- |
scripts/_pricing.py | Router: picks LiteLLM / OpenRouter / static per query. |
scripts/_sources.py | Three PricingSource classes + disk cache + circuit breaker. |
scripts/tokenizer.py | Provider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others). |
scripts/model_canon.py | canonical_family() — collapses model variants. |
scripts/code_audit.py | AST cost-risk walker. |
scripts/usage_log.py | JSONL writer + rotation + aggregation. |
scripts/tracker.py | SDK wrappers + streaming + budget enforcement. |
scripts/http_capture.py | install_global_capture() — httpx transport hook. |
scripts/openclaw_tailer.py | Watches OpenClaw sessions. |
scripts/detect_model.py | Multi-layer detector. |
scripts/errors.py | errors.jsonl writer + reader. |
scripts/io_utils.py | write_json_atomic / read_json. |
scripts/refresh_pricing.py | Regenerates static pricing.md from live sources. |
scripts/cost_watchdog.py | Unified CLI dispatcher. |
references/pricing.md | Static fallback (regenerated; ~2600 models). |
tests/test_cost_watchdog.py | 73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization. |
+= 0.05 theater).fcntl.flock-guarded budget check-and-log (no race).errors.jsonl via HTTP capture.errors.jsonl surfaces silent failures; cost_watchdog errors shows them. not measured. Run cost_watchdog validate-tokens to check drift
against the provider's authoritative count when you have an API key.
install_global_capture() can't see streaming responses — httpx exposes an empty body until the user reads the stream. Use track_openai /
track_anthropic for stream coverage; http_capture logs skipped streams
to errors.jsonl so the gap is visible.
per-SDK wrappers — HTTP capture won't see them.
API is truly live for anything it routes.
python3 -m unittest tests.test_cost_watchdog # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report # current spend summary
共 1 个版本