Comprehensive toolkit for reducing token usage and API costs in OpenClaw deployments. Combines smart model routing, optimized heartbeat intervals, usage tracking, and multi-provider strategies.
This skill can be paired with RTK for additional savings on shell-heavy workflows.
git diff, git status, rg, and test runner output.Recommended pattern:
python3 scripts/model_router.py "review this diff and summarize failures"
rtk git diff
rtk cargo test
python3 scripts/token_tracker.py check
See references/RTK.md for the companion-tool workflow.
Immediate actions (no config changes needed):
```bash
python3 scripts/context_optimizer.py generate-agents
# Creates AGENTS.md.optimized — review and replace your current AGENTS.md
```
```bash
python3 scripts/context_optimizer.py recommend "hi, how are you?"
# Shows: Only 2 files needed (not 50+!)
```
```bash
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
```
```bash
python3 scripts/model_router.py "thanks!"
# Single-provider Anthropic setup: Use Sonnet, not Opus
# Multi-provider setup (OpenRouter/Together): Use Haiku for max savings
```
```bash
python3 scripts/token_tracker.py check
```
Expected savings: 50-80% reduction in token costs for typical workloads (context optimization is the biggest factor!).
Biggest token saver — Only load files you actually need, not everything upfront.
Problem: Default OpenClaw loads ALL context files every session:
Solution: Lazy loading based on prompt complexity.
Usage:
python3 scripts/context_optimizer.py recommend "<user prompt>"
Examples:
# Simple greeting → minimal context (2 files only!)
context_optimizer.py recommend "hi"
→ Load: SOUL.md, IDENTITY.md
→ Skip: Everything else
→ Savings: ~80% of context
# Standard work → selective loading
context_optimizer.py recommend "write a function"
→ Load: SOUL.md, IDENTITY.md, memory/TODAY.md
→ Skip: docs, old memory, knowledge base
→ Savings: ~50% of context
# Complex task → full context
context_optimizer.py recommend "analyze our entire architecture"
→ Load: SOUL.md, IDENTITY.md, MEMORY.md, memory/TODAY+YESTERDAY.md
→ Conditionally load: Relevant docs only
→ Savings: ~30% of context
Output format:
{
"complexity": "simple",
"context_level": "minimal",
"recommended_files": ["SOUL.md", "IDENTITY.md"],
"file_count": 2,
"savings_percent": 80,
"skip_patterns": ["docs/**/*.md", "memory/20*.md"]
}
Integration pattern:
Before loading context for a new session:
from context_optimizer import recommend_context_bundle
user_prompt = "thanks for your help"
recommendation = recommend_context_bundle(user_prompt)
if recommendation["context_level"] == "minimal":
# Load only SOUL.md + IDENTITY.md
# Skip everything else
# Save ~80% tokens!
Generate optimized AGENTS.md:
context_optimizer.py generate-agents
# Creates AGENTS.md.optimized with lazy loading instructions
# Review and replace your current AGENTS.md
Expected savings: 50-80% reduction in context tokens.
Automatically classify tasks and route to appropriate model tiers.
NEW: Communication pattern enforcement — Never waste Opus tokens on "hi" or "thanks"!
Usage:
python3 scripts/model_router.py "<user prompt>" [current_model] [force_tier]
Examples:
# Communication (NEW!) → ALWAYS Haiku
python3 scripts/model_router.py "thanks!"
python3 scripts/model_router.py "hi"
python3 scripts/model_router.py "ok got it"
→ Enforced: Haiku (NEVER Sonnet/Opus for casual chat)
# Simple task → suggests Haiku
python3 scripts/model_router.py "read the log file"
# Medium task → suggests Sonnet
python3 scripts/model_router.py "write a function to parse JSON"
# Complex task → suggests Opus
python3 scripts/model_router.py "design a microservices architecture"
Patterns enforced to Haiku (NEVER Sonnet/Opus):
Communication:
Background tasks:
Integration pattern:
from model_router import route_task
user_prompt = "show me the config"
routing = route_task(user_prompt)
if routing["should_switch"]:
# Use routing["recommended_model"]
# Save routing["cost_savings_percent"]
Customization:
Edit ROUTING_RULES or COMMUNICATION_PATTERNS in scripts/model_router.py to adjust patterns and keywords.
Reduce API calls from heartbeat polling with smart interval tracking:
Setup:
# Copy template to workspace
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
# Plan which checks should run
python3 scripts/heartbeat_optimizer.py plan
Commands:
# Check if specific type should run now
heartbeat_optimizer.py check email
heartbeat_optimizer.py check calendar
# Record that a check was performed
heartbeat_optimizer.py record email
# Update check interval (seconds)
heartbeat_optimizer.py interval email 7200 # 2 hours
# Reset state
heartbeat_optimizer.py reset
How it works:
HEARTBEAT_OK when nothing needs attention (saves tokens)Default intervals:
Integration in HEARTBEAT.md:
## Email Check
Run only if: `heartbeat_optimizer.py check email` → `should_check: true`
After checking: `heartbeat_optimizer.py record email`
Expected savings: 50% reduction in heartbeat API calls.
Model enforcement: Heartbeat should ALWAYS use Haiku — see updated HEARTBEAT.template.md for model override instructions.
Problem: Cronjobs often default to expensive models (Sonnet/Opus) even for routine tasks.
Solution: Always specify Haiku for 90% of scheduled tasks.
See: assets/cronjob-model-guide.md for comprehensive guide with examples.
Quick reference:
| Task Type | Model | Example |
|---|---|---|
| ----------- | ------- | --------- |
| Monitoring/alerts | Haiku | Check server health, disk space |
| Data parsing | Haiku | Extract CSV/JSON/logs |
| Reminders | Haiku | Daily standup, backup reminders |
| Simple reports | Haiku | Status summaries |
| Content generation | Sonnet | Blog summaries (quality matters) |
| Deep analysis | Sonnet | Weekly insights |
| Complex reasoning | Never use Opus for cronjobs |
Example (good):
# Parse daily logs with Haiku
cron add --schedule "0 2 * * *" \
--payload '{
"kind":"agentTurn",
"message":"Parse yesterday error logs and summarize",
"model":"anthropic/claude-haiku-4"
}' \
--sessionTarget isolated
Example (bad):
# ❌ Using Opus for simple check (60x more expensive!)
cron add --schedule "*/15 * * * *" \
--payload '{
"kind":"agentTurn",
"message":"Check email",
"model":"anthropic/claude-opus-4"
}' \
--sessionTarget isolated
Savings: Using Haiku instead of Opus for 10 daily cronjobs = $17.70/month saved per agent.
Integration with model_router:
# Test if your cronjob should use Haiku
model_router.py "parse daily error logs"
# → Output: Haiku (background task pattern detected)
Monitor usage and alert when approaching limits:
Setup:
# Check current daily usage
python3 scripts/token_tracker.py check
# Get model suggestions
python3 scripts/token_tracker.py suggest general
# Reset daily tracking
python3 scripts/token_tracker.py reset
Output format:
{
"date": "2026-02-06",
"cost": 2.50,
"tokens": 50000,
"limit": 5.00,
"percent_used": 50,
"status": "ok",
"alert": null
}
Status levels:
ok: Below 80% of daily limitwarning: 80-99% of daily limitexceeded: Over daily limitIntegration pattern:
Before starting expensive operations, check budget:
import json
import subprocess
result = subprocess.run(
["python3", "scripts/token_tracker.py", "check"],
capture_output=True, text=True
)
budget = json.loads(result.stdout)
if budget["status"] == "exceeded":
# Switch to cheaper model or defer non-urgent work
use_model = "anthropic/claude-haiku-4"
elif budget["status"] == "warning":
# Use balanced model
use_model = "anthropic/claude-sonnet-4-5"
Customization:
Edit daily_limit_usd and warn_threshold parameters in function calls.
See references/PROVIDERS.md for comprehensive guide on:
Quick reference:
| Provider | Model | Cost/MTok | Use Case |
|---|---|---|---|
| ---------- | ------- | ----------- | ---------- |
| Anthropic | Haiku 4 | $0.25 | Simple tasks |
| Anthropic | Sonnet 4.5 | $3.00 | Balanced default |
| Anthropic | Opus 4 | $15.00 | Complex reasoning |
| OpenRouter | Gemini 2.5 Flash | $0.075 | Bulk operations |
| Google AI | Gemini 2.0 Flash Exp | FREE | Dev/testing |
| Together | Llama 3.3 70B | $0.18 | Open alternative |
See assets/config-patches.json for advanced optimizations:
Implemented by this skill:
Native OpenClaw 2026.2.15 — apply directly:
contextPruning: cache-ttl) — auto-trims old tool results after Anthropic cache TTL expiresbootstrapMaxChars / bootstrapTotalMaxChars) — caps workspace file injection sizecacheRetention: "long" for Opus) — amortizes cache write costsRequires OpenClaw core support:
context_optimizer.py script today)Apply config patches:
# Example: Enable multi-provider fallback
gateway config.patch --patch '{"providers": [...]}'
OpenClaw 2026.2.15 added built-in commands that complement this skill's Python scripts. Use these first for quick diagnostics before reaching for the scripts.
/context list → token count per injected file (shows exactly what's eating your prompt)
/context detail → full breakdown including tools, skills, and system prompt sections
Use before applying bootstrap_size_limits — see which files are oversized, then set bootstrapMaxChars accordingly.
/usage tokens → append token count to every reply
/usage full → append tokens + cost estimate to every reply
/usage cost → show cumulative cost summary from session logs
/usage off → disable usage footer
Combine with token_tracker.py — /usage cost gives session totals; token_tracker.py tracks daily budget.
/status → model, context %, last response tokens, estimated cost
The problem: Anthropic charges ~3.75x more for cache writes than cache reads. If your agent goes idle and the 1h cache TTL expires, the next request re-writes the entire prompt cache — expensive.
The fix: Set heartbeat interval to 55min (just under the 1h TTL). The heartbeat keeps the cache warm, so every subsequent request pays cache-read rates instead.
# Get optimal interval for your cache TTL
python3 scripts/heartbeat_optimizer.py cache-ttl
# → recommended_interval: 55min (3300s)
# → explanation: keeps 1h Anthropic cache warm
# Custom TTL (e.g., if you've configured 2h cache)
python3 scripts/heartbeat_optimizer.py cache-ttl 7200
# → recommended_interval: 115min
Apply to your OpenClaw config:
{
"agents": {
"defaults": {
"heartbeat": {
"every": "55m"
}
}
}
}
Who benefits: Anthropic API key users only. OAuth profiles already default to 1h heartbeat (OpenClaw smart default). API key profiles default to 30min — bumping to 55min is both cheaper (fewer calls) and cache-warm.
HEARTBEAT.mdExpected savings: 20-30%
Expected savings: 40-60%
Expected savings: 70-90%
# 1. User sends message
user_msg="debug this error in the logs"
# 2. Route to appropriate model
routing=$(python3 scripts/model_router.py "$user_msg")
model=$(echo $routing | jq -r .recommended_model)
# 3. Check budget before proceeding
budget=$(python3 scripts/token_tracker.py check)
status=$(echo $budget | jq -r .status)
if [ "$status" = "exceeded" ]; then
# Use cheapest model regardless of routing
model="anthropic/claude-haiku-4"
fi
# 4. Process with selected model
# (OpenClaw handles this via config or override)
## HEARTBEAT.md
# Plan what to check
result=$(python3 scripts/heartbeat_optimizer.py plan)
should_run=$(echo $result | jq -r .should_run)
if [ "$should_run" = "false" ]; then
echo "HEARTBEAT_OK"
exit 0
fi
# Run only planned checks
planned=$(echo $result | jq -r '.planned[].type')
for check in $planned; do
case $check in
email) check_email ;;
calendar) check_calendar ;;
esac
python3 scripts/heartbeat_optimizer.py record $check
done
Issue: Scripts fail with "module not found"
Issue: State files not persisting
~/.openclaw/workspace/memory/ directory exists and is writable.Issue: Budget tracking shows $0.00
token_tracker.py needs integration with OpenClaw's session_status tool. Currently tracks manually recorded usage.Issue: Routing suggests wrong model tier
ROUTING_RULES in model_router.py for your specific patterns.Daily:
token_tracker.py checkWeekly:
Monthly:
PROVIDERS.md with new optionsExample: 100K tokens/day workload
Without skill:
| Strategy | Context | Model | Daily Cost | Monthly | Savings |
|---|---|---|---|---|---|
| ---------- | --------- | ------- | ----------- | --------- | --------- |
| Baseline (no optimization) | 50K | Sonnet | $0.30 | $9.00 | 0% |
| Context opt only | 10K (-80%) | Sonnet | $0.18 | $5.40 | 40% |
| Model routing only | 50K | Mixed | $0.18 | $5.40 | 40% |
| Both (this skill) | 10K | Mixed | $0.09 | $2.70 | 70% |
| Aggressive + Gemini | 10K | Gemini | $0.03 | $0.90 | 90% |
Key insight: Context optimization (50K → 10K tokens) saves MORE than model routing!
xCloud hosting scenario (100 customers, 50K tokens/customer/day):
context_optimizer.py — Context loading optimization and lazy loading (NEW!)model_router.py — Task classification, model suggestions, and communication enforcement (ENHANCED!)heartbeat_optimizer.py — Interval management and check schedulingtoken_tracker.py — Budget monitoring and alertsPROVIDERS.md — Alternative AI providers, pricing, and routing strategiesRTK.md — Companion workflow for shrinking shell command output with RTKHEARTBEAT.template.md — Drop-in optimized heartbeat template with Haiku enforcement (ENHANCED!)cronjob-model-guide.md — Complete guide for choosing models in cronjobs (NEW!)config-patches.json — Advanced configuration examplesIdeas for extending this skill:
共 1 个版本