← 返回
未分类 中文

proxy-token-optimizer

Optimize LLM token usage and API costs for the openclaw-manager proxy platform. Provides model-tier routing (route simple prompts to glm-4.7-flashx instead o...
优化openclaw-manager代理平台的LLM令牌使用和API成本,提供模型层路由功能(将简单查询路由至glm-4.7-flashx而非更高成本模型)
whyhit2005
未分类 clawhub v1.0.1 1 版本 99845.4 Key: 无需
★ 1
Stars
📥 626
下载
💾 0
安装
1
版本
#latest

概述

Proxy Token Optimizer

Reduces LLM API costs for the openclaw-manager multi-tenant proxy platform through four strategies:

  1. Model-tier routing — Route prompts to the cheapest capable model
  2. Heartbeat optimization — Cheapest model + longer intervals for heartbeat calls
  3. Context lazy loading — Load only the context files each prompt actually needs
  4. Platform usage analytics — Real data from PostgreSQL, not estimates

Why these strategies matter

The openclaw-manager platform proxies LLM requests for multiple OpenClaw instances through providers like zai-proxy, zai-coding-proxy, and kimi-coding-proxy. Each provider offers models at different price points (e.g., glm-4.7 vs glm-4.7-flashx). Without optimization, every request — including simple greetings and heartbeat pings — uses the default (expensive) model, and every session loads the full context regardless of need. These four strategies target the highest-impact cost drivers.

Quick start

All instance-side scripts run locally with no dependencies. Platform-side scripts need DB access.

# Model routing — which model should handle this prompt?
python3 scripts/model_router.py "thanks!"
# → {"tier": "cheap", "recommended_model": "zai-proxy/glm-4.7-flashx"}

# Context optimization — which files does this prompt need?
python3 scripts/context_optimizer.py recommend "hi"
# → {"context_level": "minimal", "recommended_files": ["SOUL.md", "IDENTITY.md"]}

# Heartbeat config — generate openclaw.json patch
python3 scripts/heartbeat_config.py patch
# → {"agents": {"defaults": {"heartbeat": {"every": "55m", "model": "zai-proxy/glm-4.7-flashx"}}}}

# Unified CLI (all commands in one place)
python3 scripts/cli.py --help

Scripts reference

Instance-side (pure local, no network, no DB)

scripts/model_router.py

Routes prompts to the right model tier based on complexity analysis.

Tier logic:

  • cheapglm-4.7-flashx: Greetings, acknowledgments, heartbeats, cron jobs, log parsing. Cost savings: 5-10x vs standard.
  • standardglm-4.7: Code writing, debugging, explanations. Default for unclear prompts.
  • premiumglm-4.7 (or k2p5 for kimi): Architecture design, deep analysis, strategy planning.

Supports Chinese and English patterns. Provider-aware — works with zai-proxy, zai-coding-proxy, and kimi-coding-proxy.

python3 scripts/model_router.py "<prompt>" [provider]
python3 scripts/model_router.py compare  # show all provider models

scripts/context_optimizer.py

Analyzes prompt complexity to recommend which context files to load, reducing unnecessary token consumption.

Context levels:

LevelWhenFiles loadedToken savings
-----------------------------------------
minimal"hi", "thanks", short msgsSOUL.md + IDENTITY.md (2)~80%
standard"write a function", normal work+ memory/TODAY.md + conditional~50%
full"design architecture", complex tasks+ MEMORY.md + all conditional~30%

Also generates an optimized AGENTS.md template with lazy-loading rules baked in:

python3 scripts/context_optimizer.py recommend "<prompt>"
python3 scripts/context_optimizer.py generate-agents  # creates AGENTS.md.optimized

scripts/heartbeat_config.py

Generates openclaw.json configuration patches for heartbeat optimization:

  • Forces heartbeat model to glm-4.7-flashx (cheapest available)
  • Sets interval to 55 minutes (keeps prompt cache warm within 1-hour TTL, avoids cache rebuild cost)
python3 scripts/heartbeat_config.py recommend [cache_ttl_minutes]
python3 scripts/heartbeat_config.py patch  # output JSON patch for openclaw.json

Platform-side (requires DB connection)

These scripts query the usage_records PostgreSQL table for real data. Run from the openclaw-manager project root with the virtualenv activated.

scripts/usage_report.py

Generates usage reports from actual database records — not estimates.

python3 scripts/usage_report.py overview [days]     # platform-wide summary
python3 scripts/usage_report.py instance <name> [days]  # single instance detail

Overview includes: total calls/tokens, per-provider breakdown, per-model breakdown, top 10 instances by consumption, 7-day daily trend.

Instance report includes: per-model distribution, daily trend, lifetime totals.

scripts/quota_advisor.py

Compares actual 24-hour usage against quota plan limits to find mismatches:

  • Wasteful: Usage below 20% of plan limit → suggest downgrade
  • Throttled: Usage above 80% of plan limit → suggest upgrade
python3 scripts/quota_advisor.py analyze  # check all instances
python3 scripts/quota_advisor.py plans    # show available quota plans

Unified CLI

scripts/cli.py wraps all the above into a single entry point:

python3 scripts/cli.py route "<prompt>"       # model routing
python3 scripts/cli.py context "<prompt>"     # context recommendation
python3 scripts/cli.py generate-agents        # generate AGENTS.md
python3 scripts/cli.py heartbeat              # heartbeat config
python3 scripts/cli.py overview [days]        # platform usage (needs DB)
python3 scripts/cli.py report <name> [days]   # instance report (needs DB)
python3 scripts/cli.py advisor                # quota advice (needs DB)

Project integration points

This skill works with existing openclaw-manager infrastructure:

ComponentFileHow this skill uses it
---------------------------------------
Provider configconfig/model.yamlModel names/endpoints for routing
Proxy routingconfig_service.pyWhere _inject_proxy_providers() registers models
Usage recordingproxy_common/usage_recorder.pySource of real usage data
Quota plansconfig/llm_proxy.yamlPlan definitions for quota advisor
Instance modelapp/models.pyInstance metadata for reports

Expected savings

StrategyMechanismImpact
-----------------------------
Context lazy loadingFewer tokens per request50-80% context reduction
Model routing (flashx)Lower per-token price5-10x on simple tasks
Heartbeat → flashxLower heartbeat costSignificant per-instance savings
Heartbeat interval 55minFewer API calls~45% fewer heartbeat calls

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-01 22:28 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

zhipu web search

whyhit2005
智谱AI网络搜索工具——通过cURL提供灵活的搜索引擎功能。适用场景:需要搜索网络信息获取最新数据时。
★ 7 📥 4,335
data-analysis

Zhipu Search

whyhit2005
Zhipu AI 网络搜索工具 - 提供灵活的搜索引擎功能。适用场景:- 需要搜索网络信息获取最新数据 - 需要进行特定的搜索(...)
★ 0 📥 613
developer-tools

zhipu web fetch

whyhit2005
Zhipu AI网页读取工具 - 使用cURL获取并解析网页内容为结构化Markdown或文本。适用场景:- 需要获取并读取内容...
★ 0 📥 529