← 返回
未分类

Model Throughput Tester

Automation skill for Model Throughput Tester.
模型吞吐量测试的自动化技能
tsag1 tsag1 来源
未分类 clawhub v1.0.6 2 版本 99295.8 Key: 无需
★ 1
Stars
📥 121
下载
💾 1
安装
2
版本
#latest

概述

Model Throughput Tester

Benchmark LLM model throughput (tokens/s). Two modes available:

  • Auto Mode: Test current model via openclaw infer model run, no API key required
  • API Mode: Direct call to OpenAI-compatible API, requires URL and Key

When to Use

Use when: User explicitly requests a model throughput test.

Trigger words:

  • throughput test, tokens/s, speed test, benchmark
  • model speed, latency test, model test

Do NOT trigger: Broad performance discussion terms (e.g. "model performance", standalone "benchmark") should not auto-trigger execution.

Auto Mode (no API key):

python3 throughput.py --auto --model "<current session model>"

Core Features

1. Auto Mode (No Key, Recommended)

Auto-detects the current session model and benchmarks throughput, zero configuration needed.

python3 throughput.py --auto

Test a specific model:

python3 throughput.py --auto --model "zai/glm-5-turbo"

2. API Mode (Direct API Call)

python3 throughput.py \
  --url https://api.example.com/v1 \
  --key sk-xxx \
  --models gpt-4o-mini,gpt-4o

3. Common Parameters

ParameterDefaultDescription
---------------------------------
--iterations3Test iterations per model
--max-tokens512Max output tokens
--test-promptEnglish prose (summer field)Test prompt
--timeout60Single request timeout (seconds)
--outputthroughput-report.mdOutput report filename
--csvfalseAlso generate CSV

Workflow

Auto Mode Flow

1. Read current session model from openclaw.json (provider/model)
2. Send test prompt via openclaw infer model run
3. Timer: command start → output complete
4. Estimate token count from response text (English: 0.75 word/token, Chinese: 1.5 chars/token)
5. Calculate tokens/s
6. Generate summary report

API Mode Flow

1. Build /v1/chat/completions request
2. Timer: request start → last token received
3. Extract usage.completion_tokens from response (precise)
4. Calculate tokens/s, error rate
5. Generate summary report

Metrics

MetricDescription
---------------------
Tokens/sThroughput = Output Tokens / Elapsed Time
Avg LatencyAverage single request latency
Avg Output TokensAverage output token count
Error RateFailed request ratio

Output Example

# Model Throughput Report
**Mode:** Auto (openclaw infer)
**Iterations:** 3

## Summary
| Model | Avg Tokens/s | Avg Latency(s) | Avg Output Tokens | Error Rate |
|-------|-------------|----------------|-------------------|------------|
| zai/glm-5-turbo | 57.9 | 20.6 | 979.0 | 0.0% |

## Detail
### zai/glm-5-turbo
| Iter | Latency(s) | Output Tokens | Tokens/s | Status |
|------|------------|--------------|---------|--------|
| 1 | 19.5 | 950 | 48.7 | ✅ |
| 2 | 21.3 | 1010 | 47.4 | ✅ |
| 3 | 20.9 | 977 | 46.7 | ✅ |

Error Handling

ScenarioAuto ModeAPI Mode
-------------------------------
openclaw not installedcli_error
Model not foundapi_errorhttp_404
Network timeouttimeouttimeout
Token estimationEnglish 0.75 word/token, Chinese 1.5 chars/tokenPrecise from API

Usage Examples

Quick Test After Install (Auto Mode)

python3 ~/.openclaw/workspace/skills/model-throughput-tester/throughput.py --auto --model "<current session model>"

# Or auto-detect (may not match session override)
python3 ~/.openclaw/workspace/skills/model-throughput-tester/throughput.py --auto

Test Multiple Models (API Mode)

python3 throughput.py \
  --url "https://api.openai.com/v1" \
  --key "sk-xxx" \
  --models "gpt-4o-mini,gpt-4o" \
  --iterations 5

Custom Prompt

python3 throughput.py --auto \
  --test-prompt "Explain quantum computing in detail." \
  --iterations 5

Technical Details

  • Auto Mode: openclaw infer model run --json, Python subprocess call
  • API Mode: urllib (Python built-in), OpenAI-compatible /v1/chat/completions
  • Timer Precision: time.perf_counter() nanosecond-level
  • Token Counting: API mode uses usage.completion_tokens (precise), Auto mode estimates by character count
  • URL Handling: Smart detection of /v1, /v4, /chat/completions paths

Notes

  • Auto mode throughput includes gateway routing overhead, slightly lower than direct API (~1-3%)
  • Auto mode token count is estimated, API mode is precise
  • English prompts recommended for more accurate token estimation
  • Anti-cache: random seed suffix appended to each iteration

版本历史

共 2 个版本

  • v1.0.6 当前
    2026-06-11 18:18
  • v1.0.5
    2026-06-09 18:56

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,219 📥 266,832
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,062 📥 799,775
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 672 📥 324,503