概述

Model Throughput Tester

Benchmark LLM model throughput (tokens/s). Two modes available:

Auto Mode: Test current model via openclaw infer model run, no API key required
API Mode: Direct call to OpenAI-compatible API, requires URL and Key

When to Use

Use when: User explicitly requests a model throughput test.

Trigger words:

throughput test, tokens/s, speed test, benchmark
model speed, latency test, model test

Do NOT trigger: Broad performance discussion terms (e.g. "model performance", standalone "benchmark") should not auto-trigger execution.

Auto Mode (no API key):

python3 throughput.py --auto --model "<current session model>"

Core Features

1. Auto Mode (No Key, Recommended)

Auto-detects the current session model and benchmarks throughput, zero configuration needed.

python3 throughput.py --auto

Test a specific model:

python3 throughput.py --auto --model "zai/glm-5-turbo"

2. API Mode (Direct API Call)

python3 throughput.py \
  --url https://api.example.com/v1 \
  --key sk-xxx \
  --models gpt-4o-mini,gpt-4o

3. Common Parameters

Parameter	Default	Description
-----------	---------	-------------
`--iterations`	`3`	Test iterations per model
`--max-tokens`	`512`	Max output tokens
`--test-prompt`	English prose (summer field)	Test prompt
`--timeout`	`60`	Single request timeout (seconds)
`--output`	`throughput-report.md`	Output report filename
`--csv`	false	Also generate CSV

Workflow

Auto Mode Flow

1. Read current session model from openclaw.json (provider/model)
2. Send test prompt via openclaw infer model run
3. Timer: command start → output complete
4. Estimate token count from response text (English: 0.75 word/token, Chinese: 1.5 chars/token)
5. Calculate tokens/s
6. Generate summary report

API Mode Flow

1. Build /v1/chat/completions request
2. Timer: request start → last token received
3. Extract usage.completion_tokens from response (precise)
4. Calculate tokens/s, error rate
5. Generate summary report

Metrics

Metric	Description
--------	-------------
Tokens/s	Throughput = Output Tokens / Elapsed Time
Avg Latency	Average single request latency
Avg Output Tokens	Average output token count
Error Rate	Failed request ratio

Output Example

# Model Throughput Report
**Mode:** Auto (openclaw infer)
**Iterations:** 3

## Summary
| Model | Avg Tokens/s | Avg Latency(s) | Avg Output Tokens | Error Rate |
|-------|-------------|----------------|-------------------|------------|
| zai/glm-5-turbo | 57.9 | 20.6 | 979.0 | 0.0% |

## Detail
### zai/glm-5-turbo
| Iter | Latency(s) | Output Tokens | Tokens/s | Status |
|------|------------|--------------|---------|--------|
| 1 | 19.5 | 950 | 48.7 | ✅ |
| 2 | 21.3 | 1010 | 47.4 | ✅ |
| 3 | 20.9 | 977 | 46.7 | ✅ |

Error Handling

Scenario	Auto Mode	API Mode
----------	-----------	----------
openclaw not installed	cli_error	—
Model not found	api_error	http_404
Network timeout	timeout	timeout
Token estimation	English 0.75 word/token, Chinese 1.5 chars/token	Precise from API

Usage Examples

Quick Test After Install (Auto Mode)

python3 ~/.openclaw/workspace/skills/model-throughput-tester/throughput.py --auto --model "<current session model>"

# Or auto-detect (may not match session override)
python3 ~/.openclaw/workspace/skills/model-throughput-tester/throughput.py --auto

Test Multiple Models (API Mode)

python3 throughput.py \
  --url "https://api.openai.com/v1" \
  --key "sk-xxx" \
  --models "gpt-4o-mini,gpt-4o" \
  --iterations 5

Custom Prompt

python3 throughput.py --auto \
  --test-prompt "Explain quantum computing in detail." \
  --iterations 5

Technical Details

Auto Mode: openclaw infer model run --json, Python subprocess call
API Mode: urllib (Python built-in), OpenAI-compatible /v1/chat/completions
Timer Precision: time.perf_counter() nanosecond-level
Token Counting: API mode uses usage.completion_tokens (precise), Auto mode estimates by character count
URL Handling: Smart detection of /v1, /v4, /chat/completions paths

Notes

Auto mode throughput includes gateway routing overhead, slightly lower than direct API (~1-3%)
Auto mode token count is estimated, API mode is precise
English prompts recommended for more accurate token estimation
Anti-cache: random seed suffix appended to each iteration

版本历史

共 2 个版本

v1.0.6 当前

2026-06-11 18:18
v1.0.5

2026-06-09 18:56

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

Model Throughput Tester

概述

Model Throughput Tester

When to Use

Core Features

1. Auto Mode (No Key, Recommended)

2. API Mode (Direct API Call)

3. Common Parameters

Workflow

Auto Mode Flow

API Mode Flow

Metrics

Output Example

Error Handling

Usage Examples

Quick Test After Install (Auto Mode)

Test Multiple Models (API Mode)

Custom Prompt

Technical Details

Notes

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Skill Vetter

self-improving agent

Github