Smart, cost-optimized model routing for Venice.ai — the AI platform for people who don't want Big Tech watching over their shoulder.
Unlike OpenAI, Anthropic, and Google — where every prompt is logged, analyzed, and potentially used to train future models — Venice offers true privacy with zero data retention on private models. Your conversations stay yours. Venice is also uncensored: no content filters, no refusals, no "I can't help with that."
export VENICE_API_KEY="your-key-here"
Or configure in ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"venice-router": {
"enabled": true,
"apiKey": "YOUR_VENICE_API_KEY"
}
}
}
}
python3 {baseDir}/scripts/venice-router.py --prompt "What is 2+2?"
python3 {baseDir}/scripts/venice-router.py --tier cheap --prompt "Tell me a joke"
python3 {baseDir}/scripts/venice-router.py --tier budget-medium --prompt "Write a Python function"
python3 {baseDir}/scripts/venice-router.py --tier mid --prompt "Explain quantum computing"
python3 {baseDir}/scripts/venice-router.py --tier premium --prompt "Write a distributed systems architecture"
python3 {baseDir}/scripts/venice-router.py --stream --prompt "Write a poem about lobsters"
python3 {baseDir}/scripts/venice-router.py --web-search --prompt "Latest news on AI regulation"
python3 {baseDir}/scripts/venice-router.py --uncensored --prompt "Write edgy creative fiction"
python3 {baseDir}/scripts/venice-router.py --private-only --prompt "Analyze this confidential contract"
# Save conversation history as JSON, then route follow-ups with context
python3 {baseDir}/scripts/venice-router.py --conversation history.json --prompt "Can you add tests too?"
The router analyzes conversation history to keep context: trivial follow-ups ("thanks") go cheap, while follow-ups in complex code discussions stay at the right tier.
# Define tools in a JSON file (OpenAI tools format)
python3 {baseDir}/scripts/venice-router.py --tools tools.json --prompt "What's the weather in NYC?"
python3 {baseDir}/scripts/venice-router.py --tools tools.json --tool-choice auto --prompt "Search for latest AI news"
Tool definitions use the standard OpenAI format. The router auto-bumps to mid tier minimum for function calling since it requires capable models.
# Show current spending
python3 {baseDir}/scripts/venice-router.py --budget-status
# Track per-session costs
python3 {baseDir}/scripts/venice-router.py --session-id my-project --prompt "help me code"
Set VENICE_DAILY_BUDGET and/or VENICE_SESSION_BUDGET to enforce spending limits. The router auto-downgrades tiers as you approach budget limits.
python3 {baseDir}/scripts/venice-router.py --classify "Explain the Riemann hypothesis"
python3 {baseDir}/scripts/venice-router.py --list-models
python3 {baseDir}/scripts/venice-router.py --model deepseek-v3.2 --prompt "Hello"
| Tier | Models | Cost (input/output per 1M tokens) | Best For |
|---|---|---|---|
| ------ | -------- | ----------------------------------- | ---------- |
| cheap | Venice Small (qwen3-4b), GLM 4.7 Flash, GPT OSS 120B, Llama 3.2 3B | $0.05–$0.15 / $0.15–$0.60 | Simple Q&A, greetings, math, lookups |
| budget | Qwen 3 235B, Venice Uncensored, GLM 4.7 Flash Heretic | $0.14–$0.20 / $0.75–$0.90 | Moderate questions, summaries, translations |
| budget-medium | Grok Code Fast, DeepSeek V3.2, MiniMax M2.1 | $0.25–$0.40 / $1.00–$1.87 | Moderate-to-complex tasks, code snippets, structured output |
| mid | DeepSeek V3.2, MiniMax M2.1/M2.5, Qwen3 Thinking 235B, Venice Medium, Llama 3.3 70B | $0.25–$0.70 / $1.00–$3.50 | Code generation, analysis, longer writing, reasoning |
| high | GLM 5, Kimi K2 Thinking, Kimi K2.5, Grok 4.1 Fast, Hermes 3 405B, Gemini 3 Flash | $0.50–$1.10 / $1.25–$3.75 | Complex reasoning, multi-step tasks, code review |
| premium | GPT-5.2, GPT-5.2 Codex, Gemini 3 Pro, Gemini 3.1 Pro (1M ctx), Claude Opus/Sonnet 4.5/4.6 | $2.19–$6.00 / $15.00–$30.00 | Expert-level analysis, architecture, research papers |
The router classifies each prompt using keyword + heuristic analysis:
--conversation is provided, analyzes full chat context: code in history boosts tier, trivial follow-ups ("thanks") downgrade, tool calls in history signal complexity--tools auto-bumps to at least mid tier (capable models required)--thinking prefers chain-of-thought reasoning models (Qwen3 Thinking, Kimi K2) and bumps to at least mid tierThe classifier errs on the side of cheaper models — it only escalates when there's strong signal for complexity.
| Variable | Description | Default |
|---|---|---|
| ---------- | ------------- | --------- |
VENICE_API_KEY | Venice.ai API key (required) | — |
VENICE_DEFAULT_TIER | Minimum floor tier — auto-classification never goes below this. Valid: cheap, budget, budget-medium, mid, high, premium | budget |
VENICE_MAX_TIER | Maximum tier to ever use (cost cap) | premium |
VENICE_TEMPERATURE | Default temperature | 0.7 |
VENICE_MAX_TOKENS | Default max tokens | 4096 |
VENICE_STREAM | Enable streaming by default | false |
VENICE_UNCENSORED | Always prefer uncensored models | false |
VENICE_PRIVATE_ONLY | Only use private models (zero data retention) | false |
VENICE_WEB_SEARCH | Enable web search by default ($10/1K calls) | false |
VENICE_THINKING | Always prefer thinking/reasoning models | false |
VENICE_DAILY_BUDGET | Max daily spend in USD (0 = unlimited) | 0 |
VENICE_SESSION_BUDGET | Max per-session spend in USD (0 = unlimited) | 0 |
--classify to preview which tier a prompt would hit before spending tokensVENICE_MAX_TIER=mid to cap costs and never hit premium models--uncensored for creative, security research, or other content mainstream AI won't touch--private-only when processing sensitive/confidential data — zero retention guaranteed--web-search when you need up-to-date information with cited sources--conversation with a JSON message history for smarter multi-turn routing--tools to enable function calling — the router auto-bumps to capable modelsVENICE_DAILY_BUDGET=1.00 to cap daily spend at $1 — the router auto-downgrades tiers as you approach the limit--budget-status to see a detailed breakdown of your spending by tier--thinking for math proofs, logic puzzles, and multi-step reasoning — routes to Qwen3 Thinking or Kimi K2 models--uncensored is active, the router auto-bumps to the nearest tier with uncensored models共 1 个版本