Route every LLM request to the cheapest model that can handle it. Stop
paying Opus prices for "hello" and "summarize this."
Supports 8 providers and 25+ models: Anthropic (Claude), Google
(Gemini), OpenAI (GPT / o-series), xAI (Grok), DeepSeek, Moonshot (Kimi),
Mistral, and Ollama (local).
reasoning markers, simplicity indicators, multi-step patterns, question
count, system prompt complexity, conversation depth) in under 1 millisecond
model based on your active mode and configured providers
| Mode | Simple | Standard | Complex |
|---|---|---|---|
| ------ | -------- | ---------- | --------- |
| eco | Grok 4.1 Fast | Gemini Flash | Haiku |
| standard | Grok 4.1 Fast | Haiku | Sonnet |
| gigachad | Haiku | Sonnet | Opus 4.6 |
Each cell shows the first-choice model. The router tries a preference list
and falls through to the next available provider if the first is not
configured.
| Command | What It Does |
|---|---|
| --------- | ------------- |
route_request | Send a prompt and get a response from the cheapest capable model |
classify_prompt | Analyze prompt complexity without making an LLM call |
get_routing_stats | View cost savings and model distribution stats |
get_config | View current configuration (keys redacted) |
set_mode | Change routing mode at runtime |
get_recent_routing_log | Inspect recent routing decisions |
/opus, /sonnet, /haiku, /flash, or /grok-fast to force a specific model```
npm run setup
```
共 1 个版本