Handles rate limits and model fallbacks gracefully.
When I encounter rate limits or overload errors from cloud providers (Anthropic, OpenAI):
Before using local models for code generation, ask:
> "Cloud is rate-limited. Switch to local Ollama (qwen2.5:7b)? Reply 'yes' to confirm."
For simple queries (chat, summaries), can switch without confirmation if user previously approved.
/llm statusReport current state:
/llm switch localManually switch to Ollama for the session.
/llm switch cloudSwitch back to cloud provider.
# Check available models
ollama list
# Run a query
ollama run qwen2.5:7b "your prompt here"
# For longer prompts, use stdin
echo "your prompt" | ollama run qwen2.5:7b
Check with ollama list. Configured default: qwen2.5:7b
Track in memory during session:
currentProvider: "cloud" | "local" lastRateLimitAt: timestamp or nulllocalConfirmedForCode: booleanReset to cloud at session start.
共 1 个版本