← 返回
未分类 中文

API Failover

Detect AI API/provider/model failures and route requests to healthy fallback providers or downgraded models. Use when creating or maintaining automatic failo...
检测 AI API/供应商/模型故障并将请求路由到健康的备用供应商或降级模型。用于创建或维护自动故障转移系统。
zqh2333 zqh2333 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 301
下载
💾 0
安装
1
版本
#latest

概述

API Failover

Create or improve a lightweight failover layer for AI APIs.

Goals

Build systems that:

  • detect unavailable or degraded providers/models
  • classify failures before retrying blindly
  • switch to a safe fallback chain
  • avoid hammering broken endpoints
  • recover back to preferred providers after cooldown

Workflow

  1. Identify the call path.
  2. Classify failure modes.
  3. Define a fallback policy.
  4. Add health memory.
  5. Implement guarded retries.
  6. Emit observable logs.
  7. Validate with forced-failure tests.

Use the detailed rules below and the bundled scripts instead of re-inventing routing logic each time.

Practical defaults

Error classes

Use these normalized categories:

  • AUTH_ERROR
  • BAD_REQUEST
  • RATE_LIMIT
  • TIMEOUT
  • SERVER_ERROR
  • NETWORK_ERROR
  • MODEL_UNAVAILABLE
  • QUOTA_EXCEEDED
  • UNKNOWN_TRANSIENT

Suggested routing behavior

  • AUTH_ERROR, BAD_REQUEST: fail fast; do not retry other providers unless config explicitly maps to another credential set.
  • RATE_LIMIT: short backoff, then fallback.
  • TIMEOUT, SERVER_ERROR, NETWORK_ERROR, MODEL_UNAVAILABLE, UNKNOWN_TRANSIENT: retry briefly, then fallback.
  • QUOTA_EXCEEDED: mark provider unavailable for a longer cooldown and fallback immediately.

Circuit breaker defaults

Start with:

  • open after 3 consecutive transient failures
  • cooldown 60-180s
  • half-open with 1 probe
  • close after 1-2 successful probes

Configuration pattern

Keep policy in config, not hard-coded logic.

Recommended shape:

  • provider registry
  • task profiles with ordered fallback chains
  • retry policy
  • circuit-breaker policy
  • per-provider overrides

Design guidance

  • Prefer fewer, well-understood providers over large fallback chains.
  • Keep the fallback chain semantically compatible when possible.
  • Separate "best quality" from "must return something" behavior.
  • Keep downgrade rules explicit; avoid silent huge capability drops for critical tasks.
  • For tool-using agents, treat provider switching as a reliability event and report it when user-visible quality may change.

Semi-automatic deployment model

Use this skill to discover the environment, generate a production-ish config, run a local HTTP failover proxy, and verify health.

Do not claim full autonomous takeover unless the environment-specific integration is actually completed.

References

Read these only when needed:

  • references/config-example.yaml for a compact policy example
  • references/config-realworld-example.yaml for a more practical multi-provider template
  • references/config-production.yaml for a ready-to-edit production template
  • references/test-scenarios.md for failure-injection and validation cases
  • references/realworld-notes.md for local proxy deployment and environment-variable setup
  • references/api-failover.service for a user-systemd service example

Bundled scripts

scripts/discover_env.py

Inspect the current environment.

scripts/generate_config.py

Generate a production-ish YAML config from simple defaults.

scripts/failover_proxy.py

Run a minimal CLI failover call path.

scripts/http_proxy.py

Expose a single local OpenAI-compatible entrypoint.

Endpoints:

  • POST /v1/chat/completions
  • GET /health

Optional request header:

  • X-Failover-Profile: cheap|default|critical|local-first

scripts/selfcheck.py

Validate that the local proxy is reachable and can process a minimal chat request.

scripts/bootstrap_failover.py

Run the semi-automatic bootstrap flow:

  • discover environment
  • generate config
  • optionally start the proxy
  • run self-check
  • print next actions

Example:

python3 scripts/bootstrap_failover.py \
  --default-model custom-ai-td-ee/gpt-5.4 \
  --start-proxy

Keep these scripts small and inspectable. Extend them instead of turning SKILL.md into code-heavy instructions.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 16:16 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Safe Smart Web Fetch

zqh2333
安全网页抓取技能。先判断URL是否含token、内网/本地域名或私密链接,这些直接抓取;其余公开网页依次尝试Jina Reader、markdown.new、defuddle.md获取干净Markdown,失败回退原始抓取。
★ 1 📥 611
ai-agent

self-improving agent

pskoett
记录自身发现以实现自我改进的技能
★ 4,169 📥 941,706
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 868 📥 348,355