概述

🦞 Gradient AI — Serverless Inference

> ⚠️ This is an unofficial community skill, not maintained by DigitalOcean. Use at your own risk.

> "Why manage GPUs when the ocean provides?" — ancient lobster proverb

Use DigitalOcean's Gradient Serverless Inference to call large language models without managing infrastructure. The API is OpenAI-compatible, so standard SDKs and patterns work — just point at https://inference.do-ai.run/v1 and swim.

Authentication

All requests need a Model Access Key in the Authorization: Bearer header.

export GRADIENT_API_KEY="your-model-access-key"

Where to get one: DigitalOcean Console → Gradient AI → Model Access Keys → Create Key.

📖 Full auth docs

Tools

🔍 List Available Models

Window-shop for LLMs before you swipe the card.

python3 gradient_models.py                    # Pretty table
python3 gradient_models.py --json             # Machine-readable
python3 gradient_models.py --filter "llama"   # Search by name

Use this before hardcoding model IDs — models are added and deprecated over time.

Direct API call:

curl -s https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $GRADIENT_API_KEY" | python3 -m json.tool

📖 Models reference

💬 Chat Completions

The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.

python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --system "You are a helpful assistant." \
  --prompt "Explain serverless inference in one paragraph."

# Different model
python3 gradient_chat.py \
  --model "llama3.3-70b-instruct" \
  --prompt "Write a haiku about cloud computing."

Direct API call:

curl -s https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

📖 Chat Completions docs

⚡ Responses API (Recommended)

DigitalOcean's recommended endpoint for new integrations. Simpler request format and supports prompt caching — a.k.a. "stop paying twice for the same context."

# Basic usage
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Summarize this earnings report." \
  --responses-api

# With prompt caching (saves cost on follow-up queries)
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Now compare it to last quarter." \
  --responses-api --cache

Direct API call:

curl -s https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "input": "Explain prompt caching.",
    "store": true
  }'

When to use which:

	Chat Completions	Responses API
---	---	---
Request format	Array of messages with roles	Single `input` string
Prompt caching	❌	✅ via `store: true`
Multi-step tool use	Manual	Built-in
Best for	Structured conversations	Simple queries, cost savings

📖 Responses API docs

🖼️ Generate Images

Turn text prompts into images. Because sometimes a chart isn't enough.

python3 gradient_image.py --prompt "A lobster trading stocks on Wall Street"
python3 gradient_image.py --prompt "Sunset over the NYSE" --output sunset.png
python3 gradient_image.py --prompt "Fintech logo" --json

Direct API call:

curl -s https://inference.do-ai.run/v1/images/generations \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A lobster analyzing candlestick charts",
    "n": 1
  }'

📖 Image generation docs

🧠 Model Selection Guide

Not all models are created equal. Choose wisely, young crustacean:

Model	Best For	Speed	Quality	Context
-------	----------	-------	---------	---------
`openai-gpt-oss-120b`	Complex reasoning, analysis, writing	Medium	★★★★★	128K
`llama3.3-70b-instruct`	General tasks, instruction following	Fast	★★★★	128K
`deepseek-r1-distill-llama-70b`	Math, code, step-by-step reasoning	Slow	★★★★★	128K
`qwen3-32b`	Quick triage, short tasks	Fastest	★★★	32K

> 🦞 Pro tip: Cost-aware routing. Use a fast model (e.g., qwen3-32b) to score or triage, then only escalate to a strong model (e.g., openai-gpt-oss-120b) when depth is needed. Enable prompt caching for repeated context.

Always run python3 gradient_models.py to check what's currently available — the menu changes.

📖 Available models

💰 Model Pricing Lookup

Check what models cost before you rack up a bill. Scrapes the official DigitalOcean pricing page — no API key needed.

python3 gradient_pricing.py                    # Pretty table
python3 gradient_pricing.py --json             # Machine-readable
python3 gradient_pricing.py --model "llama"    # Filter by model name
python3 gradient_pricing.py --no-cache         # Skip cache, fetch live

How it works:

Fetches live pricing from DigitalOcean's docs (public page, no auth)
Caches results for 24 hours in /tmp/gradient_pricing_cache.json
Falls back to a bundled snapshot if the live fetch fails

> 🦞 Pro tip: Run python3 gradient_pricing.py --model "gpt-oss" before choosing a model to see the cost difference between gpt-oss-120b ($0.10/$0.70) and gpt-oss-20b ($0.05/$0.45) per 1M tokens.

📖 Pricing docs

CLI Reference

All scripts accept --json for machine-readable output.

gradient_models.py   [--json] [--filter QUERY]
gradient_chat.py     --prompt TEXT [--model ID] [--system TEXT]
                     [--responses-api] [--cache] [--temperature F]
                     [--max-tokens N] [--json]
gradient_image.py    --prompt TEXT [--model ID] [--output PATH]
                     [--size WxH] [--json]
gradient_pricing.py  [--json] [--model QUERY] [--no-cache]

External Endpoints

Endpoint	Purpose
----------	---------
`https://inference.do-ai.run/v1/models`	List available models
`https://inference.do-ai.run/v1/chat/completions`	Chat Completions API
`https://inference.do-ai.run/v1/responses`	Responses API (recommended)
`https://inference.do-ai.run/v1/images/generations`	Image generation
`https://docs.digitalocean.com/.../pricing/`	Pricing page (scraped, public)

Security & Privacy

All requests go to inference.do-ai.run — DigitalOcean's own endpoint
Your GRADIENT_API_KEY is sent as a Bearer token in the Authorization header
No other credentials or local data leave the machine
Model Access Keys are scoped to inference only — they can't manage your DO account
Prompt caching entries are scoped to your account and automatically expire

Trust Statement

> By using this skill, prompts and data are sent to DigitalOcean's Gradient Inference API.

> Only install if you trust DigitalOcean with the content you send to their LLMs.

Important Notes

Run python3 gradient_models.py before assuming a model exists — they rotate
All scripts exit with code 1 and print errors to stderr on failure

版本历史

共 1 个版本

v0.1.3 当前

2026-03-29 06:02 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Gradient Inference

概述

🦞 Gradient AI — Serverless Inference

Authentication

Tools

🔍 List Available Models

💬 Chat Completions

⚡ Responses API (Recommended)

🖼️ Generate Images

🧠 Model Selection Guide

💰 Model Pricing Lookup

CLI Reference

External Endpoints

Security & Privacy

Trust Statement

Important Notes

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

AdMapix

Gradient Knowledge Base

Baidu Wenku AIPPT