Use the official OpenAI SDK (Python, TypeScript, Ruby) -- or any
OpenAI-compatible HTTP client -- and switch the base URL to
https://runapi.ai/v1. The endpoints speak the standard OpenAI protocol:
Chat Completions (POST /v1/chat/completions), the Responses API
(POST /v1/responses), and Embeddings (POST /v1/embeddings). No client
code changes beyond base_url and api_key.
OPENAI_API_KEY=YOUR_RUNAPI_TOKEN
OPENAI_BASE_URL=https://runapi.ai/v1
Get a RunAPI API Key at
| Language | Init |
|---|---|
| --- | --- |
| Python | OpenAI(api_key=..., base_url="https://runapi.ai/v1") |
| TypeScript | new OpenAI({ apiKey: ..., baseURL: "https://runapi.ai/v1" }) |
| Ruby | OpenAI::Client.new(access_token: ..., uri_base: "https://runapi.ai/v1") |
| curl | POST https://runapi.ai/v1/chat/completions (or /v1/responses, /v1/embeddings) |
Chat, reasoning, and Codex models are reachable through every conversational
surface — Chat Completions, Responses, Anthropic-compatible /v1/messages, and
Gemini contents — so pick whichever protocol your client already speaks.
Embedding models (text-embedding-*) are reachable only through
/v1/embeddings.
from openai import OpenAI
client = OpenAI(api_key="YOUR_RUNAPI_TOKEN", base_url="https://runapi.ai/v1")
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Explain quantum computing simply."}],
reasoning_effort="high",
)
print(response.choices[0].message.content)
print(response.usage)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_RUNAPI_TOKEN",
baseURL: "https://runapi.ai/v1",
});
const response = await client.chat.completions.create({
model: "gpt-5.4",
messages: [{ role: "user", content: "Explain quantum computing simply." }],
});
import httpx
response = httpx.post(
"https://runapi.ai/v1/responses",
headers={"x-api-key": "YOUR_RUNAPI_TOKEN"},
json={
"model": "gpt-5.4",
"input": "Explain the theory of relativity.",
"reasoning": {"effort": "medium"},
},
)
print(response.json())
The Responses API takes input (string or structured), reasoning.effort
("low" / "medium" / "high"), and optional include for thinking blocks.
response = client.embeddings.create(
model="text-embedding-3-small",
input=["search document", "query text"],
encoding_format="float",
)
print(response.data[0].embedding)
print(response.usage)
const response = await client.embeddings.create({
model: "text-embedding-3-small",
input: ["search document", "query text"],
encoding_format: "float",
});
console.log(response.data[0].embedding);
stream = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Write a haiku about coding."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
const stream = await client.chat.completions.create({
model: "gpt-5.4",
messages: [{ role: "user", content: "Write a haiku about coding." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content ?? "");
}
Streaming runs through a regional edge proxy so the request does not hold a
Rails/Puma thread. Long generations should always stream.
{
"model": "gpt-5.4",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url", "image_url": { "url": "https://runapi.ai/img.jpg" } }
]
}
]
}
Standard OpenAI multimodal block — works on both Chat Completions and
Responses (Responses also accepts structured input items).
{
"model": "gpt-5.4",
"messages": [
{ "role": "user", "content": "Find the latest news on RunAPI." }
],
"tools": [
{ "type": "function", "function": { "name": "web_search" } }
]
}
web_search is supported across the GPT models above. Custom function tools
use the standard OpenAI tools schema.
curl https://runapi.ai/v1/models -H "Authorization: Bearer YOUR_RUNAPI_TOKEN"
Returns OpenAI-compatible model objects. If the API Key has
allowed_models restrictions, only permitted models are returned.
GPT generation models are also available through RunAPI's
Anthropic-compatible /v1/messages and Gemini contents client surfaces. Use
these protocol paths when an existing agent runtime already expects that
request shape; for new GPT app code, prefer the OpenAI-compatible setup above.
curl -X POST "https://runapi.ai/v1/messages" \
-H "x-api-key: YOUR_RUNAPI_TOKEN" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Draft a concise answer."}]
}'
curl -X POST \
"https://runapi.ai/v1beta/models/gpt-5.4:streamGenerateContent" \
-H "x-goog-api-key: YOUR_RUNAPI_TOKEN" \
-H "Content-Type: application/json" \
-d '{"contents":[{"role":"user","parts":[{"text":"Hello, GPT!"}]}]}'
Embeddings remain available only on /v1/embeddings; do not send embedding
models to generation endpoints or compatibility surfaces.
| Model ID | Use when |
|---|---|
| --- | --- |
gpt-5.5 | Latest general model |
gpt-5.5-pro | Reasoning-heavy |
gpt-5.4 | Production default |
gpt-5.4-mini | Cost-optimized |
gpt-5.4-nano | Smallest, fastest |
gpt-5.4-pro | Reasoning |
gpt-5.3-codex | Code generation |
gpt-5.3-codex-spark | Faster Codex variant |
gpt-5.2 | Cost-effective |
gpt-5.2-pro | Reasoning |
text-embedding-3-large | High-capacity vectors |
text-embedding-3-small | Efficient vectors |
text-embedding-ada-002 | Legacy-compatible vectors |
export OPENAI_BASE_URL=https://runapi.ai/v1
export OPENAI_API_KEY=YOUR_RUNAPI_TOKEN
codex
gpt-5.*-pro) reject Chat Completions — always use Responsesfor them. Other models accept either endpoint.
/v1/embeddings; do not send them to ChatCompletions or Responses.
Anthropic-compatible or Gemini contents paths only for existing clients
that require those request shapes.
hold the agent on a long blocking request.
reasoning_effort is supported on every GPT model above; default is usually "high" for non-Pro models.
not this skill file.
共 3 个版本