概述

GPT on RunAPI

Use the official OpenAI SDK (Python, TypeScript, Ruby) -- or any

OpenAI-compatible HTTP client -- and switch the base URL to

https://runapi.ai/v1. The endpoints speak the standard OpenAI protocol:

Chat Completions (POST /v1/chat/completions), the Responses API

(POST /v1/responses), and Embeddings (POST /v1/embeddings). No client

code changes beyond base_url and api_key.

Setup

OPENAI_API_KEY=YOUR_RUNAPI_TOKEN
OPENAI_BASE_URL=https://runapi.ai/v1

Get a RunAPI API Key at .

Language	Init
---	---
Python	`OpenAI(api_key=..., base_url="https://runapi.ai/v1")`
TypeScript	`new OpenAI({ apiKey: ..., baseURL: "https://runapi.ai/v1" })`
Ruby	`OpenAI::Client.new(access_token: ..., uri_base: "https://runapi.ai/v1")`
curl	`POST https://runapi.ai/v1/chat/completions` (or `/v1/responses`, `/v1/embeddings`)

Pick the right endpoint

Chat, reasoning, and Codex models are reachable through every conversational

surface — Chat Completions, Responses, Anthropic-compatible /v1/messages, and

Gemini contents — so pick whichever protocol your client already speaks.

Embedding models (text-embedding-*) are reachable only through

/v1/embeddings.

Core recipe — Chat Completions

from openai import OpenAI

client = OpenAI(api_key="YOUR_RUNAPI_TOKEN", base_url="https://runapi.ai/v1")

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Explain quantum computing simply."}],
    reasoning_effort="high",
)
print(response.choices[0].message.content)
print(response.usage)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_RUNAPI_TOKEN",
  baseURL: "https://runapi.ai/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-5.4",
  messages: [{ role: "user", content: "Explain quantum computing simply." }],
});

Core recipe — Responses API

import httpx

response = httpx.post(
    "https://runapi.ai/v1/responses",
    headers={"x-api-key": "YOUR_RUNAPI_TOKEN"},
    json={
        "model": "gpt-5.4",
        "input": "Explain the theory of relativity.",
        "reasoning": {"effort": "medium"},
    },
)
print(response.json())

The Responses API takes input (string or structured), reasoning.effort

("low" / "medium" / "high"), and optional include for thinking blocks.

Core recipe — Embeddings

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["search document", "query text"],
    encoding_format="float",
)
print(response.data[0].embedding)
print(response.usage)

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: ["search document", "query text"],
  encoding_format: "float",
});
console.log(response.data[0].embedding);

Streaming

stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Write a haiku about coding."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

const stream = await client.chat.completions.create({
  model: "gpt-5.4",
  messages: [{ role: "user", content: "Write a haiku about coding." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Streaming runs through a regional edge proxy so the request does not hold a

Rails/Puma thread. Long generations should always stream.

Vision / multimodal

{
  "model": "gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        { "type": "image_url", "image_url": { "url": "https://runapi.ai/img.jpg" } }
      ]
    }
  ]
}

Standard OpenAI multimodal block — works on both Chat Completions and

Responses (Responses also accepts structured input items).

Tool use / function calling / web search

{
  "model": "gpt-5.4",
  "messages": [
    { "role": "user", "content": "Find the latest news on RunAPI." }
  ],
  "tools": [
    { "type": "function", "function": { "name": "web_search" } }
  ]
}

web_search is supported across the GPT models above. Custom function tools

use the standard OpenAI tools schema.

List models

curl https://runapi.ai/v1/models -H "Authorization: Bearer YOUR_RUNAPI_TOKEN"

Returns OpenAI-compatible model objects. If the API Key has

allowed_models restrictions, only permitted models are returned.

Protocol compatibility

GPT generation models are also available through RunAPI's

Anthropic-compatible /v1/messages and Gemini contents client surfaces. Use

these protocol paths when an existing agent runtime already expects that

request shape; for new GPT app code, prefer the OpenAI-compatible setup above.

curl -X POST "https://runapi.ai/v1/messages" \
  -H "x-api-key: YOUR_RUNAPI_TOKEN" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Draft a concise answer."}]
  }'

curl -X POST \
  "https://runapi.ai/v1beta/models/gpt-5.4:streamGenerateContent" \
  -H "x-goog-api-key: YOUR_RUNAPI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"role":"user","parts":[{"text":"Hello, GPT!"}]}]}'

Embeddings remain available only on /v1/embeddings; do not send embedding

models to generation endpoints or compatibility surfaces.

Supported models

Model ID	Use when
---	---
`gpt-5.5`	Latest general model
`gpt-5.5-pro`	Reasoning-heavy
`gpt-5.4`	Production default
`gpt-5.4-mini`	Cost-optimized
`gpt-5.4-nano`	Smallest, fastest
`gpt-5.4-pro`	Reasoning
`gpt-5.3-codex`	Code generation
`gpt-5.3-codex-spark`	Faster Codex variant
`gpt-5.2`	Cost-effective
`gpt-5.2-pro`	Reasoning
`text-embedding-3-large`	High-capacity vectors
`text-embedding-3-small`	Efficient vectors
`text-embedding-ada-002`	Legacy-compatible vectors

Connect Codex CLI itself

export OPENAI_BASE_URL=https://runapi.ai/v1
export OPENAI_API_KEY=YOUR_RUNAPI_TOKEN
codex

Agent rules

Pro models (gpt-5.*-pro) reject Chat Completions — always use Responses

for them. Other models accept either endpoint.

Embedding models only work on /v1/embeddings; do not send them to Chat

Completions or Responses.

Default GPT-native integrations to OpenAI-compatible endpoints. Use

Anthropic-compatible or Gemini contents paths only for existing clients

that require those request shapes.

Use streaming for any response longer than a few hundred tokens. Do not

hold the agent on a long blocking request.

reasoning_effort is supported on every GPT model above; default is

usually "high" for non-Pro models.

Pricing, rate limits, quotas — link to ,

not this skill file.

Routing

Model page:
Provider page:
Catalog:

版本历史

共 3 个版本

v0.2.9 当前

2026-07-03 06:59
v0.2.7

2026-06-24 23:22 安全安全
v0.2.4

2026-05-25 17:12 安全安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

gpt

概述