概述

Gemini on RunAPI

Gemini on RunAPI exposes two protocols:

Protocol	Endpoint	Use when
---	---	---
OpenAI-compatible	`POST /v1beta/openai/chat/completions`	You already use the OpenAI SDK or any OpenAI client
Native Gemini	`POST /v1beta/models/:streamGenerateContent`	You use Google's `@google/generative-ai` SDK (currently `gemini-3-flash-preview` only)

Both accept the same RunAPI API Key.

Setup

RUNAPI_TOKEN=YOUR_RUNAPI_TOKEN

Get a RunAPI API Key at .

OpenAI-compatible setup

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_RUNAPI_TOKEN",
    base_url="https://runapi.ai/v1beta/openai",
)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_RUNAPI_TOKEN",
  baseURL: "https://runapi.ai/v1beta/openai",
});

Native Gemini setup

export GOOGLE_API_KEY=YOUR_RUNAPI_TOKEN
export GOOGLE_GENAI_BASE_URL=https://runapi.ai

Core recipe — OpenAI-compatible

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Explain quantum computing simply."}],
    reasoning_effort="high",
)
print(response.choices[0].message.content)
print(response.usage)

const response = await client.chat.completions.create({
  model: "gemini-2.5-flash",
  messages: [{ role: "user", content: "Explain quantum computing simply." }],
});

curl -X POST "https://runapi.ai/v1beta/openai/chat/completions" \
  -H "x-api-key: YOUR_RUNAPI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Explain quantum computing simply."}]
  }'

Core recipe — native Gemini

curl -X POST \
  "https://runapi.ai/v1beta/models/gemini-3-flash-preview:streamGenerateContent" \
  -H "x-goog-api-key: YOUR_RUNAPI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      { "role": "user", "parts": [{ "text": "Hello!" }] }
    ]
  }'

The native protocol returns SSE chunks in Google's streamGenerateContent

format — use the official @google/generative-ai SDK or Google's

google-genai Python package to consume it.

Streaming (OpenAI-compatible)

stream = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Write a haiku about coding."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Streaming runs through a regional edge proxy so the request does not hold a

Rails/Puma thread. Long generations should always stream.

Vision / multimodal

{
  "model": "gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        { "type": "image_url", "image_url": { "url": "https://example.com/img.jpg" } }
      ]
    }
  ]
}

Standard OpenAI multimodal block for the OpenAI-compatible endpoint. For the

native endpoint, embed image data as parts[].inlineData or parts[].fileData.

Google Search grounding

{
  "model": "gemini-2.5-pro",
  "messages": [
    { "role": "user", "content": "Latest news on Gemini 3." }
  ],
  "tools": [
    { "type": "function", "function": { "name": "googleSearch" } }
  ]
}

Available on gemini-2.5-flash, gemini-2.5-pro, gemini-3.1-pro-preview,

and gemini-3-pro-preview.

Structured output

{
  "model": "gemini-2.5-flash",
  "messages": [{ "role": "user", "content": "Give me one person object." }],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": { "name": { "type": "string" }, "age": { "type": "integer" } },
        "required": ["name", "age"]
      }
    }
  }
}

Reasoning effort

Supported on gemini-2.5-pro, gemini-3.1-pro-preview, gemini-3-pro-preview,

and gemini-3-flash-preview — pass reasoning_effort: "low" | "medium" | "high".

List models

curl https://runapi.ai/v1beta/models -H "x-api-key: YOUR_RUNAPI_TOKEN"

Or via the OpenAI-style path:

curl https://runapi.ai/v1beta/openai/models \
  -H "Authorization: Bearer YOUR_RUNAPI_TOKEN"

Supported models

Model ID	OpenAI endpoint	Native endpoint	Capabilities
---	---	---	---
`gemini-2.5-flash`	yes	—	Chat, multimodal, Google Search, structured output, thoughts
`gemini-2.5-pro`	yes	—	+ reasoning effort
`gemini-3.1-pro-preview`	yes	—	+ reasoning effort
`gemini-3-pro-preview`	yes	—	+ reasoning effort
`gemini-3-flash-preview`	yes	`:streamGenerateContent`	Chat, multimodal, function calling, structured output, reasoning effort

gemini-flash-latest resolves to gemini-3-flash-preview.

Connect Gemini CLI itself

export GOOGLE_API_KEY=YOUR_RUNAPI_TOKEN
export GOOGLE_GENAI_BASE_URL=https://runapi.ai
gemini

Agent rules

The native :streamGenerateContent path is currently only wired for

gemini-3-flash-preview — use the OpenAI-compatible endpoint for every

other Gemini model.

Use streaming for any response longer than a few hundred tokens. Do not

hold the agent on a long blocking request.

Google Search grounding uses a googleSearch function tool.
Pricing, rate limits, quotas — link to ,

not this skill file.

Routing

Model page:
Provider page:
Catalog:

版本历史

共 1 个版本

v0.2.4 当前

2026-05-25 17:12 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)