← 返回
未分类 Key 中文

Tmp.BbtTyvLrpz

Generate and edit images with Google Gemini image models via MCP. Use when the user asks to generate, create, or edit images using Gemini or Nano Banana mode...
使用Google Gemini图像模型通过MCP生成和编辑图像。当用户请求生成、创建或编辑图像(使用Gemini或Nano Banana模式)时使用。
chrischall chrischall 来源
未分类 clawhub v0.6.0 3 版本 100000 Key: 需要
★ 0
Stars
📥 85
下载
💾 1
安装
3
版本
#latest

概述

gemini-mcp

MCP server for Google Gemini image generation and editing — natural-language image creation via the Gemini API (Nano Banana / Nano Banana Pro models).

Setup

Option A — npx (recommended)

Add to .mcp.json in your project or ~/.claude/mcp.json:

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "@chrischall/gemini-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Option B — from source

git clone https://github.com/chrischall/gemini-mcp
cd gemini-mcp
npm install && npm run build

Then add to .mcp.json:

{
  "mcpServers": {
    "gemini": {
      "command": "node",
      "args": ["/path/to/gemini-mcp/dist/index.js"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Or use a .env file in the project directory with GEMINI_API_KEY=.

Getting your API key

  1. Go to aistudio.google.com/apikey
  2. Create an API key (requires a Google account)
  3. Copy the key and set it as GEMINI_API_KEY

Note: Image generation requires a billing-enabled Google Cloud project.

Environment Variables

VariableRequiredDescription
---------
GEMINI_API_KEYYesYour Google Gemini API key
GEMINI_IMAGE_MODELNoOverride the default image model (default: gemini-3-pro-image)
GEMINI_OUTPUT_DIRNoDefault directory for saved images (default: current working directory)
GEMINI_INPUT_DIRNoDirectory to resolve bare input-image filenames against (e.g. point at Cowork's uploads/ folder so images: ["house.jpg"] works)

Tools

Models

ToolDescription
-------------------
gemini_list_modelsList available Gemini image models and the current default

Image Generation

ToolDescription
-------------------
gemini_generate_image(prompt, count?, images?, images_base64?, video_url?, video_path?, google_search?, seed?, filename?, model?, aspect_ratio?, image_size?, thinking_level?, output_dir?, inline?)Generate image(s) from a text prompt (optionally image-conditioned via images/images_base64, or video-conditioned via video_url/video_path)
gemini_edit_image(prompt, images?, images_base64?, google_search?, seed?, filename?, model?, aspect_ratio?, image_size?, thinking_level?, output_dir?, inline?)Edit or compose input image(s) — by path (images) or value (images_base64: data URI or raw base64) — with a text instruction. Requires ≥1 input
`gemini_generate_set(master_prompt, scenes? \count?, reference_mode?, master_images?, master_images_base64?, google_search?, seed?, basename?, model?, thinking_level?, ...)`Master image (optionally seeded from a reference photo) plus N consistent images referencing it

Multi-turn (Interactions API — Beta)

ToolDescription
-------------------
gemini_interact(input, previous_interaction_id?, images?, images_base64?, video_url?, video_path?, google_search?, model?, aspect_ratio?, image_size?, thinking_level?, filename?, output_dir?, inline?)Generate/edit via Gemini's Interactions API. Returns an interaction_id; pass it back as previous_interaction_id to iteratively refine the same image conversationally — the recommended way to make incremental edits. Output is JPEG. (Beta API.)

Workflows

Generate a single image:

gemini_generate_image(prompt: "a red maple leaf on white background, studio photo")
→ returns path to saved PNG

Generate multiple variations:

gemini_generate_image(prompt: "a cartoon fox", count: 4, output_dir: "/tmp/foxes")
→ returns paths to 4 PNG files

Edit an existing image:

gemini_edit_image(prompt: "make the background blue", images: ["/path/to/image.png"])
→ returns path to edited PNG

Generate a consistent set (master + scenes):

gemini_generate_set(
  master_prompt: "a cartoon fox named Rusty, orange fur, blue scarf",
  scenes: ["Rusty waving hello", "Rusty eating an apple", "Rusty sleeping"]
)
→ returns paths to master + 3 scene images, all consistent

Generate variations of a concept:

gemini_generate_set(
  master_prompt: "minimalist logo for a coffee shop",
  count: 5
)
→ returns master + 5 variations

Use a reference photo by value (when you have the bytes):

gemini_edit_image(
  prompt: "place this house on a vintage travel-poster background",
  images_base64: ["data:image/jpeg;base64,/9j/4AAQ..."]   // or raw base64
)
→ returns path to the edited image

images_base64 is for bytes you actually have — a file you Read/encode, a URL

you fetch, or a data: URI the user pastes as text.

Iterate on ONE image conversationally (multi-turn):

r1 = gemini_interact(input: "a cozy reading nook, watercolor")
   → { images: [...], interaction_id: "v1_abc…" }
r2 = gemini_interact(input: "add a sleeping cat on the chair",
                     previous_interaction_id: r1.interaction_id)
   → refined image that preserves r1; returns a NEW interaction_id

Prefer this over re-running gemini_edit_image when you're making a series of incremental edits — the model keeps the prior result in context.

⚠️ Chat-pasted/attached images can't be fed to these tools directly. A pasted

image reaches the assistant as a vision block — the assistant can SEE it but

never receives the original bytes, and the host doesn't write it to disk. So

neither images (no file exists) nor images_base64 (the bytes can't be

reconstructed from a downscaled vision rendering) is obtainable from a paste.

To use a real reference photo, the user must make the bytes available: save

the file and give its path (→ images), drop it into the project dir, paste

it as a data: URI in text, or host it at a URL (fetch → base64 →

images_base64). This is a host/Cowork limitation, not an MCP one.

Two built-in ways to get past the unreachable-paste problem without any manual

extraction:

  • from_clipboard: true (macOS) — the tool reads the image off the system

clipboard itself (osascript), downscales it, and uses it. The user just needs

to copy the image (⌘C — distinct from pasting it inline into chat, which

doesn't keep it on the clipboard). Works on every image tool:

gemini_edit_image(prompt: "…", from_clipboard: true).

  • GEMINI_INPUT_DIR — point it at a folder (e.g. Cowork's uploads/); then a

bare filename resolves against it: gemini_edit_image(prompt: "…", images: ["house.jpg"]).

Notes

  • Input images accept either file paths (images / master_images) or base64/data-URI values (images_base64 / master_images_base64).
  • seed makes a result reproducible; it's echoed in the result metadata (a random one is chosen + echoed when omitted). count>1 uses seed, seed+1, … so the images differ. Determinism isn't fully guaranteed by the model.
  • filename/basename set the output name (extension stripped); names never overwrite (a -2, -3 suffix is added). The result echoes the absolute path(s), model, seed, and aspect/size.
  • No edit-strength control. Gemini exposes no denoise/strength knob, and Nano Banana over-preserves the input — big structural edits ("move/remove/shrink", add a mat border) are often ignored. Workarounds: reroll with a different seed, raise thinking_level to high, use forceful wording, do layout changes (padding/borders) externally, or use gemini_interact multi-turn.
  • thinking_level (minimal/high, Gemini 3 models) controls reasoning depth — high can improve complex compositions/edits at higher latency/cost.
  • Model text. When the model returns a caption/explanation (mostly Gemini 3 Pro), it's surfaced as text in the result metadata.
  • google_search: true grounds the image in live Google Search (current events, weather, real data — great for infographics). The result metadata includes grounding with the queries run and the sources ({uri, title}) used. (gemini_interact surfaces grounding.queries only — the Interactions API returns no clean source list.)
  • video_url (a public YouTube URL, on gemini_generate_image / gemini_interact) generates an image from a video reference — requires a Flash model (e.g. model: "gemini-3.1-flash-image"). For a local video file, use video_path instead: the file is uploaded to the Gemini Files API (streamed from disk, 2 GB max), waited to ACTIVE, and referenced by its files/… uri. The result metadata echoes video_file ({uri, name, expires}, ~48h retention) — reuse that uri as video_url in later calls to skip re-uploading.
  • gemini_interact is the multi-turn path: it returns an interaction_id; thread it back via previous_interaction_id for conversational refinement. Output is JPEG only. It's a Beta API (separate from the stable generate/edit/set tools).
  • output_dir per-call overrides $GEMINI_OUTPUT_DIR overrides cwd. inline: true returns bytes (with a metadata text block) instead of writing.
  • count and scenes are mutually exclusive in gemini_generate_set; reference_mode: "chain" references the previous image instead of the master.
  • Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, … · Image sizes: 512 (0.5K, Flash only), 1K, 2K, 4K. 4K is the max native output — true 18×24 in @ 300 DPI (5400×7200) needs an external upscale step.
  • All generated images carry a SynthID watermark (Google).
  • The model can mis-render text/Roman numerals (e.g. years) — verify any text in the output; it's a model limitation, not a tool setting.
  • Server logs to stderr only — stdout is reserved for JSON-RPC.

版本历史

共 3 个版本

  • v0.6.0 当前
    2026-06-14 20:07 安全 安全
  • v0.5.0
    2026-06-11 18:44
  • v0.4.0
    2026-06-09 19:02

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 224 📥 48,591
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 335 📥 94,521
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 434 📥 117,718