概述

gpt-image-2

Generate and edit images via OpenAI's gpt-image-2 model. Agent-agnostic — designed to work with any AI agent or standalone from the command line.

Quick Start

# 1. Initialize config (one-time)
python3 gpt_image2.py config --init

# 2. Edit the config to set your API key
#    ~/.config/gpt-image-2/config.json

# 3. Generate
python3 gpt_image2.py generate "A cute cat on a windowsill" -o ~/cat.png --quality low

# 4. Edit
python3 gpt_image2.py edit input.png "Change the sofa to green" -o ~/output.png

Configuration

Config priority: --config flag > --base-url/--api-key flags > config file > environment variables > defaults.

Config File Locations (in priority order)

Priority	Path	Notes
----------	------	-------
1	`$XDG_CONFIG_HOME/gpt-image-2/config.json`	XDG standard (recommended)
2	`~/.config/gpt-image-2/config.json`	Default XDG fallback
3	`~/.gpt-image-2-config.json`	Single-file fallback
4	`~/.hermes/gpt-image-2-config.json`	Legacy Hermes compat

Use python3 gpt_image2.py config --show to see which config is active.

Config File Format

{
  "base_url": "https://api.openai.com/v1",
  "api_key_env": "OPENAI_API_KEY"
}

Field	Type	Description
-------	------	-------------
`base_url`	string	API base URL. Default: `https://api.openai.com/v1`
`api_key`	string	Plaintext API key (not recommended — visible in file)
`api_key_env`	string	Environment variable name holding the key (recommended)

Environment Variables (fallback when no config file)

Variable	Purpose
----------	---------
`GPT_IMAGE2_API_KEY`	API key
`GPT_IMAGE2_BASE_URL`	API base URL

Config Management Commands

# Create template config
python3 gpt_image2.py config --init

# Show active config (keys are masked)
python3 gpt_image2.py config --show

# Overwrite config
python3 gpt_image2.py config --init --force

CLI Reference

generate — Text-to-Image

python3 gpt_image2.py generate "prompt" [options]

Option	Default	Description
--------	---------	-------------
`-o, --output`	`~/gpt-image2-output.png`	Output file path
`--quality`	`auto`	`low` (~70s), `medium` (~120s), `high` (~276s)
`--size`	`auto`	`1024x1024`, `1536x1024`, `1024x1536`
`--format`	`png`	`png`, `jpeg`, `webp`
`--n`	`1`	Number of images (1-10)
`--timeout`	`600`	curl timeout in seconds
`--config`	auto-detect	Explicit config file path
`--base-url`	from config	Override API base URL
`--api-key`	from config	Override API key (visible in ps!)

edit — Image-to-Image

python3 gpt_image2.py edit <image_path> "edit prompt" [options]

Option	Default	Description
--------	---------	-------------
`--mask`	none	PNG mask (transparent=edit area)
`--moderation`	`auto`	`low` or `auto`
(all generate options also apply)

config — Manage Configuration

python3 gpt_image2.py config [--init] [--show] [--force] [--config PATH]

Script Location

The script is at scripts/gpt_image2.py relative to this skill directory.

To find it programmatically from any agent:

# If installed as a Hermes skill:
SCRIPT="$(dirname "$(readlink -f "$0")")/../skills/creative/gpt-image-2/scripts/gpt_image2.py"

# Or copy/symlink it anywhere — it's self-contained with zero dependencies beyond stdlib + curl
cp scripts/gpt_image2.py /usr/local/bin/gpt-image2

The script has zero pip dependencies — only Python 3.8+ stdlib and curl.

API Reference

Generations (Text-to-Image)

Item	Value
------	-------
Endpoint	`POST {base_url}/images/generations`
Auth	`Authorization: Bearer {api_key}`
Content-Type	`application/json`

Edits (Image-to-Image)

Item	Value
------	-------
Endpoint	`POST {base_url}/images/edits`
Auth	`Authorization: Bearer {api_key}`
Content-Type	`multipart/form-data`

Parameters

Generations (JSON body):

Param	Type	Required	Description
-------	------	----------	-------------
`model`	string	yes	`gpt-image-2`
`prompt`	string	yes	Text description
`n`	int	no	Number of images (default 1)
`size`	string	no	`1024x1024`, `1536x1024`, `1024x1536`
`quality`	string	no	`low`, `medium`, `high` (default `auto`)
`format`	string	no	`png`, `jpg`, `webp` (default `png`)

Edits (form-data):

Param	Type	Required	Description
-------	------	----------	-------------
`model`	string	yes	`gpt-image-2`
`prompt`	string	yes	Edit instruction
`image`	file	yes	Source image (PNG, max 4 images)
`n`	int	no	Number of outputs (default 1)
`size`	string	no	`1024x1024`, `1536x1024`, `1024x1536`, or `auto`
`quality`	string	no	`low`, `medium`, `high` (default `auto`)

Agent Integration Guide

This skill is designed to be agent-agnostic. Any AI agent can use it by:

Locate the script: Find gpt_image2.py in the skill's scripts/ directory
Call via shell: python3 /gpt_image2.py generate "prompt" -o output.png
Parse stdout: The script prints Saved: ( KB) on success

Integration Examples

Hermes / Claude Code / Codex / OpenClaw:

python3 /path/to/gpt-image-2/scripts/gpt_image2.py generate "prompt" -o output.png --quality low

From Python (any agent):

import subprocess, json
result = subprocess.run(
    ["python3", script_path, "generate", prompt, "-o", output_path, "--quality", "low"],
    capture_output=True, text=True, timeout=600
)
# Parse result.stdout for "Saved: <path>"

From Node.js / TypeScript:

const { execSync } = require('child_process');
const output = execSync(`python3 ${scriptPath} generate "${prompt}" -o ${outputPath}`);
// Parse output.toString() for "Saved: ..."

Workflow: Agent Generates Images

Always use the CLI script — handles config resolution, auth security, and response parsing
Use low quality for drafts, high quality for final output
For edits: --size auto preserves original dimensions (recommended)
The script outputs: HTTP status, time elapsed, output file path and size
Parse the output: look for Saved: lines to find generated files

Workflow: Agent Edits Existing Images

Save or locate the source image path
Call gpt_image2.py edit "" --output
Edit endpoint can accept up to 4 images via repeated --image flags
Use --size auto to preserve original dimensions

Important Pitfalls

--api-key flag is visible in shell history and ps aux — prefer config file (api_key_env) or environment variables.
The edits endpoint does NOT support response_format — always returns b64_json regardless.
gpt-image-2 generations may time out on some relay endpoints — use --timeout flag (default 600s).
Prompt with special characters — the script writes prompts to temp files internally, avoiding shell escaping issues. No need to worry about quoting.
Authorization header is never passed via -H — the script uses curl -K temp config file, deleted immediately after use. Keys never appear in ps aux.
Config file permissions — the script warns if config has group/other read permissions. Run chmod 600 to fix.
Zero pip dependencies — the script only requires Python 3.8+ stdlib and curl. No installation step needed.
Chinese text in prompts may not render correctly — gpt-image-2's Chinese rendering is unstable; it often ignores Chinese constraints and outputs English text in images. Consider using Gemini for Chinese text rendering.

版本历史

共 1 个版本

v2.0.0 当前

2026-05-07 13:49 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)