AI image generation on RunComfy. Generate and edit images with 11+ AI models via the RunComfy CLI — text-to-image and image-to-image, one auth, one command. This RunComfy image generation skill picks the right model for the user's intent and ships the documented prompt patterns + the exact runcomfy run invoke for each.
runcomfy.com · Browse all models · CLI docs
# 1. Install (one of — see runcomfy-cli skill for details)
npm i -g @runcomfy/cli # global install
npx -y @runcomfy/cli --version # zero-install
# 2. Sign in (interactive — opens browser)
runcomfy login
# or in CI / containers:
export RUNCOMFY_TOKEN=<token-from-runcomfy.com/profile>
# 3. Generate
runcomfy run <vendor>/<model>/<endpoint> \
--input '{"prompt": "..."}' \
--output-dir ./out
CLI docs: Install · Quickstart · Commands · Auth · Troubleshooting
FLUX 2 Klein 9B — blackforestlabs/flux-2-klein/9b/text-to-image (default)
> Step-distilled, 4–25 steps, native multi-reference conditioning, strong photoreal + illustration all-rounder.
> Pick for: intent unclear, fast iteration, multi-ref styling, general-purpose.
> Avoid for: in-image text — use GPT Image 2.
FLUX 2 Klein 4B — blackforestlabs/flux-2-klein/4b/text-to-image
> Sub-second variant of Klein 9B, same field set.
> Pick for: storyboard, moodboard, batch concepting at speed.
> Avoid for: final delivery — slight quality drop vs 9B.
FLUX 2 Pro / Dev / Flash / Turbo / Max — blackforestlabs/flux-2/max, flux-2-dev, flux-2-flash, flux-2-turbo
> Higher-fidelity tiers of the FLUX 2 base. Cinematic + brand work, hero shots.
> Pick for: production polish, brand campaigns.
> Avoid for: sub-second speed — use Klein 4B.
Nano Banana Pro — google/nano-banana-pro/text-to-image
> Highest-quality Nano Banana tier. Gemini-grounded, optional web search for real-world references (products, landmarks).
> Pick for: NB-style instruction-following at higher fidelity.
> Avoid for: cost-sensitive iteration — drop to Nano Banana 2.
Nano Banana 2 — google/nano-banana-2/text-to-image
> Flash-tier latency, predictable framing, enable_web_search flag for real-product / real-person grounding.
> Pick for: speed iteration, 4-up batch, real-world grounded prompts.
> Avoid for: long compositional instructions — use GPT Image 2.
GPT Image 2 — openai/gpt-image-2/text-to-image
> Best-in-class in-image text rendering (Japanese kana, Cyrillic, Arabic). Layout-precise instruction following.
> Pick for: posters, ads, multi-line copy, multilingual creatives, exact-text headlines.
> Avoid for: photoreal portraits — Seedream 5 wins on skin tones and lighting.
Seedream 5 Lite — bytedance/seedream-5/lite/text-to-image
> Latest ByteDance Seedream tier. Photoreal skin tones, natural lighting, strong East Asian aesthetic.
> Pick for: photoreal portraits, product shots, fashion / lifestyle.
> Avoid for: typography precision — use GPT Image 2.
Seedream 4-5 — bytedance/seedream-4-5/text-to-image
> Previous Seedream flagship, still strong on photoreal.
> Pick for: identity-stable batches between Seedream-5 generations; cheaper Seedream tier.
> Avoid for: new work — prefer Seedream 5 Lite.
Dreamina 4-0 — bytedance/dreamina-4-0/text-to-image
> ByteDance illustration / concept-art lean, stylized characters.
> Pick for: concept art, illustrated heroes, painterly assets.
> Avoid for: photoreal — use Seedream.
Qwen Image 2512 — qwen/qwen-image/qwen-image-2512
> Alibaba Qwen latest, open-weights, LoRA-compatible (/lora variant).
> Pick for: open-weights workflow, Qwen-aligned LoRA chains.
> Avoid for: closed-weights polish — use FLUX 2 or GPT Image 2.
Wan 2-7 — wan-ai/wan-2-7/text-to-image, wan-ai/wan-2-7/pro/text-to-image
> Open-weights, pairs natively with Wan 2-7 video models for unified-stack workflows.
> Pick for: Wan-stack pipelines (image + video same brand), open-weights requirement.
> Avoid for: top-tier image-only quality.
Z-Image Turbo — tongyi-mai/z-image/turbo
> Sub-second open-weights, native LoRA /lora variant.
> Pick for: LoRA-customized open-weights workflow at speed.
> Avoid for: closed-weights polish.
Nano Banana Pro Edit — google/nano-banana-pro/edit
> Highest-quality Nano Banana edit tier. Identity-preserving, multi-ref.
> Pick for: premium NB edit work, identity-locked variants.
> Avoid for: cost-sensitive iteration — drop to Nano Banana 2 Edit.
Nano Banana 2 Edit — google/nano-banana-2/edit (default i2i)
> 1–20 input images per call, identity-preserving by default, spatial-language honored ("upper-right", "the left object").
> Pick for: default i2i, batch identity-preserving, background swap, directional object remove/add.
> Avoid for: precise mask region — use the image-edit skill (Z-Image Inpaint).
GPT Image 2 Edit — openai/gpt-image-2/edit
> Up to 10 reference images, multilingual in-image text rewrite, layout-precise repositioning.
> Pick for: multilingual headline swap, multi-ref composition, layout repositioning, brand-locked identity across translations.
> Avoid for: mask-driven inpainting — use image-edit skill.
Seedream 5 Lite Edit — bytedance/seedream-5/lite/edit
> Latest Seedream edit tier, photoreal preservation.
> Pick for: photoreal edits that started from a Seedream t2i (identity holds across the pair).
> Avoid for: multilingual text rewrite.
Seedream 4-5 Edit — bytedance/seedream-4-5/edit
> Previous Seedream edit.
> Pick for: identity-stable batches between 4-5 generations.
> Avoid for: new work — prefer Seedream 5 Lite Edit.
Dreamina 4-0 Edit — bytedance/dreamina-4-0/edit
> ByteDance illustration edit.
> Pick for: editing a Dreamina-generated illustration.
> Avoid for: photoreal subjects.
Qwen Image Edit 2511 — qwen/qwen-image/qwen-image-edit-2511
> Alibaba open-weights edit.
> Pick for: open-weights edit pipeline.
> Avoid for: closed-weights polish.
Wan 2.6 i2i — wan-ai/wan-v2.6/image-to-image
> Wan ecosystem image-to-image.
> Pick for: Wan-stack pipeline integration.
> Avoid for: new work — older generation; prefer NB or GPT Image 2.
FLUX Kontext Pro — blackforestlabs/flux-1-kontext/pro/edit
> Single-ref single-instruction, highest preservation fidelity ("keep everything except X").
> Pick for: single-image precise local edit ("change only her umbrella to orange").
> Avoid for: batch work, multi-ref composition, mask-driven inpainting.
> Need mask-driven inpainting, controlled outpainting, or the full edit treatment? → use the image-edit skill.
Models: blackforestlabs/flux-2-klein/9b/text-to-image (default), blackforestlabs/flux-2-klein/4b/text-to-image (sub-second)
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| --- | --- | --- | --- | --- |
prompt | string | yes | — | Up to ~512 tokens; longer degrades. Subject-first declarative |
steps | int | no | 25 (9B) / 4 (4B) | Step-distilled; 4–8 enough for ideation, ~25 for polish, >25 buys little |
width | int | no | 1024 | 512–1536 typical, max ~2K total. Aspect cap 16:9 |
height | int | no | 1024 | Match width's aspect intent |
Up to 4 reference images supported on the same endpoint for style transfer / guided composition. Field name documented on the model page.
Polish / final (9B):
runcomfy run blackforestlabs/flux-2-klein/9b/text-to-image \
--input '{
"prompt": "A small purple cat sitting on a moss-covered stone, golden hour rim light, shallow depth of field, photoreal",
"steps": 25,
"width": 1536,
"height": 864
}' \
--output-dir ./out
Sub-second concepting (4B):
runcomfy run blackforestlabs/flux-2-klein/4b/text-to-image \
--input '{"prompt": "A small purple cat at sunset, photoreal"}' \
--output-dir ./out
"subject from ref 1, palette from ref 2").Model: openai/gpt-image-2/text-to-image
Catalog: runcomfy.com/models/openai/gpt-image-2
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| --- | --- | --- | --- | --- |
prompt | string | yes | — | Quote in-image text exactly with "…" |
size | enum | no | 1024_1024 | 1024_1024 (1:1), 1024_1536 (2:3 portrait), 1536_1024 (3:2 landscape) — only these three |
Logo / poster with exact headline:
runcomfy run openai/gpt-image-2/text-to-image \
--input '{
"prompt": "Minimal product poster. Centered bold headline reads exactly \"AURORA — Spring 2026\" in clean white sans-serif on a deep navy background. Below the headline a small line in monospace reads \"runs on water\". 3:2 layout.",
"size": "1536_1024"
}' \
--output-dir ./out
Multilingual:
runcomfy run openai/gpt-image-2/text-to-image \
--input '{
"prompt": "Japanese magazine cover. Vertical headline reads exactly \"今日のおすすめ\" in bold Japanese kana, right-edge alignment, photoreal portrait of a woman in a kimono.",
"size": "1024_1536"
}' \
--output-dir ./out
"the sign reads exactly 'CLOSED'" — without the literal quote the model paraphrases."Japanese kana", "Cyrillic", "Arabic right-to-left". Without this it falls back to romanization."top-left", "centered", "two-line stacked", "baseline aligned".Model: google/nano-banana-2/text-to-image
Catalog: runcomfy.com/models/google/nano-banana-2 · nano-banana collection
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| --- | --- | --- | --- | --- |
prompt | string | yes | — | Subject-first description |
num_images | int | no | 1 | 1–4. Use 4 for ideation rounds |
seed | int | no | 0 | Reuse for reproducibility |
aspect_ratio | enum | no | auto | auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16 |
resolution | enum | no | 1K | 0.5K (drafts), 1K (default), 2K (final), 4K (max) |
output_format | enum | no | png | png, jpeg, webp |
safety_tolerance | int | no | 4 | 1 (strict) – 6 (permissive) |
enable_web_search | bool | no | false | Adds web grounding (extra cost + latency) |
Default draft:
runcomfy run google/nano-banana-2/text-to-image \
--input '{"prompt": "A coffee mug on marble counter, top-down warm morning light"}' \
--output-dir ./out
4-up batch for ideation:
runcomfy run google/nano-banana-2/text-to-image \
--input '{
"prompt": "Three product photos of a ceramic coffee mug on a marble counter, warm morning light, top-down angle, minimal styling",
"num_images": 4,
"aspect_ratio": "1:1",
"resolution": "0.5K"
}' \
--output-dir ./out
enable_web_search: true when the prompt names a real product, place, or person whose appearance must match reality (logos, landmarks).0.5K for ideation, jump to 2K+ only for finals — 4K ~16× the cost of 0.5K.Models: bytedance/seedream-5/lite/text-to-image · bytedance/seedream-4-5/text-to-image
Collection: seedream
runcomfy run bytedance/seedream-5/lite/text-to-image \
--input '{"prompt": "85mm portrait of a woman by a window, soft natural light, shallow depth of field, photoreal"}' \
--output-dir ./out
Field schema is on the model page — pass through the CLI verbatim.
For workflows that want open-weights / LoRA support, or alternative aesthetics:
| Model | Endpoint | When |
|---|---|---|
| --- | --- | --- |
wan-ai/wan-2-7/text-to-image | wan-ai/wan-2-7/text-to-image | Wan ecosystem; pair with Wan 2-7 video models |
wan-ai/wan-2-7/pro/text-to-image | wan-ai/wan-2-7/pro/text-to-image | Wan Pro tier |
tongyi-mai/z-image/turbo | tongyi-mai/z-image/turbo | Sub-second, supports LoRA via /lora endpoint |
qwen/qwen-image/qwen-image-2512 | qwen/qwen-image/qwen-image-2512 | Qwen Image, open-weights, also has /lora variant |
bytedance/dreamina-4-0/text-to-image | bytedance/dreamina-4-0/text-to-image | Illustration / concept art lean |
Schemas live on each model page — pass field set through the CLI verbatim.
For one-shot edits, this skill ships three core routes; for the full edit treatment (mask-driven inpainting, batch-edit, all the side schemas), use the dedicated image-edit skill.
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Keep the subject identity, pose, and clothing unchanged. Convert the background into a rainy neon cyberpunk street.",
"image_urls": ["https://.../portrait.jpg"]
}' \
--output-dir ./out
Schema: prompt, image_urls (1–20), number_of_images (1–4), aspect_ratio (auto default), resolution, output_format, seed, enable_web_search. Lead the prompt with preservation goals, end with the change.
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the photo and layout exactly as in the input. Replace only the headline with \"今日のおすすめ\" in bold Japanese kana.",
"images": ["https://.../poster-en.jpg"],
"size": "auto"
}' \
--output-dir ./out
Schema: prompt, images (up to 10 HTTPS refs; image 1 is primary), size (auto / 1024_1024 / 1024_1536 / 1536_1024). size: "auto" preserves input ratio.
runcomfy run blackforestlabs/flux-1-kontext/pro/edit \
--input '{
"prompt": "Keep the person'\''s face, pose, and clothing unchanged. Add an orange umbrella in her left hand and a slight smile.",
"image": "https://.../portrait.jpg"
}' \
--output-dir ./out
Schema: prompt, image (single URL only — no array), aspect_ratio, seed. One declarative instruction per call; iterate compound edits in passes.
Same-brand t2i→i2i pairs let you generate then refine without leaving the brand:
| Brand | t2i endpoint | i2i / edit endpoint |
|---|---|---|
| --- | --- | --- |
| Seedream 5 Lite | bytedance/seedream-5/lite/text-to-image | bytedance/seedream-5/lite/edit |
| Seedream 4-5 | bytedance/seedream-4-5/text-to-image | bytedance/seedream-4-5/edit |
| Dreamina 4-0 | bytedance/dreamina-4-0/text-to-image | bytedance/dreamina-4-0/edit |
| Nano Banana Pro | google/nano-banana-pro/text-to-image | google/nano-banana-pro/edit |
| Qwen Image | qwen/qwen-image/qwen-image-2512 | qwen/qwen-image/qwen-image-edit-2511 |
| Wan 2-7 / 2.6 | wan-ai/wan-2-7/text-to-image | wan-ai/wan-v2.6/image-to-image |
For the full "best image-editing models" curated list with side-by-side capability notes, see the best-image-editing-models collection.
size: "1536_1024" for landscape"the headline reads exactly '…' in [font weight] [font family]"steps: 25 and explicit lens/lighting languagesteps: 6, fixed seed per character to keep identity drift lowresolution: "0.5K", num_images: 4, vary seed across runsThis skill covers the high-traffic models. Full RunComfy image catalog by use case:
nano-banana collectionseedream collectionflux-kontext collectionqwen-image collectiondreamina collectionbest-image-editing-models collectionrecently-added collection — fresh additionsEvery model page has an API tab with the exact JSON schema; pass field set through the CLI verbatim.
| code | meaning |
|---|---|
| --- | --- |
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill classifies the user request into one of the t2i or i2i routes above and invokes runcomfy run with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls request status, fetches the result, and downloads any .runcomfy.net / .runcomfy.com URLs into --output-dir. Ctrl-C cancels the remote request before exit.
npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf — if the operator wants the curl-pipe path documented at docs.runcomfy.com/cli/install, they should review the script first.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.--input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content, even with backticks, quotes, or $(...) patterns.enable_web_search results are untrusted. They are fetched by the RunComfy model server and can influence generation through embedded instructions (text painted into an image, EXIF strings, web-grounded steering). Agent mitigations:enable_web_search to false; flip to true only on explicit user request for real-world grounding.model-api.runcomfy.net and .runcomfy.net / .runcomfy.com for generated-output downloads. No telemetry, no callbacks.runcomfy . npm / npx / export RUNCOMFY_TOKEN=... lines are one-time operator setup, not commands the skill executes per call.best-image-editing-models collection · nano-banana · seedream · flux-kontext · qwen-image · dreamina — RunComfy brand collections共 1 个版本