概述

Media Generation

Handle image generation, image editing, and short video generation through one workflow: choose the right modality, pass caller intent through to the provider, save outputs under tmp/images/ or tmp/videos/, and prefer the bundled helpers over ad-hoc one-off API calls.

Workflow decision

If the user wants a brand-new still image, use an image-generation model.
If the user supplies an image or wants a specific existing image changed, use an image-edit workflow.
If the user wants motion / a clip / a short video, use a video-generation model.
If the request includes one or more reference images, use the helper that supports reference-image transport.

Standard workflow

Determine whether the task is image generation, image editing, or video generation.
Clarify only when required to execute the request correctly.
Prefer scripts/generate_image.py for still-image generation.
Prefer scripts/edit_image.py for direct image edits.
Prefer scripts/mask_inpaint.py for localized edits with masks or generated regions.
Prefer scripts/outpaint_image.py for canvas expansion / outpainting.
Prefer scripts/reference_media.py when reference images need to be passed through.
Prefer scripts/generate_video.py for video generation, especially when the provider may return async job payloads.
Prefer scripts/generate_batch_media.py for repeatable batch jobs, templated variations, or auditable manifests.
Prefer scripts/object_select_edit.py for simple object-vs-background edits on transparent assets or clean backdrops.
If the provider returns a URL, path, HTML snippet, markdown snippet, data: URL, or b64_json, use scripts/fetch_generated_media.py.
Save outputs under:

images → tmp/images/
videos → tmp/videos/

If the user wants files sent in chat, prefer sending the local downloaded file.
Keep the original remote reference as fallback when local retrieval fails.

Prompt handling

Default to prompt pass-through.

Pass the caller's prompt through unchanged.
Use optional request fields only when the caller provides them.
Keep prompt semantics under caller control.

Use the scripts mainly as functional helpers:

normalize arguments
map fields to provider-specific JSON
upload files
poll async jobs
download returned media
save outputs under tmp/images/ or tmp/videos/

Delivery rules

Save generated or edited images in tmp/images/.
Save generated videos in tmp/videos/.
Never scatter generated files in the workspace root.
If message delivery blocks remote URLs, download locally first and then send the local file.
If a remote file cannot be fetched locally but the raw link may still help, provide the original link clearly.

Helper quick guide

Use the smallest helper that matches the request:

scripts/generate_image.py → direct still-image generation
scripts/edit_image.py → direct full-image edits
scripts/mask_inpaint.py → localized edits with an explicit or generated mask
scripts/outpaint_image.py → canvas expansion before an edit call
scripts/reference_media.py → reference-image transport and delegation
scripts/generate_consistent_media.py → backward-compatible wrapper only
scripts/generate_batch_media.py → repeatable manifest-driven batches
scripts/object_select_edit.py → simple object-vs-background edits on transparent or clean-backdrop assets
scripts/generate_video.py → direct video generation and async polling
scripts/fetch_generated_media.py → normalize returned media refs into local files

Use references/model-capabilities.md when deciding which helper fits the modality, transport, or return shape.

Use references/reference-image-workflow.md for reference-image transport details.

Use references/batch-workflows.md for manifest structure and batch execution behavior.

Minimal examples:

python3 skills/media-generation/scripts/generate_image.py \
  --prompt 'person' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'generated'

python3 skills/media-generation/scripts/edit_image.py \
  --image 'tmp/images/source.jpg' \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'edited'

python3 skills/media-generation/scripts/mask_inpaint.py \
  --image 'tmp/images/source.jpg' \
  --x 120 --y 80 --width 220 --height 180 \
  --prompt 'replace the masked area' \
  --out-dir 'tmp/images' \
  --prefix 'mask-result'

python3 skills/media-generation/scripts/outpaint_image.py \
  --image 'tmp/images/source.jpg' \
  --left 512 --right 512 --top 128 --bottom 128 \
  --mode blur \
  --prompt 'extend outward' \
  --out-dir 'tmp/images' \
  --prefix 'outpaint-result'

python3 skills/media-generation/scripts/reference_media.py \
  --mode image \
  --reference-image 'tmp/images/reference.png' \
  --prompt 'character' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'reference-output'

python3 skills/media-generation/scripts/generate_batch_media.py \
  --manifest 'tmp/images/media-batch.jsonl' \
  --vars-json '{"subject":"item"}' \
  --summary-out 'tmp/images/media-batch-summary.json' \
  --continue-on-error \
  --print-json

python3 skills/media-generation/scripts/object_select_edit.py \
  --image 'tmp/images/product.png' \
  --selection-mode alpha \
  --edit-target background \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'product-bg-edit'

python3 skills/media-generation/scripts/generate_video.py \
  --prompt 'motion clip' \
  --size '720x1280' \
  --seconds 6 \
  --out-dir 'tmp/videos' \
  --prefix 'generated-video'

Quick compatibility checklist

Before blaming the skill, check these first:

config exists and is valid JSON
config.models.providers. exists
the selected provider has both baseUrl and apiKey
the chosen endpoint actually exists on that provider
the chosen model name is valid for that endpoint
any provider-specific fields passed through --extra-json or --extra-json-file match that provider's schema

Defaults used by the bundled scripts:

config path: ~/.openclaw/openclaw.json or $OPENCLAW_CONFIG
default provider: $OPENCLAW_MEDIA_PROVIDER, otherwise the first provider found in config
default model names: placeholders unless overridden by env vars or --model
image → $OPENCLAW_MEDIA_IMAGE_MODEL or image-model
edit → $OPENCLAW_MEDIA_EDIT_MODEL or image-edit-model
video → $OPENCLAW_MEDIA_VIDEO_MODEL or video-model
output root: tmp/ or $MEDIA_GENERATION_OUTPUT_ROOT
output paths are resolved relative to the current working directory unless you pass an absolute --out-dir

Quick troubleshooting

Common failure patterns:

provider not found → pass --provider explicitly or set $OPENCLAW_MEDIA_PROVIDER
placeholder model warning (image-model / image-edit-model / video-model) → pass --model explicitly or set the matching $OPENCLAW_MEDIA_*_MODEL env var
config not found / invalid JSON → pass --config explicitly or fix the OpenClaw config file
HTTP 404 → check --endpoint and video polling paths
HTTP 400 → check model name and provider-specific payload fields in --extra-json / --extra-json-file
HTTP 401/403 → check the provider apiKey
request failed before HTTP response → check base URL, proxy/TLS, or network reachability
video accepted then failed later → check request payload, provider logs, or switch provider/model

Use --print-json when debugging so the response body, resolved endpoint, and failure hints stay visible.

References

Read these selectively:

helper selection, modality fit, transport notes, return-shape handling → references/model-capabilities.md
reference-image transport rules and compatibility notes → references/reference-image-workflow.md
manifest format, templating, and batch execution behavior → references/batch-workflows.md

Primary helpers:

image generation → scripts/generate_image.py
image edit → scripts/edit_image.py
mask inpaint → scripts/mask_inpaint.py
outpaint → scripts/outpaint_image.py
reference-image transport → scripts/reference_media.py
backward-compatible wrapper → scripts/generate_consistent_media.py
video generation → scripts/generate_video.py
batch generation → scripts/generate_batch_media.py
object-select edit → scripts/object_select_edit.py
object mask prep → scripts/prepare_object_mask.py
shared request utility → scripts/media_request_common.py
smoke tests → scripts/smoke_test.py
media retrieval → scripts/fetch_generated_media.py

版本历史

共 2 个版本

v2.2.0 当前

2026-05-01 10:09 安全安全
v2.1.0

2026-03-19 13:20 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Media Generation

概述

Media Generation

Workflow decision

Standard workflow

Prompt handling

Delivery rules

Helper quick guide

Quick compatibility checklist

Quick troubleshooting

References

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

AdMapix

Baidu Wenku AIPPT

Humanizer