Generate AI videos and images using Alibaba's Wan 2.6 and Wan 2.5 — featuring text-to-video, image-to-video, video-to-video, text-to-image, and image editing with up to 1080p resolution, 15-second duration, multi/single camera modes, audio-guided generation, and built-in prompt expansion.
Wan 2.6 is the latest flagship model with cinematic motion quality, multi-camera shot types, audio URL input for guided generation, and video-to-video transfer with character-level prompt control. Wan 2.5 offers a cost-effective alternative with 480p support and Flash variants for rapid prototyping.
> Data usage note: This skill sends text prompts, image URLs, audio URLs, and video files to the Atlas Cloud API (api.atlascloud.ai) for generation. No data is stored locally beyond the downloaded output files. API usage incurs charges based on resolution, duration, and model selected.
characterX prompt notation (2.6)multi_camera for dynamic shots, single_camera for stable framing (2.6)export ATLASCLOUD_API_KEY="your-key"The API key is tied to your Atlas Cloud account and its pay-as-you-go balance. All usage is billed to this account. Atlas Cloud does not currently support scoped keys — the key grants access to all models available on your account.
This skill includes Python scripts for both video and image generation. Zero external dependencies required.
python scripts/generate_video.py list-models
python scripts/generate_image.py list-models
python scripts/generate_video.py generate \
--model "alibaba/wan-2.6/text-to-video" \
--prompt "Your prompt here" \
--output ./output \
duration=5
python scripts/generate_image.py generate \
--model "alibaba/wan-2.6/text-to-image" \
--prompt "Your prompt here" \
--output ./output
python scripts/generate_video.py generate \
--model "alibaba/wan-2.6/image-to-video" \
--image "https://example.com/photo.jpg" \
--prompt "Animate this scene" \
--output ./output \
resolution=1080p duration=5
python scripts/generate_video.py upload ./local-file.jpg
Run python scripts/generate_video.py generate --help or python scripts/generate_image.py generate --help for all options. Extra model params can be passed as key=value.
All video prices are per second of video generated. Atlas Cloud pricing varies by resolution.
| Resolution | fal.ai | Atlas Cloud | Savings |
|---|---|---|---|
| :----------: | :------: | :-----------: | :-------: |
| 480p | - | $0.04/s | - |
| 720p | $0.10/s | $0.08/s | 20% off |
| 1080p | $0.15/s | $0.12/s | 20% off |
| Resolution | fal.ai | Atlas Cloud | Savings |
|---|---|---|---|
| :----------: | :------: | :-----------: | :-------: |
| 720p | $0.10/s | $0.10/s | - |
| 1080p | $0.15/s | $0.15/s | - |
| Resolution | Atlas Cloud |
|---|---|
| :----------: | :-----------: |
| All | $0.018/s |
| Model | Original | Atlas Cloud | Savings |
|---|---|---|---|
| ------- | :--------: | :-----------: | :-------: |
| Text-to-Image | ~~$0.03~~ | $0.021 | 30% off |
| Image Edit | ~~$0.035~~ | $0.021 | 40% off |
| Model | Atlas Cloud | Duration |
|---|---|---|
| ------- | :-----------: | ---------- |
| Text-to-Video | $0.035/s | 5/10 seconds |
| Image-to-Video | $0.035/s | 5/10 seconds |
> fal.ai pricing sourced from fal.ai/models/wan.
| Model ID | Type | Resolution | Duration |
|---|---|---|---|
| ---------- | ------ | :----------: | :--------: |
alibaba/wan-2.6/text-to-video | Text-to-Video | 480p–1080p | 5/10/15s |
alibaba/wan-2.6/image-to-video | Image-to-Video | 720p–1080p | 5/10/15s |
alibaba/wan-2.6/image-to-video-flash | Image-to-Video (Fast) | 720p–1080p | 5/10/15s |
alibaba/wan-2.6/video-to-video | Video-to-Video | 480p–1080p | 5/10s |
| Model ID | Type | Max Size |
|---|---|---|
| ---------- | ------ | :--------: |
alibaba/wan-2.6/text-to-image | Text-to-Image | 2184×936 |
alibaba/wan-2.6/image-edit | Image Editing | 24 presets |
| Model ID | Type | Resolution | Duration |
|---|---|---|---|
| ---------- | ------ | :----------: | :--------: |
alibaba/wan-2.5/text-to-video | Text-to-Video | 480p–1080p | 5/10s |
alibaba/wan-2.5/image-to-video | Image-to-Video | 480p–1080p | 5/10s |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ----------- | ------ | ---------- | --------- | ------------- |
prompt | string | Yes | - | Video description |
negative_prompt | string | No | - | What to exclude from the video |
size | string | No | 1280*720 | Output size (see Size Options below) |
duration | integer | No | 5 | 5, 10, or 15 seconds |
shot_type | string | No | - | multi_camera for dynamic shots, single_camera for stable framing |
audio | string | No | - | Audio URL to guide generation with synchronized sound |
generate_audio | boolean | No | false | Generate synchronized audio |
enable_prompt_expansion | boolean | No | false | Expand prompt for better results |
seed | integer | No | random | For reproducible results |
Same as text-to-video (without size), plus:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ----------- | ------ | ---------- | --------- | ------------- |
image | string | Yes | - | Source image URL |
resolution | string | No | 720p | 720p, 1080p |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ----------- | ------ | ---------- | --------- | ------------- |
prompt | string | Yes | - | Video description (use character1, character2 to reference characters in video) |
negative_prompt | string | No | - | What to exclude |
videos | array | Yes | - | Source video URLs (max 100MB each, 2-30s duration) |
size | string | No | 1280*720 | Output size |
duration | integer | No | 5 | 5 or 10 seconds |
shot_type | string | No | - | multi_camera or single_camera |
enable_prompt_expansion | boolean | No | false | Expand prompt for better results |
seed | integer | No | random | For reproducible results |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ----------- | ------ | ---------- | --------- | ------------- |
prompt | string | Yes | - | Image description |
negative_prompt | string | No | - | What to exclude |
size | string | No | 1024*1024 | Output size (27 presets, see below) |
enable_prompt_expansion | boolean | No | false | Expand prompt |
enable_sync_mode | boolean | No | false | Wait for result synchronously |
enable_base64_output | boolean | No | false | Return Base64 instead of URL |
seed | integer | No | random | For reproducible results |
Same as text-to-image, plus:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ----------- | ------ | ---------- | --------- | ------------- |
images | array | Yes | - | Images to edit (max 4, 384-5000px per side) |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ----------- | ------ | ---------- | --------- | ------------- |
prompt | string | Yes | - | Video description |
negative_prompt | string | No | - | What to exclude |
size | string | No | 1280*720 | Output size (13 presets, including 480p) |
duration | integer | No | 5 | 5 or 10 seconds |
audio | string | No | - | Audio URL for guided generation |
generate_audio | boolean | No | false | Generate synchronized audio |
enable_prompt_expansion | boolean | No | false | Expand prompt |
seed | integer | No | random | For reproducible results |
Same as Wan 2.5 text-to-video (without size), plus:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ----------- | ------ | ---------- | --------- | ------------- |
image | string | Yes | - | Source image URL |
resolution | string | No | 720p | 480p, 720p, 1080p |
T2V / V2V (10 presets):
1280720, 7201280, 960960, 19201080, 10801920, 1280960, 9601280, 1920816, 8161920, 1280544
Image Size Options (Wan 2.6 — 27 presets):
10241024, 1280720, 7201280, 1280960, 9601280, 15361024, 10241536, 12801280, 15361536, 20481024, 10242048, 15361280, 12801536, 1680720, 7201680, 2016864, 8642016, 1536864, 8641536, 2184936, 9362184, 14001050, 10501400, 16801050, 10501680, 11761176, 1560*1560
# Step 1: Submit
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/wan-2.6/text-to-video",
"prompt": "A drone shot flying over ancient ruins at golden hour, camera slowly descending toward a central courtyard",
"size": "1920*1080",
"duration": 10,
"shot_type": "multi_camera",
"generate_audio": true,
"enable_prompt_expansion": true
}'
# Returns: { "code": 200, "data": { "id": "prediction-id" } }
# Step 2: Poll (every 5 seconds until completed)
curl -s "https://api.atlascloud.ai/api/v1/model/prediction/{prediction-id}" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY"
# Returns: { "code": 200, "data": { "status": "completed", "outputs": ["https://...video-url..."] } }
# Step 3: Download
curl -o output.mp4 "VIDEO_URL_FROM_OUTPUTS"
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/wan-2.6/image-to-video",
"image": "https://example.com/landscape.jpg",
"prompt": "The camera slowly zooms in as clouds drift across the sky and leaves rustle in the wind",
"resolution": "1080p",
"duration": 5,
"generate_audio": true
}'
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/wan-2.6/video-to-video",
"videos": ["https://example.com/original-video.mp4"],
"prompt": "Transform character1 into a cartoon anime character, keep the background unchanged",
"size": "1280*720",
"duration": 5,
"shot_type": "single_camera"
}'
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/wan-2.6/text-to-video",
"prompt": "A jazz band performing on stage, musicians playing saxophone and piano",
"audio": "https://example.com/jazz-music.mp3",
"size": "1920*1080",
"duration": 10
}'
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateImage" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/wan-2.6/text-to-image",
"prompt": "A cyberpunk cityscape at night, neon signs reflected in rain puddles, photorealistic",
"size": "1680*720",
"enable_prompt_expansion": true
}'
# Returns: { "code": 200, "data": { "id": "prediction-id" } }
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateImage" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/wan-2.6/image-edit",
"prompt": "Change the background to a sunset beach scene, keep the person unchanged",
"images": ["https://example.com/photo.jpg"],
"size": "1280*720"
}'
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/wan-2.6/image-to-video-flash",
"image": "https://example.com/portrait.jpg",
"prompt": "The person slowly turns and smiles",
"resolution": "720p",
"duration": 5
}'
processing / starting / running → wait 5s, retry (typically takes ~30-120s for video, ~5-10s for image)completed / succeeded → done, get URL from data.outputs[]failed → error, read data.errorIf the Atlas Cloud MCP server is configured, use built-in tools:
atlas_generate_video(model="alibaba/wan-2.6/text-to-video", params={...})
atlas_generate_image(model="alibaba/wan-2.6/text-to-image", params={...})
atlas_get_prediction(prediction_id="...")
multi_camera for dynamic shots, single_camera for stable framinggenerate_audio: trueenable_prompt_expansion: true for auto-optimized promptsshot_type: "multi_camera" with prompts like "Cut between close-up and wide shot..."character1, character2 — e.g., "Transform character1 into an anime character"| Feature | Wan 2.6 | Wan 2.5 |
|---|---|---|
| --------- | :-------: | :-------: |
| Video T2V Price (720p) | $0.08/s | $0.035/s |
| Video I2V Price (720p) | $0.10/s | $0.035/s |
| Max Resolution | 1080p | 1080p |
| Max Duration | 15s | 10s |
| Shot Type Control | Yes | No |
| Audio-Guided | Yes | Yes |
| Video-to-Video | Yes | No |
| Text-to-Image | Yes ($0.021) | No |
| Image Editing | Yes ($0.021) | No |
| Flash/Fast Variants | I2V Flash ($0.018/s) | Yes |
| Prompt Expansion | Yes | Yes |
共 2 个版本