AI video generation on RunComfy. Generate videos with the full RunComfy video-model catalog through one CLI — text-to-video, image-to-video, and Veo's video-extend. This RunComfy video generation skill picks the right model for intent and ships the documented prompt patterns + the exact runcomfy run invoke for each.
runcomfy.com · Video models · CLI docs
# 1. Install (see runcomfy-cli skill for details)
npm i -g @runcomfy/cli # or: npx -y @runcomfy/cli --version
# 2. Sign in
runcomfy login # or in CI: export RUNCOMFY_TOKEN=<token>
# 3. Generate
runcomfy run <vendor>/<model>/<endpoint> \
--input '{"prompt": "..."}' \
--output-dir ./out
CLI deep dive: runcomfy-cli skill.
HappyHorse 1.0 — happyhorse/happyhorse-1-0/text-to-video (default)
> Currently #1 on Artificial Analysis Video Arena. Native synchronized audio generated in-pass (no separate Foley step). Native 1080p, up to ~15s, strong multi-shot character consistency.
> Pick for: general-purpose t2v, ad creative with audio, social-media clips, multi-shot narratives.
> Avoid for: audio-driven lip-sync to a specific voiceover MP3 — use Wan 2-7.
Kling 3.0 4K — kling/kling-3.0/4k/text-to-video
> Kling's latest, 4K output, strong multi-shot character identity, premium camera language.
> Pick for: hero shots, final-delivery 4K cuts, multi-shot character narratives.
> Avoid for: cost-sensitive iteration — drop to Kling 2-6 Pro or Standard i2v.
Seedance v2 Pro — bytedance/seedance-v2/pro
> ByteDance flagship — multi-modal (up to 9 reference images, 3 reference videos, 3 reference audio), in-pass synchronized audio, cinematic motion refinement, lens language honored.
> Pick for: cinematic ad frames, multi-reference composition (subject + scene + audio refs), 21:9 anamorphic looks.
> Avoid for: simple "single prompt → clip" jobs — overpowered, slower.
Seedance v2 Fast — bytedance/seedance-v2/fast
> Faster variant of Seedance v2 Pro, same multi-modal capabilities.
> Pick for: iteration on Seedance v2 compositions before locking a final on Pro.
> Avoid for: hero-shot final delivery.
Wan 2-7 — wan-ai/wan-2-7/text-to-video
> Open-weights flagship, audio_url field for audio-driven lip-sync, pairs natively with Wan image models.
> Pick for: dialog scenes where mouth must sync to a specific voiceover file; open-weights pipeline requirement.
> Avoid for: in-pass audio generation (no MP3 input) — use HappyHorse 1.0.
Kling 2-6 Pro — kling/kling-2-6/pro/text-to-video
> Previous Kling tier — still strong quality at much lower cost than 3.0 4K.
> Pick for: production at scale where 3.0 4K is too expensive.
> Avoid for: top-tier hero shots — use Kling 3.0 4K.
Seedance 1-5 Pro — bytedance/seedance-1-5/pro/text-to-video
> Previous Seedance generation, cheaper.
> Pick for: identity-stable batches between 1-5 generations; cost-sensitive baseline.
> Avoid for: new work — prefer Seedance v2 Pro or Fast.
HappyHorse 1.0 I2V — happyhorse/happyhorse-1-0/image-to-video (default)
> Animate any still with in-pass audio described in prompt, strong identity preservation.
> Pick for: animating a generated portrait or product still, vertical social clips, voiceover-described audio.
> Avoid for: physics-accurate object motion — use Veo 3-1.
Veo 3-1 — google-deepmind/veo-3-1/image-to-video
> Google's flagship — physics-respecting motion, strong object permanence ("rotates 180 degrees" = 180°), pairs with extend-video for longer clips.
> Pick for: product spins, physics-accurate motion, scenes where "no other motion" must hold.
> Avoid for: audio-driven dialog — use Wan 2-7 or HappyHorse.
Veo 3-1 Fast — google-deepmind/veo-3-1/fast/image-to-video
> Faster Veo 3-1 variant.
> Pick for: iteration on Veo compositions.
> Avoid for: hero delivery — use full Veo 3-1.
Kling 3.0 4K I2V — kling/kling-3.0/4k/image-to-video
> Multi-shot character identity, 4K output from a still.
> Pick for: 4K hero shots, character-narrative cuts.
> Avoid for: cost iteration — drop to Pro or Standard.
Kling 3.0 Pro I2V — kling/kling-3.0/pro/image-to-video
> Default Kling 3.0 quality tier.
> Pick for: high-quality i2v at moderate cost.
> Avoid for: 4K final delivery.
Kling 3.0 Standard I2V — kling/kling-3.0/standard/image-to-video
> Cheapest 3.0 i2v tier.
> Pick for: concepting / drafts on Kling 3.0.
> Avoid for: final delivery.
Hailuo 2-3 Pro — minimax/hailuo-2-3/pro/image-to-video
> MiniMax Hailuo latest — natural motion, strong on real-world subjects.
> Pick for: lifelike motion of real-people / real-product subjects.
> Avoid for: stylized characters — use Kling or Dreamina.
Dreamina 3-0 Pro — bytedance/dreamina-3-0/pro/image-to-video
> ByteDance Dreamina i2v — illustration / stylized character lean.
> Pick for: animating illustrated heroes, painterly stills.
> Avoid for: photoreal motion.
Seedance 1-0 Pro Fast — bytedance/seedance-1-0/pro/fast/image-to-video
> Older Seedance i2v generation, cheap.
> Pick for: cost-sensitive batch i2v on Seedance.
> Avoid for: new work — Seedance v2 Pro is more capable (t2v + i2v + multi-modal).
Veo 3-1 Extend — google-deepmind/veo-3-1/extend-video
> Continue an existing Veo clip with consistent motion / lighting / identity.
> Pick for: extending a video past Veo's per-call duration cap; chained narrative shots.
Veo 3-1 Fast Extend — google-deepmind/veo-3-1/fast/extend-video
> Faster Veo extend variant.
> Pick for: extending Veo Fast clips at matching latency tier.
For dedicated treatment of extend (input video preparation, frame-anchor strategy, chained extends), see the video-extend skill.
Model: happyhorse/happyhorse-1-0/text-to-video
Catalog: happyhorse-1-0
Currently #1 on the Artificial Analysis Video Arena — RunComfy's recommended default for general-purpose t2v. Native synchronized audio is generated in-pass (no separate Foley step).
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| --- | --- | --- | --- | --- |
prompt | string | yes | — | Subject-first, describe motion + scene + audio in one declarative |
duration | int | no | 5 | Seconds. Up to ~15s |
aspect_ratio | enum | no | 16:9 | 16:9, 9:16, 1:1 typical |
resolution | enum | no | 1080p | 720p, 1080p |
seed | int | no | — | Reproducibility |
runcomfy run happyhorse/happyhorse-1-0/text-to-video \
--input '{
"prompt": "A red kite tumbles across a windy beach at golden hour, kids chasing it laughing, surf in the background. Audio: wind, gulls, distant laughter.",
"duration": 8,
"aspect_ratio": "16:9",
"resolution": "1080p"
}' \
--output-dir ./out
"Audio: wind, gulls, distant laughter." HappyHorse generates audio in-pass.Model: wan-ai/wan-2-7/text-to-video
Catalog: wan-2-7 · wan-models collection
Pick Wan 2-7 when you have a specific voiceover / dialog audio file and want the on-screen subject's mouth to sync to it. The audio_url field drives the lip motion.
With audio-driven lip-sync:
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{
"prompt": "Studio portrait of a woman in her 30s speaking confidently to camera, soft window light.",
"audio_url": "https://your-cdn.example/voiceover.mp3",
"duration": 6
}' \
--output-dir ./out
Plain t2v (no audio):
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{"prompt": "Drone shot over forest canopy at sunrise, soft fog drifting between trees"}' \
--output-dir ./out
Model: bytedance/seedance-v2/pro (or /fast)
Catalog: seedance-v2 Pro · seedance collection
Pick Seedance v2 Pro when the user needs multi-modal conditioning — up to 9 reference images, 3 reference videos, 3 reference audio tracks synthesized in-pass with cinematic motion refinement.
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Anamorphic 35mm shot — a vintage car drives down a coastal road at dusk, lens flares from oncoming headlights, cinematic color grade.",
"duration": 10,
"aspect_ratio": "21:9"
}' \
--output-dir ./out
"subject from ref image 1, mood from ref video 2, score from ref audio 1".Model: happyhorse/happyhorse-1-0/image-to-video
Catalog: happyhorse-1-0 i2v
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
--input '{
"image_url": "https://your-cdn.example/portrait.jpg",
"prompt": "She turns her head slowly to look at the camera and smiles. Wind through her hair. Audio: gentle breeze.",
"duration": 6,
"aspect_ratio": "9:16"
}' \
--output-dir ./out
Model: google-deepmind/veo-3-1/image-to-video (or /fast/image-to-video)
Catalog: veo-3-1 i2v · veo-3 collection
Pick Veo when physics / realism / object permanence matters most. Veo 3-1 supports both 8s clips and longer with the extend-video companion endpoint.
runcomfy run google-deepmind/veo-3-1/image-to-video \
--input '{
"image_url": "https://your-cdn.example/product.jpg",
"prompt": "The bottle slowly rotates 180 degrees on a marble surface, soft daylight, no other motion."
}' \
--output-dir ./out
Model: kling/kling-3.0/{4k,pro,standard}/image-to-video
Catalog: kling collection
Three tiers — pick by quality / cost trade-off:
| Tier | Endpoint | When |
|---|---|---|
| --- | --- | --- |
| 4K | kling/kling-3.0/4k/image-to-video | Hero shots, final delivery at 4K |
| Pro | kling/kling-3.0/pro/image-to-video | Default — high quality at lower cost |
| Standard | kling/kling-3.0/standard/image-to-video | Concepting, drafts |
runcomfy run kling/kling-3.0/pro/image-to-video \
--input '{
"image_url": "https://your-cdn.example/character.jpg",
"prompt": "The character walks toward the camera, soft handheld feel, end on a medium close-up."
}' \
--output-dir ./out
| Endpoint | When |
|---|---|
| --- | --- |
minimax/hailuo-2-3/pro/image-to-video · /standard/image-to-video | MiniMax Hailuo — natural motion, strong on real-world subjects |
bytedance/dreamina-3-0/pro/image-to-video | Dreamina — illustrative / concept art lean |
bytedance/seedance-1-0/pro/fast/image-to-video | Seedance 1-0 — cheaper baseline |
kling/kling-video-o1/standard | Kling Video O1 — reasoning-style video model |
kling/kling-2-6/motion-control-pro | Transfer motion from a reference video onto a target character |
Schemas live on each model page — pass field set through the CLI verbatim.
aspect_ratio: "9:16", duration: 6, audio described inline"rotates 180 degrees, no other motion" — Veo respects physicsaudio_url pointing at your voiceover MP3video-extend skillai-avatar-video skill for OmniHuman + HappyHorse + Wan compositionkling · seedance · veo-3 · hailuo · wan-models · dreamina brand collections/models/feature/lip-sync · /feature/character-swap · /feature/upscale-video capability tags| code | meaning |
|---|---|
| --- | --- |
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill classifies the user request into one of the t2v / i2v / extend routes above and invokes runcomfy run with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls request status, fetches the result, and downloads any .runcomfy.net / .runcomfy.com URLs into --output-dir. Ctrl-C cancels the remote request before exit.
npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.--input. The CLI does not shell-expand prompt content. No shell-injection surface from prompt content.model-api.runcomfy.net and .runcomfy.net / .runcomfy.com. No telemetry, no callbacks.runcomfy — install lines are one-time operator setup.kling · seedance · veo-3 · hailuo · wan-models · dreamina — RunComfy video brand collections/feature/lip-sync · /feature/character-swap · /feature/upscale-video — capability tags共 1 个版本