Drive a face's mouth from an audio track. This skill routes across the lip-sync endpoints in the RunComfy catalog — OmniHuman, Sync Labs sync v2, Kling lipsync, Creatify — picking the right model for the user's actual intent and shipping the documented prompts + the exact runcomfy run invoke.
runcomfy.com · Sync Labs models · CLI docs
# 1. Install (see runcomfy-cli skill for details)
npm i -g @runcomfy/cli # or: npx -y @runcomfy/cli --version
# 2. Sign in
runcomfy login # or in CI: export RUNCOMFY_TOKEN=<token>
# 3. Lipsync
runcomfy run <vendor>/<model> \
--input '{"video_url": "...", "audio_url": "..."}' \
--output-dir ./out
CLI deep dive: runcomfy-cli skill.
Driving a real person's mouth from a separate audio track is dual-use. Refuse user requests that target real public figures without consent, or that aim at defamatory or sexually explicit synthetic media. The skill itself does not gate inputs — the responsibility rests with the operator.
Listed newest first within each subtype. The agent picks one route based on: input shape (portrait still + audio vs source video + audio vs script-only), quality tier, and budget.
Sync Labs sync v2 Pro — sync/sync/lipsync/v2/pro (default for premium)
> Sync Labs' premium lip-sync — state-of-the-art mouth motion onto an existing video. Preserves the rest of the frame untouched.
> Pick for: hero-quality dubs, lipsync on professionally-shot video, foreign-language dubbing where mouth fidelity matters most.
> Avoid for: cost-sensitive batch jobs — drop to sync v2.
Sync Labs sync v2 — sync/sync/lipsync/v2
> Standard Sync Labs tier, same workflow as Pro.
> Pick for: scaled / batch lipsync jobs, drafts.
> Avoid for: hero delivery — use v2 Pro.
Kling Lipsync (audio-to-video) — kling/lipsync/audio-to-video
> Kling's lip-sync onto a source video, driven by an audio track.
> Pick for: Kling-pipeline integration; alternative to Sync Labs.
> Avoid for: top-tier mouth fidelity — Sync Labs Pro is the industry benchmark.
Creatify Lipsync — creatify/lipsync
> Creatify's lipsync endpoint.
> Pick for: Creatify-ecosystem workflows.
> Avoid for: comparison shopping unless cost / latency favors it.
OmniHuman — bytedance/omnihuman/api (default for avatar-style)
> ByteDance's audio-driven full-body avatar. One portrait + one audio → video where the subject speaks / gestures naturally. Listed under RunComfy's /feature/lip-sync as the curated default.
> Pick for: UGC voiceover, virtual presenter, dubbed product demo from a single portrait.
> Avoid for: lip-sync onto an existing video (no portrait, want to preserve original motion) — use Sync Labs v2 instead.
Wan 2-7 with audio_url — wan-ai/wan-2-7/text-to-video
> Open-weights t2v with audio_url field — prompt describes the scene, audio drives the mouth.
> Pick for: full scene control (not just a portrait) with a specific voiceover MP3 + open-weights pipeline.
> Avoid for: simplest "portrait talks" — use OmniHuman.
Kling Lipsync (text-to-video) — kling/lipsync/text-to-video
> Generates speech audio in-pass from a script and syncs it to the resulting video.
> Pick for: "write a script → get a video with synced speech", no audio file needed.
> Avoid for: precise lip-sync to a specific MP3 (audio is regenerated each call, not locked).
HappyHorse 1.0 — happyhorse/happyhorse-1-0/text-to-video (also /image-to-video)
> Arena #1 t2v / i2v with in-pass audio generated from prompt. Quote the spoken line inside the prompt with says clearly: "…".
> Pick for: written script, in-pass audio with strong overall quality, social/UGC clips.
> Avoid for: locking mouth to a pre-recorded voiceover.
Model: sync/sync/lipsync/v2/pro (or sync/sync/lipsync/v2)
Catalog: sync v2 Pro · sync v2
runcomfy run sync/sync/lipsync/v2/pro \
--input '{
"video_url": "https://your-cdn.example/source-video.mp4",
"audio_url": "https://your-cdn.example/voiceover.mp3"
}' \
--output-dir ./out
Model: bytedance/omnihuman/api
Catalog: omnihuman
runcomfy run bytedance/omnihuman/api \
--input '{
"image_url": "https://your-cdn.example/portrait.jpg",
"audio_url": "https://your-cdn.example/voiceover.mp3"
}' \
--output-dir ./out
ai-avatar-video skill for the full avatar treatment.Model: kling/lipsync/audio-to-video (existing video + audio) or kling/lipsync/text-to-video (script-only)
Catalog: Kling lipsync a2v · Kling lipsync t2v
runcomfy run kling/lipsync/audio-to-video \
--input '{
"video_url": "https://your-cdn.example/source-video.mp4",
"audio_url": "https://your-cdn.example/voiceover.mp3"
}' \
--output-dir ./out
Schema details on the model page.
community/wan-2-2-animate/video-to-video) — see ai-avatar-video.kling collection — including Kling lipsync variants| code | meaning |
|---|---|
| --- | --- |
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill classifies user intent — source video + audio? portrait still + audio? script only? — picks the matching route, and invokes runcomfy run with the JSON body. The CLI POSTs to the Model API, polls request status, fetches the result, and downloads any .runcomfy.net / .runcomfy.com URLs into --output-dir.
npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var in CI / containers.--input. The CLI does not shell-expand prompt content. No shell-injection surface.model-api.runcomfy.net and .runcomfy.net / .runcomfy.com. No telemetry.Bash(runcomfy *) only.kling collection — including Kling lipsync variants/feature/lip-sync — RunComfy's curated lip-sync capability tag共 1 个版本