← 返回
内容创作 中文

UGC Manual

Generate lip-sync video from image + user's own audio recording. ✅ USE WHEN: - User provides their OWN audio file (voice recording) - Want to sync image to specific audio/voice - User recorded the script themselves - Need exact audio timing preserved ❌ DON'T USE WHEN: - User provides text script (not audio) → use veed-ugc - Need AI to generate the voice → use veed-ugc - Don't have audio file yet → use veed-ugc with script INPUT: Image + audio file (user's recording) OUTPUT: MP4 video with lip
基于图片和用户自有录音生成对口型视频。 ✅ 适用场景: - 用户提供自己的音频文件(录音) - 需要将图片与特定音频/语音同步 - 用户自行录制了脚本 - 需保留精确的音频时间 ❌ 不适用场景: - 用户提供文本脚本(非音频)→ 使用 veed-ugc - 需要 AI 生成语音 → 使用 veed-ugc - 尚无音频文件 → 使用 veed-ugc 并提供脚本 输入:图片 + 音频文件(用户录音) 输出:对口型与提供音频同步的 MP4 视频 核心区别:veed-ugc = 脚本 → AI 语音 → 视频 ugc-manual = 用户音频 → 视频(无语音生成)
pauldelavallaz
内容创作 clawhub v1.0.2 1 版本 99777.6 Key: 无需
★ 2
Stars
📥 1,306
下载
💾 94
安装
1
版本
#latest

概述

UGC-Manual

Generate lip-sync videos by combining an image with a custom audio file using ComfyDeploy's UGC-MANUAL workflow.

Overview

UGC-Manual takes:

  1. An image (person/character with visible face)
  2. An audio file (user's voice recording)

And produces a video where the person in the image lip-syncs to the audio.

API Details

Endpoint: https://api.comfydeploy.com/api/run/deployment/queue

Deployment ID: 075ce7d3-81a6-4e3e-ab0e-7a25edf601b5

Required Inputs

InputDescriptionFormats
-----------------------------
imageImage with a visible faceJPG, PNG
input_audioAudio file to lip-syncMP3, WAV, OGG

Usage

uv run ~/.clawdbot/skills/ugc-manual/scripts/generate.py \
  --image "path/to/image.jpg" \
  --audio "path/to/audio.mp3" \
  --output "output-video.mp4"

With URLs:

uv run ~/.clawdbot/skills/ugc-manual/scripts/generate.py \
  --image "https://example.com/image.jpg" \
  --audio "https://example.com/audio.mp3" \
  --output "result.mp4"

Workflow Integration

Typical Use Cases

  1. Custom voice recordings - User records their own audio via Telegram/WhatsApp
  2. Pre-generated TTS - Audio generated externally (ElevenLabs, etc.)
  3. Music/sound sync - Sync mouth movements to any audio

Example Pipeline

# 1. Convert Telegram voice message to MP3 (if needed)
ffmpeg -i voice.ogg -acodec libmp3lame -q:a 2 voice.mp3

# 2. Generate lip-sync video
uv run ugc-manual... --image face.jpg --audio voice.mp3 --output video.mp4

Difference from VEED-UGC

FeatureUGC-ManualVEED-UGC
-------------------------------
Audio sourceUser providesGenerated from brief
ScriptN/AAuto-generated
VoiceUser's recordingElevenLabs TTS
Use caseCustom audioAutomated content

Notes

  • Image should have a clearly visible face (frontal or 3/4 view)
  • Audio quality affects output quality
  • Processing time: ~2-5 minutes depending on audio length
  • Audio auto-conversion: The script automatically converts any audio format (MP3, OGG, M4A, etc.) to WAV PCM 16-bit mono 48kHz before sending to FabricLipsync
  • Requires ffmpeg installed on the system

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-03-29 00:46 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,448
data-analysis

AI Brand Analyzer

pauldelavallaz
分析品牌以生成全面的品牌标识档案(JSON格式)。当用户需要分析品牌、创建品牌档案或获取品牌数据进行广告生成时使用。档案可在Ad-Ready、Morpheus及其他创意工作流程中重复使用。支持列出和更新现有档案。
★ 3 📥 2,547
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 859 📥 199,546