← 返回
未分类 中文

Conversation Video

Generate animated conversation videos with multi-voice TTS audio and timed text overlays. Use when the user needs to (1) turn a transcript or dialogue into a...
Generate animated conversation videos with multi-voice TTS audio and timed text overlays. Use when the user needs to (1) turn a transcript or dialogue into a...
pratyushchauhan pratyushchauhan 来源
未分类 clawhub v1.0.0 1 版本 98461.5 Key: 无需
★ 0
Stars
📥 64
下载
💾 1
安装
1
版本
#latest

概述

Conversation Video

Generate multi-voice conversation videos from text transcripts. Two paths: quick ffmpeg (no dependencies) or rich Remotion (React animations).

Prerequisites

ToolPath / Notes
-------------------
ffmpegSystem install or Jellyfin ffmpeg at /usr/lib/jellyfin-ffmpeg/ffmpeg
supertonic-ttsPython package for multi-voice TTS (see scripts/generate_audio.py for load logic)
Node.js + npmOnly needed for Remotion path

Workflow

1. Build a transcript manifest

Create a JSON file with your conversation:

[
  {"speaker": "NARRATOR",   "text": "Customer Discovery Interview", "voice": "M1", "speed": 1.0, "align": "center"},
  {"speaker": "INTERVIEWER","text": "Walk me through when you first realized...", "voice": "M5", "speed": 0.95, "align": "left"},
  {"speaker": "CUSTOMER",   "text": "I was looking for a marketer agent.", "voice": "M2", "speed": 1.0, "align": "right"}
]

Fields: speaker (label), text (spoken text), voice (supertonic voice name e.g. M1-M5, F1-F2), speed (optional playback speed), align (left/right/center for video placement).

2. Generate audio + timing manifest

python scripts/generate_audio.py manifest.json output.wav

Outputs:

  • output.wav — concatenated multi-voice audio
  • output_timings.json — per-segment start/end times for video sync

3. Render video (choose path)

Path A: ffmpeg — fast, no Node.js needed

python scripts/ffmpeg_render.py output_timings.json output.wav video.mp4

Options: --width, --height, --font-size, --bg, --font, --crf

Path B: Remotion — richer animations, React-based

Copy the boilerplate:

cp -r assets/remotion-boilerplate ./my-video
cd my-video
npm install

Edit src/Conversation.tsx:

  1. Replace conversation array with your lines (duration in frames, 30fps)
  2. Set SpeakerConfig colors/alignment
  3. Uncomment and place audio in public/

Render:

npx remotion render src/index.ts Conversation out/video.mp4

Speaker Customization

Default color/alignment map (edit in either ffmpeg or Remotion):

SpeakerColorAlign
-----------------------
NARRATOR#cbd5e1center
INTERVIEWER#60a5faleft
CUSTOMER#34d399right

Add more by extending the config map in the respective renderer.

Resources

  • scripts/generate_audio.py — Multi-voice TTS with timing export
  • scripts/ffmpeg_render.py — ffmpeg drawtext video renderer
  • assets/remotion-boilerplate/ — Copyable Remotion project template
  • references/remotion-patterns.md — Advanced Remotion techniques (JSON data loading, word-by-word reveal, audio sync)
  • references/ffmpeg-guide.md — ffmpeg drawtext syntax and timing reference

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-06-09 19:31

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

Supertonic TTS

pratyushchauhan
使用Supertonic (Supertone)在设备上进行多语言文字转语音。适用于需要本地/离线TTS、语音生成、语音合成或转换的场景。
★ 2 📥 380

Remotion Animator

pratyushchauhan
一种面向智能体的视频技能,使用 Remotion 编程方式构建动画视频,适用于制作各类动画(如片头、解释视频)时。
★ 0 📥 133

FRED Data Viz

pratyushchauhan
用于在用户需要可视化、比较或分析经济数据时,根据美联储经济数据(FRED)生成可直接发布的对比图表。
★ 1 📥 139