Dub YouTube with Voice.ai

> This skill follows the Agent Skills specification.

Turn any script into a YouTube-ready voiceover — complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command.

Built for YouTube creators who want studio-quality narration without the studio. Powered by Voice.ai.

When to use this skill

Scenario	Why it fits
---	---
YouTube long-form	Full narration with chapter markers and captions
YouTube Shorts	Quick hooks with punchy delivery
Course content	Professional narration for educational videos
Screen recordings	Dub a screencast with clean AI voiceover
Quick iteration	Smart caching — edit one section, only that segment re-renders
Batch production	Same voice, consistent quality across every video

The one-command workflow

Have a script and a video? Dub it in one shot:

node voiceai-vo.cjs build \
  --input my-script.md \
  --voice oliver \
  --title "My YouTube Video" \
  --video ./my-recording.mp4 \
  --mux \
  --template youtube

This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:

out/my-youtube-video/muxed.mp4 — your video dubbed with the AI voiceover
out/my-youtube-video/master.wav — the standalone audio
out/my-youtube-video/review.html — listen and review each segment
out/my-youtube-video/chapters.txt — paste directly into your YouTube description
out/my-youtube-video/captions.srt — upload to YouTube as subtitles
out/my-youtube-video/description.txt — ready-made YouTube description with chapters

Use --sync pad if the audio is shorter than the video, or --sync trim to cut it to match.

Requirements

Node.js 20+ — runtime (no npm install needed — the CLI is a single bundled file)
VOICE_AI_API_KEY — set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard.
ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video dubbing. The pipeline still produces individual segments, the review page, chapters, and captions without it.

Configuration

Set VOICE_AI_API_KEY as an environment variable before running:

export VOICE_AI_API_KEY=your-key-here

The skill does not read .env files or access any files for credentials — only the environment variable.

Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).

Commands

`build` — Generate a YouTube voiceover from a script

node voiceai-vo.cjs build \
  --input <script.md or script.txt> \
  --voice <voice-alias-or-uuid> \
  --title "My YouTube Video" \
  [--template youtube] \
  [--video input.mp4 --mux --sync shortest] \
  [--force] [--mock]

What it does:

Reads the script and splits it into segments (by ## headings for .md, or by sentence boundaries for .txt)
Optionally prepends/appends YouTube intro/outro segments
Renders each segment via Voice.ai TTS
Stitches a master audio file (if ffmpeg is available)
Generates YouTube chapters, SRT captions, a review page, and a ready-made description
Optionally dubs your video with the voiceover

Full options:

Option	Description
---	---
`-i, --input`	Script file (.txt or .md) — required
`-v, --voice`	Voice alias or UUID — required
-t, --title </code></td><td>Video title (defaults to filename)</td></tr><tr><td><code>--template youtube</code></td><td>Auto-inject YouTube intro/outro</td></tr><tr><td><code>--mode <mode></code></td><td><code>headings</code> or <code>auto</code> (default: headings for .md)</td></tr><tr><td><code>--max-chars <n></code></td><td>Max characters per auto-chunk (default: 1500)</td></tr><tr><td><code>--language <code></code></td><td>Language code (default: en)</td></tr><tr><td><code>--video <path></code></td><td>Input video to dub</td></tr><tr><td><code>--mux</code></td><td>Enable video dubbing (requires --video)</td></tr><tr><td><code>--sync <policy></code></td><td><code>shortest</code>, <code>pad</code>, or <code>trim</code> (default: shortest)</td></tr><tr><td><code>--force</code></td><td>Re-render all segments (ignore cache)</td></tr><tr><td><code>--mock</code></td><td>Mock mode — no API calls, placeholder audio</td></tr><tr><td><code>-o, --out <dir></code></td><td>Custom output directory</td></tr></tbody></table><h3><code>replace-audio</code> — Dub an existing video</h3><pre><code>node voiceai-vo.cjs replace-audio \ --video ./my-video.mp4 \ --audio ./out/my-video/master.wav \ [--out ./out/my-video/dubbed.mp4] \ [--sync shortest\|pad\|trim] </code></pre><p>Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.</p><table><thead><tr><th>Sync policy</th><th>Behavior</th></tr></thead><tbody><tr><td>---</td><td>---</td></tr><tr><td><code>shortest</code> (default)</td><td>Output ends when the shorter track ends</td></tr><tr><td><code>pad</code></td><td>Pad audio with silence to match video duration</td></tr><tr><td><code>trim</code></td><td>Trim audio to match video duration</td></tr></tbody></table><p>Video stream is copied without re-encoding (<code>-c:v copy</code>). Audio is encoded as AAC for YouTube compatibility.</p><p><strong>Privacy:</strong> Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.</p><h3><code>voices</code> — List available voices</h3><pre><code>node voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock] </code></pre><hr><h2>Available voices</h2><p>Use short aliases or full UUIDs with <code>--voice</code>:</p><table><thead><tr><th>Alias</th><th>Voice</th><th>Gender</th><th>Best for YouTube</th></tr></thead><tbody><tr><td>----------</td><td>----------------------</td><td>--------</td><td>-----------------------------------</td></tr><tr><td><code>ellie</code></td><td>Ellie</td><td>F</td><td>Vlogs, lifestyle, social content</td></tr><tr><td><code>oliver</code></td><td>Oliver</td><td>M</td><td>Tutorials, narration, explainers</td></tr><tr><td><code>lilith</code></td><td>Lilith</td><td>F</td><td>ASMR, calm walkthroughs</td></tr><tr><td><code>smooth</code></td><td>Smooth Calm Voice</td><td>M</td><td>Documentaries, long-form essays</td></tr><tr><td><code>corpse</code></td><td>Corpse Husband</td><td>M</td><td>Gaming, entertainment</td></tr><tr><td><code>skadi</code></td><td>Skadi</td><td>F</td><td>Anime, character content</td></tr><tr><td><code>zhongli</code></td><td>Zhongli</td><td>M</td><td>Gaming, dramatic intros</td></tr><tr><td><code>flora</code></td><td>Flora</td><td>F</td><td>Kids content, upbeat videos</td></tr><tr><td><code>chief</code></td><td>Master Chief</td><td>M</td><td>Gaming, action trailers</td></tr></tbody></table><p>The <code>voices</code> command also returns any additional voices available on the API. Voice list is cached for 10 minutes.</p><hr><h2>Build outputs</h2><p>After a build, the output directory contains everything you need to publish on YouTube:</p><pre><code>out/<title-slug>/ segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …) master.wav # Stitched voiceover (requires ffmpeg) master.mp3 # MP3 for upload (requires ffmpeg) muxed.mp4 # Dubbed video (if --video --mux used) chapters.txt # Paste into YouTube description captions.srt # Upload as YouTube subtitles description.txt # Ready-made YouTube description with chapters review.html # Interactive review page with audio players manifest.json # Build metadata: voice, template, segment list timeline.json # Segment durations and start times </code></pre><h3>YouTube workflow</h3><ol><li>Run the build command</li><li>Upload <code>muxed.mp4</code> (or your original video + <code>master.mp3</code> as audio)</li><li>Paste <code>chapters.txt</code> content into your YouTube description</li><li>Upload <code>captions.srt</code> as subtitles in YouTube Studio</li><li>Done — professional narration, chapters, and captions in minutes</li></ol><hr><h2>YouTube template</h2><p>Use <code>--template youtube</code> to auto-inject a branded intro and outro:</p><table><thead><tr><th>Segment</th><th>Source file</th></tr></thead><tbody><tr><td>---</td><td>---</td></tr><tr><td>Intro (prepended)</td><td><code>templates/youtube_intro.txt</code></td></tr><tr><td>Outro (appended)</td><td><code>templates/youtube_outro.txt</code></td></tr></tbody></table><p>Edit the files in <code>templates/</code> to customize your channel's branding.</p><hr><h2>Caching</h2><p>Segments are cached by a hash of: <code>text content + voice ID + language</code>.</p><ul><li>Unchanged segments are <strong>skipped</strong> on rebuild — fast iteration</li><li>Modified segments are <strong>re-rendered</strong> automatically</li><li>Use <code>--force</code> to re-render everything</li><li>Cache manifest is stored in <code>segments/.cache.json</code></li></ul><hr><h2>Multilingual dubbing</h2><p>Voice.ai supports 11 languages — dub your YouTube videos for global audiences:</p><p><code>en</code>, <code>es</code>, <code>fr</code>, <code>de</code>, <code>it</code>, <code>pt</code>, <code>pl</code>, <code>ru</code>, <code>nl</code>, <code>sv</code>, <code>ca</code></p><pre><code>node voiceai-vo.cjs build \ --input script-spanish.md \ --voice ellie \ --title "Mi Video" \ --language es \ --video ./my-video.mp4 \ --mux </code></pre><p>The pipeline auto-selects the multilingual TTS model for non-English languages.</p><hr><h2>Troubleshooting</h2><table><thead><tr><th>Issue</th><th>Solution</th></tr></thead><tbody><tr><td>---</td><td>---</td></tr><tr><td><strong>ffmpeg missing</strong></td><td>Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing.</td></tr><tr><td><strong>Rate limits (429)</strong></td><td>Segments render sequentially, which stays under most limits. Wait and retry.</td></tr><tr><td><strong>Insufficient credits (402)</strong></td><td>Top up at <a href="https://voice.ai/dashboard" target="_blank" rel="noopener">voice.ai/dashboard</a>. Cached segments won't re-use credits on retry.</td></tr><tr><td><strong>Long scripts</strong></td><td>Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls.</td></tr><tr><td><strong>Windows paths</strong></td><td>Wrap paths with spaces in quotes: <code>--input "C:\My Scripts\script.md"</code></td></tr></tbody></table><p>See <a href="references/TROUBLESHOOTING.md" target="_blank" rel="noopener"><code>references/TROUBLESHOOTING.md</code></a> for more.</p><hr><h2>References</h2><ul><li><a href="https://agentskills.io/specification" target="_blank" rel="noopener">Agent Skills Specification</a></li><li><a href="https://voice.ai" target="_blank" rel="noopener">Voice.ai</a></li><li><a href="references/VOICEAI_API.md" target="_blank" rel="noopener"><code>references/VOICEAI_API.md</code></a> — API endpoints, audio formats, models</li><li><a href="references/TROUBLESHOOTING.md" target="_blank" rel="noopener"><code>references/TROUBLESHOOTING.md</code></a> — Common issues and fixes</li></ul></div> </div> </div> <div id="tab-versions" class="detail-content"> <div class="detail-section"> <h2>版本历史</h2> <p style="margin-bottom:12px;font-size:14px;color:#94a3b8;">共 1 个版本</p> <ul class="version-list"> <li> <div> <span class="version-tag">v0.1.6</span> <span style="font-size:11px;color:#5b6abf;margin-left:8px;background:#eef0ff;padding:1px 8px;border-radius:10px;">当前</span> </div> <div style="font-size:12px;color:#94a3b8;"> 2026-03-31 15:02 安全安全 </div> </li> </ul> </div> </div> <div id="tab-security" class="detail-content"> <div class="detail-section"> <h2>安全检测</h2> <div class="sec-grid"> <div class="sec-card"> <h4>腾讯云安全 (Keen)</h4> <div class="sec-status sec-safe"> 安全，无风险 </div> <a href="https://tix.qq.com/search/skill?keyword=5be7d6837832af5b055d82251b506ed3" target="_blank">查看报告</a> </div> <div class="sec-card"> <h4>腾讯云安全 (Sanbu)</h4> <div class="sec-status sec-safe"> 安全，无风险 </div> <a href="https://static.cloudsec.tencent.com/html-report-v2/2026/05/25/393951_4eccb3ca5aa3a5db6f624a90b341835d.html?q-sign-algorithm=sha1&q-ak=AKID8JMG1bzBC1dz96qNhssfFftujT1NCoFi&q-sign-time=1781384368%3B1812920368&q-key-time=1781384368%3B1812920368&q-header-list=host&q-url-param-list=&q-signature=b76860c67c1006f4db94bebc2c5b83df4eee3fb9" target="_blank">查看报告</a> </div> </div> </div> </div> <!-- Recommended Skills --> <div style="margin-top:24px;"> <h2 style="font-size:18px;font-weight:600;margin-bottom:16px;">🔗 相关推荐</h2> <div class="rec-grid"> <div class="rec-card"> <span class="badge-cat" style="margin-bottom:8px;display:inline-block;">developer-tools</span> <h3><a href="/s/voice-ai-voices">Voice.ai Voices</a></h3> <div class="rec-owner">gizmogremlin</div> <div class="rec-desc">利用 Voice.ai API 实现高质量语音合成，支持9种角色、11种语言及流式输出。</div> <div class="rec-stats"> <span style="color:#f39c12;">★ 0</span> <span style="color:#5b6abf;">📥 3,264</span> </div> </div> <div class="rec-card"> <span class="badge-cat" style="margin-bottom:8px;display:inline-block;">content-creation</span> <h3><a href="/s/admapix">AdMapix</a></h3> <div class="rec-owner">fly0pants</div> <div class="rec-desc">广告情报与应用数据分析助手，支持搜索广告素材、分析应用排名、下载量、收入及市场洞察，用于广告素材和竞品分析。</div> <div class="rec-stats"> <span style="color:#f39c12;">★ 295</span> <span style="color:#5b6abf;">📥 136,537</span> </div> </div> <div class="rec-card"> <span class="badge-cat" style="margin-bottom:8px;display:inline-block;">content-creation</span> <h3><a href="/s/ai-ppt-generator">Baidu Wenku AIPPT</a></h3> <div class="rec-owner">ide-rea</div> <div class="rec-desc">使用百度文库 AI 智能生成 PPT，自动根据内容选择模板。</div> <div class="rec-stats"> <span style="color:#f39c12;">★ 66</span> <span style="color:#5b6abf;">📥 46,237</span> </div> </div> </div> </div> </div> <script> document.addEventListener('DOMContentLoaded',function(){ document.querySelectorAll('.detail-tab').forEach(function(btn){ btn.addEventListener('click',function(e){ var tab = this.getAttribute('data-tab'); document.querySelectorAll('.detail-tab').forEach(function(b){b.classList.remove('active')}); document.querySelectorAll('.detail-content').forEach(function(c){c.classList.remove('active')}); this.classList.add('active'); var el = document.getElementById('tab-'+tab); if(el) el.classList.add('active'); }); }); }); </script> <div class="footer"> <p>Skill工具集 © 2026</p> </div></body> </html>

Dub YouTube with Voice.ai

概述

Dub YouTube with Voice.ai

When to use this skill

The one-command workflow

Requirements

Configuration

Commands

build — Generate a YouTube voiceover from a script

`build` — Generate a YouTube voiceover from a script