Voice.ai Creator Voiceover Pipeline

> This skill follows the Agent Skills specification.

Turn any script into a publish-ready voiceover — complete with numbered segments, a stitched master, YouTube chapters, SRT captions, and a beautiful review page. Optionally, replace the audio track on an existing video.

Built for creators who want studio-quality voiceovers without the studio. Powered by Voice.ai.

When to use this skill

Scenario	Why it fits
---	---
YouTube long-form	Full narration with chapter markers and captions
YouTube Shorts	Quick hooks with the `shortform` template
Podcasts	Consistent host voice, intro/outro templates
Course content	Professional narration for educational videos
Quick iteration	Smart caching — edit one section, only that segment re-renders
Video audio replacement	Drop AI voiceover onto screen recordings or B-roll

The one-command workflow

Have a script and a video? Turn them into a finished video with AI voiceover in one shot:

node voiceai-vo.cjs build \
  --input my-script.md \
  --voice oliver \
  --title "My Video" \
  --video ./my-recording.mp4 \
  --mux

This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:

out/my-video/muxed.mp4 — your video with the new voiceover
out/my-video/master.wav — the standalone audio
out/my-video/review.html — listen and review each segment
out/my-video/chapters.txt — YouTube-ready chapter timestamps
out/my-video/captions.srt — SRT captions

Use --sync pad if the audio is shorter than the video, or --sync trim to cut it to match.

Requirements

Node.js 20+ — runtime (no npm install needed — the CLI is a single bundled file)
VOICE_AI_API_KEY — set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard.
ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video muxing. The pipeline still produces individual segments, the review page, chapters, and captions without it.

Configuration

The skill reads VOICE_AI_API_KEY from (in order):

Environment variable VOICE_AI_API_KEY
Environment variable VOICEAI_API_KEY (alternate)
.env file in the skill root

echo 'VOICE_AI_API_KEY=your-key-here' > .env

Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).

Commands

`build` — Generate a voiceover from a script

node voiceai-vo.cjs build \
  --input <script.md or script.txt> \
  --voice <voice-alias-or-uuid> \
  --title "My Project" \
  [--template youtube|podcast|shortform] \
  [--language en] \
  [--video input.mp4 --mux --sync shortest] \
  [--force] [--mock]

What it does:

Reads the script and splits it into segments (by ## headings for .md, or by sentence boundaries for .txt)
Optionally prepends/appends template intro/outro segments
Renders each segment via Voice.ai TTS as a numbered WAV file
Stitches a master audio file (if ffmpeg is available)
Generates chapters, captions, a review page, and metadata files
Optionally muxes the voiceover into an existing video

Full options:

Option	Description
---	---
`-i, --input`	Script file (.txt or .md) — required
`-v, --voice`	Voice alias or UUID — required
-t, --title </code></td><td>Project title (defaults to filename)</td></tr><tr><td><code>--template <name></code></td><td><code>youtube</code>, <code>podcast</code>, or <code>shortform</code></td></tr><tr><td><code>--mode <mode></code></td><td><code>headings</code> or <code>auto</code> (default: headings for .md)</td></tr><tr><td><code>--max-chars <n></code></td><td>Max characters per auto-chunk (default: 1500)</td></tr><tr><td><code>--language <code></code></td><td>Language code (default: en)</td></tr><tr><td><code>--video <path></code></td><td>Input video for muxing</td></tr><tr><td><code>--mux</code></td><td>Enable video muxing (requires --video)</td></tr><tr><td><code>--sync <policy></code></td><td><code>shortest</code>, <code>pad</code>, or <code>trim</code> (default: shortest)</td></tr><tr><td><code>--force</code></td><td>Re-render all segments (ignore cache)</td></tr><tr><td><code>--mock</code></td><td>Mock mode — no API calls, placeholder audio</td></tr><tr><td><code>-o, --out <dir></code></td><td>Custom output directory</td></tr></tbody></table><h3><code>replace-audio</code> — Swap the audio track on a video</h3><pre><code>node voiceai-vo.cjs replace-audio \ --video ./input.mp4 \ --audio ./out/my-project/master.wav \ [--out ./out/my-project/muxed.mp4] \ [--sync shortest\|pad\|trim] </code></pre><p>Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.</p><table><thead><tr><th>Sync policy</th><th>Behavior</th></tr></thead><tbody><tr><td>---</td><td>---</td></tr><tr><td><code>shortest</code> (default)</td><td>Output ends when the shorter track ends</td></tr><tr><td><code>pad</code></td><td>Pad audio with silence to match video duration</td></tr><tr><td><code>trim</code></td><td>Trim audio to match video duration</td></tr></tbody></table><p>Video stream is copied without re-encoding (<code>-c:v copy</code>). Audio is encoded as AAC. A mux report is saved alongside the output.</p><p><strong>Privacy:</strong> Video processing is entirely local. Only script text is sent to Voice.ai for TTS.</p><h3><code>voices</code> — List available voices</h3><pre><code>node voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock] </code></pre><hr><h2>Available voices</h2><p>Use short aliases or full UUIDs with <code>--voice</code>:</p><table><thead><tr><th>Alias</th><th>Voice</th><th>Gender</th><th>Style</th></tr></thead><tbody><tr><td>----------</td><td>----------------------</td><td>--------</td><td>--------------------------</td></tr><tr><td><code>ellie</code></td><td>Ellie</td><td>F</td><td>Youthful, vibrant vlogger</td></tr><tr><td><code>oliver</code></td><td>Oliver</td><td>M</td><td>Friendly British</td></tr><tr><td><code>lilith</code></td><td>Lilith</td><td>F</td><td>Soft, feminine</td></tr><tr><td><code>smooth</code></td><td>Smooth Calm Voice</td><td>M</td><td>Deep, smooth narrator</td></tr><tr><td><code>corpse</code></td><td>Corpse Husband</td><td>M</td><td>Deep, distinctive</td></tr><tr><td><code>skadi</code></td><td>Skadi</td><td>F</td><td>Anime character</td></tr><tr><td><code>zhongli</code></td><td>Zhongli</td><td>M</td><td>Deep, authoritative</td></tr><tr><td><code>flora</code></td><td>Flora</td><td>F</td><td>Cheerful, high pitch</td></tr><tr><td><code>chief</code></td><td>Master Chief</td><td>M</td><td>Heroic, commanding</td></tr></tbody></table><p>The <code>voices</code> command also returns any additional voices available on the API. Voice list is cached for 10 minutes.</p><hr><h2>Build outputs</h2><p>After a build, the output directory contains:</p><pre><code>out/<title-slug>/ segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …) master.wav # Stitched audio (requires ffmpeg) master.mp3 # MP3 encode (requires ffmpeg) manifest.json # Build metadata: voice, template, segment list, hashes timeline.json # Segment durations and start times review.html # Interactive review page with audio players chapters.txt # YouTube-friendly chapter timestamps captions.srt # SRT captions using segment boundaries description.txt # YouTube description with chapters + Voice.ai credit </code></pre><h3>review.html</h3><p>A standalone HTML page with:</p><ul><li>Master audio player (if stitched)</li><li>Individual segment players with titles and durations</li><li>Collapsible script text for each segment</li><li>Regeneration command hints</li></ul><hr><h2>Templates</h2><p>Templates auto-inject intro/outro segments around the script content:</p><table><thead><tr><th>Template</th><th>Prepends</th><th>Appends</th></tr></thead><tbody><tr><td>---</td><td>---</td><td>---</td></tr><tr><td><code>youtube</code></td><td><code>templates/youtube_intro.txt</code></td><td><code>templates/youtube_outro.txt</code></td></tr><tr><td><code>podcast</code></td><td><code>templates/podcast_intro.txt</code></td><td>—</td></tr><tr><td><code>shortform</code></td><td><code>templates/shortform_hook.txt</code></td><td>—</td></tr></tbody></table><p>Edit the files in <code>templates/</code> to customize the intro/outro text.</p><hr><h2>Caching</h2><p>Segments are cached by a hash of: <code>text content + voice ID + language</code>.</p><ul><li>Unchanged segments are <strong>skipped</strong> on rebuild — fast iteration</li><li>Modified segments are <strong>re-rendered</strong> automatically</li><li>Use <code>--force</code> to re-render everything</li><li>Cache manifest is stored in <code>segments/.cache.json</code></li></ul><hr><h2>Multilingual support</h2><p>Voice.ai supports 11 languages. Use <code>--language <code></code> to switch:</p><p><code>en</code>, <code>es</code>, <code>fr</code>, <code>de</code>, <code>it</code>, <code>pt</code>, <code>pl</code>, <code>ru</code>, <code>nl</code>, <code>sv</code>, <code>ca</code></p><p>The pipeline auto-selects the multilingual TTS model for non-English languages.</p><hr><h2>Troubleshooting</h2><table><thead><tr><th>Issue</th><th>Solution</th></tr></thead><tbody><tr><td>---</td><td>---</td></tr><tr><td><strong>ffmpeg missing</strong></td><td>Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for master stitching and video muxing.</td></tr><tr><td><strong>Rate limits (429)</strong></td><td>Segments render sequentially, which stays under most limits. Wait and retry.</td></tr><tr><td><strong>Insufficient credits (402)</strong></td><td>Top up at <a href="https://voice.ai/dashboard" target="_blank" rel="noopener">voice.ai/dashboard</a>. Cached segments won't re-use credits on retry.</td></tr><tr><td><strong>Long scripts</strong></td><td>Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls.</td></tr><tr><td><strong>Windows paths</strong></td><td>Wrap paths with spaces in quotes: <code>--input "C:\My Scripts\script.md"</code></td></tr></tbody></table><p>See <a href="references/TROUBLESHOOTING.md" target="_blank" rel="noopener"><code>references/TROUBLESHOOTING.md</code></a> for more.</p><hr><h2>References</h2><ul><li><a href="https://agentskills.io/specification" target="_blank" rel="noopener">Agent Skills Specification</a></li><li><a href="https://voice.ai" target="_blank" rel="noopener">Voice.ai</a></li><li><a href="references/VOICEAI_API.md" target="_blank" rel="noopener"><code>references/VOICEAI_API.md</code></a> — API endpoints, audio formats, models</li><li><a href="references/TROUBLESHOOTING.md" target="_blank" rel="noopener"><code>references/TROUBLESHOOTING.md</code></a> — Common issues and fixes</li></ul></div> </div> </div> <div id="tab-versions" class="detail-content"> <div class="detail-section"> <h2>版本历史</h2> <p style="margin-bottom:12px;font-size:14px;color:#94a3b8;">共 1 个版本</p> <ul class="version-list"> <li> <div> <span class="version-tag">v0.1.3</span> <span style="font-size:11px;color:#5b6abf;margin-left:8px;background:#eef0ff;padding:1px 8px;border-radius:10px;">当前</span> </div> <div style="font-size:12px;color:#94a3b8;"> 2026-03-28 21:36 安全安全 </div> </li> </ul> </div> </div> <div id="tab-security" class="detail-content"> <div class="detail-section"> <h2>安全检测</h2> <div class="sec-grid"> <div class="sec-card"> <h4>腾讯云安全 (Keen)</h4> <div class="sec-status sec-safe"> 安全，无风险 </div> <a href="https://tix.qq.com/search/skill?keyword=94eeb590fc5ac182c5bc986ad686ed16" target="_blank">查看报告</a> </div> <div class="sec-card"> <h4>腾讯云安全 (Sanbu)</h4> <div class="sec-status sec-safe"> 安全，无风险 </div> <a href="https://static.cloudsec.tencent.com/html-report-v2/2026/05/25/394163_779bf18305dcb0b97dd31a9135c7213e.html?q-sign-algorithm=sha1&q-ak=AKID8JMG1bzBC1dz96qNhssfFftujT1NCoFi&q-sign-time=1781284787%3B1812820787&q-key-time=1781284787%3B1812820787&q-header-list=host&q-url-param-list=&q-signature=b5f3802ca9964d8c203272f26ba6d38edfbe49cb" target="_blank">查看报告</a> </div> </div> </div> </div> <!-- Recommended Skills --> <div style="margin-top:24px;"> <h2 style="font-size:18px;font-weight:600;margin-bottom:16px;">🔗 相关推荐</h2> <div class="rec-grid"> <div class="rec-card"> <span class="badge-cat" style="margin-bottom:8px;display:inline-block;">developer-tools</span> <h3><a href="/s/voice-ai-voices">Voice.ai Voices</a></h3> <div class="rec-owner">gizmogremlin</div> <div class="rec-desc">利用 Voice.ai API 实现高质量语音合成，支持9种角色、11种语言及流式输出。</div> <div class="rec-stats"> <span style="color:#f39c12;">★ 0</span> <span style="color:#5b6abf;">📥 3,263</span> </div> </div> <div class="rec-card"> <span class="badge-cat" style="margin-bottom:8px;display:inline-block;">content-creation</span> <h3><a href="/s/humanizer">Humanizer</a></h3> <div class="rec-owner">biostartechnology</div> <div class="rec-desc">消除AI写作痕迹，使文本更自然真实。基于维基百科"AI写作特征"指南，识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。</div> <div class="rec-stats"> <span style="color:#f39c12;">★ 860</span> <span style="color:#5b6abf;">📥 199,611</span> </div> </div> <div class="rec-card"> <span class="badge-cat" style="margin-bottom:8px;display:inline-block;">content-creation</span> <h3><a href="/s/ai-ppt-generator">Baidu Wenku AIPPT</a></h3> <div class="rec-owner">ide-rea</div> <div class="rec-desc">使用百度文库 AI 智能生成 PPT，自动根据内容选择模板。</div> <div class="rec-stats"> <span style="color:#f39c12;">★ 66</span> <span style="color:#5b6abf;">📥 46,168</span> </div> </div> </div> </div> </div> <script> document.addEventListener('DOMContentLoaded',function(){ document.querySelectorAll('.detail-tab').forEach(function(btn){ btn.addEventListener('click',function(e){ var tab = this.getAttribute('data-tab'); document.querySelectorAll('.detail-tab').forEach(function(b){b.classList.remove('active')}); document.querySelectorAll('.detail-content').forEach(function(c){c.classList.remove('active')}); this.classList.add('active'); var el = document.getElementById('tab-'+tab); if(el) el.classList.add('active'); }); }); }); </script> <div class="footer"> <p>Skill工具集 © 2026</p> </div></body> </html>

Voice.ai: Creator Voiceover Forge

概述

Voice.ai Creator Voiceover Pipeline

When to use this skill

The one-command workflow

Requirements

Configuration

Commands

`build` — Generate a voiceover from a script

Voice.ai: Creator Voiceover Forge

概述

Voice.ai Creator Voiceover Pipeline

When to use this skill

The one-command workflow

Requirements

Configuration

Commands

build — Generate a voiceover from a script

`build` — Generate a voiceover from a script