Convert a Markdown document into a series of narrated, animated video lessons with voiceover.
Before any work, ask the user to confirm:
IMPORTANT: The Remotion Composition always uses 1920x1080 as the layout size.
Higher resolutions are achieved via the --scale render flag, NOT by changing width/height.
This is the single most important lesson from production experience.
| Label | Layout Size | Render Flag | Output Resolution |
|---|---|---|---|
| ------- | ------------ | ------------- | ------------------- |
| 1080p | 1920x1080 | (none) | 1920x1080 |
| 2K | 1920x1080 | --scale=1.33 | 2560x1440 |
| 4K | 1920x1080 | --scale=2 | 3840x2160 |
CRITICAL: Never set Composition width/height to 3840x2160 directly.
All CSS pixel values (fontSize, padding, gap, width) are designed for 1920x1080.
Setting the Composition to 4K makes everything appear tiny.
Instead, use --scale=2 at render time to upscale without affecting layout.
| Voice ID | Gender | Style | Best For |
|---|---|---|---|
| ---------- | -------- | ------- | ---------- |
| zh-CN-YunyangNeural | Male | Professional | Tutorials, lectures |
| zh-CN-YunxiNeural | Male | Lively | Casual, vlogs |
| zh-CN-XiaoxiaoNeural | Female | Warm | Storytelling |
| zh-CN-XiaoyiNeural | Female | Broadcast | News-style |
Read the target .md file and extract:
Output a chapter plan:
Ch01: [title] - [key points] - connects to next via [hook]
Ch02: [title] - [key points] - connects to next via [hook]
Invoke the environment-setup skill or manually set up:
mkdir -p remotion-videos && cd remotion-videos
npm init -y
npm install remotion @remotion/cli react react-dom @remotion/google-fonts typescript @types/react
mkdir -p src/shared src/compositions public/voiceover public/fonts output
Create these configuration files from the assets/ directory bundled with this skill:
Global design tokens and chapter metadata:
COLORS = { bg, primary, secondary, accent, warning, text, textMuted, cardBg, codeBg, ... }
SPEC = { WIDTH: 1920, HEIGHT: 1080, FPS: 60 }
CHAPTER_TITLES = [...] // one per chapter from Step 1
CHAPTER_COLORS = [...] // accent color per chapter
A table-of-contents transition page shown at the START of every chapter.
CRITICAL COMPONENT for voiceover integration.
This component reads voiceover-durations.json and automatically overlays
Audio Sequence elements for each scene in a chapter.
It sits at the Composition level (inside AbsoluteFill), alongside visual Sequences.
// Pseudo-code structure:
ChapterAudioLayer({ chapter: "ch01" })
reads durations from voiceover-durations.json
for each scene: renders Sequence(from=offset, duration=audioDur+pad)
containing Audio(src=staticFile("voiceover/ch01/scene1.mp3"))
This is the component that makes voiceover work. Without it, videos render silent.
Register all Compositions. Key patterns:
For each chapter create src/compositions/chXX-slug/VideoComposition.tsx.
AbsoluteFill(bg)
ChapterAudioLayer(chapter="chXX") // AUDIO LAYER - always first
Sequence(outline, 240 frames) // Outline page
Sequence(scene1, audio-synced frames) // Content scene 1
Sequence(scene2, audio-synced frames) // Content scene 2
...
All editable values at the top of each file:
const COPY = { title, points: [...] }
This is the MOST CRITICAL quality step.
Map the narrative flow before generating code:
Ch01 -> "AI is everywhere" -> introduces linear algebra
Ch02 -> "one person = vector" -> introduces matrix
Ch03 -> "whole class = matrix" -> introduces matrix multiply
...each chapter hooks into the next
When video content is polished, ALSO update the source .md to match.
Article and video should tell the same story with the same metaphors.
For each scene write a spoken narration that:
Tool: edge-tts (Microsoft free TTS). Install: pip install edge-tts
Use the generate-voiceover.sh script bundled with this skill as a template.
Key command per scene:
edge-tts --voice zh-CN-YunyangNeural --rate=-5% \
--text "narration text" --write-media public/voiceover/ch01/scene1.mp3
Get duration:
ffprobe -v quiet -show_entries format=duration \
-of default=noprint_wrappers=1:nokey=1 file.mp3
Write all durations to src/voiceover-durations.json:
{ "ch01": { "scene1": 13.2, "scene2": 17.9 }, "ch02": { ... } }
Root.tsx reads this file to auto-compute durationInFrames per Composition.
ChapterAudioLayer reads it to position Audio Sequences correctly.
ChapterAudioLayer handles this automatically.
Ensure every Composition has this as the first child of AbsoluteFill:
ChapterAudioLayer(chapter="chXX")
# 1080p (default)
npx remotion render src/index.ts ch01-slug output/ch01.mp4
# 2K
npx remotion render src/index.ts ch01-slug output/ch01.mp4 --scale=1.33
# 4K
npx remotion render src/index.ts ch01-slug output/ch01.mp4 --scale=2
CRITICAL REMINDER: Never change Composition width/height for higher resolution.
Always use --scale flag. Layout stays 1920x1080, output scales up.
SCALE_FLAG="" # or "--scale=2" for 4K
for id in ch01-slug ch02-slug ...; do
npx remotion render src/index.ts $id output/$id.mp4 $SCALE_FLAG
done
for f in output/ch*.mp4; do echo "file '$PWD/$f'"; done > output/filelist.txt
ffmpeg -f concat -safe 0 -i output/filelist.txt -c copy output/full-video.mp4
NEVER export COLORS from VideoComposition.tsx if Scene files import from it.
Always put shared constants in a separate constants.ts file.
Chinese curly quotes inside JS string literals cause esbuild parse errors.
Always use straight quotes or remove decorative quotes.
Audio does not embed automatically. Must explicitly add ChapterAudioLayer
component inside each Composition's AbsoluteFill.
Never set Composition to 3840x2160. All CSS is designed for 1920x1080.
Use --scale=2 at render time instead. This was the single biggest gotcha.
After generating TTS, verify ALL chapters have entries in the JSON.
Use ffprobe to get accurate durations. Missing chapters get fallback 600 frames
per scene which causes audio/visual desync.
| Phase | Skill | Purpose |
|---|---|---|
| ------- | ------- | --------- |
| Setup | environment-setup | Node.js, FFmpeg, Remotion |
| Code | remotion-best-practices | Animation quality rules |
| Code | scene-planner | Storyboard (optional) |
remotion-videos/
scripts/
generate-voiceover.sh # TTS batch generation
sync-durations.js # Audio to frame sync
public/voiceover/chXX/ # MP3 files per chapter
src/
index.ts # Entry point
Root.tsx # All Compositions (1920x1080 layout)
voiceover-durations.json # Audio duration data
shared/
constants.ts # Colors, titles, specs
OutlinePage.tsx # TOC transition page
ChapterAudioLayer.tsx # Audio overlay component
compositions/
ch01-slug/VideoComposition.tsx
ch02-slug/VideoComposition.tsx
...
output/ # Rendered MP4s
remotion.config.ts
tsconfig.json
package.json
共 1 个版本