概述

MD to Video Course

Convert a Markdown document into a series of narrated, animated video lessons with voiceover.

Step 0: Confirm Specs

Before any work, ask the user to confirm:

Resolution — 1080p / 2K / 4K (see Resolution section below)
Voice gender — Male or Female
FPS — 30 or 60

IMPORTANT: The Remotion Composition always uses 1920x1080 as the layout size.

Higher resolutions are achieved via the --scale render flag, NOT by changing width/height.

This is the single most important lesson from production experience.

Resolution Options

Label	Layout Size	Render Flag	Output Resolution
-------	------------	-------------	-------------------
1080p	1920x1080	(none)	1920x1080
2K	1920x1080	--scale=1.33	2560x1440
4K	1920x1080	--scale=2	3840x2160

CRITICAL: Never set Composition width/height to 3840x2160 directly.

All CSS pixel values (fontSize, padding, gap, width) are designed for 1920x1080.

Setting the Composition to 4K makes everything appear tiny.

Instead, use --scale=2 at render time to upscale without affecting layout.

Voice Options (edge-tts)

Voice ID	Gender	Style	Best For
----------	--------	-------	----------
zh-CN-YunyangNeural	Male	Professional	Tutorials, lectures
zh-CN-YunxiNeural	Male	Lively	Casual, vlogs
zh-CN-XiaoxiaoNeural	Female	Warm	Storytelling
zh-CN-XiaoyiNeural	Female	Broadcast	News-style

Step 1: Analyze the Markdown

Read the target .md file and extract:

Chapter structure: each ## heading becomes one video (one Remotion Composition)
Key concepts per chapter: definitions, formulas, examples, metaphors
Logical flow: how chapters connect (for transition text between chapters)

Output a chapter plan:

Ch01: [title] - [key points] - connects to next via [hook]
Ch02: [title] - [key points] - connects to next via [hook]

Step 2: Scaffold the Remotion Project

Invoke the environment-setup skill or manually set up:

mkdir -p remotion-videos && cd remotion-videos
npm init -y
npm install remotion @remotion/cli react react-dom @remotion/google-fonts typescript @types/react
mkdir -p src/shared src/compositions public/voiceover public/fonts output

Create these configuration files from the assets/ directory bundled with this skill:

remotion.config.ts (from assets/)
src/index.ts (from assets/)
tsconfig.json (standard React JSX config)
package.json scripts: dev, render, build

Shared Components to Create

src/shared/constants.ts

Global design tokens and chapter metadata:

COLORS = { bg, primary, secondary, accent, warning, text, textMuted, cardBg, codeBg, ... }
SPEC = { WIDTH: 1920, HEIGHT: 1080, FPS: 60 }
CHAPTER_TITLES = [...] // one per chapter from Step 1
CHAPTER_COLORS = [...] // accent color per chapter

src/shared/OutlinePage.tsx

A table-of-contents transition page shown at the START of every chapter.

Display all chapter titles in two columns
Highlight the current chapter with colored border + scale animation
Gray out + strikethrough completed chapters
Dim upcoming chapters
Duration: 240 frames (4s at 60fps), fade-out at end
See references/shared-components.md for full template code

src/shared/ChapterAudioLayer.tsx

CRITICAL COMPONENT for voiceover integration.

This component reads voiceover-durations.json and automatically overlays

Audio Sequence elements for each scene in a chapter.

It sits at the Composition level (inside AbsoluteFill), alongside visual Sequences.

// Pseudo-code structure:
ChapterAudioLayer({ chapter: "ch01" })
  reads durations from voiceover-durations.json
  for each scene: renders Sequence(from=offset, duration=audioDur+pad)
    containing Audio(src=staticFile("voiceover/ch01/scene1.mp3"))

This is the component that makes voiceover work. Without it, videos render silent.

src/Root.tsx

Register all Compositions. Key patterns:

Layout is ALWAYS 1920x1080 (never change this for higher resolution)
Import voiceover-durations.json to auto-calculate durationInFrames
calcFrames function: OUTLINE(240) + sum of (ceil(audioDur * FPS) + PAD(60)) per scene

Step 3: Generate Compositions

For each chapter create src/compositions/chXX-slug/VideoComposition.tsx.

Structure Pattern (pseudo-code)

AbsoluteFill(bg)
  ChapterAudioLayer(chapter="chXX")    // AUDIO LAYER - always first
  Sequence(outline, 240 frames)         // Outline page
  Sequence(scene1, audio-synced frames) // Content scene 1
  Sequence(scene2, audio-synced frames) // Content scene 2
  ...

Animation Rules

ALWAYS use useCurrentFrame() + interpolate() or spring() for animations
NEVER use CSS transitions or animations
ALWAYS add extrapolateLeft: "clamp", extrapolateRight: "clamp" to interpolate
Use spring() for entrances (cards, text), interpolate() for fades/slides
Stagger items: spring({ frame: f - (baseDelay + i * gap), ... })
Last scene: add fadeOut via interpolate(f, [endMinus60, end], [1, 0])

Constants-First Design

All editable values at the top of each file:

const COPY = { title, points: [...] }

Step 4: Polish - Plain Language and Transitions

This is the MOST CRITICAL quality step.

Plain Language Rules

Start from daily life, not from math - open with a relatable scenario
Connect to previous chapter - "Last time we learned [X], but what if [Y]?"
Preview next chapter - "Next: [concept name]"
Every abstract concept needs a concrete analogy
No jargon without immediate plain-language translation
Remove section numbers - no "Section X", just titles

Transition Chain

Map the narrative flow before generating code:

Ch01 -> "AI is everywhere" -> introduces linear algebra
Ch02 -> "one person = vector" -> introduces matrix
Ch03 -> "whole class = matrix" -> introduces matrix multiply
...each chapter hooks into the next

Sync Article Updates

When video content is polished, ALSO update the source .md to match.

Article and video should tell the same story with the same metaphors.

Step 5: Voiceover Generation

Write Narration Scripts

For each scene write a spoken narration that:

Matches visual content timing
Uses conversational oral Chinese (not written style)
Has natural pauses via punctuation
Says "A cheng B" not "A dot B" for formulas

Generate TTS Audio

Tool: edge-tts (Microsoft free TTS). Install: pip install edge-tts

Use the generate-voiceover.sh script bundled with this skill as a template.

Key command per scene:

edge-tts --voice zh-CN-YunyangNeural --rate=-5% \
  --text "narration text" --write-media public/voiceover/ch01/scene1.mp3

Get duration:

ffprobe -v quiet -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 file.mp3

Save Duration Data

Write all durations to src/voiceover-durations.json:

{ "ch01": { "scene1": 13.2, "scene2": 17.9 }, "ch02": { ... } }

Root.tsx reads this file to auto-compute durationInFrames per Composition.

ChapterAudioLayer reads it to position Audio Sequences correctly.

Embed Audio

ChapterAudioLayer handles this automatically.

Ensure every Composition has this as the first child of AbsoluteFill:

ChapterAudioLayer(chapter="chXX")

Step 6: Render and Merge

Render with Resolution Choice

# 1080p (default)
npx remotion render src/index.ts ch01-slug output/ch01.mp4

# 2K
npx remotion render src/index.ts ch01-slug output/ch01.mp4 --scale=1.33

# 4K
npx remotion render src/index.ts ch01-slug output/ch01.mp4 --scale=2

CRITICAL REMINDER: Never change Composition width/height for higher resolution.

Always use --scale flag. Layout stays 1920x1080, output scales up.

Batch Render

SCALE_FLAG=""  # or "--scale=2" for 4K
for id in ch01-slug ch02-slug ...; do
  npx remotion render src/index.ts $id output/$id.mp4 $SCALE_FLAG
done

Merge into Final Video

for f in output/ch*.mp4; do echo "file '$PWD/$f'"; done > output/filelist.txt
ffmpeg -f concat -safe 0 -i output/filelist.txt -c copy output/full-video.mp4

Common Pitfalls (from production experience)

Circular dependency: COLORS before initialization

NEVER export COLORS from VideoComposition.tsx if Scene files import from it.

Always put shared constants in a separate constants.ts file.

Chinese quotes in JSX strings

Chinese curly quotes inside JS string literals cause esbuild parse errors.

Always use straight quotes or remove decorative quotes.

Silent video (no audio)

Audio does not embed automatically. Must explicitly add ChapterAudioLayer

component inside each Composition's AbsoluteFill.

Tiny text at 4K

Never set Composition to 3840x2160. All CSS is designed for 1920x1080.

Use --scale=2 at render time instead. This was the single biggest gotcha.

voiceover-durations.json incomplete

After generating TTS, verify ALL chapters have entries in the JSON.

Use ffprobe to get accurate durations. Missing chapters get fallback 600 frames

per scene which causes audio/visual desync.

Skills Used in Pipeline

Phase	Skill	Purpose
-------	-------	---------
Setup	environment-setup	Node.js, FFmpeg, Remotion
Code	remotion-best-practices	Animation quality rules
Code	scene-planner	Storyboard (optional)

File Structure

remotion-videos/
  scripts/
    generate-voiceover.sh   # TTS batch generation
    sync-durations.js       # Audio to frame sync
  public/voiceover/chXX/    # MP3 files per chapter
  src/
    index.ts                # Entry point
    Root.tsx                 # All Compositions (1920x1080 layout)
    voiceover-durations.json # Audio duration data
    shared/
      constants.ts          # Colors, titles, specs
      OutlinePage.tsx        # TOC transition page
      ChapterAudioLayer.tsx  # Audio overlay component
    compositions/
      ch01-slug/VideoComposition.tsx
      ch02-slug/VideoComposition.tsx
      ...
  output/                   # Rendered MP4s
  remotion.config.ts
  tsconfig.json
  package.json

版本历史

共 1 个版本

v1.0.0 Initial release 当前

2026-04-08 23:26 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)