← 返回
未分类

md-to-video-course

This skill converts a Markdown document into a series of narrated video lessons using Remotion. It should be used when the user provides a Markdown (.md) article and wants to generate educational videos with voiceover narration. The skill handles the full pipeline from document analysis to final MP4 rendering, including TTS voiceover via edge-tts. Trigger phrases include generate video from markdown, turn this article into videos, make a video course from this document.
yakargao
未分类 community v1.0.0 1 版本 99159.7 Key: 无需
★ 0
Stars
📥 118
下载
💾 1
安装
1
版本
#latest

概述

MD to Video Course

Convert a Markdown document into a series of narrated, animated video lessons with voiceover.

Step 0: Confirm Specs

Before any work, ask the user to confirm:

  1. Resolution — 1080p / 2K / 4K (see Resolution section below)
  2. Voice gender — Male or Female
  3. FPS — 30 or 60

IMPORTANT: The Remotion Composition always uses 1920x1080 as the layout size.

Higher resolutions are achieved via the --scale render flag, NOT by changing width/height.

This is the single most important lesson from production experience.

Resolution Options

LabelLayout SizeRender FlagOutput Resolution
---------------------------------------------------
1080p1920x1080(none)1920x1080
2K1920x1080--scale=1.332560x1440
4K1920x1080--scale=23840x2160

CRITICAL: Never set Composition width/height to 3840x2160 directly.

All CSS pixel values (fontSize, padding, gap, width) are designed for 1920x1080.

Setting the Composition to 4K makes everything appear tiny.

Instead, use --scale=2 at render time to upscale without affecting layout.

Voice Options (edge-tts)

Voice IDGenderStyleBest For
-----------------------------------
zh-CN-YunyangNeuralMaleProfessionalTutorials, lectures
zh-CN-YunxiNeuralMaleLivelyCasual, vlogs
zh-CN-XiaoxiaoNeuralFemaleWarmStorytelling
zh-CN-XiaoyiNeuralFemaleBroadcastNews-style

Step 1: Analyze the Markdown

Read the target .md file and extract:

  • Chapter structure: each ## heading becomes one video (one Remotion Composition)
  • Key concepts per chapter: definitions, formulas, examples, metaphors
  • Logical flow: how chapters connect (for transition text between chapters)

Output a chapter plan:

Ch01: [title] - [key points] - connects to next via [hook]
Ch02: [title] - [key points] - connects to next via [hook]

Step 2: Scaffold the Remotion Project

Invoke the environment-setup skill or manually set up:

mkdir -p remotion-videos && cd remotion-videos
npm init -y
npm install remotion @remotion/cli react react-dom @remotion/google-fonts typescript @types/react
mkdir -p src/shared src/compositions public/voiceover public/fonts output

Create these configuration files from the assets/ directory bundled with this skill:

  • remotion.config.ts (from assets/)
  • src/index.ts (from assets/)
  • tsconfig.json (standard React JSX config)
  • package.json scripts: dev, render, build

Shared Components to Create

src/shared/constants.ts

Global design tokens and chapter metadata:

COLORS = { bg, primary, secondary, accent, warning, text, textMuted, cardBg, codeBg, ... }
SPEC = { WIDTH: 1920, HEIGHT: 1080, FPS: 60 }
CHAPTER_TITLES = [...] // one per chapter from Step 1
CHAPTER_COLORS = [...] // accent color per chapter

src/shared/OutlinePage.tsx

A table-of-contents transition page shown at the START of every chapter.

  • Display all chapter titles in two columns
  • Highlight the current chapter with colored border + scale animation
  • Gray out + strikethrough completed chapters
  • Dim upcoming chapters
  • Duration: 240 frames (4s at 60fps), fade-out at end
  • See references/shared-components.md for full template code

src/shared/ChapterAudioLayer.tsx

CRITICAL COMPONENT for voiceover integration.

This component reads voiceover-durations.json and automatically overlays

Audio Sequence elements for each scene in a chapter.

It sits at the Composition level (inside AbsoluteFill), alongside visual Sequences.

// Pseudo-code structure:
ChapterAudioLayer({ chapter: "ch01" })
  reads durations from voiceover-durations.json
  for each scene: renders Sequence(from=offset, duration=audioDur+pad)
    containing Audio(src=staticFile("voiceover/ch01/scene1.mp3"))

This is the component that makes voiceover work. Without it, videos render silent.

src/Root.tsx

Register all Compositions. Key patterns:

  • Layout is ALWAYS 1920x1080 (never change this for higher resolution)
  • Import voiceover-durations.json to auto-calculate durationInFrames
  • calcFrames function: OUTLINE(240) + sum of (ceil(audioDur * FPS) + PAD(60)) per scene

Step 3: Generate Compositions

For each chapter create src/compositions/chXX-slug/VideoComposition.tsx.

Structure Pattern (pseudo-code)

AbsoluteFill(bg)
  ChapterAudioLayer(chapter="chXX")    // AUDIO LAYER - always first
  Sequence(outline, 240 frames)         // Outline page
  Sequence(scene1, audio-synced frames) // Content scene 1
  Sequence(scene2, audio-synced frames) // Content scene 2
  ...

Animation Rules

  • ALWAYS use useCurrentFrame() + interpolate() or spring() for animations
  • NEVER use CSS transitions or animations
  • ALWAYS add extrapolateLeft: "clamp", extrapolateRight: "clamp" to interpolate
  • Use spring() for entrances (cards, text), interpolate() for fades/slides
  • Stagger items: spring({ frame: f - (baseDelay + i * gap), ... })
  • Last scene: add fadeOut via interpolate(f, [endMinus60, end], [1, 0])

Constants-First Design

All editable values at the top of each file:

const COPY = { title, points: [...] }

Step 4: Polish - Plain Language and Transitions

This is the MOST CRITICAL quality step.

Plain Language Rules

  1. Start from daily life, not from math - open with a relatable scenario
  2. Connect to previous chapter - "Last time we learned [X], but what if [Y]?"
  3. Preview next chapter - "Next: [concept name]"
  4. Every abstract concept needs a concrete analogy
  5. No jargon without immediate plain-language translation
  6. Remove section numbers - no "Section X", just titles

Transition Chain

Map the narrative flow before generating code:

Ch01 -> "AI is everywhere" -> introduces linear algebra
Ch02 -> "one person = vector" -> introduces matrix
Ch03 -> "whole class = matrix" -> introduces matrix multiply
...each chapter hooks into the next

Sync Article Updates

When video content is polished, ALSO update the source .md to match.

Article and video should tell the same story with the same metaphors.

Step 5: Voiceover Generation

Write Narration Scripts

For each scene write a spoken narration that:

  • Matches visual content timing
  • Uses conversational oral Chinese (not written style)
  • Has natural pauses via punctuation
  • Says "A cheng B" not "A dot B" for formulas

Generate TTS Audio

Tool: edge-tts (Microsoft free TTS). Install: pip install edge-tts

Use the generate-voiceover.sh script bundled with this skill as a template.

Key command per scene:

edge-tts --voice zh-CN-YunyangNeural --rate=-5% \
  --text "narration text" --write-media public/voiceover/ch01/scene1.mp3

Get duration:

ffprobe -v quiet -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 file.mp3

Save Duration Data

Write all durations to src/voiceover-durations.json:

{ "ch01": { "scene1": 13.2, "scene2": 17.9 }, "ch02": { ... } }

Root.tsx reads this file to auto-compute durationInFrames per Composition.

ChapterAudioLayer reads it to position Audio Sequences correctly.

Embed Audio

ChapterAudioLayer handles this automatically.

Ensure every Composition has this as the first child of AbsoluteFill:

ChapterAudioLayer(chapter="chXX")

Step 6: Render and Merge

Render with Resolution Choice

# 1080p (default)
npx remotion render src/index.ts ch01-slug output/ch01.mp4

# 2K
npx remotion render src/index.ts ch01-slug output/ch01.mp4 --scale=1.33

# 4K
npx remotion render src/index.ts ch01-slug output/ch01.mp4 --scale=2

CRITICAL REMINDER: Never change Composition width/height for higher resolution.

Always use --scale flag. Layout stays 1920x1080, output scales up.

Batch Render

SCALE_FLAG=""  # or "--scale=2" for 4K
for id in ch01-slug ch02-slug ...; do
  npx remotion render src/index.ts $id output/$id.mp4 $SCALE_FLAG
done

Merge into Final Video

for f in output/ch*.mp4; do echo "file '$PWD/$f'"; done > output/filelist.txt
ffmpeg -f concat -safe 0 -i output/filelist.txt -c copy output/full-video.mp4

Common Pitfalls (from production experience)

Circular dependency: COLORS before initialization

NEVER export COLORS from VideoComposition.tsx if Scene files import from it.

Always put shared constants in a separate constants.ts file.

Chinese quotes in JSX strings

Chinese curly quotes inside JS string literals cause esbuild parse errors.

Always use straight quotes or remove decorative quotes.

Silent video (no audio)

Audio does not embed automatically. Must explicitly add ChapterAudioLayer

component inside each Composition's AbsoluteFill.

Tiny text at 4K

Never set Composition to 3840x2160. All CSS is designed for 1920x1080.

Use --scale=2 at render time instead. This was the single biggest gotcha.

voiceover-durations.json incomplete

After generating TTS, verify ALL chapters have entries in the JSON.

Use ffprobe to get accurate durations. Missing chapters get fallback 600 frames

per scene which causes audio/visual desync.

Skills Used in Pipeline

PhaseSkillPurpose
-----------------------
Setupenvironment-setupNode.js, FFmpeg, Remotion
Coderemotion-best-practicesAnimation quality rules
Codescene-plannerStoryboard (optional)

File Structure

remotion-videos/
  scripts/
    generate-voiceover.sh   # TTS batch generation
    sync-durations.js       # Audio to frame sync
  public/voiceover/chXX/    # MP3 files per chapter
  src/
    index.ts                # Entry point
    Root.tsx                 # All Compositions (1920x1080 layout)
    voiceover-durations.json # Audio duration data
    shared/
      constants.ts          # Colors, titles, specs
      OutlinePage.tsx        # TOC transition page
      ChapterAudioLayer.tsx  # Audio overlay component
    compositions/
      ch01-slug/VideoComposition.tsx
      ch02-slug/VideoComposition.tsx
      ...
  output/                   # Rendered MP4s
  remotion.config.ts
  tsconfig.json
  package.json

版本历史

共 1 个版本

  • v1.0.0 Initial release 当前
    2026-04-08 23:26 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 714 📥 244,030
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,362 📥 318,816
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,218 📥 266,709