概述

Video Post-Production

Use this skill when the user gives you one raw video and wants a finished

short video with:

readable subtitles
highlighted key phrases
basic SFX emphasis
optional BGM
a rendered export

The default workflow is single-input automation: the user provides the

video, and you handle the rest.

Inputs

Required:

one raw video file with spoken audio

Optional:

preferred subtitle color/style
preferred BGM mood
preferred SFX intensity
existing BGM file
existing SFX directory

If optional inputs are missing, proceed with defaults.

Outputs

Create a working directory next to the input video:

_output/

Expected output files:

alignment.json
production_plan.json
subtitles.ass
final.mp4

Default behavior

If the user does not specify a style:

subtitles: bold, high-contrast, mobile-readable
highlights: only on high-value phrases
SFX: light to medium density
BGM: optional, low-volume, ducked under speech

Do not block on missing BGM or SFX. A valid delivery can still be:

subtitles only
subtitles + generated simple SFX
subtitles + BGM

Workflow

1. Check prerequisites

Confirm these are available:

ffmpeg
ffprobe
python3
faster-whisper if transcription is needed
a Chinese-capable font for ASS rendering

Use:

ffmpeg -version | head -1
ffprobe -version | head -1
python3 -c "from faster_whisper import WhisperModel; print('faster_whisper OK')" 2>/dev/null || echo "Need: pip3 install faster-whisper"

If faster-whisper is missing:

pip3 install faster-whisper

2. Transcribe the video

Run:

python3 <skill-path>/scripts/align_speech.py \
  --video "<input_video>" \
  --output "<workdir>/alignment.json" \
  --model "medium" \
  --language "zh"

This produces word-level timing and segment-level timing.

3. Build a post-production plan

Read alignment.json and produce production_plan.json.

The plan must contain:

subtitle_groups
sfx
bgm

Minimum shape:

{
  "subtitle_groups": [],
  "sfx": [],
  "bgm": {
    "mood": "inspirational",
    "tempo": "medium",
    "volume": -18
  }
}

Planning rules

Subtitle grouping:

group into natural spoken phrases
do not split a phrase awkwardly
prefer 3-8 Chinese characters per phrase
prefer 0.5s or longer readable display windows
keep within mobile reading comfort

Keyword highlighting:

highlight emotional, contrastive, numeric, warning, CTA, and payoff phrases
do not over-highlight
default target: 1-3 highlighted phrases per 10 seconds

SFX placement:

opening transitions
key claims
warnings
CTA moments
major topic pivots

Do not place SFX on every subtitle. Keep it sparse enough to feel intentional.

BGM choice:

educational / explainer -> uplifting / neutral
warning / risk -> tense / restrained
positive / benefit-driven -> hopeful / energetic

4. Generate ASS subtitles

Run:

python3 <skill-path>/scripts/generate_subtitles.py \
  --plan "<workdir>/production_plan.json" \
  --output "<workdir>/subtitles.ass" \
  --video-width 720 \
  --video-height 1280 \
  --font "Heiti SC" \
  --font-size 48

If you need deeper ASS styling guidance, read:

/references/ass_effects.md

5. Prepare audio

BGM

Preferred order:

user-provided BGM
existing local BGM in workspace
proceed without BGM

If BGM exists, prepare it:

ffmpeg -stream_loop -1 -i "<bgm_file>" -t <video_duration> \
  -af "volume=-18dB,afade=t=in:d=2,afade=t=out:st=<duration-3>:d=3" \
  -y "<workdir>/bgm_prepared.wav"

SFX

Preferred order:

user-provided SFX directory
existing local SFX directory
generate simple fallback tones/noise

If no SFX files are available, read:

/references/audio_resources.md

6. Render final video

Run:

python3 <skill-path>/scripts/render_video.py \
  --input "<input_video>" \
  --subtitles "<workdir>/subtitles.ass" \
  --plan "<workdir>/production_plan.json" \
  --bgm "<workdir>/bgm_prepared.wav" \
  --sfx-dir "<workdir>/sfx" \
  --output "<workdir>/final.mp4" \
  --resolution "720x1280"

If no BGM exists, omit --bgm.

If no SFX directory exists, omit --sfx-dir.

Quality bar

The result should feel like a polished short-form knowledge video:

subtitles are easy to read on mobile
highlighted words are meaningful, not noisy
SFX emphasize real rhetorical beats
BGM stays below speech and never masks intelligibility
final export is usable without extra manual cleanup

Fallback rules

If the raw video is usable but supporting resources are missing:

still deliver subtitles and a rendered video
note what was skipped
do not block the whole workflow waiting for perfect assets

If transcription quality is poor:

keep subtitle grouping conservative
prefer fewer highlights
avoid aggressive SFX timing

Bundled resources

Scripts:

scripts/align_speech.py
scripts/generate_subtitles.py
scripts/render_video.py

References:

references/ass_effects.md
references/audio_resources.md

Final delivery checklist

Before finishing, ensure:

alignment.json exists
production_plan.json exists
subtitles.ass exists
final video renders successfully
subtitles are visible
speech remains clear
SFX and BGM are not overpowering

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 06:49 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Video Post Production

概述

Video Post-Production

Inputs

Outputs

Default behavior

Workflow

1. Check prerequisites

2. Transcribe the video

3. Build a post-production plan

Planning rules

4. Generate ASS subtitles

5. Prepare audio

BGM

SFX

6. Render final video

Quality bar

Fallback rules

Bundled resources

Final delivery checklist

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Openai Whisper

Nano Banana Pro

Video Frames