Create a complete illustrated story video: AI-generated images + TTS narration + FFmpeg assembly.
⚠️ Common mistakes to avoid:
✅ Correct approach:
Trigger: User provides a story idea or narrative.
Save to state file, then ask for style.
Trigger: User provides image style.
Split story into N segments (~80-150 chars each). Output:
旁白总字数:XXX 字
预估语音时长:约 XX 秒(X 分 X 秒)
建议配图数量:N 张
Ask for confirmation.
Trigger: User confirms or adjusts.
Trigger: User confirms.
Execute pipeline per segment, sequentially:
for each segment i (0 to N-1):
1. Generate TTS: mmx speech synthesize --text "segment_i" --out seg_i.mp3
2. Generate image: mmx image generate --prompt "style, segment_i scene" --out-dir images/ --out-prefix i
Then assemble each segment into its own video clip, concatenate all clips.
Always use Chinese numerals for generations and years in narration:
See scripts/make_video.py — it handles per-segment clip creation and concatenation automatically.
{
"story": "original story text",
"style": "confirmed style",
"imageCount": 8,
"segments": ["segment 1 text", "segment 2 text", ...],
"outputDir": "/tmp/story-video-XXXX"
}
mmx speech synthesize --text "SEGMENT_TEXT" --out seg_00.mp3 --voice male-qn-badao --non-interactive
mmx image generate --prompt "STYLE, scene description" --aspect-ratio 16:9 --out-dir images/ --out-prefix 00 --non-interactive
python3 scripts/make_video.py --segments-dir /path/to/segments --images /path/to/images --output final.mp4
Final video: {outputDir}/story-video.mp4
Use mmx speech voices to list available voices. Recommended for dramatic stories:
male-qn-badao — deep, dramatic male voicemale-qn-jingying — mature malefemale-chengshu — mature female共 2 个版本