← 返回
未分类

story-illustrated-video

Create an illustrated story video with voice narration and AI-generated images. Use when user wants to: make a story video, create illustrated video from story, turn a story into a video with pictures and voice, or any request involving making a video from text narration with AI images. Triggers on: make a story video, create illustrated video, turn story into video, 故事视频, 插画视频.
Create an illustrated story video with voice narration and AI-generated images. Use when user wants to: make a story video, create illustrated video from story, turn a story into a video with pictures and voice, or any request involving making a video from text narration with AI images. Triggers on: make a story video, create illustrated video, turn story into video, 故事视频, 插画视频.
AI搞钱研究室
未分类 community v1.0.1 2 版本 98507.5 Key: 无需
★ 0
Stars
📥 66
下载
💾 0
安装
2
版本
#latest

概述

Story Illustrated Video Skill

Create a complete illustrated story video: AI-generated images + TTS narration + FFmpeg assembly.

Critical Implementation Notes (Lessons Learned)

⚠️ Common mistakes to avoid:

  • DO NOT generate a single long TTS then split it — API will truncate at ~10k chars and you lose the ending
  • DO NOT use a uniform duration per image — each narration segment has different length; images must match their segment's actual audio duration
  • DO NOT use Arabic numerals like "80后" — TTS reads them as "八十后". Use Chinese numerals: "八零后"
  • DO NOT generate all images in parallel then all audio — causes misalignment between segments and images

Correct approach:

  1. Split story into N segments (≤150 chars each)
  2. Generate TTS for segment → then generate its image, sequentially per segment
  3. Assemble: each segment becomes one video clip (image + its own audio)
  4. Concatenate all clips in order

Workflow (Multi-Turn Conversation)

Step 1: Collect Story

Trigger: User provides a story idea or narrative.

Save to state file, then ask for style.

Step 2: Collect Style & Generate Plan

Trigger: User provides image style.

Split story into N segments (~80-150 chars each). Output:

旁白总字数:XXX 字
预估语音时长:约 XX 秒(X 分 X 秒)
建议配图数量:N 张

Ask for confirmation.

Step 3: Confirm Image Count

Trigger: User confirms or adjusts.

Step 4: Execute (No More Questions)

Trigger: User confirms.

Execute pipeline per segment, sequentially:

for each segment i (0 to N-1):
    1. Generate TTS: mmx speech synthesize --text "segment_i" --out seg_i.mp3
    2. Generate image: mmx image generate --prompt "style, segment_i scene" --out-dir images/ --out-prefix i

Then assemble each segment into its own video clip, concatenate all clips.

Number Formatting Rule

Always use Chinese numerals for generations and years in narration:

  • ❌ "80后、90后" → TTS reads "八十后、九十后"
  • ✅ "八零后、九零后"
  • ❌ "1999年" in narration → ✅ "1999年" (numbers in years are fine as-is)

Per-Segment Assembly Script

See scripts/make_video.py — it handles per-segment clip creation and concatenation automatically.

State File

{
  "story": "original story text",
  "style": "confirmed style",
  "imageCount": 8,
  "segments": ["segment 1 text", "segment 2 text", ...],
  "outputDir": "/tmp/story-video-XXXX"
}

Execution Commands

Per-segment TTS (sequential — one segment at a time)

mmx speech synthesize --text "SEGMENT_TEXT" --out seg_00.mp3 --voice male-qn-badao --non-interactive

Per-segment Image

mmx image generate --prompt "STYLE, scene description" --aspect-ratio 16:9 --out-dir images/ --out-prefix 00 --non-interactive

Assembly

python3 scripts/make_video.py --segments-dir /path/to/segments --images /path/to/images --output final.mp4

Output

Final video: {outputDir}/story-video.mp4

Voice Selection

Use mmx speech voices to list available voices. Recommended for dramatic stories:

  • male-qn-badao — deep, dramatic male voice
  • male-qn-jingying — mature male
  • female-chengshu — mature female

Important Notes

  • Image style is applied uniformly to ALL images
  • Narration must be vivid story narration, not summary
  • Each segment: ≤150 Chinese characters for TTS (avoid truncation)
  • Aspect ratio: 16:9 for all images and video
  • Use Chinese numerals for generational references (八零后 not 80后)
  • After completion, report file location and duration to user

版本历史

共 2 个版本

  • v1.0.1 添加脚本 当前
    2026-05-25 14:40 安全 安全
  • v1.0.0 Initial release
    2026-05-22 20:13 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,215 📥 266,476
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,773
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 324,098