← 返回
内容创作 中文

PPT Audio To Video

Convert narration audio plus slide decks into a narrated video. Use when the user has an audio-only `mp4/m4a/mp3/wav` and a `ppt/pptx/pdf` deck, and needs sl...
将旁白音频和幻灯片转换为带旁白的视频。适用于用户拥有mp4/m4a/mp3/wav音频和ppt/pptx/pdf幻灯片,并需要生成视频的情况。
lzfxxx
内容创作 clawhub v0.1.0 1 版本 99902.1 Key: 无需
★ 0
Stars
📥 1,020
下载
💾 451
安装
1
版本
#latest

概述

PPT Audio To Video

Use this skill when the source video has narration audio but no usable slide visuals, and the final deliverable should be a slide-based lecture video.

Resolve bundled scripts relative to this skill directory. If the runtime has already opened this SKILL.md, prefer paths like scripts/extract_slide_outline.py and scripts/render_from_timing_csv.py instead of machine-specific absolute paths.

Core workflow

  1. Inventory inputs.
    • Confirm which of these exist: audio-only mp4/m4a/mp3/wav, ppt/pptx, pdf, and any pre-rendered slide images.
    • Prefer an existing pdf or image directory for rendering. Treat pptx as the source of slide text and as a fallback for export.
  1. Prepare tools.
    • Required for deterministic steps: ffmpeg, ffprobe, pdftoppm.
    • Required for transcription: whisper-cli from whisper-cpp plus a multilingual model such as ggml-small.bin.
    • If only pptx exists and no pdf/images exist, prefer Keynote or PowerPoint export on macOS. Use soffice only as fallback because profile or rendering issues are common.
  1. Produce slide images.
    • If pdf exists, render it to images:

```bash

pdftoppm -png -r 200 "$PDF" "$OUTDIR/slide"

```

  • If only pptx exists, export to pdf or slide images with Keynote or PowerPoint, then continue from pdf.
  • Keep slide filenames ordered and stable, such as slide-01.png, slide-02.png, ...
  1. Extract slide text.
    • Run:

```bash

python3 scripts/extract_slide_outline.py \

--pptx "$PPTX" \

--out "$WORKDIR/slide_outline.csv"

```

  • Use the output to identify slide titles, distinctive keywords, and section changes.
  1. Extract clean audio for ASR.
    • For audio-only mp4, extract mono wav:

```bash

ffmpeg -y -i "$AUDIO_MP4" -ar 16000 -ac 1 -c:a pcm_s16le "$WORKDIR/audio.wav"

```

  • If the source is already wav/mp3/m4a, convert to the same mono wav form if needed.
  1. Transcribe with whisper-cli.
    • Example:

```bash

whisper-cli -ng \

-m "$MODEL" \

-f "$WORKDIR/audio.wav" \

-l zh \

-ocsv -osrt -of "$WORKDIR/transcript"

```

  • Prefer transcript.csv for downstream parsing. transcript.srt is useful for manual review.
  • If GPU allocation fails on macOS, retry with -ng to force CPU mode.
  1. Build slide_timings.csv.
    • Do not average slide durations unless the user explicitly asks for it.
    • Read the transcript and slide outline together, then create a monotonic timing plan by topic changes, section boundaries, and unique keywords.
    • Use this schema:

```csv

slide,start_sec,end_sec,duration_sec,reason

1,0.000,15.000,15.000,opening title and agenda

2,15.000,100.000,85.000,architecture overview starts here

```

  • Keep slide numbers sequential and ensure duration_sec = end_sec - start_sec.
  • Validate that the last end_sec matches the audio duration or is within a small tolerance.
  1. Render the final video.
    • Run:

```bash

python3 scripts/render_from_timing_csv.py \

--images "$SLIDE_IMAGES_DIR" \

--timings "$WORKDIR/slide_timings.csv" \

--audio "$WORKDIR/audio.wav" \

--output "$OUT_VIDEO"

```

  • The script generates an ffconcat file, validates timing continuity, and calls ffmpeg to encode the final mp4.
  1. Verify and iterate.
    • Check output duration with ffprobe.
    • If a slide cuts too early or too late, edit only the affected rows in slide_timings.csv and rerun the render script.
    • Keep the transcript, outline, and timing CSV as reproducible working files.

Heuristics for timing alignment

  • Use section-divider slides briefly. These slides usually hold for 5-20 seconds.
  • Use the first segment that clearly switches topic as the next slide start.
  • Prefer exact topic transitions over title-word matching. ASR often distorts proper nouns and product names.
  • Let the model infer timings, but keep the render step deterministic through slide_timings.csv.
  • When confidence is low, produce a first-cut video and tell the user which slide boundaries likely need review.

Common commands

Install dependencies on macOS if missing:

brew install ffmpeg poppler whisper-cpp

Typical multilingual model download:

mkdir -p .models
curl -L 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin' -o .models/ggml-small.bin

Bundled scripts

  • scripts/extract_slide_outline.py

Extract slide text from pptx into CSV or JSON for timing analysis.

  • scripts/render_from_timing_csv.py

Validate a timing CSV, generate an ffconcat, and render the final video with ffmpeg.

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-30 04:27 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,547
content-creation

YouTube

byungkyu
使用托管OAuth集成YouTube Data API,支持搜索视频、管理播放列表、获取频道数据及评论互动,适用于用户需要时使用此技能。
★ 142 📥 41,112
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 861 📥 200,188