Use this skill when the source video has narration audio but no usable slide visuals, and the final deliverable should be a slide-based lecture video.
Resolve bundled scripts relative to this skill directory. If the runtime has already opened this SKILL.md, prefer paths like scripts/extract_slide_outline.py and scripts/render_from_timing_csv.py instead of machine-specific absolute paths.
mp4/m4a/mp3/wav, ppt/pptx, pdf, and any pre-rendered slide images.pdf or image directory for rendering. Treat pptx as the source of slide text and as a fallback for export.ffmpeg, ffprobe, pdftoppm.whisper-cli from whisper-cpp plus a multilingual model such as ggml-small.bin.pptx exists and no pdf/images exist, prefer Keynote or PowerPoint export on macOS. Use soffice only as fallback because profile or rendering issues are common.pdf exists, render it to images:```bash
pdftoppm -png -r 200 "$PDF" "$OUTDIR/slide"
```
pptx exists, export to pdf or slide images with Keynote or PowerPoint, then continue from pdf.slide-01.png, slide-02.png, ...```bash
python3 scripts/extract_slide_outline.py \
--pptx "$PPTX" \
--out "$WORKDIR/slide_outline.csv"
```
mp4, extract mono wav:```bash
ffmpeg -y -i "$AUDIO_MP4" -ar 16000 -ac 1 -c:a pcm_s16le "$WORKDIR/audio.wav"
```
wav/mp3/m4a, convert to the same mono wav form if needed.whisper-cli.```bash
whisper-cli -ng \
-m "$MODEL" \
-f "$WORKDIR/audio.wav" \
-l zh \
-ocsv -osrt -of "$WORKDIR/transcript"
```
transcript.csv for downstream parsing. transcript.srt is useful for manual review.-ng to force CPU mode.slide_timings.csv.```csv
slide,start_sec,end_sec,duration_sec,reason
1,0.000,15.000,15.000,opening title and agenda
2,15.000,100.000,85.000,architecture overview starts here
```
duration_sec = end_sec - start_sec.end_sec matches the audio duration or is within a small tolerance.```bash
python3 scripts/render_from_timing_csv.py \
--images "$SLIDE_IMAGES_DIR" \
--timings "$WORKDIR/slide_timings.csv" \
--audio "$WORKDIR/audio.wav" \
--output "$OUT_VIDEO"
```
ffconcat file, validates timing continuity, and calls ffmpeg to encode the final mp4.ffprobe.slide_timings.csv and rerun the render script.slide_timings.csv.Install dependencies on macOS if missing:
brew install ffmpeg poppler whisper-cpp
Typical multilingual model download:
mkdir -p .models
curl -L 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin' -o .models/ggml-small.bin
scripts/extract_slide_outline.py Extract slide text from pptx into CSV or JSON for timing analysis.
scripts/render_from_timing_csv.py Validate a timing CSV, generate an ffconcat, and render the final video with ffmpeg.
共 1 个版本