← 返回
内容创作 中文

Video Captions

Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.
生成专业字幕,支持多引擎转录、词级时间轴、样式预设和字幕烧入。
ivangdavila
内容创作 clawhub v1.0.1 1 版本 99849.8 Key: 无需
★ 2
Stars
📥 1,290
下载
💾 44
安装
1
版本
#latest

概述

When to Use

User needs captions or subtitles for video content. Agent handles transcription, timing, formatting, styling, translation, and burn-in across all major formats and platforms.

Quick Reference

TopicFile
-------------
Transcription enginesengines.md
Output formatsformats.md
Styling presetsstyling.md
Platform requirementsplatforms.md

Core Rules

1. Engine Selection by Context

ScenarioEngineWhy
-----------------------
Default (recommended)Whisper local100% offline, no data leaves machine
Apple SiliconMLX WhisperNative acceleration, still local
Word timestampswhisper-timestampedDTW alignment, still local

Default: Whisper local (turbo model). See engines.md for optional cloud alternatives.

2. Format Selection by Platform

PlatformFormatNotes
-------------------------
YouTubeVTT or SRTVTT preferred
Netflix/ProTTMLStrict timing rules
Social (TikTok, IG)Burn-in (ASS)Embedded in video
GeneralSRTUniversal compatibility
Karaoke/effectsASSAdvanced styling

Ask user's target platform if not specified.

3. Professional Timing Standards

Netflix-compliant (default):

  • Min duration: 5/6 second (0.833s)
  • Max duration: 7 seconds
  • Max chars/line: 42
  • Max lines: 2
  • Gap between subtitles: 2+ frames

Social media:

  • Shorter segments (2-4 words)
  • More frequent breaks
  • Centered or dynamic positioning

4. Segmentation Rules

Break lines:

  • After punctuation marks
  • Before conjunctions (and, but, or)
  • Before prepositions

Never separate:

  • Article from noun
  • Adjective from noun
  • First name from last name
  • Verb from subject pronoun
  • Auxiliary from verb

5. Word-Level Timestamps

Use word timestamps for:

  • Karaoke-style highlighting
  • Precise sync verification
  • TikTok/Instagram animated captions
  • Quality checking transcript accuracy

Enable with --word-timestamps flag.

6. Speaker Identification

For multi-speaker content:

  • Use diarization (pyannote local, or cloud APIs if configured)
  • Format: [Speaker 1] or [Name] if known
  • SDH format: JOHN: What do you think?

7. Quality Verification

Before delivering:

  • Check sync at start, middle, end
  • Verify character limits per line
  • Confirm speaker labels if multi-speaker
  • Test burn-in render quality

Workflow

Basic Transcription

# Auto-detect language, output SRT
whisper video.mp4 --model turbo --output_format srt

# Specify language
whisper video.mp4 --model turbo --language es --output_format srt

# Multiple formats
whisper video.mp4 --model turbo --output_format all

Word-Level Timestamps

# Using whisper-timestamped
whisper_timestamped video.mp4 --model large-v3 --output_format srt

# With VAD pre-processing (reduces hallucinations)
whisper_timestamped video.mp4 --vad silero --accurate

Styled Subtitles (ASS)

# Generate SRT first, then convert with style
ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" output.mp4

Burn-In for Social Media

# TikTok/Instagram style (centered, bold)
ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Montserrat-Bold,FontSize=32,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=3,Shadow=0,Alignment=10,MarginV=50'" output.mp4

# Netflix style (bottom, clean)
ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Netflix Sans,FontSize=48,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" output.mp4

Translation

# Transcribe + translate to English
whisper video.mp4 --model turbo --task translate --output_format srt

Format Conversion

# SRT to VTT
ffmpeg -i video.srt video.vtt

# SRT to ASS (for styling)
ffmpeg -i video.srt video.ass

Caption Traps

  • Hallucinations on silence → Use VAD pre-processing or trim silent sections
  • Wrong language detection → Specify --language explicitly for mixed content
  • Timing drift in long videos → Use word timestamps + manual spot-check
  • Character limit violations → Set --max_line_width 42 for Netflix compliance
  • Missing speaker IDs → Enable diarization for multi-speaker content
  • Burn-in quality loss → Use high bitrate output (-b:v 8M)

Common Scenarios

YouTube Video

  1. Transcribe: whisper video.mp4 --output_format vtt
  2. Upload .vtt to YouTube Studio
  3. Review auto-sync suggestions

TikTok/Instagram Reel

  1. Transcribe with word timestamps
  2. Apply bold animated style
  3. Burn-in: ffmpeg -i video.mp4 -vf "subtitles=video.ass" -c:a copy output.mp4
  4. Export at platform resolution

Netflix/Professional

  1. Use Whisper large-v3 for best local accuracy
  2. Export TTML format
  3. Verify: 42 chars/line, 2 lines max, timing gaps
  4. Include translator credit as last subtitle

Podcast/Interview

  1. Enable speaker diarization
  2. Format as dialogue: [SPEAKER]: text
  3. SDH option: include [music], [laughter] descriptions

Foreign Film Translation

  1. Transcribe in original language
  2. Translate: --task translate for English
  3. Or use external translation + timing sync

External Endpoints

Default: 100% LOCAL processing. No network calls.

EndpointData SentWhen Used
--------------------------------
Whisper (local)None (local)Default — always
api.assemblyai.comAudio fileOnly if user sets ASSEMBLYAI_API_KEY
api.deepgram.comAudio fileOnly if user sets DEEPGRAM_API_KEY

Cloud APIs are documented as alternatives but never used unless user explicitly provides API keys and requests cloud processing. By default, all processing stays on your machine.

Security & Privacy

Default workflow is 100% offline:

  • Whisper runs locally on your machine
  • Generated subtitle files stay local
  • Burned-in videos stay local
  • No network calls made

Cloud APIs are OPTIONAL and OPT-IN:

  • Only used if you set ASSEMBLYAI_API_KEY or DEEPGRAM_API_KEY
  • Only triggered when you explicitly use cloud engine commands
  • If you never set these keys, no audio ever leaves your machine

This skill does NOT:

  • Upload anything by default
  • Require internet connection for basic use
  • Store data externally

Related Skills

Install with clawhub install if user confirms:

  • ffmpeg — video/audio processing
  • video — general video tasks
  • video-edit — video editing
  • audio — audio processing

Feedback

  • If useful: clawhub star video-captions
  • Stay updated: clawhub sync

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-29 05:24 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,492
productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 438 📥 147,659
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 860 📥 199,864