← 返回
内容创作 中文

YouTube Transcribe

Transcribe YouTube videos with smart fallback: extracts captions first (fast, free), falls back to local Whisper transcription when no captions available. Au...
{"answer": "智能回退转录 YouTube 视频:优先提取字幕(快速免费),无字幕时回退至本地 Whisper 转录。"}
iml885203
内容创作 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 608
下载
💾 21
安装
1
版本
#latest

概述

YouTube Transcribe

Smart YouTube video transcription with automatic fallback:

  1. Captions first — extracts existing subtitles (manual or auto-generated) via yt-dlp. Fast, free, no compute.
  2. Whisper fallback — when no captions exist, downloads audio and transcribes locally with the best available Whisper backend.

When to Use

Use this skill when the user wants to:

  • Get a transcript or text version of a YouTube video
  • Understand what a YouTube video says without watching it
  • Summarize, analyze, or take notes from a YouTube video
  • Extract subtitles or captions from a video

Triggers

  • "transcribe this YouTube video"
  • "what does this video say"
  • "get the transcript of [YouTube URL]"
  • "summarize this YouTube video" (transcribe first, then process)
  • Any YouTube URL shared with a request to understand its content

Requirements

Required:

  • yt-dlp — for caption extraction and audio download
  • python3

For Whisper fallback (when no captions available):

  • ffmpeg — for audio processing
  • One of these Whisper backends (auto-detected in priority order):
  1. mlx-whisper — Apple Silicon native, fastest on Mac (pip install mlx-whisper)
  2. faster-whisper — CTranslate2 backend, fast on CUDA/CPU (pip install faster-whisper)
  3. openai-whisper — Original Whisper, universal fallback (pip install openai-whisper)

Usage

Basic — transcribe a video

python3 {baseDir}/scripts/transcribe.py "https://www.youtube.com/watch?v=VIDEO_ID"

Specify language for captions

python3 {baseDir}/scripts/transcribe.py "URL" --language zh

Force Whisper (skip caption check)

python3 {baseDir}/scripts/transcribe.py "URL" --force-whisper

JSON output

python3 {baseDir}/scripts/transcribe.py "URL" --format json

Save to file

python3 {baseDir}/scripts/transcribe.py "URL" --output transcript.txt

Options

FlagDefaultDescription
----------------------------
--languageautoPreferred subtitle/transcription language (e.g. zh, en, ja)
--formattextOutput format: text, json, srt, vtt
--outputstdoutSave transcript to file
--force-whisperfalseSkip caption extraction, go straight to Whisper
--backendautoWhisper backend: auto, mlx, faster-whisper, whisper
--modelautoWhisper model size: auto, large-v3, medium, small, base, tiny

Environment Variables

VariableDescription
-----------------------
YT_WHISPER_BACKENDOverride Whisper backend selection
YT_WHISPER_MODELOverride Whisper model size

Auto-Detection

Whisper Backend (priority order)

  1. MLX Whisper — detected via import mlx_whisper. Best for Apple Silicon.
  2. faster-whisper — detected via import faster_whisper. Best for CUDA GPU, good on CPU.
  3. OpenAI Whisper — detected via import whisper. Universal fallback.

Model Size (based on available RAM)

RAMModelVRAM/RAM Usage
----------------------------
≥16GBlarge-v3~6-10GB
≥8GBmedium~5GB
≥4GBsmall~2.5GB
<4GBbase~1.5GB

Caption Language Priority

When --language is not specified, captions are searched in this order:

  1. Video's original language
  2. Chinese variants: zh-Hant, zh-Hans, zh-TW, zh-CN, zh
  3. English: en
  4. Any available language

Output Formats

text (default)

Plain text transcript, one continuous block.

json

{
  "video_id": "ZSnYlbIYpjs",
  "title": "Video Title",
  "channel": "Channel Name",
  "duration": 708,
  "language": "zh",
  "method": "captions",
  "transcript": [
    {"start": 0.0, "end": 5.2, "text": "..."},
    ...
  ],
  "full_text": "Complete transcript as single string"
}

srt / vtt

Standard subtitle formats with timestamps.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 18:45 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 860 📥 199,907
developer-tools

Felo AI Search

iml885203
通过 Felo API 进行 AI 合成网络搜索,聚合 15-40 个来源生成结构化摘要。适用场景:(1)研究需综合多源信息的主题……
★ 0 📥 873
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,502