← 返回
数据分析 中文

YouTube Transcript (yt-dlp captions)

Extract YouTube video transcripts from existing captions (manual or auto-generated) using yt-dlp, with optional timestamps and local SQLite caching. Use when...
使用 yt-dlp 从 YouTube 视频的现有字幕(手动或自动生成)中提取文字,可选带时间戳并本地 SQLite 缓存,适用于...
itzsubhadip
数据分析 clawhub v1.0.5 1 版本 99730.7 Key: 无需
★ 0
Stars
📥 2,963
下载
💾 72
安装
1
版本
#latest

概述

YouTube Transcript (Captions-Only)

This skill extracts transcripts from existing YouTube captions.

Primary behavior

  • Prefer manual subtitles when available.
  • Fall back to auto-generated captions.
  • Output either:
  • JSON segments (default) or
  • plain text (--text)
  • Cache results locally in SQLite for speed.

Reliability behavior

  • If YouTube blocks anonymous access (bot-check), provide cookies.txt.
  • If yt-dlp reports no captions for a video, the script tries a fallback:

1) YouTube’s transcript panel (youtubei get_transcript) when accessible

This published version intentionally does not call third-party transcript providers.

Privacy note: This published version only contacts YouTube directly (via yt-dlp and the transcript panel fallback). It does not send video IDs/URLs to third-party transcript providers.

Cookies: Cookies are treated as secrets.

  • The script supports --cookies / YT_TRANSCRIPT_COOKIES, but does not auto-load cookies from inside the skill directory.
  • Store cookies under ~/.config/yt-transcript/.

Path safety: This skill restricts --cookies and --cache paths to approved directories.

  • cookies allowed under: ~/.config/yt-transcript/
  • cache allowed under: {baseDir}/cache/ and ~/.config/yt-transcript/

How to run

Script path:

  • {baseDir}/scripts/yt_transcript.py

Typical usage:

  • python3 {baseDir}/scripts/yt_transcript.py
  • python3 {baseDir}/scripts/yt_transcript.py --lang en
  • python3 {baseDir}/scripts/yt_transcript.py --text
  • python3 {baseDir}/scripts/yt_transcript.py --no-ts

Cookies (optional, but often required on VPS IPs):

  • python3 {baseDir}/scripts/yt_transcript.py --cookies /path/to/youtube-cookies.txt
  • or set env var: YT_TRANSCRIPT_COOKIES=/path/to/youtube-cookies.txt

Publishing safety note: Cookies are optional, so YT_TRANSCRIPT_COOKIES is intentionally not required by skill metadata. Only set it if you need authenticated access.

Best practice: store cookies outside the skill folder (so you never accidentally publish them), e.g. ~/.config/yt-transcript/youtube-cookies.txt, and point to it via --cookies or YT_TRANSCRIPT_COOKIES.

What the script returns

JSON mode (default)

A JSON object:

  • video_id: 11-char id
  • lang: chosen language
  • source: manual | auto | panel
  • segments: list of { start, duration, text } (or text-only when --no-ts)

Text mode (--text)

A newline-separated transcript.

  • By default timestamps are included as [12.34s].
  • Use --no-ts to output only the text lines.

Caching

Default cache DB:

  • {baseDir}/cache/transcripts.sqlite

Cache key includes:

  • video_id, lang, source, include_timestamp, format

Cookie handling (important)

  • Cookies must be in Netscape cookies.txt format.
  • Treat cookies as secrets.
  • Never commit / publish cookies to ClawHub.

Recommended local path (ignored by git/publish):

  • {baseDir}/cache/youtube-cookies.txt (chmod 600)

Notes (safety + reliability)

  • Only accept a YouTube URL or an 11-character video ID.
  • Do not forward arbitrary user-provided flags into the command.
  • If yt-dlp is missing, instruct the user to install it (recommended):
  • install pipx
  • pipx install yt-dlp
  • ensure yt-dlp is on PATH

版本历史

共 1 个版本

  • v1.0.5 当前
    2026-03-28 21:31 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 64,861
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 162 📥 59,678
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 366 📥 139,968