← 返回
未分类

Bilibili Video Summary

Extract and summarize Bilibili videos. Fetches subtitles or GPU-transcribed audio, danmaku (scrolling comments), video comments, and description — outputs st...
提取并概括B站视频,获取字幕或GPU转写的音频、弹幕、视频评论和描述,输出结构化摘要。
gkd2323c gkd2323c 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 306
下载
💾 0
安装
1
版本
#latest

概述

Bilibili Video Summary Tool

Extract full content from a Bilibili video — transcript/subtitles, danmaku, comments, and description — then use your own LLM capabilities to produce a deep summary. No external AI API required (no OpenAI / Gemini key needed).

Capabilities

Data SourceMethodPriority
-------------------------------
CC SubtitlesBilibili APIFastest, used if available
Audio Transcriptionwhisper.cpp + Vulkan GPUAutomatic fallback when no subtitles
Video Descriptionyt-dlpAlways captured
Danmaku (scrolling comments)yt-dlpParsed, analyzed for frequent content
CommentsBilibili Comment APIHot-sorted, deduplicated, top liked extracted

Workflow

When you receive a Bilibili video link and are asked to summarize it, follow these steps:

Step 1: Extract all data

python bili-transcript.py "<video_url>"

The script automatically:

  1. Gets video title, uploader, duration, description
  2. Attempts Bilibili CC subtitles (fastest, used if available)
  3. Falls back to GPU transcription: download audio → convert to wav → whisper.cpp with Vulkan
  4. Downloads and analyzes danmaku (scrolling comments)
  5. Fetches video comments, sorted by likes

Output files are saved to ./bili-output/:

  • transcript.txt — full transcript/subtitle text
  • danmaku.json — danmaku data with statistics
  • comments.json — comment data with top-liked

The JSON output includes preview text, danmaku summary, and top comments.

Step 2: Read full transcript

The JSON preview truncates at 2000 characters. Read the full file:

cat ./bili-output/transcript.txt

Step 3: Read danmaku and comments

Review community response data:

cat ./bili-output/danmaku.json
cat ./bili-output/comments.json

Step 4: Compose your summary

Use your own LLM capabilities to produce a comprehensive summary. Suggested structure:

Video Overview — Title, uploader, duration, transcription source (subtitle / GPU). Key info from the description (project links, update notes, etc.).

Core Content — What the video is about. Fluent paragraph summary of the main narrative.

Key Points — Notable arguments, data points, or information worth highlighting.

Community Response (optional) — Reactions from danmaku and comments. Skip if content is insubstantial (spam, trolling, no valuable discussion).

  • Danmaku analysis: look for frequently repeated phrases (community memes/reactions), informative questions, technical discussions, controversy points
  • Comment analysis: look for top-liked opinions, creator interactions, user-reported issues, technical insights

Assessment (optional) — Content quality, information density, notable strengths or weaknesses.

Available Actions

# Video metadata only
python bili-transcript.py "<URL>" --action info

# CC subtitles only (if available)
python bili-transcript.py "<URL>" --action subtitle

# Force GPU transcription (skip subtitle check)
python bili-transcript.py "<URL>" --action transcribe

# Danmaku only
python bili-transcript.py "<URL>" --action danmaku

# Comments only
python bili-transcript.py "<URL>" --action comments

# Custom output directory
python bili-transcript.py "<URL>" --output ./my-output

Environment Variables

VariablePurpose
-------------------
WHISPER_CPP_DIRPath to whisper.cpp directory (containing whisper-cli)
WHISPER_MODELPath to whisper model file (e.g., ggml-large-v3-turbo.bin)
BILI_OUTPUT_DIRDefault output directory (default: ./bili-output)

Performance Reference

Video LengthTotal TimeNotes
-------------------------------
5 minutes~15sGPU transcription is fast
12 minutes~22sDownload + convert + transcribe
1 hour~2-3 minDepends on audio density
Danmaku/Comments~5-10sDepends on comment volume

Dependencies

  • Python packages: yt-dlp, av (PyAV)
  • Transcription engine: whisper.cpp with Vulkan support (optional, only needed if no CC subtitles)
  • Model: ggml-large-v3-turbo.bin (~1.6GB, download separately)
  • GPU: Any Vulkan-compatible GPU (NVIDIA, AMD, Intel) — auto-detected
  • No external AI API keys required

Limitations

  • Requires internet access to Bilibili
  • Some content requires login (paid courses, restricted videos) — may fail
  • Danmaku and comment APIs may be rate-limited
  • whisper.cpp does not support m4a; script auto-converts via PyAV
  • Very long videos (>2 hours) take significant transcription time; try --action subtitle first
  • Comments are fetched from the first 3 pages (~60 comments); may not cover very hot videos fully

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-09 17:02 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

Charset Fix

gkd2323c
解决在 Windows 上通过POSIX shell(Git Bash、MSYS2、WSL、BusyBox 等)运行 AI 代理时的中文/Unicode 编码问题,支持 Python、PowerShell 等。
★ 0 📥 286
knowledge-management

Obsidian

steipete
操作 Obsidian 仓库(纯 Markdown 笔记)并通过 obsidian-cli 自动化。
★ 447 📥 105,356
knowledge-management

web-tools-guide

user_ec205dbb
MANDATORY before calling web_search, web_fetch, browser, or opencli. Contains required error-handling procedures (web_se
★ 77 📥 164,664