Cross-platform video transcript extraction and optional AI summarization for YouTube and Bilibili.
Extract transcripts, metadata, and optional keyframes from YouTube and Bilibili videos. Outputs structured JSON to stdout. By default, no LLM summarization is performed — the agent receives the full transcript and does its own summarization with full context window.
Supports: macOS, Linux, WSL, Windows VM.
# Single video — transcript only (default, recommended)
video-insight --url "https://www.youtube.com/watch?v=VIDEO_ID"
# Bilibili video
video-insight --url "https://www.bilibili.com/video/BV1xxxxx"
# With LLM summary (opt-in)
video-insight --url "https://..." --summarize
# Channel scan (recent videos)
video-insight --channel "UC_x5XG1OV2P6uZZ5FSM9Ttw" --hours 24
# Quiet mode (no stderr progress)
video-insight --url "https://..." --quiet
# Force refresh (ignore cache)
video-insight --url "https://..." --no-cache
# Extract keyframes too
video-insight --url "https://..." --frames
Summarize video, extract transcript, YouTube summary, Bilibili transcript, video transcript, 视频摘要, 视频总结, B站视频, YouTube视频
{
"ok": true,
"data": {
"video_id": "dQw4w9WgXcQ",
"platform": "youtube",
"title": "Video Title",
"channel": "Channel Name",
"duration_seconds": 212,
"transcript": "Full transcript text without truncation...",
"transcript_with_timestamps": "[0.0-3.2] First segment\n[3.2-6.5] Second...",
"frames": [{"file": "/tmp/.../frame_001.jpg", "time_sec": 30}],
"cached": false
},
"error": null
}
Transcripts are permanently cached at ~/.cache/video-insight/{platform}_{video_id}.json. The .json format stores metadata + transcript together for richer cache hits (title, channel, duration, timestamps). Use --no-cache to force re-fetch.
video-insight --url , receive JSON with full transcript. Use your own LLM context to summarize — you have 128K+ tokens, no need for the script to truncate.--frames flag. Only needed when the user explicitly asks for a visual/image review.ffmpeg and faster-whisper (installed via setup.sh). YouTube videos typically have captions and are much faster.--summarize --quiet for automated pipelines.cd ~/.openclaw/skills/video-insight && bash setup.sh
Required: yt-dlp, youtube-transcript-api, innertube, ffmpeg (system)
Optional: faster-whisper (for Bilibili/captionless videos), requests (for --summarize)
共 1 个版本