← 返回
内容创作 中文

Video News Downloader

Automated daily news video downloader with AI subtitle proofreading. Downloads CBS Evening News and BBC News at Ten from YouTube, extracts and proofreads sub...
自动每日新闻视频下载器,具备AI字幕校对功能。从YouTube下载CBS晚间新闻和BBC十点新闻,提取并校对字幕。
cyberpsychosissss
内容创作 clawhub v1.0.0 1 版本 99905.1 Key: 无需
★ 1
Stars
📥 1,033
下载
💾 37
安装
1
版本
#automation#bbc#cbs#cron#latest#news#subtitles#video#youtube

概述

Video News Downloader with AI Subtitle Proofreading

Complete workflow for downloading daily news videos, processing subtitles, and serving them via HTTP with web players.

Overview

This skill automates:

  1. Video Download: CBS Evening News + BBC News at Ten from YouTube
  2. Subtitle Processing: Extract auto-captions and convert to VTT format
  3. AI Proofreading: Use DeepSeek to fix speech recognition errors
  4. HTTP Streaming: Serve videos with embedded web players
  5. Scheduled Updates: Daily cron jobs at configurable times

Quick Start

1. Download Latest News

python3 scripts/video_download.py --cbs --bbc

2. Proofread Subtitles

python3 scripts/subtitle_proofreader.py /path/to/subtitle.vtt

Or use DeepSeek directly:

> "校对字幕文件 /path/to/subtitle.vtt"

3. Start HTTP Servers

bash scripts/setup_server.sh

4. Setup Daily Cron Jobs

bash scripts/setup_cron.sh

Commands

Video Download Script

Download CBS only:

python3 scripts/video_download.py --cbs

Download BBC only:

python3 scripts/video_download.py --bbc

Download both:

python3 scripts/video_download.py --cbs --bbc

With subtitle proofreading:

python3 scripts/video_download.py --cbs --bbc --proofread

Subtitle Proofreading

Proofread single file:

python3 scripts/subtitle_proofreader.py <vtt_file_path>

Auto-proofread all news subtitles:

python3 scripts/subtitle_proofreader.py --all

Server Management

Start servers:

bash scripts/setup_server.sh start

Check status:

bash scripts/setup_server.sh status

Stop servers:

bash scripts/setup_server.sh stop

File Structure

/workspace/
├── cbs-live-local/
│   ├── cbs_latest.mp4
│   ├── cbs_latest.en.vtt          # Original subtitle
│   ├── cbs_latest.en.vtt-backup   # Backup
│   ├── cbs_latest-corrected.txt   # DeepSeek corrected text
│   └── cbs_latest-corrections.md  # Error list
│
├── bbc-news-live/
│   ├── bbc_news_latest.mp4
│   ├── bbc_news_latest.en.vtt
│   ├── bbc_news_latest.en.vtt-backup
│   ├── bbc_news_latest-corrected.txt
│   └── bbc_news_latest-corrections.md
│
└── temp/                           # Temporary download files

HTTP Endpoints

EndpointDescription
-----------------------
http://IP:8093/CBS Evening News player
http://IP:8093/cbs_latest.mp4CBS video direct
http://IP:8095/BBC News at Ten player
http://IP:8095/bbc_news_latest.mp4BBC video direct

Cron Jobs

Default Schedule (Beijing Time)

TimeTask
------------
20:00Download latest CBS + BBC videos
20:30DeepSeek proofread subtitles

Manual Cron Setup

See references/cron-setup.md for detailed cron configuration.

DeepSeek Proofreading

What Gets Fixed

  • Speech recognition errors (e.g., "noraster" → "nor'easter")
  • Name errors (e.g., "trunk" → "Trump")
  • Location name errors
  • Professional terminology errors
  • Obvious spelling mistakes

Output Files

For each subtitle file, generates:

  1. -backup.vtt - Original subtitle (never modified)
  2. -corrected.txt - AI-corrected plain text
  3. -corrections.md - List of corrections made

Troubleshooting

Video Download Fails

  • Check yt-dlp is installed: yt-dlp --version
  • Check YouTube URL is accessible
  • Try manual download first

Subtitle Extraction Fails

  • Some videos don't have auto-captions
  • Check if --list-subs shows available languages

Server Won't Start

  • Check ports 8093/8095 are free: lsof -i :8093
  • Check Python http.server is available

Proofreading Issues

  • Ensure DeepSeek model is available
  • Check subtitle file exists and is valid VTT format

See Also

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 12:53 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,548
content-creation

YouTube

byungkyu
使用托管OAuth集成YouTube Data API,支持搜索视频、管理播放列表、获取频道数据及评论互动,适用于用户需要时使用此技能。
★ 142 📥 41,112
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,246