← 返回
未分类 中文

HN Podcast Transcriber

Automatically fetch, transcribe, and archive Hacker News podcast episodes (Hacker News Morning Brief). Use when the user wants to set up a podcast transcript...
自动获取、转录并存档 Hacker News 播客节目(Hacker News 晨间简报),用于设置播客文字稿。
terrycarter1985 terrycarter1985 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 294
下载
💾 0
安装
1
版本
#latest

概述

HN Podcast Transcriber

Fetch new episodes from the Hacker News Morning Brief podcast RSS feed, transcribe with Whisper, and archive as searchable markdown.

Prerequisites

  • whisper CLI installed (pip install openai-whisper)
  • ffmpeg on PATH (required by whisper; download from https://ffmpeg.org)
  • python3 with standard library (no extra deps for the fetch script)
  • Disk space for audio files (~5-10 MB per episode)

Quick Start

Run the main script to fetch and transcribe all new episodes:

bash scripts/fetch_and_transcribe.sh --archive ~/hn-podcast-archive

First run processes all episodes. Subsequent runs only process new ones (tracked via state.json).

Options

FlagDefaultDescription
----------------------------
--feed URLHN Morning Brief RSSPodcast RSS feed URL
--archive DIR./hn-podcast-archiveArchive root directory
--model MODELturboWhisper model (tiny/base/small/medium/large/turbo)
--limit N0 (all)Max new episodes to process per run

Custom Feeds

Point at any podcast RSS feed:

bash scripts/fetch_and_transcribe.sh --feed "https://example.com/podcast/feed.xml" --archive ./my-podcast-archive

Scheduling

Set up an OpenClaw cron job for daily checks:

  1. Create an isolated cron job that runs the script
  2. Or add a heartbeat check in HEARTBEAT.md

Archive Structure

See references/archive-layout.md for directory layout and state.json schema.

Workflow Summary

  1. Download RSS feed → parse entries
  2. Skip already-processed episodes (state.json lookup)
  3. Download audio (mp3/m4a) to episode directory
  4. Run whisper to produce .txt transcript
  5. Generate cleaned transcript.md with title + date header
  6. Update state.json with processed episode ID

Notes

  • Whisper models cache to ~/.cache/whisper after first download
  • Use --model tiny for speed, --model large for best accuracy
  • Average episode (~6 min) takes ~1-2 min with turbo model on CPU
  • For GPU acceleration, install ffmpeg with CUDA support

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-08 13:23 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 216 📥 46,700
dev-programming

Code Formatter

terrycarter1985
代码格式化最佳实践及常用语言(Python、JavaScript、JSON、Markdown 等)的快速参考,使用 Prettier、Black、ESLint 等工具。
★ 0 📥 767
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 329 📥 93,088