概述

HN Podcast Transcriber

Fetch new episodes from the Hacker News Morning Brief podcast RSS feed, transcribe with Whisper, and archive as searchable markdown.

Prerequisites

whisper CLI installed (pip install openai-whisper)
ffmpeg on PATH (required by whisper; download from https://ffmpeg.org)
python3 with standard library (no extra deps for the fetch script)
Disk space for audio files (~5-10 MB per episode)

Quick Start

Run the main script to fetch and transcribe all new episodes:

bash scripts/fetch_and_transcribe.sh --archive ~/hn-podcast-archive

First run processes all episodes. Subsequent runs only process new ones (tracked via state.json).

Options

Flag	Default	Description
------	---------	-------------
`--feed URL`	HN Morning Brief RSS	Podcast RSS feed URL
`--archive DIR`	`./hn-podcast-archive`	Archive root directory
`--model MODEL`	`turbo`	Whisper model (tiny/base/small/medium/large/turbo)
`--limit N`	0 (all)	Max new episodes to process per run

Custom Feeds

Point at any podcast RSS feed:

bash scripts/fetch_and_transcribe.sh --feed "https://example.com/podcast/feed.xml" --archive ./my-podcast-archive

Scheduling

Set up an OpenClaw cron job for daily checks:

Create an isolated cron job that runs the script
Or add a heartbeat check in HEARTBEAT.md

Archive Structure

See references/archive-layout.md for directory layout and state.json schema.

Workflow Summary

Download RSS feed → parse entries
Skip already-processed episodes (state.json lookup)
Download audio (mp3/m4a) to episode directory
Run whisper to produce .txt transcript
Generate cleaned transcript.md with title + date header
Update state.json with processed episode ID

Notes

Whisper models cache to ~/.cache/whisper after first download
Use --model tiny for speed, --model large for best accuracy
Average episode (~6 min) takes ~1-2 min with turbo model on CPU
For GPU acceleration, install ffmpeg with CUDA support

版本历史

共 1 个版本

v1.0.0 当前

2026-05-08 13:23 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

HN Podcast Transcriber

概述

HN Podcast Transcriber

Prerequisites

Quick Start

Options

Custom Feeds

Scheduling

Archive Structure

Workflow Summary

Notes

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

UI/UX Pro Max

Code Formatter

Openai Whisper