← 返回
未分类 中文

HN Podcast Archive

Automate podcast archiving by detecting new HN episodes from RSS, downloading audio, transcribing locally with Whisper, and generating markdown archives with...
自动检测 HN 播客新集,从 RSS 获取,下载音频,使用 Whisper 本地转录,生成 Markdown 归档等
terrycarter1985 terrycarter1985 来源
未分类 clawhub v1.0.0 1 版本 99731.9 Key: 无需
★ 0
Stars
📥 372
下载
💾 0
安装
1
版本
#latest

概述

HN Podcast Archive

Set up or maintain a repeatable pipeline that:

  1. reads an RSS feed,
  2. detects new episodes,
  3. downloads audio,
  4. transcribes with local Whisper,
  5. writes a markdown archive per episode,
  6. updates index/state files.

Workflow

  1. Read references/layout.md to understand the expected archive layout and outputs.
  2. Use scripts/hn_podcast_archive.py as the primary implementation.
  3. Run python3 scripts/hn_podcast_archive.py --help to inspect options.
  4. For first-time setup, ensure required binaries and Python modules exist.
  5. For automation, schedule the script on a recurring cadence with a stable output directory.

Required runtime dependencies

The script expects:

  • ffmpeg in PATH
  • whisper in PATH
  • Python 3.10+
  • Python package feedparser

If any dependency is missing, surface a clear setup note instead of pretending the pipeline is ready to execute.

Recommended command

python3 skills/hn-podcast-archive/scripts/hn_podcast_archive.py \
  --feed-url "https://example.com/podcast.rss" \
  --output-dir ./data/hn-podcast-archive \
  --whisper-model turbo

Output expectations

For each ingested episode, create:

  • downloaded audio under audio/
  • transcript under transcripts/
  • markdown archive under episodes/

Keep these shared files current:

  • index.md
  • state.json
  • run-log.jsonl

Automation guidance

For automation, prefer a cron/standing-order style trigger that runs every few hours. The script is idempotent at the episode level by tracking processed GUIDs/URLs in state.json.

Safe operating rules

  • Never overwrite unrelated archive content.
  • Skip already-processed episodes unless explicitly forced.
  • Preserve source metadata (title, published date, audio URL, guid).
  • If transcription fails after download, keep the audio and record the failure in the log/state.

Customization points

Useful flags:

  • --limit N to ingest only recent items during testing
  • --force to reprocess already-seen items
  • --dry-run to inspect actions without writing outputs
  • --whisper-model to trade speed vs accuracy

Packaging/publishing

Package the skill from its folder. Publish with ClawHub only after local validation passes and authentication is available.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 11:54 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,094 📥 820,786
dev-programming

Code Formatter

terrycarter1985
代码格式化最佳实践及常用语言(Python、JavaScript、JSON、Markdown 等)的快速参考,使用 Prettier、Black、ESLint 等工具。
★ 0 📥 746
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,390 📥 321,737