← 返回
内容创作 中文

Podcast Transcribe

For transcript or subtitle requests involving podcast URLs, public audio URLs/files, or raw transcript cleanup. Generates audio + SRT + TXT artifacts and can...
For transcript or subtitle requests involving podcast URLs, public audio URLs/files, or raw transcript cleanup. Generates audio + SRT + TXT artifacts and can...
dairui1
内容创作 clawhub v1.4.1 2 版本 100000 Key: 无需
★ 0
Stars
📥 668
下载
💾 3
安装
2
版本
#latest

概述

Transcribe with podcast-helper

Generate transcript artifacts from a podcast episode, audio file, or raw transcript, with an optional cleanup pass that uses episode-page context.

Default Workflow

  1. Choose a dedicated output directory such as ./out//.
  2. Run npx podcast-helper transcribe --output-dir --json.
  3. Add --progress jsonl only when machine-readable progress is needed.
  4. Report the generated artifact paths for audio, .srt, and .txt.
  5. Ask whether the user wants cleanup. Do not run cleanup implicitly.

If you are already inside this repository and dist/cli.js exists, node dist/cli.js ... is acceptable. Do not default to repository-local build steps outside this repository.

If you are inside this repository and dist/cli.js is missing, run pnpm run build before using the repo-local entry point.

Gotchas

  • Prefer no-install entry points first: npx, then pnpm dlx, then a globally installed podcast-helper.
  • Let the CLI auto-select the engine unless the user explicitly requests a backend or needs offline Apple Silicon transcription.
  • Spotify URLs are unsupported because the audio is DRM-protected. Ask for an RSS-backed episode page, Apple Podcasts link, or direct audio URL instead.
  • YouTube inputs require yt-dlp.
  • Generic episode pages sometimes hide audio metadata. If source resolution fails, download the audio separately and rerun with the file path.
  • Hosted transcription failures usually come from a missing or wrong provider API key.
  • Local mlx-whisper runs require ffmpeg, python3, and a working runtime from podcast-helper setup mlx-whisper.
  • Keep the raw transcript untouched. Cleanup should write a sibling *.cleaned.txt.

Command Forms

Default:

npx podcast-helper transcribe <input> --output-dir ./out/<slug> --json

Fallbacks:

  • pnpm dlx podcast-helper transcribe --output-dir ./out/ --json
  • podcast-helper transcribe --output-dir ./out/ --json
  • node dist/cli.js transcribe --output-dir ./out/ --json only inside this repository

For offline Apple Silicon:

npx podcast-helper transcribe <input> --engine mlx-whisper --output-dir ./out/<slug> --json

Cleanup Branch

Only enter cleanup when the user asks for it or already has a raw transcript.

  1. Fetch episode context with curl https://r.jina.ai/.
  2. Use the page as reference context for obvious ASR repairs, especially names and proper nouns.
  3. Do not summarize, invent missing content, or overwrite the raw transcript.
  4. Write a sibling *.cleaned.txt file.

If no episode URL is available, clean conservatively and explicitly say that external episode context was not used.

References

  • Read references/inputs-and-engines.md for supported inputs, engine selection, and dependency notes.
  • Read references/output-contract.md for the JSON success and failure envelopes and progress handling.
  • Read references/cleanup.md for detailed cleanup rules and conservative editing guidance.
  • Read references/verification.md for smoke-test inputs and verification steps.
  • Read references/setup.md when installing this skill into Claude Code, OpenClaw, or other agents.

版本历史

共 2 个版本

  • v1.4.1 当前
    2026-05-01 15:33 安全 安全
  • v0.1.3
    2026-03-20 03:22 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,429
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,143
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 857 📥 199,402