← 返回
未分类 Key

Xiaoyuzhou Asr

Transcribe 小宇宙 (Xiaoyuzhou) podcast episodes to text using local Qwen3-ASR speech recognition. Combines xyz API (小宇宙FM API) to fetch episode metadata and aud...
使用本地Qwen3‑ASR 语音识别将小宇宙播客节目转写为文本,并结合 xyz API(小宇宙FM API)获取节目元数据和音频信息。
worldwonderer worldwonderer 来源
未分类 clawhub v2.0.0 2 版本 100000 Key: 需要
★ 0
Stars
📥 374
下载
💾 0
安装
2
版本
#latest

概述

xiaoyuzhou-asr

Transcribe 小宇宙 podcast episodes to text using local Qwen3-ASR (Metal/CUDA accelerated).

> Required service: this skill does not call 小宇宙 directly. It requires a compatible

> ultrazg/xyz API server to be installed and running.

> Default base URL is http://localhost:23020; override it with XYZ_BASE_URL. Without

> this service, login, search, episode lookup, and audio URL retrieval will not work.

Prerequisites

  1. xyz API server running — fetches episode data and audio URLs from 小宇宙

```bash

git clone https://github.com/ultrazg/xyz.git && cd xyz && go run .

# Default port: 23020, change with -p

```

  1. ffmpeg — audio format conversion (brew install ffmpeg)
  2. Qwen3-ASR model — download (HF Hub does NOT ship tokenizer.json):

```bash

python3 -c "

from huggingface_hub import snapshot_download

snapshot_download('Qwen/Qwen3-ASR-0.6B', local_dir='models/0.6B')

"

```

  1. qwen3-asr-rs — build from source:

```bash

git clone https://github.com/alan890104/qwen3-asr-rs.git && cd qwen3-asr-rs

cargo build --release --example local_transcribe

```

  1. tokenizer.json — auto-generated by the transcription script on first run (from vocab.json + merges.txt). No manual step needed.

Quick Start

# 1. Login (first time only, saves to ~/.xiaoyuzhou-asr.json)
python3 scripts/transcribe_podcast.py --login

# 2. Check all dependencies
python3 scripts/transcribe_podcast.py --check-env

# 3. Transcribe a single episode
python3 scripts/transcribe_podcast.py --keyword "早咖啡" -o output.md

# Or transcribe a shared episode URL
python3 scripts/transcribe_podcast.py --url "https://www.xiaoyuzhoufm.com/episode/EPISODE_ID" -o output.md

CLI Commands

Authentication

# Interactive login — sends verification code to phone, saves tokens
python3 scripts/transcribe_podcast.py --login

Discovery

# Search podcasts and show PID (for batch mode)
python3 scripts/transcribe_podcast.py --podcast-info --keyword "声动早咖啡"

# List recent episodes of a podcast
python3 scripts/transcribe_podcast.py --list-episodes --pid PODCAST_ID --count 10

Transcription

# Single episode by keyword (picks first result)
python3 scripts/transcribe_podcast.py --keyword "关键词" -o output.md

# Single episode by EID
python3 scripts/transcribe_podcast.py --eid EPISODE_ID -o output.md

# Single episode by Xiaoyuzhou URL
python3 scripts/transcribe_podcast.py --url "https://www.xiaoyuzhoufm.com/episode/EPISODE_ID" -o output.md

# Batch: transcribe 5 latest episodes of a podcast
python3 scripts/transcribe_podcast.py --pid PODCAST_ID --count 5 -o ./transcripts/

# With specific format
python3 scripts/transcribe_podcast.py --eid EPISODE_ID --format srt -o output.srt

Diagnostics

# Check all dependencies (ffmpeg, xyz API, token, ASR binary, model)
python3 scripts/transcribe_podcast.py --check-env

Output Formats

FormatFlagDescription
---------------------------
Markdown--format markdown (default)Metadata header + transcript
SRT--format srtSubtitles with estimated timestamps
Plain text--format txtMinimal header + transcript
JSON--format jsonMetadata + transcript as JSON

Batch Mode

  • Transcribes the N most recent episodes of a podcast (--pid --count N)
  • Saves each episode as a separate file in the output directory
  • Checkpoint/resume: skips episodes that already exist in the output directory

Configuration

Settings are resolved in priority order: CLI argument > Environment variable > Config file.

Config File (~/.xiaoyuzhou-asr.json)

Auto-created by --login. Can also store paths:

{
  "token": "x-jike-access-token",
  "refresh_token": "x-jike-refresh-token",
  "model_dir": "/path/to/models/0.6B",
  "asr_bin": "/path/to/local_transcribe"
}

Environment Variables

VariableDescriptionDefault
--------------------------------
XYZ_ACCESS_TOKENx-jike access token— (required)
XYZ_REFRESH_TOKENRefresh token for auto-renewal— (optional)
XYZ_BASE_URLxyz API base URLhttp://localhost:23020
XYZ_HTTP_TIMEOUTxyz API request timeout in seconds15
XYZ_DOWNLOAD_TIMEOUTAudio download timeout in seconds120
QWEN3_ASR_MODEL_DIRQwen3-ASR model directoryauto-detect
QWEN3_ASR_BINlocal_transcribe binary pathauto-detect

Token Management

  • --login saves tokens to config file automatically
  • If API returns 401, auto-refresh using refresh token
  • Prompt user to login if no valid token

References

Constraints

  • MUST split audio into ≤3-minute segments for Metal GPU stability (auto-handled by script)
  • Audio must be WAV 16kHz mono (auto-converted by script)
  • tokenizer.json auto-generated on first run (from vocab.json + merges.txt)
  • xyz API requires Chinese phone number (+86) login
  • All processing is local — audio never leaves the machine
  • Download retries up to 3 times on network failure

Script Reuse

This is a skill project, not a packaged Python library. Prefer the CLI above. Other scripts in

this repository can still import scripts/transcribe_podcast.py directly:

from transcribe_podcast import (
    search_episodes, transcribe_episode, format_output,
    TranscriptionError, ApiError, TokenExpiredError,
)

try:
    episodes, _ = search_episodes(token, "早咖啡")
    episode, transcript, timings = transcribe_episode(token, eid, model_dir, asr_bin)
    output = format_output(episode, transcript)
except TranscriptionError as e:
    print(f"Error: {e}")

版本历史

共 2 个版本

  • v2.0.0 当前
    2026-05-29 13:40
  • v1.0.0
    2026-05-08 02:08 安全 安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 435 📥 117,829
content-creation

Story Long Write

worldwonderer
长篇网络小说创作助手,提供大纲、正文、世界观、人物与情节线管理。触发方式:/story-long-write、/写长篇、帮我开书、写大纲、日更、续写、继续写、修改第X章、回炉、重写第X章。
★ 1 📥 1,247
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 335 📥 94,638