/tts)/podcast or /explainer)Transcribe audio files to text using coli asr, which runs fully offline via local
speech recognition models. No API key required. Supports Chinese, English, Japanese,
Korean, and Cantonese (sensevoice model) or English-only (whisper model).
Run coli asr --help for current CLI options and supported flags.
shared/config-pattern.md before any interactionshared/common-patterns.md for interaction patternsUse the AskUserQuestion tool for every multiple-choice step — do NOT print options as
plain text. Ask one question at a time. Wait for the user's answer before proceeding.
After all parameters are collected, summarize and ask the user to confirm before
running any transcription.
Before config setup, silently check the environment:
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)
| Issue | Action |
|---|---|
| ------- | -------- |
coli not found | Block. Tell user to run npm install -g @marswave/coli first |
ffmpeg not found | Warn (WAV files still work). Suggest brew install ffmpeg / sudo apt install ffmpeg |
| Models not downloaded | Inform user: first transcription will auto-download models (~60MB) to ~/.coli/models/ |
If coli is missing, stop here and do not proceed.
Follow shared/config-pattern.md Step 0.
Initial defaults:
# 当前目录:
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"
# 全局:
mkdir -p "$HOME/.listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json"
CONFIG_PATH="$HOME/.listenhub/asr/config.json"
Config summary display:
当前配置 (asr):
模型:sensevoice / whisper-tiny.en
润色:开启 / 关闭
Ask in order:
polish: truepolish: falseSave all answers at once after collecting them.
If the user hasn't provided a file path, ask:
> "请提供要转录的音频文件路径。"
Verify the file exists before proceeding.
准备转录:
文件:{filename}
模型:{model}
润色:{是 / 否}
继续?
Run coli asr with JSON output (to get metadata):
coli asr -j --model {model} "{file}"
On first run, coli will automatically download the required model. This may take a
moment — inform the user if models haven't been downloaded yet.
Parse the JSON result to extract text, lang, emotion, event, duration.
If polish is true, take the raw text from the transcription result and rewrite
it to fix punctuation, remove filler words, and improve readability. Preserve the
original meaning and speaker intent. Do not summarize or paraphrase.
Display the transcript directly in the conversation:
转录完成
{transcript text}
─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}s
If polished, show the polished version with a note that it was AI-refined. Offer to
show the raw original on request.
After presenting the result, ask:
Question: "保存为 Markdown 文件到当前目录?"
Options:
- "是" — save to current directory
- "否" — done
If yes, write {audio-filename}-transcript.md to the current working directory
(where the user is running Claude Code). The file should contain the transcript text
(polished version if polish was enabled), with a front-matter header:
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---
{transcript text}
> "帮我转录这个文件 meeting.m4a"
coli asr -j --model sensevoice "meeting.m4a"> "transcribe interview.wav, no polish"
coli asr -j --model sensevoice "interview.wav"共 1 个版本