概述

Speech to Text

Use this skill to turn local audio files into text with a public Whisper-based endpoint.

Quick start

Run:

python3 scripts/transcribe.py /path/to/file.ogg

Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.

For machine-readable output:

python3 scripts/transcribe.py /path/to/file.ogg --json

To disable cleanup and keep the raw model text:

python3 scripts/transcribe.py /path/to/file.ogg --format raw

To force Chinese punctuation cleanup:

python3 scripts/transcribe.py /path/to/file.ogg --format zh

For English translation instead of same-language transcription:

python3 scripts/transcribe.py /path/to/file.ogg --task translate

Workflow

Confirm the input is a local audio file.
Run scripts/transcribe.py on it.
If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
If helpful, post-process into:

cleaned transcript
summary
action items
bilingual output

What the script does

The script:

uploads the local file to a public Gradio-backed Hugging Face Space
submits a Whisper transcription job
waits for completion via the Gradio event stream
prints the resulting text

Default endpoint:

https://hf-audio-whisper-large-v3-turbo.hf.space

Override it with:

python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space

or set:

export HF_WHISPER_SPACE=https://your-space.hf.space

Guardrails

Treat this as a best-effort public/free path, not a privacy-grade path.
Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
Expect rate limits, queueing, and occasional outages.
If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.

Output handling

Prefer to return:

the raw transcript when the user asked to "转文字/听写"
a cleaned version when punctuation is poor
a short note about uncertainty if names, numbers, or jargon may be wrong

Script

scripts/transcribe.py — public Whisper transcription helper

版本历史

共 1 个版本

v1.0.0 当前

2026-03-31 07:21 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Speech to Text

概述

Speech to Text

Quick start

Workflow

What the script does

Guardrails

Output handling

Script

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

ontology

Nano Banana Pro

Self-Improving + Proactive Agent