← 返回
AI智能 中文

Speech to Text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...
使用 Gradio 上的公共 Hugging Face Whisper Space 将音频文件转录或翻译为文字。用于处理用户发送的语音笔记、音频附件等。
shu-hari
AI智能 clawhub v1.0.0 1 版本 99869.3 Key: 无需
★ 0
Stars
📥 764
下载
💾 12
安装
1
版本
#audio#free#latest#speech-to-text#transcription#voice#whisper

概述

Speech to Text

Use this skill to turn local audio files into text with a public Whisper-based endpoint.

Quick start

Run:

python3 scripts/transcribe.py /path/to/file.ogg

Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.

For machine-readable output:

python3 scripts/transcribe.py /path/to/file.ogg --json

To disable cleanup and keep the raw model text:

python3 scripts/transcribe.py /path/to/file.ogg --format raw

To force Chinese punctuation cleanup:

python3 scripts/transcribe.py /path/to/file.ogg --format zh

For English translation instead of same-language transcription:

python3 scripts/transcribe.py /path/to/file.ogg --task translate

Workflow

  1. Confirm the input is a local audio file.
  2. Run scripts/transcribe.py on it.
  3. If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
  4. If helpful, post-process into:
    • cleaned transcript
    • summary
    • action items
    • bilingual output

What the script does

The script:

  • uploads the local file to a public Gradio-backed Hugging Face Space
  • submits a Whisper transcription job
  • waits for completion via the Gradio event stream
  • prints the resulting text

Default endpoint:

  • https://hf-audio-whisper-large-v3-turbo.hf.space

Override it with:

python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space

or set:

export HF_WHISPER_SPACE=https://your-space.hf.space

Guardrails

  • Treat this as a best-effort public/free path, not a privacy-grade path.
  • Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
  • Expect rate limits, queueing, and occasional outages.
  • If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.

Output handling

Prefer to return:

  • the raw transcript when the user asked to "转文字/听写"
  • a cleaned version when punctuation is poor
  • a short note about uncertainty if names, numbers, or jargon may be wrong

Script

  • scripts/transcribe.py — public Whisper transcription helper

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-31 07:21 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,922
ai-intelligence

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 418 📥 115,252
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,359 📥 318,562