← 返回
未分类 中文

Whisper ASR — Speech-to-Text

Automatic Speech Recognition using OpenAI Whisper (local GPU). Supports Chinese, English, and 90+ languages. Auto-detects language.
使用 OpenAI Whisper(本地 GPU)进行自动语音识别。支持中文、英文及 90 多种语言。自动检测语言。
vincentlau2046-sudo vincentlau2046-sudo 来源
未分类 clawhub v1.2.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 264
下载
💾 0
安装
1
版本
#latest

概述

ASR — Speech-to-Text (FunASR + Whisper)

Two engines for different scenarios:

EngineBest ForChinese QualitySpeed
------------------------------------------
FunASR SenseVoice (default)Chinese, Japanese, Korean⭐⭐⭐ 简体Fast (0.03 RTF)
OpenAI WhisperMultilingual, translation⭐⭐ (繁体)Slower

Quick Start

# Default: FunASR SenseVoice (best Chinese)
{baseDir}/scripts/asr.py --input audio.mp3

# Whisper for multilingual / translation
{baseDir}/scripts/asr.py --input audio.mp3 --engine whisper

Options

OptionDefaultDescription
------------------------------
--input(required)Input audio file (mp3, wav, m4a, etc.)
--enginefunasrASR engine: funasr (SenseVoice) or whisper
--languageautoLanguage code: zh, en, ja, ko, etc. (auto-detect if omitted)
--modelbaseWhisper model size: tiny/base/small/medium/large (whisper only)
--tasktranscribetranscribe or translate (whisper only)
--outputWrite transcript to file (default: stdout)

Engine Details

FunASR SenseVoice-Small (Default)

  • Model: iic/SenseVoiceSmall (893MB, auto-downloaded from ModelScope)
  • Strengths: 简体中文最佳、情感识别、语音事件检测、速度极快
  • Output: 简体中文,自动去除特殊标记
  • Languages: zh, en, ja, ko, yue (Cantonese)

OpenAI Whisper

  • Model: base (139MB, auto-downloaded)
  • Strengths: 90+ languages、翻译模式、多语言场景
  • Output: 中文输出繁体字(已知问题,换 small 模型可改善)
  • Whisper model sizes:
ModelVRAMSpeedAccuracy
------------------------------
tiny~1GBFastestLow
base~1GBFastOK
small~2GBMediumGood
medium~5GBSlowBetter
large~10GBSlowestBest

Examples

# Chinese audio → FunASR (default, best quality)
{baseDir}/scripts/asr.py --input meeting.mp3

# Force Chinese language
{baseDir}/scripts/asr.py --input podcast.wav --language zh

# Multilingual audio → Whisper
{baseDir}/scripts/asr.py --input mixed.wav --engine whisper

# Whisper with better model
{baseDir}/scripts/asr.py --input lecture.mp3 --engine whisper --model small

# Translate Chinese speech to English text
{baseDir}/scripts/asr.py --input speech.mp3 --engine whisper --language zh --task translate

# Save transcript to file
{baseDir}/scripts/asr.py --input audio.wav --output transcript.txt

Dependencies

  • funasr + modelscope (FunASR engine)
  • openai-whisper (Whisper engine)
  • imageio-ffmpeg (bundled ffmpeg binary)
  • First run downloads model weights (auto-cached in ~/.cache/)

版本历史

共 1 个版本

  • v1.2.0 当前
    2026-05-23 23:37 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 216 📥 46,535
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 329 📥 92,885
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 424 📥 116,179