概述

飞书语音识别 ASR

触发条件

用户发送飞书语音消息
用户要求将语音转为文字
用户提到"语音识别"、"转文字"

工作流程

1. 获取语音文件

从飞书消息中获取语音文件的file_key，下载为.ogg或.m4a格式。

2. 音频格式转换

使用Python soundfile将音频转换为16kHz采样的WAV格式：

import soundfile as sf
audio, sr = sf.read(voice_file)
# 如果是立体声，转为单声道
if len(audio.shape) > 1:
    audio = audio.mean(axis=1)
sf.write('output.wav', audio, 16000)

3. 使用Whisper识别

import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'  # 国内镜像

from transformers import WhisperForConditionalGeneration, WhisperProcessor, WhisperFeatureExtractor
import soundfile as sf

# 读取音频
audio, sr = sf.read('output.wav')
if len(audio.shape) > 1:
    audio = audio.mean(axis=1)

# 加载模型
processor = WhisperProcessor.from_pretrained('openai/whisper-tiny')
model = WhisperForConditionalGeneration.from_pretrained('openai/whisper-tiny')
feature_extractor = WhisperFeatureExtractor.from_pretrained('openai/whisper-tiny')

# 识别
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors='pt').input_features
with torch.no_grad():
    predicted_ids = model.generate(input_features)

result = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

依赖安装

pip install torch transformers soundfile

模型选择

whisper-tiny: 75MB，适合CPU，最快
whisper-base: 142MB，精度更好
whisper-small: 466MB，精度高

注意事项

首次运行需要下载模型（约75MB-3GB）
建议使用国内镜像：HF_ENDPOINT=https://hf-mirror.com
模型会自动检测语言

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 08:10 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

feishu-asr

概述

飞书语音识别 ASR

触发条件

工作流程

1. 获取语音文件

2. 音频格式转换

3. 使用Whisper识别

依赖安装

模型选择

注意事项

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

UI/UX Pro Max

Nano Banana Pro

Multi-agent-bot-feishu