概述

Voice to Text

Convert voice messages and audio files to text using Vosk, an offline speech recognition toolkit.

Setup

Install dependencies:

```bash

# macOS

brew install ffmpeg

pip install vosk

# Linux

apt-get install ffmpeg

pip install vosk

```

Download a Vosk model:

```bash

mkdir -p ~/.vosk/models && cd ~/.vosk/models

# Chinese (small, fast)

curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip

unzip vosk-model-small-cn-0.22.zip

# English (small)

curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip

unzip vosk-model-small-en-us-0.15.zip

```

Usage

When the user provides a voice message or audio file path, run the transcription:

python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

For specific model selection, set the environment variable:

VOSK_MODEL_PATH=~/.vosk/models/vosk-model-cn-0.22 python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

Supported Audio Formats

MP3, WAV, M4A, OGG, FLAC, AAC, WEBM
Voice messages from WeChat, Telegram, WhatsApp, etc.

Available Models

Model	Language	Size	Notes
-------	----------	------	-------
vosk-model-small-cn-0.22	Chinese	42M	Fast, good accuracy
vosk-model-cn-0.22	Chinese	1.3G	High accuracy
vosk-model-small-en-us-0.15	English	40M	Fast, good accuracy
vosk-model-en-us-0.22	English	1.8G	High accuracy

Download models from: https://alphacephei.com/vosk/models

Example Workflow

User sends a voice message via WeChat/Telegram
OpenClaw receives the audio file
Run: python3 transcribe.py /path/to/voice.ogg
Return transcribed text to user

Troubleshooting

No model found: Download a model to ~/.vosk/models/
ffmpeg not found: Install via brew install ffmpeg or apt install ffmpeg
Poor accuracy: Try a larger model for better results

Notes

Works completely offline after model download
Supports multiple languages (download appropriate model)
Audio is converted to 16kHz mono WAV for processing

版本历史

共 1 个版本

v1.0.0 当前

2026-05-12 06:08 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)