概述

WeChat Voice

专为微信 clawbot 设计的微信语音解析技能。识别微信 SILK 语音、解码为 WAV、转写为文字，并基于语音内容回复。

What this skill does

Inspect an inbound audio attachment.
Detect whether it is WeChat SILK (#!SILK_V3).
Decode SILK audio to WAV with Python.
Transcribe WAV locally with CPU-based Whisper.
Return a transcript or explicitly report that the clip is blank / too short / unclear.
Use the transcript as the basis for the user-facing reply.

Use this workflow

Get the local file path for the inbound audio attachment from the current conversation context.
Inspect the first bytes of the file before assuming its format.
If the file header contains #!SILK_V3, treat it as WeChat SILK.
Run the bundled script scripts/transcribe_wechat_voice.py on the attachment.
If the script returns text, answer based on the transcript.
If the script returns NO_SEGMENTS, tell the user the clip appears blank, too short, too quiet, or unclear.
If decoding fails, report the failure and mention whether the issue is format detection, decode failure, or transcription failure.

Primary command

python3 /root/.openclaw/workspace/skills/wechat-voice/scripts/transcribe_wechat_voice.py <audio_path>

Optional WAV output path:

python3 /root/.openclaw/workspace/skills/wechat-voice/scripts/transcribe_wechat_voice.py <audio_path> /tmp/wechat-voice.wav

Installation / environment notes

This skill is text-only and ClawHub-friendly, but it expects common local runtimes and Python packages to be available.

Required runtimes

python3
ffmpeg

Python packages needed

Install locally with:

python3 -m pip install --user silk-python faster-whisper

Use silk-python to decode WeChat SILK audio in Python.

Use faster-whisper for local CPU transcription.

Prefer faster-whisper over openai-whisper in this environment because it avoids a heavy PyTorch/CUDA installation chain and works well on CPU.

Output expectations

The script prints exactly one of these:

The recognized text
NO_SEGMENTS if no usable speech is detected

Treat NO_SEGMENTS as a valid outcome, not as a crash.

Good user-facing behavior

Be brief.
If transcript succeeds, respond to the content instead of over-explaining the pipeline.
If the clip is blank, say so plainly.
If the clip is unclear, ask for a clearer re-recording.
Only mention technical details when the user asks or when debugging is needed.

When debugging is needed

Check these in order:

Does the file exist and have non-zero size?
Do the first bytes include #!SILK_V3?
Did WAV decode succeed?
Did transcription return NO_SEGMENTS?
Is the clip too short or silent?

For byte inspection, use a quick Python snippet such as:

python3 - <<'PY'
from pathlib import Path
p = Path('/path/to/audio')
b = p.read_bytes()[:64]
print('size=', p.stat().st_size)
print('hex=', b.hex())
print('ascii=', ''.join(chr(x) if 32 <= x < 127 else '.' for x in b))
PY

Bundled files

scripts/transcribe_wechat_voice.py: Decode/transcribe entry point.
references/notes.md: Environment-specific notes and maintenance hints.

版本历史

共 1 个版本

v0.1.0 当前

2026-03-30 21:19 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)