← 返回
AI智能 中文

VoiceClaw

Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper T...
本地语音输入/输出,服务于 OpenClaw 代理。使用本地 Whisper(whisper.cpp)将收到的语音/音频转录为文本,并使用本地 Piper TTS 生成语音回复。
asif2bd
AI智能 clawhub v1.0.6 1 版本 100000 Key: 无需
★ 0
Stars
📥 884
下载
💾 45
安装
1
版本
#latest

概述

VoiceClaw

Local-only voice I/O for OpenClaw agents.

  • STT: transcribe.sh — converts audio to text via local Whisper binary
  • TTS: speak.sh — converts text to speech via local Piper binary
  • Network calls: none — both scripts run fully offline
  • No cloud APIs, no API keys required

Prerequisites

The following must be installed on the system before using this skill:

RequirementPurpose
------
whisper binarySpeech-to-text inference
ggml-base.en.bin model fileWhisper STT model
piper binaryText-to-speech synthesis
*.onnx voice model filesPiper TTS voices
ffmpegAudio format conversion

See README.md for installation and setup instructions.


Environment Variables

VariableDefaultPurpose
---------
WHISPER_BINauto-detected via whichPath to whisper binary
WHISPER_MODEL~/.cache/whisper/ggml-base.en.binPath to Whisper model file
PIPER_BINauto-detected via whichPath to piper binary
VOICECLAW_VOICES_DIR~/.local/share/piper/voicesDirectory containing .onnx voice model files

Verify Setup

which whisper && echo "STT binary: OK"
which piper   && echo "TTS binary: OK"
which ffmpeg  && echo "ffmpeg: OK"
ls "${WHISPER_MODEL:-$HOME/.cache/whisper/ggml-base.en.bin}" && echo "STT model: OK"
ls "${VOICECLAW_VOICES_DIR:-$HOME/.local/share/piper/voices}"/*.onnx 2>/dev/null | head -1 && echo "TTS voices: OK"

Inbound Voice: Transcribe

# Transcribe audio → text (supports ogg, mp3, m4a, wav, flac)
TRANSCRIPT=$(bash scripts/transcribe.sh /path/to/audio.ogg)

Override model path:

WHISPER_MODEL=/path/to/ggml-base.en.bin bash scripts/transcribe.sh audio.ogg

Outbound Voice: Speak

# Step 1: Generate WAV (local Piper — no network)
WAV=$(bash scripts/speak.sh "Your response here." /tmp/reply.wav en_US-lessac-medium)

# Step 2: Convert to OGG Opus (Telegram voice requirement)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply.ogg -y -loglevel error

# Step 3: Send via message tool (filePath=/tmp/reply.ogg)

Override voice directory:

VOICECLAW_VOICES_DIR=/path/to/voices bash scripts/speak.sh "Hello." /tmp/reply.wav

Available Voices

VoiceStyle
------
en_US-lessac-mediumNeutral American (default)
en_US-amy-mediumWarm American female
en_US-joe-mediumAmerican male
en_US-kusal-mediumExpressive American male
en_US-danny-lowDeep American male (fast)
en_GB-alba-mediumBritish female
en_GB-northern_english_male-mediumNorthern British male

Agent Behavior Rules

  1. Voice in → Voice + Text out. Always respond with both a voice reply and a text reply when a voice message is received.
  2. Include the transcript. Show "🎙️ I heard: [transcript]" at the top of every text reply to a voice message.
  3. Keep voice responses concise. Piper TTS works best under ~200 words — summarize for audio, include full detail in text.
  4. Local only. Never use a cloud TTS/STT API. Only the local whisper and piper binaries.
  5. Send voice before text. Send the audio file first, then follow with the text reply.

Full Example

# 1. Transcribe inbound voice message
TRANSCRIPT=$(bash path/to/voiceclaw/scripts/transcribe.sh /path/to/voice.ogg)

# 2. Compose reply and generate audio
RESPONSE="Deployment complete. All checks passed."
WAV=$(bash path/to/voiceclaw/scripts/speak.sh "$RESPONSE" /tmp/reply_$$.wav)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply_$$.ogg -y -loglevel error

# 3. Send voice + text
# message(action=send, filePath=/tmp/reply_$$.ogg, ...)
# reply: "🎙️ I heard: $TRANSCRIPT\n\n$RESPONSE"

Troubleshooting

IssueFix
------
whisper: command not foundEnsure whisper binary is installed and in PATH
Whisper model not foundSet WHISPER_MODEL=/path/to/ggml-base.en.bin
piper: command not foundEnsure piper binary is installed and in PATH
Voice model missingSet VOICECLAW_VOICES_DIR=/path/to/voices/
OGG won't play on TelegramEnsure -c:a libopus flag in ffmpeg command

版本历史

共 1 个版本

  • v1.0.6 当前
    2026-03-29 16:56 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

WordPress Publisher Skill

asif2bd
通过 REST API 直接发布内容至 WordPress 站点,完整支持 Gutenberg 区块。支持创建发布文章/页面、自动加载并选择网站分类、生成 SEO 优化标签、发布前预览,以及生成表格、图片、列表和富文本格式的 Gutenbe
★ 12 📥 4,899
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 836 📥 213,106
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,308