A complete, privacy-focused voice system for OpenClaw that works entirely offline. No internet required, no data leaves your machine.
# Install the skill
clawhub install lessac_offline_voice_system
# Or manually from this directory
./scripts/install.sh
from scripts.voice_handler import VoiceHandler
handler = VoiceHandler()
# Transcribe audio to text
text = handler.audio_to_text("voice_message.ogg")
print(f"You said: {text}")
# Generate voice response
audio_file = handler.text_to_audio("Hello, this is a voice response.")
# Transcribe audio
./scripts/voice_integration.sh transcribe voice_message.ogg
# Generate TTS
./scripts/voice_integration.sh tts "Hello world" output.wav
# Full voice processing
./scripts/voice_integration.sh process voice_message.ogg
en-IE-ConnorNeural)edge-tts)When installed, the skill can be configured to automatically:
The built-in OpenClaw reply TTS path is not the local voice pipeline used by this skill.
This skill now uses a local Edge TTS reply path instead, with cached output
stored under /root/.openclaw/tts/cache.
Default outbound voice:
en-IE-ConnorNeuralRelevant files:
tts_edge_wrapper.pyvoice_handler.pyvoice_integration.shscripts/install.shIf you need to change the voice, set:
export OPENCLAW_EDGE_TTS_VOICE="en-IE-ConnorNeural"
or replace it with another Edge-supported voice.
After an OpenClaw system update, rerun the installer to restore the voice stack:
cd /root/.openclaw/workspace/skills/lessac_offline_voice_system
./scripts/install.sh
This refreshes:
faster-whisper, edge-tts, soundfile)/root/.openclaw/tts/config.json# In your OpenClaw agent or custom script
import sys
sys.path.append("/path/to/skill/scripts")
from voice_handler import VoiceHandler
class YourAgent:
def __init__(self):
self.voice = VoiceHandler()
def handle_voice_message(self, audio_file):
# Transcribe
text = self.voice.audio_to_text(audio_file)
# Generate response (your AI logic here)
response = self.generate_response(text)
# Convert to voice
voice_response = self.voice.text_to_audio(response)
return voice_response
The skill uses Edge TTS by default. To use a different voice:
OPENCLAW_EDGE_TTS_VOICE to a supported Edge voiceChange the faster-whisper model size in scripts/voice_handler.py:
"tiny": Fastest, lower accuracy"base": Default, good balance"small": Higher accuracy, slower"medium": Best accuracy, slowest```bash
pip install piper-tts
```
```bash
sudo apt-get install ffmpeg
```
"tiny" or "base" STT modelEnable debug output:
export VOICE_DEBUG=1
./scripts/voice_integration.sh process audio.ogg
scripts/install.sh - Installation scriptscripts/voice_handler.py - Main Python handlerscripts/piper_tts.py - Edge TTS wrapperscripts/voice_integration.sh - Bash interfacereferences/voice_models.md - Voice model informationassets/ - Voice model files (downloaded during install)Open source. See included LICENSE file.
For issues or questions:
共 1 个版本