概述

jarvis-vocal

Uses the authentic J.A.R.V.I.S. voice model from HuggingFace (trained on actual movie lines) via Piper TTS. No audio effects needed — the voice is naturally cinematic and British.

> Credit: Voice model by jgkawell — see the discussion for details on training and samples.

Usage

Generate a WAV file:

{baseDir}/bin/jarvis-tts "Text to speak" ./output.wav

Stream directly to an Android device (if ADB connected):

{baseDir}/bin/jarvis-tts "Text to speak" - | adb push - /sdcard/Download/temp.wav

Installation

Prerequisites

pipx install piper-tts
sudo apt install ffmpeg  # or equivalent

Install Voice Model

# Create voice directory
mkdir -p ~/.local/share/piper/voices/en_GB

# Download models via HuggingFace CLI
cd ~/.local/share/piper/voices/en_GB
hf download jgkawell/jarvis en/en_GB/jarvis/high/jarvis-high.onnx --local-dir .
hf download jgkawell/jarvis en/en_GB/jarvis/high/jarvis-high.onnx.json --local-dir .
# Optional: medium quality model
hf download jgkawell/jarvis en/en_GB/jarvis/medium/jarvis-medium.onnx --local-dir .
hf download jgkawell/jarvis en/en_GB/jarvis/medium/jarvis-medium.onnx.json --local-dir .

Integration

Works with OpenClaw Android nodes via ADB over Tailscale. Use jarvis-speak wrapper for one-command push+play:

jarvis-speak "Systems at your service, Sir."

Or use streaming mode (faster, ephemeral):

jarvis-speak "Message" --stream

Configuration

Setting	Default	Description
---------	---------	-------------
Model	`jarvis-high`	Voice quality: `high` (114MB) or `medium` (63MB)
Speed	1.0 (native)	Piper length-scale — adjust for faster/slower speech
Volume	1.0	Post-processing volume boost

Edit jarvis-speak script to change defaults.

Troubleshooting

"Model not found" → Download models to ~/.local/share/piper/voices/en_GB/jarvis-*

ADB connection refused → Ensure phone's ADB over WiFi is enabled and paired with laptop (port 5555)

Audio doesn't play → Check Android receives the file at /sdcard/Download/jarvis-current.wav and has a WAV-capable media player

License

MIT — The voice model is MIT licensed by jgkawell.

Credits

Voice model: jgkawell/jarvis on HuggingFace — trained on Marvel movie lines
TTS engine: Piper by Rhasspy
Integration: OpenClaw by Aidan Park

版本历史

共 1 个版本

v1.0.0 当前

2026-05-03 08:54 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)