← 返回
AI智能

TinkerClaw Jarvis Voice

Turn your AI into JARVIS. Voice, wit, and personality — the complete package. Humor cranked to maximum.
让AI化身JARVIS,配备声音、机智与完整人格,幽默拉满。
globalcaos
AI智能 clawhub v2.2.2 3 版本 99726 Key: 无需
★ 3
Stars
📥 8,675
下载
💾 2,168
安装
3
版本
#latest

概述

Jarvis Voice

Your AI just got a voice. And the wit to use it.

Remember JARVIS in the Iron Man films? Not just the voice — the _personality_. The bone-dry observations while Tony was mid-crisis. _"I do appreciate your concern, sir, but the suit is quite capable of—" [explosion] "—as I was saying."_ That effortless, understated humor that made you forget you were listening to software.

That's what this skill gives your OpenClaw agent. The voice — offline text-to-speech using sherpa-onnx (British Alan voice) with metallic audio processing via ffmpeg. And the humor — four research-backed comedy patterns (dry wit, self-aware AI, alien observer, literal idiom play) calibrated to make your agent sound like it's been running your life for years and is quietly amused by the experience.

The humor isn't bolted on. It's baked in. Because a JARVIS that speaks without wit is just Siri with better reverb.

📄 The research behind the humor: LIMBIC — Computational Humor via Bisociation & Embedding Distances

The built-in tts tool uses Edge TTS — cloud-based, wrong voice, no metallic effects. The jarvis shell command is the right tool here; tts produces a generic Microsoft voice that breaks the JARVIS illusion.

One call does everything. Run the jarvis command once with what you want said:

exec(command='jarvis "Your spoken text here."', background=true)

The script (~/.local/bin/jarvis) both (a) posts the text to the chat UI as a purple Jarvis: bubble via the chat.inject gateway RPC, and (b) plays the metallic TTS. You do NOT hand-write a Jarvis: line — the script is the single source of truth, so what's shown always matches what's spoken (the old fragile prompt-driven duplication is gone, fixed 2026-05-30).

Channel-gated, automatic: it speaks + posts only on the home Tinker-UI session — it reads the channel from $TC_SESSION_KEY (format agent:::) and acts only when the channel is tinker. WhatsApp / cron / subagent turns produce neither voice nor bubble, with no per-turn judgment from you. The mute toggle (~/.openclaw/data/jarvis-muted.json) silences the speaker while still posting the bubble. Override the allowed channel with TC_VOICE_CHANNEL.

Command Reference

jarvis "Hello, this is a test"
  • Backend: sherpa-onnx offline TTS (Alan voice, British English, en_GB-alan-medium)
  • Speed: 2x (--vits-length-scale=0.5)
  • Effects chain (ffmpeg):
  • Pitch up 5% — tighter AI feel
  • Flanger — metallic sheen
  • 15ms echo — robotic ring
  • Highpass 200Hz + treble boost +6dB — crisp HUD clarity
  • Output: Plays via aplay to default audio device, then cleans up temp files
  • Language: English ONLY. The Alan model cannot handle other languages.

  1. Use background: true on the exec call — blocking the reply on audio playback delays the visual response and feels broken.
  2. Do NOT hand-write a Jarvis: transcript line — the script posts the bubble itself via chat.inject. Writing one too would double it.
  3. Keep spoken text ≤ 1500 characters; sherpa-onnx truncates above that.
  4. One jarvis call per response — stacked calls fight over the audio device and produce overlapping playback.
  5. English only — the Alan voice model can't pronounce other languages cleanly, so translate or summarise in English for the spoken portion.

  • Session greetings and farewells
  • Delivering results or summaries
  • Responding to direct conversation
  • Any time the user's last message included voice/audio

  • Pure tool/file operations with no conversational element
  • HEARTBEAT_OK responses
  • NO_REPLY responses

Webchat Purple Styling

The OpenClaw webchat has built-in support for Jarvis voice transcripts:

  • ui/src/styles/chat/text.css.jarvis-voice class renders purple italic (#9b59b6 dark, #8e44ad light theme)
  • ui/src/ui/markdown.ts — Post-render hook auto-wraps text after Jarvis: in a element

This means you just write Jarvis: text in markdown and the webchat handles the purple rendering. No extra markup needed.

For non-webchat surfaces (WhatsApp, Telegram, etc.), the bold/italic markdown renders natively — no purple, but still visually distinct.

Installation (for new setups)

Requires:

  • sherpa-onnx runtime at ~/.openclaw/tools/sherpa-onnx-tts/
  • Alan medium model at ~/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/
  • ffmpeg installed system-wide
  • aplay (ALSA) for audio playback
  • The jarvis script at ~/.local/bin/jarvis (or in PATH)

The jarvis script

#!/bin/bash
# Jarvis TTS - authentic JARVIS-style voice
# Usage: jarvis "Hello, this is a test"

export LD_LIBRARY_PATH=$HOME/.openclaw/tools/sherpa-onnx-tts/lib:$LD_LIBRARY_PATH

RAW_WAV="/tmp/jarvis_raw.wav"
FINAL_WAV="/tmp/jarvis_final.wav"

# Generate speech
$HOME/.openclaw/tools/sherpa-onnx-tts/bin/sherpa-onnx-offline-tts \
  --vits-model=$HOME/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/en_GB-alan-medium.onnx \
  --vits-tokens=$HOME/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/tokens.txt \
  --vits-data-dir=$HOME/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/espeak-ng-data \
  --vits-length-scale=0.5 \
  --output-filename="$RAW_WAV" \
  "$@" >/dev/null 2>&1

# Apply JARVIS metallic processing
if [ -f "$RAW_WAV" ]; then
  ffmpeg -y -i "$RAW_WAV" \
    -af "asetrate=22050*1.05,aresample=22050,\
flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,\
aecho=0.8:0.88:15:0.5,\
highpass=f=200,\
treble=g=6" \
    "$FINAL_WAV" -v error

  if [ -f "$FINAL_WAV" ]; then
    aplay -D plughw:0,0 -q "$FINAL_WAV"
    rm "$RAW_WAV" "$FINAL_WAV"
  fi
fi

WhatsApp Voice Notes

For WhatsApp, output must be OGG/Opus format instead of speaker playback:

sherpa-onnx-offline-tts --vits-length-scale=0.5 --output-filename=raw.wav "text"
ffmpeg -i raw.wav \
  -af "asetrate=22050*1.05,aresample=22050,flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,aecho=0.8:0.88:15:0.5,highpass=f=200,treble=g=6" \
  -c:a libopus -b:a 64k output.ogg

The Full JARVIS Experience

jarvis-voice gives your agent a voice. Pair it with ai-humor-ultimate and you give it a _soul_ — dry wit, contextual humor, the kind of understated sarcasm that makes you smirk at your own terminal.

This pairing is part of a 12-skill cognitive architecture we've been building — voice, humor, memory, reasoning, and more. Research papers included, because we're that kind of obsessive.

👉 Explore the full project: github.com/globalcaos/tinkerclaw

Clone it. Fork it. Break it. Make it yours.

Setup: Workspace Files

For voice to work consistently across new sessions, copy the templates to your workspace root:

cp {baseDir}/templates/VOICE.md ~/.openclaw/workspace/VOICE.md
cp {baseDir}/templates/SESSION.md ~/.openclaw/workspace/SESSION.md
cp {baseDir}/templates/HUMOR.md ~/.openclaw/workspace/HUMOR.md
  • VOICE.md — injected every session, enforces voice output rules (like SOUL.md)
  • SESSION.md — session bootstrap that includes voice greeting requirements
  • HUMOR.md — humor configuration at maximum frequency with four pattern types (dry wit, self-aware AI, alien observer, literal idiom)

Both files are auto-loaded by OpenClaw's workspace injection. The agent will speak from the very first reply of every session.

Included Files

FilePurpose
------------------------------------------------------------------------------------------
bin/jarvisThe TTS + effects script (portable, uses $SHERPA_ONNX_TTS_DIR)
templates/VOICE.mdVoice enforcement rules (copy to workspace root)
templates/SESSION.mdSession start with voice greeting (copy to workspace root)
templates/HUMOR.mdHumor config — four patterns, frequency 1.0 (copy to workspace root)

版本历史

共 3 个版本

  • v2.2.2 当前
    2026-06-07 11:44 安全 安全
  • v2.2.1
    2026-03-28 09:47 安全 安全
  • v3.1.1
    2026-03-27 18:20

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,527
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,349 📥 317,696
content-creation

TinkerClaw YouTube

globalcaos
免费字幕、4K 下载、视频探索——不消耗 API 配额。专为 TinkerClaw 分支打造——github.com/globalcaos/tinkerclaw。
★ 9 📥 4,400