← 返回
未分类 中文

KittenTTS WhatsApp

Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib...
WhatsApp 语音对语音模式,采用 KittenTTS 与 ffmpeg,用 whisper 转录收到的音频,再用TTS 语音回复并转为 WhatsApp 兼容格式。
lakshibro lakshibro 来源
未分类 clawhub v1.0.4 1 版本 100000 Key: 无需
★ 0
Stars
📥 421
下载
💾 0
安装
1
版本
#latest

概述

KittenTTS WhatsApp Voice

Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.

> ⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.

System Dependencies

DependencyInstall commandSizeNotes
----------------------------------------
ffmpegapt-get install -y ffmpeg~30MBAvailable in most distro repos
kittenttspip3 install kittentts --break-system-packagespulls ~25-80MB from Hugging Face on first runPython package
libopusbundled with ffmpegOGG encoding support
soundfilepulled by kittenttsPython package

Network Calls

  • First run: downloads TTS model (~25-80MB) from huggingface.co/KittenML based on model size chosen
  • No API keys required — fully offline capable after model download
  • Set HF_TOKEN env var to avoid unauthenticated rate limits on model download

Model Options

ModelParametersSizeHugging Face ID
-----------------------------------------
nano (int8)15M25MBKittenML/kitten-tts-nano-0.8-int8
nano15M56MBKittenML/kitten-tts-nano-0.8-fp32
micro40M41MBKittenML/kitten-tts-micro-0.8
mini80M80MBKittenML/kitten-tts-mini-0.8

Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.

Setup

Run these manually before the skill is used:

# 1. System package (requires root/privileged)
apt-get install -y ffmpeg

# 2. Python package
pip3 install kittentts --break-system-packages

# 3. Optional: set Hugging Face token to avoid rate limits
# echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.bashrc

Restart OpenClaw after installing dependencies so the new packages are in PATH.

Usage

TTS only (no transcription)

bash scripts/tts_walkie.sh "Your message here" Bella
# Output: /tmp/walkie_reply.ogg (16kHz OGG Opus, WhatsApp-ready)

Transcription only (optional — requires whisper)

# Install whisper (one-time, ~140MB-1.4GB depending on model)
pip3 install whisper --break-system-packages

bash scripts/transcribe.sh /path/to/audio.ogg [model]
# Model: tiny | base | small | medium | large (default: base)

Voices

Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Default: Bella

Security Notes

  • Audio files are written to a private /tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.
  • WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
  • Set VOICE_SPEED env var to adjust speech rate (default: 1.0).

Files

kittentts-whatsapp/
├── SKILL.md
└── scripts/
    ├── tts_walkie.sh      # TTS + ffmpeg conversion (speed is now used)
    └── transcribe.sh       # whisper transcription (optional)

⚠️ Privileged Install Warning

The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.

Troubleshooting

Audio sends but is silent or rejected by WhatsApp:

→ Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg

→ Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.

TTS generation is slow:

→ Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.

Hugging Face download rate limit:

→ Set HF_TOKEN in your environment. Free accounts get lower rate limits.

版本历史

共 1 个版本

  • v1.0.4 当前
    2026-05-07 04:33 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

self-improving agent

pskoett
记录自身发现以实现自我改进的技能
★ 4,153 📥 925,232
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 859 📥 339,776
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,434 📥 327,490