Text-to-speech using Xiaomi's MiMo-V2-TTS model. Supports emotional style control, Chinese dialects (Northeastern/Sichuan/Cantonese/Taiwanese), role-playing voices, and singing synthesis.
https://api.xiaomimimo.com/v1/v1/chat/completions (NOT /audio/speech)mimo-v2-ttsMIMO_API_KEY env varMiMo TTS uses the Chat Completions endpoint with special requirements:
system role allowed (returns error)assistant role message (the text to synthesize)user message = style/voice instructionsassistant message = text to be spokenchoices[0].message.audio.data contains base64-encoded audiopython3 <skill_dir>/scripts/mimo_tts.py \
--text "Hello, world!" \
--output /tmp/openclaw/tts_output.mp3 \
[--style "cheerful tone"] \
[--speed 1.0] \
[--format mp3] \
[--api-key YOUR_KEY]
Set MIMO_API_KEY environment variable or pass --api-key.
| Parameter | Required | Description |
|---|---|---|
| ----------- | ---------- | ------------- |
| --text | ✅ | Text to synthesize (recommended < 5000 chars) |
| --output | ✅ | Output audio file path |
| --style | ❌ | Natural language style description |
| --speed | ❌ | Speech rate 0.5–2.0 (default 1.0) |
| --format | ❌ | mp3/wav/pcm/opus/flac (default mp3) |
| --api-key | ❌ | API Key (overrides env var) |
--style "speak in Cantonese" / "Sichuan dialect" / "Taiwanese accent"--style "happy and excited" / "sad and gentle" / "start happy then turn melancholic"--style "news anchor" / "gentle older sister"--style "sing it"--style "Northeastern dialect, enthusiastic and bold"共 1 个版本