← 返回
内容创作 Key 中文

Smallest Ai

Ultra-fast text-to-speech and speech-to-text via Smallest AI's Lightning v3.1 and Pulse models. Use when the user wants to generate speech, convert text to v...
通过 Smallest AI 的 Lightning v3.1 和 Pulse 模型提供超快文本转语音及语音转文本服务。适用于用户想要生成语音、将文本转换为...
abhishekmishragithub
内容创作 clawhub v1.0.1 1 版本 99820.8 Key: 需要
★ 0
Stars
📥 557
下载
💾 9
安装
1
版本
#latest#multilingual#speech#stt#tts#voice

概述

Smallest AI — Ultra-Fast Voice Suite

Text-to-speech (sub-100ms) via Lightning v3.1 and speech-to-text (64ms TTFT) via Pulse.

Setup

  1. Get API key from https://waves.smallest.ai → click "API Key" in left panel
  2. Set SMALLEST_API_KEY in your environment:
  3. export SMALLEST_API_KEY="your_key_here"
    

Defaults

  • Default female voice: sophia (American English)
  • Default male voice: robert (American English)
  • Default language: en
  • Default speed: 1.0
  • Default sample rate: 24000

Voice Selection Rules

Follow these rules to select the voice:

  1. If user explicitly names a voice (e.g. "use advika"), use that voice.
  2. If user asks for a male voice, use the configured defaultVoiceMale.
  3. If user asks for a female voice, use the configured defaultVoiceFemale.
  4. If no gender preference, use defaultVoiceFemale (sophia by default).
  5. For Hindi content: use advika (female) or vivaan (male).
  6. For Spanish content: use camilla (female) or carlos (male).
  7. For Tamil content: use anitha (female) or raju (male).

Always pass the configured defaultLanguage, defaultSpeed, and defaultSampleRate as --lang, --speed, and --rate flags unless the user overrides them.

Text-to-Speech

Generate speech audio from text using Lightning v3.1 model.

Shell (preferred — zero dependencies)

{baseDir}/scripts/tts.sh "Text to speak" --voice sophia --rate 24000 --speed 1.0 --lang en

Python (requires pip install smallestai or just requests)

python3 {baseDir}/scripts/tts.py "Text to speak" --voice sophia --speed 1.0 --lang en --out speech.wav

Voices

VoiceGenderAccentBest For
-----------------------------------------------------------------
sophiaFemaleAmericanGeneral use (default)
robertMaleAmericanProfessional, reports (default)
advikaFemaleIndianHindi content, code-switch
vivaanMaleIndianBilingual English/Hindi
camillaFemaleMexican/LatinSpanish content
zaraFemaleAmericanConversational
melodyFemaleAmericanStorytelling, greetings
arjunMaleIndianEnglish/Hindi bilingual
stellaFemaleAmericanExpressive, warm

80+ more voices available. List all with: {baseDir}/scripts/voices.sh

Options

  • --voice : Voice identifier (default: sophia)
  • --rate : Sample rate — 8000 | 16000 | 24000 | 44100 (default: 24000)
  • --speed : Playback speed 0.5–2.0 (default: 1.0)
  • --lang : Language code (default: en). See {baseDir}/references/languages.md
  • --out : Output file (default: auto-named media/tts_.wav)

Output

Scripts print MEDIA: on success. OpenClaw sends this as an audio attachment.

Multilingual

Supports 30+ languages. Pass --lang with ISO code:

{baseDir}/scripts/tts.sh "नमस्ते, कैसे हैं आप?" --voice advika --lang hi
{baseDir}/scripts/tts.sh "Bonjour le monde" --voice sophia --lang fr
{baseDir}/scripts/tts.sh "Hola, buenos días" --voice camilla --lang es

Code-switching (mixing languages) works automatically — no flag needed:

{baseDir}/scripts/tts.sh "Hey, मुझे meeting remind कर दो" --voice advika --lang hi

Speech-to-Text

Transcribe audio files using Pulse model. Supports WAV, MP3, OGG, FLAC.

Shell

{baseDir}/scripts/stt.sh /path/to/audio.wav
{baseDir}/scripts/stt.sh /path/to/audio.wav --diarize --timestamps --emotions

Python

python3 {baseDir}/scripts/stt.py /path/to/audio.wav --diarize --timestamps --lang en

Options

  • --lang : Language (default: en)
  • --diarize: Identify different speakers
  • --timestamps: Word-level timing
  • --emotions: Detect emotional tone

Output

Returns JSON with transcription field. With --diarize, includes speaker labels per word.

When to Use

Trigger this skill when the user:

  • Asks to "say", "speak", "read aloud", or "generate speech/audio"
  • Wants a "voice message", "voice note", or "audio file"
  • Asks to "transcribe", "convert speech/audio to text"
  • Mentions "Smallest AI", "Lightning TTS", or "Pulse STT"
  • Needs fast or low-latency speech generation
  • Wants Hindi, Spanish, multilingual, or code-switched voice output
  • Asks to compare TTS providers or benchmark latency

Error Handling

  • Missing API key → tell user to set SMALLEST_API_KEY
  • HTTP 401 → invalid or expired API key
  • HTTP 429 → rate limited, wait and retry
  • HTTP 400 → check text length (max ~5000 chars per request). Split long text into chunks.
  • Empty audio → verify voice_id is valid

Limits

  • Max text per request: ~5000 characters
  • For longer text: split into sentences, synthesize each, concatenate with sox or ffmpeg
  • Free tier: 30 minutes/month of TTS
  • Basic ($5/mo): 3 hours of TTS + 1 voice clone

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-30 06:04 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,191
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 860 📥 199,771
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,480