← 返回
数据分析 中文

Local Voice (FluidAudio TTS/STT)

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.
在 Apple Silicon 上使用 FluidAudio 实现本地文本转语音 (TTS) 和语音转文本 (STT)。通过 Apple 神经网络引擎在设备端完成秒级语音合成与转录。适用于设置本地语音功能、语音助手集成或替代云端 TTS/STT 服务。
trondw
数据分析 clawhub v1.0.1 1 版本 99901.5 Key: 无需
★ 2
Stars
📥 1,988
下载
💾 35
安装
1
版本
#latest

概述

Local Voice (FluidAudio TTS/STT)

Sub-second local voice AI for Apple Silicon Macs using FluidAudio's CoreML models.

Features

  • TTS: Kokoro model with 54 voices, ~0.6-0.8s latency
  • STT: Parakeet TDT v3, ~0.2-0.3s latency, 25 languages
  • 100% local: No cloud, no cost, works offline
  • Neural Engine: Runs on Apple's ANE for efficiency

Requirements

  • macOS 14+ on Apple Silicon (M1/M2/M3/M4)
  • Swift 5.9+
  • espeak-ng (for TTS phoneme fallback)

Quick Setup

1. Install Dependencies

brew install espeak-ng

2. Build the Daemon

cd /path/to/skill/sources
swift build -c release

3. Install Binary and Framework

mkdir -p ~/clawd/bin
cp .build/release/StellaVoice ~/clawd/bin/
cp -R .build/arm64-apple-macosx/release/ESpeakNG.framework ~/clawd/bin/
install_name_tool -add_rpath @executable_path ~/clawd/bin/StellaVoice

4. Create LaunchAgent

cat > ~/Library/LaunchAgents/com.stella.tts.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.stella.tts</string>
    <key>ProgramArguments</key>
    <array>
        <string>$HOME/clawd/bin/StellaVoice</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>$HOME/.clawdbot/logs/stella-tts.log</string>
    <key>StandardErrorPath</key>
    <string>$HOME/.clawdbot/logs/stella-tts.err.log</string>
</dict>
</plist>
EOF

launchctl load ~/Library/LaunchAgents/com.stella.tts.plist

API Endpoints

The daemon listens on http://127.0.0.1:18790:

TTS - Text to Speech

# Simple text to WAV
curl -X POST http://127.0.0.1:18790/synthesize -d "Hello world" -o output.wav

# With speed control (0.5-2.0)
curl -X POST "http://127.0.0.1:18790/synthesize?speed=1.2" -d "Fast!" -o output.wav

# JSON endpoint
curl -X POST http://127.0.0.1:18790/synthesize/json \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello", "speed": 1.0, "deEss": true}'

STT - Speech to Text

curl -X POST http://127.0.0.1:18790/transcribe \
  --data-binary @audio.wav \
  -H "Content-Type: audio/wav"
# Returns: {"text": "transcribed text"}

Health Check

curl http://127.0.0.1:18790/health
# Returns: ok

Voice Options

Default voice is af_sky. Change by modifying the source code.

Top Kokoro voices (American female):

  • af_heart (A grade) - warm, natural
  • af_bella (A-) - expressive
  • af_sky (C-) - clear, light

All 54 voices: See references/VOICES.md

Expressiveness

Speed Control

  • speed=0.8 → Calm, relaxed
  • speed=1.0 → Natural pace
  • speed=1.2 → Energetic, upbeat

Punctuation (automatic)

  • ! → Excited tone
  • ? → Rising intonation
  • . → Neutral, falling
  • ... → Pauses

SSML Tags

<phoneme ph="kəkˈɔɹO">Kokoro</phoneme>
<sub alias="Doctor">Dr.</sub>
<say-as interpret-as="date">2024-01-15</say-as>

Helper Script

See scripts/stella-tts.sh for a convenient wrapper:

scripts/stella-tts.sh "Hello world" output.wav
scripts/stella-tts.sh "Hello world" output.mp3  # Auto-converts

Integration Example

For voice assistants, update your voice proxy to use local endpoints:

// STT
const response = await fetch('http://127.0.0.1:18790/transcribe', {
    method: 'POST',
    headers: { 'Content-Type': 'audio/wav' },
    body: audioData
});
const { text } = await response.json();

// TTS
const audio = await fetch('http://127.0.0.1:18790/synthesize', {
    method: 'POST',
    body: textToSpeak
});

Troubleshooting

Library not loaded (ESpeakNG)

  • Ensure ESpeakNG.framework is in the same directory as the binary
  • Run install_name_tool -add_rpath @executable_path /path/to/binary

Slow first request

  • First request loads models (~8-10s)
  • Subsequent requests are sub-second

x86 vs ARM

  • Must build and run on ARM64 native (not Rosetta)
  • Check with uname -m (should show arm64)

Source Code

The daemon source is in sources/ directory. It's a Swift package using:

  • FluidAudio (TTS + STT models)
  • Hummingbird (HTTP server)

Rebuild after modifying:

cd sources && swift build -c release

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-28 19:49 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Math Worksheets

trondw
使用 tectonic(免费、无需注册)将 LaTeX 编译为 PDF,生成专业的数学练习卷及完整答案,支持任意数学主题。
★ 2 📥 348
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,931
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 199 📥 65,290