← 返回
未分类 中文

GPT-SoVITS TTS

High-quality Chinese TTS using GPT-SoVITS v2 Pro+ — convert text to natural-sounding speech with voice cloning support.
使用GPT-SoVITS v2 Pro+的高质量中文TTS,支持文本转自然语音和声音克隆。
huizong-cpu huizong-cpu 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 236
下载
💾 0
安装
1
版本
#chinese#gptsovits#latest#tts#voice

概述

GPT-SoVITS TTS

A production-ready text-to-speech skill that connects to a local GPT-SoVITS v2 Pro+ API server. Converts Chinese text to natural-sounding speech with a cloned reference voice. Designed for voice response automation, content narration, and AI voice applications.

Features

  • Clean TTS pipeline: Text → GPT-SoVITS API → WAV → MP3 (128kbps, 44100Hz, mono)
  • Voice cloning: Uses a pre-recorded reference audio for consistent voice output
  • Configurable: API URL, timeout, TTS parameters (speed, top_k, top_p, temperature, seed)
  • No GPU required: Pure CPU inference, works on any machine (approx. 5-10s per sentence)

Requirements

  • GPT-SoVITS v2 Pro+ API running at http://127.0.0.1:9880 (or set GPT_SOVITS_API_URL)
  • ffmpeg installed and in PATH (for WAV→MP3 conversion)
  • Node.js packages: axios

Model files needed (on the API server side)

ComponentFileSize
-----------------------
s1s1v3.ckpt148MB
s2s2Gv2ProPlus.pth191MB
BERTchinese-roberta-wwm-ext-large621MB
CNHuBERTchinese-hubert-base180MB
Speaker Verificationpretrained_eres2netv2w24s4ep4.ckpt103MB
Reference Audioref_audio.wav~10-30s clean recording

Quick Start

1. Start GPT-SoVITS API

cd /path/to/GPT-SoVITS-CPUFast
conda activate GPTSoVits
python api_v2.py -a 127.0.0.1 -p 9880

2. Set reference audio

Place a clean .wav file (10-30 seconds of the target voice) at:

voice-clone/ref_audio.wav

3. Use the skill

const tts = require('./skills/voice-clone');
const mp3 = await tts.speak("你好,欢迎使用GPT-SoVITS语音合成。", "output.mp3");
// Returns: "output.mp3"

API

speak(text, outputPath, opts?)

ParamTypeDefaultDescription
-----------------------------------
textstringrequiredChinese text to synthesize
outputPathstringrequiredOutput .mp3 file path
opts.topKnumber15Top-K sampling
opts.topPnumber0.7Top-P sampling
opts.temperaturenumber0.5Sampling temperature
opts.speednumber1.0Speed factor
opts.seednumber-1Random seed (-1 = random)

Returns: Promise — path to the generated MP3 file.

Environment Variables

VariableDefaultDescription
--------------------------------
GPT_SOVITS_API_URLhttp://127.0.0.1:9880GPT-SoVITS API base URL
GPT_SOVITS_API_TIMEOUT300000API request timeout (ms)

Integration

This skill is designed to be called from automation workflows:

  • Voice reply for messaging bots (WeChat, Telegram, etc.)
  • Content narration for video/audio production
  • Voice response for IVR systems

License

MIT

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-23 16:48 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Text Processor

huizong-cpu
批量中文文本处理——清洗、规范、翻译、提取关键词,并格式化以用于内容生产。
★ 0 📥 358
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 429 📥 116,784
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 330 📥 93,661