概述

GPT-SoVITS TTS

A production-ready text-to-speech skill that connects to a local GPT-SoVITS v2 Pro+ API server. Converts Chinese text to natural-sounding speech with a cloned reference voice. Designed for voice response automation, content narration, and AI voice applications.

Features

Clean TTS pipeline: Text → GPT-SoVITS API → WAV → MP3 (128kbps, 44100Hz, mono)
Voice cloning: Uses a pre-recorded reference audio for consistent voice output
Configurable: API URL, timeout, TTS parameters (speed, top_k, top_p, temperature, seed)
No GPU required: Pure CPU inference, works on any machine (approx. 5-10s per sentence)

Requirements

GPT-SoVITS v2 Pro+ API running at http://127.0.0.1:9880 (or set GPT_SOVITS_API_URL)
ffmpeg installed and in PATH (for WAV→MP3 conversion)
Node.js packages: axios

Model files needed (on the API server side)

Component	File	Size
-----------	------	------
s1	s1v3.ckpt	148MB
s2	s2Gv2ProPlus.pth	191MB
BERT	chinese-roberta-wwm-ext-large	621MB
CNHuBERT	chinese-hubert-base	180MB
Speaker Verification	pretrained_eres2netv2w24s4ep4.ckpt	103MB
Reference Audio	ref_audio.wav	~10-30s clean recording

Quick Start

1. Start GPT-SoVITS API

cd /path/to/GPT-SoVITS-CPUFast
conda activate GPTSoVits
python api_v2.py -a 127.0.0.1 -p 9880

2. Set reference audio

Place a clean .wav file (10-30 seconds of the target voice) at:

voice-clone/ref_audio.wav

3. Use the skill

const tts = require('./skills/voice-clone');
const mp3 = await tts.speak("你好，欢迎使用GPT-SoVITS语音合成。", "output.mp3");
// Returns: "output.mp3"

API

`speak(text, outputPath, opts?)`

Param	Type	Default	Description
-------	------	---------	-------------
text	string	required	Chinese text to synthesize
outputPath	string	required	Output .mp3 file path
opts.topK	number	15	Top-K sampling
opts.topP	number	0.7	Top-P sampling
opts.temperature	number	0.5	Sampling temperature
opts.speed	number	1.0	Speed factor
opts.seed	number	-1	Random seed (-1 = random)

Returns: Promise — path to the generated MP3 file.

Environment Variables

Variable	Default	Description
----------	---------	-------------
GPT_SOVITS_API_URL	http://127.0.0.1:9880	GPT-SoVITS API base URL
GPT_SOVITS_API_TIMEOUT	300000	API request timeout (ms)

Integration

This skill is designed to be called from automation workflows:

Voice reply for messaging bots (WeChat, Telegram, etc.)
Content narration for video/audio production
Voice response for IVR systems

License

MIT

版本历史

共 1 个版本

v1.0.0 当前

2026-05-23 16:48 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

GPT-SoVITS TTS

概述

GPT-SoVITS TTS

Features

Requirements

Model files needed (on the API server side)

Quick Start

1. Start GPT-SoVITS API

2. Set reference audio

3. Use the skill

API

`speak(text, outputPath, opts?)`

Environment Variables

Integration

License

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Text Processor

Nano Banana Pro

Openai Whisper