概述

xfyun-tts

Synthesize natural, expressive speech from text using iFlytek's Ultra-Realistic Voice Synthesis (超拟人语音合成) WebSocket API. Features human-like breathing, pauses, and emotional expression across 50+ voices.

Setup

Create an app at 讯飞控制台 with 超拟人语音合成 service enabled
Enable the desired voice(s) in the console (default: x5_lingyuzhao_flow / 聆玉昭)
Set environment variables:

```bash

export XFYUN_APP_ID="your_app_id"

export XFYUN_API_KEY="your_api_key"

export XFYUN_API_SECRET="your_api_secret"

```

Usage

Basic synthesis

python3 scripts/tts.py "你好，欢迎使用科大讯飞语音合成。"
# → saves to output.mp3

Specify output file

python3 scripts/tts.py "Hello, this is a test." --output hello.mp3

Use a different voice

python3 scripts/tts.py "大家好" --voice x6_lingfeiyi_pro --output greeting.mp3

Read from file

python3 scripts/tts.py --file article.txt --output article.mp3

Pipe from stdin

echo "流式文本输入测试" | python3 scripts/tts.py --output speech.mp3

Adjust parameters

python3 scripts/tts.py "语速快一点" --speed 70 --volume 80 --pitch 60

Output PCM format

python3 scripts/tts.py "测试" --format pcm --sample-rate 16000 --output test.pcm

List all available voices

python3 scripts/tts.py --list-voices

Options

Flag	Short	Default	Description
------	-------	---------	-------------
`text`			Text to synthesize (positional)
`--file`	`-f`		Read text from a file
`--output`	`-o`	`output.mp3`	Output audio file path
`--voice`	`-v`	`x5_lingyuzhao_flow`	Voice name (vcn)
`--format`		`mp3`	Audio format: mp3, pcm, speex, opus
`--sample-rate`		`24000`	Sample rate: 8000, 16000, 24000
`--speed`		`50`	Speed 0–100 (50=normal, 100=2x)
`--volume`		`50`	Volume 0–100 (50=normal)
`--pitch`		`50`	Pitch 0–100 (50=normal)
`--bgs`		`0`	Background sound: 0=none, 1=bg1, 2=bg2
`--reg`		`0`	English pronunciation: 0=auto, 1=spell, 2=letter
`--rdn`		`0`	Number reading: 0=auto, 1=value, 2=string, 3=string-prefer
`--list-voices`			Print voice list and exit

Popular Voices

VCN	Name	Gender	Language	Scene
-----	------	--------	----------	-------
`x5_lingyuzhao_flow`	聆玉昭	Female	中文	交互聊天
`x5_lingxiaotang_flow`	聆小糖	Female	中文	语音助手
`x6_lingfeiyi_pro`	聆飞逸	Male	中文	交互聊天
`x6_lingxiaoli_pro`	聆小璃	Female	中文	交互聊天
`x6_pangbainan1_pro`	旁白男声	Male	中文	旁白配音
`x6_pangbainv1_pro`	旁白女声	Female	中文	旁白配音
`x6_lingfeihan_pro`	聆飞瀚	Male	中文	纪录片
`x5_EnUs_Grant_flow`	Grant	Female	English	交互聊天
`x5_EnUs_Lila_flow`	Lila	Female	English	交互聊天
`x4_zijin_oral`	子津	Male	天津话	交互聊天
`x4_ziyang_oral`	子阳	Male	东北话	交互聊天

Run --list-voices for the complete list (50+ voices).

Text Features

Silent pauses

Insert [p500] in text for a 500ms pause:

你好[p500]科大讯飞

Specify pronunciation

Use [=pinyin] after a character to force pronunciation:

着[=zhuo2]手

Notes

Endpoint: wss://cbm01.cn-huabei-1.xf-yun.com/v1/private/mcd9m97e6
Protocol: WebSocket (RFC 6455) with HMAC-SHA256 signed URL auth
Text limit: max 64KB total per session
Session timeout: 60 seconds
Text input speed: must exceed 15 chars/sec for streaming (not relevant for single-shot mode)
No pip dependencies: uses a built-in minimal WebSocket client on pure Python stdlib
Env vars: XFYUN_APP_ID, XFYUN_API_KEY, XFYUN_API_SECRET
Output: prints the absolute path of saved audio to stdout (for easy piping to other tools)
x4 series voices (x4_*_oral) support oral configuration parameters (口语化), x5/x6 do not
Voices must be enabled in the console before use

版本历史

共 1 个版本

v1.0.0 当前

2026-05-03 08:52 安全安全

安全检测

暂无安全检测报告

iFlytek Ultra-Realistic TTS

概述