Synthesize natural, expressive speech from text using iFlytek's Ultra-Realistic Voice Synthesis (超拟人语音合成) WebSocket API. Features human-like breathing, pauses, and emotional expression across 50+ voices.
x5_lingyuzhao_flow / 聆玉昭)```bash
export XFYUN_APP_ID="your_app_id"
export XFYUN_API_KEY="your_api_key"
export XFYUN_API_SECRET="your_api_secret"
```
python3 scripts/tts.py "你好,欢迎使用科大讯飞语音合成。"
# → saves to output.mp3
python3 scripts/tts.py "Hello, this is a test." --output hello.mp3
python3 scripts/tts.py "大家好" --voice x6_lingfeiyi_pro --output greeting.mp3
python3 scripts/tts.py --file article.txt --output article.mp3
echo "流式文本输入测试" | python3 scripts/tts.py --output speech.mp3
python3 scripts/tts.py "语速快一点" --speed 70 --volume 80 --pitch 60
python3 scripts/tts.py "测试" --format pcm --sample-rate 16000 --output test.pcm
python3 scripts/tts.py --list-voices
| Flag | Short | Default | Description |
|---|---|---|---|
| ------ | ------- | --------- | ------------- |
text | Text to synthesize (positional) | ||
--file | -f | Read text from a file | |
--output | -o | output.mp3 | Output audio file path |
--voice | -v | x5_lingyuzhao_flow | Voice name (vcn) |
--format | mp3 | Audio format: mp3, pcm, speex, opus | |
--sample-rate | 24000 | Sample rate: 8000, 16000, 24000 | |
--speed | 50 | Speed 0–100 (50=normal, 100=2x) | |
--volume | 50 | Volume 0–100 (50=normal) | |
--pitch | 50 | Pitch 0–100 (50=normal) | |
--bgs | 0 | Background sound: 0=none, 1=bg1, 2=bg2 | |
--reg | 0 | English pronunciation: 0=auto, 1=spell, 2=letter | |
--rdn | 0 | Number reading: 0=auto, 1=value, 2=string, 3=string-prefer | |
--list-voices | Print voice list and exit |
| VCN | Name | Gender | Language | Scene |
|---|---|---|---|---|
| ----- | ------ | -------- | ---------- | ------- |
x5_lingyuzhao_flow | 聆玉昭 | Female | 中文 | 交互聊天 |
x5_lingxiaotang_flow | 聆小糖 | Female | 中文 | 语音助手 |
x6_lingfeiyi_pro | 聆飞逸 | Male | 中文 | 交互聊天 |
x6_lingxiaoli_pro | 聆小璃 | Female | 中文 | 交互聊天 |
x6_pangbainan1_pro | 旁白男声 | Male | 中文 | 旁白配音 |
x6_pangbainv1_pro | 旁白女声 | Female | 中文 | 旁白配音 |
x6_lingfeihan_pro | 聆飞瀚 | Male | 中文 | 纪录片 |
x5_EnUs_Grant_flow | Grant | Female | English | 交互聊天 |
x5_EnUs_Lila_flow | Lila | Female | English | 交互聊天 |
x4_zijin_oral | 子津 | Male | 天津话 | 交互聊天 |
x4_ziyang_oral | 子阳 | Male | 东北话 | 交互聊天 |
Run --list-voices for the complete list (50+ voices).
Insert [p500] in text for a 500ms pause:
你好[p500]科大讯飞
Use [=pinyin] after a character to force pronunciation:
着[=zhuo2]手
wss://cbm01.cn-huabei-1.xf-yun.com/v1/private/mcd9m97e6XFYUN_APP_ID, XFYUN_API_KEY, XFYUN_API_SECRETx4_*_oral) support oral configuration parameters (口语化), x5/x6 do not共 1 个版本
暂无安全检测报告