← 返回
未分类 Key

iFlytek Ultra-Realistic TTS

iFlytek Ultra-Realistic TTS (超拟人语音合成) — synthesize natural, expressive speech from text using iFlytek's ultra-realistic voice synthesis API. Supports 50+ voi...
jpengcheng523-netizen
未分类 clawhub v1.0.0 100000 Key: 需要
★ 0
Stars
📥 407
下载
💾 1
安装

概述

xfyun-tts

Synthesize natural, expressive speech from text using iFlytek's Ultra-Realistic Voice Synthesis (超拟人语音合成) WebSocket API. Features human-like breathing, pauses, and emotional expression across 50+ voices.

Setup

  1. Create an app at 讯飞控制台 with 超拟人语音合成 service enabled
  2. Enable the desired voice(s) in the console (default: x5_lingyuzhao_flow / 聆玉昭)
  3. Set environment variables:

```bash

export XFYUN_APP_ID="your_app_id"

export XFYUN_API_KEY="your_api_key"

export XFYUN_API_SECRET="your_api_secret"

```

Usage

Basic synthesis

python3 scripts/tts.py "你好,欢迎使用科大讯飞语音合成。"
# → saves to output.mp3

Specify output file

python3 scripts/tts.py "Hello, this is a test." --output hello.mp3

Use a different voice

python3 scripts/tts.py "大家好" --voice x6_lingfeiyi_pro --output greeting.mp3

Read from file

python3 scripts/tts.py --file article.txt --output article.mp3

Pipe from stdin

echo "流式文本输入测试" | python3 scripts/tts.py --output speech.mp3

Adjust parameters

python3 scripts/tts.py "语速快一点" --speed 70 --volume 80 --pitch 60

Output PCM format

python3 scripts/tts.py "测试" --format pcm --sample-rate 16000 --output test.pcm

List all available voices

python3 scripts/tts.py --list-voices

Options

FlagShortDefaultDescription
-----------------------------------
textText to synthesize (positional)
--file-fRead text from a file
--output-ooutput.mp3Output audio file path
--voice-vx5_lingyuzhao_flowVoice name (vcn)
--formatmp3Audio format: mp3, pcm, speex, opus
--sample-rate24000Sample rate: 8000, 16000, 24000
--speed50Speed 0–100 (50=normal, 100=2x)
--volume50Volume 0–100 (50=normal)
--pitch50Pitch 0–100 (50=normal)
--bgs0Background sound: 0=none, 1=bg1, 2=bg2
--reg0English pronunciation: 0=auto, 1=spell, 2=letter
--rdn0Number reading: 0=auto, 1=value, 2=string, 3=string-prefer
--list-voicesPrint voice list and exit

Popular Voices

VCNNameGenderLanguageScene
------------------------------------
x5_lingyuzhao_flow聆玉昭Female中文交互聊天
x5_lingxiaotang_flow聆小糖Female中文语音助手
x6_lingfeiyi_pro聆飞逸Male中文交互聊天
x6_lingxiaoli_pro聆小璃Female中文交互聊天
x6_pangbainan1_pro旁白男声Male中文旁白配音
x6_pangbainv1_pro旁白女声Female中文旁白配音
x6_lingfeihan_pro聆飞瀚Male中文纪录片
x5_EnUs_Grant_flowGrantFemaleEnglish交互聊天
x5_EnUs_Lila_flowLilaFemaleEnglish交互聊天
x4_zijin_oral子津Male天津话交互聊天
x4_ziyang_oral子阳Male东北话交互聊天

Run --list-voices for the complete list (50+ voices).

Text Features

Silent pauses

Insert [p500] in text for a 500ms pause:

你好[p500]科大讯飞

Specify pronunciation

Use [=pinyin] after a character to force pronunciation:

着[=zhuo2]手

Notes

  • Endpoint: wss://cbm01.cn-huabei-1.xf-yun.com/v1/private/mcd9m97e6
  • Protocol: WebSocket (RFC 6455) with HMAC-SHA256 signed URL auth
  • Text limit: max 64KB total per session
  • Session timeout: 60 seconds
  • Text input speed: must exceed 15 chars/sec for streaming (not relevant for single-shot mode)
  • No pip dependencies: uses a built-in minimal WebSocket client on pure Python stdlib
  • Env vars: XFYUN_APP_ID, XFYUN_API_KEY, XFYUN_API_SECRET
  • Output: prints the absolute path of saved audio to stdout (for easy piping to other tools)
  • x4 series voices (x4_*_oral) support oral configuration parameters (口语化), x5/x6 do not
  • Voices must be enabled in the console before use

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-03 08:52 安全 安全

安全检测

暂无安全检测报告