← 返回
内容创作 中文

qwenspeak

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...
通过SSH使用Qwen3-TTS进行文本转语音生成。支持预设语音、语音克隆和语音设计。适用于用户需要生成语音音频、克隆声音等场景。
psyb0t
内容创作 clawhub v1.5.0 1 版本 99852.1 Key: 无需
★ 0
Stars
📥 1,350
下载
💾 50
安装
1
版本
#latest

概述

qwenspeak

YAML-driven text-to-speech over SSH using Qwen3-TTS models.

For installation and deployment, see references/setup.md.

SSH Wrapper

Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.

scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file

TTS Generation

Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.

# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml

# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}

# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"

# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"

# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav

YAML Structure

Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.

steps:
  - mode: custom-voice
    model_size: 1.7b
    speaker: Ryan
    language: English
    generate:
      - text: "Hello world"
        output: hello.wav
      - text: "I cannot believe this!"
        speaker: Vivian
        instruct: "Speak angrily"
        output: angry.wav

  - mode: voice-design
    generate:
      - text: "Welcome to our store."
        instruct: "A warm, friendly young female voice with a cheerful tone"
        output: welcome.wav

  - mode: voice-clone
    model_size: 1.7b
    ref_audio: ref.wav
    ref_text: "Transcript of reference"
    generate:
      - text: "First line in cloned voice"
        output: clone1.wav
      - text: "Second line"
        output: clone2.wav

Modes

custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.

voice-design — Describe the voice in natural language via instruct. 1.7B only.

voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.

Emotion trick for cloned voices

Upload references with different emotions, use separate steps:

scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav
steps:
  - mode: voice-clone
    ref_audio: refs/happy.wav
    ref_text: "transcript of happy ref"
    generate:
      - text: "Great news everyone!"
        output: happy1.wav

  - mode: voice-clone
    ref_audio: refs/angry.wav
    ref_text: "transcript of angry ref"
    generate:
      - text: "This is unacceptable"
        output: angry1.wav

Job Management

scripts/qwenspeak.sh "tts list-jobs"              # list all
scripts/qwenspeak.sh "tts list-jobs --json"        # JSON output
scripts/qwenspeak.sh "tts get-job <id>"            # job details
scripts/qwenspeak.sh "tts get-job-log <id>"        # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f"     # follow log
scripts/qwenspeak.sh "tts cancel-job <id>"         # cancel

Statuses: queuedrunningcompleted | failed | cancelled

Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).

File Operations

All paths relative to the work directory. Traversal blocked.

CommandDescription
--------------------------------------------------------
put Upload file from stdin
get Download file to stdout
list-files [--json]List directory
remove-file Delete a file
create-dir Create directory
remove-dir Remove empty directory
move-file Move or rename
copy-file Copy a file
file-exists Check if file exists (true/false)
search-files Glob search (** recursive)

Speakers

SpeakerGenderLanguageDescription
--------------------------------------------------------------------
VivianFemaleChineseBright, slightly edgy young voice
SerenaFemaleChineseWarm, gentle young voice
Uncle_FuMaleChineseSeasoned, low mellow timbre
DylanMaleChineseYouthful Beijing dialect, clear natural timbre
EricMaleChineseLively Chengdu/Sichuan dialect, slightly husky
RyanMaleEnglishDynamic with strong rhythmic drive
AidenMaleEnglishSunny American, clear midrange
Ono_AnnaFemaleJapanesePlayful, light nimble timbre
SoheeFemaleKoreanWarm with rich emotion

YAML Options

All settings cascade: global > step > generation.

FieldDefaultDescription
------------------------------------------------------------------------------------------------
dtypefloat32float32, float16, bfloat16 (float16/bfloat16 GPU only)
flash_attnautoFlashAttention-2: auto-detects, auto-switches float32→bfloat16
temperature0.9Sampling temperature
top_k50Top-k sampling
top_p1.0Top-p / nucleus sampling
repetition_penalty1.05Repetition penalty
max_new_tokens2048Max codec tokens to generate
no_samplefalseGreedy decoding
streamingfalseStreaming mode (lower latency)
moderequiredStep only: custom-voice, voice-design, or voice-clone
model_size1.7bStep only: 1.7b or 0.6b
textrequiredText to synthesize
outputrequiredOutput file path
speakerViviancustom-voice: speaker name
languageAutoLanguage for synthesis
instruct-custom-voice: emotion/style; voice-design: voice description
ref_audio-voice-clone: reference audio file path
ref_text-voice-clone: transcript of reference audio
x_vector_onlyfalsevoice-clone: use speaker embedding only

版本历史

共 1 个版本

  • v1.5.0 当前
    2026-03-29 03:56 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 294 📥 136,401
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,131
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 857 📥 199,255