← 返回
未分类 Key 中文

Azure Speech Tts

Azure Speech TTS skill for generating local audio files from text or SSML with Azure Speech. Use when the user asks to use Azure Speech / Azure TTS / Microso...
AzureSpeech TTS 技能,用于将文本或 SSML 生成本地音频文件,适用于用户请求使用 Azure Speech / Azure TTS / Microsoft Speech 时。
conanwhf conanwhf 来源
未分类 clawhub v1.0.2 1 版本 100000 Key: 需要
★ 0
Stars
📥 458
下载
💾 1
安装
1
版本
#latest

概述

Azure Speech TTS

Use Azure Speech to turn text or SSML into a local audio file under download/.

What this skill does

  • Synthesize plain text into speech
  • Synthesize full SSML payloads directly
  • Choose voice, output format, rate, pitch, style, and role
  • Save the result as a local audio file and print a JSON summary

Configuration

This skill uses a small default config file plus environment variables.

Default config file

File:

  • config.json

Default values:

  • default_voice: zh-CN-Yunqi:DragonHDOmniLatestNeural
  • default_format: mp3
  • default_output_dir: download
  • default_timeout_seconds: 60

Secret values

Set these in the local shell environment:

  • AZURE_SPEECH_KEY
  • AZURE_SPEECH_REGION

Optional environment overrides

  • AZURE_SPEECH_VOICE
  • AZURE_SPEECH_FORMAT

Precedence

Use this order:

  1. CLI flag
  2. Environment variable
  3. config.json
  4. Built-in fallback

Quick start

python3 scripts/azure_tts.py \
  --text "你好,这是一段测试语音。" \
  --voice zh-CN-Yunqi:DragonHDOmniLatestNeural \
  --format mp3 \
  --output download/test.mp3

For SSML:

python3 scripts/azure_tts.py \
  --ssml-file temp/input.ssml \
  --format wav \
  --output download/test.wav

Workflow

  1. Decide whether the input is plain text or full SSML.
  2. Use --text / --text-file for normal narration.
  3. Use --ssml / --ssml-file only when the payload already contains a complete document.
  4. Pick the voice and output format, or let config.json supply the defaults.
  5. Run scripts/azure_tts.py.
  6. Return the generated audio path to the user.

Rules

  • Prefer plain text unless the user needs pauses, emphasis, multi-voice content, or expressive styling.
  • --ssml input must include a full root element.
  • Default voice is zh-CN-Yunqi:DragonHDOmniLatestNeural if nothing else is set.
  • Default output folder is download/.
  • If the user does not specify format, use the default MP3 output.
  • Do not put secrets in config.json.

Common formats

See references/azure-speech-cheatsheet.md for the format map and examples.

Short aliases supported by the script:

  • mp3
  • wav
  • pcm
  • ogg

Useful options

  • --voice: Azure voice name, for example en-US-AriaNeural
  • --language: SSML xml:lang for plain-text mode
  • --rate: speaking rate, for example +10%
  • --pitch: pitch adjustment, for example +2st
  • --style: expressive style such as cheerful, sad, chat
  • --style-degree: strength of the expressive style
  • --role: voice role when supported
  • --save-ssml: write the generated SSML to a file for inspection
  • --dry-run: print the generated SSML without calling Azure

Output

The helper script writes the audio file and prints JSON like:

{
  "ok": true,
  "output_path": "download/test.mp3",
  "format": "audio-24khz-48kbitrate-mono-mp3",
  "voice": "zh-CN-Yunqi:DragonHDOmniLatestNeural",
  "language": "zh-CN",
  "bytes": 123456
}

Use the printed output_path as the deliverable path.

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-05-03 07:11 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 217 📥 47,642
ai-agent

GLM MCP Server Use

conanwhf
用于OpenClaw 的 GLM MCP 服务器。配置并使用 4 个官方 Z.AI/GLM MCP 服务器(视觉、网页搜索、网页阅读器、zread),通过环境变量 AP...
★ 0 📥 437
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 430 📥 116,967