← 返回
AI智能 Key

Polyphone TTS

Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wro...
Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wro...
scikkk
AI智能 clawhub v1.0.0 1 版本 99832.8 Key: 需要
★ 0
Stars
📥 597
下载
💾 5
安装
1
版本
#latest

概述

SenseAudio Polyphone TTS (多音字)

Precise pronunciation control for Chinese TTS via pinyin annotation. The dictionary parameter lets you override how specific characters are read — essential for polyphones (多音字) that the model might guess wrong.

> The dictionary parameter only works with cloned voices and model SenseAudio-TTS-1.5. System voices (male_0004_a etc.) do not support it.

Step 1: Scan for Polyphones

When the user provides text, scan it for these common polyphones and flag any that appear:

CharacterReadingsContext clues
------------------------------------
háng (行业/银行/行列) / xíng (行走/行动/可行)银行、行长、行业 → háng
gān (干净/干燥) / gàn (干活/干部)干部、干活 → gàn
liáng (量体温/测量) / liàng (数量/重量)数量、质量 → liàng
pū (铺床/铺路) / pù (店铺/铺子)店铺、铺面 → pù
hǎo (好的/很好) / hào (好奇/爱好)爱好、好学 → hào
le (吃了/来了) / liǎo (了解/了结)了解、了不起 → liǎo
de (跑得快) / dé (得到) / děi (得去)得到 → dé;必须 → děi
de (慢慢地) / dì (土地/地方)副词用法 → de
de (我的) / dí (的确) / dì (目的)目的、的确 → dì/dí
zhe (看着) / zháo (着火) / zhuó (着装)着火、着急 → zháo;着装 → zhuó
cháng (长度/很长) / zhǎng (成长/行长)行长、生长 → zhǎng
zhòng (重量/重要) / chóng (重复/重新)重复、重新 → chóng
zhōng (中间/中国) / zhòng (中奖/中毒)中奖、中毒 → zhòng
hái (还有/还是) / huán (还钱/归还)还钱、偿还 → huán
fā (发现/发展) / fà (头发/理发)头发、理发 → fà
shù (数字/数量) / shǔ (数数/数一数)数数、数落 → shǔ
cān (参加/参考) / shēn (人参/党参)人参、党参 → shēn
chā (差别/差距) / chà (差不多) / chāi (出差)出差 → chāi;差不多 → chà

Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.

Example:

检测到多音字:
- "行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
- "行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?

Step 2: Build the Dictionary

Convert confirmed readings into the dictionary array. Each entry covers one phrase containing the polyphone:

原文片段 → replacement 格式:在多音字前加 [pinyin],其余字保持原样

Pinyin format: [声母韵母声调数字] — e.g., [hang2][xing2][zhang3]

Example:

  • original: 银行行长
  • replacement: 银[hang2]行[zhang3]长

Build the full dictionary array:

"dictionary": [
  {"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"},
  {"original": "好奇心", "replacement": "[hao4]奇心"}
]

Each original should be a short phrase (3–8 chars) that uniquely identifies the occurrence in context. Avoid single-character originals — they may match unintended occurrences.

Step 3: Synthesize

The user must provide a cloned voice ID. If they don't have one, remind them that dictionary requires a cloned voice and suggest using the senseaudio-voice-cloner skill first.

curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "SenseAudio-TTS-1.5",
    "text": "<TEXT>",
    "stream": false,
    "voice_setting": {
      "voice_id": "<CLONED_VOICE_ID>"
    },
    "audio_setting": {
      "format": "mp3"
    },
    "dictionary": <DICTIONARY_ARRAY>
  }' -o response.json

jq -r '.data.audio' response.json | xxd -r -p > output.mp3

Check base_resp.status_code == 0 before decoding.

Step 4: Iterate

After the user listens, they may find additional mispronunciations. Update the dictionary array and re-synthesize. Keep the previous response.json until the new one succeeds.

Report: file path, duration (jq '.extra_info.audio_length' response.json ms), character count, and which dictionary entries were applied.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-19 19:40 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 836 📥 213,118
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,341
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,815