← 返回
未分类 Key

ASR+TTS语音聊天配置助手

Voice service configuration assistant. Triggers when: 1) a voice message is received; 2) user requests voice reply; 3) an audio attachment is received ([media attached: audio/...] format or audio file); 4) user requests speech recognition or transcription. When any voice-related need (inbound or outbound) is unmet, you MUST follow this SKILL in full to guide configuration — never just say 'not supported'.
Yum^2
未分类 community v1.0.7 8 版本 99539.2 Key: 需要
★ 0
Stars
📥 216
下载
💾 12
安装
8
版本
#latest

概述

Voice Service Configuration Assistant

⚠️ Installation Notes

This skill contains the following files. All must be deployed to the skills directory:

voice-process-helper/
├── SKILL.md
└── scripts/
    ├── check-tts.sh                ← TTS readiness check
    ├── check-asr.sh                ← ASR readiness check
    ├── install-tts.sh              ← TTS one-click install (param: tagged|always)
    ├── install-asr.sh              ← ASR one-click install (param: tiny|base)
    ├── restart-gateway.sh          ← Gateway restart
    └── edge-tts-universal/
        ├── index.js                ← TTS plugin code
        └── openclaw.plugin.json    ← TTS plugin manifest

When installing from a zip archive, the entire directory (including scripts/) must be extracted. Do not copy SKILL.md alone.


Route to the TTS (voice reply) or ASR (speech-to-text) flow based on user intent.

⚠️ Mandatory Rules

Enter the ASR flow immediately upon receiving any of the following:

  • Audio attachments in [media attached: ...] format (audio/ogg, audio/mp3, etc.)
  • Feishu voice message raw JSON: {"file_key":"...","duration":...}only when the user's entire message is this JSON (no other context). If the JSON appears inside file content, SKILL.md text, code blocks, or documentation quotes, it is NOT a voice message — do not trigger ASR.

Processing rules:

  • ASR ready → the framework transcribes automatically; process the transcription result directly
  • ASR not ready → enter ASR configuration flow immediately; do not simply reply "I can't process voice"

Core Principles

Different solutions involve cost, privacy, and resource trade-offs — users must make informed decisions.

Always present options first, wait for explicit user selection, then execute.

⚠️ Restart Rules

After TTS and/or ASR installation, the Gateway must be restarted for changes to take effect.

Before restarting, tell the user:

> ⏳ 正在重启服务,大约需要 1 分钟,请稍候…

bash <skill_dir>/scripts/restart-gateway.sh

After restart:

> ✅ 服务已重启!请发送 /new 开始新会话。

If installing both TTS and ASR, restart only once after all installations — do not restart in between.


Part 1: TTS (Voice Reply)

Step 1 — Check Readiness

bash <skill_dir>/scripts/check-tts.sh

The script returns JSON with status:

  • ready → TTS is fully operational. Read the auto field and follow the tag rules below.
  • partial → Config exists but something is missing (binary, plugin, or plugins.allow). Run install-tts.sh to fix, then restart Gateway.
  • not_configured → Proceed to present options.

If status is ready, inform the user and ask:

> 当前已配置 TTS provider:{provider},auto 模式:{auto}

>

> 请问你想:

> - 继续使用:保持现有配置,我直接用语音回复你

> - 重新配置:覆盖现有配置,重新选择 TTS 方案

If user chooses "继续使用" → follow the tag rules based on auto value (see "TTS Tag Format" below).

If user chooses "重新配置" → proceed to present options.

Step 2 — Present Options

> A:edge-tts-universal(免费,推荐)

> 自动适配所有通道格式,无需额外配置:

> - 飞书 / Telegram / WhatsApp / Matrix → OGG(原生语音气泡)

> - 企业微信 → AMR(原生语音消息)

> - QQbot → MP3(原生语音消息)

> - Slack / 其他 → MP3

>

> - A1:智能语音回复——由 AI 判断哪些回复需要语音(推荐)

> - A2:所有消息都用语音回复

>

> B:使用腾讯云语音合成(付费,音质更自然)

> 支持多种中文音色,新用户有免费额度,需配置腾讯云 SecretID / SecretKey。

>

> 请回复 A1A2B

Step 3 — Install

User selects A1 or A2

> ⏳ 正在部署语音服务…

bash <skill_dir>/scripts/install-tts.sh tagged   # A1
bash <skill_dir>/scripts/install-tts.sh always    # A2

The script installs edge-tts + ffmpeg → deploys plugin → writes config.

After installation, restart Gateway per "Restart Rules".

User selects B — Tencent Cloud TTS

> ⏳ 正在安装腾讯云语音合成插件…

skillhub install tencentcloud-tts

Prompt user for Tencent Cloud SecretID / SecretKey, then:

openclaw config set skills.entries.tencentcloud-tts.env --strict-json '{"secret_id":"<ID>","secret_key":"<KEY>"}'

> ✅ 腾讯云语音合成已配置完成!你可以让我用语音回复你试试。


TTS Tag Format

auto = always: Output plain text only. No [[tts: tags allowed. The framework converts full text to speech automatically. Adding tags causes raw tag text to leak.

auto = tagged: Wrap content to be spoken with [[tts:text]]...[[/tts:text]].

[[tts:text]]content to speak[[/tts:text]]
Followed by normal text.

> ⚠️ Only [[tts:text]]...[[/tts:text]] is recognized. [[tts]]...[[/tts]] is wrong — never omit :text.

❌ Wrong formats: [[tts:…]], [[tts]]content[[/tts]], [[tts:voice=xxx]], mismatched tags.

Language-Voice Mapping

Languagelangvoice
-----------------------
Chinesezh-CNzh-CN-XiaoxiaoNeural
Englishen-USen-US-AriaNeural
Japaneseja-JPja-JP-NanamiNeural
Koreanko-KRko-KR-SunHiNeural
Frenchfr-FRfr-FR-DeniseNeural
Germande-DEde-DE-KatjaNeural
Spanishes-ESes-ES-ElviraNeural

Part 2: ASR (Speech Recognition)

Trigger Conditions

  • User explicitly requests transcription ("帮我转文字", "识别语音", etc.)
  • User sends a voice message or audio attachment — receiving audio content is itself a trigger

Step 1 — Check Readiness

bash <skill_dir>/scripts/check-asr.sh

The script returns JSON with status:

  • ready → ASR is fully operational (config_enabled + config_model + whisper_installed all OK). Inform the user and end the flow.
  • partial → Something is missing. Show the user which parts are incomplete and run install-asr.sh to fix, then restart Gateway.
  • not_configured → Proceed to present options.

Step 2 — Probe Machine Resources

echo "=== CPU ===" && nproc && echo "=== MEM ===" && free -h | head -2 && echo "=== DISK ===" && df -h / | tail -1 && echo "=== GPU ===" && (nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null || echo "无 GPU")

Step 3 — Present Options (Wait for User Selection)

> 当前机器配置:[CPU]核 / [内存] / [磁盘剩余] / [GPU 信息]

>

> A:Whisper 语音识别(开源免费,本地运行)

> 完全离线,隐私安全。使用 CPU 优化安装,需约 400MB。

> 支持 99 种语言。安装约 5~10 分钟。

>

> B:腾讯云 ASR(付费,商用级别)

> 支持普通话、粤语、英语、日语等。三种模式:一句话(≤60s)、极速版(≤2h)、长音频(≤5h)。

> 安装约 1~2 分钟。安装后需配置腾讯云凭证,新用户有免费额度,可以从腾讯云ASR控制台领取。

>

> 请回复 AB

Step 4 — Install

User selects A — Whisper

> ⏳ 正在安装 Whisper 语音识别(CPU 优化版,约 400MB),大约 5~10 分钟…

Model selection: default base (~140MB) for better accuracy; use tiny (~75MB) only when disk space is critically limited.

bash <skill_dir>/scripts/install-asr.sh base

After installation, restart Gateway per "Restart Rules".

User selects B — Tencent Cloud ASR

> ⏳ 正在安装腾讯云 ASR 插件…

skillhub install tencentcloud-asr

Prompt user for Tencent Cloud SecretID / SecretKey, then:

openclaw config set skills.entries.tencentcloud-asr.env --strict-json '{"secret_id":"<ID>","secret_key":"<KEY>"}'

> ✅ 腾讯云 ASR 已配置完毕!发送语音或音频文件,我来帮你转写。

版本历史

共 6 个版本

  • v1.0.7 调整 ASR 模型为 base 识别率提高; 过滤特殊 TTS 中不适合朗读的特殊字符; 调整部分文案; 当前
    2026-04-10 21:03 安全 安全
  • v1.0.6 统一 skill 名称,更新 README 文档
    2026-04-10 15:11 安全
  • v1.0.5 新增对于多个通道语音气泡的支持
    2026-04-10 14:22 安全
  • v1.0.4 Initial release
    2026-04-03 16:36 安全
  • v1.0.3 Initial release
    2026-04-03 09:33 安全 安全
  • v1.0.2 Initial release
    2026-04-02 16:50 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,371 📥 319,657
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,071 📥 804,682
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,223 📥 267,310