← 返回
AI智能 Key 中文

Deepgram Voice Workflow

End-to-end voice workflow with Deepgram STT and TTS. Use when transcribing voice messages, generating spoken replies, or building a shell-based audio pipelin...
使用 Deepgram STT 和 TTS 实现端到端语音工作流,适用于转录语音消息、生成语音回复或构建基于 shell 的音频管道...
mengbad
AI智能 clawhub v0.1.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 633
下载
💾 6
安装
1
版本
#latest

概述

Deepgram Voice Workflow

Overview

Use this skill for a complete speech workflow:

  1. transcribe audio to text with Deepgram STT
  2. optionally synthesize a spoken reply with Deepgram TTS
  3. return structured outputs that can feed chat or agent pipelines

This skill is the right choice when the task is broader than plain transcription and needs an input-audio to output-audio pipeline.

Quick Start

Transcribe only

{baseDir}/scripts/deepgram-transcribe.sh /path/to/audio.ogg

Generate speech from text

{baseDir}/scripts/deepgram-tts.sh "你好,我是 Neko。"

Run the full pipeline

{baseDir}/scripts/neko-voice-pipeline.sh /path/to/audio.ogg --reply "收到啦,这是语音回复测试。"

Environment

Set DEEPGRAM_API_KEY before use.

The bundled scripts also fall back to reading it from:

  • /root/.openclaw/.env

Workflow Decision

Use deepgram-transcribe.sh when

  • only text transcription is needed
  • the downstream system will generate its own reply
  • the task is speech-to-text only

Use deepgram-tts.sh when

  • text already exists
  • only an MP3 spoken response is needed
  • the workflow is text-to-speech only

Use neko-voice-pipeline.sh when

  • the task begins with an audio file
  • a transcript is needed
  • an optional spoken reply should be generated in the same flow

Outputs

STT output

deepgram-transcribe.sh writes:

  • transcript text file
  • raw API JSON file next to it

TTS output

deepgram-tts.sh writes:

  • MP3 output file

Pipeline output

neko-voice-pipeline.sh prints JSON with:

  • out_dir
  • transcript_path
  • transcript
  • reply_audio_path

This makes it easy to wire into scripts or adapters.

Typical Uses

Prefer this skill for:

  • transcribing Telegram/QQ/OneBot voice messages
  • generating MP3 replies to short voice prompts
  • building bot-side voice input/output automation
  • testing speech pipelines from shell without introducing a full SDK

Notes

  • Defaults are tuned for lightweight practical use, not maximal configurability.
  • deepgram-transcribe.sh defaults to model=nova-2 and language=zh.
  • deepgram-tts.sh defaults to model=aura-2-luna-en; override the model when a different voice is preferred.
  • Inspect the raw JSON transcript response when debugging recognition quality or API errors.

References

Read these files when needed:

  • references/stt-notes.md for transcription details
  • references/tts-notes.md for speech synthesis details
  • references/pipeline-notes.md for end-to-end pipeline behavior

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-31 19:17 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 418 📥 115,252
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 836 📥 213,222
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,922