← 返回
未分类

speech-translation

Build, adapt, or run an audio-processing workflow that takes spoken audio, transcribes it with Whisper or faster-whisper, translates the transcript using the...
构建、适配或运行音频处理工作流,接收口语音频,使用 Whisper 或 faster‑whisper 转录,然后使用 ... 对转录文本进行翻译。
decin decin 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 486
下载
💾 1
安装
1
版本
#latest

概述

Voice Translate

Use this skill for two closely related modes:

  1. Chat-native mode: the user sends audio or a voice note in OpenClaw; return transcript text, translation text, and translated audio.
  2. Local pipeline mode: run a deterministic file-based pipeline that writes transcript, translation, wav, and metadata artifacts.

Default to an LLM-assisted translation workflow: let the current agent produce the translation, save it to a file when using the local pipeline, or use the surrounding agent turn directly when responding in chat.

Workflow

A. Chat-native mode

Use this when an inbound message already contains an audio transcript from OpenClaw media understanding, or when the user asks you to process a voice message conversationally.

  1. Detect that the user sent audio or that the request is for voice translation.
  2. Obtain or confirm the transcript text.
  3. Translate with the current model.
  4. Send the transcript text to the user.
  5. Send the translated text to the user.
  6. Synthesize the translated text as audio:
    • prefer the OpenClaw tts tool when you need an immediate chat reply with audio
    • prefer Piper when you need a local wav artifact
  7. Keep the output order stable: transcript first, translation second, audio last.

B. Local pipeline mode

  1. Confirm input/output expectations: source language, target language, output directory, and whether the run should be real or mock.
  2. Choose backends:
    • faster-whisper for real transcription, mock for pipeline testing.
    • llm as the default translation path when an agent/model is available.
    • service only when unattended HTTP translation is preferable.
    • manual only as a fallback.
    • piper for real TTS, mock for dry-run testing.
  3. Run transcription.
  4. If using the default llm path, read the transcript and translate it with the current model. Save the translated text to a file.
  5. Run synthesis/output writing with --translation-file.
  6. Inspect outputs:
    • 01_transcript.txt
    • 02_translation.txt
    • 03_translation.wav
    • result.json
  7. If the user wants chat updates during processing, pass notifier commands with --transcript-command, --translation-command, and --audio-command.

Preferred execution patterns

Default LLM-assisted path

Use this when the agent handling the task can translate the transcript itself.

  1. Run the pipeline once transcription is available, or run the full command after preparing translation.txt.
  2. Save the model-produced translation to a file.
  3. Invoke:
bash scripts/run_voice_translate_llm.sh \
  /path/to/input.m4a \
  ./outputs/llm-run \
  zh \
  en \
  /path/to/en_US-lessac-medium.onnx \
  ./translation.txt \
  --whisper-model small \
  --transcribe-backend faster-whisper \
  --tts-backend piper

Read references/llm-translation-pattern.md when you need the exact orchestration pattern or a reusable translation prompt.

Mock end-to-end validation

Use this first when you need to validate the pipeline structure without model/runtime dependencies.

python3 scripts/run_voice_translate.py \
  --input references/examples/mock-input.txt \
  --output-dir ./outputs/mock-run \
  --source-lang zh \
  --target-lang en \
  --transcribe-backend mock \
  --translation-file ./translated.txt \
  --translation-backend llm \
  --no-interactive-translate \
  --tts-backend mock \
  --piper-model ./dummy.onnx

Notes:

  • mock transcription reads plain text from the input file.
  • mock TTS writes a silent wav file.
  • --piper-model is still required by the current CLI shape even when using mock TTS; use any placeholder path.
  • llm mode currently means the translation must already exist in --translation-file.

Service fallback

python3 scripts/run_voice_translate.py \
  --input /path/to/input.m4a \
  --output-dir ./outputs/service-run \
  --source-lang zh \
  --target-lang en \
  --whisper-model small \
  --transcribe-backend faster-whisper \
  --translation-backend service \
  --translation-service-url http://127.0.0.1:8000/translate \
  --tts-backend piper \
  --piper-model /path/to/en_US-lessac-medium.onnx

Resources

scripts/

  • run_voice_translate.py: primary entrypoint.
  • run_voice_translate_llm.sh: thin wrapper for the default LLM-assisted path.
  • voice_translate_app/: pipeline modules.
  • send_text.py: wrap stage text and forward it via a shell command.
  • send_audio.py: forward generated audio via a shell command.
  • mock_text_sender.py, mock_audio_sender.py: local smoke-test helpers.

references/

  • Read references/runtime-notes.md for dependency/setup details, backend behavior, and integration constraints.
  • Read references/llm-translation-pattern.md when the surrounding agent should perform translation with its own model.
  • Read references/openclaw-chat-mode.md when implementing or following the conversational flow: receive voice, output transcript text, output translation text, then output translated audio.

Editing guidance

  • Keep SKILL.md procedural and short.
  • Put environment- or backend-specific detail in references.
  • Treat llm as the preferred translation path for agent-driven workflows.
  • In chat-native mode, preserve the user-visible ordering: transcript text, translation text, then audio.
  • Prefer OpenClaw tts for immediate conversational audio replies; prefer Piper for local wav artifacts and offline pipelines.
  • If the user wants tighter OpenClaw integration, add an attachment-aware outer workflow or hook instead of rewriting ASR/TTS first.
  • Preserve the current file contract unless the user asks to change it: transcript, translation, wav, metadata JSON.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-03 06:07 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 426 📥 116,378
design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 216 📥 46,779
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 330 📥 93,172