← 返回
未分类 Key

MAI Transcribe

Transcribe audio with Microsoft's MAI-Transcribe-1 model via Azure AI Speech.
使用 Microsoft MAI-Transcribe-1 模型在 Azure AI Speech 上进行音频转录。
robotsbuildrobots robotsbuildrobots 来源
未分类 clawhub v0.1.1 1 版本 100000 Key: 需要
★ 0
Stars
📥 304
下载
💾 0
安装
1
版本
#latest

概述

MAI-Transcribe-1

Transcribe an audio file via Azure AI Speech using Microsoft's MAI-Transcribe-1 model.

Quick start

node {baseDir}/scripts/transcribe.js /path/to/audio.m4a

Defaults:

  • Model: mai-transcribe-1
  • Output: .txt
  • API version: 2025-10-15

Useful flags

node {baseDir}/scripts/transcribe.js /path/to/audio.ogg --out /tmp/transcript.txt
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --language en-GB
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --json --out /tmp/transcript.json
node {baseDir}/scripts/transcribe.js /path/to/audio.wav --model mai-transcribe-1
node {baseDir}/scripts/transcribe.js --help

Required env vars

export AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"

How to get the API key

  1. Go to the Azure portal and open your Speech or Foundry Speech resource.
  2. Open Keys and Endpoint.
  3. Copy:
    • the resource endpoint, for example https://your-resource.cognitiveservices.azure.com
    • one of the resource keys
  4. Export them:
export AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"

If gh-style copy-paste chaos is happening, the most important bit is that this skill expects the Speech resource endpoint, not a generic Foundry project URL.

Optional:

export AZURE_SPEECH_API_VERSION="2025-10-15"

API shape

The script calls:

POST {AZURE_SPEECH_ENDPOINT}/speechtotext/transcriptions:transcribe?api-version=2025-10-15

Headers:

  • Ocp-Apim-Subscription-Key: {AZURE_SPEECH_KEY}

Multipart form fields:

  • audio
  • definition

Example definition payload:

{
  "enhancedMode": {
    "enabled": true,
    "model": "mai-transcribe-1"
  }
}

Notes

  • This is the same style of skill as the Whisper one: a small documented script wrapper, not a built-in OpenClaw media pipeline.
  • Tested successfully against a live Azure Speech resource.
  • --json writes the raw Azure response for debugging or downstream processing.
  • Audio is uploaded to Microsoft for processing.

版本历史

共 1 个版本

  • v0.1.1 当前
    2026-05-07 15:43 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 331 📥 94,122
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 430 📥 117,243
design-media

Video Frames

steipete
使用 ffmpeg 从视频中提取帧或短片。
★ 134 📥 52,989