概述

mlx-whisper — Local Voice Transcription for Apple Silicon

Enables automatic transcription of voice notes in OpenClaw using Apple's MLX framework.

No API key required. Works fully offline. ~60× faster than standard Whisper on M1/M2/M3/M4.

How it works

User sends a voice note (Telegram .ogg / WhatsApp .opus)
OpenClaw downloads the audio file
Passes it to mlx-whisper-transcribe.sh via {{MediaPath}}
Transcript is injected as the message body
Agent replies to the text content

Setup

Step 1 — Install mlx-whisper

pip3 install mlx-whisper

Verify:

python3 -c "import mlx_whisper; print('OK')"

Step 2 — Install the wrapper script

Find the Python bin path:

python3 -m site --user-base
# e.g. /Users/<you>/Library/Python/3.9

Copy bin/mlx-whisper-transcribe.sh from this skill to /bin/mlx-whisper-transcribe.sh, then make it executable:

PYBIN=$(python3 -m site --user-base)/bin
cp {baseDir}/bin/mlx-whisper-transcribe.sh "$PYBIN/mlx-whisper-transcribe.sh"
chmod +x "$PYBIN/mlx-whisper-transcribe.sh"

Test it:

"$PYBIN/mlx-whisper-transcribe.sh" /path/to/audio.ogg
# First run downloads the model (~465MB). Subsequent runs are instant.

Step 3 — Configure OpenClaw

Add to ~/.openclaw/openclaw.json under tools.media.audio:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "<user-base>/bin/mlx-whisper-transcribe.sh",
            "args": ["{{MediaPath}}"],
            "timeoutSeconds": 60
          }
        ]
      }
    }
  }
}

Replace with the output of python3 -m site --user-base.

Step 4 — Restart OpenClaw

openclaw gateway restart

Or restart the OpenClaw app from the menu bar.

Models

The wrapper uses whisper-small-mlx by default (465MB, good balance of speed and accuracy).

To change, edit bin/mlx-whisper-transcribe.sh and update path_or_hf_repo:

Model	Size	Use case
-------	------	----------
`mlx-community/whisper-tiny-mlx`	75MB	Fastest, basic accuracy
`mlx-community/whisper-small-mlx`	465MB	Recommended
`mlx-community/whisper-medium-mlx`	1.5GB	Higher accuracy
`mlx-community/whisper-large-v3-mlx`	3GB	Best accuracy

Language hint (optional)

Pass a language code as the second argument to skip auto-detection (faster):

mlx-whisper-transcribe.sh audio.ogg zh   # Chinese
mlx-whisper-transcribe.sh audio.ogg en   # English

In openclaw.json, add the language to args:

"args": ["{{MediaPath}}", "zh"]

Performance (M3 MacBook Pro, 8GB)

Audio length	Transcription time
-------------	-------------------
10 sec	~1 sec
1 min	~7 sec
30 min	~3.5 min

Troubleshooting

mlx_whisper not found: Run pip3 install mlx-whisper again
Empty transcript: Audio may be silent or music-only (Whisper transcribes speech only)
Timeout: Increase timeoutSeconds for long audio files
Wrong language: Add "language": "zh" or the target language code to args
Model download fails: Check internet connection; models are cached after first run in ~/.cache/huggingface

版本历史

共 1 个版本

v1.0.7 当前

2026-03-31 01:24 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)