← 返回
未分类

Local Transcription

Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca...
本地语音转文字(Qwen ASR),转录任务在 Apple Silicon 设备集群中分配,支持会议、语音笔记、播客等本地内容。
twinsgeeks twinsgeeks 来源
未分类 clawhub v1.0.2 1 版本 100000 Key: 无需
★ 1
Stars
📥 437
下载
💾 2
安装
1
版本
#apple-silicon#audio-transcription#fleet-routing#latest#local-stt#mac-mini#mac-studio#mlx#offline#offline-transcription#qwen-asr#speech-to-text#stt#transcription#whisper

概述

Local Speech-to-Text Transcription

You're helping someone use speech-to-text transcription on audio files — meetings, voice memos, podcast episodes, phone recordings — without sending anything to the cloud. Every audio file stays on their devices. The fleet picks the best node to handle each speech-to-text transcription automatically.

Why local speech-to-text transcription matters

Cloud speech-to-text transcription APIs charge per minute and send your audio to third-party servers. Meeting recordings contain sensitive business discussions. Voice notes contain personal thoughts. Podcast interviews contain unreleased content. None of that should leave your network. Local transcription keeps it private.

This skill routes speech-to-text transcription requests across your fleet of devices. If one machine is busy with a 3-hour transcription, the next speech-to-text request goes to a different device. Transcription queue management, health monitoring, and dashboard visibility — same infrastructure you'd get from a cloud speech-to-text API, running entirely on your hardware.

Get started with speech-to-text transcription

pip install ollama-herd
herd                                    # start the transcription router (port 11435)
herd-node                               # start on each transcription device
uv tool install "mlx-qwen3-asr[serve]" --python 3.14  # install speech-to-text model

Enable speech-to-text transcription:

curl -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" \
  -d '{"transcription": true}'

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Transcribe audio with speech-to-text

curl — basic transcription

# Speech-to-text transcription of a meeting recording
curl -s http://localhost:11435/api/transcribe \
  -F "audio=@meeting-recording.wav" | python3 -m json.tool

Python — speech-to-text transcription

import httpx

def speech_to_text_transcription(audio_path):
    """Run speech-to-text transcription on an audio file."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    return transcription_result["text"]

# Run speech-to-text transcription
transcription_text = speech_to_text_transcription("meeting.wav")
print(transcription_text)

Speech-to-text transcription with timestamps

def transcription_with_timestamps(audio_path):
    """Speech-to-text transcription returning timestamped chunks."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    for transcription_chunk in transcription_result.get("chunks", []):
        print(f"[{transcription_chunk['start']:.1f}s - {transcription_chunk['end']:.1f}s] {transcription_chunk['text']}")
    return transcription_result

Transcription response format

{
  "transcription_text": "Hello, this is a test of the speech-to-text transcription system.",
  "language": "English",
  "transcription_chunks": [
    {
      "text": "Hello, this is a test of the speech-to-text transcription system.",
      "start": 0.0,
      "end": 3.2,
      "chunk_index": 0,
      "language": "English"
    }
  ]
}

Supported audio formats for transcription

WAV, MP3, M4A, FLAC, MP4, OGG — any format FFmpeg supports. WAV files get a ~25% transcription speed boost via native fast-path.

Speech-to-text transcription response headers

HeaderDescription
---------------------
X-Fleet-NodeWhich device performed the speech-to-text transcription
X-Fleet-ModelTranscription model used (qwen3-asr)
X-Transcription-TimeTranscription processing time in milliseconds

Speech-to-text transcription model

Qwen3-ASR — state-of-the-art open-source speech-to-text transcription in 2026. ~5% word error rate, runs natively on Apple Silicon via MLX. The 0.6B transcription model uses ~1.2GB memory and transcribes at 0.08x real-time factor (a 10-minute recording completes transcription in ~48 seconds).

Also available on this fleet

The same router handles three other AI workloads alongside speech-to-text transcription. All endpoints are at http://localhost:11435:

LLM inference

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-oss:120b","messages":[{"role":"user","content":"Hello"}]}'

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Embeddings

curl http://localhost:11435/api/embeddings \
  -d '{"model":"nomic-embed-text","prompt":"search query"}'

Monitoring speech-to-text transcription

# Transcription stats (last 24h)
curl -s http://localhost:11435/dashboard/api/transcription-stats | python3 -m json.tool

# Fleet health (includes speech-to-text transcription activity)
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — speech-to-text transcription queues show with [STT] badge alongside LLM and image queues.

Full documentation

Agent Setup Guide — complete reference for all 4 model types including speech-to-text transcription with Python, JavaScript, and curl examples.

Guardrails

  • Never delete or modify audio files provided by the user for transcription.
  • Never send audio data to external services — all speech-to-text transcription is local.
  • Never delete or modify files in ~/.fleet-manager/.
  • If transcription fails, suggest checking node logs: tail ~/.fleet-manager/logs/herd.jsonl.
  • If no speech-to-text models available, suggest installing: uv tool install "mlx-qwen3-asr[serve]" --python 3.14.

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-05-03 07:33 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 216 📥 47,036
life-service

Adopt A Pet

twinsgeeks
领养虚拟宠物作为AI智能体。为它取名、喂食、见证成长。64种以上物种,从猫狗到AI原生生物。实时饥饿感,5个进化阶段。
★ 0 📥 799
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 428 📥 116,572