← 返回
未分类 中文

Auto Whisper Safe

Transcribe long audio files safely on 16GB RAM machines using auto-chunking with Whisper’s base model and seamless transcript merging.
在 16GB 内存机器上使用自动分块与 Whisper 基模型安全转录长音频文件,并无缝合并转录结果。
neal-collab
未分类 clawhub v1.1.0 1 版本 99549.5 Key: 无需
★ 0
Stars
📥 221
下载
💾 0
安装
1
版本
#latest

概述

Auto-Whisper Safe — RAM-Friendly Voice Transcription

Transcribe voice messages and long audio files using OpenAI Whisper without crashing your machine. Designed for 16GB RAM systems running other processes (like OpenClaw agents).

The Problem

Whisper's turbo and large models use 6-10GB RAM. On a 16GB machine running OpenClaw + Ollama + other services, this causes OOM crashes. Existing Whisper skills don't handle this.

The Solution

  1. Auto-detects audio length via ffprobe
  2. Splits long audio (>10min) into 10-min chunks automatically
  3. Uses base model by default (~1.5GB RAM — safe on any 16GB machine)
  4. Merges transcripts seamlessly — no gaps, no duplicates
  5. Cleans up temp files automatically

Usage

# Basic usage
./transcribe.sh /path/to/audio.ogg

# Custom model (if you have more RAM)
WHISPER_MODEL=small ./transcribe.sh /path/to/audio.ogg

# Custom language
WHISPER_LANG=en ./transcribe.sh /path/to/audio.ogg

# Custom output directory
./transcribe.sh /path/to/audio.ogg /path/to/output/

RAM Usage by Model

ModelRAMSpeedAccuracyRecommended For
----------------------------------------------
tiny~1GB⚡⚡⚡★★Quick previews, low-RAM systems
base~1.5GB⚡⚡★★★Default — best balance
small~2.5GB★★★★When accuracy matters more
medium~5GB🐢★★★★★32GB+ RAM only
turbo~6GB🐢🐢★★★★★Dedicated transcription machines

OpenClaw Integration

Add to your agent's BOOTSTRAP.md:

## Voice Message Handling

When you receive `<media:audio>`, ALWAYS transcribe first:

1. Run: `./skills/auto-whisper-safe/transcribe.sh <audio-path>`
2. Read the output transcript file
3. Respond based on the transcribed content

Do this automatically — voice messages are meant to be transcribed.

Environment Variables

VariableDefaultDescription
--------------------------------
WHISPER_MODELbaseWhisper model size
WHISPER_LANGenAudio language (ISO code)

How Chunking Works

  • Audio ≤10min → transcribed directly (no splitting)
  • Audio >10min → split into 10-min segments via ffmpeg
  • Each segment transcribed independently
  • Transcripts concatenated in order
  • Temp files cleaned up on exit (even on errors)

Installation

# macOS
brew install openai-whisper ffmpeg

# Ubuntu/Debian
pip install openai-whisper
apt install ffmpeg

# Verify
whisper --help && ffmpeg -version

Why This Over Other Whisper Skills

  • RAM-safe: Won't crash your 16GB machine
  • Auto-chunking: Handles 1-hour podcasts without issues
  • Cleanup: No temp files left behind
  • Progress: Shows chunk-by-chunk progress
  • Configurable: Model + language via env vars
  • OpenClaw-native: Drop-in for any agent's BOOTSTRAP.md

Real-World Performance

Tested on Ubuntu 22.04, 16GB RAM, running OpenClaw (10 agents) + Ollama simultaneously:

Audio LengthModelRAM PeakTimeResult
--------------------------------------------
2 min voice memobase1.4GB~15s✅ Perfect
12 min podcast clipbase1.5GB (chunked)~90s✅ 2 chunks, seamless
45 min interviewbase1.5GB (chunked)~6min✅ 5 chunks, seamless
2 min voice memotiny0.9GB~8s✅ Good enough for quick reads

Supported Audio Formats

ffmpeg handles the conversion, so virtually any format works:

  • .ogg (Telegram voice messages)
  • .mp3, .m4a, .wav, .flac
  • .webm (browser recordings)
  • .opus (WhatsApp voice messages)

Changelog

v1.0.0

  • Initial release
  • Auto-chunking for long audio (>10min)
  • RAM-safe defaults (base model, 1.5GB)
  • Progress tracking per chunk
  • Automatic temp file cleanup
  • Configurable model and language

版本历史

共 1 个版本

  • v1.1.0 当前
    2026-05-12 05:43 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Agent Soul Crafter

neal-collab
使用详细的SOUL.md模板创建独特的AI智能体个性,涵盖身份、特质、专长、响应风格及安全规则,确保一致的体验。
★ 0 📥 1,262
content-creation

NAS Agent Sync

neal-collab
通过指定File Master代理,经由SSH连接群晖NAS或其他支持SSH的设备,集中管理多代理文件存储操作。
★ 0 📥 1,284
ai-intelligence

Agent Cost Monitor

neal-collab
实时监控所有OpenClaw智能体的Token用量及预估成本,提供告警与优化建议以控制预算。
★ 0 📥 1,863