概述

Transcription Skill

This skill provides transcription capabilities for audio and video files using the OpenAI Whisper API endpoint.

Quick Start

Send me an audio or video file and I'll transcribe it automatically! Just attach the file to your message.

Manual Usage

You can also run the script directly from the command line:

cd /home/openclaw/.openclaw/workspace/skills/transcription/scripts
python3 transcribe_audio.py inputfile.ogg

For video files:

python3 transcribe_audio.py video.mp4

Supported Formats

Audio: mp3, wav, mp4, mpeg, mpga, m4a, ogg, webm, flac, aac, wma

Video: mp4, mov, avi, mkv, webm (audio extracted automatically)

Usage Examples

How to use the python scripts for transcription:

python3 transcribe_audio.py inputfile.ogg

Features

Extract timestamps in output when needed
Batch processing - send multiple files at once
Video support - automatically extracts audio from video files
Multiple output formats - text, JSON, SRT subtitles, VTT captions

Technical Details

API Endpoint: http://192.168.0.11:8080/v1 (local Whisper endpoint)
Model: whisper-small (default)
Temperature: 0.0 (deterministic results)
Auto-extraction: ffmpeg handles video → audio conversion

Notes

Ensure files are clear and not too noisy for best results
Language auto-detection works well for most cases
For batch processing, send files one at a time or specify "batch" in your request

Scripts: scripts/transcribe_audio.py, scripts/transcribe_simple.py

References: references/transcription_guide.md

版本历史

共 1 个版本

v1.0.1 当前

2026-03-19 07:28 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)