← 返回
未分类 Key 中文

audio to text and video to text

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — incl...
使用 OpenAI Whisper API 将音频和视频文件转录为文本。当用户需要将任何音视频文件转换为文本时使用此技能,包括但不限于...
ahqazi-dev
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 430
下载
💾 0
安装
1
版本
#latest

概述

Transcription Skill

Converts audio and video files into clean, readable text using OpenAI's Whisper API and ffmpeg for media handling.

Overview

This skill handles the full pipeline:

  1. Media extraction — use ffmpeg to strip audio from video files and convert to a Whisper-compatible format
  2. Chunking — split large files (>25 MB) into overlapping segments to stay within API limits
  3. Transcription — send each chunk to OpenAI's Whisper API
  4. Assembly — merge chunk transcripts, adjusting timestamps, into a single clean output
  5. Post-processing — optionally clean up with Claude (punctuation, speaker labels, summaries)

Requirements

  • ffmpeg must be installed (which ffmpeg to verify — it's usually pre-installed in claude.ai's environment)
  • OpenAI API key stored in the environment as OPENAI_API_KEY — the user must provide this
  • Python packages: openai, pydub (install via pip if needed)

Quick Start

When a user provides a media file, run the transcription script:

# Install dependencies if missing
pip install openai pydub --break-system-packages -q

# Run transcription
python /home/claude/transcription/scripts/transcribe.py \
  --input "/path/to/media/file" \
  --output "/mnt/user-data/outputs/transcript.txt" \
  --api-key "$OPENAI_API_KEY"

See scripts/transcribe.py for the full implementation.

Supported Formats

CategoryFormats
-------------------
Audiomp3, wav, m4a, ogg, flac, aac, opus, wma
Videomp4, mov, avi, mkv, webm, wmv, m4v

ffmpeg handles extraction from any of these.

Options & Flags

FlagDefaultDescription
----------------------------
--modelwhisper-1Whisper model to use (whisper-1, gpt-4o-transcribe)
--languageauto-detectISO 639-1 language code (e.g. en, ar, fr)
--formattxtOutput format: txt, srt, vtt, json
--timestampsoffInclude timestamps in output
--chunk-size20Max chunk size in MB (must be ≤ 25)
--promptnoneContext hint to improve accuracy (e.g. domain vocab)

Output Formats

  • txt — plain text, ideal for most uses
  • srt — SubRip subtitle format (for video players)
  • vtt — WebVTT format (for web video)
  • json — full Whisper JSON with segments and timestamps

Step-by-Step Workflow

1. Check for the file

Ask the user to upload the file or provide a local path. Check:

ls /mnt/user-data/uploads/

2. Check ffmpeg and install deps

which ffmpeg && ffmpeg -version 2>&1 | head -1
pip install openai pydub --break-system-packages -q 2>&1 | tail -3

3. Get the API key

If OPENAI_API_KEY is not set in the environment, ask the user:

> "Please provide your OpenAI API key — it starts with sk-. You can get one at https://platform.openai.com/api-keys"

4. Run the script

python /home/claude/transcription/scripts/transcribe.py \
  --input "<file_path>" \
  --output "/mnt/user-data/outputs/transcript.txt"

5. Post-process (optional but recommended)

After transcription, offer to:

  • Clean up punctuation/formatting with Claude
  • Summarize the content
  • Extract action items, speakers, or key topics
  • Translate to another language

Use the transcript text directly in the conversation for these steps.

Handling Large Files

The script automatically splits files > 20 MB into overlapping chunks (with 1-second overlap for continuity). Each chunk is transcribed separately and the results are merged.

For very long recordings (> 1 hour), warn the user it may take a few minutes and show progress.

Error Handling

ErrorFix
------------
AuthenticationErrorInvalid API key — ask user to verify
RateLimitErrorWait 60s and retry, or use --chunk-size 10
InvalidRequestError: file too largeReduce --chunk-size below 25
ffmpeg not foundsudo apt install ffmpeg or brew install ffmpeg
No audio stream foundFile may be corrupt or wrong format

Example Interaction

User: "Can you transcribe this meeting recording?"
[uploads meeting.mp4]

→ Check file exists in /mnt/user-data/uploads/
→ Run transcribe.py on it
→ Save transcript to /mnt/user-data/outputs/
→ present_files() to the user
→ Offer to summarize or extract action items

Notes for openclaw.ai

  • Always save output to /mnt/user-data/outputs/ so users can download it
  • Use present_files() to share the transcript file with the user after saving
  • For business users, suggest the srt or vtt format if they're adding captions to video
  • The --prompt flag is useful for technical/domain-specific content: pass a few domain keywords to improve accuracy

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-02 12:06 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Auto Cleaning Disk

ahqazi-dev
自动清理磁盘空间,安全删除临时文件、浏览器缓存、回收站/废纸篓及日志文件,不误删重要文件。
★ 0 📥 658

scrape-creator-profile

ahqazi-dev
从YouTube、Instagram、TikTok、Twitter/X、LinkedIn、Twitch等平台以及个人网站的创作者资料中抓取并提取结构化数据。
★ 0 📥 284

Video-Generator

ahqazi-dev
使用 React 和 Remotion 制作专业动态图形视频,适用于广告、演示、社交媒体和程序化视频内容。
★ 0 📥 153