← 返回
未分类 中文

Audio Processing (Iyeque)

Audio ingestion, analysis, transformation, and generation (Transcribe, TTS, VAD, Features).
音频摄取、分析、转换与生成(转录、语音合成、语音活动检测、特征提取)
iyeque
未分类 clawhub v1.1.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 390
下载
💾 0
安装
1
版本
#latest

概述

Audio Processing Skill

A comprehensive toolset for audio manipulation and analysis with security validations.

Security

  • File paths are validated to prevent path traversal attacks
  • Access to system directories (/etc, /proc, /sys, /root) is blocked
  • TTS text input is limited to 10,000 characters
  • All file operations use resolved absolute paths

Tool API

audio_tool

Perform audio operations like transcription, text-to-speech, and feature extraction.

  • Parameters:
  • action (string, required): One of transcribe, tts, extract_features, vad_segments, transform.
  • file_path (string, optional): Path to input audio file.
  • text (string, optional): Text for TTS (max 10,000 chars).
  • output_path (string, optional): Path for output file (default: auto-generated).
  • model (string, optional): Whisper model size (tiny, base, small, medium, large). Default: base.
  • ops (string, optional): JSON string of operations for transform action.

Usage:

# Transcribe audio file
uv run --with "openai-whisper" --with "pydub" --with "numpy" skills/audio-processing/tool.py transcribe --file_path input.wav

# Transcribe with specific model
uv run --with "openai-whisper" skills/audio-processing/tool.py transcribe --file_path input.wav --model small

# Text-to-speech
uv run --with "gTTS" skills/audio-processing/tool.py tts --text "Hello world" --output_path hello.mp3

# Extract audio features
uv run --with "librosa" --with "numpy" --with "soundfile" skills/audio-processing/tool.py extract_features --file_path input.wav

# Voice activity detection (find speech segments)
uv run --with "pydub" skills/audio-processing/tool.py vad_segments --file_path input.wav

# Transform audio (trim, resample, normalize)
uv run --with "pydub" skills/audio-processing/tool.py transform --file_path input.wav --ops '[{"op": "trim", "start": 10, "end": 30}, {"op": "normalize"}]'

Actions

transcribe

Convert speech to text using OpenAI Whisper.

  • Returns: { "text": "...", "segments": [...] }
  • Models: tiny, base, small, medium, large (larger = more accurate, slower)

tts

Generate speech from text using Google TTS.

  • Returns: { "file_path": "output.mp3", "status": "created" }
  • Language: English (default)

extract_features

Extract audio features for analysis.

  • Returns: duration, sample_rate, mfcc_mean, rms_mean
  • Useful for audio classification, quality analysis

vad_segments

Detect speech segments using silence detection.

  • Returns: { "segments": [{ "start": 0.5, "end": 3.2 }, ...] }
  • Uses FFmpeg silencedetect filter
  • Aggressiveness: 1-3 (default: 2)

transform

Apply transformations to audio files.

  • Operations: trim, resample, normalize
  • Returns: { "file_path": "output.wav" }

Requirements

  • ffmpeg: Required for VAD and transform operations
  • Python 3.8+: All operations
  • Disk Space: Whisper models range from 100MB (tiny) to 3GB (large)

Error Handling

  • Returns JSON error object on failure
  • Validates all file paths before processing
  • Gracefully handles missing dependencies

版本历史

共 1 个版本

  • v1.1.1 当前
    2026-05-12 05:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

PDF Reader (Iyeque)

iyeque
提取文本、检索PDF内容及生成摘要。
★ 5 📥 2,335
productivity

Local System Info (Iyeque)

iyeque
使用 psutil 返回系统指标(CPU、内存、磁盘、进程)。
★ 0 📥 1,220

Unified Web Search (Iyeque)

iyeque
根据查询选择最佳来源(Tavily、网络搜索增强、浏览器或本地文件),执行搜索并返回带有来源的排名结果。
★ 0 📥 337