← 返回
未分类 Key

OmniVoice

All-in-one voice identity toolkit: speaker identification, voice library management, voice cloning, and speech-to-text. The only OpenClaw skill with speaker...
All-in-one voice identity toolkit: speaker identification, voice library management, voice cloning, and speech-to-text. The only OpenClaw skill with speaker...
yangqibin-caibi yangqibin-caibi 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 1
Stars
📥 342
下载
💾 0
安装
1
版本
#latest

概述

OmniVoice

Ten operations across four capabilities: identify (认) · manage (存) · transcribe (听) · clone (说).

Dependencies

ComponentInstallPurpose
-----------------------------
Whisperpip install openai-whisperSpeech-to-text
Speaker IDpip install transformers librosaSpeaker identification (UniSpeech-SAT)
CosyVoice2SiliconFlow API (SF_API_KEY)Voice cloning
ffmpegSystem packageAudio conversion

Voice references are stored in voice-refs/ at workspace root.

Metadata lives in TOOLS.md under a "Voice Library" section.

See references/voice-library-format.md for format spec.

Operations

Op 1 · Speaker Identification (声纹查询)

Input: audio → Output: who is speaking (or "unknown")

python3 scripts/voice_identify.py <audio_file> [--threshold 0.75]

Compares audio against all voice-refs/-ref.* using UniSpeech-SAT x-vector embeddings.

First run downloads model (~360MB) to /tmp/hf_models/.

Accuracy: Reliably separates male/female voices. Same-gender speakers need ≥5s audio for best results. Threshold 0.75 is default; raise to 0.85 for stricter matching.

Op 2 · Add Voice to Library (声音入库)

Input: audio + speaker name → stores in voice library

  1. Copy audio to voice-refs/-ref1.
  2. Transcribe to get reference text: whisper
  3. Add entry to TOOLS.md (see format in references/)
  4. Register speaker in voice_identify.py SPEAKER_MAP

Good reference audio: 10-15s clear speech, minimal noise, natural pace. 5s minimum.

Op 3 · Voice Library CRUD (声音库管理)

  • List: Check TOOLS.md voice library section + ls voice-refs/
  • Add: See Op 2
  • Update: Replace file in voice-refs/, update TOOLS.md entry
  • Delete: Remove file from voice-refs/, remove TOOLS.md entry, remove from SPEAKER_MAP

Op 4 · Voice Clone (声音克隆)

Input: text + library speaker → Output: audio in that speaker's voice

set -a; source <env_file_with_SF_API_KEY>; set +a

python3 scripts/cosyvoice_clone.py \
  --text "Text to speak" \
  --ref voice-refs/<speaker>-ref1.<ext> \
  --ref-text "What is said in reference audio" \
  --output /tmp/clone_output.wav

Long reference (>15s): truncate first with ffmpeg -y -i -t 15 -ar 24000 -ac 1 /tmp/ref_trimmed.wav.

Op 5 · Transcribe (纯转文字)

Input: audio → Output: text

whisper <audio_file> --model small --output_format txt --output_dir /tmp --language <lang>

Languages: zh (Chinese), en (English), ja (Japanese). Omit for auto-detect.

Op 6 · Transcribe + Identify (转文字+识别)

Input: audio → Output: who said what

Run Op 5 and Op 1 in parallel, report both results together.

Op 7 · Speaker Verification (声纹验证)

Input: two audio files → Output: same person or not

python3 scripts/voice_identify.py <audio_1> --threshold 0.75
python3 scripts/voice_identify.py <audio_2> --threshold 0.75

Compare the top-ranked speaker from both runs. If they match → same person.

For direct pairwise comparison without a library, extract embeddings and compute cosine similarity (see voice_identify.py internals).

Op 8 · Voice Swap (声音换皮)

Input: audio + library speaker → Output: same words, different voice

  1. Transcribe input audio (Op 5)
  2. Clone with target speaker's voice (Op 4), using transcribed text

Op 9 · Persona Voice Reply — from Audio (人格化语音回复·语音版)

Input: audio question + library speaker → Output: AI answer in that speaker's voice

  1. Transcribe the question (Op 5)
  2. Generate answer text via LLM
  3. Clone answer with target speaker's voice (Op 4)

Op 10 · Persona Voice Reply — from Text (人格化语音回复·文字版)

Input: text question + library speaker → Output: AI answer in that speaker's voice

  1. Generate answer text via LLM
  2. Clone answer with target speaker's voice (Op 4)

Send Audio (Feishu)

set -a; source <env_file>; set +a
bash scripts/feishu_send_audio.sh <wav_file> <receive_id>

Converts wav → opus, uploads, sends as voice message.

Requires FEISHU_APP_ID + FEISHU_APP_SECRET env vars.

Extract Audio from Video

ffmpeg -y -i <video_file> -vn -ar 24000 -ac 1 /tmp/extracted_audio.wav

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 06:46 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

suspicious
查看报告

🔗 相关推荐

ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,380 📥 320,626
dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 676 📥 325,589
ai-agent

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,228 📥 267,988