← 返回
未分类 中文

audio-quality-check

Analyze audio recording quality - echo detection, loudness, speech intelligibility, SNR, spectral analysis. Use when the user wants to check a recording's qu...
分析音频录制质量——回声检测、响度、语音可懂度、信噪比、频谱分析。适用于用户需要检查录制质量时。
tenequm
未分类 clawhub v0.1.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 359
下载
💾 0
安装
1
版本
#latest

概述

Audio Recording Quality Analyzer

Comprehensive audio quality analysis for call recordings. Handles dual-track M4A files (system audio + mic), single-track recordings, and AEC-processed files.

Quick Start

Run the bundled analysis script on a recording directory:

python <skill-path>/scripts/analyze_recording.py "/path/to/recording/directory"

Modes for focused analysis:

python <skill-path>/scripts/analyze_recording.py /path --tracks   # track info only
python <skill-path>/scripts/analyze_recording.py /path --echo     # echo detection only
python <skill-path>/scripts/analyze_recording.py /path --quality  # quality metrics (skip echo)

For Blackbox recordings, the directory is typically:

~/Library/Application Support/Blackbox/Recordings//

Dependencies

System: ffmpeg, ffprobe (brew install ffmpeg)

Python: numpy, soundfile, scipy, pyloudnorm, pesq, pystoi, librosa

Install all Python deps: pip3 install numpy soundfile scipy pyloudnorm pesq pystoi librosa

What Each Metric Tells You

EBU R128 Loudness (pyloudnorm)

  • What: Perceptual loudness in LUFS (Loudness Units Full Scale)
  • Target: -16 to -24 LUFS for speech
  • Watch for: AEC/post-processed tracks being significantly louder than originals (indicates the processing is amplifying without normalizing)

Echo Detection - Autocorrelation

  • What: Detects delayed copies of the signal within a single track by correlating the signal with itself at various time offsets
  • How to read: Peaks in the 20-100ms range with correlation > 0.3 indicate signal duplication. The lag tells you the delay of the duplicate copy
  • Key insight: If you see a consistent peak at the same lag across multiple time segments, that's a systematic duplication (e.g., a virtual audio processor like Krisp introducing a delayed copy at ~53ms)
  • Normal values: Peaks below 0.15 are typically speech pitch harmonics (harmless). Peaks above 0.3 at consistent lags are echo

Cross-Track Correlation

  • What: Measures how much one track's content appears in another (e.g., system audio bleeding into the mic track)
  • How to read: Values near 0 mean no bleed. Values above 0.1 indicate the mic is picking up system audio
  • Coherence: Frequency-domain version of the same test. Voice-band coherence (300-3400Hz) is most relevant for speech echo

PESQ - Speech Quality (requires reference + degraded)

  • What: ITU-T P.862 standard. Gives a MOS (Mean Opinion Score) comparing a degraded signal against a reference
  • Scale: 1.0 (bad) to 4.5 (excellent). NB = narrowband (phone quality), WB = wideband
  • Use for: Comparing AEC-processed mic vs original mic to see if processing helps or hurts
  • Thresholds: 4.0+ excellent, 3.0+ good, 2.5-3.0 fair, <2.5 poor

STOI - Speech Intelligibility (requires reference + degraded)

  • What: Short-Time Objective Intelligibility. Measures how understandable speech remains after processing
  • Scale: 0.0 to 1.0
  • Thresholds: >0.8 good, >0.6 fair, <0.6 poor
  • Key insight: If STOI drops significantly between original and processed, the processing is degrading intelligibility

Spectral Analysis (librosa)

  • Centroid: Average frequency weighted by amplitude. Higher = brighter/harsher audio
  • Rolloff (85%): Frequency below which 85% of spectral energy sits. Lower = more bass-heavy
  • Zero-crossing rate: How often the signal crosses zero. Higher = noisier signal. Speech is typically 0.05-0.20; values above 0.30 suggest significant noise

SNR - Signal-to-Noise Ratio

  • What: Ratio of speech energy to background noise energy (estimated via energy-based VAD)
  • Thresholds: >20dB excellent, >15dB good, >10dB fair, <10dB poor
  • Note: This measures background noise, not echo. A recording can have excellent SNR but still have echo problems

Per-Minute Energy

  • What: RMS energy and voice-band energy per minute of recording
  • Use for: Spotting segments that went silent (mic cut out), got unexpectedly loud (clipping risk), or had activity patterns that help identify when speakers were active

Manual Analysis Recipes

When you need analysis beyond what the script provides, these patterns are useful.

Extract individual tracks from dual-track M4A

ffmpeg -y -i audio.m4a -map 0:0 -ac 1 -ar 16000 /tmp/system.wav
ffmpeg -y -i audio.m4a -map 0:1 -ac 1 -ar 16000 /tmp/mic.wav

Quick loudness check with sox

sox audio.wav -n stat 2>&1

Check specific time range for echo (Python)

import numpy as np
import soundfile as sf
from scipy import signal

data, sr = sf.read('/tmp/system.wav')
# Analyze 5 seconds starting at 2 minutes
start = 120 * sr
seg = data[start:start + 5*sr]
seg_norm = seg / (np.max(np.abs(seg)) + 1e-10)
autocorr = np.correlate(seg_norm, seg_norm, mode='full')
mid = len(seg_norm) - 1
autocorr = autocorr / autocorr[mid]
# Check 20-100ms range for echo peaks
min_lag = int(0.020 * sr)
max_lag = int(0.100 * sr)
region = autocorr[mid + min_lag:mid + max_lag]
peaks, props = signal.find_peaks(region, height=0.1)
for i, p in enumerate(peaks[:5]):
    lag_ms = (p + min_lag) / sr * 1000
    print(f"  Peak at {lag_ms:.1f}ms, r={props['peak_heights'][i]:.3f}")

Common Issues and What Causes Them

SymptomLikely causeWhat to check
-------------------------------------
Speakers sound slightly doubled/echoedVirtual audio processor (Krisp) creating delayed copy in system audioAutocorrelation: consistent peak at 40-60ms
Mic track has remote speakers' voicesAcoustic echo (speakers to mic)Cross-track correlation > 0.1
AEC-processed file sounds worseDTLN-aec degrading signal qualityPESQ/STOI comparing original vs processed
AEC-processed file is too loudMissing loudness normalization after processingLoudness: processed > -10 LUFS
Recording has hiss/noiseLow SNR, noisy mic, or AGC artifactsSNR < 15dB, high zero-crossing rate
Quiet segments mid-recordingMic cut out or device changedPer-minute energy: sudden RMS drop

版本历史

共 1 个版本

  • v0.1.1 当前
    2026-05-07 06:26 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

python-dev

tenequm
使用 uv + ty + ruff + pytest + just 的固执己见 Python 开发环境。适用于新建 Python 项目、配置 pyproject.toml、设置代码检查等场景。
★ 0 📥 740
developer-tools

AgentBox Twitter

tenequm
通过付费API研究Twitter/X:支持50多种运算符搜索推文,抓取包含推文串/回复/引用的推文,获取含推文/粉丝/关注列表的用户资料。
★ 0 📥 635

mcp-best-practices

tenequm
使用 TypeScript SDK(规格 2025-11-25,SDK v1.29 / v2 alpha)构建、保护并优化生产级 MCP 服务器。适用于构建或审查 MCP 服务器。
★ 0 📥 819