概述

Music Analysis (Local, No External APIs)

Primary tool: a full listen that combines snapshot analysis, structure, groove, harmonic tension, temporal mood mapping, and optional Whisper lyric alignment into one report.

1. Full Listen — primary / recommended

python3 skills/music-analysis/scripts/listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/listen.py track.mp3 --json
python3 skills/music-analysis/scripts/listen.py track.mp3 --out report.txt
python3 skills/music-analysis/scripts/listen.py track.mp3 --json --out report.json

What it does in one pass:

Snapshot analysis: tempo, pulse stability, swing proxy, key clarity, harmonic tension, timbre, structure
Whisper lyric transcription and filtering first — keep only real lyric text, drop artifact tags like [MUSIC]
Temporal listen: windowed energy / mood / tension journey
Synthesis layer that aligns lyrics with peak / tension / quiet windows and lets the lyric layer override the final vibe when confidence is high

Human-readable output structure

SNAPSHOT
groove/pocket
structure summary + repeated sections
harmony (key clarity + tension)
timbre descriptor tags
INSTRUMENT READ
likely instrument palette (strong/likely/possible confidence)
per-section instrument entrances and exits
how instruments color the emotional feel
written as natural language, not clinical data
TEMPORAL JOURNEY
opening / middle / closing mood-energy-tension read
peak / quietest / tensest moments
mood journey and transition count
EMOTIONAL READ
explainable emotion summary based on measured features
LYRICS
Whisper segment count
excerpt or graceful skip note
SYNTHESIS
lyric-energy/tension alignment
peak / tension / quiet lyric moments
ALIGNED TIMELINE
per-window moments where transitions / lyrics / tension spikes occur

2. Snapshot Analysis — standalone

python3 skills/music-analysis/scripts/analyze_music.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/analyze_music.py track.mp3 --json

Reports:

tempo / pulse stability / pulse confidence / swing proxy / pocket
key estimate / key clarity / chroma entropy / harmonic change / tonal motion / tension
timbre descriptors (brightness, richness, low-end, contrast, dynamic range)
section labels (A/B/C...) and repeated material detection
explainable emotional read with reasons

3. Temporal Listen — standalone

python3 skills/music-analysis/scripts/temporal_listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/temporal_listen.py track.mp3 --json

Reports:

sliding-window timeline (4s windows, 2s hops)
energy contour
mood labels
harmonic tension + tonal motion
transition types (drop hits, pulls back, tightens harmonically, shifts color, evolves)
narrative arc (mountain / ascending / descending / plateau / wave)

Interpretation rules

Structure labels are similarity labels, not verse/chorus claims.
Swing proxy is a feel estimate, not drummer-grade microtiming truth.
Emotion is explainable, derived from pulse + timbre + harmonic tension rather than a black-box mood guess.
Lyrics can override the final vibe when filtered Whisper text is confident and emotionally clear.

Audio sourcing

The tool needs a real audio file on disk.

Direct file (mp3, wav, flac, ogg, m4a — anything ffmpeg/librosa can read)
YouTube / supported URLs: yt-dlp -x --audio-format mp3 -o "output.mp3" "URL_OR_SEARCH"

Whisper lyrics transcription

listen.py uses:

CLI: /opt/homebrew/bin/whisper-cli
Model: ~/.local/share/whisper-cpp/ggml-large-v3-turbo.bin
Preprocess: convert input to mono 16kHz WAV via ffmpeg
Fallback: skip gracefully if Whisper is missing or errors

Dependencies

Python:

librosa
numpy

System:

ffmpeg
ffprobe

Workspace hygiene

Keep temporary audio files in a dedicated temp/output folder for the skill.
Avoid modifying unrelated project files while working on audio analysis tasks.

版本历史

共 3 个版本

v3.0.2 当前

2026-03-29 09:00 安全安全
v1.1.0

2026-03-26 21:41
v1.2.0

2026-03-14 01:24

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)