Generate transcriptions from YouTube videos using vlmrun for speech-to-text and optional timestamps. This skill:
Refer to vlmrun-cli-skill for vlmrun CLI setup, environment variables, and all vlmrun chat options.
.env for API key.env (or .env.local) contains VLMRUN_API_KEY.vlmrun commands.-i )..mp4). For YouTube, the skill first downloads the video with yt-dlp, then passes the file to vlmrun.vlmrun chat "Transcribe this video with timestamps for each section. Output the full transcript in a clear, readable format." -i -o .transcript.txt).vlmrun[cli])> See vlmrun-cli-skill for detailed vlmrun usage and examples (including video transcription).
From the youtube-transcription-generator directory:
Windows (PowerShell):
cd path\to\youtube-transcription-generator
uv venv
.venv\Scripts\Activate.ps1
uv pip install -r requirements.txt
macOS/Linux:
cd path/to/youtube-transcription-generator
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
Copy .env_template to .env and set VLMRUN_API_KEY.
# From youtube-transcription-generator directory, with venv activated
python scripts/run_transcription.py "https://www.youtube.com/watch?v=VIDEO_ID" -o ./output
This will:
output/transcript.txt (and keep artifacts in output/).# 1) Download with yt-dlp
yt-dlp -f "bv*[ext=mp4]+ba/best[ext=mp4]/best" -o video.mp4 "https://www.youtube.com/watch?v=VIDEO_ID"
# 2) Transcribe with vlmrun (see vlmrun-cli-skill for options)
vlmrun chat "Transcribe this video with timestamps for each section. Output the full transcript in a clear, readable format." -i video.mp4 -o ./output
Capture the vlmrun stdout and save it as your transcript, or use --json if you need structured output.
"Transcribe this video with timestamps for each section. Output the full transcript in a clear, readable format."
"Transcribe everything said in this video. Output only the spoken text, no timestamps."
Use --json and ask for a structured format in the prompt (e.g. list of { "time": "...", "text": "..." }).
vlmrun is installed and VLMRUN_API_KEY is set (see vlmrun-cli-skill).uv pip install -r requirements.txt (includes vlmrun[cli] and yt-dlp).python scripts/run_transcription.py -o ./output or download + vlmrun manually.output/transcript.txt). Activate the venv and run: uv pip install "vlmrun[cli]". See vlmrun-cli-skill.
Verify VLMRUN_API_KEY in .env or the current shell.
Update yt-dlp: uv pip install -U yt-dlp. Check the URL is a valid public YouTube video.
Use audio-only download in the script (e.g. -f bestaudio) to reduce size and speed up transcription.
共 1 个版本