A page embeds a Vimeo video using .
The video is "domain-restricted" — opening the player URL directly returns a privacy error,
and yt-dlp / browser automation also fail because Vimeo enforces the Referer check.
But you only need the transcript, not the video. Vimeo's auto-generated captions are
served from a separate signed URL that the player HTML embeds in a JSON config — and that
HTML is fetched whenever the Referer matches the allowed domain. So you can pull the
captions without ever playing the video.
and no transcript text.Three commands. Replace {ID}, {HASH}, and {HOST} with values from the embed.
# 1. Fetch the player HTML with the correct Referer and grep out the text_tracks JSON.
curl -s -H 'Referer: https://{HOST}/' \
'https://player.vimeo.com/video/{ID}?h={HASH}' \
| grep -oE '"text_tracks":\[[^]]*\]'
That returns something like:
"text_tracks":[{"id":303147651,"lang":"en-x-autogen",
"url":"https://captions.vimeo.com/captions/303147651.vtt?expires=...&sig=...",
"kind":"subtitles","label":"English (auto-generated)",
"provenance":"ai_generated","default":true}]
# 2. Download the VTT (the signed URL works without Referer).
curl -s 'https://captions.vimeo.com/captions/{CAP_ID}.vtt?expires=...&sig=...' \
-o /tmp/transcript.vtt
# 3. Strip WEBVTT cues/timestamps to get plain text.
awk '/-->/{next} /^[0-9]+$/{next} /^WEBVTT/{next} /^$/{next} {print}' \
/tmp/transcript.vtt > /tmp/transcript.txt
wc -w /tmp/transcript.txt
From the embed iframe's src attribute on the host page:
https://player.vimeo.com/video/1195836424?h=a0154d0f4b
└────┬────┘ └────┬────┘
ID HASH
HOST is the hostname of the page that embeds the iframe (e.g. www.coatue.com). If you
don't have the iframe URL, open the host page in playwright/devtools and inspect:
[...document.querySelectorAll('iframe')].map(f => f.src)
WEBVTT header + numbered cues.for normal speech).
provenance":"ai_generated"). Proper nouns andbrand-name acronyms are often misheard ("Computer Use" → "CP", "Cherny" → "Cherney").
Re-read the transcript with that bias in mind, especially for names, product
codenames, and numbers.
expires parameter — typically valid for many days, but ifit 403s, re-fetch step 1 to get a fresh signature.
text_tracks returns [], the video has no captions enabled. Falling back to yt-dlp --write-auto-subs won't help (same Referer block); use Whisper on a screen
recording instead.
not on the player's HTML config payload, which leaks the captions URL. This has been
the behavior for years; if Vimeo ever closes it, the fallback is to drive the embed
inside playwright with the correct Referer and read player.getTextTracks() via the
Vimeo Player API.
text_tracks is an array — check lang field for otheravailable subtitle tracks beyond auto-generated English.
Coatue × Boris Cherny interview, May 2026:
https://www.coatue.com/blog/video/interview-with-claude-code-creatorhttps://player.vimeo.com/video/1195836424?h=a0154d0f4bReferer: https://www.coatue.com/ → captions URL leaked(Player API; useful as a fallback when the HTML scrape stops working)
共 1 个版本