Authenticated Paper Fetcher
Use this skill to fetch individual academic PDFs through access the user already has: open access, institutional library proxy, an authenticated local browser profile, or a user-provided remote browser/CDP session.
Boundaries
- Proceed only for content the user is authorized to access or that is open access.
- Never ask for or store passwords, SSO secrets, 2FA codes, session cookies, or publisher API keys in chat.
- Do not bypass paywalls, CAPTCHAs, rate limits, DRM, robots controls, or account restrictions.
- Do not do bulk downloading unless the user confirms the library/publisher license permits it. Prefer official TDM APIs for mining-scale requests.
- Treat browser profiles, CDP endpoints, and downloaded PDFs as sensitive. Do not print tokens or cookie values.
- Before using a cloud browser for university login, tell the user to confirm their school permits entering SSO credentials into that provider.
Preferred Workflow
- Normalize the request to a DOI, publisher URL, or library permalink.
- If the user gives an open-access URL or DOI, try normal direct retrieval first.
- If institutional access is needed, prefer a local persistent browser profile:
node /scripts/fetch-paper.mjs --url "" --out papers --pause-for-login
- If the user has a school proxy prefix, pass it explicitly:
node /scripts/fetch-paper.mjs --doi "" --proxy-prefix "https://ezproxy.example.edu/login?url=" --out papers --pause-for-login
- If the user provides a cloud browser or remote Chrome CDP endpoint, set
PAPER_FETCH_CDP_ENDPOINT or use --cdp. Read references/cloud-browser-options.md first. - For SpringerLink URLs, the helper will try page PDF links and the usual
link.springer.com/content/pdf/.pdf pattern after the authenticated page loads. - Save the PDF and sidecar metadata JSON. Report the saved path, final article URL, and any entitlement/login problem.
Helper Script
Use scripts/fetch-paper.mjs for repeatable retrieval.
Examples:
node <skill-dir>/scripts/fetch-paper.mjs --doi "10.1007/s00134-020-06033-2" --out papers --pause-for-login
node <skill-dir>/scripts/fetch-paper.mjs --url "https://link.springer.com/article/10.1007/s00134-020-06033-2" --out papers --headless
$env:PAPER_FETCH_CDP_ENDPOINT="wss://<redacted-remote-browser-endpoint>"
node <skill-dir>/scripts/fetch-paper.mjs --url "https://link.springer.com/article/<doi>" --out papers
If Node reports that Playwright is missing, ask permission before installing dependencies. Typical local setup:
npm install --save-dev playwright
npx playwright install chromium
For cloud-only CDP usage, playwright-core may be sufficient if the provider supplies the browser:
npm install --save-dev playwright-core
Handling Login
- If the script says access is unavailable, ask the user to log in through the opened local or cloud browser session, then rerun the same command.
- Use
--pause-for-login only when the user is ready to complete SSO in the browser. - Use
--login-only to warm the profile/session without downloading. - Do not automate 2FA, CAPTCHA solving, hidden proxy rotation, or anti-bot evasion.
When Retrieval Fails
Report the exact non-sensitive cause:
- no PDF link found on the authenticated page
- HTTP status such as
403, 401, or 404 - publisher says the article is not included in the user's entitlement
- Playwright/browser dependency is unavailable
- cloud browser endpoint is expired or not connected
Then suggest a lawful next path: user reauthenticates, provides an EZproxy/OpenAthens URL, uses a library permalink, uses an official publisher API/TDM route, or manually supplies the PDF.