Archive web content from your email links. This skill connects to Gmail via IMAP, filters emails by a subject prefix keyword, crawls every link using Playwright (headless Chromium), converts pages to Markdown, and saves them to your OpenClaw workspace.
bash references/setup.sh
This automatically installs:
playwright (Python) + Chromium browser binaryhtml2text for HTML→Markdown conversionpython3 references/gmail_link_archiver.py
The first run will prompt you for:
| Setting | Description | Default |
|---|---|---|
| --------- | ------------- | --------- |
| IMAP server | Gmail IMAP host | imap.gmail.com |
| IMAP port | SSL port | 993 |
| Gmail address | Your full email address | — |
| App password | Gmail App Password (NOT your regular password) | — |
| Default mailbox | IMAP folder to search | INBOX |
| Subject prefix | Filter emails whose subject starts with this | — |
| Workspace path | Where to save Markdown files | ~/openclaw-workspace/mail-archive |
Credentials are saved locally to ~/.config/gmail-link-archiver/config.json with 0600 permissions. They are never transmitted or logged.
> Gmail App Password: You need to generate an App Password at
> https://myaccount.google.com/apppasswords (requires 2FA enabled).
After the first setup, subsequent runs will read credentials from the saved config:
# Use saved config defaults
python3 references/gmail_link_archiver.py
# Override mailbox and prefix on the fly
python3 references/gmail_link_archiver.py --mailbox "INBOX" --subject-prefix "[Newsletter]"
# Save to a different workspace
python3 references/gmail_link_archiver.py --workspace ~/my-archive
# Limit number of links to crawl
python3 references/gmail_link_archiver.py --max-links 10
# Re-run the setup interview
python3 references/gmail_link_archiver.py --reconfigure
Gmail IMAP ──► Filter by Subject ──► Extract Links
│
▼
Playwright + Chromium (headless)
│
▼
HTML → Markdown (html2text)
│
▼
Save to OpenClaw Workspace
usage: gmail_link_archiver.py [-h] [--mailbox MAILBOX]
[--subject-prefix PREFIX]
[--workspace PATH]
[--max-links N]
[--reconfigure]
Options:
--mailbox, -m IMAP mailbox to search (default: from config)
--subject-prefix, -s Subject prefix to filter emails
--workspace, -w Directory to save Markdown files
--max-links Max number of links to crawl (default: 50)
--reconfigure Re-run the setup interview
Each crawled page is saved as a Markdown file with YAML frontmatter:
---
source: https://example.com/article
crawled_at: 2026-03-27T12:00:00Z
---
# Article Title
Article content converted to clean Markdown...
Files are named using a sanitized version of the URL plus a short hash for uniqueness.
Ask Claude to run the archiver:
> "Run the Gmail Link Archiver to crawl links from my emails with subject starting with '[ReadLater]'"
Claude will execute:
python3 references/gmail_link_archiver.py --subject-prefix "[ReadLater]"
Or to set up fresh:
> "Set up the Gmail Link Archiver with my credentials"
python3 references/gmail_link_archiver.py --reconfigure
"App password" rejected?
Playwright/Chromium issues?
# Reinstall Chromium
python3 -m playwright install chromium
# Install system dependencies (Linux)
sudo python3 -m playwright install-deps chromium
No emails found?
INBOX, [Gmail]/All Mail, etc.)Permission denied on config file?
chmod 600 ~/.config/gmail-link-archiver/config.json
~/.config/gmail-link-archiver/config.json0600 (owner read/write only)0700 permissions共 1 个版本