Required binary:
python3 - Runs data collection and merge scripts.Optional binaries:
opencli - Preferred Twitter/X backend in auto mode, required for Xiaoyuzhou metadata discovery, and the documented supported Xiaoyuzhou transcript path.mail - msmtp-based mail command for email delivery.msmtp - SMTP transport used by mail.gog - Gmail CLI fallback for email delivery.gh - GitHub CLI fallback for repository auth.openssl - GitHub App JWT signing fallback.weasyprint - PDF rendering backend.yt-dlp - YouTube podcast metadata and transcript backend.Environment variables:
| Name | Required | Description |
|---|---|---|
| --- | --- | --- |
TWITTER_API_BACKEND | No | Twitter backend: auto, opencli, getxapi, twitterapiio, or official. Default: auto; auto tries OpenCLI first. |
OPENCLI_BIN | No | Optional path to the OpenCLI executable for Twitter/X and Xiaoyuzhou podcast sources. Used when OpenCLI is not available on PATH. |
OPENCLI_MAX_WORKERS | No | Optional OpenCLI concurrency limit. Defaults to 10. |
OPENCLI_CLOSE_TABS_AFTER_RUN | No | Close OpenCLI-created X/Twitter tabs after fetch when set to 1. Default: 1. |
OPENCLI_CLOSE_CHROME_WINDOWS_AFTER_RUN | No | Close OpenCLI-created Chrome automation windows on macOS when set to 1. Default: 1. |
GETX_API_KEY | No | GetXAPI key for Twitter/X fallback. |
X_BEARER_TOKEN | No | Twitter/X API bearer token for KOL monitoring. |
TWITTERAPI_IO_KEY | No | twitterapi.io API key for KOL monitoring. |
TAVILY_API_KEY | No | Tavily Search API key. |
WEB_SEARCH_BACKEND | No | Web search backend: auto, brave, or tavily. |
BRAVE_API_KEYS | No | Brave Search API keys, comma-separated for rotation. |
BRAVE_API_KEY | No | Brave Search API key, single key fallback. |
GITHUB_TOKEN | No | GitHub token for higher API rate limits. |
GH_APP_ID | No | GitHub App ID for automatic installation token generation. |
GH_APP_INSTALL_ID | No | GitHub App Installation ID for automatic token generation. |
GH_APP_KEY_FILE | No | Path to GitHub App private key PEM file. |
YTDLP_BIN | No | Optional path to the yt-dlp executable for YouTube podcast metadata and transcript fetching. |
File access:
| Mode | Path | Purpose |
|---|---|---|
| --- | --- | --- |
| Read | config/defaults/ | Default source and topic configurations. |
| Read | references/ | Prompt templates and output templates. |
| Read | scripts/ | Python pipeline scripts. |
| Read | | Previous digests for deduplication. |
| Write | /tmp/td-*.json | Temporary pipeline intermediate outputs. |
| Write | /tmp/td-email.html | Temporary email HTML body. |
| Write | /tmp/td-digest.pdf | Generated PDF digest. |
| Write | | Saved digest archives. |
Use one of the following paths according to user intent:
run-pipeline.py first, then render with the requested template.validate-config.py before any long pipeline run.fetch-rss.py, fetch-web.py, fetch-github.py, fetch-podcast.py).Priority rule:
Automated tech news digest system with unified data source model, internal ranking pipeline, and template-based output generation.
config/defaults/. Copy to workspace for customization:```bash
mkdir -p workspace/config
cp config/defaults/sources.json workspace/config/follow-news-sources.json
cp config/defaults/topics.json workspace/config/follow-news-topics.json
```
TWITTER_API_BACKEND - Twitter backend: auto|opencli|getxapi|twitterapiio|official (optional, default: auto)OPENCLI_BIN - OpenCLI executable path override for Twitter/X and Xiaoyuzhou podcast sources (optional)OPENCLI_MAX_WORKERS - OpenCLI concurrency limit (optional, default: 10)OPENCLI_CLOSE_TABS_AFTER_RUN - close OpenCLI-created X/Twitter tabs after fetch (optional, default: 1)OPENCLI_CLOSE_CHROME_WINDOWS_AFTER_RUN - close Chrome automation windows opened by OpenCLI on macOS (optional, default: 1)GETX_API_KEY - GetXAPI key for Twitter/X fallback (optional)TWITTERAPI_IO_KEY - twitterapi.io API key for Twitter/X fallback (optional)X_BEARER_TOKEN - Twitter/X official API bearer token for final fallback (optional)TAVILY_API_KEY - Tavily Search API key, alternative to Brave (optional)WEB_SEARCH_BACKEND - Web search backend: auto|brave|tavily (optional, default: auto)BRAVE_API_KEYS - Brave Search API keys, comma-separated for rotation (optional)BRAVE_API_KEY - Single Brave key fallback (optional)GITHUB_TOKEN - GitHub personal access token (optional, improves rate limits)YTDLP_BIN - yt-dlp executable path override for YouTube podcast metadata and transcripts (optional) OpenCLI is the preferred Twitter/X backend in auto mode. In OpenClaw environments where jackwener/opencli is installed, the agent should use that skill to validate opencli doctor, browser bridge state, and X login before asking for API keys.
To use the OpenCLI backend, the user must install the OpenCLI executable and expose it on PATH, or set OPENCLI_BIN to its absolute path. OpenClaw users should also install the jackwener/opencli Skill so the agent can run opencli doctor and diagnose browser bridge or X login-state issues. OpenCLI requests default to 10 workers (OPENCLI_MAX_WORKERS=10). The fetcher closes X/Twitter tabs created during an OpenCLI run by default (OPENCLI_CLOSE_TABS_AFTER_RUN=1) and closes Chrome automation windows opened by OpenCLI on macOS (OPENCLI_CLOSE_CHROME_WINDOWS_AFTER_RUN=1) while preserving tabs and windows that existed before the run.
Xiaoyuzhou podcast metadata discovery uses OpenCLI. The user must install, configure, and authenticate OpenCLI for Xiaoyuzhou metadata discovery. Xiaoyuzhou source URLs use https://www.xiaoyuzhoufm.com/podcast/ with platform: "xiaoyuzhou". Xiaoyuzhou metadata discovery has no direct API or HTML fallback. For transcripts, backend auto/opencli uses OpenCLI for Xiaoyuzhou episodes. The opencli transcript backend is only valid for Xiaoyuzhou sources.
```bash
# Unified pipeline (recommended) - runs all 7 source layers in parallel + merge
python3 scripts/run-pipeline.py \
--defaults config/defaults \
--config workspace/config \
--hours 24 --freshness pd \
--archive-dir workspace/archive/follow-news/ \
--output /tmp/td-merged.json --verbose --force
```
sources.json - Unified Data Sources{
"sources": [
{
"id": "openai-rss",
"type": "rss",
"name": "OpenAI Blog",
"url": "https://openai.com/blog/rss.xml",
"enabled": true,
"priority": true,
"topics": ["llm", "ai-agent"],
"note": "Official OpenAI updates"
},
{
"id": "sama-twitter",
"type": "twitter",
"name": "Sam Altman",
"handle": "sama",
"enabled": true,
"priority": true,
"topics": ["llm", "frontier-tech"],
"note": "OpenAI CEO"
},
{
"id": "training-data-podcast",
"type": "podcast",
"name": "Training Data",
"url": "https://www.youtube.com/playlist?list=PLOhHNjZItNnMm5tdW61JpnyxeYH5NDDx8",
"platform": "youtube",
"enabled": true,
"priority": true,
"topics": ["podcast"],
"transcript": {
"enabled": true,
"backend": "yt-dlp",
"languages": ["en", "zh", "zh-Hans"]
},
"note": "YouTube podcast playlist with optional transcript enrichment"
},
{
"id": "xiaoyuzhou-example",
"type": "podcast",
"name": "Xiaoyuzhou Example",
"url": "https://www.xiaoyuzhoufm.com/podcast/686a1832222ae2de21fea940",
"platform": "xiaoyuzhou",
"enabled": true,
"topics": ["podcast"],
"transcript": {
"enabled": true,
"backend": "opencli",
"languages": ["zh"]
},
"note": "Xiaoyuzhou podcast using OpenCLI for metadata and transcripts"
}
]
}
topics.json - Enhanced Topic Definitions{
"topics": [
{
"id": "llm",
"emoji": "🧠",
"label": "LLM / Large Models",
"description": "Large Language Models, foundation models, breakthroughs",
"search": {
"queries": ["LLM latest news", "large language model breakthroughs"],
"must_include": ["LLM", "large language model", "foundation model"],
"exclude": ["tutorial", "beginner guide"]
},
"display": {
"max_items": 8,
"style": "detailed"
}
}
]
}
run-pipeline.py - Unified Pipeline (Recommended)python3 scripts/run-pipeline.py \
--defaults config/defaults [--config CONFIG_DIR] \
--hours 24 --freshness pd \
--archive-dir workspace/archive/follow-news/ \
--output /tmp/td-merged.json --verbose --force
*.meta.json$GITHUB_TOKEN not setOPENCLI_MAX_WORKERS=10 unless explicitly overridden.--hours) for smoke checks before full-window runs.fetch-rss.py - RSS Feed Fetcherpython3 scripts/fetch-rss.py [--defaults DIR] [--config DIR] [--hours 24] [--output FILE] [--verbose]
fetch-twitter.py - Twitter/X KOL Monitorpython3 scripts/fetch-twitter.py [--defaults DIR] [--config DIR] [--hours 24] [--output FILE] [--backend auto|opencli|getxapi|twitterapiio|official]
fetch-web.py - Web Search Enginepython3 scripts/fetch-web.py [--defaults DIR] [--config DIR] [--freshness pd] [--output FILE]
fetch-github.py - GitHub Releases Monitorpython3 scripts/fetch-github.py [--defaults DIR] [--config DIR] [--hours 24] [--output FILE]
$GITHUB_TOKEN → GitHub App auto-generate → gh CLI → unauthenticated (60 req/hr)fetch-github.py --trending - GitHub Trending Repospython3 scripts/fetch-github.py --trending [--hours 24] [--output FILE] [--verbose]
llm, ai-agent, frontier-tech)fetch-reddit.py - Reddit Posts Fetcherpython3 scripts/fetch-reddit.py [--defaults DIR] [--config DIR] [--hours 24] [--output FILE]
fetch-podcast.py - Podcast, YouTube, and Xiaoyuzhou Fetcherpython3 scripts/fetch-podcast.py [--defaults DIR] [--config DIR] [--hours 24] [--output FILE] [--verbose]
type: "podcast" sources from the unified source config.platform: "youtube" when yt-dlp is available.yt-dlp; set YTDLP_BIN when it is not available on PATH.platform: "xiaoyuzhou" and URLs like https://www.xiaoyuzhoufm.com/podcast/.OPENCLI_BIN when it is not available on PATH.auto, yt-dlp, or opencli; missing yt-dlp fails only that YouTube podcast source instead of the full pipeline.auto/opencli uses OpenCLI for Xiaoyuzhou episodes. The opencli transcript backend is rejected for non-Xiaoyuzhou podcast sources.enrich-articles.py - Article Full-Text Enrichmentpython3 scripts/enrich-articles.py --input merged.json --output enriched.json [--min-score 10] [--max-articles 15] [--verbose]
merge-sources.py - Ranking & Deduplicationpython3 scripts/merge-sources.py --rss FILE --twitter FILE --web FILE --github FILE --trending FILE --reddit FILE --podcast FILE
validate-config.py - Configuration Validatorpython3 scripts/validate-config.py [--defaults DIR] [--config DIR] [--verbose]
generate-pdf.py - PDF Report Generatorpython3 scripts/generate-pdf.py --input report.md --output digest.pdf [--verbose]
weasyprint.sanitize-html.py - Safe HTML Email Converterpython3 scripts/sanitize-html.py --input report.md --output email.html [--verbose]
source-health.py - Source Health Monitorpython3 scripts/source-health.py --rss FILE --twitter FILE --github FILE --reddit FILE --web FILE [--verbose]
summarize-merged.py - Merged Data Summarypython3 scripts/summarize-merged.py --input merged.json [--top N] [--topic TOPIC]
Place custom configs in workspace/config/ to override defaults:
"enabled": falseid → user version takes precedenceid → appended to defaultsid → user version completely replaces default// workspace/config/follow-news-sources.json
{
"sources": [
{
"id": "simonwillison-rss",
"enabled": false,
"note": "Disabled: too noisy for my use case"
},
{
"id": "my-custom-blog",
"type": "rss",
"name": "My Custom Tech Blog",
"url": "https://myblog.com/rss",
"enabled": true,
"priority": true,
"topics": ["frontier-tech"]
}
]
}
2/7 sources available) and avoid claiming completeness.references/summarize-tweets.mdreferences/summarize-podcast.mdreferences/translate.mdreferences/templates/discord.md))references/templates/email.md) references/templates/pdf.md)scripts/generate-pdf.py (requires weasyprint)llm, ai-agent, kol, hackernews, frontier-tech, podcastllm, ai-agent, kol, frontier-techAll sources pre-configured with appropriate topic tags and priority levels.
pip install -r requirements.txt
Optional but Recommended:
feedparser>=6.0.0 - Better RSS parsing (fallback to regex if unavailable)jsonschema>=4.0.0 - Configuration validationyt-dlp - Optional runtime binary for YouTube podcast metadata and transcripts. Set YTDLP_BIN to override lookup.opencli - Required for Xiaoyuzhou metadata discovery and the documented supported Xiaoyuzhou transcript path. Install, configure, and authenticate it for Xiaoyuzhou; set OPENCLI_BIN to override lookup.All scripts work with Python 3.8+ standard library only.
# Validate configuration
python3 scripts/validate-config.py --verbose
# Test RSS feeds
python3 scripts/fetch-rss.py --hours 1 --verbose
# Check Twitter API
python3 scripts/fetch-twitter.py --hours 1 --verbose
python3 scripts/validate-config.py --verbosepython3 scripts/fetch-rss.py --hours 1 --verbosepython3 scripts/run-pipeline.py --defaults config/defaults --hours 24 --freshness pd --archive-dir workspace/archive/follow-news/ --output /tmp/td-merged.json --verboseIf all pass, run the default 24h pipeline, or pass a longer --hours window when the requested digest explicitly needs historical coverage.
/archive/follow-news/ validate-config.py fails:partial status and surface affected source.--hours 24) and compare.Set in ~/.zshenv or similar:
# Twitter (at least one required for Twitter source)
export TWITTERAPI_IO_KEY="your_key" # twitterapi.io key (preferred)
export X_BEARER_TOKEN="your_bearer_token" # Official X API v2 (fallback)
export TWITTER_API_BACKEND="auto" # auto|twitterapiio|official (default: auto)
# Web Search (optional, enables web search layer)
export WEB_SEARCH_BACKEND="auto" # auto|brave|tavily (default: auto)
export TAVILY_API_KEY="tvly-xxx" # Tavily Search API (free 1000/mo)
# Brave Search (alternative)
export BRAVE_API_KEYS="key1,key2,key3" # Multiple keys, comma-separated rotation
export BRAVE_API_KEY="key1" # Single key fallback
export BRAVE_PLAN="free" # Override rate limit detection: free|pro
# GitHub (optional, improves rate limits)
export GITHUB_TOKEN="ghp_xxx" # PAT (simplest)
export GH_APP_ID="12345" # Or use GitHub App for auto-token
export GH_APP_INSTALL_ID="67890"
export GH_APP_KEY_FILE="/path/to/key.pem"
# Podcast transcripts (optional)
export YTDLP_BIN="/path/to/yt-dlp" # Optional; defaults to yt-dlp on PATH
auto mode; API backends fallback in this order: GETX_API_KEY, TWITTERAPI_IO_KEY, X_BEARER_TOKENyt-dlp for metadata and optional transcript fetching; set YTDLP_BIN if needed. Xiaoyuzhou podcast metadata discovery uses OpenCLI with platform: "xiaoyuzhou"; set OPENCLI_BIN if needed. Xiaoyuzhou metadata discovery has no direct API or HTML fallback. For transcripts, backend auto/opencli uses OpenCLI for Xiaoyuzhou episodes; non-Xiaoyuzhou podcast sources cannot use opencli.The cron prompt should NOT hardcode the pipeline steps. Instead, reference references/digest-prompt.md and only pass configuration parameters. This ensures the pipeline logic stays in the skill repo and is consistent across all installations.
Read <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a daily digest.
Replace placeholders with:
- MODE = daily
- TIME_WINDOW = past 1-2 days
- FRESHNESS = pd
- RSS_HOURS = 48
- ITEMS_PER_SECTION = 3-5
- ENRICH = true
- BLOG_PICKS_COUNT = 3
- EXTRA_SECTIONS = (none)
- SUBJECT = Daily Tech Digest - YYYY-MM-DD
- WORKSPACE = <your workspace path>
- SKILL_DIR = <your skill install path>
- DISCORD_CHANNEL_ID = <your channel id>
- EMAIL = (optional)
- LANGUAGE = English
- TEMPLATE = discord
Follow every step in the prompt template strictly. Do not skip any steps.
Read <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a weekly digest.
Replace placeholders with:
- MODE = weekly
- TIME_WINDOW = past 7 days
- FRESHNESS = pw
- RSS_HOURS = 168
- ITEMS_PER_SECTION = 10-15
- ENRICH = true
- BLOG_PICKS_COUNT = 3-5
- EXTRA_SECTIONS = 📊 Weekly Trend Summary (2-3 sentences summarizing macro trends)
- SUBJECT = Weekly Tech Digest - YYYY-MM-DD
- WORKSPACE = <your workspace path>
- SKILL_DIR = <your skill install path>
- DISCORD_CHANNEL_ID = <your channel id>
- EMAIL = (optional)
- LANGUAGE = English
- TEMPLATE = discord
Follow every step in the prompt template strictly. Do not skip any steps.
digest-prompt.md, not scattered across cron configsOpenClaw enforces cross-provider isolation: a single session can only send messages to one provider (e.g., Discord OR Telegram, not both). If you need to deliver digests to multiple platforms, create separate cron jobs for each provider:
# Job 1: Discord + Email
- DISCORD_CHANNEL_ID = <your-discord-channel-id>
- EMAIL = user@example.com
- TEMPLATE = discord
# Job 2: Telegram DM
- DISCORD_CHANNEL_ID = (none)
- EMAIL = (none)
- TEMPLATE = telegram
Replace DISCORD_CHANNEL_ID delivery with the target platform's delivery in the second job's prompt.
This is a security feature, not a bug — it prevents accidental cross-context data leakage.
This skill uses a prompt template pattern: the agent reads digest-prompt.md and follows its instructions. This is the standard OpenClaw skill execution model — the agent interprets structured instructions from skill-provided files. All instructions are shipped with the skill bundle and can be audited before installation.
The Python scripts and configured helper CLIs make outbound requests to:
follow-news-sources.json)api.x.com or api.twitterapi.io)api.search.brave.com)api.tavily.com)api.github.com)reddit.com)platform: "youtube" podcast sources, resolved through yt-dlpAPI keys are read from environment variables declared in the skill metadata. OpenCLI may reuse authenticated browser sessions managed by the user's local browser/OpenCLI setup for Twitter/X and Xiaoyuzhou.
Email delivery uses send-email.py which constructs proper MIME multipart messages with HTML body + optional PDF attachment. Subject formats are hardcoded (Daily Tech Digest - YYYY-MM-DD). PDF generation uses generate-pdf.py via weasyprint. The prompt template explicitly prohibits interpolating untrusted content (article titles, tweet text, etc.) into shell arguments. Email addresses and subjects must be static placeholder values only.
Scripts read from config/ and write to workspace/archive/. No files outside the workspace are accessed.
--verbose for detailsvalidate-config.py for specific issues--hours) and source enablementAll scripts support --verbose flag for detailed logging and troubleshooting.
MAX_WORKERS in scripts for your systemTIMEOUT for slow networksMAX_ARTICLES_PER_FEED based on needsThe digest prompt instructs agents to run Python scripts via shell commands. All script paths and arguments are skill-defined constants — no user input is interpolated into commands. Two scripts use subprocess:
run-pipeline.py orchestrates child fetch scripts (all within scripts/ directory)fetch-github.py has two subprocess calls:openssl dgst -sha256 -sign for JWT signing (only if GH_APP_* env vars are set — signs a self-constructed JWT payload, no user content involved)gh auth token CLI fallback (only if gh is installed — reads from gh's own credential store)No user-supplied or fetched content is ever interpolated into subprocess arguments. Email delivery uses send-email.py which builds MIME messages programmatically — no shell interpolation. PDF generation uses generate-pdf.py via weasyprint. Email subjects are static format strings only — never constructed from fetched data.
Scripts do not directly read ~/.config/, ~/.ssh/, or any credential files. API tokens used directly by the scripts are read from environment variables declared in the skill metadata. OpenCLI-backed Twitter/X and Xiaoyuzhou sources delegate authentication to the user's configured OpenCLI/browser session. The GitHub auth cascade is:
$GITHUB_TOKEN env var (you control what to provide)GH_APP_ID, GH_APP_INSTALL_ID, and GH_APP_KEY_FILE — uses inline JWT signing via openssl CLI, no external scripts involved)gh auth token CLI (delegates to gh's own secure credential store)If you prefer no automatic credential discovery, simply set $GITHUB_TOKEN and the script will use it directly without attempting steps 2-3.
This skill does not install any packages. requirements.txt lists optional dependencies (feedparser, jsonschema) for reference only. All scripts work with Python 3.8+ standard library. Users should install optional deps in a virtualenv if desired — the skill never runs pip install.
Scripts and helper CLIs make outbound HTTP requests to configured RSS feeds, podcast feeds, Twitter API, GitHub API, Reddit JSON API, Brave Search API, Tavily Search API, YouTube URLs handled by yt-dlp, and OpenCLI browser/session traffic for Twitter/X and Xiaoyuzhou sources. No inbound connections or listeners are created.
共 3 个版本