Automatically collect and report AI news from multiple sources with fallback browser scraping.
# Install dependencies
pip install -r references/requirements.txt
playwright install chromium
# Configure
python scripts/setup_config.py
# Run collection
python scripts/collect_ai_news.py
# Generate and push report
python scripts/push_to_feishu.py
| Source | Primary Method | Fallback Method |
|---|---|---|
| -------- | --------------- | ----------------- |
| arXiv Papers | RSS API | Playwright browser |
| Hugging Face Papers | RSS Feed | Playwright browser |
| Product Hunt | RSS Feed | Playwright browser |
| YouTube AI Creators | yt-dlp | Playwright browser |
| PaperWeekly | RSS | requests |
| Custom RSS | feedparser | requests |
Edit references/config.example.json or run setup_config.py:
{
"feishu": {
"webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/xxx",
"chat_id": "oc_xxx"
},
"sources": {
"arxiv": {"enabled": true, "categories": ["cs.CL", "cs.LG", "cs.AI"]},
"youtube": {
"enabled": true,
"creators": ["andrew_ng", "matt_wolfe", "ai_explained", "greg_isenberg"]
},
"paperweekly": {"enabled": true, "rss_url": ""}
}
}
Available creator keys:
andrew_ng - 吴恩达 (DeepLearning.AI)matt_wolfe - Matt Wolfeai_explained - AI Explainedai_with_oliver - AI with Olivergreg_isenberg - Greg Isenberg| Script | Purpose |
|---|---|
| -------- | --------- |
collect_ai_news.py | Main collector with fallback logic |
youtube_collector.py | YouTube video collection |
rss_collector.py | RSS feed collection |
browser_fallback.py | Browser-based fallback scraping |
push_to_feishu.py | Report generation and Feishu push |
daily_scheduler.py | Scheduled task runner |
setup_config.py | Interactive configuration setup |
When primary methods (RSS/API/yt-dlp) fail:
Generated reports include:
arXiv returns 0 papers: Check days_back parameter or network connection
YouTube fails: Ensure yt-dlp is installed; fallback to Playwright available
RSS timeouts: Browser fallback will attempt direct requests
Feishu push fails: Verify webhook URL and chat_id in config
rss section in configscripts/collect_ai_news.pybrowser_fallback.pySee references/DEVELOPMENT.md for detailed extension guide.
共 1 个版本