自动爬取新闻网站和RSS源,提取内容并生成摘要。
查看可用的新闻源:
python3 scripts/rss_fetcher.py
获取指定RSS源的新闻:
python3 scripts/rss_fetcher.py <rss_url> [max_items]
示例:
python3 scripts/rss_fetcher.py https://www.solidot.org/index.rss 5
python3 scripts/crawl.py <url> [max_length]
示例:
python3 scripts/crawl.py https://example.com/news/article.html 3000
常用中文科技新闻源:
国际源:
rss_fetcher.py 输出:
{
"items": [
{
"title": "文章标题",
"link": "文章链接",
"description": "简介",
"published": "发布时间"
}
],
"count": 10
}
crawl.py 输出:
{
"url": "原始链接",
"title": "页面标题",
"content": "正文内容",
"length": 5000
}
如需支持更多功能,可参考:
共 1 个版本