← 返回
数据分析

Web Fetcher

Smart web content fetcher - articles and videos from WeChat, Feishu, Bilibili, Zhihu, Toutiao, YouTube, etc. Triggers: '抓取文章', '下载网页', '保存文章', 'fetch URL', '...
Smart web content fetcher - articles and videos from WeChat, Feishu, Bilibili, Zhihu, Toutiao, YouTube, etc. Triggers: '抓取文章', '下载网页', '保存文章', 'fetch URL', '...
alexxxiong
数据分析 clawhub v0.1.1 1 版本 99897 Key: 无需
★ 0
Stars
📥 970
下载
💾 104
安装
1
版本
#latest

概述

Web Fetcher

Smart web content fetcher for Claude Code. Automatically detects platform and uses the best strategy to fetch articles or download videos.

Quick Start

# Fetch an article
python3 {SKILL_DIR}/fetcher.py "URL" -o ~/docs/

# Download a video
python3 {SKILL_DIR}/fetcher.py "https://b23.tv/xxx" -o ~/videos/

# Batch fetch from file
python3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/

Install Dependencies

Install only what you need — dependencies are checked at runtime:

DependencyPurposeInstall
-----------------------------
scraplingArticle fetching (HTTP + browser)pip install scrapling
yt-dlpVideo downloadpip install yt-dlp
camoufoxAnti-detection browser (Xiaohongshu, Weibo)pip install camoufox && python3 -m camoufox fetch
html2textHTML to Markdown conversionpip install html2text

Smart Routing

The fetcher automatically detects the platform from the URL:

PlatformMethodNotes
-------------------------
mp.weixin.qq.comscraplingExtracts data-src images, handles SVG placeholders
*.feishu.cnVirtual scrollCollects all blocks via scrolling, downloads images with cookies
zhuanlan.zhihu.comscrapling.Post-RichText selector
www.zhihu.comscrapling.RichContent selector
www.toutiao.comscraplingHandles toutiaoimg.com base64 placeholders
www.xiaohongshu.comcamoufoxAnti-bot protection requires stealth browser
www.weibo.comcamoufoxAnti-bot protection requires stealth browser
bilibili.com / b23.tvyt-dlpVideo download, supports quality selection
youtube.com / youtu.beyt-dlpVideo download
douyin.comyt-dlpVideo download
Unknown URLsscraplingGeneric fetch with fallback tiers

CLI Reference

python3 {SKILL_DIR}/fetcher.py [URL] [OPTIONS]

Arguments:
  url                    URL to fetch

Options:
  -o, --output DIR       Output directory (default: current)
  -q, --quality N        Video quality, e.g. 1080, 720 (default: 1080)
  --method METHOD        Force method: scrapling, camoufox, ytdlp, feishu
  --selector CSS         Force CSS selector for content extraction
  --urls-file FILE       File with URLs (one per line, # for comments)
  --audio-only           Extract audio only (video downloads)
  --no-images            Skip image download (articles)
  --cookies-browser NAME Browser for cookies (e.g., chrome, firefox)

Platform Notes

WeChat (mp.weixin.qq.com)

  • Images use data-src attribute with mmbiz.qpic.cn URLs
  • Visible tags contain SVG placeholders (lazy loading)
  • Image download requires Referer: https://mp.weixin.qq.com/ header
  • Scrapling GET usually works; no browser needed

Feishu (*.feishu.cn)

  • Uses virtual scroll — content blocks are rendered on-demand
  • The fetcher scrolls through the entire document, collecting [data-block-id] elements
  • Images require authenticated fetch (cookies), downloaded via browser's fetch API
  • May show "Unable to print" artifacts which are auto-cleaned

Bilibili

  • Short links (b23.tv) are auto-resolved
  • For premium/member content, use --cookies-browser chrome
  • Default quality is 1080p, adjustable with -q

Troubleshooting

ProblemSolution
-------------------
scrapling not foundpip install scrapling
yt-dlp not foundpip install yt-dlp
Article content too shortTry --method camoufox for JS-heavy pages
Feishu returns login pageThe doc may require authentication
Bilibili 403Use --cookies-browser chrome
Image download failsCheck network; WeChat images need Referer header (auto-handled)

Manual Usage

When the CLI doesn't fit your needs, use the modules directly:

from lib.router import route, check_dependency
from lib.article import fetch_article
from lib.video import fetch_video
from lib.feishu import fetch_feishu

# Route a URL
r = route("https://mp.weixin.qq.com/s/xxx")
# {'type': 'article', 'method': 'scrapling', 'selector': '#js_content', 'post': 'wx_images'}

# Fetch article
fetch_article(url, output_dir="/tmp/out", route_config=r)

# Download video
fetch_video(url, output_dir="/tmp/out", quality="720")

# Fetch Feishu doc
fetch_feishu(url, output_dir="/tmp/out")

版本历史

共 1 个版本

  • v0.1.1 当前
    2026-03-31 01:37 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Web Reader

alexxxiong
智能网页阅读器 - 抓取文章/下载视频并归档,支持分析、摘要、衍生。Triggers: '下载这篇文章', '抓取文章', '保存文章', 'fetch URL', '分析这篇文章', '摘要', '总结文章', '下载视频', '抓取微信
★ 0 📥 914
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 367 📥 140,290
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 65,032