← 返回
开发者工具
中文
XPR Web Scraping
Tools for fetching and extracting cleaned text, metadata, and links from single or multiple web pages with format options and link filtering.
用于从单个或多个网页获取并提取清洗文本、元数据和链接的工具,支持格式选项与链接过滤。
paulgnz
开发者工具
clawhub
v0.2.11 1 版本 99959.7 Key: 无需
#extraction#latest#web-scraping#xpr
概述
Web Scraping
You have web scraping tools for fetching and extracting data from web pages:
Single page:
scrape_url — fetch a URL and get cleaned text content + metadata (title, description, link count)- Use format="text" (default) for most tasks — strips all HTML
- Use format="markdown" to preserve headings, links, lists, bold/italic
- Use format="html" only when you need raw HTML
Link discovery:
extract_links — fetch a page and extract all links with text and type (internal/external)- Use the
pattern parameter to filter by regex (e.g. "\\.pdf$" for PDF links) - Links are deduplicated and resolved to absolute URLs
Multi-page research:
scrape_multiple — fetch up to 10 URLs in parallel for comparison/research- One failure doesn't block others (uses Promise.allSettled)
Best practices:
- Prefer "text" format for content extraction, "markdown" for preserving structure
- Don't scrape the same domain more than 5 times per minute
- Combine with
store_deliverable to save scraped content as job evidence - For very large pages, the content is limited to 5MB
版本历史
共 1 个版本
-
v0.2.11
当前
2026-03-28 17:39 安全 安全
安全检测
腾讯云安全 (Sanbu)
安全,无风险
查看报告
🔗 相关推荐
developer-tools
larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 68
📥 180,176
developer-tools
steipete
Google Workspace 命令行工具,支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
★ 921
📥 185,799
productivity
paulgnz
在安全的沙箱中执行 JavaScript 代码,用于数据处理、计算和快速表达式评估,且无网络或文件系统访问。
★ 0
📥 1,318