XPR Web Scraping

Tools for fetching and extracting cleaned text, metadata, and links from single or multiple web pages with format options and link filtering.

用于从单个或多个网页获取并提取清洗文本、元数据和链接的工具，支持格式选项与链接过滤。

paulgnz

开发者工具 clawhub v0.2.11 1 版本 99959.7 Key: 无需

★ 0

Stars

📥 2,482

下载

💾 51

安装

版本

#extraction#latest#web-scraping#xpr

概述

Web Scraping

You have web scraping tools for fetching and extracting data from web pages:

Single page:

scrape_url — fetch a URL and get cleaned text content + metadata (title, description, link count)
Use format="text" (default) for most tasks — strips all HTML
Use format="markdown" to preserve headings, links, lists, bold/italic
Use format="html" only when you need raw HTML

Link discovery:

extract_links — fetch a page and extract all links with text and type (internal/external)
Use the pattern parameter to filter by regex (e.g. "\\.pdf$" for PDF links)
Links are deduplicated and resolved to absolute URLs

Multi-page research:

scrape_multiple — fetch up to 10 URLs in parallel for comparison/research
One failure doesn't block others (uses Promise.allSettled)

Best practices:

Prefer "text" format for content extraction, "markdown" for preserving structure
Don't scrape the same domain more than 5 times per minute
Combine with store_deliverable to save scraped content as job evidence
For very large pages, the content is limited to 5MB

版本历史

共 1 个版本

v0.2.11 当前

2026-03-28 17:39 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

🔗 相关推荐

developer-tools

CodeConductor.ai

larsonreever

AI驱动平台，提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。

★ 68 📥 180,176

developer-tools

Gog

steipete

Google Workspace 命令行工具，支持 Gmail、日历、云端硬盘、通讯录、表格和文档。

★ 921 📥 185,799

productivity

XPR Code Sandbox

paulgnz

在安全的沙箱中执行 JavaScript 代码，用于数据处理、计算和快速表达式评估，且无网络或文件系统访问。

★ 0 📥 1,318