← 返回
开发者工具 中文

XPR Web Scraping

Tools for fetching and extracting cleaned text, metadata, and links from single or multiple web pages with format options and link filtering.
用于从单个或多个网页获取并提取清洗文本、元数据和链接的工具,支持格式选项与链接过滤。
paulgnz
开发者工具 clawhub v0.2.11 1 版本 99959.7 Key: 无需
★ 0
Stars
📥 2,482
下载
💾 51
安装
1
版本
#extraction#latest#web-scraping#xpr

概述

Web Scraping

You have web scraping tools for fetching and extracting data from web pages:

Single page:

  • scrape_url — fetch a URL and get cleaned text content + metadata (title, description, link count)
  • Use format="text" (default) for most tasks — strips all HTML
  • Use format="markdown" to preserve headings, links, lists, bold/italic
  • Use format="html" only when you need raw HTML

Link discovery:

  • extract_links — fetch a page and extract all links with text and type (internal/external)
  • Use the pattern parameter to filter by regex (e.g. "\\.pdf$" for PDF links)
  • Links are deduplicated and resolved to absolute URLs

Multi-page research:

  • scrape_multiple — fetch up to 10 URLs in parallel for comparison/research
  • One failure doesn't block others (uses Promise.allSettled)

Best practices:

  • Prefer "text" format for content extraction, "markdown" for preserving structure
  • Don't scrape the same domain more than 5 times per minute
  • Combine with store_deliverable to save scraped content as job evidence
  • For very large pages, the content is limited to 5MB

版本历史

共 1 个版本

  • v0.2.11 当前
    2026-03-28 17:39 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 68 📥 180,176
developer-tools

Gog

steipete
Google Workspace 命令行工具,支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
★ 921 📥 185,799
productivity

XPR Code Sandbox

paulgnz
在安全的沙箱中执行 JavaScript 代码,用于数据处理、计算和快速表达式评估,且无网络或文件系统访问。
★ 0 📥 1,318