← 返回
数据分析

Website Scraper Pro

Run a local script to scrape a single web page into clean markdown or deterministic JSON with Crawl4AI. Use when: user needs direct page retrieval from a URL...
运行本地脚本,使用 Crawl4AI 将单个网页抓取为干净的 Markdown 或确定性 JSON。适用于用户需要直接从 URL 获取页面内容的场景。
youpele52
数据分析 clawhub v0.1.1 2 版本 100000 Key: 无需
★ 0
Stars
📥 795
下载
💾 12
安装
2
版本
#latest

概述

Skill: Website Scraper Pro

When to use

  • The user wants the content of a single web page from a specific URL.
  • The user wants clean markdown extracted from an article, docs page, blog post, or landing page.
  • The user wants a JS-aware scrape for a page that depends on client-side rendering.
  • The user wants deterministic query-focused narrowing of one page without using an AI model inside the skill.
  • The user wants structured JSON output with markdown, title, links, and metadata.

When NOT to use

  • The user wants a broad web search across multiple sources.
  • The user wants a site-wide crawl, recursive crawl, or multi-page extraction workflow.
  • The user wants AI summarization, synthesis, or answer generation inside the scraper itself.
  • The user wants authenticated browser automation or interactive form submission.

Commands

Scrape a page to markdown

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py <URL>

Scrape a JS-heavy page

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py <URL> --js

Scrape a page and narrow by query

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py <URL> --query "<TEXT>"

Return deterministic JSON

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py <URL> --format json

Examples

# Default markdown scrape
uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com

# JS-aware scrape
uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com --js

# Query-focused retrieval
uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com --query "documentation examples"

# JSON output
uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com --format json

Output

  • Default output is clean markdown for a single page.
  • --query keeps the output deterministic and non-LLM.
  • --format json returns deterministic JSON with fields such as title, url, markdown, links, and metadata when available.

Notes

  • This v1 does not use AI models internally. It is a deterministic retrieval tool only.
  • The skill is single-page only. It does not do deep crawling, site maps, schema extraction, or RAG.
  • uv run reads the inline # /// script dependency block in main.py and installs crawl4ai in an isolated environment.
  • If browser setup is missing, run one-time setup commands such as:
  • uv run --with crawl4ai crawl4ai-setup
  • uv run --with crawl4ai python -m playwright install chromium
  • Do NOT use web search for this workflow when a direct URL is available.
  • Call uv run src/main.py directly as shown above.

版本历史

共 2 个版本

  • v0.1.1 当前
    2026-05-28 12:55
  • v0.1.0
    2026-03-29 20:32 安全 安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 64,965
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 164 📥 59,799
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 367 📥 140,147