← 返回
未分类 中文

Web Scraper

Web scraping skill with JavaScript rendering support. Extract data from websites using CSS selectors, XPath, or AI-powered extraction.
支持 JavaScript 渲染的网页抓取技能,使用 CSS 选择器、XPath 或 AI 提取网站数据。
jpengcheng523-netizen
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 522
下载
💾 0
安装
1
版本
#latest

概述

Web Scraper

Extract data from websites with support for dynamic content.

When to Use

  • User wants to scrape data from a website
  • Extract structured data from HTML
  • Handle JavaScript-rendered pages
  • Crawl multiple pages

Features

  • Static pages: Fast HTML parsing
  • Dynamic pages: Playwright/Puppeteer rendering
  • Selectors: CSS, XPath, regex
  • AI extraction: Auto-detect data patterns

Usage

Simple scrape

python3 scripts/scrape.py \
  --url "https://example.com/products" \
  --selector ".product-name" \
  --output ./products.json

With JavaScript rendering

python3 scripts/scrape.py \
  --url "https://spa-example.com/data" \
  --render \
  --wait 2000 \
  --selector ".data-item"

Extract multiple fields

python3 scripts/scrape.py \
  --url "https://example.com/listings" \
  --fields '{
    "title": "h1.title",
    "price": ".price",
    "description": ".desc"
  }'

Crawl multiple pages

python3 scripts/scrape.py \
  --url "https://example.com/page/1" \
  --crawl 'a[href*="/page/"]' \
  --max-pages 10 \
  --selector ".item"

AI-powered extraction

python3 scripts/scrape.py \
  --url "https://example.com/article" \
  --ai-extract "Extract the title, author, and publication date"

Output

{
  "success": true,
  "url": "https://example.com/products",
  "items": [
    {"name": "Product 1", "price": "$99"},
    {"name": "Product 2", "price": "$149"}
  ],
  "scraped_at": "2024-01-15T10:30:00Z"
}

Rate Limiting

  • Default delay: 1 second between requests
  • Respects robots.txt
  • Customizable user agent

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-03 05:53 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

knowledge-graph-memory

jpengcheng523-netizen
构建并维护用于长期记忆的知识图谱,具备概念漂移检测与时序推理能力。适用于存储结构化知识、检测概念漂移及进行时序推理。
★ 0 📥 886

Data Analyzer

jpengcheng523-netizen
数据分析与可视化技能,支持 CSV、Excel、JSON 数据,提供统计分析、图表与报告功能
★ 0 📥 704

周报月报生成

jpengcheng523-netizen
自动生成结构化周报、月报及项目报告,支持飞书文档输出,多模板、多格式、自定义编辑与进度可视化。
★ 1 📥 495