← 返回
内容创作 中文

Scrapling AI

Use Scrapling to scrape websites with adaptive parsing, Cloudflare bypass, and MCP support. Handles dynamic content, anti-bot detection, and provides clean H...
使用 Scrapling 抓取网站,具备自适应解析、绕过 Cloudflare 和 MCP 支持。处理动态内容、反爬虫检测,并提供干净的 HTML。
nanpaidashi
内容创作 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 557
下载
💾 53
安装
1
版本
#latest

概述

Scrapling Skill

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 03:19 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 860 📥 199,843
content-creation

YouTube

byungkyu
使用托管OAuth集成YouTube Data API,支持搜索视频、管理播放列表、获取频道数据及评论互动,适用于用户需要时使用此技能。
★ 142 📥 41,073
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,489