概述

Scrapling Skill

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

✅ USE this skill when:

Scrape static or dynamic websites
Bypass Cloudflare, captcha, or bot detection
Extract structured data (HTML/JSON) from web pages
Handle JavaScript-rendered content
Get clean HTML without extra scripts/CSS

When NOT to Use

❌ DON'T use this skill when:

Simple HTTP requests → use web_fetch
Need full browser automation → use browser tool
API-based data → use direct API calls
Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

Default timeout: 10 seconds
Auto-detects best output format (html/json/text)
Handles dynamic content via headless browser when needed
Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

✅ USE this skill when:

Scrape static or dynamic websites
Bypass Cloudflare, captcha, or bot detection
Extract structured data (HTML/JSON) from web pages
Handle JavaScript-rendered content
Get clean HTML without extra scripts/CSS

When NOT to Use

❌ DON'T use this skill when:

Simple HTTP requests → use web_fetch
Need full browser automation → use browser tool
API-based data → use direct API calls
Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

Default timeout: 10 seconds
Auto-detects best output format (html/json/text)
Handles dynamic content via headless browser when needed
Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

版本历史

共 1 个版本

v1.0.0 当前

2026-03-30 03:19 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Scrapling AI

概述

Scrapling Skill

When to Use

When NOT to Use

Setup

Common Commands

Basic Scrape

With Headers/Timeouts

Extract Specific Elements

JSON Output with Fields

MCP Integration

Examples

Scrape News Article

Extract Product Data

Handle Cloudflare

Notes

JSON Output Format

When to Use

When NOT to Use

Setup

Common Commands

Basic Scrape

With Headers/Timeouts

Extract Specific Elements

JSON Output with Fields

MCP Integration

Examples

Scrape News Article

Extract Product Data

Handle Cloudflare

Notes

JSON Output Format

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Humanizer

YouTube

AdMapix