← 返回
开发者工具

Scrapling - Stealth Web Scraper

Web scraping using Scrapling — a Python framework with anti-bot bypass (Cloudflare Turnstile, fingerprint spoofing), adaptive element tracking, stealth headl...
使用 Scrapling 实现网页爬取——一个 Python 框架,具备反爬虫绕过(Cloudflare Turnstile、指纹伪造)、自适应元素追踪、静默...
damirikys
开发者工具 clawhub v1.0.3 1 版本 100000 Key: 无需
★ 0
Stars
📥 1,980
下载
💾 108
安装
1
版本
#latest

概述

Scrapling Skill

Source: https://github.com/D4Vinci/Scrapling (open source, MIT-like license)

PyPI: scrapling — install before first use (see below)

> ⚠️ Only scrape sites you have permission to access. Respect robots.txt and Terms of Service. Do not use stealth modes to bypass paywalls or access restricted content without authorization.

Installation (one-time, confirm with user before running)

pip install scrapling[all]
patchright install chromium  # required for stealth/dynamic modes
  • scrapling[all] installs patchright (a stealth fork of Playwright, bundled as a PyPI package — not a typo), curl_cffi, MCP server deps, and IPython shell.
  • patchright install chromium downloads Chromium (~100 MB) via patchright's own installer (same mechanism as playwright install chromium).
  • Confirm with user before running — installs ~200 MB of dependencies and browser binaries.

Script

scripts/scrape.py — CLI wrapper for all three fetcher modes.

# Basic fetch (text output)
python3 ~/skills/scrapling/scripts/scrape.py <url> -q

# CSS selector extraction
python3 ~/skills/scrapling/scripts/scrape.py <url> --selector ".class" -q

# Stealth mode (Cloudflare bypass) — only on sites you're authorized to access
python3 ~/skills/scrapling/scripts/scrape.py <url> --mode stealth -q

# JSON output
python3 ~/skills/scrapling/scripts/scrape.py <url> --selector "h2" --json -q

Fetcher Modes

  • http (default) — Fast HTTP with browser TLS fingerprint spoofing. Most sites.
  • stealth — Headless Chrome with anti-detect. For Cloudflare/anti-bot.
  • dynamic — Full Playwright browser. For heavy JS SPAs.

When to Use Each Mode

  • web_fetch returns 403/429/Cloudflare challenge → use --mode stealth
  • Page content requires JS execution → use --mode dynamic
  • Regular site, just need text/data → use --mode http (default)

Python Inline Usage

For custom logic beyond the CLI, write inline Python. See references/patterns.md for:

  • Adaptive scraping (auto_save / adaptive — saves element fingerprints locally)
  • Session/cookie handling
  • Async usage
  • XPath, find_similar, attribute extraction

Notes

  • MCP server (scrapling mcp): starts a local network service for AI-native scraping. Only start if explicitly needed and trusted — it exposes a local HTTP server.
  • auto_save=True: persists element fingerprints to disk for adaptive re-scraping. Creates local state in working directory.
  • Stealth/dynamic modes use Chromium headless — no xvfb-run needed.
  • For large-scale crawls, use the Spider API (see Scrapling docs).

版本历史

共 1 个版本

  • v1.0.3 当前
    2026-03-29 06:36 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

MarkItDown

damirikys
MarkItDown是微软的一款Python工具,可将各类文件(PDF、Word、Excel、PPTX、图片、音频)转换为Markdown格式,便于提取结构化内容。
★ 5 📥 2,870
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 66 📥 179,973
developer-tools

Gog

steipete
Google Workspace 命令行工具,支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
★ 921 📥 185,761