Scrape any website. Bypass any bot protection. Free.
pip install scrapling
Scrapling installs Playwright automatically on first run. That's the only dependency.
# Single URL — prints clean markdown to stdout
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com
# Spider the whole site
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --deep
# Deep crawl with limits, save and combine
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --deep --limit 30 --combine
# JSON output — pipe into anything
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --json
echo 'alias flowcrawl="python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py"' >> ~/.zshrc
source ~/.zshrc
Then just: flowcrawl https://example.com
FlowCrawl uses a 3-tier fetcher cascade. Starts fast, escalates only when blocked:
| Tier | Method | Handles |
|---|---|---|
| ------ | -------- | --------- |
| 1 | Plain HTTP | Most sites, instant |
| 2 | Stealth + TLS spoof | Cloudflare, Imperva, basic WAFs |
| 3 | Full JS execution | SPAs, heavy JS, aggressive bot detection |
Auto-detects blocking (403, 503, "Just a moment...") and escalates silently.
| Flag | Description | Default | |
|---|---|---|---|
| ------ | ------------- | --------- | |
--deep | Spider whole site following internal links | off | |
--depth N | Max hop depth from start URL | 3 | |
--limit N | Max pages to crawl | 50 | |
--combine | Merge all pages into one file | off | |
| `--format md\ | txt` | Output format | md |
--output DIR | Output directory | ./flowcrawl-output | |
--json | Structured JSON output | off | |
--quiet | Suppress progress logs | off |
共 2 个版本