Use this skill to crawl any JavaScript-rendered webpage using real Chrome browsers from a distributed worker pool. Unlike headless browser solutions (Puppeteer/Playwright), OpenCrawl requires zero local browser installation — ideal for VPS and cloud environments.
```
OPENCRAWL_API_KEY=ak_your_key_here
OPENCRAWL_API_URL=http://39.105.206.76:9877
```
If you prefer to run your own OpenCrawl server, see the full deployment guide:
https://github.com/hlyylly/OpenCrawl
Then set OPENCRAWL_API_URL to your own server address.
How it works: Your request → OpenCrawl server → dispatched to a real Chrome browser worker → page rendered with full JavaScript → content extracted → uploaded to Cloudflare R2 → download URL returned to you.
Errors: On failure the script writes a JSON error to stderr and exits with code 1.
Use this to get the full rendered text content of any webpage, including JavaScript-rendered content that simple HTTP requests cannot retrieve.
Command:
python3 {baseDir}/tools/crawl.py --url "https://example.com"
Examples:
# Crawl a full page
python3 {baseDir}/tools/crawl.py --url "https://www.smzdm.com/p/170177008/"
# Crawl with CSS selector to extract specific content
python3 {baseDir}/tools/crawl.py --url "https://example.com" --selector ".article-content"
# Output raw JSON response (includes downloadUrl)
python3 {baseDir}/tools/crawl.py --url "https://example.com" --raw
Optional flags:
--selector ".css-selector" — Extract only matching elements--mode lite — Lite mode: no images/CSS, faster, 0.1 credit (default: full)--raw — Output full JSON response instead of just the text content--timeout 60 — Custom timeout in seconds (default: 60)Use this to search the web using multiple search engines (DuckDuckGo + Google + Bing + Baidu) through real Chrome browsers. Returns structured results compatible with Brave Search API format.
Command:
python3 {baseDir}/tools/crawl.py --search "your search query"
Examples:
# Lite search — DuckDuckGo only (0.1 credit)
python3 {baseDir}/tools/crawl.py --search "python web scraping"
# Full search — 4 engines parallel (3 credits, 20-30 deduplicated results)
python3 {baseDir}/tools/crawl.py --search "python web scraping" --mode full
Use this to check how many credits remain on the API key.
Command:
python3 {baseDir}/tools/crawl.py --balance
Use this to check the OpenCrawl platform status — how many workers are online, tasks completed, etc.
Command:
python3 {baseDir}/tools/crawl.py --status
| Action | Argument | Example |
|---|---|---|
| -------- | ---------- | --------- |
| Crawl (full) | --url | python3 {baseDir}/tools/crawl.py --url "https://example.com" |
| Crawl (lite) | --url --mode lite | python3 {baseDir}/tools/crawl.py --url "https://example.com" --mode lite |
| Search (lite) | --search | python3 {baseDir}/tools/crawl.py --search "python tutorial" |
| Search (full) | --search --mode full | python3 {baseDir}/tools/crawl.py --search "python tutorial" --mode full |
| Check balance | --balance | python3 {baseDir}/tools/crawl.py --balance |
| Check status | --status | python3 {baseDir}/tools/crawl.py --status |
Output: Crawl → rendered page text (or JSON with --raw). Search → JSON with web.results[] (Brave compatible). Balance → JSON. Status → JSON.
Requirements: Python 3.8+, requests library. No browser installation needed.
共 1 个版本