← 返回
未分类 Key 中文

crawl

Crawl any JavaScript-rendered webpage through distributed real Chrome browsers. No local browser needed — perfect for headless VPS.
使用分布式真实Chrome浏览器爬取JavaScript渲染网页。无需本地浏览器,完美适配无头VPS。
hlyylly
未分类 clawhub v1.0.2 1 版本 100000 Key: 需要
★ 1
Stars
📥 578
下载
💾 0
安装
1
版本
#latest

概述

OpenCrawl Skill

Use this skill to crawl any JavaScript-rendered webpage using real Chrome browsers from a distributed worker pool. Unlike headless browser solutions (Puppeteer/Playwright), OpenCrawl requires zero local browser installation — ideal for VPS and cloud environments.

Quick Start (use our public server)

  1. Visit http://39.105.206.76:9877 and click "Register" to get a free API Key (100 credits included)
  2. Set environment variables:

```

OPENCRAWL_API_KEY=ak_your_key_here

OPENCRAWL_API_URL=http://39.105.206.76:9877

```

  1. Start crawling!

Self-hosted (deploy your own server)

If you prefer to run your own OpenCrawl server, see the full deployment guide:

https://github.com/hlyylly/OpenCrawl

Then set OPENCRAWL_API_URL to your own server address.


How it works: Your request → OpenCrawl server → dispatched to a real Chrome browser worker → page rendered with full JavaScript → content extracted → uploaded to Cloudflare R2 → download URL returned to you.

Errors: On failure the script writes a JSON error to stderr and exits with code 1.


Tools

1. Crawl Page

Use this to get the full rendered text content of any webpage, including JavaScript-rendered content that simple HTTP requests cannot retrieve.

Command:

python3 {baseDir}/tools/crawl.py --url "https://example.com"

Examples:

# Crawl a full page
python3 {baseDir}/tools/crawl.py --url "https://www.smzdm.com/p/170177008/"

# Crawl with CSS selector to extract specific content
python3 {baseDir}/tools/crawl.py --url "https://example.com" --selector ".article-content"

# Output raw JSON response (includes downloadUrl)
python3 {baseDir}/tools/crawl.py --url "https://example.com" --raw

Optional flags:

  • --selector ".css-selector" — Extract only matching elements
  • --mode lite — Lite mode: no images/CSS, faster, 0.1 credit (default: full)
  • --raw — Output full JSON response instead of just the text content
  • --timeout 60 — Custom timeout in seconds (default: 60)

2. Search (Brave Search API Compatible)

Use this to search the web using multiple search engines (DuckDuckGo + Google + Bing + Baidu) through real Chrome browsers. Returns structured results compatible with Brave Search API format.

Command:

python3 {baseDir}/tools/crawl.py --search "your search query"

Examples:

# Lite search — DuckDuckGo only (0.1 credit)
python3 {baseDir}/tools/crawl.py --search "python web scraping"

# Full search — 4 engines parallel (3 credits, 20-30 deduplicated results)
python3 {baseDir}/tools/crawl.py --search "python web scraping" --mode full

4. Check Balance

Use this to check how many credits remain on the API key.

Command:

python3 {baseDir}/tools/crawl.py --balance

5. Check Status

Use this to check the OpenCrawl platform status — how many workers are online, tasks completed, etc.

Command:

python3 {baseDir}/tools/crawl.py --status

Summary

ActionArgumentExample
---------------------------
Crawl (full)--urlpython3 {baseDir}/tools/crawl.py --url "https://example.com"
Crawl (lite)--url --mode litepython3 {baseDir}/tools/crawl.py --url "https://example.com" --mode lite
Search (lite)--searchpython3 {baseDir}/tools/crawl.py --search "python tutorial"
Search (full)--search --mode fullpython3 {baseDir}/tools/crawl.py --search "python tutorial" --mode full
Check balance--balancepython3 {baseDir}/tools/crawl.py --balance
Check status--statuspython3 {baseDir}/tools/crawl.py --status

Output: Crawl → rendered page text (or JSON with --raw). Search → JSON with web.results[] (Brave compatible). Balance → JSON. Status → JSON.

Requirements: Python 3.8+, requests library. No browser installation needed.

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-05-02 02:25 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,511
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,903
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 324,236