← 返回
未分类 Key 中文

GetMarkdown

Scrape a single page or crawl a full website using WebCrawlerAPI. Trigger for: fetching page content, getting markdown from a URL, scraping a page, crawling...
使用 WebCrawlerAPI 抓取单个页面或整站爬取。触发条件:获取页面内容、从 URL 获取 Markdown、抓取页面、爬取...
n10ty n10ty 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 365
下载
💾 0
安装
1
版本
#latest

概述

WebCrawlerAPI Skill

Use WebCrawlerAPI to get page content as markdown (single page scrape) or crawl entire websites (multi-page).

Setup — API Key

The API key must be set as an environment variable before running any curl commands:

export WEBCRAWLERAPI_API_KEY="your_api_key"

Get your key:

  1. Go to https://webcrawlerapi.com/
  2. Sign up at https://dash.webcrawlerapi.com/
  3. Visit https://dash.webcrawlerapi.com/access
  4. Copy your API key

If WEBCRAWLERAPI_API_KEY is not set, stop and ask the user to set it before proceeding.

Decision: Scrape vs Crawl

Scrape (single page) — default when user asks for:

  • Content/markdown of a specific page or URL
  • "Get me this page", "scrape this URL", "what does this page say"
  • No mention of "website", "full site", "all pages", "crawl"

Crawl (multi-page) — when user asks for:

  • "Crawl this website", "get all pages from", "full website content"
  • Mentions a domain broadly (not a specific path)
  • Wants multiple pages

Scrape — Single Page

Use POST /v2/scrape. Synchronous — result is returned immediately.

curl --fail --silent --show-error \
  --request POST \
  --url "https://api.webcrawlerapi.com/v2/scrape" \
  --header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}" \
  --header "Content-Type: application/json" \
  --data '{
    "url": "<URL>",
    "output_formats": ["markdown"]
  }'

Scrape Response

The response contains markdown field directly — no polling needed:

{
  "success": true,
  "status": "done",
  "markdown": "## Page Title\n\nPage content...",
  "page_status_code": 200,
  "page_title": "Page Title"
}

On success

Output the markdown content directly to the user. No need to save to files for scrape.

On failure

If success is false, show the error_code and error_message to the user.


Crawl — Full Website

Use POST /v1/crawl. Asynchronous — returns a job ID, then poll for results.

Step 1: Start the crawl

curl --fail --silent --show-error \
  --request POST \
  --url "https://api.webcrawlerapi.com/v1/crawl" \
  --header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}" \
  --header "Content-Type: application/json" \
  --data '{
    "url": "<URL>",
    "items_limit": 25,
    "output_formats": ["markdown"]
  }'

Response:

{ "id": "<JOB_ID>" }

Step 2: Poll job status (background loop)

Use a background Bash job to poll every 10 seconds until status is done or error:

JOB_ID="<JOB_ID>"
while true; do
  RESULT=$(curl --fail --silent --show-error \
    --request GET \
    --url "https://api.webcrawlerapi.com/v1/job/${JOB_ID}" \
    --header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}")
  STATUS=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
  echo "Job status: $STATUS"
  if [ "$STATUS" = "done" ] || [ "$STATUS" = "error" ]; then
    echo "$RESULT"
    break
  fi
  sleep 10
done

Step 3: Download and save results

When job is done, for each job_item with status: done:

  • Fetch the content from markdown_content_url
  • Save to .webcrawlerapi//.md
mkdir -p ".webcrawlerapi/<hostname>"

# For each job_item, fetch markdown_content_url and save:
curl --silent "<markdown_content_url>" \
  --output ".webcrawlerapi/<hostname>/<sanitized-filename>.md"

Sanitize filenames: replace ://, /, ?, #, : with _. Trim leading underscores.

Step 4: Report to user

After saving, tell the user:

  • Total pages crawled
  • How many succeeded vs failed
  • Where files were saved: .webcrawlerapi//
  • List the saved files

Notes

  • Default items_limit for crawl: 25 (ask user if they want more)
  • For scrape, just output the markdown — don't save to disk
  • For crawl, always save to .webcrawlerapi/ directory in current working dir
  • If the job returns error status, show last_error from job items and the job-level error if present
  • Never hardcode the API key — always use ${WEBCRAWLERAPI_API_KEY}

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-21 14:04 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 141,013
data-analysis

Stock Watcher

robin797860
管理和监控个人股票自选列表,支持利用同花顺数据添加、删除、列出股票及汇总近期表现。适用于用户希望追踪特定股票、获取表现汇总或管理自选列表时。
★ 112 📥 46,292
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 210 📥 68,802