概述

WebCrawlerAPI Skill

Use WebCrawlerAPI to get page content as markdown (single page scrape) or crawl entire websites (multi-page).

Setup — API Key

The API key must be set as an environment variable before running any curl commands:

export WEBCRAWLERAPI_API_KEY="your_api_key"

Get your key:

Go to https://webcrawlerapi.com/
Sign up at https://dash.webcrawlerapi.com/
Visit https://dash.webcrawlerapi.com/access
Copy your API key

If WEBCRAWLERAPI_API_KEY is not set, stop and ask the user to set it before proceeding.

Decision: Scrape vs Crawl

Scrape (single page) — default when user asks for:

Content/markdown of a specific page or URL
"Get me this page", "scrape this URL", "what does this page say"
No mention of "website", "full site", "all pages", "crawl"

Crawl (multi-page) — when user asks for:

"Crawl this website", "get all pages from", "full website content"
Mentions a domain broadly (not a specific path)
Wants multiple pages

Scrape — Single Page

Use POST /v2/scrape. Synchronous — result is returned immediately.

curl --fail --silent --show-error \
  --request POST \
  --url "https://api.webcrawlerapi.com/v2/scrape" \
  --header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}" \
  --header "Content-Type: application/json" \
  --data '{
    "url": "<URL>",
    "output_formats": ["markdown"]
  }'

Scrape Response

The response contains markdown field directly — no polling needed:

{
  "success": true,
  "status": "done",
  "markdown": "## Page Title\n\nPage content...",
  "page_status_code": 200,
  "page_title": "Page Title"
}

On success

Output the markdown content directly to the user. No need to save to files for scrape.

On failure

If success is false, show the error_code and error_message to the user.

Crawl — Full Website

Use POST /v1/crawl. Asynchronous — returns a job ID, then poll for results.

Step 1: Start the crawl

curl --fail --silent --show-error \
  --request POST \
  --url "https://api.webcrawlerapi.com/v1/crawl" \
  --header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}" \
  --header "Content-Type: application/json" \
  --data '{
    "url": "<URL>",
    "items_limit": 25,
    "output_formats": ["markdown"]
  }'

Response:

{ "id": "<JOB_ID>" }

Step 2: Poll job status (background loop)

Use a background Bash job to poll every 10 seconds until status is done or error:

JOB_ID="<JOB_ID>"
while true; do
  RESULT=$(curl --fail --silent --show-error \
    --request GET \
    --url "https://api.webcrawlerapi.com/v1/job/${JOB_ID}" \
    --header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}")
  STATUS=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
  echo "Job status: $STATUS"
  if [ "$STATUS" = "done" ] || [ "$STATUS" = "error" ]; then
    echo "$RESULT"
    break
  fi
  sleep 10
done

Step 3: Download and save results

When job is done, for each job_item with status: done:

Fetch the content from markdown_content_url
Save to .webcrawlerapi//.md

mkdir -p ".webcrawlerapi/<hostname>"

# For each job_item, fetch markdown_content_url and save:
curl --silent "<markdown_content_url>" \
  --output ".webcrawlerapi/<hostname>/<sanitized-filename>.md"

Sanitize filenames: replace ://, /, ?, #, : with _. Trim leading underscores.

Step 4: Report to user

After saving, tell the user:

Total pages crawled
How many succeeded vs failed
Where files were saved: .webcrawlerapi//
List the saved files

Notes

Default items_limit for crawl: 25 (ask user if they want more)
For scrape, just output the markdown — don't save to disk
For crawl, always save to .webcrawlerapi/ directory in current working dir
If the job returns error status, show last_error from job items and the job-level error if present
Never hardcode the API key — always use ${WEBCRAWLERAPI_API_KEY}

版本历史

共 1 个版本

v1.0.0 当前

2026-05-21 14:04 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)