markdown.new/crawl

概述

Use markdown.new/crawl for multi-page crawling and async Markdown generation.

Prefer local API calls with curl (or any suitable alternative tools).
Assume no authentication is required unless the service behavior changes.
Start with POST /crawl to get a job ID.
Poll GET /crawl/status/{jobId} until crawl completes.
Default to Markdown output; use ?format=json only when structured output is needed.
Keep crawl scope explicit (limit, depth, subdomain/external toggles, include/exclude patterns).
State default crawl behavior: same-domain links only, up to 500 pages per job.
If /crawl fails (network, timeout, blocked host), fall back to another method and state the fallback.

curl -X POST "https://markdown.new/crawl" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","limit":50}'

curl "https://markdown.new/crawl/status/<jobId>"

curl "https://markdown.new/crawl/status/<jobId>?format=json"

curl "https://markdown.new/crawl/https://docs.example.com"

curl -X DELETE "https://markdown.new/crawl/status/<jobId>"

POST /crawl: create async crawl job and return job ID.
GET /crawl/status/{jobId}: return crawl output and status.
DELETE /crawl/status/{jobId}: cancel a running crawl; completed pages remain available.
GET /crawl/{url}: browser-style shortcut that starts crawl and returns tracking page.

Rate model (as documented in the crawl page FAQ): each crawl costs 50 units against a 500 daily limit (about 10 crawls/day).
Prefer lower limit values for targeted extraction to reduce cost and runtime.

共 1 个版本

安全，无风险

查看报告

安全，无风险

查看报告