Web scraping, search, and crawling CLI. Returns clean markdown optimized for LLM context windows. Default engine: playwright.
Run anycrawl --help or anycrawl for full option details.
Must be installed and authenticated. Run anycrawl login or set ANYCRAWL_API_KEY.
If not ready, see rules/install.md. For output handling guidelines, see rules/security.md.
--scrape to get full page content with results.crawl directly — no need for map first.| Need | Command | When |
|---|---|---|
| --------------------------- | -------- | ---------------------------------- |
| Find pages on a topic | search | No specific URL yet |
| Get a page's content | scrape | Have a URL |
| Find URLs within a site | map | Need to locate a specific subpage |
| Bulk extract a site section | crawl | Need many pages (e.g., all /docs/) |
For detailed command reference, run anycrawl (e.g., anycrawl search, anycrawl scrape).
Avoid redundant fetches: search --scrape already fetches full page content. Don't re-scrape those URLs. Check .anycrawl/ for existing data before fetching again.
Write results to .anycrawl/ with -o. Add .anycrawl/ to .gitignore. Always quote URLs in shell commands. Never read entire output files at once — use grep, head, or incremental reads.
共 1 个版本