概述

Tavily Crawl

Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.

Authentication

Get your API key at https://tavily.com and add to your OpenClaw config:

{
  "skills": {
    "entries": {
      "tavily-crawl": {
        "enabled": true,
        "apiKey": "tvly-YOUR_API_KEY_HERE"
      }
    }
  }
}

Or set in environment variable:

export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"

Quick Start

Using the Script

node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --output ./docs
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 --limit 50

Examples

# Basic crawl
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"

# Deeper crawl with limits
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --limit 50

# Save to files
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --output ./docs

# Focused crawl with path filters
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 \
  --select "/docs/.*" --exclude "/blog/.*"

# With semantic instructions
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" \
  --instructions "Find API documentation" --chunks 3

Options

Option	Description	Default
--------	-------------	---------
`--depth`	Crawl depth (1-5)	1
`--breadth`	Links per page	20
`--limit`	Total pages cap	50
`--output`	Save pages to directory	-
`--instructions`	Natural language guidance	-
`--chunks`	Chunks per page (1-5, requires instructions)	-
`--depth-mode`	Extract depth: `basic` or `advanced`	`basic`
`--select`	Regex pattern to include	-
`--exclude`	Regex pattern to exclude	-
`--timeout`	Max wait time (10-150 seconds)	150
`--json`	Output raw JSON	false

Depth vs Performance

Depth	Typical Pages	Time
-------	---------------	------
1	10-50	Seconds
2	50-500	Minutes
3	500-5000	Many minutes

Start with --depth 1 and increase only if needed.

Crawl for Context vs Data Collection

For agentic use (feeding results into context): Always use --instructions + --chunks. This returns only relevant chunks instead of full pages, preventing context window explosion.

For data collection (saving to files): Omit --chunks to get full page content.

Tips

Always use --chunks for agentic workflows - prevents context explosion when feeding results to LLMs
Omit --chunks only for data collection - when saving full pages to files
Start conservative (--depth 1, --limit 20) and scale up
Use path patterns to focus on relevant sections
Always set a --limit to prevent runaway crawls

版本历史

共 1 个版本

v1.0.0 当前

2026-03-29 23:35 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Tavily Crawl

概述

Tavily Crawl

Authentication

Quick Start

Using the Script

Examples

Options

Depth vs Performance

Crawl for Context vs Data Collection

Tips

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Tavily Search

Excel / XLSX

Data Analysis