← 返回
数据分析 Key 中文

Tavily Crawl

Crawl any website and save pages as local markdown files. Ideal for downloading documentation, knowledge bases, or web content for offline access or analysis.
爬取任意网站并将页面保存为本地 Markdown 文件。适用于下载文档、知识库或网页内容,以供离线访问或分析。
matthew77
数据分析 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 924
下载
💾 25
安装
1
版本
#latest

概述

Tavily Crawl

Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.

Authentication

Get your API key at https://tavily.com and add to your OpenClaw config:

{
  "skills": {
    "entries": {
      "tavily-crawl": {
        "enabled": true,
        "apiKey": "tvly-YOUR_API_KEY_HERE"
      }
    }
  }
}

Or set in environment variable:

export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"

Quick Start

Using the Script

node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --output ./docs
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 --limit 50

Examples

# Basic crawl
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"

# Deeper crawl with limits
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --limit 50

# Save to files
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --output ./docs

# Focused crawl with path filters
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 \
  --select "/docs/.*" --exclude "/blog/.*"

# With semantic instructions
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" \
  --instructions "Find API documentation" --chunks 3

Options

OptionDescriptionDefault
------------------------------
--depth Crawl depth (1-5)1
--breadth Links per page20
--limit Total pages cap50
--output Save pages to directory-
--instructions Natural language guidance-
--chunks Chunks per page (1-5, requires instructions)-
--depth-mode Extract depth: basic or advancedbasic
--select Regex pattern to include-
--exclude Regex pattern to exclude-
--timeout Max wait time (10-150 seconds)150
--jsonOutput raw JSONfalse

Depth vs Performance

DepthTypical PagesTime
----------------------------
110-50Seconds
250-500Minutes
3500-5000Many minutes

Start with --depth 1 and increase only if needed.

Crawl for Context vs Data Collection

For agentic use (feeding results into context): Always use --instructions + --chunks. This returns only relevant chunks instead of full pages, preventing context window explosion.

For data collection (saving to files): Omit --chunks to get full page content.

Tips

  • Always use --chunks for agentic workflows - prevents context explosion when feeding results to LLMs
  • Omit --chunks only for data collection - when saving full pages to files
  • Start conservative (--depth 1, --limit 20) and scale up
  • Use path patterns to focus on relevant sections
  • Always set a --limit to prevent runaway crawls

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 23:35 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Tavily Search

matthew77
使用Tavily的LLM优化API进行网络搜索,返回包含内容片段、评分和元数据的相关结果。
★ 119 📥 45,611
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,488
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 65,130