← 返回
未分类 Key 中文

Extract

Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.
使用Tavily提取API从指定URL获取网页内容,返回干净的markdown或文本格式。适用于有特定URL且需要内容但不想写代码的场景。
barneyjm
未分类 clawhub v0.1.0 1 版本 99661 Key: 需要
★ 0
Stars
📥 1,470
下载
💾 4
安装
1
版本
#latest

概述

Extract Skill

Extract clean content from specific URLs. Ideal when you know which pages you want content from.

Prerequisites

Tavily API Key Required - Get your key at https://tavily.com

Add to ~/.claude/settings.json:

{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}

Quick Start

Using the Script

./scripts/extract.sh '<json>'

Examples:

# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'

# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'

# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'

# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'

Basic Extraction

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com/article"]
  }'

Multiple URLs with Query Focus

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/ml-healthcare",
      "https://example.com/ai-diagnostics"
    ],
    "query": "AI diagnostic tools accuracy",
    "chunks_per_source": 3
  }'

API Reference

Endpoint

POST https://api.tavily.com/extract

Headers

HeaderValue
---------------
AuthorizationBearer
Content-Typeapplication/json

Request Body

FieldTypeDefaultDescription
-----------------------------------
urlsarrayRequiredURLs to extract (max 20)
querystringnullReranks chunks by relevance
chunks_per_sourceinteger3Chunks per URL (1-5, requires query)
extract_depthstring"basic"basic or advanced (for JS pages)
formatstring"markdown"markdown or text
include_imagesbooleanfalseInclude image URLs
timeoutfloatvariesMax wait (1-60 seconds)

Response Format

{
  "results": [
    {
      "url": "https://example.com/article",
      "raw_content": "# Article Title\n\nContent..."
    }
  ],
  "failed_results": [],
  "response_time": 2.3
}

Extract Depth

DepthWhen to Use
--------------------
basicSimple text extraction, faster
advancedDynamic/JS-rendered pages, tables, structured data

Examples

Single URL Extraction

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://docs.python.org/3/tutorial/classes.html"],
    "extract_depth": "basic"
  }'

Targeted Extraction with Query

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/react-hooks",
      "https://example.com/react-state"
    ],
    "query": "useState and useEffect patterns",
    "chunks_per_source": 2
  }'

JavaScript-Heavy Pages

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://app.example.com/dashboard"],
    "extract_depth": "advanced",
    "timeout": 60
  }'

Batch Extraction

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3",
      "https://example.com/page4",
      "https://example.com/page5"
    ],
    "extract_depth": "basic"
  }'

Tips

  • Max 20 URLs per request - batch larger lists
  • Use query + chunks_per_source to get only relevant content
  • Try basic first, fall back to advanced if content is missing
  • Set longer timeout for slow pages (up to 60s)
  • Check failed_results for URLs that couldn't be extracted

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-05-21 12:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Places

barneyjm
使用灵活的查询格式定位地点(自由文本搜索或结构化地址)。返回坐标、地址及可选的街景照片。用于地址地理编码或查找特定名称的地点。
★ 2 📥 1,712

Query

barneyjm
使用Camino AI的位置智能API通过自然语言搜索地点。返回包含坐标、距离和元数据的相关结果。适用于查找餐厅、商店、地标或任何兴趣点等真实世界位置。
★ 2 📥 1,658

Context

barneyjm
获取位置的综合信息,包括周边地点、区域描述及可选天气。用于了解周围环境或提供位置相关的推荐。
★ 2 📥 1,902