← 返回
未分类 Key

Tavily Best Practices

Build production-ready Tavily integrations with best practices baked in. Reference documentation for implementing web search, content extraction, crawling, a...
构建可投产的 Tavily 集成,内嵌最佳实践。参考文档实现网页搜索、内容提取、爬取等功能,...
matthew77
未分类 clawhub v1.0.0 1 版本 99295.8 Key: 需要
★ 0
Stars
📥 141
下载
💾 0
安装
1
版本
#latest

概述

Tavily Best Practices

Tavily is a search API designed for LLMs, enabling AI applications to access real-time web data.

Installation

Python:

pip install tavily-python

JavaScript:

npm install @tavily/core

Client Initialization

from tavily import TavilyClient

# Option 1: Uses TAVILY_API_KEY env var (recommended)
client = TavilyClient()

# Option 2: Explicit API key
client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Async client for parallel queries
from tavily import AsyncTavilyClient
async_client = AsyncTavilyClient()

Choosing the Right Method

For custom agents/workflows:

NeedMethod
--------------
Web search resultssearch()
Content from specific URLsextract()
Content from entire sitecrawl()
URL discovery from sitemap()

For out-of-the-box research:

NeedMethod
--------------
End-to-end research with AI synthesisresearch()

Quick Reference

search() - Web Search

response = client.search(
    query="quantum computing breakthroughs",  # Keep under 400 chars
    max_results=10,
    search_depth="advanced",  # highest relevance
    topic="general"  # or "news"
)

for result in response["results"]:
    print(f"{result['title']}: {result['score']}")

Key parameters:

  • query - Keep under 400 characters
  • max_results - 1-20
  • search_depth - ultra-fast, fast, basic, advanced
  • topic - general or news
  • include_domains, exclude_domains - Filter sources
  • time_range - day, week, month, year

extract() - URL Content Extraction

# Two-step pattern (recommended for control)
search_results = client.search(query="Python async best practices")
urls = [r["url"] for r in search_results["results"] if r["score"] > 0.5]
extracted = client.extract(
    urls=urls[:20],
    query="async patterns",  # Reranks chunks by relevance
    chunks_per_source=3  # Prevents context explosion
)

Key parameters:

  • urls - Max 20 URLs
  • extract_depth - basic or advanced
  • query - Reranks chunks by relevance
  • chunks_per_source - 1-5 (prevents context explosion)

crawl() - Site-Wide Extraction

response = client.crawl(
    url="https://docs.example.com",
    max_depth=2,
    instructions="Find API documentation pages",  # Semantic focus
    chunks_per_source=3,  # Token optimization
    select_paths=["/docs/.*", "/api/.*"]
)

Key parameters:

  • url - Root URL to crawl
  • max_depth - 1-5 (start with 1)
  • max_breadth - Links per page
  • limit - Total pages cap
  • instructions - Natural language guidance
  • chunks_per_source - 1-5 (for agentic use)
  • select_paths, exclude_paths - Regex patterns

map() - URL Discovery

response = client.map(
    url="https://docs.example.com",
    max_depth=2,
    instructions="Find all API and guide pages"
)
api_docs = [url for url in response["results"] if "/api/" in url]

Use map() when you only need URLs, not content (faster than crawl).

research() - AI-Powered Research

import time

# For comprehensive multi-topic research
result = client.research(
    input="Analyze competitive landscape for X in SMB market",
    model="pro"  # or "mini" for focused queries, "auto" when unsure
)
request_id = result["request_id"]

# Poll until completed
response = client.get_research(request_id)
while response["status"] not in ["completed", "failed"]:
    time.sleep(10)
    response = client.get_research(request_id)

print(response["content"])  # The research report

Key parameters:

  • input - Research topic or question
  • model - mini (quick), pro (comprehensive), auto
  • stream - Stream results as they arrive
  • output_schema - Structured JSON output
  • citation_format - Citation style

Search Depth Selection

DepthLatencyRelevanceUse Case
-------------------------------------
ultra-fastLowestLowerReal-time chat, autocomplete
fastLowGoodNeed chunks but latency matters
basicMediumHighGeneral-purpose, balanced
advancedHigherHighestPrecision matters, research

Rule of thumb: Start with basic, escalate to advanced for complex topics.

Model Selection for Research

Rule of thumb: "what does X do?" → mini. "X vs Y vs Z" or "best way to..." → pro.

ModelUse CaseSpeed
------------------------
miniSingle topic, targeted research~30s
proComprehensive multi-angle analysis~60-120s
autoAPI chooses based on complexityVaries

Crawl for Context vs Data Collection

For agentic use (feeding results into context):

Always use instructions + chunks_per_source. This returns only relevant chunks instead of full pages, preventing context window explosion.

For data collection (saving to files):

Omit chunks_per_source to get full page content.

Common Patterns

Pattern 1: Search + Extract

# Find relevant URLs first
search_results = client.search(query="React hooks documentation")
high_quality_urls = [r["url"] for r in search_results["results"] if r["score"] > 0.7]

# Extract content from best results
extracted = client.extract(
    urls=high_quality_urls[:10],
    query="useState and useEffect",
    chunks_per_source=3
)

Pattern 2: Map + Crawl

# Discover structure first
map_results = client.map(
    url="https://docs.example.com",
    max_depth=2,
    instructions="Find API documentation pages"
)

# Crawl only relevant sections
api_urls = [url for url in map_results["results"] if "/api/" in url]
crawl_results = client.crawl(
    url="https://docs.example.com/api",
    max_depth=1,
    limit=len(api_urls)
)

Pattern 3: Research with Citations

result = client.research(
    input="Compare LangGraph vs CrewAI for multi-agent systems",
    model="pro"
)

# The response includes citations
print(result["content"])  # AI-synthesized report
print(result["citations"])  # Source references

Performance Tips

  • Keep queries under 400 characters - Think search query, not prompt
  • Break complex queries into sub-queries - Better results than one massive query
  • Use include_domains to focus on trusted sources
  • Use time_range for recent information
  • Start conservative with crawl (max_depth=1, limit=20)
  • Always set a limit to prevent runaway crawls
  • Use chunks_per_source for agentic workflows - prevents context explosion

Cost Optimization

  • Use basic depth as default (cheaper than advanced)
  • Limit max_results to what you'll actually use
  • Disable include_raw_content unless needed
  • Use chunks_per_source instead of full content for context
  • Cache results locally for repeated queries

Error Handling

from tavily import TavilyClient
from tavily.errors import TavilyError

client = TavilyClient()

try:
    result = client.search(query="example")
except TavilyError as e:
    print(f"Tavily API error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Framework Integrations

Tavily integrates with popular frameworks:

  • LangChain - TavilySearch tool
  • LlamaIndex - TavilySearch tool
  • CrewAI - Built-in Tavily tools
  • Vercel AI SDK - Direct API calls

See the official documentation for integration examples.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-12 06:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Tavily Extract

matthew77
使用Tavily提取API从特定URL提取内容,返回干净的markdown或文本格式。
★ 1 📥 692
ai-intelligence

Tavily Search

matthew77
使用Tavily的LLM优化API进行网络搜索,返回包含内容片段、评分和元数据的相关结果。
★ 119 📥 45,641
data-analysis

Tavily Research

matthew77
基于网络数据的综合研究,附有明确引用。适用于多源综合——对比、时事、市场分析等需求。
★ 1 📥 950