← 返回
效率工具 Key 中文

Data Spider

Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction.
抓取任意网页并将结构化数据提取为JSON、表格或列表,支持基于Schema的提取。
unixlamadev-spec
效率工具 clawhub v1.1.0 2 版本 99899.2 Key: 需要
★ 1
Stars
📥 971
下载
💾 17
安装
2
版本
#latest

概述

Data Spider

Scrape and extract structured data from any webpage. Supports schema-guided extraction to match a specific data shape, or auto-detection of structure. Returns data as JSON object, table (columns + rows), or flat list depending on your chosen format.

When to Use

  • Extracting product information or pricing from pages
  • Gathering statistics and figures from articles
  • Building datasets from web sources
  • Schema-guided extraction to match your data model
  • Research and competitive analysis

Usage Flow

  1. Provide a webpage url
  2. Optionally provide a schema object — data will be extracted to match that exact shape
  3. Optionally set format: json (default), table, or list
  4. AIProx routes to the data-spider agent
  5. Returns structured data in the requested format, plus summary and source URL

Security Manifest

PermissionScopeReason
---------------------------
Networkaiprox.devAPI calls to orchestration endpoint
Env ReadAIPROX_SPEND_TOKENAuthentication for paid API

Make Request — JSON with Schema

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "url": "https://example.com/pricing",
    "schema": {"free_tier": null, "pro_price": null, "enterprise": null},
    "format": "json"
  }'

Response — JSON

{
  "data": {"free_tier": "$0/month, 1000 API calls", "pro_price": "$29/month", "enterprise": "custom pricing"},
  "summary": "SaaS pricing page with three tiers.",
  "source": "https://example.com/pricing",
  "format": "json"
}

Make Request — Table

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "task": "extract pricing tiers as a table",
    "url": "https://example.com/pricing",
    "format": "table"
  }'

Response — Table

{
  "columns": ["Plan", "Price", "API Calls"],
  "rows": [
    ["Free", "$0/month", "1,000"],
    ["Pro", "$29/month", "50,000"],
    ["Enterprise", "Custom", "Unlimited"]
  ],
  "summary": "Three-tier SaaS pricing.",
  "source": "https://example.com/pricing",
  "format": "table"
}

Response — List

{
  "items": ["$0/month — Free tier, 1000 API calls", "$29/month — Pro, 50,000 calls", "Enterprise — custom pricing"],
  "summary": "SaaS pricing tiers extracted as flat list.",
  "source": "https://example.com/pricing",
  "format": "list"
}

Trust Statement

Data Spider fetches and analyzes webpage contents via URL. Content is processed transiently and not stored. Analysis is performed by Claude via LightningProx. Respects robots.txt and rate limits. Your spend token is used for payment only.

版本历史

共 2 个版本

  • v1.1.0 当前
    2026-03-29 08:01 安全 安全
  • v1.0.1
    2026-03-26 22:25

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Weather

steipete
获取当前天气和预报(无需API密钥)
★ 445 📥 226,249
content-creation

Vision Bot

unixlamadev-spec
描述图片、检测物体、提取文字、分析网页。可直接传入任意图片URL,并用您的语言回复。
★ 0 📥 1,862
productivity

Nano Pdf

steipete
使用nano-pdf CLI通过自然语言指令编辑PDF
★ 275 📥 114,812