← 返回
未分类 Key 中文

CrawlHub

CrawlHub is a professional web data extraction platform that provides structured data from social media and messaging platforms (X/Twitter, Instagram, Telegr...
CrawlHub is a professional web data extraction platform that provides structured data from social media and messaging platforms (X/Twitter, Instagram, Telegr...
wolflabs88 wolflabs88 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 需要
★ 0
Stars
📥 363
下载
💾 0
安装
1
版本
#latest

概述

CrawlHub Integration Skill

CrawlHub is a professional web data extraction platform that provides structured, normalized data from major social media and messaging platforms — via a clean REST API.

What CrawlHub Does

CrawlHub handles all the hard parts of web scraping:

  • Proxies & rate limit handling — avoiding IP blocks
  • Anti-bot circumvention — making requests look like real browsers
  • Parsing & normalization — turning raw HTML/JSON into clean structured records
  • Data delivery — via API (JSON), webhook, or push to S3/Postgres/warehouse

Supported platforms include: X/Twitter, Instagram, Telegram, LinkedIn, YouTube, TikTok, Facebook, Threads — and more.

Platform Overview

PlatformData Types Available
------
X / TwitterUser profiles, tweets, timelines, search, trending topics
InstagramUser profiles, posts, comments, hashtags, followers
TelegramChannels, messages, groups, public content
LinkedInCompany profiles, posts, job listings, people data
YouTubeVideo metadata, channels, comments, search
TikTokUser profiles, videos, trending content
FacebookPages, posts, groups, public content
ThreadsPosts, user profiles, threads search
+ moreCrawlHub adds new platforms regularly

API Reference

Base URL: https://api.thecrawlhub.com/api/v1

Authentication:

  • Login: POST /auth/login with {"email": "...", "password": "..."} → returns access_token and refresh_token
  • Use: Authorization: Bearer {access_token} header on all requests
  • Refresh: POST /auth/refresh with {"refresh_token": "..."}

Key Endpoints:

Platform Discovery

GET /scraper/platforms                          → List all available platforms
GET /scraper/platforms/{platform_id}             → List modules & endpoints of a platform
GET /scraper/endpoints/{endpoint_id}           → Get detailed info for a specific endpoint

Data Execution

GET  /execution/endpoints/{endpoint_id}/execute     → Execute with query params
POST /execution/endpoints/{endpoint_id}/execute     → Execute with JSON body
PATCH /execution/endpoints/{endpoint_id}/execute    → Partial update style execution
PUT  /execution/endpoints/{endpoint_id}/execute     → Full replacement style execution
DELETE /execution/endpoints/{endpoint_id}/execute    → Delete style execution

Authentication & Users

POST /auth/register       → Register new account
POST /auth/login          → Login (email + password)
POST /auth/refresh        → Refresh access token
POST /auth/logout         → Revoke tokens
POST /auth/password-reset → Request password reset email
GET  /auth/token-validate  → Validate current JWT

Team Management

GET  /teams                        → List user's teams
POST /teams                        → Create a new team
GET  /teams/{team_id}              → List team members
POST /teams/{team_id}/invite       → Invite member to team
DELETE /teams/{team_id}/{member_id} → Remove member
GET  /teams/{team_id}/permissions  → Get current user's permissions
PUT  /teams/{team_id}/{member_id}/role → Change member role
GET  /teams/roles                  → List available team roles
GET  /teams/invite/validate        → Validate invite token
POST /teams/invite/accept          → Accept team invite

API Keys (Team)

GET  /teams/{team_id}/api-keys              → List team's API keys
POST /teams/{team_id}/api-keys              → Create new API key
PATCH /teams/{team_id}/api-keys/{api_key_id} → Enable/disable key
GET  /teams/{team_id}/api-keys/{api_key_id}/permissions → Get permission tree for a key
PUT  /teams/{team_id}/api-keys/{api_key_id}/permissions → Sync/set permissions

Billing & Subscription

GET /teams/{team_id}/billing/cycle          → Current billing cycle
GET /teams/{team_id}/billing/transactions   → Transaction history (paginated)
GET /teams/{team_id}/billing/wallet          → Wallet balance
GET /teams/{team_id}/subscription           → Current subscription plan
POST /teams/{team_id}/subscription          → Switch to different plan
PATCH /teams/{team_id}/subscription/policy  → Update subscription policy
GET /plans                                  → List all available plans

Request Logs

GET /teams/{team_id}/scraper/endpoints/{endpoint_id}/logs  → Request logs for an endpoint
     Query params: page, per_page, from, to, status_code, sort_key, sort_order

User Profile

GET    /user/info    → Get current user info
PATCH  /user/update  → Update profile (name, address, phone, company)

Pricing Model

CrawlHub uses a per-record pricing model:

PlanPriceRate LimitBest For
------------
Pay as you go$1.79 / 1,000 records50 req/15min/endpointTesting, prototyping
Scaler$299/month150 req/15min/endpointTeams in production
Business$999/month600 req/15min/endpointHigh-scale data pipelines
EnterpriseCustomCustomUnique requirements, SLAs

Rate limits are per endpoint. Records are counted in the response (not requests).

Execution Response Format

Successful execution returns:

{
  "data": {
    "records": [
      { "title": "...", "url": "...", "created_at": "...", ... }
    ]
  },
  "http_status": 200
}

Error responses include kind (e.g., BAD_INPUT, ABORT_ERROR, HTTP_ERROR, REGISTRY_ERROR) and details.

Use Cases

  • Brand Intelligence — Monitor brand mentions, sentiment, emerging narratives
  • Competitive Intelligence — Track competitor content, launches, audience movements
  • Threat Intelligence — Surface threats, leaks, coordinated inauthentic activity
  • Crypto & Web3 Intelligence — Monitor tokens, projects, communities across X + Telegram
  • News & Media Monitoring — Breaking event coverage across platforms
  • Lead Generation — Build targeted outreach lists from public platform data
  • Academic Research — Collect public social data for research projects

Authentication Flow (Step by Step)

  1. Register or Login to get tokens:

```bash

POST /auth/login

Body: {"email": "user@example.com", "password": "password"}

Response: {"data": {"access_token": "...", "refresh_token": "..."}}

```

  1. Use the access token in all subsequent requests:

```

Authorization: Bearer eyJhbGc...

```

  1. When token expires, refresh:

```

POST /auth/refresh

Body: {"refresh_token": "eyJhbGc..."}

```

  1. Discover platforms and endpoints:

```

GET /scraper/platforms

GET /scraper/platforms/{platform_id}

GET /scraper/endpoints/{endpoint_id}

```

  1. Execute an endpoint to get data:

```

GET /execution/endpoints/{endpoint_id}/execute?param1=value1¶m2=value2

POST /execution/endpoints/{endpoint_id}/execute

Body (JSON): {"param1": "value1", "param2": "value2"}

```

Error Handling

HTTP StatusKindCause
---------
400BAD_INPUTInvalid request parameters
401AUTH_HEADER_FORMATMissing or malformed Authorization header
401INVALID_CREDENTIALSWrong email/password
403ABORT_ERRORPermission denied (endpoint-level)
404REGISTRY_ERROREndpoint not found
405METHOD_NOT_ALLOWEDWrong HTTP method for endpoint
502HTTP_ERRORUpstream platform returned error
503ABORT_ERRORServer busy, retry later

Best Practices

  • Use idempotent retries — pass X-Request-ID header when retrying to avoid duplicate billing
  • Check /plans — before executing to understand your current plan's rate limits
  • Monitor usage — via /teams/{team_id}/billing/transactions and request logs
  • Handle 503s gracefully — implement exponential backoff when server is busy
  • Store access tokens securely — never log them; refresh before expiry

Notes

  • All timestamps are ISO 8601 / date-time format
  • Pagination uses page + per_page (max 100 per page)
  • All list endpoints return paged results
  • API keys (team-level) can have custom permission trees — useful for granular access control
  • CrawlHub adds new platforms and endpoints regularly — check /scraper/platforms periodically

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 14:52 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,309
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 208 📥 68,571
data-analysis

Stock Watcher

robin797860
管理和监控个人股票自选列表,支持利用同花顺数据添加、删除、列出股票及汇总近期表现。适用于用户希望追踪特定股票、获取表现汇总或管理自选列表时。
★ 112 📥 46,149