CrawlHub is a professional web data extraction platform that provides structured, normalized data from major social media and messaging platforms — via a clean REST API.
CrawlHub handles all the hard parts of web scraping:
Supported platforms include: X/Twitter, Instagram, Telegram, LinkedIn, YouTube, TikTok, Facebook, Threads — and more.
| Platform | Data Types Available |
|---|---|
| --- | --- |
| X / Twitter | User profiles, tweets, timelines, search, trending topics |
| User profiles, posts, comments, hashtags, followers | |
| Telegram | Channels, messages, groups, public content |
| Company profiles, posts, job listings, people data | |
| YouTube | Video metadata, channels, comments, search |
| TikTok | User profiles, videos, trending content |
| Pages, posts, groups, public content | |
| Threads | Posts, user profiles, threads search |
| + more | CrawlHub adds new platforms regularly |
Base URL: https://api.thecrawlhub.com/api/v1
Authentication:
POST /auth/login with {"email": "...", "password": "..."} → returns access_token and refresh_tokenAuthorization: Bearer {access_token} header on all requestsPOST /auth/refresh with {"refresh_token": "..."}Key Endpoints:
GET /scraper/platforms → List all available platforms
GET /scraper/platforms/{platform_id} → List modules & endpoints of a platform
GET /scraper/endpoints/{endpoint_id} → Get detailed info for a specific endpoint
GET /execution/endpoints/{endpoint_id}/execute → Execute with query params
POST /execution/endpoints/{endpoint_id}/execute → Execute with JSON body
PATCH /execution/endpoints/{endpoint_id}/execute → Partial update style execution
PUT /execution/endpoints/{endpoint_id}/execute → Full replacement style execution
DELETE /execution/endpoints/{endpoint_id}/execute → Delete style execution
POST /auth/register → Register new account
POST /auth/login → Login (email + password)
POST /auth/refresh → Refresh access token
POST /auth/logout → Revoke tokens
POST /auth/password-reset → Request password reset email
GET /auth/token-validate → Validate current JWT
GET /teams → List user's teams
POST /teams → Create a new team
GET /teams/{team_id} → List team members
POST /teams/{team_id}/invite → Invite member to team
DELETE /teams/{team_id}/{member_id} → Remove member
GET /teams/{team_id}/permissions → Get current user's permissions
PUT /teams/{team_id}/{member_id}/role → Change member role
GET /teams/roles → List available team roles
GET /teams/invite/validate → Validate invite token
POST /teams/invite/accept → Accept team invite
GET /teams/{team_id}/api-keys → List team's API keys
POST /teams/{team_id}/api-keys → Create new API key
PATCH /teams/{team_id}/api-keys/{api_key_id} → Enable/disable key
GET /teams/{team_id}/api-keys/{api_key_id}/permissions → Get permission tree for a key
PUT /teams/{team_id}/api-keys/{api_key_id}/permissions → Sync/set permissions
GET /teams/{team_id}/billing/cycle → Current billing cycle
GET /teams/{team_id}/billing/transactions → Transaction history (paginated)
GET /teams/{team_id}/billing/wallet → Wallet balance
GET /teams/{team_id}/subscription → Current subscription plan
POST /teams/{team_id}/subscription → Switch to different plan
PATCH /teams/{team_id}/subscription/policy → Update subscription policy
GET /plans → List all available plans
GET /teams/{team_id}/scraper/endpoints/{endpoint_id}/logs → Request logs for an endpoint
Query params: page, per_page, from, to, status_code, sort_key, sort_order
GET /user/info → Get current user info
PATCH /user/update → Update profile (name, address, phone, company)
CrawlHub uses a per-record pricing model:
| Plan | Price | Rate Limit | Best For |
|---|---|---|---|
| --- | --- | --- | --- |
| Pay as you go | $1.79 / 1,000 records | 50 req/15min/endpoint | Testing, prototyping |
| Scaler | $299/month | 150 req/15min/endpoint | Teams in production |
| Business | $999/month | 600 req/15min/endpoint | High-scale data pipelines |
| Enterprise | Custom | Custom | Unique requirements, SLAs |
Rate limits are per endpoint. Records are counted in the response (not requests).
Successful execution returns:
{
"data": {
"records": [
{ "title": "...", "url": "...", "created_at": "...", ... }
]
},
"http_status": 200
}
Error responses include kind (e.g., BAD_INPUT, ABORT_ERROR, HTTP_ERROR, REGISTRY_ERROR) and details.
```bash
POST /auth/login
Body: {"email": "user@example.com", "password": "password"}
Response: {"data": {"access_token": "...", "refresh_token": "..."}}
```
```
Authorization: Bearer eyJhbGc...
```
```
POST /auth/refresh
Body: {"refresh_token": "eyJhbGc..."}
```
```
GET /scraper/platforms
GET /scraper/platforms/{platform_id}
GET /scraper/endpoints/{endpoint_id}
```
```
GET /execution/endpoints/{endpoint_id}/execute?param1=value1¶m2=value2
POST /execution/endpoints/{endpoint_id}/execute
Body (JSON): {"param1": "value1", "param2": "value2"}
```
| HTTP Status | Kind | Cause |
|---|---|---|
| --- | --- | --- |
| 400 | BAD_INPUT | Invalid request parameters |
| 401 | AUTH_HEADER_FORMAT | Missing or malformed Authorization header |
| 401 | INVALID_CREDENTIALS | Wrong email/password |
| 403 | ABORT_ERROR | Permission denied (endpoint-level) |
| 404 | REGISTRY_ERROR | Endpoint not found |
| 405 | METHOD_NOT_ALLOWED | Wrong HTTP method for endpoint |
| 502 | HTTP_ERROR | Upstream platform returned error |
| 503 | ABORT_ERROR | Server busy, retry later |
X-Request-ID header when retrying to avoid duplicate billing/plans — before executing to understand your current plan's rate limits/teams/{team_id}/billing/transactions and request logspage + per_page (max 100 per page)/scraper/platforms periodically共 1 个版本