← 返回
开发者工具 Key 中文

Twitter Scrape

Scrape Twitter profiles and tweets via GraphQL, export to JSON or database
Scrape Twitter profiles and tweets via GraphQL, export to JSON or database
phy041
开发者工具 clawhub v1.0.2 2 版本 100000 Key: 需要
★ 0
Stars
📥 1,012
下载
💾 18
安装
2
版本
#graphql#latest#scraping#twitter

概述

Twitter/X Scraper Skill

Scrape Twitter profiles and tweets using the rnet_twitter.py GraphQL client. Bypasses Cloudflare via TLS fingerprint emulation. Saves to local JSON and optionally imports to a database.


Prerequisites

# Install rnet (Rust HTTP client with Chrome TLS emulation)
pip install "rnet>=3.0.0rc20" --pre

# Required:
# 1. rnet_twitter.py — async Twitter GraphQL client
#    Get it: https://github.com/PHY041/rnet-twitter-client
# 2. Twitter cookies file (set TWITTER_COOKIES_PATH env var)
#    Format: [{"name": "auth_token", "value": "..."}, {"name": "ct0", "value": "..."}]

Getting Cookies

  1. Open Chrome -> go to x.com -> log in
  2. DevTools (F12) -> Application -> Cookies -> https://x.com
  3. Copy auth_token and ct0 values
  4. Save to JSON file. Cookies expire ~2 weeks.

Quick Usage

Scrape a Twitter User

import asyncio, json, os
from rnet_twitter import RnetTwitterClient

async def scrape_user(username: str, limit: int = 200):
    client = RnetTwitterClient()
    cookies_path = os.environ.get("TWITTER_COOKIES_PATH", "twitter_cookies.json")
    client.load_cookies(cookies_path)

    # Get user profile
    user = await client.get_user_by_screen_name(username)

    # Get tweets
    tweets = await client.get_user_tweets(user["rest_id"], count=limit)

    # Save to JSON
    output = {
        "scraped_at": datetime.now().isoformat(),
        "profile": {
            "id": user["rest_id"],
            "username": username,
            "name": user.get("name", ""),
            "bio": user.get("description", ""),
            "followers_count": user.get("followers_count", 0),
            "following_count": user.get("friends_count", 0),
        },
        "tweets": tweets,
        "tweets_count": len(tweets),
    }

    output_path = f"storage/twitter/{username}.json"
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    with open(output_path, "w") as f:
        json.dump(output, f, indent=2, default=str)

    return output_path

# Usage
asyncio.run(scrape_user("elonmusk", limit=200))

Output Format

{
  "scraped_at": "2026-01-22T10:30:00",
  "profile": {
    "id": "123456789",
    "username": "example",
    "name": "Example User",
    "bio": "...",
    "followers_count": 1234567,
    "following_count": 1234
  },
  "tweets": [
    {
      "id": "1234567890123456789",
      "text": "Tweet content...",
      "created_at": "Thu Jan 22 16:45:03 +0000 2026",
      "likes": 12345,
      "retweets": 1234,
      "replies": 567,
      "views": 1234567,
      "url": "https://x.com/example/status/1234567890123456789",
      "is_retweet": false
    }
  ],
  "tweets_count": 200
}

Optional: Database Import

To import scraped tweets into a PostgreSQL/Supabase database:

CREATE TABLE twitter_content (
  id TEXT PRIMARY KEY,
  username TEXT NOT NULL,
  text TEXT,
  created_at TIMESTAMPTZ,
  likes INTEGER DEFAULT 0,
  retweets INTEGER DEFAULT 0,
  replies INTEGER DEFAULT 0,
  views BIGINT,
  url TEXT,
  is_retweet BOOLEAN DEFAULT FALSE,
  imported_at TIMESTAMPTZ DEFAULT NOW()
);
from supabase import create_client
import os

supabase = create_client(
    os.environ["SUPABASE_URL"],
    os.environ["SUPABASE_KEY"]
)

# Import from scraped JSON
import json
data = json.load(open("storage/twitter/example.json"))
for tweet in data["tweets"]:
    supabase.table("twitter_content").upsert({
        "id": tweet["id"],
        "username": data["profile"]["username"],
        "text": tweet["text"],
        "likes": tweet.get("likes", 0),
        "retweets": tweet.get("retweets", 0),
        "replies": tweet.get("replies", 0),
        "views": tweet.get("views"),
        "url": tweet.get("url"),
        "is_retweet": tweet.get("is_retweet", False),
    }).execute()

Troubleshooting

ErrorCauseSolution
------------------------
403 ForbiddenCookies expiredRefresh auth_token + ct0 from Chrome
404 Not FoundGraphQL ID rotatedRe-extract from abs.twimg.com/.../main.*.js
User not foundUsername wrong/suspendedCheck on x.com
Rate limitedToo many requestsWait 15 minutes

Technical Notes

  • SearchTimeline requires POST (GET returns 404)
  • GraphQL endpoint IDs may rotate — re-extract from Twitter's JS bundle
  • Rate limits: ~300 requests/15min window
  • Uses Emulation.Chrome133 to bypass Cloudflare detection

版本历史

共 2 个版本

  • v1.0.2 当前
    2026-05-21 12:29 安全 安全
  • v1.0.0
    2026-03-30 03:42 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Phy Lenny Mentor

phy041
由300+期Lenny播客驱动的AI产品导师。提炼Brian Chesky、Shreyas Doshi、April Dunford等世界级领袖的智慧。T...
★ 1 📥 583
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 666 📥 323,714
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 65 📥 179,753