← 返回
未分类

x推文自动抓取工具

x推文自动抓取工具 — 抓取指定 x.com 用户的推文,默认抓取前一天 00:00~23:59(北京时间)的推文,也支持自定义时间范围,翻译为中英双语对照,自动推送到飞书文档生成日报,并将链接发送到微信。
x推文自动抓取工具 — 抓取指定 x.com 用户的推文,默认抓取前一天 00:00~23:59(北京时间)的推文,也支持自定义时间范围,翻译为中英双语对照,自动推送到飞书文档生成日报,并将链接发送到微信。
user_e8dcb574
未分类 community v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 42
下载
💾 0
安装
1
版本
#latest

概述

x推文自动抓取工具

抓取 x.com 用户推文,生成中英双语日报,发布到飞书云文档,并将链接推送到微信。

完整流程 / Full Workflow:

  1. 启动 Chrome CDP → 2. 抓取推文 → 3. 翻译排版 → 4. 推送飞书 → 5. 发送链接到微信

支持两种时间范围模式:

  • 默认模式: 前一天 00:00 ~ 23:59(北京时间),即完整一天
  • 自定义模式: 通过 TIME_START / TIME_END 指定任意起止时间(ISO 8601 格式)

触发条件 / Trigger Criteria

当用户提出以下请求时使用本技能:

  • 抓取 x.com 用户推文并推送到飞书
  • 从 x.com 内容生成"日报"(daily report)
  • 获取用户从前一天 0 点到 24 点(北京时间)的推文
  • 将飞书日报链接发送到微信
  • "Scrape tweets from x.com and push to Feishu, send link to WeChat"

Workflow

Phase 1 — Launch Chrome with CDP

The scraping script requires a logged-in Chrome instance with DevTools Protocol

enabled. The user's normal Chrome cannot be reused directly (sandbox

restrictions); a temporary profile must be created.

  1. Kill existing Chrome:

```bash

pkill -9 -f "Google Chrome"

```

  1. Copy essential session files to a temp profile (do NOT copy the full

profile — it is tens of GB and will hang):

```bash

rm -rf /tmp/chrome-debug-profile

mkdir -p /tmp/chrome-debug-profile/Default

for f in "Cookies" "Cookies-journal" "Login Data" "Login Data-journal" \

"Network" "Preferences" "Web Data" "Web Data-journal"; do

src="$HOME/Library/Application Support/Google/Chrome/Default/$f"

[ -e "$src" ] && cp -r "$src" /tmp/chrome-debug-profile/Default/ 2>/dev/null

done

# Also copy top-level files

for f in "Local State" "Last Version"; do

src="$HOME/Library/Application Support/Google/Chrome/$f"

[ -e "$src" ] && cp "$src" /tmp/chrome-debug-profile/ 2>/dev/null

done

```

  1. Launch Chrome with the temp profile and debugging port:

```bash

nohup /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \

--remote-debugging-port=9222 \

--user-data-dir=/tmp/chrome-debug-profile \

--no-first-run \

--no-default-browser-check \

> /tmp/chrome-cdp.log 2>&1 &

sleep 6

```

  1. Verify CDP is reachable:

```bash

curl -s --connect-timeout 5 http://127.0.0.1:9222/json/version

```

If this returns JSON with a Browser field, Chrome is ready.

All of the above MUST use dangerouslyDisableSandbox: true — macOS sandbox

blocks process management of system Chrome otherwise.

Phase 2 — Scrape Tweets

Run the bundled scraping script.

Default time range (前一天 00:00 ~ 23:59 北京时间):

cd <workspace> && \
  TARGET_USERNAME="DeItaone" \
  OUTPUT_DIR="<workspace>" \
  NODE_OPTIONS="" \
  NODE_PATH=<workspace_node_modules> \
  <node_path> <skill_dir>/scripts/scrape_tweets.js

Custom time range (自定义时间范围):

cd <workspace> && \
  TARGET_USERNAME="DeItaone" \
  TIME_START="2026-05-24T09:00:00+08:00" \
  TIME_END="2026-05-26T09:00:00+08:00" \
  OUTPUT_DIR="<workspace>" \
  NODE_OPTIONS="" \
  NODE_PATH=<workspace_node_modules> \
  <node_path> <skill_dir>/scripts/scrape_tweets.js

Environment variables / 环境变量:

变量 / Variable说明 / Description默认值 / Default
---------
TARGET_USERNAMEx.com 用户名(不含 @DeItaone
TIME_START时间窗口起点(ISO 8601),如 2026-05-24T09:00:00+08:00自动计算(前一天 00:00 北京时间)
TIME_END时间窗口终点(ISO 8601),如 2026-05-26T09:00:00+08:00自动计算(前一天 23:59 北京时间)
OUTPUT_DIRtweets_raw.json 输出目录当前工作目录
CDP_URLChrome DevTools Protocol 地址http://127.0.0.1:9222

时间格式说明 / Time Format:

  • 支持时区偏移:2026-05-24T09:00:00+08:00(北京时间)、2026-05-24T01:00:00Z(UTC)
  • TIME_START 和 TIME_END 必须同时提供或同时省略
  • TIME_START 必须早于 TIME_END

The script:

If the script reports "Not logged in", the temp profile did not retain the

session. In that case, ask the user to log into x.com in the debug Chrome

window and re-run.

Phase 3 — Translate & Format

  1. Read tweets_raw.json and translate each tweet into Chinese. Keep ticker

tags ($NVDA, $TSLA) in the translation. Use financial-news terminology.

  1. Write translations as a JSON mapping file translations.json:

```json

{

"ENGLISH PREFIX TEXT...": "中文翻译...",

...

}

```

Use the first 60–80 characters of each English tweet as the key.

  1. Run the formatting script:

```bash

TRANSLATIONS_PATH=/translations.json \

python3 /scripts/format_for_feishu.py \

/tweets_raw.json \

/report.md \

--title "Title 日报" \

--author @username

```

  1. The script produces a markdown file with the structure defined in

references/feishu_format.md. It matches translations by longest prefix

match and inserts placeholders for any unmatched tweets.

  1. Scan the generated markdown for (翻译待补充) placeholders. For any

remaining, manually add the translations by editing the markdown file.

Phase 4 — Push to Feishu

Use lark-cli to create a Feishu cloud document from the markdown:

LARK_CLI="<path-to-lark-cli>"
NODE_OPTIONS="" "$LARK_CLI" docs +create \
  --api-version v2 \
  --as user \
  --doc-format markdown \
  --content "@<path-to-report.md>" \
  --title "Title 日报"
  • Always use --api-version v2 and --doc-format markdown.
  • The @ prefix on --content signals a file path.
  • Run from the directory containing the markdown, or use an absolute path.
  • The command returns a Feishu document URL. Save this URL — it will be used

in Phase 5 to send to WeChat. Also display it to the user.

Phase 5 — Push Link to WeChat / 推送链接到微信

After the Feishu document is created, send the link to the user's WeChat via the

WorkBuddy Mini Program (微信小程序) using deliver_attachments.

  1. Create a simple summary file containing the Feishu link:

```bash

cat > /feishu_link.md << 'EOF'

# 📊 推文日报已生成

飞书文档链接 / Feishu Doc Link:

博主 / Author: @

时间范围 / Time Range:

推文数量 / Tweet Count:

---

点击上方链接查看完整日报 / Click the link above to view the full report.

EOF

```

  1. Use deliver_attachments to push the summary to WeChat:

```

deliver_attachments({

attachments: ["/feishu_link.md"],

explanation: "推送飞书日报链接到微信小程序"

})

```

> Note / 注意: This requires the user to have the "产物回传到小程序"

> (Deliver Artifacts to Mini Program) toggle enabled in WorkBuddy Mini Program

> connection settings.

Phase 6 — Cleanup

  • Close the debug Chrome: pkill -f "chrome-debug-profile"
  • The tweets_raw.json, translations.json, report.md, and

feishu_link.md files in the workspace are intermediate artifacts.

Keep them for traceability.

Key Pitfalls

ProblemCauseFix
---------------------
CDP connection refusedChrome not running with --remote-debugging-portRe-launch Chrome per Phase 1
"Not logged in" in script outputTemp profile missing cookiesAsk user to log in via the debug Chrome window
Full profile copy hangsChrome profile is 10–50 GBOnly copy the files listed in Phase 1 step 2
xcancel.com pagination blockedAnti-bot verificationNever use xcancel/Nitter — always use x.com via CDP
lark-cli auth expiredToken TTLRe-run — the CLI auto-refreshes

Bundle Contents

  • scripts/scrape_tweets.js — Playwright CDP scraper for x.com timelines
  • scripts/format_for_feishu.py — Generates bilingual markdown from raw tweets
  • references/feishu_format.md — Document structure and Feishu CLI reference

版本历史

共 1 个版本

  • v1.0.0 Initial release 当前
    2026-05-28 15:17 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 142,672
design-media

飞书文档转小红书简报图片工具

user_e8dcb574
飞书文档转小红书简报图片工具。读取飞书云文档中的双语内容,自动生成小红书风格的竖版卡片图片,支持纵向拼接为长图。适用财经日报、双语快讯、市场速递等内容排版。触发词:小红书图片、生成XHS图片、飞书转图片、日报转图片、长图拼接、简报图片。
★ 0 📥 148
data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 276 📥 101,363