概述

x推文自动抓取工具

抓取 x.com 用户推文，生成中英双语日报，发布到飞书云文档，并将链接推送到微信。

完整流程 / Full Workflow：

启动 Chrome CDP → 2. 抓取推文 → 3. 翻译排版 → 4. 推送飞书 → 5. 发送链接到微信

支持两种时间范围模式：

默认模式： 前一天 00:00 ~ 23:59（北京时间），即完整一天
自定义模式： 通过 TIME_START / TIME_END 指定任意起止时间（ISO 8601 格式）

触发条件 / Trigger Criteria

当用户提出以下请求时使用本技能：

抓取 x.com 用户推文并推送到飞书
从 x.com 内容生成"日报"（daily report）
获取用户从前一天 0 点到 24 点（北京时间）的推文
将飞书日报链接发送到微信
"Scrape tweets from x.com and push to Feishu, send link to WeChat"

Workflow

Phase 1 — Launch Chrome with CDP

The scraping script requires a logged-in Chrome instance with DevTools Protocol

enabled. The user's normal Chrome cannot be reused directly (sandbox

restrictions); a temporary profile must be created.

Kill existing Chrome:

```bash

pkill -9 -f "Google Chrome"

```

Copy essential session files to a temp profile (do NOT copy the full

profile — it is tens of GB and will hang):

```bash

rm -rf /tmp/chrome-debug-profile

mkdir -p /tmp/chrome-debug-profile/Default

for f in "Cookies" "Cookies-journal" "Login Data" "Login Data-journal" \

"Network" "Preferences" "Web Data" "Web Data-journal"; do

src="$HOME/Library/Application Support/Google/Chrome/Default/$f"

[ -e "$src" ] && cp -r "$src" /tmp/chrome-debug-profile/Default/ 2>/dev/null

done

# Also copy top-level files

for f in "Local State" "Last Version"; do

src="$HOME/Library/Application Support/Google/Chrome/$f"

[ -e "$src" ] && cp "$src" /tmp/chrome-debug-profile/ 2>/dev/null

done

```

Launch Chrome with the temp profile and debugging port:

```bash

nohup /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \

--remote-debugging-port=9222 \

--user-data-dir=/tmp/chrome-debug-profile \

--no-first-run \

--no-default-browser-check \

> /tmp/chrome-cdp.log 2>&1 &

sleep 6

```

Verify CDP is reachable:

```bash

curl -s --connect-timeout 5 http://127.0.0.1:9222/json/version

```

If this returns JSON with a Browser field, Chrome is ready.

All of the above MUST use dangerouslyDisableSandbox: true — macOS sandbox

blocks process management of system Chrome otherwise.

Phase 2 — Scrape Tweets

Run the bundled scraping script.

Default time range (前一天 00:00 ~ 23:59 北京时间):

cd <workspace> && \
  TARGET_USERNAME="DeItaone" \
  OUTPUT_DIR="<workspace>" \
  NODE_OPTIONS="" \
  NODE_PATH=<workspace_node_modules> \
  <node_path> <skill_dir>/scripts/scrape_tweets.js

Custom time range (自定义时间范围):

cd <workspace> && \
  TARGET_USERNAME="DeItaone" \
  TIME_START="2026-05-24T09:00:00+08:00" \
  TIME_END="2026-05-26T09:00:00+08:00" \
  OUTPUT_DIR="<workspace>" \
  NODE_OPTIONS="" \
  NODE_PATH=<workspace_node_modules> \
  <node_path> <skill_dir>/scripts/scrape_tweets.js

Environment variables / 环境变量：

变量 / Variable	说明 / Description	默认值 / Default
---	---	---
`TARGET_USERNAME`	x.com 用户名（不含 `@`）	`DeItaone`
`TIME_START`	时间窗口起点（ISO 8601），如 `2026-05-24T09:00:00+08:00`	自动计算（前一天 00:00 北京时间）
`TIME_END`	时间窗口终点（ISO 8601），如 `2026-05-26T09:00:00+08:00`	自动计算（前一天 23:59 北京时间）
`OUTPUT_DIR`	`tweets_raw.json` 输出目录	当前工作目录
`CDP_URL`	Chrome DevTools Protocol 地址	`http://127.0.0.1:9222`

时间格式说明 / Time Format：

支持时区偏移：2026-05-24T09:00:00+08:00（北京时间）、2026-05-24T01:00:00Z（UTC）
TIME_START 和 TIME_END 必须同时提供或同时省略
TIME_START 必须早于 TIME_END

The script:

If the script reports "Not logged in", the temp profile did not retain the

session. In that case, ask the user to log into x.com in the debug Chrome

window and re-run.

Phase 3 — Translate & Format

Read tweets_raw.json and translate each tweet into Chinese. Keep ticker

tags ($NVDA, $TSLA) in the translation. Use financial-news terminology.

Write translations as a JSON mapping file translations.json:

```json

{

"ENGLISH PREFIX TEXT...": "中文翻译...",

...

}

```

Use the first 60–80 characters of each English tweet as the key.

Run the formatting script:

```bash

TRANSLATIONS_PATH=/translations.json \

python3 /scripts/format_for_feishu.py \

/tweets_raw.json \

/report.md \

--title "Title 日报" \

--author @username

```

The script produces a markdown file with the structure defined in

references/feishu_format.md. It matches translations by longest prefix

match and inserts placeholders for any unmatched tweets.

Scan the generated markdown for （翻译待补充） placeholders. For any

remaining, manually add the translations by editing the markdown file.

Phase 4 — Push to Feishu

Use lark-cli to create a Feishu cloud document from the markdown:

LARK_CLI="<path-to-lark-cli>"
NODE_OPTIONS="" "$LARK_CLI" docs +create \
  --api-version v2 \
  --as user \
  --doc-format markdown \
  --content "@<path-to-report.md>" \
  --title "Title 日报"

Always use --api-version v2 and --doc-format markdown.
The @ prefix on --content signals a file path.
Run from the directory containing the markdown, or use an absolute path.
The command returns a Feishu document URL. Save this URL — it will be used

in Phase 5 to send to WeChat. Also display it to the user.

Phase 5 — Push Link to WeChat / 推送链接到微信

After the Feishu document is created, send the link to the user's WeChat via the

WorkBuddy Mini Program (微信小程序) using deliver_attachments.

Create a simple summary file containing the Feishu link:

```bash

cat > /feishu_link.md << 'EOF'

# 📊 推文日报已生成

飞书文档链接 / Feishu Doc Link：

博主 / Author： @

时间范围 / Time Range：

推文数量 / Tweet Count：

---

点击上方链接查看完整日报 / Click the link above to view the full report.

EOF

```

Use deliver_attachments to push the summary to WeChat:

```

deliver_attachments({

attachments: ["/feishu_link.md"],

explanation: "推送飞书日报链接到微信小程序"

})

```

> Note / 注意： This requires the user to have the "产物回传到小程序"

> (Deliver Artifacts to Mini Program) toggle enabled in WorkBuddy Mini Program

> connection settings.

Phase 6 — Cleanup

Close the debug Chrome: pkill -f "chrome-debug-profile"
The tweets_raw.json, translations.json, report.md, and

feishu_link.md files in the workspace are intermediate artifacts.

Keep them for traceability.

Key Pitfalls

Problem	Cause	Fix
---------	-------	-----
CDP connection refused	Chrome not running with `--remote-debugging-port`	Re-launch Chrome per Phase 1
"Not logged in" in script output	Temp profile missing cookies	Ask user to log in via the debug Chrome window
Full profile copy hangs	Chrome profile is 10–50 GB	Only copy the files listed in Phase 1 step 2
xcancel.com pagination blocked	Anti-bot verification	Never use xcancel/Nitter — always use x.com via CDP
`lark-cli` auth expired	Token TTL	Re-run — the CLI auto-refreshes

Bundle Contents

scripts/scrape_tweets.js — Playwright CDP scraper for x.com timelines
scripts/format_for_feishu.py — Generates bilingual markdown from raw tweets
references/feishu_format.md — Document structure and Feishu CLI reference

版本历史

共 1 个版本

v1.0.0 Initial release 当前

2026-05-28 15:17 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

x推文自动抓取工具

概述

x推文自动抓取工具

触发条件 / Trigger Criteria

Workflow

Phase 1 — Launch Chrome with CDP

Phase 2 — Scrape Tweets

Phase 3 — Translate & Format

Phase 4 — Push to Feishu

Phase 5 — Push Link to WeChat / 推送链接到微信

Phase 6 — Cleanup

Key Pitfalls

Bundle Contents

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

AdMapix

飞书文档转小红书简报图片工具

Tavily 搜索