修复:
fetch_with_playwright 改用移动端 Chromium(is_mobile=True + iPhone UA + 393×852 viewport),临时分享链接(tempkey)可正常渲染
data-src 图片加载
对比(v1.0 → v1.1):
| 项目 | 旧版 | 新版 |
|------|------|------|
| User Agent | 桌面 Chrome | iPhone Safari |
| Viewport | 1280×900 | 393×852 |
| 临时链接 | ❌ 无法渲染 | ✅ 正常 |
| 懒加载图片 | ❌ | ✅ 滚动触发 |
Convert WeChat Official Account articles (mp.weixin.qq.com) into clean, high-quality Markdown. The skill uses a Python script optimized for WeChat's unique DOM structure, featuring deep noise removal, smart code block detection, rich text preservation, and intelligent paragraph formatting.
User provides WeChat article URL?
├── Yes → Go to Step 1: Install Dependencies & Run Script
├── User wants to convert HTML directly?
│ └── Use Step 2: In-Line Conversion (for fetched HTML)
└── User asks about multiple URLs?
└── Use batch mode with -f flag
```bash
pip install requests beautifulsoup4 markdownify
```
```bash
python scripts/wechat_to_md.py "
```
Options:
--no-images — Skip image downloading, keep remote URLs
--no-frontmatter — Omit YAML frontmatter
python scripts/wechat_to_md.py url1 url2 url3
```
└──
├──
└── images/
├── img_000.png
└── img_001.jpg
```
If the HTML has already been fetched (e.g., via web_fetch), use the script's convert_simple() function programmatically:
import sys
sys.path.insert(0, "<SKILL_DIR>/scripts")
from wechat_to_md import convert_simple
# 基础用法:仅转换,不下载图片
result = convert_simple("https://mp.weixin.qq.com/s/xxxxx")
markdown = result["markdown"] # Full Markdown string
metadata = result["metadata"] # {title, author, date, url, ...}
code_blocks = result["code_blocks"] # [{lang, code}, ...]
image_urls = result["image_urls"] # 原始图片 URL 列表
# 高级用法:同时下载图片到本地
result = convert_simple(
"https://mp.weixin.qq.com/s/xxxxx",
download_imgs=True, # 启用图片下载
output_dir="./my_article" # 指定输出目录(可选)
)
markdown = result["markdown"] # 图片链接已替换为本地路径
image_mapping = result["image_mapping"] # URL -> 本地路径映射
output_dir = result["output_dir"] # 实际输出目录
Return the Markdown content directly to the user or write it to a file.
.md file and present a summary.
The script removes 30+ WeChat-specific noise elements including:
.mp_profile_iframe, #ad_content)
.reward_area, .qr_code_pc)
#comment_container, #js_cmt_area)
mpvoice, mpvideo)
#relation_article)
display:none, visibility:hidden)
placeholders
Handles all 3 WeChat code block formats:
pre.code-snippet with data-lang attribute
.code-snippet__fix container with nested pre[data-lang]
pre[data-lang]
Features:
data-lang, CSS class, and code content
.code-snippet__line-index)
counter(line) garbage text)
→ , → , handles inline font-weight: bold
•, ·, 1., (1)) to proper Markdown lists
data-src → src)
→ space, zero-width spaces removed)
Generates YAML frontmatter:
---
title: "Article Title"
author: "Account Name"
date: "2026-04-08"
source: "https://mp.weixin.qq.com/s/xxxxx"
description: "Article description if available"
---
images/ 子目录
images/img_000.png)
图片下载增强功能:
# 下载图片并获取映射关系
from wechat_to_md import download_images, replace_image_urls
# 下载图片
url_to_local = download_images(
img_urls=["https://mmbiz.qpic.cn/..."],
output_dir=Path("./output"),
concurrency=5, # 并发数
timeout=30, # 超时时间(秒)
retries=2 # 重试次数
)
# 替换 Markdown 中的图片链接
md = replace_image_urls(markdown, url_to_local)
| Error | Cause | Resolution |
|-------|-------|------------|
| NetworkError | HTTP failure, timeout, 404 | Retries 3x with exponential backoff |
| CaptchaError | Captcha page detected | Inform user to wait and retry |
| ParseError | Content element not found | Check URL validity, may be restricted article |
| Missing dependencies | pip install not run | Install: pip install requests beautifulsoup4 markdownify |
mp.weixin.qq.com domain articles
For detailed WeChat article DOM structure, selectors, and element handling, refer to:
references/wechat-dom-reference.md — Complete WeChat DOM structure documentation
共 1 个版本