Convert WeChat public account articles to high-fidelity PDF files, preserving original content, formatting, layout, and images.
To convert a WeChat article, run the bundled script:
python D:\wechat-to-pdf\scripts\wechat_to_pdf.py <URL> [-o OUTPUT] [-f FORMAT] [-t TIMEOUT]
When a user provides a WeChat article URL for PDF conversion, follow these steps:
Verify the URL belongs to mp.weixin.qq.com. Reject non-WeChat URLs with a clear explanation. Accepted URL patterns:
https://mp.weixin.qq.com/s/...
https://mp.weixin.qq.com/s?__biz=...
Before executing, confirm playwright is installed. If missing, run:
pip install playwright
python -m playwright install chromium
-o argument
Run the conversion script with appropriate arguments:
python D:\wechat-to-pdf\scripts\wechat_to_pdf.py "<URL>" -o "<output_path>" -f <format>
The script handles all conversion steps internally:
data-src lazy-loading mechanism
After execution, confirm:
| Argument | Short | Required | Default | Description |
| ----------- | ----- | -------- | ------------------- | -------------------------------------------- |
| url | - | Yes | - | WeChat article URL |
| --output | -o | No | {cwd}/{title}.pdf | Output file path or directory |
| --format | -f | No | A4 | Paper format: A4, A3, Letter, Legal, Tabloid |
| --timeout | -t | No | 60 | Page load timeout in seconds |
| Code | Meaning |
| ---- | --------------------------------------------- |
| 0 | Success |
| 1 | Argument error |
| 2 | Invalid URL (not a WeChat article) |
| 3 | Missing dependency (playwright not installed) |
| 4 | Page load failure (timeout, network error) |
| 5 | PDF generation failure |
| Scenario | Action |
| ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------- |
| Non-WeChat URL | Inform user that only mp.weixin.qq.com URLs are supported |
| Network timeout | Suggest checking network connection or increasing timeout with -t |
| Playwright not installed | Provide installation commands |
| Images missing in PDF | The script already handles lazy-loading; if issues persist, refer to references/wechat-css-selectors.md for DOM structure updates |
scripts/wechat_to_pdf.py - Core conversion script. Accepts CLI arguments, validates URLs, handles lazy-loaded images, and generates PDF via Playwright headless Chromium.
references/wechat-css-selectors.md - Documents WeChat article page DOM structure, CSS selectors for hidden elements, and the image lazy-loading mechanism. Consult this file when WeChat page structure changes break the conversion.
共 1 个版本