← 返回
未分类 中文

Save Article With Images

Save web articles locally with images. Automatically downloads images, generates Markdown, and converts to PDF. Supports WeChat Official Account articles via...
将网页文章本地保存(含图片),自动下载图片、生成 Markdown 并转为 PDF,支持微信公众号文章...
barryqin9999
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 321
下载
💾 0
安装
1
版本
#latest

概述

Save Article with Images

Save web articles to local storage, supporting articles with images. Automatically downloads images, generates Markdown, and converts to PDF.

Triggers

  • "save article"
  • "save this article"
  • "download article"
  • "clip article"

Quick Execution

Articles Without Images

1. Fetch article content (Jina Reader or browser)
2. Save to saved-articles/{title}-{date}.md
3. Send file to Feishu

Articles With Images

1. Create directory reports/{article-name}/
2. Create images/ subdirectory
3. Download all images to images/
4. Generate Markdown (relative path references)
5. Convert to PDF
6. Send PDF to Feishu

Complete Workflow

Step 1: Check if Article Has Images

Methods:

  • Jina Reader returns content with !Image format
  • Or original webpage has tags

Decision:

  • Images < 3 → Save Markdown directly, don't download images separately
  • Images ≥ 3 → Process with image workflow

Step 2: Create Directory Structure

mkdir -p ~/.openclaw/workspace/reports/{article-name}/images/

Directory Structure:

reports/{article-name}/
├── {article-name}.md      # Markdown file
├── {article-name}.html    # HTML intermediate (optional)
├── {article-name}.pdf     # Final output (optional)
└── images/                # Image directory
    ├── image1.jpg
    ├── image2.png
    └── ...

Step 3: Fetch Article Content

Method A: Jina Reader (Recommended)

curl -s "https://r.jina.ai/URL"

Pros: Auto-converts to Markdown, extracts image links

Cons: Some sites blocked

Method B: Browser Fetch

# Open webpage
browser action=open url=URL

# Get content
browser action=act kind=evaluate fn='() => document.body.innerText'

# Get images
browser action=act kind=evaluate fn='() => {
  const imgs = document.querySelectorAll("img");
  return JSON.stringify(Array.from(imgs).map(img => ({
    src: img.src,
    alt: img.alt
  })));
}'

Step 4: Download Images

Single Image:

curl -o "images/image1.jpg" "https://example.com/image.jpg"

Batch Download (Python):

import requests
from pathlib import Path

def download_images(image_urls, output_dir):
    """Download image list"""
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    for i, url in enumerate(image_urls, 1):
        try:
            # Get extension
            ext = url.split('.')[-1].split('?')[0]
            if ext not in ['jpg', 'jpeg', 'png', 'gif', 'webp']:
                ext = 'jpg'
            
            # Download
            resp = requests.get(url, timeout=30, headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
            })
            
            if resp.status_code == 200:
                filename = f"image{i}.{ext}"
                (output_dir / filename).write_bytes(resp.content)
                print(f"✅ {filename}")
            else:
                print(f"❌ HTTP {resp.status_code}: {url}")
        except Exception as e:
            print(f"❌ {e}: {url}")

# Usage
# download_images(['url1', 'url2'], 'images/')

Image Naming:

  • Sequential: image1.jpg, image2.png, ...
  • By content: cover.jpg, screenshot.png, ...

Step 5: Generate Markdown

Template:

# {Article Title}

> Source: {URL}
> Author: {author}
> Published: {date}

---

![Cover](images/image1.jpg)

{Content}

---

## Images

![Figure 1: {description}](images/image2.jpg)
![Figure 2: {description}](images/image3.png)

---

*Saved: {timestamp}*

Image Reference Format:

![Description](images/filename.ext)

Step 6: Convert to PDF (Optional)

Using Preset Styles:

# CSS file
CSS_FILE=~/.openclaw/workspace/templates/mobile-friendly.css

# Convert to HTML
pandoc {article-name}.md -o {article-name}.html --standalone --css=$CSS_FILE

# Generate PDF
weasyprint {article-name}.html {article-name}.pdf

PDF Configuration:

  • Body: 16pt, line-height 1.8
  • Page: 6×9 inches, margins 1.5cm
  • Font: Noto Sans CJK SC

⚠️ Image Overflow Solution (Important)

Problem: Images too large (e.g., 1200px wide), exceed PDF page width (~432pt/6 inches)

Solution: Create CSS file to limit image max-width

Required CSS:

/* Prevent image overflow */
img {
  max-width: 100%;
  height: auto;
  display: block;
  margin: 1em auto;
}

/* Images in images/ directory - 90% width */
img[src^="images/"] {
  max-width: 90%;
  margin: 0.5em auto;
}

/* Body styles */
body {
  max-width: 100%;
  padding: 1cm;
}

Correct PDF Generation Flow:

# 1. Create CSS file (in article directory)
cat > style.css << 'EOF'
img { max-width: 100%; height: auto; }
img[src^="images/"] { max-width: 90%; }
EOF

# 2. Generate HTML with CSS
pandoc {article-name}.md -o {article-name}.html --standalone --css=style.css

# 3. Generate PDF
weasyprint {article-name}.html {article-name}.pdf

Key Points:

  • ✅ Must add max-width: 100% or max-width: 90%
  • ✅ Use relative paths images/xxx.jpg
  • ❌ Don't render images at original size (will overflow)

Step 7: Send to Feishu

Send Markdown:

message action=send channel=feishu target="user:ou_xxx" filePath="path/to/file.md"

Send PDF:

message action=send channel=feishu target="user:ou_xxx" filePath="path/to/file.pdf"

Platform-Specific Handling

SourceFetch MethodImage Handling
--------------------------------------
Twitter/XJina ReaderDownload pbs.twimg.com images
WeChat Official Accountbrowser + CamoufoxDownload mmbiz.qpic.cn images
General WebpagesJina ReaderDownload all img tags
Login Required SitesbrowserUser manual screenshot

Twitter/X Articles

Image URL Format:

https://pbs.twimg.com/media/XXXXX?format=jpg&name=small

Download Command:

# Get best quality
curl -o "images/image1.jpg" "https://pbs.twimg.com/media/XXXXX?format=jpg&name=large"

WeChat Official Account Articles

Problem: WeChat has anti-hotlinking, direct download fails

Solutions:

  1. Use browser to open article
  2. Save screenshot
  3. Or use Camoufox tool
# Use tool from agent-reach
cd ~/.agent-reach/tools/wechat-article-for-ai
python3 main.py "https://mp.weixin.qq.com/s/ARTICLE_ID"

Checklist

After saving, verify:

□ Markdown file generated
□ All images downloaded successfully
□ Image relative paths correct
□ Images display correctly (local preview)
□ PDF generated successfully (optional)
□ File sent to Feishu

Error Handling

ErrorCauseSolution
------------------------
Image download failedAnti-hotlinking/NetworkUse browser or lower quality
PDF generation failedMissing fonts/dependenciesCheck weasyprint installation
Markdown images not showingPath errorCheck relative paths
Jina Reader blockedSite restrictionUse browser fetch

File Locations

TypeDirectory
-----------------
Simple articlessaved-articles/{title}-{date}.md
Articles with imagesreports/{article-name}/
Temporary files/tmp/article-{id}/

Skill Version: 1.0.0

Created: 2026-03-17

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 06:48 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,062 📥 799,978
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 672 📥 324,528
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,363 📥 319,049