← 返回
内容创作 中文

Reddit Archive

Download and archive Reddit posts including images, GIFs, and videos from specified users or subreddits with filtering and sorting options.
下载并归档来自指定用户或子版块的Reddit帖子,包括图片、GIF和视频,并提供筛选和排序功能。
terellison
内容创作 clawhub v1.4.0 2 版本 99918.7 Key: 无需
★ 0
Stars
📥 1,229
下载
💾 30
安装
2
版本
#latest

概述

SKILL.md — Reddit Archive

_Download and archive Reddit posts (images, GIFs, videos) from users or subreddits._

Auto-Installation

This script automatically checks for and installs its dependencies on first run:

  • requests — Python HTTP library
  • yt-dlp — video downloader

If missing, it will attempt to install them via pip install --user. You can also:

  • Pre-install: pip3 install requests yt-dlp
  • Override yt-dlp path: export YTDLP_PATH=/your/custom/path/yt-dlp

Browser Login Required for Reddit Videos

As of mid-2026, downloading v.redd.it videos requires an authenticated

Reddit session — yt-dlp's Reddit extractor reads cookies from your

browser to satisfy this. Stay logged into Reddit in Safari (or

another browser, see below) and the script handles it automatically.

  • Default browser: safari (macOS default).
  • Override: export REDDIT_COOKIES_BROWSER=chrome (or firefox,

brave, edge, vivaldi). Set to none to skip cookie loading

if you don't need Reddit videos.

  • Image-only / redgifs-only archives don't need this — the cookie

loader is harmless if you're not logged in (those URLs won't try to

use Reddit credentials), but v.redd.it posts will fail with an

Account authentication is required error.

When to Use

You want to archive content from Reddit — either from a specific user (u/username) or a subreddit (r/subname).

Usage

python3 ~/path/to/reddit_archive.py [options]

Options

FlagDescriptionDefault
----------------------------
-u, --userReddit username (either this OR --subreddit required)
-s, --subredditSubreddit name (either this OR --user required)
-o, --outputOutput directory~/temp/.reddit_
--sortSort order: hot, new, rising, top, controversialhot
--timeTime filter for top/controversial: hour, day, week, month, year, all
--afterStart date (YYYY-MM-DD)No filter
--beforeEnd date (YYYY-MM-DD)No filter
--limitMax posts to fetch (0 = unlimited)0
--imagesDownload images (jpg, png, webp)
--gifsDownload GIFs/videos (gfycat, redgifs, imgur)
--skip-existingSkip already-downloaded files
--workersParallel download workers4

Examples

# All posts from a user
python3 reddit_archive.py -u someuser

# Subreddit with date range
python3 reddit_archive.py -s orlando --after 2025-01-01 --before 2025-12-31

# Top 10 most upvoted posts of all time from a subreddit
python3 reddit_archive.py -s funny --sort top --time all --limit 10

# New posts only
python3 reddit_archive.py -s orlando --sort new

# GIFs only, specific user
python3 reddit_archive.py -u someguy --gifs

# Custom output dir
python3 reddit_archive.py -u someuser -o ~/Downloads/reddit_archive

Output

Downloads are saved to the output directory with the following structure:

output_directory/
├── Pictures/
│   ├── {target}_{post_id}.jpg
│   ├── {target}_{post_id}.png
│   └── ...
└── Videos/
    ├── {target}_{post_id}.mp4
    └── ...

File Organization

The skill is organized as:

reddit-archive/
├── SKILL.md              ← This file
└── scripts/
    ├── reddit_archive.py ← Main downloader script
    └── requirements.txt  ← Python dependencies

Rate Limiting

  • Pauses 0.8s between listing-page fetches
  • Presents as Safari on macOS (Reddit's anti-bot blocks descriptive bot

User-Agents in 2026)

  • Sets the over18 cookie so NSFW subreddits don't return an interstitial
  • Run one instance at a time — parallel runs trigger rate limits

Technical Notes

  • Data source: scrapes old.reddit.com listing HTML

(old.reddit.com/r/// or

old.reddit.com/user//submitted/). Reddit's anonymous JSON API

started returning 403 + an anti-bot HTML page in mid-2026, and the

self-serve OAuth flow is gated behind a Responsible Builder Policy

approval. old.reddit's server-rendered listings still work and embed

the same metadata in

attributes (schema

stable since ~2010).

  • Pagination: uses the after=t3_ cursor extracted from the

page's next › button rather than a JSON after field.

  • Galleries: old.reddit embeds preview.redd.it/. URLs

for each gallery item inline. Each image is also available unsigned at

i.redd.it/. (full resolution, no expiry), which is what we

download.

  • v.redd.it videos: routed through yt-dlp with

--cookies-from-browser (HTML scraping doesn't expose the DASH

manifest URL the way the old JSON API did, and yt-dlp's Reddit

extractor in 2026 needs an authenticated session to fetch the

manifest itself).

  • GIF/video downloads use yt-dlp (redgifs, gfycat, v.redd.it);

direct images and direct mp4/gif URLs are streamed via requests.

  • Date filtering is done client-side after fetching (filters by

the post's created_utc, which we derive from data-timestamp).

版本历史

共 2 个版本

  • v1.4.0 当前
    2026-06-04 12:40
  • v1.3.0
    2026-03-29 03:19 安全 安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 857 📥 199,329
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,419
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,136