← 返回
内容创作 中文

Image Deduplicator

Detect and remove exact or similar duplicate images in folders using perceptual and MD5 hashing with configurable similarity and actions.
使用感知哈希和MD5哈希检测并删除文件夹中完全相同或相似的重复图片,支持自定义相似度阈值和处理方式。
mingo-318
内容创作 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 564
下载
💾 13
安装
1
版本
#latest

概述

Image Deduplicator

Find and remove duplicate or similar images in a folder using perceptual hashing. Use when user wants to clean up duplicate images, find near-duplicates, or deduplicate an image dataset.

Features

  • Exact Duplicates: Find images with identical content
  • Similar Images: Detect visually similar images (threshold configurable)
  • Hash-based: Fast MD5 hashing for exact duplicates
  • Perceptual Hash: pHash for finding similar images
  • Batch Processing: Process large image folders
  • Multiple Actions: List, delete, or move duplicates

Usage

# Find exact duplicates
python scripts/dedupe.py scan /path/to/images/

# Find similar images (90% similarity)
python scripts/dedupe.py scan /path/to/images/ --threshold 90

# Delete duplicates (keeps first occurrence)
python scripts/dedupe.py scan /path/to/images/ --action delete

# Move duplicates to a folder
python scripts/dedupe.py scan /path/to/images/ --action move --output /path/to/dupes/

Examples

$ python scripts/dedupe.py scan ./images/

Scanning images...
Found 150 images
Computing hashes...
Found 5 duplicate groups:

Group 1 (3 files):
  ./images/photo1.jpg
  ./images/photo1_copy.jpg
  ./images/photo1_final.jpg

Group 2 (2 files):
  ./images/screenshot.png
  ./images/screenshot (1).png

Total: 5 duplicate groups, 8 duplicate files

Installation

pip install pillow imagehash

Options

  • --threshold: Similarity threshold (0-100), default: 100 (exact)
  • --action: What to do with duplicates (list, delete, move)
  • --output: Output folder for --action move
  • --extensions: File extensions to scan (default: jpg,jpeg,png,bmp)

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 15:50 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Sports Events Query

mingo-318
使用 TheSportsDB 免费 API,无需 API 密钥,即可获取多项体育赛事、赛果、球队信息和联赛详情。
★ 1 📥 694
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 860 📥 199,973
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,221