← 返回
未分类 中文

Dedupe

Deduplication reference — exact matching, fuzzy matching, hash-based dedup, bloom filters, and data quality. Use when removing duplicate records, files, or d...
去重参考——精确匹配、模糊匹配、哈希去重、布隆过滤器及数据质量。适用于删除重复记录、文件或数据。
xueyetianya xueyetianya 来源
未分类 clawhub v1.0.0 1 版本 99773.8 Key: 无需
★ 0
Stars
📥 441
下载
💾 2
安装
1
版本
#latest

概述

Dedupe — Data Deduplication Reference

Quick-reference skill for deduplication strategies, algorithms, and data quality patterns.

When to Use

  • Removing duplicate rows from datasets or databases
  • Deduplicating files in storage systems
  • Implementing fuzzy matching for near-duplicate detection
  • Choosing between exact and probabilistic dedup methods
  • Building ETL pipelines with deduplication stages

Commands

intro

scripts/script.sh intro

Overview of deduplication — types, strategies, and tradeoffs.

exact

scripts/script.sh exact

Exact deduplication — hash-based, key-based, and sorting approaches.

fuzzy

scripts/script.sh fuzzy

Fuzzy deduplication — similarity measures, blocking, and record linkage.

files

scripts/script.sh files

File-level deduplication — fdupes, jdupes, rdfind, and storage dedup.

algorithms

scripts/script.sh algorithms

Dedup algorithms — bloom filters, HyperLogLog, MinHash, SimHash.

sql

scripts/script.sh sql

SQL deduplication patterns — ROW_NUMBER, DISTINCT, GROUP BY strategies.

cli

scripts/script.sh cli

Command-line dedup tools — sort, uniq, awk, and stream processing.

checklist

scripts/script.sh checklist

Deduplication quality checklist and validation steps.

help

scripts/script.sh help

version

scripts/script.sh version

Configuration

VariableDescription
-----------------------
DEDUPE_DIRData directory (default: ~/.dedupe/)

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-31 00:25 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,145
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 208 📥 67,319
office-efficiency

Excel Formula

xueyetianya
根据描述生成Excel公式并诊断电子表格错误。适用于编写VLOOKUP公式、调试错误或转换公式。支持...
★ 2 📥 6,285