← 返回
未分类 中文

Data Quality Checker

Validate CSV, JSON, and JSONL data files for quality issues. Detects missing values, duplicates, type inconsistencies, statistical outliers, format violation...
验证CSV、JSON和JSONL数据文件的质量问题。检测缺失值、重复数据、类型不一致、统计异常值和格式违规等。
charlie-morrison charlie-morrison 来源
未分类 clawhub v1.0.1 1 版本 99766.9 Key: 无需
★ 0
Stars
📥 428
下载
💾 1
安装
1
版本
#latest

概述

Data Quality Checker

Validate CSV/JSON/JSONL data for quality issues. Pure Python, zero dependencies.

Quick Start

# Full quality check
python3 scripts/check_data_quality.py data.csv

# JSON/JSONL support
python3 scripts/check_data_quality.py data.json
python3 scripts/check_data_quality.py data.jsonl

# Markdown report
python3 scripts/check_data_quality.py data.csv --format markdown

# JSON report (for CI/CD)
python3 scripts/check_data_quality.py data.csv --format json

# Only specific checks
python3 scripts/check_data_quality.py data.csv --checks missing,duplicates,types

# Only warnings and critical
python3 scripts/check_data_quality.py data.csv --severity warning

# Save report
python3 scripts/check_data_quality.py data.csv --format markdown --output report.md

Schema Validation

# Generate schema from existing data
python3 scripts/check_data_quality.py data.csv --generate-schema schema.json

# Validate against schema
python3 scripts/check_data_quality.py data.csv --schema schema.json

Checks Performed

CheckDescriptionSeverity
------------------------------
missingMissing/null/empty values per columninfo → critical
duplicatesDuplicate rows and potential ID conflictswarning
typesMixed data types within columnsinfo → warning
outliersStatistical outliers via IQR methodinfo → warning
formatsEmail/phone/URL/date format violationswarning
whitespaceLeading/trailing whitespaceinfo
emptyEntirely empty columnswarning
driftExtra/missing keys across rows (schema drift)warning

Quality Score

0-100 score based on weighted severity:

  • 90-100: Clean data, minor issues
  • 70-89: Usable but needs attention
  • 50-69: Significant issues
  • 0-49: Critical problems

Exit Codes

  • 0 — No warnings or critical issues
  • 1 — Warnings found
  • 2 — Critical issues found

Use in CI: python3 scripts/check_data_quality.py data.csv || echo "Quality check failed"

Schema Format

JSON schema with validation rules:

{
  "required": ["id", "email", "name"],
  "properties": {
    "id": {"type": "integer", "minimum": 1},
    "email": {"type": "string", "pattern": "^[^@]+@[^@]+\\.[^@]+$"},
    "age": {"type": "number", "minimum": 0, "maximum": 150},
    "status": {"type": "string", "enum": ["active", "inactive", "pending"]}
  }
}

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 04:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

Env Config Validator

charlie-morrison
校验 .env 文件,匹配 schema,比较环境(dev 与 prod),检测尾部空格、占位符、无效端口、缺失 protoc 等常见错误
★ 0 📥 498
data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 272 📥 100,137
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 208 📥 67,309