Validate CSV/JSON/JSONL data for quality issues. Pure Python, zero dependencies.
# Full quality check
python3 scripts/check_data_quality.py data.csv
# JSON/JSONL support
python3 scripts/check_data_quality.py data.json
python3 scripts/check_data_quality.py data.jsonl
# Markdown report
python3 scripts/check_data_quality.py data.csv --format markdown
# JSON report (for CI/CD)
python3 scripts/check_data_quality.py data.csv --format json
# Only specific checks
python3 scripts/check_data_quality.py data.csv --checks missing,duplicates,types
# Only warnings and critical
python3 scripts/check_data_quality.py data.csv --severity warning
# Save report
python3 scripts/check_data_quality.py data.csv --format markdown --output report.md
# Generate schema from existing data
python3 scripts/check_data_quality.py data.csv --generate-schema schema.json
# Validate against schema
python3 scripts/check_data_quality.py data.csv --schema schema.json
| Check | Description | Severity |
|---|---|---|
| ------- | ------------- | ---------- |
missing | Missing/null/empty values per column | info → critical |
duplicates | Duplicate rows and potential ID conflicts | warning |
types | Mixed data types within columns | info → warning |
outliers | Statistical outliers via IQR method | info → warning |
formats | Email/phone/URL/date format violations | warning |
whitespace | Leading/trailing whitespace | info |
empty | Entirely empty columns | warning |
drift | Extra/missing keys across rows (schema drift) | warning |
0-100 score based on weighted severity:
0 — No warnings or critical issues1 — Warnings found2 — Critical issues foundUse in CI: python3 scripts/check_data_quality.py data.csv || echo "Quality check failed"
JSON schema with validation rules:
{
"required": ["id", "email", "name"],
"properties": {
"id": {"type": "integer", "minimum": 1},
"email": {"type": "string", "pattern": "^[^@]+@[^@]+\\.[^@]+$"},
"age": {"type": "number", "minimum": 0, "maximum": 150},
"status": {"type": "string", "enum": ["active", "inactive", "pending"]}
}
}
共 1 个版本