Toolkit for validating and profiling tabular data quality.
from scripts.data_profiler import DataProfiler
from scripts.schema_validator import SchemaValidator
# Profile a dataset
profiler = DataProfiler()
report = profiler.profile(df) # pandas DataFrame
print(report["missing"])
print(report["outliers"])
# Validate against schema
schema = {
"age": {"type": "int", "min": 0, "max": 150},
"email": {"type": "str", "regex": r"^\S+@\S+\.\S+$"},
"id": {"type": "int", "unique": True}
}
validator = SchemaValidator(schema)
errors = validator.validate(df)
for err in errors:
print(err)
scripts/data_profiler.py - Dataset profiling and summary statsscripts/schema_validator.py - Schema-based validation enginescripts/anomaly_detector.py - Statistical anomaly detectionreferences/validation_rules.md - Common validation patterns共 1 个版本