Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.
from scripts.main import ClinicalDataCleaner
# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')
# Clean data with default settings
cleaned = cleaner.clean(raw_data)
# Save with audit trail
cleaner.save_report('output.csv')
cleaner = ClinicalDataCleaner(domain='DM') # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)
Required Fields:
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median' # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)
cleaner = ClinicalDataCleaner(
domain='LB',
outlier_method='domain', # iqr, zscore, domain
outlier_action='flag' # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)
Clinical Thresholds:
| Parameter | Range | Unit |
|---|---|---|
| ----------- | ------- | ------ |
| Glucose | 50-500 | mg/dL |
| Hemoglobin | 5-20 | g/dL |
| Systolic BP | 70-220 | mmHg |
standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median',
outlier_method='iqr',
outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')
Output Files:
output.csv - Cleaned SDTM dataoutput.report.json - Audit trail for regulatory submission# Clean demographics
python scripts/main.py \
--input dm_raw.csv \
--domain DM \
--output dm_clean.csv \
--missing-strategy median \
--outlier-method iqr \
--outlier-action flag
# Clean lab data with clinical thresholds
python scripts/main.py \
--input lb_raw.csv \
--domain LB \
--output lb_clean.csv \
--outlier-method domain
See references/common-patterns.md for detailed examples:
See references/troubleshooting.md for solutions to:
Pre-Cleaning:
Post-Cleaning:
references/sdtm_ig_guide.md - CDISC SDTM Implementation Guidereferences/domain_specs.json - Domain-specific field requirementsreferences/outlier_thresholds.json - Clinical outlier thresholdsreferences/common-patterns.md - Detailed usage patternsreferences/troubleshooting.md - Problem-solving guideSkill ID: 189 | Version: 2.0 | License: MIT
共 1 个版本