← 返回
数据分析 中文

Clinical Data Cleaner

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec...
用于临床试验数据清理、FDA/EMA申报数据准备、SDTM数据集标准化、缺失值处理...
renhaosu2024
数据分析 clawhub v0.1.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 812
下载
💾 45
安装
1
版本
#latest

概述

Clinical Data Cleaner

Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.

Quick Start

from scripts.main import ClinicalDataCleaner

# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')

# Clean data with default settings
cleaned = cleaner.clean(raw_data)

# Save with audit trail
cleaner.save_report('output.csv')

Core Capabilities

1. SDTM Domain Validation

cleaner = ClinicalDataCleaner(domain='DM')  # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)

Required Fields:

  • DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
  • LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
  • VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC

2. Missing Value Handling

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median'  # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)

3. Outlier Detection

cleaner = ClinicalDataCleaner(
    domain='LB',
    outlier_method='domain',  # iqr, zscore, domain
    outlier_action='flag'     # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)

Clinical Thresholds:

ParameterRangeUnit
------------------------
Glucose50-500mg/dL
Hemoglobin5-20g/dL
Systolic BP70-220mmHg

4. Date Standardization

standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00

5. Complete Pipeline

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median',
    outlier_method='iqr',
    outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')

Output Files:

  • output.csv - Cleaned SDTM data
  • output.report.json - Audit trail for regulatory submission

CLI Usage

# Clean demographics
python scripts/main.py \
  --input dm_raw.csv \
  --domain DM \
  --output dm_clean.csv \
  --missing-strategy median \
  --outlier-method iqr \
  --outlier-action flag

# Clean lab data with clinical thresholds
python scripts/main.py \
  --input lb_raw.csv \
  --domain LB \
  --output lb_clean.csv \
  --outlier-method domain

Common Patterns

See references/common-patterns.md for detailed examples:

  • Regulatory Submission Preparation
  • Interim Analysis Data Preparation
  • Database Migration Cleanup
  • External Lab Data Integration

Troubleshooting

See references/troubleshooting.md for solutions to:

  • Validation failures
  • Date parsing errors
  • Memory errors with large datasets
  • Outlier detection issues

Quality Checklist

Pre-Cleaning:

  • [ ] IACUC approval obtained (animal studies)
  • [ ] Sample size adequately powered
  • [ ] Randomization method documented

Post-Cleaning:

  • [ ] Validate against CDISC SDTM IG
  • [ ] Review all cleaning actions in audit trail
  • [ ] Test import to analysis software

References

  • references/sdtm_ig_guide.md - CDISC SDTM Implementation Guide
  • references/domain_specs.json - Domain-specific field requirements
  • references/outlier_thresholds.json - Clinical outlier thresholds
  • references/common-patterns.md - Detailed usage patterns
  • references/troubleshooting.md - Problem-solving guide

Skill ID: 189 | Version: 2.0 | License: MIT

版本历史

共 1 个版本

  • v0.1.1 当前
    2026-03-29 17:14 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 367 📥 140,089
content-creation

3d Molecule Ray Tracer

renhaosu2024
生成PyMOL和UCSF ChimeraX照片级渲染脚本,创建发表级分子可视化,支持光线追踪与景深
★ 0 📥 663
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 64,922