ID: 197
Automatically convert uploaded drug application documents (Word/PDF) into XML skeleton structure compliant with eCTD 4.0/3.2.2 specifications.
scripts/main.py.
references/ for task-specific guidance.
python-docx>=0.8.11 # Word document parsing
PyPDF2>=3.0.0 # PDF text extraction
lxml>=4.9.0 # XML processing
See ## Usage above for related details.
cd "20260318/scientific-skills/Academic Writing/ectd-xml-compiler"
python -m py_compile scripts/main.py
python scripts/main.py --help
Example run plan:
CONFIG block or documented parameters if the script uses fixed settings.
python scripts/main.py with the validated inputs.
See ## Workflow above for related details.
scripts/main.py.
references/ contains supporting rules, prompts, or checklists.
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
python -m py_compile scripts/main.py
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
python -m py_compile scripts/main.py
python scripts/main.py --help
eCTD (electronic Common Technical Document) is the electronic Common Technical Document standard established by ICH for submitting drug registration applications to regulatory agencies such as FDA and EMA.
This tool parses uploaded drug application documents (Word/PDF) and converts them into XML skeleton structure compliant with eCTD 4.0/3.2.2 specifications.
eCTD/
├── m1/ # Module 1: Administrative Information and Prescribing Information (region-specific)
│ ├── m1.xml
│ └── ...
├── m2/ # Module 2: CTD Summaries
│ ├── m2.xml
│ └── ...
├── m3/ # Module 3: Quality
│ ├── m3.xml
│ └── ...
├── m4/ # Module 4: Nonclinical Study Reports
│ ├── m4.xml
│ └── ...
├── m5/ # Module 5: Clinical Study Reports
│ ├── m5.xml
│ └── ...
├── index.xml # Master index file
├── index-md5.txt # MD5 checksum file
└── dtd/ # DTD files
python skills/ectd-xml-compiler/scripts/main.py [options] <input_files...>
| Argument | Description |
|----------|-------------|
| input_files | Input Word/PDF file paths (supports multiple) |
| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| --output | -o | Output directory path | ./ectd-output |
| --module | -m | Target module (m1-m5, auto) | auto |
| --region | -r | Target region (FDA, EMA, ICH) | ICH |
| --version | -v | eCTD version (3.2.2, 4.0) | 4.0 |
| --dtd-path | -d | Custom DTD path | Built-in DTD |
| --validate | | Validate generated XML | False |
# Basic usage - auto-detect module
python skills/ectd-xml-compiler/scripts/main.py document1.docx document2.pdf
# Specify output directory and module
python skills/ectd-xml-compiler/scripts/main.py -o ./my-ectd -m m3 quality-doc.docx
# FDA submission format
python skills/ectd-xml-compiler/scripts/main.py -r FDA -v 3.2.2 *.pdf
# Validate generated XML
python skills/ectd-xml-compiler/scripts/main.py --validate submission.pdf
| Keyword Pattern | Target Module |
|------------|----------|
| Administrative, Label, Package Insert | m1 |
| Summary, summary, Overview | m2 |
| Quality, quality, CMC, API, Drug Product | m3 |
| Nonclinical, Toxicology, Pharmacokinetics | m4 |
| Clinical, clinical, Study, Trial | m5 |
Generated eCTD skeleton contains:
Master index file containing references and sequence information for all modules.
XML skeleton for each module, containing:
, )
)
MD5 checksum values for each file to ensure integrity.
# Install dependencies
pip install python-docx PyPDF2 lxml
Using --validate option can validate generated XML:
MIT License
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
No additional Python packages required.
Every final response should make these items explicit when they are relevant:
scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
This skill accepts requests that match the documented purpose of ectd-xml-compiler and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> ectd-xml-compiler only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Use the following fixed structure for non-trivial requests:
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
共 1 个版本