← 返回
未分类

tidy-excel-data

Clean, normalize, validate, and summarize messy Excel workbooks. Use when Codex needs to organize .xlsx, .xls, .xlsm, or .csv data, including standardizing headers, removing blank rows and columns, trimming text, deduplicating records, detecting data quality issues, splitting or merging sheets, producing a cleaned workbook, or creating an audit summary for business spreadsheets.
Clean, normalize, validate, and summarize messy Excel workbooks. Use when Codex needs to organize .xlsx, .xls, .xlsm, or .csv data, including standardizing headers, removing blank rows and columns, trimming text, deduplicating records, detecting data quality issues, splitting or merging sheets, producing a cleaned workbook, or creating an audit summary for business spreadsheets.
user_32f7e774
未分类 community v1.0.0 1 版本 98305.1 Key: 无需
★ 0
Stars
📥 58
下载
💾 0
安装
1
版本
#latest

概述

Tidy Excel Data

Overview

Use this skill to turn messy spreadsheet data into a clean, auditable workbook. Favor a reversible workflow: inspect first, preserve raw data when useful, clean deterministically, and report every meaningful assumption.

Workflow

  1. Confirm the user's goal and output format when it is not obvious.
  2. Inspect workbook structure: sheet names, dimensions, merged cells, header rows, blank rows or columns, data types, duplicate candidates, and obvious ID/date/amount fields.
  3. Choose the safest cleaning level:
    • Light cleanup: trim text, remove empty rows/columns, normalize headers, preserve all records.
    • Standard cleanup: light cleanup plus duplicate removal and common type coercion.
    • Custom cleanup: apply user-provided rules for columns, filters, joins, pivots, or sheet splitting.
  4. Use scripts/tidy_excel.py for repeatable workbook cleanup when it fits the task.
  5. For complex business rules, write a short task-specific script or notebook-like Python snippet, then save the cleaned workbook and a QA summary.
  6. Verify the output by reopening the generated workbook or reading key sheets back with pandas/openpyxl.

Quick Start

Run the bundled script for common cleanup:

python scripts/tidy_excel.py input.xlsx --output cleaned.xlsx

Useful options:

python scripts/tidy_excel.py input.xlsx --output cleaned.xlsx --dedupe --coerce-types
python scripts/tidy_excel.py input.xlsx --output cleaned.xlsx --sheet Orders --summary qa_summary.csv
python scripts/tidy_excel.py input.csv --output cleaned.xlsx --header-row 2

The script writes cleaned sheets to an Excel workbook and adds a _tidy_summary sheet with row counts, column counts, duplicate counts, missing-value counts, and inferred column types.

Cleaning Rules

Apply these defaults unless the user gives different instructions:

  • Preserve the original file; write a new workbook instead of overwriting.
  • Normalize headers to lowercase snake_case; keep names unique with numeric suffixes.
  • Remove fully empty rows and columns.
  • Trim leading/trailing whitespace in text cells and collapse repeated internal whitespace.
  • Treat common empty markers (n/a, na, null, none, -, empty strings) as missing values.
  • Do not delete partial rows unless the user asks for filtering.
  • Deduplicate only when requested or when rows are exact duplicates and the user asked for "clean/organize".
  • Keep leading-zero identifiers as text unless the user explicitly wants numeric conversion.
  • Avoid destructive date or currency conversion when formats are ambiguous.

Data Quality Checks

Use references/quality-checks.md when deciding what to inspect or report. Include a short QA note in the final response covering:

  • workbook and sheet names processed
  • rows/columns before and after cleanup
  • duplicate rows removed, if any
  • columns with high missing rates
  • assumptions or conversions applied
  • output file path

Custom Tasks

For merges, pivots, category mapping, multi-row headers, or domain-specific rules, inspect the workbook first and adapt the workflow. Prefer pandas for tabular transformations and openpyxl when preserving workbook formatting, formulas, or sheet-level structure matters.

版本历史

共 1 个版本

  • v1.0.0 Initial release 当前
    2026-05-17 17:41 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 711 📥 243,717
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 324,029
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,215 📥 266,423