HR Report Converter

Overview

This skill converts HR data vendor Excel reports from major compensation/benefits survey providers (Aon, Mercer, WTW, KornFerry, IPAS) into a company's standardized internal Excel format. It handles field (header/column) mapping, job title matching, percentile notation normalization, unit standardization, and data filtering.

Key capabilities:

Auto-detect vendor from report structure
Interactive first-time configuration wizard
Reusable config file for subsequent conversions
Support for compensation, benefits, and human capital report types
Chinese/English bilingual field support

Workflow

User Request
    │
    ├── Has config file? ── Yes ──> Load config & convert directly
    │   (JSON config OR Excel mapping workbook)
    │
    └── No config file? ──> First-time Setup Wizard
         │
         ├── Step 1: User provides output template (result title)
         ├── Step 2: User provides field mapping table (vendor→internal)
         ├── Step 3: User provides job code matching table
         ├── Step 4: (Optional) User provides aging factor table
         ├── Step 5: Auto-detect vendor & validate mappings
         ├── Step 6: User confirms/edits any unmapped fields
         └── Step 7: Save config & execute conversion

Step-by-Step Procedure

Phase 1: Determine State (Config Exists?)

On each invocation, check whether the user already has a saved mapping configuration:

Ask: "Do you have an existing mapping configuration?" — accept either:

JSON config file (mapping-config.json style)
Excel mapping workbook (multi-sheet Excel with field mapping, job matching, and optional aging factor sheets)

If yes: load the config and skip to Phase 3.
If no: proceed to Phase 2 (First-time Setup).

Phase 2: First-Time Setup Wizard

This is a guided, interactive process. The user provides 3 required + 1 optional input files. Complete ALL steps before executing conversion.

Step 2.1: Collect Output Template + Company Job Master Data

Ask the user to upload their company's output template — an Excel file containing only the desired output column headers (1 row, no data).

Example: result title_简化案例.xlsx — 1 sheet with headers like X Job Code | X Job Family Group | Base 50th | TTDC 50th ...

Read and extract all column headers. These define the target internal format.
Classify columns into two groups:

Company columns (prefixed with X or company-specific): to be populated from the company's job master data + job code matching table via LEFT JOIN.
Market data columns (e.g., Base 50th, TTDC 65th): to be filled from vendor reports via field mapping.

The company job master data (job code, family, title, band, management level) is provided by the user — either inline in the output template's data rows, or as a separate file.
The job code matching table (from Step 2.3) provides the bridge: company job code → vendor job code. The output is assembled by:

Starting with the company job master data rows
LEFT JOINing vendor report data on job code (via the matching table)
Applying field mapping to select/rename vendor columns

If no template is available, ask the user to describe their desired output columns in plain text.

Step 2.2: Collect Field Mapping Table

Ask the user to upload their field mapping workbook — an Excel file with a sheet that maps internal field names to vendor field names.

Expected format (like Result data field match sheet):
Row 1: Column headers — Internal Field | Vendor A Field | Vendor A Field (Region) | Vendor B Field | ...
Subsequent rows: One row per internal field, with the corresponding vendor field name in each vendor column.
Use - to indicate a field does not exist for that vendor.
The same vendor may have separate columns for different regions (e.g., Mercer data field vs Mercer data field For India).

Read the mapping table and build the field mapping config:

For each internal field, find the corresponding vendor field based on the detected vendor and region.
Flag any - (missing) fields and prompt the user.

If the user does not have this file: auto-generate a suggested mapping by comparing the output template columns against the vendor report columns, then present it for user confirmation.

Step 2.3: Collect Job Code Matching Table

Ask the user to upload their job code matching table — an Excel file that maps company job codes to vendor job codes.

Expected format (like Non-Exec Job Match sheet):
Row 1: Column headers — X Job Code | X Job Family | X Job Title | X Band | X Management Level | Vendor A Job Code | Vendor B Job Code | ...
Subsequent rows: One row per company job, with multiple vendor job code columns (one per vendor).
One company job may map to multiple vendor job codes (1:N relationship).

Read and extract the relevant vendor's job code column.
If the user does not have this file: attempt fuzzy matching by job title, but warn the user that fuzzy matching is unreliable and they should verify.

Step 2.4: (Optional) Collect Aging Factor Table

Ask: "是否需要对市场数据进行 aging（老化）处理？如果需要，请提供按国家的老化系数表。"
If provided, expected format (like Aging Factor sheet):

Row 1: X Country | Aged to Date | Aging Factor
Subsequent rows: One row per country with the aging factor (decimal, e.g., 0.03 = 3%).

If not provided, skip aging and use vendor data as-is.

Step 2.5: Auto-Detect Vendor & Validate Mappings

Ask the user to upload the vendor Excel report.
Auto-detect vendor using column patterns (see Phase 3 Step 3.2 or --auto-detect).
Using the field mapping table from Step 2.2, validate that all market data columns in the output template have a valid vendor source field.
Present validation results:

| Output Column          | Vendor Source Field          | Status     |
|------------------------|-----------------------------|------------|
| Base 50th              | Base Salary Median_IW       | ✓ Matched  |
| Base 65th              | Base Salary Perc65_IW       | ✓ Matched  |
| TTDC 50th              | TDC Target (B-S) Median_IW  | ✓ Matched  |
| Survey Job Title       | (company column)             | — N/A      |
| TTDC 65th              | TDC Target (B-S) Perc75_IW  | ✓ Matched  |

For any unmapped market data columns, prompt the user to specify manually.
Handle special cases:

Percentile notation: Normalize vendor notation to internal notation (e.g., Mercer Median → 50th).
Currency: Note the vendor's currency and ask if conversion is needed.
Units: Check for K/M suffixes that need normalization.

> Mercer Dual-Row Header Matching Rule (CRITICAL):

> Mercer reports use 2-row headers: Row 1 = technical camelCase names (e.g., TotalGuaranteedAnnual), Row 2 = human-readable display names (e.g., Total Guaranteed Annual).

> When matching Mercer fields to the user's internal format, follow this priority:

> 1. First: Match against Row 2 (display names) — these are more intuitive and closer to common naming conventions.

> 2. Fallback: If no match found in Row 2, try matching against Row 1 (technical names).

> 3. No match: If neither row produces a match, flag the field as (unmapped) and explicitly prompt the user: "未找到匹配项：[字段名]（Row 2: [display name] / Row 1: [technical name]），请手动指定映射目标。"

> This rule applies to both Mercer Global and Mercer India reports.

Missing / Empty Value Handling Rules (CRITICAL)

Apply these rules throughout the entire conversion pipeline:

A. In user's mapping configuration (field mapping table / job match table):

Cells with -, empty, null, N/A, #N/A, or any placeholder indicating absence → skip — do not attempt to map this field.
Effectively treat as "this vendor does not provide this data point."

B. In vendor report data (actual source data):

Cells with -, empty, null, N/A, #N/A, or any placeholder → output as -- (double dash) in the result.
Cells with actual values (numbers, text, strings) → preserve original value and format exactly — do not round, truncate, or reformat unless the user's transformation rules explicitly require it.

C. In output Excel:

All -- entries should be formatted as plain text, not numbers.
Do not apply number formatting (comma separator, decimal places) to -- cells.

Step 2.6: Build and Save Config

Consolidate all collected information into a reusable config:

If using Excel mapping workbook: save the workbook path as the config reference (no JSON conversion needed).
If using AI-generated mapping: save as JSON config using assets/mapping-config-template.json as template.

Inform the user: "Configuration saved. Next time, provide this config to skip setup."

Phase 3: Execute Conversion

Step 3.1: Prepare Dependencies

Before running the script, ensure pandas and openpyxl are available:

pip install pandas openpyxl

If installing is not possible, fall back to AI-only processing mode: manually read the Excel, apply mappings in code, and write the output.

Step 3.2: Run Conversion Script

Execute the conversion:

python3 scripts/convert_report.py \
  --config hr-mapping-config.json \
  --input vendor_report.xlsx \
  --output internal_report.xlsx \
  --report-type compensation

Supported report types: compensation, benefits, human_capital.

Step 3.3: AI-Only Fallback (if script unavailable)

If the Python script cannot run (missing dependencies, environment issues), process the conversion using direct file operations:

Read the vendor Excel with pandas or openpyxl.
Apply column renaming per the config's field_mapping.
Apply job title substitution per job_title_mapping.
Apply data filters per filters.
Reorder columns to match internal format.
Write the output Excel with proper formatting (header styling, column widths).

Step 3.4: Quality Check

After conversion, verify the output:

Check row count matches expectations (data not lost).
Spot-check a few cells for correct values.
Verify column headers match the internal format.
Confirm job titles were correctly mapped (if applicable).
Report results to the user with a summary:

Conversion Summary:
  Vendor: Mercer
  Report Type: Compensation
  Input rows: 1,247
  Output rows: 1,247
  Columns mapped: 14/18 (4 preserved as-is)
  Job titles mapped: 892/1,247 (71%)
  Output: internal_report.xlsx

Key Knowledge: Vendor Differences (Based on Real Sample Data)

When building or verifying field mappings, reference references/vendor-report-structures.md for detailed vendor-specific field structures. All observations below are based on real vendor Excel samples stored in references/.

Aspect	Aon	Mercer	Mercer India	WTW	IPAS
--------	-----	--------	-------------	-----	------
Percentile notation	P25/P50/P65/P75/P90	Perc25/Median/Perc65/Perc75/Perc90	Same as Mercer	25th/50th/65th/75th/90th	P25/P50/P65/P75/P90
Header rows	1	2 (critical!)	2 (critical!)	1	1
Case convention	Title Case	camelCase (R1) / Title Case (R2)	Same as Mercer	lowercase	Mixed
Org count name	Company Count	NumOrgs	NumOrgs	#Orgs	Mkt # Cos
Employee count name	Employee Count	NumObs	NumObs	#Incs	Mkt #EEs
Fiscal year ref	CFY/LFY	DataEffectiveDate	DataEffectiveDate	N/A	Data Aged To
Car allowance	Explicitly excluded	Included in Guaranteed	Included in Guaranteed	N/A	Included in Base+
STI % base	% of CFY Base	% of Base Salary	% of Guaranteed Cash	% of Base Salary	Inc Tgt %
Columns (real sample)	97	85	85	82	61
Total Cash name	Target/Actual Total Cash excl. Car Allowance	TDC Target/Actual (B-S)	TDC Target/Actual (B-S)	Target/Actual Total Annual Compensation	TTC / ATC

> Critical parsing notes:

> - Mercer has 2-row headers: Row 1 = technical camelCase names, Row 2 = display names. Use Row 2 for mapping.

> - Aon headers contain newlines (\n): Strip them when parsing column names.

> - Mercer India differs from Mercer Global: STI % is of Guaranteed Cash (not Base). Has explicit TotalCashTarget/TotalCashActual.

> - IPAS uses abbreviated names: Base, Base+, Inc Tgt, Inc PYA, TTC, ATC.

> - WTW uses lowercase column names throughout.

> - Job title matching is NEVER 1:1 — always validate matches using job descriptions, not just titles.

Bundled Resources

scripts/convert_report.py

Python script for automated Excel conversion. Handles:

Excel reading (openpyxl engine)
Column renaming via config
Job title fuzzy matching
Percentile notation normalization
Unit normalization (K/M/B suffixes)
Currency conversion
Data filtering
Fuzzy country name matching for aging factors (5-level priority: exact → substring aging→vendor → substring vendor→aging → token overlap → no match)
Audit Log sheet generation — automatically records all special cases (fuzzy aging matches, missing aging matches, unmapped fields, unmatched jobs) with severity-based color coding (WARNING=red, INFO=yellow)
Formatted output Excel with styled headers, auto-fit columns, number formatting

CLI usage:

python3 scripts/convert_report.py --config <config.json> --input <vendor.xlsx> --output <internal.xlsx> --report-type compensation
python3 scripts/convert_report.py --input <vendor.xlsx> --auto-detect  # Detect vendor + suggest mapping

references/

Documentation and reference materials.

Files:

vendor-report-structures.md — Comprehensive field structure reference based on real vendor samples (Aon, Mercer, Mercer India, WTW, IPAS, KornFerry), with cross-vendor mapping table. All field information from the original Excel samples is captured here.

Note: Real vendor Excel sample files (.xlsx) are maintained separately by the user and are NOT included in the skill zip package due to SkillHub's binary file restrictions. The skill package only contains text-based files: .md, .py, .json. The Markdown reference document (vendor-report-structures.md) contains all vendor field information needed for the skill to function.

When to load: Always load vendor-report-structures.md during Phase 2 (setup) and when encountering unfamiliar vendor columns.

assets/mapping-config-template.json

JSON template for the mapping configuration file. Contains:

company_info: Company name, format version, default currency
job_title_mapping: Vendor title → internal title pairs
report_types: Per-type config (compensation/benefits/human_capital) with field_mapping, filters, output_format
transformation_rules: Currency conversion, rounding, unit normalization, date format

When to use: As the starting template when generating a new user config. Copy and populate during Phase 2 Step 2.5.

Output Format Rules (CRITICAL)

Only output ONE merged result file — never output individual vendor results separately. All vendor data is consolidated into a single file.
Output headers MUST come from the user-provided output template (B document). The output columns exactly match whatever headers the user provides in their output template Excel.
If no output template is provided, use the first column of the field mapping table (A document) as output headers — these are the company's internal field names.
Output headers are ALWAYS the company's internal field names — never the vendor's original field names. The vendor field names are only used internally to locate data in the vendor report.
All vendor data rows are stacked vertically in the merged output, using the same column format. No vendor name prefix or suffix is added to column headers.

Core Mapping Logic

The conversion follows a 3-layer lookup chain:

Output Template Header (B document)
        │
        ▼
Field Mapping Table (A document) - Column 1 = company internal field name
        │                              Other columns = vendor-specific field names
        ▼
Vendor Report (C document) - Find column matching vendor field name → extract data
        │
        ▼
Output Row: Company internal field name as header, vendor data as value

Step-by-step:

Read the output template (B document) headers → these are the definitive output column names.
For each output column header, look it up in the field mapping table (A document) — find it in Column 1 (internal field names).
Once found, read the corresponding vendor-specific column in that same row of A document — this gives you the vendor's field name.
Go to the vendor report (C document), find the column with that vendor field name, and extract the data.
Write the extracted data under the company's internal field name (from B document header) — NOT under the vendor's field name.
If a field exists in B document but has no mapping in A document for a particular vendor → output --.
If a field exists in A document but not in B document → it is NOT included in the output (B document is the authoritative format).

Important: The A document's structure is not hardcoded. Users may provide their mapping in any format — the key is to identify:

Which column contains company internal field names (these become output headers or match B document headers)
Which columns contain vendor-specific field names (these are used to look up data in vendor reports)

Aging Factor Rules (CRITICAL)

Fuzzy Country Name Matching

When applying aging factors, vendor country names may not exactly match the aging factor table. Apply this matching priority:

Exact match — vendor country name = aging table country name (e.g., China = China)
Substring: aging name in vendor name — aging table name found as substring of vendor name (e.g., China found in China(T1) → match, use factor for China)
Substring: vendor name in aging name — vendor name found as substring of aging table name (e.g., United States found in United States of America → match)
Token overlap — significant words overlap between vendor name and aging table name (e.g., South Korea → Korea, Republic of via shared korea)
No match — skip aging, log as WARNING

Critical rules:

The output country name always keeps the vendor's original text (e.g., output stays China(T1), NOT changed to China)
The aging factor is applied based on the matched aging table entry, but the display text is never altered
If no match is found at all, the aging factor is NOT applied — the raw vendor value is used as-is

Audit Log Sheet

The output Excel file MUST contain a second sheet called "Audit Log" that records all special cases for user review:

What to log:

Type	Severity	When to log
------	----------	-------------
`fuzzy_aging_match`	INFO	Country name didn't exactly match but was fuzzy-matched. Log the vendor name, matched aging table name, factor used, and match method
`no_aging_match`	WARNING	Country name has NO match in aging factor table — aging was SKIPPED entirely
`no_field_mapping`	INFO	An output template field has no corresponding vendor field mapping in the A document — output will be `--`
`no_job_match`	INFO	A company job code had no matching vendor job code — row output with all vendor fields as `--`

Audit Log format:

Column	Content
--------	---------
#	Sequential number
Severity	`WARNING` (red highlight) or `INFO` (yellow highlight)
Type	Event type code (see table above)
Vendor	Which vendor this applies to
Detail	Human-readable description of what happened

Style requirements:

WARNING rows: red background (#FFC7CE) with dark red text (#9C0006)
INFO rows: yellow background (#FFEB9C) with dark yellow text (#9C6500)
Header row: dark blue background with white bold text
Freeze header row for scrolling
Add summary row at top with timestamp and entry counts

Multi-Vendor Merge Guide

When consolidating data from multiple vendors into a single output table, follow this approach:

How it works

Read the user's output template (B document) to get the definitive output column headers.
For each vendor report:

a. Use the field mapping table (A document) to find the vendor-specific field name for each internal field.

b. Read the vendor data, matching on job codes via the job code matching table.

c. Extract data using vendor field names, but label columns with internal field names.

Stack all vendor data rows vertically in the same output file, using the same column headers from B document.
No vendor prefix/suffix on column names — all rows share the same format.

Efficiency tips

Process vendors in parallel (each conversion is independent).
The field mapping table already supports multi-vendor columns — just read the correct vendor column for each run.
Use the job code matching table's multiple vendor columns to JOIN different vendor results to the same company job rows.
For large datasets (10,000+ rows), consider using pandas merge() instead of AI-driven JOIN for performance.

版本历史

共 2 个版本

v1.0.1 从分别输出合并文档和独立文档更新为只生成合并文档（独立文档被包含进了合并文档场景）；更新了近似匹配；有特殊处理的数据都整合到单独的工作簿中以便用户检验。当前

2026-05-31 19:04 安全安全
v1.0.0 当前版本仅限薪酬报告的匹配，需要用户提供本公司内部与外部供应商的匹配逻辑即可快速将供应商数据回匹为内部可用的分析版本

2026-05-31 17:01 安全安全

人力资源供应商报告整合skill

概述