Use this skill as a normalization workflow, not as a universal migration program. The skill should understand the current batch, follow the standard template order, make mapping decisions explicit, confirm each step with the user in text, trial-run sample output for that step, and only then write a batch-specific migration script that outputs strict standard import templates.
outputs/anjuleyu/.references/standard-data-template/ as the default standard template directory. Read templates in filename order, skipping Excel lock files beginning with ~$..xls files.mapping-review.xlsx, issues.xlsx, and mapping.json are supporting evidence only; the user should not need to open them to understand or approve the plan..xls compatibility, and known source totals.Always organize mapping and confirmation by this standard template order:
01项目主数据导入模板.xls02门店信息导入模板.xlsx03项目补充信息导入模板.xlsx04户型导入模板.xlsx05房源导入模板.xlsxConfirm project master data first. Determine:
经度(高德)/纬度(高德), final generation is blocked until the source coordinate system is confirmed. When the source is Baidu BD-09 and the target is Gaode/GCJ-02, convert BD-09 to GCJ-02 and强制保留 6 位小数;never pass Baidu coordinates through to Gaode fields.Trial-run requirement: output a small sample of project rows and building/floor rows, including examples with non-numeric floor labels and coordinates, then ask the user to confirm before continuing.
Store data is optional. Check whether the source files contain store-level information.
Trial-run requirement: output a small sample of store rows or a clear "skip store import" decision, then ask the user to confirm.
Confirm fields in 03项目补充信息导入模板.xlsx, including project code, completion year, manager phones, associated/linked store, elevator/stairs, parking, deposit rules, property fee, water/electricity/gas/network/parking/other public service fees, and late-fee rules.
店长手机号和管家手机号是必填。If the source cannot identify valid values, ask the user for default values before generation; each provided default must already exist in the target system or an explicitly confirmed system user directory. Do not invent, fabricate, or silently reuse unrelated phone numbers.
For associated/linked store fields, determine whether the source has a project-to-store relationship in room pricing, room configuration, store configuration, or project metadata. Confirm the grain for conflicts (for example one project linked to multiple stores) and whether to join multiple stores or split project grain before generation.
Trial-run requirement: output a small sample of project-extra rows and call out blank or inferred non-required fields, including any project-to-store conflicts.
Confirm whether the source contains independent layout data.
项目 + 户型名称.04户型导入模板.xlsx, 室、厅、卫不能同时为 0. 厨 and 阳台 cannot be blank; default them to 0 unless the user confirms another valid value.Trial-run requirement: output a small sample of layout rows and duplicate/conflict checks before continuing.
Confirm room extraction last, after project and layout references are stable. Determine room uniqueness, project code reference, building/floor/room number parsing, layout reference, orientation, decoration, facilities/supporting amenities, area, lease status, tenant fields, rent, base price, VR URL, and video URL.
For room facilities/supporting amenities fields, inspect grouped configuration columns such as appliance, furniture, smart-device, utility, and public/private supporting fields. Confirm which truthy source values mean an amenity is present, normalize amenity names to the target wording, deduplicate them, and keep the output delimiter consistent across all room rows.
For room dictionary fields such as orientation, decoration, and lease status, read the target template's dictionary sheet or data-validation ranges and map source values into the system-allowed values instead of passing source labels through directly. For room numeric fields such as area and rent, confirm the target precision/format and write deterministic defaults in that same format.
建筑面积和租赁面积不能为 0. If the user-provided source data contains zero, blank, missing, non-positive, or unparseable area values, report the affected counts/examples and ask the user how to avoid zero values before final generation; do not silently output 0.
If 出租状态为已出租, tenant fields are mandatory: fill both 租客名称 and 租客手机号 from confirmed source data or a user-confirmed mapping. If either tenant field is missing, block final generation for those rows and ask the user how to handle them.
Trial-run requirement: output a small sample of room rows and required-field missing counts before final generation. Include examples that exercise facilities extraction, dictionary-value mapping, area zero/non-positive handling, tenant-required checks for rented rooms, and numeric precision/default formatting.
python scripts/third_party_data_normalizer.py analyze --source-dir "<source_dir>" --output-dir "<customer_output_dir>"
Pass --template-dir only when overriding the bundled templates. The bundled script is a helper/example for profiling, template inspection, AJLY mapping review, and regression testing. Treat its generate path as a sample implementation for a known batch, not as the default answer for new third-party systems. For a new or uncertain batch, write a fresh script from the confirmed mapping instead of forcing the data through a generic migrator.
Always show these decisions directly in the conversation, then point to the Excel review files for details:
室/厅/卫 all zero, blank 厨/阳台), and missing manager-phone defaults.Use conversation as the primary confirmation surface. Keep human-facing details in mapping-review.xlsx with Chinese sheet names and columns, but do not require the user to open it before they can understand or approve the mapping. Keep mapping.json only for the script.
Default behavior is conservative: fill deterministic fields, record inferred fields, and send ambiguous values to the issue list instead of inventing silent values.
Strictly reuse the files under references/standard-data-template/ unless the user explicitly provides a replacement template directory. For .xlsx templates, write only the 数据 sheet and keep the header row order unchanged.
When the source row count exceeds the template data capacity, split the output into numbered files instead of overfilling the template. The default split size comes from the smallest detected standard template data row capacity.
Legacy .xls templates are treated as fragile. This skill vendors xlrd/xlwt/xlutils for the project master workbook and writes the 项目 and 楼栋楼层 sheets from the copied standard template. If those dependencies fail in the local runtime, do not silently rewrite the file; generate a clear blocking note and ask the user to provide a writable template or handle that workbook manually.
For legacy .xls outputs, Python-written BIFF files may be readable but still fail when downstream Java/Apache POI writes an error workbook (for example HSSFWorkbook.write() record-cast errors). If Excel or WPS is available, add an optional post-generation native resave step for .xls files (open with Excel/WPS and SaveAs the same .xls format), and verify the resaved file, not just the Python-written file. When users report that manual Excel save fixes import, treat that as evidence of .xls writer compatibility, not data mapping.
Do not diagnose legacy .xls import failures only from file size. Check actual sheet nrows, non-empty rows, and data rows. A smaller file after Excel save can indicate native BIFF normalization, not necessarily removed blank rows.
Before calling final outputs ready, run automated checks for:
租客名称 and 租客手机号 populated.厨/阳台 are non-blank with confirmed defaults such as 0.店长手机号 and 管家手机号; user-provided defaults were verified as existing system values and were not fabricated.地上层数/地下层数) computed only from numeric floor labels..xls compatibility after any native Excel/WPS resave step.For Anjuleyu/AJLY batches, read references/ajly.md before mapping. The current known sample uses:
房间ID as the room merge key.房源定价.xlsx plus 房源配置.xlsx as primary room sources.物业地址/楼栋 as the default project grain.门店+物业地址 only as a conflict fallback.Treat these as batch observations and previously confirmed rules, not universal rules for other third-party systems.
references/standard-data-template/: bundled standard import templates used as the default target schema and confirmation order.scripts/third_party_data_normalizer.py: profiling, template inspection, AJLY mapping review, and sample workbook generation helper.scripts/test_third_party_data_normalizer.py: regression tests for template inspection, AJLY key analysis, confirmation gate, and split workbook generation.Use scripts as deterministic helpers for evidence and verification, not as a replacement for field-by-field user confirmation.
共 2 个版本