Use this skill to replace sensitive document data with realistic fake values while preserving readable structure. Prefer the bundled runner so the workflow works in Codex and in other skill-loading tools without manual dependency setup.
references/sensitive-field-rules.yaml.scan first for unfamiliar or high-risk files.anonymize to create new output files; the CLI refuses outputs that overwrite originals.verify on anonymized outputs when the source values or terms file are available; verification ignores the CLI's built-in fake values but still reports custom terms. Residual findings make the CLI exit non-zero.references/sensitive-field-rules.yaml and add or update tests before rerunning anonymization.From the skill directory:
python3 scripts/run_anonymize.py input.md
python3 scripts/run_anonymize.py input.docx --report report.json
python3 scripts/run_anonymize.py input.xlsx --mode scan
python3 scripts/run_anonymize.py input.pdf --mode scan
python3 scripts/run_anonymize.py ./docs --recursive --output-dir ./anonymized
python3 scripts/run_anonymize.py contract.md --terms sensitive_terms.txt --seed 20260603
python3 scripts/run_anonymize.py input.docx --field-rules custom_field_rules.yaml
python3 scripts/run_anonymize.py input.docx --no-field-rules
python3 scripts/run_anonymize.py output.anonymized.md --mode verify --terms sensitive_terms.txt
scripts/run_anonymize.py automatically creates .venv, installs scripts/requirements.txt, and runs the real CLI through the virtual environment Python. This avoids polluting global Python installs and works even when the caller cannot activate a shell environment.
For text-only work in an environment that already has PyYAML installed, skip installation:
python3 scripts/run_anonymize.py input.md --no-install
If the caller already manages dependencies, call the lower-level CLI directly:
python3 scripts/anonymize_files.py input.md
The lower-level CLI loads the bundled field rules by default, so PyYAML is required unless --no-field-rules is used. Use --field-rules to replace the bundled rules, or --no-field-rules only for debugging false positives.
Use fake data, not [REDACTED_*] placeholders. The CLI keeps the same real value mapped to the same fake value within one run, so repeated names, phone numbers, organizations, and custom terms remain internally consistent.
If a source value already matches the first fake candidate for its category, the CLI chooses a different fake value so the original is not preserved by accident.
Default examples:
张三, 李四, 王五19999999999, 18888888888zhangsan@example.com北京星河科技有限公司星云迁移项目fake_00000000000000000000000000000000For detailed categories and term-file syntax, read references/anonymization-rules.md.
Use --terms for one-off names, organizations, customer names, project names, contract numbers, or other business-specific values that regexes and bundled field rules cannot infer.
Term file format:
name:李雷
org:星河集团
project:天枢计划
customer:华东重点客户
plain sensitive phrase
Lines without a category use the generic custom fake-value pool.
The bundled rules in references/sensitive-field-rules.yaml apply to all supported file types, not only contracts. They cover common label/value fields such as emergency contacts, finance contacts, recipients, phone/email fields, addresses, bank accounts, bank routing codes, organizations, and common identifiers.
For Markdown, DOCX, and Excel tables, the CLI also checks adjacent cells: when a cell contains a configured label such as 财务联系人, 收件人, or 联行号, the cell immediately to its right is scanned with that rule. Excel also applies field rules to values below header-like rows, such as A1=姓名, A2=某姓名. Field labels themselves are preserved; only values are replaced. Excel reports use safe coordinates such as sheet 1!B2 rather than raw worksheet titles.
--mapping, never to stdout or reports.skipped_inputs as incomplete processing. Explicit unsupported or missing input paths make the CLI exit non-zero.apply_redactions(), and fails if detected text cannot be matched to redaction rectangles.Read references/supported-formats.md when processing DOCX/Excel/PDF files, when preserving layout matters, or when the report contains warnings.
共 2 个版本