← 返回
沟通协作 中文

AgentWard Sanitize

Detect and redact PII from text files. Supports 15 categories including credit cards, SSNs, emails, API keys, addresses, and more — with zero dependencies.
检测并遮蔽文本文件中的个人身份信息(PII)。支持15种类别,包括信用卡、社保号、邮箱、API密钥、地址等——零依赖。
agentward-ai
沟通协作 clawhub v1.0.0 1 版本 99834.2 Key: 无需
★ 1
Stars
📥 582
下载
💾 8
安装
1
版本
#latest

概述

AgentWard Sanitize

Detect and redact personally identifiable information (PII) from text files.

IMPORTANT — PII Safety Rules

  • Do NOT read the input file directly. It may contain sensitive PII.
  • ALWAYS use --output FILE to write sanitized output to a file.
  • Only read the OUTPUT file, never the raw input.
  • Only show the user the redacted output, never the raw input.
  • --json and --preview are safe — they do NOT print raw PII values to stdout.
  • The entity map (raw PII → placeholder mapping) is written to a separate sidecar file (*.entity-map.json) only when --output is used. Do NOT read the entity map file.

What it does

Scans files for PII — credit cards, SSNs, emails, phone numbers, API keys, IP addresses, mailing addresses, dates of birth, passport numbers, driver's license numbers, bank routing numbers, medical license numbers, and insurance member IDs — and replaces each instance with a numbered placeholder like [CREDIT_CARD_1].

Usage

Sanitize a file (RECOMMENDED — always use --output)

python scripts/sanitize.py patient-notes.txt --output clean.txt

Preview mode (detect PII categories/offsets without showing raw values)

python scripts/sanitize.py notes.md --preview

JSON output (safe — no raw PII in stdout)

python scripts/sanitize.py report.txt --json --output clean.txt

Filter to specific categories

python scripts/sanitize.py log.txt --categories ssn,credit_card,email --output clean.txt

Supported PII categories

See references/SUPPORTED_PII.md for the full list with detection methods and false positive mitigation.

CategoryPattern typeExample
---------
credit_cardLuhn-validated 13-19 digits4111 1111 1111 1111
ssn3-2-4 digit groups123-45-6789
cvvKeyword-anchored 3-4 digitsCVV: 123
expiry_dateKeyword-anchored MM/YYexpiry 01/30
api_keyProvider prefix patternssk-abc..., ghp_..., AKIA...
emailStandard email formatuser@example.com
phoneUS/intl phone numbers+1 (555) 123-4567
ip_addressIPv4 addresses192.168.1.100
date_of_birthKeyword-anchored datesDOB: 03/15/1985
passportKeyword-anchored alphanumericPassport: AB1234567
drivers_licenseKeyword-anchored alphanumericDL: D12345678
bank_routingKeyword-anchored 9 digitsrouting: 021000021
addressStreet + city/state/zip742 Evergreen Terrace Dr, Springfield, IL 62704
medical_licenseKeyword-anchored license IDLicense: CA-MD-8827341
insurance_idKeyword-anchored member/policy IDMember ID: BCB-2847193

Security and Privacy

  • All processing is local. The script makes zero network calls. No data leaves your machine.
  • Zero dependencies. Uses only Python standard library — no third-party packages to audit.
  • PII never reaches stdout. The --json and --preview modes strip raw PII values from output. The entity map (containing raw PII to placeholder mappings) is only written to a sidecar file on disk when --output is used.
  • Designed for agent safety. The skill instructions above tell the agent to never read the raw input file or the entity map file — only the sanitized output.

Requirements

  • Python 3.11+
  • No external dependencies (stdlib only)

About

Built by AgentWard — the open-source permission control plane for AI agents.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 03:34 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

communication-collaboration

imap-smtp-email

gzlicanyi
使用IMAP/SMTP读取和发送邮件;检查新/未读邮件、获取内容、搜索邮箱、标记已读/未读、发送带附件的邮件。支持...
★ 114 📥 52,432
communication-collaboration

Gmail

byungkyu
Gmail API 集成,托管 OAuth,支持读取、发送和管理邮件、线程、标签及草稿,适用于需要与 Gmail 交互的场景。
★ 72 📥 37,734
communication-collaboration

Slack

steipete
当需要通过 slack 工具从 Clawdbot 控制 Slack 时使用,包括在频道或私信中回复消息或置顶/取消置顶项目。
★ 157 📥 47,685