概述

Redact Skill

Privacy redaction toolkit using PPStructureV3 OCR for text detection and replacement.

Scripts

Script	Format	Command
--------	--------	---------
`read.py`	Images / PDF / Word / PowerPoint	`read.py [--info] [--mode json]`
`redact-image.py`	Images (png, jpg, etc.)	`redact-image.py`
`redact-pdf.py`	PDF	`redact-pdf.py`
`redact-document.py`	Word (docx, doc)	`redact-document.py`
`redact-presentation.py`	PowerPoint (pptx, ppt)	`redact-presentation.py`

CSV Rules Format

target_text,replacement_text
张三,李四
手机号,
身份证号,

Rule	Effect
------	--------
`原文本,新文本`	Replace with new text
`原文本,`	Empty = mask with █ (documents) or solid color block (images/PDF)

Masking Behavior

Format	Empty Replacement
--------	-------------------
Images, PDF	Solid color block overlay
Word, PowerPoint	`█` characters (same length as target)

Read Features

read.py supports:

Reading text from images, PDF, Word, and PowerPoint files
OCR for image files and embedded images
Page-aware output for PDF / Word / PowerPoint
--info structured output:
... for OCR text extracted from images

JSON Output

Document-like files (pdf, docx, doc, pptx) output:

{
  "type": "pptx",
  "pages": [
    {
      "page_index": 1,
      "content": [
        { "type": "text", "text": "..." },
        { "type": "image", "text": "ocr text..." }
      ]
    }
  ]
}

Image files output:

{
  "type": "image",
  "content": "..."
}

Features

Feature	Image	PDF	Document	Presentation
---------	:-----:	:---:	:--------:	:------------:
Read text	✅	✅	✅	✅
JSON output	✅	✅	✅	✅
Text replacement	✅	✅	✅	✅
Solid color mask	✅	✅	-	-
█ character mask	-	-	✅	✅
OCR detection	✅	✅	✅ (images)	✅ (images)
Tables	-	✅	✅	✅
Headers/Footers	-	✅	✅	-
Embedded images	-	✅	✅	✅

Environment Setup

使用 uv 安装依赖：

# 进入 skill 目录
cd skills/redact

# 同步依赖（自动创建虚拟环境并安装）
uv sync

Dependencies

Python 3.10+
PaddleOCR / PPStructureV3
python-docx, python-pptx, PyMuPDF, Pillow

版本历史

共 1 个版本

v0.1.1 当前

2026-05-03 10:27 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)