← 返回
未分类 中文

Image To Text Pdf

Convert a finished raster image, especially a generated poster or visual resume, into an image-based PDF with an additional selectable, copyable, searchable...
将已完成的栅格图像(如生成的海报或可视化简历)转换为图像型PDF,并提供可选择、可复制、可搜索等附加功能。
zjsxply
未分类 clawhub v0.1.0 1 版本 99537 Key: 无需
★ 0
Stars
📥 215
下载
💾 0
安装
1
版本
#latest

概述

Image to Text PDF

Turn a finished raster image into a PDF while preserving the image exactly and adding a transparent text layer for selection, copying, and search.

Workflow

  1. Get the complete image.
    • Treat the image as the visual source of truth.
    • Keep the original image dimensions; the layout coordinates should use the same pixel coordinate space whenever possible.
  1. Build a text-layer layout JSON.
    • Prefer OCR or a vision model that returns text boxes over trying to infer positions manually.
    • Use the user's source text to correct OCR transcription. OCR boxes determine position; the source text determines final copyable content.
    • Read references/ocr-alignment.md when you need guidance for extracting, correcting, or prompting for box-level text.
    • Read references/layout-json.md for the exact JSON schema.
  1. Generate both PDFs.
python scripts/compose_image_text_pdf.py \
  --image /path/to/image.png \
  --layout /path/to/layout.json \
  --output /path/to/image-text.pdf \
  --debug-output /path/to/image-text-check.pdf

For CJK or other non-Latin text, pass a Unicode font:

python scripts/compose_image_text_pdf.py \
  --image /path/to/image.png \
  --layout /path/to/layout.json \
  --output /path/to/image-text.pdf \
  --debug-output /path/to/image-text-check.pdf \
  --font-file /path/to/NotoSansCJK-Regular.ttc
  1. Inspect the debug PDF before delivering.
    • The final PDF should look like the image-only source.
    • The debug PDF should show highlighted text boxes and visible text where the hidden layer will be placed.
    • If a highlighted box is shifted, fix the layout JSON, not the image.
    • If copy/paste text is wrong, fix the text field in layout JSON, not the OCR image.

OCR Word Conversion

When an OCR tool returns word-level boxes, convert them to line-level layout items:

python scripts/ocr_words_to_layout.py \
  --ocr /path/to/ocr.json \
  --output /path/to/layout.json \
  --image-width 1536 \
  --image-height 2048 \
  --source-text /path/to/source.txt

The converter accepts common JSON shapes containing words, items, textAnnotations, or nested page/line/word objects. It groups nearby words into lines and can replace OCR text with the closest line from the source text when the match is strong.

Practical Rules

  • Keep text boxes line-level unless a paragraph must be selected as one unit. Line-level boxes are easier to position and debug.
  • Do not try to match the rasterized font exactly. The hidden layer only needs close geometry and correct text.
  • Use the visible inspection PDF as the validation artifact. It should make every embedded text span obvious.
  • For generated images, ask the image-generation step to keep text in large, separated blocks. Dense tiny text is harder for OCR and manual correction.
  • Preserve the image as the background instead of rebuilding the layout in presentation or web formats.

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-05-23 23:47 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Awesome Repo Builder

zjsxply
创建针对特定主题的GitHub awesome-list仓库框架,包含精炼的README、简洁的贡献规范、AI代理说明、URL验证等功能。
★ 0 📥 240

Semantic Scholar Library Feed

zjsxply
使用用户的 Semantic Scholar 账户阅读研究动态、检查私人文献库文件夹、向文件夹添加论文并解决相关论文问题。
★ 0 📥 254

Url Citation Search

zjsxply
查找引用指定URL的论文和预印本,尤其是博客、文档页、项目页或其他标准引文索引常遗漏的网页内容。
★ 0 📥 268