Converts a static image into an editable .pptx file where every text element is a selectable, editable text box over a clean inpainted background.
| Scenario | Recommendation |
|---|---|
| ---------- | --------------- |
| Slide with text on solid/flat background | Best results |
| Slide with photo background | Good — uses inpainting (warn about overlap areas) |
| Slide with solid background | Good — use --skip-inpaint for speed |
| Chinese/multilingual slide | Good — ch OCR handles both Chinese and English |
| Poster or infographic with text | Good — works well if text is separate from graphics |
| Dense chart with axis labels on bars | Caution — line grouping may over-merge crowded labels |
| Very thick/large decorative fonts | Caution — may exceed standard mask dilation range |
| Extract individual assets as PNGs | No — use px-asset-extract |
| Read text without creating PPTX | No — use OCR directly |
| Edit an existing .pptx file | No — use the pptx skill |
git clone https://github.com/JadeLiu-tech/px-image2pptx.git
cd px-image2pptx
pip install -e ".[all]"
px-image2pptx slide.png -o output.pptx
px-image2pptx slide.png -o output.pptx --lang ch
px-image2pptx slide.png -o output.pptx --skip-inpaint
px-image2pptx slide.png -o output.pptx --ocr-json text_regions.json
px-image2pptx slide.png -o output.pptx --work-dir ./debug/
from px_image2pptx import image_to_pptx
report = image_to_pptx("slide.png", "output.pptx")
# With options
report = image_to_pptx(
"slide.png", "output.pptx",
lang="ch",
skip_inpaint=False,
work_dir="./debug/",
)
| Option | Default | Description |
|---|---|---|
| -------- | --------- | ------------- |
-o, --output | output.pptx | Output PPTX path |
--ocr-json | Pre-computed OCR JSON (skips OCR) | |
--lang | auto | OCR language: auto, en, ch |
--sensitivity | 16 | Textmask sensitivity (lower = more) |
--dilation | 12 | Textmask dilation pixels |
--min-font | 8 | Min font size in points |
--max-font | 72 | Max font size in points |
--skip-inpaint | Skip LAMA inpainting | |
--work-dir | Save intermediate files |
Downloaded automatically on first use (~370 MB total). All models are from official open-source repositories.
| Model | Size | License | Source |
|---|---|---|---|
| ------- | ------ | --------- | -------- |
| PP-OCRv5_server_det | 84 MB | Apache 2.0 | PaddlePaddle/PaddleOCR |
| PP-OCRv5_server_rec | 81 MB | Apache 2.0 | PaddlePaddle/PaddleOCR |
| big-lama | 196 MB | Apache 2.0 | advimman/lama |
Models are cached locally after first download (~/.paddlex/official_models/ for OCR, ~/.cache/torch/hub/checkpoints/ for LAMA). To skip model downloads entirely, use --ocr-json with pre-computed OCR and --skip-inpaint.
| Input | Impact | What to tell the user |
|---|---|---|
| ------- | -------- | ---------------------- |
| Text on solid/flat background | Best results | No caveats needed |
| Text on textured background | Good results | LAMA handles repeating textures well |
| Text overlapping photos | Inpainting artifacts likely | "Areas where text covers photos may show blurring" |
| Dense chart with many labels | Over-merged labels | "Crowded labels may be grouped incorrectly" |
| Very thick/large fonts | Incomplete mask coverage | "Large fonts may exceed dilation range — try increasing --dilation" |
| Light text on dark background | Blockier inpainting | "White-on-dark text uses box masks instead of tight ink masks" |
| WebP image | OCR fails (0 regions) | Convert to PNG first: Image.open("in.webp").save("in.png") |
| Very large image (>4000px) | Slow inpainting | Suggest --skip-inpaint or downscaling |
| Decorative/handwritten fonts | Typeface won't match | "Fonts are reconstructed as Arial/Helvetica" |
| Centered/justified text | Left-aligned output | "Text alignment is not preserved" |
共 1 个版本