概述

PPX Parse

Use the local ppx CLI to parse PDFs and images into structured Markdown and JSON.

Runtime Requirements

Use Python >= 3.12.
Prefer installing PPX into a virtual environment instead of the system Python.
If ppx is missing, read references/troubleshooting.md and create a virtual environment before installing dependencies.
Keep this skill's frontmatter version synchronized from the repository pyproject.toml with scripts/sync_version.py.

Workflow

Confirm the runtime uses Python >= 3.12.
Check the runtime with scripts/check_ppx_env.sh.
If ppx is missing, create or use a virtual environment and install PPX there.
Choose parsing options:

Use --ocr auto by default.
Use --ocr yes for scanned PDFs or screenshots.
Use --ocr no for native PDFs when OCR causes noise.
Use --table auto by default.
Use --table llm only when the user needs highest table accuracy and an LLM backend is configured.

Run ppx parse -o .
Inspect the output folder and report the main artifacts:

doc.md
doc.json
pages/
images/ when figures are extracted

If parsing fails, summarize the failing step and load the relevant note from references/.

Common Commands

ppx parse report.pdf -o output/
ppx parse scan.pdf --ocr yes -o output/
ppx parse figure.png -o output/
ppx parse report.pdf --pages "1-5,10" -o output/
ppx parse report.pdf --table llm --backend deepseek -o output/

Output Contract

Prefer returning the absolute output directory.
Mention whether the result came from doc.md, doc.json, or page-level files.
Call out OCR mode, table mode, and backend when they materially affect accuracy.

References

Read references/cli-options.md when choosing parse flags.
Read references/backend-config.md when using DeepSeek, Paddle, or GLM backends.
Read references/troubleshooting.md when PPX is missing, Python is too old, or runtime dependencies fail.

版本历史

共 1 个版本

v1.0.0 当前

2026-05-21 15:50 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Ppx Parse

概述

PPX Parse

Runtime Requirements

Workflow

Common Commands

Output Contract

References

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

self-improving agent

Github

Skill Vetter