Use the local ppx CLI to parse PDFs and images into structured Markdown and JSON.
>= 3.12.ppx is missing, read references/troubleshooting.md and create a virtual environment before installing dependencies.version synchronized from the repository pyproject.toml with scripts/sync_version.py.>= 3.12.scripts/check_ppx_env.sh.ppx is missing, create or use a virtual environment and install PPX there.--ocr auto by default.--ocr yes for scanned PDFs or screenshots.--ocr no for native PDFs when OCR causes noise.--table auto by default.--table llm only when the user needs highest table accuracy and an LLM backend is configured.ppx parse -o .doc.mddoc.jsonpages/images/ when figures are extractedreferences/.ppx parse report.pdf -o output/
ppx parse scan.pdf --ocr yes -o output/
ppx parse figure.png -o output/
ppx parse report.pdf --pages "1-5,10" -o output/
ppx parse report.pdf --table llm --backend deepseek -o output/
doc.md, doc.json, or page-level files.references/cli-options.md when choosing parse flags.references/backend-config.md when using DeepSeek, Paddle, or GLM backends.references/troubleshooting.md when PPX is missing, Python is too old, or runtime dependencies fail.共 1 个版本