← 返回
未分类

Ppx Parse

Parse PDFs and images into Markdown/JSON using the `ppx` CLI. Use when the user asks to OCR scanned PDFs or screenshots, extract tables from PDFs, convert PD...
使用 `ppx`CLI 将 PDF 和图片解析为 Markdown/JSON。适用于用户要求对扫描 PDF 或截图进行 OCR、提取 PDF 表格、转换 PDF...
lihanghang
未分类 clawhub v1.0.0 1 版本 99494.9 Key: 无需
★ 0
Stars
📥 197
下载
💾 0
安装
1
版本
#latest#markdown#ocr#pdf#table

概述

PPX Parse

Use the local ppx CLI to parse PDFs and images into structured Markdown and JSON.

Runtime Requirements

  • Use Python >= 3.12.
  • Prefer installing PPX into a virtual environment instead of the system Python.
  • If ppx is missing, read references/troubleshooting.md and create a virtual environment before installing dependencies.
  • Keep this skill's frontmatter version synchronized from the repository pyproject.toml with scripts/sync_version.py.

Workflow

  1. Confirm the runtime uses Python >= 3.12.
  2. Check the runtime with scripts/check_ppx_env.sh.
  3. If ppx is missing, create or use a virtual environment and install PPX there.
  4. Choose parsing options:
    • Use --ocr auto by default.
    • Use --ocr yes for scanned PDFs or screenshots.
    • Use --ocr no for native PDFs when OCR causes noise.
    • Use --table auto by default.
    • Use --table llm only when the user needs highest table accuracy and an LLM backend is configured.
  5. Run ppx parse -o .
  6. Inspect the output folder and report the main artifacts:
    • doc.md
    • doc.json
    • pages/
    • images/ when figures are extracted
  7. If parsing fails, summarize the failing step and load the relevant note from references/.

Common Commands

ppx parse report.pdf -o output/
ppx parse scan.pdf --ocr yes -o output/
ppx parse figure.png -o output/
ppx parse report.pdf --pages "1-5,10" -o output/
ppx parse report.pdf --table llm --backend deepseek -o output/

Output Contract

  • Prefer returning the absolute output directory.
  • Mention whether the result came from doc.md, doc.json, or page-level files.
  • Call out OCR mode, table mode, and backend when they materially affect accuracy.

References

  • Read references/cli-options.md when choosing parse flags.
  • Read references/backend-config.md when using DeepSeek, Paddle, or GLM backends.
  • Read references/troubleshooting.md when PPX is missing, Python is too old, or runtime dependencies fail.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-21 15:50 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,069 📥 803,653
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 672 📥 324,827
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,221 📥 267,204