Use this skill as a fallback workflow for PDFs that break normal analysis paths.
Prefer the built-in pdf tool first when it is likely to work. If it fails, hangs, times out, or the file is too large, switch to this local workflow.
Read references/patterns.md if you need the rationale, chunking heuristics, or fallback guidance.
pdf tool aborts, provider-native upload fails, or file limits make direct analysis unlikely to work.scripts/extract_pdf.py to extract markdown locally.--url to download a remote PDF first.--chunk-dir when the output will be too large to read in one pass.--summary-out to generate a lightweight first-pass summary artifact.Local file command:
python3 skills/resilient-pdf/scripts/extract_pdf.py <file.pdf> --out <output.md> --json
Remote URL command:
python3 skills/resilient-pdf/scripts/extract_pdf.py \
--url <https://example.com/file.pdf> \
--out <output.md> \
--download-to <downloaded.pdf> \
--json
Chunked plus summary command:
python3 skills/resilient-pdf/scripts/extract_pdf.py <file.pdf> \
--out <output.md> \
--chunk-dir <chunk-dir> \
--summary-out <summary.md> \
--chunk-chars 120000 \
--chunk-overlap 4000 \
--json
The script:
--urluvxuvx --from 'markitdown[pdf]' markitdownIf uvx is not available, tell the operator the exact command to install it:
python3 -m pip install --user --break-system-packages uv
Do not silently install dependencies unless the user asked you to.
A successful run should give you:
--urlUse those outputs as the source of truth for later summarization.
pdf tool. It is the fallback when that path is unreliable.共 1 个版本