Extract and read text from PDF files using PyMuPDF.
pip install pymupdf
# Extract text (first 10 pages by default)
python pdf_reader.py "path/to/file.pdf" 10
# Output to JSON file (for reading)
python pdf_reader.py "path/to/file.pdf" 10 --output=extracted.json
# Read specific number of pages
python pdf_reader.py "path/to/file.pdf" 5
For safety, the script enforces:
.pdf files within the current working directory.json files within the current working directory../) allowedpdf_reader.py - Main Python scriptSKILL.md - This documentation共 1 个版本