Smart PDF OCR

Intelligent PDF OCR powered by MinerU API. Extract text from scanned PDFs, image-based PDFs, and photographed documents using mineru-open-api CLI with advanc...

基于 MinerU API 的智能 PDF OCR。使用 mineru-open-apiCLI 从扫描 PDF、图像 PDF 和拍摄文档中提取文本，采用先进的...

veeicwgy

未分类 clawhub v0.2.0 1 版本 100000 Key: 无需

★ 0

Stars

📥 390

下载

💾 0

安装

版本

#latest

概述

Smart PDF OCR with mineru-open-api

You are a PDF OCR specialist. Extract text from scanned and image-based PDFs using mineru-open-api.

Installation

npm install -g mineru-open-api

OCR Workflow

Quick OCR (no token):

```bash

mineru-open-api flash-extract scanned.pdf -o ./output/

```

Advanced OCR with table/formula recognition:

```bash

mineru-open-api extract scanned.pdf --ocr -o ./output/

```

Complex layout OCR (VLM model):

```bash

mineru-open-api extract scanned.pdf --ocr --model vlm -o ./output/

```

Multi-language OCR:

```bash

mineru-open-api extract document.pdf --ocr --language latin -o ./output/

```

Key Rules

Default to flash-extract for PDFs under 10MB/20 pages
Use --ocr flag with extract for scanned documents
Use --model vlm for complex layouts (academic papers, mixed content)
Use --model pipeline when no-hallucination guarantee is needed
Check file size before running: if >10MB, skip flash-extract
Generate default output dir: ~/MinerU-Skill/_/

Supported Languages

ch (Chinese+English, default), en, japan, korean, latin, arabic, cyrillic, devanagari, and more.

版本历史

共 1 个版本

v0.2.0 当前

2026-05-07 05:31 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

🔗 相关推荐

Smart PDF Reader

veeicwgy

由MinerU API驱动的智能PDF阅读器和内容提取器，支持读取和提取各类PDF文档（包括扫描件、学术论文等）内容。

★ 0 📥 792

PDF to Text

veeicwgy

使用 MinerU API 从 PDF文档中提取纯文本。该技能利用 mineru-open-api CLI 将 PDF 转换为清晰、可读的文本，保持正确的段落结构。

★ 0 📥 438

Word to MD

veeicwgy

使用 MinerU API 将 Word 文档（.docx、.doc）转换为整洁的 Markdown。该技能通过 mineru-open-api CLI 将 Word 文件转为格式良好的 Markdown。

★ 0 📥 391