← 返回
未分类 中文

Ocr Document

OCR document extraction - extract text from scanned documents, photos, and images using OCR. Use when reading scanned PDFs, photographed pages, handwritten n...
OCR文档提取——使用OCR技术从扫描文档、照片和图像中提取文字,适用于读取扫描PDF、拍摄页面、手写内容等场景。
tanis90 tanis90 来源
未分类 clawhub v1.0.0 1 版本 99934.9 Key: 无需
★ 2
Stars
📥 1,495
下载
💾 398
安装
1
版本
#latest

概述

OCR Document - Extract Text from Scanned Documents and Images

Extract text from scanned documents and images using OCR via MinerU Open API. No API key required.

Quick Start

# OCR a scanned PDF
mineru-open-api flash-extract scanned.pdf

# OCR an image of a document
mineru-open-api flash-extract page-photo.jpg

# OCR from URL (no download needed)
mineru-open-api flash-extract https://example.com/scanned.pdf

# Specify language for better accuracy
mineru-open-api flash-extract scanned.pdf --language en

# Save OCR result to file
mineru-open-api flash-extract scanned.pdf -o ./output/

Language Rule

You MUST reply to the user in the SAME language they use. This is non-negotiable.

Capabilities

  • OCR for scanned PDFs, photographed documents, images
  • Supports PDF, PNG, JPG, WebP, BMP, TIFF
  • Supports both local files and URLs directly
  • Language hint with --language (default: ch, use en for English)
  • No API key, no signup, no authentication
  • Max 10MB / 20 pages per document

When to Use

  • User asks to "OCR" a document or image
  • User has a scanned PDF that needs text extraction
  • User shares a photo of a page and wants the text
  • User mentions "scan", "handwriting", or "recognize text"

CLI Reference

Run mineru-open-api flash-extract --help for all available options.

Data Privacy

  • flash-extract uploads the document to MinerU's cloud API for processing and returns the result. No account or API key is required.
  • Documents are processed in real-time and are not stored after extraction.
  • For details, see https://mineru.net

Notes

  • Best results with clear, high-resolution scans
  • For higher precision OCR with full layout preservation, use mineru-open-api extract --ocr (requires auth via mineru-open-api auth)
  • If the CLI cannot be installed via npm/uv/go, download it from https://mineru.net/ecosystem?tab=cli

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 13:37 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

PDF to Markdown - Extract Text, Tables, Formulas from PDF

tanis90
PDF转Markdown转换器 - 从PDF文件中提取文本、表格和公式,转换为干净的Markdown格式。用于转换PDF文档、提取PDF内容等场景。
★ 0 📥 826

Summarize Pdf

tanis90
PDF转Markdown转换器 - 从PDF文件中提取文本、表格和公式,转换为干净的Markdown格式。用于转换PDF文档、提取PDF内容等场景。
★ 0 📥 1,108

Pptx To Markdown

tanis90
文档转 Markdown 转换器 - 将 DOCX、PPTX、Excel 文件转换为 Markdown。用于从 Word 文档、PowerPoint 演示文稿或 E... 提取内容。
★ 0 📥 822