← 返回
未分类

MinerU zero-setup document extraction — convert PDFs, images, Word, and PowerPoint to Markdown instantly. No login, no token, no configuration. Just run and get results

MinerU fast extract — zero-setup, instant document extraction. Convert PDFs, images, Word (DOCX), and PowerPoint (PPTX) to Markdown with no login, no token,...
MinerU 快速提取,零配置即时文档提取,支持 PDF、图片、Word (DOCX)、PowerPoint (PPTX) 转 Markdown,无需登录、无需令牌...
mineru-extract mineru-extract 来源
未分类 clawhub v0.2.1 2 版本 100000 Key: 无需
★ 0
Stars
📥 509
下载
💾 13
安装
2
版本
#latest

概述

Fast Document Extraction with mineru-open-api

Zero-setup, instant document parsing — no login, no token, no configuration needed. Supports tables and formulas (LaTeX).

Installation

npm install -g mineru-open-api

Or via Go (macOS/Linux):

go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Verify installation

mineru-open-api version

Quick start

mineru-open-api flash-extract report.pdf                     # PDF → Markdown (instant!)
mineru-open-api flash-extract report.pdf -o ./out/           # Save to file
mineru-open-api flash-extract resume.docx                    # Word → Markdown
mineru-open-api flash-extract slides.pptx                    # PowerPoint → Markdown
mineru-open-api flash-extract photo.png                      # Image → Markdown (OCR)
mineru-open-api flash-extract https://example.com/doc.pdf    # URL → Markdown

Supported input formats

FormatSupported
--------:-:
PDF (.pdf)Yes
Images (.png, .jpg, .jpeg, .jp2, .webp, .gif, .bmp)Yes
Word (.docx)Yes
PowerPoint (.pptx)Yes
URLs (remote files)Yes

Command: flash-extract

mineru-open-api flash-extract <file-or-url> [flags]

Flags

FlagShortDefaultDescription
-----------------------------------
--output-o_(stdout)_Output path (file or directory)
--languagechDocument language
--pages_(all)_Page range, e.g. 1-10
--timeout900Timeout in seconds

Supported --language values

Values are organized by script/language family — each value covers all languages in its group.

Standalone language packs

ValueIncluded languages说明
--------------------------------
chChinese, English, Chinese Traditional中英文(默认值)
ch_serverChinese, English, Chinese Traditional, Japanese繁体、手写体
enEnglish纯英文
japanChinese, English, Chinese Traditional, Japanese日文为主
koreanKorean, English韩文
chinese_chtChinese, English, Chinese Traditional, Japanese繁体中文为主
taTamil, English泰米尔文
teTelugu, English泰卢固文
kaKannada卡纳达文
elGreek, English希腊文
thThai, English泰文

Language family packs

ValueScript/FamilyIncluded languages
----------------------------------------
latinLatin script (拉丁语系)French, German, Afrikaans, Italian, Spanish, Bosnian, Portuguese, Czech, Welsh, Danish, Estonian, Irish, Croatian, Uzbek, Hungarian, Serbian (Latin), Indonesian, Occitan, Icelandic, Lithuanian, Maori, Malay, Dutch, Norwegian, Polish, Slovak, Slovenian, Albanian, Swedish, Swahili, Tagalog, Turkish, Latin, Azerbaijani, Kurdish, Latvian, Maltese, Pali, Romanian, Vietnamese, Finnish, Basque, Galician, Luxembourgish, Romansh, Catalan, Quechua
arabicArabic script (阿拉伯语系)Arabic, Persian, Uyghur, Urdu, Pashto, Kurdish, Sindhi, Balochi, English
cyrillicCyrillic script (西里尔语系)Russian, Belarusian, Ukrainian, Serbian (Cyrillic), Bulgarian, Mongolian, Abkhazian, Adyghe, Kabardian, Avar, Dargin, Ingush, Chechen, Lak, Lezgin, Tabasaran, Kazakh, Kyrgyz, Tajik, Macedonian, Tatar, Chuvash, Bashkir, Malian, Moldovan, Udmurt, Komi, Ossetian, Buryat, Kalmyk, Tuvan, Sakha, Karakalpak, English
east_slavicEast Slavic (东斯拉夫语系)Russian, Belarusian, Ukrainian, English
devanagariDevanagari script (天城文语系)Hindi, Marathi, Nepali, Bihari, Maithili, Angika, Bhojpuri, Magahi, Santali, Newari, Konkani, Sanskrit, Haryanvi, English

Examples

mineru-open-api flash-extract report.pdf
mineru-open-api flash-extract report.pdf -o ./out/
mineru-open-api flash-extract report.pdf --language en
mineru-open-api flash-extract report.pdf --language latin
mineru-open-api flash-extract report.pdf --pages "1-5"
mineru-open-api flash-extract contract.docx -o ./out/
mineru-open-api flash-extract presentation.pptx -o ./out/
mineru-open-api flash-extract scan.jpg --language ch

Output behavior

  • No -o flag: result goes to stdout; status/progress messages go to stderr
  • With -o flag: result saved to file/directory; progress messages on stderr
  • Markdown output includes extracted images saved alongside the .md file
  • Tables are converted to Markdown tables
  • Formulas are converted to LaTeX format (inline $...$ and block $$...$$)

Agent guidelines

When using this skill on behalf of the user:

  • Always use flash-extract for any input — whether it's a local file or a URL (e.g. https://cdn-mineru.openxlab.org.cn/demo/example.pdf). Do NOT assume a URL means "web page". flash-extract handles URLs to document files directly.
  • Quote file paths that contain spaces or special characters with double quotes. Example: mineru-open-api flash-extract "report 01.pdf".
  • Don't run commands blindly on errors — explain the exit code and troubleshooting steps instead of re-running the command.
  • Installation questions ("mineru 怎么安装") should be answered with the install instructions above.

Default output directory

When the user does NOT specify -o, generate a default output directory:

~/MinerU-Skill/<name>_<hash>/
  • : derived from the source, then sanitized (replace spaces and shell-unsafe characters with _, collapse consecutive _).
  • For URLs: last path segment (e.g. https://arxiv.org/pdf/2509.221862509.22186)
  • For local files: filename without extension (e.g. report.pdfreport)
  • : first 6 characters of MD5 hash of the full original source.
echo -n "source" | md5sum | cut -c1-6   # Linux
echo -n "source" | md5 | cut -c1-6      # macOS

When the user specifies -o: use the user's path as-is.

Skill upgrade = CLI upgrade

When the user asks to upgrade this skill, re-install the CLI first:

npm install -g mineru-open-api@latest

Exit codes

CodeMeaningRecovery
-------------------------
0Success
1General API or unknown errorCheck network; retry; use --verbose
2Invalid parameters / usage errorCheck command syntax and flag values
4File too large or page limit exceededTry a smaller file or fewer pages
5Extraction failedDocument may be corrupted or unsupported
6TimeoutIncrease with --timeout

Troubleshooting

  • Timeout on large files: Increase with --timeout 1600
  • Extraction quality is poor: Try specifying --language to match the document language
  • HTTP 429: Rate limit hit. Wait a few minutes and retry.

版本历史

共 2 个版本

  • v0.2.1 当前
    2026-05-03 05:18 安全 安全
  • v1.0.0
    2026-03-31 05:21

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

office-efficiency

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 476 📥 158,020
office-efficiency

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 400 📥 150,175
office-efficiency

腾讯文档 TENCENT DOCS

u_b0de8114
腾讯文档(docs.qq.com)-在线云文档平台,是创建、编辑、管理文档的首选 skill。涉及"新建/创建/编辑/读取/查看/搜索文档"、"保存文件"、"云文档"、"腾讯文档"、"docs.qq.com"等操作,请优先使用本 skill
★ 180 📥 128,207