universal-pdf-vision-parser

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....

利用多模态视觉技术（Qwen-VL-Max）从PDF中提取多语文档内容及语言学习笔记（法语、德语、日语、西班牙语等）。

mingensiie

内容创作 clawhub v1.0.0 1 版本 100000 Key: 需要

★ 0

Stars

📥 650

下载

💾 81

安装

版本

#latest

概述

Universal PDF Vision Parser Skill

Version: 0.1

This skill is a high-end multilingual document digitizer. It uses multimodal vision to 'look' at each PDF page, making it perfect for language learning notes, bilingual documents, and complex layouts that standard OCR fails to capture.

Prerequisites

DashScope API Key: A valid key from Alibaba Cloud Bailian with qwen-vl-max access.
Environment:

pip install pymupdf dashscope

Usage

Basic Command

python scripts/vision_parse.py --pdf <path_to_pdf> --out <path_to_output.md> --api-key <YOUR_API_KEY> --max-pages 2

--max-pages: (Optional) Max pages to process. Defaults to 2. Set to -1 for all pages.

Agentic Workflow

Visual Scanning: Converts PDF pages to 300 DPI PNGs.
Expert Transcription: Qwen-VL-Max identifies the language and transcribes terms, translations, and explanations.
Markdown Structuring: Automatically formats content with bold keywords, italicized meanings, and clean tables.

Examples

User: "Convert this German-Chinese note to markdown: notes.pdf"

Agent Action:

python scripts/vision_parse.py --pdf notes.pdf --out notes.md

版本历史

共 1 个版本

v1.0.0 当前

2026-03-29 22:38 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

🔗 相关推荐

content-creation

Baidu Wenku AIPPT

ide-rea

使用百度文库 AI 智能生成 PPT，自动根据内容选择模板。

★ 66 📥 46,191

content-creation

YouTube

byungkyu

使用托管OAuth集成YouTube Data API，支持搜索视频、管理播放列表、获取频道数据及评论互动，适用于用户需要时使用此技能。

★ 142 📥 41,063

content-creation

AdMapix

fly0pants

广告情报与应用数据分析助手，支持搜索广告素材、分析应用排名、下载量、收入及市场洞察，用于广告素材和竞品分析。

★ 295 📥 136,480