← 返回
开发者工具 中文

U2-doc-parser

Parse documents using UniDoc API for conversion to Markdown or JSON format. Supports both synchronous and asynchronous parsing with automatic status polling.
使用 UniDoc API 解析文档并转换为 Markdown 或 JSON 格式,支持同步和异步解析及自动状态轮询。
aaiccee
开发者工具 clawhub v1.0.5 2 版本 99817.2 Key: 无需
★ 1
Stars
📥 1,072
下载
💾 7
安装
2
版本
#latest

概述

Name: u2-doc-parser

Description: Parse documents using UniDoc API for conversion to Markdown or JSON format. Supports both synchronous and asynchronous parsing with automatic status polling.

UniDoc Document Parser

======================

Overview


Parse documents using UniDoc API for conversion to Markdown or JSON format. Supports both synchronous and asynchronous parsing with automatic status polling. Ideal for converting various document formats (PDF, DOC, DOCX, images) through a cloud-based API service.

⚠️ Important Privacy Notice

  • This skill uploads your documents to an external API service: https://unidoc.uat.hivoice.cn
  • Documents are transmitted over the internet and processed on third-party servers
  • No authentication or API key is required for this UAT environment
  • Do not use with sensitive, confidential, or private documents
  • By using this skill, you acknowledge that your files will be uploaded to external servers

Prereqs / when to read references


If you encounter API errors, network issues, or need to understand the API endpoints, read:

  • references/unidoc-notes.md

Quick start (single document)


# Output to terminal (default)
python scripts/unidoc_parse.py /path/to/file.pdf

# Save to file
python scripts/unidoc_parse.py /path/to/file.pdf --output result.md

# Convert to JSON format (async mode)
python scripts/unidoc_parse.py /path/to/file.docx --format json --mode async

Options


  • --format md|json (default: md)
  • Output format: Markdown or JSON
  • --mode sync|async (default: sync)
  • Synchronous mode: waits for conversion to complete
  • Asynchronous mode: polls status until completion
  • --func METHOD (default: unisound)
  • Conversion method/algorithm to use
  • --output FILE (optional)
  • Save output to file instead of printing to terminal
  • When not specified, results are printed directly to stdout
  • --uid UUID (optional)
  • Custom user ID (auto-generated if not provided)

Output


  • Default: Prints converted content directly to terminal (stdout)
  • With --output: Saves to specified file path
  • Progress and error messages are sent to stderr
  • Can be piped to other commands: python scripts/unidoc_parse.py doc.pdf | grep "keyword"

Notes


  • Privacy: Your documents are uploaded to UniDoc's UAT servers for processing
  • No authentication: Current implementation does not require API keys or credentials
  • Network: Requires internet connectivity to https://unidoc.uat.hivoice.cn
  • Supported formats: PDF, DOC, DOCX, PNG, JPG, etc.
  • Async mode: Polls every 1 second until completion (max 5 minutes)
  • Limits: Max file size and rate limits depend on API service configuration
  • Recommendation: For large files or batch processing, prefer async mode
  • Security: Only use with non-sensitive test documents

版本历史

共 2 个版本

  • v1.0.5 当前
    2026-03-29 08:50 安全 安全
  • v1.0.0
    2026-03-19 04:12

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

U2-audio-file-transcriber

aaiccee
使用 UniSound 的 UniCloud ASR(云知声语音识别)API 将录音文件转写为文字,支持多种音频格式,专为金融、客服等场景优化。
★ 0 📥 809
developer-tools

Gog

steipete
Google Workspace 命令行工具,支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
★ 921 📥 185,784
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 324,097