← 返回
AI智能 Key 中文

Agent Paddleocr Vision

Multi-language document understanding with PaddleOCR
基于PaddleOCR的多语言文档理解
nhzallen
AI智能 clawhub v1.1.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 574
下载
💾 120
安装
1
版本
#agent-actions#batch#document-understanding#invoice#latest#ocr#paddleocr#searchable-pdf

概述

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

  • OCR extraction via PaddleOCR cloud API (requires credentials)
  • 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
  • Action suggestion with structured parameters
  • Batch processing
  • Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

TypeActions
---------------
Invoicecreate_expense, archive, tax_report
Business Cardadd_contact, save_vcard
Receiptcreate_expense, split_bill
Tableexport_csv, analyze_data
Contractsummarize, extract_dates, flag_obligations
ID Cardextract_id_info, verify_age
Passportstore_passport_info, check_validity
Bank Statementcategorize_transactions, generate_report
Driver Licensestore_license_info, check_expiry
Tax Formsummarize_tax, suggest_deductions
Generalsummarize, translate, search_keywords

Configuration

Required environment variables:

  • PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
  • PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

  • PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

  • 中文: docs/README.zh.md
  • English: docs/README.en.md
  • Español: docs/README.es.md
  • العربية: docs/README.ar.md

License

MIT-0


Made for OpenClaw. Let your agent see and act.

版本历史

共 1 个版本

  • v1.1.0 当前
    2026-03-29 22:55 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 836 📥 213,106
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,308
developer-tools

Virtual Desktop Browser

nhzallen
在 Xvfb 虚拟显示(固定 1200x720x24)中以非无头模式启动 Chromium,通过拟人化的鼠标、键盘和截图操作实现自动化。
★ 0 📥 652