OCR with python

Extract Chinese and English text from images and scanned PDFs, including documents like invoices and contracts, using PaddleOCR in Python.

{"answer":"用Python PaddleOCR提取图片和扫描PDF中的中英文文本（含发票、合同）。"}

roamerxv

内容创作 clawhub v1.0.0 1 版本 98697.1 Key: 无需

★ 2

Stars

📥 8,823

下载

💾 2,154

安装

版本

#latest

概述

OCR Text Recognition

This skill uses PaddleOCR for text recognition, supporting both Chinese and English.

Quick Start

Basic Usage

Perform OCR recognition directly on image or PDF files:

from paddleocr import PaddleOCR

ocr = PaddleOCR(lang='ch')
result = ocr.predict("file_path.jpg")

Dependency Installation

Install dependencies before first use:

pip3 install paddlepaddle paddleocr

Output Format

Recognition results return JSON containing:

rec_texts: List of recognized text
rec_scores: Confidence score for each text

Typical Use Cases

PDF Scans: Use PyMuPDF to extract images first, then OCR
Image Text Recognition: Perform OCR directly on images
Multi-page PDFs: Process page by page

Scripts

Common scripts are located in the scripts/ directory.

版本历史

共 1 个版本

v1.0.0 当前

2026-03-28 23:02 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants

广告情报与应用数据分析助手，支持搜索广告素材、分析应用排名、下载量、收入及市场洞察，用于广告素材和竞品分析。

★ 294 📥 136,396

content-creation

Humanizer

biostartechnology

消除AI写作痕迹，使文本更自然真实。基于维基百科"AI写作特征"指南，识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。

★ 857 📥 199,243

content-creation

YouTube

byungkyu

使用托管OAuth集成YouTube Data API，支持搜索视频、管理播放列表、获取频道数据及评论互动，适用于用户需要时使用此技能。

★ 141 📥 41,013