概述

Image Reader - OCR Text Extraction

A high-performance OCR skill for extracting text from images. Powered by RapidOCR with PP-OCRv4 models, supporting Chinese and English text recognition.

Features

Multi-language: Chinese (simplified/traditional), English, and mixed text
High accuracy: PP-OCRv4 model with >95% accuracy on typical screenshots
Structured output: Text with confidence scores and bounding boxes
Image info: Dimensions, format, and color mode included
Fast: CPU-only, no GPU required

Quick Start

python scripts/read_image.py /path/to/image.jpg

Usage Examples

Extract text from a screenshot

python scripts/read_image.py screenshot.png

JSON Output

The script outputs structured JSON:

{
  "success": true,
  "text": "Full extracted text",
  "lines": [
    {
      "text": "Individual line",
      "confidence": 0.98,
      "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    }
  ],
  "line_count": 5,
  "image_info": {
    "format": "PNG",
    "size": [1920, 1080],
    "mode": "RGB"
  }
}

Requirements

pip install rapidocr onnxruntime pillow

First run will download OCR models (~50MB) automatically.

Common Use Cases

UI Screenshots: Extract text from app/website screenshots
Document Photos: Read text from photographed documents
Diagrams: Extract labels and annotations
Receipts: Parse receipt/invoice data

Output Fields

Field	Type	Description
-------	------	-------------
success	bool	Whether OCR succeeded
text	string	All extracted text
lines	array	Individual text lines with metadata
line_count	int	Number of text lines detected
image_info	object	Image metadata

Technical Details

Engine: RapidOCR (ONNX Runtime backend)
Models: PP-OCRv4 (detection + recognition)
Languages: Chinese, English (auto-detected)
Performance: ~1-2 seconds per image on CPU

License

MIT License

Third-party dependencies:

RapidOCR - Apache 2.0 License
ONNX Runtime - MIT License

版本历史

共 1 个版本

v1.0.0 当前

2026-05-03 07:58 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)