Extract text content from images using the Tesseract engine directly via command line.
Install Tesseract OCR system package:
# Ubuntu/Debian:
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim
# macOS:
brew install tesseract tesseract-lang
# Use default language (English)
tesseract /path/to/image.png stdout
# Specify language (Chinese + English)
tesseract /path/to/image.png stdout -l chi_sim+eng
# Save to file
tesseract /path/to/image.png output.txt -l chi_sim+eng
# Multiple languages
tesseract /path/to/image.png stdout -l chi_sim+eng+jpn
| Language | Code |
|---|---|
| ---------- | ------ |
| Simplified Chinese | chi_sim |
| Traditional Chinese | chi_tra |
| English | eng |
| Japanese | jpn |
| Korean | kor |
| Chinese + English | chi_sim+eng |
# OCR with Chinese support
tesseract image.jpg stdout -l chi_sim
# OCR with mixed Chinese and English
tesseract image.png stdout -l chi_sim+eng
# Save to file instead of stdout
tesseract document.png result -l chi_sim+eng
# Creates result.txt
共 1 个版本