Extract text from images using Tesseract.js. 100% local run, no API key required. Supports Chinese and English.
node {baseDir}/scripts/ocr.js /path/to/image.jpg
node {baseDir}/scripts/ocr.js /path/to/image.png --lang chi_sim
node {baseDir}/scripts/ocr.js /path/to/image.jpg --lang chi_tra+eng
--lang : Language codes (default: chi_sim+eng)chi_sim - Simplified Chinesechi_tra - Traditional Chinese eng - English+: chi_sim+eng--json: Output as JSON instead of plain text# Recognize Chinese screenshot
node {baseDir}/scripts/ocr.js screenshot.png
# Recognize English document
node {baseDir}/scripts/ocr.js document.jpg --lang eng
# Mixed Chinese + English
node {baseDir}/scripts/ocr.js mixed.png --lang chi_sim+eng
{baseDir}/scripts/tessdata/Language model files (.traineddata.gz) are automatically downloaded on first use and stored in:
{baseDir}/scripts/tessdata/
- chi_sim.traineddata.gz (Simplified Chinese)
- eng.traineddata.gz (English)
To manually download or update language data:
cd {baseDir}/scripts/tessdata
curl -O https://cdn.jsdelivr.net/npm/@tesseract.js-data/chi_sim/4.0.0_best_int/chi_sim.traineddata.gz
curl -O https://cdn.jsdelivr.net/npm/@tesseract.js-data/eng/4.0.0_best_int/eng.traineddata.gz
共 1 个版本