Image OCR — Local AI PC (Windows · GLM-OCR · llama.cpp Vulkan)

Model: ggml-org/GLM-OCR-GGUF (Q8_0, HuggingFace / hf-mirror)

Inference: llama-server (llama.cpp Vulkan prebuilt, HTTP API)

SKILL_VERSION: 2.0.0

Directory Structure (auto-created or user-specified)

<OCR_DIR>\                        ← auto-selected drive or user-specified (e.g. C:\image-ocr or D:\image-ocr)
├── llama.cpp\                    ← llama-server.exe and related binaries
└── models\
    └── GLM-OCR-GGUF\
        ├── GLM-OCR-Q8_0.gguf        ← main model (~950 MB)
        └── mmproj-GLM-OCR-Q8_0.gguf ← vision projection layer (~484 MB, required)

Python helper scripts (located alongside this SKILL.md):

File	Purpose
------	---------
`_ocr_cfg.py`	Shared helper: resolves OCR_DIR from env var or `.ocr_dir` config file
`ocr_setup.py`	All-in-one setup: preflight checks, llama.cpp install, model download & verify
`ocr_run.py`	Run GLM-OCR inference on an image (via llama-server HTTP API)

> ## ⚠️ Before You Install — Security & Compliance Disclosure

> This skill will perform the following actions on your system. Review carefully before granting autonomous execution.

> | Action | Source | Risk level |

> |--------|--------|------------|

> | Download and extract llama-server.exe and related binaries | github.com/ggml-org/llama.cpp releases | Medium — executes downloaded binary |

> | Download model files (~1.5 GB total) | HuggingFace (huggingface.co) or ModelScope (modelscope.cn) | Low — large file download |

> | Automatically install Miniforge (if Python not found) | github.com/conda-forge/miniforge | Medium — silently installs Python to %USERPROFILE%\miniforge3 |

> | Create directories and write files to disk | Local filesystem only | Low |

> Credentials: The GLM-OCR model (ggml-org/GLM-OCR-GGUF) is publicly available and does not

> require authentication. If you use a private or gated HuggingFace model, you may need to set

> $env:HUGGINGFACE_TOKEN before running the download commands — this skill does not request or

> store any tokens automatically.

> Trust: Verify you trust the following sources before proceeding:

> - https://github.com/ggml-org/llama.cpp/releases

> - https://huggingface.co/ggml-org/GLM-OCR-GGUF

> - https://github.com/conda-forge/miniforge (only if Miniforge auto-install is triggered)

> If you prefer to control each step manually rather than granting autonomous execution,

> run the commands in this SKILL.md one by one yourself.

Cached Installation

> AI Assistant: After a successful first-time setup, fill in this section with the resolved paths.

> On every subsequent invocation, read this section first and attempt the Fast Path before running setup.

OCR_DIR       =
LLAMA_EXE     =
MODEL_GGUF    =
MMPROJ_GGUF   =
LAST_VERIFIED =

(Leave blank until first install. AI fills these in after INSTALL_DONE=1 is printed by ocr_setup.py.)

⚠️ AI Assistant Instructions

0. Fast Path (try this first on every invocation)

Before running any setup steps, check the ## Cached Installation section above.

If OCR_DIR is filled in → attempt the fast path:

```

python ocr_run.py ""

```

Fast path succeeds (stdout contains recognized text) → return the result immediately. No setup steps needed.
Fast path fails (file not found, server error, model missing, etc.) → fall through to the full setup flow below and then retry inference.

Full Setup Flow (run only when Fast Path fails or on first use)

Execute one command at a time; wait for output before proceeding.
Stop immediately on error; refer to the Troubleshooting table at the end.
Wrap all paths in double quotes.
is the absolute working directory path, determined after Pre-flight.
Single goal: Recognize image content and return text results.

Execution flow (do not skip steps):

Pre-flight: python ocr_setup.py --check     → LLAMA_STATUS + MODEL_STATUS
Step 1:     python ocr_setup.py --llama     → only if LLAMA_STATUS=MISSING/OUTDATED
            (Check Python first — see Step 1 below)
Step 2:     python ocr_setup.py --download  → only if MODEL_STATUS=MISSING
Step 3:     python ocr_run.py <image>       → inference → return result
Post-setup: Update "Cached Installation" section above with resolved paths

Progress reporting: Announce each step before starting, e.g.: 🔍 Pre-flight: Checking environment…

After successful setup: Update the ## Cached Installation section in this SKILL.md with:

OCR_DIR (from ocr_setup.py output line OCR_DIR=...)
LLAMA_EXE = \llama.cpp\llama-server.exe
MODEL_GGUF = \models\GLM-OCR-GGUF\GLM-OCR-Q8_0.gguf
MMPROJ_GGUF = \models\GLM-OCR-GGUF\mmproj-GLM-OCR-Q8_0.gguf
LAST_VERIFIED = today's date

⚠️ Using Bash.exe on Windows (Git Bash / WSL)

If the AI assistant's terminal is bash.exe rather than PowerShell or CMD, environment variables

set via $env:OCR_DIR (PowerShell) or set OCR_DIR= (CMD) will not be visible to Python scripts

launched from bash. This was the most common cause of "cannot find llama.cpp / models" errors.

How it's solved now

All scripts use _ocr_cfg.py which resolves OCR_DIR via two methods (in order):

Environment variable OCR_DIR — works if properly export-ed in bash
.ocr_dir config file — written automatically by ocr_setup.py --workdir alongside the scripts

**As long as ocr_setup.py has been run once (from any shell), all subsequent scripts

will find the OCR directory automatically**, regardless of which shell is used.

Shell compatibility summary

Shell	Needs `export OCR_DIR=...`?	Works after first run of `ocr_setup.py`?
-------	----------------------------	------------------------------------------
PowerShell / CMD	No (auto-resolved)	✅ Yes
Git Bash	No (reads `.ocr_dir` file)	✅ Yes
WSL bash	No (reads `.ocr_dir` file)	✅ Yes

> Note: If running from WSL, subprocess.Popen does not support creationflags.

> ocr_run.py handles this automatically (_CREATE_NO_WINDOW is 0 when sys.platform != "win32").

Pre-flight: Check Environment

> 🔍 Pre-flight: Checking working directory, llama.cpp, and model files…

python ocr_setup.py --check

Output	Action
--------	--------
Both `READY`	✅ Skip to Step 3
`LLAMA_STATUS=MISSING/OUTDATED`	⬇️ Execute Step 1
`MODEL_STATUS=MISSING`	⬇️ Execute Step 2

Announce: ✅ Environment check complete. Execute steps as needed.

Step 1: Check Python + Install / Update llama.cpp Vulkan

> 🐍 Step 1a: Checking Python availability…

Check Python

> Note: This step requires an existing Python to run scripts. Use the snippet below to locate Python,

> record the path as PYTHON_EXE, and use that Python for all subsequent steps.

Option A — PowerShell (Windows default)

$customPythonExe = ""  # Optional: specify a path manually, e.g. "C:\Python311\python.exe"
$found = $null
if ($customPythonExe -and (Test-Path $customPythonExe)) { $found = $customPythonExe }
if (-not $found) {
    foreach ($cmd in @("python","python3","py")) {
        if (Get-Command $cmd -ErrorAction SilentlyContinue) { $found = (Get-Command $cmd).Source; break }
    }
}
if (-not $found) {
    foreach ($p in @(
        "$env:USERPROFILE\miniforge3\python.exe","$env:USERPROFILE\miniconda3\python.exe",
        "$env:USERPROFILE\anaconda3\python.exe")) {
        if (Test-Path $p) { $found = $p; break }
    }
}
if ($found) { $env:PYTHON_EXE = $found; Write-Host "PYTHON_STATUS=OK"; Write-Host "PYTHON_EXE=$found" }
else { Write-Host "PYTHON_STATUS=MISSING" }

Option B — Bash / Git Bash / WSL (fallback if PowerShell is unavailable)

CUSTOM_PYTHON_EXE=""  # Optional: e.g. "/c/Python311/python.exe"
found=""
if [ -n "$CUSTOM_PYTHON_EXE" ] && [ -x "$CUSTOM_PYTHON_EXE" ]; then found="$CUSTOM_PYTHON_EXE"; fi
if [ -z "$found" ]; then
    for cmd in python python3 py; do
        if command -v "$cmd" >/dev/null 2>&1; then found=$(command -v "$cmd"); break; fi
    done
fi
if [ -z "$found" ]; then
    for p in "$HOME/miniforge3/bin/python" "$HOME/miniconda3/bin/python" "$HOME/anaconda3/bin/python"; do
        if [ -x "$p" ]; then found="$p"; break; fi
    done
fi
if [ -n "$found" ]; then
    export PYTHON_EXE="$found"
    echo "PYTHON_STATUS=OK"
    echo "PYTHON_EXE=$found"
else
    echo "PYTHON_STATUS=MISSING"
fi

If Python is not found (PYTHON_STATUS=MISSING), install Miniforge:

> Consent required: Miniforge will be silently installed to %USERPROFILE%\miniforge3.

> This installs a Python runtime and conda/pip toolchain. No admin rights are needed.

> Source: github.com/conda-forge/miniforge. Confirm with the user before proceeding.

Option A — PowerShell

$mf = "$env:TEMP\Miniforge3-Windows-x86_64.exe"
Invoke-WebRequest -Uri "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe" -OutFile $mf
Start-Process $mf -ArgumentList "/S /D=$env:USERPROFILE\miniforge3" -Wait
Remove-Item $mf
$env:PYTHON_EXE = "$env:USERPROFILE\miniforge3\python.exe"
& $env:PYTHON_EXE --version
Write-Host "PYTHON_STATUS=OK"

Option B — Bash / Git Bash / WSL

MF_URL="https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
MF_INSTALLER="/tmp/Miniforge3-install.sh"
curl -fsSL "$MF_URL" -o "$MF_INSTALLER"
bash "$MF_INSTALLER" -b -p "$HOME/miniforge3"
rm -f "$MF_INSTALLER"
export PYTHON_EXE="$HOME/miniforge3/bin/python"
"$PYTHON_EXE" --version
echo "PYTHON_STATUS=OK"

Output	Action
--------	--------
`PYTHON_STATUS=OK`	✅ Continue to Step 1b
`PYTHON_STATUS=MISSING` → Miniforge installed	✅ Continue to Step 1b
Miniforge download error	⛔ Check network, or install Python manually

Announce: ✅ Python ready. Checking llama.cpp…

> ⬇️ Step 1b: Installing / updating llama.cpp Vulkan… (only when LLAMA_STATUS=MISSING/OUTDATED)

> Consent required: Before proceeding, inform the user:

> - A ZIP (~50–100 MB) will be downloaded from github.com/ggml-org/llama.cpp/releases

> - It will be extracted to \llama.cpp\ and the original ZIP will be deleted

> - llama-server.exe will be placed on disk and launched as a local HTTP server by ocr_run.py

> Ask the user to confirm before running the download command.

Edit the LLAMA_TAG variable at the top of ocr_setup.py if a newer release is available.

python ocr_setup.py --llama

To install a specific tag:

python ocr_setup.py --llama --tag b9000

Output	Action
--------	--------
`LLAMA_INSTALL=DONE`	✅ Continue to Step 2
Download error	⛔ Check network, or manually download from browser and extract to `\llama.cpp\`

Announce: ✅ llama.cpp installed. Continue to Step 2 to download models.

Step 2: Download GLM-OCR Models

> 📦 Step 2: Downloading GLM-OCR models… (only when MODEL_STATUS=MISSING)

First-time Download Notice (required reading when MODEL_STATUS=MISSING)

Announce the following to the user, then ask whether to proceed:

📥 First-time model download is approximately 1.5 GB
   (GLM-OCR-Q8_0.gguf ~950 MB + mmproj ~484 MB).
   Estimated download time:
   • 100 Mbps connection: ~2 minutes
   •  50 Mbps connection: ~4 minutes
   •  10 Mbps connection: ~20 minutes

   Downloads support resumption — if interrupted, re-running this step
   will automatically continue from where it left off.

   ✅ Ready — start automatic download
   📂 I prefer to download manually — skip automatic download

User chooses automatic download → continue with download commands below
User chooses manual download → jump to the "Manual Download Fallback" section

Check Disk Space

python ocr_setup.py --disk

Output	Action
--------	--------
`DISK_STATUS=OK`	✅ Continue to Download Models
`DISK_STATUS=LOW`	⚠️ Ask user to free space before continuing

Download Models

Option A: HuggingFace (recommended)

To use the HF mirror (China), uncomment the HF_ENDPOINT line inside ocr_setup.py.

python ocr_setup.py --download hf

Option B: ModelScope (alternative for users in China)

python ocr_setup.py --download ms

Verify:

python ocr_setup.py --verify

Output	Action
--------	--------
`MODEL_DOWNLOAD=DONE`	✅ Continue to Step 3
Timeout / repeated failure	⚠️ Direct user to "Manual Download Fallback", or switch Option A / B and retry

Announce: ✅ Model download complete.

Manual Download Fallback

If automatic download repeatedly fails, guide the user to download manually:

⚠️ Automatic download failed. Please manually download the following two files:

1. GLM-OCR-Q8_0.gguf (~950 MB)
   HuggingFace: https://huggingface.co/ggml-org/GLM-OCR-GGUF/resolve/main/GLM-OCR-Q8_0.gguf
   HF Mirror:   https://hf-mirror.com/ggml-org/GLM-OCR-GGUF/resolve/main/GLM-OCR-Q8_0.gguf
   ModelScope:  https://modelscope.cn/models/ggml-org/GLM-OCR-GGUF/resolve/master/GLM-OCR-Q8_0.gguf

2. mmproj-GLM-OCR-Q8_0.gguf (~484 MB)
   HuggingFace: https://huggingface.co/ggml-org/GLM-OCR-GGUF/resolve/main/mmproj-GLM-OCR-Q8_0.gguf
   HF Mirror:   https://hf-mirror.com/ggml-org/GLM-OCR-GGUF/resolve/main/mmproj-GLM-OCR-Q8_0.gguf
   ModelScope:  https://modelscope.cn/models/ggml-org/GLM-OCR-GGUF/resolve/master/mmproj-GLM-OCR-Q8_0.gguf

Once downloaded, place both files into:
   <OCR_DIR>\models\GLM-OCR-GGUF\

Then run: python ocr_setup.py --verify

Step 3: Run Inference

> 🔍 Step 3: Running GLM-OCR recognition…

Determine Input Source

Situation	Action
-----------	--------
User message contains a local file path (e.g. `C:\Users\...\xxx.png`)	⬇️ Case A
User uploaded an image via the interface; AI tool provides a temp path	⬇️ Case B
Neither	⛔ Ask user to provide a local file path or upload an image

Case A: User Provides a Local File Path

python ocr_run.py "<file path extracted from user message>"

Case B: User Uploaded an Image via the Interface

python ocr_run.py "<temporary image path provided by the AI tool>"

Success criteria: stdout contains the recognized text content.

Format Output

Once the recognized text is obtained, process it according to the user's intent:

Scenario	Handling
----------	----------
General text extraction	Output the recognized text as-is, preserving original layout
Invoice / receipt	Extract structured fields from the text; output as JSON + human-readable format
Table	Reformat the recognized text as a Markdown table
Business card	Extract name, title, company, phone, email, address; output as JSON
ID / certificate	Output structured by original layout
Screenshot / document	Organize output by paragraph
User-defined	Process according to the user's stated requirements

Completion announcement:

✅ Recognition complete!
Let me know if you'd like to re-process, change the output format, or export to a file.

Situation	Handling
-----------	----------
`ERROR: File not found`	File path does not exist — ask user to verify the path
Empty / garbled output	Low image quality — ask user to retake or rescan
Blurry / low-resolution image	Ask user to retake or zoom in before retrying
No text detected	Inform user that no recognizable text was found in the image

Troubleshooting

Error	Cause	Solution
-------	-------	----------
`PYTHON_STATUS=MISSING`	Python not installed	Step 1 auto-installs Miniforge; confirm consent and retry
`llama-server` not found	llama-server.exe path not set correctly	Run `python ocr_setup.py --check` to verify path
Server failed to start	Wrong binary version, Vulkan error, or port conflict	Run `python ocr_setup.py --llama` to reinstall latest llama.cpp; check Vulkan drivers
`ggml_vulkan: no devices found`	Vulkan driver not installed	Update GPU driver
`error: unable to open model`	Incorrect model path	Run `python ocr_setup.py --check` to verify path
`MODEL_DOWNLOAD=` no output	Download interrupted	Switch between `--download hf` / `--download ms`, or configure proxy
Garbled / blank output	Low image quality	Improve image quality
VRAM insufficient / crash	Not enough GPU memory	Lower `-ngl` value, or use `--device none`
Fast path fails unexpectedly	Stale cached paths in SKILL.md	Clear the `## Cached Installation` block, re-run full setup flow

References

llama.cpp Releases: https://github.com/ggml-org/llama.cpp/releases
GLM-OCR GGUF: https://huggingface.co/ggml-org/GLM-OCR-GGUF

版本历史

共 3 个版本

v2.0.0 当前

2026-05-26 22:50 安全安全
v1.0.4

2026-05-21 12:40 安全安全
v1.0.3

2026-05-02 00:33 安全安全

Local GLM OCR with llama.cpp on AIPC(no API Key)

概述