← 返回
未分类 中文

Screen Vision

macOS screen OCR & click automation via Apple Vision + ScreenCaptureKit. Capture any window or screen region, extract text with coordinates, find text, and c...
基于 Apple Vision + ScreenCaptureKit 的 macOS 屏幕 OCR 及点击自动化工具。捕获任意窗口或屏幕区域,提取带坐标的文本,定位文本并点击...
jackyun1024
未分类 clawhub v1.2.0 1 版本 99593.2 Key: 无需
★ 0
Stars
📥 1,469
下载
💾 6
安装
1
版本
#latest

概述

screen-vision

macOS screen OCR & click automation CLI powered by Apple Vision + ScreenCaptureKit.

IMPORTANT: Auto-Setup (run first)

Before running any screen-vision command, check if the binary exists. If not, run the setup script:

command -v screen-vision &>/dev/null || bash "${CLAUDE_SKILL_DIR}/setup.sh"

This installs screen-vision (via Homebrew or source build) and cliclick automatically.

Requirements

  • macOS 14.0+ (Sonoma)
  • Screen Recording permission (System Settings > Privacy & Security > Screen Recording)

Commands

CommandDescriptionOutput
------------------------------
screen-vision ocr [--app NAME]Full OCRJSON array [{text, x, y, w, h, confidence}]
screen-vision list [--app NAME]OCR listHuman-readable text with coordinates
screen-vision find "text" [--app NAME]Find textJSON {text, x, y, found}
screen-vision has "text" [--app NAME]Check text existsExit code 0 (found) / 1 (not found)
screen-vision tap "text" [--app NAME] [--retry N]Find + clickJSON {text, x, y, tapped}
screen-vision wait "text" [--timeout SEC]Poll until text appearsJSON {text, x, y, found}

Capture Priority

--region x,y,w,h  >  --app "AppName"  >  full screen (default)

Usage Patterns

OCR a specific app window

screen-vision list --app "Safari"

Check if text is visible (for conditionals)

screen-vision has "Submit" --app "MyApp" && echo "Found" || echo "Not found"

Click on text with retry

screen-vision tap "OK" --app "MyApp" --retry 3

Wait for text to appear (e.g. loading complete)

screen-vision wait "Complete" --timeout 30

Full screen OCR as JSON (pipe to jq)

screen-vision ocr | jq '.[].text'

$ARGUMENTS Handling

Parse the user's request to determine which command to run:

  • "화면에 뭐 있어?" / "what's on screen?" → screen-vision list
  • "~찾아" / "find ~" → screen-vision find "text"
  • "~클릭해" / "click ~" → screen-vision tap "text"
  • "~보여?" / "is ~ visible?" → screen-vision has "text"
  • "~뜰 때까지 기다려" / "wait for ~" → screen-vision wait "text"

版本历史

共 1 个版本

  • v1.2.0 当前
    2026-04-30 22:35 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,374 📥 319,866
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 673 📥 325,054
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,223 📥 267,442