← 返回
开发者工具 中文

Macos Gui Automation

Automate macOS GUI via screen capture and OCR text reading, mouse and keyboard control, window management, and app launching using cliclick, screencapture, t...
通过屏幕截图、OCR文字识别、鼠标键盘控制、窗口管理及应用启动,实现macOS图形界面的自动化。
dhdragon
开发者工具 clawhub v1.0.0 1 版本 99866.9 Key: 无需
★ 1
Stars
📥 1,481
下载
💾 48
安装
1
版本
#latest

概述

macOS GUI Automation Skill

Capabilities

  • Screen Reading: Capture screenshots and OCR text
  • Mouse Control: Click, double-click, right-click, move, drag
  • Keyboard Input: Type text, press keys, shortcuts
  • Window Management: List windows, focus, resize, close
  • App Control: Launch, quit, bring to front

Tools Available

cliclick (Mouse/Keyboard)

# Click at coordinates
cliclick c:x,y

# Double click
cliclick dc:x,y

# Right click
cliclick rc:x,y

# Move mouse
cliclick m:x,y

# Drag from to
cliclick dr:x1,y1:x2,y2

# Type text
cliclick t:"text"

# Press key (Enter, Tab, etc.)
cliclick kp:enter

screencapture + tesseract (Screen Reading)

# Capture region to file
screencapture -R x,y,w,h /tmp/screen.png

# Capture full screen
screencapture /tmp/screen.png

# OCR from image
tesseract /tmp/screen.png stdout

# OCR with Chinese support
tesseract /tmp/screen.png stdout -l chi_sim+eng

osascript (AppleScript - Window/App Control)

# List all windows
osascript -e 'tell application "System Events" to get name of every process'

# Get window position/size
osascript -e 'tell application "Finder" to get bounds of window of front window'

# Click menu item
osascript -e 'tell application "System Events" to click menu item "Save" of menu "File" of process "TextEdit"'

Usage Patterns

Read Screen Text

# 1. Capture screen
screencapture -R 100,100,800,600 /tmp/region.png

# 2. OCR
tesseract /tmp/region.png stdout

Click Button at Position

cliclick c:500,300

Type in Field

# Click field first
cliclick c:400,200
# Then type
cliclick t:"hello world"
cliclick kp:enter

Find and Click (OCR + Click)

# 1. Capture and OCR
screencapture /tmp/screen.png
text=$(tesseract /tmp/screen.png stdout)

# 2. Parse coordinates from OCR result or use image recognition

# 3. Click
cliclick c:x,y

Limitations

  • Coordinates are absolute (screen resolution dependent)
  • No built-in image recognition (need to add OpenCV/sikuli for that)
  • OCR accuracy depends on screen DPI and font
  • Some apps may not be scriptable via AppleScript

Security Notes

  • Requires Accessibility permissions in System Settings
  • Run tccutil reset Accessibility if permissions issues
  • Some apps (browsers, secure apps) may block automation

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 11:22 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Dashscope Image Gen

dhdragon
Generate images via Alibaba DashScope OpenAI-compatible endpoint (compatible-mode) using qwen-image-max. Use when the us
★ 0 📥 568
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 68 📥 180,601
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 672 📥 324,682