← 返回
效率工具 中文

UI Element Ops

Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate U...
将UI截图解析为结构化元素JSON(含类型、OCR文本、边界框),并可基于解析元素操作桌面UI。适用于检测或定位UI元素。
murongg
效率工具 clawhub v1.0.2 1 版本 99885.5 Key: 无需
★ 0
Stars
📥 872
下载
💾 29
安装
1
版本
#latest

概述

UI Element Ops

Parse one or more screenshots into a machine-readable JSON schema with:

  • type (normalized UI element type)
  • bbox_px and bbox_norm
  • text (OCR/caption content when available)
  • clickable flag
  • optional overlay image with labeled boxes
  • desktop actions via scripts/operate_ui.py (click/type/key/hotkey/screenshot)
  • element query and orchestration via scripts/operate_ui.py (find, wait)
  • coordinate calibration profile for multi-display/DPI/window offset (calibrate)

Quick Start

  1. Prepare runtime once per machine:
  2. skills/ui-element-ops/scripts/bootstrap_omniparser_env.sh "$PWD"
    
  1. Parse one screenshot:
  2. skills/ui-element-ops/scripts/run_parse_ui.sh /abs/path/to/1.jpeg
    
  1. Read outputs:
    • .elements.json
    • .overlay.png
  1. One-step capture + parse with randomized names:
  2. skills/ui-element-ops/scripts/capture_and_parse.sh
    

Workflow

  1. Confirm screenshot path and desired output path.
  2. Run scripts/bootstrap_omniparser_env.sh when .venv or OmniParser weights are missing.
  3. Run scripts/run_parse_ui.sh for standard parsing.
  4. Report absolute output paths and summary counts: total, clickable, by_type.
  5. Call out obvious quality risks for tiny text or dense icon layouts.
  6. Execute desktop actions when requested:
    • list elements: python3 skills/ui-element-ops/scripts/operate_ui.py list --elements
    • find elements: python3 skills/ui-element-ops/scripts/operate_ui.py find --elements --type button --text-contains login
    • wait for appear/disappear: python3 skills/ui-element-ops/scripts/operate_ui.py wait --elements --state appear --text-contains continue
    • click by id: python3 skills/ui-element-ops/scripts/operate_ui.py click --elements --id e_0001
    • screenshot: python3 skills/ui-element-ops/scripts/operate_ui.py screenshot (defaults to user tmp dir)
    • calibrate coordinates: python3 skills/ui-element-ops/scripts/operate_ui.py calibrate --parsed-size --actual-size

Tunables

  • Edit type mapping keywords in references/type_rules.example.json.
  • Use advanced parser args via scripts/parse_ui.py --help.
  • Use --use-paddleocr only when paddleocr/paddlepaddle are installed.

Outputs

  • Main JSON output:
  • schema_version, pipeline, image, counts, elements
  • each element has id, type, bbox_px, bbox_norm, text, clickable
  • Overlay PNG output:
  • same screenshot with labeled detection boxes

Failure Handling

  • Missing dependencies or weights: run bootstrap script again.
  • Permission/cache errors under $HOME: keep temporary caches under /tmp (handled by run script).
  • CPU-only machine: expect slower inference.
  • Performance note: parse/capture-and-parse commands are heavy; avoid very tight loops and reuse recent elements.json when possible.
  • Headless environment limitation:
  • usable without GUI: parse/list/find/wait/calibrate on existing files.
  • requires GUI session: click/click-xy/type/key/hotkey/screenshot/screen-info.

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-03-29 16:10 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 440 📥 148,118
productivity

Weather

steipete
获取当前天气和预报(无需API密钥)
★ 446 📥 226,456
productivity

Nano Pdf

steipete
使用nano-pdf CLI通过自然语言指令编辑PDF
★ 275 📥 114,962