概述

UI Element Ops

Parse one or more screenshots into a machine-readable JSON schema with:

type (normalized UI element type)
bbox_px and bbox_norm
text (OCR/caption content when available)
clickable flag
optional overlay image with labeled boxes
desktop actions via scripts/operate_ui.py (click/type/key/hotkey/screenshot)
element query and orchestration via scripts/operate_ui.py (find, wait)
coordinate calibration profile for multi-display/DPI/window offset (calibrate)

skills/ui-element-ops/scripts/bootstrap_omniparser_env.sh "$PWD"

skills/ui-element-ops/scripts/run_parse_ui.sh /abs/path/to/1.jpeg

skills/ui-element-ops/scripts/capture_and_parse.sh

Confirm screenshot path and desired output path.
Run scripts/bootstrap_omniparser_env.sh when .venv or OmniParser weights are missing.
Run scripts/run_parse_ui.sh for standard parsing.
Report absolute output paths and summary counts: total, clickable, by_type.
Call out obvious quality risks for tiny text or dense icon layouts.
Execute desktop actions when requested:

list elements: python3 skills/ui-element-ops/scripts/operate_ui.py list --elements
find elements: python3 skills/ui-element-ops/scripts/operate_ui.py find --elements --type button --text-contains login
wait for appear/disappear: python3 skills/ui-element-ops/scripts/operate_ui.py wait --elements --state appear --text-contains continue
click by id: python3 skills/ui-element-ops/scripts/operate_ui.py click --elements --id e_0001
screenshot: python3 skills/ui-element-ops/scripts/operate_ui.py screenshot (defaults to user tmp dir)
calibrate coordinates: python3 skills/ui-element-ops/scripts/operate_ui.py calibrate --parsed-size --actual-size

Missing dependencies or weights: run bootstrap script again.
Permission/cache errors under $HOME: keep temporary caches under /tmp (handled by run script).
CPU-only machine: expect slower inference.
Performance note: parse/capture-and-parse commands are heavy; avoid very tight loops and reuse recent elements.json when possible.
Headless environment limitation:
usable without GUI: parse/list/find/wait/calibrate on existing files.
requires GUI session: click/click-xy/type/key/hotkey/screenshot/screen-info.

共 1 个版本

安全，无风险

查看报告

安全，无风险

查看报告