This skill provides tools and procedures for automating interactions with the Linux desktop environment.
Use wmctrl to find the exact name of the window you want to control.
wmctrl -l
For apps supporting accessibility (GNOME apps, Electron apps with --force-renderer-accessibility), use the inspection script to find button names without taking screenshots.
python3 scripts/inspect_ui.py "<app_name>"
Use xdotool via the helper script for common actions.
# Activate window
./scripts/gui_action.sh activate "<window_name>"
# Click coordinates
./scripts/gui_action.sh click 500 500
# Type text
./scripts/gui_action.sh type "Hello World"
# Press a key
./scripts/gui_action.sh key "Return"
wmctrl -l.scripts/inspect_ui.py to get the list of buttons and inputs.xdotool key Tab and Return to navigate, or click if coordinates are known.Many modern apps (VS Code, Discord, Cider, Chrome) need a flag to expose their UI tree:
pkill <app>
nohup <app> --force-renderer-accessibility > /dev/null 2>&1 &
共 1 个版本