← 返回
未分类 中文

Gui Control

Control the GUI desktop on this machine using xdotool, scrot, and Firefox. Use when the user asks to open a browser, visit a website, take a screenshot, clic...
使用 xdotool、scrot 和 Firefox 控制本机的 GUI桌面。当用户要求打开浏览器、访问网站、截图或点击时使用。
vibes-me vibes-me 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 393
下载
💾 0
安装
1
版本
#latest

概述

GUI Control

Control the Linux desktop with a GUI display using shell tools.

Environment

  • Display: DISPLAY=:1 — ALWAYS prefix all GUI commands with this
  • This machine has a display — never say "I'm on a headless server"
  • Tools available: xdotool (keyboard/mouse), scrot (screenshots), firefox

Quick Reference

Open Firefox with a URL

DISPLAY=:1 nohup firefox https://example.com > /dev/null 2>&1 &

Wait for page load before interacting:

sleep 5

Take a Screenshot

DISPLAY=:1 scrot /tmp/screenshot.png

Type Text into Active Window

DISPLAY=:1 xdotool type --delay 50 "Hello world"

Press a Key

DISPLAY=:1 xdotool key Return

Get Active Window Name

DISPLAY=:1 xdotool getactivewindow getwindowname

Close Firefox

DISPLAY=:1 pkill firefox

Workflow: Browse a Website and Interact

  1. Open Firefox with URL: DISPLAY=:1 nohup firefox > /dev/null 2>&1 &
  2. Wait for load: sleep 5
  3. Take screenshot to verify: DISPLAY=:1 scrot /tmp/step.png
  4. Read screenshot to assess page state
  5. Interact using keyboard (preferred over mouse):
    • xdotool key Tab — move focus
    • xdotool key Return — submit/confirm
    • xdotool type --delay 50 "text" — type into focused field
  6. After each action, screenshot to verify result
  7. Send screenshots to user with the message tool and media parameter

Tips

  • Prefer keyboard over mouse coordinates — Tab, Enter, arrow keys are more reliable than xdotool mousemove + click
  • YouTube shortcut: press / to focus the search bar
  • Always wait after page loads or actions before taking screenshots
  • Use nohup ... & for launching Firefox so it doesn't block the shell
  • Send screenshots to user using message(content="...", media=["/tmp/screenshot.png"])

Lessons Learned

Don't Over-Engineer

  • Start simplexdotool + keyboard shortcuts work great. Don't jump to Selenium/Marionette unless absolutely needed.
  • One clean attempt > five messy ones — think before executing, don't retry the same failing approach.
  • Don't open Firefox multiple times — check if it's already running first with ps aux | grep firefox

Keyboard Shortcuts by Website

  • YouTube: / focuses search bar, Tab navigates between elements, Return selects
  • General web: Ctrl+F opens find bar, Ctrl+L focuses address bar, Tab cycles focus
  • Don't use xdotool mousemove with hardcoded coordinates — they break on different resolutions and you might click the wrong element (e.g., address bar instead of YouTube search)

Common Mistakes to Avoid

  • Don't guess coordinatesxdotool mousemove 640 120 will click different things on different screens
  • Don't say "I'm on a headless server" — this machine HAS a display (DISPLAY=:1)
  • Don't use DISPLAY=:0 — the correct display is :1
  • Don't open multiple Firefox instances — reuse the existing one or close it first
  • Don't confuse the browser address bar with website search bars — use keyboard shortcuts to target the right element

Screenshot Workflow

  1. Take screenshot: DISPLAY=:1 scrot /tmp/screen.png
  2. Read it yourself: read_file("/tmp/screen.png") — this lets YOU see the screen
  3. Send to user: message(content="...", media=["/tmp/screen.png"])
  4. Always screenshot AFTER actions to verify results

Gateway + GUI

  • When running nanobot gateway, always start with DISPLAY=:1 so Telegram/Discord agents can use GUI
  • The gateway agent has its own context — it won't know about the display unless MEMORY.md says so
  • Write important system info to MEMORY.md so all channels stay informed

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-03 09:25 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Find Skills

root
帮助用户发现和安装智能体技能,当用户询问如「如何做X」、「找X的技能」、「有能做...的吗」等问题时
★ 1,523 📥 580,181
ai-agent

self-improving agent

pskoett
记录自身发现以实现自我改进的技能
★ 4,170 📥 942,519
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 871 📥 348,990