Fast, Python-based browser automation CLI for AI agents
Agent Browser is a browser automation tool designed for AI agents. It provides a simple CLI interface to control web browsers using Playwright.
cd ~/.openclaw/workspace/skills/agent-browser
# Install Python dependencies
pip3 install -r requirements.txt
# Install Playwright browsers
python3 agent_browser.py install
python3 agent_browser.py open https://example.com
# Full accessibility tree
python3 agent_browser.py snapshot
# Interactive elements only
python3 agent_browser.py snapshot -i
# Compact output
python3 agent_browser.py snapshot -c
# Click element
python3 agent_browser.py click "#submit"
# Fill input field
python3 agent_browser.py fill "#email" "test@example.com"
# Type text
python3 agent_browser.py type "#search" "query"
# Get text content
python3 agent_browser.py get_text "#title"
# Get HTML
python3 agent_browser.py get_html "#content"
# Get current URL
python3 agent_browser.py get_url
# Get page title
python3 agent_browser.py get_title
# Normal screenshot
python3 agent_browser.py screenshot page.png
# Full page screenshot
python3 agent_browser.py screenshot page.png --full
# Wait for element
python3 agent_browser.py wait "#loader" --state hidden
# Wait for text
python3 agent_browser.py wait --text "Welcome"
# Wait for network idle
python3 agent_browser.py wait --load networkidle
# Find by role
python3 agent_browser.py find --role button --name "Submit"
# Find by text
python3 agent_browser.py find --text "Sign In"
# Find by label
python3 agent_browser.py find --label "Email"
python3 agent_browser.py close
# Fill form
python3 agent_browser.py fill "#name" "John Doe"
python3 agent_browser.py fill "#email" "john@example.com"
# Select dropdown
python3 agent_browser.py select "#country" "US"
# Check checkbox
python3 agent_browser.py check "#terms"
# Submit form
python3 agent_browser.py click "#submit"
python3 agent_browser.py upload "#file" file1.txt file2.txt
# Scroll down
python3 agent_browser.py scroll down 500
# Scroll up
python3 agent_browser.py scroll up 100
# Scroll element
python3 agent_browser.py scroll down 200 --selector "#main"
python3 agent_browser.py eval "document.title"
python3 agent_browser.py eval "window.innerWidth"
# Get input value
python3 agent_browser.py get_value "#email"
# Get attribute
python3 agent_browser.py get_attr "#link" href
# Get bounding box
python3 agent_browser.py get_box "#element"
# Count elements
python3 agent_browser.py count ".item"
# Headless mode (default)
python3 agent_browser.py open https://example.com --headless
# Show browser window
python3 agent_browser.py open https://example.com --headed
# Custom viewport
python3 agent_browser.py open https://example.com --viewport 1920x1080
# Interactive elements only
python3 agent_browser.py snapshot -i
# Compact output
python3 agent_browser.py snapshot -c
# Limit depth
python3 agent_browser.py snapshot -d 3
# Full page
python3 agent_browser.py screenshot page.png --full
# Annotate with labels
python3 agent_browser.py screenshot page.png --annotate
# 1. Navigate to page
python3 agent_browser.py open https://example.com
# 2. Get snapshot with refs
python3 agent_browser.py snapshot -i
# 3. AI identifies target elements
# 4. Execute actions
python3 agent_browser.py click "@e1"
python3 agent_browser.py fill "@e2" "input text"
# 5. Get new snapshot if page changed
python3 agent_browser.py snapshot -i
# Open login page
python3 agent_browser.py open https://example.com/login
# Fill credentials
python3 agent_browser.py fill "#email" "user@example.com"
python3 agent_browser.py fill "#password" "secret"
# Click submit
python3 agent_browser.py click "#submit"
# Wait for dashboard
python3 agent_browser.py wait --url "**/dashboard"
# Take screenshot
python3 agent_browser.py screenshot dashboard.png
# Open page
python3 agent_browser.py open https://example.com/products
# Get product titles
python3 agent_browser.py get_text ".product-title"
# Get prices
python3 agent_browser.py get_text ".product-price"
# Take screenshot
python3 agent_browser.py screenshot products.png
# Open form
python3 agent_browser.py open https://example.com/contact
# Fill fields
python3 agent_browser.py fill "#name" "John Doe"
python3 agent_browser.py fill "#email" "john@example.com"
python3 agent_browser.py fill "#message" "Hello!"
# Select dropdown
python3 agent_browser.py select "#subject" "Support"
# Check terms
python3 agent_browser.py check "#terms"
# Submit
python3 agent_browser.py click "#submit"
# Wait for confirmation
python3 agent_browser.py wait --text "Thank you"
All user inputs are sanitized before use:
All commands are safe and do not execute arbitrary code:
# Install Playwright browsers
python3 agent_browser.py install
# Check if element exists
python3 agent_browser.py is_visible "#element"
# Get snapshot to verify
python3 agent_browser.py snapshot -i
# Wait for page to load
python3 agent_browser.py wait --load networkidle
# Take screenshot after wait
python3 agent_browser.py screenshot page.png
# Increase timeout
python3 agent_browser.py wait "#element" --timeout 60000
For detailed API documentation, see docs/api.md.
from src.browser import BrowserAgent
# Initialize
agent = BrowserAgent(headless=True)
# Navigate
agent.open("https://example.com")
# Get snapshot
tree = agent.snapshot(interactive=True)
# Interact
agent.click("#submit")
agent.fill("#email", "test@test.com")
# Get info
text = agent.get_text("#title")
html = agent.get_html("#content")
# Screenshot
agent.screenshot("page.png")
# Close
agent.close()
MIT License - See LICENSE file for details.
For issues and questions:
Happy Automating!
共 1 个版本