← 返回
未分类 中文

Fast Agent Browser

Python CLI tool for AI agents to automate web browsers with Playwright, supporting navigation, interaction, snapshots, screenshots, and form handling.
Python CLI 工具,供 AI 代理使用 Playwright 自动化浏览器,支持导航、交互、快照、截图和表单处理。
leohuang8688
未分类 clawhub v1.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 300
下载
💾 0
安装
1
版本
#latest

概述

Agent Browser Skill

Fast, Python-based browser automation CLI for AI agents


Overview

Agent Browser is a browser automation tool designed for AI agents. It provides a simple CLI interface to control web browsers using Playwright.


Features

  • Fast CLI for browser automation
  • AI-friendly snapshot command
  • Full page interaction (click, fill, type, etc.)
  • Semantic element finding (role, text, label, etc.)
  • Smart waiting (element, text, URL, network)
  • Screenshot and PDF support
  • File upload support
  • JavaScript execution
  • Cookie and storage management

Installation

cd ~/.openclaw/workspace/skills/agent-browser

# Install Python dependencies
pip3 install -r requirements.txt

# Install Playwright browsers
python3 agent_browser.py install

Basic Usage

Open a URL

python3 agent_browser.py open https://example.com

Get Page Snapshot

# Full accessibility tree
python3 agent_browser.py snapshot

# Interactive elements only
python3 agent_browser.py snapshot -i

# Compact output
python3 agent_browser.py snapshot -c

Interact with Elements

# Click element
python3 agent_browser.py click "#submit"

# Fill input field
python3 agent_browser.py fill "#email" "test@example.com"

# Type text
python3 agent_browser.py type "#search" "query"

Get Information

# Get text content
python3 agent_browser.py get_text "#title"

# Get HTML
python3 agent_browser.py get_html "#content"

# Get current URL
python3 agent_browser.py get_url

# Get page title
python3 agent_browser.py get_title

Take Screenshot

# Normal screenshot
python3 agent_browser.py screenshot page.png

# Full page screenshot
python3 agent_browser.py screenshot page.png --full

Wait for Elements

# Wait for element
python3 agent_browser.py wait "#loader" --state hidden

# Wait for text
python3 agent_browser.py wait --text "Welcome"

# Wait for network idle
python3 agent_browser.py wait --load networkidle

Find Elements

# Find by role
python3 agent_browser.py find --role button --name "Submit"

# Find by text
python3 agent_browser.py find --text "Sign In"

# Find by label
python3 agent_browser.py find --label "Email"

Close Browser

python3 agent_browser.py close

Advanced Usage

Form Automation

# Fill form
python3 agent_browser.py fill "#name" "John Doe"
python3 agent_browser.py fill "#email" "john@example.com"

# Select dropdown
python3 agent_browser.py select "#country" "US"

# Check checkbox
python3 agent_browser.py check "#terms"

# Submit form
python3 agent_browser.py click "#submit"

File Upload

python3 agent_browser.py upload "#file" file1.txt file2.txt

Scroll Page

# Scroll down
python3 agent_browser.py scroll down 500

# Scroll up
python3 agent_browser.py scroll up 100

# Scroll element
python3 agent_browser.py scroll down 200 --selector "#main"

Execute JavaScript

python3 agent_browser.py eval "document.title"
python3 agent_browser.py eval "window.innerWidth"

Get Element Info

# Get input value
python3 agent_browser.py get_value "#email"

# Get attribute
python3 agent_browser.py get_attr "#link" href

# Get bounding box
python3 agent_browser.py get_box "#element"

# Count elements
python3 agent_browser.py count ".item"

Options

Global Options

# Headless mode (default)
python3 agent_browser.py open https://example.com --headless

# Show browser window
python3 agent_browser.py open https://example.com --headed

# Custom viewport
python3 agent_browser.py open https://example.com --viewport 1920x1080

Snapshot Options

# Interactive elements only
python3 agent_browser.py snapshot -i

# Compact output
python3 agent_browser.py snapshot -c

# Limit depth
python3 agent_browser.py snapshot -d 3

Screenshot Options

# Full page
python3 agent_browser.py screenshot page.png --full

# Annotate with labels
python3 agent_browser.py screenshot page.png --annotate

AI Workflow

Optimal AI Agent Workflow

# 1. Navigate to page
python3 agent_browser.py open https://example.com

# 2. Get snapshot with refs
python3 agent_browser.py snapshot -i

# 3. AI identifies target elements

# 4. Execute actions
python3 agent_browser.py click "@e1"
python3 agent_browser.py fill "@e2" "input text"

# 5. Get new snapshot if page changed
python3 agent_browser.py snapshot -i

Examples

Example 1: Login Flow

# Open login page
python3 agent_browser.py open https://example.com/login

# Fill credentials
python3 agent_browser.py fill "#email" "user@example.com"
python3 agent_browser.py fill "#password" "secret"

# Click submit
python3 agent_browser.py click "#submit"

# Wait for dashboard
python3 agent_browser.py wait --url "**/dashboard"

# Take screenshot
python3 agent_browser.py screenshot dashboard.png

Example 2: Data Extraction

# Open page
python3 agent_browser.py open https://example.com/products

# Get product titles
python3 agent_browser.py get_text ".product-title"

# Get prices
python3 agent_browser.py get_text ".product-price"

# Take screenshot
python3 agent_browser.py screenshot products.png

Example 3: Form Submission

# Open form
python3 agent_browser.py open https://example.com/contact

# Fill fields
python3 agent_browser.py fill "#name" "John Doe"
python3 agent_browser.py fill "#email" "john@example.com"
python3 agent_browser.py fill "#message" "Hello!"

# Select dropdown
python3 agent_browser.py select "#subject" "Support"

# Check terms
python3 agent_browser.py check "#terms"

# Submit
python3 agent_browser.py click "#submit"

# Wait for confirmation
python3 agent_browser.py wait --text "Thank you"

Security Notes

Input Sanitization

All user inputs are sanitized before use:

  • Selectors are validated
  • Text inputs are escaped
  • URLs are validated
  • JavaScript execution requires explicit command

Safe Commands

All commands are safe and do not execute arbitrary code:

  • No shell injection possible
  • No command injection possible
  • All inputs are validated

Best Practices

  1. Use headless mode for automation
  2. Validate all inputs before use
  3. Use explicit selectors
  4. Close browser when done
  5. Use timeouts for waits

Troubleshooting

Browser Does Not Open

# Install Playwright browsers
python3 agent_browser.py install

Element Not Found

# Check if element exists
python3 agent_browser.py is_visible "#element"

# Get snapshot to verify
python3 agent_browser.py snapshot -i

Screenshot Is Blank

# Wait for page to load
python3 agent_browser.py wait --load networkidle

# Take screenshot after wait
python3 agent_browser.py screenshot page.png

Timeout Errors

# Increase timeout
python3 agent_browser.py wait "#element" --timeout 60000

API Reference

For detailed API documentation, see docs/api.md.

BrowserAgent Class

from src.browser import BrowserAgent

# Initialize
agent = BrowserAgent(headless=True)

# Navigate
agent.open("https://example.com")

# Get snapshot
tree = agent.snapshot(interactive=True)

# Interact
agent.click("#submit")
agent.fill("#email", "test@test.com")

# Get info
text = agent.get_text("#title")
html = agent.get_html("#content")

# Screenshot
agent.screenshot("page.png")

# Close
agent.close()

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

License

MIT License - See LICENSE file for details.


Support

For issues and questions:

  • GitHub: https://github.com/leohuang8688/agent-browser
  • Documentation: See README.md and docs/api.md

Happy Automating!

版本历史

共 1 个版本

  • v1.1.0 当前
    2026-05-07 13:47 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

professional

Stock Analysis Skill

leohuang8688
提供多市场实时股票分析,涵盖技术指标、新闻情绪以及AI对投资组合和指数的买入/卖出/持有建议。
★ 0 📥 851
ai-agent

Self Improving Agent

leohuang8688
OpenClaw自改进代理系统,支持从交互、错误及恢复中持续学习,自动优化性能。
★ 0 📥 907
professional

Yahoo Claw

leohuang8688
雅虎财经API集成。用于查询股价、公司财务、历史数据、股息及市场数据。
★ 1 📥 787