← 返回
未分类 Key 中文

browseanything ai browser agent

Drive a real Chromium browser with an autonomous AI agent to do anything on the web — book flights, scrape sites, fill forms, log into apps, extract data beh...
使用自主AI代理控制真实的Chromium浏览器,完成网页上的任何操作——预订航班、爬取网站、填写表单、登录应用、提取数据等。
mehdi149 mehdi149 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 308
下载
💾 0
安装
1
版本
#latest

概述

Browse Anything

This skill lets you delegate any web task to a real browser driven by an

autonomous AI agent. You give a natural-language prompt; BrowseAnything

opens Chromium, navigates, clicks, types, solves CAPTCHAs, and returns

the result — including a screenshot.

When to use

Trigger this skill whenever the task requires the live web, e.g.:

  • "Find the cheapest flight from X to Y next month"
  • "Log into my Notion and pull the latest entries from this database"
  • "Fill out this Google Form with the following answers"
  • "Check whether is down right now"
  • "Buy item Z if it's under $50"
  • "Scrape the top 20 results for query Q from "
  • "Take a screenshot of after clicking Accept"

Do not use it for tasks the model can answer from internal knowledge,

or for tasks that have a dedicated MCP/API the user already configured

(prefer the more specific tool when available).

One-time setup

  1. The user must have a BrowseAnything API key (ba_live_...). Direct them to

→ Settings → API Keys to create one.

  1. They export it once:

```bash

export BROWSEANYTHING_API_KEY=ba_live_...

```

  1. (Optional self-host) Set BROWSEANYTHING_API_URL=https://your-host to

point at a self-hosted engine. Default is the hosted platform.

If BROWSEANYTHING_API_KEY is missing the scripts exit 2 with a clear

message — surface that to the user verbatim.

Default workflow (high-level)

For 95% of requests use the one-shot browse.py script. It creates a

task, polls until done, and prints the result.

python3 {baseDir}/scripts/browse.py "Find the cheapest direct flight from CDG to NRT in May, return airline + price + booking URL."

Useful flags:

  • --model : override the LLM (e.g. gpt-5.2, kimi-k2.6)
  • --max-steps : cap agent steps (default 80)
  • --proxy : e.g. us, eu
  • --metadata '{"key":"value"}': attach JSON metadata
  • --timeout : max wait (default 900)
  • --json: emit the full task object instead of a friendly summary

Exit codes:

CodeMeaning
---------------
0Task completed successfully
1Task failed (read stderr / error_message)
2Auth/usage problem (missing key, insufficient credits, bad input)
3Network unreachable
4Local timeout (task may still be running on server)
5Task is paused waiting for human input — see below

Low-level workflow (manual control)

Use these when you need to fire-and-forget, run many tasks in parallel,

fetch screenshots mid-execution, or react to requires_input.

ID=$(python3 {baseDir}/scripts/create_task.py "Prompt...")
python3 {baseDir}/scripts/get_task.py "$ID" --field status
python3 {baseDir}/scripts/get_task.py "$ID"                # full JSON
python3 {baseDir}/scripts/get_screenshot.py "$ID" --out latest.png
python3 {baseDir}/scripts/list_tasks.py --limit 20
python3 {baseDir}/scripts/cancel_task.py "$ID"
python3 {baseDir}/scripts/status.py                        # backend capacity

Handling human-in-the-loop

If a task can't proceed without information only the user has (a 2FA

code, a clarification, a confirmation), it transitions to status

requires_input. The high-level browse.py exits with code 5 and

prints the question. To answer:

python3 {baseDir}/scripts/submit_input.py <task_id> "the user's answer"

Then resume polling with get_task.py (or call browse.py flow again

on the same id by polling manually). Always ask the user before

inventing an answer for a requires_input prompt.

Authoring great prompts

The agent works best with prompts that are concrete and verifiable.

  • ✅ "On amazon.fr, search 'Sony WH-1000XM5', open the cheapest new listing

shipped from Amazon, return seller + price + ETA."

  • ❌ "find me good headphones"

Tips:

  • Name the website explicitly when you know it
  • State the success criterion ("return X, Y, Z")
  • Mention any login state ("I'm already logged in, my session is in the

saved profile") — though credentials should never be passed in plain text;

prefer pre-saved sessions in the BrowseAnything dashboard

  • Cap scope: one task, one outcome

Cost & limits

  • Tasks consume credits; tier-dependent step/concurrency caps apply
  • Default per-task hard cap: 80 steps, 20 minutes
  • Rate limit: 100 API requests/min/key
  • Supported models include gpt-5.2, gpt-5.4, kimi-k2.6,

anthropic/claude-haiku-4.5, gemini-3-flash-preview, gpt-4.1,

llama-4, openai/gpt-oss-120b, plus mini variants. The available set

depends on your tier; unsupported values return a hard error rather than

falling back. Copy the exact string from the API error message when retrying.

Pitfalls & troubleshooting

  • Model names are exact strings. The API validates the --model value strictly

(e.g. gpt-5.2 works, gpt5.4 without a hyphen does not). If you get

Invalid model, retry with the exact name from the API error message.

  • Cancel only works on running tasks. cancel_task.py returns

Task not found or cannot be cancelled for tasks that have already failed

or completed. Check status with get_task.py --field status first.

  • Human-in-the-loop blocks billing. A task stuck on requires_input

consumes concurrency but not steps; answer promptly or cancel to free the slot.

  • Foreground timeouts may be clamped by the host environment. If the

terminal tool rejects a 900 s wait, run browse.py in the background

(background=true, notify_on_complete=true) and poll with

get_task.py until it finishes.

  • Inspect requires_input messages before replying. The agent sometimes

embeds the completed answer inside its question (e.g. a table of flight

results). If the task is effectively done, cancel it rather than submitting

unnecessary input.

More

  • REFERENCE.md — full API surface, request/response shapes, status enum
  • EXAMPLES.md — copy-paste prompt patterns for common scenarios
  • README.md — install instructions for Claude Code, OpenClaw, Cursor,

Codex, Gemini, Windsurf

  • references/recurring-scraping-pipeline.md — architecture for daily

automated scraping, deduplication, enrichment, and dashboard

reporting (real estate, price monitoring, job boards, etc.)

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-21 15:12 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Find Skills

root
帮助用户发现和安装智能体技能,当用户询问如「如何做X」、「找X的技能」、「有能做...的吗」等问题时
★ 1,518 📥 574,820
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 865 📥 344,942
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,441 📥 328,499