This skill lets you delegate any web task to a real browser driven by an
autonomous AI agent. You give a natural-language prompt; BrowseAnything
opens Chromium, navigates, clicks, types, solves CAPTCHAs, and returns
the result — including a screenshot.
Trigger this skill whenever the task requires the live web, e.g.:
Do not use it for tasks the model can answer from internal knowledge,
or for tasks that have a dedicated MCP/API the user already configured
(prefer the more specific tool when available).
ba_live_...). Direct them to
```bash
export BROWSEANYTHING_API_KEY=ba_live_...
```
BROWSEANYTHING_API_URL=https://your-host topoint at a self-hosted engine. Default is the hosted platform.
If BROWSEANYTHING_API_KEY is missing the scripts exit 2 with a clear
message — surface that to the user verbatim.
For 95% of requests use the one-shot browse.py script. It creates a
task, polls until done, and prints the result.
python3 {baseDir}/scripts/browse.py "Find the cheapest direct flight from CDG to NRT in May, return airline + price + booking URL."
Useful flags:
--model : override the LLM (e.g. gpt-5.2, kimi-k2.6)--max-steps : cap agent steps (default 80)--proxy : e.g. us, eu--metadata '{"key":"value"}': attach JSON metadata--timeout : max wait (default 900)--json: emit the full task object instead of a friendly summaryExit codes:
| Code | Meaning |
|---|---|
| ------ | --------- |
| 0 | Task completed successfully |
| 1 | Task failed (read stderr / error_message) |
| 2 | Auth/usage problem (missing key, insufficient credits, bad input) |
| 3 | Network unreachable |
| 4 | Local timeout (task may still be running on server) |
| 5 | Task is paused waiting for human input — see below |
Use these when you need to fire-and-forget, run many tasks in parallel,
fetch screenshots mid-execution, or react to requires_input.
ID=$(python3 {baseDir}/scripts/create_task.py "Prompt...")
python3 {baseDir}/scripts/get_task.py "$ID" --field status
python3 {baseDir}/scripts/get_task.py "$ID" # full JSON
python3 {baseDir}/scripts/get_screenshot.py "$ID" --out latest.png
python3 {baseDir}/scripts/list_tasks.py --limit 20
python3 {baseDir}/scripts/cancel_task.py "$ID"
python3 {baseDir}/scripts/status.py # backend capacity
If a task can't proceed without information only the user has (a 2FA
code, a clarification, a confirmation), it transitions to status
requires_input. The high-level browse.py exits with code 5 and
prints the question. To answer:
python3 {baseDir}/scripts/submit_input.py <task_id> "the user's answer"
Then resume polling with get_task.py (or call browse.py flow again
on the same id by polling manually). Always ask the user before
inventing an answer for a requires_input prompt.
The agent works best with prompts that are concrete and verifiable.
shipped from Amazon, return seller + price + ETA."
Tips:
saved profile") — though credentials should never be passed in plain text;
prefer pre-saved sessions in the BrowseAnything dashboard
gpt-5.2, gpt-5.4, kimi-k2.6, anthropic/claude-haiku-4.5, gemini-3-flash-preview, gpt-4.1,
llama-4, openai/gpt-oss-120b, plus mini variants. The available set
depends on your tier; unsupported values return a hard error rather than
falling back. Copy the exact string from the API error message when retrying.
--model value strictly (e.g. gpt-5.2 works, gpt5.4 without a hyphen does not). If you get
Invalid model, retry with the exact name from the API error message.
cancel_task.py returns Task not found or cannot be cancelled for tasks that have already failed
or completed. Check status with get_task.py --field status first.
requires_inputconsumes concurrency but not steps; answer promptly or cancel to free the slot.
terminal tool rejects a 900 s wait, run browse.py in the background
(background=true, notify_on_complete=true) and poll with
get_task.py until it finishes.
requires_input messages before replying. The agent sometimesembeds the completed answer inside its question (e.g. a table of flight
results). If the task is effectively done, cancel it rather than submitting
unnecessary input.
REFERENCE.md — full API surface, request/response shapes, status enumEXAMPLES.md — copy-paste prompt patterns for common scenariosREADME.md — install instructions for Claude Code, OpenClaw, Cursor,Codex, Gemini, Windsurf
references/recurring-scraping-pipeline.md — architecture for dailyautomated scraping, deduplication, enrichment, and dashboard
reporting (real estate, price monitoring, job boards, etc.)
共 1 个版本