AgentBrowse

概述

AgentBrowse is the browser layer for agent tasks that happen on a real

website.

Use this skill when the agent needs to:

launch a browser or attach to an existing one;
inspect the current page and decide from visible state;
click, type, select, and otherwise act on returned target refs;
navigate directly to a known URL;
extract structured data from the page;
capture screenshots or recover a stuck browser session.

AgentBrowse works well on its own for browser automation. It can also be

paired with MagicPay later when a broader flow reaches an approved login,

identity, or payment step.

Open source:

Browser library and docs: https://github.com/MercuryoAI/agentbrowse
CLI package: @mercuryo-ai/agentbrowse-cli

Setup

agentbrowse must be available on PATH. If it is missing or outdated, run

npm i -g @mercuryo-ai/agentbrowse-cli@latest, then verify with

agentbrowse --version.

agentbrowse launch needs an environment that can start a browser.

agentbrowse attach needs a reachable CDP endpoint.

Core browser commands such as launch, attach, navigate, act,

browser-status, screenshot, and close do not need any API key.

AI-assisted features — observe with a natural-language goal and

extract — call an LLM through the gateway. Configure API access with

agentbrowse init before using them. Pass a non-default API

URL during init if needed.

agentbrowse doctor inspects the local config. Use it after init

when AI-assisted observe or extract still fails.

Core Loop

Start or connect to a browser with agentbrowse launch [url] or

agentbrowse attach .

Read the page with agentbrowse observe.
Act on the returned refs with agentbrowse act [value].
Re-run agentbrowse observe after navigation or meaningful UI changes.
Use agentbrowse navigate when the destination is already known.
Use agentbrowse extract '' [scopeRef] when you need

structured output instead of another page action.

Use agentbrowse screenshot or agentbrowse browser-status only for

evidence and debugging.

Finish with agentbrowse close when the browser session is no longer

needed.

When To Bring In Another Tool

Bring in a companion protected-flow tool when the site reaches:

a login step that needs approved protected values;
an identity form with protected personal data;
a payment step with protected card details or approval flow.

At that point AgentBrowse can stay the browsing layer around the protected

step, but it should not invent its own secret-handling flow.

Ask-User Boundary

Ask the user only when:

the correct next step is still ambiguous after re-observing the page;
the environment cannot launch or attach to a browser;
the task crosses into a protected approval or payment boundary.

Operating Rules

Trust the visible page state, not assumptions about what should have

happened.

Re-observe after meaningful page changes instead of reusing stale refs.
Keep browser work and protected-step handling separated.
close is only teardown or recovery. Never treat close as a success

signal — task success comes from the visible page state before close.

More Detail

Open an extra reference only when it helps:

Operating guide for resume and recovery.
Command guide for every CLI command.
Failure recovery for common runtime states.
Boundaries and escalation for safety rules.

If a term (session, ref, targetRef, scopeRef, fillRef, pageRef)

is unfamiliar, check the

AgentBrowse API reference glossary.

版本历史

共 1 个版本

v0.1.22 当前

2026-05-03 04:54 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)