Use magicbrowse to reach a target page when your runtime's own
page-control tool cannot do it reliably. "Page-control tool"
means a tool that drives browser pages programmatically and reports page state
back — not the user's desktop browser, and not screen-control of a
browser window. The planner runs two LLM loops per task and is slower
than direct browser control; prefer your own page-control tool
when it suffices. Use magicbrowse to reach a target page (search, navigation,
traversal through non-sensitive screens). At any login, identity, checkout,
donation, subscription, payment, or human-verification page, stop and surface
to the user — do not invent or type credentials, identity data, payment data,
or any value you do not legitimately have.
For a MagicPay product/payment workflow, use the MagicPay workflow-first
recipe instead of treating a standalone MagicBrowse browser as the product
parent: MagicPay starts the product session, then launches or attaches the
browser as a child resource.
Try in order. Do not start at layer 4 just because primitives exist.
control owned by your runtime. Screen-control (computer use) of an
already-open desktop browser does not qualify: takeover needs the
browser's CDP endpoint, a typical desktop browser starts without one,
and CDP cannot be enabled on a running browser without a restart. If
the session may need a MagicBrowse or MagicPay takeover mid-flow,
start from a browser with a known private CDP endpoint.
magicbrowse act "" — DOM-only navigator.magicbrowse act "" --use-vision — same goal, navigatorwith screenshots. Use only when the user is comfortable sending
screenshots/page context for this workflow. Vision is a retry mode
for the same task; keep the granule.
magicbrowse observe + primitives — click , type ,
fill , select ,
press . Use only when vision-mode act cannot make
progress, or when single-element precision is required. A primitive
completed result means the direct action ran; it is not a semantic
proof that the intended page state changed. Re-run observe before the
next decision. press is global — click first if focus matters.
For public navigation tasks, give act the semantic goal and a checkable
terminal condition:
✓ magicbrowse act "navigate to the public page that lists supported regions and stop when the region list is visible"
Avoid manually replaying snapshot ids before act has failed:
✗ magicbrowse observe → magicbrowse click 13 → magicbrowse observe → magicbrowse click 23
magicbrowse doctor first on a fresh install. It verifies thegateway config and reachability.
magicbrowse init (sign up at
https://agents.mercuryo.io/signup).
launch and act once doctor passes.> Consequential actions require approval. magicbrowse may
> navigate, inspect, draft, and prepare. It must stop and ask before
> submitting a form, posting or sending content, accepting terms,
> changing account data or settings, booking, buying, ordering,
> deleting or modifying remote data, or otherwise committing an
> irreversible or account-affecting action. After approval, re-run
> observe and execute only the approved final action. A successful typed
> MagicPay approval counts for that exact payment, signing, or confirmation
> action; ask again only if the approved page facts changed.
> Memory-managed data — never invent. Do not use act, type,
> fill, or select for any of the following on any page:
> - login or signup credentials (email, username, password, OTP),
> - identity-document fields (passport, ID, KYC address, DOB tied to
> identity),
> - payment-card or banking fields (PAN, CVV, expiry, IBAN, account),
> - any value sourced from a Memory or Memory store, or any value you
> do not legitimately have.
>
> Reach the page, stop before entering Memory values, and return the
> handoff to the orchestrator or MagicPay Memory fill workflow. Do not
> guess, placeholder, or fabricate Memory values. Be honest about what
> you cannot do.
> Use act before snapshot primitives. Do not start MagicBrowse work
> with observe plus click/type/select/press/fill before
> attempting act on the same goal. Why: the navigator keeps the goal,
> current page context, and completion check in one planner loop instead of
> spreading them across fragile snapshot ids. Use primitives only after
> DOM-only and vision-mode act cannot make progress, or when the recovery
> step is deliberately single-element.
> Target-ids are snapshot-scoped. Valid only for the observe
> snapshot that produced them. Re-run observe after any primitive that
> may change the page state before the next primitive — reusing an old id
> silently addresses a different element.
>
> ✓ observe → click 12 → observe → type 7 "hello"
> ✗ observe → click 12 → type 7 "hello"
> Primitive completion is not goal completion. For deterministic
> click/type/fill/select/press, status: "completed" means the
> browser action was dispatched through the direct action layer. It does not
> certify that a higher-level page condition is now true. If the next step
> depends on changed page state, observe again and branch on the fresh page
> state. If the task itself needs a completion check, use act with a
> checkable terminal condition instead of interpreting a primitive result as
> task success.
> One workflow per default home. The current-session pointer at
> $MAGICBROWSE_HOME/current-session.json (default ~/.magicbrowse/) is a
> singleton. Concurrent workflows on the same home overwrite each other. For
> parallel use, set a distinct MAGICBROWSE_HOME per workflow, or do not run
> the tasks in parallel.
> Fresh browser by default. Prefer an owned, fresh browser session.
> Use attach, --profile, or --user-data-dir only when the user
> explicitly approves that browser/session for the current task. One
> exception needs no separate approval: attaching to the browser child
> that MagicPay launched inside the current approved product workflow —
> that is the normal in-workflow bind of an owned disposable browser.
> Keep CDP endpoints private. Close the session before unrelated work.
> Page context can leave the browser. LLM-backed act sends page
> state to the gateway; --use-vision can include screenshots. Avoid
> private pages unless the user approves that workflow, and stop at login,
> identity, checkout, donation, subscription, or payment pages.
Contract: launch [url] → act … act → close. Sequential act calls in
one session preserve page state and planner memory.
magicbrowse launch — start a headless owned Chrome sessionpre-placed at the entry URL. Keep browser launches headless unless
the user explicitly asks for a visible browser or you are doing live
debugging. To attach to an existing CDP browser instead, first get
explicit user approval for that endpoint/session:
magicbrowse attach (positional, not a
--cdp-url flag).
magicbrowse act "" — natural-language browser step. Prompt is positional. act does not take --url; you cannot reset
the page from inside act. To re-anchor, close and launch again.
act for the next strategic granule.magicbrowse close — release the session when the overallMagicBrowse-owned browser task is done. If the workflow hands off to
another tool or the user on a sensitive page, keep the browser open until
that handoff completes. After the handoff completes, close only a
MagicBrowse-owned disposable browser that the user is not taking over; do
not close an external/user-owned attach without explicit approval.
magicbrowse run exists in the CLI for one-shot developer use. **It
is not part of this skill contract** — its bundled close destroys
continuity. Do not use it in an orchestrated workflow.
act where theorchestrator needs the next strategic decision. Tactics (which form
field first) live inside act; strategy (this partner is wrong, try
another) lives between act calls.
act; smaller is safer.** maxSteps: 100 is a safety ceiling. The planner
self-validates terminal status, so longer tasks have more room for
false-positive completion. Prefer smaller granules when the success
criterion cannot be checked externally.
task that reaches auth, CAPTCHA, or human verification ends with
status: needs_handoff, not failed. Plan tasks to end at such
a wall, not through it. magicbrowse does not solve CAPTCHA and
does not enter credentials. For a confirmed real CAPTCHA on the current
approved browser session, have the user or an external solver clear it;
after a successful solve, run magicbrowse mark-captcha-resolved before
the next act. Branch on handoff.kind: captcha means solve/mark,
auth means stop for user authentication, identity_verification means
stop for user/KYC handling, and memory_fill means hand off to the
MagicPay Memory fill workflow. Memory-fill handoffs include
resumeObjective; after the approved handler fills the form, continue
with that page-local objective. Never retry the same act against the
same wall. If the page asks for something you cannot legitimately
provide, be honest about it.
actcalls in one session preserve page state and planner memory. Do not
write "as we already found, continue with…" into goals — if you
feel the need to, the granularity is wrong.
on every DOM scan. Describe elements semantically.
act "click target 14"act "click the 'Continue' button under the price summary"criterion.**
act "get to checkout"act "navigate to a checkout page that shows passenger fields and total fare"launch, not as a separate step. To switch sites mid-workflow, either close and re-launch, or
describe the navigation inside the goal text.
> - Element indexes ([14], target 7) in goal text.
> - magicbrowse run for orchestrated multi-step workflows.
> - type / fill / select / act on Memory-managed fields. Stop at
> the form boundary; if act returns a memory-fill handoff, send it to
> the orchestrator or MagicPay Memory fill workflow and then resume with
> handoff.resumeObjective.
> - Letting act submit, post, book, buy, save, delete, or otherwise
> commit an account-affecting action without explicit approval or a matching
> typed MagicPay approval for unchanged page facts.
> - Trying to solve CAPTCHA through magicbrowse. On a confirmed real
> CAPTCHA, have the user or an external solver clear it, then
> magicbrowse mark-captcha-resolved before the next MagicBrowse step.
> - Attaching to a logged-in browser or named profile without explicit
> approval for the current task.
> - Closing a browser that was handed to another tool or the user before the
> overall task is actually done.
> - Re-narrating prior act results into the next goal — sequential
> act calls keep state.
> - Skipping the act-first path and starting at layer 4
> (observe + primitives).
> - Reusing a target-id from before a page mutation.
> - Treating a deterministic primitive's completed status as proof that
> the intended page state changed.
act returns `status: completed | blocked | needs_handoff |
needs_approval | failed | max_steps | cancelled. Branch on status`;
do not parse finalMessage to detect missing input, Memory
handoff, handoff subtype, or approval stops. For blocked, branch on
blockedReason: missing_input | item_unavailable | ambiguous | no_path.
For needs_handoff, branch on
handoff.kind: memory_fill | captcha | auth | identity_verification.
Layer-4 primitives return direct action results. Branch on their status
and reason, but verify page-state assumptions with a fresh observe;
primitive completed is not a substitute for a goal-level completion check.
finalMessage is the explanation to show the user or pass upstream.
Memory-fill handoff details are in handoff.resumeObjective. Exit
code 0 includes blocked, needs_handoff, and needs_approval; it
does not mean success.
example.
rules.
status handling.
共 7 个版本