opencli-browser

The first reader of this CLI is an agent, not a human. Every subcommand returns a structured envelope that tells you exactly what matched, how confident the match is, and what to do if it didn't. Lean on those envelopes — do not guess.

This skill is for driving a live browser to accomplish an agent task. If you are building a reusable adapter under ~/.opencli/clis// use opencli-adapter-author instead.

Prerequisites

opencli doctor

Until doctor is green, nothing else will work. Typical failures: Chrome not running, extension not installed, debug port blocked by 1Password / other extensions. The doctor output tells you which.

Window lifecycle

opencli browser * commands already keep the automation session alive between calls. The window stays open until you run opencli browser close or the idle timeout expires.
--focus (or OPENCLI_WINDOW_FOCUSED=1) opens the automation window in the foreground. Use it when you want to watch the page live.
--live (or OPENCLI_LIVE=1) is mainly for browser-backed adapter commands such as opencli xiaohongshu note .... It keeps the adapter's automation window open after the command returns so you can inspect the final page state.

Mental model

Selector-first target contract. Every interaction command (click, type, select, get text/value/attributes) takes one , which is either a numeric ref from state/find or a CSS selector. Use --nth to disambiguate multiple CSS matches.
Every envelope reports matches_n and match_level. match_level is exact, stable, or reidentified — the CLI already rescued moderate DOM drift for you, but the level tells you how confident to be.
Compact output first, full payload on demand. state is a budget-aware snapshot; get html --as json supports --depth/--children-max/--text-max; network returns shape previews and you re-fetch a single body with --detail . If you emit a giant payload you are burning context you did not need to burn.
Structured errors are machine-readable. On failure the CLI emits {error: {code, message, hint?, candidates?}}. Branch on code, not on message strings.

Critical rules

Always inspect before you act. Run state or find first. Never hard-code a ref or selector from memory across sessions — indices are per-snapshot.
Prefer numeric ref over CSS once you have it. Numeric refs survive mild DOM shifts because the CLI fingerprints each tagged element. A CSS selector written by hand will break the first time the site re-renders.
Read match_level after every write. exact = all good. stable = the element is the same but some soft attrs drifted — your action still applied. reidentified = the original ref was gone and the CLI found a unique replacement; double-check you hit the right element.
Use the compound field for form controls. Do not regex-guess a date format, do not state twice to get the full .
Verify writes that matter. After type , run get value . After select, run get value. Autocomplete widgets, React controlled inputs, and masked fields all silently eat characters. The CLI cannot detect this for you.
state → action → state after a page change. Navigations, form submits, and SPA route changes invalidate refs. Take a fresh snapshot. Do not reuse refs from before the transition.
Chain with &&. A chained sequence runs in one shell so refs acquired by the first command stay live for the second. Separate shell invocations lose the session context you just set up.
eval is read-only. Wrap the JS in an IIFE and return JSON. If you need to change the page, use the structured click / type / select / keys commands instead — they produce structured output and fingerprints, eval does not.
Prefer network to screen-scraping. If a page you care about fetches its data from a JSON API, the API is almost always more reliable than scraping the rendered DOM. Capture once, inspect the shape, then --detail the body you need.

Target contract ( for click / type / select / get text|value|attributes)

<target> ::= <numeric-ref> | <css-selector>

Numeric ref — the [N] index from state or find. Cheap, resilient to soft DOM drift.
CSS selector — anything querySelectorAll accepts. Must be unambiguous on write ops, or pair with --nth .

Envelope on success

{ "clicked": true, "target": "3", "matches_n": 1, "match_level": "exact" }

{ "value": "kalevin@example.com", "matches_n": 1, "match_level": "stable" }

match_level

| level | meaning | you should |

|-------|---------|------------|

| exact | Fingerprint agreed on tag + strong IDs with at most one soft drift | Proceed. |

| stable | Tag + strong IDs still agree, soft signals (aria-label, role, text) drifted | Proceed, but if what you typed/clicked matters, re-check with get value or state. |

| reidentified | Original ref was gone; a unique live element matched the fingerprint and was re-tagged with the old ref | Double-check you hit the right element before chaining more writes. |

Structured error codes

Branch on these, not on the human message:

| code | meaning |

|------|---------|

| not_found | Numeric ref is no longer in the DOM. Re-state. |

| stale_ref | Ref exists but the element at that ref changed identity. Re-state. |

| invalid_selector | CSS was rejected by querySelectorAll. Fix the selector. |

| selector_not_found | CSS matches 0 elements. Try find with a looser selector. |

| selector_ambiguous | CSS matches >1 and no --nth. Add --nth or narrow the selector. |

| selector_nth_out_of_range | --nth beyond match count. |

| option_not_found | select couldn't find an option matching that label/value. Error envelope includes available: string[] of the real option labels. |

| not_a_select | select was called on a non-

opencli-browser

概述

opencli-browser

Prerequisites

Window lifecycle

Mental model

Critical rules

Target contract ( for click / type / select / get text|value|attributes)

Envelope on success

match_level

Structured error codes