Win11 Visible Browser Automation

Use this skill when OpenClaw runs in WSL2/Linux but should work in a visible Windows 11 Edge/Chrome browser that the human can watch, use, and take over.

This is for legitimate assisted browsing in a normal visible browser session. Do not use it to bypass site protections, automate prohibited activity, or hide automation from the user.

Safety gate

Before state-changing actions, state what/where/risk/rollback and wait for explicit confirmation. State-changing actions include editing OpenClaw config, creating Scheduled Tasks, changing Windows firewall/portproxy, starting/stopping browser processes, writing scripts outside the workspace, sending forms/messages, purchases, or account actions.

ClawScan risk mitigations to preserve:

Prefer a dedicated browser profile by default; use a personal/logged-in profile only after explicit user approval.
Do not proceed with browser, account, payment, form, firewall, portproxy, config, or Scheduled Task changes unless the action and rollback are clear.
Verify Windows firewall rules are scoped to WSL/Hyper-V CIDR; never expose the CDP port to the LAN or Internet.
Create persistent Scheduled Tasks only after explicit approval, and keep rollback documented with Unregister-ScheduledTask.

Positioning

Prefer visible browser automation when the task benefits from:

existing tabs already open in the user's browser;
cookies, logins, extensions, and normal browser state;
visible step-by-step human oversight;
manual human help for login, captcha, 2FA, consent screens, account pickers, file dialogs, or sensitive approvals;
sites that do not work well through web_fetch or a fresh/headless browser.

Use safe wording: this skill gives the agent access to a normal visible browser while keeping the human in the loop. It does not try to bypass anti-bot systems.

Browser Resource Budget / Tab Hygiene

Visible browser control is expensive: it consumes tokens, Edge/Chrome memory, and CDP stability budget. Before using the visible browser, prefer cheaper tools when they satisfy the task.

Cost ladder

Use the cheapest sufficient tool, in this order:

Local files, project notes, memory, or CLI output.
First-class APIs/CLIs (clawhub, openclaw, curl, source-specific tools).
web_fetch for readable public pages.
Browser evaluate for structured extraction from an already-open page.
Browser snapshot/screenshot for UI understanding or evidence.
Visible Edge/Chrome CDP only when logins, cookies, human-in-the-loop, visual verification, or sites that reject cheaper access are actually needed.

If the user explicitly asks to open a site/tab in the visible browser, do not debate whether the browser is necessary. Still check basic safety/resource risk first and report if opening another tab may overload the browser.

Existing tabs are not agent-owned

Treat all tabs that existed before the current task as user state.

Do not close, reload, navigate, or repurpose existing user tabs without explicit permission.
Existing tabs already count against free system memory; do not double-subtract them in memory estimates.
They still count against CDP complexity: many existing pages/iframes/workers can make automation unstable.
If existing tabs leave too little resource budget, stop and ask for cleanup permission instead of taking over those tabs.

Classify tabs mentally:

User tabs: existed before the task; do not touch.
User-requested tabs: opened because the user explicitly asked; do not close unless asked.
Agent task tabs: opened by the agent for the current task; save useful URLs and close/clean them when done.
Archived tabs: URLs saved into a project file such as browser-tabs-YYYY-MM-DD.md; safe to close only after user approval.
Critical/manual tabs: login, captcha, payment, forms, account settings; human-in-the-loop only.

Preflight budget check

Before non-trivial visible-browser work, estimate both memory budget and CDP complexity budget. Prefer the helper:

{baseDir}/scripts/browser-budget-check.sh win-edge

If the helper is unavailable, inspect CDP directly:

WIN_IP=$(ip route | awk '/default/ {print $3; exit}')
curl -sS --max-time 8 "http://$WIN_IP:9223/json/list"

Decision inputs:

current page, iframe, and worker target counts;
whether reCAPTCHA/service workers are present;
current browser memory when measurable;
free system memory after existing user tabs;
minimum number of new tabs required by the task;
whether the task can use one reusable tab instead.

Approximate planning costs:

simple/static tab: 50-150 MB;
normal web app tab: 100-250 MB;
heavy SPA/account dashboard: 250+ MB;
iframe/worker-heavy or reCAPTCHA site: treat as high risk; do not open cards in parallel.

Memory rule of thumb:

keep at least 1 GB safety headroom;
if free memory after headroom is < estimated task cost, do not start;
if unsure, use one tab and write progress to project files rather than opening more tabs.

CDP complexity stop rules:

targets > 30: stop and propose inventory/cleanup;
pages > 10: caution; avoid opening more tabs unless explicitly needed;
iframe + worker > max(6, pages * 2): stop; the site is spawning too much browser state;
any reCAPTCHA burst: stop automation and switch to human-in-the-loop or project-file workflow;
repeated targetId, Execution context destroyed, or timeout errors: refresh target inventory instead of retrying blindly.

Minimal-tab workflow

Do not use the browser as task memory.

Default to one list/search tab and, if needed, one reusable detail tab.
Do not open a fan-out of many result cards/resumes/products.
Extract links from a list into a project file first.
Visit/detail one item at a time, record the result, then reuse or close the tab.
Prefer evaluate to extract structured data; use snapshots only when the UI structure is unknown.
Keep final chat replies compact: summary + path to saved project file, not full DOM dumps.

Cleanup and archiving

At the end of a browser task:

Save useful URLs/data into the relevant project (browser-tabs-YYYY-MM-DD.md, candidates.md, sources.md, progress.md, etc.).
Report which agent-created tabs can be closed.
Close only agent-owned or user-approved archived tabs.
Leave user tabs and user-requested tabs alone.

For cleanup of a polluted browser session, first create a read-only inventory grouped by domain/type, deduplicate URLs, and write it to the project. Only after that, ask permission to close the relevant domain/targets.

Recommended architecture

Use a dedicated Windows browser profile by default. Use the user's personal browser profile only when the user explicitly wants existing personal cookies/logins/tabs.

OpenClaw Gateway in WSL2
  → OpenClaw browser profile (example: win-edge)
  → http://WINDOWS_WSL_GATEWAY_IP:9223
  → Windows portproxy/firewall relay
  → 127.0.0.1:9222
  → visible Windows 11 Edge/Chrome profile

Recommended defaults:

OpenClaw browser profile: win-edge or win-chrome
Windows CDP local port: 9222
WSL-visible relay port: 9223
Dedicated browser profile: C:\ProgramData\OpenClaw\browser-profile
Startup/repair task: OpenClaw-Start-Windows-Browser-CDP

For implementation details, read {baseDir}/references/setup.md.

Diagnose first

Run read-only checks before repair:

openclaw browser profiles
openclaw browser --browser-profile win-edge doctor
WIN_IP=$(ip route | awk '/default/ {print $3; exit}')
curl -sS --max-time 5 "http://$WIN_IP:9223/json/version"

Or use the bundled helper:

{baseDir}/scripts/check-win11-visible-browser.sh win-edge

If CDP works, smoke-test real browser control:

openclaw browser --browser-profile win-edge open https://example.com
openclaw browser --browser-profile win-edge snapshot --format aria

Repair order

Repair in layers and stop when the layer works:

Confirm Windows Edge/Chrome is installed and can run visibly.
Start the browser with CDP on Windows localhost, usually 127.0.0.1:9222.
Expose it to WSL with a Windows relay/portproxy, usually 0.0.0.0:9223 → 127.0.0.1:9222.
Restrict Windows firewall to the current WSL/Hyper-V CIDR, not the whole LAN or Internet.
Configure an OpenClaw browser profile with cdpUrl pointing to the WSL-visible Windows endpoint and attachOnly: true.
Reload/restart Gateway if the profile is not visible.
Run doctor and a page/snapshot smoke test.

The bundled Windows repair script is {baseDir}/scripts/start-win11-browser-cdp-for-openclaw.ps1. Treat it as a template: review paths, profile name, browser path, ports, and firewall rule names before installing or running it.

Common blockers

No supported browser found: WSL cannot launch Windows Edge/Chrome as a local Linux browser; use remote CDP.
Windows CDP works but WSL curl times out: fix portproxy/firewall/WSL subnet.
Browser profile not found: OpenClaw config not loaded; reload/restart Gateway.
WSL gateway IP changed: update browser.profiles..cdpUrl or rerun the documented repair flow.
Existing tabs/logins are missing: you are probably using a dedicated profile, not the user's real profile. Ask before switching profiles.

Evidence to report

When done, report:

browser profile name and CDP URL tested;
openclaw browser --browser-profile doctor result;
/json/version result from WSL;
Windows task/log status if relevant;
smoke-test URL opened and snapshot result;
any remaining manual human step needed.

Data extraction

For structured data extraction (prices, search results, product specs, availability):

Snapshot the page once to understand the layout.
Use act kind=evaluate with a JavaScript function to extract clean data as JSON/strings in a single call.
Repeat evaluate for pagination or updated data; no new snapshot needed unless the DOM structure changes.

This uses orders of magnitude fewer tokens than snapshot-per-action loops.

Tips:

If content loads lazily on scroll, scroll the container into view via evaluate before extraction.
Extract all visible results in one pass: name, price, seller, delivery date, link.
Some sites (Ozon, Wildberries, М.Видео) trigger antibot challenges on web_fetch but work through the visible CDP-attached browser.
Яндекс Маркет generally works with both web_fetch and the visible browser.

Visual result presentation

Beyond text, the visible browser lets you show results directly to the user:

Open result tabs — after finding a product/article/video, open it in a labeled browser tab so the user can see and interact with it in real time.
Screenshot capture — take a screenshot of the relevant page section and attach it to your response for instant visual confirmation.
Multi-tab orchestration — open several search results at once with distinct labels (label="product-1", label="product-2"), letting the user visually compare while you summarise.
Article/video handoff — for tutorials or reviews, open the content in a tab and snapshot the key section so the user can continue watching/reading.
Evidence delivery — when a specific piece of information is critical (price, address, phone, delivery date), snapshot exactly that block and deliver it for verification.

Win11 Visible Browser

概述

Win11 Visible Browser Automation

Safety gate

Positioning

Browser Resource Budget / Tab Hygiene

Cost ladder

Existing tabs are not agent-owned

Preflight budget check

Minimal-tab workflow

Cleanup and archiving

Recommended architecture

Diagnose first

Repair order

Common blockers

Evidence to report

Data extraction

Visual result presentation

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Skill Vetter

self-improving agent

Github