AI Agent 浏览器自动化技能。通过 Chrome AI Action (CAA) 桥接服务,以 Puppeteer (CDP) 模式编程控制 Chrome 浏览器,支持导航、点击、输入、截图、内容提取、网络拦截、Cookie 管理、PDF 导出等 60+ 操作。
| 场景 | 调用 |
|---|---|
| --- | --- |
| User asks to browse a web page, search, fill forms, extract data | Yes |
| User needs screenshots of a web page | Yes |
| User wants to automate browser interactions | Yes |
| User asks about writing code / debugging (no browser involved) | No |
| 场景 | 调用 |
|---|---|
| --- | --- |
| 用户需要在浏览器中打开网页、搜索、填写表单、提取数据 | 是 |
| 用户需要网页截图 | 是 |
| 用户希望自动化浏览器操作 | 是 |
| 用户问代码/调试相关(不涉及浏览器) | 否 |
> The bridge automatically encodes non-ASCII characters (Chinese, etc.) in the URL. The agent can pass Chinese characters directly in the URL — the bridge will handle encoding.
>
> 桥接会自动编码 URL 中的中文等非 ASCII 字符。智能体可以直接在 URL 中传入中文,桥接会负责编码。
{"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=妻子的浪漫旅行"}}
> Only encode manually if you are in a terminal environment (PowerShell/cmd) where Chinese characters get garbled during input:
>
> 仅在终端环境(PowerShell/cmd)中输入中文乱码时,才需要手动编码:
>
> 1. encodeURIComponent('妻子的浪漫旅行') → %E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C
> 2. Build URL: https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C
| Requirement | Check | Auto-resolve |
|---|---|---|
| --- | --- | --- |
| Chrome / Chromium installed | Detected automatically | No (user must install) |
| Chrome running with CDP | Detected on startup | Yes (auto-launched) |
| Node.js 18+ | node --version | No |
| 要求 | 检查方式 | 自动处理 |
|---|---|---|
| --- | --- | --- |
| 已安装 Chrome / Chromium | 自动检测常用安装路径 | 否(用户需安装) |
| Chrome 以 CDP 模式运行 | 启动时检测 | 是(自动启动) |
| Node.js 18+ | node --version | 否 |
When loaded for the first time, the agent MUST run the startup script. The script runs the bridge as a background child process — the agent does NOT need to manage the process separately.
首次加载时,AI 智能体必须执行以下启动脚本。脚本会自动在后台启动桥接服务,智能体无需单独管理进程。
node <skill_dir>/scripts/startup.js
GET /health on port 9876 → skip if OKnpm list -g chrome-ai-action → installs via npm install -g chrome-ai-action@2.0.2 if missingchrome-ai-action --port 9876, waits for health check| Variable | Default | Description |
|---|---|---|
| --- | --- | --- |
CAA_BRIDGE_PORT | 9876 | Bridge HTTP server port |
CAA_STARTUP_TIMEOUT | 30000 | Max wait for bridge ready (ms) |
CHROME_PATH | auto-detect | Custom Chrome executable path |
CHROME_USER_DATA_DIR | platform-dependent | Chrome profile directory |
Endpoint: http://127.0.0.1:9876/
| Method | Path | Description |
|---|---|---|
| --- | --- | --- |
GET | /health | Health check — returns bridge & CDP status |
GET | /schema | Full action schema (64+ actions) |
POST | / | Execute action(s) |
{"type": "action", "action": "<ACTION>", "params": {...}, "requestId": "optional-id"}
{"type": "batch", "actions": [
{"action": "navigate", "params": {"url": "https://example.com"}},
{"action": "getTitle"}
]}
{"success": true, "data": {...}, "requestId": "req-1", "timestamp": 1712345678901}
{"success": false, "error": {"code": "ACTION_ERROR", "message": "..."}, "requestId": "req-1", "timestamp": 1712345678901}
navigate, goBack, goForward, reload, getUrl, getTitle
getText, getHtml, getLinks, getImages, getHeadings, getMetaTags, getFormFields, getFocusableElements
click, type, pressKey, scroll, scrollIntoView, findElement, focus, hover, select
getValue, getAttribute, getAttributeAll, getBoundingBox, getCookies, getPerformanceMetrics, getSelectedValue, getSelectOptions
evaluate, injectScript, injectCSS
screenshot (PNG/JPEG), getPdf (A4/Letter)
listTabs, newTab, closeTab, switchTab, getCurrentTab
waitForElement, waitForTimeout, waitForNavigation
setCookie, deleteCookie
blockUrls, unblockUrls, mockResponse, getNetworkRequests, clearNetworkRequests
getLocalStorage, setLocalStorage, removeLocalStorage, clearLocalStorage
uploadFile, setInputFiles, downloadFile
getViewport, setViewport
getConsoleLogs, clearConsoleLogs
getAccessibilityTree
ping, connect, disconnect, getBrowserInfo, highlight, dispatchEvent
navigate → go to target URL (encode Chinese in query params)waitForElement → wait for key contentgetText / getHtml / getLinks → understand pageclick / type / pressKey → perform actionsgetText / screenshot / evaluate → get resultsscreenshot → visually verify{"type": "batch", "actions": [
{"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C"}},
{"action": "waitForTimeout", "params": {"ms": 2000}},
{"action": "getText"}
]}
{"type": "batch", "actions": [
{"action": "navigate", "params": {"url": "https://example.com/login"}},
{"action": "waitForElement", "params": {"selector": "input[name=username]", "timeout": 10000}},
{"action": "type", "params": {"selector": "input[name=username]", "value": "myuser"}},
{"action": "type", "params": {"selector": "input[name=password]", "value": "mypassword"}},
{"action": "click", "params": {"selector": "button[type=submit]"}},
{"action": "waitForTimeout", "params": {"ms": 3000}},
{"action": "getCurrentTab"}
]}
| Error Code | Meaning | Resolution |
|---|---|---|
| --- | --- | --- |
CDP_NOT_CONNECTED | Chrome not running with debug port | Bridge auto-launches Chrome, retries every 3s |
ACTION_ERROR | Action execution failed | Check params, use getFocusableElements to find elements first |
INVALID_REQUEST | Malformed request | Check request format |
PARSE_ERROR | JSON parse failure | Send valid JSON |
When you don't know what elements are on a page:
getFocusableElements → all interactive elements (with positions)getFormFields → all form inputs with metadatagetLinks → all links on pagegetHeadings → understand page structuregetText → all visible textreferences/bridge-api.md — Complete API reference with all 64+ actionsreferences/setup-guide.md — Detailed setup and troubleshootingscripts/startup.js — Startup automation script共 2 个版本