概述

Chrome AI Action — Browser Automation Skill

AI Agent 浏览器自动化技能。通过 Chrome AI Action (CAA) 桥接服务，以 Puppeteer (CDP) 模式编程控制 Chrome 浏览器，支持导航、点击、输入、截图、内容提取、网络拦截、Cookie 管理、PDF 导出等 60+ 操作。

When to Use / 何时使用

场景	调用
---	---
User asks to browse a web page, search, fill forms, extract data	Yes
User needs screenshots of a web page	Yes
User wants to automate browser interactions	Yes
User asks about writing code / debugging (no browser involved)	No

场景	调用
---	---
用户需要在浏览器中打开网页、搜索、填写表单、提取数据	是
用户需要网页截图	是
用户希望自动化浏览器操作	是
用户问代码/调试相关（不涉及浏览器）	否

⚠️ Chinese URL Encoding / 中文 URL 说明

> The bridge automatically encodes non-ASCII characters (Chinese, etc.) in the URL. The agent can pass Chinese characters directly in the URL — the bridge will handle encoding.

>

> 桥接会自动编码 URL 中的中文等非 ASCII 字符。智能体可以直接在 URL 中传入中文，桥接会负责编码。

{"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=妻子的浪漫旅行"}}

> Only encode manually if you are in a terminal environment (PowerShell/cmd) where Chinese characters get garbled during input:

>

> 仅在终端环境（PowerShell/cmd）中输入中文乱码时，才需要手动编码：

>

> 1. encodeURIComponent('妻子的浪漫旅行') → %E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C

> 2. Build URL: https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C

Prerequisites / 前提条件

Requirement	Check	Auto-resolve
---	---	---
Chrome / Chromium installed	Detected automatically	No (user must install)
Chrome running with CDP	Detected on startup	Yes (auto-launched)
Node.js 18+	`node --version`	No

要求	检查方式	自动处理
---	---	---
已安装 Chrome / Chromium	自动检测常用安装路径	否（用户需安装）
Chrome 以 CDP 模式运行	启动时检测	是（自动启动）
Node.js 18+	`node --version`	否

Startup Protocol / 启动协议

When loaded for the first time, the agent MUST run the startup script. The script runs the bridge as a background child process — the agent does NOT need to manage the process separately.

首次加载时，AI 智能体必须执行以下启动脚本。脚本会自动在后台启动桥接服务，智能体无需单独管理进程。

node <skill_dir>/scripts/startup.js

What it does / 执行流程

Check if bridge is already running: GET /health on port 9876 → skip if OK
Ensure npm package installed: npm list -g chrome-ai-action → installs via npm install -g chrome-ai-action@2.0.2 if missing
Start the bridge: chrome-ai-action --port 9876, waits for health check
Auto-launch Chrome: If Chrome not running with CDP, the bridge starts it automatically (cross-platform)

Environment Variables / 环境变量

Variable	Default	Description
---	---	---
`CAA_BRIDGE_PORT`	`9876`	Bridge HTTP server port
`CAA_STARTUP_TIMEOUT`	`30000`	Max wait for bridge ready (ms)
`CHROME_PATH`	auto-detect	Custom Chrome executable path
`CHROME_USER_DATA_DIR`	platform-dependent	Chrome profile directory

API Protocol / 通信协议

Endpoint: http://127.0.0.1:9876/

Endpoints / 接口地址

Method	Path	Description
---	---	---
`GET`	`/health`	Health check — returns bridge & CDP status
`GET`	`/schema`	Full action schema (64+ actions)
`POST`	`/`	Execute action(s)

Request Format / 请求格式

{"type": "action", "action": "<ACTION>", "params": {...}, "requestId": "optional-id"}

Batch Request / 批量请求

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://example.com"}},
  {"action": "getTitle"}
]}

Response Format / 响应格式

{"success": true, "data": {...}, "requestId": "req-1", "timestamp": 1712345678901}

Error Response / 错误响应

{"success": false, "error": {"code": "ACTION_ERROR", "message": "..."}, "requestId": "req-1", "timestamp": 1712345678901}

Available Actions (64+) / 可用操作 (64+)

Navigation / 导航

navigate, goBack, goForward, reload, getUrl, getTitle

Page Content / 页面内容

getText, getHtml, getLinks, getImages, getHeadings, getMetaTags, getFormFields, getFocusableElements

Element Interaction / 元素交互

click, type, pressKey, scroll, scrollIntoView, findElement, focus, hover, select

Data Extraction / 数据提取

getValue, getAttribute, getAttributeAll, getBoundingBox, getCookies, getPerformanceMetrics, getSelectedValue, getSelectOptions

JavaScript / JS 执行

evaluate, injectScript, injectCSS

Screenshot & Export / 截图与导出

screenshot (PNG/JPEG), getPdf (A4/Letter)

Tab Management / 标签页管理

listTabs, newTab, closeTab, switchTab, getCurrentTab

Waiting / 等待

waitForElement, waitForTimeout, waitForNavigation

Cookie Management / Cookie 管理

setCookie, deleteCookie

Network Interception / 网络拦截

blockUrls, unblockUrls, mockResponse, getNetworkRequests, clearNetworkRequests

Storage / 本地存储

getLocalStorage, setLocalStorage, removeLocalStorage, clearLocalStorage

File Operations / 文件操作

uploadFile, setInputFiles, downloadFile

Viewport / 视口

getViewport, setViewport

Console / 控制台日志

getConsoleLogs, clearConsoleLogs

Accessibility / 无障碍

getAccessibilityTree

Utility / 工具

ping, connect, disconnect, getBrowserInfo, highlight, dispatchEvent

Typical Workflow / 典型工作流

Navigate: navigate → go to target URL (encode Chinese in query params)
Wait: waitForElement → wait for key content
Read: getText / getHtml / getLinks → understand page
Interact: click / type / pressKey → perform actions
Extract: getText / screenshot / evaluate → get results
Confirm: screenshot → visually verify

Example: Search Baidu with Chinese / 百度搜索中文示例

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C"}},
  {"action": "waitForTimeout", "params": {"ms": 2000}},
  {"action": "getText"}
]}

Example: Full Login Flow / 登录流程示例

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://example.com/login"}},
  {"action": "waitForElement", "params": {"selector": "input[name=username]", "timeout": 10000}},
  {"action": "type", "params": {"selector": "input[name=username]", "value": "myuser"}},
  {"action": "type", "params": {"selector": "input[name=password]", "value": "mypassword"}},
  {"action": "click", "params": {"selector": "button[type=submit]"}},
  {"action": "waitForTimeout", "params": {"ms": 3000}},
  {"action": "getCurrentTab"}
]}

Error Handling / 错误处理

Error Code	Meaning	Resolution
---	---	---
`CDP_NOT_CONNECTED`	Chrome not running with debug port	Bridge auto-launches Chrome, retries every 3s
`ACTION_ERROR`	Action execution failed	Check params, use `getFocusableElements` to find elements first
`INVALID_REQUEST`	Malformed request	Check request format
`PARSE_ERROR`	JSON parse failure	Send valid JSON

Discovery Tips / 探测提示

When you don't know what elements are on a page:

getFocusableElements → all interactive elements (with positions)
getFormFields → all form inputs with metadata
getLinks → all links on page
getHeadings → understand page structure
getText → all visible text

References / 参考资料

references/bridge-api.md — Complete API reference with all 64+ actions
references/setup-guide.md — Detailed setup and troubleshooting
scripts/startup.js — Startup automation script

版本历史

共 2 个版本

v1.0.1 更新依赖下载指定版本号当前

2026-05-11 00:32 安全安全
v1.0.0 Initial release

2026-05-10 23:21 安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)