← 返回
未分类

Babata Browser

Babata Browser v3.1 — Lightweight browser automation with CloakBrowser anti-detection (C++-level stealth Chromium). Scan-first, act-second. Playwright fallba...
Babata Browser v3.1 — 轻量级浏览器自动化,配备 CloakBrowser 反检测(C++ 级别隐匿 Chromium)。先扫描后操作。Playwright 回退...
meta-evo-creator meta-evo-creator 来源
未分类 clawhub v3.1.0 2 版本 100000 Key: 无需
★ 0
Stars
📥 519
下载
💾 1
安装
2
版本
#anti-detection#browser#browser-automation#chinese#latest#playwright#scraping

概述

Babata Browser 🦞 v3.1

> Lightweight browser automation with CloakBrowser anti-detection (C++-level stealth Chromium). Playwright fallback.

> reCAPTCHA v3: 0.9 | Cloudflare Turnstile: PASS | 30/30 bot tests


Backend (v3.1)

BackendreCAPTCHA v3Cloudflaregov sitesDefault
:--------:-----------::----------::---------::-------:
CloakBrowser0.9✅ PASS✅ Strong✅ auto
Playwright0.1❌ FAIL⚠️ Blockedfallback
  • backend='auto' (default) → CloakBrowser if available → Playwright fallback
  • backend='cloakbrowser' → force CloakBrowser, error if unavailable
  • backend='playwright' → force Playwright

When to Use ✅

Gov policy sites (JS-rendered) / SPA data collection / Form filling / Screenshot evidence / web_fetch returns <500 chars / WeChat articles

When NOT to Use ❌

Static pages → web_fetch / API queries → fetch() / Text search → web_search (Tavily)


Install

# CloakBrowser (recommended — anti-detection)
pip install cloakbrowser
cd skills/babata-browser && pip install -e .

# Playwright fallback (already installed)
pip install playwright && python -m playwright install chromium

Core Principles

1. Scan First (from smart-browser best practice)

Never snapshot blindly. Find interactive elements with JS first:

browser.execute_js(page, """
  (() => {
    const els = document.querySelectorAll('a[href], button, input, select, textarea, [role=button], [onclick]');
    return [...els]
      .filter(el => { const r = el.getBoundingClientRect(); return r.width > 0 && r.height > 0 && r.top < window.innerHeight; })
      .map((el, i) => ({ i, tag: el.tagName.toLowerCase(), text: (el.innerText || el.value || '').trim().slice(0, 50), id: el.id, href: el.href?.slice(0, 80) }));
  })()
""")

2. Click by Text, Not CSS

browser.click(page, text='Latest Policy')  # ✅ Stable
# ❌ browser.click(page, selector='#content > div:nth-child(3) > a')

3. Smart Wait (Not Fixed Sleep)

browser.execute_js(page, """
  new Promise(resolve => { let tries = 0;
    const t = setInterval(() => {
      if (document.body.innerText.includes('expected text') || ++tries > 30) { clearInterval(t); resolve(tries < 30 ? 'found' : 'timeout'); }
    }, 500);
  })
""")

4. Layered Extraction

Accessibility Snapshot → find target region → get_text(selector=region)
  → still unclear? → screenshot (last resort)

Usage

Quick (Natural Language)

from scripts.babata_browser import execute_task
result = execute_task('Open https://example.com, search policy, extract top 5 titles')

Precise Control

from scripts.babata_browser import BabataBrowser
browser = BabataBrowser(headless=True); browser.start(); page = browser.new_page()
browser.goto(page, 'https://example.com')
browser.click(page, text='Agree')
browser.fill(page, 'input[name="q"]', 'query')
text = browser.get_text(page)
browser.screenshot(page, path='evidence.png')
browser.stop()  # ⚠️ Always close

CLI

babata-browser 'Open GitHub Trending, extract top projects' --json

Capabilities

ActionDescriptionUse Case
---------------------
goto(url)NavigateOpen target page
get_text(sel?)Extract text (scoped optional)Page body
get_links(limit)All linksNavigation, search results
click(text=, sel=)Click by text or CSSPagination, submit, nav
fill(sel, val)Fill inputSearch box, login form
screenshot(path)Full-page screenshotEvidence, visual verify
scroll(n)ScrollLazy-loaded content
execute_js(code)Run JSElement scan, smart wait
extract_table(sel)Table to dict listData tables

Errors

ErrorFix
:----:-----
ERR_TIMED_OUTIncrease timeout: goto(page, url, timeout=60000)
CloudFlare "Just a moment..."Blocked — switch data source
Element not foundScan first, click by text not CSS
page.click: TimeoutUse smart wait, not fixed sleep
Orphaned browser processAlways call stop() in try/finally

Security

  • Never enter real credentials on untrusted sites
  • Check content before screenshot (avoid capturing sensitive data)
  • Default: headless=True
  • Mandatory: stop() after use

vs Playwright MCP

Playwright MCPbabata-browser v3.0
---------
DependenciesNode + npx + ChromiumPython + Playwright + Chromium
AI decisionsMCP clientBabata LLM direct
Token efficiencyMCP protocol overheadCLI, zero protocol cost
Best forLong-running automationHigh-frequency interaction, sampling

Changelog

VersionDateChanges
:----:----------
v2.12026-05-11Smart scan JS, smart wait, layered extraction, error table, security rules
v3.02026-05-11Full English localization, streamlined structure, version bump

版本历史

共 2 个版本

  • v3.1.0 当前
    2026-05-21 13:18 安全 安全
  • v2.0.0
    2026-05-09 17:14 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Mev Engine

meta-evo-creator
MEV Engine v8.0 ⚔️ — 基于OpenClaw原生。MEV五层指导思想+交付约定+教训生命周期,全部利用OpenClaw内置功能,零自定义脚本。
★ 1 📥 898

Chinese Handwriting Ocr

meta-evo-creator
中文OCR双引擎:PaddleOCR(文档OCR)和RapidOCR(手写OCR),按需切换
★ 0 📥 373

IMA知识库上传

meta-evo-creator
Markdown文件上传至IMA知识库流程:方法A(推荐)笔记import_doc → add_knowledge(仅支持Markdown,最简);方法B:create_media → COS上传 → add_knowledge(支持任意文
★ 0 📥 284