← 返回
未分类 中文

Web Scraper Trae

Opens browser and scrapes webpage content using Playwright. Invoke when user wants to crawl/scrape a webpage, extract data from a website, or get content fro...
使用 Playwright 打开浏览器并抓取网页内容。在用户想要爬取/抓取网页、从网站提取数据或获取内容时调用。
zhengjia626 zhengjia626 来源
未分类 clawhub v1.0.1 1 版本 99778.8 Key: 无需
★ 0
Stars
📥 451
下载
💾 1
安装
1
版本
#latest

概述

Web Scraper Trae

Opens a browser using Playwright and scrapes webpage content.

Prerequisites

npm install playwright
npx playwright install chromium

Usage

When user provides a URL, create a Node.js script to scrape the page:

const { chromium } = require('playwright');

async function scrape(url) {
  const browser = await chromium.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });

  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle', timeout: 60000 });

  const title = await page.title();
  const text = await page.textContent('body');
  const html = await page.content();

  await browser.close();

  return { title, text, html, url };
}

const url = process.argv[2];
if (!url) {
  console.error('请提供 URL 参数');
  process.exit(1);
}

scrape(url).then(result => {
  console.log('=== SCRAPE_RESULT ===');
  console.log(JSON.stringify(result, null, 2));
}).catch(err => {
  console.error('爬取失败:', err.message);
  process.exit(1);
});

Execution

Run the script with:

node scrape.js "https://example.com"

Output Format

Return JSON with:

  • title: Page title
  • text: Visible text content (HTML stripped)
  • html: Full HTML source
  • url: Original URL

Notes

  • Use headless: true for server environments
  • Use waitUntil: 'networkidle' to ensure full page load
  • Set timeout to 60 seconds for slow pages
  • Handle SPA (Single Page Applications) that load content dynamically
  • For pages requiring interaction, use playwright-cli skill instead

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 04:44 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 208 📥 67,340
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 296 📥 139,324
data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,158