← 返回
内容创作

Zoomin Docs Portal Scraper Tool

Scrape documentation content from Zoomin Software portals using Playwright browser automation to handle dynamic content loading. Use when standard web fetchi...
利用 Playwright 浏览器自动化技术,从 Zoomin Software 门户抓取文档内容,以处理动态内容加载。适用于标准网页抓取失效的场景。
recklessop
内容创作 clawhub v1.0.2 1 版本 100000 Key: 无需
★ 0
Stars
📥 1,194
下载
💾 37
安装
1
版本
#latest

概述

Zoomin Scraper Skill

This skill provides a mechanism to robustly scrape content from documentation portals powered by Zoomin Software. It leverages Playwright to launch a headless Chromium browser, execute JavaScript, wait for dynamic content to load, and then extract the rendered text from the main article body.

Usage

To use this skill, you need to provide a file containing a list of URLs, one URL per line. The skill will then process each URL, saving the extracted content to a specified output directory.

Prerequisites (Manual Setup)

This skill relies on Playwright. Before using this skill for the first time on a new system, you must manually install Playwright and its browser binaries by running the following commands in your terminal:

pip install playwright
playwright install chromium

These commands should be executed within the virtual environment you intend to use for this skill.

Running the Scraper

To run the scraper, you will invoke the run_scraper.sh script, which is located within this skill's scripts/ directory. This wrapper script will activate your specified Python virtual environment before executing the main Python Playwright script.

Parameters for run_scraper.sh:

  • urls_file: The path to a text file containing the URLs to scrape, one URL per line.
  • output_directory (optional): The directory where the scraped content will be saved. If not provided, it defaults to scraped_docs_output.
  • venv_path: The absolute path to your Python virtual environment (e.g., /home/justin/scraper/.env).

Example:

Assuming your list of URLs is in path/to/urls.txt, you want to save the output to my_scraped_docs/, and your virtual environment is at path/to/my_venv:

zoomin-scraper urls_file="path/to/urls.txt" output_directory="my_scraped_docs" venv_path="path/to/my_venv"

The script will launch a headless Chromium browser, navigate to each URL, wait for the main content to load (specifically targeting

), and then save the extracted text. It includes a user agent to mimic a regular browser and a small delay between requests to be polite to the server.

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-03-29 06:33 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,148
content-creation

YouTube

byungkyu
使用托管OAuth集成YouTube Data API,支持搜索视频、管理播放列表、获取频道数据及评论互动,适用于用户需要时使用此技能。
★ 142 📥 41,033
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,434