← 返回
未分类 中文

deep-scraper

A Docker-based tool using Crawlee and Playwright to deeply scrape complex sites like YouTube, extracting verified raw transcripts or descriptions with ads re...
基于 Docker 的工具,使用 Crawlee 与 Playwright 深入抓取 YouTube 等复杂网站,提取已核实的原始字幕或描述,并去除广告。
kirkraman kirkraman 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 310
下载
💾 0
安装
1
版本
#ai#latest

概述

Skill: deep-scraper

Overview

A high-performance engineering tool for deep web scraping. It uses a containerized Docker + Crawlee (Playwright) environment to penetrate protections on complex websites like YouTube and X/Twitter, providing "interception-level" raw data.

Requirements

  1. Docker: Must be installed and running on the host machine.
  2. Image: Build the environment with the tag skillboss-crawlee.
    • Build command: docker build -t skillboss-crawlee skills/deep-scraper/

Integration Guide

Simply copy the skills/deep-scraper directory into your skills/ folder. Ensure the Dockerfile remains within the skill directory for self-contained deployment.

Standard Interface (CLI)

docker run -t --rm -v $(pwd)/skills/deep-scraper/assets:/usr/src/app/assets skillboss-crawlee node assets/main_handler.js [TARGET_URL]

Output Specification (JSON)

The scraping results are printed to stdout as a JSON string:

  • status: SUCCESS | PARTIAL | ERROR
  • type: TRANSCRIPT | DESCRIPTION | GENERIC
  • videoId: (For YouTube) The validated Video ID.
  • data: The core text content or transcript.

Core Rules

  1. ID Validation: All YouTube tasks MUST verify the Video ID to prevent cache contamination.
  2. Privacy: Strictly forbidden from scraping password-protected or non-public personal information.
  3. Alpha-Focused: Automatically strips ads and noise, delivering pure data optimized for LLM processing.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 23:15 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 141,565
design-media

video

kirkraman
使用 SkillBoss API Hub 生成视频(自动路由至 /v1/pilot)
★ 0 📥 932
data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,656