← 返回
未分类 中文

Lightpanda Scraper

Fast headless browser web scraping using Lightpanda (0.5s page loads, 90x faster than Chromium). Perfect for OSINT recon, link extraction, and content scrapi...
快速无头浏览器网页抓取,使用 Lightpanda(页面加载 0.5 秒,比 Chromium 快 90 倍),适用于 OSINT 侦察、链接提取和内容抓取。
hostilespider hostilespider 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 1
Stars
📥 404
下载
💾 2
安装
1
版本
#latest

概述

Lightpanda Scraper — Fast Headless Browser for OSINT

Blazing fast web scraping using Lightpanda, a Zig-based headless browser. 0.5s per page vs 45s for Chromium/Playwright. Perfect for OSINT recon, link extraction, and content scraping.

Prerequisites

Install Lightpanda binary:

mkdir -p ~/.local/bin
curl -L https://github.com/nicholasgasior/lightpanda-browser/releases/latest/download/lightpanda-linux-x86_64 -o ~/.local/bin/lightpanda
chmod +x ~/.local/bin/lightpanda

Quick Start

# Dump page as markdown
python3 {baseDir}/scripts/lp-scrape.py https://target.com

# Extract all links
python3 {baseDir}/scripts/lp-scrape.py https://target.com --links

# Get raw HTML
python3 {baseDir}/scripts/lp-scrape.py https://target.com --html

Options

  • --links — Extract and categorize all links from the page
  • --html — Dump raw HTML instead of markdown
  • --frames — Include iframe content
  • --js "code" — Evaluate JavaScript on the page
  • --output FILE — Save output to file
  • --wait MODE — Wait condition: networkidle (default), load, domcontentloaded
  • --strip TYPES — Comma-separated resource types to strip: js, css, images
  • --proxy URL — Use proxy (e.g., socks5://127.0.0.1:9050 for Tor)
  • --timeout SECS — Request timeout (default: 30)
  • --serve --port PORT — Start CDP server mode
  • --mcp — Start as MCP server (stdio)

Use Cases

OSINT Recon

# Quick page dump for analysis
python3 {baseDir}/scripts/lp-scrape.py https://target.com > recon.md

# Extract all endpoints from a site
python3 {baseDir}/scripts/lp-scrape.py https://target.com --links | grep -i api

# Crawl with Tor
python3 {baseDir}/scripts/lp-scrape.py https://target.com --proxy socks5://127.0.0.1:9050

Bug Bounty Recon

# Fast subdomain content grab
for sub in api admin dev staging; do
  python3 {baseDir}/scripts/lp-scrape.py https://$sub.target.com --links 2>/dev/null
done

Content Extraction

# Save clean markdown
python3 {baseDir}/scripts/lp-scrape.py https://article.com --output article.md

# JavaScript evaluation
python3 {baseDir}/scripts/lp-scrape.py https://app.com --js "document.querySelectorAll('a').length"

CDP Server Mode

# Start server for programmatic access
python3 {baseDir}/scripts/lp-scrape.py --serve --port 9222
# Then connect with any CDP client

Speed Comparison

ToolPage LoadMemoryBinary Size
--------------------------------------
Lightpanda~0.5s~50MB~100MB
Chromium/Playwright~45s~500MB~300MB
curl/wget~0.3s~5MBN/A

Lightpanda gives you Playwright-like page rendering at near-curl speeds. The catch: no complex JS interactions (use Playwright for those).

Notes

  • Lightpanda is in active development; some complex SPAs may not render perfectly
  • For authenticated sessions or complex JS interactions, use Playwright instead
  • Binary is ~100MB Zig-compiled native code, runs on Linux x86_64
  • Supports HTTP/SOCKS5 proxies for Tor or VPN routing

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-03 07:37 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,407
it-ops-security

Subdomain Takeover Checker

hostilespider
检测子域名的潜在接管漏洞,识别悬空DNS记录(指向未注册服务如GitHub Pages、Heroku、AWS等)
★ 0 📥 778
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 140,932