← 返回
内容创作

Parallel Extract

URL content extraction via Parallel API. Extracts clean markdown from webpages, articles, PDFs, and JS-heavy sites. Use for reading specific URLs with LLM-re...
使用Parallel API从网页、文章、PDF和JS密集站点中提取干净Markdown,用于LLM读取指定URL内容。
normallygaussian
内容创作 clawhub v1.0.3 2 版本 99871.8 Key: 无需
★ 0
Stars
📥 2,338
下载
💾 3
安装
2
版本
#latest

概述

Parallel Extract

Extract clean, LLM-ready content from URLs. Handles webpages, articles, PDFs, and JavaScript-heavy sites that need rendering.

When to Use

Trigger this skill when the user asks for:

  • "read this URL", "fetch this page", "extract from..."
  • "get the content from [URL]"
  • "what does this article say?"
  • Reading PDFs, JS-heavy pages, or paywalled content
  • Getting clean markdown from messy web pages

Use Search to discover; use Extract to read.

Quick Start

parallel-cli extract "https://example.com/article" --json

CLI Reference

Basic Usage

parallel-cli extract "<url>" [options]

Common Flags

FlagDescription
-------------------
--url ""URL to extract (repeatable, max 10)
--objective ""Focus extraction on specific content
--jsonOutput as JSON
--excerpts / --no-excerptsInclude relevant excerpts (default: on)
--full-content / --no-full-contentInclude full page content
--excerpts-max-chars NMax chars per excerpt
--excerpts-max-total-chars NMax total excerpt chars
--full-max-chars NMax full content chars
-o Save output to file

Examples

Basic extraction:

parallel-cli extract "https://example.com/article" --json

Focused extraction:

parallel-cli extract "https://example.com/pricing" \
  --objective "pricing tiers and features" \
  --json

Full content for PDFs:

parallel-cli extract "https://example.com/whitepaper.pdf" \
  --full-content \
  --json

Multiple URLs:

parallel-cli extract \
  --url "https://example.com/page1" \
  --url "https://example.com/page2" \
  --json

Default Workflow

  1. Search with an objective + keyword queries
  2. Inspect titles/URLs/dates; choose the best sources
  3. Extract the specific pages you need (top N URLs)
  4. Answer using the extracted excerpts/content

Best-Practice Prompting

Objective

When extracting, provide context:

  • What specific information you're looking for
  • Why you need it (helps focus extraction)

Good: --objective "Find the installation steps and system requirements"

Poor: --objective "Read the page"

Response Format

Returns structured JSON with:

  • url — source URL
  • title — page title
  • excerpts[] — relevant text excerpts (if enabled)
  • full_content — complete page content (if enabled)
  • publish_date — when available

Output Handling

When turning extracted content into a user-facing answer:

  • Keep content verbatim — do not paraphrase unnecessarily
  • Extract ALL list items exhaustively
  • Strip noise: nav menus, footers, ads, "click here" links
  • Preserve all facts, names, numbers, dates, quotes
  • Include URL + publish_date for transparency

Running Out of Context?

For long conversations, save results and use sessions_spawn:

parallel-cli extract "<url>" --json -o /tmp/extract-<topic>.json

Then spawn a sub-agent:

{
  "tool": "sessions_spawn",
  "task": "Read /tmp/extract-<topic>.json and summarize the key content.",
  "label": "extract-summary"
}

Error Handling

Exit CodeMeaning
--------------------
0Success
1Unexpected error (network, parse)
2Invalid arguments
3API error (non-2xx)

Prerequisites

Requires parallel-cli (installed and authenticated). If parallel-cli --version fails, or if a later command fails with an authentication error, tell the user to see https://docs.parallel.ai/integrations/cli and stop.

References

版本历史

共 2 个版本

  • v1.0.3 当前
    2026-05-08 12:10 安全 安全
  • v1.0.0
    2026-03-28 17:09 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,502
developer-tools

Parallel Search

normallygaussian
通过Parallel API实现的AI驱动网络搜索,返回排序结果及LLM优化摘要。用于最新研究、事实核查和领域限定的搜索。
★ 0 📥 2,332
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 860 📥 199,906