← 返回
内容创作 Key 中文

Felo Web Extract

Extract web page content from a URL using Felo Web Extract API. Use when users ask to scrape/capture/fetch webpage content, extract article text from URL, co...
使用 Felo Web Extract API 从 URL 提取网页内容。用于用户请求抓取/获取网页内容、提取文章正文等场景。
wangzhiming1999
内容创作 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 714
下载
💾 13
安装
1
版本
#latest

概述

Felo Web Extract Skill

When to Use

Trigger this skill when the user wants to:

  • Extract or scrape content from a webpage URL
  • Get article/main text from a link
  • Convert a webpage to Markdown or plain text
  • Capture readable content from a URL for summarization or processing

Trigger keywords (examples):

  • extract webpage, scrape URL, fetch page content, web extract, url to markdown
  • Explicit: /felo-web-extract, "use felo web extract"
  • Same intent in other languages (e.g. 网页抓取, 提取网页内容) also triggers this skill

Do NOT use for:

  • Real-time search or Q&A (use felo-search)
  • Generating slides (use felo-slides)
  • Local file content (read files directly)

Setup

1. Get API key

  1. Visit felo.ai
  2. Open Settings -> API Keys
  3. Create and copy your API key

2. Configure environment variable

Linux/macOS:

export FELO_API_KEY="your-api-key-here"

Windows PowerShell:

$env:FELO_API_KEY="your-api-key-here"

How to Execute

Option A: Use the bundled script or packaged CLI

Script (from repo):

node felo-web-extract/scripts/run_web_extract.mjs --url "https://example.com/article" [options]

Packaged CLI (after npm install -g felo-ai): same options, with short forms allowed:

felo web-extract -u "https://example.com" [options]
# Short forms: -u (url), -f (format), -t (timeout, seconds), -j (json)

Options:

| Option | Default | Description |

|--------|---------|-------------|

| --url | (required) | Webpage URL to extract |

| --format | markdown | Output format: html, text, markdown |

| --target-selector | - | CSS selector: extract only this element (e.g. article.main, #content) |

| --wait-for-selector | - | Wait for this selector before extracting (e.g. dynamic content) |

| --readability | false | Enable readability processing (main content only) |

| --crawl-mode | fast | fast or fine |

| --timeout | 60000 (script) / 60 (CLI) | Request timeout: script uses milliseconds, CLI uses seconds (e.g. -t 90) |

| --json / -j | false | Print full API response as JSON |

How to write instructions (target_selector + output_format)

When the user wants a specific part of the page or a specific output format, phrase the command like this:

  • Output format: "Extract as text" / "Get markdown" / "Return html" → use --format text, --format markdown, or --format html.
  • Target one element: "Only the main article" / "Just the content inside #main" / "Extract only article.main-content" → use --target-selector "article.main" or the selector they give (e.g. #main, .main-content, article .post).

Examples of user intents and equivalent commands:

| User intent | Command |

|-------------|---------|

| "Extract this page as plain text" | --url "..." --format text |

| "Get only the main content area" | --url "..." --target-selector "main" or article |

| "Extract the div with id=content as markdown" | --url "..." --target-selector "#content" --format markdown |

| "Just the article body, as HTML" | --url "..." --target-selector "article .body" --format html |

Examples:

# Basic: extract as Markdown
node felo-web-extract/scripts/run_web_extract.mjs --url "https://example.com"

# Article-style with readability
node felo-web-extract/scripts/run_web_extract.mjs --url "https://example.com/article" --readability --format markdown

# Raw HTML
node felo-web-extract/scripts/run_web_extract.mjs --url "https://example.com" --format html --json

# Only the element matching a CSS selector (e.g. main article)
node felo-web-extract/scripts/run_web_extract.mjs --url "https://example.com" --target-selector "article.main" --format markdown

# Specific output format + target selector
node felo-web-extract/scripts/run_web_extract.mjs --url "https://example.com" --target-selector "#content" --format text

Option B: Call API with curl

curl -X POST "https://openapi.felo.ai/v2/web/extract" \
  -H "Authorization: Bearer $FELO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "output_format": "markdown", "with_readability": true}'

API Reference (summary)

  • Endpoint: POST /v2/web/extract
  • Base URL: https://openapi.felo.ai. Override with FELO_API_BASE env if needed.
  • Auth: Authorization: Bearer YOUR_API_KEY

Request body (JSON)

| Parameter | Type | Required | Default | Description |

|-----------|------|----------|---------|-------------|

| url | string | Yes | - | Webpage URL to extract |

| crawl_mode | string | No | fast | fast or fine |

| output_format | string | No | html | html, text, markdown |

| with_readability | boolean | No | - | Use readability (main content) |

| with_links_summary | boolean | No | - | Include links summary |

| with_images_summary | boolean | No | - | Include images summary |

| target_selector | string | No | - | CSS selector for target element |

| wait_for_selector | string | No | - | Wait for selector before extract |

| timeout | integer | No | - | Timeout in milliseconds |

| with_cache | boolean | No | true | Use cache |

Response

Success (200):

{
  "code": 0,
  "message": "success",
  "data": {
    "content": { ... }
  }
}

Extracted content is in data.content; structure depends on output_format.

Error codes

| HTTP | Code | Description |

|------|------|-------------|

| 400 | - | Parameter validation failed |

| 401 | INVALID_API_KEY | API key invalid or revoked |

| 500/502 | WEB_EXTRACT_FAILED | Extract failed (server or page error) |

Output Format

On success (script without --json):

  • Print the extracted content only (for direct use or piping).

With --json:

  • Print full API response including code, message, data.

Error response to user:

## Web Extract Failed

- Error: <code or message>
- URL: <requested url>
- Suggestion: <e.g. check URL, retry, or use --timeout>

Important Notes

  • Always check FELO_API_KEY before calling; if missing, return setup instructions.
  • For long articles or slow sites, consider --timeout or timeout in request body.
  • Use output_format: "markdown" and with_readability: true for clean article text.
  • API may cache results; use with_cache: false in body only when fresh content is required (script does not expose this by default).

References

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 12:56 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 857 📥 199,269
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,131
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,405