← 返回
未分类 中文

LiteBrowse

Extracts and summarizes the most relevant webpage passages for focused, low-token research without loading or summarizing the full page.
提取并概括最相关的网页段落,实现低token消耗的聚焦研究,无需加载或概括完整页面。
agitalent agitalent 来源
未分类 clawhub v0.1.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 383
下载
💾 0
安装
1
版本
#latest#search#token-efficient#web

概述

LiteBrowse Skill

Direct access:

  • https://agitalent.github.io/LiteBrowse.md
  • https://github.com/agitalent/agitalent.github.io

Purpose

LiteBrowse is an OpenClaw skill for low-token webpage research.

Use it when:

  • the user wants facts from a specific webpage
  • the page is long or cluttered
  • token cost matters
  • you need the most relevant passages first instead of full-page dumps

Core Rule

Do not load or summarize the full page first.

Always run the local extractor before reasoning on webpage content:

python3 ./scripts/web_relevance_extract.py "<url-or-html-file>" "<query>"

The extractor returns only the most relevant blocks under a fixed character budget.

Use that compact output as the default context for answering.

Required Workflow

  1. Restate the information target as a short query string.
  2. Run:

```bash

python3 ./scripts/web_relevance_extract.py "" "" --top-k 5 --max-chars 2400 --format json

```

  1. Read only the returned blocks.
  2. Answer from those blocks if they are sufficient.
  3. Only if recall is clearly insufficient, rerun with one controlled expansion:
    • increase --top-k
    • or increase --max-chars
    • or narrow / refine the query
  4. Do not jump to raw-page scraping unless the extractor failed.

Budget Discipline

  • Prefer --max-chars 1200 to 2400 for narrow fact lookup.
  • Keep --top-k between 3 and 6 unless the user explicitly asks for breadth.
  • Narrow the query instead of widening the token budget when possible.
  • If the first run already contains the answer, stop there.

Output Discipline

When answering:

  • cite which returned block supports the answer
  • say when the extractor output is incomplete or ambiguous
  • distinguish extracted text from your inference
  • do not claim the full page was reviewed unless it actually was

Examples

Find pricing details from a long page:

python3 ./scripts/web_relevance_extract.py "https://example.com/pricing" "pricing tiers api limits enterprise" --max-chars 1600 --top-k 4 --format text

Find job requirements from a careers page:

python3 ./scripts/web_relevance_extract.py "https://example.com/jobs/ml-engineer" "requirements python llm retrieval location" --max-chars 1800 --top-k 5 --format json

Use a saved HTML file:

python3 ./scripts/web_relevance_extract.py "/tmp/page.html" "refund policy cancellation deadline" --max-chars 1200

Failure Handling

If the page cannot be fetched or parsed:

  • report the fetch or parse failure directly
  • ask for a local HTML copy if network access is blocked
  • do not fabricate an answer from URL guesses

版本历史

共 1 个版本

  • v0.1.1 当前
    2026-03-31 06:31 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

knowledge-management

Summarize

paudyyin
智能摘要工具,自动为长文本、文档、网页生成摘要,提取要点与关键词,支持自定义摘要长度。
★ 958 📥 519,037
business-ops

Jobs Skill

agitalent
注册个人/职位信息,在人才库搜索匹配并追踪新匹配,连接求职者和招聘方。
★ 0 📥 576
knowledge-management

web-tools-guide

user_ec205dbb
MANDATORY before calling web_search, web_fetch, browser, or opencli. Contains required error-handling procedures (web_se
★ 71 📥 161,410