← 返回
未分类 中文

paper-cluster-survey-v2-2

Extract structured paper records from one or more local PDFs, arXiv links, DOI links, or general paper URLs, then classify the papers and write an academic s...
从本地PDF、arXiv、DOI或普通论文链接中提取结构化论文记录,随后分类并撰写学术摘要。
huang888596 huang888596 来源
未分类 clawhub v2.2.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 584
下载
💾 0
安装
1
版本
#latest

概述

Paper Cluster Survey V2.2

Overview

Turn raw paper URLs and PDFs into usable review inputs. Extract structured metadata and text evidence first, then classify the papers, produce a classification table, and write a review that follows common academic survey conventions instead of a rigid fill-in-the-blanks template.

Workflow

1. Normalize the source set

  • Accept multiple local PDF paths, arXiv URLs, DOI URLs, and general paper URLs.
  • Use scripts/normalize-sources.mjs when the source set is mixed or should be stored as a reusable manifest.
  • Preserve the original source string for traceability.

2. Extract paper records before reasoning

  • Use scripts/extract-paper-records.mjs to turn PDFs and URLs into structured records before classification.
  • The extraction pass should gather as much of the following as possible:
  • title
  • authors
  • year
  • venue
  • abstract
  • task
  • method
  • datasets
  • metrics
  • main_contribution
  • limitations
  • source
  • extraction_notes
  • Treat extracted records as the primary context for classification and survey drafting.
  • If important fields are missing, only fall back to direct source reading for the specific missing details.

Read extraction-pipeline.md when deciding how much to trust the extracted fields and when to re-open the raw source.

3. Verify evidence quality

  • Do not classify from titles alone when abstract or body text is available.
  • Prefer abstract, introduction, and method section.
  • Mark uncertain inferences explicitly.
  • If the extractor had to fall back to weak methods, keep claims conservative.

4. Design the classification scheme

  • Produce a classification scheme before writing the review.
  • Prefer task-based categories first.
  • If tasks are too similar, classify by method family.
  • Use application-domain categories only when they best explain the corpus.
  • Keep the taxonomy shallow unless the corpus is large.
  • Assign one primary category to each paper unless the user explicitly wants multi-label grouping.

Read taxonomy-guidelines.md when the category design is ambiguous.

5. Output the classification table

  • Always provide one classification table before the review.
  • Include:
  • paper
  • year
  • category
  • rationale
  • evidence used
  • Keep rationales brief and evidence-based.

6. Decide the review shape

Default rule:

  • Write one integrated literature review for the entire corpus after the classification table.

Exception:

  • If the user explicitly asks for "each category write a survey", "分别写综述", "per-category review", or equivalent, write separate review sections for each category.

7. Write the review in academic survey style

The review must read like a normal survey paper, not a bullet summary dump.

  • Use a concise academic title.
  • Include an abstract when the output is formal enough to justify it.
  • Include keywords when they help position the review.
  • Use an introduction that explains background, significance, scope, source selection, and the organizing logic of the review.
  • Organize the main body by the most defensible basis for the corpus:
  • chronology
  • research themes
  • method families
  • viewpoints or schools
  • End with a conclusion or concluding discussion.
  • Add future directions, outlook, or open problems when the corpus supports them.
  • List references in GB/T 7714 style when bibliographic data is available.

Typical sections in a strong review are:

  • title
  • abstract
  • keywords
  • introduction
  • themed main sections
  • discussion, conclusion, or both
  • future directions or open problems when useful
  • references

Not every output needs every section. Match the structure to the user's request, the corpus size, and the field while staying recognizably review-like.

Read review-paper-style.md when drafting the prose review or choosing section structure.

8. Keep the prose review-like

  • Prefer connected academic prose over bullet dumps.
  • Use tables only to support comparison, not replace the review.
  • Do not invent datasets, metrics, or reference details.
  • If extracted metadata is incomplete, keep partial references and state what is missing.

Output Contract

Return results in this order unless the user asks otherwise:

  1. Corpus summary
  2. Classification scheme
  3. Classification table
  4. Formal review article
  5. References

If the user wants structured output, read output-schema.md.

Bundled Scripts

scripts/normalize-sources.mjs

  • Normalize mixed PDF and URL inputs into a JSON manifest.
  • Use when the source set is large, mixed, or should be reused.

scripts/extract-paper-records.mjs

  • Fetch URLs, resolve likely paper metadata, and extract paper text evidence from URLs or PDFs.
  • Prefer this script before asking the model to reason over a large source set.
  • Use its output as the main context object for classification and review drafting.

scripts/render-formal-review-template.mjs

  • Render a flexible academic-review scaffold from structured paper records.
  • Default to one integrated review.
  • Use --per-category only when the user explicitly asks for separate category reviews.

Quality Bar

  • Run extraction before classification unless the user already gave structured paper records.
  • Keep classification and review consistent with extracted evidence.
  • Use raw source re-reading only to fill important gaps.
  • If the extractor had to rely on weak fallbacks, say so.

版本历史

共 1 个版本

  • v2.2.0 当前
    2026-05-02 11:03 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

knowledge-management

Summarize

paudyyin
智能摘要工具,自动为长文本、文档、网页生成摘要,提取要点与关键词,支持自定义摘要长度。
★ 956 📥 517,385
knowledge-management

Baidu web search

ide-rea
使用百度AI搜索引擎(BDSE)进行网络搜索。适用于获取实时信息、文档资料或研究课题。
★ 244 📥 107,079
knowledge-management

web-tools-guide

user_ec205dbb
MANDATORY before calling web_search, web_fetch, browser, or opencli. Contains required error-handling procedures (web_se
★ 61 📥 157,278