概述

带脚本的论文摘要生成

Overview

Use this skill when arXiv paper URLs or local paper files need deterministic preprocessing before the four-stage paper summarization workflow runs.

The bundled script downloads arXiv PDFs to local storage when URLs are provided, then handles extraction and cleaning.

Do not parse paper web pages or use HTML content as the paper source.

After preprocessing, run three independent generation stages from the same cleaned paper text, then a fourth verification stage that evaluates all three generated outputs against the original text.

Canonical inputs

Normalize the request into:

language
paperurls for arXiv inputs
paperfiles

Treat empty string, [], null, None, missing field, or blank list as empty.

Workflow

If both paperurls and paperfiles are empty, return an error immediately.
Run the preprocessing script:

python scripts/process_papers.py --language "" --paperurls '' --paperfiles '' --output-dir ./runs/paper-summary

Read manifest.json in the output directory.
For each successful item, read the extracted_text_path file and treat its contents as cleaned_text.
Generate these three sections separately from the same cleaned_text:

summary version
detailed version
contribution extraction

After the three sections are complete, run quality judgment using:

original cleaned paper text
summary version
detailed version
contribution extraction

Merge the outputs using references/output-template.md.

Preprocessing rules

The script does deterministic preprocessing only.

Treat URL inputs as arXiv identifiers, arXiv abstract URLs, or arXiv PDF URLs that must resolve to a PDF download.

Do not attempt webpage parsing, HTML extraction, or generic site scraping.

Do not use the script's previews as a substitute for the full extracted text.

Treat manifest failures, partial extraction notes, or unsupported formats as evidence that the source may be incomplete.

Generation-stage rules

Consult references/prompts.md for the exact Dify-style prompt patterns and variable mapping.

Summary version

Generate in the requested language.

Must include when available:

original title
research background or pain point
core method name
at least one key experimental number

If no explicit experimental result is provided in the source, state 原文未提供具体实验数据 or the equivalent in the requested language.

Do not add praise or filler.

Detailed version

Generate in the requested language.

Use this exact structure:

### 1. 背景与动机
### 2. 核心方法
### 3. 实验设置
### 4. 主要结果与消融实验
### 5. 局限性（若有）

Only include content supported by the extracted text.

Contribution extraction

Generate in the requested language.

Each contribution must be an independent innovation point, not an experimental observation.

Each one must include source-grounded support evidence without inventing citations or page numbers.

Quality judgment

Run this only after the three generated sections exist.

Evaluate summary, detailed, and contribution outputs separately against the original cleaned text.

For each one, provide a 1-5 score and a concrete error list.

Manifest-aware confidence rules

Downgrade confidence or mention extraction risk when the manifest shows:

download failure
arxiv source normalization failure
partial parsing
fallback decoding
missing quantitative evidence
unreadable pdf or docx parsing problems

Non-negotiable constraints

Never fabricate paper content missing from the extracted text.
Keep the three generation stages independent before the quality stage.
Preserve the requested language.
Keep different papers separate unless the user explicitly asks for a comparison.

Resources

scripts/process_papers.py: normalize arXiv inputs, download PDFs or read local files, extract text, clean text, and emit manifest.json
references/prompts.md: exact Dify-style prompt logic and variable mapping
references/output-template.md: final response template
references/script-usage.md: script I/O and manifest field definitions

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 06:46 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

paper-summary-scripted

概述

带脚本的论文摘要生成

Overview

Canonical inputs

Workflow

Preprocessing rules

Generation-stage rules

Summary version

Detailed version

Contribution extraction

Quality judgment

Manifest-aware confidence rules

Non-negotiable constraints

Resources

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Skill Vetter

Self-Improving + Proactive Agent

ontology