← 返回
未分类 Key 中文

Daily Literature Search

Automated daily literature search system for academic researchers. Performs scheduled searches across PubMed, OpenAlex, and Semantic Scholar with automatic d...
面向学术研究人员的自动化每日文献检索系统。可在PubMed、OpenAlex和Semantic Scholar上进行定时检索,并自动下载和整理文献。
wzr101622 wzr101622 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 571
下载
💾 0
安装
1
版本
#academic#automation#latest#literature#search

概述

Daily Literature Search Skill

Automated literature search system for academic researchers. Performs scheduled searches across multiple databases (PubMed, OpenAlex, Semantic Scholar), automatically deduplicates results, downloads open-access papers, and generates daily reports.

🎯 Use Cases

  • Daily literature monitoring for specific research topics
  • Automated paper collection for literature reviews
  • Stay updated on latest publications in your field
  • Build personal paper library with automatic categorization

📦 Components

1. Core Search Script (daily_literature_search.py)

Main execution script with the following features:

  • Multi-source search: PubMed, OpenAlex, Semantic Scholar
  • Automatic deduplication: By DOI (within batch + against local library)
  • OA detection: Uses Unpaywall API to identify open-access papers
  • Auto-download: Downloads OA papers from PubMed Central or publisher sites
  • Smart categorization: Classifies papers by topic (configurable keywords)
  • Daily reports: Generates Markdown reports with search statistics

2. Upload Analyzer (analyze_uploaded.py)

Analyzes and categorizes manually uploaded papers:

  • Filename-based classification: Uses keyword matching
  • DOI extraction: From filenames and metadata
  • Batch processing: Handles multiple files at once
  • Report generation: Creates categorization summary

⚙️ Configuration

Directory Structure

papers/
├── B-ALL/raw/          # Category 1 (e.g., B-ALL research)
├── MM/raw/             # Category 2 (e.g., Multiple Myeloma)
├── OTHER/raw/          # Other papers
├── daily_search_logs/  # Search logs and reports
└── upload_temp/        # Temporary upload directory

Search Keywords (Customizable)

Edit SEARCH_KEYWORDS in daily_literature_search.py:

SEARCH_KEYWORDS = [
    '"inotuzumab ozogamicin"',
    '"Elranatamab"',
    '"Teclistamab"',
    '"Talquetamab"',
    '"Blinatumomab"',
    '("CAR-T" AND "B-ALL")',
]

Classification Keywords

Edit B_ALL_KEYWORDS and MM_KEYWORDS in analyze_uploaded.py to match your research domains.

🚀 Usage

Manual Execution

# Run daily search
python3 papers/daily_literature_search.py

# Analyze uploaded papers
python3 papers/analyze_uploaded.py

Scheduled Execution (Cron)

Add to crontab for automatic daily searches:

# Daily search at 6:30 AM
30 6 * * * /usr/bin/python3 /path/to/papers/daily_literature_search.py >> /path/to/papers/daily_search_logs/cron.log 2>&1

Configuration Options

ParameterDefaultDescription
---------------------------------
MAX_RESULTS_PER_KEYWORD10Max results per keyword per source
DATE_RANGE_DAYS7Search window (recent N days)
SOURCES["pm", "oa", "s2"]Search databases
USER_EMAILFor polite API access (env var)

📊 Output

Daily Report Example

# 📚 每日文献检索报告
**检索日期:** 2026-03-18

## 📊 检索汇总
| 分类 | 检索到 | 成功下载 | 付费墙 |
|------|--------|---------|--------|
| B-ALL | 28 | 0 | 28 |
| MM | 24 | 0 | 24 |
| 总计 | 53 | 0 | 53 |

## 🔀 去重统计
- 原始检索结果:130 篇
- 去重后文献:110 篇
- 批次内重复:2 篇
- 库中已有:18 篇

File Organization

  • Reports: papers/daily_search_logs/daily_report_YYYY-MM-DD.md
  • Logs: papers/daily_search_logs/daily_search_YYYY-MM-DD.log
  • Papers: papers/{CATEGORY}/raw/{DOI}.pdf

🔧 Advanced Features

1. Library Deduplication

Automatically checks new results against existing library:

  • Scans all category directories for existing DOIs
  • Extracts DOIs from filenames and historical logs
  • Skips papers already in library
  • Reports duplicate statistics

2. Open Access Detection

Uses Unpaywall API to identify OA papers:

is_oa, oa_url = check_open_access(doi)
if is_oa:
    download_paper(oa_url, save_path)

3. PubMed Central Integration

Automatically tries PMC for biomedical papers:

if pmid and str(pmid).isdigit():
    download_from_pubmed(pmid, save_path)

🛠️ Customization Guide

Change Research Topics

  1. Edit SEARCH_KEYWORDS in daily_literature_search.py
  2. Update category names and keywords
  3. Modify directory structure if needed

Add New Categories

  1. Create new directory: papers/NEW_CATEGORY/raw/
  2. Add classification keywords in classify_paper() function
  3. Update report generation to include new category

Integrate with Notification Systems

Add email/Slack/Discord notifications after search completion:

# At end of main()
send_notification(f"Daily search complete: {results['total']} papers found")

📋 Requirements

Python Dependencies

pip install requests
# Most other modules are standard library

API Access (Optional but Recommended)

  • Semantic Scholar API Key: Higher rate limits
  • OpenAlex API Key: Polite pool access
  • Unpaywall: Free, no key needed (email required)

Set environment variables:

export SEMANTIC_SCHOLAR_API_KEY="your-key"
export OPENALEX_API_KEY="your-key"
export USER_EMAIL="your@email.com"

⚠️ Important Notes

  1. Rate Limits: Respect API rate limits, especially without API keys
  2. Storage: Monitor disk space for downloaded PDFs
  3. Copyright: Only download open-access or legally available papers
  4. Email: Set USER_EMAIL for polite API access

🔄 Version History

  • 1.0.0 (2026-03-18): Initial release
  • Multi-source search (PubMed, OpenAlex, Semantic Scholar)
  • Automatic deduplication (batch + library)
  • OA detection and download
  • Smart categorization
  • Daily reports with statistics

🤝 Contributing

To contribute improvements:

  1. Fork the skill repository
  2. Test changes with your own literature search
  3. Submit pull request with description of improvements

📄 License

This skill is provided as-is for academic research purposes. Users are responsible for compliance with publisher terms and copyright laws.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-02 00:24 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

knowledge-management

Baidu web search

ide-rea
使用百度AI搜索引擎(BDSE)进行网络搜索。适用于获取实时信息、文档资料或研究课题。
★ 244 📥 107,274
knowledge-management

Obsidian

steipete
操作 Obsidian 仓库(纯 Markdown 笔记)并通过 obsidian-cli 自动化。
★ 443 📥 104,766
knowledge-management

web-tools-guide

user_ec205dbb
MANDATORY before calling web_search, web_fetch, browser, or opencli. Contains required error-handling procedures (web_se
★ 65 📥 158,515