概述

Tavily arXiv Paper Fetech Pipeline

Orchestrate a title-to-arXiv metadata workflow for: $ARGUMENTS

Overview

This skill turns a list of paper titles into a reproducible arXiv lookup result:

paper titles
-> Tavily search only
-> canonical arXiv abs URL
-> local arXiv metadata fetch
-> JSONL process log
-> markdown report

The goal is not just to answer once. The goal is to leave behind a reusable work folder that can be resumed or consumed by another workflow.

Inputs

Extract or ask for:

TITLE_LIST: one or more paper titles
WORKDIR: optional working folder

If WORKDIR is omitted, default to:

0_docs/tavily_arxiv_paper_fetech

Accepted title formats:

a single title
multiple titles separated by newlines
markdown bullet lists

Constants

TAVILY_ONLY = true — Tavily is the only allowed search mechanism
SEQUENTIAL_MODE = true — process titles one by one to reduce rate limits
NO_FALLBACK_SEARCH = true — never switch to arXiv API search, guessed arXiv URLs, or another provider
LOW_CONTEXT_MODE = true — append one JSON line per title, then render markdown from JSONL

Work Folder Layout

Write all outputs under WORKDIR:

Artifact	Purpose
----------	---------
`input_titles.md`	normalized input title list
`paper_fetches.jsonl`	one JSON line per processed title
`paper_fetch_report.md`	rendered report from the JSONL

Execution Rule

Process titles in order. Do not parallelize Tavily calls.

If Tavily rate limits:

wait briefly
retry Tavily only
if it still fails, record the error and continue

Phase 0: Initialize

Normalize the input title list.
Create WORKDIR.
Write the normalized title list to WORKDIR/input_titles.md.

Phase 1: Tavily resolution

For each title:

Search Tavily with:

"<paper title>" arXiv

Prefer results in this order:

https://arxiv.org/abs/...
https://arxiv.org/html/...
https://arxiv.org/pdf/...

Accept a result only when the match is reliable:

exact title match after minor normalization
same title with only punctuation, Unicode, or math-formatting differences
arXiv page clearly shows the same paper title

If uncertain, record no_match instead of guessing.

Phase 2: Local arXiv fetch

Once a reliable arXiv URL is known, run:

python3 ".cursor/skills/tavily-arxiv-paper-fetech/scripts/fetch_arxiv_abs.py" "<arxiv-url>"

This returns compact JSON with:

canonical abs URL
arXiv id
title
authors
abstract

Phase 3: Low-context JSONL logging

For each title, immediately append one JSON line to:

WORKDIR/paper_fetches.jsonl

Each line should include at least:

index
input_title
tavily_status
tavily_error
arxiv_url
fetch

Status values:

ok
no_match
error

Phase 4: Render markdown report

After all titles are processed, run:

python3 ".cursor/skills/tavily-arxiv-paper-fetech/scripts/jsonl_to_paper_fetch_md.py" \
  "WORKDIR/paper_fetches.jsonl" \
  "WORKDIR/paper_fetch_report.md"

Output Format

The rendered report should look like:

# Tavily arXiv Paper Fetech Report

## Results

### 1. Original Title
- Status: ok
- arXiv URL: https://arxiv.org/abs/xxxx.xxxxx
- arXiv ID: xxxx.xxxxx
- Resolved Title: ...
- Authors: ...
- Abstract: ...

Key Rules

Never fabricate a paper match.
Never use non-Tavily search for title resolution.
Keep all outputs inside WORKDIR.
Prefer the local helper script over bringing full arXiv page content into context.

Utility Scripts

scripts/fetch_arxiv_abs.py — fetch compact metadata from a known arXiv URL
scripts/jsonl_to_paper_fetch_md.py — render JSONL to markdown

Additional Resources

For copy-paste invocations and expected outputs, see examples.md

Example Invocation

/tavily-arxiv-paper-fetech "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control\nOpenVLA: An Open-Source Vision-Language-Action Model --workdir 0_docs/tavily_arxiv_lookup_run_01"

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 08:31 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)