← 返回
效率工具 Key 中文

PaperBanana

Generate publication-quality academic diagrams from paper methodology text
从论文方法文本生成出版级学术图表
dwzhu-pku
效率工具 clawhub v0.1.0 1 版本 99784.5 Key: 需要
★ 1
Stars
📥 906
下载
💾 123
安装
1
版本
#latest

概述

PaperBanana

Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.

Environment Setup

cd <repo-root>
uv pip install -r requirements.txt

Set your API key via environment variable or in configs/model_config.yaml.

Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:

export OPENROUTER_API_KEY="sk-or-v1-..."

Option 2: Google API key — direct access to Gemini API:

export GOOGLE_API_KEY="your-key-here"

If both keys are configured, OpenRouter is used by default.

Usage

python skill/run.py \
  --content "METHOD_TEXT" \
  --caption "FIGURE_CAPTION" \
  --task diagram \
  --output output.png

Parameters

ParameterRequiredDefaultDescription
-------------------------------------------
--contentYes*Method section text to visualize
--content-fileYes*Path to a file containing the method text (alternative to --content)
--captionYesFigure caption or visual intent
--taskNodiagramTask type: diagram
--outputNooutput.pngOutput image file path
--aspect-ratioNo21:9Aspect ratio: 21:9, 16:9, or 3:2
--max-critic-roundsNo3Maximum critic refinement iterations
--num-candidatesNo10Number of parallel candidates to generate
--retrieval-settingNoautoRetrieval mode: auto, manual, random, or none
--main-model-nameNogemini-3.1-pro-previewMain model for VLM agents. Provider auto-detected from configured API key
--image-gen-model-nameNogemini-3.1-flash-image-previewModel for image generation. Also supports gemini-3-pro-image-preview
--exp-modeNodemo_fullPipeline: demo_full (with Stylist) or demo_planner_critic (without Stylist)

*One of --content or --content-file is required.

When --num-candidates > 1, output files are named _0.png, _1.png, etc.

Output

The absolute path of each saved image is printed to stdout, one per line.

Examples

Diagram

python skill/run.py \
  --content "We propose a transformer-based encoder-decoder architecture. The encoder consists of 12 self-attention layers with residual connections. The decoder uses cross-attention to attend to encoder outputs and generates the target sequence autoregressively." \
  --caption "Figure 1: Overview of the proposed transformer architecture" \
  --task diagram \
  --output architecture.png

Important Notes

  • Runtime: A single candidate typically takes 3-10 minutes depending on model and network conditions. With the default 10 candidates running in parallel, expect ~10-30 minutes total. Plan accordingly.
  • API calls: Each candidate involves multiple LLM calls (Retriever + Planner + Stylist + Visualizer + up to 3 Critic rounds). Candidates run in parallel for efficiency.
  • Image generation: The Visualizer agent calls an image generation model (Gemini Image) to render diagrams.

About

PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:

> PaperBanana: Automating Academic Illustration for AI Scientists

> Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon

> arXiv:2601.23265

The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-30 23:32 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Nano Pdf

steipete
使用nano-pdf CLI通过自然语言指令编辑PDF
★ 275 📥 114,802
productivity

Weather

steipete
获取当前天气和预报(无需API密钥)
★ 445 📥 226,230
productivity

Obsidian

steipete
操作 Obsidian 仓库(纯 Markdown 笔记)并通过 obsidian-cli 自动化。
★ 432 📥 103,752