← 返回
未分类

jarvis-research-idea-generator-pro

Generate and evaluate CS/AI/ML research ideas from a research domain or seed ideas. Use when the user asks for paper-grounded research questions, literature-based ideation, recent top-conference trend analysis, multi-agent idea brainstorming, or iterative research idea critique.
Generate and evaluate CS/AI/ML research ideas from a research domain or seed ideas. Use when the user asks for paper-grounded research questions, literature-based ideation, recent top-conference trend analysis, multi-agent idea brainstorming, or iterative research idea critique.
user_5afe0d61
未分类 community v1.0.1 2 版本 100000 Key: 无需
★ 0
Stars
📥 64
下载
💾 0
安装
2
版本
#latest

概述

Research Idea Generator Agent Skill

Purpose

Use this skill when the user wants to generate research ideas from a research domain, optionally with seed ideas. The skill retrieves recent top-conference papers, builds a domain development tree, discovers open problems through multi-agent brainstorming, ranks top questions, iteratively generates and evaluates ideas, and outputs final research questions and ideas with a concise process record.

This skill is designed for research ideation, not for claiming definitive literature coverage. When evidence is incomplete, explicitly say so and record the limitation.

Required Inputs

The user must provide:

domain: <research domain>

The user may optionally provide:

seed_ideas:
  - <initial thought, hypothesis, direction, or constraint>
constraints:
  target_year_range: <default: recent 3 complete publication years>
  min_papers: <default: 50>
  venues: <default: ICLR, ICML, NeurIPS, KDD, WWW, SIGIR, ACL, EMNLP, NAACL, CVPR, ICCV, ECCV, AAAI, IJCAI>
  output_language: <default: zh-CN>
  max_tree_nodes: <default: 60>
  min_tree_nodes: <default: 15>
  final_question_count: <default: 5>
  idea_retry_limit: <default: 5>
  replacement_limit: <default: 3>

If the domain is missing, ask the user for it. If seed ideas are missing, proceed without asking.

Output Directory Contract

At the start of each run, create or verify a dated run directory under outputs/.

Resolve:

RUN_DATE = current local date formatted as YYYY-M-D, for example 2026-5-26
RUN_OUTPUT_DIR = outputs/<RUN_DATE>

All run artifacts must be written under RUN_OUTPUT_DIR. Do not write directly under the top-level outputs/ directory.

Directory structure:

outputs/
  2026-5-26/
    corpus/
    tree_structure/
    questions/
    ideas/
    logs/
    final/

Required final outputs:

outputs/<RUN_DATE>/tree_structure/domain_tree.md
outputs/<RUN_DATE>/tree_structure/domain_tree.json
outputs/<RUN_DATE>/tree_structure/node_paper_mapping.csv
outputs/<RUN_DATE>/final/final_questions_and_ideas.md
outputs/<RUN_DATE>/final/process_summary.md
outputs/<RUN_DATE>/logs/process_record.md

Recommended intermediate outputs:

outputs/<RUN_DATE>/corpus/paper_index.csv
outputs/<RUN_DATE>/corpus/paper_cards.jsonl
outputs/<RUN_DATE>/corpus/retrieval_report.md
outputs/<RUN_DATE>/questions/all_node_questions.json
outputs/<RUN_DATE>/questions/all_node_questions.md
outputs/<RUN_DATE>/questions/top_questions.json
outputs/<RUN_DATE>/questions/top_questions.md
outputs/<RUN_DATE>/ideas/candidate_ideas_round_<N>.json
outputs/<RUN_DATE>/ideas/novelty_checks_round_<N>.json
outputs/<RUN_DATE>/ideas/evaluation_round_<N>.json
outputs/<RUN_DATE>/ideas/rejection_memos_round_<N>.json

Reference Files

Before analysis, read and operationalize this file:

references/evaluation_of_good_ideas.md

Convert it into internal checklists. Do not merely summarize it. Use evaluation_of_good_ideas.md to score, reject, and iterate ideas.

If a reference file is missing, record the error in outputs//logs/process_record.md, use the fallback criteria embedded in this skill, and explicitly mention the fallback in the final process summary.

Main Pipeline

1. Parse User Input

Convert the user's request into a run configuration:

{
  "domain": "...",
  "seed_ideas": ["..."],
  "constraints": {
    "target_year_range": "recent_3_complete_publication_years",
    "min_papers": 50,
    "venues": ["ICLR", "ICML", "NeurIPS", "KDD", "WWW", "SIGIR", "ACL", "EMNLP", "NAACL", "CVPR", "ICCV", "ECCV", "AAAI", "IJCAI"],
    "output_language": "zh-CN",
    "max_tree_nodes": 60,
    "min_tree_nodes": 15,
    "final_question_count": 5,
    "idea_retry_limit": 5,
    "replacement_limit": 3
  }
}

Resolve “recent 3 years” as the most recent three complete publication years. If the current year's conference proceedings are incomplete, prefer the latest complete three years and document the decision.

2. Initialize Logs and Outputs

Create the output directories. Start an append-only process record at:

outputs/<RUN_DATE>/logs/process_record.md

The process record must include:

  • parsed input
  • reference loading status
  • search queries and sources
  • paper counts before and after filtering
  • tree construction decisions
  • question generation counts
  • voting summary
  • idea rejection and retry decisions
  • final success or failure counts

3. Retrieve Papers

Retrieve at least min_papers valid recent papers from top venues relevant to the domain.

Recommended authoritative sources:

  • OpenReview for ICLR and many NeurIPS/ICML records
  • PMLR for ICML and AISTATS
  • NeurIPS proceedings
  • ACL Anthology for ACL, EMNLP, NAACL, COLING
  • CVF Open Access for CVPR, ICCV, ECCV
  • ACM Digital Library or official proceedings pages for KDD, WWW, SIGIR, WSDM, CIKM
  • AAAI proceedings for AAAI and IJCAI official pages where available
  • Semantic Scholar, DBLP, OpenAlex, or arXiv only as auxiliary metadata sources, not as sole evidence of venue inclusion

Run multi-pass retrieval:

  1. Search with the exact domain.
  2. Expand with synonyms, task names, benchmark names, and common method families.
  3. Expand with user seed ideas if provided.
  4. Extract high-frequency terms from retrieved titles/abstracts and search again.

For every candidate paper, collect:

{
  "paper_id": "P-001",
  "title": "...",
  "authors": ["..."],
  "year": 2024,
  "venue": "...",
  "abstract": "...",
  "url": "...",
  "source_url": "...",
  "venue_verified_by": "...",
  "evidence_status": "verified_or_uncertain_or_auxiliary",
  "keywords": ["..."],
  "relevance_score": 0.0,
  "assigned_topics": []
}

Filtering rules:

  1. Keep only target-year papers or documented fallback-year papers.
  2. Keep only top-conference or strongly relevant conference papers.
  3. Prefer main-track/full papers.
  4. Exclude duplicates, workshop-only papers, demos, tutorials, posters without full papers, and obviously irrelevant papers.
  5. Surveys may be used as auxiliary background but must not count toward the 50-paper minimum unless the user explicitly asks for surveys.

If fewer than min_papers valid papers remain, apply recovery steps in order:

  1. Expand keywords.
  2. Add adjacent subfields.
  3. Add strongly related venues.
  4. Lower the relevance threshold carefully.
  5. Extend the time window by one year.

If still fewer than min_papers papers are available, stop full analysis and write:

outputs/<RUN_DATE>/logs/retrieval_failure_report.md

Include searched queries, venues, years, valid paper count, likely reason, and advice for adjusting the domain.

4. Create Paper Cards

For each valid paper, create a structured card:

{
  "paper_id": "P-001",
  "title": "...",
  "problem": "...",
  "core_method": "...",
  "main_contribution": "...",
  "technical_assumptions": ["..."],
  "datasets_or_benchmarks": ["..."],
  "claimed_improvements": ["..."],
  "limitations": ["..."],
  "future_work_signals": ["..."],
  "relevance_to_domain": "...",
  "possible_idea_hooks": ["..."]
}

Rules:

  • Limitations and future-work signals should be grounded in abstracts, method descriptions, experiments, or paper text when available.
  • If the evidence is weak, mark it as uncertain.
  • Do not invent paper claims.
  • Save cards to outputs//corpus/paper_cards.jsonl.
  • Save a concise index to outputs//corpus/paper_index.csv.
  • Save retrieval decisions to outputs//corpus/retrieval_report.md.

5. Build the Domain Tree Pyramid

Organize the literature into a tree-like pyramid.

Default levels:

Level 0: user domain
Level 1: major research directions
Level 2: subdirections
Level 3: specific tasks, method families, problem types, or application scenarios
Level 4+: deeper breakdown only when evidence supports it

Each tree node must follow:

{
  "node_id": "N-001",
  "level": 0,
  "title": "...",
  "definition": "...",
  "parent_id": null,
  "child_ids": ["N-002"],
  "representative_papers": ["P-001", "P-002"],
  "dominant_methods": ["..."],
  "common_assumptions": ["..."],
  "known_limitations": ["..."],
  "open_problem_hints": ["..."]
}

Tree constraints:

  1. Every valid paper must map to at least one leaf node.
  2. Each non-leaf node should generally have 2-7 children.
  3. Merge any single-child node into its parent unless the separation is conceptually necessary.
  4. Split nodes with more than 7 children into coherent clusters.
  5. Keep tree depth mostly between 3 and 5.
  6. Keep total nodes between min_tree_nodes and max_tree_nodes when possible.
  7. If below minimum nodes, further split high-density directions.
  8. If above maximum nodes, merge low-evidence or redundant nodes.

Output:

outputs/<RUN_DATE>/tree_structure/domain_tree.md
outputs/<RUN_DATE>/tree_structure/domain_tree.json
outputs/<RUN_DATE>/tree_structure/node_paper_mapping.csv

The markdown tree must show node path, representative papers, common assumptions, limitations, and open-problem hints.

6. Configure Three Sub-Agents

Simulate or instantiate three sub-agents with distinct roles.

Agent A: 细节深挖者

Focus on technical assumptions, failure modes, datasets, metrics, reproducibility, computational cost, and hidden experimental weaknesses.

Agent B: 跨领域专家

Borrow methods, concepts, tasks, or evaluation protocols from a different field.

Pick a source field different from the user's domain:

  • If current domain is NLP, choose CV, data mining, systems, HCI, or cognitive science.
  • If current domain is CV, choose NLP, LLM, data mining, robotics, or graphics.
  • If current domain is LLM, choose IR, HCI, cognitive science, systems, or causal inference.
  • If current domain is data mining, choose NLP, CV, graph learning, causal inference, or economics.
  • Otherwise choose the closest non-overlapping field with useful methodology.

Agent C: 思路发散者

Focus on non-obvious combinations, new problem definitions, new benchmarks, new mechanisms, new theoretical framings, and high-risk high-reward ideas.

7. Discover Problems for Every Active Tree Node

Define active node:

A node with enough representative papers, known limitations, or open-problem hints to support at least one concrete research problem.

Usually include Level 1 and below. Include Level 0 only if it supports meaningful field-level problems.

For each candidate active node, decide evidence density:

high: at least 5 representative papers and multiple concrete limitations or open-problem hints
medium: 2-4 representative papers and at least one concrete limitation or open-problem hint
low: fewer than 2 representative papers or only vague limitations

Generate 0-3 final problems per candidate active node:

  1. High-density nodes should usually contribute 2-3 problems.
  2. Medium-density nodes should usually contribute 1-2 problems.
  3. Low-evidence nodes may contribute 0 problems and must record a skip reason.
  4. Never force a node to produce a problem if doing so would create a vague, unsupported, or pseudo-problem.

For each active node, run three rounds:

Round 1: Independent problem proposal

Each sub-agent proposes candidate problems with:

{
  "problem_statement": "...",
  "why_it_matters": "...",
  "evidence_from_papers": ["P-001", "P-002"],
  "current_limitation": "...",
  "affected_node_id": "N-010",
  "level": 2
}

Round 2: Cross-critique

Each sub-agent critiques the others' problems using:

  1. Is the problem real?
  2. Is it too broad?
  3. Is it merely incremental?
  4. Is it supported by paper evidence?
  5. Can it lead to a research idea?

Round 3: Node-level synthesis

The main agent synthesizes the final problems for the node according to its evidence density.

Each final node problem must be:

  1. specific
  2. researchable
  3. evidence-backed
  4. non-duplicate
  5. capable of leading to a solution idea

If a problem duplicates an earlier one, merge it. Generate a replacement only when the node still has enough evidence for another independent problem.

Output:

outputs/<RUN_DATE>/questions/all_node_questions.json
outputs/<RUN_DATE>/questions/all_node_questions.md

8. Vote for Top Questions

All generated questions are scored independently by the three sub-agents.

Use 1-5 scoring:

Significance
Novelty
Evidence
Tractability
Idea Potential
Fit to Good-Idea Checklist

Each score entry:

{
  "question_id": "Q-001",
  "scores": {
    "significance": 5,
    "novelty": 4,
    "evidence": 5,
    "tractability": 4,
    "idea_potential": 5,
    "fit_to_good_idea": 4
  },
  "rationale": "..."
}

The main agent computes:

final_score = mean(all_agent_dimension_scores)

Apply diversity constraints:

  1. Select final_question_count, default 5.
  2. At most two questions from the same Level 1 branch.
  3. Remove semantic duplicates and keep the higher-scoring one.
  4. Preserve node ID, level, and full node path.

Output:

outputs/<RUN_DATE>/questions/top_questions.json
outputs/<RUN_DATE>/questions/top_questions.md

9. Generate Ideas for Top Questions

For each selected top question, run three rounds.

Round 1: Independent solution proposals

The sub-agents should cover different solution origins:

  1. improvement of current-domain methods
  2. definition of new concepts, tasks, benchmarks, or metrics
  3. migration from other fields

Each proposal must include:

{
  "idea_title": "...",
  "target_question_id": "Q-001",
  "core_hypothesis": "...",
  "method_overview": "...",
  "what_is_new": "...",
  "why_it_might_work": "...",
  "required_resources": "...",
  "possible_experiments": ["..."],
  "risks": ["..."]
}

Round 2: Critique and fusion

Agents critique each proposal:

  1. Is it merely an engineering trick?
  2. Is it too close to existing work?
  3. Does it directly solve the target question?
  4. Is there a clear validation plan?
  5. Is the novelty defensible?

Round 3: Unified candidate idea

The main agent synthesizes one candidate idea per top question with:

  1. idea title
  2. target question
  3. node path
  4. motivation
  5. core hypothesis
  6. proposed method
  7. novelty claim
  8. difference from related work
  9. expected contribution
  10. experimental plan
  11. risks and mitigation
  12. why it satisfies the good-idea checklist

Output each round to:

outputs/<RUN_DATE>/ideas/candidate_ideas_round_<N>.json

10. Run Targeted Novelty Check, Evaluate, Reject, Retry, and Replace Ideas

Before scoring a candidate idea, run an idea-specific novelty check. Do not rely only on the initial domain corpus.

For each candidate idea:

  1. Search using the idea title, core mechanism, target question, and 3-5 alternative phrasings.
  2. Retrieve the closest prior work from authoritative venues and auxiliary indexes.
  3. Identify at least 3 closest related papers when possible.
  4. Write a concise delta from the closest prior work.
  5. Mark the novelty evidence as verified, uncertain, or insufficient.

Each novelty check must produce:

{
  "idea_id": "I-001",
  "queries": ["..."],
  "closest_prior_work": [
    {
      "paper_id_or_url": "...",
      "title": "...",
      "year": 2024,
      "venue": "...",
      "why_close": "...",
      "delta": "..."
    }
  ],
  "novelty_evidence_status": "verified_or_uncertain_or_insufficient",
  "novelty_risk": "..."
}

If no credible closest prior work can be found, do not automatically treat the idea as novel. Mark the novelty evidence as uncertain and record the search limitation.

Save novelty checks to:

outputs/<RUN_DATE>/ideas/novelty_checks_round_<N>.json

Use references/evaluation_of_good_ideas.md as the primary rubric. If unavailable, use the fallback rubric in this skill.

Default evaluation record:

{
  "idea_id": "I-001",
  "scores": {
    "novelty": 0,
    "importance": 0,
    "feasibility": 0,
    "clarity": 0,
    "technical_depth": 0,
    "evaluation_plan": 0,
    "difference_from_existing_work": 0,
    "risk_awareness": 0
  },
  "score_scale": "0-100_per_dimension",
  "weights": {
    "novelty": 0.15,
    "importance": 0.15,
    "feasibility": 0.12,
    "clarity": 0.10,
    "technical_depth": 0.15,
    "evaluation_plan": 0.13,
    "difference_from_existing_work": 0.10,
    "risk_awareness": 0.10
  },
  "total_score": 0,
  "decision": "pass",
  "novelty_evidence_status": "verified_or_uncertain_or_insufficient",
  "rejection_reasons": ["..."],
  "must_fix": ["..."]
}

All dimensions are scored from 0 to 100. Compute total_score as the weighted sum defined in references/evaluation_of_good_ideas.md, rounded to the nearest integer.

Default pass rule:

total_score >= 75
AND novelty >= 80
AND importance >= 80
AND feasibility >= 70
AND evaluation_plan >= 60
AND the idea is not a shallow combination of existing work
AND the idea has a clear experimental validation path
AND the idea can articulate contribution beyond prior work
AND novelty_evidence_status is not insufficient

If an idea fails, generate a rejection memo:

{
  "idea_id": "I-001",
  "target_question_id": "Q-001",
  "failed_criteria": ["..."],
  "rejection_reasons": ["..."],
  "must_fix": ["..."],
  "forbidden_retry_patterns": ["..."]
}

Then rerun the three-round idea brainstorming for that question.

Retry rules:

  1. Discard the rejected idea.
  2. Do not preserve its core mechanism unless the rejection memo allows it.
  3. If novelty failed, change the core mechanism or framing.
  4. If feasibility failed, simplify the method or change the validation path.
  5. If evaluation failed, define clearer datasets, metrics, baselines, or ablations.
  6. Maximum retry count per question is idea_retry_limit, default 5.

Replacement rules:

If a question fails to produce a passing idea after maximum retries:

  1. Mark the question as retired.
  2. Select the next highest-ranked unused non-duplicate candidate question.
  3. Run idea generation and evaluation for the replacement question.
  4. Maximum replacements: replacement_limit, default 3.

If fewer than 5 passing ideas are obtained after all retries and replacements, final outputs must state the true number of passing ideas. Never present failed ideas as accepted final ideas.

11. Maintain Process Record

Maintain outputs//logs/process_record.md throughout the run.

Use this structure:

# Process Record

## 1. Input
- Domain:
- Seed ideas:
- Constraints:

## 2. Reference Loading
- evaluation_of_good_ideas:
- Fallback used:

## 3. Retrieval
- Year range:
- Venues:
- Search queries:
- Retrieved papers:
- Valid papers after filtering:
- Recovery actions if any:

## 4. Paper Analysis
- Paper cards generated:
- Uncertain claims:

## 5. Tree Construction
- Number of nodes:
- Number of levels:
- Main directions:
- Merge/split decisions:

## 6. Question Generation
- Active nodes:
- Expected question range:
- Actual questions:
- Skipped low-evidence nodes:
- Deduplication notes:

## 7. Voting
- Top questions:
- Diversity constraints applied:

## 8. Idea Generation and Evaluation
- Question:
- Iteration count:
- Novelty check status:
- Rejections:
- Final decision:

## 9. Final Results
- Qualified ideas:
- Retired questions:
- Known limitations:

Keep the process record concise. Do not include full private deliberation. Include decisions, evidence, and outcomes.

12. Produce Final Documents

Final questions and ideas

Write:

outputs/<RUN_DATE>/final/final_questions_and_ideas.md

Required sections:

# Final Questions and Ideas

## 1. Run Summary
- Domain:
- Number of papers analyzed:
- Number of tree nodes:
- Number of candidate questions:
- Number of final qualified ideas:

## 2. Selected Top Questions

| Question ID | Node Path | Level | Question | Score |
|---|---|---:|---|---:|

## 3. Final Ideas

### Idea 1: <title>

#### Target Question

#### Node Path

#### Motivation

#### Core Hypothesis

#### Proposed Method

#### Novelty

#### Difference from Existing Work

#### Targeted Novelty Check

#### Experimental Plan

#### Expected Contribution

#### Risks and Mitigation

#### Evaluation Score

Process summary

Write:

outputs/<RUN_DATE>/final/process_summary.md

Required sections:

# Process Summary

## Input

## Corpus

## Tree Structure Summary

## Question Discovery Summary

## Voting Summary

## Idea Iteration Summary

## Final Outcome

## Known Limitations

Fallback Good-Idea Checklist

Use this only if references/evaluation_of_good_ideas.md cannot be read.

A good research idea should be:

  1. Problem-driven rather than method-driven.
  2. Important to a recognizable research community.
  3. Clearly different from prior work.
  4. Technically plausible.
  5. Experimentally testable.
  6. Specific enough to execute.
  7. General enough to matter beyond a single toy setting.
  8. Honest about assumptions and risks.
  9. Capable of producing interpretable positive or negative results.
  10. Communicable as a concise contribution.

Fallback Evaluation Rubric

Use this only if references/evaluation_of_good_ideas.md cannot be read.

Score out of 100:

Novelty: 20
Importance: 15
Feasibility: 15
Clarity: 10
Technical Depth: 15
Evaluation Plan: 10
Difference from Existing Work: 10
Risk Awareness: 5

Pass if:

total >= 75
novelty >= 14/20
importance >= 10/15
feasibility >= 10/15
evaluation_plan >= 7/10

Reject if any of these are true:

  1. The idea is just “apply method X to domain Y” without a new mechanism, insight, task, or evaluation.
  2. The novelty claim cannot be distinguished from retrieved prior work.
  3. The method cannot be tested with available or plausible resources.
  4. The question being solved is vague or unsupported by literature evidence.
  5. The idea has no credible baseline or ablation plan.

Safety and Honesty Rules

  • Do not fabricate paper titles, venues, years, or claims.
  • Do not count a paper as top-conference evidence unless the venue is verified or explicitly marked uncertain.
  • Do not hide failed retrieval, rejected ideas, or incomplete coverage.
  • Do not present speculative limitations as facts.
  • Do not claim exhaustive literature coverage unless the retrieval process actually supports it.
  • Distinguish clearly among verified evidence, reasonable inference, and speculation.

版本历史

共 1 个版本

  • v1.0.1 Initial release 当前
    2026-05-26 23:57 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

jarvis-research-longtext-v1

user_5afe0d61
★ 0 📥 165

jarvis-research-longtext-v0

user_5afe0d61
★ 0 📥 165

jarvis-research-longtext-compress-v0

user_5afe0d61
这是一个专门用于对长篇算法论文解析结果进行"局部降维压缩"的 Skill。它的唯一目标是:定位输入文本中的 [模型架构] 部分,将其中每个繁琐的子模块压缩为极其精炼的单一自然段,精准概括该模块的运行逻辑。**注意:严禁出现任何数学公式。**
★ 0 📥 98