Research Idea Generator Agent Skill

Purpose

Use this skill when the user wants to generate strong research ideas from a research domain, optionally with seed ideas. The skill retrieves recent top-conference papers, builds a domain development tree, constructs a tension map, discovers tension-driven open problems, stress-tests top questions, generates strong idea cards, runs nearest-prior challenges and reviewer red-team critique, reframes weak ideas, and outputs only final ideas that survive strict evaluation.

This skill is designed for research ideation, not for claiming definitive literature coverage. When evidence is incomplete, explicitly say so and record the limitation.

Required Inputs

The user must provide:

domain: <research domain>

The user may optionally provide:

seed_ideas:
  - <initial thought, hypothesis, direction, or constraint>
constraints:
  target_year_range: <default: recent 3 complete publication years>
  min_papers: <default: 50>
  venues: <default: ICLR, ICML, NeurIPS, KDD, WWW, SIGIR, ACL, EMNLP, NAACL, CVPR, ICCV, ECCV, AAAI, IJCAI>
  output_language: <default: zh-CN>
  max_tree_nodes: <default: 60>
  min_tree_nodes: <default: 15>
  final_question_count: <default: 8>
  final_idea_target: <default: 3>
  idea_retry_limit: <default: 5>
  replacement_limit: <default: 3>

If the domain is missing, ask the user for it. If seed ideas are missing, proceed without asking.

Output Directory Contract

At the start of each run, create or verify a dated run directory under outputs/.

Resolve:

RUN_DATE = current local date formatted as YYYY-M-D, for example 2026-5-26
RUN_OUTPUT_DIR = outputs/<RUN_DATE>

All run artifacts must be written under RUN_OUTPUT_DIR. Do not write directly under the top-level outputs/ directory.

Directory structure:

outputs/
  2026-5-26/
    corpus/
    tree_structure/
    tension_map/
    questions/
    ideas/
    logs/
    final/

Required final outputs:

outputs/<RUN_DATE>/tree_structure/domain_tree.md
outputs/<RUN_DATE>/tree_structure/domain_tree.json
outputs/<RUN_DATE>/tree_structure/node_paper_mapping.csv
outputs/<RUN_DATE>/tension_map/tension_map.md
outputs/<RUN_DATE>/tension_map/tension_map.json
outputs/<RUN_DATE>/final/final_questions_and_ideas.md
outputs/<RUN_DATE>/final/process_summary.md
outputs/<RUN_DATE>/logs/process_record.md

Recommended intermediate outputs:

outputs/<RUN_DATE>/corpus/paper_index.csv
outputs/<RUN_DATE>/corpus/paper_cards.jsonl
outputs/<RUN_DATE>/corpus/retrieval_report.md
outputs/<RUN_DATE>/tension_map/tension_map.md
outputs/<RUN_DATE>/tension_map/tension_map.json
outputs/<RUN_DATE>/questions/all_node_questions.json
outputs/<RUN_DATE>/questions/all_node_questions.md
outputs/<RUN_DATE>/questions/top_questions.json
outputs/<RUN_DATE>/questions/top_questions.md
outputs/<RUN_DATE>/ideas/candidate_ideas_round_<N>.json
outputs/<RUN_DATE>/ideas/novelty_checks_round_<N>.json
outputs/<RUN_DATE>/ideas/red_team_reviews_round_<N>.json
outputs/<RUN_DATE>/ideas/reframed_ideas_round_<N>.json
outputs/<RUN_DATE>/ideas/evaluation_round_<N>.json
outputs/<RUN_DATE>/ideas/rejection_memos_round_<N>.json

Reference Files

Before analysis, read and operationalize this file:

references/evaluation_of_good_ideas.md

Convert it into internal checklists. Do not merely summarize it. Use evaluation_of_good_ideas.md to score, reject, and iterate ideas.

If a reference file is missing, record the error in outputs//logs/process_record.md, use the fallback criteria embedded in this skill, and explicitly mention the fallback in the final process summary.

Main Pipeline

1. Parse User Input

Convert the user's request into a run configuration:

{
  "domain": "...",
  "seed_ideas": ["..."],
  "constraints": {
    "target_year_range": "recent_3_complete_publication_years",
    "min_papers": 50,
    "venues": ["ICLR", "ICML", "NeurIPS", "KDD", "WWW", "SIGIR", "ACL", "EMNLP", "NAACL", "CVPR", "ICCV", "ECCV", "AAAI", "IJCAI"],
    "output_language": "zh-CN",
    "max_tree_nodes": 60,
    "min_tree_nodes": 15,
    "final_question_count": 8,
    "final_idea_target": 3,
    "idea_retry_limit": 5,
    "replacement_limit": 3
  }
}

Resolve “recent 3 years” as the most recent three complete publication years. If the current year's conference proceedings are incomplete, prefer the latest complete three years and document the decision.

2. Initialize Logs and Outputs

Create the output directories. Start an append-only process record at:

outputs/<RUN_DATE>/logs/process_record.md

The process record must include:

parsed input
reference loading status
search queries and sources
paper counts before and after filtering
tree construction decisions
question generation counts
voting summary
idea rejection and retry decisions
final success or failure counts

3. Retrieve Papers

Retrieve at least min_papers valid recent papers from top venues relevant to the domain.

Recommended authoritative sources:

OpenReview for ICLR and many NeurIPS/ICML records
PMLR for ICML and AISTATS
NeurIPS proceedings
ACL Anthology for ACL, EMNLP, NAACL, COLING
CVF Open Access for CVPR, ICCV, ECCV
ACM Digital Library or official proceedings pages for KDD, WWW, SIGIR, WSDM, CIKM
AAAI proceedings for AAAI and IJCAI official pages where available
Semantic Scholar, DBLP, OpenAlex, or arXiv only as auxiliary metadata sources, not as sole evidence of venue inclusion

Run multi-pass retrieval:

Search with the exact domain.
Expand with synonyms, task names, benchmark names, and common method families.
Expand with user seed ideas if provided.
Extract high-frequency terms from retrieved titles/abstracts and search again.

For every candidate paper, collect:

{
  "paper_id": "P-001",
  "title": "...",
  "authors": ["..."],
  "year": 2024,
  "venue": "...",
  "abstract": "...",
  "url": "...",
  "source_url": "...",
  "venue_verified_by": "...",
  "evidence_status": "verified_or_uncertain_or_auxiliary",
  "keywords": ["..."],
  "relevance_score": 0.0,
  "assigned_topics": []
}

Filtering rules:

Keep only target-year papers or documented fallback-year papers.
Keep only top-conference or strongly relevant conference papers.
Prefer main-track/full papers.
Exclude duplicates, workshop-only papers, demos, tutorials, posters without full papers, and obviously irrelevant papers.
Surveys may be used as auxiliary background but must not count toward the 50-paper minimum unless the user explicitly asks for surveys.

If fewer than min_papers valid papers remain, apply recovery steps in order:

Expand keywords.
Add adjacent subfields.
Add strongly related venues.
Lower the relevance threshold carefully.
Extend the time window by one year.

If still fewer than min_papers papers are available, stop full analysis and write:

outputs/<RUN_DATE>/logs/retrieval_failure_report.md

Include searched queries, venues, years, valid paper count, likely reason, and advice for adjusting the domain.

4. Create Paper Cards

For each valid paper, create a structured card:

{
  "paper_id": "P-001",
  "title": "...",
  "problem": "...",
  "core_method": "...",
  "main_contribution": "...",
  "technical_assumptions": ["..."],
  "datasets_or_benchmarks": ["..."],
  "claimed_improvements": ["..."],
  "limitations": ["..."],
  "future_work_signals": ["..."],
  "relevance_to_domain": "...",
  "possible_idea_hooks": ["..."]
}

Rules:

Limitations and future-work signals should be grounded in abstracts, method descriptions, experiments, or paper text when available.
If the evidence is weak, mark it as uncertain.
Do not invent paper claims.
Save cards to outputs//corpus/paper_cards.jsonl.
Save a concise index to outputs//corpus/paper_index.csv.
Save retrieval decisions to outputs//corpus/retrieval_report.md.

5. Build the Domain Tree Pyramid

Organize the literature into a tree-like pyramid.

Default levels:

Level 0: user domain
Level 1: major research directions
Level 2: subdirections
Level 3: specific tasks, method families, problem types, or application scenarios
Level 4+: deeper breakdown only when evidence supports it

Each tree node must follow:

{
  "node_id": "N-001",
  "level": 0,
  "title": "...",
  "definition": "...",
  "parent_id": null,
  "child_ids": ["N-002"],
  "representative_papers": ["P-001", "P-002"],
  "dominant_methods": ["..."],
  "common_assumptions": ["..."],
  "known_limitations": ["..."],
  "open_problem_hints": ["..."]
}

Tree constraints:

Every valid paper must map to at least one leaf node.
Each non-leaf node should generally have 2-7 children.
Merge any single-child node into its parent unless the separation is conceptually necessary.
Split nodes with more than 7 children into coherent clusters.
Keep tree depth mostly between 3 and 5.
Keep total nodes between min_tree_nodes and max_tree_nodes when possible.
If below minimum nodes, further split high-density directions.
If above maximum nodes, merge low-evidence or redundant nodes.

Output:

outputs/<RUN_DATE>/tree_structure/domain_tree.md
outputs/<RUN_DATE>/tree_structure/domain_tree.json
outputs/<RUN_DATE>/tree_structure/node_paper_mapping.csv

The markdown tree must show node path, representative papers, common assumptions, limitations, and open-problem hints.

6. Build a Tension Map

Before generating questions, convert the domain tree and paper cards into a tension map. A strong idea should usually come from a real tension, contradiction, bottleneck, hidden assumption, or evaluation mismatch. Do not jump directly from a topic node to a solution.

For each high- or medium-evidence node, identify 1-4 tensions using these sources:

Hidden assumption: a repeated modeling, data, supervision, deployment, or evaluation assumption that may not hold.
Evaluation mismatch: benchmarks, metrics, or datasets reward behavior different from the real research or deployment need.
Scaling bottleneck: a method works at small scale but breaks in context length, graph size, data volume, model size, inference cost, or annotation cost.
Data or label bottleneck: progress is limited by unavailable, noisy, biased, stale, or non-representative data.
Deployment gap: papers optimize an offline setting while real use requires latency, reliability, privacy, interpretability, adaptation, or human control.
Theory-practice inconsistency: empirical success lacks a mechanistic explanation, or theoretical assumptions do not match practical regimes.
Contradiction between successful methods: two strong method families succeed under incompatible assumptions, suggesting a deeper unifying mechanism or boundary condition.

Each tension must include:

{
  "tension_id": "T-001",
  "node_id": "N-010",
  "node_path": "...",
  "tension_type": "hidden_assumption_or_evaluation_mismatch_or_scaling_bottleneck_or_data_bottleneck_or_deployment_gap_or_theory_practice_inconsistency_or_method_contradiction",
  "statement": "...",
  "evidence_from_papers": ["P-001", "P-002"],
  "why_existing_work_does_not_resolve_it": "...",
  "why_it_could_matter": "...",
  "strength": "high_or_medium_or_low",
  "uncertainty": "..."
}

Hard rules:

A tension with weak evidence may be kept as exploratory, but it cannot directly produce an accepted final idea.
Do not treat a missing paper as a tension. The tension must describe a substantive research gap, not a search failure.
Prefer fewer high-signal tensions over many generic limitations.

Output:

outputs/<RUN_DATE>/tension_map/tension_map.md
outputs/<RUN_DATE>/tension_map/tension_map.json

7. Configure Five Critical Roles (MUST Instantiate as Real Sub-Agents)

MANDATORY: Each of the five roles MUST be instantiated as a real sub-agent using sessions_spawn. It is FORBIDDEN to simulate the roles within the main agent's own reasoning.

Rationale: The five roles are designed to provide genuinely independent perspectives. Simulating them within a single reasoning process creates confirmation bias — the same model with the same context will produce correlated outputs, defeating the purpose of multi-perspective review. Real sub-agents run in isolated sessions with their own context, producing truly independent critiques.

Sub-Agent Instantiation Protocol

For each phase where roles are invoked (question stress-testing, idea generation, red-team review, reframe), spawn one sub-agent per role using:

sessions_spawn(
  task: "<role-specific prompt with full context>",
  runtime: "subagent",
  mode: "run",
  label: "<role_name>_<phase>"
)

Each spawned sub-agent receives:

Its role definition (below)
The relevant input (questions / ideas / tension map)
The domain tree and paper corpus summary
Clear output format requirements

The main agent collects all sub-agent outputs, then synthesizes. The main agent never generates role-specific content itself.

Fallback: If sub-agent spawning fails

If sessions_spawn is unavailable or fails repeatedly, record the failure in outputs//logs/process_record.md and fall back to internal role-play with the following protocol:

For each role, write the role's full definition as a separate system prompt.
Process roles sequentially, not in parallel — complete one role's full output before starting the next.
Explicitly mark all internally-generated role outputs with [FALLBACK: internal simulation, sub-agent unavailable].
Report the fallback in the final process summary.

This fallback is a degradation, not an equivalent. Prioritize fixing sub-agent availability.

Role A: Literature Attorney

Attack novelty. Find closest prior work, likely reviewer comparisons, hidden duplicated mechanisms, and claims that would not survive a related-work section.

Role B: Mechanism Builder

Demand a non-trivial mechanism. Reject ideas that are only module stacking, backbone swaps, loss tweaks, prompt tricks, or "apply X to Y" without a domain-specific mechanism.

Role C: Experimentalist

Design the smallest credible experiment that can falsify the core hypothesis. Check datasets, baselines, metrics, ablations, stress tests, compute cost, and failure interpretation.

Role D: Reviewer 2

Write the strongest top-conference rejection case. Focus on incremental novelty, unclear contribution, weak evidence, missing baselines, unrealistic assumptions, and overclaiming.

Role E: Reframer

Transform weak or incremental ideas by changing the problem framing, mechanism, evaluation target, or boundary condition. The Reframer must not merely polish wording.

8. Discover Tension-Driven Problems

Define active node:

A node with enough representative papers, known limitations, or open-problem hints to support at least one concrete research problem.

Usually include Level 1 and below. Include Level 0 only if it supports meaningful field-level problems.

For each candidate active node, decide evidence density:

high: at least 5 representative papers and multiple concrete limitations or open-problem hints
medium: 2-4 representative papers and at least one concrete limitation or open-problem hint
low: fewer than 2 representative papers or only vague limitations

Generate 0-3 final problems per candidate active node:

High-density nodes should usually contribute 2-3 problems.
Medium-density nodes should usually contribute 1-2 problems.
Low-evidence nodes may contribute 0 problems and must record a skip reason.
Never force a node to produce a problem if doing so would create a vague, unsupported, or pseudo-problem.
Prefer questions that originate from high- or medium-strength tensions.

For each active node or high-value tension, run three rounds:

Round 1: Independent problem proposal (spawn 4 sub-agents)

Spawn sub-agents for Roles A, B, C, and E. Each sub-agent independently proposes candidate problems. Do NOT merge their outputs until all have completed.

Each problem proposal must include:

{
  "question_id": "Q-001",
  "problem_statement": "...",
  "source_tension_ids": ["T-001"],
  "problem_source": "hidden_assumption_or_evaluation_mismatch_or_scaling_bottleneck_or_data_bottleneck_or_deployment_gap_or_theory_practice_inconsistency_or_method_contradiction",
  "why_it_matters": "...",
  "evidence_from_papers": ["P-001", "P-002"],
  "current_limitation": "...",
  "affected_node_id": "N-010",
  "level": 2,
  "what_would_change_if_solved": "..."
}

Round 2: Cross-critique (spawn 4 sub-agents)

Spawn sub-agents for Roles A-D. Each sub-agent independently critiques ALL proposals from Round 1. Do NOT merge their critiques until all have completed.

Each critique must address:

Is the problem real?
Is it too broad?
Is it merely incremental?
Is it supported by paper evidence?
Is it driven by a genuine tension rather than a taxonomy slot?
Can it lead to a strong research idea?

Round 3: Node-level synthesis

The main agent synthesizes the final problems for the node according to its evidence density.

Each final node problem must be:

specific
researchable
evidence-backed
non-duplicate
capable of leading to a solution idea

If a problem duplicates an earlier one, merge it. Generate a replacement only when the node still has enough evidence for another independent problem.

Output:

outputs/<RUN_DATE>/questions/all_node_questions.json
outputs/<RUN_DATE>/questions/all_node_questions.md

9. Stress-Test and Select Top Questions

All generated questions are scored independently by the five roles.

Use 1-5 scoring:

Significance
Novelty
Evidence
Tractability
Idea Potential
Fit to Good-Idea Checklist
Tension Strength
Falsifiability

Each score entry:

{
  "question_id": "Q-001",
  "scores": {
    "significance": 5,
    "novelty": 4,
    "evidence": 5,
    "tractability": 4,
    "idea_potential": 5,
    "fit_to_good_idea": 4,
    "tension_strength": 5,
    "falsifiability": 4
  },
  "rationale": "..."
}

The main agent computes:

final_score = mean(all_agent_dimension_scores)

Apply diversity constraints:

Select up to final_question_count, default 8.
At most two questions from the same Level 1 branch.
Remove semantic duplicates and keep the higher-scoring one.
Preserve node ID, level, and full node path.
Reject any question without a source tension unless it has unusually strong evidence.
Reject any question that cannot produce a falsifiable hypothesis.

Output:

outputs/<RUN_DATE>/questions/top_questions.json
outputs/<RUN_DATE>/questions/top_questions.md

10. Generate Strong Idea Cards for Top Questions

For each selected top question, run three rounds. The goal is not to fill a quota; the goal is to produce a small number of defensible strong ideas.

Round 1: Independent solution proposals (spawn 3 sub-agents)

Spawn sub-agents for Roles B, C, and E. Each sub-agent independently proposes solutions. Do NOT merge until all have completed.

Roles B, C, and E should cover different solution origins:

domain-specific mechanism redesign
new problem, benchmark, metric, or evaluation target
cross-domain analogy with explicit adaptation
unification or boundary condition between conflicting successful methods

Each proposal must include:

{
  "idea_title": "...",
  "target_question_id": "Q-001",
  "source_tension_ids": ["T-001"],
  "one_sentence_core_insight": "...",
  "why_non_obvious": "...",
  "assumption_changed": "...",
  "core_hypothesis": "...",
  "method_overview": "...",
  "what_is_new": "...",
  "why_it_might_work": "...",
  "key_mechanism": "...",
  "why_not_x_plus_y": "...",
  "required_resources": "...",
  "possible_experiments": ["..."],
  "minimum_falsifiable_experiment": "...",
  "risks": ["..."]
}

Round 2: Critique and fusion (spawn 4 sub-agents)

Spawn sub-agents for Roles A-D. Each sub-agent independently critiques ALL proposals from Round 1. Do NOT merge until all have completed.

Each critique must address:

Is it merely an engineering trick?
Is it too close to existing work?
Does it directly solve the target question?
Is there a clear validation plan?
Is the novelty defensible?
What would Reviewer 2 write as the strongest rejection?
What is the one component that must be new for the idea to matter?

Round 3: Unified candidate idea

The main agent synthesizes at most one strong candidate idea per top question with:

idea title
target question
node path
motivation
source tensions
one-sentence core insight
why the insight is non-obvious
assumption changed or boundary condition exposed
core hypothesis
key mechanism
proposed method
novelty claim
why it is not merely "X + Y"
difference from related work
minimum falsifiable experiment
full experimental plan
expected contribution
risks and mitigation
strongest reviewer attack
why it satisfies the good-idea checklist

Hard reject candidate ideas immediately if any condition is true:

The idea is only "apply method X to domain Y" without a domain-specific mechanism.
The idea is mainly a backbone, loss, module, prompt, or dataset swap.
The core insight can be generated from the question in under five minutes by an informed researcher.
The idea cannot name the closest prior work it must beat or differ from.
The idea has no minimum falsifiable experiment.
The novelty depends mainly on wording, naming, or an unsearched literature gap.
The expected contribution is only a small benchmark improvement without a new mechanism, problem framing, or evaluation insight.

Output each round to:

outputs/<RUN_DATE>/ideas/candidate_ideas_round_<N>.json

11. Run Nearest-Prior Challenge, Reviewer Red-Team, Reframe Loop, and Evaluation

Before scoring a candidate idea, run an idea-specific novelty check. Do not rely only on the initial domain corpus.

For each candidate idea:

Search using the idea title, core mechanism, target question, and 3-5 alternative phrasings.
Retrieve the closest prior work from authoritative venues and auxiliary indexes.
Identify at least 3 closest related papers when possible.
Write a concise delta from the closest prior work.
Mark the novelty evidence as verified, uncertain, or insufficient.

Each nearest-prior challenge must produce:

{
  "idea_id": "I-001",
  "queries": ["..."],
  "closest_prior_work": [
    {
      "paper_id_or_url": "...",
      "title": "...",
      "year": 2024,
      "venue": "...",
      "why_close": "...",
      "delta": "...",
      "what_reviewer_would_claim": "..."
    }
  ],
  "novelty_evidence_status": "verified_or_uncertain_or_insufficient",
  "novelty_risk": "...",
  "defense_against_x_plus_y_claim": "...",
  "true_new_knowledge_claim": "..."
}

If no credible closest prior work can be found, do not automatically treat the idea as novel. Mark the novelty evidence as uncertain and record the search limitation.

Save novelty checks to:

outputs/<RUN_DATE>/ideas/novelty_checks_round_<N>.json

Run a reviewer red-team (spawn 5 sub-agents, one per role) before final evaluation. Each sub-agent independently writes its critique. The main agent collects all critiques and synthesizes the red-team verdict.

Each sub-agent output must include:

{
  "idea_id": "I-001",
  "reviewer_2_rejection": "...",
  "literature_attorney_objections": ["..."],
  "mechanism_builder_objections": ["..."],
  "experimentalist_objections": ["..."],
  "fatal_flaws": ["..."],
  "fixable_flaws": ["..."],
  "recommended_decision_before_reframe": "pass_or_reframe_or_reject"
}

Save red-team reviews to:

outputs/<RUN_DATE>/ideas/red_team_reviews_round_<N>.json

If flaws are fixable, run a reframe loop instead of superficial revision:

Change the problem framing, mechanism, evaluation target, or boundary condition.
Preserve the source tension unless the tension itself was invalid.
Do not keep the same core mechanism after a novelty failure.
Do not merely rename the idea, add modules, or add experiments.
Re-run nearest-prior challenge and red-team review after reframing.

Save reframed ideas to:

outputs/<RUN_DATE>/ideas/reframed_ideas_round_<N>.json

Use references/evaluation_of_good_ideas.md as the primary rubric. If unavailable, use the fallback rubric in this skill.

Default evaluation record:

{
  "idea_id": "I-001",
  "one_sentence_core_insight": "...",
  "scores": {
    "novelty": 0,
    "importance": 0,
    "feasibility": 0,
    "clarity": 0,
    "technical_depth": 0,
    "evaluation_plan": 0,
    "difference_from_existing_work": 0,
    "risk_awareness": 0
  },
  "score_scale": "0-10_per_dimension_(integer)",
  "weights": {
    "novelty": 0.15,
    "importance": 0.15,
    "feasibility": 0.12,
    "clarity": 0.10,
    "technical_depth": 0.15,
    "evaluation_plan": 0.13,
    "difference_from_existing_work": 0.10,
    "risk_awareness": 0.10
  },
  "total_score": 0,
  "decision": "pass",
  "novelty_evidence_status": "verified_or_uncertain_or_insufficient",
  "red_team_decision": "pass_or_reframe_or_reject",
  "minimum_falsifiable_experiment": "...",
  "closest_prior_work_delta": "...",
  "rejection_reasons": ["..."],
  "must_fix": ["..."]
}

All dimensions are scored from 0 to 10 (integers only). Compute total_score as the weighted sum defined in references/evaluation_of_good_ideas.md, rounded to the nearest integer.

Default pass rule:

total_score >= 8
AND novelty >= 9
AND importance >= 9
AND feasibility >= 8
AND clarity >= 8
AND technical_depth >= 8
AND evaluation_plan >= 7
AND difference_from_existing_work >= 8
AND risk_awareness >= 7
AND the idea is not a shallow combination of existing work
AND the idea is not a mere domain transfer without deep adaptation
AND the idea has a clear experimental validation path
AND the idea can articulate contribution beyond prior work
AND novelty_evidence_status is not insufficient
AND red_team_decision is pass
AND the idea has a one-sentence non-obvious core insight
AND the minimum falsifiable experiment can produce an interpretable negative result

If an idea fails, generate a rejection memo:

{
  "idea_id": "I-001",
  "target_question_id": "Q-001",
  "failed_criteria": ["..."],
  "rejection_reasons": ["..."],
  "must_fix": ["..."],
  "forbidden_retry_patterns": ["..."]
}

Then rerun the three-round idea brainstorming for that question.

Retry rules:

Discard the rejected idea.
Do not preserve its core mechanism unless the rejection memo allows it.
If novelty failed, change the core mechanism or framing.
If feasibility failed, simplify the method or change the validation path.
If evaluation failed, define clearer datasets, metrics, baselines, or ablations.
If red-team review found a fixable flaw, run the reframe loop before generating a totally new idea.
Maximum retry count per question is idea_retry_limit, default 5.

Replacement rules:

If a question fails to produce a passing idea after maximum retries:

Mark the question as retired.
Select the next highest-ranked unused non-duplicate candidate question.
Run idea generation and evaluation for the replacement question.
Maximum replacements: replacement_limit, default 3.

Aim for final_idea_target, default 3, but do not force the count. If fewer strong ideas pass after all retries and replacements, final outputs must state the true number of passing ideas. Never present failed ideas as accepted final ideas.

12. Maintain Process Record

Maintain outputs//logs/process_record.md throughout the run.

Use this structure:

# Process Record

## 1. Input
- Domain:
- Seed ideas:
- Constraints:

## 2. Reference Loading
- evaluation_of_good_ideas:
- Fallback used:

## 3. Retrieval
- Year range:
- Venues:
- Search queries:
- Retrieved papers:
- Valid papers after filtering:
- Recovery actions if any:

## 4. Paper Analysis
- Paper cards generated:
- Uncertain claims:

## 5. Tree Construction
- Number of nodes:
- Number of levels:
- Main directions:
- Merge/split decisions:

## 6. Tension Map
- Tensions generated:
- High-strength tensions:
- Weak or discarded tensions:
- Key contradictions or bottlenecks:

## 7. Question Generation
- Active nodes:
- Expected question range:
- Actual questions:
- Skipped low-evidence nodes:
- Questions rejected for weak tension or weak falsifiability:
- Deduplication notes:

## 8. Question Stress Test and Voting
- Top questions:
- Diversity constraints applied:

## 9. Idea Generation, Red-Team, and Evaluation
- Question:
- Iteration count:
- Novelty check status:
- Red-team decision:
- Reframe actions:
- Rejections:
- Final decision:

## 10. Final Results
- Qualified ideas:
- Retired questions:
- Known limitations:

Keep the process record concise. Do not include full private deliberation. Include decisions, evidence, and outcomes.

13. Produce Final Documents

Final questions and ideas

Write:

outputs/<RUN_DATE>/final/final_questions_and_ideas.md

Required sections:

# Final Questions and Ideas

## 1. Run Summary
- Domain:
- Number of papers analyzed:
- Number of tree nodes:
- Number of tensions generated:
- Number of high-strength tensions:
- Number of candidate questions:
- Number of final qualified ideas:

## 2. Key Tensions

| Tension ID | Node Path | Type | Statement | Strength |
|---|---|---|---|---|

## 3. Selected Top Questions

| Question ID | Source Tensions | Node Path | Level | Question | Score |
|---|---|---|---:|---|---:|

## 4. Final Strong Ideas

### Idea 1: <title>

#### Target Question

#### Node Path

#### Source Tensions

#### Motivation

#### One-Sentence Core Insight

#### Why This Insight Is Non-Obvious

#### Assumption Changed or Boundary Condition Exposed

#### Core Hypothesis

#### Key Mechanism

#### Proposed Method

#### Why This Is Not X + Y

#### Novelty

#### Difference from Existing Work

#### Nearest-Prior Challenge

#### Minimum Falsifiable Experiment

#### Full Experimental Plan

#### Expected Contribution

#### Risks and Mitigation

#### Reviewer Red-Team

#### Evaluation Score

Process summary

Write:

outputs/<RUN_DATE>/final/process_summary.md

Required sections:

# Process Summary

## Input

## Corpus

## Tree Structure Summary

## Tension Map Summary

## Question Discovery Summary

## Question Stress Test and Voting Summary

## Idea Red-Team and Reframe Summary

## Final Outcome

## Known Limitations

Fallback Good-Idea Checklist

Use this only if references/evaluation_of_good_ideas.md cannot be read.

A good research idea should be:

Problem-driven rather than method-driven.
Important to a recognizable research community.
Clearly different from prior work.
Technically plausible.
Experimentally testable.
Specific enough to execute.
General enough to matter beyond a single toy setting.
Honest about assumptions and risks.
Capable of producing interpretable positive or negative results.
Communicable as a concise contribution.

Fallback Evaluation Rubric

Use this only if references/evaluation_of_good_ideas.md cannot be read.

Score out of 10 (integers):

Novelty: 1.5 (max 1.5)
Importance: 1.5 (max 1.5)
Feasibility: 1.2 (max 1.2)
Clarity: 1.0 (max 1.0)
Technical Depth: 1.5 (max 1.5)
Evaluation Plan: 1.3 (max 1.3)
Difference from Existing Work: 1.0 (max 1.0)
Risk Awareness: 1.0 (max 1.0)
Total max: 10.0

Pass if:

total >= 8
novelty >= 9
importance >= 9
feasibility >= 8
clarity >= 8
technical_depth >= 8
evaluation_plan >= 7
difference_from_existing_work >= 8
risk_awareness >= 7

Reject if any of these are true:

The idea is just “apply method X to domain Y” without a new mechanism, insight, task, or evaluation.
The novelty claim cannot be distinguished from retrieved prior work.
The method cannot be tested with available or plausible resources.
The question being solved is vague or unsupported by literature evidence.
The idea has no credible baseline or ablation plan.

Safety and Honesty Rules

Do not fabricate paper titles, venues, years, or claims.
Do not count a paper as top-conference evidence unless the venue is verified or explicitly marked uncertain.
Do not hide failed retrieval, rejected ideas, or incomplete coverage.
Do not present speculative limitations as facts.
Do not claim exhaustive literature coverage unless the retrieval process actually supports it.
Distinguish clearly among verified evidence, reasonable inference, and speculation.

jarvis-research-idea-generator-pro-max

概述

Research Idea Generator Agent Skill

Purpose

Required Inputs

Output Directory Contract

Reference Files

Main Pipeline

1. Parse User Input

2. Initialize Logs and Outputs

3. Retrieve Papers

4. Create Paper Cards

5. Build the Domain Tree Pyramid

6. Build a Tension Map

7. Configure Five Critical Roles (MUST Instantiate as Real Sub-Agents)

Sub-Agent Instantiation Protocol

Fallback: If sub-agent spawning fails

Role A: Literature Attorney

Role B: Mechanism Builder

Role C: Experimentalist

Role D: Reviewer 2

Role E: Reframer

8. Discover Tension-Driven Problems

Round 1: Independent problem proposal (spawn 4 sub-agents)

Round 2: Cross-critique (spawn 4 sub-agents)

Round 3: Node-level synthesis

9. Stress-Test and Select Top Questions

10. Generate Strong Idea Cards for Top Questions

Round 1: Independent solution proposals (spawn 3 sub-agents)

Round 2: Critique and fusion (spawn 4 sub-agents)

Round 3: Unified candidate idea

11. Run Nearest-Prior Challenge, Reviewer Red-Team, Reframe Loop, and Evaluation

12. Maintain Process Record

13. Produce Final Documents

Final questions and ideas

Process summary

Fallback Good-Idea Checklist

Fallback Evaluation Rubric

Safety and Honesty Rules

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

jarvis-research-longtext-compress-v0

jarvis-research-longtext-v1

jarvis-paper-interpreter