Use this skill when the user wants to generate strong research ideas from a research domain, optionally with seed ideas. The skill retrieves recent top-conference papers, builds a domain development tree, constructs a tension map, discovers tension-driven open problems, stress-tests top questions, generates strong idea cards, runs nearest-prior challenges and reviewer red-team critique, reframes weak ideas, and outputs only final ideas that survive strict evaluation.
This skill is designed for research ideation, not for claiming definitive literature coverage. When evidence is incomplete, explicitly say so and record the limitation.
The user must provide:
domain: <research domain>
The user may optionally provide:
seed_ideas:
- <initial thought, hypothesis, direction, or constraint>
constraints:
target_year_range: <default: recent 3 complete publication years>
min_papers: <default: 50>
venues: <default: ICLR, ICML, NeurIPS, KDD, WWW, SIGIR, ACL, EMNLP, NAACL, CVPR, ICCV, ECCV, AAAI, IJCAI>
output_language: <default: zh-CN>
max_tree_nodes: <default: 60>
min_tree_nodes: <default: 15>
final_question_count: <default: 8>
final_idea_target: <default: 3>
idea_retry_limit: <default: 5>
replacement_limit: <default: 3>
If the domain is missing, ask the user for it. If seed ideas are missing, proceed without asking.
At the start of each run, create or verify a dated run directory under outputs/.
Resolve:
RUN_DATE = current local date formatted as YYYY-M-D, for example 2026-5-26
RUN_OUTPUT_DIR = outputs/<RUN_DATE>
All run artifacts must be written under RUN_OUTPUT_DIR. Do not write directly under the top-level outputs/ directory.
Directory structure:
outputs/
2026-5-26/
corpus/
tree_structure/
tension_map/
questions/
ideas/
logs/
final/
Required final outputs:
outputs/<RUN_DATE>/tree_structure/domain_tree.md
outputs/<RUN_DATE>/tree_structure/domain_tree.json
outputs/<RUN_DATE>/tree_structure/node_paper_mapping.csv
outputs/<RUN_DATE>/tension_map/tension_map.md
outputs/<RUN_DATE>/tension_map/tension_map.json
outputs/<RUN_DATE>/final/final_questions_and_ideas.md
outputs/<RUN_DATE>/final/process_summary.md
outputs/<RUN_DATE>/logs/process_record.md
Recommended intermediate outputs:
outputs/<RUN_DATE>/corpus/paper_index.csv
outputs/<RUN_DATE>/corpus/paper_cards.jsonl
outputs/<RUN_DATE>/corpus/retrieval_report.md
outputs/<RUN_DATE>/tension_map/tension_map.md
outputs/<RUN_DATE>/tension_map/tension_map.json
outputs/<RUN_DATE>/questions/all_node_questions.json
outputs/<RUN_DATE>/questions/all_node_questions.md
outputs/<RUN_DATE>/questions/top_questions.json
outputs/<RUN_DATE>/questions/top_questions.md
outputs/<RUN_DATE>/ideas/candidate_ideas_round_<N>.json
outputs/<RUN_DATE>/ideas/novelty_checks_round_<N>.json
outputs/<RUN_DATE>/ideas/red_team_reviews_round_<N>.json
outputs/<RUN_DATE>/ideas/reframed_ideas_round_<N>.json
outputs/<RUN_DATE>/ideas/evaluation_round_<N>.json
outputs/<RUN_DATE>/ideas/rejection_memos_round_<N>.json
Before analysis, read and operationalize this file:
references/evaluation_of_good_ideas.md
Convert it into internal checklists. Do not merely summarize it. Use evaluation_of_good_ideas.md to score, reject, and iterate ideas.
If a reference file is missing, record the error in outputs/, use the fallback criteria embedded in this skill, and explicitly mention the fallback in the final process summary.
Convert the user's request into a run configuration:
{
"domain": "...",
"seed_ideas": ["..."],
"constraints": {
"target_year_range": "recent_3_complete_publication_years",
"min_papers": 50,
"venues": ["ICLR", "ICML", "NeurIPS", "KDD", "WWW", "SIGIR", "ACL", "EMNLP", "NAACL", "CVPR", "ICCV", "ECCV", "AAAI", "IJCAI"],
"output_language": "zh-CN",
"max_tree_nodes": 60,
"min_tree_nodes": 15,
"final_question_count": 8,
"final_idea_target": 3,
"idea_retry_limit": 5,
"replacement_limit": 3
}
}
Resolve “recent 3 years” as the most recent three complete publication years. If the current year's conference proceedings are incomplete, prefer the latest complete three years and document the decision.
Create the output directories. Start an append-only process record at:
outputs/<RUN_DATE>/logs/process_record.md
The process record must include:
Retrieve at least min_papers valid recent papers from top venues relevant to the domain.
Recommended authoritative sources:
Run multi-pass retrieval:
For every candidate paper, collect:
{
"paper_id": "P-001",
"title": "...",
"authors": ["..."],
"year": 2024,
"venue": "...",
"abstract": "...",
"url": "...",
"source_url": "...",
"venue_verified_by": "...",
"evidence_status": "verified_or_uncertain_or_auxiliary",
"keywords": ["..."],
"relevance_score": 0.0,
"assigned_topics": []
}
Filtering rules:
If fewer than min_papers valid papers remain, apply recovery steps in order:
If still fewer than min_papers papers are available, stop full analysis and write:
outputs/<RUN_DATE>/logs/retrieval_failure_report.md
Include searched queries, venues, years, valid paper count, likely reason, and advice for adjusting the domain.
For each valid paper, create a structured card:
{
"paper_id": "P-001",
"title": "...",
"problem": "...",
"core_method": "...",
"main_contribution": "...",
"technical_assumptions": ["..."],
"datasets_or_benchmarks": ["..."],
"claimed_improvements": ["..."],
"limitations": ["..."],
"future_work_signals": ["..."],
"relevance_to_domain": "...",
"possible_idea_hooks": ["..."]
}
Rules:
uncertain.outputs//corpus/paper_cards.jsonl .outputs//corpus/paper_index.csv .outputs//corpus/retrieval_report.md .Organize the literature into a tree-like pyramid.
Default levels:
Level 0: user domain
Level 1: major research directions
Level 2: subdirections
Level 3: specific tasks, method families, problem types, or application scenarios
Level 4+: deeper breakdown only when evidence supports it
Each tree node must follow:
{
"node_id": "N-001",
"level": 0,
"title": "...",
"definition": "...",
"parent_id": null,
"child_ids": ["N-002"],
"representative_papers": ["P-001", "P-002"],
"dominant_methods": ["..."],
"common_assumptions": ["..."],
"known_limitations": ["..."],
"open_problem_hints": ["..."]
}
Tree constraints:
min_tree_nodes and max_tree_nodes when possible.Output:
outputs/<RUN_DATE>/tree_structure/domain_tree.md
outputs/<RUN_DATE>/tree_structure/domain_tree.json
outputs/<RUN_DATE>/tree_structure/node_paper_mapping.csv
The markdown tree must show node path, representative papers, common assumptions, limitations, and open-problem hints.
Before generating questions, convert the domain tree and paper cards into a tension map. A strong idea should usually come from a real tension, contradiction, bottleneck, hidden assumption, or evaluation mismatch. Do not jump directly from a topic node to a solution.
For each high- or medium-evidence node, identify 1-4 tensions using these sources:
Each tension must include:
{
"tension_id": "T-001",
"node_id": "N-010",
"node_path": "...",
"tension_type": "hidden_assumption_or_evaluation_mismatch_or_scaling_bottleneck_or_data_bottleneck_or_deployment_gap_or_theory_practice_inconsistency_or_method_contradiction",
"statement": "...",
"evidence_from_papers": ["P-001", "P-002"],
"why_existing_work_does_not_resolve_it": "...",
"why_it_could_matter": "...",
"strength": "high_or_medium_or_low",
"uncertainty": "..."
}
Hard rules:
Output:
outputs/<RUN_DATE>/tension_map/tension_map.md
outputs/<RUN_DATE>/tension_map/tension_map.json
MANDATORY: Each of the five roles MUST be instantiated as a real sub-agent using sessions_spawn. It is FORBIDDEN to simulate the roles within the main agent's own reasoning.
Rationale: The five roles are designed to provide genuinely independent perspectives. Simulating them within a single reasoning process creates confirmation bias — the same model with the same context will produce correlated outputs, defeating the purpose of multi-perspective review. Real sub-agents run in isolated sessions with their own context, producing truly independent critiques.
For each phase where roles are invoked (question stress-testing, idea generation, red-team review, reframe), spawn one sub-agent per role using:
sessions_spawn(
task: "<role-specific prompt with full context>",
runtime: "subagent",
mode: "run",
label: "<role_name>_<phase>"
)
Each spawned sub-agent receives:
The main agent collects all sub-agent outputs, then synthesizes. The main agent never generates role-specific content itself.
If sessions_spawn is unavailable or fails repeatedly, record the failure in outputs/ and fall back to internal role-play with the following protocol:
[FALLBACK: internal simulation, sub-agent unavailable].This fallback is a degradation, not an equivalent. Prioritize fixing sub-agent availability.
Attack novelty. Find closest prior work, likely reviewer comparisons, hidden duplicated mechanisms, and claims that would not survive a related-work section.
Demand a non-trivial mechanism. Reject ideas that are only module stacking, backbone swaps, loss tweaks, prompt tricks, or "apply X to Y" without a domain-specific mechanism.
Design the smallest credible experiment that can falsify the core hypothesis. Check datasets, baselines, metrics, ablations, stress tests, compute cost, and failure interpretation.
Write the strongest top-conference rejection case. Focus on incremental novelty, unclear contribution, weak evidence, missing baselines, unrealistic assumptions, and overclaiming.
Transform weak or incremental ideas by changing the problem framing, mechanism, evaluation target, or boundary condition. The Reframer must not merely polish wording.
Define active node:
A node with enough representative papers, known limitations, or open-problem hints to support at least one concrete research problem.
Usually include Level 1 and below. Include Level 0 only if it supports meaningful field-level problems.
For each candidate active node, decide evidence density:
high: at least 5 representative papers and multiple concrete limitations or open-problem hints
medium: 2-4 representative papers and at least one concrete limitation or open-problem hint
low: fewer than 2 representative papers or only vague limitations
Generate 0-3 final problems per candidate active node:
For each active node or high-value tension, run three rounds:
Spawn sub-agents for Roles A, B, C, and E. Each sub-agent independently proposes candidate problems. Do NOT merge their outputs until all have completed.
Each problem proposal must include:
{
"question_id": "Q-001",
"problem_statement": "...",
"source_tension_ids": ["T-001"],
"problem_source": "hidden_assumption_or_evaluation_mismatch_or_scaling_bottleneck_or_data_bottleneck_or_deployment_gap_or_theory_practice_inconsistency_or_method_contradiction",
"why_it_matters": "...",
"evidence_from_papers": ["P-001", "P-002"],
"current_limitation": "...",
"affected_node_id": "N-010",
"level": 2,
"what_would_change_if_solved": "..."
}
Spawn sub-agents for Roles A-D. Each sub-agent independently critiques ALL proposals from Round 1. Do NOT merge their critiques until all have completed.
Each critique must address:
The main agent synthesizes the final problems for the node according to its evidence density.
Each final node problem must be:
If a problem duplicates an earlier one, merge it. Generate a replacement only when the node still has enough evidence for another independent problem.
Output:
outputs/<RUN_DATE>/questions/all_node_questions.json
outputs/<RUN_DATE>/questions/all_node_questions.md
All generated questions are scored independently by the five roles.
Use 1-5 scoring:
Significance
Novelty
Evidence
Tractability
Idea Potential
Fit to Good-Idea Checklist
Tension Strength
Falsifiability
Each score entry:
{
"question_id": "Q-001",
"scores": {
"significance": 5,
"novelty": 4,
"evidence": 5,
"tractability": 4,
"idea_potential": 5,
"fit_to_good_idea": 4,
"tension_strength": 5,
"falsifiability": 4
},
"rationale": "..."
}
The main agent computes:
final_score = mean(all_agent_dimension_scores)
Apply diversity constraints:
final_question_count, default 8.Output:
outputs/<RUN_DATE>/questions/top_questions.json
outputs/<RUN_DATE>/questions/top_questions.md
For each selected top question, run three rounds. The goal is not to fill a quota; the goal is to produce a small number of defensible strong ideas.
Spawn sub-agents for Roles B, C, and E. Each sub-agent independently proposes solutions. Do NOT merge until all have completed.
Roles B, C, and E should cover different solution origins:
Each proposal must include:
{
"idea_title": "...",
"target_question_id": "Q-001",
"source_tension_ids": ["T-001"],
"one_sentence_core_insight": "...",
"why_non_obvious": "...",
"assumption_changed": "...",
"core_hypothesis": "...",
"method_overview": "...",
"what_is_new": "...",
"why_it_might_work": "...",
"key_mechanism": "...",
"why_not_x_plus_y": "...",
"required_resources": "...",
"possible_experiments": ["..."],
"minimum_falsifiable_experiment": "...",
"risks": ["..."]
}
Spawn sub-agents for Roles A-D. Each sub-agent independently critiques ALL proposals from Round 1. Do NOT merge until all have completed.
Each critique must address:
The main agent synthesizes at most one strong candidate idea per top question with:
Hard reject candidate ideas immediately if any condition is true:
Output each round to:
outputs/<RUN_DATE>/ideas/candidate_ideas_round_<N>.json
Before scoring a candidate idea, run an idea-specific novelty check. Do not rely only on the initial domain corpus.
For each candidate idea:
verified, uncertain, or insufficient.Each nearest-prior challenge must produce:
{
"idea_id": "I-001",
"queries": ["..."],
"closest_prior_work": [
{
"paper_id_or_url": "...",
"title": "...",
"year": 2024,
"venue": "...",
"why_close": "...",
"delta": "...",
"what_reviewer_would_claim": "..."
}
],
"novelty_evidence_status": "verified_or_uncertain_or_insufficient",
"novelty_risk": "...",
"defense_against_x_plus_y_claim": "...",
"true_new_knowledge_claim": "..."
}
If no credible closest prior work can be found, do not automatically treat the idea as novel. Mark the novelty evidence as uncertain and record the search limitation.
Save novelty checks to:
outputs/<RUN_DATE>/ideas/novelty_checks_round_<N>.json
Run a reviewer red-team (spawn 5 sub-agents, one per role) before final evaluation. Each sub-agent independently writes its critique. The main agent collects all critiques and synthesizes the red-team verdict.
Each sub-agent output must include:
{
"idea_id": "I-001",
"reviewer_2_rejection": "...",
"literature_attorney_objections": ["..."],
"mechanism_builder_objections": ["..."],
"experimentalist_objections": ["..."],
"fatal_flaws": ["..."],
"fixable_flaws": ["..."],
"recommended_decision_before_reframe": "pass_or_reframe_or_reject"
}
Save red-team reviews to:
outputs/<RUN_DATE>/ideas/red_team_reviews_round_<N>.json
If flaws are fixable, run a reframe loop instead of superficial revision:
Save reframed ideas to:
outputs/<RUN_DATE>/ideas/reframed_ideas_round_<N>.json
Use references/evaluation_of_good_ideas.md as the primary rubric. If unavailable, use the fallback rubric in this skill.
Default evaluation record:
{
"idea_id": "I-001",
"one_sentence_core_insight": "...",
"scores": {
"novelty": 0,
"importance": 0,
"feasibility": 0,
"clarity": 0,
"technical_depth": 0,
"evaluation_plan": 0,
"difference_from_existing_work": 0,
"risk_awareness": 0
},
"score_scale": "0-10_per_dimension_(integer)",
"weights": {
"novelty": 0.15,
"importance": 0.15,
"feasibility": 0.12,
"clarity": 0.10,
"technical_depth": 0.15,
"evaluation_plan": 0.13,
"difference_from_existing_work": 0.10,
"risk_awareness": 0.10
},
"total_score": 0,
"decision": "pass",
"novelty_evidence_status": "verified_or_uncertain_or_insufficient",
"red_team_decision": "pass_or_reframe_or_reject",
"minimum_falsifiable_experiment": "...",
"closest_prior_work_delta": "...",
"rejection_reasons": ["..."],
"must_fix": ["..."]
}
All dimensions are scored from 0 to 10 (integers only). Compute total_score as the weighted sum defined in references/evaluation_of_good_ideas.md, rounded to the nearest integer.
Default pass rule:
total_score >= 8
AND novelty >= 9
AND importance >= 9
AND feasibility >= 8
AND clarity >= 8
AND technical_depth >= 8
AND evaluation_plan >= 7
AND difference_from_existing_work >= 8
AND risk_awareness >= 7
AND the idea is not a shallow combination of existing work
AND the idea is not a mere domain transfer without deep adaptation
AND the idea has a clear experimental validation path
AND the idea can articulate contribution beyond prior work
AND novelty_evidence_status is not insufficient
AND red_team_decision is pass
AND the idea has a one-sentence non-obvious core insight
AND the minimum falsifiable experiment can produce an interpretable negative result
If an idea fails, generate a rejection memo:
{
"idea_id": "I-001",
"target_question_id": "Q-001",
"failed_criteria": ["..."],
"rejection_reasons": ["..."],
"must_fix": ["..."],
"forbidden_retry_patterns": ["..."]
}
Then rerun the three-round idea brainstorming for that question.
Retry rules:
idea_retry_limit, default 5.Replacement rules:
If a question fails to produce a passing idea after maximum retries:
retired.replacement_limit, default 3.Aim for final_idea_target, default 3, but do not force the count. If fewer strong ideas pass after all retries and replacements, final outputs must state the true number of passing ideas. Never present failed ideas as accepted final ideas.
Maintain outputs/ throughout the run.
Use this structure:
# Process Record
## 1. Input
- Domain:
- Seed ideas:
- Constraints:
## 2. Reference Loading
- evaluation_of_good_ideas:
- Fallback used:
## 3. Retrieval
- Year range:
- Venues:
- Search queries:
- Retrieved papers:
- Valid papers after filtering:
- Recovery actions if any:
## 4. Paper Analysis
- Paper cards generated:
- Uncertain claims:
## 5. Tree Construction
- Number of nodes:
- Number of levels:
- Main directions:
- Merge/split decisions:
## 6. Tension Map
- Tensions generated:
- High-strength tensions:
- Weak or discarded tensions:
- Key contradictions or bottlenecks:
## 7. Question Generation
- Active nodes:
- Expected question range:
- Actual questions:
- Skipped low-evidence nodes:
- Questions rejected for weak tension or weak falsifiability:
- Deduplication notes:
## 8. Question Stress Test and Voting
- Top questions:
- Diversity constraints applied:
## 9. Idea Generation, Red-Team, and Evaluation
- Question:
- Iteration count:
- Novelty check status:
- Red-team decision:
- Reframe actions:
- Rejections:
- Final decision:
## 10. Final Results
- Qualified ideas:
- Retired questions:
- Known limitations:
Keep the process record concise. Do not include full private deliberation. Include decisions, evidence, and outcomes.
Write:
outputs/<RUN_DATE>/final/final_questions_and_ideas.md
Required sections:
# Final Questions and Ideas
## 1. Run Summary
- Domain:
- Number of papers analyzed:
- Number of tree nodes:
- Number of tensions generated:
- Number of high-strength tensions:
- Number of candidate questions:
- Number of final qualified ideas:
## 2. Key Tensions
| Tension ID | Node Path | Type | Statement | Strength |
|---|---|---|---|---|
## 3. Selected Top Questions
| Question ID | Source Tensions | Node Path | Level | Question | Score |
|---|---|---|---:|---|---:|
## 4. Final Strong Ideas
### Idea 1: <title>
#### Target Question
#### Node Path
#### Source Tensions
#### Motivation
#### One-Sentence Core Insight
#### Why This Insight Is Non-Obvious
#### Assumption Changed or Boundary Condition Exposed
#### Core Hypothesis
#### Key Mechanism
#### Proposed Method
#### Why This Is Not X + Y
#### Novelty
#### Difference from Existing Work
#### Nearest-Prior Challenge
#### Minimum Falsifiable Experiment
#### Full Experimental Plan
#### Expected Contribution
#### Risks and Mitigation
#### Reviewer Red-Team
#### Evaluation Score
Write:
outputs/<RUN_DATE>/final/process_summary.md
Required sections:
# Process Summary
## Input
## Corpus
## Tree Structure Summary
## Tension Map Summary
## Question Discovery Summary
## Question Stress Test and Voting Summary
## Idea Red-Team and Reframe Summary
## Final Outcome
## Known Limitations
Use this only if references/evaluation_of_good_ideas.md cannot be read.
A good research idea should be:
Use this only if references/evaluation_of_good_ideas.md cannot be read.
Score out of 10 (integers):
Novelty: 1.5 (max 1.5)
Importance: 1.5 (max 1.5)
Feasibility: 1.2 (max 1.2)
Clarity: 1.0 (max 1.0)
Technical Depth: 1.5 (max 1.5)
Evaluation Plan: 1.3 (max 1.3)
Difference from Existing Work: 1.0 (max 1.0)
Risk Awareness: 1.0 (max 1.0)
Total max: 10.0
Pass if:
total >= 8
novelty >= 9
importance >= 9
feasibility >= 8
clarity >= 8
technical_depth >= 8
evaluation_plan >= 7
difference_from_existing_work >= 8
risk_awareness >= 7
Reject if any of these are true:
共 2 个版本