Code Research Crafter
Craft comprehensive research proposals from code analysis to GitHub RFC publication.
Workflow
Phase 1: Problem Discovery & Code Analysis
- Ask the user for the target codebase (URL or local path) and the research topic. If neither is provided, do not proceed — ask the user to clarify.
- Map the project structure: use
glob */.{ts,js,py,go,rs,java} based on the detected language. Read README.md, CONTRIBUTING.md, and docs in docs/ for context.
- Search for topic-relevant files:
grep "[keyword]" src/** to locate key implementations.
- Read the top relevant files and document findings in
research-context.md:
- Code Map: file paths and their roles (table format)
- Problem List: each problem with
file:line reference and severity (high/medium/low)
- Metrics: quantified issues (e.g., "3/10 modules lack error handling", "40% of functions have no tests")
- Search existing GitHub issues:
gh issue list -R [repo] --search "[topic]" --limit 20.
Error handling: If the codebase is inaccessible, ask for an alternative URL or local path. If the topic is too broad, narrow down with the user before proceeding.
Phase 2: Academic & Community Research
- Load
references/academic-research-guide.md for search methodology.
- Use WebSearch for academic papers:
"site:arxiv.org [topic] 2024 2025", "site:scholar.google.com [topic]".
- Use WebFetch to read top 3-5 relevant papers and extract: algorithms, data structures, evaluation methods.
- Search GitHub discussions:
gh api repos/[owner]/[repo]/discussions --jq '.[].title' (if discussions are enabled).
- Analyze community sentiment from issues: note pain points, feature requests, and maintainer feedback patterns.
- Append findings to
research-context.md under sections:
- Academic Insights: algorithms, approaches, evaluation metrics
- Community Pulse: top pain points, requested features, maintainer stance
- Gaps: current implementation vs. best practices
Error handling: If no academic papers are found, note the gap and proceed with community research only. If the repo has no issues/discussions, focus on academic research and documentation review.
Phase 3: Solution Design
- Load
references/architecture-patterns.md for proven design patterns.
- Define evidence-based design principles derived from Phase 1-2 findings.
- Design a layered architecture:
- Layer 1 — Foundation: data collection and storage
- Layer 2 — Enhancement: core features building on Foundation
- Layer 3 — Intelligence: AI/ML capabilities on accumulated data
- Layer 4 — Governance: control, monitoring, and policy enforcement
- Define data models: dual-track (user-defined/static + system-learned/dynamic).
- Plan phased implementation with milestones:
- Phase 1 → Foundation (weeks 1-4)
- Phase 2 → Enhancement (weeks 5-8)
- Phase 3 → Intelligence (weeks 9-12)
- Phase 4 → Governance (weeks 13-16)
- Document trade-offs: migration path, backward compatibility, performance cost, risk assessment.
Checkpoint: Present the proposed solution design to the user. Wait for approval before proceeding. If the user requests changes, iterate on the design and re-present.
Phase 4: Documentation Generation
- Determine language needs:
- If the target project's primary language is Chinese → generate bilingual (Chinese + English) documents
- If the target project is international → generate English-only documents
- Always generate the RFC in English (the lingua franca of open source)
- Generate a structured technical document using python-docx (if available) or markdown:
- Include: table of contents, numbered headings, citations, references section
- Use consistent terminology throughout
- Save as
proposal.md (and proposal.docx if python-docx is available)
Phase 5: RFC Writing
- Load
references/rfc-template.md for the standard RFC template.
- Write the RFC in English with these required sections:
```
# RFC: [Title]
## Metadata
- Author: [name]
- Date: [YYYY-MM-DD]
- Status: Draft
- Related Issues: #[issue numbers]
## Problem Statement
[Quantified problem with code evidence and metrics]
## Prior Art
[Academic research, existing solutions, and community context]
## Proposed Solution
[Architecture, data models, API design, implementation phases]
## Trade-offs
[Cost analysis, migration path, backward compatibility, risks]
## Open Questions
[Unresolved decisions needing community input]
## Call for Collaboration
[How to get involved, what help is needed]
```
- Include code examples (with syntax highlighting) and ASCII architecture diagrams.
- Reference specific GitHub issues and discussions using
#123 format.
- Self-review: verify every claim has a citation (code location or paper reference).
Phase 6: GitHub Publication
- Check authentication:
gh auth status. If not authenticated, provide setup instructions and ask the user to configure.
- Save the RFC as
rfc-[slug].md in the project's docs/ or proposals/ directory.
- Create a GitHub issue:
```bash
gh issue create -R [owner]/[repo] \
--title "RFC: [Title]" \
--body-file rfc-[slug].md \
--label "enhancement" --label "RFC"
```
- If
gh CLI is unavailable, try GitHub API via curl:
```bash
curl -X POST -H "Authorization: token $GITHUB_TOKEN" \
https://api.github.com/repos/[owner]/[repo]/issues \
-d '{"title":"RFC: [Title]","body":"[RFC content]","labels":["enhancement","RFC"]}'
```
- If all CLI options fail, output the RFC markdown with manual submission instructions:
- URL to create issue:
https://github.com/[owner]/[repo]/issues/new
- Suggested title and labels
- Full RFC content to paste
- Reference related issues in the created issue body. Do NOT tag maintainers unless the user explicitly asks.
Output Artifacts
| Artifact | Format | Description |
|----------|--------|-------------|
| research-context.md | Markdown | Running document updated through Phases 1-3 |
| proposal.md / proposal.docx | MD/DOCX | Structured technical document |
| rfc-[slug].md | Markdown | RFC in standard format |
| GitHub Issue | Web | Link to published RFC |
Best Practices
- Quote specific code locations — always reference file paths and line numbers
- Quantify problems — use metrics like "50% of files" or "3x performance improvement"
- Cite recent research — prefer papers from 2024-2025
- Design for adoption — include migration paths and gradual rollout plans
- Track costs — document token usage, performance implications, and resource requirements
- Engage early — reference existing issues and invite collaboration from the start
- Self-review citations — verify every claim has a code location or paper reference