Deep Research Pipeline

Deep Research Pipeline turns broad questions into cited, publication-quality reports through a staged research workflow: planning, multi-query retrieval, chunk selection, analysis, reflection, writing, and optional verification.

It is designed for research that should not be answered from memory or a single search result. The pipeline keeps claims tied to sources, surfaces contradictions, tracks gaps, and can resume from checkpoints.

Why Use It

Multi-stage research, not one-shot summarization — separate researcher, analyst, reflection, and writer stages.
Citation integrity — findings and final claims trace back to URLs/sources.
Reflection loops — the pipeline checks coverage and decides whether another cycle is needed.
Portable LLM config — supports LLM_API_KEY/LLM_API_BASE, OpenAI-compatible endpoints, or Z.AI GLM.
Operational controls — checkpoint/resume, time limits, token budgets, output formats, and mock mode.

When to Use

Deep research, comprehensive analysis, literature reviews, competitive analysis, fact-checking, technology deep-dives — anything needing multiple sources, synthesis, and verified citations.

Quick Start

cd skills/deep-research

# Optional: configure any OpenAI-compatible provider
export LLM_API_KEY="your-key"
export LLM_API_BASE="https://api.example.com/v1"
export LLM_MODEL="your-model"

# Or use OpenAI-compatible env names
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.example.com/v1"

# Run a report
python3 scripts/research_pipeline.py \
  "Compare Vercel, Netlify, and Cloudflare Pages in 2026" \
  --max-cycles 2 \
  --format report \
  --output report.md

# Test without API calls
python3 scripts/research_pipeline.py "test question" --mock --output report.md

If no universal/OpenAI-compatible variables are set, the skill still supports Z.AI via ZAI_API_KEY and ZAI_API_ENDPOINT.

Architecture

ORCHESTRATOR (you)
    │
    ├── Plan → Decompose question into research dimensions
    │
    ├── REFLECTION LOOP (0-8 cycles)
    │   ├── Researcher Agent (parallel) → multi-query search + chunk selection
    │   ├── Analyst Agent → dedupe + themes + contradictions
    │   └── Reflection → coverage check, gap analysis, continue decision
    │
    ├── Writer Agent → polished report (report/summary/brief/json)
    │
    └── Verify (optional) → adversarial fact-check

Key principle: Orchestrator NEVER searches directly. Clean output flows between stages only.

Two Modes

Mode 1: Full Pipeline CLI (Recommended)

Use the enhanced research_pipeline.py for automated end-to-end research:

# Full research with all features
python3 scripts/research_pipeline.py "What is the state of quantum computing in 2026?" \
    --max-cycles 3 \
    --output report.md \
    --format report

# Mock mode (no API calls, for testing)
python3 scripts/research_pipeline.py "test question" --mock --output report.md

# With budget limits
python3 scripts/research_pipeline.py "question" \
    --max-cycles 3 --time-limit 300 --token-limit 40000

# Resume from checkpoint
python3 scripts/research_pipeline.py "question" \
    --resume checkpoint.json --output report.md

# Explicit dimensions
python3 scripts/research_pipeline.py "question" \
    --dimensions architecture benchmarks limitations \
    --output report.md --format summary

CLI Flags:

Flag	Default	Description
------	---------	-------------
`--max-cycles`	3	Max research cycles (1-8)
`--mock`	false	Use mock data, no API calls
`--output` / `-o`	stdout	Output file path
`--format` / `-f`	report	Output format: `report`, `summary`, `brief`, `json`
`--time-limit`	900	Max seconds for entire pipeline
`--token-limit`	60000	Max estimated tokens
`--checkpoint`	none	Save checkpoints to path
`--resume`	none	Resume from checkpoint file
`--dimensions`	auto	Explicit research dimensions
`--no-parallel`	false	Research dimensions sequentially

Output formats:

report — Full markdown: Executive Summary → Key Findings → Detailed Analysis → Contradictions → Gaps → Sources → Methodology
summary — Executive summary + top 5 findings + sources
brief — Bullet-point format for quick scanning
json — Structured JSON with annotated findings and metadata

Mode 2: Orchestrated Sub-Agents (For complex research)

Use when you need fine-grained control over each stage or parallel dimension research with sub-agents.

Workflow (Orchestrated Mode)

Phase 1: Planning

Analyze question, create slug, make memory/research// directory
Generate research plan with dimensions and questions
Save to plan.md

Phase 2: Research Cycle (repeat up to 8 times)

Step A: Spawn Researcher Agent(s)

Use sessions_spawn with a task brief (NOT the full query):

{
  "dimension": "technical architecture",
  "specific_questions": ["How does X work?", "What are Y's components?"],
  "context_limit": 5000,
  "max_sources": 10
}

Researcher agent does:

Multi-query generation — scripts/query_generator.py produces 3-5 variants
Parallel search — web_search for each variant
Content fetching — web_fetch for top results
LLM chunk selection — scripts/chunk_selector.py scores each chunk (≥0.7)
Context expansion — scripts/context_expander.py fetches surrounding content
Output: JSON findings with citations

Can spawn 2-3 researcher agents in parallel for different dimensions.

Step B: Spawn Analyst Agent

After researcher(s) complete, spawn analyst with their combined output:

Deduplicate overlapping findings
Flag contradictions (explicit + implicit)
Group into thematic clusters
Identify gaps
Output: Cleaned JSON + gap list

Step C: Run Reflection

After analyst completes, run scripts/reflection.py:

What's covered? (themes + confidence scores)
What gaps remain? (unanswered questions)
What contradictions emerged?
New directions discovered?
Should continue? (coverage ≥ 0.8 + minor gaps → stop)

Save reflection to memory/research//reflection-cycle-N.md

Continue Decision

Coverage ≥ 0.8 AND gaps minor → proceed to Phase 3
Major contradictions → spawn targeted researcher
Significant gaps → another researcher cycle
Hard stop at cycle 8

Phase 3: Write Report

Use the Writer Agent (scripts/writer.py) for publication-quality output:

# From Python
from writer import WriterAgent, OutputFormat, write_report

# Generate report using WriterAgent
agent = WriterAgent(use_llm=True)
result = agent.write_report(
    analyst_output,           # from analyst or run_analyst()
    question="What is RAG?",
    fmt=OutputFormat.REPORT,
)

# Or use convenience function
result = write_report(analyst_output, question, fmt="report")

# Save to file
from writer import save_report
save_report(result, "output/report.md")

Report features:

🟢🟡🟠🔴 Confidence indicators on every finding
[source_url] inline citations throughout
⚠️ Contradiction callout boxes where sources disagree
Structured sections: Summary → Findings → Analysis → Contradictions → Gaps → Sources → Methodology
Template-based fallback when no LLM available

Phase 4: Verify (optional sub-agent)

Spawn adversarial verifier:

Anchor every claim to source
Verify URLs with web_fetch
Remove unsourced claims
Save to review.md

Phase 5: Deliver

Fix any FATAL issues from review
Copy to final.md
Write provenance.md (date, cycles, sources, verification status)
Send summary to user

Python API

import sys, os
sys.path.insert(0, os.path.expanduser("~/.openclaw/workspace/skills/deep-research/scripts"))

from research_pipeline import run_enhanced_pipeline

result = run_enhanced_pipeline(
    question="What is the state of quantum computing in 2026?",
    max_cycles=3,
    dimensions=["hardware", "algorithms", "applications", "challenges"],
    mock_mode=False,
    output_format="report",
    time_limit=900,
    token_limit=60000,
    checkpoint_path="checkpoint.json",    # auto-saves progress
    parallel_dimensions=True,             # parallel research per dimension
)

# result["report"] = markdown string
# result["cycles_completed"] = int
# result["final_coverage"] = float (0.0-1.0)
# result["metadata"] = dict with timing, findings count, etc.

Scripts

Script	Purpose	Usage
--------	---------	-------
`research_pipeline.py`	Full pipeline orchestration	`python3 scripts/research_pipeline.py "question" --max-cycles 3`
`query_generator.py`	Generate 3-5 search query variants	`python3 scripts/query_generator.py -q "..."`
`chunk_selector.py`	LLM scores chunks, filters by threshold	`python3 scripts/chunk_selector.py -q "..." -c chunks.json`
`context_expander.py`	Fetch surrounding context for incomplete chunks	`python3 scripts/context_expander.py -s selected.json -q "..."`
`reflection.py`	Mandatory gap/contradiction check	`python3 scripts/reflection.py -q "..." -f findings.json -c 1`
`writer.py`	Publication-quality report generation	`from writer import WriterAgent, write_report`
`analyst.py`	Dedup + themes + contradictions (no API needed)	`from analyst import analyze_findings`
`researcher.py`	Multi-source research orchestration	`from researcher import research, research_dimension`
`research_sources.py`	Search adapters (web, GitHub, docs)	`from research_sources import WebSearchSource`
`fact-checker.py`	Claim extraction + source ranking	`python3 scripts/fact-checker.py "text" --sources '["url1"]'`

All LLM-enabled scripts use the shared provider-agnostic llm_client.py.

Provider resolution order:

LLM_API_KEY + LLM_API_BASE + optional LLM_MODEL
OPENAI_API_KEY + OPENAI_API_BASE / OPENAI_BASE_URL + optional OPENAI_MODEL
ZAI_API_KEY + optional ZAI_API_ENDPOINT / GLM_MODEL

If no key is configured, use --mock for local pipeline testing or rely on scripts with rule-based fallbacks where available.

Examples

Example 1: Quick Competitive Analysis

python3 scripts/research_pipeline.py \
    "Compare Vercel vs Netlify vs Cloudflare Pages features and pricing 2026" \
    --max-cycles 2 \
    --dimensions features pricing performance ecosystem \
    --format summary \
    --output competitive-analysis.md

Example 2: Deep Technology Research

python3 scripts/research_pipeline.py \
    "What is the current state of AI agent frameworks?" \
    --max-cycles 4 \
    --time-limit 600 \
    --token-limit 80000 \
    --checkpoint /tmp/ai-agents-checkpoint.json \
    --format report \
    --output ai-agents-research.md

Example 3: Literature Review (mock mode for testing)

python3 scripts/research_pipeline.py \
    "What does the research say about transformer architecture efficiency?" \
    --mock \
    --max-cycles 3 \
    --format report \
    --output literature-review.md

Example 4: Bullet Brief for Quick Scanning

python3 scripts/research_pipeline.py \
    "What are the latest developments in Rust web frameworks?" \
    --max-cycles 2 \
    --format brief \
    --output rust-web-brief.md

Example 5: JSON Output for Programmatic Use

python3 scripts/research_pipeline.py \
    "What is the market size of edge computing?" \
    --max-cycles 2 \
    --format json \
    --output edge-computing-data.json

Integration with Night Shift

To queue research plans for Night Shift execution:

Create a research plan file:

// memory/research/queued/<slug>.json
{
  "question": "What is the state of quantum computing in 2026?",
  "max_cycles": 3,
  "dimensions": ["hardware", "algorithms", "applications"],
  "output_format": "report",
  "output_path": "memory/research/quantum-2026/final.md",
  "time_limit": 600,
  "created_at": "2026-04-25T06:00:00Z"
}

Night Shift picks up queued plans and runs them via:

python3 scripts/research_pipeline.py "$QUESTION" \
    --max-cycles $MAX_CYCLES \
    --dimensions $DIMENSIONS \
    --format $FORMAT \
    --output $OUTPUT_PATH \
    --time-limit $TIME_LIMIT

Results are saved to memory/research//final.md with provenance metadata.

File Layout

memory/research/<slug>/
├── plan.md                    # Research plan with dimensions
├── reflection-cycle-1.md      # Reflection after each cycle
├── reflection-cycle-2.md
├── researcher-output-*.json   # Raw researcher findings
├── analyst-output.json        # Merged/deduped findings
├── draft.md                   # First draft
├── brief.md                   # Verified brief
├── review.md                  # Adversarial review (optional)
├── final.md                   # Final report
├── provenance.md              # Metadata + source verification status
└── checkpoint.json            # Pipeline checkpoint (auto-saved)

Quick Mode

Skip sub-agents and the full pipeline. Do 5-10 searches yourself. Still use evidence tables, verify URLs, cite sources. Shorter, inline in chat.

Integrity Commandments

Never fabricate a source — no URL = don't mention it
Never claim existence without checking
Never extrapolate unread details
Read before summarizing
No fake certainty — never say "verified" unless checked
Never invent numbers/benchmarks/comparisons
Separate observations from inferences
Every claim traces to a source — citation integrity is mandatory
Reflection is not optional — run it after every cycle
Stage separation — orchestrator never searches, researchers never see full plan

Scale Decision

Single fact → Quick Mode (3-10 tool calls, no sub-agents)
2-3 item comparison → 2 parallel researcher sub-agents, 2-3 cycles
Broad/multi-faceted → 3-4 researcher sub-agents, 3-5 cycles
PhD-level deep dive → 4+ researchers, 5-8 cycles

Deep Research Pipeline

概述

Deep Research Pipeline

Why Use It

When to Use

Quick Start

Architecture

Two Modes

Mode 1: Full Pipeline CLI (Recommended)

Mode 2: Orchestrated Sub-Agents (For complex research)

Workflow (Orchestrated Mode)

Phase 1: Planning

Phase 2: Research Cycle (repeat up to 8 times)

Step A: Spawn Researcher Agent(s)

Step B: Spawn Analyst Agent

Step C: Run Reflection

Continue Decision

Phase 3: Write Report

Phase 4: Verify (optional sub-agent)

Phase 5: Deliver

Python API

Scripts

Examples

Example 1: Quick Competitive Analysis

Example 2: Deep Technology Research

Example 3: Literature Review (mock mode for testing)

Example 4: Bullet Brief for Quick Scanning

Example 5: JSON Output for Programmatic Use

Integration with Night Shift

File Layout

Quick Mode

Integrity Commandments

Scale Decision

See Also

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Obsidian

web-tools-guide

Summarize