← 返回
未分类 中文

Reflect Critique Revise

Performs a multi-pass senior engineer critique and revision of code, improving quality by catching bugs, API misuse, and style issues across domains like iOS...
执行多轮高级工程师的代码审查与修改,通过捕获错误、API误用和风格问题,提高代码质量,涵盖iOS等领域。
stephenlthorn stephenlthorn 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 394
下载
💾 0
安装
1
版本
#latest

概述

Reflect, Critique, Revise

The core insight: M2.7 JANGTQ-CRACK reviewing its own output in a different

context catches a surprising percentage of its own mistakes. Generation

context and review context activate different reasoning paths even in the

same model.

The three prompts

Prompt 1 — Senior engineer critique

You are a senior {domain} engineer doing a thorough code review. You have no
investment in this code — your job is to find problems, not to validate it.

Task the author was trying to solve:
{task}

Code to review:

{draft}


{domain_specific_checklist}

For each issue you find, output:
- SEVERITY: critical | major | minor
- LOCATION: line number or function name
- ISSUE: specific problem
- FIX: concrete correction

If you find no issues, say "NO ISSUES FOUND" explicitly.

Be direct. Do not hedge. Do not explain why the code is good.

Prompt 2 — Revise based on critique

Original task:
{task}

Original code:

{draft}


Review findings:
{critique}

Produce a revised version that addresses every critical and major issue.
Minor issues: fix if clean, ignore if fix would compromise clarity.

Output ONLY the revised code. No explanation, no preamble.

Prompt 3 — Final confidence check (after final revision)

Task:
{task}

Final code:

{revised}


Rate your confidence in this code on a three-point scale:
- HIGH: you would ship this to production
- MEDIUM: works but has caveats you'd want reviewed
- LOW: has issues you can't fix without more context

Output exactly one of: HIGH, MEDIUM, LOW
Then one sentence explaining why.

Domain-specific checklists

Inject the relevant checklist into Prompt 1:

iOS checklist

Review this Swift code for:
- Deprecated API usage (any API deprecated in iOS 17+)
- Missing @MainActor annotations on UI-touching code
- Improper Task / async handling (retention cycles, missing awaits)
- SwiftUI view hierarchy issues (missing @State, @Binding, @Observable)
- SwiftData/Core Data migration safety
- Force unwraps that could crash
- Missing availability checks for iOS 26+ APIs
- Incorrect concurrency patterns (Sendable violations)

Web/frontend checklist

Review this code for:
- React rendering issues (missing keys, stale closures, effect dependencies)
- Accessibility violations (missing aria labels, keyboard navigation)
- XSS vulnerabilities (unescaped user input)
- Memory leaks (event listeners not cleaned up)
- Bundle size concerns (large imports, unused code)
- TypeScript type safety (any usage, missing types)
- Responsive / mobile breakpoint handling

Python checklist

Review this code for:
- Resource leaks (unclosed files, connections, locks)
- Exception handling gaps (bare except, swallowed errors)
- Off-by-one errors in slices/ranges
- Mutable default arguments
- Race conditions in async/threading code
- SQL injection if building queries
- Unsafe pickle/eval/exec usage
- Missing input validation

Trading checklist

Review this code for:
- Lookahead bias in backtesting (using future data)
- Survivorship bias in data selection
- Slippage/fees ignored in signal generation
- Position sizing without risk limits
- Division by zero in ratio calculations
- Missing market hours / holiday checks
- Currency/unit mixing
- Float comparison issues (use Decimal for money)

VC/analysis checklist

Review this analysis for:
- Unit confusion (ARR vs MRR, net vs gross)
- Missing risk factors (competition, moat erosion, key-person risk)
- Overly optimistic market sizing (TAM bloat)
- Unit economics fundamentals (CAC payback, LTV accuracy)
- Counterfactual reasoning (what if thesis is wrong)
- Selection bias in comparables
- Benchmark staleness

Execution logic

async def reflect_critique_revise(task, draft, domain, num_passes=2):
    current = draft
    critique_history = []

    for i in range(num_passes):
        # Pass N — critique
        critique = await llm.generate(
            prompt=PROMPT_1.format(
                task=task,
                draft=current,
                domain=domain,
                domain_specific_checklist=CHECKLISTS[domain]
            ),
            model="m27-jangtq-crack",
            system="You are a senior engineer code reviewer.",
            temperature=0.2,  # low temp for consistent critique
            max_tokens=2000
        )
        critique_history.append({"pass": i+1, "critique": critique})

        # Early exit if no issues
        if "NO ISSUES FOUND" in critique:
            break

        # Pass N — revise
        current = await llm.generate(
            prompt=PROMPT_2.format(task=task, draft=current, critique=critique),
            model="m27-jangtq-crack",
            system=f"You are a senior {domain} engineer revising code.",
            temperature=0.3,
            max_tokens=6000
        )

    # Final confidence check
    confidence_raw = await llm.generate(
        prompt=PROMPT_3.format(task=task, revised=current),
        model="m27-jangtq-crack",
        temperature=0.1,
        max_tokens=200
    )
    confidence = (
        "HIGH" if "HIGH" in confidence_raw[:10] else
        "LOW" if "LOW" in confidence_raw[:10] else
        "MEDIUM"
    )

    return {
        "code": current,
        "critique_history": critique_history,
        "confidence": confidence
    }

Cost / time

Each pass: ~2 LLM calls (critique + revise), ~5K tokens total.

Default 2 passes: ~10K tokens, ~4 minutes on M4 Max at 40 t/s.

For quick tasks you can drop to num_passes=1. For critical production code,

run num_passes=3 and escalate to Claude Code if confidence != HIGH.

Integration notes

This skill is called by coding-orchestrator as step 7. It can also

be called standalone when user says "review this code" or pastes code with

"is this right?"

When called standalone, the caller must provide domain — use route-specialist

to classify if not provided.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 10:09 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Claude Handoff

stephenlthorn
Writes a structured handoff package when local agent determines cloud Claude Code is needed. This is the ONLY path from
★ 0 📥 432

Rag Retrieve

stephenlthorn
在 TiDB 上执行域和版本感知的混合向量+BM25+元数据检索,支持多查询扩展及可选的多跳检索链。
★ 0 📥 409

Route Specialist

stephenlthorn
Classifies tasks by domain using deterministic and LLM methods, then routes to specialized prompts with tuned models and
★ 0 📥 412