← 返回
未分类 中文

Vibecoding Pro

Transform your AI coding workflow from "write and hope" to "iterate with precision." VibeCoding Pro implements the Generator-Evaluator dual-agent pattern (in...
将AI编码工作流从“写完碰运气”转变为“精准迭代”。VibeCoding Pro 实现生成器‑评估器双代理模式(...
zmy1006-sudo zmy1006-sudo 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 290
下载
💾 0
安装
1
版本
#latest

概述

VibeCoding Pro

> The AI coding upgrade that actually ships working software.

VibeCoding is fun. VibeCoding Pro is reliable.


What VibeCoding Gets Wrong

Most AI coding workflows look like this:

You → "build a login form" → AI generates → "looks good!" → ship it
                                            ↑
                                   This is the problem.

Why it's broken: The same AI that generated the code judges whether it works. It suffers from cognitive commitment bias — it can't objectively evaluate what it just built because it already committed to the approach. Bugs survive. Edge cases break. UX issues ship.

The evidence: Anthropic's 2026 engineering research ran an experiment. Solo Claude agents produced 2D game makers where the core game loop was fundamentally broken — entities rendered but ignored all player input. The agent called its own output "working." Only when a separate Evaluator agent physically clicked through the game did it discover the wiring between entity definitions and game runtime was severed.


What VibeCoding Pro Gets Right

User Goal / Spec
      ↓
 ┌─────────────┐
 │  Generator  │ ← "Build X according to spec"
 │  (vibe)     │
 └──────┬──────┘
        │ artifact
        ↓
 ┌────────────────────────────────────┐
 │           Evaluator                │
 │  • Reads SPEC (NOT generator output)│
 │  • Opens URL in real browser        │
 │  • Clicks, fills, navigates         │
 │  • Scores on rubric (0-100)          │
 │  • Returns structured JSON feedback  │
 └────────────────┬───────────────────┘
                  │ score + feedback
                  ↓
         ┌────────────────┐
         │ score ≥ threshold? │
         │ YES → Done     │
         │ NO → Generator  │
         └────────┬────────┘
                  └── Loop (5-15 rounds)

The structural fix: Evaluator never reads the generator's code, reasoning, or commit messages. It only reads the SPEC and operates the deployed artifact. This eliminates anchoring bias architecturally — not through clever prompting.


When to Use VibeCoding Pro

ScenarioApply?Why
-----------------------
React / H5 / Web UI with real interactions✅ YesPlaywright can actually click through it
Multi-step form flows (wizard, checkout, onboarding)✅ YesEvaluator can exercise each step
API + frontend integration✅ YesEvaluator calls endpoints and checks DB state
Single utility function⚠️ OptionalMight be overkill
Pure backend logic (no UI)⚠️ Use API Evaluator templateEvaluator calls endpoints directly
Design-sensitive work (brand identity, layout)✅ YesHuman-in-the-loop variant works best

Quick Start

Step 1: Write a Spec Contract

The SPEC is the most important artifact. It's the Evaluator's only reference.

# Spec: [Feature Name] v1.0

## Goal
[One sentence: what exists when this is done?]

## Functional Requirements
- FR-001: [Specific, testable, observable]
- FR-002: [...]

## Interaction Specifications
- UI-001: [User clicks X → Y happens]
- UI-002: [Form accepts type Y, rejects type N]

## Acceptance Criteria
- AC-001: [Measurable outcome]
- AC-002: [...]

## Out of Scope
- [Explicitly NOT required]

## Test Scenarios
**Scenario 1:** Happy path — normal user completes primary action
**Scenario 2:** Edge case — empty data, error state
**Scenario 3:** Boundary — max input length, concurrent actions

Step 2: Run the Loop

  1. Generator Agent receives: SPEC + iteration history + previous Evaluator feedback
  2. Generator builds artifact and deploys
  3. Evaluator Agent receives: SPEC + deployed URL (NOT generator code)
  4. Evaluator opens browser, clicks through test scenarios, screenshots, scores
  5. Evaluator returns structured JSON with score breakdown
  6. If score ≥ threshold → done. If not → loop back to Generator.

Architecture Reference

See references/architecture.md for:

  • Four architecture variants (Sequential / Parallel / Staged / Human-in-loop)
  • GAN theory deep-dive and why it works
  • Spec Contract template (copy-paste ready)
  • History format and loop control logic
  • Anti-patterns and how to fix them

Evaluator Templates

See references/evaluator-prompts.md for:

TemplateWhen to UseEvaluator Mode
---------------------------------------
Web/H5 UIReact/Vue/H5/Web componentsPlaywright browser automation
API/BackendREST endpoints, microservicesDirect HTTP calls
Content/DocsReports, copy, documentationStructured text scoring

Each template includes:

  • System prompt (calibrated for evaluator independence)
  • User prompt with rubric
  • Required JSON output schema
  • 4 calibration examples (30/60/85/95 score ranges)

Iteration Loop Scripts

See scripts/iteration_loop.py for a complete Python implementation:

  • run_generator() — adapt to your agent (Claude API, OpenAI, subagent, etc.)
  • run_evaluator() — adapt to your QA stack (Playwright, HTTP client, etc.)
  • Full loop control: plateau detection, approach switching, escalation
  • CLI: python iteration_loop.py --spec spec.md --url http://localhost:3000 --threshold 85 --rounds 15

See scripts/calibrate_evaluator.py for evaluator calibration utility:

  • Run on 4 known examples before production
  • Auto-detects score drift and suggests rubric adjustments

Scoring Rubric

Default rubric (adjust weights by domain):

DimensionWeightMeasures
----------------------------
Functional completeness30%Every spec requirement works end-to-end
Interaction quality25%Click/form/nav behavior as a real user
Edge case handling20%Error states, empty data, boundary inputs
Code/design quality15%Consistency, readability, no anti-patterns
Originality/craft10%Avoids template defaults and AI slop patterns

Threshold guidelines:

Use CasePASS_THRESHOLDMAX_ROUNDS
--------------------------------------
Internal prototype7010
User-facing feature8515
Production critical9520 + human review

Why This Works (Research Background)

Source: Anthropic Engineering, "Harness Design for Long-Running Application Development" (March 2026)

Key findings:

  • Solo Claude agents on 16-feature game maker: core game loop broken, entity runtime wiring severed
  • Full harness (Generator + Evaluator): fully working, sprite animation, sound, AI-assisted level design
  • Opus 4.6 vs 4.5: improved planning reduced harness complexity needed
  • Evaluator value is situational: worth the cost when task exceeds what the model reliably does solo

GAN theory parallel: The Generator tries to fool the Evaluator. The Evaluator tries to catch failures the Generator misses. The adversarial tension drives quality upward. Unlike ML GANs, this uses natural language feedback — it's fully inspectable and steerable.


Common Mistakes

MistakeWhy It FailsFix
---------------------------
Same agent generates and evaluatesCognitive anchoring biasSeparate agents with separate prompts
Evaluator reads generator's codeJudges intent, not realityShow only deployed URL
Skipping calibrationScore inflation/driftRun 3-5 known examples first
Vague scoring ("7/10 looks fine")Unactionable feedbackRequire structured JSON per rubric
Too few roundsGenerator never convergesMinimum 10 rounds for complex UI
Never switching approachGets stuck in local minimumSwitch strategy after 3 plateauing rounds
Using for trivial tasksOverhead > valueReserve for multi-feature/full-page work

OpenClaw Integration

In OpenClaw, use the coder + tester subagents:

Generator → sessions_spawn(agentId="coder", ...)
Evaluator → sessions_spawn(agentId="tester", ...) + browser tool

The tester subagent should use the Playwright MCP tool:

  • browser_navigate → open URL
  • browser_click → interact
  • browser_fill → form input
  • browser_screenshot → capture evidence

Built on Anthropic's 2026 engineering research. Inspired by GAN theory and adversarial validation patterns.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 18:29 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

Mcporter

steipete
使用 mcporter CLI 直接列出、配置、认证及调用 MCP 服务器/工具(支持 HTTP 或 stdio),涵盖临时服务器、配置编辑及 CLI/类型生成功能。
★ 195 📥 67,825
design-media

DESIGN.md — AI时代设计规范技能

zmy1006-sudo
DESIGN.md 设计与品牌规范技能。用户需要创建项目设计规范、为AI编写品牌规范或生成符合品牌风格的页面时激活。 触发词:DESIGN.md、设计规范、品牌规范、设计系统、生成设计文档、写DESIGN.md、设计语言。 用于:创建/更新
★ 0 📥 1,462
dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 680 📥 328,565