← 返回
未分类 中文

Agent Architecture Analysis

Use when auditing an agent codebase against the 12-Factor Agents methodology, reviewing LLM-powered system architecture, or assessing agentic app compliance....
用于根据12因素代理方法论审计代理代码库、审查基于LLM的系统架构或评估代理应用合规性。
anderskev anderskev 来源
未分类 clawhub v1.0.2 2 版本 100000 Key: 无需
★ 0
Stars
📥 455
下载
💾 1
安装
2
版本
#latest

概述

12-Factor Agents Compliance Analysis

> Reference: 12-Factor Agents

Input Parameters

ParameterDescriptionRequired
----------------------------------
docs_pathPath to documentation directory (for existing analyses)Optional
codebase_pathRoot path of the codebase to analyzeRequired

Analysis Framework

Factor 1: Natural Language to Tool Calls

Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.

Search Patterns:

# Look for Pydantic schemas
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"

# Look for JSON schema generation
grep -r "model_json_schema\|json_schema" --include="*.py"

# Look for structured output generation
grep -r "output_type\|response_model" --include="*.py"

File Patterns: /agents/.py, /schemas/.py, */models/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongAll LLM outputs use Pydantic/dataclass schemas with validators
PartialSome outputs typed, but dict returns or unvalidated strings exist
WeakLLM returns raw strings parsed manually or with regex

Anti-patterns:

  • json.loads(llm_response) without schema validation
  • output.split() or regex parsing of LLM responses
  • dict[str, Any] return types from agents
  • No validation between LLM output and handler execution

Factor 2: Own Your Prompts

Principle: Treat prompts as first-class code you control, version, and iterate on.

Search Patterns:

# Look for embedded prompts
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"

# Look for template systems
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"

# Look for prompt directories
find . -type d -name "prompts"

File Patterns: /prompts/, /templates/, */agents/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongPrompts in separate files, templated (Jinja2), versioned
PartialPrompts as module constants, some parameterization
WeakPrompts hardcoded inline in functions, f-strings only

Anti-patterns:

  • f"You are a {role}..." inline in agent methods
  • Prompts mixed with business logic
  • No way to iterate on prompts without code changes
  • No prompt versioning or A/B testing capability

Factor 3: Own Your Context Window

Principle: Control how history, state, and tool results are formatted for the LLM.

Search Patterns:

# Look for context/message management
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"

# Look for custom serialization
grep -r "to_xml\|to_context\|serialize" --include="*.py"

# Look for token management
grep -r "token_count\|max_tokens\|truncate" --include="*.py"

File Patterns: /context/.py, /state/.py, */core/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongCustom context format, token optimization, typed events, compaction
PartialBasic message history with some structure
WeakRaw message accumulation, standard OpenAI format only

Anti-patterns:

  • Unbounded message accumulation
  • Large artifacts embedded inline (diffs, files)
  • No agent-specific context filtering
  • Same context for all agent types

Factor 4: Tools Are Structured Outputs

Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.

Search Patterns:

# Look for tool/response schemas
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"

# Look for deterministic handlers
grep -r "def handle_\|def execute_" --include="*.py"

# Look for validation layer
grep -r "model_validate\|parse_obj" --include="*.py"

File Patterns: /tools/.py, /handlers/.py, */agents/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongAll tool outputs schema-validated, handlers type-safe
PartialMost tools typed, some loose dict returns
WeakTools return arbitrary dicts, no validation layer

Anti-patterns:

  • Tool handlers that directly execute LLM output
  • eval() or exec() on LLM-generated code
  • No separation between decision (LLM) and execution (code)
  • Magic method dispatch based on string matching

Factor 5: Unify Execution State

Principle: Merge execution state (step, retries) with business state (messages, results).

Search Patterns:

# Look for state models
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"

# Look for dual state systems
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep -r "sqlite\|database\|repository" --include="*.py"

# Look for state reconstruction
grep -r "load_state\|restore\|reconstruct" --include="*.py"

File Patterns: /state/.py, /models/.py, */database/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongSingle serializable state object with all execution metadata
PartialState exists but split across systems (memory + DB)
WeakExecution state scattered, requires multiple queries to reconstruct

Anti-patterns:

  • Retry count stored separately from task state
  • Error history in logs but not in state
  • LangGraph checkpoints + separate database storage
  • No unified event thread

Factor 6: Launch/Pause/Resume

Principle: Agents support simple APIs for launching, pausing at any point, and resuming.

Search Patterns:

# Look for REST endpoints
grep -r "@router.post\|@app.post" --include="*.py"
grep -r "start_workflow\|pause\|resume" --include="*.py"

# Look for interrupt mechanisms
grep -r "interrupt_before\|interrupt_after" --include="*.py"

# Look for webhook handlers
grep -r "webhook\|callback" --include="*.py"

File Patterns: /routes/.py, /api/.py, */orchestrator/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongREST API + webhook resume, pause at any point including mid-tool
PartialLaunch/pause/resume exists but only at coarse-grained points
WeakCLI-only launch, no pause/resume capability

Anti-patterns:

  • Blocking input() or confirm() calls
  • No way to resume after process restart
  • Approval only at plan level, not per-tool
  • No webhook-based resume from external systems

Factor 7: Contact Humans with Tools

Principle: Human contact is a tool call with question, options, and urgency.

Search Patterns:

# Look for human input mechanisms
grep -r "typer.confirm\|input(\|prompt(" --include="*.py"
grep -r "request_human_input\|human_contact" --include="*.py"

# Look for approval patterns
grep -r "approval\|approve\|reject" --include="*.py"

# Look for structured question formats
grep -r "question.*options\|HumanInputRequest" --include="*.py"

File Patterns: /agents/.py, /tools/.py, */orchestrator/.py

Compliance Criteria:

LevelCriteria
-----------------
Strongrequest_human_input tool with question/options/urgency/format
PartialApproval gates exist but hardcoded in graph structure
WeakBlocking CLI prompts, no tool-based human contact

Anti-patterns:

  • typer.confirm() in agent code
  • Human contact hardcoded at specific graph nodes
  • No way for agents to ask clarifying questions
  • Single response format (yes/no only)

Factor 8: Own Your Control Flow

Principle: Custom control flow, not framework defaults. Full control over routing, retries, compaction.

Search Patterns:

# Look for routing logic
grep -r "add_conditional_edges\|route_\|should_continue" --include="*.py"

# Look for custom loops
grep -r "while True\|for.*in.*range" --include="*.py" | grep -v test

# Look for execution mode control
grep -r "execution_mode\|agentic\|structured" --include="*.py"

File Patterns: /orchestrator/.py, /graph/.py, */core/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongCustom routing functions, conditional edges, execution mode control
PartialFramework control flow with some customization
WeakDefault framework loop with no custom routing

Anti-patterns:

  • Single path through graph with no branching
  • No distinction between tool types (all treated same)
  • Framework-default error handling only
  • No rate limiting or resource management

Factor 9: Compact Errors into Context

Principle: Errors in context enable self-healing. Track consecutive errors, escalate after threshold.

Search Patterns:

# Look for error handling
grep -r "except.*Exception\|error_history\|consecutive_errors" --include="*.py"

# Look for retry logic
grep -r "retry\|backoff\|max_attempts" --include="*.py"

# Look for escalation
grep -r "escalate\|human_escalation" --include="*.py"

File Patterns: /agents/.py, /orchestrator/.py, */core/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongErrors in context, retry with threshold, automatic escalation
PartialErrors logged and returned, no automatic retry loop
WeakErrors logged only, not fed back to LLM, task fails immediately

Anti-patterns:

  • logger.error() without adding to context
  • No retry mechanism (fail immediately)
  • No consecutive error tracking
  • No escalation to humans after repeated failures

Factor 10: Small, Focused Agents

Principle: Each agent has narrow responsibility, 3-10 steps max.

Search Patterns:

# Look for agent classes
grep -r "class.*Agent\|class.*Architect\|class.*Developer" --include="*.py"

# Look for step definitions
grep -r "steps\|tasks" --include="*.py" | head -20

# Count methods per agent
grep -r "async def\|def " agents/*.py 2>/dev/null | wc -l

File Patterns: */agents/.py

Compliance Criteria:

LevelCriteria
-----------------
Strong3+ specialized agents, each with single responsibility, step limits
PartialMultiple agents but some have broad scope
WeakSingle "god" agent that handles everything

Anti-patterns:

  • Single agent with 20+ tools
  • Agent with unbounded step count
  • Mixed responsibilities (planning + execution + review)
  • No step or time limits on agent execution

Factor 11: Trigger from Anywhere

Principle: Workflows triggerable from CLI, REST, WebSocket, Slack, webhooks, etc.

Search Patterns:

# Look for entry points
grep -r "@cli.command\|@router.post\|@app.post" --include="*.py"

# Look for WebSocket support
grep -r "WebSocket\|websocket" --include="*.py"

# Look for external integrations
grep -r "slack\|discord\|webhook" --include="*.py" -i

File Patterns: /routes/.py, /cli/.py, **/main.py

Compliance Criteria:

LevelCriteria
-----------------
StrongCLI + REST + WebSocket + webhooks + chat integrations
PartialCLI + REST API available
WeakCLI only, no programmatic access

Anti-patterns:

  • Only if __name__ == "__main__" entry point
  • No REST API for external systems
  • No event streaming for real-time updates
  • Trigger logic tightly coupled to execution

Factor 12: Stateless Reducer

Principle: Agents as pure functions: (state, input) -> (state, output). No side effects in agent logic.

Search Patterns:

# Look for state mutation patterns
grep -r "\.status = \|\.field = " --include="*.py"

# Look for immutable updates
grep -r "model_copy\|\.copy(\|with_" --include="*.py"

# Look for side effects in agents
grep -r "write_file\|subprocess\|requests\." agents/*.py 2>/dev/null

File Patterns: /agents/.py, /nodes/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongImmutable state updates, side effects isolated to tools/handlers
PartialMostly immutable, some in-place mutations
WeakState mutated in place, side effects mixed with agent logic

Anti-patterns:

  • state.field = new_value (mutation)
  • File writes inside agent methods
  • HTTP calls inside agent decision logic
  • Shared mutable state between agents

Factor 13: Pre-fetch Context

Principle: Fetch likely-needed data upfront rather than mid-workflow.

Search Patterns:

# Look for context pre-fetching
grep -r "pre_fetch\|prefetch\|fetch_context" --include="*.py"

# Look for RAG/embedding systems
grep -r "embedding\|vector\|semantic_search" --include="*.py"

# Look for related file discovery
grep -r "related_tests\|similar_\|find_relevant" --include="*.py"

File Patterns: /context/.py, /retrieval/.py, */rag/.py

Compliance Criteria:

LevelCriteria
-----------------
StrongAutomatic pre-fetch of related tests, files, docs before planning
PartialManual context passing, design doc support
WeakNo pre-fetching, LLM must request all context via tools

Anti-patterns:

  • Architect starts with issue only, no codebase context
  • No semantic search for similar past work
  • Related tests/files discovered only during execution
  • No RAG or document retrieval system

Output Format

Gate order: Do not assign Strong / Partial / Weak or treat recommendations as observed facts until Hard gates (after Analysis Workflow) are satisfied for the factors in scope.

Executive Summary Table

| Factor | Status | Notes |
|--------|--------|-------|
| 1. Natural Language -> Tool Calls | **Strong/Partial/Weak** | [Key finding] |
| 2. Own Your Prompts | **Strong/Partial/Weak** | [Key finding] |
| ... | ... | ... |
| 13. Pre-fetch Context | **Strong/Partial/Weak** | [Key finding] |

**Overall**: X Strong, Y Partial, Z Weak

Per-Factor Analysis

For each factor, provide:

  1. Current Implementation
    • Evidence with file:line references
    • Code snippets showing patterns
  1. Compliance Level
    • Strong/Partial/Weak with justification
  1. Gaps
    • What's missing vs. 12-Factor ideal
  1. Recommendations
    • Actionable improvements with code examples

Analysis Workflow

  1. Initial Scan
    • Run search patterns for all factors
    • Identify key files for each factor
    • Note any existing compliance documentation
  1. Deep Dive (per factor)
    • Read identified files
    • Evaluate against compliance criteria
    • Document evidence with file paths
  1. Gap Analysis
    • Compare current vs. 12-Factor ideal
    • Identify anti-patterns present
    • Prioritize by impact
  1. Recommendations
    • Provide actionable improvements
    • Include before/after code examples
    • Reference roadmap if exists
  1. Summary
    • Compile executive summary table
    • Highlight strengths and critical gaps
    • Suggest priority order for improvements

Hard gates (evidence before scores)

Run these in order. Do not skip ahead: each Pass is an objective condition you can check (paths on disk, citations present), not internal certainty.

  1. Scan gate — After the initial scan (workflow step 1), Pass: for every factor (1–13) you have either (a) ≥1 repo-relative path or glob hit to inspect, or (b) a one-line note with rationale (e.g. search command/output, or “no matches — codebase may omit this concern”). Empty hand-waving (“looks fine”) fails this gate.
  2. Evidence gate (per factor) — Before writing Strong / Partial / Weak for that factor, Pass: “Current Implementation” includes ≥1 citation with file path plus line range or short quoted snippet from codebase_path, or an explicit no evidence located statement after targeted reads. If evidence is missing after search, default that factor to Weak unless the criterion is clearly N/A (say why).
  3. Synthesis gate — Executive summary table and per-factor analysis sections, Pass: only after gates 1–2 are satisfied for the factors in scope. Recommendations may name new files or patterns only as proposals; they must not be presented as observed facts without matching citations from step 2.

Quick Reference: Compliance Scoring

ScoreMeaningAction
------------------------
StrongFully implements principleMaintain, minor optimizations
PartialSome implementation, significant gapsPlanned improvements
WeakMinimal or no implementationHigh priority for roadmap

When to Use This Skill

  • Evaluating new LLM-powered systems
  • Reviewing agent architecture decisions
  • Auditing production agentic applications
  • Planning improvements to existing agents
  • Comparing frameworks or implementations

版本历史

共 2 个版本

  • v1.0.2 当前
    2026-05-03 06:06 安全 安全
  • v1.0.0
    2026-03-31 01:39 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,093 📥 821,196
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,390 📥 321,807
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 831 📥 294,920