← 返回
未分类

Guardian

Mandatory safety gatekeeper for AI agents performing destructive operations. Intercepts file deletion (rm/del/remove), database modifications (writes/deletes...
强制安全守门员,拦截AI代理破坏性操作:文件删除(rm/del/remove)和数据库修改(写入/删除...
tooled-app
未分类 clawhub v1.2.0 3 版本 99900.9 Key: 无需
★ 1
Stars
📥 988
下载
💾 0
安装
3
版本
#latest

概述

Guardian — Mandatory Safety Gatekeeper (v1.1)

> "The agent knew it was wrong. The knowledge didn't matter." — PocketOS log, 2026

A mandatory safety skill that intercepts destructive AI agent operations before execution. It employs a Context-Aware Risk Scoring (CARS) system to balance security with operational velocity.

This skill is mandatory. No opt-out. No override by the executing agent.

Based on the principle that reasoning is not a guardrail.

The Core Protocol (v1.1)

BEFORE any tool call:
  1. SCAN operation against DESTRUCTIVE taxonomy
  2. IF destructive → ENTER Guardian Protocol
  3. EVALUATE Risk Level via CARS Matrix
  4. EXECUTE Decision Path:
     - LOW: Auto-Approve (Log only)
     - MEDIUM: Fast-Track (Verify Backup → Proceed)
     - HIGH: Hard Block (Verify Backup → Human Approval)
  5. IF JIT Window Active → Override High-Risk prompt (Proceed if Backup Verified)

Context-Aware Risk Scoring (CARS) Matrix

Risk LevelTrigger CriteriaActionVerification Required
:---:---:---:---
LowFiles in /tmp, sandbox/, or .cache; Single file deletions in non-critical paths.Auto-ApproveNone (Log only)
MediumEdits to .config or .env files; Deletions of < 5 files in a Git-tracked directory.Fast-TrackVerified backup required (Git, snapshot, or cloud sync)
Highrm -rf on root/home; DROP TABLE; Edits to system files; Mass file deletions (>10).Hard BlockMandatory backup verification + Human Approval required regardless of backup status

Escalation Rules

ScenarioAction
------------------
ANY destructive operationBackup verification required
Low risk + verified backupPROCEED
Low risk + no backupPROCEED with warning
Medium risk + verified backupPROCEED
Medium risk + no backupHALT + Human approval required
High riskALWAYS HALT + Human approval required
Repeated same patternFlag pattern, require operator review

JIT Window Override

A JIT (Just-In-Time) window can temporarily downgrade High to Medium risk, but never eliminates the human approval requirement for High risk. Human approval is always required for High-risk destructive operations.

The Guardian Protocol Detail

Step 1: Operation Scan (automatic)

Every tool call is scanned against the taxonomy above. No agent discretion. No "I know what I'm doing."

Step 2: Backup Verification (automatic)

VERIFY-BACKUP(target):
  1. Check if target is covered by active backup system
  2. Common indicators:
     - .git repository with clean status
     - Time Machine / File History active on target volume
     - Cloud sync (OneDrive, Dropbox, Google Drive, iCloud) with recent sync
     - Explicit backup tool (restic, duplicity, rsnapshot) with recent snapshot
     - Versioned storage (ZFS snapshots, S3 versioning)
  3. IF any indicator active AND recent → RETURN VERIFIED
  4. ELSE → RETURN UNVERIFIED

Fast path: Backup verification must complete in <2 seconds. No long-running checks.

Step 3: Decision Matrix (v1.1)

Backup StatusRisk LevelAction
----------------------------------
VERIFIED ACTIVELow / MediumPROCEED with execution
VERIFIED ACTIVEHighHALT and ESCALATE to human
UNVERIFIEDAnyHALT and ESCALATE to human
UNKNOWNAnyTreat as UNVERIFIED — HALT and ESCALATE

Sidenote: If a JIT Window is active, High Risk operations are downgraded to "Fast-Track" (Proceed if Backup Verified).

Step 4: Escalation Format

When escalation is required, Guardian MUST output:

🛡️ GUARDIAN HALT
Operation: [specific tool call]
Target: [file/path/database/endpoint]
Category: [taxonomy category]
Risk Level: [CRITICAL/HIGH/MEDIUM]
Backup Status: [UNVERIFIED / last backup: X hours ago]

Proposed Action: [what the agent wants to do]
Potential Impact: [what could go wrong]

Options:
1. APPROVE — Proceed with execution (human responsibility)
2. DENY — Cancel operation
3. SNAPSHOT — Create quick backup first, then proceed
4. REVIEW — Agent provides additional justification

Guardian awaits human decision.

Mandatory Rules

  1. No Self-Approval: The executing agent cannot approve its own destructive operation.
  2. No Confidence Override: High confidence does not bypass backup verification.
  3. No Silent Destruction: Every destructive operation is logged.
  4. No Assumption of Safety: "It looks safe" is not verification. Backup status is verification.
  5. No Escalation Fatigue: If an agent generates repeated escalations for the same pattern, Guardian flags the pattern, not just the instance.

Integration

For OpenClaw / Agent Systems

Guardian operates at the tool-call layer, between the agent's decision and the tool's execution:

Agent Decision → Guardian Intercept → [Verify Backup] → Execute OR Escalate

For Standalone Agents

If the runtime doesn't support interception, Guardian operates as a mandatory pre-flight check:

BEFORE calling any tool:
  1. Agent MUST call Guardian check
  2. Guardian returns PROCEED or HALT
  3. Agent respects HALT, awaits escalation resolution

Logging

Every Guardian decision is logged:

[Timestamp] [Operation] [Category] [Backup Status] [Decision] [Approver]

Logs are append-only. No deletion by the executing agent.

Sidenote: All operations within a JIT window are tagged with [JIT-GRANTED] in the audit log.

Scope

Vanilla: This skill is generic. Not specific to any agent, platform, or deployment.

Mandatory: Once enabled, all sessions load this skill. No opt-out.

Non-Blocking (when safe): Backup-verified operations proceed without delay. No human wait for routine maintenance with verified backups.

References

  • references/OPERATION-TAXONOMY.md — Full destructive operation classification
  • references/DECISION-MATRIX.md — Detailed backup verification logic and escalation rules
  • scripts/verify-backup.ps1 — Windows backup detection script
  • scripts/verify-backup.sh — Linux/macOS backup detection script

Based On

  • AgentTrust (May 2026): Runtime safety evaluation and interception for AI agent tool use
  • Proof-of-Guardrail (Mar 2026): Cryptographic verification of guardrail claims
  • AgentDoG (Jan 2026): Diagnostic guardrail framework for AI agent safety and security
  • Confirm-Before-Destroy Pattern: Tool-level guardrails + prompt-level safeguards
  • Gemini CLI PR #25947: Versioned pre-write backups with agent-driven restore

版本历史

共 3 个版本

  • v1.2.0 当前
    2026-05-26 17:27 安全 安全
  • v1.1.0
    2026-05-23 16:13 安全 安全
  • v1.0.0
    2026-05-20 05:23 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Anti-Hallucination

tooled-app
检测并纠正代理输出中的幻觉,进行事实自检、声明验证及矛盾修正。
★ 1 📥 321

Guardian Audit

tooled-app
防篡改审计日志记录器,配套Guardian安全技能使用。捕获安全决策(停机、批准、升级),以仅追加的哈希链方式存储。
★ 1 📥 473
content-creation

ELI5-TLDR

tooled-app
强制在复杂回复后提供“像对5岁孩子一样解释(ELI5)”或“太长不看(TLDR)”选项,务必给出简化版本。
★ 0 📥 529