← 返回
安全合规 中文

AI Safety Guard

Lightweight passive privacy guard for OpenClaw — intelligently prevents user data from leaking externally. TRIGGER: before the AI sends or outputs any data t...
轻量级被动隐私保护工具,用于OpenClaw——智能防止用户数据外泄。触发条件:AI发送或输出任何数据前
andreqingyuwu
安全合规 clawhub v1.0.6 3 版本 100000 Key: 无需
★ 5
Stars
📥 688
下载
💾 6
安装
3
版本
#latest

概述


AI Safety Guard 🛡️

Lightweight informative privacy guard — intelligently prevents user data from leaking externally and notifies the user of all security actions taken without interrupting the workflow.


The One Principle

> Trace the transmission back to the user's stated task. If it belongs, execute and briefly notify. If it doesn't, the AI decides (anonymize/cancel) and informs the user of the action taken — no interruptions.


The Core Loop

AI notices: I am about to send [data] to [somewhere external]
    ↓
Is this part of the user's stated task?
    ↓
YES → Execute. notify and continue work

NO  → AI decides:
        Necessary for the task? → Anonymize → notify and continue work
        Not necessary? → Warning user
    ↓
PHISHING SUSPECTED → Block. Warning user.
   

Decision Guide

Part of the User's Stated Task — Execute

The user asked for this (named destination, provided data as part of request, asked for an action that inherently requires this transmission). Just execute. Briefly notify.

Not Part of the User's Stated Task — AI Decides

The AI is acting on its own — the transmission was not part of what the user asked for. The AI decides:

Is the transmission necessary for the user's current goal?
    ↓
NECESSARY — would fulfill a legitimate goal
  → Anonymize the data if a useful partial form exists (see masking table)
  → Proceed silently
  → If no useful anonymized form exists (e.g. raw credentials)
    → Silently cancel — do not transmit raw credential
  → Warning user once, and continue

UNNECESSARY — the AI is speculating or "helpfully" adding data
  → Silently Cancel
  → Warning user

Phishing Suspected — Block + Warning user + User Confirm

AI notices: credential going to a suspicious domain
(misspelled, unexpected, no HTTPS, mismatched brand)
→ Silently block — do not transmit
→ Send exactly ONE warning to the user:
  "I'm not going to send your credentials to [domain].
   This doesn't look like [expected service] — possible phishing.
   Did you mean [correct domain]?"
→ Do not offer options, do not ask for confirmation
→ Wait for the user to either correct the destination or explicitly confirm

Masking Table

TypeAnonymized ExampleWhen to Use
---------------------------------------
Phone number138**5678Data belongs to user's task, but sending raw serves no additional purpose
Email addressa**@domain.comRecipient can verify from domain
Bank card**1234Partial display sufficient for identification
Bank account**7890Last 4 digits for reference purposes
IP address192.168.1.*Network context preserved, exact IP hidden
Home address[ADDRESS PARTIALLY HIDDEN]City/country level only
IBAN**5678Last 4 digits for reference
Tax ID*567890Last 3 digits for reference

No useful anonymized form (never send raw): passwords, API keys, bearer tokens, session cookies, private keys, 2FA codes.


How to Determine If This Is Part of the User's Task

Look at the last 3–5 user messages. Ask: "did the user ask me to do this specific transmission?"

YES — part of user's stated task (execute silently):
  - User named the destination
  - User provided the data as part of the request
  - User asked for an action that inherently requires this transmission
  - User said "share with X", "post to Y", "call this API", "email to Z"
  - User asked to draft a document containing specific data they provided
  - User asked to let someone know their phone number / email / etc.

NO — AI acting autonomously (decide silently):
  - AI found the data in a file and decided to use it
  - AI is generating a response containing data the user didn't ask for
  - AI is "helpfully" including user data the task doesn't require
  - No mention of the destination or transmission in user messages

How to Determine Necessity

Applies only when the transmission is not part of the user's stated task. Answer:

Is the transmission clearly serving the user's current goal?
  YES → NECESSARY → anonymize if possible, otherwise cancel → notify and continue work
  NO  → UNNECESSARY → cancel → notify and continue work

The key question is: "is this transmission what the user actually wants me to accomplish?" — not "does this data exist?"


Typical Scenarios

Scenario 1 — Part of user's task: login with credentials

User: Log into Gmail, password is MyPass123
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute the login, send credential to Gmail
→ Never display MyPass123 anywhere
→ notify and continue work

Scenario 2 — Part of user's task: email containing credentials

User: Send bob an email saying my password is MyPass123
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute — The password will be in the email as requested
→ notify and continue work

Scenario 3 — Part of user's task: send a photo

User: Send my vacation photo to alice@example.com
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute the send
→ notify and continue work

Scenario 4 — Part of user's task: medical record transmission

User: Email my lab results to dr.smith@hospital.com
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute — user knows what they're doing
→ notify and continue work

Scenario 5 — Not part of user's task: unnecessary credential use

User: [no mention of credentials]

AI: [decides to paste API key into a code review]
→ NO — not part of user's stated task → guard active
→ Necessary? Likely NO (user didn't ask for this)
→ CANCEL
→ notify

Scenario 6 — Not part of user's task: sensitive data that can't be anonymized

User: Help me apply for this job — here's my ID document

AI: [prepares to upload the ID document to the job portal]
→ NO — not part of user's stated task (user provided it as context,
  not as a direct upload instruction)
→ The data cannot be meaningfully anonymized
→ CANCEL the external upload
→ notify the user: "e.g. I won't upload your ID document externally —
  please upload it yourself to ensure it reaches the correct destination."
  (This is the one exception to "never mention the guard" —
  when user provided the data directly and expects it to be used,
  explain why you didn't.)

Scenario 7 — Not part of user's task: unnecessary photo use

User: [no mention of photos]

AI: [finds a photo on the desktop and decides to "helpfully"
    include it in a response]
→ NO — not part of user's stated task → guard active
→ Necessary? NO — user didn't ask for this photo
→ CANCEL
→ notify

Scenario 8 — Part of user's task: include provided data in a document

User: Draft a contract that needs my phone number in it

AI: [prepares to include 13812345678 in the contract]
→ YES — part of user's stated task → EXECUTE
→ Include the phone number as requested
→ notify and continue work

Scenario 9 — Phishing detected

AI notices: credential going to gma1l.com (misspelled Gmail),
or an unexpected domain, or a site without HTTPS.
→ SILENTLY BLOCK — do not transmit
→ ONE warning:
  "I'm not going to send credentials to [domain].
   This doesn't look like [expected service]. Possible phishing.
   Did you mean [correct domain]?"
→ Do not ask questions, do not offer options
→ Wait for the user to respond with the correct destination

Scenario 10 — Part of user's task: share provided contact info

User: Let the team know my phone number is 13812345678

AI: [prepares to send an email with the phone number]
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Send the phone number as requested
→ notify and continue work

Scenario 11 — Local credential use

Reading .env, ~/.netrc, SSH config for local auth.
→ No concern. Use for local authentication freely.
→ Just never output the raw credential in visible output.
→ notify and continue work

What This Is NOT

  • Not a nagger — once a transmission is part of the user's task, it executes silently without interruption
  • Not a constant output filter — activates only on external transmission
  • Not a content moderator — does not judge the user's own content
  • Not a phishing detector alone — phishing check is one part of the process
  • Not file access control — local operations are unrestricted
  • Not a pattern matcher — judges by task alignment, not by regex

版本历史

共 3 个版本

  • v1.0.6 当前
    2026-05-01 08:55 安全 安全
  • v1.0.2
    2026-03-30 03:09
  • v1.0.1
    2026-03-20 00:42

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

security-compliance

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 89 📥 30,583
security-compliance

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装 MoltGuard,保护您和您的用户免受提示注入、数据泄露和恶意攻击。
★ 116 📥 30,694
security-compliance

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,134