概述

Agent Firewall — Input/Output Guardian

Architecture

[Channel Input] → [INPUT FILTER] → [Agent/Model] → [OUTPUT FILTER] → [Channel Output]
                        ↓                                  ↓
                  ┌─────────────┐                  ┌──────────────┐
                  │ Block List  │                  │ Secret Scan  │
                  │ Pattern DB  │                  │ PII Redact   │
                  │ Rate Limit  │                  │ Path Scrub   │
                  │ Encoding Det│                  │ URL Checker  │
                  └─────────────┘                  └──────────────┘

Input Filters

#	Filter	Description
---	--------	-------------
1	Injection patterns	Regex + heuristic match for "ignore previous", "you are now", role confusion
2	Unicode sanitizer	Strip zero-width chars, control characters, RTL overrides
3	Encoding detector	Detect Base64, hex, ROT13 encoded payloads in user messages
4	Role confusion	Detect fake system messages, assistant impersonation
5	Rate limiter	Max messages per user per channel per minute
6	Size limiter	Reject inputs exceeding token budget

Output Filters

#	Filter	Description
---	--------	-------------
1	Secret scanner	High-entropy strings + known patterns (AWS key, GitHub token)
2	PII redactor	Email, phone, SSN, credit card → `[REDACTED]`
3	Path scrubber	Remove internal filesystem paths from outputs
4	URL checker	Block responses containing known malicious URLs
5	Consistency check	Verify output doesn't contradict system prompt directives

Configuration

# .security/firewall-rules.yaml
input:
  injection_patterns:
    - pattern: "ignore (all )?previous instructions"
      action: BLOCK
      severity: CRITICAL
    - pattern: "you are now (?!helping)"
      action: BLOCK
      severity: HIGH
  rate_limit:
    max_per_minute: 30
    max_per_hour: 500
  max_input_tokens: 4096

output:
  secret_patterns:
    - name: aws_key
      pattern: "AKIA[0-9A-Z]{16}"
      action: REDACT
    - name: github_token
      pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
      action: REDACT
  pii_redaction: true
  path_scrubbing: true

Guardrails

Firewall rules are append-only in production — deletion requires human approval
False positives → log, alert, pass through with warning (don't silently drop)
All blocks are logged with: timestamp, rule matched, full context, channel, user hash
Firewall itself cannot be disabled by agent instructions
Rules file is read-only from the agent's perspective

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 15:08 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Agent Firewall

概述

Agent Firewall — Input/Output Guardian

Architecture

Input Filters

Output Filters

Configuration

Guardrails

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Self-Improving + Proactive Agent

self-improving agent

Agent Browser