概述

ZT4AI Self-Audit

Audit your agent's skills, workspace, and configuration against Zero Trust for AI principles.

Background

AI agents process instructions and data as indistinguishable tokens in a context window. This means:

Skill files loaded into context can inject behavioral instructions
Workspace files (SOUL.md, AGENTS.md) are both operating instructions AND attack surface
External inputs (web content, emails, ClawHub skills) can contain prompt injection
Credentials in plaintext config files have no access scoping or rotation

This skill applies three frameworks:

Microsoft ZT4AI — Verify explicitly, least privilege, assume breach
"Caging the Agents" (arXiv:2603.17419) — Four-layer defense: workload isolation, credential proxy, network egress, prompt integrity
OWASP Agentic AI Top 10 — Trust boundary violations, privilege escalation, resource exhaustion

Audit Process

Step 1: Inventory Skills

Scan all three skill locations:

echo "=== System ===" && ls /usr/lib/node_modules/openclaw/skills/ 2>/dev/null
echo "=== User ===" && ls ~/.openclaw/skills/ 2>/dev/null
echo "=== Workspace ===" && ls ~/.openclaw/workspace/skills/ 2>/dev/null

Step 2: Classify Each Skill

Assign every skill to a risk category using the classification guide in references/risk-classification.md.

Categories:

Behavioral modifiers (🔴 highest risk) — Skills that change how you think, override safety instincts, or inject decision-making patterns into your context
Credential handlers (🟡 elevated risk) — Skills that read, write, or transmit API keys, tokens, passwords
System modifiers (🟡 elevated risk) — Skills that write to config files, modify system state, or execute with elevated privileges
Tool wrappers (🟢 standard risk) — Skills that wrap external tools with well-scoped inputs/outputs
Read-only (🟢 low risk) — Skills that only read data and produce reports

Step 3: Audit Each Skill Against ZT4AI Principles

For each skill, evaluate against the checklist in references/audit-checklist.md.

Quick reference — the three questions:

Verify explicitly: Does this skill verify identity/authorization before acting? Does it distinguish owner from non-owner input?
Least privilege: Does this skill request only the access it needs? Could its scope be narrowed?
Assume breach: If this skill were compromised (poisoned update, prompt injection in its files), what's the worst outcome? How would you detect it?

Step 4: Check Scripts and Executables

Find all executable code in skills:

find ~/.openclaw/skills/ ~/.openclaw/workspace/skills/ \
  -type f \( -name "*.sh" -o -name "*.py" -o -name "*.js" \) \
  2>/dev/null | sort

For each script, check:

Does it access credentials? (grep -li "API_KEY\|SECRET\|TOKEN\|PASSWORD" )
Does it make network calls? (grep -li "curl\|wget\|requests\|fetch\|http" )
Does it write to system config? (grep -li "openclaw.json\|\.env\|/etc/" )
Does it execute arbitrary input? (grep -li "eval\|exec\|subprocess\|system(" )

Step 5: Generate Integrity Baseline

Create SHA256 checksums of all skill files for future drift detection:

find ~/.openclaw/skills/ ~/.openclaw/workspace/skills/ \
  -type f \( -name "*.md" -o -name "*.sh" -o -name "*.py" -o -name "*.js" -o -name "*.json" \) \
  -exec sha256sum {} \; | sort -k2 > memory/skill-integrity-baseline.md

To verify against an existing baseline:

sha256sum -c memory/skill-integrity-baseline.md 2>&1 | grep -v ": OK$"

Any output indicates modified files — investigate before trusting.

Step 6: Assess Workspace File Security

Check the self-modification surface:

Can the agent modify its own SOUL.md / AGENTS.md? (Yes by default — flag it)
Are memory files loaded into context? (Yes — they're instruction vectors)
Is MEMORY.md loaded in non-owner contexts? (Should NOT be — data leak risk)
Are there credentials in workspace files? (grep -rli "api_key\|password\|secret" ~/.openclaw/workspace/)

Step 7: Check Network Egress

Assess outbound network restrictions:

# Check for firewall rules
iptables -L OUTPUT -n 2>/dev/null || echo "No iptables access"
ufw status 2>/dev/null || echo "No UFW"

# Check what the agent can reach
curl -s -o /dev/null -w "%{http_code}" https://httpbin.org/get --max-time 5

If the agent has unrestricted outbound access, flag as a security gap — a compromised agent could exfiltrate data to any destination.

Step 8: Produce Report

Generate a structured report using the template in references/report-template.md. Include:

Risk classification for each skill
Specific findings with severity ratings
Recommended remediations with priority
Action tier assignments (see references/action-tiers.md)

Save report to memory/zt4ai-audit-YYYY-MM-DD.md.

Ongoing Monitoring

After the initial audit:

Re-verify integrity after any skill install/update (sha256sum -c against baseline)
Re-audit behavioral skills whenever they're updated — these are the highest risk
Update baseline after intentional skill modifications
Schedule periodic audits via cron (monthly recommended)

References

references/risk-classification.md — Detailed classification criteria with examples
references/audit-checklist.md — Per-skill audit checklist
references/action-tiers.md — Graduated trust model for agent actions
references/report-template.md — Audit report template

版本历史

共 1 个版本

v1.0.0 当前

2026-03-31 05:11 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

ZT4AI Self-Audit

概述

ZT4AI Self-Audit

Background

Audit Process

Step 1: Inventory Skills

Step 2: Classify Each Skill

Step 3: Audit Each Skill Against ZT4AI Principles

Step 4: Check Scripts and Executables

Step 5: Generate Integrity Baseline

Step 6: Assess Workspace File Security

Step 7: Check Network Egress

Step 8: Produce Report

Ongoing Monitoring

References

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

MoltGuard - Security & Antivirus & Guardrails

1password

Tmux