← 返回
未分类 中文

Sharpagent Content Safety

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports...
SharpAgent 内容安全引擎 — 可插拔的多辖区内容策略执行器,拦截、标记或放行内容,取决于加载的规则集。支持...
yezhaowang888-stack
未分类 clawhub v1.0.0 1 版本 99487.2 Key: 无需
★ 0
Stars
📥 194
下载
💾 0
安装
1
版本
#compliance#content-safety#latest#security#sharpagent

概述

SharpAgent Content Safety Engine v1.0.0

> The last line of defense for content output.

> It's not about "should we say it" — it's "how should it be said in this jurisdiction."

> Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.

Architecture Position

Layer 1: Five-Factor Review     ← Trust verification (global, immutable)
Layer 2: Calibration Framework  ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine  ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output

Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.

Contract

contract:
  name: sharpagent-content-safety
  version: "1.0.0"
  category: analysis
  trust_level: verified
  reads:
    - Content
    - CompliancePolicy
  writes:
    - SafetyVerdict
  preconditions:
    - "At least one compliance policy loaded"
    - "Content is not empty"
  postconditions:
    - "Verdict is one of: pass | flag | block"
    - "If flag or block, reason and rule reference are provided"
  calibration:
    default_mode: professional
    modes_supported: [warm, professional, deep]
  compliance:
    jurisdiction: global
    safety_level: strict
  lifecycle:
    status: active
    publish_as: SharpAgent

Core Design

Pluggable Rule Engine

rules:
  - id: "global/PII-001"
    type: "block"
    description: "Detect and block personal identifiable information"
    patterns:
      - "email"
      - "phone_number"
      - "id_card"
      - "address"
    severity: "high"

  - id: "cn/content-001"
    type: "block"
    description: "Block prohibited content per China Internet regulations"
    jurisdiction: "cn"
    severity: "critical"

  - id: "us/export-001"
    type: "flag"
    description: "Flag export-controlled technology references"
    jurisdiction: "us"
    severity: "medium"

  - id: "global/hate-speech-001"
    type: "block"
    description: "Block hate speech and discriminatory content"
    severity: "high"

  - id: "global/privacy-003"
    type: "flag"
    description: "Flag privacy-sensitive content for human review"
    severity: "medium"

Rule Structure

rule:
  id: "{jurisdiction}/{name}-{seq}"   # Unique identifier
  type: "block" | "flag" | "pass"      # Action
  description: "..."                   # Human-readable
  jurisdiction: "cn" | "us" | "eu" | "global"  # Applicable jurisdiction
  patterns: [regex...]                 # Match patterns (optional)
  keywords: [string...]                # Keyword matching (optional)
  severity: "low" | "medium" | "high" | "critical"
  exemptions: [                        # Exceptions
    "educational context",
    "news reporting"
  ]

Jurisdiction Configuration

Runtime selection (multi-select):

safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])

Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.

Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity

Workflow

Step 1: Pre-Flight

  • Content empty?
  • Content too long? Chunk at ≤4096 chars.

Step 2: Rule Matching

For each chunk:
    for each loaded rule:
        skip if jurisdiction not active
        check patterns/keywords
        check exemptions
        record match

Step 3: Verdict

VerdictMeaningAction
--------------------------
✅ passNo matchesLet through to output
⚠️ flagLow severity matchTag + allow + log
🚫 blockHigh severity matchBlock + return alternative content

Block replacement:

[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.

Step 4: Logging

{
  "event": "safety_check",
  "jurisdictions": ["cn", "global"],
  "rules_matched": [
    {"rule": "cn/content-001", "severity": "critical"}
  ],
  "verdict": "block",
  "timestamp": "2026-05-11T06:10:00Z",
  "agent": "sharpagent"
}

Ruleset Management

Built-in Rulesets

RulesetCoverageFile
-------------------------
globalUniversal safety (hate speech/PII/privacy)rules/global.yaml
cnChina internet content regulationsrules/cn.yaml
usUS export control/safe harborrules/us.yaml
euGDPR relatedrules/eu.yaml

Custom Rules

rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md

Edge Cases

SituationAction
-------------------
Conflicting jurisdiction rulesStrictest wins (block > flag > pass)
Rule false positiveAdd exemption, log false positive
Cross-chunk sensitive phraseOverlap scanning (±200 chars)
No jurisdiction configuredLoad global only
Corrupt rule fileSkip + log error, don't crash engine
Exemption conditions metSkip rule, log exemption reason

Quality Gates

CheckWhatFail action
--------------------------
At least 1 rulesetNo rules = nothing blockedDon't start
Verdict unambiguouspass/flag/blockDefault block
Block provides reasonUser knows whyAdd reason
Complete audit logEvery check recordedBackfill
Rules versionedUpdates don't break running checksSemver rules

Integration Points

Five-Factor Review

  • Safety engine output (compliance_check: fail) can trigger five-factor
  • Independent but cooperative

Calibration Framework

  • Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
  • Calibration compliance field maps to safety engine rule selection

Self-Evolving

  • Safety false positives/negatives trigger self-evolving reflection
  • New rules as improvement hypotheses

Layered Memory

  • Safety logs go to L6 archive (legal compliance)

Version History

  • v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.

SharpAgent · MIT-0 · 2026-05-11

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-12 05:53 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

📝 论文写作指导

yezhaowang888-stack
论文结构模板、引用规范、写作技巧——从开题到终稿全程陪伴。
★ 1 📥 496

学术研究助手

yezhaowang888-stack
学术研究全流程助手,提供论文写作指导、文献检索方法、学术工具推荐、期刊投稿指南、学术会议信息、科研项目管理等。适用于大学生、研究生和科研人员。支持家庭(知识库)和商业(API扩展)双模式。触发条件:用户提出与论文、文献、期刊、投稿、学术、科
★ 1 📥 768

作业批改与学生学业综合评估Skill

yezhaowang888-stack
中国中小学作业批改与学生学业综合评估。教师拍照扫描批改作业、生成单生/全班单科及综合学科知识掌握评估图并给出指导建议。触发场景:(1)教师上传/拍照学生作业进行批改 (2)查询单个学生或全班的知识掌握情况 (3)生成单科或综合学科评估报告
★ 2 📥 462