← 返回
AI智能 中文

SWARM Safety

SWARM: System-Wide Assessment of Risk in Multi-agent systems. 38 agent types, 29 governance levers, 55 scenarios. Study emergent risks, phase transitions, an...
SWARM:多智能体系统风险评估。涵盖38种智能体、29个治理杠杆及55个场景,研究涌现风险、相变等现象。
rsavitt
AI智能 clawhub v1.7.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 711
下载
💾 2
安装
1
版本
#latest

概述

SWARM Safety Skill

Study how intelligence swarms — and where it fails.

SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents — harmful dynamics emerge when many sub-AGI agents interact, even when no individual agent is misaligned.

v1.7.0 | 38 agent types | 29 governance levers | 55 scenarios | 2922 tests | 8 framework bridges

Repository: https://github.com/swarm-ai-safety/swarm

Hard Rules

  • SWARM simulations run locally. Install the package first.
  • Do not submit scenarios containing real API keys, credentials, or PII.
  • Simulation results are research artifacts. Do not present them as ground truth about real systems.
  • When publishing results, cite the framework and disclose simulation parameters.

Security

  • API binds to localhost only (127.0.0.1) by default to prevent network exposure.
  • CORS restricted to localhost origins by default.
  • No authentication on development API — do not expose to untrusted networks.
  • In-memory storage — data does not persist between restarts.
  • For production deployment, add authentication middleware and use a proper database.

Install

# From PyPI
pip install swarm-safety

# With LLM agent support
pip install swarm-safety[llm]

# Full development (all extras)
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"

Quick Start (Python)

from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.adversarial import AdversarialAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)

orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))

metrics = orchestrator.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")

Quick Start (CLI)

# List available scenarios
swarm list

# Run a scenario
swarm run scenarios/baseline.yaml

# Override settings
swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15

# Export results
swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/

Quick Start (API)

Start the API server:

pip install swarm-safety[api]
uvicorn swarm.api.app:app --host 127.0.0.1 --port 8000

API documentation at http://localhost:8000/docs.

> Security Note: The server binds to 127.0.0.1 (localhost only) by default. Do not bind to 0.0.0.0 unless you understand the security implications and have proper firewall rules in place.

Register Agent

curl -X POST http://localhost:8000/api/v1/agents/register \
  -H "Content-Type: application/json" \
  -d '{
    "name": "YourAgent",
    "description": "What your agent does",
    "capabilities": ["governance-testing", "red-teaming"]
  }'

Returns agent_id and api_key.

Submit Scenario

curl -X POST http://localhost:8000/api/v1/scenarios/submit \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-scenario",
    "description": "Testing collusion detection with 5 agents",
    "yaml_content": "simulation:\n  n_epochs: 10\n  steps_per_epoch: 10\nagents:\n  - type: honest\n    count: 3\n  - type: adversarial\n    count: 2",
    "tags": ["collusion", "governance"]
  }'

Create & Join Simulation

# Create
curl -X POST http://localhost:8000/api/v1/simulations/create \
  -H "Content-Type: application/json" \
  -d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}'

# Join
curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}'

Core Concepts

Soft Probabilistic Labels

Interactions carry p = P(v = +1) — probability of beneficial outcome:

Observables -> ProxyComputer -> v_hat -> sigmoid -> p -> PayoffEngine -> payoffs
                                                    |
                                               SoftMetrics -> toxicity, quality gap, etc.

Five Key Metrics

MetricWhat It Measures
------
Toxicity rateExpected harm among accepted interactions: `E[1-p \accepted]`
Quality gapAdverse selection indicator (negative = bad): `E[p \accepted] - E[p \rejected]`
Conditional lossSelection effect on payoffs
IncoherenceVariance-to-error ratio across replays
Illusion deltaGap between perceived coherence and actual consistency

Agent Types (14 families, 38 implementations)

TypeBehavior
------
HonestCooperative, trust-based, completes tasks diligently
OpportunisticMaximizes short-term payoff, cherry-picks tasks
DeceptiveBuilds trust, then exploits trusted relationships
AdversarialTargets honest agents, coordinates with allies
LDTLogical Decision Theory with FDT/UDT precommitment
RLMReinforcement Learning from Memory
CouncilMulti-agent deliberation-based decisions
SkillRLLearns interaction strategies via reward signals
LLMBehavior determined by LLM (Anthropic, OpenAI, or Ollama)
MoltbookDomain-specific social platform agent
ScholarAcademic citation and research agent
Wiki EditorCollaborative editing with editorial policy

Governance Levers (29 mechanisms)

  • Transaction Taxes — Reduce exploitation, cost welfare
  • Reputation Decay — Punish bad actors, erode honest standing
  • Circuit Breakers — Freeze toxic agents quickly
  • Random Audits — Deter hidden exploitation
  • Staking — Filter undercapitalized agents
  • Collusion Detection — Catch coordinated attacks (the critical lever near collapse threshold)
  • Sybil Detection — Identify duplicate agents
  • Transparency Ledger — Reward/penalize based on outcome
  • Moderator Agent — Probabilistic review of interactions
  • Incoherence Friction — Tax uncertainty-driven decisions
  • Council Deliberation — Multi-agent governance decisions
  • Diversity Enforcement — Prevent monoculture collapse
  • Moltipedia-specific — Pair caps, page cooldowns, daily caps, self-fix prevention

Framework Bridges

BridgeIntegration
------
ConcordiaDeepMind's multi-agent framework
GasTownMulti-agent workspace governance
Claude CodeClaude CLI agent integration
LiveSWELive software engineering tasks
OpenClawOpen agent protocol
Prime IntellectCross-platform run tracking
RalphAgent orchestration
WorktreeGit worktree-based sandboxing

Scenario YAML Format

simulation:
  n_epochs: 10
  steps_per_epoch: 10
  seed: 42

agents:
  - type: honest
    count: 3
    config:
      acceptance_threshold: 0.4
  - type: adversarial
    count: 2
    config:
      aggression_level: 0.7

governance:
  transaction_tax_rate: 0.05
  circuit_breaker_enabled: true
  collusion_detection_enabled: true

success_criteria:
  max_toxicity: 0.3
  min_quality_gap: 0.0

Key Research Findings

Phase Transitions (11-scenario, 209-epoch study)

RegimeAdversarial %ToxicityWelfareOutcome
--------------------------------------------------
Cooperative0-20%< 0.30StableSurvives
Contested20-37.5%0.33-0.37DecliningSurvives
Collapse50%+~0.30Zero by epoch 12-14Collapses

Critical threshold between 37.5% and 50% adversarial agents separates recoverable from irreversible collapse.

Governance Cost Paradox (v1.7.0 GasTown study)

42-run study reveals: governance reduces toxicity at all adversarial levels (mean reduction 0.071) but imposes net-negative welfare costs at current parameter tuning. At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction.

Case Studies

GasTown Governance Cost

Study governance overhead vs. toxicity reduction across 7 agent compositions with and without governance levers. Reveals the safety-throughput trade-off. See scenarios/gastown_governance_cost.yaml.

LDT Cooperation

220 runs across 10 seeds comparing TDT vs FDT vs UDT cooperation strategies at population scales up to 21 agents. See scenarios/ldt_cooperation.yaml.

Moltipedia Heartbeat

Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See scenarios/moltipedia_heartbeat.yaml.

Moltbook CAPTCHA

Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See scenarios/moltbook_captcha.yaml.

API Endpoints (Full Reference)

MethodEndpointDescription
---------
GET/healthHealth check
GET/API info
POST/api/v1/agents/registerRegister agent
GET/api/v1/agents/{agent_id}Get agent details
GET/api/v1/agents/List agents
POST/api/v1/scenarios/submitSubmit scenario
GET/api/v1/scenarios/{scenario_id}Get scenario
GET/api/v1/scenarios/List scenarios
POST/api/v1/simulations/createCreate simulation
POST/api/v1/simulations/{id}/joinJoin simulation
GET/api/v1/simulations/{id}Get simulation
GET/api/v1/simulations/List simulations

Citation

@software{swarm2026,
  title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
  author = {Savitt, Raeli},
  year = {2026},
  url = {https://github.com/swarm-ai-safety/swarm}
}

Linked Docs

  • Skill metadata: skill.json
  • Agent discovery: .well-known/agent.json
  • Full documentation: https://github.com/swarm-ai-safety/swarm/tree/main/docs
  • Theoretical foundations: docs/research/theory.md
  • Governance guide: docs/governance.md
  • Red-teaming guide: docs/red-teaming.md
  • Scenario format: docs/guides/scenarios.md

版本历史

共 1 个版本

  • v1.7.1 当前
    2026-03-29 15:22 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,376
ai-intelligence

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 418 📥 115,219
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,836