← 返回
AI智能 中文

Cluster Agent Swarm

Complete Platform Agent Swarm — A coordinated multi-agent system for Kubernetes and OpenShift platform operations. Includes Orchestrator (Jarvis), Cluster Op...
完整平台代理群——Kubernetes 与 OpenShift 平台运营的协同多代理系统,包含编排器(Jarvis)及集群运维功能。
kcns008
AI智能 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 721
下载
💾 52
安装
1
版本
#latest

概述

Cluster Agent Swarm — Complete Platform Operations

This is the complete cluster-agent-swarm skill package. When you add this skill, you get

access to ALL 7 specialized agents working together as a coordinated swarm.

Installation Options

Install All Skills (Recommended)

npx skills add https://github.com/kcns008/cluster-agent-swarm-skills

This installs all 7 agents as a single combined skill with access to all capabilities.

Install Individual Skills

Each agent can also be installed separately:

# Orchestrator - Task routing and coordination
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/orchestrator

# Cluster Ops - Atlas (cluster operations)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/cluster-ops

# GitOps - Flow (ArgoCD, Helm, Kustomize)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/gitops

# Security - Shield (RBAC, policies, CVEs)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/security

# Observability - Pulse (metrics, alerts, incidents)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/observability

# Artifacts - Cache (registries, SBOM, promotions)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/artifacts

# Developer Experience - Desk (namespaces, onboarding)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/developer-experience

The Swarm — Agent Roster

AgentCode NameSession KeyDomain
---------------------------------------
OrchestratorJarvisagent:platform:orchestratorTask routing, coordination, standups
Cluster OpsAtlasagent:platform:cluster-opsCluster lifecycle, nodes, upgrades
GitOpsFlowagent:platform:gitopsArgoCD, Helm, Kustomize, deploys
SecurityShieldagent:platform:securityRBAC, policies, secrets, scanning
ObservabilityPulseagent:platform:observabilityMetrics, logs, alerts, incidents
ArtifactsCacheagent:platform:artifactsRegistries, SBOM, promotion, CVEs
Developer ExperienceDeskagent:platform:developer-experienceNamespaces, onboarding, support

Agent Capabilities Summary

What Agents CAN Do

  • Read cluster state (kubectl get, kubectl describe, oc get)
  • Deploy via GitOps (argocd app sync, Flux reconciliation)
  • Create documentation and reports
  • Investigate and triage incidents
  • Provision standard resources (namespaces, quotas, RBAC)
  • Run health checks and audits
  • Scan images and generate SBOMs
  • Query metrics and logs
  • Execute pre-approved runbooks

What Agents CANNOT Do (Human-in-the-Loop Required)

  • Delete production resources (kubectl delete in prod)
  • Modify cluster-wide policies (NetworkPolicy, OPA, Kyverno cluster policies)
  • Make direct changes to secrets without rotation workflow
  • Modify network routes or service mesh configuration
  • Scale beyond defined resource limits
  • Perform irreversible cluster upgrades
  • Approve production deployments (can prepare, human approves)
  • Change RBAC at cluster-admin level

Communication Patterns

@Mentions

Agents communicate via @mentions in shared task comments:

@Shield Please review the RBAC for payment-service v3.2 before I sync.
@Pulse Is the CPU spike related to the deployment or external traffic?
@Atlas The staging cluster needs 2 more worker nodes.

Thread Subscriptions

  • Commenting on a task → auto-subscribe
  • Being @mentioned → auto-subscribe
  • Being assigned → auto-subscribe
  • Once subscribed → receive ALL future comments on heartbeat

Escalation Path

  1. Agent detects issue
  2. Agent attempts resolution within guardrails
  3. If blocked → @mention another agent or escalate to human
  4. P1 incidents → all relevant agents auto-notified

Heartbeat Schedule

Agents wake on staggered 5-minute intervals:

*/5  * * * *  Atlas   (Cluster Ops - needs fast response for incidents)
*/5  * * * *  Pulse   (Observability - needs fast response for alerts)
*/5  * * * *  Shield  (Security - fast response for CVEs and threats)
*/10 * * * *  Flow    (GitOps - deployments can wait a few minutes)
*/10 * * * *  Cache   (Artifacts - promotions are scheduled)
*/15 * * * *  Desk    (DevEx - developer requests aren't usually urgent)
*/15 * * * *  Orchestrator (Coordination - overview and standups)

Key Principles

  • Roles over genericism — Each agent has a defined SOUL with exactly who they are
  • Files over mental notes — Only files persist between sessions
  • Staggered schedules — Don't wake all agents at once
  • Shared context — One source of truth for tasks and communication
  • Heartbeat, not always-on — Balance responsiveness with cost
  • Human-in-the-loop — Critical actions require approval
  • Guardrails over freedom — Define what agents can and cannot do
  • Audit everything — Every action logged to activity feed
  • Reliability first — System stability always wins over new features
  • Security by default — Deny access, approve by exception

Detailed Agent Capabilities

Orchestrator (Jarvis)

  • Task routing: determining which agent should handle which request
  • Workflow orchestration: coordinating multi-agent operations
  • Daily standups: compiling swarm-wide status reports
  • Priority management: determining urgency and sequencing of work
  • Cross-agent communication: facilitating collaboration
  • Accountability: tracking what was promised vs what was delivered

Cluster Ops (Atlas)

  • OpenShift/Kubernetes cluster operations (upgrades, scaling, patching)
  • Node pool management and autoscaling
  • Resource quota management and capacity planning
  • Network troubleshooting (OVN-Kubernetes, Cilium, Calico)
  • Storage class management and PVC/CSI issues
  • etcd backup, restore, and health monitoring
  • Multi-platform expertise (OCP, EKS, AKS, GKE, ROSA, ARO)

GitOps (Flow)

  • ArgoCD application management (sync, rollback, sync waves, hooks)
  • Helm chart development, debugging, and templating
  • Kustomize overlays and patch generation
  • ApplicationSet templates for multi-cluster deployments
  • Deployment strategy management (canary, blue-green, rolling)
  • Git repository management and branching strategies
  • Drift detection and remediation
  • Secrets management integration (Vault, Sealed Secrets, External Secrets)

Security (Shield)

  • RBAC audit and management
  • NetworkPolicy review and enforcement
  • Security policy validation (OPA, Kyverno)
  • Vulnerability scanning (image scanning, CVE triage)
  • Secret rotation workflows
  • Security incident investigation
  • Compliance reporting

Observability (Pulse)

  • Prometheus/Grafana metric queries
  • Log aggregation and search (Loki, Elasticsearch)
  • Alert triage and investigation
  • SLO tracking and error budget monitoring
  • Incident response coordination
  • Dashboards and visualization
  • Telemetry pipeline troubleshooting

Artifacts (Cache)

  • Container registry management
  • Image scanning and CVE analysis
  • SBOM generation and tracking
  • Artifact promotion workflows
  • Version management
  • Registry caching and proxying

Developer Experience (Desk)

  • Namespace provisioning
  • Resource quota and limit range management
  • Developer onboarding
  • Template generation
  • Developer support and troubleshooting
  • Documentation generation

File Structure

cluster-agent-swarm-skills/
├── SKILL.md                    # This file - combined swarm
├── AGENTS.md                   # Swarm configuration and protocols
├── skills/
│   ├── orchestrator/           # Jarvis - task routing
│   │   └── SKILL.md
│   ├── cluster-ops/            # Atlas - cluster operations
│   │   └── SKILL.md
│   ├── gitops/                 # Flow - GitOps
│   │   └── SKILL.md
│   ├── security/               # Shield - security
│   │   └── SKILL.md
│   ├── observability/          # Pulse - monitoring
│   │   └── SKILL.md
│   ├── artifacts/              # Cache - artifacts
│   │   └── SKILL.md
│   └── developer-experience/   # Desk - DevEx
│       └── SKILL.md
├── scripts/                    # Shared scripts
└── references/                 # Shared documentation

Reference Documentation

For detailed capabilities of each agent, refer to individual SKILL.md files:

  • skills/orchestrator/SKILL.md - Full Orchestrator documentation
  • skills/cluster-ops/SKILL.md - Full Cluster Ops documentation
  • skills/gitops/SKILL.md - Full GitOps documentation
  • skills/security/SKILL.md - Full Security documentation
  • skills/observability/SKILL.md - Full Observability documentation
  • skills/artifacts/SKILL.md - Full Artifacts documentation
  • skills/developer-experience/SKILL.md - Full Developer Experience documentation

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-30 08:21 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,055 📥 795,889
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,527
developer-tools

Kubernetes Agent Swarm

kcns008
Kubernetes 与 OpenShift 平台代理集群 — 用于集群操作的协调式多代理系统,包含编排器 (Jarvis)、集群运维 (Atlas) 等。
★ 6 📥 8,239