← 返回
未分类 中文

Pilot Ml Training Pipeline Setup

Deploy an end-to-end ML training pipeline with 4 agents. Use this skill when: 1. User wants to set up a machine learning training pipeline 2. User is configu...
Deploy an end-to-end ML training pipeline with 4 agents. Use this skill when: 1. User wants to set up a machine learning training pipeline 2. User is configu...
teoslayer teoslayer 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 345
下载
💾 0
安装
1
版本
#latest

概述

ML Training Pipeline Setup

Deploy 4 agents spanning data prep, training, evaluation, and serving.

Roles

RoleHostnameSkillsPurpose
---------------------------------
data-prep-data-preppilot-dataset, pilot-share, pilot-task-chainCleans and transforms datasets
trainer-trainerpilot-dataset, pilot-model-share, pilot-metrics, pilot-task-chainTrains models, tracks metrics
evaluator-evaluatorpilot-model-share, pilot-metrics, pilot-review, pilot-task-chainEvaluates and gates promotion
serving-servingpilot-model-share, pilot-health, pilot-webhook-bridge, pilot-load-balancer, pilot-metricsServes inference requests

Setup Procedure

Step 1: Ask the user which role this agent should play and what prefix to use.

Step 2: Install the skills for the chosen role:

# For data-prep:
clawhub install pilot-dataset pilot-share pilot-task-chain
# For trainer:
clawhub install pilot-dataset pilot-model-share pilot-metrics pilot-task-chain
# For evaluator:
clawhub install pilot-model-share pilot-metrics pilot-review pilot-task-chain
# For serving:
clawhub install pilot-model-share pilot-health pilot-webhook-bridge pilot-load-balancer pilot-metrics

Step 3: Set the hostname:

pilotctl --json set-hostname <prefix>-<role>

Step 4: Write the role-specific JSON manifest to ~/.pilot/setups/ml-training-pipeline.json.

Step 5: Tell the user to initiate handshakes with direct communication peers.

Manifest Templates Per Role

data-prep

{
  "setup": "ml-training-pipeline", "role": "data-prep", "role_name": "Data Preparation",
  "hostname": "<prefix>-data-prep",
  "description": "Cleans, validates, and transforms raw datasets. Shares processed data with the trainer.",
  "skills": {
    "pilot-dataset": "Exchange structured datasets with schema negotiation.",
    "pilot-share": "Send cleaned dataset files to <prefix>-trainer.",
    "pilot-task-chain": "Chain data prep steps into sequential pipeline."
  },
  "peers": [{ "role": "trainer", "hostname": "<prefix>-trainer", "description": "Receives prepared datasets" }],
  "data_flows": [{ "direction": "send", "peer": "<prefix>-trainer", "port": 1001, "topic": "dataset-ready", "description": "Cleaned datasets" }],
  "handshakes_needed": ["<prefix>-trainer"]
}

trainer

{
  "setup": "ml-training-pipeline", "role": "trainer", "role_name": "Model Trainer",
  "hostname": "<prefix>-trainer",
  "description": "Receives prepared datasets, runs training jobs, tracks metrics, and shares trained model artifacts.",
  "skills": {
    "pilot-dataset": "Receive prepared datasets from data-prep.",
    "pilot-model-share": "Send trained model checkpoints to evaluator.",
    "pilot-metrics": "Track and publish training loss, accuracy, epochs.",
    "pilot-task-chain": "Chain training steps sequentially."
  },
  "peers": [
    { "role": "data-prep", "hostname": "<prefix>-data-prep", "description": "Sends prepared datasets" },
    { "role": "evaluator", "hostname": "<prefix>-evaluator", "description": "Receives trained models" }
  ],
  "data_flows": [
    { "direction": "receive", "peer": "<prefix>-data-prep", "port": 1001, "topic": "dataset-ready", "description": "Cleaned datasets" },
    { "direction": "send", "peer": "<prefix>-evaluator", "port": 1001, "topic": "training-complete", "description": "Model checkpoints and metrics" }
  ],
  "handshakes_needed": ["<prefix>-data-prep", "<prefix>-evaluator"]
}

evaluator

{
  "setup": "ml-training-pipeline", "role": "evaluator", "role_name": "Model Evaluator",
  "hostname": "<prefix>-evaluator",
  "description": "Scores trained models against benchmarks and gates promotion to serving.",
  "skills": {
    "pilot-model-share": "Receive models from trainer, promote approved models to serving.",
    "pilot-metrics": "Compare benchmarks, detect drift.",
    "pilot-review": "Gate model promotion with approval workflow.",
    "pilot-task-chain": "Chain evaluation steps."
  },
  "peers": [
    { "role": "trainer", "hostname": "<prefix>-trainer", "description": "Sends trained models" },
    { "role": "serving", "hostname": "<prefix>-serving", "description": "Receives approved models" }
  ],
  "data_flows": [
    { "direction": "receive", "peer": "<prefix>-trainer", "port": 1001, "topic": "training-complete", "description": "Model checkpoints" },
    { "direction": "send", "peer": "<prefix>-serving", "port": 1001, "topic": "model-approved", "description": "Approved models" },
    { "direction": "receive", "peer": "<prefix>-serving", "port": 1002, "topic": "inference-metrics", "description": "Drift detection data" }
  ],
  "handshakes_needed": ["<prefix>-trainer", "<prefix>-serving"]
}

serving

{
  "setup": "ml-training-pipeline", "role": "serving", "role_name": "Model Server",
  "hostname": "<prefix>-serving",
  "description": "Loads approved models, serves inference, monitors health, and load-balances.",
  "skills": {
    "pilot-model-share": "Receive approved models from evaluator.",
    "pilot-health": "Monitor inference endpoint health and latency.",
    "pilot-webhook-bridge": "Trigger external alerts on serving failures.",
    "pilot-load-balancer": "Distribute inference requests across replicas.",
    "pilot-metrics": "Report QPS, latency, drift metrics to evaluator."
  },
  "peers": [{ "role": "evaluator", "hostname": "<prefix>-evaluator", "description": "Sends approved models, receives metrics" }],
  "data_flows": [
    { "direction": "receive", "peer": "<prefix>-evaluator", "port": 1001, "topic": "model-approved", "description": "Approved models" },
    { "direction": "send", "peer": "<prefix>-evaluator", "port": 1002, "topic": "inference-metrics", "description": "Inference metrics for drift" }
  ],
  "handshakes_needed": ["<prefix>-evaluator"]
}

Data Flows

  • data-prep → trainer : cleaned datasets (port 1001)
  • trainer → evaluator : model checkpoints and metrics (port 1001)
  • evaluator → serving : approved models (port 1001)
  • serving → evaluator : inference metrics for drift detection (port 1002)

Workflow Example

# On data-prep:
pilotctl --json send-file <prefix>-trainer ./datasets/training-v5.parquet
pilotctl --json publish <prefix>-trainer dataset-ready '{"name":"training-v5","rows":150000}'
# On trainer:
pilotctl --json send-file <prefix>-evaluator ./models/resnet-v5.pt
pilotctl --json publish <prefix>-evaluator training-complete '{"model":"resnet-v5","accuracy":0.967}'
# On evaluator:
pilotctl --json send-file <prefix>-serving ./models/resnet-v5.pt
pilotctl --json publish <prefix>-serving model-approved '{"model":"resnet-v5","benchmark":0.971}'

Dependencies

Requires pilot-protocol skill, pilotctl binary, clawhub binary, and a running daemon.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 17:59 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,233 📥 268,936
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,390 📥 321,743
it-ops-security

Pilot Priority Queue

teoslayer
基于Pilot协议网络的优先级消息传递,支持紧急程度级别。适用场景:1. 需要处理带优先级的紧急消息...
★ 0 📥 486