← 返回
未分类 中文

Lora Pipeline

Manages end-to-end LoRA training: collects and verifies photos, scrapes datasets, applies quality checks, captions, and trains the LoRA model locally.
管理端到端LoRA训练:收集并校验图片、抓取数据集、进行质量检查、添加说明文字,并在本地训练LoRA模型。
iskwang iskwang 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 760
下载
💾 1
安装
1
版本
#latest

概述

LoRA Pipeline

Orchestrates the full LoRA dataset-to-model pipeline. Each phase is self-contained and can be delegated to a sub-agent independently.


Pipeline Overview

Phase 1: 蒐集範例照片   → collect 3–6 reference face photos
Phase 2: 確認人臉正確   → user confirms refs; deepface cross-check
Phase 3: 蒐集 datasets  → scrape web sources guided by face features
Phase 4: 確認照片正確   → face verify + dedup + quality filter + crop
Phase 5: 開始 caption   → WD14 local tagging + trigger word
Phase 6: LoRA training  → RunPod Kohya training → retrieve outputs

Phase Index

PhaseFileCan Sub-AgentModelEst. Time
-------------:---:------
01 — Reference Collectionphases/01-reference.mdHaiku (Worker)5–10 min
02 — Scrapingphases/02-scraping.mdHaiku (Worker)10–30 min
03 — Verify & Cleanphases/03-verify.mdHaiku (Worker)2–5 min
04 — Captionphases/04-caption.mdHaiku (Worker)1–3 min
05 — Trainingphases/05-training.mdHaiku (Worker) + Sentry15–30 min

To load a specific phase: read skills/lora-pipeline/phases/ — each file is independently readable.


Directory Structure

~/.openclaw/workspace/
└── datasets/
    ├── face_references/
    │   └── <lora_name>/          # Phase 1–2: Gold standard refs (3–6 photos)
    │       ├── ref_01.jpg
    │       └── ...
    ├── <lora_name>_raw/          # Phase 3: Raw scraped images (pre-verification)
    │   └── ...
    └── <lora_name>/              # Phase 4–5: Verified + captioned training set
        ├── image001.png
        ├── image001.txt
        └── ...

Privacy Rules (CRITICAL — All Phases)

  • NO DATA INSPECTION: Do NOT cat, read, or analyze image file contents or .txt caption files.
  • NO CLOUD UPLOAD: All face verification (DeepFace) must run locally. Never send images to cloud APIs.
  • NO DATA LEAKAGE: Do not describe dataset details (person names, attributes) to the LLM unnecessarily.
  • Treat datasets as opaque binary blobs except when running local scripts.

Quality Standards (SDXL)

  • Resolution: 1024×1024 minimum after crop
  • Format: Convert all to PNG before training
  • No black borders: Run autocrop before final save
  • Dataset diversity: ≥30% clothed/natural skin shots

Scripts

ScriptLocationPurpose
---------------------------
tag_batch.pyskills/lora-pipeline/scripts/tag_batch.pyLocal WD14 ONNX tagger for a directory
smart_crop.pyskills/lora-pipeline/scripts/smart_crop.pyInteractive or automated single-subject cropping
batch_lora_train.pyskills/lora-pipeline/scripts/batch_lora_train.pyKohya batch training runner for RunPod

Sub-Agent Protocol

Each phase file contains:

  1. Input Contract — what must already exist before this phase starts
  2. Output Contract — what this phase produces
  3. Completion Signal — how to report back (sessions_send + status file fallback)
  4. Error Escalation — sub-agent reports to parent, never self-escalates model tier

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-01 19:22 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 866 📥 346,998
data-analysis

Alaska Air

iskwang
抓取阿拉斯加航空奖励日历和航班数据,检查单程航班的里程、奖励座位可用性及价格。
★ 0 📥 454
ai-agent

self-improving agent

pskoett
记录自身发现以实现自我改进的技能
★ 4,165 📥 939,503