← 返回
AI智能 中文

F5tts Monitor

Monitor F5-TTS distributed training on the 9-GPU mining rig (Local-LLM) without interfering with the process.
在9卡矿机(Local-LLM)上监控F5-TTS分布式训练,且不干扰进程运行。
pbseiya
AI智能 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 529
下载
💾 11
安装
1
版本
#latest

概述

F5-TTS Mining Rig Monitor Skill

This skill provides instructions for ADA to safely monitor the ongoing F5-TTS training process on the 9-GPU mining rig (Local-LLM), without interfering with the data or environment.

IMPORTANT:

  1. The training dataset and checkpoints are strictly located on the HDD of the mining rig at /mnt/toshiba/projects/F5-TTS/.
  2. Do not attempt to run training locally on asus-z170k.
  3. Use uv exclusively when interacting with the Python environment on the mining rig.

Steps to Monitor Training

1. Check GPU Utilization

To ensure all 9 GPUs are actively training and not bottlenecked or OOMed, run the following command via SSH (remember to use pseudo-terminal if using watch):

ssh Local-LLM "nvidia-smi"

You should see 9 python3 processes consistently consuming ~11GB of VRAM each.

2. Check Training Epoch Progress

Check the Accelerate training logs to see the current epoch and global step:

ssh Local-LLM "tail -n 100 /mnt/toshiba/projects/F5-TTS/outputs/training_mining_rig.log"

Look for Epoch: and Step: progression.

3. Check System RAM and CPU Load

The mining rig only has a 2-core Pentium CPU and 16GB of RAM. Make sure the system isn't buckling under the DDP overhead:

ssh Local-LLM "free -h && uptime"

4. Update the Heartbeat

After successfully probing the status, update your HEARTBEAT.md files locally to report the current Epoch, Step, GPU temperature, and estimated time remaining to Master Seiya.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 15:32 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 836 📥 213,131
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,827
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,365