← 返回
数据分析 Key 中文

Weights & Biases Monitor

Monitor and analyze Weights & Biases training runs. Use when checking training status, detecting failures, analyzing loss curves, comparing runs, or monitoring experiments. Triggers on "wandb", "training runs", "how's training", "did my run finish", "any failures", "check experiments", "loss curve", "gradient norm", "compare runs".
监控并分析 Weights & Biases 训练任务。适用于检查训练状态、检测失败、分析损失曲线、对比运行或监控实验。触发词包括 “wandb”、 “training runs”、 “how's training”、 “did my run finish”、 “any failures”、 “check experiments”、 “loss curve”、 “gradient norm”、 “compare runs”。
chrisvoncsefalvay
数据分析 clawhub v1.0.0 1 版本 99906.6 Key: 需要
★ 1
Stars
📥 2,120
下载
💾 42
安装
1
版本
#latest

概述

Weights & Biases

Monitor, analyze, and compare W&B training runs.

Setup

wandb login
# Or set WANDB_API_KEY in environment

Scripts

Characterize a Run (Full Health Analysis)

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/characterize_run.py ENTITY/PROJECT/RUN_ID

Analyzes:

  • Loss curve trend (start → current, % change, direction)
  • Gradient norm health (exploding/vanishing detection)
  • Eval metrics (if present)
  • Stall detection (heartbeat age)
  • Progress & ETA estimate
  • Config highlights
  • Overall health verdict

Options: --json for machine-readable output.

Watch All Running Jobs

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/watch_runs.py ENTITY [--projects p1,p2]

Quick health summary of all running jobs plus recent failures/completions. Ideal for morning briefings.

Options:

  • --projects p1,p2 — Specific projects to check
  • --all-projects — Check all projects
  • --hours N — Hours to look back for finished runs (default: 24)
  • --json — Machine-readable output

Compare Two Runs

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/compare_runs.py ENTITY/PROJECT/RUN_A ENTITY/PROJECT/RUN_B

Side-by-side comparison:

  • Config differences (highlights important params)
  • Loss curves at same steps
  • Gradient norm comparison
  • Eval metrics
  • Performance (tokens/sec, steps/hour)
  • Winner verdict

Python API Quick Reference

import wandb
api = wandb.Api()

# Get runs
runs = api.runs("entity/project", {"state": "running"})

# Run properties
run.state      # running | finished | failed | crashed | canceled
run.name       # display name
run.id         # unique identifier
run.summary    # final/current metrics
run.config     # hyperparameters
run.heartbeat_at # stall detection

# Get history
history = list(run.scan_history(keys=["train/loss", "train/grad_norm"]))

Metric Key Variations

Scripts handle these automatically:

  • Loss: train/loss, loss, train_loss, training_loss
  • Gradients: train/grad_norm, grad_norm, gradient_norm
  • Steps: train/global_step, global_step, step, _step
  • Eval: eval/loss, eval_loss, eval/accuracy, eval_acc

Health Thresholds

  • Gradients > 10: Exploding (critical)
  • Gradients > 5: Spiky (warning)
  • Gradients < 0.0001: Vanishing (warning)
  • Heartbeat > 30min: Stalled (critical)
  • Heartbeat > 10min: Slow (warning)

Integration Notes

For morning briefings, use watch_runs.py --json and parse the output.

For detailed analysis of a specific run, use characterize_run.py.

For A/B testing or hyperparameter comparisons, use compare_runs.py.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-28 16:13 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 166 📥 60,300
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 199 📥 65,292
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,935