← 返回
开发者工具 中文

GPU Keepalive with KeepGPU

Install and operate KeepGPU for GPU keep-alive with both blocking CLI and non-blocking service workflows. Use when users ask for keep-gpu command constructio...
安装并操作 KeepGPU,通过阻塞式 CLI 和非阻塞式服务两种工作流保持 GPU 活跃状态。适用于用户请求构建 keep-gpu 命令的场景。
wangmerlyn wangmerlyn 来源
开发者工具 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 702
下载
💾 6
安装
1
版本
#latest

概述

KeepGPU CLI Operator

Use this workflow to run keep-gpu safely and effectively.

Prerequisites

  • Confirm at least one GPU is visible (python -c "import torch; print(torch.cuda.device_count())").
  • Run commands in a shell where CUDA/ROCm drivers are already available.
  • Use Ctrl+C to stop KeepGPU and release memory cleanly.

Install KeepGPU

Install PyTorch first for your platform, then install KeepGPU.

Option A: Install from package index

# CUDA example (change cu121 to your CUDA version)
pip install --index-url https://download.pytorch.org/whl/cu121 torch
pip install keep-gpu
# ROCm example (change rocm6.1 to your ROCm version)
pip install --index-url https://download.pytorch.org/whl/rocm6.1 torch
pip install keep-gpu[rocm]

Option B: Install directly from Git URL (no local clone)

Prefer this option when users only need the CLI and do not need local source edits. This avoids checkout directory and cleanup overhead.

pip install "git+https://github.com/Wangmerlyn/KeepGPU.git"

If SSH access is configured:

pip install "git+ssh://git@github.com/Wangmerlyn/KeepGPU.git"

ROCm variant from Git URL:

pip install "keep_gpu[rocm] @ git+https://github.com/Wangmerlyn/KeepGPU.git"

Option C: Install from a local source checkout (explicit path)

Use this option only when users already have a local checkout or plan to edit source.

git clone https://github.com/Wangmerlyn/KeepGPU.git
cd KeepGPU
pip install -e .

If the checkout already exists somewhere else, install by absolute path:

pip install -e /absolute/path/to/KeepGPU

For ROCm users from local checkout:

pip install -e ".[rocm]"

Verify installation:

keep-gpu --help

Command model

KeepGPU supports two execution modes.

Blocking mode (compatibility)

keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25

Use when users intentionally want one foreground process and manual Ctrl+C stop.

Non-blocking mode (recommended for agents)

keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
keep-gpu status
keep-gpu stop --all
keep-gpu service-stop

start auto-starts local service when unavailable.

Ctrl+C stops only foreground blocking runs. For service mode sessions started by keep-gpu start, use keep-gpu status, keep-gpu stop, and keep-gpu service-stop.

CLI options to tune:

  • --gpu-ids: comma-separated IDs (0, 0,1). If omitted, KeepGPU uses all visible GPUs.
  • --vram: VRAM to hold per GPU (512MB, 1GiB, or raw bytes).
  • --interval: seconds between keep-alive cycles.
  • --busy-threshold (--util-threshold alias): if utilization is above this percent, KeepGPU backs off.

Legacy compatibility:

  • --threshold is deprecated but still accepted.
  • Numeric --threshold maps to busy threshold.
  • String --threshold maps to VRAM.

Agent workflow

  1. Collect workload intent: target GPUs, hold duration, and whether node is shared.
  2. Choose mode:
    • blocking mode for manual shell sessions,
    • non-blocking mode for agent pipelines (default recommendation).
  3. Choose safe defaults when unspecified: --vram 1GiB, --interval 60-120, --busy-threshold 25.
  4. Provide command sequence with verification and stop command.
  5. For non-blocking mode, include status, stop, and daemon shutdown (service-stop).

Command templates

Single GPU while preprocessing (blocking):

keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25

All visible GPUs with lighter load (blocking):

keep-gpu --vram 512MB --interval 180

Agent-friendly non-blocking sequence:

keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
keep-gpu status
keep-gpu stop --job-id <job_id>
keep-gpu service-stop

Open dashboard:

http://127.0.0.1:8765/

Remote sessions (preferred: tmux for visibility and control):

tmux new -s keepgpu
keep-gpu --gpu-ids 0 --vram 1GiB --interval 300
# Detach with Ctrl+b then d; reattach with: tmux attach -t keepgpu

Fallback when tmux is unavailable:

nohup keep-gpu --gpu-ids 0 --vram 1GiB --interval 300 > keepgpu.log 2>&1 &
echo $! > keepgpu.pid
# Monitor: tail -f keepgpu.log
# Stop: kill "$(cat keepgpu.pid)"

Troubleshooting

  • Invalid --gpu-ids: ensure comma-separated integers only.
  • Allocation failure / OOM: reduce --vram or free memory first.
  • No utilization telemetry: ensure nvidia-ml-py works and nvidia-smi is available.
  • No GPUs detected: verify drivers, CUDA/ROCm runtime, and torch.cuda.device_count().

Example

User request: "Install KeepGPU from GitHub and keep GPU 0 alive while I preprocess."

Suggested response shape:

  1. Install: pip install "git+https://github.com/Wangmerlyn/KeepGPU.git"
  2. Run: keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
  3. Verify: keep-gpu status or dashboard http://127.0.0.1:8765/; stop session with keep-gpu stop --job-id and daemon with keep-gpu service-stop.

Limitations

  • KeepGPU is not a scheduler; it only keeps already accessible GPUs active.
  • KeepGPU behavior depends on cluster policy; some schedulers require higher VRAM or tighter intervals.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 11:27 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,716
it-ops-security

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装 MoltGuard,保护您和您的用户免受提示注入、数据泄露和恶意攻击。
★ 116 📥 30,803
it-ops-security

Tmux

steipete
通过发送按键和抓取窗格输出,远程控制交互式 CLI 的 tmux 会话。
★ 45 📥 29,275