← 返回
未分类 中文

Dgx Spark Setup

Set up and maintain an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server running vLLM + LiteLLM + OpenClaw. Use when in...
Set up and maintain an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server running vLLM + LiteLLM + OpenClaw. Use when in...
jimmy-hernandez jimmy-hernandez 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 322
下载
💾 0
安装
1
版本
#latest

概述

DGX Spark Setup

Complete setup guide for running Nemotron Super 120B (NVFP4) on a DGX Spark as a private OpenClaw backend with multi-user LiteLLM routing.

Architecture

MacBook (remote) ──Tailscale──► Mac Mini (OpenClaw host, SatPicks worker)
                                      │ LAN SSH
                                      ▼
                               DGX Spark (192.168.1.234)
                               ├── vLLM :8000  (inference)
                               └── LiteLLM :4000 (auth/routing)

Prerequisites

  • DGX Spark with Ubuntu (user: jhernandez)
  • Model downloaded to /home/jhernandez/models/nemotron-super-120b-nvfp4
  • Python 3.12 available (python3 --version)
  • uv installed (curl -LsSf https://astral.sh/uv/install.sh | sh)

1. vLLM Environment Setup

The DGX Spark uses the GB10 Blackwell chip (sm_121). Stock PyPI packages do NOT support sm_121 — everything must be custom built or sourced from specific index URLs.

mkdir -p ~/vllm-install
cd ~/vllm-install
uv venv .vllm --python 3.12
source .vllm/bin/activate

Install PyTorch (CUDA 13.0)

Must use uv pip install with the cu130 index — regular pip may resolve the wrong CUDA variant:

uv pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu130

Verify: python3 -c "import torch; print(torch.__version__)" → should show 2.11.0+cu130

Build Custom Triton (sm_121 support)

Stock Triton does not support sm_121. Must build from this exact commit:

cd ~/vllm-install
git clone https://github.com/triton-lang/triton.git
cd triton
git checkout 4caa0328bf8df64896dd5f6fb9df41b0eb2e750a
pip install ninja cmake wheel
pip install -e python/

Verify: python3 -c "import triton; print(triton.__version__)" → should show 3.5.0+git4caa0328

Install flashinfer

Versions must match exactly — mismatched cubin/flashinfer causes silent failures:

pip install flashinfer-python
pip install flashinfer  # cubin package — must match flashinfer-python version

Install vLLM from Source

cd ~/vllm-install
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 66a168a197ba214a5b70a74fa2e713c9eeb3251a
pip install -e . --no-build-isolation

2. Running vLLM

Always launch inside the tmux session so it survives SSH disconnects:

tmux new-session -s nemotron   # or: tmux attach -t nemotron

export PATH=$HOME/.local/bin:$PATH
source ~/vllm-install/.vllm/bin/activate

TORCH_CUDA_ARCH_LIST=12.1a \
VLLM_USE_FLASHINFER_MXFP4_MOE=1 \
TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas \
  python -m vllm.entrypoints.openai.api_server \
  --model /home/jhernandez/models/nemotron-super-120b-nvfp4 \
  --trust-remote-code --max-model-len 8192 \
  --gpu-memory-utilization 0.85 --port 8000

Startup takes ~8 minutes (loading 17 safetensor shards). Ready when log shows Application startup complete.

Note: nvidia-smi shows N/A for memory on the GB10 (unified memory architecture) — this is normal, not a bug.

3. LiteLLM Setup

LiteLLM proxies vLLM and handles per-user auth and rate limiting.

Install

pip install litellm

Config (~/litellm-config.yaml)

See references/litellm-config-template.yaml for a full config with virtual keys and rate limits.

Run as systemd service

mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/litellm.service << 'EOF'
[Unit]
Description=LiteLLM Proxy
After=network.target

[Service]
ExecStart=/home/jhernandez/.local/bin/litellm --config /home/jhernandez/litellm-config.yaml --port 4000
Restart=on-failure
RestartSec=5
StandardOutput=append:/home/jhernandez/litellm.log
StandardError=append:/home/jhernandez/litellm.log
Environment=PATH=/home/jhernandez/.local/bin:/usr/local/bin:/usr/bin:/bin

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable litellm
systemctl --user start litellm

Verify: curl http://localhost:4000/health/liveliness"I'm alive!"

4. Tailscale

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Visit the auth URL shown, then approve in Tailscale admin
tailscale ip -4  # note this IP for OpenClaw client configs

5. OpenClaw Client Config

Point any OpenClaw instance at LiteLLM:

model:
  provider: openai-compatible
  baseUrl: http://<dgx-tailscale-ip>:4000/v1
  apiKey: <virtual-key>
  model: nemotron-super

Troubleshooting

See references/troubleshooting.md for common failure modes and fixes.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 09:00 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,625
it-ops-security

Tmux

steipete
通过发送按键和抓取窗格输出,远程控制交互式 CLI 的 tmux 会话。
★ 46 📥 29,527
it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,935