概述

DGX Spark Setup

Complete setup guide for running Nemotron Super 120B (NVFP4) on a DGX Spark as a private OpenClaw backend with multi-user LiteLLM routing.

Architecture

MacBook (remote) ──Tailscale──► Mac Mini (OpenClaw host, SatPicks worker)
                                      │ LAN SSH
                                      ▼
                               DGX Spark (192.168.1.234)
                               ├── vLLM :8000  (inference)
                               └── LiteLLM :4000 (auth/routing)

Prerequisites

DGX Spark with Ubuntu (user: jhernandez)
Model downloaded to /home/jhernandez/models/nemotron-super-120b-nvfp4
Python 3.12 available (python3 --version)
uv installed (curl -LsSf https://astral.sh/uv/install.sh | sh)

1. vLLM Environment Setup

The DGX Spark uses the GB10 Blackwell chip (sm_121). Stock PyPI packages do NOT support sm_121 — everything must be custom built or sourced from specific index URLs.

mkdir -p ~/vllm-install
cd ~/vllm-install
uv venv .vllm --python 3.12
source .vllm/bin/activate

Install PyTorch (CUDA 13.0)

Must use uv pip install with the cu130 index — regular pip may resolve the wrong CUDA variant:

uv pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu130

Verify: python3 -c "import torch; print(torch.__version__)" → should show 2.11.0+cu130

Build Custom Triton (sm_121 support)

Stock Triton does not support sm_121. Must build from this exact commit:

cd ~/vllm-install
git clone https://github.com/triton-lang/triton.git
cd triton
git checkout 4caa0328bf8df64896dd5f6fb9df41b0eb2e750a
pip install ninja cmake wheel
pip install -e python/

Verify: python3 -c "import triton; print(triton.__version__)" → should show 3.5.0+git4caa0328

Install flashinfer

Versions must match exactly — mismatched cubin/flashinfer causes silent failures:

pip install flashinfer-python
pip install flashinfer  # cubin package — must match flashinfer-python version

Install vLLM from Source

cd ~/vllm-install
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 66a168a197ba214a5b70a74fa2e713c9eeb3251a
pip install -e . --no-build-isolation

2. Running vLLM

Always launch inside the tmux session so it survives SSH disconnects:

tmux new-session -s nemotron   # or: tmux attach -t nemotron

export PATH=$HOME/.local/bin:$PATH
source ~/vllm-install/.vllm/bin/activate

TORCH_CUDA_ARCH_LIST=12.1a \
VLLM_USE_FLASHINFER_MXFP4_MOE=1 \
TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas \
  python -m vllm.entrypoints.openai.api_server \
  --model /home/jhernandez/models/nemotron-super-120b-nvfp4 \
  --trust-remote-code --max-model-len 8192 \
  --gpu-memory-utilization 0.85 --port 8000

Startup takes ~8 minutes (loading 17 safetensor shards). Ready when log shows Application startup complete.

Note: nvidia-smi shows N/A for memory on the GB10 (unified memory architecture) — this is normal, not a bug.

3. LiteLLM Setup

LiteLLM proxies vLLM and handles per-user auth and rate limiting.

Install

pip install litellm

Config (`~/litellm-config.yaml`)

See references/litellm-config-template.yaml for a full config with virtual keys and rate limits.

Run as systemd service

mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/litellm.service << 'EOF'
[Unit]
Description=LiteLLM Proxy
After=network.target

[Service]
ExecStart=/home/jhernandez/.local/bin/litellm --config /home/jhernandez/litellm-config.yaml --port 4000
Restart=on-failure
RestartSec=5
StandardOutput=append:/home/jhernandez/litellm.log
StandardError=append:/home/jhernandez/litellm.log
Environment=PATH=/home/jhernandez/.local/bin:/usr/local/bin:/usr/bin:/bin

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable litellm
systemctl --user start litellm

Verify: curl http://localhost:4000/health/liveliness → "I'm alive!"

4. Tailscale

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Visit the auth URL shown, then approve in Tailscale admin
tailscale ip -4  # note this IP for OpenClaw client configs

5. OpenClaw Client Config

Point any OpenClaw instance at LiteLLM:

model:
  provider: openai-compatible
  baseUrl: http://<dgx-tailscale-ip>:4000/v1
  apiKey: <virtual-key>
  model: nemotron-super

Troubleshooting

See references/troubleshooting.md for common failure modes and fixes.

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 09:00 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Dgx Spark Setup

概述

DGX Spark Setup

Architecture

Prerequisites

1. vLLM Environment Setup

Install PyTorch (CUDA 13.0)

Build Custom Triton (sm_121 support)

Install flashinfer

Install vLLM from Source

2. Running vLLM

3. LiteLLM Setup

Install

Config (`~/litellm-config.yaml`)

Run as systemd service

4. Tailscale

5. OpenClaw Client Config

Troubleshooting

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

1password

Tmux

OpenClaw Backup

Dgx Spark Setup

概述

DGX Spark Setup

Architecture

Prerequisites

1. vLLM Environment Setup

Install PyTorch (CUDA 13.0)

Build Custom Triton (sm_121 support)

Install flashinfer

Install vLLM from Source

2. Running vLLM

3. LiteLLM Setup

Install

Config (~/litellm-config.yaml)

Run as systemd service

4. Tailscale

5. OpenClaw Client Config

Troubleshooting

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

1password

Tmux

OpenClaw Backup

Config (`~/litellm-config.yaml`)