Guide for generating PyTorch code that runs on Moore Threads (摩尔线程) MUSA GPUs using torch_musa.
MUSA (Metaverse Unified System Architecture) is Moore Threads' GPU computing platform. This skill helps generate code that:
torch_musa
| CUDA | MUSA |
| ------------------------------ | ------------------------------ |
| torch.cuda | torch.musa |
| torch.device("cuda") | torch.device("musa") |
| torch.cuda.is_available() | torch.musa.is_available() |
| backend='nccl' | backend='mccl' |
| torch.cuda.device_count() | torch.musa.device_count() |
| torch.cuda.get_device_name() | torch.musa.get_device_name() |
DO NOT install PyTorch, vLLM, or related packages manually. MUSA environments are custom-built and include:
Installing standard packages from PyPI will break the environment.
MUSA provides pre-configured conda environments. Common environment names:
v1.2 - MUSA SDK v1.2 environment
v1.3 - MUSA SDK v1.3 environment (newer)
# List available MUSA environments
conda env list | grep -E "(v1\.2|v1\.3|musa)"
# Activate the appropriate environment
conda activate v1.2 # or v1.3
# Verify MUSA availability
python -c "import torch_musa; import torch; print(torch.musa.is_available())"
If no MUSA conda environment is detected:
```bash
which musaInfo # Should show musaInfo path
ls /usr/local/musa/ # MUSA SDK location
```
musa-env-setup skill for complete environment installation
/opt/conda/envs/
~/conda/envs/
/usr/local/conda/envs/
| Variable | Purpose |
| ------------------------------ | ------------------------- |
| MUSA_VISIBLE_DEVICES=0,1,2,3 | Control visible GPU IDs |
| MUSA_LAUNCH_BLOCKING=1 | Synchronous kernel launch |
| MUDNN_LOG_LEVEL=INFO | Enable MUDNN logging |
| TORCH_SHOW_CPP_STACKTRACES=1 | Show C++ stack traces |
When generating PyTorch code for MUSA:
```python
import torch_musa # Must import before using torch.musa
```
```python
device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")
tensor = torch.tensor([1.0, 2.0], device=device)
```
```python
dist.init_process_group(backend='mccl', ...)
```
```python
from torch.cuda.amp import autocast, GradScaler # Same API
```
torch.backends.musa.matmul.allow_tf32 = True for TensorFloat32
For common model types, see templates in references/:
reference.md - Complete MUSA API reference
import torch
import torch_musa
print(f"MUSA available: {torch.musa.is_available()}")
print(f"Device count: {torch.musa.device_count()}")
print(f"Device name: {torch.musa.get_device_name(0)}")
import torch_musa
# Device setup
device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")
# Model and data to device
model = model.to(device)
inputs = inputs.to(device)
# Training (same as CUDA)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
import torch.distributed as dist
import torch_musa
# Initialize with mccl backend
dist.init_process_group(backend='mccl', rank=rank, world_size=world_size)
# Create process group on MUSA
torch.cuda.set_device(local_rank) # torch_musa extends torch.cuda API
When converting existing CUDA code to MUSA:
import torch_musa at the top
cuda with musa in device strings
nccl with mccl for distributed backend
render group: sudo usermod -aG render $(whoami)
LD_LIBRARY_PATH includes /usr/local/musa/lib/
python setup.py clean && bash build.sh
--env MTHREADS_VISIBLE_DEVICES=all
For detailed API reference and examples, see references/reference.md.
共 1 个版本