概述

WSL2 Local AI — Windows Developer LLM Stack

Develop AI apps on Windows with full Linux performance. WSL2 gives you native Linux inside Windows with NVIDIA GPU passthrough — your RTX GPU runs CUDA in WSL2 at near-native speed. Ollama Herd routes AI requests across WSL2 instances and native Windows machines.

Why WSL2 for local AI

Full Linux + Windows GPU — WSL2 passes your NVIDIA GPU directly to Linux. CUDA works in WSL2.
Docker integration — Docker Desktop on Windows uses WSL2 backend. Containerize your AI workflows.
Best of both — VS Code on Windows, Ollama in WSL2, GPU shared between them.
Development workflow — write code on Windows, run inference in WSL2, same filesystem.

WSL2 AI setup

Step 1: Enable WSL2 with GPU support

# PowerShell (admin)
wsl --install -d Ubuntu
wsl --set-default-version 2

Verify WSL2 NVIDIA GPU access:

# Inside WSL2
nvidia-smi    # should show your RTX GPU

Step 2: Install Ollama in WSL2

# Inside WSL2
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &

Step 3: Install WSL2 Ollama Herd

# Inside WSL2
pip install ollama-herd
herd          # start WSL2 AI router on port 11435
herd-node     # register WSL2 as a node

Step 4: Access from Windows

Your WSL2 AI endpoint is accessible from Windows at http://localhost:11435 — WSL2 forwards ports automatically.

# From Windows PowerShell
curl http://localhost:11435/api/tags    # see WSL2 AI models

Use WSL2 AI

Python (from Windows or WSL2)

from openai import OpenAI

# Same URL works from Windows and WSL2
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# WSL2 handles the inference via NVIDIA GPU
response = client.chat.completions.create(
    model="qwen3.5:32b",
    messages=[{"role": "user", "content": "Write a Docker Compose file for a Python API"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

VS Code + WSL2 AI

// .vscode/settings.json — Continue.dev configuration
{
  "continue.models": [{
    "title": "WSL2 Local",
    "provider": "openai",
    "model": "codestral",
    "apiBase": "http://localhost:11435/v1",
    "apiKey": "not-needed"
  }]
}

curl from WSL2

# WSL2 inference
curl http://localhost:11435/api/chat -d '{
  "model": "codestral",
  "messages": [{"role": "user", "content": "Refactor this Python function"}],
  "stream": false
}'

WSL2 + Docker AI workflow

Run Ollama in Docker on WSL2 for containerized AI:

# WSL2 Docker + Ollama
docker run -d --gpus all -p 11434:11434 ollama/ollama

# Herd routes between Docker Ollama and native Ollama
pip install ollama-herd
herd &
herd-node

WSL2 AI hardware guide

Windows PC	GPU	WSL2 AI models
------------	-----	---------------
RTX 4090 desktop	24GB shared with WSL2	`llama3.3:70b`, `qwen3.5:32b`
RTX 4080 desktop	16GB shared with WSL2	`phi4`, `codestral`, `qwen3.5:14b`
RTX 4060 laptop	8GB shared with WSL2	`phi4-mini`, `gemma3:4b`

> WSL2 shares GPU memory with Windows. Close GPU-heavy Windows apps for more WSL2 AI vRAM.

WSL2 AI environment

# WSL2 Ollama optimization
export OLLAMA_KEEP_ALIVE=-1
export OLLAMA_MAX_LOADED_MODELS=-1

# Add to ~/.bashrc for persistence in WSL2
echo 'export OLLAMA_KEEP_ALIVE=-1' >> ~/.bashrc
echo 'export OLLAMA_MAX_LOADED_MODELS=-1' >> ~/.bashrc

Monitor WSL2 AI

# WSL2 fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# WSL2 health checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — accessible from both Windows browser and WSL2.

Also available on WSL2 AI

Image generation

curl http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "developer workspace", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "WSL2 Windows development AI"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). WSL2 developers welcome:

Guardrails

WSL2 AI model downloads require explicit user confirmation.
WSL2 AI model deletion requires explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/.
No models are downloaded automatically — all pulls are user-initiated or require opt-in.

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 05:57 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)