← 返回
未分类 中文

Mac Mini AI — Mac Mini Local LLM, Image Gen, STT on Apple Silicon

Mac Mini AI — run LLMs, image generation, speech-to-text, and embeddings on your Mac Mini. M4 (16-32GB) and M4 Pro (24-64GB) configurations make the Mac Mini...
Mac Mini AI — 在 Mac Mini 上运行大型语言模型、图像生成、语音转文字和嵌入。M4(16‑32GB)和 M4 Pro(24‑64GB)配置让 Mac Mini……
twinsgeeks twinsgeeks 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 2
Stars
📥 367
下载
💾 2
安装
1
版本
#affordable-ai#apple-silicon#embedding#fleet#homelab#latest#llm#local-ai#m4#m4-pro#mac-cluster#mac-mini#ollama#phi4#self-hosted

概述

Mac Mini AI — The $599 AI Node

The Mac Mini is the most cost-effective hardware for local AI. Starting at $599 with 16GB of unified memory, it runs 7B-14B models comfortably. Stack three Mac Minis for the cost of one month of cloud GPU rental — and they run forever with zero ongoing costs.

This skill turns one Mac Mini into an AI server and multiple Mac Minis into a fleet.

Mac Mini configurations for AI

ConfigChipUnified MemoryPriceLLM Sweet Spot
----------------------------------------------------
Mac Mini M4 (16GB)M416GB$5993B-7B models (phi4-mini, llama3.2:3b)
Mac Mini M4 (24GB)M424GB$7997B-14B models (phi4, gemma3:12b)
Mac Mini M4 (32GB)M432GB$99914B-22B models (qwen3:14b, codestral)
Mac Mini M4 Pro (48GB)M4 Pro48GB$1,39922B-32B models (qwen3:32b)
Mac Mini M4 Pro (64GB)M4 Pro64GB$1,79932B-70B models (llama3.3:70b quantized)

The Mac Mini fleet strategy

Three Mac Minis (32GB each) for $3,000 give you:

  • 96GB total unified memory across the fleet
  • Each runs a different model simultaneously
  • The router picks the best device for every request
  • $0/month after purchase — no cloud API costs
Mac Mini #1 (32GB) — llama3.3:70b (quantized)  ─┐
Mac Mini #2 (32GB) — codestral + phi4            ├──→  Router  ←──  Your apps
Mac Mini #3 (32GB) — qwen3:14b + embeddings     ─┘

Setup

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/

On one Mac Mini (the router):

herd

On every other Mac Mini:

herd-node

Devices discover each other automatically. No IP configuration, no Docker, no Kubernetes.

Use your Mac Mini

Chat with an LLM

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="phi4",
    messages=[{"role": "user", "content": "Write a Python web scraper"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Ollama API

curl http://localhost:11435/api/chat -d '{
  "model": "gemma3:12b",
  "messages": [{"role": "user", "content": "Explain recursion simply"}],
  "stream": false
}'

Image generation (optional)

uv tool install mflux    # Install on any Mac Mini
curl -o art.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model": "z-image-turbo", "prompt": "a stack of Mac Minis glowing", "width": 512, "height": 512}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "file=@meeting.wav" -F "model=qwen3-asr"

Embeddings for RAG

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Mac Mini home server local AI"}'

Best models for Mac Mini

RAMBest modelsWhy
----------------------
16GBphi4-mini (3.8B), gemma3:4b, nomic-embed-textSmall but capable, leaves room for OS
24GBphi4 (14B), gemma3:12b, codestralSweet spot for single-model use
32GBqwen3:14b, deepseek-r1:14b, codestral + phi4-miniTwo models simultaneously
48GBqwen3:32b, deepseek-r1:32bLarger models, great quality
64GBllama3.3:70b (quantized)Near-frontier quality on a Mac Mini

Monitor your Mac Mini fleet

Dashboard at http://localhost:11435/dashboard — see every Mac Mini's status, loaded models, and queue depths.

# Fleet overview
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Model recommendations for your hardware
curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

Works with any OpenAI-compatible tool

ToolConnection
-----------------
Open WebUIOllama URL: http://mac-mini-ip:11435
Aideraider --openai-api-base http://mac-mini-ip:11435/v1
Continue.devBase URL: http://mac-mini-ip:11435/v1
LangChainChatOpenAI(base_url="http://mac-mini-ip:11435/v1")

Full documentation

Contribute

Ollama Herd is open source (MIT). Built for the Mac Mini fleet community:

  • Star on GitHub — help other Mac Mini owners find us
  • Open an issue — share your Mac Mini fleet setup
  • PRs welcome from humans and AI agents. CLAUDE.md gives full context.
  • Running a Mac Mini cluster? We'd love to hear about it.

Guardrails

  • No automatic downloads — model pulls require explicit user confirmation.
  • Model deletion requires explicit user confirmation.
  • All requests stay local — no data leaves your network.
  • Never delete or modify files in ~/.fleet-manager/.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 08:53 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Find Skills

root
帮助用户发现和安装智能体技能,当用户询问如「如何做X」、「找X的技能」、「有能做...的吗」等问题时
★ 1,513 📥 570,331
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 864 📥 341,922
life-service

Adopt A Pet

twinsgeeks
领养虚拟宠物作为AI智能体。为它取名、喂食、见证成长。64种以上物种,从猫狗到AI原生生物。实时饥饿感,5个进化阶段。
★ 0 📥 816