概述

llamacpp-bench

Run standardized benchmarks on GGUF models using llama.cpp's llama-bench tool.

Quick Start

# Basic benchmark
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99

# With specific backend
LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99

Benchmark Parameters

Parameter	Description	Default
-----------	-------------	---------
`-m`	Model path (GGUF file)	required
`-p`	Prompt sizes to test	512
`-n`	Generation lengths to test	128
`-ngl`	GPU layers to offload	99
`-t`	CPU threads	auto
`-dev`	Device selection	auto

Standard Test Suite

For consistent comparisons across models, use:

-p 512,1024,2048 -n 128,256 -ngl 99

This tests:

Prompt processing: 512, 1024, 2048 tokens
Token generation: 128, 256 tokens

Interpreting Results

Metric	Meaning	Good Performance
--------	---------	------------------
`pp512`	Prompt processing speed at 512 tokens	>1000 t/s
`pp1024`	Prompt processing speed at 1024 tokens	>1000 t/s
`pp2048`	Prompt processing speed at 2048 tokens	>1000 t/s
`tg128`	Token generation speed (128 tokens)	>50 t/s
`tg256`	Token generation speed (256 tokens)	>50 t/s

Backend Selection

llama-bench auto-detects available backends. Priority order:

CUDA (NVIDIA GPUs)
ROCm (AMD GPUs)
Vulkan (cross-platform GPU)
CPU (fallback)

To force a backend, set environment variable or check build:

# Check available backends
llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan"

Batch Benchmarking

Use the provided script for benchmarking multiple models:

./scripts/benchmark_models.sh /path/to/models/*.gguf

Saving Results

Output can be redirected to a file:

llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt

Or use the benchmark script which auto-saves to timestamped files.

Common Issues

Out of memory: Reduce -ngl (GPU layers) or test smaller prompt sizes
Slow CPU performance: Ensure -t matches CPU core count
Backend not found: Check llama.cpp was built with the desired backend

Building / Updating llama.cpp

Check Current Version

./scripts/build_llamacpp.sh -v

Shows:

Current Git commit and branch
Build date
Whether behind upstream
Available backends

Build or Update

# Interactive mode (prompts for backend selection)
./scripts/build_llamacpp.sh -u

# Specify backend directly
./scripts/build_llamacpp.sh -u -b vulkan   # Vulkan (AMD/Intel GPUs)
./scripts/build_llamacpp.sh -u -b cuda     # CUDA (NVIDIA GPUs)
./scripts/build_llamacpp.sh -u -b rocm     # ROCm (AMD GPUs)
./scripts/build_llamacpp.sh -u -b cpu      # CPU only

# Clean rebuild
./scripts/build_llamacpp.sh -c -b vulkan

# Custom build directory
./scripts/build_llamacpp.sh -u -b cuda -d /custom/path

Build Options

Flag	Description
------	-------------
`-v`	Show version info and exit
`-u`	Update to latest from GitHub
`-c`	Clean build (remove existing)
`-b`	Backend: vulkan, cuda, rocm, cpu
`-d`	Build directory path
`-j`	Parallel jobs (default: CPU count)

Finding llama-bench

The benchmark script auto-detects llama-bench in these locations:

/DATA/Benchmark/llama.cpp/build/bin/llama-bench
~/Repo/llama.cpp/build/bin/llama-bench
~/lab/build/bin/llama-bench

If not found, it will search your home directory or you can build it using the script above.

版本历史

共 1 个版本

v1.0.0 当前

2026-05-07 06:17 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)