Run standardized benchmarks on GGUF models using llama.cpp's llama-bench tool.
# Basic benchmark
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99
# With specific backend
LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99
| Parameter | Description | Default |
|---|---|---|
| ----------- | ------------- | --------- |
-m | Model path (GGUF file) | required |
-p | Prompt sizes to test | 512 |
-n | Generation lengths to test | 128 |
-ngl | GPU layers to offload | 99 |
-t | CPU threads | auto |
-dev | Device selection | auto |
For consistent comparisons across models, use:
-p 512,1024,2048 -n 128,256 -ngl 99
This tests:
| Metric | Meaning | Good Performance |
|---|---|---|
| -------- | --------- | ------------------ |
pp512 | Prompt processing speed at 512 tokens | >1000 t/s |
pp1024 | Prompt processing speed at 1024 tokens | >1000 t/s |
pp2048 | Prompt processing speed at 2048 tokens | >1000 t/s |
tg128 | Token generation speed (128 tokens) | >50 t/s |
tg256 | Token generation speed (256 tokens) | >50 t/s |
llama-bench auto-detects available backends. Priority order:
To force a backend, set environment variable or check build:
# Check available backends
llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan"
Use the provided script for benchmarking multiple models:
./scripts/benchmark_models.sh /path/to/models/*.gguf
Output can be redirected to a file:
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt
Or use the benchmark script which auto-saves to timestamped files.
-ngl (GPU layers) or test smaller prompt sizes-t matches CPU core count./scripts/build_llamacpp.sh -v
Shows:
# Interactive mode (prompts for backend selection)
./scripts/build_llamacpp.sh -u
# Specify backend directly
./scripts/build_llamacpp.sh -u -b vulkan # Vulkan (AMD/Intel GPUs)
./scripts/build_llamacpp.sh -u -b cuda # CUDA (NVIDIA GPUs)
./scripts/build_llamacpp.sh -u -b rocm # ROCm (AMD GPUs)
./scripts/build_llamacpp.sh -u -b cpu # CPU only
# Clean rebuild
./scripts/build_llamacpp.sh -c -b vulkan
# Custom build directory
./scripts/build_llamacpp.sh -u -b cuda -d /custom/path
| Flag | Description |
|---|---|
| ------ | ------------- |
-v | Show version info and exit |
-u | Update to latest from GitHub |
-c | Clean build (remove existing) |
-b | Backend: vulkan, cuda, rocm, cpu |
-d | Build directory path |
-j | Parallel jobs (default: CPU count) |
The benchmark script auto-detects llama-bench in these locations:
/DATA/Benchmark/llama.cpp/build/bin/llama-bench~/Repo/llama.cpp/build/bin/llama-bench~/lab/build/bin/llama-benchIf not found, it will search your home directory or you can build it using the script above.
共 1 个版本