Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.
Thinking mode: Use ultrathink for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.
Dependencies:
go install golang.org/x/perf/cmd/benchstat@latestPerformance improvement does not exist without measures — if you can measure it, you can improve it.
This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See samber/cc-skills-golang@golang-performance skill. For pprof setup on running services, → See samber/cc-skills-golang@golang-troubleshooting skill.
b.Loop() (Go 1.24+) — preferredFor Go 1.24+, prefer b.Loop() for new benchmarks. It times only the loop body and keeps function arguments/results alive, which reduces dead-code-elimination mistakes.
func BenchmarkParse(b *testing.B) {
data := loadFixture("large.json") // setup — excluded from timing
for b.Loop() {
Parse(data) // compiler cannot eliminate this call
}
}
Legacy b.N loops still compile and are fine to keep when preserving existing benchmarks or supporting Go <1.24. They are easier to get wrong: setup may need b.ResetTimer(), and results may need a sink if the compiler can eliminate the work. Go 1.26 fixed an earlier b.Loop() inlining limitation — benchmarks on 1.24–1.25 already benefit from b.Loop() but may miss inlining optimizations that 1.26 delivers.
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // or run with -benchmem flag
var sink []byte
for b.Loop() {
sink = make([]byte, 1024)
}
_ = sink
}
b.ReportMetric() adds custom metrics (e.g., throughput):
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s") // b.Elapsed() is only valid inside b.Loop()
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
| Flag | Purpose |
|---|---|
| ---------------------- | ----------------------------------------- |
-bench=. | Run all benchmarks (regexp filter) |
-benchmem | Report allocations (B/op, allocs/op) |
-count=10 | Run 10 times for statistical significance |
-benchtime=3s | Minimum time per benchmark (default 1s) |
-cpu=1,2,4 | Run with different GOMAXPROCS values |
-cpuprofile=cpu.prof | Write CPU profile |
-memprofile=mem.prof | Write memory profile |
-trace=trace.out | Write execution trace |
Output format: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — the -8 suffix is GOMAXPROCS, ns/op is time per operation, B/op is bytes allocated per op, allocs/op is heap allocation count per op.
Paste benchstat output in the commit body when the change has a measurable performance impact. This documents _why_ an optimization was made, prevents future readers from reverting it, and lets reviewers verify the claim without re-running benchmarks.
Commit format:
perf(parser): reduce Parse allocations 50% with sync.Pool
Replace per-call []byte allocation with a pooled buffer.
goos: linux / goarch: amd64 / cpu: AMD Ryzen 9 5950X
│ old │ new │
│ sec/op │ sec/op vs base │
Parse-32 4.592µ ± 2% 3.041µ ± 1% -33.78% (p=0.000 n=10)
│ old │ new │
│ B/op │ B/op vs base │
Parse-32 1.024Ki ± 0% 0.512Ki ± 0% -50.00% (p=0.000 n=10)
│ old │ new │
│ allocs/op │ allocs/op vs base │
Parse-32 12.00 ± 0% 6.000 ± 0% -50.00% (p=0.000 n=10)
Rules:
~ (no statistical significance) — the improvement cannot be claimedgoos/goarch/cpu) so results are reproducibleperf(scope): commit type for performance-only changesGenerate profiles directly from benchmark runs — no HTTP server needed:
# CPU profile
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
# Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
# Execution trace
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see pprof Reference. For execution trace interpretation, see Trace Reference. For statistical comparison, see benchstat Reference.
prometheus/client_golang. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between runtime/metrics (Go internal data) and Prometheus metrics (what you scrape from /metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.samber/cc-skills-golang@golang-performance skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y")samber/cc-skills-golang@golang-troubleshooting skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodologysamber/cc-skills-golang@golang-observability skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)samber/cc-skills-golang@golang-testing skill for general testing practicessamber/cc-skills@promql-cli skill for querying Prometheus runtime metrics in production to validate benchmark findings共 4 个版本