← 返回
数据分析 中文

Linux Kernel Crash Debug

Debug Linux kernel crashes using the crash utility and memory debugging tools. Use when users mention kernel crash, kernel panic, vmcore analysis, kernel dum...
使用 crash 工具和内存调试工具进行 Linux 内核崩溃调试,适用于用户提到内核崩溃、内核 panic、vmcore 分析、内核dump 等情形。
crazyss crazyss 来源
数据分析 clawhub v1.3.2 4 版本 99899.9 Key: 无需
★ 0
Stars
📥 998
下载
💾 20
安装
4
版本
#latest

概述

Linux Kernel Crash Debugging

This skill guides you through analyzing Linux kernel crash dumps using the crash utility.

Installation

Claude Code

claude skill install linux-kernel-crash-debug.skill

OpenClaw

# Method 1: Install via ClawHub
clawhub install linux-kernel-crash-debug

# Method 2: Manual installation
mkdir -p ~/.openclaw/workspace/skills/linux-kernel-crash-debug
cp SKILL.md ~/.openclaw/workspace/skills/linux-kernel-crash-debug/

Quick Start

Starting a Session

# Analyze a dump file
crash vmlinux vmcore

# Debug a running system
crash vmlinux

# Raw RAM dump
crash vmlinux ddr.bin --ram_start=0x80000000

Core Debugging Workflow

1. crash> sys              # Confirm panic reason
2. crash> log              # View kernel log
3. crash> bt               # Analyze call stack
4. crash> struct <type>    # Inspect data structures
5. crash> kmem <addr>      # Memory analysis

🤖 Agent Execution Directives

If you are an AI/Agent using this skill, do not invoke crash interactively as it will block your subshell.

  1. Use the bundled wrapper ./scripts/agent-crash.sh which maps precisely to the workflows below but safely truncates outputs:
    • ./scripts/agent-crash.sh -k vmlinux -c vmcore triage - Safely runs initial sys, log, and bt.
    • ./scripts/agent-crash.sh -k vmlinux -c vmcore flow-oom - Top 15 memory checks.
    • ./scripts/agent-crash.sh -k vmlinux -c vmcore flow-deadlock - Pulls UN task stacks.
    • ./scripts/agent-crash.sh -k vmlinux -c vmcore dis-regs - Assembly regression.
    • ./scripts/agent-crash.sh -k vmlinux -c vmcore check-poison - Pattern match memory poisons.
  2. Fallback Strategy: If macros don't solve the issue, fall back to basic primitives manually: ./scripts/agent-crash.sh -k vmlinux -c vmcore run "rd ffff880123456780".
  3. Check references/agentic-heuristics.md for extended expert methodologies.

Prerequisites

ItemRequirement
-------------------
vmlinuxMust have debug symbols (CONFIG_DEBUG_INFO=y)
vmcorekdump/netdump/diskdump/ELF format
Versionvmlinux must exactly match the vmcore kernel version

Package Installation

Anolis OS / Alibaba Cloud Linux

# Install crash utility
sudo dnf install crash

# Install kernel debuginfo (match your kernel version)
sudo dnf install kernel-debuginfo-$(uname -r)

# Install additional analysis tools
sudo dnf install gdb readelf objdump makedumpfile

# Optional: Install kernel-devel for source code reference
sudo dnf install kernel-devel-$(uname -r)

RHEL / CentOS / Rocky / AlmaLinux

sudo dnf install crash kernel-debuginfo-$(uname -r)
sudo dnf install gdb binutils makedumpfile

Ubuntu / Debian

sudo apt install crash linux-crashdump gdb binutils makedumpfile
sudo apt install linux-image-$(uname -r)-dbgsym

Self-compiled Kernel

# Enable debug symbols in kernel config
make menuconfig  # Enable CONFIG_DEBUG_INFO, CONFIG_DEBUG_INFO_REDUCED=n

# Or set directly
scripts/config --enable CONFIG_DEBUG_INFO
scripts/config --enable CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT

Verify Installation

# Check crash version
crash --version

# Verify debuginfo matches kernel
crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /proc/kcore

Core Command Reference

Debugging Analysis

CommandPurposeExample
---------------------------
sysSystem info/panic reasonsys, sys -i
logKernel message bufferlog, `log \tail`
btStack backtracebt, bt -a, bt -f
structView structuresstruct task_struct
p/px/pdPrint variablesp jiffies, px current
kmemMemory analysiskmem -i, kmem -S

Tasks and Processes

CommandPurposeExample
---------------------------
psProcess listps, `ps -m \grep UN`
setSwitch contextset , set -p
foreachBatch task operationsforeach bt, foreach UN bt
tasktask_struct contentstask
filesOpen filesfiles

Memory Operations

CommandPurposeExample
---------------------------
rdRead memoryrd , rd -p
searchSearch memorysearch -k deadbeef
vtopAddress translationvtop
listTraverse linked listslist task_struct.tasks -h

bt Command Details

The most important debugging command:

crash> bt              # Current task stack
crash> bt -a           # All CPU active tasks
crash> bt -f           # Expand stack frame raw data
crash> bt -F           # Symbolic stack frame data
crash> bt -l           # Show source file and line number
crash> bt -e           # Search for exception frames
crash> bt -v           # Check stack overflow
crash> bt -R <sym>     # Only show stacks referencing symbol
crash> bt <pid>        # Specific process

Context Management

Crash session has a "current context" affecting bt, files, vm commands:

crash> set              # View current context
crash> set <pid>        # Switch to specified PID
crash> set <task_addr>  # Switch to task address
crash> set -p           # Restore to panic task

Session Control

# Output control
crash> set scroll off   # Disable pagination
crash> sf               # Alias for scroll off

# Output redirection
crash> foreach bt > bt.all

# GDB passthrough
crash> gdb bt           # Single gdb invocation
crash> set gdb on       # Enter gdb mode
(gdb) info registers
(gdb) set gdb off

# Read commands from file
crash> < commands.txt

ARM64 / x86_64 Quick Reference

Architecture Differences in crash Analysis

Aspectx86_64ARM64
-----------------------
crash commandcrash vmlinux vmcorecrash_arm64 ... -m ... vmlinux vmcore
KASLRVMCOREINFO auto-handledMust pass -m kaslr=
Virtual address bitsfixedMust pass -m vabits_actual=
Physical basephys_base from VMCOREINFOMust pass -m phys_offset=
VA-PA offset__START_KERNEL_mapMust pass -m kimage_voffset=
Frame pointerRBP (often optimized away)FP (x29) explicit
Calling conventionRDI/RSI/RDX/RCX/R8/R9X0-X7

> For complete ARM64 address parameter derivation, see references/arm64-crash-params.md.

> For kdump end-to-end setup, see references/kdump-setup-guide.md.

ARM64 Crash Command Template

crash_arm64 \
  -m vabits_actual=39 \
  -m phys_offset=0x80000000 \
  -m kimage_voffset=0xffffffc000000000 \
  -m kaslr=0x0 \
  vmlinux vmcore

> Default kaslr=0 means KASLR disabled. Adjust based on /proc/kallsyms or VMCOREINFO.

Typical Debugging Scenarios

Kernel BUG Location

crash> sys                    # Confirm panic
crash> log | tail -50         # View logs
crash> bt                     # Call stack
crash> bt -f                  # Expand frames for parameters
crash> struct <type> <addr>   # Inspect data structures

Deadlock Analysis

crash> bt -a                  # All CPU call stacks
crash> ps -m | grep UN        # Uninterruptible processes
crash> foreach UN bt          # View waiting reasons
crash> struct mutex <addr>    # Inspect lock state

Memory Issues

crash> kmem -i                # Memory statistics
crash> kmem -S <cache>        # Inspect slab
crash> vm <pid>               # Process memory mapping
crash> search -k <pattern>    # Search memory

Stack Overflow

crash> bt -v                  # Check stack overflow
crash> bt -r                  # Raw stack data

Advanced Techniques

Deriving Lock Pointers from Stack Backtrace (ARM64)

> Source: Kernel panic 实验室 - Kernel panic 实战之读写锁推导

When a task is blocked waiting for a lock, you can derive the lock address by reading callee-saved registers from the stack:

# 1. Find FP (frame pointer) from backtrace
#    The value in [...] is the FP of that function
crash> bt
PID: 1234
#3 [fffffc09c4f3ab0] schedule_preempt_disable
#4 [fffffc09c4f3b30] rwsem_down_write_slowpath
#5 [fffffc09c4f3b90] down_write

# 2. Disassemble the calling function to find where it puts the lock pointer
crash> dis -xl down_write
    mov  x0, x19                # x0 = x19 (lock pointer)
    mov  w1, #0x2
    bl   rwsem_down_write_slowpath

# 3. Disassemble the callee to find where x19 is saved to stack
crash> dis -xl rwsem_down_write_slowpath
    stp  x20, x19, [sp, #176]   # x19 saved at sp+176

# 4. Calculate SP from FP: SP = FP - 0x60 (from "add x29, sp, #0x60")
#    rwsem_down_write_slowpath FP = 0xfffffc09c4f3b30
#    SP = 0xfffffc09c4f3b30 - 0x60 = 0xfffffc09c4f3ad0

# 5. Read x19 from stack: SP + 176 = 0xfffffc09c4f3b88
crash> rd 0xfffffc09c4f3b88
    fffffc09c4f3b88:  fffff80f78b0b00    ← This is the lock address!

# 6. Inspect the lock
crash> struct rw_semaphore fffff80f78b0b00 -x

Why it works: x19-x28 are callee-saved in AArch64 ABI, so callees must save them on stack before clobbering. By finding where callee saved the register, you can recover the lock address.

> x86_64 equivalent: Use RBP chain with bt -f. Note that with -fomit-frame-pointer, this technique may fail; in that case use bt -F or look for explicit stack frames.

Memory Leak Diagnostic (Three-Layer Check)

> Source: Kernel panic 实验室 - Kernel driver 内存泄露问题排查指南

Three independent paths to diagnose memory leaks:

# === Layer 1: /proc 三件套 (read from running system or captured info) ===
# MemAvailable 持续下降 + SUnreclaim 持续增加 → slab 内存泄露
cat /proc/meminfo
cat /proc/slabinfo
cat /proc/buddyinfo

# === Layer 2: SLAB-specific (slub_debug) ===
# In bootargs: slub_debug=u,kmalloc-512
# Then read:
cat /sys/kernel/debug/slab/kmalloc-512/alloc_traces
cat /sys/kernel/debug/slab/kmalloc-512/free_traces

# === Layer 3: >8K allocations (page_owner) ===
# SUnreclaim rises but slabinfo flat → kmalloc > 8K uses alloc_pages directly
# Enable CONFIG_PAGE_OWNER + boot with page_owner=on
# Then:
echo 1 > /sys/kernel/debug/page_owner/enable
# Periodic dumps, then diff:
./page_owner_sort --cull name,ator,stacktrace page_owner_begin.txt > begin.txt
./page_owner_sort --cull name,ator,stacktrace page_owner_end.txt   > end.txt
# Compare begin.txt vs end.txt - rising stacks are leaks

# === Alternative: kmemleak ===
# CONFIG_DEBUG_KMEMLEAK + kmemleak=on bootarg
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak

Chained Queries

crash> bt -f                  # Get pointers
crash> struct file.f_dentry <addr>
crash> struct dentry.d_inode <addr>
crash> struct inode.i_pipe <addr>

Batch Slab Inspection

crash> kmem -S inode_cache | grep counter | grep -v "= 1"

Kernel Linked List Traversal

crash> list task_struct.tasks -s task_struct.pid -h <start>
crash> list -h <addr> -s dentry.d_name.name

Extended Reference

For detailed information, refer to the following reference files:

FileContent
---------------
references/advanced-commands.mdAdvanced commands: list, rd, search, vtop, kmem, foreach
references/vmcore-format.mdvmcore file format, ELF structure, VMCOREINFO
references/case-studies.mdDebugging cases: kernel BUG, deadlock, OOM, NULL pointer, stack overflow
references/debug-tools-guide.mdAdvanced debugging tools: KASAN, Kprobes, Kmemleak, UBSAN (require kernel rebuild)
references/kdump-setup-guide.mdNEW End-to-end kdump configuration (x86_64 + ARM64, crashkernel syntax, sysrq triggers)
references/arm64-crash-params.mdNEW ARM64-specific crash address parameters (vabits_actual, phys_offset, kimage_voffset, kaslr)
references/sources.mdNEW Complete bibliography of reference materials used to enhance this skill

Usage:

crash> help <command>        # Built-in help
# Or ask Claude to view reference files

Common Errors

crash: vmlinux and vmcore do not match!
# -> Ensure vmlinux version exactly matches vmcore

crash: cannot find booted kernel
# -> Specify vmlinux path explicitly

crash: cannot resolve symbol
# -> Check if vmlinux has debug symbols

Security Warnings

⚠️ Dangerous Operations

The following commands can cause system damage or data loss:

CommandRiskRecommendation
-------------------------------
wrWrites to live kernel memoryNEVER use on production systems - can crash or corrupt running kernel
GDB passthroughUnrestricted memory accessUse with caution, may modify memory or registers

🔒 Sensitive Data Handling

  • vmcore files contain complete kernel memory, potentially including:
  • User process memory and credentials
  • Encryption keys and secrets
  • Network connection data and passwords
  • Access control: Restrict vmcore file access to authorized personnel
  • Secure storage: Store dump files in encrypted or access-controlled directories
  • Secure disposal: Use shred or secure delete when disposing of vmcore files

🛡️ Best Practices

  1. Only analyze vmcore files in isolated/test environments when possible
  2. Never share raw vmcore files publicly without sanitization
  3. Consider using makedumpfile -d to filter sensitive pages before analysis
  4. Document and audit all crash analysis sessions for compliance

Important Notes

  1. Version Match: vmlinux must exactly match the vmcore kernel version
  2. Debug Info: Must use vmlinux with debug symbols
  3. Context Awareness: bt, files, vm commands are affected by current context
  4. Live System Modification: wr command modifies running kernel, extremely dangerous

Resources

Contributing

This is an open-source project. Contributions are welcome!

  • GitHub Repository: https://github.com/crazyss/linux-kernel-crash-debug
  • Report Issues: GitHub Issues
  • Submit PRs: Pull requests are welcome for bug fixes, new features, or documentation improvements

See CONTRIBUTING.md for guidelines.

版本历史

共 4 个版本

  • v1.3.2 当前
    2026-06-14 22:56
  • v1.0.4
    2026-05-03 03:24 安全 安全
  • v1.0.2
    2026-03-29 09:36
  • v1.0.0
    2026-03-26 21:40

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,926
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 166 📥 60,295
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 199 📥 65,287