Sungju's Slow Life

Personal journal


kernel

  • How to track SLAB usage using slub_debug=U

    When there is high usage in kernel side, most common reason is high usage in SLAB. As the SLAB memory blocks are all data, it is hard to tell which parts of code was consuming this memory blocks. Fortunately, there is this kernel option slub_debug=U which is saving backtrace of the allocation and freeing calls.… Continue reading

  • 5-level paging in KVM

    Latest kernels such as RHEL8 and above are supporting 5-level page table. You can check if this is available/enabled by looking at /boot/config-$(uname -r). If you don’t want to use 5-level page table, you can disable that by adding ‘no5lvl’ as a kernel parameter in grub file (/boot/grub2/grubenv) and reboot the system. This 5 level… Continue reading

  • How intel_idle works

    When the system is in IDLE state which means nothing to run and swapper is running, it calls cpuidle_idle_call() like shown in the below. This cpuidle_idle_call() is called from arch_cpu_idle(). cpuidle_idle_call() is the main idle loop which is checking idle driver and do further steps if the driver is installed and active. It can change… Continue reading

  • What is ‘page_cache’, how it is managed and how ‘drop_caches’ dropping this pages?

    – The “buffers/cache” values reported by free include the page cache, but not the dentry cache which is saved in slab ‘dentry_cache’. – page cache is increased and decreased based on the disk access activities and managed by each super block (it means each disk). – ‘echo 1 > /proc/sys/vm/drop_caches’ frees page caches by calling… Continue reading

  • What happens if numa=off is provided in kernel parameter?

    If “numa=off” is in kernel boot parameter, it will mark ‘numa_off’ global variable which will be checked during initialization function which is ‘x86_numa_init()’ in x86_64. This will make it not call ‘numa_init’ if numa_off is 1. static __init int numa_setup(char *opt) { if (!opt) return -EINVAL; if (!strncmp(opt, “off”, 3)) numa_off = 1; #ifdef CONFIG_NUMA_EMU… Continue reading

  • What’s TAINT_WARN?

    TAINT_WARN is explained in kernel/panic.c as ‘Taint on warning’. static const struct tnt tnts[] = { … { TAINT_WARN, ‘W’, ‘ ‘ }, } /** … * ‘W’ – Taint on warning. … */ This flag is turned on from “__WARN()” to confirm that the system had ‘WARNING’ messages once or more time. #define __WARN()… Continue reading

  • What’s virtual address limit of 32bit/64bit Linux kernel?

    RHEL 5 code 32bit: include/asm-i386/processor.h /* * User space process size: 3GB (default). */ #define TASK_SIZE (PAGE_OFFSET) 64bit: include/asm-x86_64/processor.h /* * User space process size. 47bits minus one guard page. */ #define TASK_SIZE64 (0x800000000000UL – 4096) /* This decides where the kernel will search for a free chunk of vm * space during mmap’s. */… Continue reading

  • Personal memo for ‘Automatic NUMA Balancing’

    Automatic NUMA Balancing It is described in Documentation/sysctl/kernel.txt numa_balancing Enables/disables automatic page fault based NUMA memory balancing. Memory is moved automatically to nodes that access it often. Enables/disables automatic NUMA memory balancing. On NUMA machines, there is a performance penalty if remote memory is accessed by a CPU. When this feature is enabled the kernel… Continue reading

  • Tracing a function with jprobes

    One problem with kprobes is that you can’t check validity of the arguments passed to the function you are monitoring. For that matter, jprobes comes in. It’s basically make a wrapper for the existing function and will be called instead without make any changes to the existing function. jprobes is an extention to the kprobes… Continue reading

  • Tracing an instruction or a function with kprobes

    As the kernel is running on top of all other services, it’s hard to debug it in a live system. You can use ‘gdb’ on a live system, but you only can check the current values of some exported symbols. You can’t use breakpoint on a running kernel. If you set a breakpoint, it’ll stop… Continue reading

  • Extending SysRq

    Basics about SysRq During the kernel debugging, you can use SysRq to get some details about the system status at some point or to execute some commands without typing the command. We can use one of the below method to trigger the operation. Method 1. $ echo 1 > /proc/sys/kernel/sysrq Press ‘Alt-SysRq-[key]’ combination to trigger… Continue reading

  • Jump into vmcore analysis – Step 8

    There’s a time you want to check the local variables or other entries in the stack. Below is an example that was crashed in ‘kmem_freepages’ and needed to check why it’s crashed whiling freeing it. PID: 26 TASK: ffff81027f9197a0 CPU: 0 COMMAND: “events/0” #0 [ffff81027f92fa90] crash_kexec at ffffffff800aaa0c #1 [ffff81027f92fb50] __die at ffffffff8006520f #2 [ffff81027f92fb90]… Continue reading