How intel_idle works

When the system is in IDLE state which means nothing to run and swapper is running, it calls cpuidle_idle_call() like shown in the below. [exception RIP: cpuidle_enter_state+0x57] RIP: ffffffff833c1a67 RSP: ffff8917ce84be60 RFLAGS: 00000202 RAX: 0001332321398719 RBX: ffffffff82eaf37b RCX: 0000000000000018 RDX: 0000000225c17d03 RSI: ffff8917ce84bfd8 RDI: 0001332321398719 RBP: ffff8917ce84be88 R8: 0000000000000130 R9: 0000000000000018 R10: 00000000000000c3 R11: 0000000000000400… Continue reading How intel_idle works

What is ‘page_cache’, how it is managed and how ‘drop_caches’ dropping this pages?

- The "buffers/cache" values reported by free include the page cache, but not the dentry cache which is saved in slab 'dentry_cache'. - page cache is increased and decreased based on the disk access activities and managed by each super block (it means each disk). - 'echo 1 > /proc/sys/vm/drop_caches' frees page caches by calling… Continue reading What is ‘page_cache’, how it is managed and how ‘drop_caches’ dropping this pages?

What happens if numa=off is provided in kernel parameter?

If "numa=off" is in kernel boot parameter, it will mark 'numa_off' global variable which will be checked during initialization function which is 'x86_numa_init()' in x86_64. This will make it not call 'numa_init' if numa_off is 1. static __init int numa_setup(char *opt) { if (!opt) return -EINVAL; if (!strncmp(opt, "off", 3)) numa_off = 1; #ifdef CONFIG_NUMA_EMU… Continue reading What happens if numa=off is provided in kernel parameter?

What’s virtual address limit of 32bit/64bit Linux kernel?

RHEL 5 code 32bit: include/asm-i386/processor.h /* * User space process size: 3GB (default). */ #define TASK_SIZE (PAGE_OFFSET) 64bit: include/asm-x86_64/processor.h /* * User space process size. 47bits minus one guard page. */ #define TASK_SIZE64 (0x800000000000UL - 4096) /* This decides where the kernel will search for a free chunk of vm * space during mmap's. */… Continue reading What’s virtual address limit of 32bit/64bit Linux kernel?

Personal memo for ‘Automatic NUMA Balancing’

Automatic NUMA Balancing It is described in Documentation/sysctl/kernel.txt numa_balancing Enables/disables automatic page fault based NUMA memory balancing. Memory is moved automatically to nodes that access it often. Enables/disables automatic NUMA memory balancing. On NUMA machines, there is a performance penalty if remote memory is accessed by a CPU. When this feature is enabled the kernel… Continue reading Personal memo for ‘Automatic NUMA Balancing’

Tracing a function with jprobes

One problem with kprobes is that you can't check validity of the arguments passed to the function you are monitoring. For that matter, jprobes comes in. It's basically make a wrapper for the existing function and will be called instead without make any changes to the existing function. jprobes is an extention to the kprobes… Continue reading Tracing a function with jprobes

Jump into vmcore analysis – Step 8

There's a time you want to check the local variables or other entries in the stack. Below is an example that was crashed in 'kmem_freepages' and needed to check why it's crashed whiling freeing it. PID: 26 TASK: ffff81027f9197a0 CPU: 0 COMMAND: "events/0" #0 [ffff81027f92fa90] crash_kexec at ffffffff800aaa0c #1 [ffff81027f92fb50] __die at ffffffff8006520f #2 [ffff81027f92fb90]… Continue reading Jump into vmcore analysis – Step 8

Jump into vmcore analysis – Step 6

The real merit of the vmcore is that you can trace the code with the current value each variable holds. Here you can find one example that traces the filesystem which ended up with the corrupted data entry somehow. crash> bt PID: 6326 TASK: ffff810402165820 CPU: 1 COMMAND: "fuser" #0 [ffff8103b54efa80] crash_kexec at ffffffff800b099c… Continue reading Jump into vmcore analysis – Step 6