Sungju's Slow Life

Personal journal


vmcore

  • How to track SLAB usage using slub_debug=U

    When there is high usage in kernel side, most common reason is high usage in SLAB. As the SLAB memory blocks are all data, it is hard to tell which parts of code was consuming this memory blocks. Fortunately, there is this kernel option slub_debug=U which is saving backtrace of the allocation and freeing calls.… Continue reading

  • Print callgraph of a function

    Sometimes you may want to see what functions are called in a function in multiple level. Below command in my extension may help. crash> edis -c irq_exit {irq_exit} -+- {rcu_irq_exit} -+- {warn_slowpath_null} |- {idle_cpu} |- {tick_nohz_stop_sched_tick} -+- {ktime_get} | |- {update_ts_time_stats} | |- {sched_clock_idle_sleep_event} | |- {rcu_needs_cpu} | |- {select_nohz_load_balancer} | |- {rcu_enter_nohz} | |-… Continue reading

  • An example case with some of my commands

    System got high load average and it wasn’t responding for long which is a typical hang situation. It shows total 56 tasks in D (Uninterruptible) state and 5 tasks were in D state longer than 120 seconds which is considered as a hung task. Let’s see what it was waiting for in this process. Alright,… Continue reading

  • pycrashext – A rich python extension

    Based on Pykdump, I wrote a set of plugins named ‘pycrashext’ which is basically trying to help to reduce the troubleshooting time.  My favorite command in this set is ‘edis’ which can display source code in between disassembled lines. This requires an additional source code server with source codes, but once you have it, it… Continue reading

  • Python/CRASH API aka pkydump

    I am dealing with vmcore analysis for the most of my daily work. To speed up the analysis, I needed some extra command set on top of the commands ‘crash’ is providing. Luckily there is a tool names ‘pkydump’ which is a crash extension and also provides a way to implement extensions using python. I… Continue reading

  • How to disassemble a module from a vmcore

    There are times that you have to deal with a module which you don’t have source code. Only thing we can do is disassemble it, but if you don’t have actual module binary, this is also tough. Luckily, vmcore has all the code loaded into the memory. So, here’s the steps to get disassembled code… Continue reading

  • crash extension ‘pstree’

    ‘crash’ is useful tool to analyse system crashes or debugging in Linux system. It has many useful commands, but sometimes I wanted to get full picture of process list that was running at the time of crash. You can get process list with ‘ps’, but if you want to get hierarchical view, only ‘ps -p’… Continue reading

  • Jump into vmcore analysis – Step 8

    There’s a time you want to check the local variables or other entries in the stack. Below is an example that was crashed in ‘kmem_freepages’ and needed to check why it’s crashed whiling freeing it. PID: 26 TASK: ffff81027f9197a0 CPU: 0 COMMAND: “events/0” #0 [ffff81027f92fa90] crash_kexec at ffffffff800aaa0c #1 [ffff81027f92fb50] __die at ffffffff8006520f #2 [ffff81027f92fb90]… Continue reading

  • Jump into vmcore analysis – Step 7

    If the vmcore was generated by human and you want to check who actually was, you might need to check the related process. There are various options in ‘ps’ command, so, you would be able to check it with below steps. crash> ps -a 6326 PID: 6326 TASK: ffff810402165820 CPU: 1 COMMAND: “fuser” ARG: fuser… Continue reading

  • Jump into vmcore analysis – Step 6

    The real merit of the vmcore is that you can trace the code with the current value each variable holds. Here you can find one example that traces the filesystem which ended up with the corrupted data entry somehow. http://pagead2.googlesyndication.com/pagead/show_ads.js crash> bt PID: 6326 TASK: ffff810402165820 CPU: 1 COMMAND: “fuser” #0 [ffff8103b54efa80] crash_kexec at ffffffff800b099c… Continue reading

  • Jump into vmcore analysis – Step 5

    With ‘bt’ command you can check the processes that occupied the CPU at the time of crash. But, there is the time you want to check other processes’s backtrace to see what interaction had been established between processes. You can check it by using one of below two methods. crash> set 24960 PID: 24960 COMMAND:… Continue reading

  • Jump into vmcore analysis – Step 4

    There are times the crash had happened because of the lack of memory. Or times that system had hard time because of the memory issue. To check those, you can use below command. crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 16464824 62.8 GB —- FREE 3807386 14.5 GB 23% of TOTAL MEM USED 12657438… Continue reading