How to write mpykdump extension

If you are dealing with a vmcore (Linux memory dump), you must be familiar with 'crash'. It is a powerful tool, but it doesn't cover all the data you can find in Linux kernel. So, there comes 'mpykdump' which is a crash extension which understands python code. mpykdump comes with many prebuilt commands that you… Continue reading How to write mpykdump extension

Print callgraph of a function

Sometimes you may want to see what functions are called in a function in multiple level. Below command in my extension may help. crash> edis -c irq_exit {irq_exit} -+- {rcu_irq_exit} -+- {warn_slowpath_null} |- {idle_cpu} |- {tick_nohz_stop_sched_tick} -+- {ktime_get} | |- {update_ts_time_stats} | |- {sched_clock_idle_sleep_event} | |- {rcu_needs_cpu} | |- {select_nohz_load_balancer} | |- {rcu_enter_nohz} | |-… Continue reading Print callgraph of a function

Why error message not goes into pipe nor redirected path in ‘crash’?

In the below example, the error always shows in the console. crash> sym ffffffffa02ef86 > /dev/null sym: invalid address: ffffffffa02ef86 This 'sym' command is implemented in 'void cmd_sym(void)' function in crash. /* * This command may be used to: * * 1. Translate a symbol to its value. * 2. Translate a value to it… Continue reading Why error message not goes into pipe nor redirected path in ‘crash’?

An example case with some of my commands

System got high load average and it wasn't responding for long which is a typical hang situation. crash> sys | egrep -e LOAD -e CPUS CPUS: 14 LOAD AVERAGE: 520.69, 210.35, 79.69 crash> hangcheck [0 00:00:00.003] [UN] PID: 5507 TASK: ffff8d257723cf10 CPU: 6 COMMAND: "ora_dia0_gladp6" [0 00:00:00.006] [UN] PID: 6068 TASK: ffff8d266239cf10 CPU: 7 COMMAND:… Continue reading An example case with some of my commands

Jump into vmcore analysis – Step 8

There's a time you want to check the local variables or other entries in the stack. Below is an example that was crashed in 'kmem_freepages' and needed to check why it's crashed whiling freeing it. PID: 26 TASK: ffff81027f9197a0 CPU: 0 COMMAND: "events/0" #0 [ffff81027f92fa90] crash_kexec at ffffffff800aaa0c #1 [ffff81027f92fb50] __die at ffffffff8006520f #2 [ffff81027f92fb90]… Continue reading Jump into vmcore analysis – Step 8

Jump into vmcore analysis – Step 6

The real merit of the vmcore is that you can trace the code with the current value each variable holds. Here you can find one example that traces the filesystem which ended up with the corrupted data entry somehow. http://pagead2.googlesyndication.com/pagead/show_ads.js crash> bt PID: 6326 TASK: ffff810402165820 CPU: 1 COMMAND: "fuser" #0 [ffff8103b54efa80] crash_kexec at ffffffff800b099c… Continue reading Jump into vmcore analysis – Step 6

Jump into vmcore analysis – Step 5

With 'bt' command you can check the processes that occupied the CPU at the time of crash. But, there is the time you want to check other processes's backtrace to see what interaction had been established between processes. You can check it by using one of below two methods. crash> set 24960 PID: 24960 COMMAND:… Continue reading Jump into vmcore analysis – Step 5

Jump into vmcore analysis – Step 3

Now we are ready to deep dive into vmcore analysis to confirm what went wrong in the system. In general, first thing I'm checking is the system's log. crash> log .... .... end_request: I/O error, dev sdajm, sector 0 end_request: I/O error, dev sdajm, sector 8 end_request: I/O error, dev sdajm, sector 0 NMI Watchdog… Continue reading Jump into vmcore analysis – Step 3