Sungju's Slow Life

Personal journal


Jump into vmcore analysis – Step 5

With ‘bt’ command you can check the processes that occupied the CPU at the time of crash. But, there is the time you want to check other processes’s backtrace to see what interaction had been established between processes. You can check it by using one of below two methods.

crash> set 24960
    PID: 24960
COMMAND: "oracle"
   TASK: ffff8102088b2080  [THREAD_INFO: ffff810399952000]
    CPU: 6
  STATE: TASK_RUNNING (ACTIVE)
crash> bt
PID: 24960  TASK: ffff8102088b2080  CPU: 6   COMMAND: "oracle"
 #0 [ffff81048e1f6f20] crash_nmi_callback at ffffffff8007bf44
 #1 [ffff81048e1f6f40] do_nmi at ffffffff8006688a
 #2 [ffff81048e1f6f50] nmi at ffffffff80065eef
    [exception RIP: .text.lock.spinlock+17]
    RIP: ffffffff80065c0b  RSP: ffff810399953be0  RFLAGS: 00000086
    RAX: 0000000000000246  RBX: 0000000010000000  RCX: 000000005049177f
    RDX: 0000000000000100  RSI: 0000000006edaf33  RDI: ffff810dfec33198
    RBP: ffff810dfec33188   R8: 0000000000000000   R9: 0000000000000001
    R10: ffff810399953d98  R11: ffff810399953dd0  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000003
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
---  ---
 #3 [ffff810399953be0] .text.lock.spinlock at ffffffff80065c0b (via _spin_lock_irqsave)
 #4 [ffff810399953be0] ocfs2_cluster_lock at ffffffff88c50ff0 [ocfs2]
 #5 [ffff810399953cb0] ocfs2_rw_lock at ffffffff88c566f6 [ocfs2]
 #6 [ffff810399953d70] audit_sockaddr at ffffffff800b9197
 #7 [ffff810399953dc0] lock_timer_base at ffffffff8003ddd5
 #8 [ffff810399953de0] __mod_timer at ffffffff8001cb9f
 #9 [ffff810399953e20] aio_pwrite at ffffffff800efd18
#10 [ffff810399953e40] aio_run_iocb at ffffffff800f07d9
#11 [ffff810399953e60] io_submit_one at ffffffff800f1343
#12 [ffff810399953eb0] sys_io_submit at ffffffff800f195a
#13 [ffff810399953f80] tracesys at ffffffff8005e28d (via system_call)
    RIP: 0000003bc6800607  RSP: 00007fff674204a8  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: ffffffff8005e28d  RCX: ffffffffffffffff
    RDX: 00007fff674204b0  RSI: 0000000000000001  RDI: 00002b006d4fe000
    RBP: 00002b006d3f19d0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000202  R12: 00007fff67422d70
    R13: 0000000000000001  R14: 0000000000000001  R15: 00007fff674224f0
    ORIG_RAX: 00000000000000d1  CS: 0033  SS: 002b

or

crash> bt 5544
PID: 5544   TASK: ffff810c7fce8080  CPU: 1   COMMAND: "dlm_thread"
 #0 [ffff81048e10df20] crash_nmi_callback at ffffffff8007bf44
 #1 [ffff81048e10df40] do_nmi at ffffffff8006688a
 #2 [ffff81048e10df50] nmi at ffffffff80065eef
    [exception RIP: .text.lock.spinlock+2]
    RIP: ffffffff80065bfc  RSP: ffff810c8e197e18  RFLAGS: 00000082
    RAX: ffff810dfec331b0  RBX: ffff810dfec33188  RCX: ffff810dfec331b0
    RDX: 0000000000000000  RSI: 0000000000000145  RDI: ffff81045b1a9670
    RBP: ffff81045b1a9000   R8: 0000000000000045   R9: ffff810851b5c6c0
    R10: 0000000100f8ee45  R11: ffffffff88c57879  R12: 0000000000000246
    R13: 0000000000000003  R14: ffff81045b1a9000  R15: ffff810b30c34338
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
---  ---
 #3 [ffff810c8e197e18] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock)
 #4 [ffff810c8e197e18] ocfs2_schedule_blocked_lock at ffffffff88c503cb [ocfs2]
 #5 [ffff810c8e197e38] ocfs2_blocking_ast at ffffffff88c57bdd [ocfs2]
 #6 [ffff810c8e197e68] dlm_thread at ffffffff88b549b9 [ocfs2_dlm]
 #7 [ffff810c8e197ee8] kthread at ffffffff80032bdc
 #8 [ffff810c8e197f48] kernel_thread at ffffffff8005efb1

If there were system hang at the time of the crash, it might be worth to check blocked processes. Especially checking ‘Uninterruptible’ state would be useful.

crash> ps | grep UN
...
   5685   5250   4  ffff810c3f0e4820  UN   0.2  142536 121012  osysmond
   5739    563   0  ffff810871c220c0  UN   0.0       0      0  [dlm_thread]
...
crash> set 5739
    PID: 5739
COMMAND: "dlm_thread"
   TASK: ffff810871c220c0  [THREAD_INFO: ffff810c75f44000]
    CPU: 0
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 5739   TASK: ffff810871c220c0  CPU: 0   COMMAND: "dlm_thread"
 #0 [ffff810c75f45b70] schedule at ffffffff80063f96
 #1 [ffff810c75f45c48] o2net_send_message_vec at ffffffff88b033c5 [ocfs2_nodemanager]
 #2 [ffff810c75f45e18] __dlm_lockres_unused at ffffffff88b532d4 [ocfs2_dlm]
 #3 [ffff810c75f45e28] __dlm_lockres_calc_usage at ffffffff88b533ca [ocfs2_dlm]
 #4 [ffff810c75f45e68] dlm_thread at ffffffff88b54918 [ocfs2_dlm]
 #5 [ffff810c75f45ee8] kthread at ffffffff80032bdc
 #6 [ffff810c75f45f48] kernel_thread at ffffffff8005efb1
crash> task -R last_ran
PID: 5739   TASK: ffff810871c220c0  CPU: 0   COMMAND: "dlm_thread"
  last_ran = 7004996775750, 

crash> runq | grep "CPU 0"
CPU 0 RUNQUEUE: ffff810009004420
crash> rq ffff810009004420 | grep timestamp_last_tick
  timestamp_last_tick = 7017069075482, 
crash> pd (7017069075482 - 7004996775750)/1000000000
$2 = 12
crash> pd 12/60
$3 = 0

From above command, you can confirm that ‘dlm_thread’ was in Uninterruptible state for 12 seconds.

To check active processes in each CPU without ‘bt -a’, you can use below command.

crash> ps | grep "> "
>     0      0   0  ffffffff80309b60  RU   0.0       0      0  [swapper]
>     0      1   3  ffff81107ffc97e0  RU   0.0       0      0  [swapper]
>     0      1   4  ffff81107ffbf0c0  RU   0.0       0      0  [swapper]
>     0      1   5  ffff81048e1c30c0  RU   0.0       0      0  [swapper]
>     0      1   7  ffff81048e1c4100  RU   0.0       0      0  [swapper]
>  5543    563   2  ffff81087e14b0c0  RU   0.0       0      0  [ocfs2dc]
>  5544    563   1  ffff810c7fce8080  RU   0.0       0      0  [dlm_thread]
> 24960      1   6  ffff8102088b2080  RU   0.1 3329088  94724  oracle


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About Me

A software engineer who loves any technologies that makes life easier. That’s why I love Linux and Mac at the same time.

Newsletter

%d bloggers like this: