With ‘bt’ command you can check the processes that occupied the CPU at the time of crash. But, there is the time you want to check other processes’s backtrace to see what interaction had been established between processes. You can check it by using one of below two methods.
crash> set 24960 PID: 24960 COMMAND: "oracle" TASK: ffff8102088b2080 [THREAD_INFO: ffff810399952000] CPU: 6 STATE: TASK_RUNNING (ACTIVE) crash> bt PID: 24960 TASK: ffff8102088b2080 CPU: 6 COMMAND: "oracle" #0 [ffff81048e1f6f20] crash_nmi_callback at ffffffff8007bf44 #1 [ffff81048e1f6f40] do_nmi at ffffffff8006688a #2 [ffff81048e1f6f50] nmi at ffffffff80065eef [exception RIP: .text.lock.spinlock+17] RIP: ffffffff80065c0b RSP: ffff810399953be0 RFLAGS: 00000086 RAX: 0000000000000246 RBX: 0000000010000000 RCX: 000000005049177f RDX: 0000000000000100 RSI: 0000000006edaf33 RDI: ffff810dfec33198 RBP: ffff810dfec33188 R8: 0000000000000000 R9: 0000000000000001 R10: ffff810399953d98 R11: ffff810399953dd0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- --- #3 [ffff810399953be0] .text.lock.spinlock at ffffffff80065c0b (via _spin_lock_irqsave) #4 [ffff810399953be0] ocfs2_cluster_lock at ffffffff88c50ff0 [ocfs2] #5 [ffff810399953cb0] ocfs2_rw_lock at ffffffff88c566f6 [ocfs2] #6 [ffff810399953d70] audit_sockaddr at ffffffff800b9197 #7 [ffff810399953dc0] lock_timer_base at ffffffff8003ddd5 #8 [ffff810399953de0] __mod_timer at ffffffff8001cb9f #9 [ffff810399953e20] aio_pwrite at ffffffff800efd18 #10 [ffff810399953e40] aio_run_iocb at ffffffff800f07d9 #11 [ffff810399953e60] io_submit_one at ffffffff800f1343 #12 [ffff810399953eb0] sys_io_submit at ffffffff800f195a #13 [ffff810399953f80] tracesys at ffffffff8005e28d (via system_call) RIP: 0000003bc6800607 RSP: 00007fff674204a8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: ffffffff8005e28d RCX: ffffffffffffffff RDX: 00007fff674204b0 RSI: 0000000000000001 RDI: 00002b006d4fe000 RBP: 00002b006d3f19d0 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff67422d70 R13: 0000000000000001 R14: 0000000000000001 R15: 00007fff674224f0 ORIG_RAX: 00000000000000d1 CS: 0033 SS: 002b
or
crash> bt 5544 PID: 5544 TASK: ffff810c7fce8080 CPU: 1 COMMAND: "dlm_thread" #0 [ffff81048e10df20] crash_nmi_callback at ffffffff8007bf44 #1 [ffff81048e10df40] do_nmi at ffffffff8006688a #2 [ffff81048e10df50] nmi at ffffffff80065eef [exception RIP: .text.lock.spinlock+2] RIP: ffffffff80065bfc RSP: ffff810c8e197e18 RFLAGS: 00000082 RAX: ffff810dfec331b0 RBX: ffff810dfec33188 RCX: ffff810dfec331b0 RDX: 0000000000000000 RSI: 0000000000000145 RDI: ffff81045b1a9670 RBP: ffff81045b1a9000 R8: 0000000000000045 R9: ffff810851b5c6c0 R10: 0000000100f8ee45 R11: ffffffff88c57879 R12: 0000000000000246 R13: 0000000000000003 R14: ffff81045b1a9000 R15: ffff810b30c34338 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- --- #3 [ffff810c8e197e18] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock) #4 [ffff810c8e197e18] ocfs2_schedule_blocked_lock at ffffffff88c503cb [ocfs2] #5 [ffff810c8e197e38] ocfs2_blocking_ast at ffffffff88c57bdd [ocfs2] #6 [ffff810c8e197e68] dlm_thread at ffffffff88b549b9 [ocfs2_dlm] #7 [ffff810c8e197ee8] kthread at ffffffff80032bdc #8 [ffff810c8e197f48] kernel_thread at ffffffff8005efb1
If there were system hang at the time of the crash, it might be worth to check blocked processes. Especially checking ‘Uninterruptible’ state would be useful.
crash> ps | grep UN ... 5685 5250 4 ffff810c3f0e4820 UN 0.2 142536 121012 osysmond 5739 563 0 ffff810871c220c0 UN 0.0 0 0 [dlm_thread] ... crash> set 5739 PID: 5739 COMMAND: "dlm_thread" TASK: ffff810871c220c0 [THREAD_INFO: ffff810c75f44000] CPU: 0 STATE: TASK_UNINTERRUPTIBLE crash> bt PID: 5739 TASK: ffff810871c220c0 CPU: 0 COMMAND: "dlm_thread" #0 [ffff810c75f45b70] schedule at ffffffff80063f96 #1 [ffff810c75f45c48] o2net_send_message_vec at ffffffff88b033c5 [ocfs2_nodemanager] #2 [ffff810c75f45e18] __dlm_lockres_unused at ffffffff88b532d4 [ocfs2_dlm] #3 [ffff810c75f45e28] __dlm_lockres_calc_usage at ffffffff88b533ca [ocfs2_dlm] #4 [ffff810c75f45e68] dlm_thread at ffffffff88b54918 [ocfs2_dlm] #5 [ffff810c75f45ee8] kthread at ffffffff80032bdc #6 [ffff810c75f45f48] kernel_thread at ffffffff8005efb1 crash> task -R last_ran PID: 5739 TASK: ffff810871c220c0 CPU: 0 COMMAND: "dlm_thread" last_ran = 7004996775750, crash> runq | grep "CPU 0" CPU 0 RUNQUEUE: ffff810009004420 crash> rq ffff810009004420 | grep timestamp_last_tick timestamp_last_tick = 7017069075482, crash> pd (7017069075482 - 7004996775750)/1000000000 $2 = 12 crash> pd 12/60 $3 = 0
From above command, you can confirm that ‘dlm_thread’ was in Uninterruptible state for 12 seconds.
To check active processes in each CPU without ‘bt -a’, you can use below command.
crash> ps | grep "> " > 0 0 0 ffffffff80309b60 RU 0.0 0 0 [swapper] > 0 1 3 ffff81107ffc97e0 RU 0.0 0 0 [swapper] > 0 1 4 ffff81107ffbf0c0 RU 0.0 0 0 [swapper] > 0 1 5 ffff81048e1c30c0 RU 0.0 0 0 [swapper] > 0 1 7 ffff81048e1c4100 RU 0.0 0 0 [swapper] > 5543 563 2 ffff81087e14b0c0 RU 0.0 0 0 [ocfs2dc] > 5544 563 1 ffff810c7fce8080 RU 0.0 0 0 [dlm_thread] > 24960 1 6 ffff8102088b2080 RU 0.1 3329088 94724 oracle
Leave a Reply