There’s a time you want to check the local variables or other entries in the stack.
Below is an example that was crashed in ‘kmem_freepages’ and needed to check why it’s crashed whiling freeing it.
PID: 26 TASK: ffff81027f9197a0 CPU: 0 COMMAND: "events/0" #0 [ffff81027f92fa90] crash_kexec at ffffffff800aaa0c #1 [ffff81027f92fb50] __die at ffffffff8006520f #2 [ffff81027f92fb90] do_page_fault at ffffffff80066e1c #3 [ffff81027f92fc80] error_exit at ffffffff8005dde9 [exception RIP: kmem_freepages+58] RIP: ffffffff800d6bd9 RSP: ffff81027f92fd30 RFLAGS: 00010003 RAX: 000000000ffffffe RBX: ffff810108ca30c0 RCX: 0000000000000000 RDX: ffffffe97f684000 RSI: ffffffe97f684000 RDI: ffff810108ca30c0 RBP: 0000000000000001 R8: 00000000001857ea R9: 0000000000000000 R10: 0000000200000000 R11: ffff81027f9197a0 R12: ffff810108ca30c0 R13: ffff80e97f684000 R14: ffff81017f684660 R15: 0000000000000010 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff81027f92fd58] slab_destroy at ffffffff800d72c1 #5 [ffff81027f92fd88] free_block at ffffffff800d73e6 #6 [ffff81027f92fdc8] drain_array at ffffffff800d74af #7 [ffff81027f92fe08] cache_reap at ffffffff800d7fd3 #8 [ffff81027f92fe38] run_workqueue at ffffffff8004d139 #9 [ffff81027f92fe78] worker_thread at ffffffff80049aaa #10 [ffff81027f92fee8] kthread at ffffffff80032360 #11 [ffff81027f92ff48] kernel_thread at ffffffff8005dfb1
http://pagead2.googlesyndication.com/pagead/show_ads.js
First thing is disassemble the kmem_freepages to compare with the original C code.
crash> dis -rl kmem_freepages+58 /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1598 0xffffffff800d6b9f : push %r13 /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1599 0xffffffff800d6ba1 : mov $0x1,%eax /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1598 0xffffffff800d6ba6 : mov %rsi,%r13 0xffffffff800d6ba9 : push %r12 0xffffffff800d6bab : mov %rdi,%r12 0xffffffff800d6bae : push %rbp 0xffffffff800d6baf : push %rbx 0xffffffff800d6bb0 : sub $0x8,%rsp /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1599 0xffffffff800d6bb4 : mov 0xa10(%rdi),%ecx 0xffffffff800d6bba : shl %cl,%eax /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/include/linux/mmzone.h: 562 0xffffffff800d6bbc : xor %ecx,%ecx /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1599 0xffffffff800d6bbe : movslq %eax,%rbp /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1600 0xffffffff800d6bc1 : movabs $0x7f0000000000,%rax 0xffffffff800d6bcb : lea (%rsi,%rax,1),%rax 0xffffffff800d6bcf : mov %rax,%rsi /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/include/linux/mmzone.h: 603 0xffffffff800d6bd2 : mov %rax,%rdx /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/include/linux/mmzone.h: 562 0xffffffff800d6bd5 : shr $0x24,%rax 0xffffffff800d6bd9 : mov -0x7fb70480(,%rax,8),%rax
So, from above, we can confirm that it was crashed in the ‘mov’ operation for the value in rax register.
Let’s first check what -0x7fb70480 actually is.
crash> eval -0x7fb70480 hexadecimal: ffffffff8048fb80 decimal: 18446744071566850944 (-2142700672) octal: 1777777777760022175600 binary: 1111111111111111111111111111111110000000010010001111101110000000 crash> sym ffffffff8048fb80 ffffffff8048fb80 (B) mem_section
It is printed wrongly in dis command, but actually it’s pointing to ‘mem_section’. From the ‘bt’ output, we can see that ‘RAX’ has value ‘000000000ffffffe’. This value was calculated by converting a virtual address into a physical address with the operation ‘shr $0x24,%rax’. The actual address is ‘ffffffe97f684000’ which you can find in %rsi and if you try to check the address, it will show below error messages. Once ran the above shift operation to make it as a PFN (page frame number), the value became ‘000000000ffffffe’.
crash> rd ffffffe97f684000 rd: read error: kernel virtual address: ffffffe97f684000 type: "64-bit KVADDR"
From the disassembled code, we can see that %rax is copied to %rsi and %rdx. %rsi is saved into %r13 and %r13 also has the same value.
RSI: ffffffe97f684000 R13: ffff80e97f684000
So, the second argument to kmem_freepages() was a bogus address. Next thing to check is where this data came from. The previous function in the call stack is ‘slab_destroy’ and you can see the code here.
static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp) { void *addr = slabp->s_mem - slabp->colouroff; slab_destroy_objs(cachep, slabp); if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) { struct slab_rcu *slab_rcu; slab_rcu = (struct slab_rcu *)slabp; slab_rcu->cachep = cachep; slab_rcu->addr = addr; call_rcu(&slab_rcu->head, kmem_rcu_free); } else { kmem_freepages(cachep, addr); if (OFF_SLAB(cachep)) kmem_cache_free(cachep->slabp_cache, slabp); } }
Now disassemble the caller of slab_destroy to check the second argument ‘slabp’.
crash> dis -r ffffffff800d73e6 ... 0xffffffff800d73d4 : mov %rbx,%rsi <-- It's the second argument 0xffffffff800d73d7 : mov %r12,%rdi 0xffffffff800d73da : sub %rax,%rdx 0xffffffff800d73dd : mov %rdx,0x30(%rbp) 0xffffffff800d73e1 : callq 0xffffffff800d7241
We can see that slab pointer is saved in %rbx by disaseembling slab_destroy.
crash> dis slab_destroy 0xffffffff800d7241 : push %r13 0xffffffff800d7243 : xor %r13d,%r13d 0xffffffff800d7246 : push %r12 0xffffffff800d7248 : push %rbp 0xffffffff800d7249 : mov %rsi,%rbp 0xffffffff800d724c : push %rbx <-- slab_destroy saved the second argument into the stack as fourth entry. ...
To retrieve the data from the stack, you should remember below fact.
- Using stack decreament address
- First entry is the return address (ffffffff800d73e6)
crash> bt -f PID: 26 TASK: ffff81027f9197a0 CPU: 0 COMMAND: "events/0" ... #4 [ffff81027f92fd58] slab_destroy at ffffffff800d72c1 ffff81027f92fd60: 0000000000000000 ffff81017f684000 <-- %rbx ffff81027f92fd70: ffff810108c99c40 ffff810108ca30c0 ffff81027f92fd80: ffff810108ca4098 ffffffff800d73e6 <-- Return address #5 [ffff81027f92fd88] free_block at ffffffff800d73e6 <-- Caller address
Now check the slabp as we know the address from the stack.
crash> slab ffff81017f684000 struct slab { list = { next = 0x100100, prev = 0x200200 }, colouroff = 103079215224, s_mem = 0xffff81017f684078, inuse = 0, free = 7, nodeid = 0 }
The code uses slabp->s_mem – slabp->colour off which is the bad address 0xffff80e97f684000.
crash> rd 0xffff81017f684078 ffff81017f684078: 0000001000000000 ........
s_mem has correct pointer, but the colour off has way too big value.
Checking slab entries also shows the some corrupted links.
crash> kmem -s ffff81017f684000 kmem: dentry_cache: partial list: slab: ffff81017f684000 bad next pointer: 100100 kmem: dentry_cache: full list: slab: ffff81017f684000 bad next pointer: 100100 kmem: dentry_cache: free list: slab: ffff81017f684000 bad next pointer: 100100 kmem: dentry_cache: address not found in cache: ffff81017f684000
In this case some code in the kernel corrupt the entry, so, it might be necessary to test with the debug kernel to check who tries to write beyond the allowed memory range.
In this case, it doesn’t show the cause of the issue, but can learn how we can dig into the stack to check the possible cause.
Leave a Reply to Vince Cancel reply