Sungju's Slow Life

Personal journal


Jump into vmcore analysis – Step 8

There’s a time you want to check the local variables or other entries in the stack.

Below is an example that was crashed in ‘kmem_freepages’ and needed to check why it’s crashed whiling freeing it.

PID: 26     TASK: ffff81027f9197a0  CPU: 0   COMMAND: "events/0"
 #0 [ffff81027f92fa90] crash_kexec at ffffffff800aaa0c
 #1 [ffff81027f92fb50] __die at ffffffff8006520f
 #2 [ffff81027f92fb90] do_page_fault at ffffffff80066e1c
 #3 [ffff81027f92fc80] error_exit at ffffffff8005dde9
    [exception RIP: kmem_freepages+58]
    RIP: ffffffff800d6bd9  RSP: ffff81027f92fd30  RFLAGS: 00010003
    RAX: 000000000ffffffe  RBX: ffff810108ca30c0  RCX: 0000000000000000
    RDX: ffffffe97f684000  RSI: ffffffe97f684000  RDI: ffff810108ca30c0
    RBP: 0000000000000001   R8: 00000000001857ea   R9: 0000000000000000
    R10: 0000000200000000  R11: ffff81027f9197a0  R12: ffff810108ca30c0
    R13: ffff80e97f684000  R14: ffff81017f684660  R15: 0000000000000010
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff81027f92fd58] slab_destroy at ffffffff800d72c1
 #5 [ffff81027f92fd88] free_block at ffffffff800d73e6
 #6 [ffff81027f92fdc8] drain_array at ffffffff800d74af
 #7 [ffff81027f92fe08] cache_reap at ffffffff800d7fd3
 #8 [ffff81027f92fe38] run_workqueue at ffffffff8004d139
 #9 [ffff81027f92fe78] worker_thread at ffffffff80049aaa
#10 [ffff81027f92fee8] kthread at ffffffff80032360
#11 [ffff81027f92ff48] kernel_thread at ffffffff8005dfb1

http://pagead2.googlesyndication.com/pagead/show_ads.js

First thing is disassemble the kmem_freepages to compare with the original C code.

crash> dis -rl kmem_freepages+58
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1598
0xffffffff800d6b9f :    push   %r13
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1599
0xffffffff800d6ba1 :  mov    $0x1,%eax
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1598
0xffffffff800d6ba6 :  mov    %rsi,%r13
0xffffffff800d6ba9 : push   %r12
0xffffffff800d6bab : mov    %rdi,%r12
0xffffffff800d6bae : push   %rbp
0xffffffff800d6baf : push   %rbx
0xffffffff800d6bb0 : sub    $0x8,%rsp
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1599
0xffffffff800d6bb4 : mov    0xa10(%rdi),%ecx
0xffffffff800d6bba : shl    %cl,%eax
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/include/linux/mmzone.h: 562
0xffffffff800d6bbc : xor    %ecx,%ecx
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1599
0xffffffff800d6bbe : movslq %eax,%rbp
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/mm/slab.c: 1600
0xffffffff800d6bc1 : movabs $0x7f0000000000,%rax
0xffffffff800d6bcb : lea    (%rsi,%rax,1),%rax
0xffffffff800d6bcf : mov    %rax,%rsi
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/include/linux/mmzone.h: 603
0xffffffff800d6bd2 : mov    %rax,%rdx
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/include/linux/mmzone.h: 562
0xffffffff800d6bd5 : shr    $0x24,%rax
0xffffffff800d6bd9 : mov    -0x7fb70480(,%rax,8),%rax

So, from above, we can confirm that it was crashed in the ‘mov’ operation for the value in rax register.

Let’s first check what -0x7fb70480 actually is.

crash> eval -0x7fb70480
hexadecimal: ffffffff8048fb80  
    decimal: 18446744071566850944  (-2142700672)
      octal: 1777777777760022175600
     binary: 1111111111111111111111111111111110000000010010001111101110000000

crash> sym ffffffff8048fb80
ffffffff8048fb80 (B) mem_section

It is printed wrongly in dis command, but actually it’s pointing to ‘mem_section’. From the ‘bt’ output, we can see that ‘RAX’ has value ‘000000000ffffffe’. This value was calculated by converting a virtual address into a physical address with the operation ‘shr $0x24,%rax’. The actual address is ‘ffffffe97f684000’ which you can find in %rsi and if you try to check the address, it will show below error messages. Once ran the above shift operation to make it as a PFN (page frame number), the value became ‘000000000ffffffe’.

crash> rd ffffffe97f684000
rd: read error: kernel virtual address: ffffffe97f684000  type: "64-bit KVADDR"

From the disassembled code, we can see that %rax is copied to %rsi and %rdx. %rsi is saved into %r13 and %r13 also has the same value.

RSI: ffffffe97f684000
R13: ffff80e97f684000

So, the second argument to kmem_freepages() was a bogus address. Next thing to check is where this data came from. The previous function in the call stack is ‘slab_destroy’ and you can see the code here.

static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
{
    void *addr = slabp->s_mem - slabp->colouroff;
 
    slab_destroy_objs(cachep, slabp);
    if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
        struct slab_rcu *slab_rcu;
 
        slab_rcu = (struct slab_rcu *)slabp;
        slab_rcu->cachep = cachep;
        slab_rcu->addr = addr;
        call_rcu(&slab_rcu->head, kmem_rcu_free);
    } else {
        kmem_freepages(cachep, addr);
        if (OFF_SLAB(cachep))
            kmem_cache_free(cachep->slabp_cache, slabp);
    }
}

Now disassemble the caller of slab_destroy to check the second argument ‘slabp’.

crash> dis -r ffffffff800d73e6
...
0xffffffff800d73d4 :    mov    %rbx,%rsi  <-- It's the second argument
0xffffffff800d73d7 :    mov    %r12,%rdi
0xffffffff800d73da :    sub    %rax,%rdx
0xffffffff800d73dd :    mov    %rdx,0x30(%rbp)
0xffffffff800d73e1 :    callq  0xffffffff800d7241 

We can see that slab pointer is saved in %rbx by disaseembling slab_destroy.

crash> dis slab_destroy
0xffffffff800d7241 :      push   %r13
0xffffffff800d7243 :    xor    %r13d,%r13d
0xffffffff800d7246 :    push   %r12
0xffffffff800d7248 :    push   %rbp
0xffffffff800d7249 :    mov    %rsi,%rbp
0xffffffff800d724c :   push   %rbx     <-- slab_destroy saved the second argument into the stack as fourth entry.
...

To retrieve the data from the stack, you should remember below fact.

  1. Using stack decreament address
  2. First entry is the return address (ffffffff800d73e6)
crash> bt -f
PID: 26     TASK: ffff81027f9197a0  CPU: 0   COMMAND: "events/0"
...
 #4 [ffff81027f92fd58] slab_destroy at ffffffff800d72c1
    ffff81027f92fd60: 0000000000000000 ffff81017f684000  <-- %rbx
    ffff81027f92fd70: ffff810108c99c40 ffff810108ca30c0 
    ffff81027f92fd80: ffff810108ca4098 ffffffff800d73e6  <--  Return address
 #5 [ffff81027f92fd88] free_block at ffffffff800d73e6    <--  Caller address

Now check the slabp as we know the address from the stack.

crash> slab ffff81017f684000
struct slab {
  list = {
    next = 0x100100, 
    prev = 0x200200
  }, 
  colouroff = 103079215224, 
  s_mem = 0xffff81017f684078, 
  inuse = 0, 
  free = 7, 
  nodeid = 0
}

The code uses slabp->s_mem – slabp->colour off which is the bad address 0xffff80e97f684000.

crash> rd 0xffff81017f684078
ffff81017f684078:  0000001000000000                    ........

s_mem has correct pointer, but the colour off has way too big value.

Checking slab entries also shows the some corrupted links.

crash> kmem -s ffff81017f684000
kmem: dentry_cache: partial list: slab: ffff81017f684000  bad next pointer: 100100
kmem: dentry_cache: full list: slab: ffff81017f684000  bad next pointer: 100100
kmem: dentry_cache: free list: slab: ffff81017f684000  bad next pointer: 100100
kmem: dentry_cache: address not found in cache: ffff81017f684000

In this case some code in the kernel corrupt the entry, so, it might be necessary to test with the debug kernel to check who tries to write beyond the allowed memory range.

In this case, it doesn’t show the cause of the issue, but can learn how we can dig into the stack to check the possible cause.



4 responses to “Jump into vmcore analysis – Step 8”

  1. Thanks for your note.

  2. “…
    0xffffffff800d6bd9 : mov -0x7fb70480(,%rax,8),%rax

    From the ‘bt’ output, we can see that ‘RAX’ has value ‘000000000ffffffe’. As RAX is the index in the mem_section, the actual address is ‘ffffffe97f684000’ …”

    Do you mean the actual address of “-0x7fb70480(,%rax,8)”, how can it be ‘ffffffe97f684000’ assuming [base+index*scale+disp]?

    1. Hey Vince,

      It’s updated to make clear. I meant to say the original virtual address is ffffffe97f684000.

      Cheers.

      1. Thanks for your nice post!

Leave a Reply to Bo Cancel reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About Me

A software engineer who loves any technologies that makes life easier. That’s why I love Linux and Mac at the same time.

Newsletter

%d bloggers like this: