Linux如何定位死机问题: CPU 0 Unable to handle kernel paging request at virtual address

Linux运行的时候崩溃死机了,打印如下:

CPU 0 Unable to handle kernel paging request at virtual address 0000000000000318, epc == ffffffffc0445a10, ra == ffffffffc04459dc
Oops[#1]:
Cpu 0
0 : 0000000000000000 ffffffff808b1da0 0000000000000300 0000000000000030 4 : 0000000000000000 a8000000029d2160 000000000000002e a800000002559000
8 : a8000000029d2140 0000000000000001 0000000000000000 000000000000001812 : 0000000000000000 000000001000001f a800000031180000 0000000000000000
16 : a8000000029d214e 0000000000000300 a8000000012d1600 a8000000029c858020 : a8000000012d1870 ffffffff812408e8 0000000000000806 0000000000000000
24 : 00000000000002b1 000000555d5887b028 : ffffffff811c4000 ffffffff811c7970 ffffffff811c7970 ffffffffc04459dc
Hi : 0000000000000000
Lo : 0000000000000000
epc : ffffffffc0445a10 rlb_arp_recv+0x128/0x228 [bonding]
Tainted: P
ra : ffffffffc04459dc rlb_arp_recv+0xf4/0x228 [bonding]
Status: 1010cce3 KX SX UX KERNEL EXL IE
Cause : 00800008
BadVA : 0000000000000318
PrId : 000d9202 (Cavium Octeon II)
Modules linked in: bonding run(P) raid vscsih iscsitgt disk vdisk cache(P) service gmeta mpt2sas netlink bubble platform octeon_ethernet at24
Process swapper (pid: 0, threadinfo=ffffffff811c4000, task=ffffffff811e5280, tls=0000000000000000)
Stack : 0000000000000003 ffffffff81241498 ffffffff812414d8 a8000000029c8580
a8000000029c8644 a800000002559000 ffffffff811c79b0 ffffffff807a7648
000d0300000d0300 ffffffff808b22e0 000000000000003c a800000002559600
a8000000029c8580 a800000002b7d280 0000000000000000 0000000000000001
0000000000000001 0000000000000001 ffffffff811c7a10 ffffffffc0010154
ffffffff811c7b80 ffffffff802d22e8 0000000000000000 ffffffff80356140
0000000000000000 0000000000000000 8001670000000000 0000000000000001
0000000000000003 0000000000000001 0000000000000000 000000000000ffff
0000000000000000 ffffffffc001ac00 0000000000000020 000000011000001f
a800000031180000 0000000000000000 ffffffff811d2a00 8001670000000100
...
Call Trace:
[] rlb_arp_recv+0x128/0x228 [bonding]
[] netif_receive_skb+0x3f0/0x4d8
[] cvm_oct_napi_poll_38+0x7ac/0x10e8 [octeon_ethernet]
[] net_rx_action+0x128/0x280
[] __do_softirq+0x130/0x248
[] do_softirq+0x88/0x90
[] irq_exit+0x70/0x88
[] do_IRQ+0x48/0x60
[] octeon_irq_ip2_ciu+0x94/0xb8
[] plat_irq_dispatch+0x80/0xd0
[] ret_from_irq+0x0/0x4
[] r4k_wait+0x20/0x40
[] cpu_idle+0x84/0xa0
[] rest_init+0x80/0x98
[] start_kernel+0x37c/0x4c4

Code: de440268 70431003 0082882d <92230018> 10600007 3c02808b 8a020018 8e230000 9a02001b
Kernel panic - not syncing: Fatal exception in interrupt

*** NMI Watchdog interrupt on Core 0x01 ***
$0 0x0000000000000000 at 0xffffffff803471bc
v0 0xffffffff802d24c0 v1 0x0000000000000001
a0 0xfffffffffffffffd a1 0x0000000000000000
a2 0xffffffff812403c8 a3 0x0000000000000001
a4 0x0000000000000800 a5 0x0000000000000020
a6 0x0000000000000000 a7 0x000000aaab43b498
t0 0x0000000000000000 t1 0x000000001000001f
t2 0xa800000031188000 t3 0x0000000000000000
s0 0xffffffff853e0000 s1 0xffffffff853f0000
s2 0xffffffff811c8980 s3 0x0000000000000000
s4 0x0000000000000002 s5 0x0000000000200200
s6 0xffffffff811c8990 s7 0xffffffff811287d0
t8 0x0000000000000000 t9 0x0000005561b7f7b0
k0 0x0000000000000000 k1 0x0000000000000000
gp 0xa8000000310fc000 sp 0xa8000000310ffb10
s8 0xa8000000310ffb10 ra 0xffffffff802dbc18
err_epc 0xffffffff802d24e0 epc 0xffffffff802d24e0
status 0x000000001058cce4 cause 0x0000000040808800
sum0 0x0000000000000000 en0 0x0000000000000000
*** Chip soft reset soon ***

重点在这里:

epc   : ffffffffc0445a10 rlb_arp_recv+0x128/0x228
Call Trace:
[] rlb_arp_recv+0x128/0x228 [bonding]

反汇编发生死机的ko模块

mips64-octeon-linux-gnu-objdump -S  bonding.ko

搜索 rlb_arp_recv的基址,并计算死机的位置:

000000000000e8e8 :

0xe8e8 + 0x128 = 0xea10

也就是说,正确的出错位置是 if ((client_info->assigned) &&

 _lock_rx_hashtbl(bond);

hash_index = _simple_hash((u8*)&(arp->ip_src), sizeof(arp->ip_src));
client_info = &(bond_info->rx_hashtbl[hash_index]);
e9fc: 7c82f803 dext v0,a0,0x0,0x20
ea00: 24030030 li v1,48
ea04: de440268 ld a0,616(s2)
ea08: 70431003 dmul v0,v0,v1
ea0c: 0082882d daddu s1,a0,v0

if ((client_info->assigned) &&
ea10: 92230018 lbu v1,24(s1)
ea14: 10600007 beqz v1,ea34
ea18: 3c020000 lui v0,0x0
ea1c: 8a020018 lwl v0,24(s0)
ea20: 8e230000 lw v1,0(s1)
ea24: 9a02001b lwr v0,27(s0)
ea28: 10620019 beq v1,v0,ea90
ea2c: 00000000 nop
spin_lock_bh(&(BOND_ALB_INFO(bond).rx_hashtbl_lock));

epc :exception program counter  , 异常程序计数器,  ra : return address 返回地址

Posted in 未分类