Linux运行的时候崩溃死机了,打印如下:
CPU 0 Unable to handle kernel paging request at virtual address 0000000000000318, epc == ffffffffc0445a10, ra == ffffffffc04459dc Oops[#1]: Cpu 0 $ 0 : 0000000000000000 ffffffff808b1da0 0000000000000300 0000000000000030 $ 4 : 0000000000000000 a8000000029d2160 000000000000002e a800000002559000 $ 8 : a8000000029d2140 0000000000000001 0000000000000000 0000000000000018 $12 : 0000000000000000 000000001000001f a800000031180000 0000000000000000 $16 : a8000000029d214e 0000000000000300 a8000000012d1600 a8000000029c8580 $20 : a8000000012d1870 ffffffff812408e8 0000000000000806 0000000000000000 $24 : 00000000000002b1 000000555d5887b0 $28 : ffffffff811c4000 ffffffff811c7970 ffffffff811c7970 ffffffffc04459dc Hi : 0000000000000000 Lo : 0000000000000000 epc : ffffffffc0445a10 rlb_arp_recv+0x128/0x228 [bonding] Tainted: P ra : ffffffffc04459dc rlb_arp_recv+0xf4/0x228 [bonding] Status: 1010cce3 KX SX UX KERNEL EXL IE Cause : 00800008 BadVA : 0000000000000318 PrId : 000d9202 (Cavium Octeon II) Modules linked in: bonding run(P) raid vscsih iscsitgt disk vdisk cache(P) service gmeta mpt2sas netlink bubble platform octeon_ethernet at24 Process swapper (pid: 0, threadinfo=ffffffff811c4000, task=ffffffff811e5280, tls=0000000000000000) Stack : 0000000000000003 ffffffff81241498 ffffffff812414d8 a8000000029c8580 a8000000029c8644 a800000002559000 ffffffff811c79b0 ffffffff807a7648 000d0300000d0300 ffffffff808b22e0 000000000000003c a800000002559600 a8000000029c8580 a800000002b7d280 0000000000000000 0000000000000001 0000000000000001 0000000000000001 ffffffff811c7a10 ffffffffc0010154 ffffffff811c7b80 ffffffff802d22e8 0000000000000000 ffffffff80356140 0000000000000000 0000000000000000 8001670000000000 0000000000000001 0000000000000003 0000000000000001 0000000000000000 000000000000ffff 0000000000000000 ffffffffc001ac00 0000000000000020 000000011000001f a800000031180000 0000000000000000 ffffffff811d2a00 8001670000000100 ... Call Trace: [] rlb_arp_recv+0x128/0x228 [bonding] [ ] netif_receive_skb+0x3f0/0x4d8 [ ] cvm_oct_napi_poll_38+0x7ac/0x10e8 [octeon_ethernet] [ ] net_rx_action+0x128/0x280 [ ] __do_softirq+0x130/0x248 [ ] do_softirq+0x88/0x90 [ ] irq_exit+0x70/0x88 [ ] do_IRQ+0x48/0x60 [ ] octeon_irq_ip2_ciu+0x94/0xb8 [ ] plat_irq_dispatch+0x80/0xd0 [ ] ret_from_irq+0x0/0x4 [ ] r4k_wait+0x20/0x40 [ ] cpu_idle+0x84/0xa0 [ ] rest_init+0x80/0x98 [ ] start_kernel+0x37c/0x4c4 Code: de440268 70431003 0082882d <92230018> 10600007 3c02808b 8a020018 8e230000 9a02001b Kernel panic - not syncing: Fatal exception in interrupt *** NMI Watchdog interrupt on Core 0x01 *** $0 0x0000000000000000 at 0xffffffff803471bc v0 0xffffffff802d24c0 v1 0x0000000000000001 a0 0xfffffffffffffffd a1 0x0000000000000000 a2 0xffffffff812403c8 a3 0x0000000000000001 a4 0x0000000000000800 a5 0x0000000000000020 a6 0x0000000000000000 a7 0x000000aaab43b498 t0 0x0000000000000000 t1 0x000000001000001f t2 0xa800000031188000 t3 0x0000000000000000 s0 0xffffffff853e0000 s1 0xffffffff853f0000 s2 0xffffffff811c8980 s3 0x0000000000000000 s4 0x0000000000000002 s5 0x0000000000200200 s6 0xffffffff811c8990 s7 0xffffffff811287d0 t8 0x0000000000000000 t9 0x0000005561b7f7b0 k0 0x0000000000000000 k1 0x0000000000000000 gp 0xa8000000310fc000 sp 0xa8000000310ffb10 s8 0xa8000000310ffb10 ra 0xffffffff802dbc18 err_epc 0xffffffff802d24e0 epc 0xffffffff802d24e0 status 0x000000001058cce4 cause 0x0000000040808800 sum0 0x0000000000000000 en0 0x0000000000000000 *** Chip soft reset soon *** 重点在这里: epc : ffffffffc0445a10 rlb_arp_recv+0x128/0x228 Call Trace: [] rlb_arp_recv+0x128/0x228 [bonding] 反汇编发生死机的ko模块 mips64-octeon-linux-gnu-objdump -S bonding.ko 搜索 rlb_arp_recv的基址,并计算死机的位置:000000000000e8e8: 0xe8e8 + 0x128 = 0xea10 也就是说,正确的出错位置是 if ((client_info->assigned) && _lock_rx_hashtbl(bond); hash_index = _simple_hash((u8*)&(arp->ip_src), sizeof(arp->ip_src)); client_info = &(bond_info->rx_hashtbl[hash_index]); e9fc: 7c82f803 dext v0,a0,0x0,0x20 ea00: 24030030 li v1,48 ea04: de440268 ld a0,616(s2) ea08: 70431003 dmul v0,v0,v1 ea0c: 0082882d daddu s1,a0,v0 if ((client_info->assigned) && ea10: 92230018 lbu v1,24(s1) ea14: 10600007 beqz v1,ea34ea18: 3c020000 lui v0,0x0 ea1c: 8a020018 lwl v0,24(s0) ea20: 8e230000 lw v1,0(s1) ea24: 9a02001b lwr v0,27(s0) ea28: 10620019 beq v1,v0,ea90 ea2c: 00000000 nop spin_lock_bh(&(BOND_ALB_INFO(bond).rx_hashtbl_lock)); epc :exception program counter , 异常程序计数器, ra : return address 返回地址