Message ID | 1452457610.16445.13.camel@edumazet-glaptop2.roam.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10.01.2016 22:26, Eric Dumazet wrote: > > I do not see anything wrong with this commit. > > It must uncover a prior bug. > Could be, but it was working fine in 3.19. > Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ? > > > diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c > index a58680016472..fd2f44940bd7 100644 > --- a/net/phonet/pn_dev.c > +++ b/net/phonet/pn_dev.c The patch makes difference: <3>[ 118.943542] et8ek8 3-003e: could not get clock <6>[ 119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready <4>[ 120.945922] sched: RT throttling activated <5>[ 158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK <6>[ 158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready <1>[ 158.708831] Unable to handle kernel NULL pointer dereference at virtual address 0000005c <1>[ 158.717498] pgd = ce548000 <1>[ 158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000 <0>[ 158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM <4>[ 158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger gpio_keys mac80211 smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 leds_lp5523 leds_lp55xx_common adp1653 bq27xxx_battery tsl2563 twl4030_wdt rtc_twl et8ek8 smiaregs ad5820 v4l2_common twl4030_vibra ff_memless videodev lis3lv02d_i2c media lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal thermal_sys hwmon rx51_battery <4>[ 158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14 <4>[ 158.795562] Hardware name: Nokia RX-51 board <4>[ 158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000 <4>[ 158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c <4>[ 158.811401] LR is at sock_queue_rcv_skb+0x208/0x214 <4>[ 158.816528] pc : [<c0394218>] lr : [<c0385648>] psr: 00000113 <4>[ 158.816528] sp : ce4e5e98 ip : 14400000 fp : cd4ab380 <4>[ 158.828582] r10: c249405c r9 : 00000000 r8 : c24b2d74 <4>[ 158.834045] r7 : 0000f500 r6 : c2494000 r5 : c2494048 r4 : c24b2cc0 <4>[ 158.840911] r3 : 00000000 r2 : 00000002 r1 : 00000000 r0 : 00000000 <4>[ 158.847778] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none <4>[ 158.855285] Control: 10c5387d Table: 8e548019 DAC: 00000051 <0>[ 158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210) <0>[ 158.867340] Stack: (0xce4e5e98 to 0xce4e6000) <0>[ 158.871917] 5e80: cf3ab000 cf2e4d90 <0>[ 158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00 c00685ac c243b67c 40000113 <0>[ 158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000 c06780c0 c24b2cc0 00000000 <0>[ 158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20 c0396950 c03968dc c06780c0 <0>[ 158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0 c03970a4 ce4e5f18 ce4e5f18 <0>[ 158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380 00000008 c0679340 c067938c <0>[ 158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00 fa2000cc 00000003 00000003 <0>[ 158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000 cf802000 00000001 10c53c7d <0>[ 158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000 c005b4c4 ce4e5fb0 b6e3f82c <0>[ 158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014 00000000 0001ef78 00000014 <0>[ 158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000 00001a30 0000026c beba476c <0>[ 158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010 ffffffff ddee3fd6 e7f5d5dd <4>[ 158.975158] [<c0394218>] (__netif_receive_skb_core) from [<c0396950>] (process_backlog+0x74/0xf4) <4>[ 158.984497] [<c0396950>] (process_backlog) from [<c03970a4>] (net_rx_action+0xd0/0x284) <4>[ 158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>] (__do_softirq+0xb0/0x208) <4>[ 159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>] (irq_exit+0x80/0xe4) <4>[ 159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>] (__handle_domain_irq+0x88/0xa8) <4>[ 159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>] (__irq_usr+0x50/0x80) <0>[ 159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c) <4>[ 159.031677] ---[ end trace c52d1e36f2d07d67 ]--- <0>[ 159.038146] Kernel panic - not syncing: Fatal exception in interrupt --[ end trace c52d1e36f2d07d67 ]--- Could you provide some hints on where to put some printks or dump some data to get a clearer picture on what happens? Thanks, Ivo -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Would you be able to disassemble your kernel so we could tell where the null pointer dereference happens? Thanks, Salam On Mon, Jan 11, 2016 at 1:03 PM, Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> wrote: > > > On 10.01.2016 22:26, Eric Dumazet wrote: >> >> >> I do not see anything wrong with this commit. >> >> It must uncover a prior bug. >> > > Could be, but it was working fine in 3.19. > >> Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ? >> >> >> diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c >> index a58680016472..fd2f44940bd7 100644 >> --- a/net/phonet/pn_dev.c >> +++ b/net/phonet/pn_dev.c > > > The patch makes difference: > > <3>[ 118.943542] et8ek8 3-003e: could not get clock > <6>[ 119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready > <4>[ 120.945922] sched: RT throttling activated > <5>[ 158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK > <6>[ 158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready > <1>[ 158.708831] Unable to handle kernel NULL pointer dereference at > virtual address 0000005c > <1>[ 158.717498] pgd = ce548000 > <1>[ 158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000 > <0>[ 158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM > <4>[ 158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol > vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6 > bq2415x_charger uinput hsi_char radio_platform_si4713 joydev omap_ssi_port > video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger gpio_keys mac80211 > smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 > leds_lp5523 leds_lp55xx_common adp1653 bq27xxx_battery tsl2563 twl4030_wdt > rtc_twl et8ek8 smiaregs ad5820 v4l2_common twl4030_vibra ff_memless videodev > lis3lv02d_i2c media lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal > thermal_sys hwmon rx51_battery > <4>[ 158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14 > <4>[ 158.795562] Hardware name: Nokia RX-51 board > <4>[ 158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000 > <4>[ 158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c > <4>[ 158.811401] LR is at sock_queue_rcv_skb+0x208/0x214 > <4>[ 158.816528] pc : [<c0394218>] lr : [<c0385648>] psr: 00000113 > <4>[ 158.816528] sp : ce4e5e98 ip : 14400000 fp : cd4ab380 > <4>[ 158.828582] r10: c249405c r9 : 00000000 r8 : c24b2d74 > <4>[ 158.834045] r7 : 0000f500 r6 : c2494000 r5 : c2494048 r4 : c24b2cc0 > <4>[ 158.840911] r3 : 00000000 r2 : 00000002 r1 : 00000000 r0 : 00000000 > <4>[ 158.847778] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment none > <4>[ 158.855285] Control: 10c5387d Table: 8e548019 DAC: 00000051 > <0>[ 158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210) > <0>[ 158.867340] Stack: (0xce4e5e98 to 0xce4e6000) > <0>[ 158.871917] 5e80: cf3ab000 cf2e4d90 > <0>[ 158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00 > c00685ac c243b67c 40000113 > <0>[ 158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000 > c06780c0 c24b2cc0 00000000 > <0>[ 158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20 > c0396950 c03968dc c06780c0 > <0>[ 158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0 > c03970a4 ce4e5f18 ce4e5f18 > <0>[ 158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380 > 00000008 c0679340 c067938c > <0>[ 158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00 > fa2000cc 00000003 00000003 > <0>[ 158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000 > cf802000 00000001 10c53c7d > <0>[ 158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000 > c005b4c4 ce4e5fb0 b6e3f82c > <0>[ 158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014 > 00000000 0001ef78 00000014 > <0>[ 158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000 > 00001a30 0000026c beba476c > <0>[ 158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010 > ffffffff ddee3fd6 e7f5d5dd > <4>[ 158.975158] [<c0394218>] (__netif_receive_skb_core) from [<c0396950>] > (process_backlog+0x74/0xf4) > <4>[ 158.984497] [<c0396950>] (process_backlog) from [<c03970a4>] > (net_rx_action+0xd0/0x284) > <4>[ 158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>] > (__do_softirq+0xb0/0x208) > <4>[ 159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>] > (irq_exit+0x80/0xe4) > <4>[ 159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>] > (__handle_domain_irq+0x88/0xa8) > <4>[ 159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>] > (__irq_usr+0x50/0x80) > <0>[ 159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c) > <4>[ 159.031677] ---[ end trace c52d1e36f2d07d67 ]--- > <0>[ 159.038146] Kernel panic - not syncing: Fatal exception in interrupt > --[ end trace c52d1e36f2d07d67 ]--- > > Could you provide some hints on where to put some printks or dump some data > to get a clearer picture on what happens? > > Thanks, > Ivo -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12.01.2016 00:11, Salam Noureddine wrote: > Would you be able to disassemble your kernel so we could tell where > the null pointer dereference happens? > Sure, but wouldn't it be better to provide the object file containing the debug symbols as well? Otherwise, the null pointer dereference happens somewhere in: (gdb) l *__netif_receive_skb_core+0x7c0 0x1318 is in __netif_receive_skb_core (include/linux/compiler.h:218). 213 }) 214 215 static __always_inline 216 void __read_once_size(const volatile void *p, void *res, int size) 217 { 218 __READ_ONCE_SIZE; 219 } 220 221 #ifdef CONFIG_KASAN 222 /* (gdb) l *__netif_receive_skb_core+0x7bc 0x1314 is in __netif_receive_skb_core (net/core/dev.c:3934). 3929 } 3930 3931 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3932 &orig_dev->ptype_specific); 3933 3934 if (unlikely(skb->dev != orig_dev)) { 3935 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3936 &skb->dev->ptype_specific); 3937 } 3938 (gdb) l *__netif_receive_skb_core+0x7c4 0x131c is in __netif_receive_skb_core (net/core/dev.c:3935). 3930 3931 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3932 &orig_dev->ptype_specific); 3933 3934 if (unlikely(skb->dev != orig_dev)) { 3935 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3936 &skb->dev->ptype_specific); 3937 } 3938 3939 if (pt_prev) { 0x00001300 <+1960>: cmp r10, r3 0x00001304 <+1964>: bne 0x1284 <__netif_receive_skb_core+1836> 0x00001308 <+1968>: ldr r4, [sp, #12] 0x0000130c <+1972>: ldr r3, [r4, #20] 0x00001310 <+1976>: cmp r3, r6 0x00001314 <+1980>: beq 0x13b0 <__netif_receive_skb_core+2136> 0x00001318 <+1984>: ldr r5, [r3, #92] ; 0x5c <-FAULT r3 seems to be skb->dev 0x0000131c <+1988>: add r10, r3, #92 ; 0x5c 0x00001320 <+1992>: add r8, r4, #180 ; 0xb4 0x00001324 <+1996>: sub r5, r5, #20 0x00001328 <+2000>: b 0x13a4 <__netif_receive_skb_core+2124> 0x0000132c <+2004>: ldrh r3, [r5] 0x00001330 <+2008>: cmp r3, r7 I put some additional printks around that code, and it turned out that skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but "&skb->dev->ptype_specific" oopses. Thanks, Ivo -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote: > > I put some additional printks around that code, and it turned out that > skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but > "&skb->dev->ptype_specific" oopses. Nice find ! Now lets find what possibly called netif_rx() with skb->dev == NULL (This is illegal) -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
It must be that skb->dev was changed to NULL inside of __netif_receive_skb_core, otherwise we would have crashed much earlier. Also, orig_dev is saved at the beginning. Possibly a device is layered on top of the original device. On Mon, Jan 11, 2016 at 5:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote: > >> >> I put some additional printks around that code, and it turned out that >> skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but >> "&skb->dev->ptype_specific" oopses. > > Nice find ! > > Now lets find what possibly called netif_rx() with skb->dev == NULL > > (This is illegal) > > > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote: > It must be that skb->dev was changed to NULL inside of > __netif_receive_skb_core, otherwise we would have crashed much > earlier. Also, orig_dev is saved at the beginning. Possibly a device > is layered on top of the original device. Please do not top post on netdev / lkml mailing lists. My guess is a protocol handler queued the skb without calling skb_share_check() -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c index a58680016472..fd2f44940bd7 100644 --- a/net/phonet/pn_dev.c +++ b/net/phonet/pn_dev.c @@ -63,11 +63,11 @@ struct phonet_device_list *phonet_device_list(struct net *net) static struct phonet_device *__phonet_device_alloc(struct net_device *dev) { struct phonet_device_list *pndevs = phonet_device_list(dev_net(dev)); - struct phonet_device *pnd = kmalloc(sizeof(*pnd), GFP_ATOMIC); - if (pnd == NULL) + struct phonet_device *pnd = kzalloc(sizeof(*pnd), GFP_KERNEL); + + if (!pnd) return NULL; pnd->netdev = dev; - bitmap_zero(pnd->addrs, 64); BUG_ON(!mutex_is_locked(&pndevs->lock)); list_add_rcu(&pnd->list, &pndevs->list); @@ -117,7 +117,7 @@ static void phonet_device_destroy(struct net_device *dev) for_each_set_bit(addr, pnd->addrs, 64) phonet_address_notify(RTM_DELADDR, dev, addr); - kfree(pnd); + kfree_rcu(pnd, rcu); } } @@ -301,6 +301,7 @@ static int phonet_device_notify(struct notifier_block *me, unsigned long what, if (dev->type == ARPHRD_PHONET) phonet_device_autoconf(dev); break; + case NETDEV_DOWN: case NETDEV_UNREGISTER: phonet_device_destroy(dev); phonet_route_autodel(dev);