Message ID | 20210203165458.28717-8-vadym.kochan@plvision.eu (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Marvell Prestera Switchdev misc updates | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: tchornyi@marvell.com vkochan@marvell.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 9 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Wed, 3 Feb 2021 18:54:58 +0200 Vadym Kochan wrote: > For some reason there might be a crash during ports creation if port > events are handling at the same time because fw may send initial > port event with down state. > > The crash points to cancel_delayed_work() which is called when port went > is down. Currently I did not find out the real cause of the issue, so > fixed it by cancel port stats work only if previous port's state was up > & runnig. Maybe you just need to move the DELAYED_WORK_INIT() earlier? Not sure why it's at the end of prestera_port_create(), it just initializes some fields. > [ 28.489791] Call trace: > [ 28.492259] get_work_pool+0x48/0x60 > [ 28.495874] cancel_delayed_work+0x38/0xb0 > [ 28.500011] prestera_port_handle_event+0x90/0xa0 [prestera] > [ 28.505743] prestera_evt_recv+0x98/0xe0 [prestera] > [ 28.510683] prestera_fw_evt_work_fn+0x180/0x228 [prestera_pci] > [ 28.516660] process_one_work+0x1e8/0x360 > [ 28.520710] worker_thread+0x44/0x480 > [ 28.524412] kthread+0x154/0x160 > [ 28.527670] ret_from_fork+0x10/0x38 > [ 28.531290] Code: a8c17bfd d50323bf d65f03c0 9278dc21 (f9400020) > [ 28.537429] ---[ end trace 5eced933df3a080b ]--- > > Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> > --- > drivers/net/ethernet/marvell/prestera/prestera_main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c > index 39465e65d09b..122324dae47d 100644 > --- a/drivers/net/ethernet/marvell/prestera/prestera_main.c > +++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c > @@ -433,7 +433,8 @@ static void prestera_port_handle_event(struct prestera_switch *sw, > netif_carrier_on(port->dev); > if (!delayed_work_pending(caching_dw)) > queue_delayed_work(prestera_wq, caching_dw, 0); > - } else { > + } else if (netif_running(port->dev) && > + netif_carrier_ok(port->dev)) { > netif_carrier_off(port->dev); > if (delayed_work_pending(caching_dw)) > cancel_delayed_work(caching_dw);
Hi Jakub, On Thu, Feb 04, 2021 at 09:19:34PM -0800, Jakub Kicinski wrote: > On Wed, 3 Feb 2021 18:54:58 +0200 Vadym Kochan wrote: > > For some reason there might be a crash during ports creation if port > > events are handling at the same time because fw may send initial > > port event with down state. > > > > The crash points to cancel_delayed_work() which is called when port went > > is down. Currently I did not find out the real cause of the issue, so > > fixed it by cancel port stats work only if previous port's state was up > > & runnig. > > Maybe you just need to move the DELAYED_WORK_INIT() earlier? > Not sure why it's at the end of prestera_port_create(), it > just initializes some fields. > Thanks for suggestion, but it does not help. Will try to find-out the real root cause but this is the only fix I 'v came up. > > [ 28.489791] Call trace: > > [ 28.492259] get_work_pool+0x48/0x60 > > [ 28.495874] cancel_delayed_work+0x38/0xb0 > > [ 28.500011] prestera_port_handle_event+0x90/0xa0 [prestera] > > [ 28.505743] prestera_evt_recv+0x98/0xe0 [prestera] > > [ 28.510683] prestera_fw_evt_work_fn+0x180/0x228 [prestera_pci] > > [ 28.516660] process_one_work+0x1e8/0x360 > > [ 28.520710] worker_thread+0x44/0x480 > > [ 28.524412] kthread+0x154/0x160 > > [ 28.527670] ret_from_fork+0x10/0x38 > > [ 28.531290] Code: a8c17bfd d50323bf d65f03c0 9278dc21 (f9400020) > > [ 28.537429] ---[ end trace 5eced933df3a080b ]--- > > > > Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> > > --- > > drivers/net/ethernet/marvell/prestera/prestera_main.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c > > index 39465e65d09b..122324dae47d 100644 > > --- a/drivers/net/ethernet/marvell/prestera/prestera_main.c > > +++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c > > @@ -433,7 +433,8 @@ static void prestera_port_handle_event(struct prestera_switch *sw, > > netif_carrier_on(port->dev); > > if (!delayed_work_pending(caching_dw)) > > queue_delayed_work(prestera_wq, caching_dw, 0); > > - } else { > > + } else if (netif_running(port->dev) && > > + netif_carrier_ok(port->dev)) { > > netif_carrier_off(port->dev); > > if (delayed_work_pending(caching_dw)) > > cancel_delayed_work(caching_dw); >
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c index 39465e65d09b..122324dae47d 100644 --- a/drivers/net/ethernet/marvell/prestera/prestera_main.c +++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c @@ -433,7 +433,8 @@ static void prestera_port_handle_event(struct prestera_switch *sw, netif_carrier_on(port->dev); if (!delayed_work_pending(caching_dw)) queue_delayed_work(prestera_wq, caching_dw, 0); - } else { + } else if (netif_running(port->dev) && + netif_carrier_ok(port->dev)) { netif_carrier_off(port->dev); if (delayed_work_pending(caching_dw)) cancel_delayed_work(caching_dw);
For some reason there might be a crash during ports creation if port events are handling at the same time because fw may send initial port event with down state. The crash points to cancel_delayed_work() which is called when port went is down. Currently I did not find out the real cause of the issue, so fixed it by cancel port stats work only if previous port's state was up & runnig. The following is the crash which can be triggered: [ 28.311104] Unable to handle kernel paging request at virtual address 000071775f776600 [ 28.319097] Mem abort info: [ 28.321914] ESR = 0x96000004 [ 28.324996] EC = 0x25: DABT (current EL), IL = 32 bits [ 28.330350] SET = 0, FnV = 0 [ 28.333430] EA = 0, S1PTW = 0 [ 28.336597] Data abort info: [ 28.339499] ISV = 0, ISS = 0x00000004 [ 28.343362] CM = 0, WnR = 0 [ 28.346354] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100bf7000 [ 28.352842] [000071775f776600] pgd=0000000000000000, p4d=0000000000000000 [ 28.359695] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 28.365310] Modules linked in: prestera_pci(+) prestera uio_pdrv_genirq [ 28.372005] CPU: 0 PID: 1291 Comm: kworker/0:1H Not tainted 5.11.0-rc4 #1 [ 28.378846] Hardware name: DNI AmazonGo1 A7040 board (DT) [ 28.384283] Workqueue: prestera_fw_wq prestera_fw_evt_work_fn [prestera_pci] [ 28.391413] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) [ 28.397468] pc : get_work_pool+0x48/0x60 [ 28.401442] lr : try_to_grab_pending+0x6c/0x1b0 [ 28.406018] sp : ffff80001391bc60 [ 28.409358] x29: ffff80001391bc60 x28: 0000000000000000 [ 28.414725] x27: ffff000104fc8b40 x26: ffff80001127de88 [ 28.420089] x25: 0000000000000000 x24: ffff000106119760 [ 28.425452] x23: ffff00010775dd60 x22: ffff00010567e000 [ 28.430814] x21: 0000000000000000 x20: ffff80001391bcb0 [ 28.436175] x19: ffff00010775deb8 x18: 00000000000000c0 [ 28.441537] x17: 0000000000000000 x16: 000000008d9b0e88 [ 28.446898] x15: 0000000000000001 x14: 00000000000002ba [ 28.452261] x13: 80a3002c00000002 x12: 00000000000005f4 [ 28.457622] x11: 0000000000000030 x10: 000000000000000c [ 28.462985] x9 : 000000000000000c x8 : 0000000000000030 [ 28.468346] x7 : ffff800014400000 x6 : ffff000106119758 [ 28.473708] x5 : 0000000000000003 x4 : ffff00010775dc60 [ 28.479068] x3 : 0000000000000000 x2 : 0000000000000060 [ 28.484429] x1 : 000071775f776600 x0 : ffff00010775deb8 [ 28.489791] Call trace: [ 28.492259] get_work_pool+0x48/0x60 [ 28.495874] cancel_delayed_work+0x38/0xb0 [ 28.500011] prestera_port_handle_event+0x90/0xa0 [prestera] [ 28.505743] prestera_evt_recv+0x98/0xe0 [prestera] [ 28.510683] prestera_fw_evt_work_fn+0x180/0x228 [prestera_pci] [ 28.516660] process_one_work+0x1e8/0x360 [ 28.520710] worker_thread+0x44/0x480 [ 28.524412] kthread+0x154/0x160 [ 28.527670] ret_from_fork+0x10/0x38 [ 28.531290] Code: a8c17bfd d50323bf d65f03c0 9278dc21 (f9400020) [ 28.537429] ---[ end trace 5eced933df3a080b ]--- Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> --- drivers/net/ethernet/marvell/prestera/prestera_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)