Message ID | 20230131213703.1347761-2-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [net,1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock | expand |
On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote: > From: Dave Ertman <david.m.ertman@intel.com> > > RDMA is not supported in ice on a PF that has been added to a bonded > interface. To enforce this, when an interface enters a bond, we unplug > the auxiliary device that supports RDMA functionality. This unplug > currently happens in the context of handling the netdev bonding event. > This event is sent to the ice driver under RTNL context. This is causing > a deadlock where the RDMA driver is waiting for the RTNL lock to complete > the removal. > > Defer the unplugging/re-plugging of the auxiliary device to the service > task so that it is not performed under the RTNL lock context. > > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/ > Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave") > Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed") > Signed-off-by: Dave Ertman <david.m.ertman@intel.com> > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > --- > drivers/net/ethernet/intel/ice/ice.h | 14 +++++--------- > drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++---------- > 2 files changed, 12 insertions(+), 19 deletions(-) <...> > index 5f86e4111fa9..055494dbcce0 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work) > } > } > > - if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) { > - /* Plug aux device per request */ > + /* Plug aux device per request */ > + if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) Very interesting pattern. You are not holding any locks while running ice_service_task() and clear bits before you actually performed requested operation. How do you protect from races while testing bits in other places of ice driver? Thanks > ice_plug_aux_dev(pf); > > - /* Mark plugging as done but check whether unplug was > - * requested during ice_plug_aux_dev() call > - * (e.g. from ice_clear_rdma_cap()) and if so then > - * plug aux device. > - */ > - if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) > - ice_unplug_aux_dev(pf); > - } > + /* unplug aux dev per request, if an unplug request came in > + * while processing a plug request, this will handle it > + */ > + if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags)) > + ice_unplug_aux_dev(pf); > > if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) { > struct iidc_event *event; > -- > 2.38.1 >
On 2/1/2023 1:49 AM, Leon Romanovsky wrote: > On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote: >> From: Dave Ertman <david.m.ertman@intel.com> >> >> RDMA is not supported in ice on a PF that has been added to a bonded >> interface. To enforce this, when an interface enters a bond, we unplug >> the auxiliary device that supports RDMA functionality. This unplug >> currently happens in the context of handling the netdev bonding event. >> This event is sent to the ice driver under RTNL context. This is causing >> a deadlock where the RDMA driver is waiting for the RTNL lock to complete >> the removal. >> >> Defer the unplugging/re-plugging of the auxiliary device to the service >> task so that it is not performed under the RTNL lock context. >> >> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> >> Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/ >> Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave") >> Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed") >> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> >> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> >> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) >> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> >> --- >> drivers/net/ethernet/intel/ice/ice.h | 14 +++++--------- >> drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++---------- >> 2 files changed, 12 insertions(+), 19 deletions(-) > > <...> > >> index 5f86e4111fa9..055494dbcce0 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_main.c >> +++ b/drivers/net/ethernet/intel/ice/ice_main.c >> @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work) >> } >> } >> >> - if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) { >> - /* Plug aux device per request */ >> + /* Plug aux device per request */ >> + if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) > > Very interesting pattern. You are not holding any locks while running > ice_service_task() and clear bits before you actually performed requested > operation. > > How do you protect from races while testing bits in other places of ice > driver? I haven't heard from Dave so I'm going to drop this from the series so that the other patches can move on. Thanks, Tony
> -----Original Message----- > From: Leon Romanovsky <leonro@nvidia.com> > Sent: Wednesday, February 1, 2023 1:50 AM > Subject: Re: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug > under RTNL lock > > On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote: > > From: Dave Ertman <david.m.ertman@intel.com> > > > > RDMA is not supported in ice on a PF that has been added to a bonded > > interface. To enforce this, when an interface enters a bond, we unplug > > the auxiliary device that supports RDMA functionality. This unplug > > currently happens in the context of handling the netdev bonding event. > > This event is sent to the ice driver under RTNL context. This is causing > > a deadlock where the RDMA driver is waiting for the RTNL lock to complete > > the removal. > > > > Defer the unplugging/re-plugging of the auxiliary device to the service > > task so that it is not performed under the RTNL lock context. > > > > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > > Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb- > 0487c95e395d@leemhuis.info/ > > Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave") > > Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed") > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com> > > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > > Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent > worker at Intel) > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > > --- > > drivers/net/ethernet/intel/ice/ice.h | 14 +++++--------- > > drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++---------- > > 2 files changed, 12 insertions(+), 19 deletions(-) > > <...> > > > index 5f86e4111fa9..055494dbcce0 100644 > > --- a/drivers/net/ethernet/intel/ice/ice_main.c > > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > > @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct > *work) > > } > > } > > > > - if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) { > > - /* Plug aux device per request */ > > + /* Plug aux device per request */ > > + if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) > > Very interesting pattern. You are not holding any locks while running > ice_service_task() and clear bits before you actually performed requested > operation. > > How do you protect from races while testing bits in other places of ice > driver? Leon, Thanks for the review and sorry for the late reply, got sidetracked into another project. Your review caused us to re-evaluate the plug/unplug flow, and since these bits are only set/cleared in the bonding event flow, and the UNPLUG bit set clears the PLUG bit, we attain the desired outcome in all cases if we swap the order that we evaluate the bits in the service task. Any multi-event situation that happens between or during service task will be handled in the expected way. DaveE > > Thanks > > > ice_plug_aux_dev(pf); > > > > - /* Mark plugging as done but check whether unplug was > > - * requested during ice_plug_aux_dev() call > > - * (e.g. from ice_clear_rdma_cap()) and if so then > > - * plug aux device. > > - */ > > - if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf- > >flags)) > > - ice_unplug_aux_dev(pf); > > - } > > + /* unplug aux dev per request, if an unplug request came in > > + * while processing a plug request, this will handle it > > + */ > > + if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags)) > > + ice_unplug_aux_dev(pf); > > > > if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) { > > struct iidc_event *event; > > -- > > 2.38.1 > >
On Tue, Feb 14, 2023 at 10:24:04PM +0000, Ertman, David M wrote: > > -----Original Message----- > > From: Leon Romanovsky <leonro@nvidia.com> > > Sent: Wednesday, February 1, 2023 1:50 AM > > Subject: Re: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug > > under RTNL lock > > > > On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote: > > > From: Dave Ertman <david.m.ertman@intel.com> > > > > > > RDMA is not supported in ice on a PF that has been added to a bonded > > > interface. To enforce this, when an interface enters a bond, we unplug > > > the auxiliary device that supports RDMA functionality. This unplug > > > currently happens in the context of handling the netdev bonding event. > > > This event is sent to the ice driver under RTNL context. This is causing > > > a deadlock where the RDMA driver is waiting for the RTNL lock to complete > > > the removal. > > > > > > Defer the unplugging/re-plugging of the auxiliary device to the service > > > task so that it is not performed under the RTNL lock context. > > > > > > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > > > Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb- > > 0487c95e395d@leemhuis.info/ > > > Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave") > > > Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed") > > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com> > > > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > > > Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent > > worker at Intel) > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > > > --- > > > drivers/net/ethernet/intel/ice/ice.h | 14 +++++--------- > > > drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++---------- > > > 2 files changed, 12 insertions(+), 19 deletions(-) > > > > <...> > > > > > index 5f86e4111fa9..055494dbcce0 100644 > > > --- a/drivers/net/ethernet/intel/ice/ice_main.c > > > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > > > @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct > > *work) > > > } > > > } > > > > > > - if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) { > > > - /* Plug aux device per request */ > > > + /* Plug aux device per request */ > > > + if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) > > > > Very interesting pattern. You are not holding any locks while running > > ice_service_task() and clear bits before you actually performed requested > > operation. > > > > How do you protect from races while testing bits in other places of ice > > driver? > > Leon, > > Thanks for the review and sorry for the late reply, got sidetracked into another project. > > Your review caused us to re-evaluate the plug/unplug flow, and since these bits are only set/cleared in > the bonding event flow, and the UNPLUG bit set clears the PLUG bit, we attain the desired outcome > in all cases if we swap the order that we evaluate the bits in the service task. I afraid that it won't make ice state machine more understandable. :) Thanks > > Any multi-event situation that happens between or during service task will be handled in the expected way. > > DaveE > > > > > Thanks > > > > > ice_plug_aux_dev(pf); > > > > > > - /* Mark plugging as done but check whether unplug was > > > - * requested during ice_plug_aux_dev() call > > > - * (e.g. from ice_clear_rdma_cap()) and if so then > > > - * plug aux device. > > > - */ > > > - if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf- > > >flags)) > > > - ice_unplug_aux_dev(pf); > > > - } > > > + /* unplug aux dev per request, if an unplug request came in > > > + * while processing a plug request, this will handle it > > > + */ > > > + if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags)) > > > + ice_unplug_aux_dev(pf); > > > > > > if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) { > > > struct iidc_event *event; > > > -- > > > 2.38.1 > > >
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 713069f809ec..3cad5e6b2ad1 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -506,6 +506,7 @@ enum ice_pf_flags { ICE_FLAG_VF_VLAN_PRUNING, ICE_FLAG_LINK_LENIENT_MODE_ENA, ICE_FLAG_PLUG_AUX_DEV, + ICE_FLAG_UNPLUG_AUX_DEV, ICE_FLAG_MTU_CHANGED, ICE_FLAG_GNSS, /* GNSS successfully initialized */ ICE_PF_FLAGS_NBITS /* must be last */ @@ -950,16 +951,11 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf) */ static inline void ice_clear_rdma_cap(struct ice_pf *pf) { - /* We can directly unplug aux device here only if the flag bit - * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev() - * could race with ice_plug_aux_dev() called from - * ice_service_task(). In this case we only clear that bit now and - * aux device will be unplugged later once ice_plug_aux_device() - * called from ice_service_task() finishes (see ice_service_task()). + /* defer unplug to service task to avoid RTNL lock and + * clear PLUG bit so that pending plugs don't interfere */ - if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) - ice_unplug_aux_dev(pf); - + clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags); + set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags); clear_bit(ICE_FLAG_RDMA_ENA, pf->flags); } #endif /* _ICE_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 5f86e4111fa9..055494dbcce0 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work) } } - if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) { - /* Plug aux device per request */ + /* Plug aux device per request */ + if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) ice_plug_aux_dev(pf); - /* Mark plugging as done but check whether unplug was - * requested during ice_plug_aux_dev() call - * (e.g. from ice_clear_rdma_cap()) and if so then - * plug aux device. - */ - if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) - ice_unplug_aux_dev(pf); - } + /* unplug aux dev per request, if an unplug request came in + * while processing a plug request, this will handle it + */ + if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags)) + ice_unplug_aux_dev(pf); if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) { struct iidc_event *event;