Message ID | 20220830081627.1205872-1-ivecera@redhat.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] iavf: Detach device during reset task | expand |
> -----Original Message----- > From: Ivan Vecera <ivecera@redhat.com> > Sent: Tuesday, August 30, 2022 10:16 AM > Subject: [PATCH net v2] iavf: Detach device during reset task > > iavf_reset_task() takes crit_lock at the beginning and holds it during whole call. > The function subsequently calls > iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when userspace > initiates during the reset task any ndo callback that runs under RTNL like > iavf_open() because some of that functions tries to take crit_lock. This leads to > classic A-B B-A deadlock scenario. > > To resolve this situation the device should be detached in > iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running under > RTNL and reattach the device at the end. > > Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around > i40evf_set_interrupt_capability") > Cc: Jacob Keller <jacob.e.keller@intel.com> > Cc: Patryk Piotrowski <patryk.piotrowski@intel.com> > Cc: SlawomirX Laba <slawomirx.laba@intel.com> > Tested-by: Vitaly Grinberg <vgrinber@redhat.com> > Signed-off-by: Ivan Vecera <ivecera@redhat.com> > > @@ -2884,7 +2889,7 @@ static void iavf_reset_task(struct work_struct *work) > if (adapter->state != __IAVF_REMOVE) > queue_work(iavf_wq, &adapter->reset_task); > > - return; > + goto reset_finish; > } > > while (!mutex_trylock(&adapter->client_lock)) Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish label? if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) { dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n", reg_val); iavf_disable_vf(adapter); mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); return; /* Do not attempt to reinit. It's dead, Jim. */ } I am concerned that if the reset never finishes and iavf goes into disabled state, and then for example if driver reload operation is performed, bad things can happen. [1] https://elixir.bootlin.com/linux/v6.0-rc3/source/drivers/net/ethernet/intel/iavf/iavf_main.c#L2939
On Tue, 30 Aug 2022 20:49:54 +0000 "Laba, SlawomirX" <slawomirx.laba@intel.com> wrote: > Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish label? > > if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) { > dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n", > reg_val); > iavf_disable_vf(adapter); > mutex_unlock(&adapter->client_lock); > mutex_unlock(&adapter->crit_lock); > return; /* Do not attempt to reinit. It's dead, Jim. */ > } > > I am concerned that if the reset never finishes and iavf goes into disabled state, and then for example if driver reload operation is performed, bad things can happen. I think we should not re-attach device back as the VF is disabled. Detached device causes that userspace (user) won't be able to configure associated interface that is correct. Driver reload won't cause anything special in this situation because during module removal the interface is unregistered. Thanks, Ivan
> -----Original Message----- > From: Ivan Vecera <ivecera@redhat.com> > Sent: Wednesday, August 31, 2022 12:06 AM > To: Laba, SlawomirX <slawomirx.laba@intel.com> > Cc: netdev@vger.kernel.org; Keller, Jacob E <jacob.e.keller@intel.com>; > Piotrowski, Patryk <patryk.piotrowski@intel.com>; Vitaly Grinberg > <vgrinber@redhat.com>; Brandeburg, Jesse <jesse.brandeburg@intel.com>; > Nguyen, Anthony L <anthony.l.nguyen@intel.com>; David S. Miller > <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Jeff Kirsher > <jeffrey.t.kirsher@intel.com>; moderated list:INTEL ETHERNET DRIVERS <intel- > wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org> > Subject: Re: [PATCH net v2] iavf: Detach device during reset task > > On Tue, 30 Aug 2022 20:49:54 +0000 > "Laba, SlawomirX" <slawomirx.laba@intel.com> wrote: > > > Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish > label? > > > > if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) { > > dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n", > > reg_val); > > iavf_disable_vf(adapter); > > mutex_unlock(&adapter->client_lock); > > mutex_unlock(&adapter->crit_lock); > > return; /* Do not attempt to reinit. It's dead, Jim. */ > > } > > > > I am concerned that if the reset never finishes and iavf goes into disabled state, > and then for example if driver reload operation is performed, bad things can > happen. > > I think we should not re-attach device back as the VF is disabled. Detached device > causes that userspace (user) won't be able to configure associated interface > that is correct. Driver reload won't cause anything special in this situation > because during module removal the interface is unregistered. > > Thanks, > Ivan I agree. It's safe to remove a detached device. Thanks, Jake
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of > Ivan Vecera > Sent: Tuesday, August 30, 2022 10:16 AM > To: netdev@vger.kernel.org > Cc: Paolo Abeni <pabeni@redhat.com>; Eric Dumazet > <edumazet@google.com>; moderated list:INTEL ETHERNET DRIVERS <intel- > wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>; > Piotrowski, Patryk <patryk.piotrowski@intel.com>; Jeff Kirsher > <jeffrey.t.kirsher@intel.com>; Jakub Kicinski <kuba@kernel.org>; Vitaly > Grinberg <vgrinber@redhat.com>; David S. Miller <davem@davemloft.net> > Subject: [Intel-wired-lan] [PATCH net v2] iavf: Detach device during reset > task > > iavf_reset_task() takes crit_lock at the beginning and holds it during whole > call. The function subsequently calls > iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when > userspace initiates during the reset task any ndo callback that runs under > RTNL like iavf_open() because some of that functions tries to take crit_lock. > This leads to classic A-B B-A deadlock scenario. > > To resolve this situation the device should be detached in > iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running > under RTNL and reattach the device at the end. > > Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around > i40evf_set_interrupt_capability") > Cc: Jacob Keller <jacob.e.keller@intel.com> > Cc: Patryk Piotrowski <patryk.piotrowski@intel.com> > Cc: SlawomirX Laba <slawomirx.laba@intel.com> > Tested-by: Vitaly Grinberg <vgrinber@redhat.com> > Signed-off-by: Ivan Vecera <ivecera@redhat.com> > --- > drivers/net/ethernet/intel/iavf/iavf_main.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c > b/drivers/net/ethernet/intel/iavf/iavf_main.c > index f39440ad5c50..10aa99dfdcdb 100644 > --- a/drivers/net/ethernet/intel/iavf/iavf_main.c > +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index f39440ad5c50..10aa99dfdcdb 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -2877,6 +2877,11 @@ static void iavf_reset_task(struct work_struct *work) int i = 0, err; bool running; + /* Detach interface to avoid subsequent NDO callbacks */ + rtnl_lock(); + netif_device_detach(netdev); + rtnl_unlock(); + /* When device is being removed it doesn't make sense to run the reset * task, just return in such a case. */ @@ -2884,7 +2889,7 @@ static void iavf_reset_task(struct work_struct *work) if (adapter->state != __IAVF_REMOVE) queue_work(iavf_wq, &adapter->reset_task); - return; + goto reset_finish; } while (!mutex_trylock(&adapter->client_lock)) @@ -2954,7 +2959,6 @@ static void iavf_reset_task(struct work_struct *work) if (running) { netif_carrier_off(netdev); - netif_tx_stop_all_queues(netdev); adapter->link_up = false; iavf_napi_disable_all(adapter); } @@ -3084,7 +3088,7 @@ static void iavf_reset_task(struct work_struct *work) mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); - return; + goto reset_finish; reset_err: if (running) { set_bit(__IAVF_VSI_DOWN, adapter->vsi.state); @@ -3095,6 +3099,10 @@ static void iavf_reset_task(struct work_struct *work) mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n"); +reset_finish: + rtnl_lock(); + netif_device_attach(netdev); + rtnl_unlock(); } /**