diff mbox series

[net,v2] iavf: Detach device during reset task

Message ID 20220830081627.1205872-1-ivecera@redhat.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] iavf: Detach device during reset task | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 44 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ivan Vecera Aug. 30, 2022, 8:16 a.m. UTC
iavf_reset_task() takes crit_lock at the beginning and holds
it during whole call. The function subsequently calls
iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs
when userspace initiates during the reset task any ndo callback
that runs under RTNL like iavf_open() because some of that
functions tries to take crit_lock. This leads to classic A-B B-A
deadlock scenario.

To resolve this situation the device should be detached in
iavf_reset_task() prior taking crit_lock to avoid subsequent
ndos running under RTNL and reattach the device at the end.

Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability")
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
Cc: SlawomirX Laba <slawomirx.laba@intel.com>
Tested-by: Vitaly Grinberg <vgrinber@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Laba, SlawomirX Aug. 30, 2022, 8:49 p.m. UTC | #1
> -----Original Message-----
> From: Ivan Vecera <ivecera@redhat.com>
> Sent: Tuesday, August 30, 2022 10:16 AM
> Subject: [PATCH net v2] iavf: Detach device during reset task
> 
> iavf_reset_task() takes crit_lock at the beginning and holds it during whole call.
> The function subsequently calls
> iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when userspace
> initiates during the reset task any ndo callback that runs under RTNL like
> iavf_open() because some of that functions tries to take crit_lock. This leads to
> classic A-B B-A deadlock scenario.
> 
> To resolve this situation the device should be detached in
> iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running under
> RTNL and reattach the device at the end.
> 
> Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around
> i40evf_set_interrupt_capability")
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>
> Tested-by: Vitaly Grinberg <vgrinber@redhat.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
>
> @@ -2884,7 +2889,7 @@ static void iavf_reset_task(struct work_struct *work)
>  		if (adapter->state != __IAVF_REMOVE)
>  			queue_work(iavf_wq, &adapter->reset_task);
> 
> -		return;
> +		goto reset_finish;
>  	}
> 
>  	while (!mutex_trylock(&adapter->client_lock))

Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish label?

	if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) {
		dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
			reg_val);
		iavf_disable_vf(adapter);
		mutex_unlock(&adapter->client_lock);
		mutex_unlock(&adapter->crit_lock);
		return; /* Do not attempt to reinit. It's dead, Jim. */
	}

I am concerned that if the reset never finishes and iavf goes into disabled state, and then for example if driver reload operation is performed, bad things can happen.

[1] https://elixir.bootlin.com/linux/v6.0-rc3/source/drivers/net/ethernet/intel/iavf/iavf_main.c#L2939
Ivan Vecera Aug. 31, 2022, 7:05 a.m. UTC | #2
On Tue, 30 Aug 2022 20:49:54 +0000
"Laba, SlawomirX" <slawomirx.laba@intel.com> wrote:

> Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish label?
> 
> 	if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) {
> 		dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
> 			reg_val);
> 		iavf_disable_vf(adapter);
> 		mutex_unlock(&adapter->client_lock);
> 		mutex_unlock(&adapter->crit_lock);
> 		return; /* Do not attempt to reinit. It's dead, Jim. */
> 	}
> 
> I am concerned that if the reset never finishes and iavf goes into disabled state, and then for example if driver reload operation is performed, bad things can happen.

I think we should not re-attach device back as the VF is disabled. Detached device causes that userspace (user) won't be able to configure associated interface
that is correct. Driver reload won't cause anything special in this situation because during module removal the interface is unregistered.

Thanks,
Ivan
Jacob Keller Aug. 31, 2022, 8:09 p.m. UTC | #3
> -----Original Message-----
> From: Ivan Vecera <ivecera@redhat.com>
> Sent: Wednesday, August 31, 2022 12:06 AM
> To: Laba, SlawomirX <slawomirx.laba@intel.com>
> Cc: netdev@vger.kernel.org; Keller, Jacob E <jacob.e.keller@intel.com>;
> Piotrowski, Patryk <patryk.piotrowski@intel.com>; Vitaly Grinberg
> <vgrinber@redhat.com>; Brandeburg, Jesse <jesse.brandeburg@intel.com>;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; David S. Miller
> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Jeff Kirsher
> <jeffrey.t.kirsher@intel.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH net v2] iavf: Detach device during reset task
> 
> On Tue, 30 Aug 2022 20:49:54 +0000
> "Laba, SlawomirX" <slawomirx.laba@intel.com> wrote:
> 
> > Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish
> label?
> >
> > 	if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) {
> > 		dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
> > 			reg_val);
> > 		iavf_disable_vf(adapter);
> > 		mutex_unlock(&adapter->client_lock);
> > 		mutex_unlock(&adapter->crit_lock);
> > 		return; /* Do not attempt to reinit. It's dead, Jim. */
> > 	}
> >
> > I am concerned that if the reset never finishes and iavf goes into disabled state,
> and then for example if driver reload operation is performed, bad things can
> happen.
> 
> I think we should not re-attach device back as the VF is disabled. Detached device
> causes that userspace (user) won't be able to configure associated interface
> that is correct. Driver reload won't cause anything special in this situation
> because during module removal the interface is unregistered.
> 
> Thanks,
> Ivan

I agree. It's safe to remove a detached device.

Thanks,
Jake
Jankowski, Konrad0 Sept. 2, 2022, 6:05 a.m. UTC | #4
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Ivan Vecera
> Sent: Tuesday, August 30, 2022 10:16 AM
> To: netdev@vger.kernel.org
> Cc: Paolo Abeni <pabeni@redhat.com>; Eric Dumazet
> <edumazet@google.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>;
> Piotrowski, Patryk <patryk.piotrowski@intel.com>; Jeff Kirsher
> <jeffrey.t.kirsher@intel.com>; Jakub Kicinski <kuba@kernel.org>; Vitaly
> Grinberg <vgrinber@redhat.com>; David S. Miller <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH net v2] iavf: Detach device during reset
> task
> 
> iavf_reset_task() takes crit_lock at the beginning and holds it during whole
> call. The function subsequently calls
> iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when
> userspace initiates during the reset task any ndo callback that runs under
> RTNL like iavf_open() because some of that functions tries to take crit_lock.
> This leads to classic A-B B-A deadlock scenario.
> 
> To resolve this situation the device should be detached in
> iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running
> under RTNL and reattach the device at the end.
> 
> Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around
> i40evf_set_interrupt_capability")
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>
> Tested-by: Vitaly Grinberg <vgrinber@redhat.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_main.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index f39440ad5c50..10aa99dfdcdb 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c

Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index f39440ad5c50..10aa99dfdcdb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2877,6 +2877,11 @@  static void iavf_reset_task(struct work_struct *work)
 	int i = 0, err;
 	bool running;
 
+	/* Detach interface to avoid subsequent NDO callbacks */
+	rtnl_lock();
+	netif_device_detach(netdev);
+	rtnl_unlock();
+
 	/* When device is being removed it doesn't make sense to run the reset
 	 * task, just return in such a case.
 	 */
@@ -2884,7 +2889,7 @@  static void iavf_reset_task(struct work_struct *work)
 		if (adapter->state != __IAVF_REMOVE)
 			queue_work(iavf_wq, &adapter->reset_task);
 
-		return;
+		goto reset_finish;
 	}
 
 	while (!mutex_trylock(&adapter->client_lock))
@@ -2954,7 +2959,6 @@  static void iavf_reset_task(struct work_struct *work)
 
 	if (running) {
 		netif_carrier_off(netdev);
-		netif_tx_stop_all_queues(netdev);
 		adapter->link_up = false;
 		iavf_napi_disable_all(adapter);
 	}
@@ -3084,7 +3088,7 @@  static void iavf_reset_task(struct work_struct *work)
 	mutex_unlock(&adapter->client_lock);
 	mutex_unlock(&adapter->crit_lock);
 
-	return;
+	goto reset_finish;
 reset_err:
 	if (running) {
 		set_bit(__IAVF_VSI_DOWN, adapter->vsi.state);
@@ -3095,6 +3099,10 @@  static void iavf_reset_task(struct work_struct *work)
 	mutex_unlock(&adapter->client_lock);
 	mutex_unlock(&adapter->crit_lock);
 	dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
+reset_finish:
+	rtnl_lock();
+	netif_device_attach(netdev);
+	rtnl_unlock();
 }
 
 /**