diff mbox series

[net,v2] iavf: fix hang on reboot with ice

Message ID 20230313160645.3332457-1-sassmann@kpanic.de (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] iavf: fix hang on reboot with ice | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/cc_maintainers fail 3 blamed authors not CCed: mateusz.palczewski@intel.com jacob.e.keller@intel.com phani.r.burra@intel.com; 8 maintainers not CCed: phani.r.burra@intel.com pabeni@redhat.com jesse.brandeburg@intel.com kuba@kernel.org edumazet@google.com mateusz.palczewski@intel.com jacob.e.keller@intel.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch warning WARNING: Reported-by: should be immediately followed by Link: with a URL to the report
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Stefan Assmann March 13, 2023, 4:06 p.m. UTC
When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <mcornea@redhat.com>
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
v2: return instead of breaking the while (1) loop
    This avoids going through remove code twice and is how things worked
    before a8417330f8a5.

 drivers/net/ethernet/intel/iavf/iavf_main.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Michal Kubiak March 14, 2023, 2:24 p.m. UTC | #1
On Mon, Mar 13, 2023 at 05:06:45PM +0100, Stefan Assmann wrote:
> When a system with E810 with existing VFs gets rebooted the following
> hang may be observed.
> 
>  Pid 1 is hung in iavf_remove(), part of a network driver:
>  PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
>   #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
>   #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
>   #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
>   #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
>   #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
>   #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
>   #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
>   #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
>   #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
>   #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
>  #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
>  #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
>  #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
>  #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
>  #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
>  #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
>  #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
>  #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
>  #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
>  #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
>  #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
>  #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
>  #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
>      RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
>      RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
>      RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
>      RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
>      R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
>      R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
>      ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b
> 
> During reboot all drivers PM shutdown callbacks are invoked.
> In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
> In ice_shutdown() the call chain above is executed, which at some point
> calls iavf_remove(). However iavf_remove() expects the VF to be in one
> of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
> that's not the case it sleeps forever.
> So if iavf_shutdown() gets invoked before iavf_remove() the system will
> hang indefinitely because the adapter is already in state __IAVF_REMOVE.
> 
> Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
> as we already went through iavf_shutdown().
> 
> Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
> Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
> Reported-by: Marius Cornea <mcornea@redhat.com>
> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
> ---
> v2: return instead of breaking the while (1) loop
>     This avoids going through remove code twice and is how things worked
>     before a8417330f8a5.

Good catch. Indeed there was such a logic before that patch.

Thanks,
Michal

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>

> 
>  drivers/net/ethernet/intel/iavf/iavf_main.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 3273aeb8fa67..ce7071e9af15 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -5066,6 +5066,11 @@ static void iavf_remove(struct pci_dev *pdev)
>  			mutex_unlock(&adapter->crit_lock);
>  			break;
>  		}
> +		/* Simply return if we already went through iavf_shutdown */
> +		if (adapter->state == __IAVF_REMOVE) {
> +			mutex_unlock(&adapter->crit_lock);
> +			return;
> +		}
>  
>  		mutex_unlock(&adapter->crit_lock);
>  		usleep_range(500, 1000);
> -- 
> 2.39.1
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 3273aeb8fa67..ce7071e9af15 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -5066,6 +5066,11 @@  static void iavf_remove(struct pci_dev *pdev)
 			mutex_unlock(&adapter->crit_lock);
 			break;
 		}
+		/* Simply return if we already went through iavf_shutdown */
+		if (adapter->state == __IAVF_REMOVE) {
+			mutex_unlock(&adapter->crit_lock);
+			return;
+		}
 
 		mutex_unlock(&adapter->crit_lock);
 		usleep_range(500, 1000);