diff mbox series

[net,1/8] net/mlx5: Always drain health in shutdown callback

Message ID 20240730061638.1831002-2-tariqt@nvidia.com (mailing list archive)
State Accepted
Commit 1b75da22ed1e6171e261bc9265370162553d5393
Delegated to: Netdev Maintainers
Headers show
Series mlx5 misc fixes 2024-07-30 | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 42 this patch: 42
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: parav@nvidia.com; 3 maintainers not CCed: parav@nvidia.com jiri@resnulli.us linux-rdma@vger.kernel.org
netdev/build_clang success Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 43 this patch: 43
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 21 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-31--12-00 (tests: 707)

Commit Message

Tariq Toukan July 30, 2024, 6:16 a.m. UTC
From: Shay Drory <shayd@nvidia.com>

There is no point in recovery during device shutdown. if health
work started need to wait for it to avoid races and NULL pointer
access.

Hence, drain health WQ on shutdown callback.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Fixes: d2aa060d40fa ("net/mlx5: Cancel health poll before sending panic teardown command")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c          | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

Wojciech Drewek July 30, 2024, 9:33 a.m. UTC | #1
On 30.07.2024 08:16, Tariq Toukan wrote:
> From: Shay Drory <shayd@nvidia.com>
> 
> There is no point in recovery during device shutdown. if health
> work started need to wait for it to avoid races and NULL pointer
> access.
> 
> Hence, drain health WQ on shutdown callback.
> 
> Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
> Fixes: d2aa060d40fa ("net/mlx5: Cancel health poll before sending panic teardown command")
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>

>  drivers/net/ethernet/mellanox/mlx5/core/main.c          | 2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 527da58c7953..5b7e6f4b5c7e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -2142,7 +2142,6 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev)
>  	/* Panic tear down fw command will stop the PCI bus communication
>  	 * with the HCA, so the health poll is no longer needed.
>  	 */
> -	mlx5_drain_health_wq(dev);
>  	mlx5_stop_health_poll(dev, false);
>  
>  	ret = mlx5_cmd_fast_teardown_hca(dev);
> @@ -2177,6 +2176,7 @@ static void shutdown(struct pci_dev *pdev)
>  
>  	mlx5_core_info(dev, "Shutdown was called\n");
>  	set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state);
> +	mlx5_drain_health_wq(dev);
>  	err = mlx5_try_fast_unload(dev);
>  	if (err)
>  		mlx5_unload_one(dev, false);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
> index b2986175d9af..b706f1486504 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
> @@ -112,6 +112,7 @@ static void mlx5_sf_dev_shutdown(struct auxiliary_device *adev)
>  	struct mlx5_core_dev *mdev = sf_dev->mdev;
>  
>  	set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state);
> +	mlx5_drain_health_wq(mdev);
>  	mlx5_unload_one(mdev, false);
>  }
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 527da58c7953..5b7e6f4b5c7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -2142,7 +2142,6 @@  static int mlx5_try_fast_unload(struct mlx5_core_dev *dev)
 	/* Panic tear down fw command will stop the PCI bus communication
 	 * with the HCA, so the health poll is no longer needed.
 	 */
-	mlx5_drain_health_wq(dev);
 	mlx5_stop_health_poll(dev, false);
 
 	ret = mlx5_cmd_fast_teardown_hca(dev);
@@ -2177,6 +2176,7 @@  static void shutdown(struct pci_dev *pdev)
 
 	mlx5_core_info(dev, "Shutdown was called\n");
 	set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state);
+	mlx5_drain_health_wq(dev);
 	err = mlx5_try_fast_unload(dev);
 	if (err)
 		mlx5_unload_one(dev, false);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index b2986175d9af..b706f1486504 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -112,6 +112,7 @@  static void mlx5_sf_dev_shutdown(struct auxiliary_device *adev)
 	struct mlx5_core_dev *mdev = sf_dev->mdev;
 
 	set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state);
+	mlx5_drain_health_wq(mdev);
 	mlx5_unload_one(mdev, false);
 }