Message ID | 20240730061638.1831002-2-tariqt@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 1b75da22ed1e6171e261bc9265370162553d5393 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mlx5 misc fixes 2024-07-30 | expand |
On 30.07.2024 08:16, Tariq Toukan wrote: > From: Shay Drory <shayd@nvidia.com> > > There is no point in recovery during device shutdown. if health > work started need to wait for it to avoid races and NULL pointer > access. > > Hence, drain health WQ on shutdown callback. > > Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") > Fixes: d2aa060d40fa ("net/mlx5: Cancel health poll before sending panic teardown command") > Signed-off-by: Shay Drory <shayd@nvidia.com> > Reviewed-by: Moshe Shemesh <moshe@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> > drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- > drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c > index 527da58c7953..5b7e6f4b5c7e 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c > @@ -2142,7 +2142,6 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev) > /* Panic tear down fw command will stop the PCI bus communication > * with the HCA, so the health poll is no longer needed. > */ > - mlx5_drain_health_wq(dev); > mlx5_stop_health_poll(dev, false); > > ret = mlx5_cmd_fast_teardown_hca(dev); > @@ -2177,6 +2176,7 @@ static void shutdown(struct pci_dev *pdev) > > mlx5_core_info(dev, "Shutdown was called\n"); > set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state); > + mlx5_drain_health_wq(dev); > err = mlx5_try_fast_unload(dev); > if (err) > mlx5_unload_one(dev, false); > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c > index b2986175d9af..b706f1486504 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c > @@ -112,6 +112,7 @@ static void mlx5_sf_dev_shutdown(struct auxiliary_device *adev) > struct mlx5_core_dev *mdev = sf_dev->mdev; > > set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state); > + mlx5_drain_health_wq(mdev); > mlx5_unload_one(mdev, false); > } >
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 527da58c7953..5b7e6f4b5c7e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -2142,7 +2142,6 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev) /* Panic tear down fw command will stop the PCI bus communication * with the HCA, so the health poll is no longer needed. */ - mlx5_drain_health_wq(dev); mlx5_stop_health_poll(dev, false); ret = mlx5_cmd_fast_teardown_hca(dev); @@ -2177,6 +2176,7 @@ static void shutdown(struct pci_dev *pdev) mlx5_core_info(dev, "Shutdown was called\n"); set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state); + mlx5_drain_health_wq(dev); err = mlx5_try_fast_unload(dev); if (err) mlx5_unload_one(dev, false); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c index b2986175d9af..b706f1486504 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c @@ -112,6 +112,7 @@ static void mlx5_sf_dev_shutdown(struct auxiliary_device *adev) struct mlx5_core_dev *mdev = sf_dev->mdev; set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state); + mlx5_drain_health_wq(mdev); mlx5_unload_one(mdev, false); }