Message ID | 1742331077-102038-3-git-send-email-tariqt@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mlx5 misc fixes 2025-03-18 | expand |
On Tue, Mar 18, 2025 at 10:51:17PM +0200, Tariq Toukan wrote: > From: Moshe Shemesh <moshe@nvidia.com> > > The health poll mechanism performs periodic checks to detect firmware > errors. One of the checks verifies the function is still enabled on > firmware side, but the function is enabled only after enable_hca command > completed. Start health poll after enable_hca command to avoid a race > between function enabled and first health polling. > > Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") > Signed-off-by: Moshe Shemesh <moshe@nvidia.com> > Reviewed-by: Shay Drori <shayd@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c > index ec956c4bcebd..7c3312d6aed9 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c > @@ -1205,24 +1205,24 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou > dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev); > mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP); > > - mlx5_start_health_poll(dev); > - > err = mlx5_core_enable_hca(dev, 0); > if (err) { > mlx5_core_err(dev, "enable hca failed\n"); > - goto stop_health_poll; > + goto err_cmd_cleanup; > } > > + mlx5_start_health_poll(dev); > + > err = mlx5_core_set_issi(dev); > if (err) { > mlx5_core_err(dev, "failed to set issi\n"); > - goto err_disable_hca; > + goto stop_health_poll; > } > > err = mlx5_satisfy_startup_pages(dev, 1); > if (err) { > mlx5_core_err(dev, "failed to allocate boot pages\n"); > - goto err_disable_hca; > + goto stop_health_poll; > } > > err = mlx5_tout_query_dtor(dev); > @@ -1235,10 +1235,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou > > reclaim_boot_pages: > mlx5_reclaim_startup_pages(dev); > -err_disable_hca: > - mlx5_core_disable_hca(dev, 0); > stop_health_poll: > mlx5_stop_health_poll(dev, boot); > + mlx5_core_disable_hca(dev, 0); > err_cmd_cleanup: > mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); > mlx5_cmd_disable(dev); > @@ -1249,8 +1248,8 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou > static void mlx5_function_disable(struct mlx5_core_dev *dev, bool boot) > { > mlx5_reclaim_startup_pages(dev); > - mlx5_core_disable_hca(dev, 0); > mlx5_stop_health_poll(dev, boot); > + mlx5_core_disable_hca(dev, 0); > mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); > mlx5_cmd_disable(dev); > } Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > -- > 2.31.1
On Wed, Mar 19, 2025 at 2:22 AM Tariq Toukan <tariqt@nvidia.com> wrote: > > From: Moshe Shemesh <moshe@nvidia.com> > > The health poll mechanism performs periodic checks to detect firmware > errors. One of the checks verifies the function is still enabled on > firmware side, but the function is enabled only after enable_hca command > completed. Start health poll after enable_hca command to avoid a race > between function enabled and first health polling. > > Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") > Signed-off-by: Moshe Shemesh <moshe@nvidia.com> > Reviewed-by: Shay Drori <shayd@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index ec956c4bcebd..7c3312d6aed9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1205,24 +1205,24 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev); mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP); - mlx5_start_health_poll(dev); - err = mlx5_core_enable_hca(dev, 0); if (err) { mlx5_core_err(dev, "enable hca failed\n"); - goto stop_health_poll; + goto err_cmd_cleanup; } + mlx5_start_health_poll(dev); + err = mlx5_core_set_issi(dev); if (err) { mlx5_core_err(dev, "failed to set issi\n"); - goto err_disable_hca; + goto stop_health_poll; } err = mlx5_satisfy_startup_pages(dev, 1); if (err) { mlx5_core_err(dev, "failed to allocate boot pages\n"); - goto err_disable_hca; + goto stop_health_poll; } err = mlx5_tout_query_dtor(dev); @@ -1235,10 +1235,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou reclaim_boot_pages: mlx5_reclaim_startup_pages(dev); -err_disable_hca: - mlx5_core_disable_hca(dev, 0); stop_health_poll: mlx5_stop_health_poll(dev, boot); + mlx5_core_disable_hca(dev, 0); err_cmd_cleanup: mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); mlx5_cmd_disable(dev); @@ -1249,8 +1248,8 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou static void mlx5_function_disable(struct mlx5_core_dev *dev, bool boot) { mlx5_reclaim_startup_pages(dev); - mlx5_core_disable_hca(dev, 0); mlx5_stop_health_poll(dev, boot); + mlx5_core_disable_hca(dev, 0); mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); mlx5_cmd_disable(dev); }