diff mbox series

[rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow

Message ID 25dfefddb0ebefa668c32e06a94d84e3216257cf.1740033937.git.leon@kernel.org (mailing list archive)
State New
Headers show
Series [rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow | expand

Commit Message

Leon Romanovsky Feb. 20, 2025, 6:47 a.m. UTC
From: Patrisious Haddad <phaddad@nvidia.com>

When there is a failure during bind QP, the cleanup flow destroys the
counter regardless if it is the one that created it or not, which is
problematic since if it isn't the one that created it, that counter could
still be in use.

Fix that by destroying the counter only if it was created during this call.

Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/counters.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Zhu Yanjun Feb. 20, 2025, 12:23 p.m. UTC | #1
在 2025/2/20 7:47, Leon Romanovsky 写道:
> From: Patrisious Haddad <phaddad@nvidia.com>
> 
> When there is a failure during bind QP, the cleanup flow destroys the
> counter regardless if it is the one that created it or not, which is
> problematic since if it isn't the one that created it, that counter could
> still be in use.
> 
> Fix that by destroying the counter only if it was created during this call.
> 
> Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration")
> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
> Reviewed-by: Mark Zhang <markzhang@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>   drivers/infiniband/hw/mlx5/counters.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c
> index 4f6c1968a2ee..81cfa74147a1 100644
> --- a/drivers/infiniband/hw/mlx5/counters.c
> +++ b/drivers/infiniband/hw/mlx5/counters.c
> @@ -546,6 +546,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
>   				   struct ib_qp *qp)
>   {
>   	struct mlx5_ib_dev *dev = to_mdev(qp->device);
> +	bool new = false;
>   	int err;
>   
>   	if (!counter->id) {
> @@ -560,6 +561,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
>   			return err;
>   		counter->id =
>   			MLX5_GET(alloc_q_counter_out, out, counter_set_id);
> +		new = true;
It seems that there is no other better method except that a new bool 
variable is used. IMO, this method can fix this problem.

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun

>   	}
>   
>   	err = mlx5_ib_qp_set_counter(qp, counter);
> @@ -569,8 +571,10 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
>   	return 0;
>   
>   fail_set_counter:
> -	mlx5_ib_counter_dealloc(counter);
> -	counter->id = 0;
> +	if (new) {
> +		mlx5_ib_counter_dealloc(counter);
> +		counter->id = 0;
> +	}
>   
>   	return err;
>   }
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c
index 4f6c1968a2ee..81cfa74147a1 100644
--- a/drivers/infiniband/hw/mlx5/counters.c
+++ b/drivers/infiniband/hw/mlx5/counters.c
@@ -546,6 +546,7 @@  static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
 				   struct ib_qp *qp)
 {
 	struct mlx5_ib_dev *dev = to_mdev(qp->device);
+	bool new = false;
 	int err;
 
 	if (!counter->id) {
@@ -560,6 +561,7 @@  static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
 			return err;
 		counter->id =
 			MLX5_GET(alloc_q_counter_out, out, counter_set_id);
+		new = true;
 	}
 
 	err = mlx5_ib_qp_set_counter(qp, counter);
@@ -569,8 +571,10 @@  static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
 	return 0;
 
 fail_set_counter:
-	mlx5_ib_counter_dealloc(counter);
-	counter->id = 0;
+	if (new) {
+		mlx5_ib_counter_dealloc(counter);
+		counter->id = 0;
+	}
 
 	return err;
 }