diff mbox

[rdma-core,1/1] mlx5: Account for at least 1 medium bfreg in low_lat_uuars check

Message ID 1524506327-24417-1-git-send-email-rzambre@uci.edu (mailing list archive)
State Superseded
Headers show

Commit Message

Rohit Zambre April 23, 2018, 5:58 p.m. UTC
The calculation of need_uuar_lock rightly accounts for at least 1 medium
bfreg. However, the user can wrongly request for all but one uuars to be
low latency uuars.

For example, setting MLX5_TOTAL_UUARS=16 and MLX5_NUM_LOW_LAT_UUARS=15
doesn't throw any error but the user gets only 14 low latency uuars;
need_lock for bf_1 evaluates to 1. In this case, the first QP of the
context is mapped to bf_1 which is not a low latency uuar.

Signed-off-by: Rohit Zambre <rzambre@uci.edu>
---
 providers/mlx5/mlx5.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Doug Ledford April 27, 2018, 5:49 p.m. UTC | #1
On Mon, 2018-04-23 at 17:58 +0000, Rohit Zambre wrote:
> The calculation of need_uuar_lock rightly accounts for at least 1 medium
> bfreg. However, the user can wrongly request for all but one uuars to be
> low latency uuars.
> 
> For example, setting MLX5_TOTAL_UUARS=16 and MLX5_NUM_LOW_LAT_UUARS=15
> doesn't throw any error but the user gets only 14 low latency uuars;
> need_lock for bf_1 evaluates to 1. In this case, the first QP of the
> context is mapped to bf_1 which is not a low latency uuar.
> 
> Signed-off-by: Rohit Zambre <rzambre@uci.edu>
> ---
>  providers/mlx5/mlx5.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/providers/mlx5/mlx5.c b/providers/mlx5/mlx5.c
> index 5590241..2a09a95 100644
> --- a/providers/mlx5/mlx5.c
> +++ b/providers/mlx5/mlx5.c
> @@ -1026,7 +1026,8 @@ static struct verbs_context *mlx5_alloc_context(struct ibv_device *ibdev,
>  		goto err_free;
>  	}
>  
> -	if (low_lat_uuars > tot_uuars - 1) {
> +	/* account for at least 1 med bfreg and the 0th uuar */
> +	if (low_lat_uuars > tot_uuars - 2) {
>  		errno = ENOMEM;
>  		goto err_free;
>  	}

It would be nice if someone from Mellanox who knows the hardware in more
detail would confirm this and the matching fix for the kernel.
Leon Romanovsky April 29, 2018, 6:48 a.m. UTC | #2
On Fri, Apr 27, 2018 at 01:49:55PM -0400, Doug Ledford wrote:
> On Mon, 2018-04-23 at 17:58 +0000, Rohit Zambre wrote:
> > The calculation of need_uuar_lock rightly accounts for at least 1 medium
> > bfreg. However, the user can wrongly request for all but one uuars to be
> > low latency uuars.
> >
> > For example, setting MLX5_TOTAL_UUARS=16 and MLX5_NUM_LOW_LAT_UUARS=15
> > doesn't throw any error but the user gets only 14 low latency uuars;
> > need_lock for bf_1 evaluates to 1. In this case, the first QP of the
> > context is mapped to bf_1 which is not a low latency uuar.
> >
> > Signed-off-by: Rohit Zambre <rzambre@uci.edu>
> > ---
> >  providers/mlx5/mlx5.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/providers/mlx5/mlx5.c b/providers/mlx5/mlx5.c
> > index 5590241..2a09a95 100644
> > --- a/providers/mlx5/mlx5.c
> > +++ b/providers/mlx5/mlx5.c
> > @@ -1026,7 +1026,8 @@ static struct verbs_context *mlx5_alloc_context(struct ibv_device *ibdev,
> >  		goto err_free;
> >  	}
> >
> > -	if (low_lat_uuars > tot_uuars - 1) {
> > +	/* account for at least 1 med bfreg and the 0th uuar */
> > +	if (low_lat_uuars > tot_uuars - 2) {
> >  		errno = ENOMEM;
> >  		goto err_free;
> >  	}
>
> It would be nice if someone from Mellanox who knows the hardware in more
> detail would confirm this and the matching fix for the kernel.

Thanks Doug for waiting to us, we started to look on it before weekend,
but didn't finish. Yishai will try his best to answer today.

Thanks

>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Yishai Hadas April 29, 2018, 4:12 p.m. UTC | #3
On 4/23/2018 8:58 PM, Rohit Zambre wrote:
> The calculation of need_uuar_lock rightly accounts for at least 1 medium
> bfreg. 

The code should support even 0 medium bfregs comparing the suggested 
patch below.

However, the user can wrongly request for all but one uuars to be
> low latency uuars.
> 
> For example, setting MLX5_TOTAL_UUARS=16 and MLX5_NUM_LOW_LAT_UUARS=15
> doesn't throw any error but the user gets only 14 low latency uuars;
> need_lock for bf_1 evaluates to 1. In this case, the first QP of the
> context is mapped to bf_1 which is not a low latency uuar.
> 
> Signed-off-by: Rohit Zambre <rzambre@uci.edu>
> ---
>   providers/mlx5/mlx5.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/providers/mlx5/mlx5.c b/providers/mlx5/mlx5.c
> index 5590241..2a09a95 100644
> --- a/providers/mlx5/mlx5.c
> +++ b/providers/mlx5/mlx5.c
> @@ -1026,7 +1026,8 @@ static struct verbs_context *mlx5_alloc_context(struct ibv_device *ibdev,
>   		goto err_free;
>   	}
>   
> -	if (low_lat_uuars > tot_uuars - 1) {
> +	/* account for at least 1 med bfreg and the 0th uuar */
> +	if (low_lat_uuars > tot_uuars - 2) {

The expected fix should not be here but in need_uuar_lock. Please see 
below a candidate suggestion from Eli (cc).

>   		errno = ENOMEM;
>   		goto err_free;
>   	}
> 


  static int need_uuar_lock(struct mlx5_context *ctx, int uuarn)
  {
+       int i;
+
         if (uuarn == 0 || mlx5_single_threaded)
                 return 0;

-       if (uuarn >= (ctx->tot_uuars - ctx->low_lat_uuars) * 2)
+       i = (uuarn / 2) + (uuarn % 2);
+       if (i >= ctx->tot_uuars - ctx->low_lat_uuars)
                 return 0;

Can you please test and confirm whether it works for you ?

Yishai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rohit Zambre April 29, 2018, 5:34 p.m. UTC | #4
On Sun, Apr 29, 2018 at 11:12 AM, Yishai Hadas
<yishaih@dev.mellanox.co.il> wrote:
>
> On 4/23/2018 8:58 PM, Rohit Zambre wrote:
>>
>> The calculation of need_uuar_lock rightly accounts for at least 1 medium
>> bfreg.
>
>
> The code should support even 0 medium bfregs comparing the suggested patch below.

So when the user sets MLX5_TOTAL_UUARS=16 and
MLX5_NUM_LOW_LAT_UUARS=15, the first 15 QPs will map to the 15 low
latency bfregs. To which bfreg will the 16th QP map to?

-Rohit
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/providers/mlx5/mlx5.c b/providers/mlx5/mlx5.c
index 5590241..2a09a95 100644
--- a/providers/mlx5/mlx5.c
+++ b/providers/mlx5/mlx5.c
@@ -1026,7 +1026,8 @@  static struct verbs_context *mlx5_alloc_context(struct ibv_device *ibdev,
 		goto err_free;
 	}
 
-	if (low_lat_uuars > tot_uuars - 1) {
+	/* account for at least 1 med bfreg and the 0th uuar */
+	if (low_lat_uuars > tot_uuars - 2) {
 		errno = ENOMEM;
 		goto err_free;
 	}