diff mbox

[rdma-core] mlx5: Return pointer to CQ doorbell

Message ID 1502976998-20906-1-git-send-email-yishaih@mellanox.com (mailing list archive)
State Accepted
Headers show

Commit Message

Yishai Hadas Aug. 17, 2017, 1:36 p.m. UTC
From: Leon Romanovsky <leonro@mellanox.com>

The returned UAR pointer is actually void ** and points to whole
UAR database.

Because user is not supposed to access it and expected to use
the CQ doorbell, we will return uar[0] (CQ doorbell) directly
and eliminate the following logic in the applications:

uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) +
                     MLX5_CQ_DOORBELL;

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
---

Pull request was sent:
https://github.com/linux-rdma/rdma-core/pull/184

 providers/mlx5/mlx5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jason Gunthorpe Aug. 17, 2017, 5:33 p.m. UTC | #1
On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> The returned UAR pointer is actually void ** and points to whole
> UAR database.
> 
> Because user is not supposed to access it and expected to use
> the CQ doorbell, we will return uar[0] (CQ doorbell) directly
> and eliminate the following logic in the applications:
> 
> uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) +
>                      MLX5_CQ_DOORBELL;

NAK, at least, as is..

This changes the ABI and mlx5dv is a public interface now, so you have
to do it properly.

Looking at it, my advice is to rev the symbol version of
mlx5dv_init_obj and consider providing a compat symbol.

Also, you need to change the name of the 'uar' field:

> -	cq_out->uar	  = mctx->uar;
> +	cq_out->uar	  = mctx->uar[0];

To something else, to make it clear, which of the two ABIs the
application code is coded to - for a change like this, you *MUST*
force old apps to have a compile failure.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Aug. 17, 2017, 5:39 p.m. UTC | #2
On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote:
> On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > The returned UAR pointer is actually void ** and points to whole
> > UAR database.
> >
> > Because user is not supposed to access it and expected to use
> > the CQ doorbell, we will return uar[0] (CQ doorbell) directly
> > and eliminate the following logic in the applications:
> >
> > uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) +
> >                      MLX5_CQ_DOORBELL;
>
> NAK, at least, as is..
>
> This changes the ABI and mlx5dv is a public interface now, so you have
> to do it properly.
>
> Looking at it, my advice is to rev the symbol version of
> mlx5dv_init_obj and consider providing a compat symbol.
>
> Also, you need to change the name of the 'uar' field:
>
> > -	cq_out->uar	  = mctx->uar;
> > +	cq_out->uar	  = mctx->uar[0];
>
> To something else, to make it clear, which of the two ABIs the
> application code is coded to - for a change like this, you *MUST*
> force old apps to have a compile failure.

I know exactly who coded on top of this interface. They are the ones who
asked for this change and they are fully understand the implications,
inability to work with this interface on "old" libraries.

Thanks

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 17, 2017, 6:03 p.m. UTC | #3
On Thu, Aug 17, 2017 at 08:39:32PM +0300, Leon Romanovsky wrote:
> On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote:
> > On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote:
> > > From: Leon Romanovsky <leonro@mellanox.com>
> > >
> > > The returned UAR pointer is actually void ** and points to whole
> > > UAR database.
> > >
> > > Because user is not supposed to access it and expected to use
> > > the CQ doorbell, we will return uar[0] (CQ doorbell) directly
> > > and eliminate the following logic in the applications:
> > >
> > > uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) +
> > >                      MLX5_CQ_DOORBELL;
> >
> > NAK, at least, as is..
> >
> > This changes the ABI and mlx5dv is a public interface now, so you have
> > to do it properly.
> >
> > Looking at it, my advice is to rev the symbol version of
> > mlx5dv_init_obj and consider providing a compat symbol.
> >
> > Also, you need to change the name of the 'uar' field:
> >
> > > -	cq_out->uar	  = mctx->uar;
> > > +	cq_out->uar	  = mctx->uar[0];
> >
> > To something else, to make it clear, which of the two ABIs the
> > application code is coded to - for a change like this, you *MUST*
> > force old apps to have a compile failure.
> 
> I know exactly who coded on top of this interface. They are the ones who
> asked for this change and they are fully understand the implications,
> inability to work with this interface on "old" libraries.

I don't care who they are - you need to follow sane ABI rules if you
are pushing public shared libraries into distros. This is not negotiable.

You can maybe avoid providing compat, but not the rest of it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas Aug. 21, 2017, 9:06 a.m. UTC | #4
On 8/17/2017 9:03 PM, Jason Gunthorpe wrote:
> On Thu, Aug 17, 2017 at 08:39:32PM +0300, Leon Romanovsky wrote:
>> On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote:
>>> On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote:
>>>> From: Leon Romanovsky <leonro@mellanox.com>
>>>>
>>>> The returned UAR pointer is actually void ** and points to whole
>>>> UAR database.
>>>>
>>>> Because user is not supposed to access it and expected to use
>>>> the CQ doorbell, we will return uar[0] (CQ doorbell) directly
>>>> and eliminate the following logic in the applications:
>>>>
>>>> uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) +
>>>>                      MLX5_CQ_DOORBELL;
>>>
>>> NAK, at least, as is..
>>>
>>> This changes the ABI and mlx5dv is a public interface now, so you have
>>> to do it properly.
>>>
>>> Looking at it, my advice is to rev the symbol version of
>>> mlx5dv_init_obj and consider providing a compat symbol.
>>>
>>> Also, you need to change the name of the 'uar' field:
>>>
>>>> -	cq_out->uar	  = mctx->uar;
>>>> +	cq_out->uar	  = mctx->uar[0];
>>>
>>> To something else, to make it clear, which of the two ABIs the
>>> application code is coded to - for a change like this, you *MUST*
>>> force old apps to have a compile failure.
>>
>> I know exactly who coded on top of this interface. They are the ones who
>> asked for this change and they are fully understand the implications,
>> inability to work with this interface on "old" libraries.
>
> I don't care who they are - you need to follow sane ABI rules if you
> are pushing public shared libraries into distros. This is not negotiable.
>

There is no ABI/API change but just a bug fix in mlx5 driver.
The API talked about void* but by mistake the mlx5 code returned void**, 
this patch fixes that.

We don't care about potential break for direct mlx5 applications in this 
arm_cq flow as code was buggy and couldn't be used properly.

Patch was merged.




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 21, 2017, 3:22 p.m. UTC | #5
On Mon, Aug 21, 2017 at 12:06:57PM +0300, Yishai Hadas wrote:
> On 8/17/2017 9:03 PM, Jason Gunthorpe wrote:
> >On Thu, Aug 17, 2017 at 08:39:32PM +0300, Leon Romanovsky wrote:
> >>On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote:
> >>>On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote:
> >>>>From: Leon Romanovsky <leonro@mellanox.com>
> >>>>
> >>>>The returned UAR pointer is actually void ** and points to whole
> >>>>UAR database.
> >>>>
> >>>>Because user is not supposed to access it and expected to use
> >>>>the CQ doorbell, we will return uar[0] (CQ doorbell) directly
> >>>>and eliminate the following logic in the applications:
> >>>>
> >>>>uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) +
> >>>>                     MLX5_CQ_DOORBELL;
> >>>
> >>>NAK, at least, as is..
> >>>
> >>>This changes the ABI and mlx5dv is a public interface now, so you have
> >>>to do it properly.
> >>>
> >>>Looking at it, my advice is to rev the symbol version of
> >>>mlx5dv_init_obj and consider providing a compat symbol.
> >>>
> >>>Also, you need to change the name of the 'uar' field:
> >>>
> >>>>-	cq_out->uar	  = mctx->uar;
> >>>>+	cq_out->uar	  = mctx->uar[0];
> >>>
> >>>To something else, to make it clear, which of the two ABIs the
> >>>application code is coded to - for a change like this, you *MUST*
> >>>force old apps to have a compile failure.
> >>
> >>I know exactly who coded on top of this interface. They are the ones who
> >>asked for this change and they are fully understand the implications,
> >>inability to work with this interface on "old" libraries.
> >
> >I don't care who they are - you need to follow sane ABI rules if you
> >are pushing public shared libraries into distros. This is not negotiable.
> >
> 
> There is no ABI/API change but just a bug fix in mlx5 driver.
> The API talked about void* but by mistake the mlx5 code returned void**,
> this patch fixes that.

The commit message is garbage then.

Is there existing code out there that uses cq_out->uar and works
properly today? Yes or No?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas Aug. 22, 2017, 8:36 a.m. UTC | #6
On 8/21/2017 6:22 PM, Jason Gunthorpe wrote:
> Is there existing code out there that uses cq_out->uar and works
> properly today? Yes or No?

No, only this fix enables that to work properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 22, 2017, 4:30 p.m. UTC | #7
On Tue, Aug 22, 2017 at 11:36:42AM +0300, Yishai Hadas wrote:
> On 8/21/2017 6:22 PM, Jason Gunthorpe wrote:
> >Is there existing code out there that uses cq_out->uar and works
> >properly today? Yes or No?
> 
> No, only this fix enables that to work properly.

This is still bad reasoning.

rdma-core 14 does not support the 'uar' field, it may as well have
been marked reserved.

rdma-core 15 supports 'uar'.

This is an ABI change, and need to be handled properly to protect the
users against mixing incompatible libraries.

The fact that rdma-core 14 intended to support uar but screwed it up
doesn't really matter from the user perspective. All they care about
is that an app using using uar needs to be build against >= 15 or it
silently explodes at runtime.

This sorts of issues are exactly what we are supposed to be using
symbol versions to protect against, and largely why they exist at
all. If you do it properly then tools like RPM and the dynanmic linker
take care of everything for the end user.

You don't get to ignore sane shared library versioning rules just
because it is your company's library - part of the point of rdma-core
is to have a much stronger stance on ABI issues (our community has
done a terrible job over the years) and I do not like this weakening.

Particularly since a patch to do this with proper compatibility
exists, and there is no reason at all to take a shortcut.

Doug? This needs to be decided before you mark rdma-core 15 final.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas Aug. 23, 2017, 12:13 p.m. UTC | #8
On 8/22/2017 7:30 PM, Jason Gunthorpe wrote:
> On Tue, Aug 22, 2017 at 11:36:42AM +0300, Yishai Hadas wrote:
>> On 8/21/2017 6:22 PM, Jason Gunthorpe wrote:
>>> Is there existing code out there that uses cq_out->uar and works
>>> properly today? Yes or No?
>>
>> No, only this fix enables that to work properly.

> Particularly since a patch to do this with proper compatibility
> exists, and there is no reason at all to take a shortcut.
>

We are not looking for a shortcut but for a solution that will match 
majority users if not all.

Changing the symbol version for this API will cause that applications 
that will be linked against this version won't be able to run at all on 
previous RH packages (e.g RH 7.3/7.4) even if this flow wasn't used at 
all as we expect. As pointed, from our knowledge there are no customers 
for that specific *rare* flow in mlx5 DV and we prefer to stay with 
current bug fix and same API version.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 23, 2017, 2:06 p.m. UTC | #9
On Wed, Aug 23, 2017 at 03:13:38PM +0300, Yishai Hadas wrote:
> On 8/22/2017 7:30 PM, Jason Gunthorpe wrote:
> >On Tue, Aug 22, 2017 at 11:36:42AM +0300, Yishai Hadas wrote:
> >>On 8/21/2017 6:22 PM, Jason Gunthorpe wrote:
> >>>Is there existing code out there that uses cq_out->uar and works
> >>>properly today? Yes or No?
> >>
> >>No, only this fix enables that to work properly.
> 
> >Particularly since a patch to do this with proper compatibility
> >exists, and there is no reason at all to take a shortcut.
> 
> We are not looking for a shortcut but for a solution that will match
> majority users if not all.

It is a short cut.

> Changing the symbol version for this API will cause that applications that
> will be linked against this version won't be able to run at all on
> previous

Then provide the compat symbol too, as I said in the first place, easy
to do.

There is no excuse here.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/providers/mlx5/mlx5.c b/providers/mlx5/mlx5.c
index ede91cb..c7a6efb 100644
--- a/providers/mlx5/mlx5.c
+++ b/providers/mlx5/mlx5.c
@@ -678,7 +678,7 @@  static int mlx5dv_get_cq(struct ibv_cq *cq_in,
 	cq_out->cqe_size  = mcq->cqe_sz;
 	cq_out->buf       = mcq->active_buf->buf;
 	cq_out->dbrec     = mcq->dbrec;
-	cq_out->uar	  = mctx->uar;
+	cq_out->uar	  = mctx->uar[0];
 
 	mcq->flags	 |= MLX5_CQ_FLAGS_DV_OWNED;