Message ID | 1502976998-20906-1-git-send-email-yishaih@mellanox.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote: > From: Leon Romanovsky <leonro@mellanox.com> > > The returned UAR pointer is actually void ** and points to whole > UAR database. > > Because user is not supposed to access it and expected to use > the CQ doorbell, we will return uar[0] (CQ doorbell) directly > and eliminate the following logic in the applications: > > uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) + > MLX5_CQ_DOORBELL; NAK, at least, as is.. This changes the ABI and mlx5dv is a public interface now, so you have to do it properly. Looking at it, my advice is to rev the symbol version of mlx5dv_init_obj and consider providing a compat symbol. Also, you need to change the name of the 'uar' field: > - cq_out->uar = mctx->uar; > + cq_out->uar = mctx->uar[0]; To something else, to make it clear, which of the two ABIs the application code is coded to - for a change like this, you *MUST* force old apps to have a compile failure. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote: > On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote: > > From: Leon Romanovsky <leonro@mellanox.com> > > > > The returned UAR pointer is actually void ** and points to whole > > UAR database. > > > > Because user is not supposed to access it and expected to use > > the CQ doorbell, we will return uar[0] (CQ doorbell) directly > > and eliminate the following logic in the applications: > > > > uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) + > > MLX5_CQ_DOORBELL; > > NAK, at least, as is.. > > This changes the ABI and mlx5dv is a public interface now, so you have > to do it properly. > > Looking at it, my advice is to rev the symbol version of > mlx5dv_init_obj and consider providing a compat symbol. > > Also, you need to change the name of the 'uar' field: > > > - cq_out->uar = mctx->uar; > > + cq_out->uar = mctx->uar[0]; > > To something else, to make it clear, which of the two ABIs the > application code is coded to - for a change like this, you *MUST* > force old apps to have a compile failure. I know exactly who coded on top of this interface. They are the ones who asked for this change and they are fully understand the implications, inability to work with this interface on "old" libraries. Thanks > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 17, 2017 at 08:39:32PM +0300, Leon Romanovsky wrote: > On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote: > > On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote: > > > From: Leon Romanovsky <leonro@mellanox.com> > > > > > > The returned UAR pointer is actually void ** and points to whole > > > UAR database. > > > > > > Because user is not supposed to access it and expected to use > > > the CQ doorbell, we will return uar[0] (CQ doorbell) directly > > > and eliminate the following logic in the applications: > > > > > > uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) + > > > MLX5_CQ_DOORBELL; > > > > NAK, at least, as is.. > > > > This changes the ABI and mlx5dv is a public interface now, so you have > > to do it properly. > > > > Looking at it, my advice is to rev the symbol version of > > mlx5dv_init_obj and consider providing a compat symbol. > > > > Also, you need to change the name of the 'uar' field: > > > > > - cq_out->uar = mctx->uar; > > > + cq_out->uar = mctx->uar[0]; > > > > To something else, to make it clear, which of the two ABIs the > > application code is coded to - for a change like this, you *MUST* > > force old apps to have a compile failure. > > I know exactly who coded on top of this interface. They are the ones who > asked for this change and they are fully understand the implications, > inability to work with this interface on "old" libraries. I don't care who they are - you need to follow sane ABI rules if you are pushing public shared libraries into distros. This is not negotiable. You can maybe avoid providing compat, but not the rest of it. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/17/2017 9:03 PM, Jason Gunthorpe wrote: > On Thu, Aug 17, 2017 at 08:39:32PM +0300, Leon Romanovsky wrote: >> On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote: >>> On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote: >>>> From: Leon Romanovsky <leonro@mellanox.com> >>>> >>>> The returned UAR pointer is actually void ** and points to whole >>>> UAR database. >>>> >>>> Because user is not supposed to access it and expected to use >>>> the CQ doorbell, we will return uar[0] (CQ doorbell) directly >>>> and eliminate the following logic in the applications: >>>> >>>> uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) + >>>> MLX5_CQ_DOORBELL; >>> >>> NAK, at least, as is.. >>> >>> This changes the ABI and mlx5dv is a public interface now, so you have >>> to do it properly. >>> >>> Looking at it, my advice is to rev the symbol version of >>> mlx5dv_init_obj and consider providing a compat symbol. >>> >>> Also, you need to change the name of the 'uar' field: >>> >>>> - cq_out->uar = mctx->uar; >>>> + cq_out->uar = mctx->uar[0]; >>> >>> To something else, to make it clear, which of the two ABIs the >>> application code is coded to - for a change like this, you *MUST* >>> force old apps to have a compile failure. >> >> I know exactly who coded on top of this interface. They are the ones who >> asked for this change and they are fully understand the implications, >> inability to work with this interface on "old" libraries. > > I don't care who they are - you need to follow sane ABI rules if you > are pushing public shared libraries into distros. This is not negotiable. > There is no ABI/API change but just a bug fix in mlx5 driver. The API talked about void* but by mistake the mlx5 code returned void**, this patch fixes that. We don't care about potential break for direct mlx5 applications in this arm_cq flow as code was buggy and couldn't be used properly. Patch was merged. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 21, 2017 at 12:06:57PM +0300, Yishai Hadas wrote: > On 8/17/2017 9:03 PM, Jason Gunthorpe wrote: > >On Thu, Aug 17, 2017 at 08:39:32PM +0300, Leon Romanovsky wrote: > >>On Thu, Aug 17, 2017 at 11:33:20AM -0600, Jason Gunthorpe wrote: > >>>On Thu, Aug 17, 2017 at 04:36:38PM +0300, Yishai Hadas wrote: > >>>>From: Leon Romanovsky <leonro@mellanox.com> > >>>> > >>>>The returned UAR pointer is actually void ** and points to whole > >>>>UAR database. > >>>> > >>>>Because user is not supposed to access it and expected to use > >>>>the CQ doorbell, we will return uar[0] (CQ doorbell) directly > >>>>and eliminate the following logic in the applications: > >>>> > >>>>uint64_t cq_db_reg = (uint64_t *)(((uint64_t)(uint64_t *)rxq->cq_uar) + > >>>> MLX5_CQ_DOORBELL; > >>> > >>>NAK, at least, as is.. > >>> > >>>This changes the ABI and mlx5dv is a public interface now, so you have > >>>to do it properly. > >>> > >>>Looking at it, my advice is to rev the symbol version of > >>>mlx5dv_init_obj and consider providing a compat symbol. > >>> > >>>Also, you need to change the name of the 'uar' field: > >>> > >>>>- cq_out->uar = mctx->uar; > >>>>+ cq_out->uar = mctx->uar[0]; > >>> > >>>To something else, to make it clear, which of the two ABIs the > >>>application code is coded to - for a change like this, you *MUST* > >>>force old apps to have a compile failure. > >> > >>I know exactly who coded on top of this interface. They are the ones who > >>asked for this change and they are fully understand the implications, > >>inability to work with this interface on "old" libraries. > > > >I don't care who they are - you need to follow sane ABI rules if you > >are pushing public shared libraries into distros. This is not negotiable. > > > > There is no ABI/API change but just a bug fix in mlx5 driver. > The API talked about void* but by mistake the mlx5 code returned void**, > this patch fixes that. The commit message is garbage then. Is there existing code out there that uses cq_out->uar and works properly today? Yes or No? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/21/2017 6:22 PM, Jason Gunthorpe wrote: > Is there existing code out there that uses cq_out->uar and works > properly today? Yes or No? No, only this fix enables that to work properly. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 22, 2017 at 11:36:42AM +0300, Yishai Hadas wrote: > On 8/21/2017 6:22 PM, Jason Gunthorpe wrote: > >Is there existing code out there that uses cq_out->uar and works > >properly today? Yes or No? > > No, only this fix enables that to work properly. This is still bad reasoning. rdma-core 14 does not support the 'uar' field, it may as well have been marked reserved. rdma-core 15 supports 'uar'. This is an ABI change, and need to be handled properly to protect the users against mixing incompatible libraries. The fact that rdma-core 14 intended to support uar but screwed it up doesn't really matter from the user perspective. All they care about is that an app using using uar needs to be build against >= 15 or it silently explodes at runtime. This sorts of issues are exactly what we are supposed to be using symbol versions to protect against, and largely why they exist at all. If you do it properly then tools like RPM and the dynanmic linker take care of everything for the end user. You don't get to ignore sane shared library versioning rules just because it is your company's library - part of the point of rdma-core is to have a much stronger stance on ABI issues (our community has done a terrible job over the years) and I do not like this weakening. Particularly since a patch to do this with proper compatibility exists, and there is no reason at all to take a shortcut. Doug? This needs to be decided before you mark rdma-core 15 final. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/22/2017 7:30 PM, Jason Gunthorpe wrote: > On Tue, Aug 22, 2017 at 11:36:42AM +0300, Yishai Hadas wrote: >> On 8/21/2017 6:22 PM, Jason Gunthorpe wrote: >>> Is there existing code out there that uses cq_out->uar and works >>> properly today? Yes or No? >> >> No, only this fix enables that to work properly. > Particularly since a patch to do this with proper compatibility > exists, and there is no reason at all to take a shortcut. > We are not looking for a shortcut but for a solution that will match majority users if not all. Changing the symbol version for this API will cause that applications that will be linked against this version won't be able to run at all on previous RH packages (e.g RH 7.3/7.4) even if this flow wasn't used at all as we expect. As pointed, from our knowledge there are no customers for that specific *rare* flow in mlx5 DV and we prefer to stay with current bug fix and same API version. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 23, 2017 at 03:13:38PM +0300, Yishai Hadas wrote: > On 8/22/2017 7:30 PM, Jason Gunthorpe wrote: > >On Tue, Aug 22, 2017 at 11:36:42AM +0300, Yishai Hadas wrote: > >>On 8/21/2017 6:22 PM, Jason Gunthorpe wrote: > >>>Is there existing code out there that uses cq_out->uar and works > >>>properly today? Yes or No? > >> > >>No, only this fix enables that to work properly. > > >Particularly since a patch to do this with proper compatibility > >exists, and there is no reason at all to take a shortcut. > > We are not looking for a shortcut but for a solution that will match > majority users if not all. It is a short cut. > Changing the symbol version for this API will cause that applications that > will be linked against this version won't be able to run at all on > previous Then provide the compat symbol too, as I said in the first place, easy to do. There is no excuse here. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/providers/mlx5/mlx5.c b/providers/mlx5/mlx5.c index ede91cb..c7a6efb 100644 --- a/providers/mlx5/mlx5.c +++ b/providers/mlx5/mlx5.c @@ -678,7 +678,7 @@ static int mlx5dv_get_cq(struct ibv_cq *cq_in, cq_out->cqe_size = mcq->cqe_sz; cq_out->buf = mcq->active_buf->buf; cq_out->dbrec = mcq->dbrec; - cq_out->uar = mctx->uar; + cq_out->uar = mctx->uar[0]; mcq->flags |= MLX5_CQ_FLAGS_DV_OWNED;