Message ID | 1450446906-10336-11-git-send-email-hch@lst.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On 12/18/2015 02:55 PM, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig <hch@lst.de> Shouldn't the description of this patch be changed into something like "Remove the usecnt field from ib_mr since it is always zero" ? Anyway: Reviewed-by: Bart Van Assche <bvanassche@sandisk.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/18/2015 4:14 PM, Bart Van Assche wrote: > On 12/18/2015 02:55 PM, Christoph Hellwig wrote: >> Signed-off-by: Christoph Hellwig <hch@lst.de> > > Shouldn't the description of this patch be changed into something like > "Remove the usecnt field from ib_mr since it is always zero" ? Agree. I would like us to avoid empty changes log for IB core patches and actually all over the rdma subsystem. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/18/2015 3:55 PM, Christoph Hellwig wrote: > > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index 284916d..e45776e 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -1306,7 +1306,6 @@ struct ib_mr { > u64 iova; > u32 length; > unsigned int page_size; > - atomic_t usecnt; /* count number of MWs */ > }; This comment is part of Roland's uverbs commit. I wonder if LL driver supporting the IB_WR_BIND_MW op ref the MR on port send and deref it on completion? Or. > > struct ib_mw { > -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/20/2015 9:25 AM, Or Gerlitz wrote: > On 12/18/2015 3:55 PM, Christoph Hellwig wrote: >> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h >> index 284916d..e45776e 100644 >> --- a/include/rdma/ib_verbs.h >> +++ b/include/rdma/ib_verbs.h >> @@ -1306,7 +1306,6 @@ struct ib_mr { >> u64 iova; >> u32 length; >> unsigned int page_size; >> - atomic_t usecnt; /* count number of MWs */ >> }; > > This comment is part of Roland's uverbs commit. I saw now that you removed in-kernel support for MWs as a downstream patch of this series. I guess this cleanup needs to go there as the refcount field of kernel MRs relates to MWs. This will also help someone coming in the future and returning the in-kernel MW support to avoid forgetting on doing the refs. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h >> index 284916d..e45776e 100644 >> --- a/include/rdma/ib_verbs.h >> +++ b/include/rdma/ib_verbs.h >> @@ -1306,7 +1306,6 @@ struct ib_mr { >> u64 iova; >> u32 length; >> unsigned int page_size; >> - atomic_t usecnt; /* count number of MWs */ >> }; > > This comment is part of Roland's uverbs commit. > > I wonder if LL driver supporting the IB_WR_BIND_MW op > ref the MR on port send and deref it on completion? I don't see why any driver would ref the underlying MR at post_send(IB_WR_BIND_MW) time. It doesn't really make sense. However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs: Implement memory windows support in uverbs") is wrong. A memory window allocation should really reference the MR and not the PD (which is referenced by the MR). Otherwise the MR deregistration is allowed with windows bound to it. If this is the case then this field is not useless and we should fix ib_uverbs_alloc_mw()? Haggai, can you shed some light here? It's Shani's and your code. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/20/2015 11:31 AM, Sagi Grimberg wrote: > However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs: > Implement memory windows support in uverbs") is wrong. A memory > window allocation should really reference the MR and not the PD (which > is referenced by the MR). Otherwise the MR deregistration is allowed > with windows bound to it. If this is the case then this field is not > useless and we should fix ib_uverbs_alloc_mw()? I tend to agree, lets not remove the field this is clarified with Haggai and Co > > Haggai, can you shed some light here? It's Shani's and your code. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/20/2015 11:31 AM, Sagi Grimberg wrote: > However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs: > Implement memory windows support in uverbs") is wrong. A memory > window allocation should really reference the MR and not the PD (which > is referenced by the MR). Otherwise the MR deregistration is allowed > with windows bound to it. If this is the case then this field is not > useless and we should fix ib_uverbs_alloc_mw()? I tend to agree, lets not remove the field till this is clarified with Haggai and Co > > Haggai, can you shed some light here? It's Shani's and your code. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20/12/2015 11:31, Sagi Grimberg wrote: > >>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h >>> index 284916d..e45776e 100644 >>> --- a/include/rdma/ib_verbs.h >>> +++ b/include/rdma/ib_verbs.h >>> @@ -1306,7 +1306,6 @@ struct ib_mr { >>> u64 iova; >>> u32 length; >>> unsigned int page_size; >>> - atomic_t usecnt; /* count number of MWs */ >>> }; >> >> This comment is part of Roland's uverbs commit. >> >> I wonder if LL driver supporting the IB_WR_BIND_MW op >> ref the MR on port send and deref it on completion? > > I don't see why any driver would ref the underlying MR at > post_send(IB_WR_BIND_MW) time. It doesn't really make sense. > > However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs: > Implement memory windows support in uverbs") is wrong. A memory > window allocation should really reference the MR and not the PD (which > is referenced by the MR). Otherwise the MR deregistration is allowed > with windows bound to it. If this is the case then this field is not > useless and we should fix ib_uverbs_alloc_mw()? > > Haggai, can you shed some light here? It's Shani's and your code. A memory window takes reference on its PD during creation. It can only be used by that PD, and it isn't bound to any memory region yet. You are right that we want to prevent deregistration of an MR when there are memory windows bound to it. This is suggested (although not required) by the IBA specifications (section 10.6.7.2.6 Deregistering Regions with Bound Windows). The problem with the usecnt field in ib_mr as you wrote is that it needs to reflect what happens at bind / invalidate time, but bind can happen in user-space without the kernel knowing, and invalidation can similarly occur due to a send-with-invalidate message being received. This is why in the ConnectX-3 implementation of memory windows we didn't use the ib_mr.usecnt field, but relied on a hardware implementation of the reference count. As I said, the spec also allows a device to proceed with deregistering an MR, leaving an orphan memory window. I don't think we can prevent this in the kernel, and I'm not sure it would be a good idea anyway, so I think it is okay to remove the use_cnt field. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs: >> Implement memory windows support in uverbs") is wrong. A memory >> window allocation should really reference the MR and not the PD (which >> is referenced by the MR). Otherwise the MR deregistration is allowed >> with windows bound to it. If this is the case then this field is not >> useless and we should fix ib_uverbs_alloc_mw()? >> >> Haggai, can you shed some light here? It's Shani's and your code. > A memory window takes reference on its PD during creation. It can only > be used by that PD, and it isn't bound to any memory region yet. Right! it's still not bound. I mixed up the details. Thanks for reminding me. > You are right that we want to prevent deregistration of an MR when there > are memory windows bound to it. This is suggested (although not > required) by the IBA specifications (section 10.6.7.2.6 Deregistering > Regions with Bound Windows). The problem with the usecnt field in ib_mr > as you wrote is that it needs to reflect what happens at bind / > invalidate time, but bind can happen in user-space without the kernel > knowing, and invalidation can similarly occur due to a > send-with-invalidate message being received. That's true. I didn't see user-space do that either but if it's not required it's probably a good idea not to complicate things. > This is why in the ConnectX-3 implementation of memory windows we didn't use the > ib_mr.usecnt field, but relied on a hardware implementation of the > reference count. > > As I said, the spec also allows a device to proceed with deregistering > an MR, leaving an orphan memory window. I don't think we can prevent > this in the kernel, and I'm not sure it would be a good idea anyway, so > I think it is okay to remove the use_cnt field. I completely agree. we don't have any way to maintain mw<->mr reference so let's drop it. Reviewed-by: Sagi Grimberg <sagig@mellanox.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 18, 2015 at 03:14:08PM +0100, Bart Van Assche wrote: > On 12/18/2015 02:55 PM, Christoph Hellwig wrote: >> Signed-off-by: Christoph Hellwig <hch@lst.de> > > Shouldn't the description of this patch be changed into something like > "Remove the usecnt field from ib_mr since it is always zero" ? In software context unused for my also includes something that's only written to. But I can change it to something like: IB: remove the write only usecnt field from struct ib_mr to be a little more precise. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 48776bb..57be54e 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -992,7 +992,6 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file, mr->pd = pd; mr->uobject = uobj; atomic_inc(&pd->usecnt); - atomic_set(&mr->usecnt, 0); uobj->object = mr; ret = idr_add_uobj(&ib_uverbs_mr_idr, uobj); @@ -1090,11 +1089,6 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file, } } - if (atomic_read(&mr->usecnt)) { - ret = -EBUSY; - goto put_uobj_pd; - } - old_pd = mr->pd; ret = mr->device->rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length, cmd.hca_va, diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 2858b35..cce080f 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1209,7 +1209,6 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) mr->pd = pd; mr->uobject = NULL; atomic_inc(&pd->usecnt); - atomic_set(&mr->usecnt, 0); } return mr; @@ -1218,13 +1217,9 @@ EXPORT_SYMBOL(ib_get_dma_mr); int ib_dereg_mr(struct ib_mr *mr) { - struct ib_pd *pd; + struct ib_pd *pd = mr->pd; int ret; - if (atomic_read(&mr->usecnt)) - return -EBUSY; - - pd = mr->pd; ret = mr->device->dereg_mr(mr); if (!ret) atomic_dec(&pd->usecnt); @@ -1260,7 +1255,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd, mr->pd = pd; mr->uobject = NULL; atomic_inc(&pd->usecnt); - atomic_set(&mr->usecnt, 0); } return mr; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 097eb93..f806289 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -458,9 +458,6 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr) u32 mmid; PDBG("%s ib_mr %p\n", __func__, ib_mr); - /* There can be no memory windows */ - if (atomic_read(&ib_mr->usecnt)) - return -EINVAL; mhp = to_iwch_mr(ib_mr); kfree(mhp->pages); diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 1eb833a..7849890 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -704,9 +704,6 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr) u32 mmid; PDBG("%s ib_mr %p\n", __func__, ib_mr); - /* There can be no memory windows */ - if (atomic_read(&ib_mr->usecnt)) - return -EINVAL; mhp = to_c4iw_mr(ib_mr); rhp = mhp->rhp; diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c b/drivers/staging/rdma/ehca/ehca_mrmw.c index 1814af7..06b832b 100644 --- a/drivers/staging/rdma/ehca/ehca_mrmw.c +++ b/drivers/staging/rdma/ehca/ehca_mrmw.c @@ -1339,7 +1339,6 @@ int ehca_reg_internal_maxmr( e_mr->ib.ib_mr.pd = &e_pd->ib_pd; e_mr->ib.ib_mr.uobject = NULL; atomic_inc(&(e_pd->ib_pd.usecnt)); - atomic_set(&(e_mr->ib.ib_mr.usecnt), 0); *e_maxmr = e_mr; return 0; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 284916d..e45776e 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1306,7 +1306,6 @@ struct ib_mr { u64 iova; u32 length; unsigned int page_size; - atomic_t usecnt; /* count number of MWs */ }; struct ib_mw {
Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/infiniband/core/uverbs_cmd.c | 6 ------ drivers/infiniband/core/verbs.c | 8 +------- drivers/infiniband/hw/cxgb3/iwch_provider.c | 3 --- drivers/infiniband/hw/cxgb4/mem.c | 3 --- drivers/staging/rdma/ehca/ehca_mrmw.c | 1 - include/rdma/ib_verbs.h | 1 - 6 files changed, 1 insertion(+), 21 deletions(-)