diff mbox

[10/10] IB: remove the unused usecnt field from struct ib_mr

Message ID 1450446906-10336-11-git-send-email-hch@lst.de (mailing list archive)
State Superseded
Headers show

Commit Message

Christoph Hellwig Dec. 18, 2015, 1:55 p.m. UTC
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/uverbs_cmd.c        | 6 ------
 drivers/infiniband/core/verbs.c             | 8 +-------
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 3 ---
 drivers/infiniband/hw/cxgb4/mem.c           | 3 ---
 drivers/staging/rdma/ehca/ehca_mrmw.c       | 1 -
 include/rdma/ib_verbs.h                     | 1 -
 6 files changed, 1 insertion(+), 21 deletions(-)

Comments

Bart Van Assche Dec. 18, 2015, 2:14 p.m. UTC | #1
On 12/18/2015 02:55 PM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Shouldn't the description of this patch be changed into something like 
"Remove the usecnt field from ib_mr since it is always zero" ?

Anyway:

Reviewed-by: Bart Van Assche <bvanassche@sandisk.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 20, 2015, 7:16 a.m. UTC | #2
On 12/18/2015 4:14 PM, Bart Van Assche wrote:
> On 12/18/2015 02:55 PM, Christoph Hellwig wrote:
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Shouldn't the description of this patch be changed into something like 
> "Remove the usecnt field from ib_mr since it is always zero" ?


Agree.

I would like us to avoid empty changes log for IB core patches and 
actually all over the rdma subsystem.

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 20, 2015, 7:25 a.m. UTC | #3
On 12/18/2015 3:55 PM, Christoph Hellwig wrote:
>   
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 284916d..e45776e 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1306,7 +1306,6 @@ struct ib_mr {
>   	u64		   iova;
>   	u32		   length;
>   	unsigned int	   page_size;
> -	atomic_t	   usecnt; /* count number of MWs */
>   };

This comment is part of Roland's uverbs commit.

I wonder if LL driver supporting the  IB_WR_BIND_MW op
ref the MR on port send and deref it on completion?

Or.


>   
>   struct ib_mw {
> -- 1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 20, 2015, 7:36 a.m. UTC | #4
On 12/20/2015 9:25 AM, Or Gerlitz wrote:
> On 12/18/2015 3:55 PM, Christoph Hellwig wrote:
>>   diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index 284916d..e45776e 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1306,7 +1306,6 @@ struct ib_mr {
>>       u64           iova;
>>       u32           length;
>>       unsigned int       page_size;
>> -    atomic_t       usecnt; /* count number of MWs */
>>   };
>
> This comment is part of Roland's uverbs commit. 

I saw now that you removed in-kernel support for MWs as a downstream patch
of this series. I guess this cleanup needs to go there as the refcount 
field of
kernel MRs relates to MWs. This will also help someone coming in the future
and returning the in-kernel MW support to avoid forgetting on doing the 
refs.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Dec. 20, 2015, 9:31 a.m. UTC | #5
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index 284916d..e45776e 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1306,7 +1306,6 @@ struct ib_mr {
>>       u64           iova;
>>       u32           length;
>>       unsigned int       page_size;
>> -    atomic_t       usecnt; /* count number of MWs */
>>   };
>
> This comment is part of Roland's uverbs commit.
>
> I wonder if LL driver supporting the  IB_WR_BIND_MW op
> ref the MR on port send and deref it on completion?

I don't see why any driver would ref the underlying MR at
post_send(IB_WR_BIND_MW) time. It doesn't really make sense.

However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs:
Implement memory windows support in uverbs") is wrong. A memory
window allocation should really reference the MR and not the PD (which
is referenced by the MR). Otherwise the MR deregistration is allowed
with windows bound to it. If this is the case then this field is not
useless and we should fix ib_uverbs_alloc_mw()?

Haggai, can you shed some light here? It's Shani's and your code.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 20, 2015, 11:27 a.m. UTC | #6
On 12/20/2015 11:31 AM, Sagi Grimberg wrote:
> However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs:
> Implement memory windows support in uverbs") is wrong. A memory
> window allocation should really reference the MR and not the PD (which
> is referenced by the MR). Otherwise the MR deregistration is allowed
> with windows bound to it. If this is the case then this field is not
> useless and we should fix ib_uverbs_alloc_mw()?

I tend to agree, lets not remove the field this is clarified with Haggai 
and Co

>
> Haggai, can you shed some light here? It's Shani's and your code. 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 20, 2015, 11:28 a.m. UTC | #7
On 12/20/2015 11:31 AM, Sagi Grimberg wrote:
> However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs:
> Implement memory windows support in uverbs") is wrong. A memory
> window allocation should really reference the MR and not the PD (which
> is referenced by the MR). Otherwise the MR deregistration is allowed
> with windows bound to it. If this is the case then this field is not
> useless and we should fix ib_uverbs_alloc_mw()?

I tend to agree, lets not remove the field till this is clarified with 
Haggai and Co

>
> Haggai, can you shed some light here? It's Shani's and your code. 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran Dec. 20, 2015, 11:33 a.m. UTC | #8
On 20/12/2015 11:31, Sagi Grimberg wrote:
> 
>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>> index 284916d..e45776e 100644
>>> --- a/include/rdma/ib_verbs.h
>>> +++ b/include/rdma/ib_verbs.h
>>> @@ -1306,7 +1306,6 @@ struct ib_mr {
>>>       u64           iova;
>>>       u32           length;
>>>       unsigned int       page_size;
>>> -    atomic_t       usecnt; /* count number of MWs */
>>>   };
>>
>> This comment is part of Roland's uverbs commit.
>>
>> I wonder if LL driver supporting the  IB_WR_BIND_MW op
>> ref the MR on port send and deref it on completion?
> 
> I don't see why any driver would ref the underlying MR at
> post_send(IB_WR_BIND_MW) time. It doesn't really make sense.
> 
> However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs:
> Implement memory windows support in uverbs") is wrong. A memory
> window allocation should really reference the MR and not the PD (which
> is referenced by the MR). Otherwise the MR deregistration is allowed
> with windows bound to it. If this is the case then this field is not
> useless and we should fix ib_uverbs_alloc_mw()?
> 
> Haggai, can you shed some light here? It's Shani's and your code.
A memory window takes reference on its PD during creation. It can only
be used by that PD, and it isn't bound to any memory region yet. You 
are right that we want to prevent deregistration of an MR when there 
are memory windows bound to it. This is suggested (although not 
required) by the IBA specifications (section 10.6.7.2.6 Deregistering 
Regions with Bound Windows). The problem with the usecnt field in ib_mr 
as you wrote is that it needs to reflect what happens at bind / 
invalidate time, but bind can happen in user-space without the kernel 
knowing, and invalidation can similarly occur due to a 
send-with-invalidate message being received. This is why in the 
ConnectX-3 implementation of memory windows we didn't use the 
ib_mr.usecnt field, but relied on a hardware implementation of the 
reference count.

As I said, the spec also allows a device to proceed with deregistering 
an MR, leaving an orphan memory window. I don't think we can prevent 
this in the kernel, and I'm not sure it would be a good idea anyway, so 
I think it is okay to remove the use_cnt field.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Dec. 20, 2015, 11:42 a.m. UTC | #9
>> However, I think that the patch from Shani 6b52a12bc3fc ("IB/uverbs:
>> Implement memory windows support in uverbs") is wrong. A memory
>> window allocation should really reference the MR and not the PD (which
>> is referenced by the MR). Otherwise the MR deregistration is allowed
>> with windows bound to it. If this is the case then this field is not
>> useless and we should fix ib_uverbs_alloc_mw()?
>>
>> Haggai, can you shed some light here? It's Shani's and your code.
> A memory window takes reference on its PD during creation. It can only
> be used by that PD, and it isn't bound to any memory region yet.

Right! it's still not bound. I mixed up the details. Thanks for
reminding me.

> You are right that we want to prevent deregistration of an MR when there
> are memory windows bound to it. This is suggested (although not
> required) by the IBA specifications (section 10.6.7.2.6 Deregistering
> Regions with Bound Windows). The problem with the usecnt field in ib_mr
> as you wrote is that it needs to reflect what happens at bind /
> invalidate time, but bind can happen in user-space without the kernel
> knowing, and invalidation can similarly occur due to a
> send-with-invalidate message being received.

That's true. I didn't see user-space do that either but if it's not
required it's probably a good idea not to complicate things.

> This is why in the ConnectX-3 implementation of memory windows we didn't use the
> ib_mr.usecnt field, but relied on a hardware implementation of the
> reference count.
>
> As I said, the spec also allows a device to proceed with deregistering
> an MR, leaving an orphan memory window. I don't think we can prevent
> this in the kernel, and I'm not sure it would be a good idea anyway, so
> I think it is okay to remove the use_cnt field.

I completely agree. we don't have any way to maintain mw<->mr reference
so let's drop it.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Dec. 20, 2015, 3:42 p.m. UTC | #10
On Fri, Dec 18, 2015 at 03:14:08PM +0100, Bart Van Assche wrote:
> On 12/18/2015 02:55 PM, Christoph Hellwig wrote:
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Shouldn't the description of this patch be changed into something like 
> "Remove the usecnt field from ib_mr since it is always zero" ?

In software context unused for my also includes something that's
only written to.  But I can change it to something like:

IB: remove the write only usecnt field from struct ib_mr

to be a little more precise.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 48776bb..57be54e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -992,7 +992,6 @@  ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	mr->pd      = pd;
 	mr->uobject = uobj;
 	atomic_inc(&pd->usecnt);
-	atomic_set(&mr->usecnt, 0);
 
 	uobj->object = mr;
 	ret = idr_add_uobj(&ib_uverbs_mr_idr, uobj);
@@ -1090,11 +1089,6 @@  ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
 		}
 	}
 
-	if (atomic_read(&mr->usecnt)) {
-		ret = -EBUSY;
-		goto put_uobj_pd;
-	}
-
 	old_pd = mr->pd;
 	ret = mr->device->rereg_user_mr(mr, cmd.flags, cmd.start,
 					cmd.length, cmd.hca_va,
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2858b35..cce080f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1209,7 +1209,6 @@  struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags)
 		mr->pd      = pd;
 		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
-		atomic_set(&mr->usecnt, 0);
 	}
 
 	return mr;
@@ -1218,13 +1217,9 @@  EXPORT_SYMBOL(ib_get_dma_mr);
 
 int ib_dereg_mr(struct ib_mr *mr)
 {
-	struct ib_pd *pd;
+	struct ib_pd *pd = mr->pd;
 	int ret;
 
-	if (atomic_read(&mr->usecnt))
-		return -EBUSY;
-
-	pd = mr->pd;
 	ret = mr->device->dereg_mr(mr);
 	if (!ret)
 		atomic_dec(&pd->usecnt);
@@ -1260,7 +1255,6 @@  struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
 		mr->pd      = pd;
 		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
-		atomic_set(&mr->usecnt, 0);
 	}
 
 	return mr;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 097eb93..f806289 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -458,9 +458,6 @@  static int iwch_dereg_mr(struct ib_mr *ib_mr)
 	u32 mmid;
 
 	PDBG("%s ib_mr %p\n", __func__, ib_mr);
-	/* There can be no memory windows */
-	if (atomic_read(&ib_mr->usecnt))
-		return -EINVAL;
 
 	mhp = to_iwch_mr(ib_mr);
 	kfree(mhp->pages);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 1eb833a..7849890 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -704,9 +704,6 @@  int c4iw_dereg_mr(struct ib_mr *ib_mr)
 	u32 mmid;
 
 	PDBG("%s ib_mr %p\n", __func__, ib_mr);
-	/* There can be no memory windows */
-	if (atomic_read(&ib_mr->usecnt))
-		return -EINVAL;
 
 	mhp = to_c4iw_mr(ib_mr);
 	rhp = mhp->rhp;
diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c b/drivers/staging/rdma/ehca/ehca_mrmw.c
index 1814af7..06b832b 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -1339,7 +1339,6 @@  int ehca_reg_internal_maxmr(
 	e_mr->ib.ib_mr.pd = &e_pd->ib_pd;
 	e_mr->ib.ib_mr.uobject = NULL;
 	atomic_inc(&(e_pd->ib_pd.usecnt));
-	atomic_set(&(e_mr->ib.ib_mr.usecnt), 0);
 	*e_maxmr = e_mr;
 	return 0;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 284916d..e45776e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1306,7 +1306,6 @@  struct ib_mr {
 	u64		   iova;
 	u32		   length;
 	unsigned int	   page_size;
-	atomic_t	   usecnt; /* count number of MWs */
 };
 
 struct ib_mw {