diff mbox

rds: rds_ib_device.refcount overflow

Message ID 1436164511-2411-1-git-send-email-wen.gang.wang@oracle.com (mailing list archive)
State Accepted
Headers show

Commit Message

Wengang Wang July 6, 2015, 6:35 a.m. UTC
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
---
 net/rds/ib_rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Wengang Wang July 13, 2015, 1:18 a.m. UTC | #1
Hi Doug,

How do you think about this patch?

thanks,
wengang

? 2015?07?06? 14:35, Wengang Wang ??:
> Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
>
> There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
> failed(mr pool running out). this lead to the refcount overflow.
>
> A complain in line 117(see following) is seen. From vmcore:
> s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
> That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
> to return ERR_PTR(-EAGAIN).
>
> 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
> 116 {
> 117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
> 118         if (atomic_dec_and_test(&rds_ibdev->refcount))
> 119                 queue_work(rds_wq, &rds_ibdev->free_work);
> 120 }
>
> fix is to drop refcount when rds_ib_alloc_fmr failed.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> Reviewed-by: Haggai Eran <haggaie@mellanox.com>
> ---
>   net/rds/ib_rdma.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
> index 273b8bf..657ba9f 100644
> --- a/net/rds/ib_rdma.c
> +++ b/net/rds/ib_rdma.c
> @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
>   	}
>   
>   	ibmr = rds_ib_alloc_fmr(rds_ibdev);
> -	if (IS_ERR(ibmr))
> +	if (IS_ERR(ibmr)) {
> +		rds_ib_dev_put(rds_ibdev);
>   		return ibmr;
> +	}
>   
>   	ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
>   	if (ret == 0)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford July 29, 2015, 2:36 p.m. UTC | #2
On 07/12/2015 09:18 PM, Wengang Wang wrote:
> Hi Doug,
> 
> How do you think about this patch?

Sorry, I picked this up already.  I must have missed sending out the
acknowledgment on this one.

> thanks,
> wengang
> 
> ? 2015?07?06? 14:35, Wengang Wang ??:
>> Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct
>> rds_ib_device")
>>
>> There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
>> failed(mr pool running out). this lead to the refcount overflow.
>>
>> A complain in line 117(see following) is seen. From vmcore:
>> s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is
>> -2147475448.
>> That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is
>> very likely
>> to return ERR_PTR(-EAGAIN).
>>
>> 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
>> 116 {
>> 117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
>> 118         if (atomic_dec_and_test(&rds_ibdev->refcount))
>> 119                 queue_work(rds_wq, &rds_ibdev->free_work);
>> 120 }
>>
>> fix is to drop refcount when rds_ib_alloc_fmr failed.
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> Reviewed-by: Haggai Eran <haggaie@mellanox.com>
>> ---
>>   net/rds/ib_rdma.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
>> index 273b8bf..657ba9f 100644
>> --- a/net/rds/ib_rdma.c
>> +++ b/net/rds/ib_rdma.c
>> @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg,
>> unsigned long nents,
>>       }
>>         ibmr = rds_ib_alloc_fmr(rds_ibdev);
>> -    if (IS_ERR(ibmr))
>> +    if (IS_ERR(ibmr)) {
>> +        rds_ib_dev_put(rds_ibdev);
>>           return ibmr;
>> +    }
>>         ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
>>       if (ret == 0)
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wengang Wang July 30, 2015, 5:35 a.m. UTC | #3
Doug,

No problem. I found the patch picked up.

thanks,
wengang

? 2015?07?29? 22:36, Doug Ledford ??:
> On 07/12/2015 09:18 PM, Wengang Wang wrote:
>> Hi Doug,
>>
>> How do you think about this patch?
> Sorry, I picked this up already.  I must have missed sending out the
> acknowledgment on this one.
>
>> thanks,
>> wengang
>>
>> ? 2015?07?06? 14:35, Wengang Wang ??:
>>> Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct
>>> rds_ib_device")
>>>
>>> There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
>>> failed(mr pool running out). this lead to the refcount overflow.
>>>
>>> A complain in line 117(see following) is seen. From vmcore:
>>> s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is
>>> -2147475448.
>>> That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is
>>> very likely
>>> to return ERR_PTR(-EAGAIN).
>>>
>>> 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
>>> 116 {
>>> 117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
>>> 118         if (atomic_dec_and_test(&rds_ibdev->refcount))
>>> 119                 queue_work(rds_wq, &rds_ibdev->free_work);
>>> 120 }
>>>
>>> fix is to drop refcount when rds_ib_alloc_fmr failed.
>>>
>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>> Reviewed-by: Haggai Eran <haggaie@mellanox.com>
>>> ---
>>>    net/rds/ib_rdma.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
>>> index 273b8bf..657ba9f 100644
>>> --- a/net/rds/ib_rdma.c
>>> +++ b/net/rds/ib_rdma.c
>>> @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg,
>>> unsigned long nents,
>>>        }
>>>          ibmr = rds_ib_alloc_fmr(rds_ibdev);
>>> -    if (IS_ERR(ibmr))
>>> +    if (IS_ERR(ibmr)) {
>>> +        rds_ib_dev_put(rds_ibdev);
>>>            return ibmr;
>>> +    }
>>>          ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
>>>        if (ret == 0)
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 273b8bf..657ba9f 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -759,8 +759,10 @@  void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 	}
 
 	ibmr = rds_ib_alloc_fmr(rds_ibdev);
-	if (IS_ERR(ibmr))
+	if (IS_ERR(ibmr)) {
+		rds_ib_dev_put(rds_ibdev);
 		return ibmr;
+	}
 
 	ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
 	if (ret == 0)