Message ID | 1435121680-11491-1-git-send-email-wen.gang.wang@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Hi Doug, Could you please review this patch? thanks, wengang ? 2015?06?24? 12:54, Wengang Wang ??: > There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr > failed(mr pool running out). this lead to the refcount overflow. > > A complain in line 117(see following) is seen. From vmcore: > s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448. > That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely > to return ERR_PTR(-EAGAIN). > > 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) > 116 { > 117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0); > 118 if (atomic_dec_and_test(&rds_ibdev->refcount)) > 119 queue_work(rds_wq, &rds_ibdev->free_work); > 120 } > > fix is to drop refcount when rds_ib_alloc_fmr failed. > > Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> > --- > net/rds/ib_rdma.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c > index 273b8bf..657ba9f 100644 > --- a/net/rds/ib_rdma.c > +++ b/net/rds/ib_rdma.c > @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, > } > > ibmr = rds_ib_alloc_fmr(rds_ibdev); > - if (IS_ERR(ibmr)) > + if (IS_ERR(ibmr)) { > + rds_ib_dev_put(rds_ibdev); > return ibmr; > + } > > ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents); > if (ret == 0) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24/06/2015 07:54, Wengang Wang wrote: > There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr > failed(mr pool running out). this lead to the refcount overflow. > > A complain in line 117(see following) is seen. From vmcore: > s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448. > That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely > to return ERR_PTR(-EAGAIN). > > 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) > 116 { > 117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0); > 118 if (atomic_dec_and_test(&rds_ibdev->refcount)) > 119 queue_work(rds_wq, &rds_ibdev->free_work); > 120 } > > fix is to drop refcount when rds_ib_alloc_fmr failed. > > Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> > --- > net/rds/ib_rdma.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c > index 273b8bf..657ba9f 100644 > --- a/net/rds/ib_rdma.c > +++ b/net/rds/ib_rdma.c > @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, > } > > ibmr = rds_ib_alloc_fmr(rds_ibdev); > - if (IS_ERR(ibmr)) > + if (IS_ERR(ibmr)) { > + rds_ib_dev_put(rds_ibdev); > return ibmr; > + } > > ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents); > if (ret == 0) > It seems like the function indeed is missing a put on the rds_ibdev in that case. Reviewed-by: Haggai Eran <haggaie@mellanox.com> You may also want to add: Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device") -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Haggai, Thanks for review! I will add the message you suggested and re-post. thanks, wengang ? 2015?07?06? 14:18, Haggai Eran ??: > On 24/06/2015 07:54, Wengang Wang wrote: >> There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr >> failed(mr pool running out). this lead to the refcount overflow. >> >> A complain in line 117(see following) is seen. From vmcore: >> s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448. >> That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely >> to return ERR_PTR(-EAGAIN). >> >> 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) >> 116 { >> 117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0); >> 118 if (atomic_dec_and_test(&rds_ibdev->refcount)) >> 119 queue_work(rds_wq, &rds_ibdev->free_work); >> 120 } >> >> fix is to drop refcount when rds_ib_alloc_fmr failed. >> >> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> >> --- >> net/rds/ib_rdma.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c >> index 273b8bf..657ba9f 100644 >> --- a/net/rds/ib_rdma.c >> +++ b/net/rds/ib_rdma.c >> @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, >> } >> >> ibmr = rds_ib_alloc_fmr(rds_ibdev); >> - if (IS_ERR(ibmr)) >> + if (IS_ERR(ibmr)) { >> + rds_ib_dev_put(rds_ibdev); >> return ibmr; >> + } >> >> ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents); >> if (ret == 0) >> > It seems like the function indeed is missing a put on the rds_ibdev in > that case. > > Reviewed-by: Haggai Eran <haggaie@mellanox.com> > > You may also want to add: > Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct > rds_ib_device") -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c index 273b8bf..657ba9f 100644 --- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, } ibmr = rds_ib_alloc_fmr(rds_ibdev); - if (IS_ERR(ibmr)) + if (IS_ERR(ibmr)) { + rds_ib_dev_put(rds_ibdev); return ibmr; + } ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents); if (ret == 0)
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr failed(mr pool running out). this lead to the refcount overflow. A complain in line 117(see following) is seen. From vmcore: s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448. That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely to return ERR_PTR(-EAGAIN). 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) 116 { 117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0); 118 if (atomic_dec_and_test(&rds_ibdev->refcount)) 119 queue_work(rds_wq, &rds_ibdev->free_work); 120 } fix is to drop refcount when rds_ib_alloc_fmr failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> --- net/rds/ib_rdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)