[v3,1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
diff mbox series

Message ID 158152394998.433502.5623790463334839091.stgit@morisot.1015granger.net
State New
Headers show
Series
  • Fix NFS/RDMA operation with Ryzen IOMMU
Related show

Commit Message

Chuck Lever Feb. 12, 2020, 4:12 p.m. UTC
The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.

The bug was exposed by recent changes to the AMD IOMMU driver, which
enabled sg entry concatenation.

Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
new memory registration API") and reviewing other kernel ULPs, it's
not clear that the frwr_map() logic was ever correct for this case.

Reported-by: Andre Tomt <andre@tomt.net>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v5.5
---
 net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Jason Gunthorpe Feb. 12, 2020, 6:26 p.m. UTC | #1
On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> The @nents value that was passed to ib_dma_map_sg() has to be passed
> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> concatenate sg entries, it will return a different nents value than
> it was passed.
> 
> The bug was exposed by recent changes to the AMD IOMMU driver, which
> enabled sg entry concatenation.
> 
> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> new memory registration API") and reviewing other kernel ULPs, it's
> not clear that the frwr_map() logic was ever correct for this case.
> 
> Reported-by: Andre Tomt <andre@tomt.net>
> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Cc: stable@vger.kernel.org # v5.5
> ---
>  net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)

Yep

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason
Chuck Lever Feb. 12, 2020, 6:38 p.m. UTC | #2
> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>> concatenate sg entries, it will return a different nents value than
>> it was passed.
>> 
>> The bug was exposed by recent changes to the AMD IOMMU driver, which
>> enabled sg entry concatenation.
>> 
>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
>> new memory registration API") and reviewing other kernel ULPs, it's
>> not clear that the frwr_map() logic was ever correct for this case.
>> 
>> Reported-by: Andre Tomt <andre@tomt.net>
>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Cc: stable@vger.kernel.org # v5.5
>> ---
>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>> 1 file changed, 7 insertions(+), 6 deletions(-)
> 
> Yep
> 
> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Thanks.

Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".

Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")

--
Chuck Lever
Jason Gunthorpe Feb. 12, 2020, 7:05 p.m. UTC | #3
On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
> 
> > On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> >> The @nents value that was passed to ib_dma_map_sg() has to be passed
> >> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> >> concatenate sg entries, it will return a different nents value than
> >> it was passed.
> >> 
> >> The bug was exposed by recent changes to the AMD IOMMU driver, which
> >> enabled sg entry concatenation.
> >> 
> >> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> >> new memory registration API") and reviewing other kernel ULPs, it's
> >> not clear that the frwr_map() logic was ever correct for this case.
> >> 
> >> Reported-by: Andre Tomt <andre@tomt.net>
> >> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> Cc: stable@vger.kernel.org # v5.5
> >> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
> >> 1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > Yep
> > 
> > Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> 
> Thanks.
> 
> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
> 
> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")

Not really, this was broken for other configurations besides AMD

Jason
Chuck Lever Feb. 12, 2020, 7:09 p.m. UTC | #4
> On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
>> 
>>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
>>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>>>> concatenate sg entries, it will return a different nents value than
>>>> it was passed.
>>>> 
>>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
>>>> enabled sg entry concatenation.
>>>> 
>>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
>>>> new memory registration API") and reviewing other kernel ULPs, it's
>>>> not clear that the frwr_map() logic was ever correct for this case.
>>>> 
>>>> Reported-by: Andre Tomt <andre@tomt.net>
>>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> Cc: stable@vger.kernel.org # v5.5
>>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>>>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>> 
>>> Yep
>>> 
>>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>> 
>> Thanks.
>> 
>> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
>> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
>> 
>> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
> 
> Not really, this was broken for other configurations besides AMD

Agreed, but the bug seems to have been inconsequential until now?

Otherwise we should explore backporting farther into the past.


--
Chuck Lever
Jason Gunthorpe Feb. 12, 2020, 7:30 p.m. UTC | #5
On Wed, Feb 12, 2020 at 02:09:03PM -0500, Chuck Lever wrote:
> 
> 
> > On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
> >> 
> >>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>> 
> >>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> >>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
> >>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> >>>> concatenate sg entries, it will return a different nents value than
> >>>> it was passed.
> >>>> 
> >>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
> >>>> enabled sg entry concatenation.
> >>>> 
> >>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> >>>> new memory registration API") and reviewing other kernel ULPs, it's
> >>>> not clear that the frwr_map() logic was ever correct for this case.
> >>>> 
> >>>> Reported-by: Andre Tomt <andre@tomt.net>
> >>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>> Cc: stable@vger.kernel.org # v5.5
> >>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
> >>>> 1 file changed, 7 insertions(+), 6 deletions(-)
> >>> 
> >>> Yep
> >>> 
> >>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> >> 
> >> Thanks.
> >> 
> >> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
> >> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
> >> 
> >> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
> > 
> > Not really, this was broken for other configurations besides AMD
> 
> Agreed, but the bug seems to have been inconsequential until now?

I imagine it would get you on ARM or other archs, IIRC.

Jason
Chuck Lever Feb. 13, 2020, 2:33 p.m. UTC | #6
> On Feb 12, 2020, at 2:30 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, Feb 12, 2020 at 02:09:03PM -0500, Chuck Lever wrote:
>> 
>> 
>>> On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
>>>> 
>>>>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>> 
>>>>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
>>>>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>>>>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>>>>>> concatenate sg entries, it will return a different nents value than
>>>>>> it was passed.
>>>>>> 
>>>>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
>>>>>> enabled sg entry concatenation.
>>>>>> 
>>>>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
>>>>>> new memory registration API") and reviewing other kernel ULPs, it's
>>>>>> not clear that the frwr_map() logic was ever correct for this case.
>>>>>> 
>>>>>> Reported-by: Andre Tomt <andre@tomt.net>
>>>>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>> Cc: stable@vger.kernel.org # v5.5
>>>>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>>>>>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>>>> 
>>>>> Yep
>>>>> 
>>>>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>>>> 
>>>> Thanks.
>>>> 
>>>> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
>>>> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
>>>> 
>>>> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
>>> 
>>> Not really, this was broken for other configurations besides AMD
>> 
>> Agreed, but the bug seems to have been inconsequential until now?
> 
> I imagine it would get you on ARM or other archs, IIRC.

That's certainly plausible, but I haven't received explicit bug reports
in this area. (I'm not at all saying that such bugs categorically do
not exist).

In any event, practical matters: the posted patch applies back to v5.4,
but fails to apply starting with v5.3.

I think we can leave the "Cc: stable # v5.5"; and I'm open to requests
to backport this simple fix onto earlier stable kernels (back to v4.4),
which can be handled case-by-case. 'Salright?

--
Chuck Lever
Jason Gunthorpe Feb. 13, 2020, 2:56 p.m. UTC | #7
On Thu, Feb 13, 2020 at 09:33:23AM -0500, Chuck Lever wrote:
> 
> 
> > On Feb 12, 2020, at 2:30 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, Feb 12, 2020 at 02:09:03PM -0500, Chuck Lever wrote:
> >> 
> >> 
> >>> On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>> 
> >>> On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
> >>>> 
> >>>>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>> 
> >>>>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> >>>>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
> >>>>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> >>>>>> concatenate sg entries, it will return a different nents value than
> >>>>>> it was passed.
> >>>>>> 
> >>>>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
> >>>>>> enabled sg entry concatenation.
> >>>>>> 
> >>>>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> >>>>>> new memory registration API") and reviewing other kernel ULPs, it's
> >>>>>> not clear that the frwr_map() logic was ever correct for this case.
> >>>>>> 
> >>>>>> Reported-by: Andre Tomt <andre@tomt.net>
> >>>>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> >>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>>>> Cc: stable@vger.kernel.org # v5.5
> >>>>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
> >>>>>> 1 file changed, 7 insertions(+), 6 deletions(-)
> >>>>> 
> >>>>> Yep
> >>>>> 
> >>>>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> >>>> 
> >>>> Thanks.
> >>>> 
> >>>> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
> >>>> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
> >>>> 
> >>>> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
> >>> 
> >>> Not really, this was broken for other configurations besides AMD
> >> 
> >> Agreed, but the bug seems to have been inconsequential until now?
> > 
> > I imagine it would get you on ARM or other archs, IIRC.
> 
> That's certainly plausible, but I haven't received explicit bug reports
> in this area. (I'm not at all saying that such bugs categorically do
> not exist).

Usually I encourage people to put the fixes line to the commit that is
being fixed, pointing at some other commit that happens to expose the
bug is not the best. 
 
> In any event, practical matters: the posted patch applies back to v5.4,
> but fails to apply starting with v5.3.
> 
> I think we can leave the "Cc: stable # v5.5"; and I'm open to requests
> to backport this simple fix onto earlier stable kernels (back to v4.4),
> which can be handled case-by-case. 'Salright?

I'd just put Cc: stable, the stable folks will reject it on earlier
versions because of conflicts and we can leave it.

Jason

Patch
diff mbox series

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 095be887753e..125297c9aa3e 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -288,8 +288,8 @@  struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
 {
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	struct ib_reg_wr *reg_wr;
+	int i, n, dma_nents;
 	struct ib_mr *ibmr;
-	int i, n;
 	u8 key;
 
 	if (nsegs > ia->ri_max_frwr_depth)
@@ -313,15 +313,16 @@  struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
 			break;
 	}
 	mr->mr_dir = rpcrdma_data_dir(writing);
+	mr->mr_nents = i;
 
-	mr->mr_nents =
-		ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
-	if (!mr->mr_nents)
+	dma_nents = ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, mr->mr_nents,
+				  mr->mr_dir);
+	if (!dma_nents)
 		goto out_dmamap_err;
 
 	ibmr = mr->frwr.fr_mr;
-	n = ib_map_mr_sg(ibmr, mr->mr_sg, mr->mr_nents, NULL, PAGE_SIZE);
-	if (unlikely(n != mr->mr_nents))
+	n = ib_map_mr_sg(ibmr, mr->mr_sg, dma_nents, NULL, PAGE_SIZE);
+	if (n != dma_nents)
 		goto out_mapmr_err;
 
 	ibmr->iova &= 0x00000000ffffffff;