diff mbox series

[v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance

Message ID 158145102079.515252.3226617475691911684.stgit@morisot.1015granger.net (mailing list archive)
State New, archived
Headers show
Series [v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance | expand

Commit Message

Chuck Lever III Feb. 11, 2020, 7:58 p.m. UTC
The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.

The bug was exposed by recent changes to the AMD IOMMU driver.

Reported-by: Andre Tomt <andre@tomt.net>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/frwr_ops.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Hey Andre, please try this out. It just reverts the bit of brokenness that
Robin observed this morning. I've done basic testing here with Intel
IOMMU systems, no change in behavior (ie, all good to go).

Comments

Andre Tomt Feb. 11, 2020, 8:50 p.m. UTC | #1
On 11.02.2020 20:58, Chuck Lever wrote:
> The @nents value that was passed to ib_dma_map_sg() has to be passed
> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> concatenate sg entries, it will return a different nents value than
> it was passed.
> 
> The bug was exposed by recent changes to the AMD IOMMU driver.

This seems to fail differently on my system; mount fails with:
mount.nfs: mount system call failed

and the kernel log reports:
[   38.890344] NFS: Registering the id_resolver key type
[   38.890351] Key type id_resolver registered
[   38.890352] Key type id_legacy registered
[   38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. 
Exiting with error EIO
[   38.901817] NFS4: Couldn't follow remote path

amd_iommu=off still works

One detail I accidentally left out of the original report is that the 
server (intel system) is running Ubuntu 20.04 ("beta") userspace, and 
AMD clients are Ubuntu 19.10 userspace. Although I dont believe this to 
matter at this point.

> Reported-by: Andre Tomt <andre@tomt.net>
> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   net/sunrpc/xprtrdma/frwr_ops.c |    5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> Hey Andre, please try this out. It just reverts the bit of brokenness that
> Robin observed this morning. I've done basic testing here with Intel
> IOMMU systems, no change in behavior (ie, all good to go).
> 
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 095be887753e..449bb51e4fe8 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
>   			break;
>   	}
>   	mr->mr_dir = rpcrdma_data_dir(writing);
> +	mr->mr_nents = i;
>   
> -	mr->mr_nents =
> -		ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
> -	if (!mr->mr_nents)
> +	if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
>   		goto out_dmamap_err;
>   
>   	ibmr = mr->frwr.fr_mr;
> 
>
Chuck Lever III Feb. 11, 2020, 10 p.m. UTC | #2
Hi Andre, thanks for trying this out.

> On Feb 11, 2020, at 3:50 PM, Andre Tomt <andre@tomt.net> wrote:
> 
> On 11.02.2020 20:58, Chuck Lever wrote:
>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>> concatenate sg entries, it will return a different nents value than
>> it was passed.
>> The bug was exposed by recent changes to the AMD IOMMU driver.
> 
> This seems to fail differently on my system; mount fails with:
> mount.nfs: mount system call failed
> 
> and the kernel log reports:
> [   38.890344] NFS: Registering the id_resolver key type
> [   38.890351] Key type id_resolver registered
> [   38.890352] Key type id_legacy registered
> [   38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. Exiting with error EIO
> [   38.901817] NFS4: Couldn't follow remote path
> 
> amd_iommu=off still works
> 
> One detail I accidentally left out of the original report is that the server (intel system) is running Ubuntu 20.04 ("beta") userspace, and AMD clients are Ubuntu 19.10 userspace. Although I dont believe this to matter at this point.

Next thing to try:

# trace-cmd record -e sunrpc -e rpcrdma

then issue the mount command. Once it completes, ^C the trace-cmd and send me trace.dat.

Try this with both the v5.4 kernel and the v5.5 kernel (and note that trace-cmd overwrites trace.dat, so copy it out between tests).


>> Reported-by: Andre Tomt <andre@tomt.net>
>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>> Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  net/sunrpc/xprtrdma/frwr_ops.c |    5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> Hey Andre, please try this out. It just reverts the bit of brokenness that
>> Robin observed this morning. I've done basic testing here with Intel
>> IOMMU systems, no change in behavior (ie, all good to go).
>> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
>> index 095be887753e..449bb51e4fe8 100644
>> --- a/net/sunrpc/xprtrdma/frwr_ops.c
>> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
>> @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
>>  			break;
>>  	}
>>  	mr->mr_dir = rpcrdma_data_dir(writing);
>> +	mr->mr_nents = i;
>>  -	mr->mr_nents =
>> -		ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
>> -	if (!mr->mr_nents)
>> +	if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
>>  		goto out_dmamap_err;
>>    	ibmr = mr->frwr.fr_mr;
> 

--
Chuck Lever
Andre Tomt Feb. 12, 2020, 12:16 a.m. UTC | #3
On 11.02.2020 23:00, Chuck Lever wrote:
> Hi Andre, thanks for trying this out.
> 
>> On Feb 11, 2020, at 3:50 PM, Andre Tomt <andre@tomt.net> wrote:
>>
>> On 11.02.2020 20:58, Chuck Lever wrote:
>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>>> concatenate sg entries, it will return a different nents value than
>>> it was passed.
>>> The bug was exposed by recent changes to the AMD IOMMU driver.
>>
>> This seems to fail differently on my system; mount fails with:
>> mount.nfs: mount system call failed
>>
>> and the kernel log reports:
>> [   38.890344] NFS: Registering the id_resolver key type
>> [   38.890351] Key type id_resolver registered
>> [   38.890352] Key type id_legacy registered
>> [   38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. Exiting with error EIO
>> [   38.901817] NFS4: Couldn't follow remote path
>>
>> amd_iommu=off still works
>>
>> One detail I accidentally left out of the original report is that the server (intel system) is running Ubuntu 20.04 ("beta") userspace, and AMD clients are Ubuntu 19.10 userspace. Although I dont believe this to matter at this point.
> 
> Next thing to try:
> 
> # trace-cmd record -e sunrpc -e rpcrdma
> 
> then issue the mount command. Once it completes, ^C the trace-cmd and send me trace.dat.
> 
> Try this with both the v5.4 kernel and the v5.5 kernel (and note that trace-cmd overwrites trace.dat, so copy it out between tests).

I've uploaded them to https://tomt.net/temp/rdmaiommubug/
I'll probably do a 5.5.3 with the v1 fix as well, should show up 
momentarily.

>>> Reported-by: Andre Tomt <andre@tomt.net>
>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>>> Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>   net/sunrpc/xprtrdma/frwr_ops.c |    5 ++---
>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>> Hey Andre, please try this out. It just reverts the bit of brokenness that
>>> Robin observed this morning. I've done basic testing here with Intel
>>> IOMMU systems, no change in behavior (ie, all good to go).
>>> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
>>> index 095be887753e..449bb51e4fe8 100644
>>> --- a/net/sunrpc/xprtrdma/frwr_ops.c
>>> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
>>> @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
>>>   			break;
>>>   	}
>>>   	mr->mr_dir = rpcrdma_data_dir(writing);
>>> +	mr->mr_nents = i;
>>>   -	mr->mr_nents =
>>> -		ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
>>> -	if (!mr->mr_nents)
>>> +	if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
>>>   		goto out_dmamap_err;
>>>     	ibmr = mr->frwr.fr_mr;
>>
> 
> --
> Chuck Lever
> 
> 
>
diff mbox series

Patch

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 095be887753e..449bb51e4fe8 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -313,10 +313,9 @@  struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
 			break;
 	}
 	mr->mr_dir = rpcrdma_data_dir(writing);
+	mr->mr_nents = i;
 
-	mr->mr_nents =
-		ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
-	if (!mr->mr_nents)
+	if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
 		goto out_dmamap_err;
 
 	ibmr = mr->frwr.fr_mr;