diff mbox series

[for-next,v2] RDMA/rxe: fix regression caused by recent patch

Message ID 20201030171106.4191-1-rpearson@hpe.com (mailing list archive)
State Superseded
Headers show
Series [for-next,v2] RDMA/rxe: fix regression caused by recent patch | expand

Commit Message

Bob Pearson Oct. 30, 2020, 5:11 p.m. UTC
The commit referenced below performs additional checking on
devices used for DMA. Specifically it checks that

device->dma_mask != NULL

Rdma_rxe uses this device when pinning MR memory but did not
set the value of dma_mask. In fact rdma_rxe does not perform
any DMA operations so the value is never used but is checked.

This patch gives dma_mask a valid value extracted from the device
backing the ndev used by rxe.

Without this patch rdma_rxe does not function.

N.B. This patch needs to be applied before the recent fix to add back
IB_USER_VERBS_CMD_POST_SEND to uverbs_cmd_mask.

Dennis Dallesandro reported that Parav's similar patch did not apply
cleanly to rxe. This one does to for-next head of tree as of yesterday.

Fixes: f959dcd6ddfd2 ("dma-direct: Fix potential NULL pointer dereference")
Signed-off-by: Bob Pearson <rpearson@hpe.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe Oct. 30, 2020, 5:36 p.m. UTC | #1
On Fri, Oct 30, 2020 at 12:11:07PM -0500, Bob Pearson wrote:
> The commit referenced below performs additional checking on
> devices used for DMA. Specifically it checks that
> 
> device->dma_mask != NULL
> 
> Rdma_rxe uses this device when pinning MR memory but did not
> set the value of dma_mask. In fact rdma_rxe does not perform
> any DMA operations so the value is never used but is checked.
> 
> This patch gives dma_mask a valid value extracted from the device
> backing the ndev used by rxe.
> 
> Without this patch rdma_rxe does not function.
> 
> N.B. This patch needs to be applied before the recent fix to add back
> IB_USER_VERBS_CMD_POST_SEND to uverbs_cmd_mask.
> 
> Dennis Dallesandro reported that Parav's similar patch did not apply
> cleanly to rxe. This one does to for-next head of tree as of yesterday.
> 
> Fixes: f959dcd6ddfd2 ("dma-direct: Fix potential NULL pointer dereference")
> Signed-off-by: Bob Pearson <rpearson@hpe.com>
>  drivers/infiniband/sw/rxe/rxe_verbs.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 7652d53af2c1..c857e83323ed 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -1128,19 +1128,32 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
>  	int err;
>  	struct ib_device *dev = &rxe->ib_dev;
>  	struct crypto_shash *tfm;
> +	u64 dma_mask;
>  
>  	strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc));
>  
>  	dev->node_type = RDMA_NODE_IB_CA;
>  	dev->phys_port_cnt = 1;
>  	dev->num_comp_vectors = num_possible_cpus();
> -	dev->dev.parent = rxe_dma_device(rxe);
>  	dev->local_dma_lkey = 0;
>  	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
>  			    rxe->ndev->dev_addr);
>  	dev->dev.dma_parms = &rxe->dma_parms;
>  	dma_set_max_seg_size(&dev->dev, UINT_MAX);
> -	dma_set_coherent_mask(&dev->dev, dma_get_required_mask(&dev->dev));
> +
> +	/* rdma_rxe never does real DMA but does rely on
> +	 * pinning user memory in MRs to avoid page faults
> +	 * in responder and completer tasklets. This code
> +	 * supplies a valid dma_mask from the underlying
> +	 * network device. It is never used but is checked.
> +	 */
> +	dev->dev.parent = rxe_dma_device(rxe);

Oh! This is another bug, the parent of an ib_device should never be
set to a net_device!! This is probably why we get all those mysterious
syzkaller faults :| Just leave it NULL

> +	dma_mask = *(dev->dev.parent->dma_mask);
> +	err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask);

Why not use Parav's logic?

Jason
Bob Pearson Oct. 30, 2020, 5:45 p.m. UTC | #2
On 10/30/20 12:36 PM, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 12:11:07PM -0500, Bob Pearson wrote:
>> The commit referenced below performs additional checking on
>> devices used for DMA. Specifically it checks that
>>
>> device->dma_mask != NULL
>>
>> Rdma_rxe uses this device when pinning MR memory but did not
>> set the value of dma_mask. In fact rdma_rxe does not perform
>> any DMA operations so the value is never used but is checked.
>>
>> This patch gives dma_mask a valid value extracted from the device
>> backing the ndev used by rxe.
>>
>> Without this patch rdma_rxe does not function.
>>
>> N.B. This patch needs to be applied before the recent fix to add back
>> IB_USER_VERBS_CMD_POST_SEND to uverbs_cmd_mask.
>>
>> Dennis Dallesandro reported that Parav's similar patch did not apply
>> cleanly to rxe. This one does to for-next head of tree as of yesterday.
>>
>> Fixes: f959dcd6ddfd2 ("dma-direct: Fix potential NULL pointer dereference")
>> Signed-off-by: Bob Pearson <rpearson@hpe.com>
>>  drivers/infiniband/sw/rxe/rxe_verbs.c | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
>> index 7652d53af2c1..c857e83323ed 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
>> @@ -1128,19 +1128,32 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
>>  	int err;
>>  	struct ib_device *dev = &rxe->ib_dev;
>>  	struct crypto_shash *tfm;
>> +	u64 dma_mask;
>>  
>>  	strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc));
>>  
>>  	dev->node_type = RDMA_NODE_IB_CA;
>>  	dev->phys_port_cnt = 1;
>>  	dev->num_comp_vectors = num_possible_cpus();
>> -	dev->dev.parent = rxe_dma_device(rxe);
>>  	dev->local_dma_lkey = 0;
>>  	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
>>  			    rxe->ndev->dev_addr);
>>  	dev->dev.dma_parms = &rxe->dma_parms;
>>  	dma_set_max_seg_size(&dev->dev, UINT_MAX);
>> -	dma_set_coherent_mask(&dev->dev, dma_get_required_mask(&dev->dev));
>> +
>> +	/* rdma_rxe never does real DMA but does rely on
>> +	 * pinning user memory in MRs to avoid page faults
>> +	 * in responder and completer tasklets. This code
>> +	 * supplies a valid dma_mask from the underlying
>> +	 * network device. It is never used but is checked.
>> +	 */
>> +	dev->dev.parent = rxe_dma_device(rxe);
> 
> Oh! This is another bug, the parent of an ib_device should never be
> set to a net_device!! This is probably why we get all those mysterious
> syzkaller faults :| Just leave it NULL
> 
>> +	dma_mask = *(dev->dev.parent->dma_mask);
>> +	err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask);
> 
> Why not use Parav's logic?
> 
> Jason
> 

It's not the network device. It is the parent of the network device.
On 64 bit machines it gives 0xffffffffffffffff as dma_mask.

struct device *rxe_dma_device(struct rxe_dev *rxe)
{
        struct net_device *ndev;

        ndev = rxe->ndev;

        if (is_vlan_dev(ndev))
                ndev = vlan_dev_real_dev(ndev);

        return ndev->dev.parent;
}

His should work too. They will behave the same at the end of the day.
I don't really know what the rxe_dma_device() code was trying to do in the
first place so I didn't change it. But it was a handy place to get a dma_mask
that should work on any architecture. If there is no reason to set dev.parent
I can get rid of rxe_dma_device.

Bob
Jason Gunthorpe Oct. 30, 2020, 5:47 p.m. UTC | #3
On Fri, Oct 30, 2020 at 12:45:54PM -0500, Bob Pearson wrote:
> >> +
> >> +	/* rdma_rxe never does real DMA but does rely on
> >> +	 * pinning user memory in MRs to avoid page faults
> >> +	 * in responder and completer tasklets. This code
> >> +	 * supplies a valid dma_mask from the underlying
> >> +	 * network device. It is never used but is checked.
> >> +	 */
> >> +	dev->dev.parent = rxe_dma_device(rxe);
> > 
> > Oh! This is another bug, the parent of an ib_device should never be
> > set to a net_device!! This is probably why we get all those mysterious
> > syzkaller faults :| Just leave it NULL
> > 
> >> +	dma_mask = *(dev->dev.parent->dma_mask);
> >> +	err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask);
> > 
> > Why not use Parav's logic?
> > 
> > Jason
> 
> It's not the network device. It is the parent of the network device.
> On 64 bit machines it gives 0xffffffffffffffff as dma_mask.

No, it is some weird thing because network devices don't always have
physical device parents.

There is no relation between the netdevice RXE is running on and the
DMA mask to use for the dummy dma ops, AFAICT

> that should work on any architecture. If there is no reason to set
> dev.parent I can get rid of rxe_dma_device.

Please, that arrangement is causing bugs.

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 7652d53af2c1..c857e83323ed 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1128,19 +1128,32 @@  int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	int err;
 	struct ib_device *dev = &rxe->ib_dev;
 	struct crypto_shash *tfm;
+	u64 dma_mask;
 
 	strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc));
 
 	dev->node_type = RDMA_NODE_IB_CA;
 	dev->phys_port_cnt = 1;
 	dev->num_comp_vectors = num_possible_cpus();
-	dev->dev.parent = rxe_dma_device(rxe);
 	dev->local_dma_lkey = 0;
 	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
 			    rxe->ndev->dev_addr);
 	dev->dev.dma_parms = &rxe->dma_parms;
 	dma_set_max_seg_size(&dev->dev, UINT_MAX);
-	dma_set_coherent_mask(&dev->dev, dma_get_required_mask(&dev->dev));
+
+	/* rdma_rxe never does real DMA but does rely on
+	 * pinning user memory in MRs to avoid page faults
+	 * in responder and completer tasklets. This code
+	 * supplies a valid dma_mask from the underlying
+	 * network device. It is never used but is checked.
+	 */
+	dev->dev.parent = rxe_dma_device(rxe);
+	dma_mask = *(dev->dev.parent->dma_mask);
+	err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask);
+	if (err) {
+		pr_warn("dma_mask not supported\n");
+		return err;
+	}
 
 	dev->uverbs_cmd_mask |= BIT_ULL(IB_USER_VERBS_CMD_REQ_NOTIFY_CQ);