diff mbox series

[-next] RDMA/hns: Fix return in hns_roce_rereg_user_mr()

Message ID 20210804125939.20516-1-yuehaibing@huawei.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [-next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() | expand

Commit Message

Yue Haibing Aug. 4, 2021, 12:59 p.m. UTC
If re-registering an MR in hns_roce_rereg_user_mr(), we should
return NULL instead of pass 0 to ERR_PTR.

Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Leon Romanovsky Aug. 4, 2021, 1:53 p.m. UTC | #1
On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> return NULL instead of pass 0 to ERR_PTR.
> 
> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> index 006c84bb3f9f..7089ac780291 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
>  free_cmd_mbox:
>  	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
>  
> -	return ERR_PTR(ret);
> +	if (ret)
> +		return ERR_PTR(ret);
> +	return NULL;
>  }

I don't understand this function, it returns or ERR_PTR() or NULL, but
should return &mr->ibmr in success path. How does it work?

Thanks

>  
>  int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> -- 
> 2.17.1
>
Yue Haibing Aug. 5, 2021, 2:36 a.m. UTC | #2
On 2021/8/4 21:53, Leon Romanovsky wrote:
> On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
>> If re-registering an MR in hns_roce_rereg_user_mr(), we should
>> return NULL instead of pass 0 to ERR_PTR.
>>
>> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>> ---
>>  drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
>> index 006c84bb3f9f..7089ac780291 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
>> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
>>  free_cmd_mbox:
>>  	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
>>  
>> -	return ERR_PTR(ret);
>> +	if (ret)
>> +		return ERR_PTR(ret);
>> +	return NULL;
>>  }
> 
> I don't understand this function, it returns or ERR_PTR() or NULL, but
> should return &mr->ibmr in success path. How does it work?

Did you means hns_roce_reg_user_mr()?

hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,

In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:

 829         new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
 830                                            cmd.hca_va, cmd.access_flags, new_pd,
 831                                            &attrs->driver_udata);
 832         if (IS_ERR(new_mr)) {
 833                 ret = PTR_ERR(new_mr);
 834                 goto put_new_uobj;
 835         }
 836         if (new_mr) {
.....
 860                 mr = new_mr;
 861         } else {
 862                 if (cmd.flags & IB_MR_REREG_PD) {
 863                         atomic_dec(&orig_pd->usecnt);
 864                         mr->pd = new_pd;
 865                         atomic_inc(&new_pd->usecnt);
 866                 }
 867                 if (cmd.flags & IB_MR_REREG_TRANS)
 868                         mr->iova = cmd.hca_va;
 869         }


> 
> Thanks
> 
>>  
>>  int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>> -- 
>> 2.17.1
>>
> .
>
Leon Romanovsky Aug. 5, 2021, 3:40 a.m. UTC | #3
On Thu, Aug 05, 2021 at 10:36:03AM +0800, YueHaibing wrote:
> On 2021/8/4 21:53, Leon Romanovsky wrote:
> > On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> >> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> >> return NULL instead of pass 0 to ERR_PTR.
> >>
> >> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> >> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> >> ---
> >>  drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> index 006c84bb3f9f..7089ac780291 100644
> >> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
> >>  free_cmd_mbox:
> >>  	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
> >>  
> >> -	return ERR_PTR(ret);
> >> +	if (ret)
> >> +		return ERR_PTR(ret);
> >> +	return NULL;
> >>  }
> > 
> > I don't understand this function, it returns or ERR_PTR() or NULL, but
> > should return &mr->ibmr in success path. How does it work?
> 
> Did you means hns_roce_reg_user_mr()?
> 
> hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,
> 
> In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:
> 
>  829         new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
>  830                                            cmd.hca_va, cmd.access_flags, new_pd,
>  831                                            &attrs->driver_udata);
>  832         if (IS_ERR(new_mr)) {
>  833                 ret = PTR_ERR(new_mr);
>  834                 goto put_new_uobj;
>  835         }
>  836         if (new_mr) {
> .....
>  860                 mr = new_mr;
>  861         } else {
>  862                 if (cmd.flags & IB_MR_REREG_PD) {
>  863                         atomic_dec(&orig_pd->usecnt);
>  864                         mr->pd = new_pd;
>  865                         atomic_inc(&new_pd->usecnt);
>  866                 }
>  867                 if (cmd.flags & IB_MR_REREG_TRANS)
>  868                         mr->iova = cmd.hca_va;
>  869         }

You overwrite various fields in old_mr when executing hns_roce_rereg_user_mr().
For example mr->access flags, which is not returned to the original
state after all failures.

Also I'm not so sure about if it is valid to return NULL in all flows.

Thanks

> 
> 
> > 
> > Thanks
> > 
> >>  
> >>  int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> >> -- 
> >> 2.17.1
> >>
> > .
> >
Yue Haibing Aug. 5, 2021, 9:29 a.m. UTC | #4
On 2021/8/5 11:40, Leon Romanovsky wrote:
> On Thu, Aug 05, 2021 at 10:36:03AM +0800, YueHaibing wrote:
>> On 2021/8/4 21:53, Leon Romanovsky wrote:
>>> On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
>>>> If re-registering an MR in hns_roce_rereg_user_mr(), we should
>>>> return NULL instead of pass 0 to ERR_PTR.
>>>>
>>>> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
>>>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>>>> ---
>>>>  drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
>>>> index 006c84bb3f9f..7089ac780291 100644
>>>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
>>>> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
>>>>  free_cmd_mbox:
>>>>  	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
>>>>  
>>>> -	return ERR_PTR(ret);
>>>> +	if (ret)
>>>> +		return ERR_PTR(ret);
>>>> +	return NULL;
>>>>  }
>>>
>>> I don't understand this function, it returns or ERR_PTR() or NULL, but
>>> should return &mr->ibmr in success path. How does it work?
>>
>> Did you means hns_roce_reg_user_mr()?
>>
>> hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,
>>
>> In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:
>>
>>  829         new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
>>  830                                            cmd.hca_va, cmd.access_flags, new_pd,
>>  831                                            &attrs->driver_udata);
>>  832         if (IS_ERR(new_mr)) {
>>  833                 ret = PTR_ERR(new_mr);
>>  834                 goto put_new_uobj;
>>  835         }
>>  836         if (new_mr) {
>> .....
>>  860                 mr = new_mr;
>>  861         } else {
>>  862                 if (cmd.flags & IB_MR_REREG_PD) {
>>  863                         atomic_dec(&orig_pd->usecnt);
>>  864                         mr->pd = new_pd;
>>  865                         atomic_inc(&new_pd->usecnt);
>>  866                 }
>>  867                 if (cmd.flags & IB_MR_REREG_TRANS)
>>  868                         mr->iova = cmd.hca_va;
>>  869         }
> 
> You overwrite various fields in old_mr when executing hns_roce_rereg_user_mr().
> For example mr->access flags, which is not returned to the original
> state after all failures.

IMO, if ibv_rereg_mr failed, the mr is in undefined state, user needs to call
ibv_dereg_mr in order to release it, so there no need to recover the original state.

Also, mlx4_ib_rereg_user_mr seems to do the same thing.

> 
> Also I'm not so sure about if it is valid to return NULL in all flows.
> 
> Thanks
> 
>>
>>
>>>
>>> Thanks
>>>
>>>>  
>>>>  int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>>>> -- 
>>>> 2.17.1
>>>>
>>> .
>>>
> .
>
Leon Romanovsky Aug. 5, 2021, 10:58 a.m. UTC | #5
On Thu, Aug 05, 2021 at 05:29:25PM +0800, YueHaibing wrote:
> On 2021/8/5 11:40, Leon Romanovsky wrote:
> > On Thu, Aug 05, 2021 at 10:36:03AM +0800, YueHaibing wrote:
> >> On 2021/8/4 21:53, Leon Romanovsky wrote:
> >>> On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> >>>> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> >>>> return NULL instead of pass 0 to ERR_PTR.
> >>>>
> >>>> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> >>>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> >>>> ---
> >>>>  drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
> >>>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >>>> index 006c84bb3f9f..7089ac780291 100644
> >>>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> >>>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >>>> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
> >>>>  free_cmd_mbox:
> >>>>  	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
> >>>>  
> >>>> -	return ERR_PTR(ret);
> >>>> +	if (ret)
> >>>> +		return ERR_PTR(ret);
> >>>> +	return NULL;
> >>>>  }
> >>>
> >>> I don't understand this function, it returns or ERR_PTR() or NULL, but
> >>> should return &mr->ibmr in success path. How does it work?
> >>
> >> Did you means hns_roce_reg_user_mr()?
> >>
> >> hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,
> >>
> >> In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:
> >>
> >>  829         new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
> >>  830                                            cmd.hca_va, cmd.access_flags, new_pd,
> >>  831                                            &attrs->driver_udata);
> >>  832         if (IS_ERR(new_mr)) {
> >>  833                 ret = PTR_ERR(new_mr);
> >>  834                 goto put_new_uobj;
> >>  835         }
> >>  836         if (new_mr) {
> >> .....
> >>  860                 mr = new_mr;
> >>  861         } else {
> >>  862                 if (cmd.flags & IB_MR_REREG_PD) {
> >>  863                         atomic_dec(&orig_pd->usecnt);
> >>  864                         mr->pd = new_pd;
> >>  865                         atomic_inc(&new_pd->usecnt);
> >>  866                 }
> >>  867                 if (cmd.flags & IB_MR_REREG_TRANS)
> >>  868                         mr->iova = cmd.hca_va;
> >>  869         }
> > 
> > You overwrite various fields in old_mr when executing hns_roce_rereg_user_mr().
> > For example mr->access flags, which is not returned to the original
> > state after all failures.
> 
> IMO, if ibv_rereg_mr failed, the mr is in undefined state, user needs to call
> ibv_dereg_mr in order to release it, so there no need to recover the original state.

The thing is that it undefined state in the kernel.
What will be if user will change access_flags and try to use that
"broken" MR anyway? Will you catch it?

> 
> Also, mlx4_ib_rereg_user_mr seems to do the same thing.

mlx4 does many crazy things.

> 
> > 
> > Also I'm not so sure about if it is valid to return NULL in all flows.
> > 
> > Thanks
> > 
> >>
> >>
> >>>
> >>> Thanks
> >>>
> >>>>  
> >>>>  int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> >>>> -- 
> >>>> 2.17.1
> >>>>
> >>> .
> >>>
> > .
> >
Jason Gunthorpe Aug. 5, 2021, 12:23 p.m. UTC | #6
On Thu, Aug 05, 2021 at 01:58:53PM +0300, Leon Romanovsky wrote:

> > IMO, if ibv_rereg_mr failed, the mr is in undefined state, user
> > needs to call ibv_dereg_mr in order to release it, so there no
> > need to recover the original state.
> 
> The thing is that it undefined state in the kernel.  What will be if
> user will change access_flags and try to use that "broken" MR
> anyway? Will you catch it?

rereg is not atomic, if the rereg fails in the middle the mr should be
left in some safe state.

Jason
Leon Romanovsky Aug. 5, 2021, 12:30 p.m. UTC | #7
On Thu, Aug 05, 2021 at 09:23:11AM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 05, 2021 at 01:58:53PM +0300, Leon Romanovsky wrote:
> 
> > > IMO, if ibv_rereg_mr failed, the mr is in undefined state, user
> > > needs to call ibv_dereg_mr in order to release it, so there no
> > > need to recover the original state.
> > 
> > The thing is that it undefined state in the kernel.  What will be if
> > user will change access_flags and try to use that "broken" MR
> > anyway? Will you catch it?
> 
> rereg is not atomic, if the rereg fails in the middle the mr should be
> left in some safe state.

It is not the case in the hns flow, they leave such MR in limbo state.

> 
> Jason
Jason Gunthorpe Aug. 19, 2021, 2:18 p.m. UTC | #8
On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> return NULL instead of pass 0 to ERR_PTR.
> 
> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>  drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Applied to for-next, though hns should be checked to ensure MRs are
not left in some broken state after rereg failure.

Thanks,
Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index 006c84bb3f9f..7089ac780291 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -352,7 +352,9 @@  struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
 free_cmd_mbox:
 	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
 
-	return ERR_PTR(ret);
+	if (ret)
+		return ERR_PTR(ret);
+	return NULL;
 }
 
 int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)