diff mbox series

[for-rc,2/2] IB/mlx5: Fix how advise_mr() launches async work

Message ID 20190112023124.21647-3-jgg@ziepe.ca (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series Fix poor reference counting in advice_mr work | expand

Commit Message

Jason Gunthorpe Jan. 12, 2019, 2:31 a.m. UTC
From: Jason Gunthorpe <jgg@mellanox.com>

Work must hold a kref on the ib_device otherwise the dev pointer can
become free before the work runs.

Remove the bogus use of 'reg_state':
 - While in uverbs the reg_state is guaranteed to always be
   REGISTERED
 - Testing reg_state with no locking is bogus. Use ib_device_try_get()
   to get back into a region that prevents unregistration.

Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Leon Romanovsky Jan. 12, 2019, 3:06 p.m. UTC | #1
On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
>
> Work must hold a kref on the ib_device otherwise the dev pointer can
> become free before the work runs.

How can it occur exactly?

"dev" will be released after mlx5_ib will exist, mlx5_ib exist will
drain advise_mr queue before it. Such drain will ensure that no works
are running.

>
> Remove the bogus use of 'reg_state':
>  - While in uverbs the reg_state is guaranteed to always be
>    REGISTERED
>  - Testing reg_state with no locking is bogus. Use ib_device_try_get()
>    to get back into a region that prevents unregistration.
>
> Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  drivers/infiniband/hw/mlx5/odp.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> index 8d46b1dc56580f..a241a65834d4ca 100644
> --- a/drivers/infiniband/hw/mlx5/odp.c
> +++ b/drivers/infiniband/hw/mlx5/odp.c
> @@ -1595,10 +1595,12 @@ static void mlx5_ib_prefetch_mr_work(struct work_struct *work)
>  	struct prefetch_mr_work *w =
>  		container_of(work, struct prefetch_mr_work, work);
>
> -	if (w->dev->ib_dev.reg_state == IB_DEV_REGISTERED)
> +	if (ib_device_try_get(&w->dev->ib_dev)) {
>  		mlx5_ib_prefetch_sg_list(w->dev, w->pf_flags, w->sg_list,
>  					 w->num_sge);
> -
> +		ib_device_put(&w->dev->ib_dev);
> +	}
> +	put_device(&w->dev->ib_dev.dev);
>  	kfree(w);
>  }
>
> @@ -1617,15 +1619,13 @@ int mlx5_ib_advise_mr_prefetch(struct ib_pd *pd,
>  		return mlx5_ib_prefetch_sg_list(dev, pf_flags, sg_list,
>  						num_sge);
>
> -	if (dev->ib_dev.reg_state != IB_DEV_REGISTERED)
> -		return -ENODEV;
> -
>  	work = kvzalloc(struct_size(work, sg_list, num_sge), GFP_KERNEL);
>  	if (!work)
>  		return -ENOMEM;
>
>  	memcpy(work->sg_list, sg_list, num_sge * sizeof(struct ib_sge));
>
> +	get_device(&dev->ib_dev.dev);

It looks extremely wrong to sprinkle driver code with general
get_device/put_device.

>  	work->dev = dev;
>  	work->pf_flags = pf_flags;
>  	work->num_sge = num_sge;
> --
> 2.20.1
>
Jason Gunthorpe Jan. 12, 2019, 5:05 p.m. UTC | #2
On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> >
> > Work must hold a kref on the ib_device otherwise the dev pointer can
> > become free before the work runs.
> 
> How can it occur exactly?
> 
> "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> drain advise_mr queue before it. Such drain will ensure that no works
> are running.

Okay, that is fine for the get_device, it can be removed.

Are we OK with the reg_state hunk?

Jason
Parav Pandit Jan. 12, 2019, 5:32 p.m. UTC | #3
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> Sent: Saturday, January 12, 2019 11:06 AM
> To: Leon Romanovsky <leon@kernel.org>
> Cc: linux-rdma@vger.kernel.org; Moni Shoua <monis@mellanox.com>
> Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> work
> 
> On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > From: Jason Gunthorpe <jgg@mellanox.com>
> > >
> > > Work must hold a kref on the ib_device otherwise the dev pointer can
> > > become free before the work runs.
> >
> > How can it occur exactly?
> >
> > "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> > drain advise_mr queue before it. Such drain will ensure that no works
> > are running.
> 
> Okay, that is fine for the get_device, it can be removed.
> 
> Are we OK with the reg_state hunk?
> 
> Jason

Unrelated to your question,
I noticed that advise_mr does uverbs_attr_ptr_get_array_size() for the SGEs.
I think this is  incorrect.
It should have been mr handles whose ownership can be check against the PID or ucontext that registered those MRs.

With ib_sge, now one user can start page faulting MRs of another process of another user?
How is this protected? Uverbs shouldn't be deriving uverbs handle from mr key of ib_sge.

Who enqueues work to dev->advise_mr_wq? schedule_work() in mlx5_ib_advise_mr_prefetch() does to global workqueue.
It should be done to advise_mr_wq().
This will give chance to flush the wq when IB device is unregistered by the core.
Jason Gunthorpe Jan. 12, 2019, 5:38 p.m. UTC | #4
On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> 
> 
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > Sent: Saturday, January 12, 2019 11:06 AM
> > To: Leon Romanovsky <leon@kernel.org>
> > Cc: linux-rdma@vger.kernel.org; Moni Shoua <monis@mellanox.com>
> > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > work
> > 
> > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > >
> > > > Work must hold a kref on the ib_device otherwise the dev pointer can
> > > > become free before the work runs.
> > >
> > > How can it occur exactly?
> > >
> > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> > > drain advise_mr queue before it. Such drain will ensure that no works
> > > are running.
> > 
> > Okay, that is fine for the get_device, it can be removed.
> > 
> > Are we OK with the reg_state hunk?
> > 
> > Jason
> 
> Unrelated to your question, I noticed that advise_mr does
> uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> incorrect.  It should have been mr handles whose ownership can be
> check against the PID or ucontext that registered those MRs.

The SGE's have lkeys in them, and the lkeys are local to the PD passed
in as argument.

So the caller cannot touch resources outside the PD, hopefully the
driver implementation checks this..
 
> Who enqueues work to dev->advise_mr_wq? schedule_work() in
> mlx5_ib_advise_mr_prefetch() does to global workqueue. 

Ah! I *thought* I checked this, yes, using the system work queue is
why I added the put_device :)

> It should be done to advise_mr_wq().  This will give chance to flush
> the wq when IB device is unregistered by the core.

Good question - Moni??

Jason
Parav Pandit Jan. 12, 2019, 6:06 p.m. UTC | #5
> -----Original Message-----
> From: Jason Gunthorpe
> Sent: Saturday, January 12, 2019 11:39 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; Moni
> Shoua <monis@mellanox.com>
> Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> work
> 
> On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> >
> >
> > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > > Sent: Saturday, January 12, 2019 11:06 AM
> > > To: Leon Romanovsky <leon@kernel.org>
> > > Cc: linux-rdma@vger.kernel.org; Moni Shoua <monis@mellanox.com>
> > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > launches async work
> > >
> > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > >
> > > > > Work must hold a kref on the ib_device otherwise the dev pointer
> > > > > can become free before the work runs.
> > > >
> > > > How can it occur exactly?
> > > >
> > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist
> > > > will drain advise_mr queue before it. Such drain will ensure that
> > > > no works are running.
> > >
> > > Okay, that is fine for the get_device, it can be removed.
> > >
> > > Are we OK with the reg_state hunk?
> > >
> > > Jason
> >
> > Unrelated to your question, I noticed that advise_mr does
> > uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> > incorrect.  It should have been mr handles whose ownership can be
> > check against the PID or ucontext that registered those MRs.
> 
> The SGE's have lkeys in them, and the lkeys are local to the PD passed in as
> argument.
> 
> So the caller cannot touch resources outside the PD, hopefully the driver
> implementation checks this..
>
pagefault_single_data_segment () invokes __mlx5_mr_lookup() to lookup a MR per device and not per PD.
mlx5_ib_advise_mr_prefetch() doesn't touch pd.
work item doesn't have PD in it.
Lets say PD check is added and it passes in pagefault_single_data_segment() under device overloaded srcu lock in work item context.

After that MR got reallocated to other process before work item can do the work and now it fault's some other process.

To avoid this, instead of overloading dev->mr_srcu, this work item should be enqueued to advise_mr_wq and that should be flushed during destroy_mr() which ensures that nothing dangling for this MR.
Such code will create right synchronization without depending on randomness of key, number of MR etc parameters.

> > Who enqueues work to dev->advise_mr_wq? schedule_work() in
> > mlx5_ib_advise_mr_prefetch() does to global workqueue.
> 
> Ah! I *thought* I checked this, yes, using the system work queue is why I
> added the put_device :)
> 
> > It should be done to advise_mr_wq().  This will give chance to flush
> > the wq when IB device is unregistered by the core.
> 
> Good question - Moni??
> 
> Jason
Jason Gunthorpe Jan. 12, 2019, 6:41 p.m. UTC | #6
On Sat, Jan 12, 2019 at 11:06:51AM -0700, Parav Pandit wrote:
> 
> 
> > From: Jason Gunthorpe
> > Sent: Saturday, January 12, 2019 11:39 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; Moni
> > Shoua <monis@mellanox.com>
> > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > work
> > 
> > On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> > >
> > >
> > > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > > > Sent: Saturday, January 12, 2019 11:06 AM
> > > > To: Leon Romanovsky <leon@kernel.org>
> > > > Cc: linux-rdma@vger.kernel.org; Moni Shoua <monis@mellanox.com>
> > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > > launches async work
> > > >
> > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > > >
> > > > > > Work must hold a kref on the ib_device otherwise the dev pointer
> > > > > > can become free before the work runs.
> > > > >
> > > > > How can it occur exactly?
> > > > >
> > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist
> > > > > will drain advise_mr queue before it. Such drain will ensure that
> > > > > no works are running.
> > > >
> > > > Okay, that is fine for the get_device, it can be removed.
> > > >
> > > > Are we OK with the reg_state hunk?
> > > >
> > > > Jason
> > >
> > > Unrelated to your question, I noticed that advise_mr does
> > > uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> > > incorrect.  It should have been mr handles whose ownership can be
> > > check against the PID or ucontext that registered those MRs.
> > 
> > The SGE's have lkeys in them, and the lkeys are local to the PD passed in as
> > argument.
> > 
> > So the caller cannot touch resources outside the PD, hopefully the driver
> > implementation checks this..
> >
> pagefault_single_data_segment () invokes __mlx5_mr_lookup() to
> lookup a MR per device and not per PD.  mlx5_ib_advise_mr_prefetch()
> doesn't touch pd.  work item doesn't have PD in it.

Ugh, Moni?

I don't think we should alow processes to page fault outside their PD,
that sounds like an avenue for a side-channel security problem.

> After that MR got reallocated to other process before work item can
> do the work and now it fault's some other process.

The locking around mr_srcu and MR lifetime is all broken, so this is
not too surprising..
 
> To avoid this, instead of overloading dev->mr_srcu, this work item
> should be enqueued to advise_mr_wq and that should be flushed during
> destroy_mr() which ensures that nothing dangling for this MR.  Such
> code will create right synchronization without depending on
> randomness of key, number of MR etc parameters.

Moni?

Jason
Leon Romanovsky Jan. 13, 2019, 8:26 a.m. UTC | #7
On Sat, Jan 12, 2019 at 05:05:49PM +0000, Jason Gunthorpe wrote:
> On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > From: Jason Gunthorpe <jgg@mellanox.com>
> > >
> > > Work must hold a kref on the ib_device otherwise the dev pointer can
> > > become free before the work runs.
> >
> > How can it occur exactly?
> >
> > "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> > drain advise_mr queue before it. Such drain will ensure that no works
> > are running.
>
> Okay, that is fine for the get_device, it can be removed.
>
> Are we OK with the reg_state hunk?

I'm not sure about that
Maybe current code is not nice as you suggested but IB_DEV_REGISTERED
check also valid approach.

More general, I don't like how IB/core handles states and IMHO we should
push our stack to be similar to block layer elevator. Block layer implements
clear stages and calls to callbacks provided by drivers. It removes the need
to manage and check any state in those drivers.

If we follow this approach, we will fix natively all those unregister/reset
bogus flows and eliminate the need to do kref on everything.

Thanks

>
> Jason
Leon Romanovsky Jan. 13, 2019, 8:46 a.m. UTC | #8
On Sat, Jan 12, 2019 at 05:38:52PM +0000, Jason Gunthorpe wrote:
> On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> >
> >
> > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > > Sent: Saturday, January 12, 2019 11:06 AM
> > > To: Leon Romanovsky <leon@kernel.org>
> > > Cc: linux-rdma@vger.kernel.org; Moni Shoua <monis@mellanox.com>
> > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > > work
> > >
> > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > >
> > > > > Work must hold a kref on the ib_device otherwise the dev pointer can
> > > > > become free before the work runs.
> > > >
> > > > How can it occur exactly?
> > > >
> > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> > > > drain advise_mr queue before it. Such drain will ensure that no works
> > > > are running.
> > >
> > > Okay, that is fine for the get_device, it can be removed.
> > >
> > > Are we OK with the reg_state hunk?
> > >
> > > Jason
> >
> > Unrelated to your question, I noticed that advise_mr does
> > uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> > incorrect.  It should have been mr handles whose ownership can be
> > check against the PID or ucontext that registered those MRs.
>
> The SGE's have lkeys in them, and the lkeys are local to the PD passed
> in as argument.
>
> So the caller cannot touch resources outside the PD, hopefully the
> driver implementation checks this..
>
> > Who enqueues work to dev->advise_mr_wq? schedule_work() in
> > mlx5_ib_advise_mr_prefetch() does to global workqueue.
>
> Ah! I *thought* I checked this, yes, using the system work queue is
> why I added the put_device :)
>
> > It should be done to advise_mr_wq().  This will give chance to flush
> > the wq when IB device is unregistered by the core.
>
> Good question - Moni??

It is definitely missed in my review, but if the IB/core doesn't flush
general system queue on driver removal, we are better to fix this.

First addition of new works should be stopped and second system
queue should be flushed. My general guideline is to avoid creating
extra workqueues and I saw number of drivers are using system
queues.

Thanks

>
> Jason
Moni Shoua Jan. 13, 2019, 4:57 p.m. UTC | #9
> Ah! I *thought* I checked this, yes, using the system work queue is
> why I added the put_device :)
>
> > It should be done to advise_mr_wq().  This will give chance to flush
> > the wq when IB device is unregistered by the core.
>
> Good question - Moni??
>
> Jason
Thanks.
Using schedule_work() is obviously a bug. The intention was to use the
advise_mr_wq. I'll send a fix for that.
However, After this will be fixed there is no need to get_device()
since it is assured that no work items will be processed after
mlx5_ib_stage_init_cleanup() finishes.
Moni Shoua Jan. 13, 2019, 5:19 p.m. UTC | #10
On Sat, Jan 12, 2019 at 8:42 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Sat, Jan 12, 2019 at 11:06:51AM -0700, Parav Pandit wrote:
> >
> >
> > > From: Jason Gunthorpe
> > > Sent: Saturday, January 12, 2019 11:39 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; Moni
> > > Shoua <monis@mellanox.com>
> > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > > work
> > >
> > > On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > > > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > > > > Sent: Saturday, January 12, 2019 11:06 AM
> > > > > To: Leon Romanovsky <leon@kernel.org>
> > > > > Cc: linux-rdma@vger.kernel.org; Moni Shoua <monis@mellanox.com>
> > > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > > > launches async work
> > > > >
> > > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > > > >
> > > > > > > Work must hold a kref on the ib_device otherwise the dev pointer
> > > > > > > can become free before the work runs.
> > > > > >
> > > > > > How can it occur exactly?
> > > > > >
> > > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist
> > > > > > will drain advise_mr queue before it. Such drain will ensure that
> > > > > > no works are running.
> > > > >
> > > > > Okay, that is fine for the get_device, it can be removed.
> > > > >
> > > > > Are we OK with the reg_state hunk?
> > > > >
> > > > > Jason
> > > >
> > > > Unrelated to your question, I noticed that advise_mr does
> > > > uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> > > > incorrect.  It should have been mr handles whose ownership can be
> > > > check against the PID or ucontext that registered those MRs.
> > >
> > > The SGE's have lkeys in them, and the lkeys are local to the PD passed in as
> > > argument.
> > >
> > > So the caller cannot touch resources outside the PD, hopefully the driver
> > > implementation checks this..
> > >
> > pagefault_single_data_segment () invokes __mlx5_mr_lookup() to
> > lookup a MR per device and not per PD.  mlx5_ib_advise_mr_prefetch()
> > doesn't touch pd.  work item doesn't have PD in it.
>
> Ugh, Moni?
>
> I don't think we should alow processes to page fault outside their PD,
> that sounds like an avenue for a side-channel security problem.
>

I can pass pd from prefetch_mr down to pagefault_single_data_segment()
and force that that lookup will be also for pd.
However there is another flow that goes through
pagefault_single_data_segment()  that starts from hardware page-fault.
What will be the pd in this case? Perhaps the pd of the
source/destination QP of the packet that generated the pagefault? But
I can do this
only if the type of page-fault is WQE. In this case the hardware
provides the QPN. For page-fault of type RDMA the hardware doesn't
provide the QPN, only the lkey, address and size (dev is the one that
fired the event).
So if there is a potential security risk in the advise_mr flow doesn't
it exist in the page-fault flow? A malicious packet can do the same
thing a malicious process does.
What do you think?

> > After that MR got reallocated to other process before work item can
> > do the work and now it fault's some other process.
>
> The locking around mr_srcu and MR lifetime is all broken, so this is
> not too surprising..
>
> > To avoid this, instead of overloading dev->mr_srcu, this work item
> > should be enqueued to advise_mr_wq and that should be flushed during
> > destroy_mr() which ensures that nothing dangling for this MR.  Such
> > code will create right synchronization without depending on
> > randomness of key, number of MR etc parameters.
>
> Moni?
>
> Jason
Parav Pandit Jan. 13, 2019, 5:26 p.m. UTC | #11
> -----Original Message-----
> From: Moni Shoua <monis@mellanox.com>
> Sent: Sunday, January 13, 2019 11:19 AM
> To: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Parav Pandit <parav@mellanox.com>; Leon Romanovsky
> <leon@kernel.org>; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> work
> 
> On Sat, Jan 12, 2019 at 8:42 PM Jason Gunthorpe <jgg@mellanox.com>
> wrote:
> >
> > On Sat, Jan 12, 2019 at 11:06:51AM -0700, Parav Pandit wrote:
> > >
> > >
> > > > From: Jason Gunthorpe
> > > > Sent: Saturday, January 12, 2019 11:39 AM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org;
> > > > Moni Shoua <monis@mellanox.com>
> > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > > launches async work
> > > >
> > > > On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > > > > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > > > > > Sent: Saturday, January 12, 2019 11:06 AM
> > > > > > To: Leon Romanovsky <leon@kernel.org>
> > > > > > Cc: linux-rdma@vger.kernel.org; Moni Shoua
> > > > > > <monis@mellanox.com>
> > > > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > > > > launches async work
> > > > > >
> > > > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > > > > >
> > > > > > > > Work must hold a kref on the ib_device otherwise the dev
> > > > > > > > pointer can become free before the work runs.
> > > > > > >
> > > > > > > How can it occur exactly?
> > > > > > >
> > > > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib
> > > > > > > exist will drain advise_mr queue before it. Such drain will
> > > > > > > ensure that no works are running.
> > > > > >
> > > > > > Okay, that is fine for the get_device, it can be removed.
> > > > > >
> > > > > > Are we OK with the reg_state hunk?
> > > > > >
> > > > > > Jason
> > > > >
> > > > > Unrelated to your question, I noticed that advise_mr does
> > > > > uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> > > > > incorrect.  It should have been mr handles whose ownership can
> > > > > be check against the PID or ucontext that registered those MRs.
> > > >
> > > > The SGE's have lkeys in them, and the lkeys are local to the PD
> > > > passed in as argument.
> > > >
> > > > So the caller cannot touch resources outside the PD, hopefully the
> > > > driver implementation checks this..
> > > >
> > > pagefault_single_data_segment () invokes __mlx5_mr_lookup() to
> > > lookup a MR per device and not per PD.  mlx5_ib_advise_mr_prefetch()
> > > doesn't touch pd.  work item doesn't have PD in it.
> >
> > Ugh, Moni?
> >
> > I don't think we should alow processes to page fault outside their PD,
> > that sounds like an avenue for a side-channel security problem.
> >
> 
> I can pass pd from prefetch_mr down to pagefault_single_data_segment()
> and force that that lookup will be also for pd.
> However there is another flow that goes through
> pagefault_single_data_segment()  that starts from hardware page-fault.
> What will be the pd in this case? Perhaps the pd of the source/destination
> QP of the packet that generated the pagefault? But I can do this only if the
> type of page-fault is WQE. In this case the hardware provides the QPN. For
> page-fault of type RDMA the hardware doesn't provide the QPN, only the
> lkey, address and size (dev is the one that fired the event).
> So if there is a potential security risk in the advise_mr flow doesn't it exist in
> the page-fault flow? A malicious packet can do the same thing a malicious
> process does.
> What do you think?

When hw triggered the page fault it has passed the checks of PD for a QP and MR from the QP and MR context including honoring read/write/atomic access rights.
Once those checks pass, it should ask for page fault.
If MR gets destroyed in this process before page fault, host driver will be terminating the MR and all pending transactions.
Parav Pandit Jan. 13, 2019, 6:16 p.m. UTC | #12
> -----Original Message-----
> From: Moni Shoua <monis@mellanox.com>
> Sent: Sunday, January 13, 2019 10:57 AM
> To: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Parav Pandit <parav@mellanox.com>; Leon Romanovsky
> <leon@kernel.org>; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> work
> 
> > Ah! I *thought* I checked this, yes, using the system work queue is
> > why I added the put_device :)
> >
> > > It should be done to advise_mr_wq().  This will give chance to flush
> > > the wq when IB device is unregistered by the core.
> >
> > Good question - Moni??
> >
> > Jason
> Thanks.
> Using schedule_work() is obviously a bug. The intention was to use the
> advise_mr_wq. I'll send a fix for that.
> However, After this will be fixed there is no need to get_device() since it is
> assured that no work items will be processed after
> mlx5_ib_stage_init_cleanup() finishes.

Also you either need to flush the advise_mr_wq or cancel the pending work item when MR is destroyed to make sure that whatever page fault user triggered are canceled. Otherwise a MR can get recycled (dealloc, alloc) (looked up by MR key) for same PD which is page faulting now, which user didn't ask to.
Moni Shoua Jan. 14, 2019, 1:09 p.m. UTC | #13
On Sun, Jan 13, 2019 at 8:18 PM Parav Pandit <parav@mellanox.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Moni Shoua <monis@mellanox.com>
> > Sent: Sunday, January 13, 2019 10:57 AM
> > To: Jason Gunthorpe <jgg@mellanox.com>
> > Cc: Parav Pandit <parav@mellanox.com>; Leon Romanovsky
> > <leon@kernel.org>; linux-rdma@vger.kernel.org
> > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > work
> >
> > > Ah! I *thought* I checked this, yes, using the system work queue is
> > > why I added the put_device :)
> > >
> > > > It should be done to advise_mr_wq().  This will give chance to flush
> > > > the wq when IB device is unregistered by the core.
> > >
> > > Good question - Moni??
> > >
> > > Jason
> > Thanks.
> > Using schedule_work() is obviously a bug. The intention was to use the
> > advise_mr_wq. I'll send a fix for that.
> > However, After this will be fixed there is no need to get_device() since it is
> > assured that no work items will be processed after
> > mlx5_ib_stage_init_cleanup() finishes.
>
> Also you either need to flush the advise_mr_wq or cancel the pending work item when MR is destroyed to make sure that whatever page fault user triggered are canceled. Otherwise a MR can get recycled (dealloc, alloc) (looked up by MR key) for same PD which is page faulting now, which user didn't ask to.
Moni Shoua Jan. 14, 2019, 1:13 p.m. UTC | #14
On Mon, Jan 14, 2019 at 3:09 PM Moni Shoua <monis@mellanox.com> wrote:
>
> On Sun, Jan 13, 2019 at 8:18 PM Parav Pandit <parav@mellanox.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Moni Shoua <monis@mellanox.com>
> > > Sent: Sunday, January 13, 2019 10:57 AM
> > > To: Jason Gunthorpe <jgg@mellanox.com>
> > > Cc: Parav Pandit <parav@mellanox.com>; Leon Romanovsky
> > > <leon@kernel.org>; linux-rdma@vger.kernel.org
> > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > > work
> > >
> > > > Ah! I *thought* I checked this, yes, using the system work queue is
> > > > why I added the put_device :)
> > > >
> > > > > It should be done to advise_mr_wq().  This will give chance to flush
> > > > > the wq when IB device is unregistered by the core.
> > > >
> > > > Good question - Moni??
> > > >
> > > > Jason
> > > Thanks.
> > > Using schedule_work() is obviously a bug. The intention was to use the
> > > advise_mr_wq. I'll send a fix for that.
> > > However, After this will be fixed there is no need to get_device() since it is
> > > assured that no work items will be processed after
> > > mlx5_ib_stage_init_cleanup() finishes.
> >
> > Also you either need to flush the advise_mr_wq or cancel the pending work item when MR is destroyed to make sure that whatever page fault user triggered are canceled. Otherwise a MR can get recycled (dealloc, alloc) (looked up by MR key) for same PD which is page faulting now, which user didn't ask to.

There is also a check for valid address. So, if the the new MR has
different address range then the operation will not take and if the
address is in range of the new MR so there is no difference.
Jason Gunthorpe Jan. 14, 2019, 5:32 p.m. UTC | #15
On Sun, Jan 13, 2019 at 10:26:03AM -0700, Parav Pandit wrote:
> 
> 
> > From: Moni Shoua <monis@mellanox.com>
> > Sent: Sunday, January 13, 2019 11:19 AM
> > To: Jason Gunthorpe <jgg@mellanox.com>
> > Cc: Parav Pandit <parav@mellanox.com>; Leon Romanovsky
> > <leon@kernel.org>; linux-rdma@vger.kernel.org
> > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > work
> > 
> > On Sat, Jan 12, 2019 at 8:42 PM Jason Gunthorpe <jgg@mellanox.com>
> > wrote:
> > >
> > > On Sat, Jan 12, 2019 at 11:06:51AM -0700, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Jason Gunthorpe
> > > > > Sent: Saturday, January 12, 2019 11:39 AM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org;
> > > > > Moni Shoua <monis@mellanox.com>
> > > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > > > launches async work
> > > > >
> > > > > On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> > > > > >
> > > > > >
> > > > > > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > > > > > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > > > > > > Sent: Saturday, January 12, 2019 11:06 AM
> > > > > > > To: Leon Romanovsky <leon@kernel.org>
> > > > > > > Cc: linux-rdma@vger.kernel.org; Moni Shoua
> > > > > > > <monis@mellanox.com>
> > > > > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > > > > > launches async work
> > > > > > >
> > > > > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > > > > > >
> > > > > > > > > Work must hold a kref on the ib_device otherwise the dev
> > > > > > > > > pointer can become free before the work runs.
> > > > > > > >
> > > > > > > > How can it occur exactly?
> > > > > > > >
> > > > > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib
> > > > > > > > exist will drain advise_mr queue before it. Such drain will
> > > > > > > > ensure that no works are running.
> > > > > > >
> > > > > > > Okay, that is fine for the get_device, it can be removed.
> > > > > > >
> > > > > > > Are we OK with the reg_state hunk?
> > > > > > >
> > > > > > > Jason
> > > > > >
> > > > > > Unrelated to your question, I noticed that advise_mr does
> > > > > > uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> > > > > > incorrect.  It should have been mr handles whose ownership can
> > > > > > be check against the PID or ucontext that registered those MRs.
> > > > >
> > > > > The SGE's have lkeys in them, and the lkeys are local to the PD
> > > > > passed in as argument.
> > > > >
> > > > > So the caller cannot touch resources outside the PD, hopefully the
> > > > > driver implementation checks this..
> > > > >
> > > > pagefault_single_data_segment () invokes __mlx5_mr_lookup() to
> > > > lookup a MR per device and not per PD.  mlx5_ib_advise_mr_prefetch()
> > > > doesn't touch pd.  work item doesn't have PD in it.
> > >
> > > Ugh, Moni?
> > >
> > > I don't think we should alow processes to page fault outside their PD,
> > > that sounds like an avenue for a side-channel security problem.
> > >
> > 
> > I can pass pd from prefetch_mr down to pagefault_single_data_segment()
> > and force that that lookup will be also for pd.
> > However there is another flow that goes through
> > pagefault_single_data_segment()  that starts from hardware page-fault.
> > What will be the pd in this case? Perhaps the pd of the source/destination
> > QP of the packet that generated the pagefault? But I can do this only if the
> > type of page-fault is WQE. In this case the hardware provides the QPN. For
> > page-fault of type RDMA the hardware doesn't provide the QPN, only the
> > lkey, address and size (dev is the one that fired the event).
> > So if there is a potential security risk in the advise_mr flow doesn't it exist in
> > the page-fault flow? A malicious packet can do the same thing a malicious
> > process does.
> > What do you think?
> 
> When hw triggered the page fault it has passed the checks of PD for
> a QP and MR from the QP and MR context including honoring
> read/write/atomic access rights.  Once those checks pass, it should
> ask for page fault.  If MR gets destroyed in this process before
> page fault, host driver will be terminating the MR and all pending
> transactions.

I agree, the HW needs to validate all on-the-wire rkeys against the
Rx'ing QP's PD before passing them into the driver.

So if the HW is right then the HW driver flow should ignore the PD as
it is already checked.

Jason
Jason Gunthorpe Jan. 14, 2019, 5:34 p.m. UTC | #16
On Sun, Jan 13, 2019 at 10:46:08AM +0200, Leon Romanovsky wrote:

> > > Who enqueues work to dev->advise_mr_wq? schedule_work() in
> > > mlx5_ib_advise_mr_prefetch() does to global workqueue.
> >
> > Ah! I *thought* I checked this, yes, using the system work queue is
> > why I added the put_device :)
> >
> > > It should be done to advise_mr_wq().  This will give chance to flush
> > > the wq when IB device is unregistered by the core.
> >
> > Good question - Moni??
> 
> It is definitely missed in my review, but if the IB/core doesn't flush
> general system queue on driver removal, we are better to fix this.
> 
> First addition of new works should be stopped and second system
> queue should be flushed. My general guideline is to avoid creating
> extra workqueues and I saw number of drivers are using system
> queues.

It is up to each driver to flush all the work queues it uses after
ib_unregister_driver completes and before it allowed module unload to
proceed.

Jason
Jason Gunthorpe Jan. 14, 2019, 5:36 p.m. UTC | #17
On Sun, Jan 13, 2019 at 10:26:27AM +0200, Leon Romanovsky wrote:
> On Sat, Jan 12, 2019 at 05:05:49PM +0000, Jason Gunthorpe wrote:
> > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > >
> > > > Work must hold a kref on the ib_device otherwise the dev pointer can
> > > > become free before the work runs.
> > >
> > > How can it occur exactly?
> > >
> > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> > > drain advise_mr queue before it. Such drain will ensure that no works
> > > are running.
> >
> > Okay, that is fine for the get_device, it can be removed.
> >
> > Are we OK with the reg_state hunk?
> 
> I'm not sure about that
> Maybe current code is not nice as you suggested but IB_DEV_REGISTERED
> check also valid approach.

It is not. Checking the state without holding the appropriate locks
does nothing useful.

We need to delete the test or properly hold the registration lock.

> More general, I don't like how IB/core handles states and IMHO we should
> push our stack to be similar to block layer elevator. Block layer implements
> clear stages and calls to callbacks provided by drivers. It removes the need
> to manage and check any state in those drivers.

I'm not super familiar with the block layer, but I don't think more
callbacks will be in anyway more clear here. We have a very
complicated lifetime model already..

Jason
Leon Romanovsky Jan. 14, 2019, 6:26 p.m. UTC | #18
On Mon, Jan 14, 2019 at 05:36:52PM +0000, Jason Gunthorpe wrote:
> On Sun, Jan 13, 2019 at 10:26:27AM +0200, Leon Romanovsky wrote:
> > On Sat, Jan 12, 2019 at 05:05:49PM +0000, Jason Gunthorpe wrote:
> > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > >
> > > > > Work must hold a kref on the ib_device otherwise the dev pointer can
> > > > > become free before the work runs.
> > > >
> > > > How can it occur exactly?
> > > >
> > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> > > > drain advise_mr queue before it. Such drain will ensure that no works
> > > > are running.
> > >
> > > Okay, that is fine for the get_device, it can be removed.
> > >
> > > Are we OK with the reg_state hunk?
> >
> > I'm not sure about that
> > Maybe current code is not nice as you suggested but IB_DEV_REGISTERED
> > check also valid approach.
>
> It is not. Checking the state without holding the appropriate locks
> does nothing useful.
>
> We need to delete the test or properly hold the registration lock.
>
> > More general, I don't like how IB/core handles states and IMHO we should
> > push our stack to be similar to block layer elevator. Block layer implements
> > clear stages and calls to callbacks provided by drivers. It removes the need
> > to manage and check any state in those drivers.
>
> I'm not super familiar with the block layer, but I don't think more
> callbacks will be in anyway more clear here. We have a very
> complicated lifetime model already..

The idea is to provide very clear device state machine which will ensure
that once driver was called, IB/core won't change state without need to
annotate driver code with various get/put.

Thanks

>
> Jason
Jason Gunthorpe Jan. 14, 2019, 6:36 p.m. UTC | #19
On Mon, Jan 14, 2019 at 08:26:20PM +0200, Leon Romanovsky wrote:
> On Mon, Jan 14, 2019 at 05:36:52PM +0000, Jason Gunthorpe wrote:
> > On Sun, Jan 13, 2019 at 10:26:27AM +0200, Leon Romanovsky wrote:
> > > On Sat, Jan 12, 2019 at 05:05:49PM +0000, Jason Gunthorpe wrote:
> > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > > >
> > > > > > Work must hold a kref on the ib_device otherwise the dev pointer can
> > > > > > become free before the work runs.
> > > > >
> > > > > How can it occur exactly?
> > > > >
> > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist will
> > > > > drain advise_mr queue before it. Such drain will ensure that no works
> > > > > are running.
> > > >
> > > > Okay, that is fine for the get_device, it can be removed.
> > > >
> > > > Are we OK with the reg_state hunk?
> > >
> > > I'm not sure about that
> > > Maybe current code is not nice as you suggested but IB_DEV_REGISTERED
> > > check also valid approach.
> >
> > It is not. Checking the state without holding the appropriate locks
> > does nothing useful.
> >
> > We need to delete the test or properly hold the registration lock.
> >
> > > More general, I don't like how IB/core handles states and IMHO we should
> > > push our stack to be similar to block layer elevator. Block layer implements
> > > clear stages and calls to callbacks provided by drivers. It removes the need
> > > to manage and check any state in those drivers.
> >
> > I'm not super familiar with the block layer, but I don't think more
> > callbacks will be in anyway more clear here. We have a very
> > complicated lifetime model already..
> 
> The idea is to provide very clear device state machine which will ensure
> that once driver was called, IB/core won't change state without need to
> annotate driver code with various get/put.

We do that.

Today the driver is in control of the lifetime of both the memory and
the 'registration', as it is the one that calls ib_device_unregister
and the badly named ib_dealloc

The registration lock is a tool the driver can use to prevent progress
through unregistration in cases where it cares about that. This is
much better than having every driver try and build their own lock, as
I saw in bxnt and rxe for instance.

Jason
Moni Shoua Jan. 21, 2019, 5:05 p.m. UTC | #20
On Sat, Jan 12, 2019 at 4:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> From: Jason Gunthorpe <jgg@mellanox.com>
>
> Work must hold a kref on the ib_device otherwise the dev pointer can
> become free before the work runs.
>
> Remove the bogus use of 'reg_state':
>  - While in uverbs the reg_state is guaranteed to always be
>    REGISTERED
>  - Testing reg_state with no locking is bogus. Use ib_device_try_get()
>    to get back into a region that prevents unregistration.
>
> Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  drivers/infiniband/hw/mlx5/odp.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> index 8d46b1dc56580f..a241a65834d4ca 100644
> --- a/drivers/infiniband/hw/mlx5/odp.c
> +++ b/drivers/infiniband/hw/mlx5/odp.c
> @@ -1595,10 +1595,12 @@ static void mlx5_ib_prefetch_mr_work(struct work_struct *work)
>         struct prefetch_mr_work *w =
>                 container_of(work, struct prefetch_mr_work, work);
>
> -       if (w->dev->ib_dev.reg_state == IB_DEV_REGISTERED)
> +       if (ib_device_try_get(&w->dev->ib_dev)) {
>                 mlx5_ib_prefetch_sg_list(w->dev, w->pf_flags, w->sg_list,
>                                          w->num_sge);
> -
> +               ib_device_put(&w->dev->ib_dev);
> +       }
> +       put_device(&w->dev->ib_dev.dev);
>         kfree(w);
>  }
>
> @@ -1617,15 +1619,13 @@ int mlx5_ib_advise_mr_prefetch(struct ib_pd *pd,
>                 return mlx5_ib_prefetch_sg_list(dev, pf_flags, sg_list,
>                                                 num_sge);
>
> -       if (dev->ib_dev.reg_state != IB_DEV_REGISTERED)
> -               return -ENODEV;
> -
>         work = kvzalloc(struct_size(work, sg_list, num_sge), GFP_KERNEL);
>         if (!work)
>                 return -ENOMEM;
>
>         memcpy(work->sg_list, sg_list, num_sge * sizeof(struct ib_sge));
>
> +       get_device(&dev->ib_dev.dev);
>         work->dev = dev;
>         work->pf_flags = pf_flags;
>         work->num_sge = num_sge;
> --
> 2.20.1
>
Looks good

Reviewed-by: Moni Shoua <monis@mellanox.com>
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 8d46b1dc56580f..a241a65834d4ca 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -1595,10 +1595,12 @@  static void mlx5_ib_prefetch_mr_work(struct work_struct *work)
 	struct prefetch_mr_work *w =
 		container_of(work, struct prefetch_mr_work, work);
 
-	if (w->dev->ib_dev.reg_state == IB_DEV_REGISTERED)
+	if (ib_device_try_get(&w->dev->ib_dev)) {
 		mlx5_ib_prefetch_sg_list(w->dev, w->pf_flags, w->sg_list,
 					 w->num_sge);
-
+		ib_device_put(&w->dev->ib_dev);
+	}
+	put_device(&w->dev->ib_dev.dev);
 	kfree(w);
 }
 
@@ -1617,15 +1619,13 @@  int mlx5_ib_advise_mr_prefetch(struct ib_pd *pd,
 		return mlx5_ib_prefetch_sg_list(dev, pf_flags, sg_list,
 						num_sge);
 
-	if (dev->ib_dev.reg_state != IB_DEV_REGISTERED)
-		return -ENODEV;
-
 	work = kvzalloc(struct_size(work, sg_list, num_sge), GFP_KERNEL);
 	if (!work)
 		return -ENOMEM;
 
 	memcpy(work->sg_list, sg_list, num_sge * sizeof(struct ib_sge));
 
+	get_device(&dev->ib_dev.dev);
 	work->dev = dev;
 	work->pf_flags = pf_flags;
 	work->num_sge = num_sge;