Message ID | 20190112023124.21647-1-jgg@ziepe.ca (mailing list archive) |
---|---|
Headers | show |
Series | Fix poor reference counting in advice_mr work | expand |
On Fri, Jan 11, 2019 at 07:31:22PM -0700, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@mellanox.com> > > Work needs to hold a reference on the pointers that it plans to use otherwise > they can become free'd or unregistered. Expose the existing registration lock > that netlink uses for this purpose. > > Jason Gunthorpe (2): > RDMA/device: Expose ib_device_try_get(() > IB/mlx5: Fix how advise_mr() launches async work I've applied both of these to for-rc, with the change in comment that Parav gave. We can revisit if get_device/put_device is the best way to handle this when the work queue issue is resolved, most probably we can drop the get/put in the driver once the work queue is flushed prior to deallocing the device. Jason
On Mon, Jan 21, 2019 at 02:43:41PM -0700, Jason Gunthorpe wrote: > On Fri, Jan 11, 2019 at 07:31:22PM -0700, Jason Gunthorpe wrote: > > From: Jason Gunthorpe <jgg@mellanox.com> > > > > Work needs to hold a reference on the pointers that it plans to use otherwise > > they can become free'd or unregistered. Expose the existing registration lock > > that netlink uses for this purpose. > > > > Jason Gunthorpe (2): > > RDMA/device: Expose ib_device_try_get(() > > IB/mlx5: Fix how advise_mr() launches async work > > I've applied both of these to for-rc, with the change in comment that > Parav gave. > > We can revisit if get_device/put_device is the best way to handle this > when the work queue issue is resolved, most probably we can drop the > get/put in the driver once the work queue is flushed prior to > deallocing the device. Didn't we agree that the right fix is to delete advise_mr and sync system queue on exit? It will be much more clean to do it right from the beginning instead of adding new get_device/put_device and removing them immediately in following patch. Thanks > > Jason
On Tue, Jan 22, 2019 at 07:52:20AM +0200, Leon Romanovsky wrote: > On Mon, Jan 21, 2019 at 02:43:41PM -0700, Jason Gunthorpe wrote: > > On Fri, Jan 11, 2019 at 07:31:22PM -0700, Jason Gunthorpe wrote: > > > From: Jason Gunthorpe <jgg@mellanox.com> > > > > > > Work needs to hold a reference on the pointers that it plans to use otherwise > > > they can become free'd or unregistered. Expose the existing registration lock > > > that netlink uses for this purpose. > > > > > > Jason Gunthorpe (2): > > > RDMA/device: Expose ib_device_try_get(() > > > IB/mlx5: Fix how advise_mr() launches async work > > > > I've applied both of these to for-rc, with the change in comment that > > Parav gave. > > > > We can revisit if get_device/put_device is the best way to handle this > > when the work queue issue is resolved, most probably we can drop the > > get/put in the driver once the work queue is flushed prior to > > deallocing the device. > > Didn't we agree that the right fix is to delete advise_mr and sync > system queue on exit? I need something for -rc, and that fix was way too big. > It will be much more clean to do it right from the beginning instead of > adding new get_device/put_device and removing them immediately in > following patch. The WQ cleanup patch will be in -next.. Jason
From: Jason Gunthorpe <jgg@mellanox.com> Work needs to hold a reference on the pointers that it plans to use otherwise they can become free'd or unregistered. Expose the existing registration lock that netlink uses for this purpose. Jason Gunthorpe (2): RDMA/device: Expose ib_device_try_get(() IB/mlx5: Fix how advise_mr() launches async work drivers/infiniband/core/core_priv.h | 1 - drivers/infiniband/core/device.c | 6 +++--- drivers/infiniband/hw/mlx5/odp.c | 10 +++++----- include/rdma/ib_verbs.h | 24 ++++++++++++++++++++++-- 4 files changed, 30 insertions(+), 11 deletions(-)