Message ID | 20180321013008.GC6235@ziepe.ca (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Tue, Mar 20, 2018 at 07:30:08PM -0600, Jason Gunthorpe wrote: > On Wed, Mar 21, 2018 at 12:11:59AM +0000, Parav Pandit wrote: > > > > > Right way to do ib core to have open() and close() callback. > > > > > > Probably.. But since we already accepted the patches that cause this bug for this > > > merge window we may be stuck accepting a hack. > > > > > > The hack is to let mlx5 opt out of resource tracking for it's internal objects. > > > There are some patches from Steve that I didn't review deeply, but > > those patches let some internal data structure of the provider > > driver to be exposed to user for debugging purpose, which is very > > useful. > > > > Given UMR QP is one such core QP that might need debugging as well > > in future and some other internal UD QPs that Bodong is working on. > > Given that it might be useful to debug such internal QP as > > well. Having them resource tracked is useful instead of opting out. > > So not tracking them further creates blockers to not able to debug > > them in future. > > Yes, that seems to be likely. Someone will ultimately have to add new > methods and revise mlx5. It really shouldn't be creating resources > prior to registering.. There is main difference between Steve's patches and UMR flows. Steve is exposing extra information for standard QPs which are created by ib_core. UMR QPs are something different, they are created by mlx5_ib and ib_core doesn't aware of them, for example their type is something different from known to rdmatool and ib_core. Mark and me discussed the solution to the current situation and proposed current patch. We are all agree that UMR flow needs to be revisited and ideal solution will be create symmetrical create/destroy UMR flows. However we don't have clear vision how to do it in -rc6. This is why it is safe for now to skip UMR completely from restrack. > > > > At least that way we still protect the ULPs on other non-broken drivers. > > Not sure if any other drivers have dangling resources after ib_unregister_device() is called. > > Otherwise Leon' patch would have notrack() API invoked in other provider drivers too. > > It is more the ULPs that worry me. ULPs leaving dangling resources > after client unregister seems kind of likely. > > > > Maybe restrack_init should be moved to the register as well in this patch? > > > That alleast avoids the confusing code. I may admit that names are confusing, but the place is right, ib_alloc_device is responsible to initialize various structures and this is exactly what restrack_init does. The rdma_restrack_clean() will be renamed to something like rdma_restrack_check_leaks() in near future, because I see a need to rewrite the printed output from that function to make debug more sane. > > What if we add this hunk to the patch? Leon? In -rc6? No way. I'll add it after I'll rewrite rdma_restrack_clean(), so people won't get scary messages without ability to debug. Thanks > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > index 0ab99e62cc5ce0..247f2be0a1516b 100644 > --- a/drivers/infiniband/core/device.c > +++ b/drivers/infiniband/core/device.c > @@ -464,6 +464,12 @@ int ib_register_device(struct ib_device *device, > struct ib_udata uhw = {.outlen = 0, .inlen = 0}; > struct device *parent = device->dev.parent; > > + /* > + * Nothing is permitted to create objects prior to calling > + * register_device. > + */ > + rdma_restrack_clean(&device->res); > + > WARN_ON_ONCE(device->dma_device); > if (device->dev.dma_ops) { > /* > > Jason
> -----Original Message----- > From: Leon Romanovsky [mailto:leon@kernel.org] > Sent: Wednesday, March 21, 2018 12:30 AM > To: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Parav Pandit <parav@mellanox.com>; Doug Ledford > <dledford@redhat.com>; RDMA mailing list <linux-rdma@vger.kernel.org>; > Mark Bloch <markb@mellanox.com> > Subject: Re: [PATCH rdma-rc 1/2] RDMA/restrack: Add ability to create non- > traceable restrack objects > > On Tue, Mar 20, 2018 at 07:30:08PM -0600, Jason Gunthorpe wrote: > > On Wed, Mar 21, 2018 at 12:11:59AM +0000, Parav Pandit wrote: > > > > > > > Right way to do ib core to have open() and close() callback. > > > > > > > > Probably.. But since we already accepted the patches that cause > > > > this bug for this merge window we may be stuck accepting a hack. > > > > > > > > The hack is to let mlx5 opt out of resource tracking for it's internal objects. > > > > > There are some patches from Steve that I didn't review deeply, but > > > those patches let some internal data structure of the provider > > > driver to be exposed to user for debugging purpose, which is very > > > useful. > > > > > > Given UMR QP is one such core QP that might need debugging as well > > > in future and some other internal UD QPs that Bodong is working on. > > > Given that it might be useful to debug such internal QP as well. > > > Having them resource tracked is useful instead of opting out. > > > So not tracking them further creates blockers to not able to debug > > > them in future. > > > > Yes, that seems to be likely. Someone will ultimately have to add new > > methods and revise mlx5. It really shouldn't be creating resources > > prior to registering.. > > There is main difference between Steve's patches and UMR flows. Steve is > exposing extra information for standard QPs which are created by ib_core. > > UMR QPs are something different, they are created by mlx5_ib and ib_core UMR QP is of type IB_QPT_RESERVED1 defined by ib_core. > doesn't aware of them, for example their type is something different from > known to rdmatool and ib_core. > > Mark and me discussed the solution to the current situation and proposed > current patch. We are all agree that UMR flow needs to be revisited and ideal > solution will be create symmetrical create/destroy UMR flows. > However we don't have clear vision how to do it in -rc6. > > This is why it is safe for now to skip UMR completely from restrack. restack_clean() in dealloc_device() eliminates this issue if for-rc is the only concern from the response it seems. > > > > > > > At least that way we still protect the ULPs on other non-broken drivers. All provider drivers > > > Not sure if any other drivers have dangling resources after > ib_unregister_device() is called. > > > Otherwise Leon' patch would have notrack() API invoked in other provider > drivers too. > > > > It is more the ULPs that worry me. ULPs leaving dangling resources > > after client unregister seems kind of likely. Given that ib_dealloc_device() still catches those ULP errors. Most drivers call ib_dealloc_device() soon after ib_unregister_device(). > > > > > > Maybe restrack_init should be moved to the register as well in this patch? > > > > > That alleast avoids the confusing code. > > I may admit that names are confusing, but the place is right, ib_alloc_device is > responsible to initialize various structures and this is exactly what restrack_init > does. > > The rdma_restrack_clean() will be renamed to something like > rdma_restrack_check_leaks() in near future, because I see a need to rewrite the > printed output from that function to make debug more sane. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 21, 2018 at 06:22:53AM +0000, Parav Pandit wrote: > > > > -----Original Message----- > > From: Leon Romanovsky [mailto:leon@kernel.org] > > Sent: Wednesday, March 21, 2018 12:30 AM > > To: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Parav Pandit <parav@mellanox.com>; Doug Ledford > > <dledford@redhat.com>; RDMA mailing list <linux-rdma@vger.kernel.org>; > > Mark Bloch <markb@mellanox.com> > > Subject: Re: [PATCH rdma-rc 1/2] RDMA/restrack: Add ability to create non- > > traceable restrack objects > > > > On Tue, Mar 20, 2018 at 07:30:08PM -0600, Jason Gunthorpe wrote: > > > On Wed, Mar 21, 2018 at 12:11:59AM +0000, Parav Pandit wrote: > > > > > > > > > Right way to do ib core to have open() and close() callback. > > > > > > > > > > Probably.. But since we already accepted the patches that cause > > > > > this bug for this merge window we may be stuck accepting a hack. > > > > > > > > > > The hack is to let mlx5 opt out of resource tracking for it's internal objects. > > > > > > > There are some patches from Steve that I didn't review deeply, but > > > > those patches let some internal data structure of the provider > > > > driver to be exposed to user for debugging purpose, which is very > > > > useful. > > > > > > > > Given UMR QP is one such core QP that might need debugging as well > > > > in future and some other internal UD QPs that Bodong is working on. > > > > Given that it might be useful to debug such internal QP as well. > > > > Having them resource tracked is useful instead of opting out. > > > > So not tracking them further creates blockers to not able to debug > > > > them in future. > > > > > > Yes, that seems to be likely. Someone will ultimately have to add new > > > methods and revise mlx5. It really shouldn't be creating resources > > > prior to registering.. > > > > There is main difference between Steve's patches and UMR flows. Steve is > > exposing extra information for standard QPs which are created by ib_core. > > > > UMR QPs are something different, they are created by mlx5_ib and ib_core > UMR QP is of type IB_QPT_RESERVED1 defined by ib_core. > > > doesn't aware of them, for example their type is something different from > > known to rdmatool and ib_core. > > > > Mark and me discussed the solution to the current situation and proposed > > current patch. We are all agree that UMR flow needs to be revisited and ideal > > solution will be create symmetrical create/destroy UMR flows. > > However we don't have clear vision how to do it in -rc6. > > > > This is why it is safe for now to skip UMR completely from restrack. > restack_clean() in dealloc_device() eliminates this issue if for-rc is the only concern from the response it seems. Ok, I tried it now and it fixed the issue, so your solution is even better for -rc. Someone needs to clean UMR mess anyway. Thanks
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 0ab99e62cc5ce0..247f2be0a1516b 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -464,6 +464,12 @@ int ib_register_device(struct ib_device *device, struct ib_udata uhw = {.outlen = 0, .inlen = 0}; struct device *parent = device->dev.parent; + /* + * Nothing is permitted to create objects prior to calling + * register_device. + */ + rdma_restrack_clean(&device->res); + WARN_ON_ONCE(device->dma_device); if (device->dev.dma_ops) { /*