Message ID | CAAKD3BDFrTMMgX0nErD50rp2je=HC9zeaYWHDKf0mqQwc5fM9g@mail.gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Thu, Jul 27, 2017 at 03:54:07PM +0300, Matan Barak wrote: > Digging a bit, we found a fix that might be related to this issue. > I would be happy if you could try that and report if it solved this problem. > We plan to send it soon. Yep this looks like it. FWIW, it causes random kernel memory corruption and failures in my experience, I was very lucky to get such a clean oops the first time.. > commit 1d4ecbf034193f000fe6686586c40ab4b2a95da1 > Author: Yishai Hadas <yishaih@mellanox.com> > Date: Thu Jul 27 15:49:00 2017 +0200 > > IB/uverbs: Fix device cleanup > > Uverbs device should be cleaned up only when there is no > potential usage of. > > As part of ib_uverbs_remove_one which might be triggered upon reset flow > the device reference count is decreased as expected and leave the final > cleanup to the FDs that were opened. > > Current code increases reference count upon opening a new command FD and > decreases it upon closing the file. The event FD is opened internally > and rely on the command FD by taking on it a reference count. > > In case that the command FD was closed and just later the event FD we > may ensure that the device resources as of srcu are still alive as they > are still in use. > > Fixing the above by moving the reference count decreasing to the place > where the command FD is really freed instead of doing that when it was > just closed. > > Signed-off-by: Yishai Hadas <yishaih@mellanox.com> > Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Please add a fixes line Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 27, 2017 at 02:44:37PM -0600, Jason Gunthorpe wrote: > On Thu, Jul 27, 2017 at 03:54:07PM +0300, Matan Barak wrote: > > > Digging a bit, we found a fix that might be related to this issue. > > I would be happy if you could try that and report if it solved this problem. > > We plan to send it soon. > > Yep this looks like it. > > FWIW, it causes random kernel memory corruption and failures in my > experience, I was very lucky to get such a clean oops the first time.. > > > commit 1d4ecbf034193f000fe6686586c40ab4b2a95da1 > > Author: Yishai Hadas <yishaih@mellanox.com> > > Date: Thu Jul 27 15:49:00 2017 +0200 > > > > IB/uverbs: Fix device cleanup > > > > Uverbs device should be cleaned up only when there is no > > potential usage of. > > > > As part of ib_uverbs_remove_one which might be triggered upon reset flow > > the device reference count is decreased as expected and leave the final > > cleanup to the FDs that were opened. > > > > Current code increases reference count upon opening a new command FD and > > decreases it upon closing the file. The event FD is opened internally > > and rely on the command FD by taking on it a reference count. > > > > In case that the command FD was closed and just later the event FD we > > may ensure that the device resources as of srcu are still alive as they > > are still in use. > > > > Fixing the above by moving the reference count decreasing to the place > > where the command FD is really freed instead of doing that when it was > > just closed. > > > > Signed-off-by: Yishai Hadas <yishaih@mellanox.com> > > Reviewed-by: Matan Barak <matanb@mellanox.com> > > Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > Please add a fixes line Hi Jason, I queued it [1] for submission, once the IPoIB fixes [2] will be accepted, I'll submit it. [1] https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=rdma-rc&id=38a974d578451dbbde0c40fc2d81fba44027a338 [2] http://marc.info/?l=linux-rdma&m=150109276402195&w=2 > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jul 30, 2017 at 01:25:14PM +0300, Leon Romanovsky wrote: > > Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > > > Please add a fixes line > > Hi Jason, > > I queued it [1] for submission, once the IPoIB fixes [2] will be > accepted, I'll submit it. Isn't fixing random kernel memory corruption triggerable by userspace exactly the sort of thing we should be rushing into Linus's tree? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jul 30, 2017 at 09:52:08PM -0600, Jason Gunthorpe wrote: > On Sun, Jul 30, 2017 at 01:25:14PM +0300, Leon Romanovsky wrote: > > > Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > > Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > > > > > Please add a fixes line > > > > Hi Jason, > > > > I queued it [1] for submission, once the IPoIB fixes [2] will be > > accepted, I'll submit it. > > Isn't fixing random kernel memory corruption triggerable by userspace > exactly the sort of thing we should be rushing into Linus's tree? I'm still looking on the easiest way to submit patches and want to see that everything is working before "rushing". Thanks > > Jason
On Mon, Jul 31, 2017 at 08:39:01AM +0300, Leon Romanovsky wrote: > On Sun, Jul 30, 2017 at 09:52:08PM -0600, Jason Gunthorpe wrote: > > On Sun, Jul 30, 2017 at 01:25:14PM +0300, Leon Romanovsky wrote: > > > > Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > > > Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > > > > > > > Please add a fixes line > > > > > > Hi Jason, > > > > > > I queued it [1] for submission, once the IPoIB fixes [2] will be > > > accepted, I'll submit it. > > > > Isn't fixing random kernel memory corruption triggerable by userspace > > exactly the sort of thing we should be rushing into Linus's tree? > > I'm still looking on the easiest way to submit patches and want to see > that everything is working before "rushing". OK, I followed the advice [1] - "I would prefer to see is one submission, then two or three days (so the first submission has had some bake time), then the next one and it should assume the first is applied." And submitted the fix [2]. [1] http://marc.info/?l=linux-rdma&m=150020922014003&w=2 [2] https://patchwork.kernel.org/patch/9871109/ > > Thanks > > > > > Jason
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index a88d223..cb1729a 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -251,6 +251,7 @@ void ib_uverbs_release_file(struct kref *ref) if (atomic_dec_and_test(&file->device->refcount)) ib_uverbs_comp_dev(file->device); + kobject_put(&file->device->kobj); kfree(file); } @@ -918,7 +919,6 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) static int ib_uverbs_close(struct inode *inode, struct file *filp) { struct ib_uverbs_file *file = filp->private_data; - struct ib_uverbs_device *dev = file->device; mutex_lock(&file->cleanup_mutex);