Message ID | 1467548899-21923-2-git-send-email-leon@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Sun, 2016-07-03 at 15:28 +0300, Leon Romanovsky wrote: > From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > > Fixes an oops that might happen if uverbs_close races with > remove_one. > > Both contexts may run ib_uverbs_cleanup_ucontext, it depends > on the flow. > > Currently, there is no protection for a case that remove_one > didn't make the cleanup it runs to its end, the underlying > ib_device was freed then uverbs_close will call > ib_uverbs_cleanup_ucontext and OOPs. > > Above might happen if uverbs_close deleted the file from the list > then remove_one didn't find it and runs to its end. > > Fixes to protect against that case by a new cleanup lock so that > ib_uverbs_cleanup_ucontext will be called always before that > remove_one is ended. > > Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and > remove_one") > Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> > Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > Signed-off-by: Yishai Hadas <yishaih@mellanox.com> > Signed-off-by: Leon Romanovsky <leon@kernel.org> The only reason I hadn't taken this patch before is because Jason said it was totally untested and someone (Devesh in this case) needed to test it to make sure it resolved their problem. I don't see a test-by line here, so has this happened?
On Tue, Aug 2, 2016 at 9:31 PM, Doug Ledford <dledford@redhat.com> wrote: > On Sun, 2016-07-03 at 15:28 +0300, Leon Romanovsky wrote: >> From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> >> >> Fixes an oops that might happen if uverbs_close races with >> remove_one. >> >> Both contexts may run ib_uverbs_cleanup_ucontext, it depends >> on the flow. >> >> Currently, there is no protection for a case that remove_one >> didn't make the cleanup it runs to its end, the underlying >> ib_device was freed then uverbs_close will call >> ib_uverbs_cleanup_ucontext and OOPs. >> >> Above might happen if uverbs_close deleted the file from the list >> then remove_one didn't find it and runs to its end. >> >> Fixes to protect against that case by a new cleanup lock so that >> ib_uverbs_cleanup_ucontext will be called always before that >> remove_one is ended. >> >> Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and >> remove_one") >> Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> >> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> >> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> >> Signed-off-by: Leon Romanovsky <leon@kernel.org> > > The only reason I hadn't taken this patch before is because Jason said > it was totally untested and someone (Devesh in this case) needed to > test it to make sure it resolved their problem. I don't see a test-by > line here, so has this happened? I can't say about Devesh, but we verified it as part of our verification phase. Do you need anything special except Yishai's SOB? > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: 0E572FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 02, 2016 at 09:50:14PM +0300, Leon Romanovsky wrote: > > The only reason I hadn't taken this patch before is because Jason said > > it was totally untested and someone (Devesh in this case) needed to > > test it to make sure it resolved their problem. I don't see a test-by > > line here, so has this happened? > > I can't say about Devesh, but we verified it as part of our verification phase. > Do you need anything special except Yishai's SOB? I was hoping to see Devesh confirm that the race he was able to reproduce is in fact closed, unless Mellanox did the reproduction? At least your testing shows it doesn't make things worse, so it is probably OK to go ahead if Devesh does not respond soon. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2016-08-02 at 13:26 -0600, Jason Gunthorpe wrote: > On Tue, Aug 02, 2016 at 09:50:14PM +0300, Leon Romanovsky wrote: > > > > > > > > The only reason I hadn't taken this patch before is because Jason > > > said > > > it was totally untested and someone (Devesh in this case) needed > > > to > > > test it to make sure it resolved their problem. I don't see a > > > test-by > > > line here, so has this happened? > > > > I can't say about Devesh, but we verified it as part of our > > verification phase. > > Do you need anything special except Yishai's SOB? > > I was hoping to see Devesh confirm that the race he was able to > reproduce is in fact closed, unless Mellanox did the reproduction? > > At least your testing shows it doesn't make things worse, so it is > probably OK to go ahead if Devesh does not respond soon. > > Jason I'm not sure what Devesh is off doing now a days, so I went ahead and took this. With the fact that Mellanox tested it, I'd rather have it in than out.
On Tue, Aug 02, 2016 at 10:43:14PM -0400, Doug Ledford wrote: > On Tue, 2016-08-02 at 13:26 -0600, Jason Gunthorpe wrote: > > On Tue, Aug 02, 2016 at 09:50:14PM +0300, Leon Romanovsky wrote: > > > > > > > > > > > The only reason I hadn't taken this patch before is because Jason > > > > said > > > > it was totally untested and someone (Devesh in this case) needed > > > > to > > > > test it to make sure it resolved their problem. I don't see a > > > > test-by > > > > line here, so has this happened? > > > > > > I can't say about Devesh, but we verified it as part of our > > > verification phase. > > > Do you need anything special except Yishai's SOB? > > > > I was hoping to see Devesh confirm that the race he was able to > > reproduce is in fact closed, unless Mellanox did the reproduction? > > > > At least your testing shows it doesn't make things worse, so it is > > probably OK to go ahead if Devesh does not respond soon. > > > > Jason > > I'm not sure what Devesh is off doing now a days, so I went ahead and > took this. With the fact that Mellanox tested it, I'd rather have it > in than out. Thanks > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: 0E572FDD
Hi Jason et al. I could not respond to this mail chain. I was quite busy somewhere else and did not had any bandwidth at all to get back to this one. Thanks for taking care of this issue. I would try to see If I could confirm that the fix works for ocrdma. -Regards On Wed, Aug 3, 2016 at 10:48 AM, Leon Romanovsky <leon@kernel.org> wrote: > On Tue, Aug 02, 2016 at 10:43:14PM -0400, Doug Ledford wrote: >> On Tue, 2016-08-02 at 13:26 -0600, Jason Gunthorpe wrote: >> > On Tue, Aug 02, 2016 at 09:50:14PM +0300, Leon Romanovsky wrote: >> > > >> > > > >> > > > The only reason I hadn't taken this patch before is because Jason >> > > > said >> > > > it was totally untested and someone (Devesh in this case) needed >> > > > to >> > > > test it to make sure it resolved their problem. I don't see a >> > > > test-by >> > > > line here, so has this happened? >> > > >> > > I can't say about Devesh, but we verified it as part of our >> > > verification phase. >> > > Do you need anything special except Yishai's SOB? >> > >> > I was hoping to see Devesh confirm that the race he was able to >> > reproduce is in fact closed, unless Mellanox did the reproduction? >> > >> > At least your testing shows it doesn't make things worse, so it is >> > probably OK to go ahead if Devesh does not respond soon. >> > >> > Jason >> >> I'm not sure what Devesh is off doing now a days, so I went ahead and >> took this. With the fact that Mellanox tested it, I'd rather have it >> in than out. > > Thanks > >> >> -- >> Doug Ledford <dledford@redhat.com> >> GPG KeyID: 0E572FDD > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index b7f3b8d..df26a74 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -116,6 +116,7 @@ struct ib_uverbs_event_file { struct ib_uverbs_file { struct kref ref; struct mutex mutex; + struct mutex cleanup_mutex; /* protect cleanup */ struct ib_uverbs_device *device; struct ib_ucontext *ucontext; struct ib_event_handler event_handler; diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 426e0ac..0012fa5 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -969,6 +969,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) file->async_file = NULL; kref_init(&file->ref); mutex_init(&file->mutex); + mutex_init(&file->cleanup_mutex); filp->private_data = file; kobject_get(&dev->kobj); @@ -994,18 +995,20 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp) { struct ib_uverbs_file *file = filp->private_data; struct ib_uverbs_device *dev = file->device; - struct ib_ucontext *ucontext = NULL; + + mutex_lock(&file->cleanup_mutex); + if (file->ucontext) { + ib_uverbs_cleanup_ucontext(file, file->ucontext); + file->ucontext = NULL; + } + mutex_unlock(&file->cleanup_mutex); mutex_lock(&file->device->lists_mutex); - ucontext = file->ucontext; - file->ucontext = NULL; if (!file->is_closed) { list_del(&file->list); file->is_closed = 1; } mutex_unlock(&file->device->lists_mutex); - if (ucontext) - ib_uverbs_cleanup_ucontext(file, ucontext); if (file->async_file) kref_put(&file->async_file->ref, ib_uverbs_release_event_file); @@ -1219,22 +1222,30 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev, mutex_lock(&uverbs_dev->lists_mutex); while (!list_empty(&uverbs_dev->uverbs_file_list)) { struct ib_ucontext *ucontext; - file = list_first_entry(&uverbs_dev->uverbs_file_list, struct ib_uverbs_file, list); file->is_closed = 1; - ucontext = file->ucontext; list_del(&file->list); - file->ucontext = NULL; kref_get(&file->ref); mutex_unlock(&uverbs_dev->lists_mutex); - /* We must release the mutex before going ahead and calling - * disassociate_ucontext. disassociate_ucontext might end up - * indirectly calling uverbs_close, for example due to freeing - * the resources (e.g mmput). - */ + ib_uverbs_event_handler(&file->event_handler, &event); + + mutex_lock(&file->cleanup_mutex); + ucontext = file->ucontext; + file->ucontext = NULL; + mutex_unlock(&file->cleanup_mutex); + + /* At this point ib_uverbs_close cannot be running + * ib_uverbs_cleanup_ucontext + */ if (ucontext) { + /* We must release the mutex before going ahead and + * calling disassociate_ucontext. disassociate_ucontext + * might end up indirectly calling uverbs_close, + * for example due to freeing the resources + * (e.g mmput). + */ ib_dev->disassociate_ucontext(ucontext); ib_uverbs_cleanup_ucontext(file, ucontext); }