Message ID | 16d6916702fe746e08a7e9b20d90dc2fbfe92184.1536688044.git.swise@opengridcomputing.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [v2] ib_uverbs: atomically flush and mark closed the comp event queue | expand |
Ignore the old date... Jason, is this good to go? Steve. On 8/31/2018 9:16 AM, Steve Wise wrote: > Currently a uverbs completion event queue is flushed of events in > ib_uverbs_comp_event_close() with the queue spinlock held and then > released. Yet setting ev_queue->is_closed is not set until later in > uverbs_hot_unplug_completion_event_file(). > > In between the time ib_uverbs_comp_event_close() releases the lock > and uverbs_hot_unplug_completion_event_file() acquires the lock, a > completion event can arrive and be inserted into the event queue by > ib_uverbs_comp_handler(). > > This can cause a "double add" list_add warning or crash depending on > the kernel configuration, or a memory leak because the event is never > dequeued since the queue is already closed down. > > So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close(). > > Cc: stable@vger.kernel.org > Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema") > Signed-off-by: Steve Wise <swise@opengridcomputing.com> > --- > > Changes since v1: > > leave setting ev_queue.is_closed to 1 in > uverbs_hot_unplug_completion_event_file(). > > --- > drivers/infiniband/core/uverbs_main.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c > index 823beca448e1..b3ef7db68bde 100644 > --- a/drivers/infiniband/core/uverbs_main.c > +++ b/drivers/infiniband/core/uverbs_main.c > @@ -440,6 +440,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp) > list_del(&entry->obj_list); > kfree(entry); > } > + file->ev_queue.is_closed = 1; > spin_unlock_irq(&file->ev_queue.lock); > > uverbs_close_fd(filp); >
On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote: > Ignore the old date... > > Jason, is this good to go? I'm not sure, I think your old patch might have been the right one.. But I haven't checked it all through again yet. But I'm still not sure if the kref should be allowed to work like this at all.. Leaving a long term dangling filep in the object pointer is not how this was really intended to work. Jason
On 9/11/2018 4:25 PM, Jason Gunthorpe wrote: > On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote: >> Ignore the old date... >> >> Jason, is this good to go? > > I'm not sure, I think your old patch might have been the right > one.. But I haven't checked it all through again yet. > > But I'm still not sure if the kref should be allowed to work like > this at all.. > > Leaving a long term dangling filep in the object pointer is not how > this was really intended to work. > > Jason > Looking at the original commit (1e7710f3f656), ib_uverbs_async_event_close() sets is_closed, yet ib_uverbs_comp_event_close() does not. Seems like an oversight in that commit maybe? But as for the whole design of this part of the rdma core, I'm not to familiar with how it "should work" to comment further. I just saw the evidence in the dump and tracked down the issue. :) This is certainly a regression from cxgb4's perspective: These tests past prior to commit 1e7710f3f656, I believe. Steve
On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote: > > > On 9/11/2018 4:25 PM, Jason Gunthorpe wrote: > > On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote: > >> Ignore the old date... > >> > >> Jason, is this good to go? > > > > I'm not sure, I think your old patch might have been the right > > one.. But I haven't checked it all through again yet. > > > > But I'm still not sure if the kref should be allowed to work like > > this at all.. > > > > Leaving a long term dangling filep in the object pointer is not how > > this was really intended to work. > > > > Jason > > > > > Looking at the original commit (1e7710f3f656), > ib_uverbs_async_event_close() sets is_closed, yet > ib_uverbs_comp_event_close() does not. Seems like an oversight in that > commit maybe? But as for the whole design of this part of the rdma > core, I'm not to familiar with how it "should work" to comment further. > I just saw the evidence in the dump and tracked down the issue. :) Hmm... Okay, the other reason this is so confusing is because the is_closed in uverbs_hot_unplug_completion_event_file() is not doing what it needs to do .. With the below it starts to make more sense and has sensible locking.. Once the HW disassociates is_closed becomes set and poll indicates readable and read returns EIO, which is the standard semantic for this sort of case. So, yes, this is the right version of this, we should be keeping the is_closed in uverbs_hot_unplug_completion_event_file().. diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 50152c1b100452..4a9d84038762fb 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue, spin_lock_irq(&ev_queue->lock); while (list_empty(&ev_queue->event_list)) { + bool closed = ev_queue->is_closed; + spin_unlock_irq(&ev_queue->lock); + if (closed) + return -EIO; if (filp->f_flags & O_NONBLOCK) return -EAGAIN; if (wait_event_interruptible(ev_queue->poll_wait, (!list_empty(&ev_queue->event_list) || - /* The barriers built into wait_event_interruptible() - * and wake_up() guarentee this will see the null set - * without using RCU - */ - !uverbs_file->device->ib_dev))) + ev_queue->is_closed))) return -ERESTARTSYS; - /* If device was disassociated and no event exists set an error */ - if (list_empty(&ev_queue->event_list) && - !uverbs_file->device->ib_dev) - return -EIO; - spin_lock_irq(&ev_queue->lock); } @@ -361,7 +356,7 @@ static __poll_t ib_uverbs_event_poll(struct ib_uverbs_event_queue *ev_queue, poll_wait(filp, &ev_queue->poll_wait, wait); spin_lock_irq(&ev_queue->lock); - if (!list_empty(&ev_queue->event_list)) + if (!list_empty(&ev_queue->event_list) || ev_queue->is_closed) pollflags = EPOLLIN | EPOLLRDNORM; spin_unlock_irq(&ev_queue->lock);
On Fri, Aug 31, 2018 at 07:16:03AM -0700, Steve Wise wrote: > Currently a uverbs completion event queue is flushed of events in > ib_uverbs_comp_event_close() with the queue spinlock held and then > released. Yet setting ev_queue->is_closed is not set until later in > uverbs_hot_unplug_completion_event_file(). > > In between the time ib_uverbs_comp_event_close() releases the lock > and uverbs_hot_unplug_completion_event_file() acquires the lock, a > completion event can arrive and be inserted into the event queue by > ib_uverbs_comp_handler(). > > This can cause a "double add" list_add warning or crash depending on > the kernel configuration, or a memory leak because the event is never > dequeued since the queue is already closed down. > > So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close(). > > Cc: stable@vger.kernel.org > Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema") > Signed-off-by: Steve Wise <swise@opengridcomputing.com> > --- > > Changes since v1: > > leave setting ev_queue.is_closed to 1 in > uverbs_hot_unplug_completion_event_file(). > > --- > drivers/infiniband/core/uverbs_main.c | 1 + > 1 file changed, 1 insertion(+) Applied to for-rc thanks Jason
On 9/12/2018 4:33 PM, Jason Gunthorpe wrote: > On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote: >> >> >> On 9/11/2018 4:25 PM, Jason Gunthorpe wrote: >>> On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote: >>>> Ignore the old date... >>>> >>>> Jason, is this good to go? >>> >>> I'm not sure, I think your old patch might have been the right >>> one.. But I haven't checked it all through again yet. >>> >>> But I'm still not sure if the kref should be allowed to work like >>> this at all.. >>> >>> Leaving a long term dangling filep in the object pointer is not how >>> this was really intended to work. >>> >>> Jason >>> >> >> >> Looking at the original commit (1e7710f3f656), >> ib_uverbs_async_event_close() sets is_closed, yet >> ib_uverbs_comp_event_close() does not. Seems like an oversight in that >> commit maybe? But as for the whole design of this part of the rdma >> core, I'm not to familiar with how it "should work" to comment further. >> I just saw the evidence in the dump and tracked down the issue. :) > > Hmm... > > Okay, the other reason this is so confusing is because the is_closed > in uverbs_hot_unplug_completion_event_file() is not doing what it > needs to do .. > > With the below it starts to make more sense and has sensible > locking.. Once the HW disassociates is_closed becomes set and poll > indicates readable and read returns EIO, which is the standard > semantic for this sort of case. > > So, yes, this is the right version of this, we should be keeping > the is_closed in uverbs_hot_unplug_completion_event_file().. > > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c > index 50152c1b100452..4a9d84038762fb 100644 > --- a/drivers/infiniband/core/uverbs_main.c > +++ b/drivers/infiniband/core/uverbs_main.c > @@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue, > spin_lock_irq(&ev_queue->lock); > > while (list_empty(&ev_queue->event_list)) { > + bool closed = ev_queue->is_closed; > + > spin_unlock_irq(&ev_queue->lock); > + if (closed) > + return -EIO; Shouldn't reading ev_queue->is_closed be done inside the lock? What if it is set immediately after reading it and before this code acquires the lock? > > if (filp->f_flags & O_NONBLOCK) > return -EAGAIN; > > if (wait_event_interruptible(ev_queue->poll_wait, > (!list_empty(&ev_queue->event_list) || > - /* The barriers built into wait_event_interruptible() > - * and wake_up() guarentee this will see the null set > - * without using RCU > - */ > - !uverbs_file->device->ib_dev))) > + ev_queue->is_closed))) > return -ERESTARTSYS; > > - /* If device was disassociated and no event exists set an error */ > - if (list_empty(&ev_queue->event_list) && > - !uverbs_file->device->ib_dev) > - return -EIO; > - > spin_lock_irq(&ev_queue->lock); > } > > @@ -361,7 +356,7 @@ static __poll_t ib_uverbs_event_poll(struct ib_uverbs_event_queue *ev_queue, > poll_wait(filp, &ev_queue->poll_wait, wait); > > spin_lock_irq(&ev_queue->lock); > - if (!list_empty(&ev_queue->event_list)) > + if (!list_empty(&ev_queue->event_list) || ev_queue->is_closed) > pollflags = EPOLLIN | EPOLLRDNORM; > spin_unlock_irq(&ev_queue->lock); > >
On Wed, Sep 12, 2018 at 05:04:11PM -0500, Steve Wise wrote: > > > On 9/12/2018 4:33 PM, Jason Gunthorpe wrote: > > On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote: > >> > >> > >> On 9/11/2018 4:25 PM, Jason Gunthorpe wrote: > >>> On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote: > >>>> Ignore the old date... > >>>> > >>>> Jason, is this good to go? > >>> > >>> I'm not sure, I think your old patch might have been the right > >>> one.. But I haven't checked it all through again yet. > >>> > >>> But I'm still not sure if the kref should be allowed to work like > >>> this at all.. > >>> > >>> Leaving a long term dangling filep in the object pointer is not how > >>> this was really intended to work. > >>> > >>> Jason > >>> > >> > >> > >> Looking at the original commit (1e7710f3f656), > >> ib_uverbs_async_event_close() sets is_closed, yet > >> ib_uverbs_comp_event_close() does not. Seems like an oversight in that > >> commit maybe? But as for the whole design of this part of the rdma > >> core, I'm not to familiar with how it "should work" to comment further. > >> I just saw the evidence in the dump and tracked down the issue. :) > > > > Hmm... > > > > Okay, the other reason this is so confusing is because the is_closed > > in uverbs_hot_unplug_completion_event_file() is not doing what it > > needs to do .. > > > > With the below it starts to make more sense and has sensible > > locking.. Once the HW disassociates is_closed becomes set and poll > > indicates readable and read returns EIO, which is the standard > > semantic for this sort of case. > > > > So, yes, this is the right version of this, we should be keeping > > the is_closed in uverbs_hot_unplug_completion_event_file().. > > > > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c > > index 50152c1b100452..4a9d84038762fb 100644 > > +++ b/drivers/infiniband/core/uverbs_main.c > > @@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue, > > spin_lock_irq(&ev_queue->lock); > > > > while (list_empty(&ev_queue->event_list)) { > > + bool closed = ev_queue->is_closed; > > + > > spin_unlock_irq(&ev_queue->lock); > > + if (closed) > > + return -EIO; > > Shouldn't reading ev_queue->is_closed be done inside the lock? It is is read inside the lock, isn't it? > What if it is set immediately after reading it and before this code > acquires the lock? If it is set after unlock then the wake_up_interruptible() sequence after the set will cause wait_event_interruptible() to wakeup/not sleep. Jason
On 9/12/2018 5:10 PM, Jason Gunthorpe wrote: > On Wed, Sep 12, 2018 at 05:04:11PM -0500, Steve Wise wrote: >> >> >> On 9/12/2018 4:33 PM, Jason Gunthorpe wrote: >>> On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote: >>>> >>>> >>>> On 9/11/2018 4:25 PM, Jason Gunthorpe wrote: >>>>> On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote: >>>>>> Ignore the old date... >>>>>> >>>>>> Jason, is this good to go? >>>>> >>>>> I'm not sure, I think your old patch might have been the right >>>>> one.. But I haven't checked it all through again yet. >>>>> >>>>> But I'm still not sure if the kref should be allowed to work like >>>>> this at all.. >>>>> >>>>> Leaving a long term dangling filep in the object pointer is not how >>>>> this was really intended to work. >>>>> >>>>> Jason >>>>> >>>> >>>> >>>> Looking at the original commit (1e7710f3f656), >>>> ib_uverbs_async_event_close() sets is_closed, yet >>>> ib_uverbs_comp_event_close() does not. Seems like an oversight in that >>>> commit maybe? But as for the whole design of this part of the rdma >>>> core, I'm not to familiar with how it "should work" to comment further. >>>> I just saw the evidence in the dump and tracked down the issue. :) >>> >>> Hmm... >>> >>> Okay, the other reason this is so confusing is because the is_closed >>> in uverbs_hot_unplug_completion_event_file() is not doing what it >>> needs to do .. >>> >>> With the below it starts to make more sense and has sensible >>> locking.. Once the HW disassociates is_closed becomes set and poll >>> indicates readable and read returns EIO, which is the standard >>> semantic for this sort of case. >>> >>> So, yes, this is the right version of this, we should be keeping >>> the is_closed in uverbs_hot_unplug_completion_event_file().. >>> >>> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c >>> index 50152c1b100452..4a9d84038762fb 100644 >>> +++ b/drivers/infiniband/core/uverbs_main.c >>> @@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue, >>> spin_lock_irq(&ev_queue->lock); >>> >>> while (list_empty(&ev_queue->event_list)) { >>> + bool closed = ev_queue->is_closed; >>> + >>> spin_unlock_irq(&ev_queue->lock); >>> + if (closed) >>> + return -EIO; >> >> Shouldn't reading ev_queue->is_closed be done inside the lock? > > It is is read inside the lock, isn't it? Sorry, I misread it. :( > >> What if it is set immediately after reading it and before this code >> acquires the lock? > > If it is set after unlock then the wake_up_interruptible() sequence > after the set will cause wait_event_interruptible() to wakeup/not > sleep. > > Jason >
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 823beca448e1..b3ef7db68bde 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -440,6 +440,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp) list_del(&entry->obj_list); kfree(entry); } + file->ev_queue.is_closed = 1; spin_unlock_irq(&file->ev_queue.lock); uverbs_close_fd(filp);
Currently a uverbs completion event queue is flushed of events in ib_uverbs_comp_event_close() with the queue spinlock held and then released. Yet setting ev_queue->is_closed is not set until later in uverbs_hot_unplug_completion_event_file(). In between the time ib_uverbs_comp_event_close() releases the lock and uverbs_hot_unplug_completion_event_file() acquires the lock, a completion event can arrive and be inserted into the event queue by ib_uverbs_comp_handler(). This can cause a "double add" list_add warning or crash depending on the kernel configuration, or a memory leak because the event is never dequeued since the queue is already closed down. So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close(). Cc: stable@vger.kernel.org Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema") Signed-off-by: Steve Wise <swise@opengridcomputing.com> --- Changes since v1: leave setting ev_queue.is_closed to 1 in uverbs_hot_unplug_completion_event_file(). --- drivers/infiniband/core/uverbs_main.c | 1 + 1 file changed, 1 insertion(+)