diff mbox series

[v2] ib_uverbs: atomically flush and mark closed the comp event queue

Message ID 16d6916702fe746e08a7e9b20d90dc2fbfe92184.1536688044.git.swise@opengridcomputing.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [v2] ib_uverbs: atomically flush and mark closed the comp event queue | expand

Commit Message

Steve Wise Aug. 31, 2018, 2:16 p.m. UTC
Currently a uverbs completion event queue is flushed of events in
ib_uverbs_comp_event_close() with the queue spinlock held and then
released.  Yet setting ev_queue->is_closed is not set until later in
uverbs_hot_unplug_completion_event_file().

In between the time ib_uverbs_comp_event_close() releases the lock
and uverbs_hot_unplug_completion_event_file() acquires the lock, a
completion event can arrive and be inserted into the event queue by
ib_uverbs_comp_handler().

This can cause a "double add" list_add warning or crash depending on
the kernel configuration, or a memory leak because the event is never
dequeued since the queue is already closed down.

So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().

Cc: stable@vger.kernel.org
Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

Changes since v1:

leave setting ev_queue.is_closed to 1 in
uverbs_hot_unplug_completion_event_file().

---
 drivers/infiniband/core/uverbs_main.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Steve Wise Sept. 11, 2018, 9:20 p.m. UTC | #1
Ignore the old date...

Jason, is this good to go?

Steve.

On 8/31/2018 9:16 AM, Steve Wise wrote:
> Currently a uverbs completion event queue is flushed of events in
> ib_uverbs_comp_event_close() with the queue spinlock held and then
> released.  Yet setting ev_queue->is_closed is not set until later in
> uverbs_hot_unplug_completion_event_file().
> 
> In between the time ib_uverbs_comp_event_close() releases the lock
> and uverbs_hot_unplug_completion_event_file() acquires the lock, a
> completion event can arrive and be inserted into the event queue by
> ib_uverbs_comp_handler().
> 
> This can cause a "double add" list_add warning or crash depending on
> the kernel configuration, or a memory leak because the event is never
> dequeued since the queue is already closed down.
> 
> So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().
> 
> Cc: stable@vger.kernel.org
> Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> 
> Changes since v1:
> 
> leave setting ev_queue.is_closed to 1 in
> uverbs_hot_unplug_completion_event_file().
> 
> ---
>  drivers/infiniband/core/uverbs_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 823beca448e1..b3ef7db68bde 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -440,6 +440,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp)
>  			list_del(&entry->obj_list);
>  		kfree(entry);
>  	}
> +	file->ev_queue.is_closed = 1;
>  	spin_unlock_irq(&file->ev_queue.lock);
>  
>  	uverbs_close_fd(filp);
>
Jason Gunthorpe Sept. 11, 2018, 9:25 p.m. UTC | #2
On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote:
> Ignore the old date...
> 
> Jason, is this good to go?

I'm not sure, I think your old patch might have been the right
one.. But I haven't checked it all through again yet.

But I'm still not sure if the kref should be allowed to work like
this at all..

Leaving a long term dangling filep in the object pointer is not how
this was really intended to work.

Jason
Steve Wise Sept. 11, 2018, 10:18 p.m. UTC | #3
On 9/11/2018 4:25 PM, Jason Gunthorpe wrote:
> On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote:
>> Ignore the old date...
>>
>> Jason, is this good to go?
> 
> I'm not sure, I think your old patch might have been the right
> one.. But I haven't checked it all through again yet.
> 
> But I'm still not sure if the kref should be allowed to work like
> this at all..
> 
> Leaving a long term dangling filep in the object pointer is not how
> this was really intended to work.
> 
> Jason
> 


Looking at the original commit (1e7710f3f656),
ib_uverbs_async_event_close() sets is_closed, yet
ib_uverbs_comp_event_close() does not. Seems like an oversight in that
commit maybe?  But as for the whole design of this part of the rdma
core, I'm not to familiar with how it "should work" to comment further.
I just saw the evidence in the dump and tracked down the issue. :)

This is certainly a regression from cxgb4's perspective:  These tests
past prior to commit 1e7710f3f656, I believe.

Steve
Jason Gunthorpe Sept. 12, 2018, 9:33 p.m. UTC | #4
On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote:
> 
> 
> On 9/11/2018 4:25 PM, Jason Gunthorpe wrote:
> > On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote:
> >> Ignore the old date...
> >>
> >> Jason, is this good to go?
> > 
> > I'm not sure, I think your old patch might have been the right
> > one.. But I haven't checked it all through again yet.
> > 
> > But I'm still not sure if the kref should be allowed to work like
> > this at all..
> > 
> > Leaving a long term dangling filep in the object pointer is not how
> > this was really intended to work.
> > 
> > Jason
> > 
> 
> 
> Looking at the original commit (1e7710f3f656),
> ib_uverbs_async_event_close() sets is_closed, yet
> ib_uverbs_comp_event_close() does not. Seems like an oversight in that
> commit maybe?  But as for the whole design of this part of the rdma
> core, I'm not to familiar with how it "should work" to comment further.
> I just saw the evidence in the dump and tracked down the issue. :)

Hmm...

Okay, the other reason this is so confusing is because the is_closed
in uverbs_hot_unplug_completion_event_file() is not doing what it
needs to do ..

With the below it starts to make more sense and has sensible
locking.. Once the HW disassociates is_closed becomes set and poll
indicates readable and read returns EIO, which is the standard
semantic for this sort of case.

So, yes, this is the right version of this, we should be keeping
the is_closed in uverbs_hot_unplug_completion_event_file()..

diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 50152c1b100452..4a9d84038762fb 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue,
 	spin_lock_irq(&ev_queue->lock);
 
 	while (list_empty(&ev_queue->event_list)) {
+		bool closed = ev_queue->is_closed;
+
 		spin_unlock_irq(&ev_queue->lock);
+		if (closed)
+			return -EIO;
 
 		if (filp->f_flags & O_NONBLOCK)
 			return -EAGAIN;
 
 		if (wait_event_interruptible(ev_queue->poll_wait,
 					     (!list_empty(&ev_queue->event_list) ||
-			/* The barriers built into wait_event_interruptible()
-			 * and wake_up() guarentee this will see the null set
-			 * without using RCU
-			 */
-					     !uverbs_file->device->ib_dev)))
+					      ev_queue->is_closed)))
 			return -ERESTARTSYS;
 
-		/* If device was disassociated and no event exists set an error */
-		if (list_empty(&ev_queue->event_list) &&
-		    !uverbs_file->device->ib_dev)
-			return -EIO;
-
 		spin_lock_irq(&ev_queue->lock);
 	}
 
@@ -361,7 +356,7 @@ static __poll_t ib_uverbs_event_poll(struct ib_uverbs_event_queue *ev_queue,
 	poll_wait(filp, &ev_queue->poll_wait, wait);
 
 	spin_lock_irq(&ev_queue->lock);
-	if (!list_empty(&ev_queue->event_list))
+	if (!list_empty(&ev_queue->event_list) || ev_queue->is_closed)
 		pollflags = EPOLLIN | EPOLLRDNORM;
 	spin_unlock_irq(&ev_queue->lock);
Jason Gunthorpe Sept. 12, 2018, 9:44 p.m. UTC | #5
On Fri, Aug 31, 2018 at 07:16:03AM -0700, Steve Wise wrote:
> Currently a uverbs completion event queue is flushed of events in
> ib_uverbs_comp_event_close() with the queue spinlock held and then
> released.  Yet setting ev_queue->is_closed is not set until later in
> uverbs_hot_unplug_completion_event_file().
> 
> In between the time ib_uverbs_comp_event_close() releases the lock
> and uverbs_hot_unplug_completion_event_file() acquires the lock, a
> completion event can arrive and be inserted into the event queue by
> ib_uverbs_comp_handler().
> 
> This can cause a "double add" list_add warning or crash depending on
> the kernel configuration, or a memory leak because the event is never
> dequeued since the queue is already closed down.
> 
> So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().
> 
> Cc: stable@vger.kernel.org
> Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> 
> Changes since v1:
> 
> leave setting ev_queue.is_closed to 1 in
> uverbs_hot_unplug_completion_event_file().
> 
> ---
>  drivers/infiniband/core/uverbs_main.c | 1 +
>  1 file changed, 1 insertion(+)

Applied to for-rc thanks

Jason
Steve Wise Sept. 12, 2018, 10:04 p.m. UTC | #6
On 9/12/2018 4:33 PM, Jason Gunthorpe wrote:
> On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote:
>>
>>
>> On 9/11/2018 4:25 PM, Jason Gunthorpe wrote:
>>> On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote:
>>>> Ignore the old date...
>>>>
>>>> Jason, is this good to go?
>>>
>>> I'm not sure, I think your old patch might have been the right
>>> one.. But I haven't checked it all through again yet.
>>>
>>> But I'm still not sure if the kref should be allowed to work like
>>> this at all..
>>>
>>> Leaving a long term dangling filep in the object pointer is not how
>>> this was really intended to work.
>>>
>>> Jason
>>>
>>
>>
>> Looking at the original commit (1e7710f3f656),
>> ib_uverbs_async_event_close() sets is_closed, yet
>> ib_uverbs_comp_event_close() does not. Seems like an oversight in that
>> commit maybe?  But as for the whole design of this part of the rdma
>> core, I'm not to familiar with how it "should work" to comment further.
>> I just saw the evidence in the dump and tracked down the issue. :)
> 
> Hmm...
> 
> Okay, the other reason this is so confusing is because the is_closed
> in uverbs_hot_unplug_completion_event_file() is not doing what it
> needs to do ..
> 
> With the below it starts to make more sense and has sensible
> locking.. Once the HW disassociates is_closed becomes set and poll
> indicates readable and read returns EIO, which is the standard
> semantic for this sort of case.
> 
> So, yes, this is the right version of this, we should be keeping
> the is_closed in uverbs_hot_unplug_completion_event_file()..
> 
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 50152c1b100452..4a9d84038762fb 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue,
>  	spin_lock_irq(&ev_queue->lock);
>  
>  	while (list_empty(&ev_queue->event_list)) {
> +		bool closed = ev_queue->is_closed;
> +
>  		spin_unlock_irq(&ev_queue->lock);
> +		if (closed)
> +			return -EIO;

Shouldn't reading ev_queue->is_closed be done inside the lock?  What if
it is set immediately after reading it and before this code acquires the
lock?



>  
>  		if (filp->f_flags & O_NONBLOCK)
>  			return -EAGAIN;
>  
>  		if (wait_event_interruptible(ev_queue->poll_wait,
>  					     (!list_empty(&ev_queue->event_list) ||
> -			/* The barriers built into wait_event_interruptible()
> -			 * and wake_up() guarentee this will see the null set
> -			 * without using RCU
> -			 */
> -					     !uverbs_file->device->ib_dev)))
> +					      ev_queue->is_closed)))
>  			return -ERESTARTSYS;
>  
> -		/* If device was disassociated and no event exists set an error */
> -		if (list_empty(&ev_queue->event_list) &&
> -		    !uverbs_file->device->ib_dev)
> -			return -EIO;
> -
>  		spin_lock_irq(&ev_queue->lock);
>  	}
>  
> @@ -361,7 +356,7 @@ static __poll_t ib_uverbs_event_poll(struct ib_uverbs_event_queue *ev_queue,
>  	poll_wait(filp, &ev_queue->poll_wait, wait);
>  
>  	spin_lock_irq(&ev_queue->lock);
> -	if (!list_empty(&ev_queue->event_list))
> +	if (!list_empty(&ev_queue->event_list) || ev_queue->is_closed)
>  		pollflags = EPOLLIN | EPOLLRDNORM;
>  	spin_unlock_irq(&ev_queue->lock);
>  
>
Jason Gunthorpe Sept. 12, 2018, 10:10 p.m. UTC | #7
On Wed, Sep 12, 2018 at 05:04:11PM -0500, Steve Wise wrote:
> 
> 
> On 9/12/2018 4:33 PM, Jason Gunthorpe wrote:
> > On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote:
> >>
> >>
> >> On 9/11/2018 4:25 PM, Jason Gunthorpe wrote:
> >>> On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote:
> >>>> Ignore the old date...
> >>>>
> >>>> Jason, is this good to go?
> >>>
> >>> I'm not sure, I think your old patch might have been the right
> >>> one.. But I haven't checked it all through again yet.
> >>>
> >>> But I'm still not sure if the kref should be allowed to work like
> >>> this at all..
> >>>
> >>> Leaving a long term dangling filep in the object pointer is not how
> >>> this was really intended to work.
> >>>
> >>> Jason
> >>>
> >>
> >>
> >> Looking at the original commit (1e7710f3f656),
> >> ib_uverbs_async_event_close() sets is_closed, yet
> >> ib_uverbs_comp_event_close() does not. Seems like an oversight in that
> >> commit maybe?  But as for the whole design of this part of the rdma
> >> core, I'm not to familiar with how it "should work" to comment further.
> >> I just saw the evidence in the dump and tracked down the issue. :)
> > 
> > Hmm...
> > 
> > Okay, the other reason this is so confusing is because the is_closed
> > in uverbs_hot_unplug_completion_event_file() is not doing what it
> > needs to do ..
> > 
> > With the below it starts to make more sense and has sensible
> > locking.. Once the HW disassociates is_closed becomes set and poll
> > indicates readable and read returns EIO, which is the standard
> > semantic for this sort of case.
> > 
> > So, yes, this is the right version of this, we should be keeping
> > the is_closed in uverbs_hot_unplug_completion_event_file()..
> > 
> > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> > index 50152c1b100452..4a9d84038762fb 100644
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue,
> >  	spin_lock_irq(&ev_queue->lock);
> >  
> >  	while (list_empty(&ev_queue->event_list)) {
> > +		bool closed = ev_queue->is_closed;
> > +
> >  		spin_unlock_irq(&ev_queue->lock);
> > +		if (closed)
> > +			return -EIO;
> 
> Shouldn't reading ev_queue->is_closed be done inside the lock?  

It is is read inside the lock, isn't it?

> What if it is set immediately after reading it and before this code
> acquires the lock?

If it is set after unlock then the wake_up_interruptible() sequence
after the set will cause wait_event_interruptible() to wakeup/not
sleep.

Jason
Steve Wise Sept. 12, 2018, 10:12 p.m. UTC | #8
On 9/12/2018 5:10 PM, Jason Gunthorpe wrote:
> On Wed, Sep 12, 2018 at 05:04:11PM -0500, Steve Wise wrote:
>>
>>
>> On 9/12/2018 4:33 PM, Jason Gunthorpe wrote:
>>> On Tue, Sep 11, 2018 at 05:18:32PM -0500, Steve Wise wrote:
>>>>
>>>>
>>>> On 9/11/2018 4:25 PM, Jason Gunthorpe wrote:
>>>>> On Tue, Sep 11, 2018 at 04:20:50PM -0500, Steve Wise wrote:
>>>>>> Ignore the old date...
>>>>>>
>>>>>> Jason, is this good to go?
>>>>>
>>>>> I'm not sure, I think your old patch might have been the right
>>>>> one.. But I haven't checked it all through again yet.
>>>>>
>>>>> But I'm still not sure if the kref should be allowed to work like
>>>>> this at all..
>>>>>
>>>>> Leaving a long term dangling filep in the object pointer is not how
>>>>> this was really intended to work.
>>>>>
>>>>> Jason
>>>>>
>>>>
>>>>
>>>> Looking at the original commit (1e7710f3f656),
>>>> ib_uverbs_async_event_close() sets is_closed, yet
>>>> ib_uverbs_comp_event_close() does not. Seems like an oversight in that
>>>> commit maybe?  But as for the whole design of this part of the rdma
>>>> core, I'm not to familiar with how it "should work" to comment further.
>>>> I just saw the evidence in the dump and tracked down the issue. :)
>>>
>>> Hmm...
>>>
>>> Okay, the other reason this is so confusing is because the is_closed
>>> in uverbs_hot_unplug_completion_event_file() is not doing what it
>>> needs to do ..
>>>
>>> With the below it starts to make more sense and has sensible
>>> locking.. Once the HW disassociates is_closed becomes set and poll
>>> indicates readable and read returns EIO, which is the standard
>>> semantic for this sort of case.
>>>
>>> So, yes, this is the right version of this, we should be keeping
>>> the is_closed in uverbs_hot_unplug_completion_event_file()..
>>>
>>> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
>>> index 50152c1b100452..4a9d84038762fb 100644
>>> +++ b/drivers/infiniband/core/uverbs_main.c
>>> @@ -281,25 +281,20 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue,
>>>  	spin_lock_irq(&ev_queue->lock);
>>>  
>>>  	while (list_empty(&ev_queue->event_list)) {
>>> +		bool closed = ev_queue->is_closed;
>>> +
>>>  		spin_unlock_irq(&ev_queue->lock);
>>> +		if (closed)
>>> +			return -EIO;
>>
>> Shouldn't reading ev_queue->is_closed be done inside the lock?  
> 
> It is is read inside the lock, isn't it?

Sorry, I misread it. :(

> 
>> What if it is set immediately after reading it and before this code
>> acquires the lock?
> 
> If it is set after unlock then the wake_up_interruptible() sequence
> after the set will cause wait_event_interruptible() to wakeup/not
> sleep.
> 
> Jason
>
diff mbox series

Patch

diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 823beca448e1..b3ef7db68bde 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -440,6 +440,7 @@  static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp)
 			list_del(&entry->obj_list);
 		kfree(entry);
 	}
+	file->ev_queue.is_closed = 1;
 	spin_unlock_irq(&file->ev_queue.lock);
 
 	uverbs_close_fd(filp);