diff mbox

[V2] IB/uverbs: Fix race between uverbs_close and remove_one

Message ID 20160307190833.GA1886@obsidianresearch.com (mailing list archive)
State Superseded
Headers show

Commit Message

Jason Gunthorpe March 7, 2016, 7:08 p.m. UTC
On Mon, Mar 07, 2016 at 04:44:33AM -0500, Devesh Sharma wrote:

> [67140.260665]  [<ffffffff810c16a0>] ? prepare_to_wait_event+0xf0/0xf0
> [67140.268337]  [<ffffffffa04cabc3>] ? ib_dereg_mr+0x23/0x30 [ib_core]

So, ib_dereg_mr at this point:

        ret = mr->device->dereg_mr(mr);

Is running when mr->device is already freed?

> During rmmod <vendor-driver> "ib_uverbs_close()" context is
> still running, while "ib_uverbs_remove_one()" context completes and
> ends up freeing ib_dev pointer, thus causing a Kernel Panic.

Hurm..

So ib_uverbs_close is busy running in ib_uverbs_cleanup_ucontext and
then ib_uverbs_free_hw_resources is called?

At first blush it certainly looks like the locking around
ib_uverbs_cleanup_context is wrong.

> This patch fixes the race. ib_uverbs_close validates dev->ib_dev against NULL
> inside an srcu lock. If it is NULL, it waits for a completion and drops the srcu
> else continues with the normal flow.

Hum.. So this is really weird, this patch is bascially duplicating a
mutex with srcu and a completion??

What is wrong with simply this:


Noting that ib_uverbs_free_hw_resources holds lists_mutex while
calling ib_uverbs_cleanup_ucontext, so it should be safe, or we have
another bug?

Certainly, the above is closer to the original intent of how this was
supposed to work...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Devesh Sharma March 8, 2016, 10:54 a.m. UTC | #1
On Tue, Mar 8, 2016 at 12:38 AM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Mon, Mar 07, 2016 at 04:44:33AM -0500, Devesh Sharma wrote:
>
>> [67140.260665]  [<ffffffff810c16a0>] ? prepare_to_wait_event+0xf0/0xf0
>> [67140.268337]  [<ffffffffa04cabc3>] ? ib_dereg_mr+0x23/0x30 [ib_core]
>
> So, ib_dereg_mr at this point:
>
>         ret = mr->device->dereg_mr(mr);
>
> Is running when mr->device is already freed?

Yes.

>
>> During rmmod <vendor-driver> "ib_uverbs_close()" context is
>> still running, while "ib_uverbs_remove_one()" context completes and
>> ends up freeing ib_dev pointer, thus causing a Kernel Panic.
>
> Hurm..
>
> So ib_uverbs_close is busy running in ib_uverbs_cleanup_ucontext and
> then ib_uverbs_free_hw_resources is called?

Yes, and completed also to unblock ib_unregister_device() which actually free-up
device pointer.

>
> At first blush it certainly looks like the locking around
> ib_uverbs_cleanup_context is wrong.

I agree, from both locations it is called without any synchronization.

>
>> This patch fixes the race. ib_uverbs_close validates dev->ib_dev against NULL
>> inside an srcu lock. If it is NULL, it waits for a completion and drops the srcu
>> else continues with the normal flow.
>
> Hum.. So this is really weird, this patch is bascially duplicating a
> mutex with srcu and a completion??

Agreed.

>
> What is wrong with simply this:
>
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -962,9 +962,9 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>                 list_del(&file->list);
>                 file->is_closed = 1;
>         }
> -       mutex_unlock(&file->device->lists_mutex);
>         if (ucontext)
>                 ib_uverbs_cleanup_ucontext(file, ucontext);
> +       mutex_unlock(&file->device->lists_mutex);
>
>
> ??

There is following comment about list_mutex in uverbs_main.c around
line number 1200:
/* We must release the mutex before going ahead and calling
 * disassociate_ucontext. disassociate_ucontext might end up
 * indirectly calling uverbs_close, for example due to freeing
 * the resources (e.g mmput).
 */

>
> Noting that ib_uverbs_free_hw_resources holds lists_mutex while
> calling ib_uverbs_cleanup_ucontext, so it should be safe, or we have
> another bug?

No, ib_uverbs_cleanup_ucontext is called outside mutex from this context.
the code takes the reference of the file pointer from the list, then
releases the mutex
then calls ib_uverbs_cleanup_ucontext. After the call is done, mutext
is acquired again.

>
> Certainly, the above is closer to the original intent of how this was
> supposed to work...
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas March 8, 2016, 2:33 p.m. UTC | #2
On 3/8/2016 12:54 PM, Devesh Sharma wrote:
>> What is wrong with simply this:
>>
>> --- a/drivers/infiniband/core/uverbs_main.c
>> +++ b/drivers/infiniband/core/uverbs_main.c
>> @@ -962,9 +962,9 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>>                  list_del(&file->list);
>>                  file->is_closed = 1;
>>          }
>> -       mutex_unlock(&file->device->lists_mutex);
>>          if (ucontext)
>>                  ib_uverbs_cleanup_ucontext(file, ucontext);
>> +       mutex_unlock(&file->device->lists_mutex);
>>
>>
>> ??
>
> There is following comment about list_mutex in uverbs_main.c around
> line number 1200:
> /* We must release the mutex before going ahead and calling
>   * disassociate_ucontext. disassociate_ucontext might end up
>   * indirectly calling uverbs_close, for example due to freeing
>   * the resources (e.g mmput).
>   */
>

Correct.

In addition it's very *bad/incorrect* to call ib_uverbs_cleanup_ucontext 
under the list mutex as it will prevent parallel cleanups of different 
contexts, this may end up with softlock ups as cleanup may involve FW 
commands and may take time.

The correct solution is as I described in my comments to V2, need V3 
with the fixes.




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 8, 2016, 5:53 p.m. UTC | #3
On Tue, Mar 08, 2016 at 04:24:51PM +0530, Devesh Sharma wrote:

> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -962,9 +962,9 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
> >                 list_del(&file->list);
> >                 file->is_closed = 1;
> >         }
> > -       mutex_unlock(&file->device->lists_mutex);
> >         if (ucontext)
> >                 ib_uverbs_cleanup_ucontext(file, ucontext);
> > +       mutex_unlock(&file->device->lists_mutex);
> >
> >
> > ??
> 
> There is following comment about list_mutex in uverbs_main.c around
> line number 1200:
> /* We must release the mutex before going ahead and calling
>  * disassociate_ucontext. disassociate_ucontext might end up
>  * indirectly calling uverbs_close, for example due to freeing
>  * the resources (e.g mmput).
>  */

Okay, now I remember this discussion, and being unhappy about this
during review.

However, this comment is talking about disassociate_ucontext, the bug
is with ib_uverbs_cleanup_ucontext. We can't re-entre uverbs_close
while we are already in uverbs_close, so that doesn't explain why it
cannot be in the mutex.

So, Yishai, what is the problem with the above lock placement?

The only issue you raised was with multi-file close concurrency, and
that is trivially solved with another mutex.

I'd rather see another mutex added then this ugly add-hoc
srcu/completion thing.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas March 9, 2016, 4:48 p.m. UTC | #4
On 3/8/2016 7:53 PM, Jason Gunthorpe wrote:
> On Tue, Mar 08, 2016 at 04:24:51PM +0530, Devesh Sharma wrote:
>
>>> +++ b/drivers/infiniband/core/uverbs_main.c
>>> @@ -962,9 +962,9 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>>>                  list_del(&file->list);
>>>                  file->is_closed = 1;
>>>          }
>>> -       mutex_unlock(&file->device->lists_mutex);
>>>          if (ucontext)
>>>                  ib_uverbs_cleanup_ucontext(file, ucontext);
>>> +       mutex_unlock(&file->device->lists_mutex);
>>>
>>>
>>> ??
>>
>> There is following comment about list_mutex in uverbs_main.c around
>> line number 1200:
>> /* We must release the mutex before going ahead and calling
>>   * disassociate_ucontext. disassociate_ucontext might end up
>>   * indirectly calling uverbs_close, for example due to freeing
>>   * the resources (e.g mmput).
>>   */
>
> Okay, now I remember this discussion, and being unhappy about this
> during review.
>
> However, this comment is talking about disassociate_ucontext, the bug
> is with ib_uverbs_cleanup_ucontext. We can't re-entre uverbs_close
> while we are already in uverbs_close, so that doesn't explain why it
> cannot be in the mutex.
>
> So, Yishai, what is the problem with the above lock placement?
>
> The only issue you raised was with multi-file close concurrency, and
> that is trivially solved with another mutex.
>
> I'd rather see another mutex added then this ugly add-hoc
> srcu/completion thing.

The srcu with NULL checking by itself can prevent the race, no need for 
the "completion" mechanism. ib_uverbs_free_hw_resources uses 
synchronize_srcu just after that ib_dev was set to NULL as part of 
ib_uverbs_remove_one.

The reason for the extra "completion" that I suggested comes to make 
sure that when an application returns from its close API the underlying 
resources were really freed, this is open in current code even if the 
race *wasn't* hit.

As we need to enable parallel closing it seems to be the preferred way 
to go.

Devesh, can you send V3 with above suggestion to help people reviewing 
it ? if you have some other solution with mutex that addressed above 
points please come it to the list for a review.






--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma March 10, 2016, 8:16 a.m. UTC | #5
On Wed, Mar 9, 2016 at 10:18 PM, Yishai Hadas
<yishaih@dev.mellanox.co.il> wrote:
> On 3/8/2016 7:53 PM, Jason Gunthorpe wrote:
>>
>> On Tue, Mar 08, 2016 at 04:24:51PM +0530, Devesh Sharma wrote:
>>
>>>> +++ b/drivers/infiniband/core/uverbs_main.c
>>>> @@ -962,9 +962,9 @@ static int ib_uverbs_close(struct inode *inode,
>>>> struct file *filp)
>>>>                  list_del(&file->list);
>>>>                  file->is_closed = 1;
>>>>          }
>>>> -       mutex_unlock(&file->device->lists_mutex);
>>>>          if (ucontext)
>>>>                  ib_uverbs_cleanup_ucontext(file, ucontext);
>>>> +       mutex_unlock(&file->device->lists_mutex);
>>>>
>>>>
>>>> ??
>>>
>>>
>>> There is following comment about list_mutex in uverbs_main.c around
>>> line number 1200:
>>> /* We must release the mutex before going ahead and calling
>>>   * disassociate_ucontext. disassociate_ucontext might end up
>>>   * indirectly calling uverbs_close, for example due to freeing
>>>   * the resources (e.g mmput).
>>>   */
>>
>>
>> Okay, now I remember this discussion, and being unhappy about this
>> during review.
>>
>> However, this comment is talking about disassociate_ucontext, the bug
>> is with ib_uverbs_cleanup_ucontext. We can't re-entre uverbs_close
>> while we are already in uverbs_close, so that doesn't explain why it
>> cannot be in the mutex.
>>
>> So, Yishai, what is the problem with the above lock placement?
>>
>> The only issue you raised was with multi-file close concurrency, and
>> that is trivially solved with another mutex.
>>
>> I'd rather see another mutex added then this ugly add-hoc
>> srcu/completion thing.
>
>
> The srcu with NULL checking by itself can prevent the race, no need for the
> "completion" mechanism. ib_uverbs_free_hw_resources uses synchronize_srcu
> just after that ib_dev was set to NULL as part of ib_uverbs_remove_one.
>
> The reason for the extra "completion" that I suggested comes to make sure
> that when an application returns from its close API the underlying resources
> were really freed, this is open in current code even if the race *wasn't*
> hit.
>
> As we need to enable parallel closing it seems to be the preferred way to
> go.
>
> Devesh, can you send V3 with above suggestion to help people reviewing it ?
> if you have some other solution with mutex that addressed above points
> please come it to the list for a review.

Sure, I should be able to post it toady.

>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -962,9 +962,9 @@  static int ib_uverbs_close(struct inode *inode, struct file *filp)
                list_del(&file->list);
                file->is_closed = 1;
        }
-       mutex_unlock(&file->device->lists_mutex);
        if (ucontext)
                ib_uverbs_cleanup_ucontext(file, ucontext);
+       mutex_unlock(&file->device->lists_mutex);
 

??