diff mbox

[1/1] bcache: use llist_for_each_entry_safe() in __closure_wake_up()

Message ID 20170926095412.74028-2-colyli@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Coly Li Sept. 26, 2017, 9:54 a.m. UTC
Commit 09b3efec ("bcache: Don't reinvent the wheel but use existing llist
API") replaces the following while loop by llist_for_each_entry(),

-
-	while (reverse) {
-		cl = container_of(reverse, struct closure, list);
-		reverse = llist_next(reverse);
-
+	llist_for_each_entry(cl, reverse, list) {
 		closure_set_waiting(cl, 0);
 		closure_sub(cl, CLOSURE_WAITING + 1);
 	}

This modification introduces a potential race by iterating a corrupted
list. Here is how it happens.

In the above modification, closure_sub() may wake up a process which is
waiting on reverse list. If this process decides to wait again by calling
closure_wait(), its cl->list will be added to another wait list. Then
when llist_for_each_entry() continues to iterate next node, it will travel
on another new wait list which is added in closure_wait(), not the
original reverse list in __closure_wake_up(). It is more probably to
happen on UP machine because the waked up process may preempt the process
which wakes up it.

Use llist_for_each_entry_safe() will fix the issue, the safe version fetch
next node before waking up a process. Then the copy of next node will make
sure list iteration stays on original reverse list.

Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Byungchul Park <byungchul.park@lge.com>
---
 drivers/md/bcache/closure.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Coly Li Sept. 27, 2017, 7:16 p.m. UTC | #1
Hi Jens,

Could you please take a look on this patch? It will be helpful if we can
have it in 4.14, then we can fix a bug introduced in 4.14-rc1.

This patch is reported by Michael Lyle, reviewed by Byungchul Park, and
finally verified by Michael Lyle after I posted the patch.

Many thanks in advance.

Coly Li

On 2017/9/26 下午5:54, Coly Li wrote:
> Commit 09b3efec ("bcache: Don't reinvent the wheel but use existing llist
> API") replaces the following while loop by llist_for_each_entry(),
> 
> -
> -	while (reverse) {
> -		cl = container_of(reverse, struct closure, list);
> -		reverse = llist_next(reverse);
> -
> +	llist_for_each_entry(cl, reverse, list) {
>  		closure_set_waiting(cl, 0);
>  		closure_sub(cl, CLOSURE_WAITING + 1);
>  	}
> 
> This modification introduces a potential race by iterating a corrupted
> list. Here is how it happens.
> 
> In the above modification, closure_sub() may wake up a process which is
> waiting on reverse list. If this process decides to wait again by calling
> closure_wait(), its cl->list will be added to another wait list. Then
> when llist_for_each_entry() continues to iterate next node, it will travel
> on another new wait list which is added in closure_wait(), not the
> original reverse list in __closure_wake_up(). It is more probably to
> happen on UP machine because the waked up process may preempt the process
> which wakes up it.
> 
> Use llist_for_each_entry_safe() will fix the issue, the safe version fetch
> next node before waking up a process. Then the copy of next node will make
> sure list iteration stays on original reverse list.
> 
> Signed-off-by: Coly Li <colyli@suse.de>
> Reported-by: Michael Lyle <mlyle@lyle.org>
> Reviewed-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  drivers/md/bcache/closure.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/closure.c b/drivers/md/bcache/closure.c
> index 7d5286b05036..1841d0359bac 100644
> --- a/drivers/md/bcache/closure.c
> +++ b/drivers/md/bcache/closure.c
> @@ -64,7 +64,7 @@ EXPORT_SYMBOL(closure_put);
>  void __closure_wake_up(struct closure_waitlist *wait_list)
>  {
>  	struct llist_node *list;
> -	struct closure *cl;
> +	struct closure *cl, *t;
>  	struct llist_node *reverse = NULL;
>  
>  	list = llist_del_all(&wait_list->list);
> @@ -73,7 +73,7 @@ void __closure_wake_up(struct closure_waitlist *wait_list)
>  	reverse = llist_reverse_order(list);
>  
>  	/* Then do the wakeups */
> -	llist_for_each_entry(cl, reverse, list) {
> +	llist_for_each_entry_safe(cl, t, reverse, list) {
>  		closure_set_waiting(cl, 0);
>  		closure_sub(cl, CLOSURE_WAITING + 1);
>  	}
>
Jens Axboe Sept. 27, 2017, 8:27 p.m. UTC | #2
On 09/27/2017 09:16 PM, Coly Li wrote:
> Hi Jens,
> 
> Could you please take a look on this patch? It will be helpful if we can
> have it in 4.14, then we can fix a bug introduced in 4.14-rc1.
> 
> This patch is reported by Michael Lyle, reviewed by Byungchul Park, and
> finally verified by Michael Lyle after I posted the patch.

It looks fine to me, I'll get it queued up. BTW, it's technically
a use-after-free bug, not a race condition.
Jens Axboe Sept. 27, 2017, 8:29 p.m. UTC | #3
On 09/26/2017 11:54 AM, Coly Li wrote:
> Commit 09b3efec ("bcache: Don't reinvent the wheel but use existing llist
> API") replaces the following while loop by llist_for_each_entry(),

If it's fixing a specific commit, please add a proper

Fixes: sha ("title")

line to your patch. That way automated scripts can pick it up, if they
backport the problematic patch.

Also, as mentioned in the other email, this is technically a
use-after-free problem, not a race.

Can you resend with commit message fixed up?
Michael Lyle Sept. 27, 2017, 8:48 p.m. UTC | #4
Jens--

I think it's a race condition-- the individual closures remain valid.
It's just that the list element has different meanings-- it's either a
list actively being used to wake, or a linkage on one of several lists
that is being used to await wake.  If a closure goes back to wait very
quickly after being woken, it can end up connecting its new wait-list
with the being-woken list.

Mike

On Wed, Sep 27, 2017 at 1:27 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 09/27/2017 09:16 PM, Coly Li wrote:
>> Hi Jens,
>>
>> Could you please take a look on this patch? It will be helpful if we can
>> have it in 4.14, then we can fix a bug introduced in 4.14-rc1.
>>
>> This patch is reported by Michael Lyle, reviewed by Byungchul Park, and
>> finally verified by Michael Lyle after I posted the patch.
>
> It looks fine to me, I'll get it queued up. BTW, it's technically
> a use-after-free bug, not a race condition.
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Sept. 27, 2017, 8:52 p.m. UTC | #5
On 09/27/2017 10:48 PM, Michael Lyle wrote:
> Jens--
> 
> I think it's a race condition-- the individual closures remain valid.
> It's just that the list element has different meanings-- it's either a
> list actively being used to wake, or a linkage on one of several lists
> that is being used to await wake.  If a closure goes back to wait very
> quickly after being woken, it can end up connecting its new wait-list
> with the being-woken list.

Reading it a second time, looks like you are right. OK, then let's just
queue it up as-is, I'll add the Fixes line this time.
Coly Li Sept. 28, 2017, 12:46 a.m. UTC | #6
On 2017/9/28 上午4:52, Jens Axboe wrote:
> On 09/27/2017 10:48 PM, Michael Lyle wrote:
>> Jens--
>>
>> I think it's a race condition-- the individual closures remain valid.
>> It's just that the list element has different meanings-- it's either a
>> list actively being used to wake, or a linkage on one of several lists
>> that is being used to await wake.  If a closure goes back to wait very
>> quickly after being woken, it can end up connecting its new wait-list
>> with the being-woken list.
> 
> Reading it a second time, looks like you are right. OK, then let's just
> queue it up as-is, I'll add the Fixes line this time.
> 
Hi Jens,

Copied, next time for similar situation, Fixes will be there.

Thanks for doing this.
diff mbox

Patch

diff --git a/drivers/md/bcache/closure.c b/drivers/md/bcache/closure.c
index 7d5286b05036..1841d0359bac 100644
--- a/drivers/md/bcache/closure.c
+++ b/drivers/md/bcache/closure.c
@@ -64,7 +64,7 @@  EXPORT_SYMBOL(closure_put);
 void __closure_wake_up(struct closure_waitlist *wait_list)
 {
 	struct llist_node *list;
-	struct closure *cl;
+	struct closure *cl, *t;
 	struct llist_node *reverse = NULL;
 
 	list = llist_del_all(&wait_list->list);
@@ -73,7 +73,7 @@  void __closure_wake_up(struct closure_waitlist *wait_list)
 	reverse = llist_reverse_order(list);
 
 	/* Then do the wakeups */
-	llist_for_each_entry(cl, reverse, list) {
+	llist_for_each_entry_safe(cl, t, reverse, list) {
 		closure_set_waiting(cl, 0);
 		closure_sub(cl, CLOSURE_WAITING + 1);
 	}