diff mbox series

[12/12] closures: fix a race on wakeup from closure_sync

Message ID 20190610191420.27007-13-kent.overstreet@gmail.com (mailing list archive)
State New, archived
Headers show
Series [01/12] Compiler Attributes: add __flatten | expand

Commit Message

Kent Overstreet June 10, 2019, 7:14 p.m. UTC
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 lib/closure.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Coly Li July 16, 2019, 10:47 a.m. UTC | #1
Hi Kent,

On 2019/6/11 3:14 上午, Kent Overstreet wrote:
> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Coly Li <colyli@suse.de>

And also I receive report for suspicious closure race condition in
bcache, and people ask for having this patch into Linux v5.3.

So before this patch gets merged into upstream, I plan to rebase it to
drivers/md/bcache/closure.c at this moment. Of cause the author is you.

When lib/closure.c merged into upstream, I will rebase all closure usage
from bcache to use lib/closure.{c,h}.

Thanks in advance.

Coly Li

> ---
>  lib/closure.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/closure.c b/lib/closure.c
> index 46cfe4c382..3e6366c262 100644
> --- a/lib/closure.c
> +++ b/lib/closure.c
> @@ -104,8 +104,14 @@ struct closure_syncer {
>  
>  static void closure_sync_fn(struct closure *cl)
>  {
> -	cl->s->done = 1;
> -	wake_up_process(cl->s->task);
> +	struct closure_syncer *s = cl->s;
> +	struct task_struct *p;
> +
> +	rcu_read_lock();
> +	p = READ_ONCE(s->task);
> +	s->done = 1;
> +	wake_up_process(p);
> +	rcu_read_unlock();
>  }
>  
>  void __sched __closure_sync(struct closure *cl)
>
Coly Li July 18, 2019, 7:46 a.m. UTC | #2
On 2019/7/16 6:47 下午, Coly Li wrote:
> Hi Kent,
> 
> On 2019/6/11 3:14 上午, Kent Overstreet wrote:
>> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
> Acked-by: Coly Li <colyli@suse.de>
> 
> And also I receive report for suspicious closure race condition in
> bcache, and people ask for having this patch into Linux v5.3.
> 
> So before this patch gets merged into upstream, I plan to rebase it to
> drivers/md/bcache/closure.c at this moment. Of cause the author is you.
> 
> When lib/closure.c merged into upstream, I will rebase all closure usage
> from bcache to use lib/closure.{c,h}.

Hi Kent,

The race bug reporter replies me that the closure race bug is very rare
to reproduce, after applying the patch and testing, they are not sure
whether their closure race problem is fixed or not.

And I notice rcu_read_lock()/rcu_read_unlock() is used here, but it is
not clear to me what is the functionality of the rcu read lock in
closure_sync_fn(). I believe you have reason to use the rcu stuffs here,
could you please provide some hints to help me to understand the change
better ?

Thanks in advance.

Coly Li

>> ---
>>  lib/closure.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/closure.c b/lib/closure.c
>> index 46cfe4c382..3e6366c262 100644
>> --- a/lib/closure.c
>> +++ b/lib/closure.c
>> @@ -104,8 +104,14 @@ struct closure_syncer {
>>  
>>  static void closure_sync_fn(struct closure *cl)
>>  {
>> -	cl->s->done = 1;
>> -	wake_up_process(cl->s->task);
>> +	struct closure_syncer *s = cl->s;
>> +	struct task_struct *p;
>> +
>> +	rcu_read_lock();
>> +	p = READ_ONCE(s->task);
>> +	s->done = 1;
>> +	wake_up_process(p);
>> +	rcu_read_unlock();
>>  }
>>  
>>  void __sched __closure_sync(struct closure *cl)
Kent Overstreet July 22, 2019, 5:22 p.m. UTC | #3
On Thu, Jul 18, 2019 at 03:46:46PM +0800, Coly Li wrote:
> On 2019/7/16 6:47 下午, Coly Li wrote:
> > Hi Kent,
> > 
> > On 2019/6/11 3:14 上午, Kent Overstreet wrote:
> >> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
> > Acked-by: Coly Li <colyli@suse.de>
> > 
> > And also I receive report for suspicious closure race condition in
> > bcache, and people ask for having this patch into Linux v5.3.
> > 
> > So before this patch gets merged into upstream, I plan to rebase it to
> > drivers/md/bcache/closure.c at this moment. Of cause the author is you.
> > 
> > When lib/closure.c merged into upstream, I will rebase all closure usage
> > from bcache to use lib/closure.{c,h}.
> 
> Hi Kent,
> 
> The race bug reporter replies me that the closure race bug is very rare
> to reproduce, after applying the patch and testing, they are not sure
> whether their closure race problem is fixed or not.
> 
> And I notice rcu_read_lock()/rcu_read_unlock() is used here, but it is
> not clear to me what is the functionality of the rcu read lock in
> closure_sync_fn(). I believe you have reason to use the rcu stuffs here,
> could you please provide some hints to help me to understand the change
> better ?

The race was when a thread using closure_sync() notices cl->s->done == 1 before
the thread calling closure_put() calls wake_up_process(). Then, it's possible
for that thread to return and exit just before wake_up_process() is called - so
we're trying to wake up a process that no longer exists.

rcu_read_lock() is sufficient to protect against this, as there's an rcu barrier
somewhere in the process teardown path.
diff mbox series

Patch

diff --git a/lib/closure.c b/lib/closure.c
index 46cfe4c382..3e6366c262 100644
--- a/lib/closure.c
+++ b/lib/closure.c
@@ -104,8 +104,14 @@  struct closure_syncer {
 
 static void closure_sync_fn(struct closure *cl)
 {
-	cl->s->done = 1;
-	wake_up_process(cl->s->task);
+	struct closure_syncer *s = cl->s;
+	struct task_struct *p;
+
+	rcu_read_lock();
+	p = READ_ONCE(s->task);
+	s->done = 1;
+	wake_up_process(p);
+	rcu_read_unlock();
 }
 
 void __sched __closure_sync(struct closure *cl)