diff mbox

bdi: Fix oops in wb_workfn()

Message ID 20180503162626.27753-1-jack@suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kara May 3, 2018, 4:26 p.m. UTC
Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb->bdi->dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:

	mod_delayed_work(bdi_wq, &wb->dwork, 0);

and if this happens while wb_shutdown() is waiting in:

	flush_delayed_work(&wb->dwork);

the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb->bdi->dev.

Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.

CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
CC: Tejun Heo <tj@kernel.org>
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dave Chinner May 3, 2018, 9:55 p.m. UTC | #1
On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote:
> Syzbot has reported that it can hit a NULL pointer dereference in
> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
> wb_workfn() was called for an already unregistered bdi which should not
> happen as wb_shutdown() called from bdi_unregister() should make sure
> all pending writeback works are completed before bdi is unregistered.
> Except that wb_workfn() itself can requeue the work with:
> 
> 	mod_delayed_work(bdi_wq, &wb->dwork, 0);
> 
> and if this happens while wb_shutdown() is waiting in:
> 
> 	flush_delayed_work(&wb->dwork);
> 
> the dwork can get executed after wb_shutdown() has finished and
> bdi_unregister() has cleared wb->bdi->dev.
> 
> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> the necessary precautions against racing with bdi unregistration.
> 
> CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> CC: Tejun Heo <tj@kernel.org>
> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
> Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/fs-writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 47d7c151fcba..471d863958bc 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
>  	}
>  
>  	if (!list_empty(&wb->work_list))
> -		mod_delayed_work(bdi_wq, &wb->dwork, 0);
> +		wb_wakeup(wb);
>  	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>  		wb_wakeup_delayed(wb);

Yup, looks fine - I can't see any more of these open coded wakeup,
either, so we should be good here.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

As an aside, why is half the wb infrastructure in fs/fs-writeback.c
and the other half in mm/backing-dev.c? it seems pretty random as to
what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost
identical, but are in completely different files...

Cheers,

Dave.
Jens Axboe May 3, 2018, 9:57 p.m. UTC | #2
On 5/3/18 3:55 PM, Dave Chinner wrote:
> On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote:
>> Syzbot has reported that it can hit a NULL pointer dereference in
>> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
>> wb_workfn() was called for an already unregistered bdi which should not
>> happen as wb_shutdown() called from bdi_unregister() should make sure
>> all pending writeback works are completed before bdi is unregistered.
>> Except that wb_workfn() itself can requeue the work with:
>>
>> 	mod_delayed_work(bdi_wq, &wb->dwork, 0);
>>
>> and if this happens while wb_shutdown() is waiting in:
>>
>> 	flush_delayed_work(&wb->dwork);
>>
>> the dwork can get executed after wb_shutdown() has finished and
>> bdi_unregister() has cleared wb->bdi->dev.
>>
>> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
>> the necessary precautions against racing with bdi unregistration.
>>
>> CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> CC: Tejun Heo <tj@kernel.org>
>> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
>> Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>> ---
>>  fs/fs-writeback.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 47d7c151fcba..471d863958bc 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
>>  	}
>>  
>>  	if (!list_empty(&wb->work_list))
>> -		mod_delayed_work(bdi_wq, &wb->dwork, 0);
>> +		wb_wakeup(wb);
>>  	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>>  		wb_wakeup_delayed(wb);
> 
> Yup, looks fine - I can't see any more of these open coded wakeup,
> either, so we should be good here.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> As an aside, why is half the wb infrastructure in fs/fs-writeback.c
> and the other half in mm/backing-dev.c? it seems pretty random as to
> what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost
> identical, but are in completely different files...

That's always bothered me too, it's due for a cleanup and bringing it
all into one location.
Tetsuo Handa May 3, 2018, 10:35 p.m. UTC | #3
Jan Kara wrote:
> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> the necessary precautions against racing with bdi unregistration.

Yes, this patch will solve NULL pointer dereference bug. But is it OK to leave
list_empty(&wb->work_list) == false situation? Who takes over the role of making
list_empty(&wb->work_list) == true?

Just a confirmation, for Fabiano Rosas is facing a problem that "write call
hangs in kernel space after virtio hot-remove" and is thinking that we might
need to go the opposite direction
( http://lkml.kernel.org/r/f0787b79-1e50-5f55-a400-44f715451777@linux.ibm.com ).
Jan Kara May 9, 2018, 9:47 a.m. UTC | #4
On Fri 04-05-18 07:35:34, Tetsuo Handa wrote:
> Jan Kara wrote:
> > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> > the necessary precautions against racing with bdi unregistration.
> 
> Yes, this patch will solve NULL pointer dereference bug. But is it OK to
> leave list_empty(&wb->work_list) == false situation? Who takes over the
> role of making list_empty(&wb->work_list) == true?

That's a good question. The reason is the last running instance of
wb_workfn() cannot leave with the work_list non-empty. Once WB_registered
is cleared we cannot add new entries to work_list. Then we'll queue and
flush last wb_workfn() to clean up the list. The problem with NULL ptr
deref has been triggered not by this last running wb_workfn() but by one
running independently in parallel to wb_shutdown(). So something like:

CPU0			CPU1			CPU2
wb_workfn()
  do {
    ...
  } while (!list_empty(&wb->work_list));
			wb_queue_work()
			  if (test_bit(WB_registered, &wb->state)) {
			    list_add_tail(&work->list, &wb->work_list);
			    mod_delayed_work(bdi_wq, &wb->dwork, 0);
			  }
						wb_shutdown()
						  if (!test_and_clear_bit(WB_registered, &wb->state)) {
						  ...
						  mod_delayed_work(bdi_wq, &wb->dwork, 0);
						  flush_delayed_work(&wb->dwork);
  if (!list_empty(&wb->work_list))
    mod_delayed_work(bdi_wq, &wb->dwork, 0); -> queues buggy work

> Just a confirmation, for Fabiano Rosas is facing a problem that "write call
> hangs in kernel space after virtio hot-remove" and is thinking that we might
> need to go the opposite direction
> ( http://lkml.kernel.org/r/f0787b79-1e50-5f55-a400-44f715451777@linux.ibm.com ).

Yes, I'm aware of that report and I think it should be solved
differently than what Fabiano suggests.

								Honza
Jan Kara May 9, 2018, 9:48 a.m. UTC | #5
On Fri 04-05-18 07:55:58, Dave Chinner wrote:
> On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote:
> > Syzbot has reported that it can hit a NULL pointer dereference in
> > wb_workfn() due to wb->bdi->dev being NULL. This indicates that
> > wb_workfn() was called for an already unregistered bdi which should not
> > happen as wb_shutdown() called from bdi_unregister() should make sure
> > all pending writeback works are completed before bdi is unregistered.
> > Except that wb_workfn() itself can requeue the work with:
> > 
> > 	mod_delayed_work(bdi_wq, &wb->dwork, 0);
> > 
> > and if this happens while wb_shutdown() is waiting in:
> > 
> > 	flush_delayed_work(&wb->dwork);
> > 
> > the dwork can get executed after wb_shutdown() has finished and
> > bdi_unregister() has cleared wb->bdi->dev.
> > 
> > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> > the necessary precautions against racing with bdi unregistration.
> > 
> > CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > CC: Tejun Heo <tj@kernel.org>
> > Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
> > Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/fs-writeback.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 47d7c151fcba..471d863958bc 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
> >  	}
> >  
> >  	if (!list_empty(&wb->work_list))
> > -		mod_delayed_work(bdi_wq, &wb->dwork, 0);
> > +		wb_wakeup(wb);
> >  	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
> >  		wb_wakeup_delayed(wb);
> 
> Yup, looks fine - I can't see any more of these open coded wakeup,
> either, so we should be good here.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Thanks!

> As an aside, why is half the wb infrastructure in fs/fs-writeback.c
> and the other half in mm/backing-dev.c? it seems pretty random as to
> what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost
> identical, but are in completely different files...

Yeah, it deserves a cleanup.

								Honza
Jan Kara May 9, 2018, 10:31 a.m. UTC | #6
On Thu 03-05-18 18:26:26, Jan Kara wrote:
> Syzbot has reported that it can hit a NULL pointer dereference in
> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
> wb_workfn() was called for an already unregistered bdi which should not
> happen as wb_shutdown() called from bdi_unregister() should make sure
> all pending writeback works are completed before bdi is unregistered.
> Except that wb_workfn() itself can requeue the work with:
> 
> 	mod_delayed_work(bdi_wq, &wb->dwork, 0);
> 
> and if this happens while wb_shutdown() is waiting in:
> 
> 	flush_delayed_work(&wb->dwork);
> 
> the dwork can get executed after wb_shutdown() has finished and
> bdi_unregister() has cleared wb->bdi->dev.
> 
> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> the necessary precautions against racing with bdi unregistration.
> 
> CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> CC: Tejun Heo <tj@kernel.org>
> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
> Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/fs-writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Jens, can you please pick up this patch? Probably for the next merge window
(I don't see a reason to rush this at this point in release cycle). Thanks!

								Honza

> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 47d7c151fcba..471d863958bc 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
>  	}
>  
>  	if (!list_empty(&wb->work_list))
> -		mod_delayed_work(bdi_wq, &wb->dwork, 0);
> +		wb_wakeup(wb);
>  	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>  		wb_wakeup_delayed(wb);
>  
> -- 
> 2.13.6
>
Jens Axboe May 9, 2018, 2:42 p.m. UTC | #7
On 5/9/18 4:31 AM, Jan Kara wrote:
> On Thu 03-05-18 18:26:26, Jan Kara wrote:
>> Syzbot has reported that it can hit a NULL pointer dereference in
>> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
>> wb_workfn() was called for an already unregistered bdi which should not
>> happen as wb_shutdown() called from bdi_unregister() should make sure
>> all pending writeback works are completed before bdi is unregistered.
>> Except that wb_workfn() itself can requeue the work with:
>>
>> 	mod_delayed_work(bdi_wq, &wb->dwork, 0);
>>
>> and if this happens while wb_shutdown() is waiting in:
>>
>> 	flush_delayed_work(&wb->dwork);
>>
>> the dwork can get executed after wb_shutdown() has finished and
>> bdi_unregister() has cleared wb->bdi->dev.
>>
>> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
>> the necessary precautions against racing with bdi unregistration.
>>
>> CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> CC: Tejun Heo <tj@kernel.org>
>> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
>> Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>> ---
>>  fs/fs-writeback.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Jens, can you please pick up this patch? Probably for the next merge window
> (I don't see a reason to rush this at this point in release cycle). Thanks!

Looks like I never replied that back, but I did pick it up, and it did
in fact go out last week for this series. So we should be all good. I
didn't see a need to postpone it, it's obviously correct and fixes
a real issue.
diff mbox

Patch

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 47d7c151fcba..471d863958bc 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1961,7 +1961,7 @@  void wb_workfn(struct work_struct *work)
 	}
 
 	if (!list_empty(&wb->work_list))
-		mod_delayed_work(bdi_wq, &wb->dwork, 0);
+		wb_wakeup(wb);
 	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
 		wb_wakeup_delayed(wb);