diff mbox

bdi: Fix another oops in wb_workfn()

Message ID 314ae2e0-c873-04ce-9cd5-fe2acadaee26@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Tetsuo Handa May 27, 2018, 2:21 a.m. UTC
From 8a8222698163d1fe180258566e9a3ff43f54fcd9 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sun, 27 May 2018 11:08:20 +0900
Subject: [PATCH] bdi: Fix another oops in wb_workfn()

syzbot is still hitting NULL pointer dereference at wb_workfn() [1].
This might be because we overlooked that delayed_work_timer_fn() does not
check WB_registered before calling __queue_work() while mod_delayed_work()
does not wait for already started delayed_work_timer_fn() because it uses
del_timer() rather than del_timer_sync().

Make wb_shutdown() be careful about wb_wakeup_delayed() path.

[1] https://syzkaller.appspot.com/bug?id=e0818ccb7e46190b3f1038b0c794299208ed4206

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
---
 mm/backing-dev.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Tejun Heo May 27, 2018, 2:36 a.m. UTC | #1
On Sun, May 27, 2018 at 11:21:25AM +0900, Tetsuo Handa wrote:
> From 8a8222698163d1fe180258566e9a3ff43f54fcd9 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Sun, 27 May 2018 11:08:20 +0900
> Subject: [PATCH] bdi: Fix another oops in wb_workfn()
> 
> syzbot is still hitting NULL pointer dereference at wb_workfn() [1].
> This might be because we overlooked that delayed_work_timer_fn() does not
> check WB_registered before calling __queue_work() while mod_delayed_work()
> does not wait for already started delayed_work_timer_fn() because it uses
> del_timer() rather than del_timer_sync().

It shouldn't be that as dwork timer is an irq safe timer.  Even if
that's the case, the right thing to do would be fixing workqueue
rather than reaching into workqueue internals from backing-dev code.

Thanks.
Tetsuo Handa May 27, 2018, 4:43 a.m. UTC | #2
Tejun Heo wrote:
> On Sun, May 27, 2018 at 11:21:25AM +0900, Tetsuo Handa wrote:
> > syzbot is still hitting NULL pointer dereference at wb_workfn() [1].
> > This might be because we overlooked that delayed_work_timer_fn() does not
> > check WB_registered before calling __queue_work() while mod_delayed_work()
> > does not wait for already started delayed_work_timer_fn() because it uses
> > del_timer() rather than del_timer_sync().
> 
> It shouldn't be that as dwork timer is an irq safe timer.  Even if
> that's the case, the right thing to do would be fixing workqueue
> rather than reaching into workqueue internals from backing-dev code.
> 

Do you think that there is possibility that __queue_work() is almost concurrently
executed from two CPUs, one from mod_delayed_work(bdi_wq, &wb->dwork, 0) from
wb_shutdown() path (which is called without spin_lock_bh(&wb->work_lock)) and
the other from delayed_work_timer_fn() path (which is called without checking
WB_registered bit under spin_lock_bh(&wb->work_lock)) ?
Tejun Heo May 29, 2018, 1:46 p.m. UTC | #3
On Sun, May 27, 2018 at 01:43:45PM +0900, Tetsuo Handa wrote:
> Tejun Heo wrote:
> > On Sun, May 27, 2018 at 11:21:25AM +0900, Tetsuo Handa wrote:
> > > syzbot is still hitting NULL pointer dereference at wb_workfn() [1].
> > > This might be because we overlooked that delayed_work_timer_fn() does not
> > > check WB_registered before calling __queue_work() while mod_delayed_work()
> > > does not wait for already started delayed_work_timer_fn() because it uses
> > > del_timer() rather than del_timer_sync().
> > 
> > It shouldn't be that as dwork timer is an irq safe timer.  Even if
> > that's the case, the right thing to do would be fixing workqueue
> > rather than reaching into workqueue internals from backing-dev code.
> > 
> 
> Do you think that there is possibility that __queue_work() is almost concurrently
> executed from two CPUs, one from mod_delayed_work(bdi_wq, &wb->dwork, 0) from
> wb_shutdown() path (which is called without spin_lock_bh(&wb->work_lock)) and
> the other from delayed_work_timer_fn() path (which is called without checking
> WB_registered bit under spin_lock_bh(&wb->work_lock)) ?

__queue_work() is gated by WORK_STRUCT_PENDING_BIT, so I don't see how
multiple instances would execute concurrently for the same work item.

Thanks.
diff mbox

Patch

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7441bd9..31e1d7e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -372,11 +372,24 @@  static void wb_shutdown(struct bdi_writeback *wb)
 
 	cgwb_remove_from_bdi_list(wb);
 	/*
+	 * mod_delayed_work() is not appropriate here, for
+	 * delayed_work_timer_fn() from wb_wakeup_delayed() does not check
+	 * WB_registered before calling __queue_work().
+	 */
+	del_timer_sync(&wb->dwork.timer);
+	/*
+	 * Clear WORK_STRUCT_PENDING_BIT in order to make sure that next
+	 * queue_delayed_work() actually enqueues this work to the tail, for
+	 * wb_wakeup_delayed() already set WORK_STRUCT_PENDING_BIT before
+	 * scheduling delayed_work_timer_fn().
+	 */
+	cancel_delayed_work_sync(&wb->dwork);
+	/*
 	 * Drain work list and shutdown the delayed_work.  !WB_registered
 	 * tells wb_workfn() that @wb is dying and its work_list needs to
 	 * be drained no matter what.
 	 */
-	mod_delayed_work(bdi_wq, &wb->dwork, 0);
+	queue_delayed_work(bdi_wq, &wb->dwork, 0);
 	flush_delayed_work(&wb->dwork);
 	WARN_ON(!list_empty(&wb->work_list));
 	/*