[26/37] bcache: fix potential deadlock in cached_def_free()

When enable lockdep and reboot system with a writeback mode bcache
device, the following potential deadlock warning is reported by lockdep
engine.

[  101.536569][  T401] kworker/2:2/401 is trying to acquire lock:
[  101.538575][  T401] 00000000bbf6e6c7 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
[  101.542054][  T401]
[  101.542054][  T401] but task is already holding lock:
[  101.544587][  T401] 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[  101.548386][  T401]
[  101.548386][  T401] which lock already depends on the new lock.
[  101.548386][  T401]
[  101.551874][  T401]
[  101.551874][  T401] the existing dependency chain (in reverse order) is:
[  101.555000][  T401]
[  101.555000][  T401] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
[  101.557860][  T401]        process_one_work+0x277/0x640
[  101.559661][  T401]        worker_thread+0x39/0x3f0
[  101.561340][  T401]        kthread+0x125/0x140
[  101.562963][  T401]        ret_from_fork+0x3a/0x50
[  101.564718][  T401]
[  101.564718][  T401] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
[  101.567701][  T401]        lock_acquire+0xb4/0x1c0
[  101.569651][  T401]        flush_workqueue+0xae/0x4c0
[  101.571494][  T401]        drain_workqueue+0xa9/0x180
[  101.573234][  T401]        destroy_workqueue+0x17/0x250
[  101.575109][  T401]        cached_dev_free+0x44/0x120 [bcache]
[  101.577304][  T401]        process_one_work+0x2a4/0x640
[  101.579357][  T401]        worker_thread+0x39/0x3f0
[  101.581055][  T401]        kthread+0x125/0x140
[  101.582709][  T401]        ret_from_fork+0x3a/0x50
[  101.584592][  T401]
[  101.584592][  T401] other info that might help us debug this:
[  101.584592][  T401]
[  101.588355][  T401]  Possible unsafe locking scenario:
[  101.588355][  T401]
[  101.590974][  T401]        CPU0                    CPU1
[  101.592889][  T401]        ----                    ----
[  101.594743][  T401]   lock((work_completion)(&cl->work)#2);
[  101.596785][  T401]                                lock((wq_completion)bcache_writeback_wq);
[  101.600072][  T401]                                lock((work_completion)(&cl->work)#2);
[  101.602971][  T401]   lock((wq_completion)bcache_writeback_wq);
[  101.605255][  T401]
[  101.605255][  T401]  *** DEADLOCK ***
[  101.605255][  T401]
[  101.608310][  T401] 2 locks held by kworker/2:2/401:
[  101.610208][  T401]  #0: 00000000cf2c7d17 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
[  101.613709][  T401]  #1: 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[  101.617480][  T401]
[  101.617480][  T401] stack backtrace:
[  101.619539][  T401] CPU: 2 PID: 401 Comm: kworker/2:2 Tainted: G        W         5.2.0-rc4-lp151.20-default+ #1
[  101.623225][  T401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[  101.627210][  T401] Workqueue: events cached_dev_free [bcache]
[  101.629239][  T401] Call Trace:
[  101.630360][  T401]  dump_stack+0x85/0xcb
[  101.631777][  T401]  print_circular_bug+0x19a/0x1f0
[  101.633485][  T401]  __lock_acquire+0x16cd/0x1850
[  101.635184][  T401]  ? __lock_acquire+0x6a8/0x1850
[  101.636863][  T401]  ? lock_acquire+0xb4/0x1c0
[  101.638421][  T401]  ? find_held_lock+0x34/0xa0
[  101.640015][  T401]  lock_acquire+0xb4/0x1c0
[  101.641513][  T401]  ? flush_workqueue+0x87/0x4c0
[  101.643248][  T401]  flush_workqueue+0xae/0x4c0
[  101.644832][  T401]  ? flush_workqueue+0x87/0x4c0
[  101.646476][  T401]  ? drain_workqueue+0xa9/0x180
[  101.648303][  T401]  drain_workqueue+0xa9/0x180
[  101.649867][  T401]  destroy_workqueue+0x17/0x250
[  101.651503][  T401]  cached_dev_free+0x44/0x120 [bcache]
[  101.653328][  T401]  process_one_work+0x2a4/0x640
[  101.655029][  T401]  worker_thread+0x39/0x3f0
[  101.656693][  T401]  ? process_one_work+0x640/0x640
[  101.658501][  T401]  kthread+0x125/0x140
[  101.660012][  T401]  ? kthread_create_worker_on_cpu+0x70/0x70
[  101.661985][  T401]  ret_from_fork+0x3a/0x50
[  101.691318][  T401] bcache: bcache_device_free() bcache0 stopped

Here is how the above potential deadlock may happen in reboot/shutdown
code path,
1) bcache_reboot() is called firstly in the reboot/shutdown code path,
   then in bcache_reboot(), bcache_device_stop() is called.
2) bcache_device_stop() sets BCACHE_DEV_CLOSING on d->falgs, then call
   closure_queue(&d->cl) to invoke cached_dev_flush(). And in turn
   cached_dev_flush() calls cached_dev_free() via closure_at()
3) In cached_dev_free(), after stopped writebach kthread
   dc->writeback_thread, the kwork dc->writeback_write_wq is stopping by
   destroy_workqueue().
4) Inside destroy_workqueue(), drain_workqueue() is called. Inside
   drain_workqueue(), flush_workqueue() is called. Then wq->lockdep_map
   is acquired by lock_map_acquire() in flush_workqueue(). After the
   lock acquired the rest part of flush_workqueue() just wait for the
   workqueue to complete.
5) Now we look back at writeback thread routine bch_writeback_thread(),
   in the main while-loop, write_dirty() is called via continue_at() in
   read_dirty_submit(), which is called via continue_at() in while-loop
   level called function read_dirty(). Inside write_dirty() it may be
   re-called on workqueeu dc->writeback_write_wq via continue_at().
   It means when the writeback kthread is stopped in cached_dev_free()
   there might be still one kworker queued on dc->writeback_write_wq
   to execute write_dirty() again.
6) Now this kworker is scheduled on dc->writeback_write_wq to run by
   process_one_work() (which is called by worker_thread()). Before
   calling the kwork routine, wq->lockdep_map is acquired.
7) But wq->lockdep_map is acquired already in step 4), so a A-A lock
   (lockdep terminology) scenario happens.

Indeed on multiple cores syatem, the above deadlock is very rare to
happen, just as the code comments in process_one_work() says,
2263     * AFAICT there is no possible deadlock scenario between the
2264     * flush_work() and complete() primitives (except for
	   single-threaded
2265     * workqueues), so hiding them isn't a problem.

But it is still good to fix such lockdep warning, even no one running
bcache on single core system.

The fix is simple. This patch solves the above potential deadlock by,
- Do not destroy workqueue dc->writeback_write_wq in cached_dev_free().
- Flush and destroy dc->writeback_write_wq in writebach kthread routine
  bch_writeback_thread(), where after quit the thread main while-loop
  and before cached_dev_put() is called.

By this fix, dc->writeback_write_wq will be stopped and destroy before
the writeback kthread stopped, so the chance for a A-A locking on
wq->lockdep_map is disappeared, such A-A deadlock won't happen
any more.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c     | 2 --
 drivers/md/bcache/writeback.c | 4 ++++
 2 files changed, 4 insertions(+), 2 deletions(-)

Message ID	20190628120000.40753-27-colyli@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Coly Li <colyli@suse.de> To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li <colyli@suse.de> Subject: [PATCH 26/37] bcache: fix potential deadlock in cached_def_free() Date: Fri, 28 Jun 2019 19:59:49 +0800 Message-Id: <20190628120000.40753-27-colyli@suse.de> In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	bcache patches for Linux v5.3 \| expand [00/37] bcache patches for Linux v5.3 [01/37] bcache: don't set max writeback rate if gc is running [02/37] bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() [03/37] bcache: fix return value error in bch_journal_read() [04/37] Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()" [05/37] bcache: avoid flushing btree node in cache_set_flush() if io disabled [06/37] bcache: ignore read-ahead request failure on backing device [07/37] bcache: add io error counting in write_bdev_super_endio() [08/37] bcache: remove unnecessary prefetch() in bset_search_tree() [09/37] bcache: use sysfs_match_string() instead of __sysfs_match_string() [10/37] bcache: add return value check to bch_cached_dev_run() [11/37] bcache: remove unncessary code in bch_btree_keys_init() [12/37] bcache: check CACHE_SET_IO_DISABLE in allocator code [13/37] bcache: check CACHE_SET_IO_DISABLE bit in bch_journal() [14/37] bcache: more detailed error message to bcache_device_link() [15/37] bcache: add more error message in bch_cached_dev_attach() [16/37] bcache: improve error message in bch_cached_dev_run() [17/37] bcache: remove "XXX:" comment line from run_cache_set() [18/37] bcache: make bset_search_tree() be more understandable [19/37] bcache: add pendings_cleanup to stop pending bcache device [20/37] bcache: fix mistaken sysfs entry for io_error counter [21/37] bcache: destroy dc->writeback_write_wq if failed to create dc->writeback_thread [22/37] bcache: stop writeback kthread and kworker when bch_cached_dev_run() failed [23/37] bcache: avoid a deadlock in bcache_reboot() [24/37] bcache: acquire bch_register_lock later in cached_dev_detach_finish() [25/37] bcache: acquire bch_register_lock later in cached_dev_free() [26/37] bcache: fix potential deadlock in cached_def_free() [27/37] bcache: add code comments for journal_read_bucket() [28/37] bcache: set largest seq to ja->seq[bucket_index] in journal_read_bucket() [29/37] bcache: shrink btree node cache after bch_btree_check() [30/37] bcache: Revert "bcache: free heap cache_set->flush_btree in bch_journal_free" [31/37] bcache: Revert "bcache: fix high CPU occupancy during journal" [32/37] bcache: only clear BTREE_NODE_dirty bit when it is set [33/37] bcache: add comments for mutex_lock(&b->write_lock) [34/37] bcache: remove retry_flush_write from struct cache_set [35/37] bcache: fix race in btree_flush_write() [36/37] bcache: performance improvement for btree_flush_write() [37/37] bcache: add reclaimed_journal_buckets to struct cache_set

[26/37] bcache: fix potential deadlock in cached_def_free()

Commit Message

Patch