[3/3] block: model freeze & enter queue as lock for supporting lockdep

Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue
and blk_enter_queue().

Turns out the two are just like acquiring read/write lock, so model them as
read/write lock for supporting lockdep:

1) model q->q_usage_counter as two locks(io and queue lock)
- queue lock covers sync with blk_enter_queue()

- io lock covers sync with bio_enter_queue()

2) make the lockdep class/key as per-queue:

- different subsystem has very different lock use pattern, shared lock class
causes false positive easily

- freeze_queue degrades to no lock in case that disk state becomes DEAD because
  bio_enter_queue() won't be blocked any more

- freeze_queue degrades to no lock in case that request queue becomes dying
  because blk_enter_queue() won't be blocked any more

3) model blk_mq_freeze_queue() as acquire_exclusive & try_lock
- it is exclusive lock, so dependency with blk_enter_queue() is covered

- it is trylock because blk_mq_freeze_queue() are allowed to run concurrently

4) model blk_enter_queue() & bio_enter_queue() as acquire_read()
- nested blk_enter_queue() are allowed

- dependency with blk_mq_freeze_queue() is covered

- blk_queue_exit() is often called from other contexts(such as irq), and
it can't be annotated as lock_release(), so simply do it in
blk_enter_queue(), this way still covered cases as many as possible

With lockdep support, such kind of reports may be reported asap and
needn't wait until the real deadlock is triggered.

For example, the following lockdep report can be triggered in the
report[3].

[   31.671822] ======================================================
[   31.673169] WARNING: possible circular locking dependency detected
[   31.674456] 6.11.0_nbd+ #411 Not tainted
[   31.675220] ------------------------------------------------------
[   31.676379] bash/1425 is trying to acquire lock:
[   31.676861] ffff990b8ea27530 (&q->limits_lock){+.+.}-{3:3}, at: queue_wc_store+0x8e/0x180
[   31.677268]
               but task is already holding lock:
[   31.677548] ffff990b8ea27410 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x75/0xc0
[   31.677931]
               which lock already depends on the new lock.

[   31.678315]
               the existing dependency chain (in reverse order) is:
[   31.678664]
               -> #2 (&q->sysfs_lock){+.+.}-{3:3}:
[   31.678951]        __mutex_lock+0xad/0xb20
[   31.679157]        queue_attr_store+0x75/0xc0
[   31.679366]        kernfs_fop_write_iter+0x15c/0x210
[   31.679608]        vfs_write+0x2a7/0x540
[   31.679801]        ksys_write+0x75/0x100
[   31.679999]        do_syscall_64+0x95/0x180
[   31.680209]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   31.680488]
               -> #1 (&q->q_usage_counter(queue)#2){++++}-{0:0}:
[   31.680839]        blk_queue_enter+0x195/0x1d0
[   31.681060]        blk_mq_alloc_request+0x136/0x2d0
[   31.681301]        scsi_execute_cmd+0x9c/0x4c0
[   31.681528]        read_capacity_16+0x116/0x410
[   31.681765]        sd_revalidate_disk.isra.0+0x54d/0x2f00
[   31.682044]        sd_probe+0x2ec/0x520
[   31.682238]        really_probe+0xd3/0x390
[   31.682445]        __driver_probe_device+0x78/0x150
[   31.682682]        driver_probe_device+0x1f/0x90
[   31.682908]        __device_attach_driver+0x89/0x110
[   31.683161]        bus_for_each_drv+0x95/0xf0
[   31.683377]        __device_attach_async_helper+0xa7/0xf0
[   31.683639]        async_run_entry_fn+0x31/0x130
[   31.683875]        process_one_work+0x212/0x700
[   31.684100]        worker_thread+0x1ce/0x380
[   31.684308]        kthread+0xd2/0x110
[   31.684490]        ret_from_fork+0x31/0x50
[   31.684700]        ret_from_fork_asm+0x1a/0x30
[   31.684922]
               -> #0 (&q->limits_lock){+.+.}-{3:3}:
[   31.685499]        __lock_acquire+0x15c0/0x23e0
[   31.685872]        lock_acquire+0xd8/0x300
[   31.686207]        __mutex_lock+0xad/0xb20
[   31.686535]        queue_wc_store+0x8e/0x180
[   31.686877]        queue_attr_store+0x84/0xc0
[   31.687231]        kernfs_fop_write_iter+0x15c/0x210
[   31.687594]        vfs_write+0x2a7/0x540
[   31.687907]        ksys_write+0x75/0x100
[   31.688219]        do_syscall_64+0x95/0x180
[   31.688534]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   31.688910]
               other info that might help us debug this:

[   31.689621] Chain exists of:
                 &q->limits_lock --> &q->q_usage_counter(queue)#2 --> &q->sysfs_lock

[   31.690549]  Possible unsafe locking scenario:

[   31.691060]        CPU0                    CPU1
[   31.691389]        ----                    ----
[   31.691716]   lock(&q->sysfs_lock);
[   31.691999]                                lock(&q->q_usage_counter(queue)#2);
[   31.692460]                                lock(&q->sysfs_lock);
[   31.692863]   lock(&q->limits_lock);
[   31.693155]
                *** DEADLOCK ***

[   31.693746] 6 locks held by bash/1425:
[   31.694043]  #0: ffff990b8007e420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x75/0x100
[   31.694543]  #1: ffff990bcf1a3288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x115/0x210
[   31.695119]  #2: ffff990b91888378 (kn->active#166){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x11e/0x210
[   31.695685]  #3: ffff990b8ea26ee8 (&q->q_usage_counter(io)#2){++++}-{0:0}, at: queue_attr_store+0x60/0xc0
[   31.696269]  #4: ffff990b8ea26f20 (&q->q_usage_counter(queue)#2){++++}-{0:0}, at: queue_attr_store+0x60/0xc0
[   31.696846]  #5: ffff990b8ea27410 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x75/0xc0
[   31.697381]
               stack backtrace:
[   31.697826] CPU: 9 UID: 0 PID: 1425 Comm: bash Not tainted 6.11.0_nbd+ #411
[   31.698285] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014
[   31.698807] Call Trace:
[   31.699058]  <TASK>
[   31.699289]  dump_stack_lvl+0x93/0xf0
[   31.699598]  print_circular_bug+0x26e/0x340
[   31.699924]  check_noncircular+0x16c/0x190
[   31.700251]  ? lock_acquire+0x2a1/0x300
[   31.700561]  __lock_acquire+0x15c0/0x23e0
[   31.700877]  lock_acquire+0xd8/0x300
[   31.701181]  ? queue_wc_store+0x8e/0x180
[   31.701502]  __mutex_lock+0xad/0xb20
[   31.701806]  ? queue_wc_store+0x8e/0x180
[   31.702128]  ? queue_wc_store+0x8e/0x180
[   31.702446]  ? queue_wc_store+0x8e/0x180
[   31.702761]  queue_wc_store+0x8e/0x180
[   31.703084]  ? __mutex_lock+0xad/0xb20
[   31.703385]  ? __mutex_lock+0x6e4/0xb20
[   31.703691]  ? mark_held_locks+0x40/0x70
[   31.704004]  ? queue_attr_store+0x75/0xc0
[   31.704317]  queue_attr_store+0x84/0xc0
[   31.704643]  kernfs_fop_write_iter+0x15c/0x210
[   31.704987]  vfs_write+0x2a7/0x540
[   31.705274]  ksys_write+0x75/0x100
[   31.705559]  do_syscall_64+0x95/0x180
[   31.705864]  ? do_user_addr_fault+0x361/0x790
[   31.706239]  ? trace_hardirqs_off+0x4b/0xc0
[   31.706564]  ? clear_bhb_loop+0x25/0x80
[   31.706966]  ? clear_bhb_loop+0x25/0x80
[   31.707272]  ? clear_bhb_loop+0x25/0x80
[   31.707568]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   31.707927] RIP: 0033:0x7fb69d85e174
[   31.708227] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 6d b4 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[   31.709343] RSP: 002b:00007ffed933fb48 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   31.709834] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb69d85e174
[   31.710311] RDX: 0000000000000005 RSI: 000055713d6fb7d0 RDI: 0000000000000001
[   31.710779] RBP: 00007ffed933fb70 R08: 0000000000000073 R09: 0000000000000001
[   31.711263] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000005
[   31.711755] R13: 000055713d6fb7d0 R14: 00007fb69d932780 R15: 0000000000000005
[   31.712242]  </TASK>

[1] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
https://bugzilla.kernel.org/show_bug.cgi?id=219166

[2] del_gendisk() vs blk_queue_enter() race condition
https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/

[3] queue_freeze & queue_enter deadlock in scsi
https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 18 ++++++++++++++++--
 block/blk-mq.c         | 26 ++++++++++++++++++++++----
 block/blk.h            | 29 ++++++++++++++++++++++++++---
 block/genhd.c          | 15 +++++++++++----
 include/linux/blkdev.h |  6 ++++++
 5 files changed, 81 insertions(+), 13 deletions(-)

Message ID	20241023095438.3451156-4-ming.lei@redhat.com (mailing list archive)
State	New
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6F1C1A7271 for <linux-block@vger.kernel.org>; Wed, 23 Oct 2024 09:55:14 +0000 (UTC) From: Ming Lei <ming.lei@redhat.com> To: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de>, Peter Zijlstra <peterz@infradead.org>, Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, linux-kernel@vger.kernel.org, Bart Van Assche <bvanassche@acm.org>, Ming Lei <ming.lei@redhat.com> Subject: [PATCH 3/3] block: model freeze & enter queue as lock for supporting lockdep Date: Wed, 23 Oct 2024 17:54:35 +0800 Message-ID: <20241023095438.3451156-4-ming.lei@redhat.com> In-Reply-To: <20241023095438.3451156-1-ming.lei@redhat.com> References: <20241023095438.3451156-1-ming.lei@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	block: model freeze/enter queue as lock for lockdep \| expand [0/3] block: model freeze/enter queue as lock for lockdep [1/3] blk-mq: add non_owner variant of start_freeze/unfreeze queue APIs [2/3] nvme: core: switch to non_owner variant of start_freeze/unfreeze queue [3/3] block: model freeze & enter queue as lock for supporting lockdep

[3/3] block: model freeze & enter queue as lock for supporting lockdep

Commit Message

Comments

Patch