diff mbox series

[2/3] block: Fix __blk_mq_update_nr_hw_queues() queue freeze and limits lock order

Message ID 20250104132522.247376-3-dlemoal@kernel.org (mailing list archive)
State New
Headers show
Series Fix queue freeze and limit locking order | expand

Commit Message

Damien Le Moal Jan. 4, 2025, 1:25 p.m. UTC
__blk_mq_update_nr_hw_queues() freezes a device queues during operation,
which also includes updating the BLK_FEAT_POLL feature flag for the
device queues using queue_limits_start_update() and
queue_limits_commit_update(). This call order thus creates an invalid
ordering of a queue freeze and queue limit locking which can lead to a
deadlock when the device driver must issue commands to probe the device
when revalidating its limits.

Avoid this issue by moving the update of the BLK_FEAT_POLL feature flag
out of the main queue remapping loop to the end of
__blk_mq_update_nr_hw_queues(), after the device queues have been
unfrozen.

Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-mq.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig Jan. 6, 2025, 8:30 a.m. UTC | #1
On Sat, Jan 04, 2025 at 10:25:21PM +0900, Damien Le Moal wrote:
> __blk_mq_update_nr_hw_queues() freezes a device queues during operation,
> which also includes updating the BLK_FEAT_POLL feature flag for the
> device queues using queue_limits_start_update() and
> queue_limits_commit_update(). This call order thus creates an invalid
> ordering of a queue freeze and queue limit locking which can lead to a
> deadlock when the device driver must issue commands to probe the device
> when revalidating its limits.
> 
> Avoid this issue by moving the update of the BLK_FEAT_POLL feature flag
> out of the main queue remapping loop to the end of
> __blk_mq_update_nr_hw_queues(), after the device queues have been
> unfrozen.

What happens if I/O is queued after the unfreeze, but before clearing
the poll flag?
Damien Le Moal Jan. 6, 2025, 9:58 a.m. UTC | #2
On 1/6/25 5:30 PM, Christoph Hellwig wrote:
> On Sat, Jan 04, 2025 at 10:25:21PM +0900, Damien Le Moal wrote:
>> __blk_mq_update_nr_hw_queues() freezes a device queues during operation,
>> which also includes updating the BLK_FEAT_POLL feature flag for the
>> device queues using queue_limits_start_update() and
>> queue_limits_commit_update(). This call order thus creates an invalid
>> ordering of a queue freeze and queue limit locking which can lead to a
>> deadlock when the device driver must issue commands to probe the device
>> when revalidating its limits.
>>
>> Avoid this issue by moving the update of the BLK_FEAT_POLL feature flag
>> out of the main queue remapping loop to the end of
>> __blk_mq_update_nr_hw_queues(), after the device queues have been
>> unfrozen.
> 
> What happens if I/O is queued after the unfreeze, but before clearing
> the poll flag?

Ah, yes, that would potentially be an issue... Hmmm... Maybe a better solution
would be to move the start update out of the main loop and do it first, before
the freeze. What I do not fully understand with the code of this function is
that it does freeze and limit update for each tag list of the tag set, but
there is only a single request queue for all of them. So I am confused. Why
does the blk_mq_freeze_queue and poll limit setting have to be done in the loop
multiple times for each tag list ? I do not see it... If we can move these out
of the tag list loops, then correcting the ordering becomes easy.
Christoph Hellwig Jan. 6, 2025, 10 a.m. UTC | #3
On Mon, Jan 06, 2025 at 06:58:00PM +0900, Damien Le Moal wrote:
> Ah, yes, that would potentially be an issue... Hmmm... Maybe a better solution
> would be to move the start update out of the main loop and do it first, before
> the freeze. What I do not fully understand with the code of this function is
> that it does freeze and limit update for each tag list of the tag set, but
> there is only a single request queue for all of them. So I am confused. Why
> does the blk_mq_freeze_queue and poll limit setting have to be done in the loop
> multiple times for each tag list ? I do not see it... If we can move these out
> of the tag list loops, then correcting the ordering becomes easy.

I've come up with a version that simply never updates the flag. About
to send it out.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8ac19d4ae3c0..1f63067790c8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -5021,8 +5021,6 @@  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 fallback:
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
-		struct queue_limits lim;
-
 		blk_mq_realloc_hw_ctxs(set, q);
 
 		if (q->nr_hw_queues != set->nr_hw_queues) {
@@ -5036,13 +5034,6 @@  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 			set->nr_hw_queues = prev_nr_hw_queues;
 			goto fallback;
 		}
-		lim = queue_limits_start_update(q);
-		if (blk_mq_can_poll(set))
-			lim.features |= BLK_FEAT_POLL;
-		else
-			lim.features &= ~BLK_FEAT_POLL;
-		if (queue_limits_commit_update(q, &lim) < 0)
-			pr_warn("updating the poll flag failed\n");
 		blk_mq_map_swqueue(q);
 	}
 
@@ -5059,6 +5050,22 @@  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_unfreeze_queue(q);
 
+	list_for_each_entry(q, &set->tag_list, tag_set_list) {
+		struct queue_limits lim;
+		int ret;
+
+		lim = queue_limits_start_update(q);
+		if (blk_mq_can_poll(set))
+			lim.features |= BLK_FEAT_POLL;
+		else
+			lim.features &= ~BLK_FEAT_POLL;
+		blk_mq_freeze_queue(q);
+		ret = queue_limits_commit_update(q, &lim);
+		blk_mq_unfreeze_queue(q);
+		if (ret < 0)
+			pr_warn("updating the poll flag failed\n");
+	}
+
 	/* Free the excess tags when nr_hw_queues shrink. */
 	for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++)
 		__blk_mq_free_map_and_rqs(set, i);