Message ID | 20250104132522.247376-3-dlemoal@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix queue freeze and limit locking order | expand |
On Sat, Jan 04, 2025 at 10:25:21PM +0900, Damien Le Moal wrote: > __blk_mq_update_nr_hw_queues() freezes a device queues during operation, > which also includes updating the BLK_FEAT_POLL feature flag for the > device queues using queue_limits_start_update() and > queue_limits_commit_update(). This call order thus creates an invalid > ordering of a queue freeze and queue limit locking which can lead to a > deadlock when the device driver must issue commands to probe the device > when revalidating its limits. > > Avoid this issue by moving the update of the BLK_FEAT_POLL feature flag > out of the main queue remapping loop to the end of > __blk_mq_update_nr_hw_queues(), after the device queues have been > unfrozen. What happens if I/O is queued after the unfreeze, but before clearing the poll flag?
On 1/6/25 5:30 PM, Christoph Hellwig wrote: > On Sat, Jan 04, 2025 at 10:25:21PM +0900, Damien Le Moal wrote: >> __blk_mq_update_nr_hw_queues() freezes a device queues during operation, >> which also includes updating the BLK_FEAT_POLL feature flag for the >> device queues using queue_limits_start_update() and >> queue_limits_commit_update(). This call order thus creates an invalid >> ordering of a queue freeze and queue limit locking which can lead to a >> deadlock when the device driver must issue commands to probe the device >> when revalidating its limits. >> >> Avoid this issue by moving the update of the BLK_FEAT_POLL feature flag >> out of the main queue remapping loop to the end of >> __blk_mq_update_nr_hw_queues(), after the device queues have been >> unfrozen. > > What happens if I/O is queued after the unfreeze, but before clearing > the poll flag? Ah, yes, that would potentially be an issue... Hmmm... Maybe a better solution would be to move the start update out of the main loop and do it first, before the freeze. What I do not fully understand with the code of this function is that it does freeze and limit update for each tag list of the tag set, but there is only a single request queue for all of them. So I am confused. Why does the blk_mq_freeze_queue and poll limit setting have to be done in the loop multiple times for each tag list ? I do not see it... If we can move these out of the tag list loops, then correcting the ordering becomes easy.
On Mon, Jan 06, 2025 at 06:58:00PM +0900, Damien Le Moal wrote: > Ah, yes, that would potentially be an issue... Hmmm... Maybe a better solution > would be to move the start update out of the main loop and do it first, before > the freeze. What I do not fully understand with the code of this function is > that it does freeze and limit update for each tag list of the tag set, but > there is only a single request queue for all of them. So I am confused. Why > does the blk_mq_freeze_queue and poll limit setting have to be done in the loop > multiple times for each tag list ? I do not see it... If we can move these out > of the tag list loops, then correcting the ordering becomes easy. I've come up with a version that simply never updates the flag. About to send it out.
diff --git a/block/blk-mq.c b/block/blk-mq.c index 8ac19d4ae3c0..1f63067790c8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -5021,8 +5021,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, fallback: blk_mq_update_queue_map(set); list_for_each_entry(q, &set->tag_list, tag_set_list) { - struct queue_limits lim; - blk_mq_realloc_hw_ctxs(set, q); if (q->nr_hw_queues != set->nr_hw_queues) { @@ -5036,13 +5034,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, set->nr_hw_queues = prev_nr_hw_queues; goto fallback; } - lim = queue_limits_start_update(q); - if (blk_mq_can_poll(set)) - lim.features |= BLK_FEAT_POLL; - else - lim.features &= ~BLK_FEAT_POLL; - if (queue_limits_commit_update(q, &lim) < 0) - pr_warn("updating the poll flag failed\n"); blk_mq_map_swqueue(q); } @@ -5059,6 +5050,22 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, list_for_each_entry(q, &set->tag_list, tag_set_list) blk_mq_unfreeze_queue(q); + list_for_each_entry(q, &set->tag_list, tag_set_list) { + struct queue_limits lim; + int ret; + + lim = queue_limits_start_update(q); + if (blk_mq_can_poll(set)) + lim.features |= BLK_FEAT_POLL; + else + lim.features &= ~BLK_FEAT_POLL; + blk_mq_freeze_queue(q); + ret = queue_limits_commit_update(q, &lim); + blk_mq_unfreeze_queue(q); + if (ret < 0) + pr_warn("updating the poll flag failed\n"); + } + /* Free the excess tags when nr_hw_queues shrink. */ for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++) __blk_mq_free_map_and_rqs(set, i);
__blk_mq_update_nr_hw_queues() freezes a device queues during operation, which also includes updating the BLK_FEAT_POLL feature flag for the device queues using queue_limits_start_update() and queue_limits_commit_update(). This call order thus creates an invalid ordering of a queue freeze and queue limit locking which can lead to a deadlock when the device driver must issue commands to probe the device when revalidating its limits. Avoid this issue by moving the update of the BLK_FEAT_POLL feature flag out of the main queue remapping loop to the end of __blk_mq_update_nr_hw_queues(), after the device queues have been unfrozen. Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> --- block/blk-mq.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-)