diff mbox series

[1/2] block: Revert "block: Fix potential deadlock while freezing queue and acquiring sysfs_lock"

Message ID 20241218101617.3275704-2-ming.lei@redhat.com (mailing list archive)
State New
Headers show
Series block: fix two regressions in v6.13 dev cycle | expand

Commit Message

Ming Lei Dec. 18, 2024, 10:16 a.m. UTC
This reverts commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2.

Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
callback registering out of q->sysfs_lock"), and causes the original resctrl
lockdep warning.

So revert it and we need to fix the issue in another way.

Cc: Nilay Shroff <nilay@linux.ibm.com>
Fixes: be26ba96421a ("block: Fix potential deadlock while freezing queue and acquiring sysfs_loc")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sysfs.c | 16 ++++++++++------
 block/blk-mq.c       | 29 +++++++++++------------------
 block/blk-sysfs.c    |  4 ++--
 3 files changed, 23 insertions(+), 26 deletions(-)

Comments

Nilay Shroff Dec. 20, 2024, 10:23 a.m. UTC | #1
On 12/18/24 15:46, Ming Lei wrote:
> This reverts commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2.
> 
> Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
> acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
> callback registering out of q->sysfs_lock"), and causes the original resctrl
> lockdep warning.
> 
> So revert it and we need to fix the issue in another way.
> 
Hi Ming,

Can we wait here for some more time before we revert this as this is
currently being discussed[1] and we don't know yet how we may fix it?

[1]https://lore.kernel.org/all/20241219061514.GA19575@lst.de/

Thanks,
--Nilay
Jens Axboe Dec. 20, 2024, 3:09 p.m. UTC | #2
On 12/20/24 3:23 AM, Nilay Shroff wrote:
> On 12/18/24 15:46, Ming Lei wrote:
>> This reverts commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2.
>>
>> Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
>> acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
>> callback registering out of q->sysfs_lock"), and causes the original resctrl
>> lockdep warning.
>>
>> So revert it and we need to fix the issue in another way.
>>
> Hi Ming,
> 
> Can we wait here for some more time before we revert this as this is
> currently being discussed[1] and we don't know yet how we may fix it?
> 
> [1]https://lore.kernel.org/all/20241219061514.GA19575@lst.de/

It's already queued up and will go out today. Doesn't exclude people
working on solving this for real.
Nilay Shroff Dec. 20, 2024, 3:18 p.m. UTC | #3
On 12/20/24 20:39, Jens Axboe wrote:
> On 12/20/24 3:23 AM, Nilay Shroff wrote:
>> On 12/18/24 15:46, Ming Lei wrote:
>>> This reverts commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2.
>>>
>>> Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
>>> acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
>>> callback registering out of q->sysfs_lock"), and causes the original resctrl
>>> lockdep warning.
>>>
>>> So revert it and we need to fix the issue in another way.
>>>
>> Hi Ming,
>>
>> Can we wait here for some more time before we revert this as this is
>> currently being discussed[1] and we don't know yet how we may fix it?
>>
>> [1]https://lore.kernel.org/all/20241219061514.GA19575@lst.de/
> 
> It's already queued up and will go out today. Doesn't exclude people
> working on solving this for real.
> 
Sure, my bad. I will ensure this in future.

Thanks,
--Nilay
Ming Lei Dec. 20, 2024, 3:24 p.m. UTC | #4
On Fri, Dec 20, 2024 at 11:10 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 12/20/24 3:23 AM, Nilay Shroff wrote:
> > On 12/18/24 15:46, Ming Lei wrote:
> >> This reverts commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2.
> >>
> >> Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
> >> acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
> >> callback registering out of q->sysfs_lock"), and causes the original resctrl
> >> lockdep warning.
> >>
> >> So revert it and we need to fix the issue in another way.
> >>
> > Hi Ming,
> >
> > Can we wait here for some more time before we revert this as this is
> > currently being discussed[1] and we don't know yet how we may fix it?
> >
> > [1]https://lore.kernel.org/all/20241219061514.GA19575@lst.de/
>
> It's already queued up and will go out today. Doesn't exclude people
> working on solving this for real.

IMO, it is correct to cut the dependency between q->sysfs_lock and
global cpu hotplug lock, otherwise more subsystems can be involved,
so let's revert it first.

Christoph is also working on q->sysfs_lock warning, and we can try to
figure out other solutions given the involved(or most of) locks should be
from block layer internal.

https://lore.kernel.org/linux-block/20241219061514.GA19575@lst.de/

Thanks,
Ming
Nilay Shroff Dec. 21, 2024, 12:31 p.m. UTC | #5
On 12/20/24 20:54, Ming Lei wrote:
> On Fri, Dec 20, 2024 at 11:10 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 12/20/24 3:23 AM, Nilay Shroff wrote:
>>> On 12/18/24 15:46, Ming Lei wrote:
>>>> This reverts commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2.
>>>>
>>>> Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
>>>> acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
>>>> callback registering out of q->sysfs_lock"), and causes the original resctrl
>>>> lockdep warning.
>>>>
>>>> So revert it and we need to fix the issue in another way.
>>>>
>>> Hi Ming,
>>>
>>> Can we wait here for some more time before we revert this as this is
>>> currently being discussed[1] and we don't know yet how we may fix it?
>>>
>>> [1]https://lore.kernel.org/all/20241219061514.GA19575@lst.de/
>>
>> It's already queued up and will go out today. Doesn't exclude people
>> working on solving this for real.
> 
> IMO, it is correct to cut the dependency between q->sysfs_lock and
> global cpu hotplug lock, otherwise more subsystems can be involved,
> so let's revert it first.
> 
Yeah agreed, we may want to in that case just remove lockdep aseert of 
q->sysfs_lock from blk_mq_realloc_hw_ctxs() and also remove q->sysfs_lock 
from blk_mq_init_allocated_queue(). But that's ok. Lets see how we'd pursue 
further and solve other limits-lock and queue-freeze issue. 

> Christoph is also working on q->sysfs_lock warning, and we can try to
> figure out other solutions given the involved(or most of) locks should be
> from block layer internal.
> 
> https://lore.kernel.org/linux-block/20241219061514.GA19575@lst.de/
> 
 
Thanks,
--Nilay
diff mbox series

Patch

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index cd5ea6eaa76b..156e9bb07abf 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -275,13 +275,15 @@  void blk_mq_sysfs_unregister_hctxs(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
-
+	mutex_lock(&q->sysfs_dir_lock);
 	if (!q->mq_sysfs_init_done)
-		return;
+		goto unlock;
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_unregister_hctx(hctx);
+
+unlock:
+	mutex_unlock(&q->sysfs_dir_lock);
 }
 
 int blk_mq_sysfs_register_hctxs(struct request_queue *q)
@@ -290,10 +292,9 @@  int blk_mq_sysfs_register_hctxs(struct request_queue *q)
 	unsigned long i;
 	int ret = 0;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
-
+	mutex_lock(&q->sysfs_dir_lock);
 	if (!q->mq_sysfs_init_done)
-		return ret;
+		goto unlock;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_register_hctx(hctx);
@@ -301,5 +302,8 @@  int blk_mq_sysfs_register_hctxs(struct request_queue *q)
 			break;
 	}
 
+unlock:
+	mutex_unlock(&q->sysfs_dir_lock);
+
 	return ret;
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6b6111513986..92e8ddf34575 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4453,8 +4453,7 @@  static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	unsigned long i, j;
 
 	/* protect against switching io scheduler  */
-	lockdep_assert_held(&q->sysfs_lock);
-
+	mutex_lock(&q->sysfs_lock);
 	for (i = 0; i < set->nr_hw_queues; i++) {
 		int old_node;
 		int node = blk_mq_get_hctx_node(set, i);
@@ -4487,6 +4486,7 @@  static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 
 	xa_for_each_start(&q->hctx_table, j, hctx, j)
 		blk_mq_exit_hctx(q, set, hctx, j);
+	mutex_unlock(&q->sysfs_lock);
 
 	/* unregister cpuhp callbacks for exited hctxs */
 	blk_mq_remove_hw_queues_cpuhp(q);
@@ -4518,14 +4518,10 @@  int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 
 	xa_init(&q->hctx_table);
 
-	mutex_lock(&q->sysfs_lock);
-
 	blk_mq_realloc_hw_ctxs(set, q);
 	if (!q->nr_hw_queues)
 		goto err_hctxs;
 
-	mutex_unlock(&q->sysfs_lock);
-
 	INIT_WORK(&q->timeout_work, blk_mq_timeout_work);
 	blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
 
@@ -4544,7 +4540,6 @@  int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	return 0;
 
 err_hctxs:
-	mutex_unlock(&q->sysfs_lock);
 	blk_mq_release(q);
 err_exit:
 	q->mq_ops = NULL;
@@ -4925,12 +4920,12 @@  static bool blk_mq_elv_switch_none(struct list_head *head,
 		return false;
 
 	/* q->elevator needs protection from ->sysfs_lock */
-	lockdep_assert_held(&q->sysfs_lock);
+	mutex_lock(&q->sysfs_lock);
 
 	/* the check has to be done with holding sysfs_lock */
 	if (!q->elevator) {
 		kfree(qe);
-		goto out;
+		goto unlock;
 	}
 
 	INIT_LIST_HEAD(&qe->node);
@@ -4940,7 +4935,9 @@  static bool blk_mq_elv_switch_none(struct list_head *head,
 	__elevator_get(qe->type);
 	list_add(&qe->node, head);
 	elevator_disable(q);
-out:
+unlock:
+	mutex_unlock(&q->sysfs_lock);
+
 	return true;
 }
 
@@ -4969,9 +4966,11 @@  static void blk_mq_elv_switch_back(struct list_head *head,
 	list_del(&qe->node);
 	kfree(qe);
 
+	mutex_lock(&q->sysfs_lock);
 	elevator_switch(q, t);
 	/* drop the reference acquired in blk_mq_elv_switch_none */
 	elevator_put(t);
+	mutex_unlock(&q->sysfs_lock);
 }
 
 static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
@@ -4991,11 +4990,8 @@  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues)
 		return;
 
-	list_for_each_entry(q, &set->tag_list, tag_set_list) {
-		mutex_lock(&q->sysfs_dir_lock);
-		mutex_lock(&q->sysfs_lock);
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_freeze_queue(q);
-	}
 	/*
 	 * Switch IO scheduler to 'none', cleaning up the data associated
 	 * with the previous scheduler. We will switch back once we are done
@@ -5051,11 +5047,8 @@  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_elv_switch_back(&head, q);
 
-	list_for_each_entry(q, &set->tag_list, tag_set_list) {
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_unfreeze_queue(q);
-		mutex_unlock(&q->sysfs_lock);
-		mutex_unlock(&q->sysfs_dir_lock);
-	}
 
 	/* Free the excess tags when nr_hw_queues shrink. */
 	for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 64f70c713d2f..767598e719ab 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -706,11 +706,11 @@  queue_attr_store(struct kobject *kobj, struct attribute *attr,
 	if (entry->load_module)
 		entry->load_module(disk, page, length);
 
-	mutex_lock(&q->sysfs_lock);
 	blk_mq_freeze_queue(q);
+	mutex_lock(&q->sysfs_lock);
 	res = entry->store(disk, page, length);
-	blk_mq_unfreeze_queue(q);
 	mutex_unlock(&q->sysfs_lock);
+	blk_mq_unfreeze_queue(q);
 	return res;
 }