diff mbox series

[1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits

Message ID 20241216080206.2850773-2-ming.lei@redhat.com (mailing list archive)
State New
Headers show
Series block: fix deadlock caused by atomic limits update | expand

Commit Message

Ming Lei Dec. 16, 2024, 8:02 a.m. UTC
Commit d690cb8ae14b ("block: add an API to atomically update queue limits")
adds APIs for updating queue limits atomically. And q->limits_lock is
grabbed in queue_limits_start_update(), and released in queue_limits_commit_update().

This way is very fragile and easy to introduce deadlock[1][2].

More importantly, queue_limits_start_update() returns one local copy of
q->limits, then the API user overwrites the local copy, and commit the
copy by queue_limits_commit_update() finally.

So holding q->limits_lock protects nothing for the local copy, and not see
any real help by preventing new update & commit from happening, cause
what matters is that we do validation & commit atomically.

Changes the API to not hold q->limits_lock across atomic queue limits update
APIs for fixing deadlock & making it easy to use.

[1] https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
[2] https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/

Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c   | 2 +-
 include/linux/blkdev.h | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Dec. 16, 2024, 3:49 p.m. UTC | #1
On Mon, Dec 16, 2024 at 04:02:03PM +0800, Ming Lei wrote:
> More importantly, queue_limits_start_update() returns one local copy of
> q->limits, then the API user overwrites the local copy, and commit the
> copy by queue_limits_commit_update() finally.
> 
> So holding q->limits_lock protects nothing for the local copy, and not see
> any real help by preventing new update & commit from happening, cause
> what matters is that we do validation & commit atomically.

It protects against someone else changing the copy in the queue while
the current thread is updating the local copy, and thus incoherent
updates.  So no, we can't just remove it.
Ming Lei Dec. 17, 2024, 1:52 a.m. UTC | #2
On Mon, Dec 16, 2024 at 04:49:01PM +0100, Christoph Hellwig wrote:
> On Mon, Dec 16, 2024 at 04:02:03PM +0800, Ming Lei wrote:
> > More importantly, queue_limits_start_update() returns one local copy of
> > q->limits, then the API user overwrites the local copy, and commit the
> > copy by queue_limits_commit_update() finally.
> > 
> > So holding q->limits_lock protects nothing for the local copy, and not see
> > any real help by preventing new update & commit from happening, cause
> > what matters is that we do validation & commit atomically.
> 
> It protects against someone else changing the copy in the queue while
> the current thread is updating the local copy, and thus incoherent
> updates.  So no, we can't just remove it.

The local copy can be updated in any way with any data, so does another
concurrent update on q->limits really matter?


thanks,
Ming
Christoph Hellwig Dec. 17, 2024, 4:40 a.m. UTC | #3
On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> The local copy can be updated in any way with any data, so does another
> concurrent update on q->limits really matter?

Yes, because that means one of the updates get lost even if it is
for entirely separate fields.
Ming Lei Dec. 17, 2024, 7:05 a.m. UTC | #4
On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> > The local copy can be updated in any way with any data, so does another
> > concurrent update on q->limits really matter?
> 
> Yes, because that means one of the updates get lost even if it is
> for entirely separate fields.

Right, but the limits are still valid anytime.

Any suggestion for fixing this deadlock?

One idea is to add queue_limits_start_try_update() and apply it in
sysfs ->store(), in which update failure could be tolerable.


Thanks,
Ming
Christoph Hellwig Dec. 17, 2024, 7:19 a.m. UTC | #5
On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> > On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> > > The local copy can be updated in any way with any data, so does another
> > > concurrent update on q->limits really matter?
> > 
> > Yes, because that means one of the updates get lost even if it is
> > for entirely separate fields.
> 
> Right, but the limits are still valid anytime.
> 
> Any suggestion for fixing this deadlock?

What is "this deadlock"?
Ming Lei Dec. 17, 2024, 7:30 a.m. UTC | #6
On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
> > On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> > > On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> > > > The local copy can be updated in any way with any data, so does another
> > > > concurrent update on q->limits really matter?
> > > 
> > > Yes, because that means one of the updates get lost even if it is
> > > for entirely separate fields.
> > 
> > Right, but the limits are still valid anytime.
> > 
> > Any suggestion for fixing this deadlock?
> 
> What is "this deadlock"?

The commit log provides two reports:

- lockdep warning

https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/

- real deadlock report

https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/

It is actually one simple ABBA lock:

1) queue_attr_store()

      blk_mq_freeze_queue(q);					//queue freeze lock
      res = entry->store(disk, page, length);
	  			queue_limits_start_update		//->limits_lock
				...
				queue_limits_commit_update
      blk_mq_unfreeze_queue(q);

2) sd_revalidate_disk()

queue_limits_start_update					//->limits_lock
	sd_read_capacity()
		scsi_execute_cmd
			scsi_alloc_request
				blk_queue_enter					//queue freeze lock


Thanks,
Ming
Damien Le Moal Dec. 17, 2024, 4:07 p.m. UTC | #7
On 2024/12/16 23:30, Ming Lei wrote:
> On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
>> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
>>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
>>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
>>>>> The local copy can be updated in any way with any data, so does another
>>>>> concurrent update on q->limits really matter?
>>>>
>>>> Yes, because that means one of the updates get lost even if it is
>>>> for entirely separate fields.
>>>
>>> Right, but the limits are still valid anytime.
>>>
>>> Any suggestion for fixing this deadlock?
>>
>> What is "this deadlock"?
> 
> The commit log provides two reports:
> 
> - lockdep warning
> 
> https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
> 
> - real deadlock report
> 
> https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
> 
> It is actually one simple ABBA lock:
> 
> 1) queue_attr_store()
> 
>       blk_mq_freeze_queue(q);					//queue freeze lock
>       res = entry->store(disk, page, length);
> 	  			queue_limits_start_update		//->limits_lock
> 				...
> 				queue_limits_commit_update
>       blk_mq_unfreeze_queue(q);

The locking + freeze pattern should be:

	lim = queue_limits_start_update(q);
	...
	blk_mq_freeze_queue(q);
	ret = queue_limits_commit_update(q, &lim);
	blk_mq_unfreeze_queue(q);

This pattern is used in most places and anything that does not use it is likely
susceptible to a similar ABBA deadlock. We should probably look into trying to
integrate the freeze/unfreeze calls directly into queue_limits_commit_update().

Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
sd_revalidate_disk().

> 
> 2) sd_revalidate_disk()
> 
> queue_limits_start_update					//->limits_lock
> 	sd_read_capacity()
> 		scsi_execute_cmd
> 			scsi_alloc_request
> 				blk_queue_enter					//queue freeze lock
> 
> 
> Thanks,
> Ming
> 
>
Ming Lei Dec. 18, 2024, 2:09 a.m. UTC | #8
On Tue, Dec 17, 2024 at 08:07:06AM -0800, Damien Le Moal wrote:
> On 2024/12/16 23:30, Ming Lei wrote:
> > On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
> >> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
> >>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> >>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> >>>>> The local copy can be updated in any way with any data, so does another
> >>>>> concurrent update on q->limits really matter?
> >>>>
> >>>> Yes, because that means one of the updates get lost even if it is
> >>>> for entirely separate fields.
> >>>
> >>> Right, but the limits are still valid anytime.
> >>>
> >>> Any suggestion for fixing this deadlock?
> >>
> >> What is "this deadlock"?
> > 
> > The commit log provides two reports:
> > 
> > - lockdep warning
> > 
> > https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
> > 
> > - real deadlock report
> > 
> > https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
> > 
> > It is actually one simple ABBA lock:
> > 
> > 1) queue_attr_store()
> > 
> >       blk_mq_freeze_queue(q);					//queue freeze lock
> >       res = entry->store(disk, page, length);
> > 	  			queue_limits_start_update		//->limits_lock
> > 				...
> > 				queue_limits_commit_update
> >       blk_mq_unfreeze_queue(q);
> 
> The locking + freeze pattern should be:
> 
> 	lim = queue_limits_start_update(q);
> 	...
> 	blk_mq_freeze_queue(q);
> 	ret = queue_limits_commit_update(q, &lim);
> 	blk_mq_unfreeze_queue(q);
> 
> This pattern is used in most places and anything that does not use it is likely
> susceptible to a similar ABBA deadlock. We should probably look into trying to
> integrate the freeze/unfreeze calls directly into queue_limits_commit_update().
> 
> Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
> sd_revalidate_disk().

This way looks good, just commit af2814149883 ("block: freeze the queue in
queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
added to each queue attribute .store() handler.



Thanks,
Ming
Nilay Shroff Dec. 18, 2024, 11:33 a.m. UTC | #9
On 12/18/24 07:39, Ming Lei wrote:
> On Tue, Dec 17, 2024 at 08:07:06AM -0800, Damien Le Moal wrote:
>> On 2024/12/16 23:30, Ming Lei wrote:
>>> On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
>>>> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
>>>>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
>>>>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
>>>>>>> The local copy can be updated in any way with any data, so does another
>>>>>>> concurrent update on q->limits really matter?
>>>>>>
>>>>>> Yes, because that means one of the updates get lost even if it is
>>>>>> for entirely separate fields.
>>>>>
>>>>> Right, but the limits are still valid anytime.
>>>>>
>>>>> Any suggestion for fixing this deadlock?
>>>>
>>>> What is "this deadlock"?
>>>
>>> The commit log provides two reports:
>>>
>>> - lockdep warning
>>>
>>> https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
>>>
>>> - real deadlock report
>>>
>>> https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
>>>
>>> It is actually one simple ABBA lock:
>>>
>>> 1) queue_attr_store()
>>>
>>>       blk_mq_freeze_queue(q);					//queue freeze lock
>>>       res = entry->store(disk, page, length);
>>> 	  			queue_limits_start_update		//->limits_lock
>>> 				...
>>> 				queue_limits_commit_update
>>>       blk_mq_unfreeze_queue(q);
>>
>> The locking + freeze pattern should be:
>>
>> 	lim = queue_limits_start_update(q);
>> 	...
>> 	blk_mq_freeze_queue(q);
>> 	ret = queue_limits_commit_update(q, &lim);
>> 	blk_mq_unfreeze_queue(q);
>>
>> This pattern is used in most places and anything that does not use it is likely
>> susceptible to a similar ABBA deadlock. We should probably look into trying to
>> integrate the freeze/unfreeze calls directly into queue_limits_commit_update().
>>
>> Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
>> sd_revalidate_disk().
> 
> This way looks good, just commit af2814149883 ("block: freeze the queue in
> queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
> added to each queue attribute .store() handler.
> 
Wouldn't it be feasible to add blk-mq freeze in queue_limits_start_update()
and blk-mq unfreeze in queue_limits_commit_update()? If we do this then 
the pattern would be, 

queue_limits_start_update(): limit-lock + freeze
queue_limits_commit_update() : unfreeze + limit-unlock  

Then in queue_attr_store() we shall just remove freeze/unfreeze.

We also need to fix few call sites where we've code block,

{
    blk_mq_freeze_queue()
    ...
    queue_limits_start_update()
    ...    
    queue_limits_commit_update()
    ...
    blk_mq_unfreeze_queue()
    
}

In the above code block, we may then replace blk_mq_freeze_queue() with
queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() 
with queue_limits_commit_update().

{
    queue_limits_start_update()
    ...
    ...
    ...
    queue_limits_commit_update()

}

Thanks,
--Nilay
diff mbox series

Patch

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8f09e33f41f6..b737428c6084 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -422,6 +422,7 @@  int queue_limits_commit_update(struct request_queue *q,
 {
 	int error;
 
+	mutex_lock(&q->limits_lock);
 	error = blk_validate_limits(lim);
 	if (error)
 		goto out_unlock;
@@ -456,7 +457,6 @@  EXPORT_SYMBOL_GPL(queue_limits_commit_update);
  */
 int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
 {
-	mutex_lock(&q->limits_lock);
 	return queue_limits_commit_update(q, lim);
 }
 EXPORT_SYMBOL_GPL(queue_limits_set);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 378d3a1a22fc..6cc20ca79adc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -944,8 +944,13 @@  static inline unsigned int blk_boundary_sectors_left(sector_t offset,
 static inline struct queue_limits
 queue_limits_start_update(struct request_queue *q)
 {
+	struct queue_limits lim;
+
 	mutex_lock(&q->limits_lock);
-	return q->limits;
+	lim = q->limits;
+	mutex_unlock(&q->limits_lock);
+
+	return lim;
 }
 int queue_limits_commit_update(struct request_queue *q,
 		struct queue_limits *lim);
@@ -962,7 +967,6 @@  int blk_validate_limits(struct queue_limits *lim);
  */
 static inline void queue_limits_cancel_update(struct request_queue *q)
 {
-	mutex_unlock(&q->limits_lock);
 }
 
 /*