Message ID | c8f3fba4-9cc1-4e7c-baf3-afb10ab7605d@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] Block fix for 6.11-final | expand |
On Thu, 12 Sept 2024 at 15:44, Jens Axboe <axboe@kernel.dk> wrote: > > Just a single fix for a deadlock issue that can happen if someone > attempts to change the root disk IO scheduler with a module that > requires loading from disk. Changing the scheduler freezes the queue > while that operation is happening, hence causing a deadlock. Side note: I do think that doing the blk_mq_freeze_queue() outside the sysfs_lock mutex is also a mistake, and will deadlock if anybody then needs to do any IO (like a user space access) inside the sysfs_lock mutex somewhere else. It wasn't what caused Jesper's problems, and maybe nothing actually does that, but it still looks rather questionable in queue_attr_store(). I mean, imagine holding q->sysfs_lock, and doing something as simple as just a memory allocation that wants to do swapping, but somebody else did that queue_attr_store(), which freezed the queues and is now waiting for the lock and won't unfreeze them until it gets it... Yeah, yeah, very very unlikely to hit in real life, but still. Seems very wrong. Linus
The pull request you sent on Thu, 12 Sep 2024 16:44:21 -0600:
> git://git.kernel.dk/linux.git tags/block-6.11-20240912
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/b8e7cd09ae543c1d384677b3d43e009a0e8647ca
Thank you!
On 9/12/24 5:21 PM, Linus Torvalds wrote: > On Thu, 12 Sept 2024 at 15:44, Jens Axboe <axboe@kernel.dk> wrote: >> >> Just a single fix for a deadlock issue that can happen if someone >> attempts to change the root disk IO scheduler with a module that >> requires loading from disk. Changing the scheduler freezes the queue >> while that operation is happening, hence causing a deadlock. > > Side note: I do think that doing the blk_mq_freeze_queue() outside the > sysfs_lock mutex is also a mistake, and will deadlock if anybody then > needs to do any IO (like a user space access) inside the sysfs_lock > mutex somewhere else. > > It wasn't what caused Jesper's problems, and maybe nothing actually > does that, but it still looks rather questionable in > queue_attr_store(). > > I mean, imagine holding q->sysfs_lock, and doing something as simple > as just a memory allocation that wants to do swapping, but somebody > else did that queue_attr_store(), which freezed the queues and is now > waiting for the lock and won't unfreeze them until it gets it... > > Yeah, yeah, very very unlikely to hit in real life, but still. Seems > very wrong. Yep I agree, it does feel like the wrong thing and could deadlock under reclaim. I didn't get around to replying to the other one, but I didn't want to risk changing the ordering right before release. Will do a separate patch for testing.