Message ID | 20240322081005.1112401-1-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING | expand |
On Fri, Mar 22, 2024 at 1:17 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: [...] > > Refer to the implementation from raid1 and raid10, fix this problem by > skipping issue IO if MD_SB_CHANGE_PENDING is still set after > md_check_recovery(), daemon thread will be woken up when 'reconfig_mutex' > is released. Meanwhile, the hang problem will be fixed as well. > > Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d") > Reported-and-tested-by: Dan Moulding <dan@danm.net> > Closes: https://lore.kernel.org/all/20240123005700.9302-1-dan@danm.net/ > Investigated-by: Junxiao Bi <junxiao.bi@oracle.com> > Signed-off-by: Yu Kuai <yukuai3@huawei.com> Applied to md-6.10. Thanks! Song
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index d874abfc1836..2bd1ce9b3922 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -36,7 +36,6 @@ */ #include <linux/blkdev.h> -#include <linux/delay.h> #include <linux/kthread.h> #include <linux/raid/pq.h> #include <linux/async_tx.h> @@ -6734,6 +6733,9 @@ static void raid5d(struct md_thread *thread) int batch_size, released; unsigned int offset; + if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) + break; + released = release_stripe_list(conf, conf->temp_inactive_list); if (released) clear_bit(R5_DID_ALLOC, &conf->cache_state); @@ -6770,18 +6772,7 @@ static void raid5d(struct md_thread *thread) spin_unlock_irq(&conf->device_lock); md_check_recovery(mddev); spin_lock_irq(&conf->device_lock); - - /* - * Waiting on MD_SB_CHANGE_PENDING below may deadlock - * seeing md_check_recovery() is needed to clear - * the flag when using mdmon. - */ - continue; } - - wait_event_lock_irq(mddev->sb_wait, - !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), - conf->device_lock); } pr_debug("%d stripes handled\n", handled);