Message ID | 20231107212412.51470-2-junxiao.bi@oracle.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [V2,1/2] md: bypass block throttle for superblock update | expand |
Hi, 在 2023/11/08 5:24, Junxiao Bi 写道: > This reverts commit 5e2cf333b7bd5d3e62595a44d598a254c697cd74. > > That commit introduced the following race and can cause system hung. > > md_write_start: raid5d: > if (mddev->safemode == 1) > mddev->safemode = 0; > /* sync_checkers is always 0 when writes_pending is in per-cpu mode */ > if (mddev->in_sync || mddev->sync_checkers) { > spin_lock(&mddev->lock); > if (mddev->in_sync) { > mddev->in_sync = 0; > set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); > set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); > >>> running before md_write_start wake up it > if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) { > spin_unlock_irq(&conf->device_lock); > md_check_recovery(mddev); > spin_lock_irq(&conf->device_lock); > > /* > * Waiting on MD_SB_CHANGE_PENDING below may deadlock > * seeing md_check_recovery() is needed to clear > * the flag when using mdmon. > */ > continue; > } > > wait_event_lock_irq(mddev->sb_wait, >>>>>>>>>>> hung > !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), > conf->device_lock); > md_wakeup_thread(mddev->thread); > did_change = 1; > } > spin_unlock(&mddev->lock); > } > Forgot to mention this, above extrem long lines really is not good, please fix it. I think following should be enough: raid5d md_write_start if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) // failed set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) md_wakeup_thread wait_event_lock_irq /* * wait for MD_SB_CHANGE_PENDING to be cleared, while * md_write_start expect daemon thread to clear it. */ Thanks, Kuai > ... > > wait_event(mddev->sb_wait, >>>>>>>>>> hung > !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || > mddev->suspended); > > The issue that reverted commit is fixing is fixed by last patch in a new way. > > Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d") > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > --- > drivers/md/raid5.c | 12 ------------ > 1 file changed, 12 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index dc031d42f53b..fcc8a44dd4fd 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -36,7 +36,6 @@ > */ > > #include <linux/blkdev.h> > -#include <linux/delay.h> > #include <linux/kthread.h> > #include <linux/raid/pq.h> > #include <linux/async_tx.h> > @@ -6820,18 +6819,7 @@ static void raid5d(struct md_thread *thread) > spin_unlock_irq(&conf->device_lock); > md_check_recovery(mddev); > spin_lock_irq(&conf->device_lock); > - > - /* > - * Waiting on MD_SB_CHANGE_PENDING below may deadlock > - * seeing md_check_recovery() is needed to clear > - * the flag when using mdmon. > - */ > - continue; > } > - > - wait_event_lock_irq(mddev->sb_wait, > - !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), > - conf->device_lock); > } > pr_debug("%d stripes handled\n", handled); > >
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index dc031d42f53b..fcc8a44dd4fd 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -36,7 +36,6 @@ */ #include <linux/blkdev.h> -#include <linux/delay.h> #include <linux/kthread.h> #include <linux/raid/pq.h> #include <linux/async_tx.h> @@ -6820,18 +6819,7 @@ static void raid5d(struct md_thread *thread) spin_unlock_irq(&conf->device_lock); md_check_recovery(mddev); spin_lock_irq(&conf->device_lock); - - /* - * Waiting on MD_SB_CHANGE_PENDING below may deadlock - * seeing md_check_recovery() is needed to clear - * the flag when using mdmon. - */ - continue; } - - wait_event_lock_irq(mddev->sb_wait, - !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), - conf->device_lock); } pr_debug("%d stripes handled\n", handled);