Message ID | 20240125082131.788600-1-song@kernel.org (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"" | expand |
在 2024/01/25 16:21, Song Liu 写道: > This reverts commit bed9e27baf52a09b7ba2a3714f1e24e17ced386d. > > The original set [1][2] was expected to undo a suboptimal fix in [2], and > replace it with a better fix [1]. However, as reported by Dan Moulding [2] > causes an issue with raid5 with journal device. > > Revert [2] for now to close the issue. We will follow up on another issue > reported by Juxiao Bi, as [2] is expected to fix it. We believe this is a > good trade-off, because the latter issue happens less freqently. > > In the meanwhile, we will NOT revert [1], as it contains the right logic. > > Reported-by: Dan Moulding<dan@danm.net> > Closes:https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/ > Fixes: bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"") > Cc:stable@vger.kernel.org # v5.19+ > Cc: Junxiao Bi<junxiao.bi@oracle.com> > Cc: Yu Kuai<yukuai3@huawei.com> > Signed-off-by: Song Liu<song@kernel.org> > > [1] commit d6e035aad6c0 ("md: bypass block throttle for superblock update") > [2] commit bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"") LGTM Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Thank you Song. Let me know if there is any more information I can provide to help diagnose or reproduce this. -- Dan
Should we get some understanding what is the issue before reverting the commit? I am not clear what is the issue, already asked Dan in another thread. Thanks, Junxiao. On 1/25/24 12:21 AM, Song Liu wrote: > This reverts commit bed9e27baf52a09b7ba2a3714f1e24e17ced386d. > > The original set [1][2] was expected to undo a suboptimal fix in [2], and > replace it with a better fix [1]. However, as reported by Dan Moulding [2] > causes an issue with raid5 with journal device. > > Revert [2] for now to close the issue. We will follow up on another issue > reported by Juxiao Bi, as [2] is expected to fix it. We believe this is a > good trade-off, because the latter issue happens less freqently. > > In the meanwhile, we will NOT revert [1], as it contains the right logic. > > Reported-by: Dan Moulding <dan@danm.net> > Closes: https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/ > Fixes: bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"") > Cc: stable@vger.kernel.org # v5.19+ > Cc: Junxiao Bi <junxiao.bi@oracle.com> > Cc: Yu Kuai <yukuai3@huawei.com> > Signed-off-by: Song Liu <song@kernel.org> > > [1] commit d6e035aad6c0 ("md: bypass block throttle for superblock update") > [2] commit bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"") > --- > drivers/md/raid5.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 8497880135ee..2b2f03705990 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -36,6 +36,7 @@ > */ > > #include <linux/blkdev.h> > +#include <linux/delay.h> > #include <linux/kthread.h> > #include <linux/raid/pq.h> > #include <linux/async_tx.h> > @@ -6773,7 +6774,18 @@ static void raid5d(struct md_thread *thread) > spin_unlock_irq(&conf->device_lock); > md_check_recovery(mddev); > spin_lock_irq(&conf->device_lock); > + > + /* > + * Waiting on MD_SB_CHANGE_PENDING below may deadlock > + * seeing md_check_recovery() is needed to clear > + * the flag when using mdmon. > + */ > + continue; > } > + > + wait_event_lock_irq(mddev->sb_wait, > + !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), > + conf->device_lock); > } > pr_debug("%d stripes handled\n", handled); >
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 8497880135ee..2b2f03705990 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -36,6 +36,7 @@ */ #include <linux/blkdev.h> +#include <linux/delay.h> #include <linux/kthread.h> #include <linux/raid/pq.h> #include <linux/async_tx.h> @@ -6773,7 +6774,18 @@ static void raid5d(struct md_thread *thread) spin_unlock_irq(&conf->device_lock); md_check_recovery(mddev); spin_lock_irq(&conf->device_lock); + + /* + * Waiting on MD_SB_CHANGE_PENDING below may deadlock + * seeing md_check_recovery() is needed to clear + * the flag when using mdmon. + */ + continue; } + + wait_event_lock_irq(mddev->sb_wait, + !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), + conf->device_lock); } pr_debug("%d stripes handled\n", handled);
This reverts commit bed9e27baf52a09b7ba2a3714f1e24e17ced386d. The original set [1][2] was expected to undo a suboptimal fix in [2], and replace it with a better fix [1]. However, as reported by Dan Moulding [2] causes an issue with raid5 with journal device. Revert [2] for now to close the issue. We will follow up on another issue reported by Juxiao Bi, as [2] is expected to fix it. We believe this is a good trade-off, because the latter issue happens less freqently. In the meanwhile, we will NOT revert [1], as it contains the right logic. Reported-by: Dan Moulding <dan@danm.net> Closes: https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/ Fixes: bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"") Cc: stable@vger.kernel.org # v5.19+ Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> [1] commit d6e035aad6c0 ("md: bypass block throttle for superblock update") [2] commit bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"") --- drivers/md/raid5.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)