Message ID | 20240830072721.2112006-1-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | md: enhance faulty chekcing for blocked handling | expand |
On Fri, 30 Aug 2024 15:27:14 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote: > From: Yu Kuai <yukuai3@huawei.com> > > The lifetime of badblocks: > > - IO error, and decide to record badblocks, and record sb_flags; > - write IO found rdev has badblocks and not yet acknowledged, then this > IO is blocked; > - daemon found sb_flags is set, update superblock and flush badblocks; > - write IO continue; > > Main idea is that badblocks will be set in memory fist, before badblocks > are acknowledged, new write request must be blocked to prevent reading > old data after power failure, and this behaviour is not necessary if rdev > is faulty in the first place. > > Yu Kuai (7): > md: add a new helper rdev_blocked() > md: don't wait faulty rdev in md_wait_for_blocked_rdev() > md: don't record new badblocks for faulty rdev > md/raid1: factor out helper to handle blocked rdev from > raid1_write_request() > md/raid1: don't wait for Faulty rdev in wait_blocked_rdev() > md/raid10: don't wait for Faulty rdev in wait_blocked_rdev() > md/raid5: don't set Faulty rdev for blocked_rdev > > drivers/md/md.c | 8 +++-- > drivers/md/md.h | 24 +++++++++++++++ > drivers/md/raid1.c | 75 +++++++++++++++++++++++---------------------- > drivers/md/raid10.c | 40 +++++++++++------------- > drivers/md/raid5.c | 13 ++++---- > 5 files changed, 92 insertions(+), 68 deletions(-) > Hi Song, We need to test this with external metadata so please wait for our green light before you will take this. I checked the code and it looks safe but I need to double confirm it to avoid hung tasks. Thanks, Mariusz
On Fri, 30 Aug 2024 15:27:14 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote: > From: Yu Kuai <yukuai3@huawei.com> > > The lifetime of badblocks: > > - IO error, and decide to record badblocks, and record sb_flags; > - write IO found rdev has badblocks and not yet acknowledged, then this > IO is blocked; > - daemon found sb_flags is set, update superblock and flush badblocks; > - write IO continue; > > Main idea is that badblocks will be set in memory fist, before badblocks > are acknowledged, new write request must be blocked to prevent reading > old data after power failure, and this behaviour is not necessary if rdev > is faulty in the first place. > > Yu Kuai (7): > md: add a new helper rdev_blocked() > md: don't wait faulty rdev in md_wait_for_blocked_rdev() > md: don't record new badblocks for faulty rdev > md/raid1: factor out helper to handle blocked rdev from > raid1_write_request() > md/raid1: don't wait for Faulty rdev in wait_blocked_rdev() > md/raid10: don't wait for Faulty rdev in wait_blocked_rdev() > md/raid5: don't set Faulty rdev for blocked_rdev > > drivers/md/md.c | 8 +++-- > drivers/md/md.h | 24 +++++++++++++++ > drivers/md/raid1.c | 75 +++++++++++++++++++++++---------------------- > drivers/md/raid10.c | 40 +++++++++++------------- > drivers/md/raid5.c | 13 ++++---- > 5 files changed, 92 insertions(+), 68 deletions(-) > Hi, We tested this patchset. mdmon rework: https://github.com/md-raid-utilities/mdadm/pull/66 Kernel build torvalds/linux.git master: commit e32cde8d2bd7d251a8f9b434143977ddf13dcec6 I applied this patchset on top of that. My tests proved that: - If only mdmon PR is applied - hangs are reproducible. - If only this patchset is applied - hangs are reproducible. - If both kernel patchset and mdmon rework are applied- hangs are not reproducible (at least until now). It was tricky topic (I needed to deal with weird issues related to shared descriptors in mdmon). What the most important- there is no regression detected. Thanks, Mariusz
Dear Kuai, Thank you for this patch series. Just a note about the typo in che*ck*ing. Kind regards, Paul
Hi, 在 2024/10/09 15:14, Mariusz Tkaczyk 写道: > On Fri, 30 Aug 2024 15:27:14 +0800 > Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> From: Yu Kuai <yukuai3@huawei.com> >> >> The lifetime of badblocks: >> >> - IO error, and decide to record badblocks, and record sb_flags; >> - write IO found rdev has badblocks and not yet acknowledged, then this >> IO is blocked; >> - daemon found sb_flags is set, update superblock and flush badblocks; >> - write IO continue; >> >> Main idea is that badblocks will be set in memory fist, before badblocks >> are acknowledged, new write request must be blocked to prevent reading >> old data after power failure, and this behaviour is not necessary if rdev >> is faulty in the first place. >> >> Yu Kuai (7): >> md: add a new helper rdev_blocked() >> md: don't wait faulty rdev in md_wait_for_blocked_rdev() >> md: don't record new badblocks for faulty rdev >> md/raid1: factor out helper to handle blocked rdev from >> raid1_write_request() >> md/raid1: don't wait for Faulty rdev in wait_blocked_rdev() >> md/raid10: don't wait for Faulty rdev in wait_blocked_rdev() >> md/raid5: don't set Faulty rdev for blocked_rdev >> >> drivers/md/md.c | 8 +++-- >> drivers/md/md.h | 24 +++++++++++++++ >> drivers/md/raid1.c | 75 +++++++++++++++++++++++---------------------- >> drivers/md/raid10.c | 40 +++++++++++------------- >> drivers/md/raid5.c | 13 ++++---- >> 5 files changed, 92 insertions(+), 68 deletions(-) >> > > > Hi, > We tested this patchset. > > mdmon rework: > https://github.com/md-raid-utilities/mdadm/pull/66 > > Kernel build torvalds/linux.git master: > commit e32cde8d2bd7d251a8f9b434143977ddf13dcec6 > > I applied this patchset on top of that. > > My tests proved that: > - If only mdmon PR is applied - hangs are reproducible. > - If only this patchset is applied - hangs are reproducible. > - If both kernel patchset and mdmon rework are applied- hangs are not > reproducible (at least until now). > > It was tricky topic (I needed to deal with weird issues related to shared > descriptors in mdmon). > > What the most important- there is no regression detected. Good to here that, I'll send a V2 then. Usually this set will land in v6.13, because this doesn't look like a fix in kernel. :) Thanks, Kuai > > Thanks, > Mariusz > > . >
Hi, 在 2024/10/09 16:52, Paul Menzel 写道: > Dear Kuai, > > > Thank you for this patch series. Just a note about the typo in che*ck*ing. Thanks for the notice. :) Kuai > > > Kind regards, > > Paul > . >
From: Yu Kuai <yukuai3@huawei.com> The lifetime of badblocks: - IO error, and decide to record badblocks, and record sb_flags; - write IO found rdev has badblocks and not yet acknowledged, then this IO is blocked; - daemon found sb_flags is set, update superblock and flush badblocks; - write IO continue; Main idea is that badblocks will be set in memory fist, before badblocks are acknowledged, new write request must be blocked to prevent reading old data after power failure, and this behaviour is not necessary if rdev is faulty in the first place. Yu Kuai (7): md: add a new helper rdev_blocked() md: don't wait faulty rdev in md_wait_for_blocked_rdev() md: don't record new badblocks for faulty rdev md/raid1: factor out helper to handle blocked rdev from raid1_write_request() md/raid1: don't wait for Faulty rdev in wait_blocked_rdev() md/raid10: don't wait for Faulty rdev in wait_blocked_rdev() md/raid5: don't set Faulty rdev for blocked_rdev drivers/md/md.c | 8 +++-- drivers/md/md.h | 24 +++++++++++++++ drivers/md/raid1.c | 75 +++++++++++++++++++++++---------------------- drivers/md/raid10.c | 40 +++++++++++------------- drivers/md/raid5.c | 13 ++++---- 5 files changed, 92 insertions(+), 68 deletions(-)