Message ID | 20240830072721.2112006-4-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | md: enhance faulty chekcing for blocked handling | expand |
Context | Check | Description |
---|---|---|
mdraidci/vmtest-md-6_12-PR | success | PR summary |
mdraidci/vmtest-md-6_12-VM_Test-0 | success | Logs for per-patch-testing |
On Fri, 30 Aug 2024 15:27:17 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Faulty will be checked before issuing IO to the rdev, however, rdev can > be faulty at any time, hence it's possible that rdev_set_badblocks() > will be called for faulty rdev. In this case, mddev->sb_flags will be > set and some other path can be blocked by updating super block. > > Since faulty rdev will not be accesed anymore, there is no need to > record new babblocks for faulty rdev and forcing updating super block. > > Noted this is not a bugfix, just prevent updating superblock in some > corner cases, and will help to slice a bug related to external > metadata[1]. > > [1] > https://lore.kernel.org/all/f34452df-810b-48b2-a9b4-7f925699a9e7@linux.intel.com/ > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/md/md.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 675d89597c7b..a3f7f407fe42 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -9757,6 +9757,10 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t > s, int sectors, { > struct mddev *mddev = rdev->mddev; > int rv; > + > + if (test_bit(Faulty, &rdev->flags)) > + return 1; > + Blame is volatile, this is why we need a comment here :) Otherwise, someone may remove that. Thanks, Mariusz
Hi, 在 2024/08/30 18:28, Mariusz Tkaczyk 写道: > On Fri, 30 Aug 2024 15:27:17 +0800 > Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> From: Yu Kuai <yukuai3@huawei.com> >> >> Faulty will be checked before issuing IO to the rdev, however, rdev can >> be faulty at any time, hence it's possible that rdev_set_badblocks() >> will be called for faulty rdev. In this case, mddev->sb_flags will be >> set and some other path can be blocked by updating super block. >> >> Since faulty rdev will not be accesed anymore, there is no need to >> record new babblocks for faulty rdev and forcing updating super block. >> >> Noted this is not a bugfix, just prevent updating superblock in some >> corner cases, and will help to slice a bug related to external >> metadata[1]. >> >> [1] >> https://lore.kernel.org/all/f34452df-810b-48b2-a9b4-7f925699a9e7@linux.intel.com/ >> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> drivers/md/md.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 675d89597c7b..a3f7f407fe42 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -9757,6 +9757,10 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t >> s, int sectors, { >> struct mddev *mddev = rdev->mddev; >> int rv; >> + >> + if (test_bit(Faulty, &rdev->flags)) >> + return 1; >> + > > Blame is volatile, this is why we need a comment here :) > Otherwise, someone may remove that. Perhaps something like following? /* * record new babblocks for faulty rdev will force unnecessary * super block updating. */ Thanks, Kuai > > Thanks, > Mariusz > > > . >
On Sat, 31 Aug 2024 09:14:39 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote: > Hi, > > 在 2024/08/30 18:28, Mariusz Tkaczyk 写道: > > On Fri, 30 Aug 2024 15:27:17 +0800 > > Yu Kuai <yukuai1@huaweicloud.com> wrote: > > > >> From: Yu Kuai <yukuai3@huawei.com> > >> > >> Faulty will be checked before issuing IO to the rdev, however, rdev can > >> be faulty at any time, hence it's possible that rdev_set_badblocks() > >> will be called for faulty rdev. In this case, mddev->sb_flags will be > >> set and some other path can be blocked by updating super block. > >> > >> Since faulty rdev will not be accesed anymore, there is no need to > >> record new babblocks for faulty rdev and forcing updating super block. > >> > >> Noted this is not a bugfix, just prevent updating superblock in some > >> corner cases, and will help to slice a bug related to external > >> metadata[1]. > >> > >> [1] > >> https://lore.kernel.org/all/f34452df-810b-48b2-a9b4-7f925699a9e7@linux.intel.com/ > >> > >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > >> --- > >> drivers/md/md.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c > >> index 675d89597c7b..a3f7f407fe42 100644 > >> --- a/drivers/md/md.c > >> +++ b/drivers/md/md.c > >> @@ -9757,6 +9757,10 @@ int rdev_set_badblocks(struct md_rdev *rdev, > >> sector_t s, int sectors, { > >> struct mddev *mddev = rdev->mddev; > >> int rv; > >> + > >> + if (test_bit(Faulty, &rdev->flags)) > >> + return 1; > >> + > > > > Blame is volatile, this is why we need a comment here :) > > Otherwise, someone may remove that. > > Perhaps something like following? > > /* > * record new babblocks for faulty rdev will force unnecessary > * super block updating. > */ > Almost, we need to refer to external context because this is important to mention where to expect issues: /* * Recording new badblocks for faulty rdev will force unnecessary * super block updating. This is fragile for external management because * userspace daemon may trying to remove this device and deadlock may * occur. This will be probably solved in the mdadm, but it is safer to avoid * it. */ In my testing, I observed that it improves failing bios and device removal path (recording badblock is simply expensive if there are many badblocks) so the devices are removed faster but I don't have data here, this is what I saw. Obviously, it is optimization. Mariusz
Hi, 在 2024/09/02 16:55, Mariusz Tkaczyk 写道: > On Sat, 31 Aug 2024 09:14:39 +0800 > Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> Hi, >> >> 在 2024/08/30 18:28, Mariusz Tkaczyk 写道: >>> On Fri, 30 Aug 2024 15:27:17 +0800 >>> Yu Kuai <yukuai1@huaweicloud.com> wrote: >>> >>>> From: Yu Kuai <yukuai3@huawei.com> >>>> >>>> Faulty will be checked before issuing IO to the rdev, however, rdev can >>>> be faulty at any time, hence it's possible that rdev_set_badblocks() >>>> will be called for faulty rdev. In this case, mddev->sb_flags will be >>>> set and some other path can be blocked by updating super block. >>>> >>>> Since faulty rdev will not be accesed anymore, there is no need to >>>> record new babblocks for faulty rdev and forcing updating super block. >>>> >>>> Noted this is not a bugfix, just prevent updating superblock in some >>>> corner cases, and will help to slice a bug related to external >>>> metadata[1]. >>>> >>>> [1] >>>> https://lore.kernel.org/all/f34452df-810b-48b2-a9b4-7f925699a9e7@linux.intel.com/ >>>> >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>>> --- >>>> drivers/md/md.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>>> index 675d89597c7b..a3f7f407fe42 100644 >>>> --- a/drivers/md/md.c >>>> +++ b/drivers/md/md.c >>>> @@ -9757,6 +9757,10 @@ int rdev_set_badblocks(struct md_rdev *rdev, >>>> sector_t s, int sectors, { >>>> struct mddev *mddev = rdev->mddev; >>>> int rv; >>>> + >>>> + if (test_bit(Faulty, &rdev->flags)) >>>> + return 1; >>>> + >>> >>> Blame is volatile, this is why we need a comment here :) >>> Otherwise, someone may remove that. >> >> Perhaps something like following? >> >> /* >> * record new babblocks for faulty rdev will force unnecessary >> * super block updating. >> */ >> > > Almost, we need to refer to external context because this is important to > mention where to expect issues: > > /* > * Recording new badblocks for faulty rdev will force unnecessary > * super block updating. This is fragile for external management because > * userspace daemon may trying to remove this device and deadlock may > * occur. This will be probably solved in the mdadm, but it is safer to avoid > * it. > */ > > In my testing, I observed that it improves failing bios and device removal > path (recording badblock is simply expensive if there are many badblocks) so > the devices are removed faster but I don't have data here, this is what I saw. I'll mention this in the commit message, and add the above comment in v2. Thanks, Kuai > > Obviously, it is optimization. > > Mariusz > > . >
diff --git a/drivers/md/md.c b/drivers/md/md.c index 675d89597c7b..a3f7f407fe42 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9757,6 +9757,10 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors, { struct mddev *mddev = rdev->mddev; int rv; + + if (test_bit(Faulty, &rdev->flags)) + return 1; + if (is_new) s += rdev->new_data_offset; else