Message ID | 20250408074909.729471-1-meir.elisha@volumez.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | md/raid1: Add check for missing source disk in process_checks() | expand |
Hi, 在 2025/04/08 15:49, Meir Elisha 写道: > During recovery/check operations, the process_checks function loops > through available disks to find a 'primary' source with successfully > read data. > > If no suitable source disk is found after checking all possibilities, > the 'primary' index will reach conf->raid_disks * 2. Add an explicit > check for this condition after the loop. If no source disk was found, > print an error message and return early to prevent further processing > without a valid primary source. > > Signed-off-by: Meir Elisha <meir.elisha@volumez.com> > --- > This was observed when forcefully disconnecting all iSCSI devices backing > a RAID1 array(using --failfast flag) during a check operation, causing > all reads within process_checks to fail. The resulting kernel oops shows: > > BUG: kernel NULL pointer dereference, address: 0000000000000040 > RIP: 0010:process_checks+0x25e/0x5e0 [raid1] > Code: ... <4c> 8b 53 40 ... // mov r10,[rbx+0x40] > Call Trace: > process_checks > sync_request_write > raid1d > md_thread > kthread > ret_from_fork > > drivers/md/raid1.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 0efc03cea24e..b6a52c137f53 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2296,6 +2296,12 @@ static void process_checks(struct r1bio *r1_bio) > rdev_dec_pending(conf->mirrors[primary].rdev, mddev); > break; > } > + > + if (primary >= conf->raid_disks * 2) { > + pr_err_ratelimited("md/raid1:%s: unable to find source disk\n", mdname(mddev)); > + return; > + } This means all reads failed, then I'm a bit confused why sync_request_write() doesn't return early? end_sync_read if (!bio->bi_status) // uptodate is not set set_bit(R1BIO_Uptodate, &r1_bio->state); sync_request_write if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) if (!fix_sync_read_error(r1_bio)) // why not return here? return; process_checks(r1_bio); The answer is that fix_sync_read_error() is used for the case just one rdev is read, it just handle the bio from r1_bio->read_disk, and suppose that just one rdev is read. And if rdev_set_badblocks() succeed, fix_sync_read_error() will clear R1BIO_Uptodate and bi_status and finally cause this problem. While in this case, all member disks are read for check/repair, hence I think fix_sync_read_error() is not supposed to be called here, and better solution will be: diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index d422bab77580..dafda9095c73 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2346,10 +2346,15 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio) int disks = conf->raid_disks * 2; struct bio *wbio; - if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) - /* ouch - failed to read all of that. */ - if (!fix_sync_read_error(r1_bio)) + if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) { + /* + * ouch - failed to read all of that. No need to fix read error + * for check or repair because read all member disks failed. + */ + if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) || + !fix_sync_read_error(r1_bio)) return; + } if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) process_checks(r1_bio); Thanks, Kuai > + > r1_bio->read_disk = primary; > for (i = 0; i < conf->raid_disks * 2; i++) { > int j = 0; >
Hi, 在 2025/04/08 16:28, Yu Kuai 写道: > Hi, > > 在 2025/04/08 15:49, Meir Elisha 写道: >> During recovery/check operations, the process_checks function loops >> through available disks to find a 'primary' source with successfully >> read data. >> >> If no suitable source disk is found after checking all possibilities, >> the 'primary' index will reach conf->raid_disks * 2. Add an explicit >> check for this condition after the loop. If no source disk was found, >> print an error message and return early to prevent further processing >> without a valid primary source. >> >> Signed-off-by: Meir Elisha <meir.elisha@volumez.com> >> --- >> This was observed when forcefully disconnecting all iSCSI devices backing >> a RAID1 array(using --failfast flag) during a check operation, causing >> all reads within process_checks to fail. The resulting kernel oops shows: >> >> BUG: kernel NULL pointer dereference, address: 0000000000000040 >> RIP: 0010:process_checks+0x25e/0x5e0 [raid1] >> Code: ... <4c> 8b 53 40 ... // mov r10,[rbx+0x40] >> Call Trace: >> process_checks >> sync_request_write >> raid1d >> md_thread >> kthread >> ret_from_fork >> >> drivers/md/raid1.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 0efc03cea24e..b6a52c137f53 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -2296,6 +2296,12 @@ static void process_checks(struct r1bio *r1_bio) >> rdev_dec_pending(conf->mirrors[primary].rdev, mddev); >> break; >> } >> + >> + if (primary >= conf->raid_disks * 2) { >> + pr_err_ratelimited("md/raid1:%s: unable to find source >> disk\n", mdname(mddev)); >> + return; >> + } > > This means all reads failed, then I'm a bit confused why > sync_request_write() doesn't return early? > > end_sync_read > if (!bio->bi_status) > // uptodate is not set > set_bit(R1BIO_Uptodate, &r1_bio->state); > > sync_request_write > if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) > if (!fix_sync_read_error(r1_bio)) > // why not return here? > return; > > process_checks(r1_bio); > > The answer is that fix_sync_read_error() is used for the case just one > rdev is read, it just handle the bio from r1_bio->read_disk, and suppose > that just one rdev is read. And if rdev_set_badblocks() succeed, > fix_sync_read_error() will clear R1BIO_Uptodate and bi_status and > finally cause this problem. > > While in this case, all member disks are read for check/repair, hence I > think fix_sync_read_error() is not supposed to be called here, and > better solution will be: > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index d422bab77580..dafda9095c73 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2346,10 +2346,15 @@ static void sync_request_write(struct mddev > *mddev, struct r1bio *r1_bio) > int disks = conf->raid_disks * 2; > struct bio *wbio; > > - if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) > - /* ouch - failed to read all of that. */ > - if (!fix_sync_read_error(r1_bio)) > + if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) { > + /* > + * ouch - failed to read all of that. No need to fix > read error > + * for check or repair because read all member disks > failed. > + */ > + if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) || > + !fix_sync_read_error(r1_bio)) > return; > + } > > if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) > process_checks(r1_bio); Sorry that I forgot to release the r1bio, please check the following patch: diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index d422bab77580..1a04dc91a8aa 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2200,14 +2200,9 @@ static int fix_sync_read_error(struct r1bio *r1_bio) if (!rdev_set_badblocks(rdev, sect, s, 0)) abort = 1; } - if (abort) { - conf->recovery_disabled = - mddev->recovery_disabled; - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_done_sync(mddev, r1_bio->sectors, 0); - put_buf(r1_bio); + if (abort) return 0; - } + /* Try next page */ sectors -= s; sect += s; @@ -2346,10 +2341,20 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio) int disks = conf->raid_disks * 2; struct bio *wbio; - if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) - /* ouch - failed to read all of that. */ - if (!fix_sync_read_error(r1_bio)) + if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) { + /* + * ouch - failed to read all of that. No need to fix read error + * for check or repair because all member disks are read. + */ + if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) || + !fix_sync_read_error(r1_bio)) { + conf->recovery_disabled = mddev->recovery_disabled; + set_bit(MD_RECOVERY_INTR, &mddev->recovery); + md_done_sync(mddev, r1_bio->sectors, 0); + put_buf(r1_bio); return; + } + } if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) process_checks(r1_bio); Thanks, Kuai > > Thanks, > Kuai > >> + >> r1_bio->read_disk = primary; >> for (i = 0; i < conf->raid_disks * 2; i++) { >> int j = 0; >> > > . >
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 0efc03cea24e..b6a52c137f53 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2296,6 +2296,12 @@ static void process_checks(struct r1bio *r1_bio) rdev_dec_pending(conf->mirrors[primary].rdev, mddev); break; } + + if (primary >= conf->raid_disks * 2) { + pr_err_ratelimited("md/raid1:%s: unable to find source disk\n", mdname(mddev)); + return; + } + r1_bio->read_disk = primary; for (i = 0; i < conf->raid_disks * 2; i++) { int j = 0;
During recovery/check operations, the process_checks function loops through available disks to find a 'primary' source with successfully read data. If no suitable source disk is found after checking all possibilities, the 'primary' index will reach conf->raid_disks * 2. Add an explicit check for this condition after the loop. If no source disk was found, print an error message and return early to prevent further processing without a valid primary source. Signed-off-by: Meir Elisha <meir.elisha@volumez.com> --- This was observed when forcefully disconnecting all iSCSI devices backing a RAID1 array(using --failfast flag) during a check operation, causing all reads within process_checks to fail. The resulting kernel oops shows: BUG: kernel NULL pointer dereference, address: 0000000000000040 RIP: 0010:process_checks+0x25e/0x5e0 [raid1] Code: ... <4c> 8b 53 40 ... // mov r10,[rbx+0x40] Call Trace: process_checks sync_request_write raid1d md_thread kthread ret_from_fork drivers/md/raid1.c | 6 ++++++ 1 file changed, 6 insertions(+)