diff mbox series

md/raid1: Add check for missing source disk in process_checks()

Message ID 20250408074909.729471-1-meir.elisha@volumez.com (mailing list archive)
State New
Headers show
Series md/raid1: Add check for missing source disk in process_checks() | expand

Commit Message

Meir Elisha April 8, 2025, 7:49 a.m. UTC
During recovery/check operations, the process_checks function loops
through available disks to find a 'primary' source with successfully
read data.

If no suitable source disk is found after checking all possibilities,
the 'primary' index will reach conf->raid_disks * 2. Add an explicit
check for this condition after the loop. If no source disk was found,
print an error message and return early to prevent further processing
without a valid primary source.

Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
---
This was observed when forcefully disconnecting all iSCSI devices backing
a RAID1 array(using --failfast flag) during a check operation, causing
all reads within process_checks to fail. The resulting kernel oops shows:

  BUG: kernel NULL pointer dereference, address: 0000000000000040
  RIP: 0010:process_checks+0x25e/0x5e0 [raid1]
  Code: ... <4c> 8b 53 40 ... // mov r10,[rbx+0x40]
  Call Trace:
   process_checks
   sync_request_write
   raid1d
   md_thread
   kthread
   ret_from_fork

 drivers/md/raid1.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Yu Kuai April 8, 2025, 8:28 a.m. UTC | #1
Hi,

在 2025/04/08 15:49, Meir Elisha 写道:
> During recovery/check operations, the process_checks function loops
> through available disks to find a 'primary' source with successfully
> read data.
> 
> If no suitable source disk is found after checking all possibilities,
> the 'primary' index will reach conf->raid_disks * 2. Add an explicit
> check for this condition after the loop. If no source disk was found,
> print an error message and return early to prevent further processing
> without a valid primary source.
> 
> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
> ---
> This was observed when forcefully disconnecting all iSCSI devices backing
> a RAID1 array(using --failfast flag) during a check operation, causing
> all reads within process_checks to fail. The resulting kernel oops shows:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000040
>    RIP: 0010:process_checks+0x25e/0x5e0 [raid1]
>    Code: ... <4c> 8b 53 40 ... // mov r10,[rbx+0x40]
>    Call Trace:
>     process_checks
>     sync_request_write
>     raid1d
>     md_thread
>     kthread
>     ret_from_fork
> 
>   drivers/md/raid1.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 0efc03cea24e..b6a52c137f53 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2296,6 +2296,12 @@ static void process_checks(struct r1bio *r1_bio)
>   			rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
>   			break;
>   		}
> +
> +	if (primary >= conf->raid_disks * 2) {
> +		pr_err_ratelimited("md/raid1:%s: unable to find source disk\n", mdname(mddev));
> +		return;
> +	}

This means all reads failed, then I'm a bit confused why
sync_request_write() doesn't return early?

end_sync_read
  if (!bio->bi_status)
   // uptodate is not set
   set_bit(R1BIO_Uptodate, &r1_bio->state);

sync_request_write
  if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
   if (!fix_sync_read_error(r1_bio))
    // why not return here?
    return;

  process_checks(r1_bio);

The answer is that fix_sync_read_error() is used for the case just one
rdev is read, it just handle the bio from r1_bio->read_disk, and suppose 
that just one rdev is read. And if rdev_set_badblocks() succeed,
fix_sync_read_error() will clear R1BIO_Uptodate and bi_status and
finally cause this problem.

While in this case, all member disks are read for check/repair, hence I
think fix_sync_read_error() is not supposed to be called here, and
better solution will be:

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d422bab77580..dafda9095c73 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2346,10 +2346,15 @@ static void sync_request_write(struct mddev 
*mddev, struct r1bio *r1_bio)
         int disks = conf->raid_disks * 2;
         struct bio *wbio;

-       if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
-               /* ouch - failed to read all of that. */
-               if (!fix_sync_read_error(r1_bio))
+       if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
+               /*
+                * ouch - failed to read all of that. No need to fix 
read error
+                * for check or repair because read all member disks failed.
+                */
+               if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) ||
+                   !fix_sync_read_error(r1_bio))
                         return;
+       }

         if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
                 process_checks(r1_bio);

Thanks,
Kuai

> +
>   	r1_bio->read_disk = primary;
>   	for (i = 0; i < conf->raid_disks * 2; i++) {
>   		int j = 0;
>
Yu Kuai April 8, 2025, 8:32 a.m. UTC | #2
Hi,

在 2025/04/08 16:28, Yu Kuai 写道:
> Hi,
> 
> 在 2025/04/08 15:49, Meir Elisha 写道:
>> During recovery/check operations, the process_checks function loops
>> through available disks to find a 'primary' source with successfully
>> read data.
>>
>> If no suitable source disk is found after checking all possibilities,
>> the 'primary' index will reach conf->raid_disks * 2. Add an explicit
>> check for this condition after the loop. If no source disk was found,
>> print an error message and return early to prevent further processing
>> without a valid primary source.
>>
>> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
>> ---
>> This was observed when forcefully disconnecting all iSCSI devices backing
>> a RAID1 array(using --failfast flag) during a check operation, causing
>> all reads within process_checks to fail. The resulting kernel oops shows:
>>
>>    BUG: kernel NULL pointer dereference, address: 0000000000000040
>>    RIP: 0010:process_checks+0x25e/0x5e0 [raid1]
>>    Code: ... <4c> 8b 53 40 ... // mov r10,[rbx+0x40]
>>    Call Trace:
>>     process_checks
>>     sync_request_write
>>     raid1d
>>     md_thread
>>     kthread
>>     ret_from_fork
>>
>>   drivers/md/raid1.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 0efc03cea24e..b6a52c137f53 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -2296,6 +2296,12 @@ static void process_checks(struct r1bio *r1_bio)
>>               rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
>>               break;
>>           }
>> +
>> +    if (primary >= conf->raid_disks * 2) {
>> +        pr_err_ratelimited("md/raid1:%s: unable to find source 
>> disk\n", mdname(mddev));
>> +        return;
>> +    }
> 
> This means all reads failed, then I'm a bit confused why
> sync_request_write() doesn't return early?
> 
> end_sync_read
>   if (!bio->bi_status)
>    // uptodate is not set
>    set_bit(R1BIO_Uptodate, &r1_bio->state);
> 
> sync_request_write
>   if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
>    if (!fix_sync_read_error(r1_bio))
>     // why not return here?
>     return;
> 
>   process_checks(r1_bio);
> 
> The answer is that fix_sync_read_error() is used for the case just one
> rdev is read, it just handle the bio from r1_bio->read_disk, and suppose 
> that just one rdev is read. And if rdev_set_badblocks() succeed,
> fix_sync_read_error() will clear R1BIO_Uptodate and bi_status and
> finally cause this problem.
> 
> While in this case, all member disks are read for check/repair, hence I
> think fix_sync_read_error() is not supposed to be called here, and
> better solution will be:
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index d422bab77580..dafda9095c73 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2346,10 +2346,15 @@ static void sync_request_write(struct mddev 
> *mddev, struct r1bio *r1_bio)
>          int disks = conf->raid_disks * 2;
>          struct bio *wbio;
> 
> -       if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
> -               /* ouch - failed to read all of that. */
> -               if (!fix_sync_read_error(r1_bio))
> +       if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
> +               /*
> +                * ouch - failed to read all of that. No need to fix 
> read error
> +                * for check or repair because read all member disks 
> failed.
> +                */
> +               if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) ||
> +                   !fix_sync_read_error(r1_bio))
>                          return;
> +       }
> 
>          if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
>                  process_checks(r1_bio);
Sorry that I forgot to release the r1bio, please check the following
patch:

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d422bab77580..1a04dc91a8aa 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2200,14 +2200,9 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
                                 if (!rdev_set_badblocks(rdev, sect, s, 0))
                                         abort = 1;
                         }
-                       if (abort) {
-                               conf->recovery_disabled =
-                                       mddev->recovery_disabled;
-                               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-                               md_done_sync(mddev, r1_bio->sectors, 0);
-                               put_buf(r1_bio);
+                       if (abort)
                                 return 0;
-                       }
+
                         /* Try next page */
                         sectors -= s;
                         sect += s;
@@ -2346,10 +2341,20 @@ static void sync_request_write(struct mddev 
*mddev, struct r1bio *r1_bio)
         int disks = conf->raid_disks * 2;
         struct bio *wbio;

-       if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
-               /* ouch - failed to read all of that. */
-               if (!fix_sync_read_error(r1_bio))
+       if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
+               /*
+                * ouch - failed to read all of that. No need to fix 
read error
+                * for check or repair because all member disks are read.
+                */
+               if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) ||
+                   !fix_sync_read_error(r1_bio)) {
+                       conf->recovery_disabled = mddev->recovery_disabled;
+                       set_bit(MD_RECOVERY_INTR, &mddev->recovery);
+                       md_done_sync(mddev, r1_bio->sectors, 0);
+                       put_buf(r1_bio);
                         return;
+               }
+       }

         if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
                 process_checks(r1_bio);

Thanks,
Kuai
> 
> Thanks,
> Kuai
> 
>> +
>>       r1_bio->read_disk = primary;
>>       for (i = 0; i < conf->raid_disks * 2; i++) {
>>           int j = 0;
>>
> 
> .
>
diff mbox series

Patch

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0efc03cea24e..b6a52c137f53 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2296,6 +2296,12 @@  static void process_checks(struct r1bio *r1_bio)
 			rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
 			break;
 		}
+
+	if (primary >= conf->raid_disks * 2) {
+		pr_err_ratelimited("md/raid1:%s: unable to find source disk\n", mdname(mddev));
+		return;
+	}
+
 	r1_bio->read_disk = primary;
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		int j = 0;