diff mbox series

[v1,14/15] md: Ensure resync is reported after it starts

Message ID 20220519191311.17119-15-logang@deltatee.com (mailing list archive)
State New, archived
Headers show
Series Bug fixes for mdadm tests | expand

Commit Message

Logan Gunthorpe May 19, 2022, 7:13 p.m. UTC
The 07layouts test in mdadm fails on some systems. The failure
presents itself as the backup file not being removed before the next
layout is grown into:

  mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
      File exists

This is because the background mdadm process, which is responsible for
cleaning up this backup file gets into an infinite loop waiting for
the reshape to start. mdadm checks the mdstat file if a reshape is
going and, if it is not, it waits for an event on the file or times
out in 5 seconds. On faster machines, the reshape may complete before
the 5 seconds times out, and thus the background mdadm process loops
waiting for a reshape to start that has already occurred.

mdadm reads the mdstat file to start, but mdstat does not report that the
reshape has begun, even though it has indeed begun. So the mdstat_wait()
call (in mdadm) which polls on the mdstat file won't ever return until
timing out.

The reason mdstat reports the reshape has started is due to an issue
in status_resync(). recovery_active is subtracted from curr_resync which
will result in a value of zero for the first chunk of reshaped data, and
the resulting read will report no reshape in progress.

To fix this, if "resync - recovery_active" is zero: force the value to
be 4 so the code reports a resync in progress.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/md.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig May 21, 2022, 11:51 a.m. UTC | #1
On Thu, May 19, 2022 at 01:13:10PM -0600, Logan Gunthorpe wrote:
> The 07layouts test in mdadm fails on some systems. The failure
> presents itself as the backup file not being removed before the next
> layout is grown into:
> 
>   mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
>       File exists
> 
> This is because the background mdadm process, which is responsible for
> cleaning up this backup file gets into an infinite loop waiting for
> the reshape to start. mdadm checks the mdstat file if a reshape is
> going and, if it is not, it waits for an event on the file or times
> out in 5 seconds. On faster machines, the reshape may complete before
> the 5 seconds times out, and thus the background mdadm process loops
> waiting for a reshape to start that has already occurred.
> 
> mdadm reads the mdstat file to start, but mdstat does not report that the
> reshape has begun, even though it has indeed begun. So the mdstat_wait()
> call (in mdadm) which polls on the mdstat file won't ever return until
> timing out.
> 
> The reason mdstat reports the reshape has started is due to an issue
> in status_resync(). recovery_active is subtracted from curr_resync which
> will result in a value of zero for the first chunk of reshaped data, and
> the resulting read will report no reshape in progress.
> 
> To fix this, if "resync - recovery_active" is zero: force the value to
> be 4 so the code reports a resync in progress.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>  drivers/md/md.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8273ac5eef06..dbac63c8e35c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8022,10 +8022,18 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
>  		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>  			/* Still cleaning up */
>  			resync = max_sectors;
> -	} else if (resync > max_sectors)
> +	} else if (resync > max_sectors) {
>  		resync = max_sectors;
> -	else
> +	} else {
>  		resync -= atomic_read(&mddev->recovery_active);
> +		if (!resync) {
> +			/*
> +			 * Resync has started, but if it's zero, ensure
> +			 * it is still reported, by forcing it to be 4
> +			 */
> +			resync = 4;

Where does this magic 4 come from?
Logan Gunthorpe May 24, 2022, 3:45 p.m. UTC | #2
On 2022-05-21 05:51, Christoph Hellwig wrote:
> On Thu, May 19, 2022 at 01:13:10PM -0600, Logan Gunthorpe wrote:
>> The 07layouts test in mdadm fails on some systems. The failure
>> presents itself as the backup file not being removed before the next
>> layout is grown into:
>>
>>   mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
>>       File exists
>>
>> This is because the background mdadm process, which is responsible for
>> cleaning up this backup file gets into an infinite loop waiting for
>> the reshape to start. mdadm checks the mdstat file if a reshape is
>> going and, if it is not, it waits for an event on the file or times
>> out in 5 seconds. On faster machines, the reshape may complete before
>> the 5 seconds times out, and thus the background mdadm process loops
>> waiting for a reshape to start that has already occurred.
>>
>> mdadm reads the mdstat file to start, but mdstat does not report that the
>> reshape has begun, even though it has indeed begun. So the mdstat_wait()
>> call (in mdadm) which polls on the mdstat file won't ever return until
>> timing out.
>>
>> The reason mdstat reports the reshape has started is due to an issue
>> in status_resync(). recovery_active is subtracted from curr_resync which
>> will result in a value of zero for the first chunk of reshaped data, and
>> the resulting read will report no reshape in progress.
>>
>> To fix this, if "resync - recovery_active" is zero: force the value to
>> be 4 so the code reports a resync in progress.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>  drivers/md/md.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 8273ac5eef06..dbac63c8e35c 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8022,10 +8022,18 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
>>  		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>>  			/* Still cleaning up */
>>  			resync = max_sectors;
>> -	} else if (resync > max_sectors)
>> +	} else if (resync > max_sectors) {
>>  		resync = max_sectors;
>> -	else
>> +	} else {
>>  		resync -= atomic_read(&mddev->recovery_active);
>> +		if (!resync) {
>> +			/*
>> +			 * Resync has started, but if it's zero, ensure
>> +			 * it is still reported, by forcing it to be 4
>> +			 */
>> +			resync = 4;
> 
> Where does this magic 4 come from?

There are a bunch of existing magic numbers in this code. Just before
this hunk there's a if (resync <= 3) and just after the code there's an
if (resync < 3).

There's a comment in md_do_sync() indicating overloaded values for 1 and
2. I can try and turn this into an enum for v2.

Logan
diff mbox series

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8273ac5eef06..dbac63c8e35c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8022,10 +8022,18 @@  static int status_resync(struct seq_file *seq, struct mddev *mddev)
 		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
 			/* Still cleaning up */
 			resync = max_sectors;
-	} else if (resync > max_sectors)
+	} else if (resync > max_sectors) {
 		resync = max_sectors;
-	else
+	} else {
 		resync -= atomic_read(&mddev->recovery_active);
+		if (!resync) {
+			/*
+			 * Resync has started, but if it's zero, ensure
+			 * it is still reported, by forcing it to be 4
+			 */
+			resync = 4;
+		}
+	}
 
 	if (resync == 0) {
 		if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) {