diff mbox series

[v3,3/3] md/raid1: check array size before reshape

Message ID 20230719070954.3084379-4-xueshi.hu@smartx.com (mailing list archive)
State Superseded, archived
Headers show
Series don't change mempool if in-flight r1bio exists | expand

Commit Message

Xueshi Hu July 19, 2023, 7:09 a.m. UTC
If array size doesn't changed, nothing need to do.

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 drivers/md/raid1.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Paul Menzel July 19, 2023, 7:38 a.m. UTC | #1
Dear Xueshi,


Thank you for your patches.

Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> If array size doesn't changed, nothing need to do.

Maybe: … nothing needs to be done.

Do you have a test case to reproduce it?


Kind regards,

Paul


> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
>   drivers/md/raid1.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 62e86b7d1561..5840b8b0f9b7 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>   	int d, d2;
>   	int ret;
>   
> -	memset(&newpool, 0, sizeof(newpool));
> -	memset(&oldpool, 0, sizeof(oldpool));
> -
>   	/* Cannot change chunk_size, layout, or level */
>   	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>   	    mddev->layout != mddev->new_layout ||
> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>   		return -EINVAL;
>   	}
>   
> +	if (mddev->delta_disks == 0)
> +		return 0; /* nothing to do */
> +
> +	memset(&newpool, 0, sizeof(newpool));
> +	memset(&oldpool, 0, sizeof(oldpool));
> +
>   	if (!mddev_is_clustered(mddev))
>   		md_allow_write(mddev);
>
Xueshi Hu July 19, 2023, 11:51 a.m. UTC | #2
On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> Dear Xueshi,
> 
> 
> Thank you for your patches.
> 
> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > If array size doesn't changed, nothing need to do.
> 
> Maybe: … nothing needs to be done.
What a hurry, I'll be cautious next time and check the sentences with
tools.
> 
> Do you have a test case to reproduce it?
> 
Userspace command:

	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks

Kernel function calling flow:

md_attr_store()
	raid_disks_store()
		update_raid_disks()
			raid1_reshape()

Maybe I shall provide more information when submit patches, thank you for
reminding me.
> 
> Kind regards,
> 
> Paul
> 
> 
> > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > ---
> >   drivers/md/raid1.c | 9 ++++++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 62e86b7d1561..5840b8b0f9b7 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> >   	int d, d2;
> >   	int ret;
> > -	memset(&newpool, 0, sizeof(newpool));
> > -	memset(&oldpool, 0, sizeof(oldpool));
> > -
> >   	/* Cannot change chunk_size, layout, or level */
> >   	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> >   	    mddev->layout != mddev->new_layout ||
> > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> >   		return -EINVAL;
> >   	}
> > +	if (mddev->delta_disks == 0)
> > +		return 0; /* nothing to do */
> > +
> > +	memset(&newpool, 0, sizeof(newpool));
> > +	memset(&oldpool, 0, sizeof(oldpool));
> > +
> >   	if (!mddev_is_clustered(mddev))
> >   		md_allow_write(mddev);
Yu Kuai July 20, 2023, 1:28 a.m. UTC | #3
Hi,

在 2023/07/19 19:51, Xueshi Hu 写道:
> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
>> Dear Xueshi,
>>
>>
>> Thank you for your patches.
>>
>> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
>>> If array size doesn't changed, nothing need to do.
>>
>> Maybe: … nothing needs to be done.
> What a hurry, I'll be cautious next time and check the sentences with
> tools.
>>
>> Do you have a test case to reproduce it?
>>
> Userspace command:
> 
> 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> 
> Kernel function calling flow:
> 
> md_attr_store()
> 	raid_disks_store()
> 		update_raid_disks()
> 			raid1_reshape()
> 
> Maybe I shall provide more information when submit patches, thank you for
> reminding me.
>>
>> Kind regards,
>>
>> Paul
>>
>>
>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>> ---
>>>    drivers/md/raid1.c | 9 ++++++---
>>>    1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 62e86b7d1561..5840b8b0f9b7 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>>>    	int d, d2;
>>>    	int ret;
>>> -	memset(&newpool, 0, sizeof(newpool));
>>> -	memset(&oldpool, 0, sizeof(oldpool));
>>> -
>>>    	/* Cannot change chunk_size, layout, or level */
>>>    	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>>>    	    mddev->layout != mddev->new_layout ||
>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>>>    		return -EINVAL;
>>>    	}
>>> +	if (mddev->delta_disks == 0)
>>> +		return 0; /* nothing to do */

I think this is wrong, you should at least keep following:

         set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
         set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
         md_wakeup_thread(mddev->thread);

Thanks,
Kuai

>>> +
>>> +	memset(&newpool, 0, sizeof(newpool));
>>> +	memset(&oldpool, 0, sizeof(oldpool));
>>> +
>>>    	if (!mddev_is_clustered(mddev))
>>>    		md_allow_write(mddev);
> .
>
Xueshi Hu July 28, 2023, 2:42 p.m. UTC | #4
On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/19 19:51, Xueshi Hu 写道:
> > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> > > Dear Xueshi,
> > > 
> > > 
> > > Thank you for your patches.
> > > 
> > > Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > > > If array size doesn't changed, nothing need to do.
> > > 
> > > Maybe: … nothing needs to be done.
> > What a hurry, I'll be cautious next time and check the sentences with
> > tools.
> > > 
> > > Do you have a test case to reproduce it?
> > > 
> > Userspace command:
> > 
> > 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> > 
> > Kernel function calling flow:
> > 
> > md_attr_store()
> > 	raid_disks_store()
> > 		update_raid_disks()
> > 			raid1_reshape()
> > 
> > Maybe I shall provide more information when submit patches, thank you for
> > reminding me.
> > > 
> > > Kind regards,
> > > 
> > > Paul
> > > 
> > > 
> > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > > ---
> > > >    drivers/md/raid1.c | 9 ++++++---
> > > >    1 file changed, 6 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > > index 62e86b7d1561..5840b8b0f9b7 100644
> > > > --- a/drivers/md/raid1.c
> > > > +++ b/drivers/md/raid1.c
> > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> > > >    	int d, d2;
> > > >    	int ret;
> > > > -	memset(&newpool, 0, sizeof(newpool));
> > > > -	memset(&oldpool, 0, sizeof(oldpool));
> > > > -
> > > >    	/* Cannot change chunk_size, layout, or level */
> > > >    	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> > > >    	    mddev->layout != mddev->new_layout ||
> > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> > > >    		return -EINVAL;
> > > >    	}
> > > > +	if (mddev->delta_disks == 0)
> > > > +		return 0; /* nothing to do */
> 
> I think this is wrong, you should at least keep following:
> 
>         set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>         set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>         md_wakeup_thread(mddev->thread);
> 
I fail to comprehend the rationale behind the kernel's need to invoke 
raid1d() repeatedly despite the absence of any modifications.
It appears that raid1d() is responsible for invoking md_check_recovery() 
and resubmit the failed io.

Evidently, it is not within the scope of raid1_reshape() to resubmit
failed io.

Moreover, upon reviewing the Documentation[1], I could not find any
indications that reshape must undertake the actions as specified in
md_check_recovery()'s documentation, such as eliminating faulty devices.

[1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
> Thanks,
> Kuai
> 
> > > > +
> > > > +	memset(&newpool, 0, sizeof(newpool));
> > > > +	memset(&oldpool, 0, sizeof(oldpool));
> > > > +
> > > >    	if (!mddev_is_clustered(mddev))
> > > >    		md_allow_write(mddev);
> > .
> > 
> 
Thanks,
Hu
Yu Kuai July 29, 2023, 12:58 a.m. UTC | #5
Hi,

在 2023/07/28 22:42, Xueshi Hu 写道:
> On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/19 19:51, Xueshi Hu 写道:
>>> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
>>>> Dear Xueshi,
>>>>
>>>>
>>>> Thank you for your patches.
>>>>
>>>> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
>>>>> If array size doesn't changed, nothing need to do.
>>>>
>>>> Maybe: … nothing needs to be done.
>>> What a hurry, I'll be cautious next time and check the sentences with
>>> tools.
>>>>
>>>> Do you have a test case to reproduce it?
>>>>
>>> Userspace command:
>>>
>>> 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
>>>
>>> Kernel function calling flow:
>>>
>>> md_attr_store()
>>> 	raid_disks_store()
>>> 		update_raid_disks()
>>> 			raid1_reshape()
>>>
>>> Maybe I shall provide more information when submit patches, thank you for
>>> reminding me.
>>>>
>>>> Kind regards,
>>>>
>>>> Paul
>>>>
>>>>
>>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>>>> ---
>>>>>     drivers/md/raid1.c | 9 ++++++---
>>>>>     1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>>> index 62e86b7d1561..5840b8b0f9b7 100644
>>>>> --- a/drivers/md/raid1.c
>>>>> +++ b/drivers/md/raid1.c
>>>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>>>>>     	int d, d2;
>>>>>     	int ret;
>>>>> -	memset(&newpool, 0, sizeof(newpool));
>>>>> -	memset(&oldpool, 0, sizeof(oldpool));
>>>>> -
>>>>>     	/* Cannot change chunk_size, layout, or level */
>>>>>     	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>>>>>     	    mddev->layout != mddev->new_layout ||
>>>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>>>>>     		return -EINVAL;
>>>>>     	}
>>>>> +	if (mddev->delta_disks == 0)
>>>>> +		return 0; /* nothing to do */
>>
>> I think this is wrong, you should at least keep following:
>>
>>          set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>          set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>          md_wakeup_thread(mddev->thread);
>>
> I fail to comprehend the rationale behind the kernel's need to invoke
> raid1d() repeatedly despite the absence of any modifications.
> It appears that raid1d() is responsible for invoking md_check_recovery()
> and resubmit the failed io.

No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
so that md_check_recovery() can start to reshape.

Thanks,
Kuai
> 
> Evidently, it is not within the scope of raid1_reshape() to resubmit
> failed io.
> 
> Moreover, upon reviewing the Documentation[1], I could not find any
> indications that reshape must undertake the actions as specified in
> md_check_recovery()'s documentation, such as eliminating faulty devices.
> 
> [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
>> Thanks,
>> Kuai
>>
>>>>> +
>>>>> +	memset(&newpool, 0, sizeof(newpool));
>>>>> +	memset(&oldpool, 0, sizeof(oldpool));
>>>>> +
>>>>>     	if (!mddev_is_clustered(mddev))
>>>>>     		md_allow_write(mddev);
>>> .
>>>
>>
> Thanks,
> Hu
> .
>
Xueshi Hu July 29, 2023, 3:29 a.m. UTC | #6
On Sat, Jul 29, 2023 at 08:58:45AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/28 22:42, Xueshi Hu 写道:
> > On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2023/07/19 19:51, Xueshi Hu 写道:
> > > > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> > > > > Dear Xueshi,
> > > > > 
> > > > > 
> > > > > Thank you for your patches.
> > > > > 
> > > > > Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > > > > > If array size doesn't changed, nothing need to do.
> > > > > 
> > > > > Maybe: … nothing needs to be done.
> > > > What a hurry, I'll be cautious next time and check the sentences with
> > > > tools.
> > > > > 
> > > > > Do you have a test case to reproduce it?
> > > > > 
> > > > Userspace command:
> > > > 
> > > > 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> > > > 
> > > > Kernel function calling flow:
> > > > 
> > > > md_attr_store()
> > > > 	raid_disks_store()
> > > > 		update_raid_disks()
> > > > 			raid1_reshape()
> > > > 
> > > > Maybe I shall provide more information when submit patches, thank you for
> > > > reminding me.
> > > > > 
> > > > > Kind regards,
> > > > > 
> > > > > Paul
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > > > > ---
> > > > > >     drivers/md/raid1.c | 9 ++++++---
> > > > > >     1 file changed, 6 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > > > > index 62e86b7d1561..5840b8b0f9b7 100644
> > > > > > --- a/drivers/md/raid1.c
> > > > > > +++ b/drivers/md/raid1.c
> > > > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> > > > > >     	int d, d2;
> > > > > >     	int ret;
> > > > > > -	memset(&newpool, 0, sizeof(newpool));
> > > > > > -	memset(&oldpool, 0, sizeof(oldpool));
> > > > > > -
> > > > > >     	/* Cannot change chunk_size, layout, or level */
> > > > > >     	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> > > > > >     	    mddev->layout != mddev->new_layout ||
> > > > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> > > > > >     		return -EINVAL;
> > > > > >     	}
> > > > > > +	if (mddev->delta_disks == 0)
> > > > > > +		return 0; /* nothing to do */
> > > 
> > > I think this is wrong, you should at least keep following:
> > > 
> > >          set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > >          set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > >          md_wakeup_thread(mddev->thread);
> > > 
> > I fail to comprehend the rationale behind the kernel's need to invoke
> > raid1d() repeatedly despite the absence of any modifications.
> > It appears that raid1d() is responsible for invoking md_check_recovery()
> > and resubmit the failed io.
> 
> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> so that md_check_recovery() can start to reshape.
I apologize, but I am still unable to comprehend your idea.
If mmdev->delta_disks == 0 , what is the purpose of invoking
md_check_recovery() again?
> 
> Thanks,
> Kuai
> > 
> > Evidently, it is not within the scope of raid1_reshape() to resubmit
> > failed io.
> > 
> > Moreover, upon reviewing the Documentation[1], I could not find any
> > indications that reshape must undertake the actions as specified in
> > md_check_recovery()'s documentation, such as eliminating faulty devices.
> > 
> > [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
> > > Thanks,
> > > Kuai
> > > 
> > > > > > +
> > > > > > +	memset(&newpool, 0, sizeof(newpool));
> > > > > > +	memset(&oldpool, 0, sizeof(oldpool));
> > > > > > +
> > > > > >     	if (!mddev_is_clustered(mddev))
> > > > > >     		md_allow_write(mddev);
> > > > .
> > > > 
> > > 
> > Thanks,
> > Hu
> > .
> > 
>
Yu Kuai July 29, 2023, 3:36 a.m. UTC | #7
Hi,

在 2023/07/29 11:29, Xueshi Hu 写道:
>>>> I think this is wrong, you should at least keep following:
>>>>
>>>>           set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>           set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>           md_wakeup_thread(mddev->thread);
>>>>
>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>> raid1d() repeatedly despite the absence of any modifications.
>>> It appears that raid1d() is responsible for invoking md_check_recovery()
>>> and resubmit the failed io.
>>
>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>> so that md_check_recovery() can start to reshape.
> I apologize, but I am still unable to comprehend your idea.
> If mmdev->delta_disks == 0 , what is the purpose of invoking
> md_check_recovery() again?

Sounds like you think raid1_reshape() can only be called from
md_check_recovery(), please take a look at other callers.

Thanks,
Kuai
Yu Kuai July 29, 2023, 3:51 a.m. UTC | #8
Hi,

在 2023/07/29 11:36, Yu Kuai 写道:
> Hi,
> 
> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>> I think this is wrong, you should at least keep following:
>>>>>
>>>>>           set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>           set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>           md_wakeup_thread(mddev->thread);
>>>>>
>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>> raid1d() repeatedly despite the absence of any modifications.
>>>> It appears that raid1d() is responsible for invoking 
>>>> md_check_recovery()
>>>> and resubmit the failed io.
>>>
>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>> so that md_check_recovery() can start to reshape.
>> I apologize, but I am still unable to comprehend your idea.
>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>> md_check_recovery() again?
> 
> Sounds like you think raid1_reshape() can only be called from
> md_check_recovery(), please take a look at other callers.

And even if raid1_reshape() is called from md_check_recovery(),
which means reshape is interupted, then MD_RECOVERY_RECOVER
and MD_RECOVERY_RECOVER need to be set, so that reshape can
continue.

Thanks,
Kuai
> 
> Thanks,
> Kuai
> 
> .
>
Xueshi Hu July 29, 2023, 6:16 a.m. UTC | #9
On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/29 11:36, Yu Kuai 写道:
> > Hi,
> > 
> > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > I think this is wrong, you should at least keep following:
> > > > > > 
> > > > > >           set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > >           set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > >           md_wakeup_thread(mddev->thread);
> > > > > > 
> > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > It appears that raid1d() is responsible for invoking
> > > > > md_check_recovery()
> > > > > and resubmit the failed io.
> > > > 
> > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > so that md_check_recovery() can start to reshape.
> > > I apologize, but I am still unable to comprehend your idea.
> > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > md_check_recovery() again?
> > 
> > Sounds like you think raid1_reshape() can only be called from
> > md_check_recovery(), please take a look at other callers.
Thank you for the quick reply and patience.

Of course, I have checked all the caller of md_personality::check_reshape.

- layout_store
- action_store
- chunk_size_store
- md_ioctl
	__md_set_array_info 
		update_array_info 
- md_check_recovery
- md_ioctl
	__md_set_array_info
		update_array_info
			update_raid_disks
- process_metadata_update
	md_reload_sb
		check_sb_changes
			update_raid_disks
- raid_disks_store
	update_raid_disks

There are three categories of callers except md_check_recovery().
1. write to sysfs
2. ioctl
3. revice instructions from md cluster peer

Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
example, if the mddev::raid_disks is already 4, I don't think 
raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
bit, then wake up the mddev::thread to call md_check_recovery().

Is there any counterexample to demonstrate the issues that may arise if
md_check_recovery() is not called out of raid1_reshape() ?

> 
> And even if raid1_reshape() is called from md_check_recovery(),
> which means reshape is interupted, then MD_RECOVERY_RECOVER
> and MD_RECOVERY_RECOVER need to be set, so that reshape can
> continue.

raid1 only register md_personality::check_reshape, all the work related
with reshape are handle in md_personality::check_reshape.
What's the meaning of "reshape can continue" ? In another word, what's
the additional work need to do if mddev::raid_disks doesn't change ?

Thanks,
Hu

> 
> Thanks,
> Kuai
> > 
> > Thanks,
> > Kuai
> > 
> > .
> > 
>
Yu Kuai July 29, 2023, 7:37 a.m. UTC | #10
Hi,

在 2023/07/29 14:16, Xueshi Hu 写道:
> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/29 11:36, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>>>> I think this is wrong, you should at least keep following:
>>>>>>>
>>>>>>>            set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>>>            set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>>>            md_wakeup_thread(mddev->thread);
>>>>>>>
>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>>>> raid1d() repeatedly despite the absence of any modifications.
>>>>>> It appears that raid1d() is responsible for invoking
>>>>>> md_check_recovery()
>>>>>> and resubmit the failed io.
>>>>>
>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>>>> so that md_check_recovery() can start to reshape.
>>>> I apologize, but I am still unable to comprehend your idea.
>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>>>> md_check_recovery() again?
>>>
>>> Sounds like you think raid1_reshape() can only be called from
>>> md_check_recovery(), please take a look at other callers.
> Thank you for the quick reply and patience.
> 
> Of course, I have checked all the caller of md_personality::check_reshape.
> 
> - layout_store
> - action_store
> - chunk_size_store
> - md_ioctl
> 	__md_set_array_info
> 		update_array_info
> - md_check_recovery
> - md_ioctl
> 	__md_set_array_info
> 		update_array_info
> 			update_raid_disks
> - process_metadata_update
> 	md_reload_sb
> 		check_sb_changes
> 			update_raid_disks
> - raid_disks_store
> 	update_raid_disks
> 
> There are three categories of callers except md_check_recovery().
> 1. write to sysfs
> 2. ioctl
> 3. revice instructions from md cluster peer
> 
> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> example, if the mddev::raid_disks is already 4, I don't think
> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> bit, then wake up the mddev::thread to call md_check_recovery().
> 
> Is there any counterexample to demonstrate the issues that may arise if
> md_check_recovery() is not called out of raid1_reshape() ?
> 
>>
>> And even if raid1_reshape() is called from md_check_recovery(),
>> which means reshape is interupted, then MD_RECOVERY_RECOVER
>> and MD_RECOVERY_RECOVER need to be set, so that reshape can
>> continue.
> 
> raid1 only register md_personality::check_reshape, all the work related
> with reshape are handle in md_personality::check_reshape.
> What's the meaning of "reshape can continue" ? In another word, what's
> the additional work need to do if mddev::raid_disks doesn't change ?

Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
really means recovery, synchronize data to new disks. Never mind what
I said "reshape can continue", it's not possible for raid1.

Then the problem is the same from 'recovery' point of view:
if the 'recovery' is interrupted, before this patch, even if raid_disks
is the same, raid1_reshape() will still set the flag and then
md_check_recovery() will try to continue the recovery. I'd like to
keep this behaviour untill it can be sure that no user will be
affected.

Thanks,
Kuai

> 
> Thanks,
> Hu
> 
>>
>> Thanks,
>> Kuai
>>>
>>> Thanks,
>>> Kuai
>>>
>>> .
>>>
>>
> .
>
Xueshi Hu July 29, 2023, 12:23 p.m. UTC | #11
On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/29 14:16, Xueshi Hu 写道:
> > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2023/07/29 11:36, Yu Kuai 写道:
> > > > Hi,
> > > > 
> > > > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > > > I think this is wrong, you should at least keep following:
> > > > > > > > 
> > > > > > > >            set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > > > >            set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > > > >            md_wakeup_thread(mddev->thread);
> > > > > > > > 
> > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > > > It appears that raid1d() is responsible for invoking
> > > > > > > md_check_recovery()
> > > > > > > and resubmit the failed io.
> > > > > > 
> > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > > > so that md_check_recovery() can start to reshape.
> > > > > I apologize, but I am still unable to comprehend your idea.
> > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > > > md_check_recovery() again?
> > > > 
> > > > Sounds like you think raid1_reshape() can only be called from
> > > > md_check_recovery(), please take a look at other callers.
> > Thank you for the quick reply and patience.
> > 
> > Of course, I have checked all the caller of md_personality::check_reshape.
> > 
> > - layout_store
> > - action_store
> > - chunk_size_store
> > - md_ioctl
> > 	__md_set_array_info
> > 		update_array_info
> > - md_check_recovery
> > - md_ioctl
> > 	__md_set_array_info
> > 		update_array_info
> > 			update_raid_disks
> > - process_metadata_update
> > 	md_reload_sb
> > 		check_sb_changes
> > 			update_raid_disks
> > - raid_disks_store
> > 	update_raid_disks
> > 
> > There are three categories of callers except md_check_recovery().
> > 1. write to sysfs
> > 2. ioctl
> > 3. revice instructions from md cluster peer
> > 
> > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> > example, if the mddev::raid_disks is already 4, I don't think
> > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> > bit, then wake up the mddev::thread to call md_check_recovery().
> > 
> > Is there any counterexample to demonstrate the issues that may arise if
> > md_check_recovery() is not called out of raid1_reshape() ?
> > 
> > > 
> > > And even if raid1_reshape() is called from md_check_recovery(),
> > > which means reshape is interupted, then MD_RECOVERY_RECOVER
> > > and MD_RECOVERY_RECOVER need to be set, so that reshape can
> > > continue.
> > 
> > raid1 only register md_personality::check_reshape, all the work related
> > with reshape are handle in md_personality::check_reshape.
> > What's the meaning of "reshape can continue" ? In another word, what's
> > the additional work need to do if mddev::raid_disks doesn't change ?
> 
> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
> really means recovery, synchronize data to new disks. Never mind what
> I said "reshape can continue", it's not possible for raid1.
> 
> Then the problem is the same from 'recovery' point of view:
> if the 'recovery' is interrupted, before this patch, even if raid_disks
> is the same, raid1_reshape() will still set the flag and then
> md_check_recovery() will try to continue the recovery. I'd like to
> keep this behaviour untill it can be sure that no user will be
> affected.

But md_check_recovery() will never call raid1_reshape().

md_personality::check_reshape() is called in md_check_recovery() when
the reshape is in process. But, raid1 is speicial as
mddev::reshape_position always equals to MaxSector in raid1.

By the way, the concern in V2 patch[1] is unnecessary out of the same
reason.

[1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t

> 
> Thanks,
> Kuai
> 
> > 
> > Thanks,
> > Hu
> > 
> > > 
> > > Thanks,
> > > Kuai
> > > > 
> > > > Thanks,
> > > > Kuai
> > > > 
> > > > .
> > > > 
> > > 
> > .
> > 
>
Yu Kuai July 31, 2023, 1:03 a.m. UTC | #12
Hi,

在 2023/07/29 20:23, Xueshi Hu 写道:
> On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/29 14:16, Xueshi Hu 写道:
>>> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2023/07/29 11:36, Yu Kuai 写道:
>>>>> Hi,
>>>>>
>>>>> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>>>>>> I think this is wrong, you should at least keep following:
>>>>>>>>>
>>>>>>>>>             set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>>>>>             set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>>>>>             md_wakeup_thread(mddev->thread);
>>>>>>>>>
>>>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>>>>>> raid1d() repeatedly despite the absence of any modifications.
>>>>>>>> It appears that raid1d() is responsible for invoking
>>>>>>>> md_check_recovery()
>>>>>>>> and resubmit the failed io.
>>>>>>>
>>>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>>>>>> so that md_check_recovery() can start to reshape.
>>>>>> I apologize, but I am still unable to comprehend your idea.
>>>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>>>>>> md_check_recovery() again?
>>>>>
>>>>> Sounds like you think raid1_reshape() can only be called from
>>>>> md_check_recovery(), please take a look at other callers.
>>> Thank you for the quick reply and patience.
>>>
>>> Of course, I have checked all the caller of md_personality::check_reshape.
>>>
>>> - layout_store
>>> - action_store
>>> - chunk_size_store
>>> - md_ioctl
>>> 	__md_set_array_info
>>> 		update_array_info
>>> - md_check_recovery
>>> - md_ioctl
>>> 	__md_set_array_info
>>> 		update_array_info
>>> 			update_raid_disks
>>> - process_metadata_update
>>> 	md_reload_sb
>>> 		check_sb_changes
>>> 			update_raid_disks
>>> - raid_disks_store
>>> 	update_raid_disks
>>>
>>> There are three categories of callers except md_check_recovery().
>>> 1. write to sysfs
>>> 2. ioctl
>>> 3. revice instructions from md cluster peer
>>>
>>> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
>>> example, if the mddev::raid_disks is already 4, I don't think
>>> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
>>> bit, then wake up the mddev::thread to call md_check_recovery().
>>>
>>> Is there any counterexample to demonstrate the issues that may arise if
>>> md_check_recovery() is not called out of raid1_reshape() ?
>>>
>>>>
>>>> And even if raid1_reshape() is called from md_check_recovery(),
>>>> which means reshape is interupted, then MD_RECOVERY_RECOVER
>>>> and MD_RECOVERY_RECOVER need to be set, so that reshape can
>>>> continue.
>>>
>>> raid1 only register md_personality::check_reshape, all the work related
>>> with reshape are handle in md_personality::check_reshape.
>>> What's the meaning of "reshape can continue" ? In another word, what's
>>> the additional work need to do if mddev::raid_disks doesn't change ?
>>
>> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
>> really means recovery, synchronize data to new disks. Never mind what
>> I said "reshape can continue", it's not possible for raid1.
>>
>> Then the problem is the same from 'recovery' point of view:
>> if the 'recovery' is interrupted, before this patch, even if raid_disks
>> is the same, raid1_reshape() will still set the flag and then
>> md_check_recovery() will try to continue the recovery. I'd like to
>> keep this behaviour untill it can be sure that no user will be
>> affected.
> 
> But md_check_recovery() will never call raid1_reshape().
> 
> md_personality::check_reshape() is called in md_check_recovery() when
> the reshape is in process. But, raid1 is speicial as
> mddev::reshape_position always equals to MaxSector in raid1.
> 
> By the way, the concern in V2 patch[1] is unnecessary out of the same
> reason.
> 

Well... I just said reshape can continue is not possible for raid1, and
this patch will cause that recovery can't continue is some cases.

Thanks,
Kuai

> [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t
> 
>>
>> Thanks,
>> Kuai
>>
>>>
>>> Thanks,
>>> Hu
>>>
>>>>
>>>> Thanks,
>>>> Kuai
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>
>>>>> .
>>>>>
>>>>
>>> .
>>>
>>
> .
>
Xueshi Hu July 31, 2023, 3:48 a.m. UTC | #13
On Mon, Jul 31, 2023 at 09:03:52AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/29 20:23, Xueshi Hu 写道:
> > On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2023/07/29 14:16, Xueshi Hu 写道:
> > > > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> > > > > Hi,
> > > > > 
> > > > > 在 2023/07/29 11:36, Yu Kuai 写道:
> > > > > > Hi,
> > > > > > 
> > > > > > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > > > > > I think this is wrong, you should at least keep following:
> > > > > > > > > > 
> > > > > > > > > >             set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > > > > > >             set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > > > > > >             md_wakeup_thread(mddev->thread);
> > > > > > > > > > 
> > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > > > > > It appears that raid1d() is responsible for invoking
> > > > > > > > > md_check_recovery()
> > > > > > > > > and resubmit the failed io.
> > > > > > > > 
> > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > > > > > so that md_check_recovery() can start to reshape.
> > > > > > > I apologize, but I am still unable to comprehend your idea.
> > > > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > > > > > md_check_recovery() again?
> > > > > > 
> > > > > > Sounds like you think raid1_reshape() can only be called from
> > > > > > md_check_recovery(), please take a look at other callers.
> > > > Thank you for the quick reply and patience.
> > > > 
> > > > Of course, I have checked all the caller of md_personality::check_reshape.
> > > > 
> > > > - layout_store
> > > > - action_store
> > > > - chunk_size_store
> > > > - md_ioctl
> > > > 	__md_set_array_info
> > > > 		update_array_info
> > > > - md_check_recovery
> > > > - md_ioctl
> > > > 	__md_set_array_info
> > > > 		update_array_info
> > > > 			update_raid_disks
> > > > - process_metadata_update
> > > > 	md_reload_sb
> > > > 		check_sb_changes
> > > > 			update_raid_disks
> > > > - raid_disks_store
> > > > 	update_raid_disks
> > > > 
> > > > There are three categories of callers except md_check_recovery().
> > > > 1. write to sysfs
> > > > 2. ioctl
> > > > 3. revice instructions from md cluster peer
> > > > 
> > > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> > > > example, if the mddev::raid_disks is already 4, I don't think
> > > > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> > > > bit, then wake up the mddev::thread to call md_check_recovery().
> > > > 
> > > > Is there any counterexample to demonstrate the issues that may arise if
> > > > md_check_recovery() is not called out of raid1_reshape() ?
> > > > 
> > > > > 
> > > > > And even if raid1_reshape() is called from md_check_recovery(),
> > > > > which means reshape is interupted, then MD_RECOVERY_RECOVER
> > > > > and MD_RECOVERY_RECOVER need to be set, so that reshape can
> > > > > continue.
> > > > 
> > > > raid1 only register md_personality::check_reshape, all the work related
> > > > with reshape are handle in md_personality::check_reshape.
> > > > What's the meaning of "reshape can continue" ? In another word, what's
> > > > the additional work need to do if mddev::raid_disks doesn't change ?
> > > 
> > > Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
> > > really means recovery, synchronize data to new disks. Never mind what
> > > I said "reshape can continue", it's not possible for raid1.
> > > 
> > > Then the problem is the same from 'recovery' point of view:
> > > if the 'recovery' is interrupted, before this patch, even if raid_disks
> > > is the same, raid1_reshape() will still set the flag and then
> > > md_check_recovery() will try to continue the recovery. I'd like to
> > > keep this behaviour untill it can be sure that no user will be
> > > affected.
> > 
> > But md_check_recovery() will never call raid1_reshape().
> > 
> > md_personality::check_reshape() is called in md_check_recovery() when
> > the reshape is in process. But, raid1 is speicial as
> > mddev::reshape_position always equals to MaxSector in raid1.
> > 
> > By the way, the concern in V2 patch[1] is unnecessary out of the same
> > reason.
> > 
> 
> Well... I just said reshape can continue is not possible for raid1, and
> this patch will cause that recovery can't continue is some cases.
I see. I will reread the relevant code to gain a better understanding of
"some cases".

Thanks,
Hu
> 
> Thanks,
> Kuai
> 
> > [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t
> > 
> > > 
> > > Thanks,
> > > Kuai
> > > 
> > > > 
> > > > Thanks,
> > > > Hu
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Kuai
> > > > > > 
> > > > > > Thanks,
> > > > > > Kuai
> > > > > > 
> > > > > > .
> > > > > > 
> > > > > 
> > > > .
> > > > 
> > > 
> > .
> > 
>
Yu Kuai July 31, 2023, 6:22 a.m. UTC | #14
Hi,

在 2023/07/31 11:48, Xueshi Hu 写道:
>> Well... I just said reshape can continue is not possible for raid1, and
>> this patch will cause that recovery can't continue is some cases.

> I see. I will reread the relevant code to gain a better understanding of
> "some cases".

It's not that complex at all, the key point is whether your changes
introduce functional changes, if so, does the effect fully evaluated.

In this case, raid1_reshape(all the callers) will set
MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon
thread must check if recovery is needed, if so it will start recovery.

I agree that recovery is probably not needed in this case that
raid_disks is the same, however, note that recovery can be interrupted
by user, and raid1_reshape() can be triggered by user as well, this
means following procedures will end up different:

1. trigger a recovery;
2. interupt the recovery;
3. trigger a raid1_reshape();

Before this patch, the interupted recovery will continue;

This is minor, but I can't say that no user will be affected, and I
really prefer to keep this behaviour, which means this patch can just do
some cleanup without introducing functional changes.

Thanks,
Kuai
Xueshi Hu July 31, 2023, 2:12 p.m. UTC | #15
On Mon, Jul 31, 2023 at 02:22:31PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/31 11:48, Xueshi Hu 写道:
> > > Well... I just said reshape can continue is not possible for raid1, and
> > > this patch will cause that recovery can't continue is some cases.
> 
> > I see. I will reread the relevant code to gain a better understanding of
> > "some cases".
> 
> It's not that complex at all, the key point is whether your changes
> introduce functional changes, if so, does the effect fully evaluated.
> 
> In this case, raid1_reshape(all the callers) will set
> MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon
> thread must check if recovery is needed, if so it will start recovery.
> 
> I agree that recovery is probably not needed in this case that
> raid_disks is the same, however, note that recovery can be interrupted
> by user, and raid1_reshape() can be triggered by user as well, this
> means following procedures will end up different:
> 
> 1. trigger a recovery;
> 2. interupt the recovery;
> 3. trigger a raid1_reshape();
Sincere gratitude for your detailed response. I have been consistently
misunderstanding the concept of "interrupted recovery." I now understand
your concerns.

Thanks,
Hu
> 
> Before this patch, the interupted recovery will continue;
> 
> This is minor, but I can't say that no user will be affected, and I
> really prefer to keep this behaviour, which means this patch can just do
> some cleanup without introducing functional changes.
> 
> Thanks,
> Kuai
>
diff mbox series

Patch

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 62e86b7d1561..5840b8b0f9b7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3282,9 +3282,6 @@  static int raid1_reshape(struct mddev *mddev)
 	int d, d2;
 	int ret;
 
-	memset(&newpool, 0, sizeof(newpool));
-	memset(&oldpool, 0, sizeof(oldpool));
-
 	/* Cannot change chunk_size, layout, or level */
 	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
 	    mddev->layout != mddev->new_layout ||
@@ -3295,6 +3292,12 @@  static int raid1_reshape(struct mddev *mddev)
 		return -EINVAL;
 	}
 
+	if (mddev->delta_disks == 0)
+		return 0; /* nothing to do */
+
+	memset(&newpool, 0, sizeof(newpool));
+	memset(&oldpool, 0, sizeof(oldpool));
+
 	if (!mddev_is_clustered(mddev))
 		md_allow_write(mddev);