Message ID | 20230719070954.3084379-4-xueshi.hu@smartx.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | don't change mempool if in-flight r1bio exists | expand |
Dear Xueshi, Thank you for your patches. Am 19.07.23 um 09:09 schrieb Xueshi Hu: > If array size doesn't changed, nothing need to do. Maybe: … nothing needs to be done. Do you have a test case to reproduce it? Kind regards, Paul > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > --- > drivers/md/raid1.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 62e86b7d1561..5840b8b0f9b7 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > int d, d2; > int ret; > > - memset(&newpool, 0, sizeof(newpool)); > - memset(&oldpool, 0, sizeof(oldpool)); > - > /* Cannot change chunk_size, layout, or level */ > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > mddev->layout != mddev->new_layout || > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > return -EINVAL; > } > > + if (mddev->delta_disks == 0) > + return 0; /* nothing to do */ > + > + memset(&newpool, 0, sizeof(newpool)); > + memset(&oldpool, 0, sizeof(oldpool)); > + > if (!mddev_is_clustered(mddev)) > md_allow_write(mddev); >
On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: > Dear Xueshi, > > > Thank you for your patches. > > Am 19.07.23 um 09:09 schrieb Xueshi Hu: > > If array size doesn't changed, nothing need to do. > > Maybe: … nothing needs to be done. What a hurry, I'll be cautious next time and check the sentences with tools. > > Do you have a test case to reproduce it? > Userspace command: echo 4 > /sys/devices/virtual/block/md10/md/raid_disks Kernel function calling flow: md_attr_store() raid_disks_store() update_raid_disks() raid1_reshape() Maybe I shall provide more information when submit patches, thank you for reminding me. > > Kind regards, > > Paul > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > > --- > > drivers/md/raid1.c | 9 ++++++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > index 62e86b7d1561..5840b8b0f9b7 100644 > > --- a/drivers/md/raid1.c > > +++ b/drivers/md/raid1.c > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > > int d, d2; > > int ret; > > - memset(&newpool, 0, sizeof(newpool)); > > - memset(&oldpool, 0, sizeof(oldpool)); > > - > > /* Cannot change chunk_size, layout, or level */ > > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > > mddev->layout != mddev->new_layout || > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > > return -EINVAL; > > } > > + if (mddev->delta_disks == 0) > > + return 0; /* nothing to do */ > > + > > + memset(&newpool, 0, sizeof(newpool)); > > + memset(&oldpool, 0, sizeof(oldpool)); > > + > > if (!mddev_is_clustered(mddev)) > > md_allow_write(mddev);
Hi, 在 2023/07/19 19:51, Xueshi Hu 写道: > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: >> Dear Xueshi, >> >> >> Thank you for your patches. >> >> Am 19.07.23 um 09:09 schrieb Xueshi Hu: >>> If array size doesn't changed, nothing need to do. >> >> Maybe: … nothing needs to be done. > What a hurry, I'll be cautious next time and check the sentences with > tools. >> >> Do you have a test case to reproduce it? >> > Userspace command: > > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks > > Kernel function calling flow: > > md_attr_store() > raid_disks_store() > update_raid_disks() > raid1_reshape() > > Maybe I shall provide more information when submit patches, thank you for > reminding me. >> >> Kind regards, >> >> Paul >> >> >>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> >>> --- >>> drivers/md/raid1.c | 9 ++++++--- >>> 1 file changed, 6 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>> index 62e86b7d1561..5840b8b0f9b7 100644 >>> --- a/drivers/md/raid1.c >>> +++ b/drivers/md/raid1.c >>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) >>> int d, d2; >>> int ret; >>> - memset(&newpool, 0, sizeof(newpool)); >>> - memset(&oldpool, 0, sizeof(oldpool)); >>> - >>> /* Cannot change chunk_size, layout, or level */ >>> if (mddev->chunk_sectors != mddev->new_chunk_sectors || >>> mddev->layout != mddev->new_layout || >>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) >>> return -EINVAL; >>> } >>> + if (mddev->delta_disks == 0) >>> + return 0; /* nothing to do */ I think this is wrong, you should at least keep following: set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); Thanks, Kuai >>> + >>> + memset(&newpool, 0, sizeof(newpool)); >>> + memset(&oldpool, 0, sizeof(oldpool)); >>> + >>> if (!mddev_is_clustered(mddev)) >>> md_allow_write(mddev); > . >
On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/19 19:51, Xueshi Hu 写道: > > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: > > > Dear Xueshi, > > > > > > > > > Thank you for your patches. > > > > > > Am 19.07.23 um 09:09 schrieb Xueshi Hu: > > > > If array size doesn't changed, nothing need to do. > > > > > > Maybe: … nothing needs to be done. > > What a hurry, I'll be cautious next time and check the sentences with > > tools. > > > > > > Do you have a test case to reproduce it? > > > > > Userspace command: > > > > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks > > > > Kernel function calling flow: > > > > md_attr_store() > > raid_disks_store() > > update_raid_disks() > > raid1_reshape() > > > > Maybe I shall provide more information when submit patches, thank you for > > reminding me. > > > > > > Kind regards, > > > > > > Paul > > > > > > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > > > > --- > > > > drivers/md/raid1.c | 9 ++++++--- > > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > > index 62e86b7d1561..5840b8b0f9b7 100644 > > > > --- a/drivers/md/raid1.c > > > > +++ b/drivers/md/raid1.c > > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > > > > int d, d2; > > > > int ret; > > > > - memset(&newpool, 0, sizeof(newpool)); > > > > - memset(&oldpool, 0, sizeof(oldpool)); > > > > - > > > > /* Cannot change chunk_size, layout, or level */ > > > > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > > > > mddev->layout != mddev->new_layout || > > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > > > > return -EINVAL; > > > > } > > > > + if (mddev->delta_disks == 0) > > > > + return 0; /* nothing to do */ > > I think this is wrong, you should at least keep following: > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > md_wakeup_thread(mddev->thread); > I fail to comprehend the rationale behind the kernel's need to invoke raid1d() repeatedly despite the absence of any modifications. It appears that raid1d() is responsible for invoking md_check_recovery() and resubmit the failed io. Evidently, it is not within the scope of raid1_reshape() to resubmit failed io. Moreover, upon reviewing the Documentation[1], I could not find any indications that reshape must undertake the actions as specified in md_check_recovery()'s documentation, such as eliminating faulty devices. [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html > Thanks, > Kuai > > > > > + > > > > + memset(&newpool, 0, sizeof(newpool)); > > > > + memset(&oldpool, 0, sizeof(oldpool)); > > > > + > > > > if (!mddev_is_clustered(mddev)) > > > > md_allow_write(mddev); > > . > > > Thanks, Hu
Hi, 在 2023/07/28 22:42, Xueshi Hu 写道: > On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/07/19 19:51, Xueshi Hu 写道: >>> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: >>>> Dear Xueshi, >>>> >>>> >>>> Thank you for your patches. >>>> >>>> Am 19.07.23 um 09:09 schrieb Xueshi Hu: >>>>> If array size doesn't changed, nothing need to do. >>>> >>>> Maybe: … nothing needs to be done. >>> What a hurry, I'll be cautious next time and check the sentences with >>> tools. >>>> >>>> Do you have a test case to reproduce it? >>>> >>> Userspace command: >>> >>> echo 4 > /sys/devices/virtual/block/md10/md/raid_disks >>> >>> Kernel function calling flow: >>> >>> md_attr_store() >>> raid_disks_store() >>> update_raid_disks() >>> raid1_reshape() >>> >>> Maybe I shall provide more information when submit patches, thank you for >>> reminding me. >>>> >>>> Kind regards, >>>> >>>> Paul >>>> >>>> >>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> >>>>> --- >>>>> drivers/md/raid1.c | 9 ++++++--- >>>>> 1 file changed, 6 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>>> index 62e86b7d1561..5840b8b0f9b7 100644 >>>>> --- a/drivers/md/raid1.c >>>>> +++ b/drivers/md/raid1.c >>>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) >>>>> int d, d2; >>>>> int ret; >>>>> - memset(&newpool, 0, sizeof(newpool)); >>>>> - memset(&oldpool, 0, sizeof(oldpool)); >>>>> - >>>>> /* Cannot change chunk_size, layout, or level */ >>>>> if (mddev->chunk_sectors != mddev->new_chunk_sectors || >>>>> mddev->layout != mddev->new_layout || >>>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) >>>>> return -EINVAL; >>>>> } >>>>> + if (mddev->delta_disks == 0) >>>>> + return 0; /* nothing to do */ >> >> I think this is wrong, you should at least keep following: >> >> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >> md_wakeup_thread(mddev->thread); >> > I fail to comprehend the rationale behind the kernel's need to invoke > raid1d() repeatedly despite the absence of any modifications. > It appears that raid1d() is responsible for invoking md_check_recovery() > and resubmit the failed io. No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, so that md_check_recovery() can start to reshape. Thanks, Kuai > > Evidently, it is not within the scope of raid1_reshape() to resubmit > failed io. > > Moreover, upon reviewing the Documentation[1], I could not find any > indications that reshape must undertake the actions as specified in > md_check_recovery()'s documentation, such as eliminating faulty devices. > > [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html >> Thanks, >> Kuai >> >>>>> + >>>>> + memset(&newpool, 0, sizeof(newpool)); >>>>> + memset(&oldpool, 0, sizeof(oldpool)); >>>>> + >>>>> if (!mddev_is_clustered(mddev)) >>>>> md_allow_write(mddev); >>> . >>> >> > Thanks, > Hu > . >
On Sat, Jul 29, 2023 at 08:58:45AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/28 22:42, Xueshi Hu 写道: > > On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/07/19 19:51, Xueshi Hu 写道: > > > > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: > > > > > Dear Xueshi, > > > > > > > > > > > > > > > Thank you for your patches. > > > > > > > > > > Am 19.07.23 um 09:09 schrieb Xueshi Hu: > > > > > > If array size doesn't changed, nothing need to do. > > > > > > > > > > Maybe: … nothing needs to be done. > > > > What a hurry, I'll be cautious next time and check the sentences with > > > > tools. > > > > > > > > > > Do you have a test case to reproduce it? > > > > > > > > > Userspace command: > > > > > > > > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks > > > > > > > > Kernel function calling flow: > > > > > > > > md_attr_store() > > > > raid_disks_store() > > > > update_raid_disks() > > > > raid1_reshape() > > > > > > > > Maybe I shall provide more information when submit patches, thank you for > > > > reminding me. > > > > > > > > > > Kind regards, > > > > > > > > > > Paul > > > > > > > > > > > > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > > > > > > --- > > > > > > drivers/md/raid1.c | 9 ++++++--- > > > > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > > > > index 62e86b7d1561..5840b8b0f9b7 100644 > > > > > > --- a/drivers/md/raid1.c > > > > > > +++ b/drivers/md/raid1.c > > > > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > > > > > > int d, d2; > > > > > > int ret; > > > > > > - memset(&newpool, 0, sizeof(newpool)); > > > > > > - memset(&oldpool, 0, sizeof(oldpool)); > > > > > > - > > > > > > /* Cannot change chunk_size, layout, or level */ > > > > > > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > > > > > > mddev->layout != mddev->new_layout || > > > > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > > > > > > return -EINVAL; > > > > > > } > > > > > > + if (mddev->delta_disks == 0) > > > > > > + return 0; /* nothing to do */ > > > > > > I think this is wrong, you should at least keep following: > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > md_wakeup_thread(mddev->thread); > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > raid1d() repeatedly despite the absence of any modifications. > > It appears that raid1d() is responsible for invoking md_check_recovery() > > and resubmit the failed io. > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > so that md_check_recovery() can start to reshape. I apologize, but I am still unable to comprehend your idea. If mmdev->delta_disks == 0 , what is the purpose of invoking md_check_recovery() again? > > Thanks, > Kuai > > > > Evidently, it is not within the scope of raid1_reshape() to resubmit > > failed io. > > > > Moreover, upon reviewing the Documentation[1], I could not find any > > indications that reshape must undertake the actions as specified in > > md_check_recovery()'s documentation, such as eliminating faulty devices. > > > > [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html > > > Thanks, > > > Kuai > > > > > > > > > + > > > > > > + memset(&newpool, 0, sizeof(newpool)); > > > > > > + memset(&oldpool, 0, sizeof(oldpool)); > > > > > > + > > > > > > if (!mddev_is_clustered(mddev)) > > > > > > md_allow_write(mddev); > > > > . > > > > > > > > > Thanks, > > Hu > > . > > >
Hi, 在 2023/07/29 11:29, Xueshi Hu 写道: >>>> I think this is wrong, you should at least keep following: >>>> >>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>> md_wakeup_thread(mddev->thread); >>>> >>> I fail to comprehend the rationale behind the kernel's need to invoke >>> raid1d() repeatedly despite the absence of any modifications. >>> It appears that raid1d() is responsible for invoking md_check_recovery() >>> and resubmit the failed io. >> >> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >> so that md_check_recovery() can start to reshape. > I apologize, but I am still unable to comprehend your idea. > If mmdev->delta_disks == 0 , what is the purpose of invoking > md_check_recovery() again? Sounds like you think raid1_reshape() can only be called from md_check_recovery(), please take a look at other callers. Thanks, Kuai
Hi, 在 2023/07/29 11:36, Yu Kuai 写道: > Hi, > > 在 2023/07/29 11:29, Xueshi Hu 写道: >>>>> I think this is wrong, you should at least keep following: >>>>> >>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>>> md_wakeup_thread(mddev->thread); >>>>> >>>> I fail to comprehend the rationale behind the kernel's need to invoke >>>> raid1d() repeatedly despite the absence of any modifications. >>>> It appears that raid1d() is responsible for invoking >>>> md_check_recovery() >>>> and resubmit the failed io. >>> >>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >>> so that md_check_recovery() can start to reshape. >> I apologize, but I am still unable to comprehend your idea. >> If mmdev->delta_disks == 0 , what is the purpose of invoking >> md_check_recovery() again? > > Sounds like you think raid1_reshape() can only be called from > md_check_recovery(), please take a look at other callers. And even if raid1_reshape() is called from md_check_recovery(), which means reshape is interupted, then MD_RECOVERY_RECOVER and MD_RECOVERY_RECOVER need to be set, so that reshape can continue. Thanks, Kuai > > Thanks, > Kuai > > . >
On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/29 11:36, Yu Kuai 写道: > > Hi, > > > > 在 2023/07/29 11:29, Xueshi Hu 写道: > > > > > > I think this is wrong, you should at least keep following: > > > > > > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > > > > md_wakeup_thread(mddev->thread); > > > > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > > > > raid1d() repeatedly despite the absence of any modifications. > > > > > It appears that raid1d() is responsible for invoking > > > > > md_check_recovery() > > > > > and resubmit the failed io. > > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > > > > so that md_check_recovery() can start to reshape. > > > I apologize, but I am still unable to comprehend your idea. > > > If mmdev->delta_disks == 0 , what is the purpose of invoking > > > md_check_recovery() again? > > > > Sounds like you think raid1_reshape() can only be called from > > md_check_recovery(), please take a look at other callers. Thank you for the quick reply and patience. Of course, I have checked all the caller of md_personality::check_reshape. - layout_store - action_store - chunk_size_store - md_ioctl __md_set_array_info update_array_info - md_check_recovery - md_ioctl __md_set_array_info update_array_info update_raid_disks - process_metadata_update md_reload_sb check_sb_changes update_raid_disks - raid_disks_store update_raid_disks There are three categories of callers except md_check_recovery(). 1. write to sysfs 2. ioctl 3. revice instructions from md cluster peer Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an example, if the mddev::raid_disks is already 4, I don't think raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED bit, then wake up the mddev::thread to call md_check_recovery(). Is there any counterexample to demonstrate the issues that may arise if md_check_recovery() is not called out of raid1_reshape() ? > > And even if raid1_reshape() is called from md_check_recovery(), > which means reshape is interupted, then MD_RECOVERY_RECOVER > and MD_RECOVERY_RECOVER need to be set, so that reshape can > continue. raid1 only register md_personality::check_reshape, all the work related with reshape are handle in md_personality::check_reshape. What's the meaning of "reshape can continue" ? In another word, what's the additional work need to do if mddev::raid_disks doesn't change ? Thanks, Hu > > Thanks, > Kuai > > > > Thanks, > > Kuai > > > > . > > >
Hi, 在 2023/07/29 14:16, Xueshi Hu 写道: > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/07/29 11:36, Yu Kuai 写道: >>> Hi, >>> >>> 在 2023/07/29 11:29, Xueshi Hu 写道: >>>>>>> I think this is wrong, you should at least keep following: >>>>>>> >>>>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>>>>> md_wakeup_thread(mddev->thread); >>>>>>> >>>>>> I fail to comprehend the rationale behind the kernel's need to invoke >>>>>> raid1d() repeatedly despite the absence of any modifications. >>>>>> It appears that raid1d() is responsible for invoking >>>>>> md_check_recovery() >>>>>> and resubmit the failed io. >>>>> >>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >>>>> so that md_check_recovery() can start to reshape. >>>> I apologize, but I am still unable to comprehend your idea. >>>> If mmdev->delta_disks == 0 , what is the purpose of invoking >>>> md_check_recovery() again? >>> >>> Sounds like you think raid1_reshape() can only be called from >>> md_check_recovery(), please take a look at other callers. > Thank you for the quick reply and patience. > > Of course, I have checked all the caller of md_personality::check_reshape. > > - layout_store > - action_store > - chunk_size_store > - md_ioctl > __md_set_array_info > update_array_info > - md_check_recovery > - md_ioctl > __md_set_array_info > update_array_info > update_raid_disks > - process_metadata_update > md_reload_sb > check_sb_changes > update_raid_disks > - raid_disks_store > update_raid_disks > > There are three categories of callers except md_check_recovery(). > 1. write to sysfs > 2. ioctl > 3. revice instructions from md cluster peer > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an > example, if the mddev::raid_disks is already 4, I don't think > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED > bit, then wake up the mddev::thread to call md_check_recovery(). > > Is there any counterexample to demonstrate the issues that may arise if > md_check_recovery() is not called out of raid1_reshape() ? > >> >> And even if raid1_reshape() is called from md_check_recovery(), >> which means reshape is interupted, then MD_RECOVERY_RECOVER >> and MD_RECOVERY_RECOVER need to be set, so that reshape can >> continue. > > raid1 only register md_personality::check_reshape, all the work related > with reshape are handle in md_personality::check_reshape. > What's the meaning of "reshape can continue" ? In another word, what's > the additional work need to do if mddev::raid_disks doesn't change ? Okay, I missed that raid1 doesn't have 'start_reshape', reshape here really means recovery, synchronize data to new disks. Never mind what I said "reshape can continue", it's not possible for raid1. Then the problem is the same from 'recovery' point of view: if the 'recovery' is interrupted, before this patch, even if raid_disks is the same, raid1_reshape() will still set the flag and then md_check_recovery() will try to continue the recovery. I'd like to keep this behaviour untill it can be sure that no user will be affected. Thanks, Kuai > > Thanks, > Hu > >> >> Thanks, >> Kuai >>> >>> Thanks, >>> Kuai >>> >>> . >>> >> > . >
On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/29 14:16, Xueshi Hu 写道: > > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/07/29 11:36, Yu Kuai 写道: > > > > Hi, > > > > > > > > 在 2023/07/29 11:29, Xueshi Hu 写道: > > > > > > > > I think this is wrong, you should at least keep following: > > > > > > > > > > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > > > > > > md_wakeup_thread(mddev->thread); > > > > > > > > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > > > > > > raid1d() repeatedly despite the absence of any modifications. > > > > > > > It appears that raid1d() is responsible for invoking > > > > > > > md_check_recovery() > > > > > > > and resubmit the failed io. > > > > > > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > > > > > > so that md_check_recovery() can start to reshape. > > > > > I apologize, but I am still unable to comprehend your idea. > > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking > > > > > md_check_recovery() again? > > > > > > > > Sounds like you think raid1_reshape() can only be called from > > > > md_check_recovery(), please take a look at other callers. > > Thank you for the quick reply and patience. > > > > Of course, I have checked all the caller of md_personality::check_reshape. > > > > - layout_store > > - action_store > > - chunk_size_store > > - md_ioctl > > __md_set_array_info > > update_array_info > > - md_check_recovery > > - md_ioctl > > __md_set_array_info > > update_array_info > > update_raid_disks > > - process_metadata_update > > md_reload_sb > > check_sb_changes > > update_raid_disks > > - raid_disks_store > > update_raid_disks > > > > There are three categories of callers except md_check_recovery(). > > 1. write to sysfs > > 2. ioctl > > 3. revice instructions from md cluster peer > > > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an > > example, if the mddev::raid_disks is already 4, I don't think > > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED > > bit, then wake up the mddev::thread to call md_check_recovery(). > > > > Is there any counterexample to demonstrate the issues that may arise if > > md_check_recovery() is not called out of raid1_reshape() ? > > > > > > > > And even if raid1_reshape() is called from md_check_recovery(), > > > which means reshape is interupted, then MD_RECOVERY_RECOVER > > > and MD_RECOVERY_RECOVER need to be set, so that reshape can > > > continue. > > > > raid1 only register md_personality::check_reshape, all the work related > > with reshape are handle in md_personality::check_reshape. > > What's the meaning of "reshape can continue" ? In another word, what's > > the additional work need to do if mddev::raid_disks doesn't change ? > > Okay, I missed that raid1 doesn't have 'start_reshape', reshape here > really means recovery, synchronize data to new disks. Never mind what > I said "reshape can continue", it's not possible for raid1. > > Then the problem is the same from 'recovery' point of view: > if the 'recovery' is interrupted, before this patch, even if raid_disks > is the same, raid1_reshape() will still set the flag and then > md_check_recovery() will try to continue the recovery. I'd like to > keep this behaviour untill it can be sure that no user will be > affected. But md_check_recovery() will never call raid1_reshape(). md_personality::check_reshape() is called in md_check_recovery() when the reshape is in process. But, raid1 is speicial as mddev::reshape_position always equals to MaxSector in raid1. By the way, the concern in V2 patch[1] is unnecessary out of the same reason. [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t > > Thanks, > Kuai > > > > > Thanks, > > Hu > > > > > > > > Thanks, > > > Kuai > > > > > > > > Thanks, > > > > Kuai > > > > > > > > . > > > > > > > > > . > > >
Hi, 在 2023/07/29 20:23, Xueshi Hu 写道: > On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/07/29 14:16, Xueshi Hu 写道: >>> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: >>>> Hi, >>>> >>>> 在 2023/07/29 11:36, Yu Kuai 写道: >>>>> Hi, >>>>> >>>>> 在 2023/07/29 11:29, Xueshi Hu 写道: >>>>>>>>> I think this is wrong, you should at least keep following: >>>>>>>>> >>>>>>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>>>>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>>>>>>> md_wakeup_thread(mddev->thread); >>>>>>>>> >>>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke >>>>>>>> raid1d() repeatedly despite the absence of any modifications. >>>>>>>> It appears that raid1d() is responsible for invoking >>>>>>>> md_check_recovery() >>>>>>>> and resubmit the failed io. >>>>>>> >>>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >>>>>>> so that md_check_recovery() can start to reshape. >>>>>> I apologize, but I am still unable to comprehend your idea. >>>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking >>>>>> md_check_recovery() again? >>>>> >>>>> Sounds like you think raid1_reshape() can only be called from >>>>> md_check_recovery(), please take a look at other callers. >>> Thank you for the quick reply and patience. >>> >>> Of course, I have checked all the caller of md_personality::check_reshape. >>> >>> - layout_store >>> - action_store >>> - chunk_size_store >>> - md_ioctl >>> __md_set_array_info >>> update_array_info >>> - md_check_recovery >>> - md_ioctl >>> __md_set_array_info >>> update_array_info >>> update_raid_disks >>> - process_metadata_update >>> md_reload_sb >>> check_sb_changes >>> update_raid_disks >>> - raid_disks_store >>> update_raid_disks >>> >>> There are three categories of callers except md_check_recovery(). >>> 1. write to sysfs >>> 2. ioctl >>> 3. revice instructions from md cluster peer >>> >>> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an >>> example, if the mddev::raid_disks is already 4, I don't think >>> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED >>> bit, then wake up the mddev::thread to call md_check_recovery(). >>> >>> Is there any counterexample to demonstrate the issues that may arise if >>> md_check_recovery() is not called out of raid1_reshape() ? >>> >>>> >>>> And even if raid1_reshape() is called from md_check_recovery(), >>>> which means reshape is interupted, then MD_RECOVERY_RECOVER >>>> and MD_RECOVERY_RECOVER need to be set, so that reshape can >>>> continue. >>> >>> raid1 only register md_personality::check_reshape, all the work related >>> with reshape are handle in md_personality::check_reshape. >>> What's the meaning of "reshape can continue" ? In another word, what's >>> the additional work need to do if mddev::raid_disks doesn't change ? >> >> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here >> really means recovery, synchronize data to new disks. Never mind what >> I said "reshape can continue", it's not possible for raid1. >> >> Then the problem is the same from 'recovery' point of view: >> if the 'recovery' is interrupted, before this patch, even if raid_disks >> is the same, raid1_reshape() will still set the flag and then >> md_check_recovery() will try to continue the recovery. I'd like to >> keep this behaviour untill it can be sure that no user will be >> affected. > > But md_check_recovery() will never call raid1_reshape(). > > md_personality::check_reshape() is called in md_check_recovery() when > the reshape is in process. But, raid1 is speicial as > mddev::reshape_position always equals to MaxSector in raid1. > > By the way, the concern in V2 patch[1] is unnecessary out of the same > reason. > Well... I just said reshape can continue is not possible for raid1, and this patch will cause that recovery can't continue is some cases. Thanks, Kuai > [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t > >> >> Thanks, >> Kuai >> >>> >>> Thanks, >>> Hu >>> >>>> >>>> Thanks, >>>> Kuai >>>>> >>>>> Thanks, >>>>> Kuai >>>>> >>>>> . >>>>> >>>> >>> . >>> >> > . >
On Mon, Jul 31, 2023 at 09:03:52AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/29 20:23, Xueshi Hu 写道: > > On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/07/29 14:16, Xueshi Hu 写道: > > > > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: > > > > > Hi, > > > > > > > > > > 在 2023/07/29 11:36, Yu Kuai 写道: > > > > > > Hi, > > > > > > > > > > > > 在 2023/07/29 11:29, Xueshi Hu 写道: > > > > > > > > > > I think this is wrong, you should at least keep following: > > > > > > > > > > > > > > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > > > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > > > > > > > > md_wakeup_thread(mddev->thread); > > > > > > > > > > > > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > > > > > > > > raid1d() repeatedly despite the absence of any modifications. > > > > > > > > > It appears that raid1d() is responsible for invoking > > > > > > > > > md_check_recovery() > > > > > > > > > and resubmit the failed io. > > > > > > > > > > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > > > > > > > > so that md_check_recovery() can start to reshape. > > > > > > > I apologize, but I am still unable to comprehend your idea. > > > > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking > > > > > > > md_check_recovery() again? > > > > > > > > > > > > Sounds like you think raid1_reshape() can only be called from > > > > > > md_check_recovery(), please take a look at other callers. > > > > Thank you for the quick reply and patience. > > > > > > > > Of course, I have checked all the caller of md_personality::check_reshape. > > > > > > > > - layout_store > > > > - action_store > > > > - chunk_size_store > > > > - md_ioctl > > > > __md_set_array_info > > > > update_array_info > > > > - md_check_recovery > > > > - md_ioctl > > > > __md_set_array_info > > > > update_array_info > > > > update_raid_disks > > > > - process_metadata_update > > > > md_reload_sb > > > > check_sb_changes > > > > update_raid_disks > > > > - raid_disks_store > > > > update_raid_disks > > > > > > > > There are three categories of callers except md_check_recovery(). > > > > 1. write to sysfs > > > > 2. ioctl > > > > 3. revice instructions from md cluster peer > > > > > > > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an > > > > example, if the mddev::raid_disks is already 4, I don't think > > > > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED > > > > bit, then wake up the mddev::thread to call md_check_recovery(). > > > > > > > > Is there any counterexample to demonstrate the issues that may arise if > > > > md_check_recovery() is not called out of raid1_reshape() ? > > > > > > > > > > > > > > And even if raid1_reshape() is called from md_check_recovery(), > > > > > which means reshape is interupted, then MD_RECOVERY_RECOVER > > > > > and MD_RECOVERY_RECOVER need to be set, so that reshape can > > > > > continue. > > > > > > > > raid1 only register md_personality::check_reshape, all the work related > > > > with reshape are handle in md_personality::check_reshape. > > > > What's the meaning of "reshape can continue" ? In another word, what's > > > > the additional work need to do if mddev::raid_disks doesn't change ? > > > > > > Okay, I missed that raid1 doesn't have 'start_reshape', reshape here > > > really means recovery, synchronize data to new disks. Never mind what > > > I said "reshape can continue", it's not possible for raid1. > > > > > > Then the problem is the same from 'recovery' point of view: > > > if the 'recovery' is interrupted, before this patch, even if raid_disks > > > is the same, raid1_reshape() will still set the flag and then > > > md_check_recovery() will try to continue the recovery. I'd like to > > > keep this behaviour untill it can be sure that no user will be > > > affected. > > > > But md_check_recovery() will never call raid1_reshape(). > > > > md_personality::check_reshape() is called in md_check_recovery() when > > the reshape is in process. But, raid1 is speicial as > > mddev::reshape_position always equals to MaxSector in raid1. > > > > By the way, the concern in V2 patch[1] is unnecessary out of the same > > reason. > > > > Well... I just said reshape can continue is not possible for raid1, and > this patch will cause that recovery can't continue is some cases. I see. I will reread the relevant code to gain a better understanding of "some cases". Thanks, Hu > > Thanks, > Kuai > > > [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t > > > > > > > > Thanks, > > > Kuai > > > > > > > > > > > Thanks, > > > > Hu > > > > > > > > > > > > > > Thanks, > > > > > Kuai > > > > > > > > > > > > Thanks, > > > > > > Kuai > > > > > > > > > > > > . > > > > > > > > > > > > > > > . > > > > > > > > > . > > >
Hi, 在 2023/07/31 11:48, Xueshi Hu 写道: >> Well... I just said reshape can continue is not possible for raid1, and >> this patch will cause that recovery can't continue is some cases. > I see. I will reread the relevant code to gain a better understanding of > "some cases". It's not that complex at all, the key point is whether your changes introduce functional changes, if so, does the effect fully evaluated. In this case, raid1_reshape(all the callers) will set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon thread must check if recovery is needed, if so it will start recovery. I agree that recovery is probably not needed in this case that raid_disks is the same, however, note that recovery can be interrupted by user, and raid1_reshape() can be triggered by user as well, this means following procedures will end up different: 1. trigger a recovery; 2. interupt the recovery; 3. trigger a raid1_reshape(); Before this patch, the interupted recovery will continue; This is minor, but I can't say that no user will be affected, and I really prefer to keep this behaviour, which means this patch can just do some cleanup without introducing functional changes. Thanks, Kuai
On Mon, Jul 31, 2023 at 02:22:31PM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/31 11:48, Xueshi Hu 写道: > > > Well... I just said reshape can continue is not possible for raid1, and > > > this patch will cause that recovery can't continue is some cases. > > > I see. I will reread the relevant code to gain a better understanding of > > "some cases". > > It's not that complex at all, the key point is whether your changes > introduce functional changes, if so, does the effect fully evaluated. > > In this case, raid1_reshape(all the callers) will set > MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon > thread must check if recovery is needed, if so it will start recovery. > > I agree that recovery is probably not needed in this case that > raid_disks is the same, however, note that recovery can be interrupted > by user, and raid1_reshape() can be triggered by user as well, this > means following procedures will end up different: > > 1. trigger a recovery; > 2. interupt the recovery; > 3. trigger a raid1_reshape(); Sincere gratitude for your detailed response. I have been consistently misunderstanding the concept of "interrupted recovery." I now understand your concerns. Thanks, Hu > > Before this patch, the interupted recovery will continue; > > This is minor, but I can't say that no user will be affected, and I > really prefer to keep this behaviour, which means this patch can just do > some cleanup without introducing functional changes. > > Thanks, > Kuai >
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 62e86b7d1561..5840b8b0f9b7 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) int d, d2; int ret; - memset(&newpool, 0, sizeof(newpool)); - memset(&oldpool, 0, sizeof(oldpool)); - /* Cannot change chunk_size, layout, or level */ if (mddev->chunk_sectors != mddev->new_chunk_sectors || mddev->layout != mddev->new_layout || @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) return -EINVAL; } + if (mddev->delta_disks == 0) + return 0; /* nothing to do */ + + memset(&newpool, 0, sizeof(newpool)); + memset(&oldpool, 0, sizeof(oldpool)); + if (!mddev_is_clustered(mddev)) md_allow_write(mddev);
If array size doesn't changed, nothing need to do. Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> --- drivers/md/raid1.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)