diff mbox series

mdadm reshape hangs on external grow chunk

Message ID 20220923142635.470305-1-ncroxon@redhat.com (mailing list archive)
State Rejected, archived
Headers show
Series mdadm reshape hangs on external grow chunk | expand

Commit Message

Nigel Croxon Sept. 23, 2022, 2:26 p.m. UTC
After creating a raid array on top of a imsm container. Try to
grow the chunk size and the reshape will hang with zero progress.
The reason is the computation of sync_max_to_set value:
if (before_data_disks <= data_disks)
        sync_max_to_set = sra->reshape_progress / data_disks;
    else
        sync_max_to_set = (sra->component_size * data_disks
                       - sra->reshape_progress) / data_disks;

Can produce a zero result. Which is then used to set the maximum
sync value, causing zero progress to the reshape.  The change is to
test if the sync_max_to_set value is zero. And if so, set the sysfs
sync_max to "max".

Steps to Reproduce:
1. Create a container and RAID0 array
mdadm -CR /dev/md/imsm -e imsm -n2 /dev/nvme0n1 /dev/nvme1n1
mdadm -CR  /dev/md/vol -l0 --chunk=16 -n2 /dev/nvme0n1 /dev/nvme1n1
2. Wait for resync
3. Try to grow the chunk size
mdadm --grow /dev/md/vol --chunk=256

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
---
 Grow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Kinga Stefaniuk Sept. 29, 2022, 9:35 a.m. UTC | #1
On Fri, 23 Sep 2022 10:26:35 -0400
Nigel Croxon <ncroxon@redhat.com> wrote:

> After creating a raid array on top of a imsm container. Try to
> grow the chunk size and the reshape will hang with zero progress.
> The reason is the computation of sync_max_to_set value:
> if (before_data_disks <= data_disks)
>         sync_max_to_set = sra->reshape_progress / data_disks;
>     else
>         sync_max_to_set = (sra->component_size * data_disks
>                        - sra->reshape_progress) / data_disks;
> 
> Can produce a zero result. Which is then used to set the maximum
> sync value, causing zero progress to the reshape.  The change is to
> test if the sync_max_to_set value is zero. And if so, set the sysfs
> sync_max to "max".
> 
> Steps to Reproduce:
> 1. Create a container and RAID0 array
> mdadm -CR /dev/md/imsm -e imsm -n2 /dev/nvme0n1 /dev/nvme1n1
> mdadm -CR  /dev/md/vol -l0 --chunk=16 -n2 /dev/nvme0n1 /dev/nvme1n1
> 2. Wait for resync
> 3. Try to grow the chunk size
> mdadm --grow /dev/md/vol --chunk=256
> 
> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> ---
>  Grow.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Grow.c b/Grow.c
> index 0f07a894..6c5021bc 100644
> --- a/Grow.c
> +++ b/Grow.c
> @@ -943,7 +943,7 @@ int start_reshape(struct mdinfo *sra, int
> already_running, if (!already_running)
>  		sysfs_set_num(sra, NULL, "sync_min",
> sync_max_to_set); 
> -        if (st->ss->external)
> +        if (sync_max_to_set)
>  		err = err ?: sysfs_set_num(sra, NULL, "sync_max",
> sync_max_to_set); else
>  		err = err ?: sysfs_set_str(sra, NULL, "sync_max",
> "max");

Hi Nigel,

I was trying to retest with your patch but still have the defect. I
analyzed it and found another reason, which causes this defect. In
validate_geometry_imsm function freesize and super is being checked and
return 1 if any of those is NULL. In my opinion 0 shall be returned
here, because it is an error and reshape should be stopped here. I will
prepare proper patch and send to review immediately.

King regards,
Kinga Tanska
Mariusz Tkaczyk Nov. 17, 2022, 2:07 p.m. UTC | #2
On Thu, 29 Sep 2022 11:35:21 +0200
Kinga Tanska <kinga.tanska@linux.intel.com> wrote:

> On Fri, 23 Sep 2022 10:26:35 -0400
> Nigel Croxon <ncroxon@redhat.com> wrote:
> 
> > After creating a raid array on top of a imsm container. Try to
> > grow the chunk size and the reshape will hang with zero progress.
> > The reason is the computation of sync_max_to_set value:
> > if (before_data_disks <= data_disks)
> >         sync_max_to_set = sra->reshape_progress / data_disks;
> >     else
> >         sync_max_to_set = (sra->component_size * data_disks
> >                        - sra->reshape_progress) / data_disks;
> > 
> > Can produce a zero result. Which is then used to set the maximum
> > sync value, causing zero progress to the reshape.  The change is to
> > test if the sync_max_to_set value is zero. And if so, set the sysfs
> > sync_max to "max".
> > 
> > Steps to Reproduce:
> > 1. Create a container and RAID0 array
> > mdadm -CR /dev/md/imsm -e imsm -n2 /dev/nvme0n1 /dev/nvme1n1
> > mdadm -CR  /dev/md/vol -l0 --chunk=16 -n2 /dev/nvme0n1 /dev/nvme1n1
> > 2. Wait for resync
> > 3. Try to grow the chunk size
> > mdadm --grow /dev/md/vol --chunk=256
> > 
> > Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> > ---
> >  Grow.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/Grow.c b/Grow.c
> > index 0f07a894..6c5021bc 100644
> > --- a/Grow.c
> > +++ b/Grow.c
> > @@ -943,7 +943,7 @@ int start_reshape(struct mdinfo *sra, int
> > already_running, if (!already_running)
> >  		sysfs_set_num(sra, NULL, "sync_min",
> > sync_max_to_set); 
> > -        if (st->ss->external)
> > +        if (sync_max_to_set)
> >  		err = err ?: sysfs_set_num(sra, NULL, "sync_max",
> > sync_max_to_set); else
> >  		err = err ?: sysfs_set_str(sra, NULL, "sync_max",
> > "max");
> 
> Hi Nigel,
> 
> I was trying to retest with your patch but still have the defect. I
> analyzed it and found another reason, which causes this defect. In
> validate_geometry_imsm function freesize and super is being checked and
> return 1 if any of those is NULL. In my opinion 0 shall be returned
> here, because it is an error and reshape should be stopped here. I will
> prepare proper patch and send to review immediately.
> 
Hi Nigel,
I agree with Kinga.
https://patchwork.kernel.org/project/linux-raid/patch/20221028025117.27048-1-kinga.tanska@intel.com/
Could you please retest the proposed patch on your side and provide feedback?

Thanks,
Mariusz
Mariusz Tkaczyk Feb. 1, 2023, 1:37 p.m. UTC | #3
Hi Nigel,
Ping?

Thanks,
Mariusz

On Thu, 17 Nov 2022 15:07:41 +0100
Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote:
> On Thu, 29 Sep 2022 11:35:21 +0200
> Kinga Tanska <kinga.tanska@linux.intel.com> wrote:
> 
> > On Fri, 23 Sep 2022 10:26:35 -0400
> > Nigel Croxon <ncroxon@redhat.com> wrote:
> >   
> > > After creating a raid array on top of a imsm container. Try to
> > > grow the chunk size and the reshape will hang with zero progress.
> > > The reason is the computation of sync_max_to_set value:
> > 
> > Hi Nigel,
> > 
> > I was trying to retest with your patch but still have the defect. I
> > analyzed it and found another reason, which causes this defect. In
> > validate_geometry_imsm function freesize and super is being checked and
> > return 1 if any of those is NULL. In my opinion 0 shall be returned
> > here, because it is an error and reshape should be stopped here. I will
> > prepare proper patch and send to review immediately.
> >   
> Hi Nigel,
> I agree with Kinga.
> https://patchwork.kernel.org/project/linux-raid/patch/20221028025117.27048-1-kinga.tanska@intel.com/
> Could you please retest the proposed patch on your side and provide feedback?
> 
> Thanks,
> Mariusz
Jes Sorensen March 8, 2023, 7:34 p.m. UTC | #4
On 2/1/23 08:37, Mariusz Tkaczyk wrote:
> Hi Nigel,
> Ping?

..... crickets ..... I'll close this one in patchwork if we don't hear
anything soon.

Thanks,
Jes

> Thanks,
> Mariusz
> 
> On Thu, 17 Nov 2022 15:07:41 +0100
> Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote:
>> On Thu, 29 Sep 2022 11:35:21 +0200
>> Kinga Tanska <kinga.tanska@linux.intel.com> wrote:
>>
>>> On Fri, 23 Sep 2022 10:26:35 -0400
>>> Nigel Croxon <ncroxon@redhat.com> wrote:
>>>   
>>>> After creating a raid array on top of a imsm container. Try to
>>>> grow the chunk size and the reshape will hang with zero progress.
>>>> The reason is the computation of sync_max_to_set value:
>>>
>>> Hi Nigel,
>>>
>>> I was trying to retest with your patch but still have the defect. I
>>> analyzed it and found another reason, which causes this defect. In
>>> validate_geometry_imsm function freesize and super is being checked and
>>> return 1 if any of those is NULL. In my opinion 0 shall be returned
>>> here, because it is an error and reshape should be stopped here. I will
>>> prepare proper patch and send to review immediately.
>>>   
>> Hi Nigel,
>> I agree with Kinga.
>> https://patchwork.kernel.org/project/linux-raid/patch/20221028025117.27048-1-kinga.tanska@intel.com/
>> Could you please retest the proposed patch on your side and provide feedback?
>>
>> Thanks,
>> Mariusz
>
diff mbox series

Patch

diff --git a/Grow.c b/Grow.c
index 0f07a894..6c5021bc 100644
--- a/Grow.c
+++ b/Grow.c
@@ -943,7 +943,7 @@  int start_reshape(struct mdinfo *sra, int already_running,
 	if (!already_running)
 		sysfs_set_num(sra, NULL, "sync_min", sync_max_to_set);
 
-        if (st->ss->external)
+        if (sync_max_to_set)
 		err = err ?: sysfs_set_num(sra, NULL, "sync_max", sync_max_to_set);
 	else
 		err = err ?: sysfs_set_str(sra, NULL, "sync_max", "max");