diff mbox series

[04/10] mdadm/Grow: sleep a while after removing disk in impose_level

Message ID 20240828021150.63240-5-xni@redhat.com (mailing list archive)
State New
Headers show
Series mdadm tests fix | expand

Checks

Context Check Description
mdraidci/vmtest-md-6_11-PR fail merge-conflict
mdraidci/vmtest-md-6_12-PR fail merge-conflict

Commit Message

Xiao Ni Aug. 28, 2024, 2:11 a.m. UTC
It needs to remove disks when reshaping from raid456 to raid0. In
kernel space it sets MD_RECOVERY_RUNNING. And it will fail to change
level. So wait sometime to let md thread to clear this flag.

This is found by test case 05r6tor0.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 Grow.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Xiao Ni Sept. 3, 2024, 12:53 a.m. UTC | #1
On Mon, Sep 2, 2024 at 6:14 PM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> On Wed, 28 Aug 2024 10:11:44 +0800
> Xiao Ni <xni@redhat.com> wrote:
>
> > It needs to remove disks when reshaping from raid456 to raid0. In
> > kernel space it sets MD_RECOVERY_RUNNING. And it will fail to change
> > level. So wait sometime to let md thread to clear this flag.
> >
> > This is found by test case 05r6tor0.
> >
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> >  Grow.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/Grow.c b/Grow.c
> > index 2a7587315817..aaf349e9722f 100644
> > --- a/Grow.c
> > +++ b/Grow.c
> > @@ -3028,6 +3028,12 @@ static int impose_level(int fd, int level, char
> > *devname, int verbose) makedev(disk.major, disk.minor));
> >                       hot_remove_disk(fd, makedev(disk.major, disk.minor),
> > 1); }
> > +             /*
> > +              * hot_remove_disk lets kernel set MD_RECOVERY_RUNNING
> > +              * and it can't set level. It needs to wait sometime
> > +              * to let md thread to clear the flag.
> > +              */
> > +             sleep_for(5, 0, true);
>

Hi Mariusz

> Shouldn't we check sysfs is shorter intervals? I know that is the simplest way but
> big sleeps are generally not good.
>
> I will merge it if you don't want to rework it but you need to add log that we
> are waiting 5 second for the user to not panic that it is frozen.

Which sysfs do you mean? If we have a better way, I want to choose it.

Best Regards
Xiao

>
> Thanks,
> Mariusz
>
Mariusz Tkaczyk Sept. 3, 2024, 7:22 a.m. UTC | #2
On Tue, 3 Sep 2024 08:53:42 +0800
Xiao Ni <xni@redhat.com> wrote:

> On Mon, Sep 2, 2024 at 6:14 PM Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > On Wed, 28 Aug 2024 10:11:44 +0800
> > Xiao Ni <xni@redhat.com> wrote:
> >  
> > > It needs to remove disks when reshaping from raid456 to raid0. In
> > > kernel space it sets MD_RECOVERY_RUNNING. And it will fail to change
> > > level. So wait sometime to let md thread to clear this flag.
> > >
> > > This is found by test case 05r6tor0.
> > >
> > > Signed-off-by: Xiao Ni <xni@redhat.com>
> > > ---
> > >  Grow.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/Grow.c b/Grow.c
> > > index 2a7587315817..aaf349e9722f 100644
> > > --- a/Grow.c
> > > +++ b/Grow.c
> > > @@ -3028,6 +3028,12 @@ static int impose_level(int fd, int level, char
> > > *devname, int verbose) makedev(disk.major, disk.minor));
> > >                       hot_remove_disk(fd, makedev(disk.major, disk.minor),
> > > 1); }
> > > +             /*
> > > +              * hot_remove_disk lets kernel set MD_RECOVERY_RUNNING
> > > +              * and it can't set level. It needs to wait sometime
> > > +              * to let md thread to clear the flag.
> > > +              */
> > > +             sleep_for(5, 0, true);  
> >  
> 
> Hi Mariusz
> 
> > Shouldn't we check sysfs is shorter intervals? I know that is the simplest
> > way but big sleeps are generally not good.
> >
> > I will merge it if you don't want to rework it but you need to add log that
> > we are waiting 5 second for the user to not panic that it is frozen.  
> 
> Which sysfs do you mean? If we have a better way, I want to choose it.
> 

If we are sending hot remove to the disk, we can check if there is path
available: /sys/block/<mddev>/md/dev-{devnm}
if not, then device has been finally removed.
Eventually, we can see same in mdstat but checking path looks simpler to me.

Thanks,
Mariusz
Xiao Ni Sept. 11, 2024, 8:42 a.m. UTC | #3
在 2024/9/3 下午3:22, Mariusz Tkaczyk 写道:
> On Tue, 3 Sep 2024 08:53:42 +0800
> Xiao Ni <xni@redhat.com> wrote:
>
>> On Mon, Sep 2, 2024 at 6:14 PM Mariusz Tkaczyk
>> <mariusz.tkaczyk@linux.intel.com> wrote:
>>> On Wed, 28 Aug 2024 10:11:44 +0800
>>> Xiao Ni <xni@redhat.com> wrote:
>>>   
>>>> It needs to remove disks when reshaping from raid456 to raid0. In
>>>> kernel space it sets MD_RECOVERY_RUNNING. And it will fail to change
>>>> level. So wait sometime to let md thread to clear this flag.
>>>>
>>>> This is found by test case 05r6tor0.
>>>>
>>>> Signed-off-by: Xiao Ni <xni@redhat.com>
>>>> ---
>>>>   Grow.c | 6 ++++++
>>>>   1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/Grow.c b/Grow.c
>>>> index 2a7587315817..aaf349e9722f 100644
>>>> --- a/Grow.c
>>>> +++ b/Grow.c
>>>> @@ -3028,6 +3028,12 @@ static int impose_level(int fd, int level, char
>>>> *devname, int verbose) makedev(disk.major, disk.minor));
>>>>                        hot_remove_disk(fd, makedev(disk.major, disk.minor),
>>>> 1); }
>>>> +             /*
>>>> +              * hot_remove_disk lets kernel set MD_RECOVERY_RUNNING
>>>> +              * and it can't set level. It needs to wait sometime
>>>> +              * to let md thread to clear the flag.
>>>> +              */
>>>> +             sleep_for(5, 0, true);
>>>   
>> Hi Mariusz
>>
>>> Shouldn't we check sysfs is shorter intervals? I know that is the simplest
>>> way but big sleeps are generally not good.
>>>
>>> I will merge it if you don't want to rework it but you need to add log that
>>> we are waiting 5 second for the user to not panic that it is frozen.
>> Which sysfs do you mean? If we have a better way, I want to choose it.
>>
> If we are sending hot remove to the disk, we can check if there is path
> available: /sys/block/<mddev>/md/dev-{devnm}
> if not, then device has been finally removed.
> Eventually, we can see same in mdstat but checking path looks simpler to me.
>
> Thanks,
> Mariusz


Hi Mariusz

I check you method and it doesn't work. There are two steps in kernel 
space and they are async.

1. remove disk including remove the sysfs directory, set 
MD_RECOVERY_NEEDED and wake up md thread

2. Because MD_RECOVERY_NEEDED is set, kernel space sets 
MD_RECOVERY_RUNNING and queue a sync work. It doesn't do anything and 
clear MD_RECOVERY_RUNNING

So there is a time window. It depends on machines. Sometimes it fails 
when setting new level because MD_RECOVERY_RUNNING is set. Maybe we can 
add some check when removing disk. If it doesn't need to do 
sync/recovery, we don't need to set MD_RECOVERY_NEEDED. But now, we can 
add a sleep here as a solution. I'll add a log here to give admin.

Best Regards

Xiao
diff mbox series

Patch

diff --git a/Grow.c b/Grow.c
index 2a7587315817..aaf349e9722f 100644
--- a/Grow.c
+++ b/Grow.c
@@ -3028,6 +3028,12 @@  static int impose_level(int fd, int level, char *devname, int verbose)
 			      makedev(disk.major, disk.minor));
 			hot_remove_disk(fd, makedev(disk.major, disk.minor), 1);
 		}
+		/*
+		 * hot_remove_disk lets kernel set MD_RECOVERY_RUNNING
+		 * and it can't set level. It needs to wait sometime
+		 * to let md thread to clear the flag.
+		 */
+		sleep_for(5, 0, true);
 	}
 	c = map_num(pers, level);
 	if (c) {