Message ID | 20240904235453.99120-1-xni@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [V3,md-6.12,1/1] md: add new_level sysfs interface | expand |
Context | Check | Description |
---|---|---|
mdraidci/vmtest-md-6_12-PR | success | PR summary |
mdraidci/vmtest-md-6_12-VM_Test-0 | success | Logs for per-patch-testing |
On Wed, Sep 4, 2024 at 4:55 PM Xiao Ni <xni@redhat.com> wrote: > > Reshape needs to specify a backup file when it can't update data offset > of member disks. For this situation, first, it starts reshape and then > it kicks off mdadm-grow-continue service which does backup job and > monitors the reshape process. The service is a new process, so it needs > to read superblock from member disks to get information. > > But in the first step, it doesn't update new level in superblock. So > in the second step, the new level that systemd service reads from > superblock is wrong. It can't change to the right new level after reshape > finishes. This interface is used to update new level during reshape > progress. > > Reproduce steps: > mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3] > mdadm --wait /dev/md0 > mdadm /dev/md0 --grow -l5 --backup=backup > cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] > md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0] > > Test case 07changelevels from mdadm regression tests can trigger this > problem. Please briefly describe how the user space (mdadm or something else) will use the new API to fix this issue. No need to resend the patch. You can just reply here. Thanks, Song > > Signed-off-by: Xiao Ni <xni@redhat.com> > --- > V3: explain more about the root cause > V2: add detail about test information > drivers/md/md.c | 29 +++++++++++++++++++++++++++++ > 1 file changed, 29 insertions(+) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index d3a837506a36..3c354e7a7825 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len) > static struct md_sysfs_entry md_level = > __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store); > > +static ssize_t > +new_level_show(struct mddev *mddev, char *page) > +{ > + return sprintf(page, "%d\n", mddev->new_level); > +} > + > +static ssize_t > +new_level_store(struct mddev *mddev, const char *buf, size_t len) > +{ > + unsigned int n; > + int err; > + > + err = kstrtouint(buf, 10, &n); > + if (err < 0) > + return err; > + err = mddev_lock(mddev); > + if (err) > + return err; > + > + mddev->new_level = n; > + md_update_sb(mddev, 1); > + > + mddev_unlock(mddev); > + return len; > +} > +static struct md_sysfs_entry md_new_level = > +__ATTR(new_level, 0664, new_level_show, new_level_store); > + > static ssize_t > layout_show(struct mddev *mddev, char *page) > { > @@ -5666,6 +5694,7 @@ __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize_policy_show, > > static struct attribute *md_default_attrs[] = { > &md_level.attr, > + &md_new_level.attr, > &md_layout.attr, > &md_raid_disks.attr, > &md_uuid.attr, > -- > 2.32.0 (Apple Git-132) >
On Wed, Sep 4, 2024 at 4:55 PM Xiao Ni <xni@redhat.com> wrote: > > Reshape needs to specify a backup file when it can't update data offset > of member disks. For this situation, first, it starts reshape and then > it kicks off mdadm-grow-continue service which does backup job and > monitors the reshape process. The service is a new process, so it needs > to read superblock from member disks to get information. > > But in the first step, it doesn't update new level in superblock. So > in the second step, the new level that systemd service reads from > superblock is wrong. It can't change to the right new level after reshape > finishes. This interface is used to update new level during reshape > progress. > > Reproduce steps: > mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3] > mdadm --wait /dev/md0 > mdadm /dev/md0 --grow -l5 --backup=backup > cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] > md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0] > > Test case 07changelevels from mdadm regression tests can trigger this > problem. > > Signed-off-by: Xiao Ni <xni@redhat.com> > --- > V3: explain more about the root cause > V2: add detail about test information > drivers/md/md.c | 29 +++++++++++++++++++++++++++++ > 1 file changed, 29 insertions(+) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index d3a837506a36..3c354e7a7825 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len) > static struct md_sysfs_entry md_level = > __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store); > > +static ssize_t > +new_level_show(struct mddev *mddev, char *page) > +{ > + return sprintf(page, "%d\n", mddev->new_level); > +} > + > +static ssize_t > +new_level_store(struct mddev *mddev, const char *buf, size_t len) > +{ > + unsigned int n; > + int err; > + > + err = kstrtouint(buf, 10, &n); > + if (err < 0) > + return err; > + err = mddev_lock(mddev); > + if (err) > + return err; > + > + mddev->new_level = n; > + md_update_sb(mddev, 1); > + > + mddev_unlock(mddev); > + return len; > +} > +static struct md_sysfs_entry md_new_level = > +__ATTR(new_level, 0664, new_level_show, new_level_store); Actually, since we can read and write "new_level", please describe how it will be used (read and write). Thanks, Song
On Thu, Sep 5, 2024 at 1:55 PM Song Liu <song@kernel.org> wrote: > > On Wed, Sep 4, 2024 at 4:55 PM Xiao Ni <xni@redhat.com> wrote: > > > > Reshape needs to specify a backup file when it can't update data offset > > of member disks. For this situation, first, it starts reshape and then > > it kicks off mdadm-grow-continue service which does backup job and > > monitors the reshape process. The service is a new process, so it needs > > to read superblock from member disks to get information. > > > > But in the first step, it doesn't update new level in superblock. So > > in the second step, the new level that systemd service reads from > > superblock is wrong. It can't change to the right new level after reshape > > finishes. This interface is used to update new level during reshape > > progress. > > > > Reproduce steps: > > mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3] > > mdadm --wait /dev/md0 > > mdadm /dev/md0 --grow -l5 --backup=backup > > cat /proc/mdstat > > Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] > > md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0] > > > > Test case 07changelevels from mdadm regression tests can trigger this > > problem. > > > > Signed-off-by: Xiao Ni <xni@redhat.com> > > --- > > V3: explain more about the root cause > > V2: add detail about test information > > drivers/md/md.c | 29 +++++++++++++++++++++++++++++ > > 1 file changed, 29 insertions(+) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index d3a837506a36..3c354e7a7825 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len) > > static struct md_sysfs_entry md_level = > > __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store); > > > > +static ssize_t > > +new_level_show(struct mddev *mddev, char *page) > > +{ > > + return sprintf(page, "%d\n", mddev->new_level); > > +} > > + > > +static ssize_t > > +new_level_store(struct mddev *mddev, const char *buf, size_t len) > > +{ > > + unsigned int n; > > + int err; > > + > > + err = kstrtouint(buf, 10, &n); > > + if (err < 0) > > + return err; > > + err = mddev_lock(mddev); > > + if (err) > > + return err; > > + > > + mddev->new_level = n; > > + md_update_sb(mddev, 1); > > + > > + mddev_unlock(mddev); > > + return len; > > +} > > +static struct md_sysfs_entry md_new_level = > > +__ATTR(new_level, 0664, new_level_show, new_level_store); > > Actually, since we can read and write "new_level", please describe how > it will be used (read and write). > > Thanks, > Song > Hi Song I rewrite the comments like this: Now reshape supports two ways: with backup file or without backup file. For the situation without backup file, it needs to change data offset. It doesn't need systemd service mdadm-grow-continue. So it can finish the reshape job in one process environment. It can know the new level from mdadm --grow command and can change to new level after reshape finishes. For the situation with backup file, it needs systemd service mdadm-grow-continue to monitor reshape progress. So there are two process environments. One is mdadm --grow command whick kicks off reshape. It doesn't wait reshape to finish and wake up mdadm-grow-continue service. Two is the service. But it doesn't know the new level. Because in the first step, it doesn't sync the information to kernel space. So the new level which reads from superblock is wrong. In kernel space mddev->new_level is used to record the new level when doing reshape. This patch tries to add a new interface to help mdadm can update new level and sync it to metadata. So it can read the right new level when mdadm-grow-continue starts. And it can change to the new level after reshape finishes. Reproduce steps: mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3] mdadm --wait /dev/md0 mdadm /dev/md0 --grow -l5 --backup=backup cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0] Test case 07changelevels from mdadm regression tests can trigger this problem. If you want me to re-send a formal patch, I'll do it. Best Regards Xiao
On Thu, Sep 5, 2024 at 7:07 PM Xiao Ni <xni@redhat.com> wrote: > [...] > > Hi Song > > I rewrite the comments like this: > > Now reshape supports two ways: with backup file or without backup file. > For the situation without backup file, it needs to change data offset. > It doesn't need systemd service mdadm-grow-continue. So it can finish > the reshape job in one process environment. It can know the new level > from mdadm --grow command and can change to new level after reshape > finishes. > > For the situation with backup file, it needs systemd service > mdadm-grow-continue to monitor reshape progress. So there are two process > environments. One is mdadm --grow command whick kicks off reshape. It > doesn't wait reshape to finish and wake up mdadm-grow-continue service. > Two is the service. But it doesn't know the new level. Because in the > first step, it doesn't sync the information to kernel space. So the new > level which reads from superblock is wrong. > > In kernel space mddev->new_level is used to record the new level when > doing reshape. This patch tries to add a new interface to help mdadm > can update new level and sync it to metadata. So it can read the right > new level when mdadm-grow-continue starts. And it can change to the new > level after reshape finishes. > > Reproduce steps: > mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3] > mdadm --wait /dev/md0 > mdadm /dev/md0 --grow -l5 --backup=backup > cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] > md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0] > > Test case 07changelevels from mdadm regression tests can trigger this > problem. Applied to md-6.12 with a commit log based on this version. Thanks, Song
diff --git a/drivers/md/md.c b/drivers/md/md.c index d3a837506a36..3c354e7a7825 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len) static struct md_sysfs_entry md_level = __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store); +static ssize_t +new_level_show(struct mddev *mddev, char *page) +{ + return sprintf(page, "%d\n", mddev->new_level); +} + +static ssize_t +new_level_store(struct mddev *mddev, const char *buf, size_t len) +{ + unsigned int n; + int err; + + err = kstrtouint(buf, 10, &n); + if (err < 0) + return err; + err = mddev_lock(mddev); + if (err) + return err; + + mddev->new_level = n; + md_update_sb(mddev, 1); + + mddev_unlock(mddev); + return len; +} +static struct md_sysfs_entry md_new_level = +__ATTR(new_level, 0664, new_level_show, new_level_store); + static ssize_t layout_show(struct mddev *mddev, char *page) { @@ -5666,6 +5694,7 @@ __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize_policy_show, static struct attribute *md_default_attrs[] = { &md_level.attr, + &md_new_level.attr, &md_layout.attr, &md_raid_disks.attr, &md_uuid.attr,
Reshape needs to specify a backup file when it can't update data offset of member disks. For this situation, first, it starts reshape and then it kicks off mdadm-grow-continue service which does backup job and monitors the reshape process. The service is a new process, so it needs to read superblock from member disks to get information. But in the first step, it doesn't update new level in superblock. So in the second step, the new level that systemd service reads from superblock is wrong. It can't change to the right new level after reshape finishes. This interface is used to update new level during reshape progress. Reproduce steps: mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3] mdadm --wait /dev/md0 mdadm /dev/md0 --grow -l5 --backup=backup cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0] Test case 07changelevels from mdadm regression tests can trigger this problem. Signed-off-by: Xiao Ni <xni@redhat.com> --- V3: explain more about the root cause V2: add detail about test information drivers/md/md.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)