[V3,md-6.12,1/1] md: add new_level sysfs interface

Message ID	20240904235453.99120-1-xni@redhat.com (mailing list archive)
State	Accepted
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DBE418454D for <linux-raid@vger.kernel.org>; Wed, 4 Sep 2024 23:55:12 +0000 (UTC) From: Xiao Ni <xni@redhat.com> To: song@kernel.org Cc: linux-raid@vger.kernel.org, pmenzel@molgen.mpg.de, yukuai1@huaweicloud.com Subject: [PATCH V3 md-6.12 1/1] md: add new_level sysfs interface Date: Thu, 5 Sep 2024 07:54:53 +0800 Message-Id: <20240904235453.99120-1-xni@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[V3,md-6.12,1/1] md: add new_level sysfs interface \| expand [V3,md-6.12,1/1] md: add new_level sysfs interface

Context	Check	Description
mdraidci/vmtest-md-6_12-PR	success	PR summary
mdraidci/vmtest-md-6_12-VM_Test-0	success	Logs for per-patch-testing

Xiao Ni Sept. 4, 2024, 11:54 p.m. UTC

Reshape needs to specify a backup file when it can't update data offset
of member disks. For this situation, first, it starts reshape and then
it kicks off mdadm-grow-continue service which does backup job and
monitors the reshape process. The service is a new process, so it needs
to read superblock from member disks to get information.

But in the first step, it doesn't update new level in superblock. So
in the second step, the new level that systemd service reads from
superblock is wrong. It can't change to the right new level after reshape
finishes. This interface is used to update new level during reshape
progress.

Reproduce steps:
mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
mdadm --wait /dev/md0
mdadm /dev/md0 --grow -l5 --backup=backup
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]

Test case 07changelevels from mdadm regression tests can trigger this
problem.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
V3: explain more about the root cause
V2: add detail about test information
 drivers/md/md.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Song Liu Sept. 5, 2024, 5:52 a.m. UTC | #1

On Wed, Sep 4, 2024 at 4:55 PM Xiao Ni <xni@redhat.com> wrote:
>
> Reshape needs to specify a backup file when it can't update data offset
> of member disks. For this situation, first, it starts reshape and then
> it kicks off mdadm-grow-continue service which does backup job and
> monitors the reshape process. The service is a new process, so it needs
> to read superblock from member disks to get information.
>
> But in the first step, it doesn't update new level in superblock. So
> in the second step, the new level that systemd service reads from
> superblock is wrong. It can't change to the right new level after reshape
> finishes. This interface is used to update new level during reshape
> progress.
>
> Reproduce steps:
> mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
> mdadm --wait /dev/md0
> mdadm /dev/md0 --grow -l5 --backup=backup
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
> md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]
>
> Test case 07changelevels from mdadm regression tests can trigger this
> problem.

Please briefly describe how the user space (mdadm or something else)
will use the new API to fix this issue.

No need to resend the patch. You can just reply here.

Thanks,
Song

>
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
> V3: explain more about the root cause
> V2: add detail about test information
>  drivers/md/md.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d3a837506a36..3c354e7a7825 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
>  static struct md_sysfs_entry md_level =
>  __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store);
>
> +static ssize_t
> +new_level_show(struct mddev *mddev, char *page)
> +{
> +       return sprintf(page, "%d\n", mddev->new_level);
> +}
> +
> +static ssize_t
> +new_level_store(struct mddev *mddev, const char *buf, size_t len)
> +{
> +       unsigned int n;
> +       int err;
> +
> +       err = kstrtouint(buf, 10, &n);
> +       if (err < 0)
> +               return err;
> +       err = mddev_lock(mddev);
> +       if (err)
> +               return err;
> +
> +       mddev->new_level = n;
> +       md_update_sb(mddev, 1);
> +
> +       mddev_unlock(mddev);
> +       return len;
> +}
> +static struct md_sysfs_entry md_new_level =
> +__ATTR(new_level, 0664, new_level_show, new_level_store);
> +
>  static ssize_t
>  layout_show(struct mddev *mddev, char *page)
>  {
> @@ -5666,6 +5694,7 @@ __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize_policy_show,
>
>  static struct attribute *md_default_attrs[] = {
>         &md_level.attr,
> +       &md_new_level.attr,
>         &md_layout.attr,
>         &md_raid_disks.attr,
>         &md_uuid.attr,
> --
> 2.32.0 (Apple Git-132)
>

Song Liu Sept. 5, 2024, 5:55 a.m. UTC | #2

On Wed, Sep 4, 2024 at 4:55 PM Xiao Ni <xni@redhat.com> wrote:
>
> Reshape needs to specify a backup file when it can't update data offset
> of member disks. For this situation, first, it starts reshape and then
> it kicks off mdadm-grow-continue service which does backup job and
> monitors the reshape process. The service is a new process, so it needs
> to read superblock from member disks to get information.
>
> But in the first step, it doesn't update new level in superblock. So
> in the second step, the new level that systemd service reads from
> superblock is wrong. It can't change to the right new level after reshape
> finishes. This interface is used to update new level during reshape
> progress.
>
> Reproduce steps:
> mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
> mdadm --wait /dev/md0
> mdadm /dev/md0 --grow -l5 --backup=backup
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
> md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]
>
> Test case 07changelevels from mdadm regression tests can trigger this
> problem.
>
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
> V3: explain more about the root cause
> V2: add detail about test information
>  drivers/md/md.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d3a837506a36..3c354e7a7825 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
>  static struct md_sysfs_entry md_level =
>  __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store);
>
> +static ssize_t
> +new_level_show(struct mddev *mddev, char *page)
> +{
> +       return sprintf(page, "%d\n", mddev->new_level);
> +}
> +
> +static ssize_t
> +new_level_store(struct mddev *mddev, const char *buf, size_t len)
> +{
> +       unsigned int n;
> +       int err;
> +
> +       err = kstrtouint(buf, 10, &n);
> +       if (err < 0)
> +               return err;
> +       err = mddev_lock(mddev);
> +       if (err)
> +               return err;
> +
> +       mddev->new_level = n;
> +       md_update_sb(mddev, 1);
> +
> +       mddev_unlock(mddev);
> +       return len;
> +}
> +static struct md_sysfs_entry md_new_level =
> +__ATTR(new_level, 0664, new_level_show, new_level_store);

Actually, since we can read and write "new_level", please describe how
it will be used (read and write).

Thanks,
Song

Xiao Ni Sept. 6, 2024, 2:07 a.m. UTC | #3

On Thu, Sep 5, 2024 at 1:55 PM Song Liu <song@kernel.org> wrote:
>
> On Wed, Sep 4, 2024 at 4:55 PM Xiao Ni <xni@redhat.com> wrote:
> >
> > Reshape needs to specify a backup file when it can't update data offset
> > of member disks. For this situation, first, it starts reshape and then
> > it kicks off mdadm-grow-continue service which does backup job and
> > monitors the reshape process. The service is a new process, so it needs
> > to read superblock from member disks to get information.
> >
> > But in the first step, it doesn't update new level in superblock. So
> > in the second step, the new level that systemd service reads from
> > superblock is wrong. It can't change to the right new level after reshape
> > finishes. This interface is used to update new level during reshape
> > progress.
> >
> > Reproduce steps:
> > mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
> > mdadm --wait /dev/md0
> > mdadm /dev/md0 --grow -l5 --backup=backup
> > cat /proc/mdstat
> > Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
> > md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]
> >
> > Test case 07changelevels from mdadm regression tests can trigger this
> > problem.
> >
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> > V3: explain more about the root cause
> > V2: add detail about test information
> >  drivers/md/md.c | 29 +++++++++++++++++++++++++++++
> >  1 file changed, 29 insertions(+)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index d3a837506a36..3c354e7a7825 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
> >  static struct md_sysfs_entry md_level =
> >  __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store);
> >
> > +static ssize_t
> > +new_level_show(struct mddev *mddev, char *page)
> > +{
> > +       return sprintf(page, "%d\n", mddev->new_level);
> > +}
> > +
> > +static ssize_t
> > +new_level_store(struct mddev *mddev, const char *buf, size_t len)
> > +{
> > +       unsigned int n;
> > +       int err;
> > +
> > +       err = kstrtouint(buf, 10, &n);
> > +       if (err < 0)
> > +               return err;
> > +       err = mddev_lock(mddev);
> > +       if (err)
> > +               return err;
> > +
> > +       mddev->new_level = n;
> > +       md_update_sb(mddev, 1);
> > +
> > +       mddev_unlock(mddev);
> > +       return len;
> > +}
> > +static struct md_sysfs_entry md_new_level =
> > +__ATTR(new_level, 0664, new_level_show, new_level_store);
>
> Actually, since we can read and write "new_level", please describe how
> it will be used (read and write).
>
> Thanks,
> Song
>

Hi Song

I rewrite the comments like this:

Now reshape supports two ways: with backup file or without backup file.
For the situation without backup file, it needs to change data offset.
It doesn't need systemd service mdadm-grow-continue. So it can finish
the reshape job in one process environment. It can know the new level
from mdadm --grow command and can change to new level after reshape
finishes.

For the situation with backup file, it needs systemd service
mdadm-grow-continue to monitor reshape progress. So there are two process
environments. One is mdadm --grow command whick kicks off reshape. It
doesn't wait reshape to finish and wake up mdadm-grow-continue service.
Two is the service. But it doesn't know the new level. Because in the
first step, it doesn't sync the information to kernel space. So the new
level which reads from superblock is wrong.

In kernel space mddev->new_level is used to record the new level when
doing reshape. This patch tries to add a new interface to help mdadm
can update new level and sync it to metadata. So it can read the right
new level when mdadm-grow-continue starts. And it can change to the new
level after reshape finishes.

Reproduce steps:
mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
mdadm --wait /dev/md0
mdadm /dev/md0 --grow -l5 --backup=backup
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]

Test case 07changelevels from mdadm regression tests can trigger this
problem.

If you want me to re-send a formal patch, I'll do it.

Best Regards
Xiao

Song Liu Sept. 6, 2024, 6:15 p.m. UTC | #4

On Thu, Sep 5, 2024 at 7:07 PM Xiao Ni <xni@redhat.com> wrote:
>
[...]
>
> Hi Song
>
> I rewrite the comments like this:
>
> Now reshape supports two ways: with backup file or without backup file.
> For the situation without backup file, it needs to change data offset.
> It doesn't need systemd service mdadm-grow-continue. So it can finish
> the reshape job in one process environment. It can know the new level
> from mdadm --grow command and can change to new level after reshape
> finishes.
>
> For the situation with backup file, it needs systemd service
> mdadm-grow-continue to monitor reshape progress. So there are two process
> environments. One is mdadm --grow command whick kicks off reshape. It
> doesn't wait reshape to finish and wake up mdadm-grow-continue service.
> Two is the service. But it doesn't know the new level. Because in the
> first step, it doesn't sync the information to kernel space. So the new
> level which reads from superblock is wrong.
>
> In kernel space mddev->new_level is used to record the new level when
> doing reshape. This patch tries to add a new interface to help mdadm
> can update new level and sync it to metadata. So it can read the right
> new level when mdadm-grow-continue starts. And it can change to the new
> level after reshape finishes.
>
> Reproduce steps:
> mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
> mdadm --wait /dev/md0
> mdadm /dev/md0 --grow -l5 --backup=backup
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
> md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]
>
> Test case 07changelevels from mdadm regression tests can trigger this
> problem.

Applied to md-6.12 with a commit log based on this version.

Thanks,
Song

[V3,md-6.12,1/1] md: add new_level sysfs interface

Checks

Commit Message

Comments

Patch