diff mbox series

md/raid6: use valid sector values to determine if an I/O should wait on the reshape

Message ID 20231128181233.6187-1-djeffery@redhat.com (mailing list archive)
State Accepted, archived
Headers show
Series md/raid6: use valid sector values to determine if an I/O should wait on the reshape | expand

Commit Message

David Jeffery Nov. 28, 2023, 6:11 p.m. UTC
During a reshape or a RAID6 array such as expanding by adding an additional
disk, I/Os to the region of the array which have not yet been reshaped can
stall indefinitely. This is from errors in the stripe_ahead_of_reshape
function causing md to think the I/O is to a region in the actively
undergoing the reshape.

stripe_ahead_of_reshape fails to account for the q disk having a sector
value of 0. By not excluding the q disk from the for loop, raid6 will always
generate a min_sector value of 0, causing a return value which stalls.

The function's max_sector calculation also uses min() when it should use
max(), causing the max_sector value to always be 0. During a backwards
rebuild this can cause the opposite problem where it allows I/O to advance
when it should wait.

Fixing these errors will allow safe I/O to advance in a timely manner and
delay only I/O which is unsafe due to stripes in the middle of undergoing
the reshape.

Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
---
 drivers/md/raid5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Song Liu Nov. 28, 2023, 6:44 p.m. UTC | #1
Hi David and Laurence,

On Tue, Nov 28, 2023 at 10:13 AM David Jeffery <djeffery@redhat.com> wrote:
>
> During a reshape or a RAID6 array such as expanding by adding an additional
> disk, I/Os to the region of the array which have not yet been reshaped can
> stall indefinitely. This is from errors in the stripe_ahead_of_reshape
> function causing md to think the I/O is to a region in the actively
> undergoing the reshape.
>
> stripe_ahead_of_reshape fails to account for the q disk having a sector
> value of 0. By not excluding the q disk from the for loop, raid6 will always
> generate a min_sector value of 0, causing a return value which stalls.
>
> The function's max_sector calculation also uses min() when it should use
> max(), causing the max_sector value to always be 0. During a backwards
> rebuild this can cause the opposite problem where it allows I/O to advance
> when it should wait.
>
> Fixing these errors will allow safe I/O to advance in a timely manner and
> delay only I/O which is unsafe due to stripes in the middle of undergoing
> the reshape.
>
> Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
> Signed-off-by: David Jeffery <djeffery@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>

Thanks for the fix!

Can we add a test to mdadm/tests to cover this case? (Not sure how reliable
the test will be).

Song

> ---
>  drivers/md/raid5.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index dc031d42f53b..26e1e8a5e941 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
>         int dd_idx;
>
>         for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
> -               if (dd_idx == sh->pd_idx)
> +               if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
>                         continue;
>
>                 min_sector = min(min_sector, sh->dev[dd_idx].sector);
> -               max_sector = min(max_sector, sh->dev[dd_idx].sector);
> +               max_sector = max(max_sector, sh->dev[dd_idx].sector);
>         }
>
>         spin_lock_irq(&conf->device_lock);
> --
> 2.43.0
>
>
Laurence Oberman Nov. 28, 2023, 6:50 p.m. UTC | #2
On Tue, 2023-11-28 at 13:11 -0500, David Jeffery wrote:
> During a reshape or a RAID6 array such as expanding by adding an
> additional
> disk, I/Os to the region of the array which have not yet been
> reshaped can
> stall indefinitely. This is from errors in the
> stripe_ahead_of_reshape
> function causing md to think the I/O is to a region in the actively
> undergoing the reshape.
> 
> stripe_ahead_of_reshape fails to account for the q disk having a
> sector
> value of 0. By not excluding the q disk from the for loop, raid6 will
> always
> generate a min_sector value of 0, causing a return value which
> stalls.
> 
> The function's max_sector calculation also uses min() when it should
> use
> max(), causing the max_sector value to always be 0. During a
> backwards
> rebuild this can cause the opposite problem where it allows I/O to
> advance
> when it should wait.
> 
> Fixing these errors will allow safe I/O to advance in a timely manner
> and
> delay only I/O which is unsafe due to stripes in the middle of
> undergoing
> the reshape.
> 
> Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for
> reshape progress")
> Signed-off-by: David Jeffery <djeffery@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> ---
>  drivers/md/raid5.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index dc031d42f53b..26e1e8a5e941 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct
> mddev *mddev, struct r5conf *conf,
>         int dd_idx;
>  
>         for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
> -               if (dd_idx == sh->pd_idx)
> +               if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
>                         continue;
>  
>                 min_sector = min(min_sector, sh->dev[dd_idx].sector);
> -               max_sector = min(max_sector, sh->dev[dd_idx].sector);
> +               max_sector = max(max_sector, sh->dev[dd_idx].sector);
>         }
>  
>         spin_lock_irq(&conf->device_lock);

Nice work David as usual, this issue has lurked for a while it seems.
Thanks for all you do for Red Hat and upstream.

Regards
Laurence
Laurence Oberman Nov. 28, 2023, 7:01 p.m. UTC | #3
On Tue, 2023-11-28 at 10:44 -0800, Song Liu wrote:
> Hi David and Laurence,
> 
> On Tue, Nov 28, 2023 at 10:13 AM David Jeffery <djeffery@redhat.com>
> wrote:
> > 
> > During a reshape or a RAID6 array such as expanding by adding an
> > additional
> > disk, I/Os to the region of the array which have not yet been
> > reshaped can
> > stall indefinitely. This is from errors in the
> > stripe_ahead_of_reshape
> > function causing md to think the I/O is to a region in the actively
> > undergoing the reshape.
> > 
> > stripe_ahead_of_reshape fails to account for the q disk having a
> > sector
> > value of 0. By not excluding the q disk from the for loop, raid6
> > will always
> > generate a min_sector value of 0, causing a return value which
> > stalls.
> > 
> > The function's max_sector calculation also uses min() when it
> > should use
> > max(), causing the max_sector value to always be 0. During a
> > backwards
> > rebuild this can cause the opposite problem where it allows I/O to
> > advance
> > when it should wait.
> > 
> > Fixing these errors will allow safe I/O to advance in a timely
> > manner and
> > delay only I/O which is unsafe due to stripes in the middle of
> > undergoing
> > the reshape.
> > 
> > Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head
> > for reshape progress")
> > Signed-off-by: David Jeffery <djeffery@redhat.com>
> > Tested-by: Laurence Oberman <loberman@redhat.com>
> 
> Thanks for the fix!
> 
> Can we add a test to mdadm/tests to cover this case? (Not sure how
> reliable
> the test will be).
> 
> Song
> 
> > ---
> >  drivers/md/raid5.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index dc031d42f53b..26e1e8a5e941 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct
> > mddev *mddev, struct r5conf *conf,
> >         int dd_idx;
> > 
> >         for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
> > -               if (dd_idx == sh->pd_idx)
> > +               if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
> >                         continue;
> > 
> >                 min_sector = min(min_sector, sh-
> > >dev[dd_idx].sector);
> > -               max_sector = min(max_sector, sh-
> > >dev[dd_idx].sector);
> > +               max_sector = max(max_sector, sh-
> > >dev[dd_idx].sector);
> >         }
> > 
> >         spin_lock_irq(&conf->device_lock);
> > --
> > 2.43.0
> > 
> > 
> 
Hello Song

The way this was discovered is the customer forced a reshape of a raid6
device by adding an additional device. 
The reshape then sess the hang.

# mdadm /dev/md0 -a /dev/nvme18n1
mdadm: added /dev/nvme18n1

# mdadm --grow --raid-devices=6 /dev/md0

The problem was we needed a test kernel just for the customer to
recover, because on reboot the reshape starts again and hangs the
device access.

Such steps could be added for a test but David will know maybe an
easier way for a test.

Regards
Laurence
diff mbox series

Patch

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc031d42f53b..26e1e8a5e941 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5892,11 +5892,11 @@  static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
 	int dd_idx;
 
 	for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
-		if (dd_idx == sh->pd_idx)
+		if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
 			continue;
 
 		min_sector = min(min_sector, sh->dev[dd_idx].sector);
-		max_sector = min(max_sector, sh->dev[dd_idx].sector);
+		max_sector = max(max_sector, sh->dev[dd_idx].sector);
 	}
 
 	spin_lock_irq(&conf->device_lock);