Message ID | 20231128181233.6187-1-djeffery@redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | md/raid6: use valid sector values to determine if an I/O should wait on the reshape | expand |
Hi David and Laurence, On Tue, Nov 28, 2023 at 10:13 AM David Jeffery <djeffery@redhat.com> wrote: > > During a reshape or a RAID6 array such as expanding by adding an additional > disk, I/Os to the region of the array which have not yet been reshaped can > stall indefinitely. This is from errors in the stripe_ahead_of_reshape > function causing md to think the I/O is to a region in the actively > undergoing the reshape. > > stripe_ahead_of_reshape fails to account for the q disk having a sector > value of 0. By not excluding the q disk from the for loop, raid6 will always > generate a min_sector value of 0, causing a return value which stalls. > > The function's max_sector calculation also uses min() when it should use > max(), causing the max_sector value to always be 0. During a backwards > rebuild this can cause the opposite problem where it allows I/O to advance > when it should wait. > > Fixing these errors will allow safe I/O to advance in a timely manner and > delay only I/O which is unsafe due to stripes in the middle of undergoing > the reshape. > > Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress") > Signed-off-by: David Jeffery <djeffery@redhat.com> > Tested-by: Laurence Oberman <loberman@redhat.com> Thanks for the fix! Can we add a test to mdadm/tests to cover this case? (Not sure how reliable the test will be). Song > --- > drivers/md/raid5.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index dc031d42f53b..26e1e8a5e941 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf, > int dd_idx; > > for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) { > - if (dd_idx == sh->pd_idx) > + if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx) > continue; > > min_sector = min(min_sector, sh->dev[dd_idx].sector); > - max_sector = min(max_sector, sh->dev[dd_idx].sector); > + max_sector = max(max_sector, sh->dev[dd_idx].sector); > } > > spin_lock_irq(&conf->device_lock); > -- > 2.43.0 > >
On Tue, 2023-11-28 at 13:11 -0500, David Jeffery wrote: > During a reshape or a RAID6 array such as expanding by adding an > additional > disk, I/Os to the region of the array which have not yet been > reshaped can > stall indefinitely. This is from errors in the > stripe_ahead_of_reshape > function causing md to think the I/O is to a region in the actively > undergoing the reshape. > > stripe_ahead_of_reshape fails to account for the q disk having a > sector > value of 0. By not excluding the q disk from the for loop, raid6 will > always > generate a min_sector value of 0, causing a return value which > stalls. > > The function's max_sector calculation also uses min() when it should > use > max(), causing the max_sector value to always be 0. During a > backwards > rebuild this can cause the opposite problem where it allows I/O to > advance > when it should wait. > > Fixing these errors will allow safe I/O to advance in a timely manner > and > delay only I/O which is unsafe due to stripes in the middle of > undergoing > the reshape. > > Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for > reshape progress") > Signed-off-by: David Jeffery <djeffery@redhat.com> > Tested-by: Laurence Oberman <loberman@redhat.com> > --- > drivers/md/raid5.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index dc031d42f53b..26e1e8a5e941 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct > mddev *mddev, struct r5conf *conf, > int dd_idx; > > for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) { > - if (dd_idx == sh->pd_idx) > + if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx) > continue; > > min_sector = min(min_sector, sh->dev[dd_idx].sector); > - max_sector = min(max_sector, sh->dev[dd_idx].sector); > + max_sector = max(max_sector, sh->dev[dd_idx].sector); > } > > spin_lock_irq(&conf->device_lock); Nice work David as usual, this issue has lurked for a while it seems. Thanks for all you do for Red Hat and upstream. Regards Laurence
On Tue, 2023-11-28 at 10:44 -0800, Song Liu wrote: > Hi David and Laurence, > > On Tue, Nov 28, 2023 at 10:13 AM David Jeffery <djeffery@redhat.com> > wrote: > > > > During a reshape or a RAID6 array such as expanding by adding an > > additional > > disk, I/Os to the region of the array which have not yet been > > reshaped can > > stall indefinitely. This is from errors in the > > stripe_ahead_of_reshape > > function causing md to think the I/O is to a region in the actively > > undergoing the reshape. > > > > stripe_ahead_of_reshape fails to account for the q disk having a > > sector > > value of 0. By not excluding the q disk from the for loop, raid6 > > will always > > generate a min_sector value of 0, causing a return value which > > stalls. > > > > The function's max_sector calculation also uses min() when it > > should use > > max(), causing the max_sector value to always be 0. During a > > backwards > > rebuild this can cause the opposite problem where it allows I/O to > > advance > > when it should wait. > > > > Fixing these errors will allow safe I/O to advance in a timely > > manner and > > delay only I/O which is unsafe due to stripes in the middle of > > undergoing > > the reshape. > > > > Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head > > for reshape progress") > > Signed-off-by: David Jeffery <djeffery@redhat.com> > > Tested-by: Laurence Oberman <loberman@redhat.com> > > Thanks for the fix! > > Can we add a test to mdadm/tests to cover this case? (Not sure how > reliable > the test will be). > > Song > > > --- > > drivers/md/raid5.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index dc031d42f53b..26e1e8a5e941 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct > > mddev *mddev, struct r5conf *conf, > > int dd_idx; > > > > for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) { > > - if (dd_idx == sh->pd_idx) > > + if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx) > > continue; > > > > min_sector = min(min_sector, sh- > > >dev[dd_idx].sector); > > - max_sector = min(max_sector, sh- > > >dev[dd_idx].sector); > > + max_sector = max(max_sector, sh- > > >dev[dd_idx].sector); > > } > > > > spin_lock_irq(&conf->device_lock); > > -- > > 2.43.0 > > > > > Hello Song The way this was discovered is the customer forced a reshape of a raid6 device by adding an additional device. The reshape then sess the hang. # mdadm /dev/md0 -a /dev/nvme18n1 mdadm: added /dev/nvme18n1 # mdadm --grow --raid-devices=6 /dev/md0 The problem was we needed a test kernel just for the customer to recover, because on reboot the reshape starts again and hangs the device access. Such steps could be added for a test but David will know maybe an easier way for a test. Regards Laurence
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index dc031d42f53b..26e1e8a5e941 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf, int dd_idx; for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) { - if (dd_idx == sh->pd_idx) + if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx) continue; min_sector = min(min_sector, sh->dev[dd_idx].sector); - max_sector = min(max_sector, sh->dev[dd_idx].sector); + max_sector = max(max_sector, sh->dev[dd_idx].sector); } spin_lock_irq(&conf->device_lock);