Message ID | 20240312233006.2461827-1-david@fromorbit.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xfs: allow sunit mount option to repair bad primary sb stripe values | expand |
On Wed, Mar 13, 2024 at 10:30:06AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > If a filesystem has a busted stripe alignment configuration on disk > (e.g. because broken RAID firmware told mkfs that swidth was smaller > than sunit), then the filesystem will refuse to mount due to the > stripe validation failing. This failure is triggering during distro > upgrades from old kernels lacking this check to newer kernels with > this check, and currently the only way to fix it is with offline > xfs_db surgery. > > This runtime validity checking occurs when we read the superblock > for the first time and causes the mount to fail immediately. This > prevents the rewrite of stripe unit/width via > mount options that occurs later in the mount process. Hence there is > no way to recover this situation without resorting to offline xfs_db > rewrite of the values. > > However, we parse the mount options long before we read the > superblock, and we know if the mount has been asked to re-write the > stripe alignment configuration when we are reading the superblock > and verifying it for the first time. Hence we can conditionally > ignore stripe verification failures if the mount options specified > will correct the issue. > > We validate that the new stripe unit/width are valid before we > overwrite the superblock values, so we can ignore the invalid config > at verification and fail the mount later if the new values are not > valid. This, at least, gives users the chance of correcting the > issue after a kernel upgrade without having to resort to xfs-db > hacks. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_sb.c | 40 +++++++++++++++++++++++++++++++--------- > fs/xfs/libxfs/xfs_sb.h | 3 ++- > 2 files changed, 33 insertions(+), 10 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > index d991eec05436..f51b1efa2cae 100644 > --- a/fs/xfs/libxfs/xfs_sb.c > +++ b/fs/xfs/libxfs/xfs_sb.c > @@ -530,7 +530,8 @@ xfs_validate_sb_common( > } > > if (!xfs_validate_stripe_geometry(mp, XFS_FSB_TO_B(mp, sbp->sb_unit), > - XFS_FSB_TO_B(mp, sbp->sb_width), 0, false)) > + XFS_FSB_TO_B(mp, sbp->sb_width), 0, > + xfs_buf_daddr(bp) == XFS_SB_DADDR, false)) > return -EFSCORRUPTED; > > /* > @@ -1323,8 +1324,10 @@ xfs_sb_get_secondary( > } > > /* > - * sunit, swidth, sectorsize(optional with 0) should be all in bytes, > - * so users won't be confused by values in error messages. > + * sunit, swidth, sectorsize(optional with 0) should be all in bytes, so users > + * won't be confused by values in error messages. This returns false if a value > + * is invalid and it is not the primary superblock that going to be corrected > + * later in the mount process. Hmm, I found this last sentence a little confusing. How about: "This function returns false if the stripe geometry is invalid and no attempt will be made to correct it later in the mount process." > */ > bool > xfs_validate_stripe_geometry( > @@ -1332,20 +1335,21 @@ xfs_validate_stripe_geometry( > __s64 sunit, > __s64 swidth, > int sectorsize, > + bool primary_sb, > bool silent) > { > if (swidth > INT_MAX) { > if (!silent) > xfs_notice(mp, > "stripe width (%lld) is too large", swidth); > - return false; > + goto check_override; > } > > if (sunit > swidth) { > if (!silent) > xfs_notice(mp, > "stripe unit (%lld) is larger than the stripe width (%lld)", sunit, swidth); > - return false; > + goto check_override; > } > > if (sectorsize && (int)sunit % sectorsize) { > @@ -1353,21 +1357,21 @@ xfs_validate_stripe_geometry( > xfs_notice(mp, > "stripe unit (%lld) must be a multiple of the sector size (%d)", > sunit, sectorsize); > - return false; > + goto check_override; > } > > if (sunit && !swidth) { > if (!silent) > xfs_notice(mp, > "invalid stripe unit (%lld) and stripe width of 0", sunit); > - return false; > + goto check_override; > } > > if (!sunit && swidth) { > if (!silent) > xfs_notice(mp, > "invalid stripe width (%lld) and stripe unit of 0", swidth); > - return false; > + goto check_override; > } > > if (sunit && (int)swidth % (int)sunit) { > @@ -1375,9 +1379,27 @@ xfs_validate_stripe_geometry( > xfs_notice(mp, > "stripe width (%lld) must be a multiple of the stripe unit (%lld)", > swidth, sunit); > - return false; > + goto check_override; > } > return true; > + > +check_override: > + if (!primary_sb) > + return false; > + /* > + * During mount, mp->m_dalign will not be set unless the sunit mount > + * option was set. If it was set, ignore the bad stripe alignment values > + * and allow the validation and overwrite later in the mount process to > + * attempt to overwrite the bad stripe alignment values with the values > + * supplied by mount options. What catches the case of if m_dalign/m_swidth also being garbage values? Is it xfs_check_new_dalign? Should that fail the mount if the replacement values are also garbage? > + */ > + if (!mp->m_dalign) > + return false; > + if (!silent) > + xfs_notice(mp, > +"Will try to correct with specified mount options sunit (%d) and swidth (%d)", > + BBTOB(mp->m_dalign), BBTOB(mp->m_swidth)); > + return true; > } > > /* > diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h > index 67a40069724c..58798b9c70ba 100644 > --- a/fs/xfs/libxfs/xfs_sb.h > +++ b/fs/xfs/libxfs/xfs_sb.h > @@ -35,7 +35,8 @@ extern int xfs_sb_get_secondary(struct xfs_mount *mp, > struct xfs_buf **bpp); > > extern bool xfs_validate_stripe_geometry(struct xfs_mount *mp, This declaration might as well lose the extern here too. > - __s64 sunit, __s64 swidth, int sectorsize, bool silent); > + __s64 sunit, __s64 swidth, int sectorsize, bool primary_sb, > + bool silent); What should value for @primary_sb should mkfs pass into xfs_validate_stripe_geometry from calc_stripe_factors? --D > > uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents); > > -- > 2.43.0 > >
On Tue, Mar 12, 2024 at 09:56:34PM -0700, Darrick J. Wong wrote: > On Wed, Mar 13, 2024 at 10:30:06AM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > If a filesystem has a busted stripe alignment configuration on disk > > (e.g. because broken RAID firmware told mkfs that swidth was smaller > > than sunit), then the filesystem will refuse to mount due to the > > stripe validation failing. This failure is triggering during distro > > upgrades from old kernels lacking this check to newer kernels with > > this check, and currently the only way to fix it is with offline > > xfs_db surgery. > > > > This runtime validity checking occurs when we read the superblock > > for the first time and causes the mount to fail immediately. This > > prevents the rewrite of stripe unit/width via > > mount options that occurs later in the mount process. Hence there is > > no way to recover this situation without resorting to offline xfs_db > > rewrite of the values. > > > > However, we parse the mount options long before we read the > > superblock, and we know if the mount has been asked to re-write the > > stripe alignment configuration when we are reading the superblock > > and verifying it for the first time. Hence we can conditionally > > ignore stripe verification failures if the mount options specified > > will correct the issue. > > > > We validate that the new stripe unit/width are valid before we > > overwrite the superblock values, so we can ignore the invalid config > > at verification and fail the mount later if the new values are not > > valid. This, at least, gives users the chance of correcting the > > issue after a kernel upgrade without having to resort to xfs-db > > hacks. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/libxfs/xfs_sb.c | 40 +++++++++++++++++++++++++++++++--------- > > fs/xfs/libxfs/xfs_sb.h | 3 ++- > > 2 files changed, 33 insertions(+), 10 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > > index d991eec05436..f51b1efa2cae 100644 > > --- a/fs/xfs/libxfs/xfs_sb.c > > +++ b/fs/xfs/libxfs/xfs_sb.c > > @@ -530,7 +530,8 @@ xfs_validate_sb_common( > > } > > > > if (!xfs_validate_stripe_geometry(mp, XFS_FSB_TO_B(mp, sbp->sb_unit), > > - XFS_FSB_TO_B(mp, sbp->sb_width), 0, false)) > > + XFS_FSB_TO_B(mp, sbp->sb_width), 0, > > + xfs_buf_daddr(bp) == XFS_SB_DADDR, false)) > > return -EFSCORRUPTED; > > > > /* > > @@ -1323,8 +1324,10 @@ xfs_sb_get_secondary( > > } > > > > /* > > - * sunit, swidth, sectorsize(optional with 0) should be all in bytes, > > - * so users won't be confused by values in error messages. > > + * sunit, swidth, sectorsize(optional with 0) should be all in bytes, so users > > + * won't be confused by values in error messages. This returns false if a value > > + * is invalid and it is not the primary superblock that going to be corrected > > + * later in the mount process. > > Hmm, I found this last sentence a little confusing. How about: > > "This function returns false if the stripe geometry is invalid and no > attempt will be made to correct it later in the mount process." Sure. ..... > > @@ -1375,9 +1379,27 @@ xfs_validate_stripe_geometry( > > xfs_notice(mp, > > "stripe width (%lld) must be a multiple of the stripe unit (%lld)", > > swidth, sunit); > > - return false; > > + goto check_override; > > } > > return true; > > + > > +check_override: > > + if (!primary_sb) > > + return false; > > + /* > > + * During mount, mp->m_dalign will not be set unless the sunit mount > > + * option was set. If it was set, ignore the bad stripe alignment values > > + * and allow the validation and overwrite later in the mount process to > > + * attempt to overwrite the bad stripe alignment values with the values > > + * supplied by mount options. > > What catches the case of if m_dalign/m_swidth also being garbage values? > Is it xfs_check_new_dalign? Should that fail the mount if the > replacement values are also garbage? xfs_validate_new_dalign(). It returns -EINVAL is the new striped alignment cannot be used (i.e. not aligned to block or ag sizes) and that causes the mount to fail. > > diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h > > index 67a40069724c..58798b9c70ba 100644 > > --- a/fs/xfs/libxfs/xfs_sb.h > > +++ b/fs/xfs/libxfs/xfs_sb.h > > @@ -35,7 +35,8 @@ extern int xfs_sb_get_secondary(struct xfs_mount *mp, > > struct xfs_buf **bpp); > > > > extern bool xfs_validate_stripe_geometry(struct xfs_mount *mp, > > This declaration might as well lose the extern here too. > > > - __s64 sunit, __s64 swidth, int sectorsize, bool silent); > > + __s64 sunit, __s64 swidth, int sectorsize, bool primary_sb, > > + bool silent); > > What should value for @primary_sb should mkfs pass into > xfs_validate_stripe_geometry from calc_stripe_factors? Userspace problem, really. i.e. mkfs is already abusing this function by passing a NULL xfs_mount and so will crash if the function tries to dereference mp. Hence it can pass false for "primary_sb" so it doesn't run any of the kernel side primary sb recovery code that dereferences mp because it doesn't need to (can't!) run it. -Dave.
On Wed, Mar 13, 2024 at 05:05:16PM +1100, Dave Chinner wrote: > On Tue, Mar 12, 2024 at 09:56:34PM -0700, Darrick J. Wong wrote: > > On Wed, Mar 13, 2024 at 10:30:06AM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > If a filesystem has a busted stripe alignment configuration on disk > > > (e.g. because broken RAID firmware told mkfs that swidth was smaller > > > than sunit), then the filesystem will refuse to mount due to the > > > stripe validation failing. This failure is triggering during distro > > > upgrades from old kernels lacking this check to newer kernels with > > > this check, and currently the only way to fix it is with offline > > > xfs_db surgery. > > > > > > This runtime validity checking occurs when we read the superblock > > > for the first time and causes the mount to fail immediately. This > > > prevents the rewrite of stripe unit/width via > > > mount options that occurs later in the mount process. Hence there is > > > no way to recover this situation without resorting to offline xfs_db > > > rewrite of the values. > > > > > > However, we parse the mount options long before we read the > > > superblock, and we know if the mount has been asked to re-write the > > > stripe alignment configuration when we are reading the superblock > > > and verifying it for the first time. Hence we can conditionally > > > ignore stripe verification failures if the mount options specified > > > will correct the issue. > > > > > > We validate that the new stripe unit/width are valid before we > > > overwrite the superblock values, so we can ignore the invalid config > > > at verification and fail the mount later if the new values are not > > > valid. This, at least, gives users the chance of correcting the > > > issue after a kernel upgrade without having to resort to xfs-db > > > hacks. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > --- > > > fs/xfs/libxfs/xfs_sb.c | 40 +++++++++++++++++++++++++++++++--------- > > > fs/xfs/libxfs/xfs_sb.h | 3 ++- > > > 2 files changed, 33 insertions(+), 10 deletions(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > > > index d991eec05436..f51b1efa2cae 100644 > > > --- a/fs/xfs/libxfs/xfs_sb.c > > > +++ b/fs/xfs/libxfs/xfs_sb.c > > > @@ -530,7 +530,8 @@ xfs_validate_sb_common( > > > } > > > > > > if (!xfs_validate_stripe_geometry(mp, XFS_FSB_TO_B(mp, sbp->sb_unit), > > > - XFS_FSB_TO_B(mp, sbp->sb_width), 0, false)) > > > + XFS_FSB_TO_B(mp, sbp->sb_width), 0, > > > + xfs_buf_daddr(bp) == XFS_SB_DADDR, false)) > > > return -EFSCORRUPTED; > > > > > > /* > > > @@ -1323,8 +1324,10 @@ xfs_sb_get_secondary( > > > } > > > > > > /* > > > - * sunit, swidth, sectorsize(optional with 0) should be all in bytes, > > > - * so users won't be confused by values in error messages. > > > + * sunit, swidth, sectorsize(optional with 0) should be all in bytes, so users > > > + * won't be confused by values in error messages. This returns false if a value > > > + * is invalid and it is not the primary superblock that going to be corrected > > > + * later in the mount process. > > > > Hmm, I found this last sentence a little confusing. How about: > > > > "This function returns false if the stripe geometry is invalid and no > > attempt will be made to correct it later in the mount process." > > Sure. > > ..... > > > @@ -1375,9 +1379,27 @@ xfs_validate_stripe_geometry( > > > xfs_notice(mp, > > > "stripe width (%lld) must be a multiple of the stripe unit (%lld)", > > > swidth, sunit); > > > - return false; > > > + goto check_override; > > > } > > > return true; > > > + > > > +check_override: > > > + if (!primary_sb) > > > + return false; > > > + /* > > > + * During mount, mp->m_dalign will not be set unless the sunit mount > > > + * option was set. If it was set, ignore the bad stripe alignment values > > > + * and allow the validation and overwrite later in the mount process to > > > + * attempt to overwrite the bad stripe alignment values with the values > > > + * supplied by mount options. > > > > What catches the case of if m_dalign/m_swidth also being garbage values? > > Is it xfs_check_new_dalign? Should that fail the mount if the > > replacement values are also garbage? > > xfs_validate_new_dalign(). > > It returns -EINVAL is the new striped alignment cannot be used (i.e. > not aligned to block or ag sizes) and that causes the mount to fail. Ok good. > > > diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h > > > index 67a40069724c..58798b9c70ba 100644 > > > --- a/fs/xfs/libxfs/xfs_sb.h > > > +++ b/fs/xfs/libxfs/xfs_sb.h > > > @@ -35,7 +35,8 @@ extern int xfs_sb_get_secondary(struct xfs_mount *mp, > > > struct xfs_buf **bpp); > > > > > > extern bool xfs_validate_stripe_geometry(struct xfs_mount *mp, > > > > This declaration might as well lose the extern here too. > > > > > - __s64 sunit, __s64 swidth, int sectorsize, bool silent); > > > + __s64 sunit, __s64 swidth, int sectorsize, bool primary_sb, > > > + bool silent); > > > > What should value for @primary_sb should mkfs pass into > > xfs_validate_stripe_geometry from calc_stripe_factors? > > Userspace problem, really. i.e. mkfs is already abusing this > function by passing a NULL xfs_mount and so will crash if the > function tries to dereference mp. Hence it can pass false for > "primary_sb" so it doesn't run any of the kernel side primary sb > recovery code that dereferences mp because it doesn't need to > (can't!) run it. At least it's mkfs so there's nothing /to/ recover ;) I've vaguely wondered if this function ought to drop the @mp argument and instead pass back an error enum, and then the caller can decide if it wants to emit a message or do something else. That would be a worthwhile cleanup for the lurking logic bomb in mkfs; maybe I'll try to code something up later. --D > -Dave. > -- > Dave Chinner > david@fromorbit.com >
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index d991eec05436..f51b1efa2cae 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -530,7 +530,8 @@ xfs_validate_sb_common( } if (!xfs_validate_stripe_geometry(mp, XFS_FSB_TO_B(mp, sbp->sb_unit), - XFS_FSB_TO_B(mp, sbp->sb_width), 0, false)) + XFS_FSB_TO_B(mp, sbp->sb_width), 0, + xfs_buf_daddr(bp) == XFS_SB_DADDR, false)) return -EFSCORRUPTED; /* @@ -1323,8 +1324,10 @@ xfs_sb_get_secondary( } /* - * sunit, swidth, sectorsize(optional with 0) should be all in bytes, - * so users won't be confused by values in error messages. + * sunit, swidth, sectorsize(optional with 0) should be all in bytes, so users + * won't be confused by values in error messages. This returns false if a value + * is invalid and it is not the primary superblock that going to be corrected + * later in the mount process. */ bool xfs_validate_stripe_geometry( @@ -1332,20 +1335,21 @@ xfs_validate_stripe_geometry( __s64 sunit, __s64 swidth, int sectorsize, + bool primary_sb, bool silent) { if (swidth > INT_MAX) { if (!silent) xfs_notice(mp, "stripe width (%lld) is too large", swidth); - return false; + goto check_override; } if (sunit > swidth) { if (!silent) xfs_notice(mp, "stripe unit (%lld) is larger than the stripe width (%lld)", sunit, swidth); - return false; + goto check_override; } if (sectorsize && (int)sunit % sectorsize) { @@ -1353,21 +1357,21 @@ xfs_validate_stripe_geometry( xfs_notice(mp, "stripe unit (%lld) must be a multiple of the sector size (%d)", sunit, sectorsize); - return false; + goto check_override; } if (sunit && !swidth) { if (!silent) xfs_notice(mp, "invalid stripe unit (%lld) and stripe width of 0", sunit); - return false; + goto check_override; } if (!sunit && swidth) { if (!silent) xfs_notice(mp, "invalid stripe width (%lld) and stripe unit of 0", swidth); - return false; + goto check_override; } if (sunit && (int)swidth % (int)sunit) { @@ -1375,9 +1379,27 @@ xfs_validate_stripe_geometry( xfs_notice(mp, "stripe width (%lld) must be a multiple of the stripe unit (%lld)", swidth, sunit); - return false; + goto check_override; } return true; + +check_override: + if (!primary_sb) + return false; + /* + * During mount, mp->m_dalign will not be set unless the sunit mount + * option was set. If it was set, ignore the bad stripe alignment values + * and allow the validation and overwrite later in the mount process to + * attempt to overwrite the bad stripe alignment values with the values + * supplied by mount options. + */ + if (!mp->m_dalign) + return false; + if (!silent) + xfs_notice(mp, +"Will try to correct with specified mount options sunit (%d) and swidth (%d)", + BBTOB(mp->m_dalign), BBTOB(mp->m_swidth)); + return true; } /* diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h index 67a40069724c..58798b9c70ba 100644 --- a/fs/xfs/libxfs/xfs_sb.h +++ b/fs/xfs/libxfs/xfs_sb.h @@ -35,7 +35,8 @@ extern int xfs_sb_get_secondary(struct xfs_mount *mp, struct xfs_buf **bpp); extern bool xfs_validate_stripe_geometry(struct xfs_mount *mp, - __s64 sunit, __s64 swidth, int sectorsize, bool silent); + __s64 sunit, __s64 swidth, int sectorsize, bool primary_sb, + bool silent); uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);