diff mbox series

[1/3] xfs_repair: detect v5 featureset mismatches in secondary supers

Message ID 165176675148.248791.14783205262181556770.stgit@magnolia (mailing list archive)
State Accepted
Headers show
Series xfs_repair: various small fixes | expand

Commit Message

Darrick J. Wong May 5, 2022, 4:05 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Make sure we detect and correct mismatches between the V5 features
described in the primary and the secondary superblocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/agheader.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

Comments

Eric Sandeen May 12, 2022, 7:02 p.m. UTC | #1
On 5/5/22 11:05 AM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Make sure we detect and correct mismatches between the V5 features
> described in the primary and the secondary superblocks.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---


> +	if ((mp->m_sb.sb_features_incompat ^ sb->sb_features_incompat) &
> +			~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR) {

I'd like to add a comment about why XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR is special.
(Why is XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR special? Just because userspace doesn't
bother to set it on all superblocks in the upgrade paths, right?)

-Eric
Darrick J. Wong May 12, 2022, 7:13 p.m. UTC | #2
On Thu, May 12, 2022 at 02:02:33PM -0500, Eric Sandeen wrote:
> On 5/5/22 11:05 AM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Make sure we detect and correct mismatches between the V5 features
> > described in the primary and the secondary superblocks.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> 
> 
> > +	if ((mp->m_sb.sb_features_incompat ^ sb->sb_features_incompat) &
> > +			~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR) {
> 
> I'd like to add a comment about why XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR is special.
> (Why is XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR special? Just because userspace doesn't
> bother to set it on all superblocks in the upgrade paths, right?)

Right -- we only set it on the primary super to force users to run
xfs_repair.  Repair will clear NEEDSREPAIR on the primary and all
secondaries before it exits, so there's no point in complaining about
discrepancies in that particular feature bit.

--D

> -Eric
>
Dave Chinner May 16, 2022, 8:11 a.m. UTC | #3
On Thu, May 12, 2022 at 12:13:29PM -0700, Darrick J. Wong wrote:
> On Thu, May 12, 2022 at 02:02:33PM -0500, Eric Sandeen wrote:
> > On 5/5/22 11:05 AM, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Make sure we detect and correct mismatches between the V5 features
> > > described in the primary and the secondary superblocks.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > 
> > 
> > > +	if ((mp->m_sb.sb_features_incompat ^ sb->sb_features_incompat) &
> > > +			~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR) {
> > 
> > I'd like to add a comment about why XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR is special.
> > (Why is XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR special? Just because userspace doesn't
> > bother to set it on all superblocks in the upgrade paths, right?)
> 
> Right -- we only set it on the primary super to force users to run
> xfs_repair.  Repair will clear NEEDSREPAIR on the primary and all
> secondaries before it exits

Oh no it doesn't! :)

I just debugged the persistent xfs/158 failure down to repair only
writing the secondary superblocks with "features_changed" is true.

xfs/158 trashes the repair process after setting the inobtcnt and
needrepair feature bits in the primary superblock. That's the only
code that sets the internal "features_changed" flag that triggers
secondary superblock writeback. If repair then dies after this it
won't get set on the next run without the upgrade options set on the
command line. Hence we re-run repair without the upgrade feature
being enabled to check that it still recovers correctly.

The result is that repair does the right thing in completing the
feature upgrade because it sees the feature flag in the primary
superblock, but it *never sets "features_changed"* and so the
secondary superblocks are not updated when it is done. It then goes
on to check NEEDSREPAIR in the primary superblock and clears it,
allowing the fileystem to mount again.

This results in secondary superblocks with in-progress set and
mis-matched feature bits, resulting in xfs_scrub reporting corrupt
secondary superblocks and so failing the test.

This change will probably mask that specific bug - it'll still be
lying their waiting to pounce on some unsuspecting bystander, but it
will be masked by other code detecting the feature mismatch that it
causes and correcting it that way...

Cheers,

Dave.
Darrick J. Wong May 16, 2022, 5:59 p.m. UTC | #4
On Mon, May 16, 2022 at 06:11:53PM +1000, Dave Chinner wrote:
> On Thu, May 12, 2022 at 12:13:29PM -0700, Darrick J. Wong wrote:
> > On Thu, May 12, 2022 at 02:02:33PM -0500, Eric Sandeen wrote:
> > > On 5/5/22 11:05 AM, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Make sure we detect and correct mismatches between the V5 features
> > > > described in the primary and the secondary superblocks.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > 
> > > 
> > > > +	if ((mp->m_sb.sb_features_incompat ^ sb->sb_features_incompat) &
> > > > +			~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR) {
> > > 
> > > I'd like to add a comment about why XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR is special.
> > > (Why is XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR special? Just because userspace doesn't
> > > bother to set it on all superblocks in the upgrade paths, right?)
> > 
> > Right -- we only set it on the primary super to force users to run
> > xfs_repair.  Repair will clear NEEDSREPAIR on the primary and all
> > secondaries before it exits
> 
> Oh no it doesn't! :)
> 
> I just debugged the persistent xfs/158 failure down to repair only
> writing the secondary superblocks with "features_changed" is true.
> 
> xfs/158 trashes the repair process after setting the inobtcnt and
> needrepair feature bits in the primary superblock. That's the only
> code that sets the internal "features_changed" flag that triggers
> secondary superblock writeback. If repair then dies after this it
> won't get set on the next run without the upgrade options set on the
> command line. Hence we re-run repair without the upgrade feature
> being enabled to check that it still recovers correctly.
> 
> The result is that repair does the right thing in completing the
> feature upgrade because it sees the feature flag in the primary
> superblock, but it *never sets "features_changed"* and so the
> secondary superblocks are not updated when it is done. It then goes
> on to check NEEDSREPAIR in the primary superblock and clears it,
> allowing the fileystem to mount again.
> 
> This results in secondary superblocks with in-progress set and
> mis-matched feature bits, resulting in xfs_scrub reporting corrupt
> secondary superblocks and so failing the test.
> 
> This change will probably mask that specific bug - it'll still be
> lying their waiting to pounce on some unsuspecting bystander, but it
> will be masked by other code detecting the feature mismatch that it
> causes and correcting it that way...

Ah, ok.  You're pointing out that repair needs to set features_changed
right after printing "clearing needsrepair flag and regenerating
metadata", and that this patch sort of papers over the underlying issue.

The genesis of this patch wasn't xfs/158 failing, it was the secondary
sb fuzz testing noticing that xfs_scrub reported a failure whereas
xfs_repair didn't.

So, I'll go add an extra patch to fix the upgrade case, and I think we
can let Eric add this one to his tree.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/repair/agheader.c b/repair/agheader.c
index d8f912f2..650ffb2e 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -220,6 +220,92 @@  compare_sb(xfs_mount_t *mp, xfs_sb_t *sb)
 	return(XR_OK);
 }
 
+/*
+ * If the fs feature bits on a secondary superblock don't match the
+ * primary, we need to update them.
+ */
+static inline int
+check_v5_feature_mismatch(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct xfs_sb		*sb)
+{
+	bool			dirty = false;
+
+	if (!xfs_has_crc(mp) || agno == 0)
+		return 0;
+
+	if (mp->m_sb.sb_features_compat != sb->sb_features_compat) {
+		if (no_modify) {
+			do_warn(
+	_("would fix compat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_compat,
+					sb->sb_features_compat);
+		} else {
+			do_warn(
+	_("will fix compat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_compat,
+					sb->sb_features_compat);
+			dirty = true;
+		}
+	}
+
+	if ((mp->m_sb.sb_features_incompat ^ sb->sb_features_incompat) &
+			~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR) {
+		if (no_modify) {
+			do_warn(
+	_("would fix incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_incompat,
+					sb->sb_features_incompat);
+		} else {
+			do_warn(
+	_("will fix incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_incompat,
+					sb->sb_features_incompat);
+			dirty = true;
+		}
+	}
+
+	if (mp->m_sb.sb_features_ro_compat != sb->sb_features_ro_compat) {
+		if (no_modify) {
+			do_warn(
+	_("would fix ro compat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_ro_compat,
+					sb->sb_features_ro_compat);
+		} else {
+			do_warn(
+	_("will fix ro compat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_ro_compat,
+					sb->sb_features_ro_compat);
+			dirty = true;
+		}
+	}
+
+	if (mp->m_sb.sb_features_log_incompat != sb->sb_features_log_incompat) {
+		if (no_modify) {
+			do_warn(
+	_("would fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_log_incompat,
+					sb->sb_features_log_incompat);
+		} else {
+			do_warn(
+	_("will fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+					agno, mp->m_sb.sb_features_log_incompat,
+					sb->sb_features_log_incompat);
+			dirty = true;
+		}
+	}
+
+	if (!dirty)
+		return 0;
+
+	sb->sb_features_compat = mp->m_sb.sb_features_compat;
+	sb->sb_features_ro_compat = mp->m_sb.sb_features_ro_compat;
+	sb->sb_features_incompat = mp->m_sb.sb_features_incompat;
+	sb->sb_features_log_incompat = mp->m_sb.sb_features_log_incompat;
+	return XR_AG_SB_SEC;
+}
+
 /*
  * Possible fields that may have been set at mkfs time,
  * sb_inoalignmt, sb_unit, sb_width and sb_dirblklog.
@@ -452,6 +538,8 @@  secondary_sb_whack(
 			rval |= XR_AG_SB_SEC;
 	}
 
+	rval |= check_v5_feature_mismatch(mp, i, sb);
+
 	if (xfs_sb_version_needsrepair(sb)) {
 		if (i == 0) {
 			if (!no_modify)