diff mbox series

[2/3] xfs: define a new "needrepair" feature

Message ID 160679385127.447856.3129099457617444604.stgit@magnolia (mailing list archive)
State Superseded, archived
Headers show
Series xfs: add the ability to flag a fs for repair | expand

Commit Message

Darrick J. Wong Dec. 1, 2020, 3:37 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Define an incompat feature flag to indicate that the filesystem needs to
be repaired.  While libxfs will recognize this feature, the kernel will
refuse to mount if the feature flag is set, and only xfs_repair will be
able to clear the flag.  The goal here is to force the admin to run
xfs_repair to completion after upgrading the filesystem, or if we
otherwise detect anomalies.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h |    7 +++++++
 fs/xfs/xfs_mount.c         |    6 ++++++
 2 files changed, 13 insertions(+)

Comments

Brian Foster Dec. 1, 2020, 4:18 p.m. UTC | #1
On Mon, Nov 30, 2020 at 07:37:31PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Define an incompat feature flag to indicate that the filesystem needs to
> be repaired.  While libxfs will recognize this feature, the kernel will
> refuse to mount if the feature flag is set, and only xfs_repair will be
> able to clear the flag.  The goal here is to force the admin to run
> xfs_repair to completion after upgrading the filesystem, or if we
> otherwise detect anomalies.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

IIUC, we're using an incompat bit to intentionally ensure the filesystem
cannot mount, even on kernels that predate this particular "needs
repair" feature. The only difference is that an older kernel would
complain about an unknown feature and return a different error code.
Right?

That seems reasonable, but out of curiousity is there a need/reason for
using an incompat bit over an ro_compat bit?

Brian

>  fs/xfs/libxfs/xfs_format.h |    7 +++++++
>  fs/xfs/xfs_mount.c         |    6 ++++++
>  2 files changed, 13 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index dd764da08f6f..5d8ba609ac0b 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -468,6 +468,7 @@ xfs_sb_has_ro_compat_feature(
>  #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
>  #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
>  #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
> +#define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
>  #define XFS_SB_FEAT_INCOMPAT_ALL \
>  		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
>  		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
> @@ -584,6 +585,12 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>  		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
>  }
>  
> +static inline bool xfs_sb_version_needsrepair(struct xfs_sb *sbp)
> +{
> +	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> +		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR);
> +}
> +
>  /*
>   * end of superblock version macros
>   */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 7bc7901d648d..2853ad49b27d 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -266,6 +266,12 @@ xfs_sb_validate_mount(
>  	struct xfs_buf		*bp,
>  	struct xfs_sb		*sbp)
>  {
> +	/* Filesystem claims it needs repair, so refuse the mount. */
> +	if (xfs_sb_version_needsrepair(&mp->m_sb)) {
> +		xfs_warn(mp, "Filesystem needs repair.  Please run xfs_repair.");
> +		return -EFSCORRUPTED;
> +	}
> +
>  	/*
>  	 * Don't touch the filesystem if a user tool thinks it owns the primary
>  	 * superblock.  mkfs doesn't clear the flag from secondary supers, so
>
Darrick J. Wong Dec. 1, 2020, 4:25 p.m. UTC | #2
On Tue, Dec 01, 2020 at 11:18:12AM -0500, Brian Foster wrote:
> On Mon, Nov 30, 2020 at 07:37:31PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Define an incompat feature flag to indicate that the filesystem needs to
> > be repaired.  While libxfs will recognize this feature, the kernel will
> > refuse to mount if the feature flag is set, and only xfs_repair will be
> > able to clear the flag.  The goal here is to force the admin to run
> > xfs_repair to completion after upgrading the filesystem, or if we
> > otherwise detect anomalies.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> IIUC, we're using an incompat bit to intentionally ensure the filesystem
> cannot mount, even on kernels that predate this particular "needs
> repair" feature. The only difference is that an older kernel would
> complain about an unknown feature and return a different error code.
> Right?
> 
> That seems reasonable, but out of curiousity is there a need/reason for
> using an incompat bit over an ro_compat bit?

The general principle is to prevent /any/ mounting of the fs until the
admin runs repair, even if it's readonly mounting.  The specific reason
is so that xfs_db can set some other feature flag as part of an upgrade
and then set the incompat bit to force a repair run (which xfs_admin
will immediately take care of).

Hm.  Now that you got me thinking, maybe there should be an exception
for a norecovery mount?

--D

> Brian
> 
> >  fs/xfs/libxfs/xfs_format.h |    7 +++++++
> >  fs/xfs/xfs_mount.c         |    6 ++++++
> >  2 files changed, 13 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index dd764da08f6f..5d8ba609ac0b 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -468,6 +468,7 @@ xfs_sb_has_ro_compat_feature(
> >  #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
> >  #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
> >  #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
> > +#define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
> >  #define XFS_SB_FEAT_INCOMPAT_ALL \
> >  		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
> >  		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
> > @@ -584,6 +585,12 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
> >  		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
> >  }
> >  
> > +static inline bool xfs_sb_version_needsrepair(struct xfs_sb *sbp)
> > +{
> > +	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> > +		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR);
> > +}
> > +
> >  /*
> >   * end of superblock version macros
> >   */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 7bc7901d648d..2853ad49b27d 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -266,6 +266,12 @@ xfs_sb_validate_mount(
> >  	struct xfs_buf		*bp,
> >  	struct xfs_sb		*sbp)
> >  {
> > +	/* Filesystem claims it needs repair, so refuse the mount. */
> > +	if (xfs_sb_version_needsrepair(&mp->m_sb)) {
> > +		xfs_warn(mp, "Filesystem needs repair.  Please run xfs_repair.");
> > +		return -EFSCORRUPTED;
> > +	}
> > +
> >  	/*
> >  	 * Don't touch the filesystem if a user tool thinks it owns the primary
> >  	 * superblock.  mkfs doesn't clear the flag from secondary supers, so
> > 
>
Brian Foster Dec. 1, 2020, 5:09 p.m. UTC | #3
On Tue, Dec 01, 2020 at 08:25:39AM -0800, Darrick J. Wong wrote:
> On Tue, Dec 01, 2020 at 11:18:12AM -0500, Brian Foster wrote:
> > On Mon, Nov 30, 2020 at 07:37:31PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Define an incompat feature flag to indicate that the filesystem needs to
> > > be repaired.  While libxfs will recognize this feature, the kernel will
> > > refuse to mount if the feature flag is set, and only xfs_repair will be
> > > able to clear the flag.  The goal here is to force the admin to run
> > > xfs_repair to completion after upgrading the filesystem, or if we
> > > otherwise detect anomalies.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > 
> > IIUC, we're using an incompat bit to intentionally ensure the filesystem
> > cannot mount, even on kernels that predate this particular "needs
> > repair" feature. The only difference is that an older kernel would
> > complain about an unknown feature and return a different error code.
> > Right?
> > 
> > That seems reasonable, but out of curiousity is there a need/reason for
> > using an incompat bit over an ro_compat bit?
> 
> The general principle is to prevent /any/ mounting of the fs until the
> admin runs repair, even if it's readonly mounting.  The specific reason
> is so that xfs_db can set some other feature flag as part of an upgrade
> and then set the incompat bit to force a repair run (which xfs_admin
> will immediately take care of).
> 
> Hm.  Now that you got me thinking, maybe there should be an exception
> for a norecovery mount?
> 

Yeah, I was more thinking about for recovery purposes if something
happens to go wrong, so that should imply norecovery as well. Eh, I
suppose one could always clear the bit too in that case so it's not that
big of a deal. Either way:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> --D
> 
> > Brian
> > 
> > >  fs/xfs/libxfs/xfs_format.h |    7 +++++++
> > >  fs/xfs/xfs_mount.c         |    6 ++++++
> > >  2 files changed, 13 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > > index dd764da08f6f..5d8ba609ac0b 100644
> > > --- a/fs/xfs/libxfs/xfs_format.h
> > > +++ b/fs/xfs/libxfs/xfs_format.h
> > > @@ -468,6 +468,7 @@ xfs_sb_has_ro_compat_feature(
> > >  #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
> > >  #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
> > >  #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
> > > +#define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
> > >  #define XFS_SB_FEAT_INCOMPAT_ALL \
> > >  		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
> > >  		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
> > > @@ -584,6 +585,12 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
> > >  		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
> > >  }
> > >  
> > > +static inline bool xfs_sb_version_needsrepair(struct xfs_sb *sbp)
> > > +{
> > > +	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> > > +		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR);
> > > +}
> > > +
> > >  /*
> > >   * end of superblock version macros
> > >   */
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index 7bc7901d648d..2853ad49b27d 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -266,6 +266,12 @@ xfs_sb_validate_mount(
> > >  	struct xfs_buf		*bp,
> > >  	struct xfs_sb		*sbp)
> > >  {
> > > +	/* Filesystem claims it needs repair, so refuse the mount. */
> > > +	if (xfs_sb_version_needsrepair(&mp->m_sb)) {
> > > +		xfs_warn(mp, "Filesystem needs repair.  Please run xfs_repair.");
> > > +		return -EFSCORRUPTED;
> > > +	}
> > > +
> > >  	/*
> > >  	 * Don't touch the filesystem if a user tool thinks it owns the primary
> > >  	 * superblock.  mkfs doesn't clear the flag from secondary supers, so
> > > 
> > 
>
Eric Sandeen Dec. 4, 2020, 8:07 p.m. UTC | #4
On 12/1/20 10:18 AM, Brian Foster wrote:
> On Mon, Nov 30, 2020 at 07:37:31PM -0800, Darrick J. Wong wrote:
>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>
>> Define an incompat feature flag to indicate that the filesystem needs to
>> be repaired.  While libxfs will recognize this feature, the kernel will
>> refuse to mount if the feature flag is set, and only xfs_repair will be
>> able to clear the flag.  The goal here is to force the admin to run
>> xfs_repair to completion after upgrading the filesystem, or if we
>> otherwise detect anomalies.
>>
>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>> ---
> IIUC, we're using an incompat bit to intentionally ensure the filesystem
> cannot mount, even on kernels that predate this particular "needs
> repair" feature. The only difference is that an older kernel would
> complain about an unknown feature and return a different error code.
> Right?
> 
> That seems reasonable, but out of curiousity is there a need/reason for
> using an incompat bit over an ro_compat bit?

I'm a fan of a straight-up incompat, because we don't really know what
format changes in the future might require this flag to be set; nothing
guarantees that future changes will be ro-compat-safe, right?

-Eric
Darrick J. Wong Dec. 4, 2020, 9:36 p.m. UTC | #5
On Fri, Dec 04, 2020 at 02:07:49PM -0600, Eric Sandeen wrote:
> On 12/1/20 10:18 AM, Brian Foster wrote:
> > On Mon, Nov 30, 2020 at 07:37:31PM -0800, Darrick J. Wong wrote:
> >> From: Darrick J. Wong <darrick.wong@oracle.com>
> >>
> >> Define an incompat feature flag to indicate that the filesystem needs to
> >> be repaired.  While libxfs will recognize this feature, the kernel will
> >> refuse to mount if the feature flag is set, and only xfs_repair will be
> >> able to clear the flag.  The goal here is to force the admin to run
> >> xfs_repair to completion after upgrading the filesystem, or if we
> >> otherwise detect anomalies.
> >>
> >> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >> ---
> > IIUC, we're using an incompat bit to intentionally ensure the filesystem
> > cannot mount, even on kernels that predate this particular "needs
> > repair" feature. The only difference is that an older kernel would
> > complain about an unknown feature and return a different error code.
> > Right?
> > 
> > That seems reasonable, but out of curiousity is there a need/reason for
> > using an incompat bit over an ro_compat bit?
> 
> I'm a fan of a straight-up incompat, because we don't really know what
> format changes in the future might require this flag to be set; nothing
> guarantees that future changes will be ro-compat-safe, right?

Correct.  In the case of the inobtcount upgrade, we know that the
inobt/finobt blockcounts in the AGI are zero (and thus wrong) right
after the upgrade.  If we made it a rocompat bit then we'd allow ro
mounts but we'd also have to be careful to prohibit a ro->rw remount,
at which point the admin gets a Big Surprise.

Why not just make the admin repair the system right then and there?
I mean, xfs_admin is already going to run repair anyway, so in practice
there shouldn't be that many people trying to push an "upgraded but
needs repair" fs at the kernel anyway.

--D

> -Eric
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index dd764da08f6f..5d8ba609ac0b 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -468,6 +468,7 @@  xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
 #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
 #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
+#define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
@@ -584,6 +585,12 @@  static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
 }
 
+static inline bool xfs_sb_version_needsrepair(struct xfs_sb *sbp)
+{
+	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR);
+}
+
 /*
  * end of superblock version macros
  */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 7bc7901d648d..2853ad49b27d 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -266,6 +266,12 @@  xfs_sb_validate_mount(
 	struct xfs_buf		*bp,
 	struct xfs_sb		*sbp)
 {
+	/* Filesystem claims it needs repair, so refuse the mount. */
+	if (xfs_sb_version_needsrepair(&mp->m_sb)) {
+		xfs_warn(mp, "Filesystem needs repair.  Please run xfs_repair.");
+		return -EFSCORRUPTED;
+	}
+
 	/*
 	 * Don't touch the filesystem if a user tool thinks it owns the primary
 	 * superblock.  mkfs doesn't clear the flag from secondary supers, so