diff mbox series

[2/2] xfs_repair: clear the needsrepair flag

Message ID 161076029319.3386490.2011901341184065451.stgit@magnolia (mailing list archive)
State Superseded
Headers show
Series xfs: add the ability to flag a fs for repair | expand

Commit Message

Darrick J. Wong Jan. 16, 2021, 1:24 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Clear the needsrepair flag, since it's used to prevent mounting of an
inconsistent filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/agheader.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Brian Foster Jan. 19, 2021, 2:37 p.m. UTC | #1
On Fri, Jan 15, 2021 at 05:24:53PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Clear the needsrepair flag, since it's used to prevent mounting of an
> inconsistent filesystem.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---

Code/errors look much cleaner. Though looking at the repair code again,
I wonder... if we clear the needsrepair bit and dirty/write the sb in
phase 2 and then xfs_repair happens to crash, do we risk clearing the
bit and thus allowing a potential mount before whatever requisite
metadata updates have been made?

Brian

>  repair/agheader.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> 
> diff --git a/repair/agheader.c b/repair/agheader.c
> index 8bb99489..d9b72d3a 100644
> --- a/repair/agheader.c
> +++ b/repair/agheader.c
> @@ -452,6 +452,21 @@ secondary_sb_whack(
>  			rval |= XR_AG_SB_SEC;
>  	}
>  
> +	if (xfs_sb_version_needsrepair(sb)) {
> +		if (!no_modify)
> +			sb->sb_features_incompat &=
> +					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> +		if (i == 0) {
> +			if (!no_modify)
> +				do_warn(
> +	_("clearing needsrepair flag and regenerating metadata\n"));
> +			else
> +				do_warn(
> +	_("would clear needsrepair flag and regenerate metadata\n"));
> +		}
> +		rval |= XR_AG_SB_SEC;
> +	}
> +
>  	return(rval);
>  }
>  
>
Darrick J. Wong Jan. 19, 2021, 6:15 p.m. UTC | #2
On Tue, Jan 19, 2021 at 09:37:54AM -0500, Brian Foster wrote:
> On Fri, Jan 15, 2021 at 05:24:53PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Clear the needsrepair flag, since it's used to prevent mounting of an
> > inconsistent filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> 
> Code/errors look much cleaner. Though looking at the repair code again,
> I wonder... if we clear the needsrepair bit and dirty/write the sb in
> phase 2 and then xfs_repair happens to crash, do we risk clearing the
> bit and thus allowing a potential mount before whatever requisite
> metadata updates have been made?

[Oh good, now mail.kernel.org is having problems...]

Yes, though I think that falls into the realm of "sysadmins should be
sufficiently self-aware not to expect mount to work after repair
fails/system crashes during an upgrade".

I've thought about how to solve the general problem of preventing people
from mounting filesystems if repair doesn't run to completion.  I think
xfs_repair could be modified so that once it finds the primary super, it
writes it back out with NEEDSREPAIR set (V5) or inprogress set (V4).
Once we've finished the buffer cache flush at the end of repair, we
clear needsrepair/inprogress and write the primary super again.

An optimization on that would be to find a way to avoid that first super
write until we flush the first dirty buffer.

Another way to make repair more "transactional" would be to do it would
be to fiddle with the buffer manager so that writes are sent to a
metadump file which could be mdrestore'd if repair completes
successfully.  But that's a short-circuit around the even bigger project
of porting the kernel logging code to userspace and use that in repair.

--D

> Brian
> 
> >  repair/agheader.c |   15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > 
> > diff --git a/repair/agheader.c b/repair/agheader.c
> > index 8bb99489..d9b72d3a 100644
> > --- a/repair/agheader.c
> > +++ b/repair/agheader.c
> > @@ -452,6 +452,21 @@ secondary_sb_whack(
> >  			rval |= XR_AG_SB_SEC;
> >  	}
> >  
> > +	if (xfs_sb_version_needsrepair(sb)) {
> > +		if (!no_modify)
> > +			sb->sb_features_incompat &=
> > +					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> > +		if (i == 0) {
> > +			if (!no_modify)
> > +				do_warn(
> > +	_("clearing needsrepair flag and regenerating metadata\n"));
> > +			else
> > +				do_warn(
> > +	_("would clear needsrepair flag and regenerate metadata\n"));
> > +		}
> > +		rval |= XR_AG_SB_SEC;
> > +	}
> > +
> >  	return(rval);
> >  }
> >  
> > 
>
Brian Foster Jan. 19, 2021, 7:44 p.m. UTC | #3
On Tue, Jan 19, 2021 at 10:15:49AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 19, 2021 at 09:37:54AM -0500, Brian Foster wrote:
> > On Fri, Jan 15, 2021 at 05:24:53PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Clear the needsrepair flag, since it's used to prevent mounting of an
> > > inconsistent filesystem.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > 
> > Code/errors look much cleaner. Though looking at the repair code again,
> > I wonder... if we clear the needsrepair bit and dirty/write the sb in
> > phase 2 and then xfs_repair happens to crash, do we risk clearing the
> > bit and thus allowing a potential mount before whatever requisite
> > metadata updates have been made?
> 
> [Oh good, now mail.kernel.org is having problems...]
> 
> Yes, though I think that falls into the realm of "sysadmins should be
> sufficiently self-aware not to expect mount to work after repair
> fails/system crashes during an upgrade".
> 
> I've thought about how to solve the general problem of preventing people
> from mounting filesystems if repair doesn't run to completion.  I think
> xfs_repair could be modified so that once it finds the primary super, it
> writes it back out with NEEDSREPAIR set (V5) or inprogress set (V4).
> Once we've finished the buffer cache flush at the end of repair, we
> clear needsrepair/inprogress and write the primary super again.
> 

That's kind of what I was thinking.. set a global flag if/when we come
across the bit set on disk and clear it as a final step.

> An optimization on that would be to find a way to avoid that first super
> write until we flush the first dirty buffer.
> 
> Another way to make repair more "transactional" would be to do it would
> be to fiddle with the buffer manager so that writes are sent to a
> metadump file which could be mdrestore'd if repair completes
> successfully.  But that's a short-circuit around the even bigger project
> of porting the kernel logging code to userspace and use that in repair.
> 

Yeah, I wouldn't want to have clearing a feature bit depend on such a
significant rework of core functionality. The bit is not going to be set
in most cases, so I'd suspect an extra superblock write at the end of
repair wouldn't be much of a problem. In fact, it looks like main()
already has an unconditional sb write before the final libxfs_unmount()
call. Perhaps we could just hitch onto that for the primary super
update (and continue to clear the secondaries earlier as the current
patch does)..?

Brian

> --D
> 
> > Brian
> > 
> > >  repair/agheader.c |   15 +++++++++++++++
> > >  1 file changed, 15 insertions(+)
> > > 
> > > 
> > > diff --git a/repair/agheader.c b/repair/agheader.c
> > > index 8bb99489..d9b72d3a 100644
> > > --- a/repair/agheader.c
> > > +++ b/repair/agheader.c
> > > @@ -452,6 +452,21 @@ secondary_sb_whack(
> > >  			rval |= XR_AG_SB_SEC;
> > >  	}
> > >  
> > > +	if (xfs_sb_version_needsrepair(sb)) {
> > > +		if (!no_modify)
> > > +			sb->sb_features_incompat &=
> > > +					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> > > +		if (i == 0) {
> > > +			if (!no_modify)
> > > +				do_warn(
> > > +	_("clearing needsrepair flag and regenerating metadata\n"));
> > > +			else
> > > +				do_warn(
> > > +	_("would clear needsrepair flag and regenerate metadata\n"));
> > > +		}
> > > +		rval |= XR_AG_SB_SEC;
> > > +	}
> > > +
> > >  	return(rval);
> > >  }
> > >  
> > > 
> > 
>
Darrick J. Wong Jan. 19, 2021, 8:31 p.m. UTC | #4
On Tue, Jan 19, 2021 at 02:44:06PM -0500, Brian Foster wrote:
> On Tue, Jan 19, 2021 at 10:15:49AM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 19, 2021 at 09:37:54AM -0500, Brian Foster wrote:
> > > On Fri, Jan 15, 2021 at 05:24:53PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Clear the needsrepair flag, since it's used to prevent mounting of an
> > > > inconsistent filesystem.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > 
> > > Code/errors look much cleaner. Though looking at the repair code again,
> > > I wonder... if we clear the needsrepair bit and dirty/write the sb in
> > > phase 2 and then xfs_repair happens to crash, do we risk clearing the
> > > bit and thus allowing a potential mount before whatever requisite
> > > metadata updates have been made?
> > 
> > [Oh good, now mail.kernel.org is having problems...]
> > 
> > Yes, though I think that falls into the realm of "sysadmins should be
> > sufficiently self-aware not to expect mount to work after repair
> > fails/system crashes during an upgrade".
> > 
> > I've thought about how to solve the general problem of preventing people
> > from mounting filesystems if repair doesn't run to completion.  I think
> > xfs_repair could be modified so that once it finds the primary super, it
> > writes it back out with NEEDSREPAIR set (V5) or inprogress set (V4).
> > Once we've finished the buffer cache flush at the end of repair, we
> > clear needsrepair/inprogress and write the primary super again.
> > 
> 
> That's kind of what I was thinking.. set a global flag if/when we come
> across the bit set on disk and clear it as a final step.

I think if I found it set on the primary sb I would leave it alone, and
if the bit wasn't set then I'd set it and bwrite the buffer immediately.
Probably we're talking about more or less the same thing...

> > An optimization on that would be to find a way to avoid that first super
> > write until we flush the first dirty buffer.
> > 
> > Another way to make repair more "transactional" would be to do it would
> > be to fiddle with the buffer manager so that writes are sent to a
> > metadump file which could be mdrestore'd if repair completes
> > successfully.  But that's a short-circuit around the even bigger project
> > of porting the kernel logging code to userspace and use that in repair.
> > 
> 
> Yeah, I wouldn't want to have clearing a feature bit depend on such a
> significant rework of core functionality. The bit is not going to be set
> in most cases, so I'd suspect an extra superblock write at the end of
> repair wouldn't be much of a problem. In fact, it looks like main()
> already has an unconditional sb write before the final libxfs_unmount()
> call. Perhaps we could just hitch onto that for the primary super
> update (and continue to clear the secondaries earlier as the current
> patch does)..?

Clearing the bit has to happen after the bcache purge and disk flush,
because we need to make sure that everything else we wrote actually made
it to stable storage.

Hm, I guess we could export libxfs_flush_mount so that repair could call
that, clear the fields in the sb, and then go ahead with the
libxfs_umount.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > >  repair/agheader.c |   15 +++++++++++++++
> > > >  1 file changed, 15 insertions(+)
> > > > 
> > > > 
> > > > diff --git a/repair/agheader.c b/repair/agheader.c
> > > > index 8bb99489..d9b72d3a 100644
> > > > --- a/repair/agheader.c
> > > > +++ b/repair/agheader.c
> > > > @@ -452,6 +452,21 @@ secondary_sb_whack(
> > > >  			rval |= XR_AG_SB_SEC;
> > > >  	}
> > > >  
> > > > +	if (xfs_sb_version_needsrepair(sb)) {
> > > > +		if (!no_modify)
> > > > +			sb->sb_features_incompat &=
> > > > +					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> > > > +		if (i == 0) {
> > > > +			if (!no_modify)
> > > > +				do_warn(
> > > > +	_("clearing needsrepair flag and regenerating metadata\n"));
> > > > +			else
> > > > +				do_warn(
> > > > +	_("would clear needsrepair flag and regenerate metadata\n"));
> > > > +		}
> > > > +		rval |= XR_AG_SB_SEC;
> > > > +	}
> > > > +
> > > >  	return(rval);
> > > >  }
> > > >  
> > > > 
> > > 
> > 
>
Darrick J. Wong Jan. 19, 2021, 11:49 p.m. UTC | #5
On Tue, Jan 19, 2021 at 12:31:10PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 19, 2021 at 02:44:06PM -0500, Brian Foster wrote:
> > On Tue, Jan 19, 2021 at 10:15:49AM -0800, Darrick J. Wong wrote:
> > > On Tue, Jan 19, 2021 at 09:37:54AM -0500, Brian Foster wrote:
> > > > On Fri, Jan 15, 2021 at 05:24:53PM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > Clear the needsrepair flag, since it's used to prevent mounting of an
> > > > > inconsistent filesystem.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > ---
> > > > 
> > > > Code/errors look much cleaner. Though looking at the repair code again,
> > > > I wonder... if we clear the needsrepair bit and dirty/write the sb in
> > > > phase 2 and then xfs_repair happens to crash, do we risk clearing the
> > > > bit and thus allowing a potential mount before whatever requisite
> > > > metadata updates have been made?
> > > 
> > > [Oh good, now mail.kernel.org is having problems...]
> > > 
> > > Yes, though I think that falls into the realm of "sysadmins should be
> > > sufficiently self-aware not to expect mount to work after repair
> > > fails/system crashes during an upgrade".
> > > 
> > > I've thought about how to solve the general problem of preventing people
> > > from mounting filesystems if repair doesn't run to completion.  I think
> > > xfs_repair could be modified so that once it finds the primary super, it
> > > writes it back out with NEEDSREPAIR set (V5) or inprogress set (V4).
> > > Once we've finished the buffer cache flush at the end of repair, we
> > > clear needsrepair/inprogress and write the primary super again.
> > > 
> > 
> > That's kind of what I was thinking.. set a global flag if/when we come
> > across the bit set on disk and clear it as a final step.
> 
> I think if I found it set on the primary sb I would leave it alone, and
> if the bit wasn't set then I'd set it and bwrite the buffer immediately.
> Probably we're talking about more or less the same thing...

...though after talking to Eric some more, I think I should change this
patch to clear needsrepair at the end of repair, like you said.

--D

> > > An optimization on that would be to find a way to avoid that first super
> > > write until we flush the first dirty buffer.
> > > 
> > > Another way to make repair more "transactional" would be to do it would
> > > be to fiddle with the buffer manager so that writes are sent to a
> > > metadump file which could be mdrestore'd if repair completes
> > > successfully.  But that's a short-circuit around the even bigger project
> > > of porting the kernel logging code to userspace and use that in repair.
> > > 
> > 
> > Yeah, I wouldn't want to have clearing a feature bit depend on such a
> > significant rework of core functionality. The bit is not going to be set
> > in most cases, so I'd suspect an extra superblock write at the end of
> > repair wouldn't be much of a problem. In fact, it looks like main()
> > already has an unconditional sb write before the final libxfs_unmount()
> > call. Perhaps we could just hitch onto that for the primary super
> > update (and continue to clear the secondaries earlier as the current
> > patch does)..?
> 
> Clearing the bit has to happen after the bcache purge and disk flush,
> because we need to make sure that everything else we wrote actually made
> it to stable storage.
> 
> Hm, I guess we could export libxfs_flush_mount so that repair could call
> that, clear the fields in the sb, and then go ahead with the
> libxfs_umount.
> 
> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > >  repair/agheader.c |   15 +++++++++++++++
> > > > >  1 file changed, 15 insertions(+)
> > > > > 
> > > > > 
> > > > > diff --git a/repair/agheader.c b/repair/agheader.c
> > > > > index 8bb99489..d9b72d3a 100644
> > > > > --- a/repair/agheader.c
> > > > > +++ b/repair/agheader.c
> > > > > @@ -452,6 +452,21 @@ secondary_sb_whack(
> > > > >  			rval |= XR_AG_SB_SEC;
> > > > >  	}
> > > > >  
> > > > > +	if (xfs_sb_version_needsrepair(sb)) {
> > > > > +		if (!no_modify)
> > > > > +			sb->sb_features_incompat &=
> > > > > +					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> > > > > +		if (i == 0) {
> > > > > +			if (!no_modify)
> > > > > +				do_warn(
> > > > > +	_("clearing needsrepair flag and regenerating metadata\n"));
> > > > > +			else
> > > > > +				do_warn(
> > > > > +	_("would clear needsrepair flag and regenerate metadata\n"));
> > > > > +		}
> > > > > +		rval |= XR_AG_SB_SEC;
> > > > > +	}
> > > > > +
> > > > >  	return(rval);
> > > > >  }
> > > > >  
> > > > > 
> > > > 
> > > 
> >
Darrick J. Wong Jan. 20, 2021, 4:31 a.m. UTC | #6
From: Darrick J. Wong <djwong@kernel.org>

Clear the needsrepair flag, since it's used to prevent mounting of an
inconsistent filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v2.1: only remove needsrepair at the end of the xfs_repair run
---
 include/xfs_mount.h |    1 +
 libxfs/init.c       |    2 +-
 repair/agheader.c   |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/agheader.h   |    2 ++
 repair/xfs_repair.c |    4 +++-
 5 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 36594643..75230ca5 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -181,6 +181,7 @@ xfs_perag_resv(
 
 extern xfs_mount_t	*libxfs_mount (xfs_mount_t *, xfs_sb_t *,
 				dev_t, dev_t, dev_t, int);
+int libxfs_flush_mount(struct xfs_mount *mp);
 int		libxfs_umount(struct xfs_mount *mp);
 extern void	libxfs_rtmount_destroy (xfs_mount_t *);
 
diff --git a/libxfs/init.c b/libxfs/init.c
index 9fe13b8d..083060bf 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -870,7 +870,7 @@ _("%s: Flushing the %s failed, err=%d!\n"),
  * Flush all dirty buffers to stable storage and report on writes that didn't
  * make it to stable storage.
  */
-static int
+int
 libxfs_flush_mount(
 	struct xfs_mount	*mp)
 {
diff --git a/repair/agheader.c b/repair/agheader.c
index 8bb99489..56a7f45c 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -220,6 +220,40 @@ compare_sb(xfs_mount_t *mp, xfs_sb_t *sb)
 	return(XR_OK);
 }
 
+/* Clear needsrepair after a successful repair run. */
+int
+clear_needsrepair(
+	struct xfs_mount	*mp)
+{
+	struct xfs_buf		*bp;
+	int			error;
+
+	if (!xfs_sb_version_needsrepair(&mp->m_sb) || no_modify)
+		return 0;
+
+	/* We must succeed at flushing all dirty buffers to disk. */
+	error = -libxfs_flush_mount(mp);
+	if (error)
+		return error;
+
+	/* Clear needsrepair from the superblock. */
+	bp = libxfs_getsb(mp);
+	if (!bp)
+		return ENOMEM;
+	if (bp->b_error) {
+		error = bp->b_error;
+		libxfs_buf_relse(bp);
+		return -error;
+	}
+
+	mp->m_sb.sb_features_incompat &= ~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
+
+	libxfs_sb_to_disk(bp->b_addr, &mp->m_sb);
+	libxfs_buf_mark_dirty(bp);
+	libxfs_buf_relse(bp);
+	return 0;
+}
+
 /*
  * Possible fields that may have been set at mkfs time,
  * sb_inoalignmt, sb_unit, sb_width and sb_dirblklog.
@@ -452,6 +486,27 @@ secondary_sb_whack(
 			rval |= XR_AG_SB_SEC;
 	}
 
+	if (xfs_sb_version_needsrepair(sb)) {
+		if (!no_modify)
+			sb->sb_features_incompat &=
+					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
+		if (i == 0) {
+			if (!no_modify)
+				do_warn(
+	_("clearing needsrepair flag and regenerating metadata\n"));
+			else
+				do_warn(
+	_("would clear needsrepair flag and regenerate metadata\n"));
+		} else {
+			/*
+			 * Quietly clear needsrepair on the secondary supers as
+			 * part of ensuring them.  If needsrepair is set on the
+			 * primary, it will be done at the very end of repair.
+			 */
+			rval |= XR_AG_SB_SEC;
+		}
+	}
+
 	return(rval);
 }
 
diff --git a/repair/agheader.h b/repair/agheader.h
index a63827c8..552c1f70 100644
--- a/repair/agheader.h
+++ b/repair/agheader.h
@@ -82,3 +82,5 @@ typedef struct fs_geo_list  {
 #define XR_AG_AGF	0x2
 #define XR_AG_AGI	0x4
 #define XR_AG_SB_SEC	0x8
+
+int clear_needsrepair(struct xfs_mount *mp);
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 9409f0d8..d36c5a21 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -1133,7 +1133,9 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
 	format_log_max_lsn(mp);
 
 	/* Report failure if anything failed to get written to our fs. */
-	error = -libxfs_umount(mp);
+	error = clear_needsrepair(mp);
+	if (!error)
+		error = -libxfs_umount(mp);
 	if (error)
 		do_error(
 	_("File system metadata writeout failed, err=%d.  Re-run xfs_repair.\n"),
Brian Foster Jan. 20, 2021, 5:38 p.m. UTC | #7
On Tue, Jan 19, 2021 at 08:31:28PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Clear the needsrepair flag, since it's used to prevent mounting of an
> inconsistent filesystem.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> v2.1: only remove needsrepair at the end of the xfs_repair run
> ---
>  include/xfs_mount.h |    1 +
>  libxfs/init.c       |    2 +-
>  repair/agheader.c   |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  repair/agheader.h   |    2 ++
>  repair/xfs_repair.c |    4 +++-
>  5 files changed, 62 insertions(+), 2 deletions(-)
> 
...
> diff --git a/repair/agheader.c b/repair/agheader.c
> index 8bb99489..56a7f45c 100644
> --- a/repair/agheader.c
> +++ b/repair/agheader.c
> @@ -220,6 +220,40 @@ compare_sb(xfs_mount_t *mp, xfs_sb_t *sb)
>  	return(XR_OK);
>  }
>  
> +/* Clear needsrepair after a successful repair run. */
> +int
> +clear_needsrepair(
> +	struct xfs_mount	*mp)
> +{
> +	struct xfs_buf		*bp;
> +	int			error;
> +
> +	if (!xfs_sb_version_needsrepair(&mp->m_sb) || no_modify)
> +		return 0;
> +
> +	/* We must succeed at flushing all dirty buffers to disk. */
> +	error = -libxfs_flush_mount(mp);
> +	if (error)
> +		return error;
> +

Do we really need a new helper and buf get/relse cycle just to
incorporate the above flush? ISTM we could either lift the
libxfs_bcache_flush() call above the superblock update in the caller,
insert the libxfs_flush_mount() right after that, and just do:

	dsb->sb_features_incompat &= ~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;

... in the hunk that also updates the quota flags.

Though perhaps cleaner would be to keep the helper, but genericize it a
bit to something like final_sb_update() and fold in the qflags update
and whatever flush/ordering is required for the feature bit.

> +	/* Clear needsrepair from the superblock. */
> +	bp = libxfs_getsb(mp);
> +	if (!bp)
> +		return ENOMEM;
> +	if (bp->b_error) {
> +		error = bp->b_error;
> +		libxfs_buf_relse(bp);
> +		return -error;
> +	}
> +
> +	mp->m_sb.sb_features_incompat &= ~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> +
> +	libxfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> +	libxfs_buf_mark_dirty(bp);
> +	libxfs_buf_relse(bp);
> +	return 0;
> +}
> +
>  /*
>   * Possible fields that may have been set at mkfs time,
>   * sb_inoalignmt, sb_unit, sb_width and sb_dirblklog.
> @@ -452,6 +486,27 @@ secondary_sb_whack(
>  			rval |= XR_AG_SB_SEC;
>  	}
>  
> +	if (xfs_sb_version_needsrepair(sb)) {
> +		if (!no_modify)
> +			sb->sb_features_incompat &=
> +					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;

I suspect this should be folded into the check below so we don't modify
the primary sb by accident (should some other check dirty it).

Brian

> +		if (i == 0) {
> +			if (!no_modify)
> +				do_warn(
> +	_("clearing needsrepair flag and regenerating metadata\n"));
> +			else
> +				do_warn(
> +	_("would clear needsrepair flag and regenerate metadata\n"));
> +		} else {
> +			/*
> +			 * Quietly clear needsrepair on the secondary supers as
> +			 * part of ensuring them.  If needsrepair is set on the
> +			 * primary, it will be done at the very end of repair.
> +			 */
> +			rval |= XR_AG_SB_SEC;
> +		}
> +	}
> +
>  	return(rval);
>  }
>  
> diff --git a/repair/agheader.h b/repair/agheader.h
> index a63827c8..552c1f70 100644
> --- a/repair/agheader.h
> +++ b/repair/agheader.h
> @@ -82,3 +82,5 @@ typedef struct fs_geo_list  {
>  #define XR_AG_AGF	0x2
>  #define XR_AG_AGI	0x4
>  #define XR_AG_SB_SEC	0x8
> +
> +int clear_needsrepair(struct xfs_mount *mp);
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index 9409f0d8..d36c5a21 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -1133,7 +1133,9 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
>  	format_log_max_lsn(mp);
>  
>  	/* Report failure if anything failed to get written to our fs. */
> -	error = -libxfs_umount(mp);
> +	error = clear_needsrepair(mp);
> +	if (!error)
> +		error = -libxfs_umount(mp);
>  	if (error)
>  		do_error(
>  	_("File system metadata writeout failed, err=%d.  Re-run xfs_repair.\n"),
>
Darrick J. Wong Feb. 2, 2021, 10:22 p.m. UTC | #8
On Wed, Jan 20, 2021 at 12:38:20PM -0500, Brian Foster wrote:
> On Tue, Jan 19, 2021 at 08:31:28PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Clear the needsrepair flag, since it's used to prevent mounting of an
> > inconsistent filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > v2.1: only remove needsrepair at the end of the xfs_repair run
> > ---
> >  include/xfs_mount.h |    1 +
> >  libxfs/init.c       |    2 +-
> >  repair/agheader.c   |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  repair/agheader.h   |    2 ++
> >  repair/xfs_repair.c |    4 +++-
> >  5 files changed, 62 insertions(+), 2 deletions(-)
> > 
> ...
> > diff --git a/repair/agheader.c b/repair/agheader.c
> > index 8bb99489..56a7f45c 100644
> > --- a/repair/agheader.c
> > +++ b/repair/agheader.c
> > @@ -220,6 +220,40 @@ compare_sb(xfs_mount_t *mp, xfs_sb_t *sb)
> >  	return(XR_OK);
> >  }
> >  
> > +/* Clear needsrepair after a successful repair run. */
> > +int
> > +clear_needsrepair(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct xfs_buf		*bp;
> > +	int			error;
> > +
> > +	if (!xfs_sb_version_needsrepair(&mp->m_sb) || no_modify)
> > +		return 0;
> > +
> > +	/* We must succeed at flushing all dirty buffers to disk. */
> > +	error = -libxfs_flush_mount(mp);
> > +	if (error)
> > +		return error;
> > +
> 
> Do we really need a new helper and buf get/relse cycle just to
> incorporate the above flush? ISTM we could either lift the

Yes.  If quotacheck thinks the dquots are bad, we always want to try to
clear the CHKD flags in the primary superblock, even if other disk
writes fail due to IO errors or write verifiers failing, etc.  Note that
libxfs_bcache_flush only pwrite()s the dirty buffers to disk, it doesn't
force the disks to persist to stable media.

However, if NEEDSREPAIR was set and /any/ part of persisting the dirty
buffers fails (write verifier trips, media errors, etc.) then we don't
even want to try to clear NEEDSREPAIR.

Because of that requirement, the two writes have to be separate steps,
separated by a big flush.

--D

> libxfs_bcache_flush() call above the superblock update in the caller,
> insert the libxfs_flush_mount() right after that, and just do:
> 
> 	dsb->sb_features_incompat &= ~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> 
> ... in the hunk that also updates the quota flags.
> 
> Though perhaps cleaner would be to keep the helper, but genericize it a
> bit to something like final_sb_update() and fold in the qflags update
> and whatever flush/ordering is required for the feature bit.
> 
> > +	/* Clear needsrepair from the superblock. */
> > +	bp = libxfs_getsb(mp);
> > +	if (!bp)
> > +		return ENOMEM;
> > +	if (bp->b_error) {
> > +		error = bp->b_error;
> > +		libxfs_buf_relse(bp);
> > +		return -error;
> > +	}
> > +
> > +	mp->m_sb.sb_features_incompat &= ~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> > +
> > +	libxfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > +	libxfs_buf_mark_dirty(bp);
> > +	libxfs_buf_relse(bp);
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Possible fields that may have been set at mkfs time,
> >   * sb_inoalignmt, sb_unit, sb_width and sb_dirblklog.
> > @@ -452,6 +486,27 @@ secondary_sb_whack(
> >  			rval |= XR_AG_SB_SEC;
> >  	}
> >  
> > +	if (xfs_sb_version_needsrepair(sb)) {
> > +		if (!no_modify)
> > +			sb->sb_features_incompat &=
> > +					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> 
> I suspect this should be folded into the check below so we don't modify
> the primary sb by accident (should some other check dirty it).

Yup.  Fixed.

--D

> 
> Brian
> 
> > +		if (i == 0) {
> > +			if (!no_modify)
> > +				do_warn(
> > +	_("clearing needsrepair flag and regenerating metadata\n"));
> > +			else
> > +				do_warn(
> > +	_("would clear needsrepair flag and regenerate metadata\n"));
> > +		} else {
> > +			/*
> > +			 * Quietly clear needsrepair on the secondary supers as
> > +			 * part of ensuring them.  If needsrepair is set on the
> > +			 * primary, it will be done at the very end of repair.
> > +			 */
> > +			rval |= XR_AG_SB_SEC;
> > +		}
> > +	}
> > +
> >  	return(rval);
> >  }
> >  
> > diff --git a/repair/agheader.h b/repair/agheader.h
> > index a63827c8..552c1f70 100644
> > --- a/repair/agheader.h
> > +++ b/repair/agheader.h
> > @@ -82,3 +82,5 @@ typedef struct fs_geo_list  {
> >  #define XR_AG_AGF	0x2
> >  #define XR_AG_AGI	0x4
> >  #define XR_AG_SB_SEC	0x8
> > +
> > +int clear_needsrepair(struct xfs_mount *mp);
> > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > index 9409f0d8..d36c5a21 100644
> > --- a/repair/xfs_repair.c
> > +++ b/repair/xfs_repair.c
> > @@ -1133,7 +1133,9 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
> >  	format_log_max_lsn(mp);
> >  
> >  	/* Report failure if anything failed to get written to our fs. */
> > -	error = -libxfs_umount(mp);
> > +	error = clear_needsrepair(mp);
> > +	if (!error)
> > +		error = -libxfs_umount(mp);
> >  	if (error)
> >  		do_error(
> >  	_("File system metadata writeout failed, err=%d.  Re-run xfs_repair.\n"),
> > 
>
diff mbox series

Patch

diff --git a/repair/agheader.c b/repair/agheader.c
index 8bb99489..d9b72d3a 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -452,6 +452,21 @@  secondary_sb_whack(
 			rval |= XR_AG_SB_SEC;
 	}
 
+	if (xfs_sb_version_needsrepair(sb)) {
+		if (!no_modify)
+			sb->sb_features_incompat &=
+					~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
+		if (i == 0) {
+			if (!no_modify)
+				do_warn(
+	_("clearing needsrepair flag and regenerating metadata\n"));
+			else
+				do_warn(
+	_("would clear needsrepair flag and regenerate metadata\n"));
+		}
+		rval |= XR_AG_SB_SEC;
+	}
+
 	return(rval);
 }