diff mbox series

[6/7] xfs_repair: check that metadata updates have been committed

Message ID 158258946575.451075.126426300036283442.stgit@magnolia (mailing list archive)
State Superseded
Headers show
Series xfsprogs: actually check that writes succeeded | expand

Commit Message

Darrick J. Wong Feb. 25, 2020, 12:11 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure that any metadata that we repaired or regenerated has been
written to disk.  If that fails, exit with 1 to signal that there are
still errors in the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/xfs_repair.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Brian Foster Feb. 25, 2020, 3:08 p.m. UTC | #1
On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Make sure that any metadata that we repaired or regenerated has been
> written to disk.  If that fails, exit with 1 to signal that there are
> still errors in the filesystem.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  repair/xfs_repair.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index eb1ce546..ccb13f4a 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -703,6 +703,7 @@ main(int argc, char **argv)
>  	struct xfs_sb	psb;
>  	int		rval;
>  	struct xfs_ino_geometry	*igeo;
> +	int		error;
>  
>  	progname = basename(argv[0]);
>  	setlocale(LC_ALL, "");
> @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
>  	 */
>  	libxfs_bcache_flush();
>  	format_log_max_lsn(mp);
> -	libxfs_umount(mp);
> +
> +	/* Report failure if anything failed to get written to our fs. */
> +	error = -libxfs_umount(mp);
> +	if (error)
> +		exit(1);

I wonder a bit whether repair should really exit like this vs. report
the error as it does for most others, but I could go either way. I'll
defer to Eric:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  
>  	if (x.rtdev)
>  		libxfs_device_close(x.rtdev);
>
Darrick J. Wong Feb. 25, 2020, 3:14 p.m. UTC | #2
On Tue, Feb 25, 2020 at 10:08:17AM -0500, Brian Foster wrote:
> On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Make sure that any metadata that we repaired or regenerated has been
> > written to disk.  If that fails, exit with 1 to signal that there are
> > still errors in the filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  repair/xfs_repair.c |    7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > index eb1ce546..ccb13f4a 100644
> > --- a/repair/xfs_repair.c
> > +++ b/repair/xfs_repair.c
> > @@ -703,6 +703,7 @@ main(int argc, char **argv)
> >  	struct xfs_sb	psb;
> >  	int		rval;
> >  	struct xfs_ino_geometry	*igeo;
> > +	int		error;
> >  
> >  	progname = basename(argv[0]);
> >  	setlocale(LC_ALL, "");
> > @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
> >  	 */
> >  	libxfs_bcache_flush();
> >  	format_log_max_lsn(mp);
> > -	libxfs_umount(mp);
> > +
> > +	/* Report failure if anything failed to get written to our fs. */
> > +	error = -libxfs_umount(mp);
> > +	if (error)
> > +		exit(1);
> 
> I wonder a bit whether repair should really exit like this vs. report
> the error as it does for most others, but I could go either way. I'll
> defer to Eric:

I suppose I could do:

	error = -libxfs_umount();
	if (error)
		do_error(_("fs unmount failed (err=%d), re-run repair!\n"),
				error);

Though then you'd end up with:

	# xfs_repair /dev/fd0
	...
	Refusing to write corrupted metadata to the data device!
	fs unmount failed (err=117), re-run repair!
	# echo $?
	1

Which seems a little redundant.  But let's see what Eric thinks.

> Reviewed-by: Brian Foster <bfoster@redhat.com>

Anyway, thanks for reviewing!

--D

> >  
> >  	if (x.rtdev)
> >  		libxfs_device_close(x.rtdev);
> > 
>
Christoph Hellwig Feb. 25, 2020, 5:38 p.m. UTC | #3
On Tue, Feb 25, 2020 at 07:14:27AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 25, 2020 at 10:08:17AM -0500, Brian Foster wrote:
> > On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Make sure that any metadata that we repaired or regenerated has been
> > > written to disk.  If that fails, exit with 1 to signal that there are
> > > still errors in the filesystem.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  repair/xfs_repair.c |    7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > 
> > > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > > index eb1ce546..ccb13f4a 100644
> > > --- a/repair/xfs_repair.c
> > > +++ b/repair/xfs_repair.c
> > > @@ -703,6 +703,7 @@ main(int argc, char **argv)
> > >  	struct xfs_sb	psb;
> > >  	int		rval;
> > >  	struct xfs_ino_geometry	*igeo;
> > > +	int		error;
> > >  
> > >  	progname = basename(argv[0]);
> > >  	setlocale(LC_ALL, "");
> > > @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
> > >  	 */
> > >  	libxfs_bcache_flush();
> > >  	format_log_max_lsn(mp);
> > > -	libxfs_umount(mp);
> > > +
> > > +	/* Report failure if anything failed to get written to our fs. */
> > > +	error = -libxfs_umount(mp);
> > > +	if (error)
> > > +		exit(1);
> > 
> > I wonder a bit whether repair should really exit like this vs. report
> > the error as it does for most others, but I could go either way. I'll
> > defer to Eric:
> 
> I suppose I could do:
> 
> 	error = -libxfs_umount();
> 	if (error)
> 		do_error(_("fs unmount failed (err=%d), re-run repair!\n"),
> 				error);
> 
> Though then you'd end up with:
> 
> 	# xfs_repair /dev/fd0
> 	...
> 	Refusing to write corrupted metadata to the data device!
> 	fs unmount failed (err=117), re-run repair!
> 	# echo $?
> 	1
> 
> Which seems a little redundant.  But let's see what Eric thinks.

I think this message would be at last somewhat useful.
Eric Sandeen Feb. 29, 2020, 10:37 p.m. UTC | #4
On 2/25/20 9:38 AM, Christoph Hellwig wrote:
> On Tue, Feb 25, 2020 at 07:14:27AM -0800, Darrick J. Wong wrote:
>> On Tue, Feb 25, 2020 at 10:08:17AM -0500, Brian Foster wrote:
>>> On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
>>>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>>>
>>>> Make sure that any metadata that we repaired or regenerated has been
>>>> written to disk.  If that fails, exit with 1 to signal that there are
>>>> still errors in the filesystem.
>>>>
>>>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>>>> ---
>>>>  repair/xfs_repair.c |    7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>>
>>>> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
>>>> index eb1ce546..ccb13f4a 100644
>>>> --- a/repair/xfs_repair.c
>>>> +++ b/repair/xfs_repair.c
>>>> @@ -703,6 +703,7 @@ main(int argc, char **argv)
>>>>  	struct xfs_sb	psb;
>>>>  	int		rval;
>>>>  	struct xfs_ino_geometry	*igeo;
>>>> +	int		error;
>>>>  
>>>>  	progname = basename(argv[0]);
>>>>  	setlocale(LC_ALL, "");
>>>> @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
>>>>  	 */
>>>>  	libxfs_bcache_flush();
>>>>  	format_log_max_lsn(mp);
>>>> -	libxfs_umount(mp);
>>>> +
>>>> +	/* Report failure if anything failed to get written to our fs. */
>>>> +	error = -libxfs_umount(mp);
>>>> +	if (error)
>>>> +		exit(1);
>>>
>>> I wonder a bit whether repair should really exit like this vs. report
>>> the error as it does for most others, but I could go either way. I'll
>>> defer to Eric:
>>
>> I suppose I could do:
>>
>> 	error = -libxfs_umount();
>> 	if (error)
>> 		do_error(_("fs unmount failed (err=%d), re-run repair!\n"),
>> 				error);
>>
>> Though then you'd end up with:
>>
>> 	# xfs_repair /dev/fd0
>> 	...
>> 	Refusing to write corrupted metadata to the data device!
>> 	fs unmount failed (err=117), re-run repair!
>> 	# echo $?
>> 	1
>>
>> Which seems a little redundant.  But let's see what Eric thinks.
> 
> I think this message would be at last somewhat useful.

I also think this is good, thank you guys for working it out while I'm
half-traveling/half-working (as was hch but he's keeping up!)  :)

-Eric
diff mbox series

Patch

diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index eb1ce546..ccb13f4a 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -703,6 +703,7 @@  main(int argc, char **argv)
 	struct xfs_sb	psb;
 	int		rval;
 	struct xfs_ino_geometry	*igeo;
+	int		error;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -1104,7 +1105,11 @@  _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
 	 */
 	libxfs_bcache_flush();
 	format_log_max_lsn(mp);
-	libxfs_umount(mp);
+
+	/* Report failure if anything failed to get written to our fs. */
+	error = -libxfs_umount(mp);
+	if (error)
+		exit(1);
 
 	if (x.rtdev)
 		libxfs_device_close(x.rtdev);