diff mbox

[04/16] xfs_scrub: don't ask user to run xfs_repair for only warnings

Message ID 151993160036.22223.2742740914776483127.stgit@magnolia (mailing list archive)
State Superseded
Headers show

Commit Message

Darrick J. Wong March 1, 2018, 7:13 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Don't advise the user to run xfs_repair on a filesystem that triggers
warnings but no errors; there's no corruption for it to fix.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/xfs_scrub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Sandeen March 6, 2018, 5:16 p.m. UTC | #1
On 3/1/18 1:13 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Don't advise the user to run xfs_repair on a filesystem that triggers
> warnings but no errors; there's no corruption for it to fix.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

I went looking for why ->need_repair is set if repair isn't needed, and:

C symbol: need_repair

  File              Function	   Line
0 scrub/xfs_scrub.h <global>        98 bool need_repair;
1 scrub/phase1.c    xfs_setup_fs   239 ctx->need_repair = true;
2 scrub/xfs_scrub.c report_outcome 517 if (ctx->need_repair)

um, when is ->need_repair ever false?  What am I missing?

> ---
>  scrub/xfs_scrub.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
> index ab26e63..53a105a 100644
> --- a/scrub/xfs_scrub.c
> +++ b/scrub/xfs_scrub.c
> @@ -514,7 +514,7 @@ report_outcome(
>  		fprintf(stderr, _("%s: errors found: %llu; warnings found: %llu\n"),
>  				ctx->mntpoint, total_errors,
>  				ctx->warnings_found);
> -	if (ctx->need_repair)
> +	if (ctx->need_repair && total_errors > 0)
>  		fprintf(stderr, _("%s: Unmount and run xfs_repair.\n"),
>  				ctx->mntpoint);
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 6, 2018, 5:27 p.m. UTC | #2
On Tue, Mar 06, 2018 at 11:16:50AM -0600, Eric Sandeen wrote:
> 
> 
> On 3/1/18 1:13 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Don't advise the user to run xfs_repair on a filesystem that triggers
> > warnings but no errors; there's no corruption for it to fix.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> I went looking for why ->need_repair is set if repair isn't needed, and:
> 
> C symbol: need_repair
> 
>   File              Function	   Line
> 0 scrub/xfs_scrub.h <global>        98 bool need_repair;
> 1 scrub/phase1.c    xfs_setup_fs   239 ctx->need_repair = true;
> 2 scrub/xfs_scrub.c report_outcome 517 if (ctx->need_repair)
> 
> um, when is ->need_repair ever false?  What am I missing?

In main():

struct scrub_ctx	ctx = {0};

ctx.need_repair is false from the start of the program until the end of
phase 1 when we've decided that yes we can check this xfs filesystem.

--D

> > ---
> >  scrub/xfs_scrub.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> > diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
> > index ab26e63..53a105a 100644
> > --- a/scrub/xfs_scrub.c
> > +++ b/scrub/xfs_scrub.c
> > @@ -514,7 +514,7 @@ report_outcome(
> >  		fprintf(stderr, _("%s: errors found: %llu; warnings found: %llu\n"),
> >  				ctx->mntpoint, total_errors,
> >  				ctx->warnings_found);
> > -	if (ctx->need_repair)
> > +	if (ctx->need_repair && total_errors > 0)
> >  		fprintf(stderr, _("%s: Unmount and run xfs_repair.\n"),
> >  				ctx->mntpoint);
> >  }
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 6, 2018, 6:34 p.m. UTC | #3
On 3/6/18 11:27 AM, Darrick J. Wong wrote:
> On Tue, Mar 06, 2018 at 11:16:50AM -0600, Eric Sandeen wrote:
>>
>>
>> On 3/1/18 1:13 PM, Darrick J. Wong wrote:
>>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>>
>>> Don't advise the user to run xfs_repair on a filesystem that triggers
>>> warnings but no errors; there's no corruption for it to fix.
>>>
>>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>>
>> I went looking for why ->need_repair is set if repair isn't needed, and:
>>
>> C symbol: need_repair
>>
>>   File              Function	   Line
>> 0 scrub/xfs_scrub.h <global>        98 bool need_repair;
>> 1 scrub/phase1.c    xfs_setup_fs   239 ctx->need_repair = true;
>> 2 scrub/xfs_scrub.c report_outcome 517 if (ctx->need_repair)
>>
>> um, when is ->need_repair ever false?  What am I missing?
> 
> In main():
> 
> struct scrub_ctx	ctx = {0};
> 
> ctx.need_repair is false from the start of the program until the end of
> phase 1 when we've decided that yes we can check this xfs filesystem.

Ok so after more looking & discussion, what ->need_repair really means
is "we got far enough to run the scrub ioctl?"

If that's true, and errors remain for any reason (?), the user is told
to run repair.

So while I see that this patch improves the user experience, I wonder
if we shouldn't take this opportunity to improve the developer experience
by renaming ->need_repair to ->scrub_ran or something, because I think
that makes a bit more sense semantically:

if (scrub ioctl ran && errors remain)
	tell_user("run repair")

My other quibble is that if (scrub ioctl ran && errors remain) is true only
because "-n" was specified, it seems a little odd to instruct the user
to run repair, when the errors may remain only because of -n.  But that's
a separate issue, I guess.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 6, 2018, 6:53 p.m. UTC | #4
On Tue, Mar 06, 2018 at 12:34:20PM -0600, Eric Sandeen wrote:
> 
> 
> On 3/6/18 11:27 AM, Darrick J. Wong wrote:
> > On Tue, Mar 06, 2018 at 11:16:50AM -0600, Eric Sandeen wrote:
> >>
> >>
> >> On 3/1/18 1:13 PM, Darrick J. Wong wrote:
> >>> From: Darrick J. Wong <darrick.wong@oracle.com>
> >>>
> >>> Don't advise the user to run xfs_repair on a filesystem that triggers
> >>> warnings but no errors; there's no corruption for it to fix.
> >>>
> >>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >>
> >> I went looking for why ->need_repair is set if repair isn't needed, and:
> >>
> >> C symbol: need_repair
> >>
> >>   File              Function	   Line
> >> 0 scrub/xfs_scrub.h <global>        98 bool need_repair;
> >> 1 scrub/phase1.c    xfs_setup_fs   239 ctx->need_repair = true;
> >> 2 scrub/xfs_scrub.c report_outcome 517 if (ctx->need_repair)
> >>
> >> um, when is ->need_repair ever false?  What am I missing?
> > 
> > In main():
> > 
> > struct scrub_ctx	ctx = {0};
> > 
> > ctx.need_repair is false from the start of the program until the end of
> > phase 1 when we've decided that yes we can check this xfs filesystem.
> 
> Ok so after more looking & discussion, what ->need_repair really means
> is "we got far enough to run the scrub ioctl?"
> 
> If that's true, and errors remain for any reason (?), the user is told
> to run repair.
> 
> So while I see that this patch improves the user experience, I wonder
> if we shouldn't take this opportunity to improve the developer experience
> by renaming ->need_repair to ->scrub_ran or something, because I think
> that makes a bit more sense semantically:
> 
> if (scrub ioctl ran && errors remain)
> 	tell_user("run repair")

Ok.  I'll update the name.

> My other quibble is that if (scrub ioctl ran && errors remain) is true only
> because "-n" was specified, it seems a little odd to instruct the user
> to run repair, when the errors may remain only because of -n.  But that's
> a separate issue, I guess.

My thought process here is that any time we leave errors behind on the
filesystem we should advise the caller to run xfs_repair, whether that's
because the caller told us to fix things and we failed, or because the
caller trusts xfs_scrub to find the errors but not to fix them and
therefore ran xfs_scrub -n.  Either way you have a broken fs and need to
repair it.

However, I wonder if you're thinking "the user told (scrub) they didn't
want to change anything, so why would we advise the user to run a
(repair) tool that changes things"?

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 6, 2018, 7 p.m. UTC | #5
On 3/6/18 12:53 PM, Darrick J. Wong wrote:
> On Tue, Mar 06, 2018 at 12:34:20PM -0600, Eric Sandeen wrote:

...

>> My other quibble is that if (scrub ioctl ran && errors remain) is true only
>> because "-n" was specified, it seems a little odd to instruct the user
>> to run repair, when the errors may remain only because of -n.  But that's
>> a separate issue, I guess.
> 
> My thought process here is that any time we leave errors behind on the
> filesystem we should advise the caller to run xfs_repair, whether that's
> because the caller told us to fix things and we failed, or because the
> caller trusts xfs_scrub to find the errors but not to fix them and
> therefore ran xfs_scrub -n.  Either way you have a broken fs and need to
> repair it.
> 
> However, I wonder if you're thinking "the user told (scrub) they didn't
> want to change anything, so why would we advise the user to run a
> (repair) tool that changes things"?

I guess my thinking is that in reality the user has two options and the
tool is issuing a specific instruction to use only one of them.  I don't
think we can guess what the user does or doesn't trust.

Perhaps just something along the lines of

        if (ctx->need_repair) {
                fprintf(stderr, _("%s: Unmount and run xfs_repair.\n"),
                                ctx->mntpoint);
                if (ctx->mode = SCRUB_MODE_DRY_RUN)
                        fprintf(stderr, _("%s: Or, re-run without '-n'.\n"),
                                        ctx->mntpoint);
        }

or whatever ordering/phrasing makes sense?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 6, 2018, 11:24 p.m. UTC | #6
On Tue, Mar 06, 2018 at 01:00:47PM -0600, Eric Sandeen wrote:
> On 3/6/18 12:53 PM, Darrick J. Wong wrote:
> > On Tue, Mar 06, 2018 at 12:34:20PM -0600, Eric Sandeen wrote:
> 
> ...
> 
> >> My other quibble is that if (scrub ioctl ran && errors remain) is true only
> >> because "-n" was specified, it seems a little odd to instruct the user
> >> to run repair, when the errors may remain only because of -n.  But that's
> >> a separate issue, I guess.
> > 
> > My thought process here is that any time we leave errors behind on the
> > filesystem we should advise the caller to run xfs_repair, whether that's
> > because the caller told us to fix things and we failed, or because the
> > caller trusts xfs_scrub to find the errors but not to fix them and
> > therefore ran xfs_scrub -n.  Either way you have a broken fs and need to
> > repair it.
> > 
> > However, I wonder if you're thinking "the user told (scrub) they didn't
> > want to change anything, so why would we advise the user to run a
> > (repair) tool that changes things"?
> 
> I guess my thinking is that in reality the user has two options and the
> tool is issuing a specific instruction to use only one of them.  I don't
> think we can guess what the user does or doesn't trust.
> 
> Perhaps just something along the lines of
> 
>         if (ctx->need_repair) {
>                 fprintf(stderr, _("%s: Unmount and run xfs_repair.\n"),
>                                 ctx->mntpoint);

"need_repair" has been changed to "scrub_setup_succeeded".

>                 if (ctx->mode = SCRUB_MODE_DRY_RUN)
>                         fprintf(stderr, _("%s: Or, re-run without '-n'.\n"),
>                                         ctx->mntpoint);

I'll do that, but not until the patch that adds fs repair
functionality to xfs_scrub.

--D

>         }
> 
> or whatever ordering/phrasing makes sense?
> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index ab26e63..53a105a 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -514,7 +514,7 @@  report_outcome(
 		fprintf(stderr, _("%s: errors found: %llu; warnings found: %llu\n"),
 				ctx->mntpoint, total_errors,
 				ctx->warnings_found);
-	if (ctx->need_repair)
+	if (ctx->need_repair && total_errors > 0)
 		fprintf(stderr, _("%s: Unmount and run xfs_repair.\n"),
 				ctx->mntpoint);
 }