diff mbox

[08/10] xfs_scrub: only retry non-permanent repair failures

Message ID 153006772140.20121.17551661183487526760.stgit@magnolia (mailing list archive)
State Accepted
Headers show

Commit Message

Darrick J. Wong June 27, 2018, 2:48 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

If a repair fails, we want to retry the repair if the error was a
transient one, such as ENOMEM.  For "permanent" ones (shutdown fs,
repair not supported by kernel, readonly fs) there's no point to
retrying them so just error out immediately.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/scrub.c |   37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Sandeen July 26, 2018, 1:16 a.m. UTC | #1
On 6/26/18 7:48 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If a repair fails, we want to retry the repair if the error was a
> transient one, such as ENOMEM.  For "permanent" ones (shutdown fs,
> repair not supported by kernel, readonly fs) there's no point to
> retrying them so just error out immediately.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Is there something that stops infinite retries?
(I need to get this all applied for context I guess)

But anyway,
Reviewed-by: Eric Sandeen <sandeen@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong July 26, 2018, 1:19 a.m. UTC | #2
On Wed, Jul 25, 2018 at 06:16:15PM -0700, Eric Sandeen wrote:
> On 6/26/18 7:48 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > If a repair fails, we want to retry the repair if the error was a
> > transient one, such as ENOMEM.  For "permanent" ones (shutdown fs,
> > repair not supported by kernel, readonly fs) there's no point to
> > retrying them so just error out immediately.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Is there something that stops infinite retries?
> (I need to get this all applied for context I guess)

If scrub notices the set of unrepaired things stops shrinking it'll try
them all again single-threaded and complain about anything that didn't
get fixed.

--D

> But anyway,
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/scrub/scrub.c b/scrub/scrub.c
index 2ac146a9..b20c1cbe 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -758,13 +758,6 @@  xfs_repair_metadata(
 		str_info(ctx, buf, _("Attempting optimization."));
 
 	error = ioctl(fd, XFS_IOC_SCRUB_METADATA, &meta);
-	/*
-	 * If the caller doesn't want us to complain, tell the caller to
-	 * requeue the repair for later and don't say a thing.
-	 */
-	if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED) &&
-	    (error || needs_repair(&meta)))
-		return CHECK_RETRY;
 	if (error) {
 		switch (errno) {
 		case EDEADLOCK:
@@ -781,6 +774,16 @@  _("Filesystem is shut down, aborting."));
 			return CHECK_ABORT;
 		case ENOTTY:
 		case EOPNOTSUPP:
+			/*
+			 * If we're in no-complain mode, requeue the check for
+			 * later.  It's possible that an error in another
+			 * component caused us to flag an error in this
+			 * component.  Even if the kernel didn't think it
+			 * could fix this, it's at least worth trying the scan
+			 * again to see if another repair fixed it.
+			 */
+			if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED))
+				return CHECK_RETRY;
 			/*
 			 * If we forced repairs or this is a preen, don't
 			 * error out if the kernel doesn't know how to fix.
@@ -810,7 +813,14 @@  _("Read-only filesystem; cannot make changes."));
 				return CHECK_DONE;
 			/* fall through */
 		default:
-			/* Operational error. */
+			/*
+			 * Operational error.  If the caller doesn't want us
+			 * to complain about repair failures, tell the caller
+			 * to requeue the repair for later and don't say a
+			 * thing.  Otherwise, print error and bail out.
+			 */
+			if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED))
+				return CHECK_RETRY;
 			str_errno(ctx, buf);
 			return CHECK_DONE;
 		}
@@ -818,9 +828,14 @@  _("Read-only filesystem; cannot make changes."));
 	if (repair_flags & XRM_COMPLAIN_IF_UNFIXED)
 		xfs_scrub_warn_incomplete_scrub(ctx, buf, &meta);
 	if (needs_repair(&meta)) {
-		/* Still broken, try again or fix offline. */
-		if ((repair_flags & XRM_COMPLAIN_IF_UNFIXED) || debug)
-			str_error(ctx, buf,
+		/*
+		 * Still broken; if we've been told not to complain then we
+		 * just requeue this and try again later.  Otherwise we
+		 * log the error loudly and don't try again.
+		 */
+		if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED))
+			return CHECK_RETRY;
+		str_error(ctx, buf,
 _("Repair unsuccessful; offline repair required."));
 	} else {
 		/* Clean metadata, no corruption remains. */