From patchwork Sun Dec 31 22:36:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507914 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D821FC129 for ; Sun, 31 Dec 2023 22:36:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MTVj0SeG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 522D9C433C7; Sun, 31 Dec 2023 22:36:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704062202; bh=rc19ZligsqDIJgYpNJdRPmEId6g1f2+XiUi89y8clNc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=MTVj0SeGo6dkWRvt2lC5Eez7B6LLmXGVh+fgD/SCqKBJCbahHYVCfKBVYOQGeQcfF XgOCCcBvLwluyXxuYAUr74gIFvCVg6bhafH7LtLs3Dd7iWO2YFGUb5jvn4Y+8IvXZZ n2n4IjyCDCPovgwE63AeVrHIYUmm657vZpygXEYSNvvfce+D+cEiDea/KMJn9I1mlR 7/wU4iU1wQpiz2SAt2w0pIa7XokPSZiv6wN9xDEqqDR/mp1PNJOxdUGIthevo3dbZY CKWU6PmhH185j2mTgbAhiE5pOq2xbXvEFjJDC5Br/8+6IILWK40GZi+M5If7fzXqJX j0UdYPNr1f6Tg== Date: Sun, 31 Dec 2023 14:36:41 -0800 Subject: [PATCH 1/7] xfs_scrub: flush stdout after printing to it From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404998660.1797322.4141893748731169587.stgit@frogsfrogsfrogs> In-Reply-To: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> References: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Make sure we flush stdout after printf'ing to it, especially before we start any operation that could take a while to complete. Most of scrub already does this, but we missed a couple of spots. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/xfs_scrub.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c index a1b67544391..752180d646b 100644 --- a/scrub/xfs_scrub.c +++ b/scrub/xfs_scrub.c @@ -535,6 +535,7 @@ _("%s: repairs made: %llu.\n"), fprintf(stdout, _("%s: optimizations made: %llu.\n"), ctx->mntpoint, ctx->preens); + fflush(stdout); } static void @@ -620,6 +621,7 @@ main( int error; fprintf(stdout, "EXPERIMENTAL xfs_scrub program in use! Use at your own risk!\n"); + fflush(stdout); progname = basename(argv[0]); setlocale(LC_ALL, ""); From patchwork Sun Dec 31 22:36:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507915 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33D46C127 for ; Sun, 31 Dec 2023 22:36:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f1ZWwO8N" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 023C9C433C8; Sun, 31 Dec 2023 22:36:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704062218; bh=01jAn6gWaaHqTr6yKIZBTZVcOjRFQ0DyFQzbAt4W6RM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=f1ZWwO8N8fs0fePcDpMbsdDKMCwqPVnKUxtcMd2e3z9Ds7YDxFf5/eYucSDHsN84k kQGCpFsaC2miv73OKaoQWNUjN2aEctyYCV8o8nVjiNpcqp9NEZhtd6Yck89KEmbBjr JFaavBI5Bmvc7x2nitg3CydYEur3H1anjdYY+PULZd5K2ssOTcPjl5klgIwdonNTZQ iTAAkUQmg46ly2/ZN9msHOfh7wCOYRvgckEIFDEQ4PzaM7LUK7mU9REMJNOMcc+i2v mYWGg2RGYH49UIt6/cqwODJYl2GHe27XeLba6RDwB2OwWkWWXwCSyOQGRb3f8+mz6r UeVmxhU3wDp1w== Date: Sun, 31 Dec 2023 14:36:57 -0800 Subject: [PATCH 2/7] xfs_scrub: don't report media errors for space with unknowable owner From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404998673.1797322.18151501437365003721.stgit@frogsfrogsfrogs> In-Reply-To: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> References: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong On filesystems that don't have the reverse mapping feature enabled, the GETFSMAP call cannot tell us much about the owner of a space extent -- we're limited to static fs metadata, free space, or "unknown". In this case, nothing is corrupt, so str_corrupt is not an appropriate logging function. Relax this to str_info so that the user sees a notice that media errors have been found so that the user knows something bad happened even if the directory tree walker cannot find the file owning the space where the media error was found. Filesystems with rmap enabled are never supposed to return OWN_UNKNOWN from a GETFSMAP report, so continue to report that as a corruption. This fixes a regression in xfs/556. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/phase6.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/scrub/phase6.c b/scrub/phase6.c index 33c3c8bde3c..99a32bc7962 100644 --- a/scrub/phase6.c +++ b/scrub/phase6.c @@ -397,7 +397,18 @@ report_ioerr_fsmap( snprintf(buf, DESCR_BUFSZ, _("disk offset %"PRIu64), (uint64_t)map->fmr_physical + err_off); type = decode_special_owner(map->fmr_owner); - str_corrupt(ctx, buf, _("media error in %s."), type); + /* + * On filesystems that don't store reverse mappings, the + * GETFSMAP call returns OWNER_UNKNOWN for allocated space. + * We'll have to let the directory tree walker find the file + * that lost data. + */ + if (!(ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT) && + map->fmr_owner == XFS_FMR_OWN_UNKNOWN) { + str_info(ctx, buf, _("media error detected.")); + } else { + str_corrupt(ctx, buf, _("media error in %s."), type); + } } /* Report extent maps */ From patchwork Sun Dec 31 22:37:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507916 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BDD9C127 for ; Sun, 31 Dec 2023 22:37:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HAXLocEu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE490C433C8; Sun, 31 Dec 2023 22:37:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704062233; bh=2XNTNBIc+Fz9ijRJ5TpwOsJr76EMS9tsKdtj4dJXfrU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=HAXLocEunyyvDJEtMfRtdbOZ15foh7R/6xvAIKhJI7s0x4pQIQnM/6FpwXCF6oLbw slXk0YXeLUB9CYxLlMNQnrwBebDgLbMOSFgLLD6wnMgDa1QD098ZkLjp5haOIqSON2 W8wkusQ2v52DpYc2E8Mzbbo//gQUnbUN4p+XoAeTZoY0OB1dk/wawRmNPKllzEbdre NuhsT/Uqdf7mmSfNuufTjnRU1k05BwuOaXuP2DtfXyoYQF5uZ1VaWb3NnJv9SyGpxS OagzCRi8MB0PtAGJpdGn2aREJOmh6/gjc9EgK/f43ty+V8OR7yPeUXjxp+syjjFiJA 9AuMHvv4WfC4g== Date: Sun, 31 Dec 2023 14:37:13 -0800 Subject: [PATCH 3/7] xfs_scrub: remove ALP_* flags namespace From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404998686.1797322.5266294710113595532.stgit@frogsfrogsfrogs> In-Reply-To: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> References: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong In preparation to move all the repair code to repair.[ch], remove the ALP_* flags namespace since it mostly overlaps with XRM_*. Rename the clunky "COMPLAIN_IF_UNFIXED" flag to "FINAL_WARNING", because that's what it really means. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/phase3.c | 2 +- scrub/phase4.c | 2 +- scrub/phase5.c | 2 +- scrub/phase7.c | 2 +- scrub/repair.c | 4 ++-- scrub/repair.h | 16 ++++++++++++---- scrub/scrub.c | 10 +++++----- scrub/scrub.h | 10 ---------- 8 files changed, 23 insertions(+), 25 deletions(-) diff --git a/scrub/phase3.c b/scrub/phase3.c index 4235c228c0e..9a26b92036c 100644 --- a/scrub/phase3.c +++ b/scrub/phase3.c @@ -88,7 +88,7 @@ try_inode_repair( return 0; ret = action_list_process(ictx->ctx, fd, alist, - ALP_REPAIR_ONLY | ALP_NOPROGRESS); + XRM_REPAIR_ONLY | XRM_NOPROGRESS); if (ret) return ret; diff --git a/scrub/phase4.c b/scrub/phase4.c index 8807f147aed..d42e67637d8 100644 --- a/scrub/phase4.c +++ b/scrub/phase4.c @@ -54,7 +54,7 @@ repair_ag( } while (unfixed > 0); /* Try once more, but this time complain if we can't fix things. */ - flags |= ALP_COMPLAIN_IF_UNFIXED; + flags |= XRM_FINAL_WARNING; ret = action_list_process(ctx, -1, alist, flags); if (ret) *aborted = true; diff --git a/scrub/phase5.c b/scrub/phase5.c index b4c635d3452..940e434c3cd 100644 --- a/scrub/phase5.c +++ b/scrub/phase5.c @@ -422,7 +422,7 @@ fs_scan_worker( } ret = action_list_process(ctx, ctx->mnt.fd, &item->alist, - ALP_COMPLAIN_IF_UNFIXED | ALP_NOPROGRESS); + XRM_FINAL_WARNING | XRM_NOPROGRESS); if (ret) { str_liberror(ctx, ret, _("repairing fs scan metadata")); *item->abortedp = true; diff --git a/scrub/phase7.c b/scrub/phase7.c index 93a074f1151..820a68f99a4 100644 --- a/scrub/phase7.c +++ b/scrub/phase7.c @@ -122,7 +122,7 @@ phase7_func( if (error) return error; error = action_list_process(ctx, -1, &alist, - ALP_COMPLAIN_IF_UNFIXED | ALP_NOPROGRESS); + XRM_FINAL_WARNING | XRM_NOPROGRESS); if (error) return error; diff --git a/scrub/repair.c b/scrub/repair.c index 9ade805e1b6..61d62ab6b49 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -274,7 +274,7 @@ action_list_process( fix = xfs_repair_metadata(ctx, xfdp, aitem, repair_flags); switch (fix) { case CHECK_DONE: - if (!(repair_flags & ALP_NOPROGRESS)) + if (!(repair_flags & XRM_NOPROGRESS)) progress_add(1); alist->nr--; list_del(&aitem->list); @@ -316,7 +316,7 @@ action_list_process_or_defer( int ret; ret = action_list_process(ctx, -1, alist, - ALP_REPAIR_ONLY | ALP_NOPROGRESS); + XRM_REPAIR_ONLY | XRM_NOPROGRESS); if (ret) return ret; diff --git a/scrub/repair.h b/scrub/repair.h index aa3ea13615f..6b6f64691a3 100644 --- a/scrub/repair.h +++ b/scrub/repair.h @@ -32,10 +32,18 @@ void action_list_find_mustfix(struct action_list *actions, unsigned long long *broken_primaries, unsigned long long *broken_secondaries); -/* Passed through to xfs_repair_metadata() */ -#define ALP_REPAIR_ONLY (XRM_REPAIR_ONLY) -#define ALP_COMPLAIN_IF_UNFIXED (XRM_COMPLAIN_IF_UNFIXED) -#define ALP_NOPROGRESS (1U << 31) +/* + * Only ask the kernel to repair this object if the kernel directly told us it + * was corrupt. Objects that are only flagged as having cross-referencing + * errors or flagged as eligible for optimization are left for later. + */ +#define XRM_REPAIR_ONLY (1U << 0) + +/* This is the last repair attempt; complain if still broken even after fix. */ +#define XRM_FINAL_WARNING (1U << 1) + +/* Don't call progress_add after repairing an item. */ +#define XRM_NOPROGRESS (1U << 2) int action_list_process(struct scrub_ctx *ctx, int fd, struct action_list *alist, unsigned int repair_flags); diff --git a/scrub/scrub.c b/scrub/scrub.c index 7cb94af3d15..f4b152a1c9c 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -743,7 +743,7 @@ _("Filesystem is shut down, aborting.")); * could fix this, it's at least worth trying the scan * again to see if another repair fixed it. */ - if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED)) + if (!(repair_flags & XRM_FINAL_WARNING)) return CHECK_RETRY; fallthrough; case EINVAL: @@ -773,13 +773,13 @@ _("Read-only filesystem; cannot make changes.")); * to requeue the repair for later and don't say a * thing. Otherwise, print error and bail out. */ - if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED)) + if (!(repair_flags & XRM_FINAL_WARNING)) return CHECK_RETRY; str_liberror(ctx, error, descr_render(&dsc)); return CHECK_DONE; } - if (repair_flags & XRM_COMPLAIN_IF_UNFIXED) + if (repair_flags & XRM_FINAL_WARNING) scrub_warn_incomplete_scrub(ctx, &dsc, &meta); if (needs_repair(&meta)) { /* @@ -787,7 +787,7 @@ _("Read-only filesystem; cannot make changes.")); * just requeue this and try again later. Otherwise we * log the error loudly and don't try again. */ - if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED)) + if (!(repair_flags & XRM_FINAL_WARNING)) return CHECK_RETRY; str_corrupt(ctx, descr_render(&dsc), _("Repair unsuccessful; offline repair required.")); @@ -799,7 +799,7 @@ _("Repair unsuccessful; offline repair required.")); * caller to run xfs_repair; otherwise, we'll keep trying to * reverify the cross-referencing as repairs progress. */ - if (repair_flags & XRM_COMPLAIN_IF_UNFIXED) { + if (repair_flags & XRM_FINAL_WARNING) { str_info(ctx, descr_render(&dsc), _("Seems correct but cross-referencing failed; offline repair recommended.")); } else { diff --git a/scrub/scrub.h b/scrub/scrub.h index cb33ddb46f3..5359548b06f 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -54,16 +54,6 @@ struct action_item { __u32 agno; }; -/* - * Only ask the kernel to repair this object if the kernel directly told us it - * was corrupt. Objects that are only flagged as having cross-referencing - * errors or flagged as eligible for optimization are left for later. - */ -#define XRM_REPAIR_ONLY (1U << 0) - -/* Complain if still broken even after fix. */ -#define XRM_COMPLAIN_IF_UNFIXED (1U << 1) - enum check_outcome xfs_repair_metadata(struct scrub_ctx *ctx, struct xfs_fd *xfdp, struct action_item *aitem, unsigned int repair_flags); From patchwork Sun Dec 31 22:37:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507917 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 023AAC140 for ; Sun, 31 Dec 2023 22:37:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SV5tFvMu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 782E5C433C7; Sun, 31 Dec 2023 22:37:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704062249; bh=a+vTHOaYXDeipGvZ/vL35m7NzEPz+jiYvlg3ILzexxU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=SV5tFvMu/DRXutgG2vE1fxndEIRKRszCgJ/bhzyK58XYCYL8MeL8Vrkdi/LIRsrj6 F790XlWq0jdsmzI1VfUNbPulRYkG6ML65CfZu2x1DWS3Buj/Rz4VYh+kRR0CQhxu6t 7snwS7wZLnToVLZYGKmd7PSvaVC9ot6kPztdwiDTeyILvx0vwmSq4JFTV7a6km1zVy GbZQSCRuphaTinphbolxS9hBq5XwtmVeUQyXLEdgetMJ8s3DYZnFKhjwQAjlJ9Gxdj 1koZKaHJ/Jh7ynEkLJvNAnvi0eOY7DHMpscORYp0hmqiF1+SpITkzwel8ZJoO2zIaN TrPrTls1fKvAA== Date: Sun, 31 Dec 2023 14:37:29 -0800 Subject: [PATCH 4/7] xfs_scrub: move repair functions to repair.c From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404998699.1797322.6425738245640649842.stgit@frogsfrogsfrogs> In-Reply-To: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> References: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Move all the repair functions to repair.c. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/phase1.c | 2 scrub/repair.c | 169 +++++++++++++++++++++++++++++++++++++++++ scrub/scrub.c | 204 +------------------------------------------------ scrub/scrub.h | 6 - scrub/scrub_private.h | 55 +++++++++++++ 5 files changed, 230 insertions(+), 206 deletions(-) create mode 100644 scrub/scrub_private.h diff --git a/scrub/phase1.c b/scrub/phase1.c index 96138e03e71..81b0918a1c8 100644 --- a/scrub/phase1.c +++ b/scrub/phase1.c @@ -210,7 +210,7 @@ _("Kernel metadata scrubbing facility is not available.")); } /* Do we need kernel-assisted metadata repair? */ - if (ctx->mode != SCRUB_MODE_DRY_RUN && !xfs_can_repair(ctx)) { + if (ctx->mode != SCRUB_MODE_DRY_RUN && !can_repair(ctx)) { str_error(ctx, ctx->mntpoint, _("Kernel metadata repair facility is not available. Use -n to scrub.")); return ECANCELED; diff --git a/scrub/repair.c b/scrub/repair.c index 61d62ab6b49..54bd09575c0 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -10,11 +10,180 @@ #include #include "list.h" #include "libfrog/paths.h" +#include "libfrog/fsgeom.h" +#include "libfrog/scrub.h" #include "xfs_scrub.h" #include "common.h" #include "scrub.h" #include "progress.h" #include "repair.h" +#include "descr.h" +#include "scrub_private.h" + +/* General repair routines. */ + +/* Repair some metadata. */ +static enum check_outcome +xfs_repair_metadata( + struct scrub_ctx *ctx, + struct xfs_fd *xfdp, + struct action_item *aitem, + unsigned int repair_flags) +{ + struct xfs_scrub_metadata meta = { 0 }; + struct xfs_scrub_metadata oldm; + DEFINE_DESCR(dsc, ctx, format_scrub_descr); + int error; + + assert(aitem->type < XFS_SCRUB_TYPE_NR); + assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL")); + meta.sm_type = aitem->type; + meta.sm_flags = aitem->flags | XFS_SCRUB_IFLAG_REPAIR; + if (use_force_rebuild) + meta.sm_flags |= XFS_SCRUB_IFLAG_FORCE_REBUILD; + switch (xfrog_scrubbers[aitem->type].group) { + case XFROG_SCRUB_GROUP_AGHEADER: + case XFROG_SCRUB_GROUP_PERAG: + meta.sm_agno = aitem->agno; + break; + case XFROG_SCRUB_GROUP_INODE: + meta.sm_ino = aitem->ino; + meta.sm_gen = aitem->gen; + break; + default: + break; + } + + if (!is_corrupt(&meta) && (repair_flags & XRM_REPAIR_ONLY)) + return CHECK_RETRY; + + memcpy(&oldm, &meta, sizeof(oldm)); + descr_set(&dsc, &oldm); + + if (needs_repair(&meta)) + str_info(ctx, descr_render(&dsc), _("Attempting repair.")); + else if (debug || verbose) + str_info(ctx, descr_render(&dsc), + _("Attempting optimization.")); + + error = -xfrog_scrub_metadata(xfdp, &meta); + switch (error) { + case 0: + /* No operational errors encountered. */ + break; + case EDEADLOCK: + case EBUSY: + /* Filesystem is busy, try again later. */ + if (debug || verbose) + str_info(ctx, descr_render(&dsc), +_("Filesystem is busy, deferring repair.")); + return CHECK_RETRY; + case ESHUTDOWN: + /* Filesystem is already shut down, abort. */ + str_error(ctx, descr_render(&dsc), +_("Filesystem is shut down, aborting.")); + return CHECK_ABORT; + case ENOTTY: + case EOPNOTSUPP: + /* + * If the kernel cannot perform the optimization that we + * requested; or we forced a repair but the kernel doesn't know + * how to perform the repair, don't requeue the request. Mark + * it done and move on. + */ + if (is_unoptimized(&oldm) || + debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) + return CHECK_DONE; + /* + * If we're in no-complain mode, requeue the check for + * later. It's possible that an error in another + * component caused us to flag an error in this + * component. Even if the kernel didn't think it + * could fix this, it's at least worth trying the scan + * again to see if another repair fixed it. + */ + if (!(repair_flags & XRM_FINAL_WARNING)) + return CHECK_RETRY; + fallthrough; + case EINVAL: + /* Kernel doesn't know how to repair this? */ + str_corrupt(ctx, descr_render(&dsc), +_("Don't know how to fix; offline repair required.")); + return CHECK_DONE; + case EROFS: + /* Read-only filesystem, can't fix. */ + if (verbose || debug || needs_repair(&oldm)) + str_error(ctx, descr_render(&dsc), +_("Read-only filesystem; cannot make changes.")); + return CHECK_ABORT; + case ENOENT: + /* Metadata not present, just skip it. */ + return CHECK_DONE; + case ENOMEM: + case ENOSPC: + /* Don't care if preen fails due to low resources. */ + if (is_unoptimized(&oldm) && !needs_repair(&oldm)) + return CHECK_DONE; + fallthrough; + default: + /* + * Operational error. If the caller doesn't want us + * to complain about repair failures, tell the caller + * to requeue the repair for later and don't say a + * thing. Otherwise, print error and bail out. + */ + if (!(repair_flags & XRM_FINAL_WARNING)) + return CHECK_RETRY; + str_liberror(ctx, error, descr_render(&dsc)); + return CHECK_DONE; + } + + if (repair_flags & XRM_FINAL_WARNING) + scrub_warn_incomplete_scrub(ctx, &dsc, &meta); + if (needs_repair(&meta)) { + /* + * Still broken; if we've been told not to complain then we + * just requeue this and try again later. Otherwise we + * log the error loudly and don't try again. + */ + if (!(repair_flags & XRM_FINAL_WARNING)) + return CHECK_RETRY; + str_corrupt(ctx, descr_render(&dsc), +_("Repair unsuccessful; offline repair required.")); + } else if (xref_failed(&meta)) { + /* + * This metadata object itself looks ok, but we still noticed + * inconsistencies when comparing it with the other filesystem + * metadata. If we're in "final warning" mode, advise the + * caller to run xfs_repair; otherwise, we'll keep trying to + * reverify the cross-referencing as repairs progress. + */ + if (repair_flags & XRM_FINAL_WARNING) { + str_info(ctx, descr_render(&dsc), + _("Seems correct but cross-referencing failed; offline repair recommended.")); + } else { + if (verbose) + str_info(ctx, descr_render(&dsc), + _("Seems correct but cross-referencing failed; will keep checking.")); + return CHECK_RETRY; + } + } else { + /* Clean operation, no corruption detected. */ + if (is_corrupt(&oldm)) + record_repair(ctx, descr_render(&dsc), + _("Repairs successful.")); + else if (xref_disagrees(&oldm)) + record_repair(ctx, descr_render(&dsc), + _("Repairs successful after discrepancy in cross-referencing.")); + else if (xref_failed(&oldm)) + record_repair(ctx, descr_render(&dsc), + _("Repairs successful after cross-referencing failure.")); + else + record_preen(ctx, descr_render(&dsc), + _("Optimization successful.")); + } + return CHECK_DONE; +} /* * Prioritize action items in order of how long we can wait. diff --git a/scrub/scrub.c b/scrub/scrub.c index f4b152a1c9c..59583913031 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -20,11 +20,12 @@ #include "scrub.h" #include "repair.h" #include "descr.h" +#include "scrub_private.h" /* Online scrub and repair wrappers. */ /* Format a scrub description. */ -static int +int format_scrub_descr( struct scrub_ctx *ctx, char *buf, @@ -52,46 +53,8 @@ format_scrub_descr( return -1; } -/* Predicates for scrub flag state. */ - -static inline bool is_corrupt(struct xfs_scrub_metadata *sm) -{ - return sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT; -} - -static inline bool is_unoptimized(struct xfs_scrub_metadata *sm) -{ - return sm->sm_flags & XFS_SCRUB_OFLAG_PREEN; -} - -static inline bool xref_failed(struct xfs_scrub_metadata *sm) -{ - return sm->sm_flags & XFS_SCRUB_OFLAG_XFAIL; -} - -static inline bool xref_disagrees(struct xfs_scrub_metadata *sm) -{ - return sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT; -} - -static inline bool is_incomplete(struct xfs_scrub_metadata *sm) -{ - return sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE; -} - -static inline bool is_suspicious(struct xfs_scrub_metadata *sm) -{ - return sm->sm_flags & XFS_SCRUB_OFLAG_WARNING; -} - -/* Should we fix it? */ -static inline bool needs_repair(struct xfs_scrub_metadata *sm) -{ - return is_corrupt(sm) || xref_disagrees(sm); -} - /* Warn about strange circumstances after scrub. */ -static inline void +void scrub_warn_incomplete_scrub( struct scrub_ctx *ctx, struct descr *dsc, @@ -647,7 +610,7 @@ can_scrub_parent( } bool -xfs_can_repair( +can_repair( struct scrub_ctx *ctx) { return __scrub_test(ctx, XFS_SCRUB_TYPE_PROBE, XFS_SCRUB_IFLAG_REPAIR); @@ -660,162 +623,3 @@ can_force_rebuild( return __scrub_test(ctx, XFS_SCRUB_TYPE_PROBE, XFS_SCRUB_IFLAG_REPAIR | XFS_SCRUB_IFLAG_FORCE_REBUILD); } - -/* General repair routines. */ - -/* Repair some metadata. */ -enum check_outcome -xfs_repair_metadata( - struct scrub_ctx *ctx, - struct xfs_fd *xfdp, - struct action_item *aitem, - unsigned int repair_flags) -{ - struct xfs_scrub_metadata meta = { 0 }; - struct xfs_scrub_metadata oldm; - DEFINE_DESCR(dsc, ctx, format_scrub_descr); - int error; - - assert(aitem->type < XFS_SCRUB_TYPE_NR); - assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL")); - meta.sm_type = aitem->type; - meta.sm_flags = aitem->flags | XFS_SCRUB_IFLAG_REPAIR; - if (use_force_rebuild) - meta.sm_flags |= XFS_SCRUB_IFLAG_FORCE_REBUILD; - switch (xfrog_scrubbers[aitem->type].group) { - case XFROG_SCRUB_GROUP_AGHEADER: - case XFROG_SCRUB_GROUP_PERAG: - meta.sm_agno = aitem->agno; - break; - case XFROG_SCRUB_GROUP_INODE: - meta.sm_ino = aitem->ino; - meta.sm_gen = aitem->gen; - break; - default: - break; - } - - if (!is_corrupt(&meta) && (repair_flags & XRM_REPAIR_ONLY)) - return CHECK_RETRY; - - memcpy(&oldm, &meta, sizeof(oldm)); - descr_set(&dsc, &oldm); - - if (needs_repair(&meta)) - str_info(ctx, descr_render(&dsc), _("Attempting repair.")); - else if (debug || verbose) - str_info(ctx, descr_render(&dsc), - _("Attempting optimization.")); - - error = -xfrog_scrub_metadata(xfdp, &meta); - switch (error) { - case 0: - /* No operational errors encountered. */ - break; - case EDEADLOCK: - case EBUSY: - /* Filesystem is busy, try again later. */ - if (debug || verbose) - str_info(ctx, descr_render(&dsc), -_("Filesystem is busy, deferring repair.")); - return CHECK_RETRY; - case ESHUTDOWN: - /* Filesystem is already shut down, abort. */ - str_error(ctx, descr_render(&dsc), -_("Filesystem is shut down, aborting.")); - return CHECK_ABORT; - case ENOTTY: - case EOPNOTSUPP: - /* - * If the kernel cannot perform the optimization that we - * requested; or we forced a repair but the kernel doesn't know - * how to perform the repair, don't requeue the request. Mark - * it done and move on. - */ - if (is_unoptimized(&oldm) || - debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) - return CHECK_DONE; - /* - * If we're in no-complain mode, requeue the check for - * later. It's possible that an error in another - * component caused us to flag an error in this - * component. Even if the kernel didn't think it - * could fix this, it's at least worth trying the scan - * again to see if another repair fixed it. - */ - if (!(repair_flags & XRM_FINAL_WARNING)) - return CHECK_RETRY; - fallthrough; - case EINVAL: - /* Kernel doesn't know how to repair this? */ - str_corrupt(ctx, descr_render(&dsc), -_("Don't know how to fix; offline repair required.")); - return CHECK_DONE; - case EROFS: - /* Read-only filesystem, can't fix. */ - if (verbose || debug || needs_repair(&oldm)) - str_error(ctx, descr_render(&dsc), -_("Read-only filesystem; cannot make changes.")); - return CHECK_ABORT; - case ENOENT: - /* Metadata not present, just skip it. */ - return CHECK_DONE; - case ENOMEM: - case ENOSPC: - /* Don't care if preen fails due to low resources. */ - if (is_unoptimized(&oldm) && !needs_repair(&oldm)) - return CHECK_DONE; - fallthrough; - default: - /* - * Operational error. If the caller doesn't want us - * to complain about repair failures, tell the caller - * to requeue the repair for later and don't say a - * thing. Otherwise, print error and bail out. - */ - if (!(repair_flags & XRM_FINAL_WARNING)) - return CHECK_RETRY; - str_liberror(ctx, error, descr_render(&dsc)); - return CHECK_DONE; - } - - if (repair_flags & XRM_FINAL_WARNING) - scrub_warn_incomplete_scrub(ctx, &dsc, &meta); - if (needs_repair(&meta)) { - /* - * Still broken; if we've been told not to complain then we - * just requeue this and try again later. Otherwise we - * log the error loudly and don't try again. - */ - if (!(repair_flags & XRM_FINAL_WARNING)) - return CHECK_RETRY; - str_corrupt(ctx, descr_render(&dsc), -_("Repair unsuccessful; offline repair required.")); - } else if (xref_failed(&meta)) { - /* - * This metadata object itself looks ok, but we still noticed - * inconsistencies when comparing it with the other filesystem - * metadata. If we're in "final warning" mode, advise the - * caller to run xfs_repair; otherwise, we'll keep trying to - * reverify the cross-referencing as repairs progress. - */ - if (repair_flags & XRM_FINAL_WARNING) { - str_info(ctx, descr_render(&dsc), - _("Seems correct but cross-referencing failed; offline repair recommended.")); - } else { - if (verbose) - str_info(ctx, descr_render(&dsc), - _("Seems correct but cross-referencing failed; will keep checking.")); - return CHECK_RETRY; - } - } else { - /* Clean operation, no corruption detected. */ - if (needs_repair(&oldm)) - record_repair(ctx, descr_render(&dsc), - _("Repairs successful.")); - else - record_preen(ctx, descr_render(&dsc), - _("Optimization successful.")); - } - return CHECK_DONE; -} diff --git a/scrub/scrub.h b/scrub/scrub.h index 5359548b06f..133445e8da6 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -38,7 +38,7 @@ bool can_scrub_dir(struct scrub_ctx *ctx); bool can_scrub_attr(struct scrub_ctx *ctx); bool can_scrub_symlink(struct scrub_ctx *ctx); bool can_scrub_parent(struct scrub_ctx *ctx); -bool xfs_can_repair(struct scrub_ctx *ctx); +bool can_repair(struct scrub_ctx *ctx); bool can_force_rebuild(struct scrub_ctx *ctx); int scrub_file(struct scrub_ctx *ctx, int fd, const struct xfs_bulkstat *bstat, @@ -54,8 +54,4 @@ struct action_item { __u32 agno; }; -enum check_outcome xfs_repair_metadata(struct scrub_ctx *ctx, - struct xfs_fd *xfdp, struct action_item *aitem, - unsigned int repair_flags); - #endif /* XFS_SCRUB_SCRUB_H_ */ diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h new file mode 100644 index 00000000000..a24d485a286 --- /dev/null +++ b/scrub/scrub_private.h @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2021-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef XFS_SCRUB_SCRUB_PRIVATE_H_ +#define XFS_SCRUB_SCRUB_PRIVATE_H_ + +/* Shared code between scrub.c and repair.c. */ + +int format_scrub_descr(struct scrub_ctx *ctx, char *buf, size_t buflen, + void *where); + +/* Predicates for scrub flag state. */ + +static inline bool is_corrupt(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT; +} + +static inline bool is_unoptimized(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & XFS_SCRUB_OFLAG_PREEN; +} + +static inline bool xref_failed(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & XFS_SCRUB_OFLAG_XFAIL; +} + +static inline bool xref_disagrees(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT; +} + +static inline bool is_incomplete(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE; +} + +static inline bool is_suspicious(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & XFS_SCRUB_OFLAG_WARNING; +} + +/* Should we fix it? */ +static inline bool needs_repair(struct xfs_scrub_metadata *sm) +{ + return is_corrupt(sm) || xref_disagrees(sm); +} + +void scrub_warn_incomplete_scrub(struct scrub_ctx *ctx, struct descr *dsc, + struct xfs_scrub_metadata *meta); + +#endif /* XFS_SCRUB_SCRUB_PRIVATE_H_ */ From patchwork Sun Dec 31 22:37:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507918 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E38AC13B for ; Sun, 31 Dec 2023 22:37:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aMg3xVPP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 308FBC433C7; Sun, 31 Dec 2023 22:37:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704062265; bh=vEDzYIYaVJ1RVgE50Hp+PZPEW8vfdPsjk+Zjz9Dds9o=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=aMg3xVPPH/Xj7MQdH2dUL0OS8xHA321HxsC9IZ6xW8TaacTpIzasZrIVQOk4Pxwwz ASdwe6ugvg95YyC3ePaSZLhGpWdNESGrZmWwwU4oA2u7eG3Wx04hqIjSU/+j68JUH7 oSZpy0UAUHdfneNjdQql8cwBYm5YpebHYGmwormh98AKq9Opq7h5WFIEKFFndEP6/4 +pAYR+ACTjb23e13MbAuRrA8eaAXqGVFeJMp7mLgs6WLc3JwsdAXZQ5E2Y9Wkj2E4k gN8FYtk9aPWN1NW7dsyCQYHVRU1Vg1xxk+ZGvCpUljbMD7LIVDdMvhbglegrtF3CtM O/5b9mtkEfywg== Date: Sun, 31 Dec 2023 14:37:44 -0800 Subject: [PATCH 5/7] xfs_scrub: log when a repair was unnecessary From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404998713.1797322.10083768087196595064.stgit@frogsfrogsfrogs> In-Reply-To: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> References: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If the kernel tells us that a filesystem object didn't need repairs, we should log that with a message specific to that outcome. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/repair.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/scrub/repair.c b/scrub/repair.c index 54bd09575c0..50f168d24fe 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -167,6 +167,10 @@ _("Repair unsuccessful; offline repair required.")); _("Seems correct but cross-referencing failed; will keep checking.")); return CHECK_RETRY; } + } else if (meta.sm_flags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED) { + if (verbose) + str_info(ctx, descr_render(&dsc), + _("No modification needed.")); } else { /* Clean operation, no corruption detected. */ if (is_corrupt(&oldm)) From patchwork Sun Dec 31 22:38:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507919 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1EC9C127 for ; Sun, 31 Dec 2023 22:38:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ZGuNRVZu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BB420C433C7; Sun, 31 Dec 2023 22:38:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704062280; bh=mVNEmLus2Ne6xo/nJuTou9mR3FFoUZmAPRpZvA3u5jY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=ZGuNRVZuo/guDZuGDgaM/IY8BKhAw43mvRelzYlp0oW1BTXuACN6B7G0T8z9qMBB3 N5az6wMNXoSs3WGzZuLGnDSbSJVb0+/F2r2lGNOaKB67zN8Jrfit2Qaipx9cgT6iVI spg/aWBWs+qDZ+KBaBTDYsVpA6JwdpZ4bc60H2GpjSyx/5fzqodqTXgvUskT9uxt62 gQDZnYp8fJhjiqur1GgcYbQgRt2wHpJqO+emHQfLLZsULBK7ndVzR+IsTHjOTltU01 VaVMjCRAHqYbSevzqNjaOqJITprYMDGmYzyiuszVD09BTQVZMvn56HYN79uECXzIvP mVCKB/VREd4Rg== Date: Sun, 31 Dec 2023 14:38:00 -0800 Subject: [PATCH 6/7] xfs_scrub: require primary superblock repairs to complete before proceeding From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404998726.1797322.10938356067740131087.stgit@frogsfrogsfrogs> In-Reply-To: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> References: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Phase 2 of the xfs_scrub program calls the kernel to check the primary superblock before scanning the rest of the filesystem. Though doing so is a no-op now (since the primary super must pass all checks as a prerequisite for mounting), the goal of this code is to enable future kernel code to intercept an xfs_scrub run before it actually does anything. If this some day involves fixing the primary superblock, it seems reasonable to require that /all/ repairs complete successfully before moving on to the rest of the filesystem. Unfortunately, that's not what xfs_scrub does now -- primary super repairs that fail are theoretically deferred to phase 4! So make this mandatory. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/phase2.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scrub/phase2.c b/scrub/phase2.c index 80c77b2876f..2d49c604eae 100644 --- a/scrub/phase2.c +++ b/scrub/phase2.c @@ -174,7 +174,8 @@ phase2_func( ret = scrub_primary_super(ctx, &alist); if (ret) goto out_wq; - ret = action_list_process_or_defer(ctx, 0, &alist); + ret = action_list_process(ctx, -1, &alist, + XRM_FINAL_WARNING | XRM_NOPROGRESS); if (ret) goto out_wq; From patchwork Sun Dec 31 22:38:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507920 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90D56C127 for ; Sun, 31 Dec 2023 22:38:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g63OhbPu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5DB88C433C8; Sun, 31 Dec 2023 22:38:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704062296; bh=bUPVsEaU9/ZU+YZZgzPa/m8S3yi4vT8I+UHnQ9q5xUw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=g63OhbPuP3JMIc62rmmpW6ftTndoc1GuIWIIQAZr5PoGXvnpOH0Pa90X7cHSRu0Z5 RwsVQiQHzFLqB/Hn2IwoiJZ4L8zRx1Xy3iNd9eT1cQs/sRuyf2QPKcjNR2biJFDv+q I/jD1F00kHKpRRR3ZYykOp2Adu2k6sTz3ciSuZJCCj9IGTHd2J/eMz9pgPhRvM7bsr 2MEWGIQ7OIg8gjDsMehLSDApGyeJiF8M9OwUt9tb1AODEn0y9yNL8VVVifsmfLywBU jKjv2AXXgANA0krMs1OMBijrsADaf+Z8I+0RN7jaWtmNZXu9GkRfb5mQy6IXl4LKTd sBaGkdQiJyqbQ== Date: Sun, 31 Dec 2023 14:38:15 -0800 Subject: [PATCH 7/7] xfs_scrub: actually try to fix summary counters ahead of repairs From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404998739.1797322.2676507559663047353.stgit@frogsfrogsfrogs> In-Reply-To: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> References: <170404998642.1797322.3177048972598846181.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong A while ago, I decided to make phase 4 check the summary counters before it starts any other repairs, having observed that repairs of primary metadata can fail because the summary counters (incorrectly) claim that there aren't enough free resources in the filesystem. However, if problems are found in the summary counters, the repair work will be run as part of the AG 0 repairs, which means that it runs concurrently with other scrubbers. This doesn't quite get us to the intended goal, so try to fix the scrubbers ahead of time. If that fails, tough, we'll get back to it in phase 7 if scrub gets that far. Fixes: cbaf1c9d91a0 ("xfs_scrub: check summary counters") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/phase4.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/scrub/phase4.c b/scrub/phase4.c index d42e67637d8..0c67abf64a3 100644 --- a/scrub/phase4.c +++ b/scrub/phase4.c @@ -129,6 +129,7 @@ phase4_func( struct scrub_ctx *ctx) { struct xfs_fsop_geom fsgeom; + struct action_list alist; int ret; if (!have_action_items(ctx)) @@ -136,11 +137,13 @@ phase4_func( /* * Check the summary counters early. Normally we do this during phase - * seven, but some of the cross-referencing requires fairly-accurate - * counters, so counter repairs have to be put on the list now so that - * they get fixed before we stop retrying unfixed metadata repairs. + * seven, but some of the cross-referencing requires fairly accurate + * summary counters. Check and try to repair them now to minimize the + * chance that repairs of primary metadata fail due to secondary + * metadata. If repairs fails, we'll come back during phase 7. */ - ret = scrub_fs_counters(ctx, &ctx->action_lists[0]); + action_list_init(&alist); + ret = scrub_fs_counters(ctx, &alist); if (ret) return ret; @@ -155,11 +158,18 @@ phase4_func( return ret; if (fsgeom.sick & XFS_FSOP_GEOM_SICK_QUOTACHECK) { - ret = scrub_quotacheck(ctx, &ctx->action_lists[0]); + ret = scrub_quotacheck(ctx, &alist); if (ret) return ret; } + /* Repair counters before starting on the rest. */ + ret = action_list_process(ctx, -1, &alist, + XRM_REPAIR_ONLY | XRM_NOPROGRESS); + if (ret) + return ret; + action_list_discard(&alist); + ret = repair_everything(ctx); if (ret) return ret;