From patchwork Fri Dec 30 22:18:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FC6CC4332F for ; Sat, 31 Dec 2022 00:27:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235962AbiLaA1m (ORCPT ); Fri, 30 Dec 2022 19:27:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235614AbiLaA1i (ORCPT ); Fri, 30 Dec 2022 19:27:38 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4E871EAF0 for ; Fri, 30 Dec 2022 16:27:32 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 39EEC61D2F for ; Sat, 31 Dec 2022 00:27:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9AAFFC433EF; Sat, 31 Dec 2022 00:27:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446451; bh=SIXqsmji5aPkWUHC/tY/HcaiXEhzLXTS6EetHWoXSQ0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=B5VjabgLEkHlyqUaXtuLj+rDjHeLhql65S+b4UHcRE/r+UmHDpKf7xJeGkcioCGza 8+os369e07Zx+MvpTugzNgsgjz2bG+cdv7Pk7zjk1D+9Xui1hkZTLiyPoFRB8p5Ldc D4QA2DLAKyhuj5b94KCOIhMRMC42xNz6FviF2Cfx0tVdTto1FLX42VjWklAz6uCcSP 59T+o7TvYksQZXEGVTLSf9+r6ni8FwZak70kf6nZ+QjTXJ/w4dCHkZ0ET3zWjbltL7 NlSlg1MUHf6YSGxczoEcYAGItBWUpgGodvXePTj1vuUlIwg9zYvtaRFJL2sSabLfSY 7m3B7LYkWA3TA== Subject: [PATCH 1/9] xfs_scrub: track repair items by principal, not by individual repairs From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:17 -0800 Message-ID: <167243869726.715746.4221238033641226521.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Create a new structure to track scrub and repair state by principal filesystem object (e.g. ag number or inode number/generation) so that we can more easily examine and ensure that we satisfy repair order dependencies. This transposition will eventually enable bulk scrub operations and will also save a lot of memory if a given object needs a lot of work. Signed-off-by: Darrick J. Wong --- scrub/phase1.c | 4 ++ scrub/phase2.c | 14 ++++++-- scrub/phase3.c | 19 ++++++----- scrub/phase4.c | 6 ++-- scrub/phase5.c | 5 ++- scrub/phase7.c | 4 ++ scrub/scrub.c | 68 ++++++++++++++++++++++++++++++++-------- scrub/scrub.h | 83 +++++++++++++++++++++++++++++++++++++++++++++---- scrub/scrub_private.h | 19 +++++++++++ 9 files changed, 185 insertions(+), 37 deletions(-) diff --git a/scrub/phase1.c b/scrub/phase1.c index 047631802e4..3113fc5ccf6 100644 --- a/scrub/phase1.c +++ b/scrub/phase1.c @@ -52,6 +52,7 @@ static int report_to_kernel( struct scrub_ctx *ctx) { + struct scrub_item sri; struct action_list alist; int ret; @@ -60,8 +61,9 @@ report_to_kernel( ctx->warnings_found) return 0; + scrub_item_init_fs(&sri); action_list_init(&alist); - ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_HEALTHY, 0, &alist); + ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_HEALTHY, 0, &alist, &sri); if (ret) return ret; diff --git a/scrub/phase2.c b/scrub/phase2.c index a78d15aac1f..50c2c88276f 100644 --- a/scrub/phase2.c +++ b/scrub/phase2.c @@ -57,6 +57,7 @@ scan_ag_metadata( xfs_agnumber_t agno, void *arg) { + struct scrub_item sri; struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; struct scan_ctl *sctl = arg; struct action_list alist; @@ -68,6 +69,7 @@ scan_ag_metadata( if (sctl->aborted) return; + scrub_item_init_ag(&sri, agno); action_list_init(&alist); action_list_init(&immediate_alist); snprintf(descr, DESCR_BUFSZ, _("AG %u"), agno); @@ -76,7 +78,7 @@ scan_ag_metadata( * First we scrub and fix the AG headers, because we need * them to work well enough to check the AG btrees. */ - ret = scrub_ag_headers(ctx, agno, &alist); + ret = scrub_ag_headers(ctx, agno, &alist, &sri); if (ret) goto err; @@ -86,7 +88,7 @@ scan_ag_metadata( goto err; /* Now scrub the AG btrees. */ - ret = scrub_ag_metadata(ctx, agno, &alist); + ret = scrub_ag_metadata(ctx, agno, &alist, &sri); if (ret) goto err; @@ -120,6 +122,7 @@ scan_metafile( xfs_agnumber_t type, void *arg) { + struct scrub_item sri; struct action_list alist; struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; struct scan_ctl *sctl = arg; @@ -129,8 +132,9 @@ scan_metafile( if (sctl->aborted) goto out; + scrub_item_init_fs(&sri); action_list_init(&alist); - ret = scrub_metadata_file(ctx, type, &alist); + ret = scrub_metadata_file(ctx, type, &alist, &sri); if (ret) { sctl->aborted = true; goto out; @@ -162,6 +166,7 @@ phase2_func( .rbm_done = false, }; struct action_list alist; + struct scrub_item sri; const struct xfrog_scrub_descr *sc = xfrog_scrubbers; xfs_agnumber_t agno; unsigned int type; @@ -183,8 +188,9 @@ phase2_func( * upgrades) off of the sb 0 scrubber (which currently does nothing). * If errors occur, this function will log them and return nonzero. */ + scrub_item_init_ag(&sri, 0); action_list_init(&alist); - ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_SB, 0, &alist); + ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_SB, 0, &alist, &sri); if (ret) goto out_wq; ret = action_list_process(ctx, -1, &alist, diff --git a/scrub/phase3.c b/scrub/phase3.c index ef41ee8049d..ef22a1d11c1 100644 --- a/scrub/phase3.c +++ b/scrub/phase3.c @@ -105,12 +105,14 @@ scrub_inode( void *arg) { struct action_list alist; + struct scrub_item sri; struct scrub_inode_ctx *ictx = arg; struct ptcounter *icount = ictx->icount; xfs_agnumber_t agno; int fd = -1; int error; + scrub_item_init_file(&sri, bstat); action_list_init(&alist); agno = cvt_ino_to_agno(&ctx->mnt, bstat->bs_ino); background_sleep(); @@ -143,7 +145,7 @@ scrub_inode( fd = scrub_open_handle(handle); /* Scrub the inode. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_INODE, &alist); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_INODE, &alist, &sri); if (error) goto out; @@ -152,13 +154,13 @@ scrub_inode( goto out; /* Scrub all block mappings. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTD, &alist); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTD, &alist, &sri); if (error) goto out; - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTA, &alist); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTA, &alist, &sri); if (error) goto out; - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTC, &alist); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTC, &alist, &sri); if (error) goto out; @@ -169,21 +171,22 @@ scrub_inode( if (S_ISLNK(bstat->bs_mode)) { /* Check symlink contents. */ error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_SYMLINK, - &alist); + &alist, &sri); } else if (S_ISDIR(bstat->bs_mode)) { /* Check the directory entries. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_DIR, &alist); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_DIR, &alist, + &sri); } if (error) goto out; /* Check all the extended attributes. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_XATTR, &alist); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_XATTR, &alist, &sri); if (error) goto out; /* Check parent pointers. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_PARENT, &alist); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_PARENT, &alist, &sri); if (error) goto out; diff --git a/scrub/phase4.c b/scrub/phase4.c index df9b066cfd2..31939653bda 100644 --- a/scrub/phase4.c +++ b/scrub/phase4.c @@ -130,6 +130,7 @@ phase4_func( { struct xfs_fsop_geom fsgeom; struct action_list alist; + struct scrub_item sri; int ret; if (!have_action_items(ctx)) @@ -142,8 +143,9 @@ phase4_func( * chance that repairs of primary metadata fail due to secondary * metadata. If repairs fails, we'll come back during phase 7. */ + scrub_item_init_fs(&sri); action_list_init(&alist); - ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_FSCOUNTERS, 0, &alist); + ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_FSCOUNTERS, 0, &alist, &sri); if (ret) return ret; @@ -159,7 +161,7 @@ phase4_func( if (fsgeom.sick & XFS_FSOP_GEOM_SICK_QUOTACHECK) { ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_QUOTACHECK, 0, - &alist); + &alist, &sri); if (ret) return ret; } diff --git a/scrub/phase5.c b/scrub/phase5.c index e598ffd3985..ea77c2a5298 100644 --- a/scrub/phase5.c +++ b/scrub/phase5.c @@ -384,6 +384,7 @@ check_fs_label( } struct iscan_item { + struct scrub_item sri; struct action_list alist; bool *abortedp; unsigned int scrub_type; @@ -411,7 +412,8 @@ iscan_worker( nanosleep(&tv, NULL); } - ret = scrub_meta_type(ctx, item->scrub_type, 0, &item->alist); + ret = scrub_meta_type(ctx, item->scrub_type, 0, &item->alist, + &item->sri); if (ret) { str_liberror(ctx, ret, _("checking iscan metadata")); *item->abortedp = true; @@ -449,6 +451,7 @@ queue_iscan( str_liberror(ctx, ret, _("setting up iscan")); return ret; } + scrub_item_init_fs(&item->sri); action_list_init(&item->alist); item->scrub_type = scrub_type; item->abortedp = abortedp; diff --git a/scrub/phase7.c b/scrub/phase7.c index e9cb40f48d8..ddc1e3b24e3 100644 --- a/scrub/phase7.c +++ b/scrub/phase7.c @@ -99,6 +99,7 @@ phase7_func( struct scrub_ctx *ctx) { struct summary_counts totalcount = {0}; + struct scrub_item sri; struct action_list alist; struct ptvar *ptvar; unsigned long long used_data; @@ -117,8 +118,9 @@ phase7_func( int error; /* Check and fix the summary metadata. */ + scrub_item_init_fs(&sri); action_list_init(&alist); - error = scrub_summary_metadata(ctx, &alist); + error = scrub_summary_metadata(ctx, &alist, &sri); if (error) return error; error = action_list_process(ctx, -1, &alist, diff --git a/scrub/scrub.c b/scrub/scrub.c index fe4603f863b..55653b31c4c 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -264,7 +264,8 @@ scrub_meta_type( struct scrub_ctx *ctx, unsigned int type, xfs_agnumber_t agno, - struct action_list *alist) + struct action_list *alist, + struct scrub_item *sri) { struct xfs_scrub_metadata meta = { .sm_type = type, @@ -283,11 +284,13 @@ scrub_meta_type( case CHECK_ABORT: return ECANCELED; case CHECK_REPAIR: + scrub_item_save_state(sri, type, meta.sm_flags); ret = scrub_save_repair(ctx, alist, &meta); if (ret) return ret; fallthrough; case CHECK_DONE: + scrub_item_clean_state(sri, type); return 0; default: /* CHECK_RETRY should never happen. */ @@ -305,7 +308,8 @@ scrub_group( struct scrub_ctx *ctx, enum xfrog_scrub_group group, xfs_agnumber_t agno, - struct action_list *alist) + struct action_list *alist, + struct scrub_item *sri) { const struct xfrog_scrub_descr *sc; unsigned int type; @@ -317,7 +321,7 @@ scrub_group( if (sc->group != group) continue; - ret = scrub_meta_type(ctx, type, agno, alist); + ret = scrub_meta_type(ctx, type, agno, alist, sri); if (ret) return ret; } @@ -330,9 +334,10 @@ int scrub_ag_headers( struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist) + struct action_list *alist, + struct scrub_item *sri) { - return scrub_group(ctx, XFROG_SCRUB_GROUP_AGHEADER, agno, alist); + return scrub_group(ctx, XFROG_SCRUB_GROUP_AGHEADER, agno, alist, sri); } /* Scrub each AG's metadata btrees. */ @@ -340,9 +345,10 @@ int scrub_ag_metadata( struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist) + struct action_list *alist, + struct scrub_item *sri) { - return scrub_group(ctx, XFROG_SCRUB_GROUP_PERAG, agno, alist); + return scrub_group(ctx, XFROG_SCRUB_GROUP_PERAG, agno, alist, sri); } /* Scrub one metadata file */ @@ -350,20 +356,22 @@ int scrub_metadata_file( struct scrub_ctx *ctx, unsigned int type, - struct action_list *alist) + struct action_list *alist, + struct scrub_item *sri) { ASSERT(xfrog_scrubbers[type].group == XFROG_SCRUB_GROUP_METAFILES); - return scrub_meta_type(ctx, type, 0, alist); + return scrub_meta_type(ctx, type, 0, alist, sri); } /* Scrub all FS summary metadata. */ int scrub_summary_metadata( struct scrub_ctx *ctx, - struct action_list *alist) + struct action_list *alist, + struct scrub_item *sri) { - return scrub_group(ctx, XFROG_SCRUB_GROUP_SUMMARY, 0, alist); + return scrub_group(ctx, XFROG_SCRUB_GROUP_SUMMARY, 0, alist, sri); } /* How many items do we have to check? */ @@ -425,7 +433,8 @@ scrub_file( int fd, const struct xfs_bulkstat *bstat, unsigned int type, - struct action_list *alist) + struct action_list *alist, + struct scrub_item *sri) { struct xfs_scrub_metadata meta = {0}; struct xfs_fd xfd; @@ -454,12 +463,45 @@ scrub_file( fix = xfs_check_metadata(ctx, xfdp, &meta, true); if (fix == CHECK_ABORT) return ECANCELED; - if (fix == CHECK_DONE) + if (fix == CHECK_DONE) { + scrub_item_clean_state(sri, type); return 0; + } + scrub_item_save_state(sri, type, meta.sm_flags); return scrub_save_repair(ctx, alist, &meta); } +/* Dump a scrub item for debugging purposes. */ +void +scrub_item_dump( + struct scrub_item *sri, + unsigned int group_mask, + const char *tag) +{ + unsigned int i; + + if (group_mask == 0) + group_mask = -1U; + + printf("DUMP SCRUB ITEM FOR %s\n", tag); + if (sri->sri_ino != -1ULL) + printf("ino 0x%llx gen %u\n", (unsigned long long)sri->sri_ino, + sri->sri_gen); + if (sri->sri_agno != -1U) + printf("agno %u\n", sri->sri_agno); + + foreach_scrub_type(i) { + unsigned int g = 1U << xfrog_scrubbers[i].group; + + if (g & group_mask) + printf("[%u]: type '%s' state 0x%x\n", i, + xfrog_scrubbers[i].name, + sri->sri_state[i]); + } + fflush(stdout); +} + /* * Test the availability of a kernel scrub command. If errors occur (or the * scrub ioctl is rejected) the errors will be logged and this function will diff --git a/scrub/scrub.h b/scrub/scrub.h index b02e8f16815..546651b2818 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -16,17 +16,85 @@ enum check_outcome { struct action_item; +/* + * These flags record the metadata object state that the kernel returned. + * We want to remember if the object was corrupt, if the cross-referencing + * revealed inconsistencies (xcorrupt), if the cross referencing itself failed + * (xfail) or if the object is correct but could be optimised (preen). + */ +#define SCRUB_ITEM_CORRUPT (XFS_SCRUB_OFLAG_CORRUPT) /* (1 << 1) */ +#define SCRUB_ITEM_PREEN (XFS_SCRUB_OFLAG_PREEN) /* (1 << 2) */ +#define SCRUB_ITEM_XFAIL (XFS_SCRUB_OFLAG_XFAIL) /* (1 << 3) */ +#define SCRUB_ITEM_XCORRUPT (XFS_SCRUB_OFLAG_XCORRUPT) /* (1 << 4) */ + +/* All of the state flags that we need to prioritize repair work. */ +#define SCRUB_ITEM_REPAIR_ANY (SCRUB_ITEM_CORRUPT | \ + SCRUB_ITEM_PREEN | \ + SCRUB_ITEM_XFAIL | \ + SCRUB_ITEM_XCORRUPT) + +struct scrub_item { + /* + * Information we need to call the scrub and repair ioctls. Per-AG + * items should set the ino/gen fields to -1; per-inode items should + * set sri_agno to -1; and per-fs items should set all three fields to + * -1. Or use the macros below. + */ + __u64 sri_ino; + __u32 sri_gen; + __u32 sri_agno; + + /* Scrub item state flags, one for each XFS_SCRUB_TYPE. */ + __u8 sri_state[XFS_SCRUB_TYPE_NR]; +}; + +#define foreach_scrub_type(loopvar) \ + for ((loopvar) = 0; (loopvar) < XFS_SCRUB_TYPE_NR; (loopvar)++) + +static inline void +scrub_item_init_ag(struct scrub_item *sri, xfs_agnumber_t agno) +{ + memset(sri, 0, sizeof(*sri)); + sri->sri_agno = agno; + sri->sri_ino = -1ULL; + sri->sri_gen = -1U; +} + +static inline void +scrub_item_init_fs(struct scrub_item *sri) +{ + memset(sri, 0, sizeof(*sri)); + sri->sri_agno = -1U; + sri->sri_ino = -1ULL; + sri->sri_gen = -1U; +} + +static inline void +scrub_item_init_file(struct scrub_item *sri, struct xfs_bulkstat *bstat) +{ + memset(sri, 0, sizeof(*sri)); + sri->sri_agno = -1U; + sri->sri_ino = bstat->bs_ino; + sri->sri_gen = bstat->bs_gen; +} + +void scrub_item_dump(struct scrub_item *sri, unsigned int group_mask, + const char *tag); + void scrub_report_preen_triggers(struct scrub_ctx *ctx); int scrub_ag_headers(struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist); + struct action_list *alist, struct scrub_item *sri); int scrub_ag_metadata(struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist); + struct action_list *alist, struct scrub_item *sri); int scrub_metadata_file(struct scrub_ctx *ctx, unsigned int scrub_type, - struct action_list *alist); -int scrub_iscan_metadata(struct scrub_ctx *ctx, struct action_list *alist); -int scrub_summary_metadata(struct scrub_ctx *ctx, struct action_list *alist); + struct action_list *alist, struct scrub_item *sri); +int scrub_iscan_metadata(struct scrub_ctx *ctx, struct action_list *alist, + struct scrub_item *sri); +int scrub_summary_metadata(struct scrub_ctx *ctx, struct action_list *alist, + struct scrub_item *sri); int scrub_meta_type(struct scrub_ctx *ctx, unsigned int type, - xfs_agnumber_t agno, struct action_list *alist); + xfs_agnumber_t agno, struct action_list *alist, + struct scrub_item *sri); bool can_scrub_fs_metadata(struct scrub_ctx *ctx); bool can_scrub_inode(struct scrub_ctx *ctx); @@ -39,7 +107,8 @@ bool can_repair(struct scrub_ctx *ctx); bool can_force_rebuild(struct scrub_ctx *ctx); int scrub_file(struct scrub_ctx *ctx, int fd, const struct xfs_bulkstat *bstat, - unsigned int type, struct action_list *alist); + unsigned int type, struct action_list *alist, + struct scrub_item *sri); /* Repair parameters are the scrub inputs and retry count. */ struct action_item { diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h index 8bc0c521463..f91c65383d1 100644 --- a/scrub/scrub_private.h +++ b/scrub/scrub_private.h @@ -52,4 +52,23 @@ static inline bool needs_repair(struct xfs_scrub_metadata *sm) void scrub_warn_incomplete_scrub(struct scrub_ctx *ctx, struct descr *dsc, struct xfs_scrub_metadata *meta); +/* Scrub item functions */ + +static inline void +scrub_item_save_state( + struct scrub_item *sri, + unsigned int scrub_type, + unsigned int scrub_flags) +{ + sri->sri_state[scrub_type] = scrub_flags & SCRUB_ITEM_REPAIR_ANY; +} + +static inline void +scrub_item_clean_state( + struct scrub_item *sri, + unsigned int scrub_type) +{ + sri->sri_state[scrub_type] = 0; +} + #endif /* XFS_SCRUB_SCRUB_PRIVATE_H_ */ From patchwork Fri Dec 30 22:18:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085134 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85DC1C4332F for ; Sat, 31 Dec 2022 00:28:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235615AbiLaA2Q (ORCPT ); Fri, 30 Dec 2022 19:28:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235991AbiLaA2B (ORCPT ); Fri, 30 Dec 2022 19:28:01 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3BD01EC6D for ; Fri, 30 Dec 2022 16:27:50 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id E2960CE1AC2 for ; Sat, 31 Dec 2022 00:27:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 284C1C433D2; Sat, 31 Dec 2022 00:27:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446467; bh=B4giqe5pVQwJIUNSwJwY8WxPFwXDF5li50f81W3jSGA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=oXi4W/Jd0plz5cZaOK00X5Xy/NcPEohA+SWfopPgwJQGH/keaJDtc1g+S0k54IJiP 4YZ+v1EQXhRGo/6Hz+y/fKCS6nMtjYmswANtupMfRLz05AjvGG/TaVgK75BJJ/uFWz 47y9o0r4bnK01JtouXXq8SwdknDrVLuuz5ZDpRuOQKuMw5g8VLPOGzw9RNK4X15L02 ZAZR22iqzHVTbe0Td1AaHAeGv165bZbdgGi5KcbJ/5YwBs4mONd4vfv/7b92quC+cy oHeaKXEbFtJWjRYeVDZcxJU2y0ooPDm2ReSjl+NCg/q3sGnt00Z0xtzfD/6q7kHwvA NxFNsQ/MBNmnQ== Subject: [PATCH 2/9] xfs_scrub: use repair_item to direct repair activities From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:17 -0800 Message-ID: <167243869740.715746.16223922246286993336.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Now that the new scrub_item tracks the state of any filesystem object needing any kind of repair, use it to drive filesystem repairs and updates to the in-kernel health status when repair finishes. Signed-off-by: Darrick J. Wong --- scrub/phase1.c | 2 scrub/phase2.c | 24 ++-- scrub/phase3.c | 57 ++++---- scrub/phase4.c | 7 - scrub/phase5.c | 2 scrub/phase7.c | 3 scrub/repair.c | 381 +++++++++++++++++++++++++++++++------------------------- scrub/repair.h | 45 +++++-- scrub/scrub.c | 44 ------ scrub/scrub.h | 12 -- 10 files changed, 298 insertions(+), 279 deletions(-) diff --git a/scrub/phase1.c b/scrub/phase1.c index 3113fc5ccf6..2c0ff7c8327 100644 --- a/scrub/phase1.c +++ b/scrub/phase1.c @@ -71,7 +71,7 @@ report_to_kernel( * Complain if we cannot fail the clean bill of health, unless we're * just testing repairs. */ - if (action_list_length(&alist) > 0 && + if (repair_item_count_needsrepair(&sri) != 0 && !debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) { str_info(ctx, _("Couldn't upload clean bill of health."), NULL); action_list_discard(&alist); diff --git a/scrub/phase2.c b/scrub/phase2.c index 50c2c88276f..83c467347fe 100644 --- a/scrub/phase2.c +++ b/scrub/phase2.c @@ -58,6 +58,7 @@ scan_ag_metadata( void *arg) { struct scrub_item sri; + struct scrub_item fix_now; struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; struct scan_ctl *sctl = arg; struct action_list alist; @@ -83,7 +84,7 @@ scan_ag_metadata( goto err; /* Repair header damage. */ - ret = action_list_process_or_defer(ctx, agno, &alist); + ret = repair_item_corruption(ctx, &sri); if (ret) goto err; @@ -99,17 +100,19 @@ scan_ag_metadata( * the inobt from rmapbt data, but if the rmapbt is broken even * at this early phase then we are sunk. */ - difficulty = action_list_difficulty(&alist); - action_list_find_mustfix(&alist, &immediate_alist); + difficulty = repair_item_difficulty(&sri); + repair_item_mustfix(&sri, &fix_now); warn_repair_difficulties(ctx, difficulty, descr); /* Repair (inode) btree damage. */ - ret = action_list_process_or_defer(ctx, agno, &immediate_alist); + ret = repair_item_corruption(ctx, &fix_now); if (ret) goto err; /* Everything else gets fixed during phase 4. */ - action_list_defer(ctx, agno, &alist); + ret = repair_item_defer(ctx, &sri); + if (ret) + goto err; return; err: sctl->aborted = true; @@ -141,10 +144,14 @@ scan_metafile( } /* Complain about metadata corruptions that might not be fixable. */ - difficulty = action_list_difficulty(&alist); + difficulty = repair_item_difficulty(&sri); warn_repair_difficulties(ctx, difficulty, xfrog_scrubbers[type].descr); - action_list_defer(ctx, 0, &alist); + ret = repair_item_defer(ctx, &sri); + if (ret) { + sctl->aborted = true; + goto out; + } out: if (type == XFS_SCRUB_TYPE_RTBITMAP) { @@ -193,8 +200,7 @@ phase2_func( ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_SB, 0, &alist, &sri); if (ret) goto out_wq; - ret = action_list_process(ctx, -1, &alist, - XRM_FINAL_WARNING | XRM_NOPROGRESS); + ret = repair_item_completely(ctx, &sri); if (ret) goto out_wq; diff --git a/scrub/phase3.c b/scrub/phase3.c index ef22a1d11c1..7e09c48ce18 100644 --- a/scrub/phase3.c +++ b/scrub/phase3.c @@ -55,45 +55,48 @@ report_close_error( * Defer all the repairs until phase 4, being careful about locking since the * inode scrub threads are not per-AG. */ -static void +static int defer_inode_repair( - struct scrub_inode_ctx *ictx, - xfs_agnumber_t agno, - struct action_list *alist) + struct scrub_inode_ctx *ictx, + const struct xfs_bulkstat *bstat, + struct scrub_item *sri) { - if (alist->nr == 0) - return; + struct action_item *aitem = NULL; + xfs_agnumber_t agno; + int ret; + ret = repair_item_to_action_item(ictx->ctx, sri, &aitem); + if (ret || !aitem) + return ret; + + agno = cvt_ino_to_agno(&ictx->ctx->mnt, bstat->bs_ino); pthread_mutex_lock(&ictx->locks[agno]); - action_list_defer(ictx->ctx, agno, alist); + action_list_add(&ictx->ctx->action_lists[agno], aitem); pthread_mutex_unlock(&ictx->locks[agno]); + return 0; } -/* Run repair actions now and defer unfinished items for later. */ +/* Run repair actions now and leave unfinished items for later. */ static int try_inode_repair( - struct scrub_inode_ctx *ictx, - int fd, - xfs_agnumber_t agno, - struct action_list *alist) + struct scrub_inode_ctx *ictx, + struct scrub_item *sri, + int fd, + const struct xfs_bulkstat *bstat) { - int ret; - /* * If at the start of phase 3 we already had ag/rt metadata repairs * queued up for phase 4, leave the action list untouched so that file - * metadata repairs will be deferred in scan order until phase 4. + * metadata repairs will be deferred until phase 4. */ if (ictx->always_defer_repairs) return 0; - ret = action_list_process(ictx->ctx, fd, alist, - XRM_REPAIR_ONLY | XRM_NOPROGRESS); - if (ret) - return ret; - - defer_inode_repair(ictx, agno, alist); - return 0; + /* + * Try to repair the file metadata. Unfixed metadata will remain in + * the scrub item state to be queued as a single action item. + */ + return repair_file_corruption(ictx->ctx, sri, fd); } /* Verify the contents, xattrs, and extent maps of an inode. */ @@ -108,13 +111,11 @@ scrub_inode( struct scrub_item sri; struct scrub_inode_ctx *ictx = arg; struct ptcounter *icount = ictx->icount; - xfs_agnumber_t agno; int fd = -1; int error; scrub_item_init_file(&sri, bstat); action_list_init(&alist); - agno = cvt_ino_to_agno(&ctx->mnt, bstat->bs_ino); background_sleep(); /* @@ -149,7 +150,7 @@ scrub_inode( if (error) goto out; - error = try_inode_repair(ictx, fd, agno, &alist); + error = try_inode_repair(ictx, &sri, fd, bstat); if (error) goto out; @@ -164,7 +165,7 @@ scrub_inode( if (error) goto out; - error = try_inode_repair(ictx, fd, agno, &alist); + error = try_inode_repair(ictx, &sri, fd, bstat); if (error) goto out; @@ -191,7 +192,7 @@ scrub_inode( goto out; /* Try to repair the file while it's open. */ - error = try_inode_repair(ictx, fd, agno, &alist); + error = try_inode_repair(ictx, &sri, fd, bstat); if (error) goto out; @@ -208,7 +209,7 @@ scrub_inode( progress_add(1); if (!error && !ictx->aborted) - defer_inode_repair(ictx, agno, &alist); + error = defer_inode_repair(ictx, bstat, &sri); if (fd >= 0) { int err2; diff --git a/scrub/phase4.c b/scrub/phase4.c index 31939653bda..3afd04af47e 100644 --- a/scrub/phase4.c +++ b/scrub/phase4.c @@ -40,7 +40,7 @@ repair_ag( /* Repair anything broken until we fail to make progress. */ do { - ret = action_list_process(ctx, -1, alist, flags); + ret = action_list_process(ctx, alist, flags); if (ret) { *aborted = true; return; @@ -55,7 +55,7 @@ repair_ag( /* Try once more, but this time complain if we can't fix things. */ flags |= XRM_FINAL_WARNING; - ret = action_list_process(ctx, -1, alist, flags); + ret = action_list_process(ctx, alist, flags); if (ret) *aborted = true; } @@ -167,8 +167,7 @@ phase4_func( } /* Repair counters before starting on the rest. */ - ret = action_list_process(ctx, -1, &alist, - XRM_REPAIR_ONLY | XRM_NOPROGRESS); + ret = repair_item_corruption(ctx, &sri); if (ret) return ret; action_list_discard(&alist); diff --git a/scrub/phase5.c b/scrub/phase5.c index ea77c2a5298..b7801b46760 100644 --- a/scrub/phase5.c +++ b/scrub/phase5.c @@ -420,7 +420,7 @@ iscan_worker( goto out; } - ret = action_list_process(ctx, ctx->mnt.fd, &item->alist, + ret = action_list_process(ctx, &item->alist, XRM_FINAL_WARNING | XRM_NOPROGRESS); if (ret) { str_liberror(ctx, ret, _("repairing iscan metadata")); diff --git a/scrub/phase7.c b/scrub/phase7.c index ddc1e3b24e3..15540778ffa 100644 --- a/scrub/phase7.c +++ b/scrub/phase7.c @@ -123,8 +123,7 @@ phase7_func( error = scrub_summary_metadata(ctx, &alist, &sri); if (error) return error; - error = action_list_process(ctx, -1, &alist, - XRM_FINAL_WARNING | XRM_NOPROGRESS); + error = repair_item_completely(ctx, &sri); if (error) return error; diff --git a/scrub/repair.c b/scrub/repair.c index 6be5d7684b3..cadd2c20627 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -27,7 +27,8 @@ static enum check_outcome xfs_repair_metadata( struct scrub_ctx *ctx, struct xfs_fd *xfdp, - struct action_item *aitem, + unsigned int scrub_type, + struct scrub_item *sri, unsigned int repair_flags) { struct xfs_scrub_metadata meta = { 0 }; @@ -35,20 +36,20 @@ xfs_repair_metadata( DEFINE_DESCR(dsc, ctx, format_scrub_descr); int error; - assert(aitem->type < XFS_SCRUB_TYPE_NR); + assert(scrub_type < XFS_SCRUB_TYPE_NR); assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL")); - meta.sm_type = aitem->type; - meta.sm_flags = aitem->flags | XFS_SCRUB_IFLAG_REPAIR; + meta.sm_type = scrub_type; + meta.sm_flags = XFS_SCRUB_IFLAG_REPAIR; if (use_force_rebuild) meta.sm_flags |= XFS_SCRUB_IFLAG_FORCE_REBUILD; - switch (xfrog_scrubbers[aitem->type].group) { + switch (xfrog_scrubbers[scrub_type].group) { case XFROG_SCRUB_GROUP_AGHEADER: case XFROG_SCRUB_GROUP_PERAG: - meta.sm_agno = aitem->agno; + meta.sm_agno = sri->sri_agno; break; case XFROG_SCRUB_GROUP_INODE: - meta.sm_ino = aitem->ino; - meta.sm_gen = aitem->gen; + meta.sm_ino = sri->sri_ino; + meta.sm_gen = sri->sri_gen; break; default: break; @@ -58,9 +59,10 @@ xfs_repair_metadata( return CHECK_RETRY; memcpy(&oldm, &meta, sizeof(oldm)); + oldm.sm_flags = sri->sri_state[scrub_type] & SCRUB_ITEM_REPAIR_ANY; descr_set(&dsc, &oldm); - if (needs_repair(&meta)) + if (needs_repair(&oldm)) str_info(ctx, descr_render(&dsc), _("Attempting repair.")); else if (debug || verbose) str_info(ctx, descr_render(&dsc), @@ -100,13 +102,16 @@ _("Filesystem is shut down, aborting.")); * error out if the kernel doesn't know how to fix. */ if (is_unoptimized(&oldm) || - debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) + debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) { + scrub_item_clean_state(sri, scrub_type); return CHECK_DONE; + } fallthrough; case EINVAL: /* Kernel doesn't know how to repair this? */ str_corrupt(ctx, descr_render(&dsc), _("Don't know how to fix; offline repair required.")); + scrub_item_clean_state(sri, scrub_type); return CHECK_DONE; case EROFS: /* Read-only filesystem, can't fix. */ @@ -116,23 +121,28 @@ _("Read-only filesystem; cannot make changes.")); return CHECK_ABORT; case ENOENT: /* Metadata not present, just skip it. */ + scrub_item_clean_state(sri, scrub_type); return CHECK_DONE; case ENOMEM: case ENOSPC: /* Don't care if preen fails due to low resources. */ - if (is_unoptimized(&oldm) && !needs_repair(&oldm)) + if (is_unoptimized(&oldm) && !needs_repair(&oldm)) { + scrub_item_clean_state(sri, scrub_type); return CHECK_DONE; + } fallthrough; default: /* - * Operational error. If the caller doesn't want us - * to complain about repair failures, tell the caller - * to requeue the repair for later and don't say a - * thing. Otherwise, print error and bail out. + * Operational error. If the caller doesn't want us to + * complain about repair failures, tell the caller to requeue + * the repair for later and don't say a thing. Otherwise, + * print an error, mark the item clean because we're done with + * trying to repair it, and bail out. */ if (!(repair_flags & XRM_FINAL_WARNING)) return CHECK_RETRY; str_liberror(ctx, error, descr_render(&dsc)); + scrub_item_clean_state(sri, scrub_type); return CHECK_DONE; } @@ -178,12 +188,13 @@ _("Read-only filesystem; cannot make changes.")); record_preen(ctx, descr_render(&dsc), _("Optimization successful.")); } + + scrub_item_clean_state(sri, scrub_type); return CHECK_DONE; } /* * Prioritize action items in order of how long we can wait. - * 0 = do it now, 10000 = do it later. * * To minimize the amount of repair work, we want to prioritize metadata * objects by perceived corruptness. If CORRUPT is set, the fields are @@ -199,104 +210,34 @@ _("Read-only filesystem; cannot make changes.")); * in order. */ -/* Sort action items in severity order. */ -static int -PRIO( - struct action_item *aitem, - int order) -{ - if (aitem->flags & XFS_SCRUB_OFLAG_CORRUPT) - return order; - else if (aitem->flags & XFS_SCRUB_OFLAG_XCORRUPT) - return 100 + order; - else if (aitem->flags & XFS_SCRUB_OFLAG_XFAIL) - return 200 + order; - else if (aitem->flags & XFS_SCRUB_OFLAG_PREEN) - return 300 + order; - abort(); -} - -/* Sort the repair items in dependency order. */ -static int -xfs_action_item_priority( - struct action_item *aitem) -{ - switch (aitem->type) { - case XFS_SCRUB_TYPE_SB: - case XFS_SCRUB_TYPE_AGF: - case XFS_SCRUB_TYPE_AGFL: - case XFS_SCRUB_TYPE_AGI: - case XFS_SCRUB_TYPE_BNOBT: - case XFS_SCRUB_TYPE_CNTBT: - case XFS_SCRUB_TYPE_INOBT: - case XFS_SCRUB_TYPE_FINOBT: - case XFS_SCRUB_TYPE_REFCNTBT: - case XFS_SCRUB_TYPE_RMAPBT: - case XFS_SCRUB_TYPE_INODE: - case XFS_SCRUB_TYPE_BMBTD: - case XFS_SCRUB_TYPE_BMBTA: - case XFS_SCRUB_TYPE_BMBTC: - return PRIO(aitem, aitem->type - 1); - case XFS_SCRUB_TYPE_DIR: - case XFS_SCRUB_TYPE_XATTR: - case XFS_SCRUB_TYPE_SYMLINK: - case XFS_SCRUB_TYPE_PARENT: - return PRIO(aitem, XFS_SCRUB_TYPE_DIR); - case XFS_SCRUB_TYPE_RTBITMAP: - case XFS_SCRUB_TYPE_RTSUM: - return PRIO(aitem, XFS_SCRUB_TYPE_RTBITMAP); - case XFS_SCRUB_TYPE_UQUOTA: - case XFS_SCRUB_TYPE_GQUOTA: - case XFS_SCRUB_TYPE_PQUOTA: - return PRIO(aitem, XFS_SCRUB_TYPE_UQUOTA); - case XFS_SCRUB_TYPE_QUOTACHECK: - /* This should always go after [UGP]QUOTA no matter what. */ - return PRIO(aitem, aitem->type); - case XFS_SCRUB_TYPE_FSCOUNTERS: - /* This should always go after AG headers no matter what. */ - return PRIO(aitem, INT_MAX); - } - abort(); -} - -/* Make sure that btrees get repaired before headers. */ -static int -xfs_action_item_compare( - void *priv, - struct list_head *a, - struct list_head *b) -{ - struct action_item *ra; - struct action_item *rb; - - ra = container_of(a, struct action_item, list); - rb = container_of(b, struct action_item, list); - - return xfs_action_item_priority(ra) - xfs_action_item_priority(rb); -} +struct action_item { + struct list_head list; + struct scrub_item sri; +}; /* * Figure out which AG metadata must be fixed before we can move on * to the inode scan. */ void -action_list_find_mustfix( - struct action_list *alist, - struct action_list *immediate_alist) +repair_item_mustfix( + struct scrub_item *sri, + struct scrub_item *fix_now) { - struct action_item *n; - struct action_item *aitem; + unsigned int scrub_type; - list_for_each_entry_safe(aitem, n, &alist->list, list) { - if (!(aitem->flags & XFS_SCRUB_OFLAG_CORRUPT)) + assert(sri->sri_agno != -1U); + scrub_item_init_ag(fix_now, sri->sri_agno); + + foreach_scrub_type(scrub_type) { + if (!(sri->sri_state[scrub_type] & SCRUB_ITEM_CORRUPT)) continue; - switch (aitem->type) { + + switch (scrub_type) { case XFS_SCRUB_TYPE_AGI: case XFS_SCRUB_TYPE_FINOBT: case XFS_SCRUB_TYPE_INOBT: - alist->nr--; - list_move_tail(&aitem->list, &immediate_alist->list); - immediate_alist->nr++; + fix_now->sri_state[scrub_type] |= SCRUB_ITEM_CORRUPT; break; } } @@ -304,19 +245,19 @@ action_list_find_mustfix( /* Determine if primary or secondary metadata are inconsistent. */ unsigned int -action_list_difficulty( - const struct action_list *alist) +repair_item_difficulty( + const struct scrub_item *sri) { - struct action_item *aitem, *n; - unsigned int ret = 0; + unsigned int scrub_type; + unsigned int ret = 0; - list_for_each_entry_safe(aitem, n, &alist->list, list) { - if (!(aitem->flags & (XFS_SCRUB_OFLAG_CORRUPT | - XFS_SCRUB_OFLAG_XCORRUPT | - XFS_SCRUB_OFLAG_XFAIL))) + foreach_scrub_type(scrub_type) { + if (!(sri->sri_state[scrub_type] & (XFS_SCRUB_OFLAG_CORRUPT | + XFS_SCRUB_OFLAG_XCORRUPT | + XFS_SCRUB_OFLAG_XFAIL))) continue; - switch (aitem->type) { + switch (scrub_type) { case XFS_SCRUB_TYPE_RMAPBT: ret |= REPAIR_DIFFICULTY_SECONDARY; break; @@ -396,13 +337,19 @@ action_list_init( alist->sorted = false; } -/* Number of repairs in this list. */ +/* Number of pending repairs in this list. */ unsigned long long action_list_length( struct action_list *alist) { - return alist->nr; -}; + struct action_item *aitem; + unsigned long long ret = 0; + + list_for_each_entry(aitem, &alist->list, list) + ret += repair_item_count_needsrepair(&aitem->sri); + + return ret; +} /* Add to the list of repairs. */ void @@ -415,60 +362,78 @@ action_list_add( alist->sorted = false; } -/* Splice two repair lists. */ -void -action_list_splice( - struct action_list *dest, - struct action_list *src) -{ - if (src->nr == 0) - return; - - list_splice_tail_init(&src->list, &dest->list); - dest->nr += src->nr; - src->nr = 0; - dest->sorted = false; -} - /* Repair everything on this list. */ int action_list_process( struct scrub_ctx *ctx, - int fd, struct action_list *alist, unsigned int repair_flags) +{ + struct action_item *aitem; + struct action_item *n; + int ret; + + list_for_each_entry_safe(aitem, n, &alist->list, list) { + if (scrub_excessive_errors(ctx)) + return ECANCELED; + + ret = repair_item(ctx, &aitem->sri, repair_flags); + if (ret) + break; + + if (repair_item_count_needsrepair(&aitem->sri) == 0) { + list_del(&aitem->list); + free(aitem); + } + } + + return ret; +} + +/* + * For a given filesystem object, perform all repairs of a given class + * (corrupt, xcorrupt, xfail, preen) if the repair item says it's needed. + */ +static int +repair_item_class( + struct scrub_ctx *ctx, + struct scrub_item *sri, + int override_fd, + uint8_t repair_mask, + unsigned int flags) { struct xfs_fd xfd; struct xfs_fd *xfdp = &ctx->mnt; - struct action_item *aitem; - struct action_item *n; - enum check_outcome fix; + unsigned int scrub_type; + + if (ctx->mode < SCRUB_MODE_REPAIR) + return 0; /* * If the caller passed us a file descriptor for a scrub, use it * instead of scrub-by-handle because this enables the kernel to skip * costly inode btree lookups. */ - if (fd >= 0) { + if (override_fd >= 0) { memcpy(&xfd, xfdp, sizeof(xfd)); - xfd.fd = fd; + xfd.fd = override_fd; xfdp = &xfd; } - if (!alist->sorted) { - list_sort(NULL, &alist->list, xfs_action_item_compare); - alist->sorted = true; - } + foreach_scrub_type(scrub_type) { + enum check_outcome fix; - list_for_each_entry_safe(aitem, n, &alist->list, list) { - fix = xfs_repair_metadata(ctx, xfdp, aitem, repair_flags); + if (scrub_excessive_errors(ctx)) + return ECANCELED; + + if (!(sri->sri_state[scrub_type] & repair_mask)) + continue; + + fix = xfs_repair_metadata(ctx, xfdp, scrub_type, sri, flags); switch (fix) { case CHECK_DONE: - if (!(repair_flags & XRM_NOPROGRESS)) + if (!(flags & XRM_NOPROGRESS)) progress_add(1); - alist->nr--; - list_del(&aitem->list); - free(aitem); continue; case CHECK_ABORT: return ECANCELED; @@ -479,37 +444,113 @@ action_list_process( } } - if (scrub_excessive_errors(ctx)) - return ECANCELED; + return 0; +} + +/* + * Repair all parts (i.e. scrub types) of this filesystem object for which + * corruption has been observed directly. Other types of repair work (fixing + * cross referencing problems and preening) are deferred. + * + * This function should only be called to perform spot repairs of fs objects + * during phase 2 and 3 while we still have open handles to those objects. + */ +int +repair_item_corruption( + struct scrub_ctx *ctx, + struct scrub_item *sri) +{ + return repair_item_class(ctx, sri, -1, SCRUB_ITEM_CORRUPT, + XRM_REPAIR_ONLY | XRM_NOPROGRESS); +} + +/* Repair all parts of this file, similar to repair_item_corruption. */ +int +repair_file_corruption( + struct scrub_ctx *ctx, + struct scrub_item *sri, + int override_fd) +{ + return repair_item_class(ctx, sri, override_fd, SCRUB_ITEM_CORRUPT, + XRM_REPAIR_ONLY | XRM_NOPROGRESS); +} + +/* + * Repair everything in this filesystem object that needs it. This includes + * cross-referencing and preening. + */ +int +repair_item( + struct scrub_ctx *ctx, + struct scrub_item *sri, + unsigned int flags) +{ + int ret; + + ret = repair_item_class(ctx, sri, -1, SCRUB_ITEM_CORRUPT, flags); + if (ret) + return ret; + + ret = repair_item_class(ctx, sri, -1, SCRUB_ITEM_XCORRUPT, flags); + if (ret) + return ret; + + ret = repair_item_class(ctx, sri, -1, SCRUB_ITEM_XFAIL, flags); + if (ret) + return ret; + + return repair_item_class(ctx, sri, -1, SCRUB_ITEM_PREEN, flags); +} + +/* Create an action item around a scrub item that needs repairs. */ +int +repair_item_to_action_item( + struct scrub_ctx *ctx, + const struct scrub_item *sri, + struct action_item **aitemp) +{ + struct action_item *aitem; + + if (repair_item_count_needsrepair(sri) == 0) + return 0; + + aitem = malloc(sizeof(struct action_item)); + if (!aitem) { + int error = errno; + + str_liberror(ctx, error, _("creating repair action item")); + return error; + } + + INIT_LIST_HEAD(&aitem->list); + memcpy(&aitem->sri, sri, sizeof(struct scrub_item)); + + *aitemp = aitem; return 0; } /* Defer all the repairs until phase 4. */ -void -action_list_defer( - struct scrub_ctx *ctx, - xfs_agnumber_t agno, - struct action_list *alist) +int +repair_item_defer( + struct scrub_ctx *ctx, + const struct scrub_item *sri) { + struct action_item *aitem = NULL; + unsigned int agno; + int error; + + error = repair_item_to_action_item(ctx, sri, &aitem); + if (error || !aitem) + return error; + + if (sri->sri_agno != -1U) + agno = sri->sri_agno; + else if (sri->sri_ino != -1ULL && sri->sri_gen != -1U) + agno = cvt_ino_to_agno(&ctx->mnt, sri->sri_ino); + else + agno = 0; ASSERT(agno < ctx->mnt.fsgeom.agcount); - action_list_splice(&ctx->action_lists[agno], alist); -} - -/* Run actions now and defer unfinished items for later. */ -int -action_list_process_or_defer( - struct scrub_ctx *ctx, - xfs_agnumber_t agno, - struct action_list *alist) -{ - int ret; - - ret = action_list_process(ctx, -1, alist, - XRM_REPAIR_ONLY | XRM_NOPROGRESS); - if (ret) - return ret; - - action_list_defer(ctx, agno, alist); + action_list_add(&ctx->action_lists[agno], aitem); return 0; } diff --git a/scrub/repair.h b/scrub/repair.h index 4c3fd718575..b0b448cef7a 100644 --- a/scrub/repair.h +++ b/scrub/repair.h @@ -12,6 +12,8 @@ struct action_list { bool sorted; }; +struct action_item; + int action_lists_alloc(size_t nr, struct action_list **listsp); void action_lists_free(struct action_list **listsp); @@ -25,16 +27,14 @@ static inline bool action_list_empty(const struct action_list *alist) unsigned long long action_list_length(struct action_list *alist); void action_list_add(struct action_list *dest, struct action_item *item); void action_list_discard(struct action_list *alist); -void action_list_splice(struct action_list *dest, struct action_list *src); -void action_list_find_mustfix(struct action_list *actions, - struct action_list *immediate_alist); +void repair_item_mustfix(struct scrub_item *sri, struct scrub_item *fix_now); /* Primary metadata is corrupt */ #define REPAIR_DIFFICULTY_PRIMARY (1U << 0) /* Secondary metadata is corrupt */ #define REPAIR_DIFFICULTY_SECONDARY (1U << 1) -unsigned int action_list_difficulty(const struct action_list *actions); +unsigned int repair_item_difficulty(const struct scrub_item *sri); /* * Only ask the kernel to repair this object if the kernel directly told us it @@ -49,11 +49,36 @@ unsigned int action_list_difficulty(const struct action_list *actions); /* Don't call progress_add after repairing an item. */ #define XRM_NOPROGRESS (1U << 2) -int action_list_process(struct scrub_ctx *ctx, int fd, - struct action_list *alist, unsigned int repair_flags); -void action_list_defer(struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist); -int action_list_process_or_defer(struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist); +int action_list_process(struct scrub_ctx *ctx, struct action_list *alist, + unsigned int repair_flags); +int repair_item_corruption(struct scrub_ctx *ctx, struct scrub_item *sri); +int repair_file_corruption(struct scrub_ctx *ctx, struct scrub_item *sri, + int override_fd); +int repair_item(struct scrub_ctx *ctx, struct scrub_item *sri, + unsigned int repair_flags); +int repair_item_to_action_item(struct scrub_ctx *ctx, + const struct scrub_item *sri, struct action_item **aitemp); +int repair_item_defer(struct scrub_ctx *ctx, const struct scrub_item *sri); + +static inline unsigned int +repair_item_count_needsrepair( + const struct scrub_item *sri) +{ + unsigned int scrub_type; + unsigned int nr = 0; + + foreach_scrub_type(scrub_type) + if (sri->sri_state[scrub_type] & SCRUB_ITEM_REPAIR_ANY) + nr++; + return nr; +} + +static inline int +repair_item_completely( + struct scrub_ctx *ctx, + struct scrub_item *sri) +{ + return repair_item(ctx, sri, XRM_FINAL_WARNING | XRM_NOPROGRESS); +} #endif /* XFS_SCRUB_REPAIR_H_ */ diff --git a/scrub/scrub.c b/scrub/scrub.c index 55653b31c4c..e3bfee40489 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -217,42 +217,6 @@ _("Optimizations of %s are possible."), _(xfrog_scrubbers[i].descr)); } } -/* Save a scrub context for later repairs. */ -static int -scrub_save_repair( - struct scrub_ctx *ctx, - struct action_list *alist, - struct xfs_scrub_metadata *meta) -{ - struct action_item *aitem; - - /* Schedule this item for later repairs. */ - aitem = malloc(sizeof(struct action_item)); - if (!aitem) { - str_errno(ctx, _("adding item to repair list")); - return errno; - } - - memset(aitem, 0, sizeof(*aitem)); - aitem->type = meta->sm_type; - aitem->flags = meta->sm_flags; - switch (xfrog_scrubbers[meta->sm_type].group) { - case XFROG_SCRUB_GROUP_AGHEADER: - case XFROG_SCRUB_GROUP_PERAG: - aitem->agno = meta->sm_agno; - break; - case XFROG_SCRUB_GROUP_INODE: - aitem->ino = meta->sm_ino; - aitem->gen = meta->sm_gen; - break; - default: - break; - } - - action_list_add(alist, aitem); - return 0; -} - /* * Scrub a single XFS_SCRUB_TYPE_*, saving corruption reports for later. * @@ -272,7 +236,6 @@ scrub_meta_type( .sm_agno = agno, }; enum check_outcome fix; - int ret; background_sleep(); @@ -285,10 +248,7 @@ scrub_meta_type( return ECANCELED; case CHECK_REPAIR: scrub_item_save_state(sri, type, meta.sm_flags); - ret = scrub_save_repair(ctx, alist, &meta); - if (ret) - return ret; - fallthrough; + return 0; case CHECK_DONE: scrub_item_clean_state(sri, type); return 0; @@ -469,7 +429,7 @@ scrub_file( } scrub_item_save_state(sri, type, meta.sm_flags); - return scrub_save_repair(ctx, alist, &meta); + return 0; } /* Dump a scrub item for debugging purposes. */ diff --git a/scrub/scrub.h b/scrub/scrub.h index 546651b2818..95882eabedb 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -14,8 +14,6 @@ enum check_outcome { CHECK_RETRY, /* repair failed, try again later */ }; -struct action_item; - /* * These flags record the metadata object state that the kernel returned. * We want to remember if the object was corrupt, if the cross-referencing @@ -110,14 +108,4 @@ int scrub_file(struct scrub_ctx *ctx, int fd, const struct xfs_bulkstat *bstat, unsigned int type, struct action_list *alist, struct scrub_item *sri); -/* Repair parameters are the scrub inputs and retry count. */ -struct action_item { - struct list_head list; - __u64 ino; - __u32 type; - __u32 flags; - __u32 gen; - __u32 agno; -}; - #endif /* XFS_SCRUB_SCRUB_H_ */ From patchwork Fri Dec 30 22:18:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C66D4C3DA7C for ; Sat, 31 Dec 2022 00:28:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235614AbiLaA2T (ORCPT ); Fri, 30 Dec 2022 19:28:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235939AbiLaA2F (ORCPT ); Fri, 30 Dec 2022 19:28:05 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7C3C1EADE for ; Fri, 30 Dec 2022 16:28:03 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 48B4561D32 for ; Sat, 31 Dec 2022 00:28:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A32B2C433EF; Sat, 31 Dec 2022 00:28:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446482; bh=st3VsYxtjCs4C3MHvkBqyRaCYNgQBLVaGZEOKLovux0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=hlM6t2Cj75PeOBBJZURpw77IKEgH20iaMFLlG02TIgXxIjMaSffcpriJtU2IJ2aIX 43nybULGjKPbegLk6NWNWsiHO9xUYC3EFpIt7oKL+Te525WgllkcZh0w00SmE6XmH9 YPKEnBB+qNIuoZs0RV3JyvXe9g8cysaT9D9nPFzbQSMyUl7pOY/mS87dMVIrSKnu3W kOyc7TdNF9HElmAwVaLxzn6KnJdD/T61Nr64zLmC5gEP2/3Y2XejkXGCm5kokFzoKe NdHnp81Ea3pdvMGYIhDbWUF/6TnDArbnkWJmmWzExGGgu7LyXQwzb2bBoF7zI1DSpm XMltijHJnJY+w== Subject: [PATCH 3/9] xfs_scrub: remove action lists from phaseX code From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:17 -0800 Message-ID: <167243869753.715746.12688530121907469172.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Now that we track repair schedules by filesystem object (and not individual repairs) we can get rid of all the onstack list heads and whatnot in the phaseX code. Signed-off-by: Darrick J. Wong --- scrub/phase1.c | 5 +---- scrub/phase2.c | 16 ++++------------ scrub/phase3.c | 19 ++++++++----------- scrub/phase4.c | 8 ++------ scrub/phase5.c | 8 ++------ scrub/phase7.c | 4 +--- scrub/scrub.c | 37 ++++++++++++++++++++----------------- scrub/scrub.h | 16 +++++----------- 8 files changed, 43 insertions(+), 70 deletions(-) diff --git a/scrub/phase1.c b/scrub/phase1.c index 2c0ff7c8327..6b2f6cdd5fa 100644 --- a/scrub/phase1.c +++ b/scrub/phase1.c @@ -53,7 +53,6 @@ report_to_kernel( struct scrub_ctx *ctx) { struct scrub_item sri; - struct action_list alist; int ret; if (!ctx->scrub_setup_succeeded || ctx->corruptions_found || @@ -62,8 +61,7 @@ report_to_kernel( return 0; scrub_item_init_fs(&sri); - action_list_init(&alist); - ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_HEALTHY, 0, &alist, &sri); + ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_HEALTHY, &sri); if (ret) return ret; @@ -74,7 +72,6 @@ report_to_kernel( if (repair_item_count_needsrepair(&sri) != 0 && !debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) { str_info(ctx, _("Couldn't upload clean bill of health."), NULL); - action_list_discard(&alist); } return 0; diff --git a/scrub/phase2.c b/scrub/phase2.c index 83c467347fe..656eccce449 100644 --- a/scrub/phase2.c +++ b/scrub/phase2.c @@ -61,8 +61,6 @@ scan_ag_metadata( struct scrub_item fix_now; struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; struct scan_ctl *sctl = arg; - struct action_list alist; - struct action_list immediate_alist; char descr[DESCR_BUFSZ]; unsigned int difficulty; int ret; @@ -71,15 +69,13 @@ scan_ag_metadata( return; scrub_item_init_ag(&sri, agno); - action_list_init(&alist); - action_list_init(&immediate_alist); snprintf(descr, DESCR_BUFSZ, _("AG %u"), agno); /* * First we scrub and fix the AG headers, because we need * them to work well enough to check the AG btrees. */ - ret = scrub_ag_headers(ctx, agno, &alist, &sri); + ret = scrub_ag_headers(ctx, &sri); if (ret) goto err; @@ -89,7 +85,7 @@ scan_ag_metadata( goto err; /* Now scrub the AG btrees. */ - ret = scrub_ag_metadata(ctx, agno, &alist, &sri); + ret = scrub_ag_metadata(ctx, &sri); if (ret) goto err; @@ -126,7 +122,6 @@ scan_metafile( void *arg) { struct scrub_item sri; - struct action_list alist; struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; struct scan_ctl *sctl = arg; unsigned int difficulty; @@ -136,8 +131,7 @@ scan_metafile( goto out; scrub_item_init_fs(&sri); - action_list_init(&alist); - ret = scrub_metadata_file(ctx, type, &alist, &sri); + ret = scrub_metadata_file(ctx, type, &sri); if (ret) { sctl->aborted = true; goto out; @@ -172,7 +166,6 @@ phase2_func( .aborted = false, .rbm_done = false, }; - struct action_list alist; struct scrub_item sri; const struct xfrog_scrub_descr *sc = xfrog_scrubbers; xfs_agnumber_t agno; @@ -196,8 +189,7 @@ phase2_func( * If errors occur, this function will log them and return nonzero. */ scrub_item_init_ag(&sri, 0); - action_list_init(&alist); - ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_SB, 0, &alist, &sri); + ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_SB, &sri); if (ret) goto out_wq; ret = repair_item_completely(ctx, &sri); diff --git a/scrub/phase3.c b/scrub/phase3.c index 7e09c48ce18..01171de64d1 100644 --- a/scrub/phase3.c +++ b/scrub/phase3.c @@ -107,7 +107,6 @@ scrub_inode( struct xfs_bulkstat *bstat, void *arg) { - struct action_list alist; struct scrub_item sri; struct scrub_inode_ctx *ictx = arg; struct ptcounter *icount = ictx->icount; @@ -115,7 +114,6 @@ scrub_inode( int error; scrub_item_init_file(&sri, bstat); - action_list_init(&alist); background_sleep(); /* @@ -146,7 +144,7 @@ scrub_inode( fd = scrub_open_handle(handle); /* Scrub the inode. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_INODE, &alist, &sri); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_INODE, &sri); if (error) goto out; @@ -155,13 +153,13 @@ scrub_inode( goto out; /* Scrub all block mappings. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTD, &alist, &sri); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTD, &sri); if (error) goto out; - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTA, &alist, &sri); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTA, &sri); if (error) goto out; - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTC, &alist, &sri); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_BMBTC, &sri); if (error) goto out; @@ -172,22 +170,21 @@ scrub_inode( if (S_ISLNK(bstat->bs_mode)) { /* Check symlink contents. */ error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_SYMLINK, - &alist, &sri); + &sri); } else if (S_ISDIR(bstat->bs_mode)) { /* Check the directory entries. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_DIR, &alist, - &sri); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_DIR, &sri); } if (error) goto out; /* Check all the extended attributes. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_XATTR, &alist, &sri); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_XATTR, &sri); if (error) goto out; /* Check parent pointers. */ - error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_PARENT, &alist, &sri); + error = scrub_file(ctx, fd, bstat, XFS_SCRUB_TYPE_PARENT, &sri); if (error) goto out; diff --git a/scrub/phase4.c b/scrub/phase4.c index 3afd04af47e..ee6aa90f326 100644 --- a/scrub/phase4.c +++ b/scrub/phase4.c @@ -129,7 +129,6 @@ phase4_func( struct scrub_ctx *ctx) { struct xfs_fsop_geom fsgeom; - struct action_list alist; struct scrub_item sri; int ret; @@ -144,8 +143,7 @@ phase4_func( * metadata. If repairs fails, we'll come back during phase 7. */ scrub_item_init_fs(&sri); - action_list_init(&alist); - ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_FSCOUNTERS, 0, &alist, &sri); + ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_FSCOUNTERS, &sri); if (ret) return ret; @@ -160,8 +158,7 @@ phase4_func( return ret; if (fsgeom.sick & XFS_FSOP_GEOM_SICK_QUOTACHECK) { - ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_QUOTACHECK, 0, - &alist, &sri); + ret = scrub_meta_type(ctx, XFS_SCRUB_TYPE_QUOTACHECK, &sri); if (ret) return ret; } @@ -170,7 +167,6 @@ phase4_func( ret = repair_item_corruption(ctx, &sri); if (ret) return ret; - action_list_discard(&alist); ret = repair_everything(ctx); if (ret) diff --git a/scrub/phase5.c b/scrub/phase5.c index b7801b46760..ea32d185751 100644 --- a/scrub/phase5.c +++ b/scrub/phase5.c @@ -385,7 +385,6 @@ check_fs_label( struct iscan_item { struct scrub_item sri; - struct action_list alist; bool *abortedp; unsigned int scrub_type; }; @@ -412,16 +411,14 @@ iscan_worker( nanosleep(&tv, NULL); } - ret = scrub_meta_type(ctx, item->scrub_type, 0, &item->alist, - &item->sri); + ret = scrub_meta_type(ctx, item->scrub_type, &item->sri); if (ret) { str_liberror(ctx, ret, _("checking iscan metadata")); *item->abortedp = true; goto out; } - ret = action_list_process(ctx, &item->alist, - XRM_FINAL_WARNING | XRM_NOPROGRESS); + ret = repair_item_completely(ctx, &item->sri); if (ret) { str_liberror(ctx, ret, _("repairing iscan metadata")); *item->abortedp = true; @@ -452,7 +449,6 @@ queue_iscan( return ret; } scrub_item_init_fs(&item->sri); - action_list_init(&item->alist); item->scrub_type = scrub_type; item->abortedp = abortedp; diff --git a/scrub/phase7.c b/scrub/phase7.c index 15540778ffa..98846a1566b 100644 --- a/scrub/phase7.c +++ b/scrub/phase7.c @@ -100,7 +100,6 @@ phase7_func( { struct summary_counts totalcount = {0}; struct scrub_item sri; - struct action_list alist; struct ptvar *ptvar; unsigned long long used_data; unsigned long long used_rt; @@ -119,8 +118,7 @@ phase7_func( /* Check and fix the summary metadata. */ scrub_item_init_fs(&sri); - action_list_init(&alist); - error = scrub_summary_metadata(ctx, &alist, &sri); + error = scrub_summary_metadata(ctx, &sri); if (error) return error; error = repair_item_completely(ctx, &sri); diff --git a/scrub/scrub.c b/scrub/scrub.c index e3bfee40489..5dd5cf67a8e 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -219,6 +219,7 @@ _("Optimizations of %s are possible."), _(xfrog_scrubbers[i].descr)); /* * Scrub a single XFS_SCRUB_TYPE_*, saving corruption reports for later. + * Do not call this function to repair file metadata. * * Returns 0 for success. If errors occur, this function will log them and * return a positive error code. @@ -227,18 +228,29 @@ int scrub_meta_type( struct scrub_ctx *ctx, unsigned int type, - xfs_agnumber_t agno, - struct action_list *alist, struct scrub_item *sri) { struct xfs_scrub_metadata meta = { .sm_type = type, - .sm_agno = agno, }; enum check_outcome fix; background_sleep(); + switch (xfrog_scrubbers[type].group) { + case XFROG_SCRUB_GROUP_AGHEADER: + case XFROG_SCRUB_GROUP_PERAG: + meta.sm_agno = sri->sri_agno; + break; + case XFROG_SCRUB_GROUP_METAFILES: + case XFROG_SCRUB_GROUP_SUMMARY: + case XFROG_SCRUB_GROUP_NONE: + break; + default: + assert(0); + break; + } + /* Check the item. */ fix = xfs_check_metadata(ctx, &ctx->mnt, &meta, false); progress_add(1); @@ -267,8 +279,6 @@ static bool scrub_group( struct scrub_ctx *ctx, enum xfrog_scrub_group group, - xfs_agnumber_t agno, - struct action_list *alist, struct scrub_item *sri) { const struct xfrog_scrub_descr *sc; @@ -281,7 +291,7 @@ scrub_group( if (sc->group != group) continue; - ret = scrub_meta_type(ctx, type, agno, alist, sri); + ret = scrub_meta_type(ctx, type, sri); if (ret) return ret; } @@ -293,22 +303,18 @@ scrub_group( int scrub_ag_headers( struct scrub_ctx *ctx, - xfs_agnumber_t agno, - struct action_list *alist, struct scrub_item *sri) { - return scrub_group(ctx, XFROG_SCRUB_GROUP_AGHEADER, agno, alist, sri); + return scrub_group(ctx, XFROG_SCRUB_GROUP_AGHEADER, sri); } /* Scrub each AG's metadata btrees. */ int scrub_ag_metadata( struct scrub_ctx *ctx, - xfs_agnumber_t agno, - struct action_list *alist, struct scrub_item *sri) { - return scrub_group(ctx, XFROG_SCRUB_GROUP_PERAG, agno, alist, sri); + return scrub_group(ctx, XFROG_SCRUB_GROUP_PERAG, sri); } /* Scrub one metadata file */ @@ -316,22 +322,20 @@ int scrub_metadata_file( struct scrub_ctx *ctx, unsigned int type, - struct action_list *alist, struct scrub_item *sri) { ASSERT(xfrog_scrubbers[type].group == XFROG_SCRUB_GROUP_METAFILES); - return scrub_meta_type(ctx, type, 0, alist, sri); + return scrub_meta_type(ctx, type, sri); } /* Scrub all FS summary metadata. */ int scrub_summary_metadata( struct scrub_ctx *ctx, - struct action_list *alist, struct scrub_item *sri) { - return scrub_group(ctx, XFROG_SCRUB_GROUP_SUMMARY, 0, alist, sri); + return scrub_group(ctx, XFROG_SCRUB_GROUP_SUMMARY, sri); } /* How many items do we have to check? */ @@ -393,7 +397,6 @@ scrub_file( int fd, const struct xfs_bulkstat *bstat, unsigned int type, - struct action_list *alist, struct scrub_item *sri) { struct xfs_scrub_metadata meta = {0}; diff --git a/scrub/scrub.h b/scrub/scrub.h index 95882eabedb..e1e70b38b8e 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -80,18 +80,13 @@ void scrub_item_dump(struct scrub_item *sri, unsigned int group_mask, const char *tag); void scrub_report_preen_triggers(struct scrub_ctx *ctx); -int scrub_ag_headers(struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist, struct scrub_item *sri); -int scrub_ag_metadata(struct scrub_ctx *ctx, xfs_agnumber_t agno, - struct action_list *alist, struct scrub_item *sri); +int scrub_ag_headers(struct scrub_ctx *ctx, struct scrub_item *sri); +int scrub_ag_metadata(struct scrub_ctx *ctx, struct scrub_item *sri); int scrub_metadata_file(struct scrub_ctx *ctx, unsigned int scrub_type, - struct action_list *alist, struct scrub_item *sri); -int scrub_iscan_metadata(struct scrub_ctx *ctx, struct action_list *alist, - struct scrub_item *sri); -int scrub_summary_metadata(struct scrub_ctx *ctx, struct action_list *alist, struct scrub_item *sri); +int scrub_iscan_metadata(struct scrub_ctx *ctx, struct scrub_item *sri); +int scrub_summary_metadata(struct scrub_ctx *ctx, struct scrub_item *sri); int scrub_meta_type(struct scrub_ctx *ctx, unsigned int type, - xfs_agnumber_t agno, struct action_list *alist, struct scrub_item *sri); bool can_scrub_fs_metadata(struct scrub_ctx *ctx); @@ -105,7 +100,6 @@ bool can_repair(struct scrub_ctx *ctx); bool can_force_rebuild(struct scrub_ctx *ctx); int scrub_file(struct scrub_ctx *ctx, int fd, const struct xfs_bulkstat *bstat, - unsigned int type, struct action_list *alist, - struct scrub_item *sri); + unsigned int type, struct scrub_item *sri); #endif /* XFS_SCRUB_SCRUB_H_ */ From patchwork Fri Dec 30 22:18:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085136 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D60B6C3DA7C for ; Sat, 31 Dec 2022 00:29:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231164AbiLaA2x (ORCPT ); Fri, 30 Dec 2022 19:28:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229749AbiLaA2V (ORCPT ); Fri, 30 Dec 2022 19:28:21 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B822F1E3FE for ; Fri, 30 Dec 2022 16:28:20 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 74218B81EAD for ; Sat, 31 Dec 2022 00:28:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2FB53C433EF; Sat, 31 Dec 2022 00:28:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446498; bh=THDiV6lz70du9OdKvWFTE8UhOfrdktdXp1k3ZnBz9pQ=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=jddZzBZ5EAZbw+wy19APHAy06yeLTEHyixBDhCx+xBFo5VH3YvMWskNNoAWUG2uij ZJQ8hNwE0K55nKTzBTNK0N6PMHqRBsPpOKGRP9+gytEURDXRqDsig14n3NTTVQgJ5w OHykNjunlY6NU/64do1LdMDUNkkxEHwjUzUI0WkFL+wYNcpJEwJZTJKl+GP03qJxSw 149I6yKz4RV9QM4d8HZxhEm8MN7pIo+8LydhMN6dOM20KncKPtucqweydquzjotuMH Y1W2gdtfe9M+K2vHowQJb06nP5pTqc4oQ55MmpI/5eKmSV9WBpAOLSaXq4FeNLCyJu NKhVnAYifUIgg== Subject: [PATCH 4/9] xfs_scrub: remove scrub_metadata_file From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:17 -0800 Message-ID: <167243869766.715746.17566397921177016618.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Collapse this function with scrub_meta_type. Signed-off-by: Darrick J. Wong --- scrub/phase2.c | 2 +- scrub/scrub.c | 12 ------------ scrub/scrub.h | 2 -- 3 files changed, 1 insertion(+), 15 deletions(-) diff --git a/scrub/phase2.c b/scrub/phase2.c index 656eccce449..138f0f8a8f3 100644 --- a/scrub/phase2.c +++ b/scrub/phase2.c @@ -131,7 +131,7 @@ scan_metafile( goto out; scrub_item_init_fs(&sri); - ret = scrub_metadata_file(ctx, type, &sri); + ret = scrub_meta_type(ctx, type, &sri); if (ret) { sctl->aborted = true; goto out; diff --git a/scrub/scrub.c b/scrub/scrub.c index 5dd5cf67a8e..b970d1cfe90 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -317,18 +317,6 @@ scrub_ag_metadata( return scrub_group(ctx, XFROG_SCRUB_GROUP_PERAG, sri); } -/* Scrub one metadata file */ -int -scrub_metadata_file( - struct scrub_ctx *ctx, - unsigned int type, - struct scrub_item *sri) -{ - ASSERT(xfrog_scrubbers[type].group == XFROG_SCRUB_GROUP_METAFILES); - - return scrub_meta_type(ctx, type, sri); -} - /* Scrub all FS summary metadata. */ int scrub_summary_metadata( diff --git a/scrub/scrub.h b/scrub/scrub.h index e1e70b38b8e..6e34ca2d7b3 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -82,8 +82,6 @@ void scrub_item_dump(struct scrub_item *sri, unsigned int group_mask, void scrub_report_preen_triggers(struct scrub_ctx *ctx); int scrub_ag_headers(struct scrub_ctx *ctx, struct scrub_item *sri); int scrub_ag_metadata(struct scrub_ctx *ctx, struct scrub_item *sri); -int scrub_metadata_file(struct scrub_ctx *ctx, unsigned int scrub_type, - struct scrub_item *sri); int scrub_iscan_metadata(struct scrub_ctx *ctx, struct scrub_item *sri); int scrub_summary_metadata(struct scrub_ctx *ctx, struct scrub_item *sri); int scrub_meta_type(struct scrub_ctx *ctx, unsigned int type, From patchwork Fri Dec 30 22:18:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F252C4332F for ; Sat, 31 Dec 2022 00:29:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229749AbiLaA2y (ORCPT ); Fri, 30 Dec 2022 19:28:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235656AbiLaA2h (ORCPT ); Fri, 30 Dec 2022 19:28:37 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 621FE1E3FE for ; Fri, 30 Dec 2022 16:28:36 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0304EB81EAC for ; Sat, 31 Dec 2022 00:28:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B6C31C433EF; Sat, 31 Dec 2022 00:28:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446513; bh=WYucsrfUj/a/i1rEGxY0w8reWZASbJ5VvVvHcldRpxE=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=OLzKIjlXSv69Zy1XDRYZVoEurBWenWm6b3Bzi2zbkiwJFNzipnVh5Q5SUh2SgoVHM wCzWZhRmh8Jctf3cCeCai+/GAc6AXs92Iys3SB3E5lankMDCuXdI65QXn+cpbsvPI/ fYTPcy3s+NPdeQjWZ8fbAf0URwW4n4t27KlPoyp8TWKc+V4mavW82FxOtsCn2oKk0U VSDOAeKcQ9fZUf25PSjSCp7bdaCTVzamZATqZ96vMXZJyOF+RUiZLik0bToOpwfUaa bDPzdc4RFMxrOGYdQXfEtI7qrxmAqATQ8QaH0bAZtdlyISR4quHszh4PCtJhlfi0QW 5Nbswx4QZWo8g== Subject: [PATCH 5/9] xfs_scrub: boost the repair priority of dependencies of damaged items From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:17 -0800 Message-ID: <167243869779.715746.14845159623486485344.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong In XFS, certain types of metadata objects depend on the correctness of lower level metadata objects. For example, directory blocks are stored in the data fork of directory files, which means that any issues with the inode core and the data fork should be dealt with before we try to repair a directory. xfs_scrub prioritises repairs by the severity of what the kernel scrub function reports -- anything directly observed to be corrupt get repaired first, then anything that had trouble with cross referencing, and finally anything that was correct but could be further optimised. Returning to the above example, if a directory data fork mapping offset is off by a bit flip, scrub will mark that as failing cross referencing, but it'll mark the directory as corrupt. Repair should check out the mapping problem before it tackles the directory. Do this by embedding a dependency table and using it to boost the priority of the repair_item fields as needed. Signed-off-by: Darrick J. Wong --- libfrog/scrub.c | 1 + scrub/repair.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++- scrub/scrub.h | 12 ++++++ scrub/scrub_private.h | 8 ++++ 4 files changed, 116 insertions(+), 3 deletions(-) diff --git a/libfrog/scrub.c b/libfrog/scrub.c index 7cd241d9bce..3e322b4717d 100644 --- a/libfrog/scrub.c +++ b/libfrog/scrub.c @@ -150,6 +150,7 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = { .group = XFROG_SCRUB_GROUP_NONE, }, }; +#undef DEP /* Invoke the scrub ioctl. Returns zero or negative error code. */ int diff --git a/scrub/repair.c b/scrub/repair.c index cadd2c20627..16acb0a0f10 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -22,6 +22,28 @@ /* General repair routines. */ +/* + * Bitmap showing the correctness dependencies between scrub types for repairs. + * There are no edges between AG btrees and AG headers because we can't mount + * the filesystem if the btree root pointers in the AG headers are wrong. + * Dependencies cannot cross scrub groups. + */ +#define DEP(x) (1U << (x)) +static const unsigned int repair_deps[XFS_SCRUB_TYPE_NR] = { + [XFS_SCRUB_TYPE_BMBTD] = DEP(XFS_SCRUB_TYPE_INODE), + [XFS_SCRUB_TYPE_BMBTA] = DEP(XFS_SCRUB_TYPE_INODE), + [XFS_SCRUB_TYPE_BMBTC] = DEP(XFS_SCRUB_TYPE_INODE), + [XFS_SCRUB_TYPE_DIR] = DEP(XFS_SCRUB_TYPE_BMBTD), + [XFS_SCRUB_TYPE_XATTR] = DEP(XFS_SCRUB_TYPE_BMBTA), + [XFS_SCRUB_TYPE_SYMLINK] = DEP(XFS_SCRUB_TYPE_BMBTD), + [XFS_SCRUB_TYPE_PARENT] = DEP(XFS_SCRUB_TYPE_BMBTD), + [XFS_SCRUB_TYPE_QUOTACHECK] = DEP(XFS_SCRUB_TYPE_UQUOTA) | + DEP(XFS_SCRUB_TYPE_GQUOTA) | + DEP(XFS_SCRUB_TYPE_PQUOTA), + [XFS_SCRUB_TYPE_RTSUM] = DEP(XFS_SCRUB_TYPE_RTBITMAP), +}; +#undef DEP + /* Repair some metadata. */ static enum check_outcome xfs_repair_metadata( @@ -34,8 +56,16 @@ xfs_repair_metadata( struct xfs_scrub_metadata meta = { 0 }; struct xfs_scrub_metadata oldm; DEFINE_DESCR(dsc, ctx, format_scrub_descr); + bool repair_only; int error; + /* + * If the caller boosted the priority of this scrub type on behalf of a + * higher level repair by setting IFLAG_REPAIR, turn off REPAIR_ONLY. + */ + repair_only = (repair_flags & XRM_REPAIR_ONLY) && + scrub_item_type_boosted(sri, scrub_type); + assert(scrub_type < XFS_SCRUB_TYPE_NR); assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL")); meta.sm_type = scrub_type; @@ -55,7 +85,7 @@ xfs_repair_metadata( break; } - if (!is_corrupt(&meta) && (repair_flags & XRM_REPAIR_ONLY)) + if (!is_corrupt(&meta) && repair_only) return CHECK_RETRY; memcpy(&oldm, &meta, sizeof(oldm)); @@ -215,6 +245,60 @@ struct action_item { struct scrub_item sri; }; +/* + * The operation of higher level metadata objects depends on the correctness of + * lower level metadata objects. This means that if X depends on Y, we must + * investigate and correct all the observed issues with Y before we try to make + * a correction to X. For all scheduled repair activity on X, boost the + * priority of repairs on all the Ys to ensure this correctness. + */ +static void +repair_item_boost_priorities( + struct scrub_item *sri) +{ + unsigned int scrub_type; + + foreach_scrub_type(scrub_type) { + unsigned int dep_mask = repair_deps[scrub_type]; + unsigned int b; + + if (repair_item_count_needsrepair(sri) == 0 || !dep_mask) + continue; + + /* + * Check if the repairs for this scrub type depend on any other + * scrub types that have been flagged with cross-referencing + * errors and are not already tagged for the highest priority + * repair (SCRUB_ITEM_CORRUPT). If so, boost the priority of + * that scrub type (via SCRUB_ITEM_BOOST_REPAIR) so that any + * problems with the dependencies will (hopefully) be fixed + * before we start repairs on this scrub type. + * + * So far in the history of xfs_scrub we have maintained that + * lower numbered scrub types do not depend on higher numbered + * scrub types, so we need only process the bit mask once. + */ + for (b = 0; b < XFS_SCRUB_TYPE_NR; b++, dep_mask >>= 1) { + if (!dep_mask) + break; + if (!(dep_mask & 1)) + continue; + if (!(sri->sri_state[b] & SCRUB_ITEM_REPAIR_XREF)) + continue; + if (sri->sri_state[b] & SCRUB_ITEM_CORRUPT) + continue; + sri->sri_state[b] |= SCRUB_ITEM_BOOST_REPAIR; + } + } +} + +/* + * These are the scrub item state bits that must be copied when scheduling + * a (per-AG) scrub type for immediate repairs. The original state tracking + * bits are left untouched to force a rescan in phase 4. + */ +#define MUSTFIX_STATES (SCRUB_ITEM_CORRUPT | \ + SCRUB_ITEM_BOOST_REPAIR) /* * Figure out which AG metadata must be fixed before we can move on * to the inode scan. @@ -227,17 +311,21 @@ repair_item_mustfix( unsigned int scrub_type; assert(sri->sri_agno != -1U); + repair_item_boost_priorities(sri); scrub_item_init_ag(fix_now, sri->sri_agno); foreach_scrub_type(scrub_type) { - if (!(sri->sri_state[scrub_type] & SCRUB_ITEM_CORRUPT)) + unsigned int state; + + state = sri->sri_state[scrub_type] & MUSTFIX_STATES; + if (!state) continue; switch (scrub_type) { case XFS_SCRUB_TYPE_AGI: case XFS_SCRUB_TYPE_FINOBT: case XFS_SCRUB_TYPE_INOBT: - fix_now->sri_state[scrub_type] |= SCRUB_ITEM_CORRUPT; + fix_now->sri_state[scrub_type] = state; break; } } @@ -471,6 +559,8 @@ repair_file_corruption( struct scrub_item *sri, int override_fd) { + repair_item_boost_priorities(sri); + return repair_item_class(ctx, sri, override_fd, SCRUB_ITEM_CORRUPT, XRM_REPAIR_ONLY | XRM_NOPROGRESS); } @@ -487,6 +577,8 @@ repair_item( { int ret; + repair_item_boost_priorities(sri); + ret = repair_item_class(ctx, sri, -1, SCRUB_ITEM_CORRUPT, flags); if (ret) return ret; diff --git a/scrub/scrub.h b/scrub/scrub.h index 6e34ca2d7b3..0d5738dc692 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -14,6 +14,14 @@ enum check_outcome { CHECK_RETRY, /* repair failed, try again later */ }; +/* + * This flag boosts the repair priority of a scrub item when a dependent scrub + * item is scheduled for repair. Use a separate flag to preserve the + * corruption state that we got from the kernel. Priority boost is cleared the + * next time xfs_repair_metadata is called. + */ +#define SCRUB_ITEM_BOOST_REPAIR (1 << 0) + /* * These flags record the metadata object state that the kernel returned. * We want to remember if the object was corrupt, if the cross-referencing @@ -31,6 +39,10 @@ enum check_outcome { SCRUB_ITEM_XFAIL | \ SCRUB_ITEM_XCORRUPT) +/* Cross-referencing failures only. */ +#define SCRUB_ITEM_REPAIR_XREF (SCRUB_ITEM_XFAIL | \ + SCRUB_ITEM_XCORRUPT) + struct scrub_item { /* * Information we need to call the scrub and repair ioctls. Per-AG diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h index f91c65383d1..eafb750b0d1 100644 --- a/scrub/scrub_private.h +++ b/scrub/scrub_private.h @@ -71,4 +71,12 @@ scrub_item_clean_state( sri->sri_state[scrub_type] = 0; } +static inline bool +scrub_item_type_boosted( + struct scrub_item *sri, + unsigned int scrub_type) +{ + return sri->sri_state[scrub_type] & SCRUB_ITEM_BOOST_REPAIR; +} + #endif /* XFS_SCRUB_SCRUB_PRIVATE_H_ */ From patchwork Fri Dec 30 22:18:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085138 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A2CAC3DA7D for ; Sat, 31 Dec 2022 00:29:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235592AbiLaA3Z (ORCPT ); Fri, 30 Dec 2022 19:29:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235664AbiLaA2w (ORCPT ); Fri, 30 Dec 2022 19:28:52 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D874D1E3FE for ; Fri, 30 Dec 2022 16:28:51 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8CA2CB81EAC for ; Sat, 31 Dec 2022 00:28:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 50123C433D2; Sat, 31 Dec 2022 00:28:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446529; bh=M1FskjE7kjB8CrAVJEo+WtbH4hOUSyy4dRFaOF3F9a4=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Zj/FEU9jEdzNxEvAkqK7yxuakPHaQzvqeVjqCgpEoWvtFdlOXrPYEOutxFh1ZJziC J5xjxDhQmMHcFR+RY/0BRW4ufeoGmYueT+zlpVan1wIpB7A5UxjGarzDdZFor2n1pX V5+JytN5qRwYJ5hwaUgl3RGX2JvplJ8/+0W0jhzJeULkVWgjUHeNuuRGfw71q/YAus dGzCxm+os3qOs5aWBJ6WnLRV4UUsp0KzjEchESXrgWq2bqhybDh5GFWi4ZCb7xkeXo IWs7LmzzI26kdfkmiDCTrlQGmWCv3QpL2wwNYGnfcdPwrmIMbpzUmKIbQtw2EhIirT 4HsH80lgYmGqA== Subject: [PATCH 6/9] xfs_scrub: clean up repair_item_difficulty a little From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:17 -0800 Message-ID: <167243869793.715746.14142316853722833072.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Document the flags handling in repair_item_difficulty. Signed-off-by: Darrick J. Wong --- scrub/repair.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/scrub/repair.c b/scrub/repair.c index 16acb0a0f10..7ad4f6cfe8a 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -331,6 +331,15 @@ repair_item_mustfix( } } +/* + * These scrub item states correspond to metadata that is inconsistent in some + * way and must be repaired. If too many metadata objects share these states, + * this can make repairs difficult. + */ +#define HARDREPAIR_STATES (SCRUB_ITEM_CORRUPT | \ + SCRUB_ITEM_XCORRUPT | \ + SCRUB_ITEM_XFAIL) + /* Determine if primary or secondary metadata are inconsistent. */ unsigned int repair_item_difficulty( @@ -340,9 +349,10 @@ repair_item_difficulty( unsigned int ret = 0; foreach_scrub_type(scrub_type) { - if (!(sri->sri_state[scrub_type] & (XFS_SCRUB_OFLAG_CORRUPT | - XFS_SCRUB_OFLAG_XCORRUPT | - XFS_SCRUB_OFLAG_XFAIL))) + unsigned int state; + + state = sri->sri_state[scrub_type] & HARDREPAIR_STATES; + if (!state) continue; switch (scrub_type) { From patchwork Fri Dec 30 22:18:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 173FAC4332F for ; Sat, 31 Dec 2022 00:29:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231570AbiLaA30 (ORCPT ); Fri, 30 Dec 2022 19:29:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235583AbiLaA3G (ORCPT ); Fri, 30 Dec 2022 19:29:06 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD9DE1E3FE for ; Fri, 30 Dec 2022 16:29:05 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 69EC561D3E for ; Sat, 31 Dec 2022 00:29:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CA231C433D2; Sat, 31 Dec 2022 00:29:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446544; bh=khK6i/zrvDEnxOYr9g+ON7FbIBrB7/utzpBc0YK49Sc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=nFZXmAqXwV/TMiO+9XCmRrkCJmwPGSEzlTeo82R+652XqFkpJs1ldvb90aZ1VrKEC y2UmFDr8tq1WoJHmU7XY18KNXK1e4w2ATsk+5H0DIIiBZeruIBC2SkwzeRjgIU1Ayv 26wA73BzhwIuyLjHVsxKHUkqkgszoHCz4+TDRiDvr0H/UCzAjhvHggUmIxEVuzlVZG lr22ScInB1Qs8hy+fsV7MxScb+p/TfJXa6U5EAQgQ5U4DLZHx/CVXeiqErSX7JyaKA kEIsCG973wbWn0vvOcjjaH5fdPe14Yx9qBXZxf02I3XSil+9KXjJ5tdAARN/1wY1f3 Yr2wDqT/qNwpQ== Subject: [PATCH 7/9] xfs_scrub: check dependencies of a scrub type before repairing From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:18 -0800 Message-ID: <167243869806.715746.18228141058609604189.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Now that we have a map of a scrub type to its dependent scrub types, use this information to avoid trying to fix higher level metadata before the lower levels have passed. Signed-off-by: Darrick J. Wong --- scrub/repair.c | 32 ++++++++++++++++++++++++++++++++ scrub/scrub.h | 5 +++++ 2 files changed, 37 insertions(+) diff --git a/scrub/repair.c b/scrub/repair.c index 7ad4f6cfe8a..8624167246a 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -488,6 +488,29 @@ action_list_process( return ret; } +/* Decide if the dependent scrub types of the given scrub type are ok. */ +static bool +repair_item_dependencies_ok( + const struct scrub_item *sri, + unsigned int scrub_type) +{ + unsigned int dep_mask = repair_deps[scrub_type]; + unsigned int b; + + for (b = 0; dep_mask && b < XFS_SCRUB_TYPE_NR; b++, dep_mask >>= 1) { + if (!(dep_mask & 1)) + continue; + /* + * If this lower level object also needs repair, we can't fix + * the higher level item. + */ + if (sri->sri_state[b] & SCRUB_ITEM_NEEDSREPAIR) + return false; + } + + return true; +} + /* * For a given filesystem object, perform all repairs of a given class * (corrupt, xcorrupt, xfail, preen) if the repair item says it's needed. @@ -527,6 +550,15 @@ repair_item_class( if (!(sri->sri_state[scrub_type] & repair_mask)) continue; + /* + * Don't try to repair higher level items if their lower-level + * dependencies haven't been verified, unless this is our last + * chance to fix things without complaint. + */ + if (!(flags & XRM_FINAL_WARNING) && + !repair_item_dependencies_ok(sri, scrub_type)) + continue; + fix = xfs_repair_metadata(ctx, xfdp, scrub_type, sri, flags); switch (fix) { case CHECK_DONE: diff --git a/scrub/scrub.h b/scrub/scrub.h index 0d5738dc692..75595f43ee9 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -43,6 +43,11 @@ enum check_outcome { #define SCRUB_ITEM_REPAIR_XREF (SCRUB_ITEM_XFAIL | \ SCRUB_ITEM_XCORRUPT) +/* Mask of bits signalling that a piece of metadata requires attention. */ +#define SCRUB_ITEM_NEEDSREPAIR (SCRUB_ITEM_CORRUPT | \ + SCRUB_ITEM_XFAIL | \ + SCRUB_ITEM_XCORRUPT) + struct scrub_item { /* * Information we need to call the scrub and repair ioctls. Per-AG From patchwork Fri Dec 30 22:18:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49EA6C4708E for ; Sat, 31 Dec 2022 00:29:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235513AbiLaA30 (ORCPT ); Fri, 30 Dec 2022 19:29:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235484AbiLaA3W (ORCPT ); Fri, 30 Dec 2022 19:29:22 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C2E71E3FE for ; Fri, 30 Dec 2022 16:29:21 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id ED10761D3E for ; Sat, 31 Dec 2022 00:29:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5C2A5C433EF; Sat, 31 Dec 2022 00:29:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446560; bh=SMX8a96RM7v3kAMtIgo4WE60AO6p59BH9Hn+vb1Djew=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=pcjEW1TkBgDLgihWl7FyksgkHjGOnHPZ/fLs25Ri+CeqLxY6cyWxaE478fiYeI9Dr r4dqKzli1r9DReziWDI0U406IjdvGyL9uSjD3/9tIt2hIblLCf/J0jUpEVaWYp+oQv hVtkZZNXPc9zxpM/oB7sa1Kwn8TSKAVCWfeJ3/16OJBj7IATeIzAwspW3d21sb53b5 cfzsZ1jD2Al1VgbmSMekE4ilKWRfY85Mj2i7xbT/fLgR8Za6h4jzcDlFUKgK8Euy5/ vBKBJxId5Z2Dh4ELj51XtyENWQ+YyMPdj3AUyP1xG1VI8HWP/ujbB/Amddr0U78VA4 caqT7dvlZSm0A== Subject: [PATCH 8/9] xfs_scrub: retry incomplete repairs From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:18 -0800 Message-ID: <167243869820.715746.7386680567080978081.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong If a repair says it didn't do anything on account of not being able to complete a scan of the metadata, retry the repair a few times; if even that doesn't work, we can delay it to phase 4. Signed-off-by: Darrick J. Wong --- scrub/repair.c | 15 ++++++++++++++- scrub/scrub.c | 3 +-- scrub/scrub_private.h | 10 ++++++++++ 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/scrub/repair.c b/scrub/repair.c index 8624167246a..c1ab03d6f02 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -57,6 +57,7 @@ xfs_repair_metadata( struct xfs_scrub_metadata oldm; DEFINE_DESCR(dsc, ctx, format_scrub_descr); bool repair_only; + unsigned int tries = 0; int error; /* @@ -98,6 +99,7 @@ xfs_repair_metadata( str_info(ctx, descr_render(&dsc), _("Attempting optimization.")); +retry: error = -xfrog_scrub_metadata(xfdp, &meta); switch (error) { case 0: @@ -176,9 +178,20 @@ _("Read-only filesystem; cannot make changes.")); return CHECK_DONE; } + /* + * If the kernel says the repair was incomplete or that there was a + * cross-referencing discrepancy but no obvious corruption, we'll try + * the repair again, just in case the fs was busy. Only retry so many + * times. + */ + if (want_retry(&meta) && tries < 10) { + tries++; + goto retry; + } + if (repair_flags & XRM_FINAL_WARNING) scrub_warn_incomplete_scrub(ctx, &dsc, &meta); - if (needs_repair(&meta)) { + if (needs_repair(&meta) || is_incomplete(&meta)) { /* * Still broken; if we've been told not to complain then we * just requeue this and try again later. Otherwise we diff --git a/scrub/scrub.c b/scrub/scrub.c index b970d1cfe90..699e9aa3940 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -137,8 +137,7 @@ _("Filesystem is shut down, aborting.")); * we'll try the scan again, just in case the fs was busy. * Only retry so many times. */ - if (tries < 10 && (is_incomplete(meta) || - (xref_disagrees(meta) && !is_corrupt(meta)))) { + if (want_retry(meta) && tries < 10) { tries++; goto retry; } diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h index eafb750b0d1..b54384c2091 100644 --- a/scrub/scrub_private.h +++ b/scrub/scrub_private.h @@ -49,6 +49,16 @@ static inline bool needs_repair(struct xfs_scrub_metadata *sm) return is_corrupt(sm) || xref_disagrees(sm); } +/* + * We want to retry an operation if the kernel says it couldn't complete the + * scan/repair; or if there were cross-referencing problems but the object was + * not obviously corrupt. + */ +static inline bool want_retry(struct xfs_scrub_metadata *sm) +{ + return is_incomplete(sm) || (xref_disagrees(sm) && !is_corrupt(sm)); +} + void scrub_warn_incomplete_scrub(struct scrub_ctx *ctx, struct descr *dsc, struct xfs_scrub_metadata *meta); From patchwork Fri Dec 30 22:18:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91A8DC4332F for ; Sat, 31 Dec 2022 00:30:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235435AbiLaA35 (ORCPT ); Fri, 30 Dec 2022 19:29:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235603AbiLaA3k (ORCPT ); Fri, 30 Dec 2022 19:29:40 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D4C21EACF for ; Fri, 30 Dec 2022 16:29:39 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 9A61BCE1AC3 for ; Sat, 31 Dec 2022 00:29:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC5BAC433EF; Sat, 31 Dec 2022 00:29:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446575; bh=kL9Y7YuRzbEEYIQTD79/1YXgJsIq9YtNvFd6fSMYQdA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=gZjVRUp/3XmxFPrQevE4i9dDgRjVqZd2ADvgMbvJakImMPDXPfdqzgGqIGdt9yyZ2 +35lSrKBSVWJWQB6mO/WfsrVT+syyxFs5vU41rupS8SgFESEFTJe3/FB/Eg4AcOrS9 7j4ksSxaqOxbNHDNez418wc7/bBdgu0N9+cyVS8G33ugeWl2CX7A9o+4d8KysxuPW6 c9nf9WSq2TPK8VilQezWuTpd8XoynoeFmSw222GVrqv5bdaXNfHrLZG0rMjHGFqU+Y vPtAst6DCeJzTKybTikuIMwO9SXjyZQw0iHFp4AI67SoI96JzTVOtmmjQU1HbjuBwD HxiamP/hmWNBQ== Subject: [PATCH 9/9] xfs_scrub: remove unused action_list fields From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:18 -0800 Message-ID: <167243869833.715746.893789790108959642.stgit@magnolia> In-Reply-To: <167243869711.715746.14725730988345960302.stgit@magnolia> References: <167243869711.715746.14725730988345960302.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Remove some fields since we don't need them anymore. Signed-off-by: Darrick J. Wong --- scrub/repair.c | 5 ----- scrub/repair.h | 2 -- 2 files changed, 7 deletions(-) diff --git a/scrub/repair.c b/scrub/repair.c index c1ab03d6f02..a552b445e90 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -423,7 +423,6 @@ action_list_discard( struct action_item *n; list_for_each_entry_safe(aitem, n, &alist->list, list) { - alist->nr--; list_del(&aitem->list); free(aitem); } @@ -444,8 +443,6 @@ action_list_init( struct action_list *alist) { INIT_LIST_HEAD(&alist->list); - alist->nr = 0; - alist->sorted = false; } /* Number of pending repairs in this list. */ @@ -469,8 +466,6 @@ action_list_add( struct action_item *aitem) { list_add_tail(&aitem->list, &alist->list); - alist->nr++; - alist->sorted = false; } /* Repair everything on this list. */ diff --git a/scrub/repair.h b/scrub/repair.h index b0b448cef7a..d76bb963cdd 100644 --- a/scrub/repair.h +++ b/scrub/repair.h @@ -8,8 +8,6 @@ struct action_list { struct list_head list; - unsigned long long nr; - bool sorted; }; struct action_item;