From patchwork Fri May 26 00:50:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13255901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5ED9DC77B7A for ; Fri, 26 May 2023 00:50:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229567AbjEZAuE (ORCPT ); Thu, 25 May 2023 20:50:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235394AbjEZAuD (ORCPT ); Thu, 25 May 2023 20:50:03 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42469199 for ; Thu, 25 May 2023 17:50:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D3AD8615B4 for ; Fri, 26 May 2023 00:50:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E252C433D2; Fri, 26 May 2023 00:50:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685062201; bh=qbhcu6uaHXs7rNX+rkew4Ot0Q/QepaN+xA986eTUEZQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=CZ0I8fosZixFicAhl/+rdNiFoBIh22LC5UmbU5l+VLNzxLvHt4B8Gvv2geRiUVO/S dPpFwaABgBAQ4XipI/JyUtSUGOXHdyU0ua31Z5JzTl/SbX3sAguge53oiqZFwCIGhS 86Zk+cs2L5KjGAg0iw/6d6FmMhNNcnSZS0yOxEWZ7935Iq3qnnwjQfPUUC8gZ9QMrJ l6Bf0QjW6jP915PRdBnGcn6sGdDQOq9yqApkFeQJjzwA5xM9udhHCDaXqrXNdKjD+l Gfd1ca2bBQX1tR+zrB95DusKaFDoOm0B6AaLVKGN4w6cJnYfAuEsbb7cEtXF2s6T4u MovoMkKOVnbFw== Date: Thu, 25 May 2023 17:50:00 -0700 Subject: [PATCH 1/2] xfs: always rescan allegedly healthy per-ag metadata after repair From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <168506057238.3730021.5815780398185212485.stgit@frogsfrogsfrogs> In-Reply-To: <168506057223.3730021.15237048674614006148.stgit@frogsfrogsfrogs> References: <168506057223.3730021.15237048674614006148.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong After an online repair function runs for a per-AG metadata structure, sc->sick_mask is supposed to reflect the per-AG metadata that the repair function fixed. Our next move is to re-check the metadata to assess the completeness of our repair, so we don't want the rebuilt structure to be excluded from the rescan just because the health system previously logged a problem with the data structure. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/health.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index d2b2a1cb6533..5e2b09ed6e29 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -226,6 +226,16 @@ xchk_ag_btree_healthy_enough( return true; } + /* + * If we just repaired some AG metadata, sc->sick_mask will reflect all + * the per-AG metadata types that were repaired. Exclude these from + * the filesystem health query because we have not yet updated the + * health status and we want everything to be scanned. + */ + if ((sc->flags & XREP_ALREADY_FIXED) && + type_to_health_flag[sc->sm->sm_type].group == XHG_AG) + mask &= ~sc->sick_mask; + if (xfs_ag_has_sickness(pag, mask)) { sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XFAIL; return false; From patchwork Fri May 26 00:50:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13255902 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0985EC77B7E for ; Fri, 26 May 2023 00:50:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233106AbjEZAuT (ORCPT ); Thu, 25 May 2023 20:50:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232645AbjEZAuS (ORCPT ); Thu, 25 May 2023 20:50:18 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D401C12E for ; Thu, 25 May 2023 17:50:17 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 704E6619B3 for ; Fri, 26 May 2023 00:50:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5D6DC433D2; Fri, 26 May 2023 00:50:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685062216; bh=5jXpceOI7p8QU+X8F7d8i7Cu3PaTZvwnrbG4PbbYF/Q=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=umuaPAG3XdWM1UzN2vBbi46+x0UfAZO8434o9aEm2VvFXa/cWZIFaEtg9kzzierYR ghhRVMq5bxJATXctT+PCQUxrKpDk+e+OyXUXB2ou7svcJoemDLtgwA+z1HcgrFRHDz TU44zQSLuLJUWjtuQyNgWqyY9pTWz6ZTyR+/PERiI2CaXTsTMCh8ht84vKTDw2p9RB oSoCx4QPbLM+Rqn6zQ4DYv38B8ySoJ+orZ4wWuXmMaXjBbdm6POr9tUkNqOHiaYapx ScK3Xv0j6t6ii5aoN8GlIrOYFVpLw42o46QWw90EWF9rBVSJYU230j8l+l6ibpHOtg NruNAfQOYu9Lw== Date: Thu, 25 May 2023 17:50:16 -0700 Subject: [PATCH 2/2] xfs: allow the user to cancel repairs before we start writing From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <168506057253.3730021.16355634028400743188.stgit@frogsfrogsfrogs> In-Reply-To: <168506057223.3730021.15237048674614006148.stgit@frogsfrogsfrogs> References: <168506057223.3730021.15237048674614006148.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong All online repair functions have the same structure: walk filesystem metadata structures gathering enough data to rebuild the structure, stage a new copy, and then commit the new copy. The gathering steps do not write anything to disk, so they are peppered with xchk_should_terminate calls to avoid softlockup warnings and to provide an opportunity to abort the repair (by killing xfs_scrub). However, it's not clear in the code base when is the last chance to abort cleanly without having to undo a bunch of structure. Therefore, add one more call to xchk_should_terminate (along with a comment) providing the sysadmin with the ability to abort before it's too late and to make it clear in the source code when it's no longer convenient or safe to abort a repair. As there are only four repair functions right now, this patch exists more to establish a precedent for subsequent additions than to deliver practical functionality. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/agheader_repair.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index 7874ae8149ca..d54edd0d8538 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -50,6 +50,10 @@ xrep_superblock( if (error) return error; + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + return error; + /* Copy AG 0's superblock to this one. */ xfs_buf_zero(bp, 0, BBTOB(bp->b_length)); xfs_sb_to_disk(bp->b_addr, &mp->m_sb); @@ -425,6 +429,10 @@ xrep_agf( if (error) return error; + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + return error; + /* Start rewriting the header and implant the btrees we found. */ xrep_agf_init_header(sc, agf_bp, &old_agf); xrep_agf_set_roots(sc, agf, fab); @@ -749,6 +757,10 @@ xrep_agfl( if (error) goto err; + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + goto err; + /* * Update AGF and AGFL. We reset the global free block counter when * we adjust the AGF flcount (which can fail) so avoid updating any @@ -996,6 +1008,10 @@ xrep_agi( if (error) return error; + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + return error; + /* Start rewriting the header and implant the btrees we found. */ xrep_agi_init_header(sc, agi_bp, &old_agi); xrep_agi_set_roots(sc, agi, fab);