From patchwork Fri Dec 30 22:11:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 255B0C4332F for ; Fri, 30 Dec 2022 22:45:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235580AbiL3WpM (ORCPT ); Fri, 30 Dec 2022 17:45:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235648AbiL3WpJ (ORCPT ); Fri, 30 Dec 2022 17:45:09 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4366912D26 for ; Fri, 30 Dec 2022 14:45:08 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C4CFC61C0D for ; Fri, 30 Dec 2022 22:45:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 30889C433D2; Fri, 30 Dec 2022 22:45:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672440307; bh=NbHBu/hkzzHRD/koSRZdMFxgLjq+6obDsKD16dB+UwA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=IV6oqT4NCd0uJZxFJ6TSr1iE+ccdOHbT4qk3viEpif8ffmMRp7bpFwm45hZT1lZzS d3FxEq3Lk1UjWHgpzQ+IyA7YFDNc6u9RcZ8eOlPcfGHxreNqq/2hHknkQYMdnsSRHW 0QmwdUPyLZon0RpvORxkt5uY/rzd0uPpL1C0KDJvsWNpAHdhvgmfxBe1v8gIU4sVeT eHmALM/FaGsl/OFqntTDE9TOrp+zLk62hNoQf31Q28R2zAW6P8Ucb8eXUwqZaCVWCB u6KwfiQOkuLM8Ufyf9Wi+efDHsE0M/eOQ2CfIplEvQpw+vWGIOXlkUOANZNmK73ckx FsSMHz+lQ7XSA== Subject: [PATCH 1/4] xfs: manage inode DONTCACHE status at irele time From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:11:35 -0800 Message-ID: <167243829569.684831.11929810961803889833.stgit@magnolia> In-Reply-To: <167243829551.684831.7487988225134202107.stgit@magnolia> References: <167243829551.684831.7487988225134202107.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Right now, there are statements scattered all over the online fsck codebase about how we can't use XFS_IGET_DONTCACHE because of concerns about scrub's unusual practice of releasing inodes with transactions held. However, iget is the wrong place to handle this -- the DONTCACHE state doesn't matter at all until we try to *release* the inode, and here we get things wrong in multiple ways: First, if we /do/ have a transaction, we must NOT drop the inode, because the inode could have dirty pages, dropping the inode will trigger writeback, and writeback can trigger a nested transaction. Second, if the inode already had an active reference and the DONTCACHE flag set, the icache hit when scrub grabs another ref will not clear DONTCACHE. This is sort of by design, since DONTCACHE is now used to initiate cache drops so that sysadmins can change a file's access mode between pagecache and DAX. Third, if we do actually have the last active reference to the inode, we can set DONTCACHE to avoid polluting the cache. This is the /one/ case where we actually want that flag. Create an xchk_irele helper to encode all that logic and switch the online fsck code to use it. Since this now means that nearly all scrubbers use the same xfs_iget flags, we can wrap them too. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/common.c | 52 +++++++++++++++++++++++++++++++++++++++++++++---- fs/xfs/scrub/common.h | 3 +++ fs/xfs/scrub/dir.c | 2 +- fs/xfs/scrub/parent.c | 9 ++++---- fs/xfs/scrub/scrub.c | 2 +- 5 files changed, 57 insertions(+), 11 deletions(-) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index b21d675dd158..28c43d9f1c56 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -710,6 +710,16 @@ xchk_checkpoint_log( return 0; } +/* Verify that an inode is allocated ondisk, then return its cached inode. */ +int +xchk_iget( + struct xfs_scrub *sc, + xfs_ino_t inum, + struct xfs_inode **ipp) +{ + return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp); +} + /* * Given an inode and the scrub control structure, grab either the * inode referenced in the control structure or the inode passed in. @@ -734,8 +744,7 @@ xchk_get_inode( /* Look up the inode, see if the generation number matches. */ if (xfs_internal_inum(mp, sc->sm->sm_ino)) return -ENOENT; - error = xfs_iget(mp, NULL, sc->sm->sm_ino, - XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip); + error = xchk_iget(sc, sc->sm->sm_ino, &ip); switch (error) { case -ENOENT: /* Inode doesn't exist, just bail out. */ @@ -757,7 +766,7 @@ xchk_get_inode( * that it no longer exists. */ error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap, - XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE); + XFS_IGET_UNTRUSTED); if (error) return -ENOENT; error = -EFSCORRUPTED; @@ -770,7 +779,7 @@ xchk_get_inode( return error; } if (VFS_I(ip)->i_generation != sc->sm->sm_gen) { - xfs_irele(ip); + xchk_irele(sc, ip); return -ENOENT; } @@ -778,6 +787,41 @@ xchk_get_inode( return 0; } +/* Release an inode, possibly dropping it in the process. */ +void +xchk_irele( + struct xfs_scrub *sc, + struct xfs_inode *ip) +{ + if (current->journal_info != NULL) { + ASSERT(current->journal_info == sc->tp); + + /* + * If we are in a transaction, we /cannot/ drop the inode + * ourselves, because the VFS will trigger writeback, which + * can require a transaction. Clear DONTCACHE to force the + * inode to the LRU, where someone else can take care of + * dropping it. + * + * Note that when we grabbed our reference to the inode, it + * could have had an active ref and DONTCACHE set if a sysadmin + * is trying to coerce a change in file access mode. icache + * hits do not clear DONTCACHE, so we must do it here. + */ + spin_lock(&VFS_I(ip)->i_lock); + VFS_I(ip)->i_state &= ~I_DONTCACHE; + spin_unlock(&VFS_I(ip)->i_lock); + } else if (atomic_read(&VFS_I(ip)->i_count) == 1) { + /* + * If this is the last reference to the inode and the caller + * permits it, set DONTCACHE to avoid thrashing. + */ + d_mark_dontcache(VFS_I(ip)); + } + + xfs_irele(ip); +} + /* Set us up to scrub a file's contents. */ int xchk_setup_inode_contents( diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 0efe6b947d88..7472c41d9cfe 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -137,6 +137,9 @@ int xchk_get_inode(struct xfs_scrub *sc); int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks); void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp); +int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp); +void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip); + /* * Don't bother cross-referencing if we already found corruption or cross * referencing discrepancies. diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c index d1b0f23c2c59..677b21c3c865 100644 --- a/fs/xfs/scrub/dir.c +++ b/fs/xfs/scrub/dir.c @@ -86,7 +86,7 @@ xchk_dir_check_ftype( xfs_mode_to_ftype(VFS_I(ip)->i_mode)); if (ino_dtype != dtype) xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset); - xfs_irele(ip); + xchk_irele(sdc->sc, ip); out: return error; } diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c index d8dff3fd8053..2696bb49324a 100644 --- a/fs/xfs/scrub/parent.c +++ b/fs/xfs/scrub/parent.c @@ -131,7 +131,6 @@ xchk_parent_validate( xfs_ino_t dnum, bool *try_again) { - struct xfs_mount *mp = sc->mp; struct xfs_inode *dp = NULL; xfs_nlink_t expected_nlink; xfs_nlink_t nlink; @@ -168,7 +167,7 @@ xchk_parent_validate( * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a * cross referencing error. Any other error is an operational error. */ - error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp); + error = xchk_iget(sc, dnum, &dp); if (error == -EINVAL || error == -ENOENT) { error = -EFSCORRUPTED; xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error); @@ -236,11 +235,11 @@ xchk_parent_validate( /* Drat, parent changed. Try again! */ if (dnum != dp->i_ino) { - xfs_irele(dp); + xchk_irele(sc, dp); *try_again = true; return 0; } - xfs_irele(dp); + xchk_irele(sc, dp); /* * '..' didn't change, so check that there was only one entry @@ -253,7 +252,7 @@ xchk_parent_validate( out_unlock: xfs_iunlock(dp, XFS_IOLOCK_SHARED); out_rele: - xfs_irele(dp); + xchk_irele(sc, dp); out: return error; } diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 7a3557a69fe0..bc9638c7a379 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -181,7 +181,7 @@ xchk_teardown( xfs_iunlock(sc->ip, sc->ilock_flags); if (sc->ip != ip_in && !xfs_internal_inum(sc->mp, sc->ip->i_ino)) - xfs_irele(sc->ip); + xchk_irele(sc, sc->ip); sc->ip = NULL; } if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) From patchwork Fri Dec 30 22:11:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 686DBC4332F for ; Fri, 30 Dec 2022 22:45:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235651AbiL3Wp1 (ORCPT ); Fri, 30 Dec 2022 17:45:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56184 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235648AbiL3Wp0 (ORCPT ); Fri, 30 Dec 2022 17:45:26 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A300E13CE7 for ; Fri, 30 Dec 2022 14:45:25 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 217EEB81D95 for ; Fri, 30 Dec 2022 22:45:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B0009C433EF; Fri, 30 Dec 2022 22:45:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672440322; bh=JIahKHH/l85s1b1gz16xSPI2G6MObjwEZ1+HUZP/OlU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=MC02PLqxa0rbCAgrv3kmHpj7qj53j0OEkI+QuuO5ogTZp+Ff+/Jg35/ObArV9ij9p QFHN3Iz9Tu0mbyGk24q3W6AJWetAfJ3g1ViynmkLV5aHT1QShIlf5kR6vWeoz8GLxQ 5HbfoAN90KOk/0JJVPgovSKtSCGNfMPlgBMuFpKopjkkTbqOARNAt0UAryCS6uyA92 S9oIOaknDZMJxJw5MIhn1yGwTb3CgRby7y5tkQuxRcGC72E9oJXiFsGzcOTr0wgYSF bBKZIjG38cyt+i9aaVRldkAhxEmSJgq4fIzI2EjkYBP0zzDFr8vMKCPdVg8+hj3Jes EEIU+ObZgX6wQ== Subject: [PATCH 2/4] xfs: fix an inode lookup race in xchk_get_inode From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:11:35 -0800 Message-ID: <167243829583.684831.13639022660758037510.stgit@magnolia> In-Reply-To: <167243829551.684831.7487988225134202107.stgit@magnolia> References: <167243829551.684831.7487988225134202107.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong In commit d658e, we tried to improve the robustnes of xchk_get_inode in the face of EINVAL returns from iget by calling xfs_imap to see if the inobt itself thinks that the inode is allocated. Unfortunately, that commit didn't consider the possibility that the inode gets allocated after iget but before imap. In this case, the imap call will succeed, but we turn that into a corruption error and tell userspace the inode is corrupt. Avoid this false corruption report by grabbing the AGI header and retrying the iget before calling imap. If the iget succeeds, we can proceed with the usual scrub-by-handle code. Fix all the incorrect comments too, since unreadable/corrupt inodes no longer result in EINVAL returns. Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode") Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/common.c | 207 ++++++++++++++++++++++++++++++++++++++++--------- fs/xfs/scrub/common.h | 4 + fs/xfs/xfs_icache.c | 3 - fs/xfs/xfs_icache.h | 11 ++- 4 files changed, 182 insertions(+), 43 deletions(-) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 28c43d9f1c56..70ee293bc58f 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -635,6 +635,14 @@ xchk_ag_init( /* Per-scrubber setup functions */ +void +xchk_trans_cancel( + struct xfs_scrub *sc) +{ + xfs_trans_cancel(sc->tp); + sc->tp = NULL; +} + /* * Grab an empty transaction so that we can re-grab locked buffers if * one of our btrees turns out to be cyclic. @@ -720,6 +728,84 @@ xchk_iget( return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp); } +/* + * Try to grab an inode in a manner that avoids races with physical inode + * allocation. If we can't, return the locked AGI buffer so that the caller + * can single-step the loading process to see where things went wrong. + * + * If the iget succeeds, return 0, a NULL AGI, and the inode. + * + * If the iget fails, return the error, the locked AGI, and a NULL inode. This + * can include -EINVAL and -ENOENT for invalid inode numbers or inodes that are + * no longer allocated; or any other corruption or runtime error. + * + * If the AGI read fails, return the error, a NULL AGI, and NULL inode. + * + * If a fatal signal is pending, return -EINTR, a NULL AGI, and a NULL inode. + */ +int +xchk_iget_agi( + struct xfs_scrub *sc, + xfs_ino_t inum, + struct xfs_buf **agi_bpp, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_trans *tp = sc->tp; + struct xfs_perag *pag; + int error; + +again: + *agi_bpp = NULL; + *ipp = NULL; + error = 0; + + if (xchk_should_terminate(sc, &error)) + return error; + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum)); + error = xfs_ialloc_read_agi(pag, tp, agi_bpp); + xfs_perag_put(pag); + if (error) + return error; + + error = xfs_iget(mp, tp, inum, + XFS_IGET_NORETRY | XFS_IGET_UNTRUSTED, 0, ipp); + if (error == -EAGAIN) { + /* + * The inode may be in core but temporarily unavailable and may + * require the AGI buffer before it can be returned. Drop the + * AGI buffer and retry the lookup. + */ + xfs_trans_brelse(tp, *agi_bpp); + delay(1); + goto again; + } + if (error) + return error; + + /* We got the inode, so we can release the AGI. */ + ASSERT(*ipp != NULL); + xfs_trans_brelse(tp, *agi_bpp); + *agi_bpp = NULL; + return 0; +} + +/* Install an inode that we opened by handle for scrubbing. */ +static int +xchk_install_handle_inode( + struct xfs_scrub *sc, + struct xfs_inode *ip) +{ + if (VFS_I(ip)->i_generation != sc->sm->sm_gen) { + xchk_irele(sc, ip); + return -ENOENT; + } + + sc->ip = ip; + return 0; +} + /* * Given an inode and the scrub control structure, grab either the * inode referenced in the control structure or the inode passed in. @@ -731,60 +817,105 @@ xchk_get_inode( { struct xfs_imap imap; struct xfs_mount *mp = sc->mp; + struct xfs_buf *agi_bp; struct xfs_inode *ip_in = XFS_I(file_inode(sc->file)); struct xfs_inode *ip = NULL; + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, sc->sm->sm_ino); int error; + ASSERT(sc->tp == NULL); + /* We want to scan the inode we already had opened. */ if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) { sc->ip = ip_in; return 0; } - /* Look up the inode, see if the generation number matches. */ + /* Reject internal metadata files and obviously bad inode numbers. */ if (xfs_internal_inum(mp, sc->sm->sm_ino)) return -ENOENT; + if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino)) + return -ENOENT; + + /* Try a regular untrusted iget. */ error = xchk_iget(sc, sc->sm->sm_ino, &ip); - switch (error) { - case -ENOENT: - /* Inode doesn't exist, just bail out. */ + if (!error) + return xchk_install_handle_inode(sc, ip); + if (error == -ENOENT) return error; - case 0: - /* Got an inode, continue. */ - break; - case -EINVAL: - /* - * -EINVAL with IGET_UNTRUSTED could mean one of several - * things: userspace gave us an inode number that doesn't - * correspond to fs space, or doesn't have an inobt entry; - * or it could simply mean that the inode buffer failed the - * read verifiers. - * - * Try just the inode mapping lookup -- if it succeeds, then - * the inode buffer verifier failed and something needs fixing. - * Otherwise, we really couldn't find it so tell userspace - * that it no longer exists. - */ - error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap, - XFS_IGET_UNTRUSTED); - if (error) - return -ENOENT; + if (error != -EINVAL) + goto out_error; + + /* + * EINVAL with IGET_UNTRUSTED probably means one of several things: + * userspace gave us an inode number that doesn't correspond to fs + * space; the inode btree lacks a record for this inode; or there is a + * record, and it says this inode is free. + * + * We want to look up this inode in the inobt to distinguish two + * scenarios: (1) the inobt says the inode is free, in which case + * there's nothing to do; and (2) the inobt says the inode is + * allocated, but loading it failed due to corruption. + * + * Allocate a transaction and grab the AGI to prevent inobt activity + * in this AG. Retry the iget in case someone allocated a new inode + * after the first iget failed. + */ + error = xchk_trans_alloc(sc, 0); + if (error) + goto out_error; + + error = xchk_iget_agi(sc, sc->sm->sm_ino, &agi_bp, &ip); + if (error == 0) { + /* Actually got the inode, so install it. */ + xchk_trans_cancel(sc); + return xchk_install_handle_inode(sc, ip); + } + if (error == -ENOENT) + goto out_gone; + if (error != -EINVAL) + goto out_cancel; + + /* Ensure that we have protected against inode allocation/freeing. */ + if (agi_bp == NULL) { + ASSERT(agi_bp != NULL); + error = -ECANCELED; + goto out_cancel; + } + + /* + * Untrusted iget failed a second time. Let's try an inobt lookup. + * If the inobt thinks this the inode neither can exist inside the + * filesystem nor is allocated, return ENOENT to signal that the check + * can be skipped. + * + * If the lookup returns corruption, we'll mark this inode corrupt and + * exit to userspace. There's little chance of fixing anything until + * the inobt is straightened out, but there's nothing we can do here. + * + * If the lookup encounters any other error, exit to userspace. + * + * If the lookup succeeds, something else must be very wrong in the fs + * such that setting up the incore inode failed in some strange way. + * Treat those as corruptions. + */ + error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap, + XFS_IGET_UNTRUSTED); + if (error == -EINVAL || error == -ENOENT) + goto out_gone; + if (!error) error = -EFSCORRUPTED; - fallthrough; - default: - trace_xchk_op_error(sc, - XFS_INO_TO_AGNO(mp, sc->sm->sm_ino), - XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino), - error, __return_address); - return error; - } - if (VFS_I(ip)->i_generation != sc->sm->sm_gen) { - xchk_irele(sc, ip); - return -ENOENT; - } - sc->ip = ip; - return 0; +out_cancel: + xchk_trans_cancel(sc); +out_error: + trace_xchk_op_error(sc, agno, XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino), + error, __return_address); + return error; +out_gone: + /* The file is gone, so there's nothing to check. */ + xchk_trans_cancel(sc); + return -ENOENT; } /* Release an inode, possibly dropping it in the process. */ diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 7472c41d9cfe..6a7fe2596841 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -32,6 +32,8 @@ xchk_should_terminate( } int xchk_trans_alloc(struct xfs_scrub *sc, uint resblks); +void xchk_trans_cancel(struct xfs_scrub *sc); + bool xchk_process_error(struct xfs_scrub *sc, xfs_agnumber_t agno, xfs_agblock_t bno, int *error); bool xchk_fblock_process_error(struct xfs_scrub *sc, int whichfork, @@ -138,6 +140,8 @@ int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks); void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp); int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp); +int xchk_iget_agi(struct xfs_scrub *sc, xfs_ino_t inum, + struct xfs_buf **agi_bpp, struct xfs_inode **ipp); void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip); /* diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index ddeaccc04aec..0d58d7b0d8ac 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -767,7 +767,8 @@ xfs_iget( return 0; out_error_or_again: - if (!(flags & XFS_IGET_INCORE) && error == -EAGAIN) { + if (!(flags & (XFS_IGET_INCORE | XFS_IGET_NORETRY)) && + error == -EAGAIN) { delay(1); goto again; } diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 6cd180721659..87910191a9dd 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -34,10 +34,13 @@ struct xfs_icwalk { /* * Flags for xfs_iget() */ -#define XFS_IGET_CREATE 0x1 -#define XFS_IGET_UNTRUSTED 0x2 -#define XFS_IGET_DONTCACHE 0x4 -#define XFS_IGET_INCORE 0x8 /* don't read from disk or reinit */ +#define XFS_IGET_CREATE (1U << 0) +#define XFS_IGET_UNTRUSTED (1U << 1) +#define XFS_IGET_DONTCACHE (1U << 2) +/* don't read from disk or reinit */ +#define XFS_IGET_INCORE (1U << 3) +/* Return -EAGAIN immediately if the inode is unavailable. */ +#define XFS_IGET_NORETRY (1U << 4) int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino, uint flags, uint lock_flags, xfs_inode_t **ipp); From patchwork Fri Dec 30 22:11:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 546DEC4332F for ; Fri, 30 Dec 2022 22:45:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235648AbiL3Wpm (ORCPT ); Fri, 30 Dec 2022 17:45:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235652AbiL3Wpl (ORCPT ); Fri, 30 Dec 2022 17:45:41 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEF6B13CE7 for ; Fri, 30 Dec 2022 14:45:40 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8609FB81C22 for ; Fri, 30 Dec 2022 22:45:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3B5B6C433EF; Fri, 30 Dec 2022 22:45:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672440338; bh=x9z6zo/EXsYlr+IeQvB+VgiHTZHQHfVYShkX2j6qVeY=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=rkGu6lygb2mhQ7NpVxra4yJo7N6t0KHKLDuG+1JhoPYk9alXGmNmPBpgyU1uNARi/ nnbxphTPDsMIZYBOkwbAb2NDwvIsgFqvUmA1RzvaX6yPNVMmo1FVota8BbYK5PANaA 0RGTaahkIUVuTcePhru3Nu/iPHHnd8VdusTWdIqDsv9S90nUWMpYDtLxS3GlXIl3YZ 9twu8VZCRYC+djLd6CF9i2zpHfc5j5kn7DffPZ5YaMxv3P5tf4OnjML0xVfG+BAm/i mqgSEHg8A43rPEP6BI8P/IsXDT7hLLU/UTSkD3MIcBgYtPxHYXKq3Om4sx3X56QFgU /pEgzyx50kOLA== Subject: [PATCH 3/4] xfs: rename xchk_get_inode -> xchk_iget_for_scrubbing From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:11:36 -0800 Message-ID: <167243829597.684831.7183936100069813265.stgit@magnolia> In-Reply-To: <167243829551.684831.7487988225134202107.stgit@magnolia> References: <167243829551.684831.7487988225134202107.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Dave Chinner suggested renaming this function to make more obvious what it does. The function returns an incore inode to callers that want to scrub a metadata structure that hangs off an inode. If the iget fails with EINVAL, it will single-step the loading process to distinguish between actually free inodes or impossible inumbers (ENOENT); discrepancies between the inobt freemask and the free status in the inode record (EFSCORRUPTED). Any other negative errno is returned unchanged. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/bmap.c | 2 +- fs/xfs/scrub/common.c | 12 +++++++----- fs/xfs/scrub/common.h | 2 +- fs/xfs/scrub/inode.c | 2 +- 4 files changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index b195bc0e09a4..fe13da54e133 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -34,7 +34,7 @@ xchk_setup_inode_bmap( if (xchk_need_fshook_drain(sc)) xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); - error = xchk_get_inode(sc); + error = xchk_iget_for_scrubbing(sc); if (error) goto out; diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 70ee293bc58f..90f53f415d99 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -807,12 +807,14 @@ xchk_install_handle_inode( } /* - * Given an inode and the scrub control structure, grab either the - * inode referenced in the control structure or the inode passed in. - * The inode is not locked. + * In preparation to scrub metadata structures that hang off of an inode, + * grab either the inode referenced in the scrub control structure or the + * inode passed in. If the inumber does not reference an allocated inode + * record, the function returns ENOENT to end the scrub early. The inode + * is not locked. */ int -xchk_get_inode( +xchk_iget_for_scrubbing( struct xfs_scrub *sc) { struct xfs_imap imap; @@ -961,7 +963,7 @@ xchk_setup_inode_contents( { int error; - error = xchk_get_inode(sc); + error = xchk_iget_for_scrubbing(sc); if (error) return error; diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 6a7fe2596841..5ef27e6bdac6 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -135,7 +135,7 @@ int xchk_count_rmap_ownedby_ag(struct xfs_scrub *sc, struct xfs_btree_cur *cur, const struct xfs_owner_info *oinfo, xfs_filblks_t *blocks); int xchk_setup_ag_btree(struct xfs_scrub *sc, bool force_log); -int xchk_get_inode(struct xfs_scrub *sc); +int xchk_iget_for_scrubbing(struct xfs_scrub *sc); int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks); void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp); diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 3b272c86d0ad..39ac7cc09fbd 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -39,7 +39,7 @@ xchk_setup_inode( * Try to get the inode. If the verifiers fail, we try again * in raw mode. */ - error = xchk_get_inode(sc); + error = xchk_iget_for_scrubbing(sc); switch (error) { case 0: break; From patchwork Fri Dec 30 22:11:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13084666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD74CC4332F for ; Fri, 30 Dec 2022 22:45:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235655AbiL3Wp6 (ORCPT ); Fri, 30 Dec 2022 17:45:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235652AbiL3Wp5 (ORCPT ); Fri, 30 Dec 2022 17:45:57 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6749832F for ; Fri, 30 Dec 2022 14:45:56 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 14950B81C22 for ; Fri, 30 Dec 2022 22:45:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C982BC433D2; Fri, 30 Dec 2022 22:45:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672440353; bh=ss4HSUGiNE347TM7oxjMvA8FZDCGCRazOdKSvQITjaw=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=N4lp9w7RIY7TbPYYhbUP6S8n/1GSRWSwWAltW5VwaLgR4NmMv0bz2uZOD/r6zhuWY CIWqyFJ04dsLzGDbx0RaCvUF7Km1aGZAD+bGLyvzjY+Zy3GA8eOgiJakXi+rj2A6IQ qR7sy4ovC+J7hXj3rVKuPcIDQHWPjhqW57H4+sMcYXf21+3Nveyv9+fQmMAOh5+MOA UQPer7j14GA2cfrMVEGcJgIIZ5HLvdZgNlYLNVpVI5pr5FNwwZ8rowxyqt/Hm3SNUh e+MdAcXGypgPyRk0PnUS3kJC26e2gh388vB3ft41W11b2fuK/0a7OZIqTskSpMmj1r ILBtR2yBXbLEg== Subject: [PATCH 4/4] xfs: retain the AGI when we can't iget an inode to scrub the core From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:11:36 -0800 Message-ID: <167243829611.684831.4165413752588458060.stgit@magnolia> In-Reply-To: <167243829551.684831.7487988225134202107.stgit@magnolia> References: <167243829551.684831.7487988225134202107.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong xchk_get_inode is not quite the right function to be calling from the inode scrubber setup function. The common get_inode function either gets an inode and installs it in the scrub context, or it returns an error code explaining what happened. This is acceptable for most file scrubbers because it is not in their scope to fix corruptions in the inode core and fork areas that cause iget to fail. Dealing with these problems is within the scope of the inode scrubber, however. If iget fails with EFSCORRUPTED, we need to xchk_inode to flag that as corruption. Since we can't get our hands on an incore inode, we need to hold the AGI to prevent inode allocation activity so that nothing changes in the inode metadata. Looking ahead to the inode core repair patches, we will also need to hold the AGI buffer into xrep_inode so that we can make modifications to the xfs_dinode structure without any other thread swooping in to allocate or free the inode. Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off use case where the error codes we check for are a little different, and the return state is much different from the common function. xchk_setup_inode prepares to check or repair an inode record, so it must continue the scrub operation even if the inode/inobt verifiers cause xfs_iget to return EFSCORRUPTED. This is done by attaching the locked AGI buffer to the scrub transaction and returning 0 to move on to the actual scrub. (Later, the online inode repair code will also want the xfs_imap structure so that it can reset the ondisk xfs_dinode structure.) xchk_get_inode retrieves an inode on behalf of a scrubber that operates on an incore inode -- data/attr/cow forks, directories, xattrs, symlinks, parent pointers, etc. If the inode/inobt verifiers fail and xfs_iget returns EFSCORRUPTED, we want to exit to userspace (because the caller should be fix the inode first) and drop everything we acquired along the way. A behavior common to both functions is that it's possible that xfs_scrub asked for a scrub-by-handle concurrent with the inode being freed or the passed-in inumber is invalid. In this case, we call xfs_imap to see if the inobt index thinks the inode is allocated, and return ENOENT ("nothing to check here") to userspace if this is not the case. The imap lookup is why both functions call xchk_iget_agi. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/common.c | 2 - fs/xfs/scrub/common.h | 1 fs/xfs/scrub/inode.c | 180 +++++++++++++++++++++++++++++++++++++++++-------- 3 files changed, 153 insertions(+), 30 deletions(-) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 90f53f415d99..e0c1be0161f3 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -792,7 +792,7 @@ xchk_iget_agi( } /* Install an inode that we opened by handle for scrubbing. */ -static int +int xchk_install_handle_inode( struct xfs_scrub *sc, struct xfs_inode *ip) diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 5ef27e6bdac6..07daea2c7ab4 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -143,6 +143,7 @@ int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp); int xchk_iget_agi(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_buf **agi_bpp, struct xfs_inode **ipp); void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip); +int xchk_install_handle_inode(struct xfs_scrub *sc, struct xfs_inode *ip); /* * Don't bother cross-referencing if we already found corruption or cross diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 39ac7cc09fbd..51b8ba7037f3 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -11,8 +11,10 @@ #include "xfs_mount.h" #include "xfs_btree.h" #include "xfs_log_format.h" +#include "xfs_trans.h" #include "xfs_inode.h" #include "xfs_ialloc.h" +#include "xfs_icache.h" #include "xfs_da_format.h" #include "xfs_reflink.h" #include "xfs_rmap.h" @@ -20,48 +22,168 @@ #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" +#include "scrub/trace.h" -/* - * Grab total control of the inode metadata. It doesn't matter here if - * the file data is still changing; exclusive access to the metadata is - * the goal. - */ -int -xchk_setup_inode( +/* Prepare the attached inode for scrubbing. */ +static inline int +xchk_prepare_iscrub( struct xfs_scrub *sc) { int error; - if (xchk_need_fshook_drain(sc)) - xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); - - /* - * Try to get the inode. If the verifiers fail, we try again - * in raw mode. - */ - error = xchk_iget_for_scrubbing(sc); - switch (error) { - case 0: - break; - case -EFSCORRUPTED: - case -EFSBADCRC: - return xchk_trans_alloc(sc, 0); - default: - return error; - } - - /* Got the inode, lock it and we're ready to go. */ sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; xfs_ilock(sc->ip, sc->ilock_flags); + error = xchk_trans_alloc(sc, 0); if (error) - goto out; + return error; + sc->ilock_flags |= XFS_ILOCK_EXCL; xfs_ilock(sc->ip, XFS_ILOCK_EXCL); + return 0; +} -out: - /* scrub teardown will unlock and release the inode for us */ +/* Install this scrub-by-handle inode and prepare it for scrubbing. */ +static inline int +xchk_install_handle_iscrub( + struct xfs_scrub *sc, + struct xfs_inode *ip) +{ + int error; + + error = xchk_install_handle_inode(sc, ip); + if (error) + return error; + + return xchk_prepare_iscrub(sc); +} + +/* + * Grab total control of the inode metadata. In the best case, we grab the + * incore inode and take all locks on it. If the incore inode cannot be + * constructed due to corruption problems, lock the AGI so that we can single + * step the loading process to fix everything that can go wrong. + */ +int +xchk_setup_inode( + struct xfs_scrub *sc) +{ + struct xfs_imap imap; + struct xfs_inode *ip; + struct xfs_mount *mp = sc->mp; + struct xfs_inode *ip_in = XFS_I(file_inode(sc->file)); + struct xfs_buf *agi_bp; + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, sc->sm->sm_ino); + int error; + + if (xchk_need_fshook_drain(sc)) + xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); + + /* We want to scan the opened inode, so lock it and exit. */ + if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) { + sc->ip = ip_in; + return xchk_prepare_iscrub(sc); + } + + /* Reject internal metadata files and obviously bad inode numbers. */ + if (xfs_internal_inum(mp, sc->sm->sm_ino)) + return -ENOENT; + if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino)) + return -ENOENT; + + /* Try a regular untrusted iget. */ + error = xchk_iget(sc, sc->sm->sm_ino, &ip); + if (!error) + return xchk_install_handle_iscrub(sc, ip); + if (error == -ENOENT) + return error; + if (error != -EFSCORRUPTED && error != -EFSBADCRC && error != -EINVAL) + goto out_error; + + /* + * EINVAL with IGET_UNTRUSTED probably means one of several things: + * userspace gave us an inode number that doesn't correspond to fs + * space; the inode btree lacks a record for this inode; or there is + * a record, and it says this inode is free. + * + * EFSCORRUPTED/EFSBADCRC could mean that the inode was mappable, but + * some other metadata corruption (e.g. inode forks) prevented + * instantiation of the incore inode. Or it could mean the inobt is + * corrupt. + * + * We want to look up this inode in the inobt directly to distinguish + * three different scenarios: (1) the inobt says the inode is free, + * in which case there's nothing to do; (2) the inobt is corrupt so we + * should flag the corruption and exit to userspace to let it fix the + * inobt; and (3) the inobt says the inode is allocated, but loading it + * failed due to corruption. + * + * Allocate a transaction and grab the AGI to prevent inobt activity in + * this AG. Retry the iget in case someone allocated a new inode after + * the first iget failed. + */ + error = xchk_trans_alloc(sc, 0); + if (error) + goto out_error; + + error = xchk_iget_agi(sc, sc->sm->sm_ino, &agi_bp, &ip); + if (error == 0) { + /* Actually got the incore inode, so install it and proceed. */ + xchk_trans_cancel(sc); + return xchk_install_handle_iscrub(sc, ip); + } + if (error == -ENOENT) + goto out_gone; + if (error != -EFSCORRUPTED && error != -EFSBADCRC && error != -EINVAL) + goto out_cancel; + + /* Ensure that we have protected against inode allocation/freeing. */ + if (agi_bp == NULL) { + ASSERT(agi_bp != NULL); + error = -ECANCELED; + goto out_cancel; + } + + /* + * Untrusted iget failed a second time. Let's try an inobt lookup. + * If the inobt doesn't think this is an allocated inode then we'll + * return ENOENT to signal that the check can be skipped. + * + * If the lookup signals corruption, we'll mark this inode corrupt and + * exit to userspace. There's little chance of fixing anything until + * the inobt is straightened out, but there's nothing we can do here. + * + * If the lookup encounters a runtime error, exit to userspace. + */ + error = xfs_imap(mp, sc->tp, sc->sm->sm_ino, &imap, + XFS_IGET_UNTRUSTED); + if (error == -EINVAL || error == -ENOENT) + goto out_gone; + if (error) + goto out_cancel; + + /* + * The lookup succeeded. Chances are the ondisk inode is corrupt and + * preventing iget from reading it. Retain the scrub transaction and + * the AGI buffer to prevent anyone from allocating or freeing inodes. + * This ensures that we preserve the inconsistency between the inobt + * saying the inode is allocated and the icache being unable to load + * the inode until we can flag the corruption in xchk_inode. The + * scrub function has to note the corruption, since we're not really + * supposed to do that from the setup function. + */ + return 0; + +out_cancel: + xchk_trans_cancel(sc); +out_error: + trace_xchk_op_error(sc, agno, XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino), + error, __return_address); return error; +out_gone: + /* The file is gone, so there's nothing to check. */ + xchk_trans_cancel(sc); + return -ENOENT; } /* Inode core */