From patchwork Fri Dec 30 22:14:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085008 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C279C4332F for ; Fri, 30 Dec 2022 23:55:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235683AbiL3Xzz (ORCPT ); Fri, 30 Dec 2022 18:55:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229733AbiL3Xzy (ORCPT ); Fri, 30 Dec 2022 18:55:54 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C6511740A for ; Fri, 30 Dec 2022 15:55:52 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 28DFC61C9A for ; Fri, 30 Dec 2022 23:55:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8CDDAC433EF; Fri, 30 Dec 2022 23:55:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444551; bh=Tw1OOxAo7nJQMaKlmNsNQ19tYLpxPxp7XKcecp+T0Gg=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=My2fC5QP3wGIs08NO1TW8AhOTw3VUZJkB0piN1gX8AwCyPcp0pnK1vg1s1ZxoMO72 sIBrr6TpW4TQ/o7Ym/yxQOaQ9BnHTsaKYyccPHGLOXFAxvkmqpQXPthBsugic7Xu2K uAgtsWVfhn+BdoBc8inBzyrN7i5MpSiqm7XsR9l1S9HW2i89Fm0t/iNrzZ3e+ceVqu D+688hhh6i6vhZWSkzNb/feE6HTUsx+0zpkuuKyHtVJFADF4UtjPS1IlZYRves3hEQ uTBEThMwBIGC71fmwSJ0AZk0MN7SSWysxh0bfTwt7kkJ53HU2wLliW/Lm7G/r+RCPy abTpLlw/QyGIQ== Subject: [PATCH 1/4] xfs: hide private inodes from bulkstat and handle functions From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:14:01 -0800 Message-ID: <167243844133.699982.515386507193418795.stgit@magnolia> In-Reply-To: <167243844115.699982.6114366012860370017.stgit@magnolia> References: <167243844115.699982.6114366012860370017.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong We're about to start adding functionality that uses internal inodes that are private to XFS. What this means is that userspace should never be able to access any information about these files, and should not be able to open these files by handle. Callers are not allowed to link these files into the directory tree, which should suffice to make these private inodes actually private. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_export.c | 2 +- fs/xfs/xfs_itable.c | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c index 1064c2342876..b6ba96e0dd75 100644 --- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -146,7 +146,7 @@ xfs_nfs_get_inode( return ERR_PTR(error); } - if (VFS_I(ip)->i_generation != generation) { + if (VFS_I(ip)->i_generation != generation || IS_PRIVATE(VFS_I(ip))) { xfs_irele(ip); return ERR_PTR(-ESTALE); } diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index a1c2bcf65d37..7a967cc78010 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -86,6 +86,14 @@ xfs_bulkstat_one_int( vfsuid = i_uid_into_vfsuid(mnt_userns, inode); vfsgid = i_gid_into_vfsgid(mnt_userns, inode); + /* If this is a private inode, don't leak its details to userspace. */ + if (IS_PRIVATE(inode)) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_irele(ip); + error = -EINVAL; + goto out_advance; + } + /* xfs_iget returns the following without needing * further change. */ From patchwork Fri Dec 30 22:14:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085009 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED0D4C4332F for ; Fri, 30 Dec 2022 23:56:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235727AbiL3X4m (ORCPT ); Fri, 30 Dec 2022 18:56:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229733AbiL3X4L (ORCPT ); Fri, 30 Dec 2022 18:56:11 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2ADE15701 for ; Fri, 30 Dec 2022 15:56:09 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5CED4B81DD2 for ; Fri, 30 Dec 2022 23:56:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21E6BC433EF; Fri, 30 Dec 2022 23:56:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444567; bh=jk2UFEqO3ShK0K15cb/w69SMbh9LngEb6mKU2T3JfYM=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=thpdBzTtmKJB70eCDRyyXbPsB/ldcA9ZoycrkYoR9wUBr9gbOxflCb7IHnDPuqeyH 6rKah9bckbnIurcD/XqFJbj0OhQUDcoLK2C27WJmbSqSHPaM0Rt1DVyccC7gbBeBTf FQMwwTurEfb+sDAfcZARJETHhs9Skfma5rNy1vHT59l91UiPcNODNMUAWsUg2pjuOG v/EiT3mD8A7XhXr0FBkcf4+tlre7cjXdADsik+/bUesa0hAv0spMWCEAvYhtijhSeH OrV4LHtWN8VD0p92WLu7xwqvyWtPOKVfMC2s98y9zIeR47CMQHT98hnf4MAYB9mM3+ T5cFbzChJ5IJg== Subject: [PATCH 2/4] xfs: create temporary files and directories for online repair From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:14:01 -0800 Message-ID: <167243844148.699982.16543964563257605304.stgit@magnolia> In-Reply-To: <167243844115.699982.6114366012860370017.stgit@magnolia> References: <167243844115.699982.6114366012860370017.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Teach the online repair code how to create temporary files or directories. These temporary files can be used to stage reconstructed information until we're ready to perform an atomic extent swap to commit the new metadata. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/common.c | 6 + fs/xfs/scrub/parent.c | 2 fs/xfs/scrub/scrub.c | 3 + fs/xfs/scrub/scrub.h | 4 + fs/xfs/scrub/tempfile.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/tempfile.h | 27 +++++ fs/xfs/scrub/trace.h | 33 +++++++ fs/xfs/xfs_inode.c | 3 - fs/xfs/xfs_inode.h | 2 10 files changed, 308 insertions(+), 4 deletions(-) create mode 100644 fs/xfs/scrub/tempfile.c create mode 100644 fs/xfs/scrub/tempfile.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 23b0c40620cf..6df1cd3b46ca 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -201,6 +201,7 @@ xfs-y += $(addprefix scrub/, \ refcount_repair.o \ repair.o \ rmap_repair.o \ + tempfile.o \ xfbtree.o \ ) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index bde9159dca4a..7eade2567af6 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -37,6 +37,7 @@ #include "scrub/trace.h" #include "scrub/repair.h" #include "scrub/health.h" +#include "scrub/tempfile.h" /* Common code for the metadata scrubbers. */ @@ -1047,7 +1048,10 @@ xchk_setup_inode_contents( if (error) return error; - /* Got the inode, lock it and we're ready to go. */ + /* + * Prepare to scrub the file contents by locking out IO and page + * faults. + */ xchk_ilock(sc, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); error = xchk_trans_alloc(sc, resblks); if (error) diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c index 84740ffee0d2..58d012252015 100644 --- a/fs/xfs/scrub/parent.c +++ b/fs/xfs/scrub/parent.c @@ -181,7 +181,7 @@ xchk_parent_validate( } if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error)) return error; - if (dp == sc->ip || !S_ISDIR(VFS_I(dp)->i_mode)) { + if (dp == sc->ip || dp == sc->tempip || !S_ISDIR(VFS_I(dp)->i_mode)) { xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); goto out_rele; } diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index c6eb692a0822..7bab30c2766e 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -18,12 +18,14 @@ #include "xfs_btree.h" #include "xfs_btree_staging.h" #include "xfs_rmap.h" +#include "xfs_xchgrange.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" #include "scrub/repair.h" #include "scrub/health.h" #include "scrub/xfile.h" +#include "scrub/tempfile.h" /* * Online Scrub and Repair @@ -211,6 +213,7 @@ xchk_teardown( sc->buf = NULL; } + xrep_tempfile_rele(sc); xchk_fshooks_disable(sc); return error; } diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index cf18bb4e8b35..7892901ad70b 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -105,6 +105,10 @@ struct xfs_scrub { /* Lock flags for @ip. */ uint ilock_flags; + /* A temporary file on this filesystem, for staging new metadata. */ + struct xfs_inode *tempip; + uint temp_ilock_flags; + /* See the XCHK/XREP state flags below. */ unsigned int flags; diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c new file mode 100644 index 000000000000..797088deb7cb --- /dev/null +++ b/fs/xfs/scrub/tempfile.c @@ -0,0 +1,231 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_inode.h" +#include "xfs_ialloc.h" +#include "xfs_quota.h" +#include "xfs_bmap_btree.h" +#include "xfs_trans_space.h" +#include "xfs_dir2.h" +#include "xfs_xchgrange.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/tempfile.h" + +/* + * Create a temporary file for reconstructing metadata, with the intention of + * atomically swapping the temporary file's contents with the file that's + * being repaired. + */ +int +xrep_tempfile_create( + struct xfs_scrub *sc, + uint16_t mode) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_trans *tp = NULL; + struct xfs_dquot *udqp = NULL; + struct xfs_dquot *gdqp = NULL; + struct xfs_dquot *pdqp = NULL; + struct xfs_trans_res *tres; + struct xfs_inode *dp = mp->m_rootip; + xfs_ino_t ino; + unsigned int resblks; + bool is_dir = S_ISDIR(mode); + int error; + + if (xfs_is_shutdown(mp)) + return -EIO; + if (xfs_is_readonly(mp)) + return -EROFS; + + ASSERT(sc->tp == NULL); + ASSERT(sc->tempip == NULL); + + /* + * Make sure that we have allocated dquot(s) on disk. The temporary + * inode should be completely root owned so that we don't fail due to + * quota limits. + */ + error = xfs_qm_vop_dqalloc(dp, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0, + XFS_QMOPT_QUOTALL, &udqp, &gdqp, &pdqp); + if (error) + return error; + + if (is_dir) { + resblks = XFS_MKDIR_SPACE_RES(mp, 0); + tres = &M_RES(mp)->tr_mkdir; + } else { + resblks = XFS_IALLOC_SPACE_RES(mp); + tres = &M_RES(mp)->tr_create_tmpfile; + } + + error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks, + &tp); + if (error) + goto out_release_dquots; + + /* Allocate inode, set up directory. */ + error = xfs_dialloc(&tp, dp->i_ino, mode, &ino); + if (error) + goto out_trans_cancel; + error = xfs_init_new_inode(&init_user_ns, tp, dp, ino, mode, 0, 0, + 0, false, &sc->tempip); + if (error) + goto out_trans_cancel; + + /* Change the ownership of the inode to root. */ + VFS_I(sc->tempip)->i_uid = GLOBAL_ROOT_UID; + VFS_I(sc->tempip)->i_gid = GLOBAL_ROOT_GID; + sc->tempip->i_diflags &= ~(XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT); + xfs_trans_log_inode(tp, sc->tempip, XFS_ILOG_CORE); + + /* + * Mark our temporary file as private so that LSMs and the ACL code + * don't try to add their own metadata or reason about these files. + * The file should never be exposed to userspace. + */ + VFS_I(sc->tempip)->i_flags |= S_PRIVATE; + VFS_I(sc->tempip)->i_opflags &= ~IOP_XATTR; + + if (is_dir) { + error = xfs_dir_init(tp, sc->tempip, dp); + if (error) + goto out_trans_cancel; + } + + /* + * Attach the dquot(s) to the inodes and modify them incore. + * These ids of the inode couldn't have changed since the new + * inode has been locked ever since it was created. + */ + xfs_qm_vop_create_dqattach(tp, sc->tempip, udqp, gdqp, pdqp); + + /* + * Put our temp file on the unlinked list so it's purged automatically. + * Anything being reconstructed using this file must be atomically + * swapped with the original file because the contents here will be + * purged when the inode is dropped or log recovery cleans out the + * unlinked list. + */ + error = xfs_iunlink(tp, sc->tempip); + if (error) + goto out_trans_cancel; + + error = xfs_trans_commit(tp); + if (error) + goto out_release_inode; + + trace_xrep_tempfile_create(sc); + + xfs_qm_dqrele(udqp); + xfs_qm_dqrele(gdqp); + xfs_qm_dqrele(pdqp); + + /* Finish setting up the incore / vfs context. */ + xfs_setup_iops(sc->tempip); + xfs_finish_inode_setup(sc->tempip); + + sc->temp_ilock_flags = 0; + return error; + +out_trans_cancel: + xfs_trans_cancel(tp); +out_release_inode: + /* + * Wait until after the current transaction is aborted to finish the + * setup of the inode and release the inode. This prevents recursive + * transactions and deadlocks from xfs_inactive. + */ + if (sc->tempip) { + xfs_finish_inode_setup(sc->tempip); + xchk_irele(sc, sc->tempip); + } +out_release_dquots: + xfs_qm_dqrele(udqp); + xfs_qm_dqrele(gdqp); + xfs_qm_dqrele(pdqp); + + return error; +} + +/* Take IOLOCK_EXCL on the temporary file, maybe. */ +bool +xrep_tempfile_iolock_nowait( + struct xfs_scrub *sc) +{ + if (xfs_ilock_nowait(sc->tempip, XFS_IOLOCK_EXCL)) { + sc->temp_ilock_flags |= XFS_IOLOCK_EXCL; + return true; + } + + return false; +} + +/* Release IOLOCK_EXCL on the temporary file. */ +void +xrep_tempfile_iounlock( + struct xfs_scrub *sc) +{ + xfs_iunlock(sc->tempip, XFS_IOLOCK_EXCL); + sc->temp_ilock_flags &= ~XFS_IOLOCK_EXCL; +} + +/* Prepare the temporary file for metadata updates by grabbing ILOCK_EXCL. */ +void +xrep_tempfile_ilock( + struct xfs_scrub *sc) +{ + sc->temp_ilock_flags |= XFS_ILOCK_EXCL; + xfs_ilock(sc->tempip, XFS_ILOCK_EXCL); +} + +/* Try to grab ILOCK_EXCL on the temporary file. */ +bool +xrep_tempfile_ilock_nowait( + struct xfs_scrub *sc) +{ + if (xfs_ilock_nowait(sc->tempip, XFS_ILOCK_EXCL)) { + sc->temp_ilock_flags |= XFS_ILOCK_EXCL; + return true; + } + + return false; +} + +/* Unlock ILOCK_EXCL on the temporary file after an update. */ +void +xrep_tempfile_iunlock( + struct xfs_scrub *sc) +{ + xfs_iunlock(sc->tempip, XFS_ILOCK_EXCL); + sc->temp_ilock_flags &= ~XFS_ILOCK_EXCL; +} + +/* Release the temporary file. */ +void +xrep_tempfile_rele( + struct xfs_scrub *sc) +{ + if (!sc->tempip) + return; + + if (sc->temp_ilock_flags) { + xfs_iunlock(sc->tempip, sc->temp_ilock_flags); + sc->temp_ilock_flags = 0; + } + + xchk_irele(sc, sc->tempip); + sc->tempip = NULL; +} diff --git a/fs/xfs/scrub/tempfile.h b/fs/xfs/scrub/tempfile.h new file mode 100644 index 000000000000..f00a9ce43a32 --- /dev/null +++ b/fs/xfs/scrub/tempfile.h @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SCRUB_TEMPFILE_H__ +#define __XFS_SCRUB_TEMPFILE_H__ + +#ifdef CONFIG_XFS_ONLINE_REPAIR +int xrep_tempfile_create(struct xfs_scrub *sc, uint16_t mode); +void xrep_tempfile_rele(struct xfs_scrub *sc); + +bool xrep_tempfile_iolock_nowait(struct xfs_scrub *sc); +void xrep_tempfile_iounlock(struct xfs_scrub *sc); + +void xrep_tempfile_ilock(struct xfs_scrub *sc); +bool xrep_tempfile_ilock_nowait(struct xfs_scrub *sc); +void xrep_tempfile_iunlock(struct xfs_scrub *sc); +#else +static inline void xrep_tempfile_iolock_both(struct xfs_scrub *sc) +{ + xchk_ilock(sc, XFS_IOLOCK_EXCL); +} +# define xrep_tempfile_rele(sc) +#endif /* CONFIG_XFS_ONLINE_REPAIR */ + +#endif /* __XFS_SCRUB_TEMPFILE_H__ */ diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 558bef72b569..b1a39a206730 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -2157,6 +2157,39 @@ TRACE_EVENT(xrep_rmap_live_update, __entry->flags) ); +TRACE_EVENT(xrep_tempfile_create, + TP_PROTO(struct xfs_scrub *sc), + TP_ARGS(sc), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino) + __field(unsigned int, type) + __field(xfs_agnumber_t, agno) + __field(xfs_ino_t, inum) + __field(unsigned int, gen) + __field(unsigned int, flags) + __field(xfs_ino_t, temp_inum) + ), + TP_fast_assign( + __entry->dev = sc->mp->m_super->s_dev; + __entry->ino = sc->file ? XFS_I(file_inode(sc->file))->i_ino : 0; + __entry->type = sc->sm->sm_type; + __entry->agno = sc->sm->sm_agno; + __entry->inum = sc->sm->sm_ino; + __entry->gen = sc->sm->sm_gen; + __entry->flags = sc->sm->sm_flags; + __entry->temp_inum = sc->tempip->i_ino; + ), + TP_printk("dev %d:%d ino 0x%llx type %s inum 0x%llx gen 0x%x flags 0x%x temp_inum 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, + __print_symbolic(__entry->type, XFS_SCRUB_TYPE_STRINGS), + __entry->inum, + __entry->gen, + __entry->flags, + __entry->temp_inum) +); + #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 04ceafb936bc..d81c864207bb 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -41,7 +41,6 @@ struct kmem_cache *xfs_inode_cache; -STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *); STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, struct xfs_inode *); @@ -2103,7 +2102,7 @@ xfs_iunlink_insert_inode( * We place the on-disk inode on a list in the AGI. It will be pulled from this * list when the inode is freed. */ -STATIC int +int xfs_iunlink( struct xfs_trans *tp, struct xfs_inode *ip) diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 444c43571e31..3f6c63304ca3 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -575,6 +575,8 @@ extern struct kmem_cache *xfs_inode_cache; bool xfs_inode_needs_inactive(struct xfs_inode *ip); +int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); + void xfs_end_io(struct work_struct *work); int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); From patchwork Fri Dec 30 22:14:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAD6FC3DA7C for ; Fri, 30 Dec 2022 23:56:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229733AbiL3X4n (ORCPT ); Fri, 30 Dec 2022 18:56:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235840AbiL3X4Y (ORCPT ); Fri, 30 Dec 2022 18:56:24 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7A4815701 for ; Fri, 30 Dec 2022 15:56:23 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 55E7161C9A for ; Fri, 30 Dec 2022 23:56:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B113DC433EF; Fri, 30 Dec 2022 23:56:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444582; bh=F/LTdvlAVaR/4XCwZi1HyYdUVioX73bemd1u/8HGS+Q=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=BeHPJpb/DqJCRgsjfPZLOIdeWhWxySIaq85Up3C1qPPLevvJdLH176SOTiwlKQeGg d5V3bDQvSyTWJWd+3n47Tp0Zx9drMwtHNic7pedypCmO++m6HST3pFCy+m4C8Xl0Dj fM0JXPgAjsjN2/BWvldR1PpTsfZCbL/P0ghPpFANZRSwAAKkrJ7zAyTCJfokqVzYSW GCqBWSClRx0I5LDuAzYIXKCTQYxr0/l9dBYrPGezYVoJekxL1eXapJzJVC30pYkv5K NDDL4kKbQ4PZVfMTQDy1aOGA2bQQQqTDG1PaZNAeO7OLIzGjYMn4qe0A/UdJsXyP7F 7SbIiZ2MzTtMQ== Subject: [PATCH 3/4] xfs: refactor stale buffer scanning for repairs From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:14:01 -0800 Message-ID: <167243844163.699982.15212214018142194596.stgit@magnolia> In-Reply-To: <167243844115.699982.6114366012860370017.stgit@magnolia> References: <167243844115.699982.6114366012860370017.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong In an upcoming patch, we will need to be able to look for xfs_buf objects caching file-based metadata blocks without needing to walk the (possibly corrupt) structures to find all the buffers. Repair already has most of the code needed to scan the buffer cache, so hoist these utility functions. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/reap.c | 73 ++++++++++++++++++++++++++++++++++++--------------- fs/xfs/scrub/reap.h | 20 ++++++++++++++ 2 files changed, 71 insertions(+), 22 deletions(-) diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index ea7a274aa778..b6f89762b00c 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -209,6 +209,48 @@ static inline void xreap_defer_finish_reset(struct xreap_state *rs) rs->force_roll = false; } +/* + * Compute the maximum length of a buffer cache scan (in units of sectors), + * given a quantity of fs blocks. + */ +xfs_daddr_t +xrep_bufscan_max_sectors( + struct xfs_mount *mp, + xfs_extlen_t fsblocks) +{ + int max_fsbs; + + /* Remote xattr values are the largest buffers that we support. */ + max_fsbs = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX); + + return XFS_FSB_TO_BB(mp, min_t(xfs_extlen_t, fsblocks, max_fsbs)); +} + +/* + * Return an incore buffer from a sector scan, or NULL if there are no buffers + * left to return. + */ +struct xfs_buf * +xrep_bufscan_advance( + struct xfs_mount *mp, + struct xrep_bufscan *scan) +{ + scan->__sector_count += scan->daddr_step; + while (scan->__sector_count <= scan->max_sectors) { + struct xfs_buf *bp = NULL; + int error; + + error = xfs_buf_incore(mp->m_ddev_targp, scan->daddr, + scan->__sector_count, XBF_BCACHE_SCAN, &bp); + if (!error) + return bp; + + scan->__sector_count += scan->daddr_step; + } + + return NULL; +} + /* Try to invalidate the incore buffers for an extent that we're freeing. */ STATIC void xreap_agextent_binval( @@ -239,28 +281,15 @@ xreap_agextent_binval( * of any plausible size. */ while (bno < agbno_next) { - xfs_agblock_t fsbcount; - xfs_agblock_t max_fsbs; - - /* - * Max buffer size is the max remote xattr buffer size, which - * is one fs block larger than 64k. - */ - max_fsbs = min_t(xfs_agblock_t, agbno_next - bno, - xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX)); - - for (fsbcount = 1; fsbcount < max_fsbs; fsbcount++) { - struct xfs_buf *bp = NULL; - xfs_daddr_t daddr; - int error; - - daddr = XFS_AGB_TO_DADDR(mp, agno, bno); - error = xfs_buf_incore(mp->m_ddev_targp, daddr, - XFS_FSB_TO_BB(mp, fsbcount), - XBF_BCACHE_SCAN, &bp); - if (error) - continue; - + struct xrep_bufscan scan = { + .daddr = XFS_AGB_TO_DADDR(mp, agno, bno), + .max_sectors = xrep_bufscan_max_sectors(mp, + agbno_next - bno), + .daddr_step = XFS_FSB_TO_BB(mp, 1), + }; + struct xfs_buf *bp; + + while ((bp = xrep_bufscan_advance(mp, &scan)) != NULL) { xfs_trans_bjoin(sc->tp, bp); xfs_trans_binval(sc->tp, bp); rs->invalidated++; diff --git a/fs/xfs/scrub/reap.h b/fs/xfs/scrub/reap.h index 29a8b52a1a54..3c31c795fd1a 100644 --- a/fs/xfs/scrub/reap.h +++ b/fs/xfs/scrub/reap.h @@ -11,4 +11,24 @@ int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap, int xrep_reap_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap, const struct xfs_owner_info *oinfo); +/* Buffer cache scan context. */ +struct xrep_bufscan { + /* Disk address for the buffers we want to scan. */ + xfs_daddr_t daddr; + + /* Maximum number of sectors to scan. */ + xfs_daddr_t max_sectors; + + /* Each round, increment the search length by this number of sectors. */ + xfs_daddr_t daddr_step; + + /* Internal scan state; initialize to zero. */ + xfs_daddr_t __sector_count; +}; + +xfs_daddr_t xrep_bufscan_max_sectors(struct xfs_mount *mp, + xfs_extlen_t fsblocks); +struct xfs_buf *xrep_bufscan_advance(struct xfs_mount *mp, + struct xrep_bufscan *scan); + #endif /* __XFS_SCRUB_REAP_H__ */ From patchwork Fri Dec 30 22:14:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085011 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15D0AC4708E for ; Fri, 30 Dec 2022 23:56:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235623AbiL3X4n (ORCPT ); Fri, 30 Dec 2022 18:56:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235843AbiL3X4k (ORCPT ); Fri, 30 Dec 2022 18:56:40 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54C791740A for ; Fri, 30 Dec 2022 15:56:39 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D85DF61C44 for ; Fri, 30 Dec 2022 23:56:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3ED7AC433D2; Fri, 30 Dec 2022 23:56:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672444598; bh=+vBdp18liCp5zxBRp8oHhGaFKcws7USBMaYbF8Gypd0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Uqc3g/1DUzN5sYIL7Y0vt0nqlnbdu1wOy/hkWd6PQ/kD9i2UtS9GvDANCJDgx+sNr dYiNgywMbKeUgN/v4U4ACmoN9bTkwKsMDq50d4mHSRB4gcxtUQUA3A+yJGX9UWlMyz c6+9e3YSqD5NdCfbuC0gqnMFFy6z5PAXH2qbNF2XfZq8QAnlKCweZbQcQlPjMduUEk 720SWNpeR8gOFrlfKKGKECju9AV7GHV/XKVoPN/5Hfmow9VjNYgQFhKac/3bBIXZoW EwHIW1yORruecTP8woN/DOfwCxXJbKVv9mu06+AffCAY8x17DAche2GhWNkUob3QOC DrzaMPQuIuMgg== Subject: [PATCH 4/4] xfs: add the ability to reap entire inode forks From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:14:01 -0800 Message-ID: <167243844177.699982.15913262492735757661.stgit@magnolia> In-Reply-To: <167243844115.699982.6114366012860370017.stgit@magnolia> References: <167243844115.699982.6114366012860370017.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong In preparation for supporting repair of indexed file-based metadata (such as realtime bitmaps, directories, and extended attribute data), add a function to reap the old blocks after a metadata repair finishes. IOWs, this is an elaborate bunmapi call that deals with crosslinked blocks by unmapping them without freeing them, and also scans for incore buffers to invalidate. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/reap.c | 354 ++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/reap.h | 1 fs/xfs/scrub/trace.h | 63 +++++++++ 3 files changed, 418 insertions(+) diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index b6f89762b00c..f43ad4dfc6f7 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -32,6 +32,7 @@ #include "xfs_da_btree.h" #include "xfs_attr.h" #include "xfs_attr_remote.h" +#include "xfs_defer.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -670,3 +671,356 @@ xrep_reap_fsblocks( return 0; } + +/* + * Metadata files are not supposed to share blocks with anything else. + * If blocks are shared, we remove the reverse mapping (thus reducing the + * crosslink factor); if blocks are not shared, we also need to free them. + * + * This first step determines the longest subset of the passed-in imap + * (starting at its beginning) that is either crosslinked or not crosslinked. + * The blockcount will be adjust down as needed. + */ +STATIC int +xreap_bmapi_select( + struct xfs_scrub *sc, + struct xfs_inode *ip, + int whichfork, + struct xfs_bmbt_irec *imap, + bool *crosslinked) +{ + struct xfs_owner_info oinfo; + struct xfs_btree_cur *cur; + xfs_filblks_t len = 1; + xfs_agblock_t bno; + xfs_agblock_t agbno; + xfs_agblock_t agbno_next; + int error; + + agbno = XFS_FSB_TO_AGBNO(sc->mp, imap->br_startblock); + agbno_next = agbno + imap->br_blockcount; + + cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, sc->sa.agf_bp, + sc->sa.pag); + + xfs_rmap_ino_owner(&oinfo, ip->i_ino, whichfork, imap->br_startoff); + error = xfs_rmap_has_other_keys(cur, agbno, 1, &oinfo, crosslinked); + if (error) + goto out_cur; + + bno = agbno + 1; + while (bno < agbno_next) { + bool also_crosslinked; + + oinfo.oi_offset++; + error = xfs_rmap_has_other_keys(cur, bno, 1, &oinfo, + &also_crosslinked); + if (error) + goto out_cur; + + if (also_crosslinked != *crosslinked) + break; + + len++; + bno++; + } + + imap->br_blockcount = len; + trace_xreap_bmapi_select(sc->sa.pag, agbno, len, *crosslinked); +out_cur: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* + * Decide if this buffer can be joined to a transaction. This is true for most + * buffers, but there are two cases that we want to catch: large remote xattr + * value buffers are not logged and can overflow the buffer log item dirty + * bitmap size; and oversized cached buffers if things have really gone + * haywire. + */ +static inline bool +xreap_buf_loggable( + const struct xfs_buf *bp) +{ + int i; + + for (i = 0; i < bp->b_map_count; i++) { + int chunks; + int map_size; + + chunks = DIV_ROUND_UP(BBTOB(bp->b_maps[i].bm_len), + XFS_BLF_CHUNK); + map_size = DIV_ROUND_UP(chunks, NBWORD); + if (map_size > XFS_BLF_DATAMAP_SIZE) + return false; + } + + return true; +} + +/* + * Invalidate any buffers for this file mapping. The @imap blockcount may be + * adjusted downward if we need to roll the transaction. + */ +STATIC int +xreap_bmapi_binval( + struct xfs_scrub *sc, + struct xfs_inode *ip, + int whichfork, + struct xfs_bmbt_irec *imap) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_perag *pag = sc->sa.pag; + int bmap_flags = xfs_bmapi_aflag(whichfork); + xfs_fileoff_t off; + xfs_fileoff_t max_off; + xfs_extlen_t scan_blocks; + xfs_agnumber_t agno = sc->sa.pag->pag_agno; + xfs_agblock_t bno; + xfs_agblock_t agbno; + xfs_agblock_t agbno_next; + unsigned int invalidated = 0; + int error; + + /* + * Avoid invalidating AG headers and post-EOFS blocks because we never + * own those. + */ + agbno = bno = XFS_FSB_TO_AGBNO(sc->mp, imap->br_startblock); + agbno_next = agbno + imap->br_blockcount; + if (!xfs_verify_agbno(pag, agbno) || + !xfs_verify_agbno(pag, agbno_next - 1)) + return 0; + + /* + * Buffers for file blocks can span multiple contiguous mappings. This + * means that for each block in the mapping, there could exist an + * xfs_buf indexed by that block with any length up to the maximum + * buffer size (remote xattr values) or to the next hole in the fork. + * To set up our binval scan, first we need to figure out the location + * of the next hole. + */ + off = imap->br_startoff + imap->br_blockcount; + max_off = off + xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX); + while (off < max_off) { + struct xfs_bmbt_irec hmap; + int nhmaps = 1; + + error = xfs_bmapi_read(ip, off, max_off - off, &hmap, + &nhmaps, bmap_flags); + if (error) + return error; + if (nhmaps != 1 || hmap.br_startblock == DELAYSTARTBLOCK) { + ASSERT(0); + return -EFSCORRUPTED; + } + + if (!xfs_bmap_is_real_extent(&hmap)) + break; + + off = hmap.br_startoff + hmap.br_blockcount; + } + scan_blocks = off - imap->br_startoff; + + trace_xreap_bmapi_binval_scan(sc, imap, scan_blocks); + + /* + * If there are incore buffers for these blocks, invalidate them. If + * we can't (try)lock the buffer we assume it's owned by someone else + * and leave it alone. The buffer cache cannot detect aliasing, so + * employ nested loops to detect incore buffers of any plausible size. + */ + while (bno < agbno_next) { + struct xrep_bufscan scan = { + .daddr = XFS_AGB_TO_DADDR(mp, agno, bno), + .max_sectors = xrep_bufscan_max_sectors(mp, + scan_blocks), + .daddr_step = XFS_FSB_TO_BB(mp, 1), + }; + struct xfs_buf *bp; + + while ((bp = xrep_bufscan_advance(mp, &scan)) != NULL) { + if (xreap_buf_loggable(bp)) { + xfs_trans_bjoin(sc->tp, bp); + xfs_trans_binval(sc->tp, bp); + } else { + xfs_buf_stale(bp); + xfs_buf_relse(bp); + } + invalidated++; + + /* + * Stop invalidating if we've hit the limit; we should + * still have enough reservation left to free however + * much of the mapping we've seen so far. + */ + if (invalidated > XREAP_MAX_BINVAL) { + imap->br_blockcount = agbno_next - bno; + goto out; + } + } + + bno++; + scan_blocks--; + } + +out: + trace_xreap_bmapi_binval(sc->sa.pag, agbno, imap->br_blockcount); + return 0; +} + +/* + * Dispose of as much of this file extent as we can. Upon successful return, + * the imap will reflect the mapping that was removed from the fork. + */ +STATIC int +xreap_ifork_extent( + struct xfs_scrub *sc, + struct xfs_inode *ip, + int whichfork, + struct xfs_bmbt_irec *imap) +{ + xfs_agnumber_t agno; + bool crosslinked; + int error; + + ASSERT(sc->sa.pag == NULL); + + trace_xreap_ifork_extent(sc, ip, whichfork, imap); + + agno = XFS_FSB_TO_AGNO(sc->mp, imap->br_startblock); + sc->sa.pag = xfs_perag_get(sc->mp, agno); + if (!sc->sa.pag) + return -EFSCORRUPTED; + + error = xfs_alloc_read_agf(sc->sa.pag, sc->tp, 0, &sc->sa.agf_bp); + if (error) + goto out_pag; + + /* + * Decide the fate of the blocks at the beginning of the mapping, then + * update the mapping to use it with the unmap calls. + */ + error = xreap_bmapi_select(sc, ip, whichfork, imap, &crosslinked); + if (error) + goto out_agf; + + if (crosslinked) { + /* + * If there are other rmappings, this block is cross linked and + * must not be freed. Remove the reverse mapping, leave the + * buffer cache in its possibly confused state, and move on. + * We don't want to risk discarding valid data buffers from + * anybody else who thinks they own the block, even though that + * runs the risk of stale buffer warnings in the future. + */ + trace_xreap_dispose_unmap_extent(sc->sa.pag, + XFS_FSB_TO_AGBNO(sc->mp, imap->br_startblock), + imap->br_blockcount); + + /* + * Schedule removal of the mapping from the fork. We use + * deferred log intents in this function to control the exact + * sequence of metadata updates. + */ + xfs_bmap_unmap_extent(sc->tp, ip, whichfork, imap); + xfs_trans_mod_dquot_byino(sc->tp, ip, XFS_TRANS_DQ_BCOUNT, + -(int64_t)imap->br_blockcount); + xfs_rmap_unmap_extent(sc->tp, ip, whichfork, imap); + } else { + /* + * If the block is not crosslinked, we can invalidate all the + * incore buffers for the extent, and then free the extent. + * This is a bit of a mess since we don't detect discontiguous + * buffers that are indexed by a block starting before the + * first block of the extent but overlap anyway. + */ + trace_xreap_dispose_free_extent(sc->sa.pag, + XFS_FSB_TO_AGBNO(sc->mp, imap->br_startblock), + imap->br_blockcount); + + /* + * Invalidate as many buffers as we can, starting at the + * beginning of this mapping. If this function sets blockcount + * to zero, the transaction is full of logged buffer + * invalidations, so we need to return early so that we can + * roll and retry. + */ + error = xreap_bmapi_binval(sc, ip, whichfork, imap); + if (error || imap->br_blockcount == 0) + goto out_agf; + + /* + * Schedule removal of the mapping from the fork. We use + * deferred log intents in this function to control the exact + * sequence of metadata updates. + */ + xfs_bmap_unmap_extent(sc->tp, ip, whichfork, imap); + xfs_trans_mod_dquot_byino(sc->tp, ip, XFS_TRANS_DQ_BCOUNT, + -(int64_t)imap->br_blockcount); + __xfs_free_extent_later(sc->tp, imap->br_startblock, + imap->br_blockcount, NULL, true); + } + +out_agf: + xfs_trans_brelse(sc->tp, sc->sa.agf_bp); + sc->sa.agf_bp = NULL; +out_pag: + xfs_perag_put(sc->sa.pag); + sc->sa.pag = NULL; + return error; +} + +/* + * Dispose of each block mapped to the given fork of the given file. Callers + * must hold ILOCK_EXCL, and ip can only be sc->ip or sc->tempip. The fork + * must not have any delalloc reservations. + */ +int +xrep_reap_ifork( + struct xfs_scrub *sc, + struct xfs_inode *ip, + int whichfork) +{ + xfs_fileoff_t off = 0; + int bmap_flags = xfs_bmapi_aflag(whichfork); + int error; + + ASSERT(xfs_has_rmapbt(sc->mp)); + ASSERT(ip == sc->ip || ip == sc->tempip); + ASSERT(whichfork == XFS_ATTR_FORK || !XFS_IS_REALTIME_INODE(ip)); + + while (off < XFS_MAX_FILEOFF) { + struct xfs_bmbt_irec imap; + int nimaps = 1; + + /* Read the next extent, skip past holes and delalloc. */ + error = xfs_bmapi_read(ip, off, XFS_MAX_FILEOFF - off, &imap, + &nimaps, bmap_flags); + if (error) + return error; + if (nimaps != 1 || imap.br_startblock == DELAYSTARTBLOCK) { + ASSERT(0); + return -EFSCORRUPTED; + } + + /* + * If this is a real space mapping, reap as much of it as we + * can in a single transaction. + */ + if (xfs_bmap_is_real_extent(&imap)) { + error = xreap_ifork_extent(sc, ip, whichfork, &imap); + if (error) + return error; + + error = xfs_defer_finish(&sc->tp); + if (error) + return error; + } + + off = imap.br_startoff + imap.br_blockcount; + } + + return 0; +} diff --git a/fs/xfs/scrub/reap.h b/fs/xfs/scrub/reap.h index 3c31c795fd1a..6606b119b9ec 100644 --- a/fs/xfs/scrub/reap.h +++ b/fs/xfs/scrub/reap.h @@ -10,6 +10,7 @@ int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap, const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type); int xrep_reap_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap, const struct xfs_owner_info *oinfo); +int xrep_reap_ifork(struct xfs_scrub *sc, struct xfs_inode *ip, int whichfork); /* Buffer cache scan context. */ struct xrep_bufscan { diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index b1a39a206730..b80de49ae831 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -1356,6 +1356,7 @@ DEFINE_EVENT(xrep_extent_class, name, \ DEFINE_REPAIR_EXTENT_EVENT(xreap_dispose_unmap_extent); DEFINE_REPAIR_EXTENT_EVENT(xreap_dispose_free_extent); DEFINE_REPAIR_EXTENT_EVENT(xreap_agextent_binval); +DEFINE_REPAIR_EXTENT_EVENT(xreap_bmapi_binval); DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert); DECLARE_EVENT_CLASS(xrep_reap_find_class, @@ -1389,6 +1390,7 @@ DEFINE_EVENT(xrep_reap_find_class, name, \ bool crosslinked), \ TP_ARGS(pag, agbno, len, crosslinked)) DEFINE_REPAIR_REAP_FIND_EVENT(xreap_agextent_select); +DEFINE_REPAIR_REAP_FIND_EVENT(xreap_bmapi_select); DECLARE_EVENT_CLASS(xrep_rmap_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, @@ -2190,6 +2192,67 @@ TRACE_EVENT(xrep_tempfile_create, __entry->temp_inum) ); +TRACE_EVENT(xreap_ifork_extent, + TP_PROTO(struct xfs_scrub *sc, struct xfs_inode *ip, int whichfork, + const struct xfs_bmbt_irec *irec), + TP_ARGS(sc, ip, whichfork, irec), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino) + __field(int, whichfork) + __field(xfs_fileoff_t, fileoff) + __field(xfs_filblks_t, len) + __field(xfs_agnumber_t, agno) + __field(xfs_agblock_t, agbno) + __field(int, state) + ), + TP_fast_assign( + __entry->dev = sc->mp->m_super->s_dev; + __entry->ino = ip->i_ino; + __entry->whichfork = whichfork; + __entry->fileoff = irec->br_startoff; + __entry->len = irec->br_blockcount; + __entry->agno = XFS_FSB_TO_AGNO(sc->mp, irec->br_startblock); + __entry->agbno = XFS_FSB_TO_AGBNO(sc->mp, irec->br_startblock); + __entry->state = irec->br_state; + ), + TP_printk("dev %d:%d ip 0x%llx whichfork %s agno 0x%x agbno 0x%x fileoff 0x%llx fsbcount 0x%llx state 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, + __print_symbolic(__entry->whichfork, XFS_WHICHFORK_STRINGS), + __entry->agno, + __entry->agbno, + __entry->fileoff, + __entry->len, + __entry->state) +); + +TRACE_EVENT(xreap_bmapi_binval_scan, + TP_PROTO(struct xfs_scrub *sc, const struct xfs_bmbt_irec *irec, + xfs_extlen_t scan_blocks), + TP_ARGS(sc, irec, scan_blocks), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_filblks_t, len) + __field(xfs_agnumber_t, agno) + __field(xfs_agblock_t, agbno) + __field(xfs_extlen_t, scan_blocks) + ), + TP_fast_assign( + __entry->dev = sc->mp->m_super->s_dev; + __entry->len = irec->br_blockcount; + __entry->agno = XFS_FSB_TO_AGNO(sc->mp, irec->br_startblock); + __entry->agbno = XFS_FSB_TO_AGBNO(sc->mp, irec->br_startblock); + __entry->scan_blocks = scan_blocks; + ), + TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%llx scan_blocks 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agbno, + __entry->len, + __entry->scan_blocks) +); + #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */