From patchwork Tue Apr 16 01:41:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13631074 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5C9B5227 for ; Tue, 16 Apr 2024 01:41:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231701; cv=none; b=NZ5ucFFYbQ8pt06yRnj+NuAzFtozUcWCJoSWqDvRcKrDpymZL6bbfGoZmmHy+wUpyaCSeVQ/wQz+nyY6bzAZSGK7Fj2NkcbqkPYgJeXmUwd4h/stgdB2pcSslQnGgRKcOPnHSFQXMT+aymMi3vA/AC8u8pRd/QOqaNrkfedHfRI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231701; c=relaxed/simple; bh=nDbQUX2kXKdvlI5CHGsbH6BjKMZnINMaem7HTfyMKwg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PAs+hkRjii3j8nR2TLI16+yhvr7WG92Kqasaugl1ICGm8qM++hMqsOsg1RwPkJ90pJeSRl20TXpPs1apenM000VoDqwVN6xHhEISRSMx+9tJJ1JEjdDoCqrNx3aUl5jYRyPtlT+16vMvhscl3MaPz1DXJUYlizjIGPUWU0MF44I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=u4cvf9RF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="u4cvf9RF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49586C113CC; Tue, 16 Apr 2024 01:41:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713231701; bh=nDbQUX2kXKdvlI5CHGsbH6BjKMZnINMaem7HTfyMKwg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=u4cvf9RFlrHSOOWfpNOnYqcqKwpvLr96sNtyo/wEf8IBTG+KbSSClNJHUiVmFviaL 2p+/i5pgfCx+Fv0L1jBgAYu0ezqT5ojqqH9lvXrt0YO3ce7p3jYdK76+Id/VQGDZPS rt01JTr1f+OM3jeZT0S4w6AHnVdZ/OAMaV9t+M/K/9y7JY4v3pERkkcZcyS4qulEjG SQzm8GfDlYwsZUNzrqLWuAdyu6ftJr2dhcn6Qnjmt7vBZmOPPVF3xwRG+YCllXydqO aPy0JfkPh+LxtWCasAvYkfe/rg7ICfTGoBYwmGJ784W0hix/mJN9+GXPsZEvBMoaM/ XOQ0ARTp5xysw== Date: Mon, 15 Apr 2024 18:41:40 -0700 Subject: [PATCH 1/4] xfs: reduce the rate of cond_resched calls inside scrub From: "Darrick J. Wong" To: djwong@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org, hch@lst.de, hch@infradead.org Message-ID: <171323030260.253873.3709400504100908629.stgit@frogsfrogsfrogs> In-Reply-To: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> References: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong We really don't want to call cond_resched every single time we go through a loop in scrub -- there may be billions of records, and probing into the scheduler itself has overhead. Reduce this overhead by only calling cond_resched 10x per second; and add a counter so that we only check jiffies once every 1000 records or so. Surprisingly, this reduces scrub-only fstests runtime by about 2%. I used the bmapinflate xfs_db command to produce a billion-extent file and this stupid gadget reduced the scrub runtime by about 4%. From a stupid microbenchmark of calling these things 1 billion times, I estimate that cond_resched costs about 5.5ns per call; jiffes costs about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per call. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/common.h | 25 ------------------- fs/xfs/scrub/scrub.c | 1 + fs/xfs/scrub/scrub.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/xfarray.c | 10 ++++---- fs/xfs/scrub/xfarray.h | 3 ++ fs/xfs/scrub/xfile.c | 2 +- 6 files changed, 74 insertions(+), 31 deletions(-) diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 39465e39dc5fd..3d5f1f6b4b7bf 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -6,31 +6,6 @@ #ifndef __XFS_SCRUB_COMMON_H__ #define __XFS_SCRUB_COMMON_H__ -/* - * We /could/ terminate a scrub/repair operation early. If we're not - * in a good place to continue (fatal signal, etc.) then bail out. - * Note that we're careful not to make any judgements about *error. - */ -static inline bool -xchk_should_terminate( - struct xfs_scrub *sc, - int *error) -{ - /* - * If preemption is disabled, we need to yield to the scheduler every - * few seconds so that we don't run afoul of the soft lockup watchdog - * or RCU stall detector. - */ - cond_resched(); - - if (fatal_signal_pending(current)) { - if (*error == 0) - *error = -EINTR; - return true; - } - return false; -} - int xchk_trans_alloc(struct xfs_scrub *sc, uint resblks); int xchk_trans_alloc_empty(struct xfs_scrub *sc); void xchk_trans_cancel(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index e813b66b603a1..4a81f828f9f13 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -620,6 +620,7 @@ xfs_scrub_metadata( sc->sm = sm; sc->ops = &meta_scrub_ops[sm->sm_type]; sc->sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type); + sc->relax = INIT_XCHK_RELAX; retry_op: /* * When repairs are allowed, prevent freezing or readonly remount while diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 3910270471462..4e7e3edb6350c 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -8,6 +8,49 @@ struct xfs_scrub; +struct xchk_relax { + unsigned long next_resched; + unsigned int resched_nr; + bool interruptible; +}; + +/* Yield to the scheduler at most 10x per second. */ +#define XCHK_RELAX_NEXT (jiffies + (HZ / 10)) + +#define INIT_XCHK_RELAX \ + (struct xchk_relax){ \ + .next_resched = XCHK_RELAX_NEXT, \ + .resched_nr = 0, \ + .interruptible = true, \ + } + +/* + * Relax during a scrub operation and exit if there's a fatal signal pending. + * + * If preemption is disabled, we need to yield to the scheduler every now and + * then so that we don't run afoul of the soft lockup watchdog or RCU stall + * detector. cond_resched calls are somewhat expensive (~5ns) so we want to + * ratelimit this to 10x per second. Amortize the cost of the other checks by + * only doing it once every 100 calls. + */ +static inline int xchk_maybe_relax(struct xchk_relax *widget) +{ + /* Amortize the cost of scheduling and checking signals. */ + if (likely(++widget->resched_nr < 100)) + return 0; + widget->resched_nr = 0; + + if (unlikely(widget->next_resched <= jiffies)) { + cond_resched(); + widget->next_resched = XCHK_RELAX_NEXT; + } + + if (widget->interruptible && fatal_signal_pending(current)) + return -EINTR; + + return 0; +} + /* * Standard flags for allocating memory within scrub. NOFS context is * configured by the process allocation scope. Scrub and repair must be able @@ -123,6 +166,9 @@ struct xfs_scrub { */ unsigned int sick_mask; + /* next time we want to cond_resched() */ + struct xchk_relax relax; + /* State tracking for single-AG operations. */ struct xchk_ag sa; }; @@ -167,6 +213,24 @@ struct xfs_scrub_subord *xchk_scrub_create_subord(struct xfs_scrub *sc, unsigned int subtype); void xchk_scrub_free_subord(struct xfs_scrub_subord *sub); +/* + * We /could/ terminate a scrub/repair operation early. If we're not + * in a good place to continue (fatal signal, etc.) then bail out. + * Note that we're careful not to make any judgements about *error. + */ +static inline bool +xchk_should_terminate( + struct xfs_scrub *sc, + int *error) +{ + if (xchk_maybe_relax(&sc->relax)) { + if (*error == 0) + *error = -EINTR; + return true; + } + return false; +} + /* Metadata scrubbers */ int xchk_tester(struct xfs_scrub *sc); int xchk_superblock(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c index b65cd3fc5ac9b..9185ae7088d49 100644 --- a/fs/xfs/scrub/xfarray.c +++ b/fs/xfs/scrub/xfarray.c @@ -7,9 +7,9 @@ #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" +#include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfarray.h" -#include "scrub/scrub.h" #include "scrub/trace.h" /* @@ -486,6 +486,9 @@ xfarray_sortinfo_alloc( xfarray_sortinfo_lo(si)[0] = 0; xfarray_sortinfo_hi(si)[0] = array->nr - 1; + si->relax = INIT_XCHK_RELAX; + if (flags & XFARRAY_SORT_KILLABLE) + si->relax.interruptible = false; trace_xfarray_sort(si, nr_bytes); *infop = si; @@ -503,10 +506,7 @@ xfarray_sort_terminated( * few seconds so that we don't run afoul of the soft lockup watchdog * or RCU stall detector. */ - cond_resched(); - - if ((si->flags & XFARRAY_SORT_KILLABLE) && - fatal_signal_pending(current)) { + if (xchk_maybe_relax(&si->relax)) { if (*error == 0) *error = -EINTR; return true; diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h index 8f54c8fc888fa..5eeeeed13ae24 100644 --- a/fs/xfs/scrub/xfarray.h +++ b/fs/xfs/scrub/xfarray.h @@ -127,6 +127,9 @@ struct xfarray_sortinfo { /* XFARRAY_SORT_* flags; see below. */ unsigned int flags; + /* next time we want to cond_resched() */ + struct xchk_relax relax; + /* Cache a folio here for faster scanning for pivots */ struct folio *folio; diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c index 4e254a0ba0036..d848222f802ba 100644 --- a/fs/xfs/scrub/xfile.c +++ b/fs/xfs/scrub/xfile.c @@ -10,9 +10,9 @@ #include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfarray.h" -#include "scrub/scrub.h" #include "scrub/trace.h" #include From patchwork Tue Apr 16 01:41:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13631075 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5576FE545 for ; Tue, 16 Apr 2024 01:41:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231717; cv=none; b=j3dVNtfMuAuYqe/GevYrbkhXNDNq5ChNpK1N1iTptV61OHhKU4KLsaVElSJ5UKxESHYQawTbO+NhMuaRdJirp7t3uKcj46Mot1MQi5SRyQIUOmShtrR9h/NlBd48JqA1Ogtbq+a/B2wB9uSTTOCYobTj9CLWnAxlhvt9Sp1Kcjs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231717; c=relaxed/simple; bh=z6zhJT0U1xUzup3vwjjF5s0syDGfEghiBAORKW/Sr2U=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kOEriyWHtKH3FaRa5/w0H0q+2Mj0zBFfrmxB4GoeI8Nn1M9MINKKJrqj+BwFFXi1/Kk8eTTFLediIRw02hZogPvm44dkIo5+bsf45KaptBQYA0OvaGaKiHtFetHYbvElYE40HWrx81RGpO9qlloI5b6N2mBhhmxIci3+YzSjYao= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=husROhD4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="husROhD4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2714C113CC; Tue, 16 Apr 2024 01:41:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713231716; bh=z6zhJT0U1xUzup3vwjjF5s0syDGfEghiBAORKW/Sr2U=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=husROhD4l7Ng1crObwl2YnqOD81d/fR0dUKk2uKD1PTnA1gaYM/UoKjmJBZinOiJU kqojW75p8QzaiiAOyeKH3O7kmbiKo8GzxfmfogsB0hKDOx3woNRMrXQzEgMxRvPL9K lfS3NVw7YEE3kmCmHc/kBWZ4ry+SVEPDor2eEiEJbAwRWI7vLawuKlyuKCmKuV4r1V itYs0FE/vi1eoNRT/qYPoRQdhpv1twKIYrzhVkOMXNfav7DsIeRDWGB5r5TuagzUll hpbM+ujNv5G8aTxkSyxqNDZKeS/28tp2Ohb4ThuL2ZykqLjeosA//3kDFMZhs7mA7T i+ndoDfbG2Rgg== Date: Mon, 15 Apr 2024 18:41:56 -0700 Subject: [PATCH 2/4] xfs: move xfs_ioc_scrub_metadata to scrub.c From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de, hch@infradead.org Message-ID: <171323030277.253873.12950334854150989191.stgit@frogsfrogsfrogs> In-Reply-To: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> References: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Move the scrub ioctl handler to scrub.c to keep the code together and to reduce unnecessary code when CONFIG_XFS_ONLINE_SCRUB=n. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/scrub.c | 27 ++++++++++++++++++++++++++- fs/xfs/scrub/xfs_scrub.h | 4 ++-- fs/xfs/xfs_ioctl.c | 24 ------------------------ 3 files changed, 28 insertions(+), 27 deletions(-) diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 4a81f828f9f13..1456cc11c406d 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -578,7 +578,7 @@ xchk_scrub_create_subord( } /* Dispatch metadata scrubbing. */ -int +STATIC int xfs_scrub_metadata( struct file *file, struct xfs_scrub_metadata *sm) @@ -724,3 +724,28 @@ xfs_scrub_metadata( run.retries++; goto retry_op; } + +/* Scrub one aspect of one piece of metadata. */ +int +xfs_ioc_scrub_metadata( + struct file *file, + void __user *arg) +{ + struct xfs_scrub_metadata scrub; + int error; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&scrub, arg, sizeof(scrub))) + return -EFAULT; + + error = xfs_scrub_metadata(file, &scrub); + if (error) + return error; + + if (copy_to_user(arg, &scrub, sizeof(scrub))) + return -EFAULT; + + return 0; +} diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h index a39befa743ce0..02c930f175d0b 100644 --- a/fs/xfs/scrub/xfs_scrub.h +++ b/fs/xfs/scrub/xfs_scrub.h @@ -7,9 +7,9 @@ #define __XFS_SCRUB_H__ #ifndef CONFIG_XFS_ONLINE_SCRUB -# define xfs_scrub_metadata(file, sm) (-ENOTTY) +# define xfs_ioc_scrub_metadata(f, a) (-ENOTTY) #else -int xfs_scrub_metadata(struct file *file, struct xfs_scrub_metadata *sm); +int xfs_ioc_scrub_metadata(struct file *file, void __user *arg); #endif /* CONFIG_XFS_ONLINE_SCRUB */ #endif /* __XFS_SCRUB_H__ */ diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 6055053a8f6b2..87a45d4dbbd7c 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1055,30 +1055,6 @@ xfs_ioc_getfsmap( return error; } -STATIC int -xfs_ioc_scrub_metadata( - struct file *file, - void __user *arg) -{ - struct xfs_scrub_metadata scrub; - int error; - - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - - if (copy_from_user(&scrub, arg, sizeof(scrub))) - return -EFAULT; - - error = xfs_scrub_metadata(file, &scrub); - if (error) - return error; - - if (copy_to_user(arg, &scrub, sizeof(scrub))) - return -EFAULT; - - return 0; -} - int xfs_ioc_swapext( xfs_swapext_t *sxp) From patchwork Tue Apr 16 01:42:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13631076 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A5F26AC2 for ; Tue, 16 Apr 2024 01:42:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231733; cv=none; b=mqHnsJ8aSPCDMnuC2RZWSse/dAqQFNfB1XK3918G36P6IW4hqPYQLc78C1f9uLKsDehdIwRgC2D01nr1BWeLl7DUcXFIeMjOCA4h5eHH6QRsldSEo9QKAHiPZyg7cxmPSe1FcvwlXGHO70E1eZPVw35i5olIevNTuYteyh83omw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231733; c=relaxed/simple; bh=1tuG0uOIjNnhS6hac858zj94UjTdz2s8Xx374vlGIek=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=K3gSyv8LWFhE/SgFzzlerLLxe6LwrzoQiMPpmSrmla5fqTUat4smBGrk0n7Iuh6RJ0spgKTbMux+yx9hTYl+y7ZXLoeHMJJsWsCnveOSaYx56jjXoQKoqhuVSQr+wVvTKyIdx1w7b564v1as8I/NinLC8T+Sxso/lhT4I2vtnmY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NeBBqFQk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NeBBqFQk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 86B3DC113CC; Tue, 16 Apr 2024 01:42:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713231732; bh=1tuG0uOIjNnhS6hac858zj94UjTdz2s8Xx374vlGIek=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=NeBBqFQk7/OBTBjQ0gwEtC5BM6QNZLwov8e0ykLwsnpD4lDZFgz5o+VC9TTQ+NGyz Lt4hg99VTnBXEJY7yIlPsP5/7XTCBQ6zDVLV46EfV59/kWSbE2M9vPf9sbX+WJYqVp iWKbfOjsWBlLW8nbh7PuPS9T9srIwUVMAbe6memnFGgtWDsohQIwvyDgj22GtKTCtR IgVg6/40PLLFyvgcqr/zoUa6ID4RVLvu7orzzv8bL1928KX4Ko4HyVOLvWNb92Q3p9 JvA0/qa6qvijo1WC3gfap1CWYZOUZ41oxASVnjF9T1CdosCWIB1o1sP+xC9+g2cM0p 5BgcUfplgggFg== Date: Mon, 15 Apr 2024 18:42:12 -0700 Subject: [PATCH 3/4] xfs: introduce vectored scrub mode From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de, hch@infradead.org Message-ID: <171323030293.253873.15581752242911696791.stgit@frogsfrogsfrogs> In-Reply-To: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> References: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored mode. The caller specifies the principal metadata object that they want to scrub (allocation group, inode, etc.) once, followed by an array of scrub types they want called on that object. The kernel runs the scrub operations and writes the output flags and errno code to the corresponding array element. A new pseudo scrub type BARRIER is introduced to force the kernel to return to userspace if any corruptions have been found when scrubbing the previous scrub types in the array. This enables userspace to schedule, for example, the sequence: 1. data fork 2. barrier 3. directory If the data fork scrub is clean, then the kernel will perform the directory scrub. If not, the barrier in 2 will exit back to userspace. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime due to fewer transitions across the system call boundary. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_fs.h | 33 ++++++++++ fs/xfs/scrub/scrub.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/trace.h | 79 ++++++++++++++++++++++++ fs/xfs/scrub/xfs_scrub.h | 2 + fs/xfs/xfs_ioctl.c | 2 + 5 files changed, 264 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index cc2ee5e0183d1..0071e6b57c561 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -725,6 +725,15 @@ struct xfs_scrub_metadata { /* Number of scrub subcommands. */ #define XFS_SCRUB_TYPE_NR 29 +/* + * This special type code only applies to the vectored scrub implementation. + * + * If any of the previous scrub vectors recorded runtime errors or have + * sv_flags bits set that match the OFLAG bits in the barrier vector's + * sv_flags, set the barrier's sv_ret to -ECANCELED and return to userspace. + */ +#define XFS_SCRUB_TYPE_BARRIER (0xFFFFFFFF) + /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) @@ -769,6 +778,29 @@ struct xfs_scrub_metadata { XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED) #define XFS_SCRUB_FLAGS_ALL (XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT) +/* Vectored scrub calls to reduce the number of kernel transitions. */ + +struct xfs_scrub_vec { + __u32 sv_type; /* XFS_SCRUB_TYPE_* */ + __u32 sv_flags; /* XFS_SCRUB_FLAGS_* */ + __s32 sv_ret; /* 0 or a negative error code */ + __u32 sv_reserved; /* must be zero */ +}; + +/* Vectored metadata scrub control structure. */ +struct xfs_scrub_vec_head { + __u64 svh_ino; /* inode number. */ + __u32 svh_gen; /* inode generation. */ + __u32 svh_agno; /* ag number. */ + __u32 svh_flags; /* XFS_SCRUB_VEC_FLAGS_* */ + __u16 svh_rest_us; /* wait this much time between vector items */ + __u16 svh_nr; /* number of svh_vectors */ + __u64 svh_reserved; /* must be zero */ + __u64 svh_vectors; /* pointer to buffer of xfs_scrub_vec */ +}; + +#define XFS_SCRUB_VEC_FLAGS_ALL (0) + /* * ioctl limits */ @@ -951,6 +983,7 @@ struct xfs_getparents_by_handle { #define XFS_IOC_AG_GEOMETRY _IOWR('X', 61, struct xfs_ag_geometry) #define XFS_IOC_GETPARENTS _IOWR('X', 62, struct xfs_getparents) #define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 63, struct xfs_getparents_by_handle) +#define XFS_IOC_SCRUBV_METADATA _IOWR('X', 64, struct xfs_scrub_vec_head) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 1456cc11c406d..78b00ab85c9c9 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -21,6 +21,7 @@ #include "xfs_exchmaps.h" #include "xfs_dir2.h" #include "xfs_parent.h" +#include "xfs_icache.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -749,3 +750,151 @@ xfs_ioc_scrub_metadata( return 0; } + +/* Decide if there have been any scrub failures up to this point. */ +static inline int +xfs_scrubv_check_barrier( + struct xfs_mount *mp, + const struct xfs_scrub_vec *vectors, + const struct xfs_scrub_vec *stop_vec) +{ + const struct xfs_scrub_vec *v; + __u32 failmask; + + failmask = stop_vec->sv_flags & XFS_SCRUB_FLAGS_OUT; + + for (v = vectors; v < stop_vec; v++) { + if (v->sv_type == XFS_SCRUB_TYPE_BARRIER) + continue; + + /* + * Runtime errors count as a previous failure, except the ones + * used to ask userspace to retry. + */ + switch (v->sv_ret) { + case -EBUSY: + case -ENOENT: + case -EUSERS: + case 0: + break; + default: + return -ECANCELED; + } + + /* + * If any of the out-flags on the scrub vector match the mask + * that was set on the barrier vector, that's a previous fail. + */ + if (v->sv_flags & failmask) + return -ECANCELED; + } + + return 0; +} + +/* Vectored scrub implementation to reduce ioctl calls. */ +int +xfs_ioc_scrubv_metadata( + struct file *file, + void __user *arg) +{ + struct xfs_scrub_vec_head head; + struct xfs_scrub_vec_head __user *uhead = arg; + struct xfs_scrub_vec *vectors; + struct xfs_scrub_vec __user *uvectors; + struct xfs_inode *ip_in = XFS_I(file_inode(file)); + struct xfs_mount *mp = ip_in->i_mount; + struct xfs_scrub_vec *v; + size_t vec_bytes; + unsigned int i; + int error = 0; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&head, uhead, sizeof(head))) + return -EFAULT; + + if (head.svh_reserved) + return -EINVAL; + if (head.svh_flags & ~XFS_SCRUB_VEC_FLAGS_ALL) + return -EINVAL; + if (head.svh_nr == 0) + return 0; + + vec_bytes = array_size(head.svh_nr, sizeof(struct xfs_scrub_vec)); + if (vec_bytes > PAGE_SIZE) + return -ENOMEM; + + uvectors = (void __user *)(uintptr_t)head.svh_vectors; + vectors = memdup_user(uvectors, vec_bytes); + if (IS_ERR(vectors)) + return PTR_ERR(vectors); + + trace_xchk_scrubv_start(ip_in, &head); + + for (i = 0, v = vectors; i < head.svh_nr; i++, v++) { + if (v->sv_reserved) { + error = -EINVAL; + goto out_free; + } + + if (v->sv_type == XFS_SCRUB_TYPE_BARRIER && + (v->sv_flags & ~XFS_SCRUB_FLAGS_OUT)) { + error = -EINVAL; + goto out_free; + } + + trace_xchk_scrubv_item(mp, &head, i, v); + } + + /* Run all the scrubbers. */ + for (i = 0, v = vectors; i < head.svh_nr; i++, v++) { + struct xfs_scrub_metadata sm = { + .sm_type = v->sv_type, + .sm_flags = v->sv_flags, + .sm_ino = head.svh_ino, + .sm_gen = head.svh_gen, + .sm_agno = head.svh_agno, + }; + + if (v->sv_type == XFS_SCRUB_TYPE_BARRIER) { + v->sv_ret = xfs_scrubv_check_barrier(mp, vectors, v); + if (v->sv_ret) { + trace_xchk_scrubv_barrier_fail(mp, &head, i, v); + break; + } + + continue; + } + + v->sv_ret = xfs_scrub_metadata(file, &sm); + v->sv_flags = sm.sm_flags; + + trace_xchk_scrubv_outcome(mp, &head, i, v); + + if (head.svh_rest_us) { + ktime_t expires; + + expires = ktime_add_ns(ktime_get(), + head.svh_rest_us * 1000); + set_current_state(TASK_KILLABLE); + schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); + } + + if (fatal_signal_pending(current)) { + error = -EINTR; + goto out_free; + } + } + + if (copy_to_user(uvectors, vectors, vec_bytes) || + copy_to_user(uhead, &head, sizeof(head))) { + error = -EFAULT; + goto out_free; + } + +out_free: + kfree(vectors); + return error; +} diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index b3756722bee1d..8ce74bd8530a6 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -69,6 +69,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_QUOTACHECK); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_NLINKS); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_DIRTREE); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_BARRIER); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -99,7 +100,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_DIRTREE); { XFS_SCRUB_TYPE_QUOTACHECK, "quotacheck" }, \ { XFS_SCRUB_TYPE_NLINKS, "nlinks" }, \ { XFS_SCRUB_TYPE_HEALTHY, "healthy" }, \ - { XFS_SCRUB_TYPE_DIRTREE, "dirtree" } + { XFS_SCRUB_TYPE_DIRTREE, "dirtree" }, \ + { XFS_SCRUB_TYPE_BARRIER, "barrier" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ @@ -208,6 +210,81 @@ DEFINE_EVENT(xchk_fsgate_class, name, \ DEFINE_SCRUB_FSHOOK_EVENT(xchk_fsgates_enable); DEFINE_SCRUB_FSHOOK_EVENT(xchk_fsgates_disable); +DECLARE_EVENT_CLASS(xchk_vector_head_class, + TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_vec_head *vhead), + TP_ARGS(ip, vhead), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino) + __field(xfs_agnumber_t, agno) + __field(xfs_ino_t, inum) + __field(unsigned int, gen) + __field(unsigned int, flags) + __field(unsigned short, rest_us) + __field(unsigned short, nr_vecs) + ), + TP_fast_assign( + __entry->dev = ip->i_mount->m_super->s_dev; + __entry->ino = ip->i_ino; + __entry->agno = vhead->svh_agno; + __entry->inum = vhead->svh_ino; + __entry->gen = vhead->svh_gen; + __entry->flags = vhead->svh_flags; + __entry->rest_us = vhead->svh_rest_us; + __entry->nr_vecs = vhead->svh_nr; + ), + TP_printk("dev %d:%d ino 0x%llx agno 0x%x inum 0x%llx gen 0x%x flags 0x%x rest_us %u nr_vecs %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, + __entry->agno, + __entry->inum, + __entry->gen, + __entry->flags, + __entry->rest_us, + __entry->nr_vecs) +) +#define DEFINE_SCRUBV_HEAD_EVENT(name) \ +DEFINE_EVENT(xchk_vector_head_class, name, \ + TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_vec_head *vhead), \ + TP_ARGS(ip, vhead)) + +DEFINE_SCRUBV_HEAD_EVENT(xchk_scrubv_start); + +DECLARE_EVENT_CLASS(xchk_vector_class, + TP_PROTO(struct xfs_mount *mp, struct xfs_scrub_vec_head *vhead, + unsigned int vec_nr, struct xfs_scrub_vec *v), + TP_ARGS(mp, vhead, vec_nr, v), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, vec_nr) + __field(unsigned int, vec_type) + __field(unsigned int, vec_flags) + __field(int, vec_ret) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->vec_nr = vec_nr; + __entry->vec_type = v->sv_type; + __entry->vec_flags = v->sv_flags; + __entry->vec_ret = v->sv_ret; + ), + TP_printk("dev %d:%d vec[%u] type %s flags %s ret %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->vec_nr, + __print_symbolic(__entry->vec_type, XFS_SCRUB_TYPE_STRINGS), + __print_flags(__entry->vec_flags, "|", XFS_SCRUB_FLAG_STRINGS), + __entry->vec_ret) +) +#define DEFINE_SCRUBV_EVENT(name) \ +DEFINE_EVENT(xchk_vector_class, name, \ + TP_PROTO(struct xfs_mount *mp, struct xfs_scrub_vec_head *vhead, \ + unsigned int vec_nr, struct xfs_scrub_vec *v), \ + TP_ARGS(mp, vhead, vec_nr, v)) + +DEFINE_SCRUBV_EVENT(xchk_scrubv_barrier_fail); +DEFINE_SCRUBV_EVENT(xchk_scrubv_item); +DEFINE_SCRUBV_EVENT(xchk_scrubv_outcome); + TRACE_EVENT(xchk_op_error, TP_PROTO(struct xfs_scrub *sc, xfs_agnumber_t agno, xfs_agblock_t bno, int error, void *ret_ip), diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h index 02c930f175d0b..f17173b83e6f3 100644 --- a/fs/xfs/scrub/xfs_scrub.h +++ b/fs/xfs/scrub/xfs_scrub.h @@ -8,8 +8,10 @@ #ifndef CONFIG_XFS_ONLINE_SCRUB # define xfs_ioc_scrub_metadata(f, a) (-ENOTTY) +# define xfs_ioc_scrubv_metadata(f, a) (-ENOTTY) #else int xfs_ioc_scrub_metadata(struct file *file, void __user *arg); +int xfs_ioc_scrubv_metadata(struct file *file, void __user *arg); #endif /* CONFIG_XFS_ONLINE_SCRUB */ #endif /* __XFS_SCRUB_H__ */ diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 87a45d4dbbd7c..6e094208d80e2 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1413,6 +1413,8 @@ xfs_file_ioctl( case FS_IOC_GETFSMAP: return xfs_ioc_getfsmap(ip, arg); + case XFS_IOC_SCRUBV_METADATA: + return xfs_ioc_scrubv_metadata(filp, arg); case XFS_IOC_SCRUB_METADATA: return xfs_ioc_scrub_metadata(filp, arg); From patchwork Tue Apr 16 01:42:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13631077 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4CBF6AB9 for ; Tue, 16 Apr 2024 01:42:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231748; cv=none; b=Bjd4TGTht2ZAWauuIrta7awGsKxQIhkJ7cglMuyypyXB6bkafsq04A43fCLAp4Aztd2Np3OzDYEXH1f5AKh23Xsmk1LPb18BDoFVEXdowtvqVl4hQkGnAy+t78dMAqgmONn3xg2xTob8Up+zbpyerIldhEFRIcht10ZPvb5n3ws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713231748; c=relaxed/simple; bh=JUy/LuvUD1YzrQCVTbIGHFA/kd3Wzrx2eXKRidV+2Rk=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=o2z2ONDufkUrde2ZyKnawYQ02vxn4P/ICH/Ez4LjDRUhhf2eQynq0iS4G9oBdu3tRKaVeE+d2AxePfXsyZD8KPJmHXLCAS9KNX0mbwyLdWb0Cq8FAOfOWhuZyzR4hLyDgKMGlp52fiqZEI3xqxzX+Dfjy1l483jRG2aC+mbWnk4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NJ04VrKq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NJ04VrKq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CA7FC113CC; Tue, 16 Apr 2024 01:42:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713231748; bh=JUy/LuvUD1YzrQCVTbIGHFA/kd3Wzrx2eXKRidV+2Rk=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=NJ04VrKqrqQ+evFI3G2UEj2296fcADqS9hxwB5sbNRfqVvZ4bWg8za1i342x9w7Wv NYn7Cbm3QCFjydwLJF4dTDw2iNtSp53L0yog6ojscWa/c4ME94ajrJ3tU0RKF7k8FQ hX8pDOjzV9xQFbU/M0oEX94OuxhMQvQVonuZMPHrAy0t6fsIGIOrALQobVj01K6NDu WYAxz3eHpMN7sip43VAKENtwg1EaSDwSG2R2NJT1SJegeXI4qGzH+xdXLTebOGMiF0 U35jqDeX78LtcjcWWCFvnn7bMWHpCA0Keisi4bJy4Ro/UKGBuPpmVhJkowp30wo4L5 QOi6gn4f0WCjg== Date: Mon, 15 Apr 2024 18:42:27 -0700 Subject: [PATCH 4/4] xfs: only iget the file once when doing vectored scrub-by-handle From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de, hch@infradead.org Message-ID: <171323030309.253873.8649027644659300452.stgit@frogsfrogsfrogs> In-Reply-To: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> References: <171323030233.253873.6726826444851242926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If a program wants us to perform a scrub on a file handle and the fd passed to ioctl() is not the file referenced in the handle, iget the file once and pass it into the scrub code. This amortizes the untrusted iget lookup over /all/ the scrubbers mentioned in the scrubv call. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime on account of avoiding repeated inobt lookups. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/scrub.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 78b00ab85c9c9..87a5a728031fb 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -792,6 +792,31 @@ xfs_scrubv_check_barrier( return 0; } +/* + * If the caller provided us with a nonzero inode number that isn't the ioctl + * file, try to grab a reference to it to eliminate all further untrusted inode + * lookups. If we can't get the inode, let each scrub function try again. + */ +STATIC struct xfs_inode * +xchk_scrubv_open_by_handle( + struct xfs_mount *mp, + const struct xfs_scrub_vec_head *head) +{ + struct xfs_inode *ip; + int error; + + error = xfs_iget(mp, NULL, head->svh_ino, XFS_IGET_UNTRUSTED, 0, &ip); + if (error) + return NULL; + + if (VFS_I(ip)->i_generation != head->svh_gen) { + xfs_irele(ip); + return NULL; + } + + return ip; +} + /* Vectored scrub implementation to reduce ioctl calls. */ int xfs_ioc_scrubv_metadata( @@ -804,6 +829,7 @@ xfs_ioc_scrubv_metadata( struct xfs_scrub_vec __user *uvectors; struct xfs_inode *ip_in = XFS_I(file_inode(file)); struct xfs_mount *mp = ip_in->i_mount; + struct xfs_inode *handle_ip = NULL; struct xfs_scrub_vec *v; size_t vec_bytes; unsigned int i; @@ -848,6 +874,17 @@ xfs_ioc_scrubv_metadata( trace_xchk_scrubv_item(mp, &head, i, v); } + /* + * If the caller wants us to do a scrub-by-handle and the file used to + * call the ioctl is not the same file, load the incore inode and pin + * it across all the scrubv actions to avoid repeated UNTRUSTED + * lookups. The reference is not passed to deeper layers of scrub + * because each scrubber gets to decide its own strategy for getting an + * inode. + */ + if (head.svh_ino && head.svh_ino != ip_in->i_ino) + handle_ip = xchk_scrubv_open_by_handle(mp, &head); + /* Run all the scrubbers. */ for (i = 0, v = vectors; i < head.svh_nr; i++, v++) { struct xfs_scrub_metadata sm = { @@ -895,6 +932,15 @@ xfs_ioc_scrubv_metadata( } out_free: + /* + * If we're holding the only reference to an inode opened via handle, + * mark it dontcache so that we don't pollute the cache. + */ + if (handle_ip) { + if (atomic_read(&VFS_I(handle_ip)->i_count) == 1) + d_mark_dontcache(VFS_I(handle_ip)); + xfs_irele(handle_ip); + } kfree(vectors); return error; }