diff mbox series

[4/4] xfs: only iget the file once when doing vectored scrub-by-handle

Message ID 171323030309.253873.8649027644659300452.stgit@frogsfrogsfrogs (mailing list archive)
State Superseded
Headers show
Series [1/4] xfs: reduce the rate of cond_resched calls inside scrub | expand

Commit Message

Darrick J. Wong April 16, 2024, 1:42 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

If a program wants us to perform a scrub on a file handle and the fd
passed to ioctl() is not the file referenced in the handle, iget the
file once and pass it into the scrub code.  This amortizes the untrusted
iget lookup over /all/ the scrubbers mentioned in the scrubv call.

When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime on account of avoiding repeated
inobt lookups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/scrub.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

Comments

Christoph Hellwig April 16, 2024, 5:35 a.m. UTC | #1
>  out_free:
> +	/*
> +	 * If we're holding the only reference to an inode opened via handle,
> +	 * mark it dontcache so that we don't pollute the cache.
> +	 */
> +	if (handle_ip) {
> +		if (atomic_read(&VFS_I(handle_ip)->i_count) == 1)
> +			d_mark_dontcache(VFS_I(handle_ip));
> +		xfs_irele(handle_ip);

This still feels the wrong way around vs just using IGET_UNCACHED?

Or did I miss the killer argument why that can't work?
Darrick J. Wong April 16, 2024, 10:31 p.m. UTC | #2
On Mon, Apr 15, 2024 at 10:35:26PM -0700, Christoph Hellwig wrote:
> >  out_free:
> > +	/*
> > +	 * If we're holding the only reference to an inode opened via handle,
> > +	 * mark it dontcache so that we don't pollute the cache.
> > +	 */
> > +	if (handle_ip) {
> > +		if (atomic_read(&VFS_I(handle_ip)->i_count) == 1)
> > +			d_mark_dontcache(VFS_I(handle_ip));
> > +		xfs_irele(handle_ip);
> 
> This still feels the wrong way around vs just using IGET_UNCACHED?
> 
> Or did I miss the killer argument why that can't work?

No, I missed the killer argument.  In fact, I think the fsdax
reorganization into d_mark_dontcache makes this counterproductive.

Initially, I had thought that we'd be doing users a favor by only
marking inodes dontcache at the end of a scrub operation, and only if
there's only one reference to that inode.  This was more or less true
back when I_DONTCACHE was an XFS iflag and the only thing it did was
change the outcome of xfs_fs_drop_inode to 1.  If there are dentries
pointing to the inode when scrub finishes, the inode will have positive
i_count and stay around in cache until dentry reclaim.

But now we have d_mark_dontcache, which cause the inode *and* the
dentries attached to it all to be marked I_DONTCACHE, which means that
we drop the dentries ASAP, which drops the inode ASAP.

This is bad if scrub found problems with the inode, because now they can
be scheduled for inactivation, which can cause inodegc to trip on it and
shut down the filesystem.

Even if the inode isn't bad, this is still suboptimal because phases 3-7
each initiate inode scans.  Dropping the inode immediately during phase
3 is silly because phase 5 will reload it and drop it immediately, etc.
It's fine to mark the inodes dontcache, but if there have been accesses
to the file that set up dentries, we should keep them.

I validated this by setting up ftrace to capture xfs_iget_recycle*
tracepoints and ran xfs/285 for 30 seconds.  With current djwong-wtf I
saw ~30,000 recycle events.  I then dropped the d_mark_dontcache calls
and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per
30 seconds.

So I think I want to change xchk_irele to clear I_DONTCACHE if we're in
transaction context or if corruption was found; and to use
XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls.

--D
Darrick J. Wong April 16, 2024, 10:51 p.m. UTC | #3
On Tue, Apr 16, 2024 at 03:31:48PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 15, 2024 at 10:35:26PM -0700, Christoph Hellwig wrote:
> > >  out_free:
> > > +	/*
> > > +	 * If we're holding the only reference to an inode opened via handle,
> > > +	 * mark it dontcache so that we don't pollute the cache.
> > > +	 */
> > > +	if (handle_ip) {
> > > +		if (atomic_read(&VFS_I(handle_ip)->i_count) == 1)
> > > +			d_mark_dontcache(VFS_I(handle_ip));
> > > +		xfs_irele(handle_ip);
> > 
> > This still feels the wrong way around vs just using IGET_UNCACHED?
> > 
> > Or did I miss the killer argument why that can't work?
> 
> No, I missed the killer argument.  In fact, I think the fsdax
> reorganization into d_mark_dontcache makes this counterproductive.
> 
> Initially, I had thought that we'd be doing users a favor by only
> marking inodes dontcache at the end of a scrub operation, and only if
> there's only one reference to that inode.  This was more or less true
> back when I_DONTCACHE was an XFS iflag and the only thing it did was
> change the outcome of xfs_fs_drop_inode to 1.  If there are dentries
> pointing to the inode when scrub finishes, the inode will have positive
> i_count and stay around in cache until dentry reclaim.
> 
> But now we have d_mark_dontcache, which cause the inode *and* the
> dentries attached to it all to be marked I_DONTCACHE, which means that
> we drop the dentries ASAP, which drops the inode ASAP.
> 
> This is bad if scrub found problems with the inode, because now they can
> be scheduled for inactivation, which can cause inodegc to trip on it and
> shut down the filesystem.
> 
> Even if the inode isn't bad, this is still suboptimal because phases 3-7
> each initiate inode scans.  Dropping the inode immediately during phase
> 3 is silly because phase 5 will reload it and drop it immediately, etc.
> It's fine to mark the inodes dontcache, but if there have been accesses
> to the file that set up dentries, we should keep them.
> 
> I validated this by setting up ftrace to capture xfs_iget_recycle*
> tracepoints and ran xfs/285 for 30 seconds.  With current djwong-wtf I
> saw ~30,000 recycle events.  I then dropped the d_mark_dontcache calls
> and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per
> 30 seconds.
> 
> So I think I want to change xchk_irele to clear I_DONTCACHE if we're in
> transaction context or if corruption was found; and to use
> XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls.

Actually we don't even need to clear DONTCACHE on sick inodes; commit
7975e465af6b4 ("xfs: drop IDONTCACHE on inodes when we mark them sick")
already took care of that.

--D

> --D
>
Christoph Hellwig April 17, 2024, 5:01 a.m. UTC | #4
On Tue, Apr 16, 2024 at 03:31:48PM -0700, Darrick J. Wong wrote:
> So I think I want to change xchk_irele to clear I_DONTCACHE if we're in
> transaction context or if corruption was found; and to use
> XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls.

Yes, that's what I thought.
Christoph Hellwig April 17, 2024, 5:02 a.m. UTC | #5
On Tue, Apr 16, 2024 at 03:51:19PM -0700, Darrick J. Wong wrote:
> Actually we don't even need to clear DONTCACHE on sick inodes; commit
> 7975e465af6b4 ("xfs: drop IDONTCACHE on inodes when we mark them sick")
> already took care of that.

Even better.
diff mbox series

Patch

diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 78b00ab85c9c9..87a5a728031fb 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -792,6 +792,31 @@  xfs_scrubv_check_barrier(
 	return 0;
 }
 
+/*
+ * If the caller provided us with a nonzero inode number that isn't the ioctl
+ * file, try to grab a reference to it to eliminate all further untrusted inode
+ * lookups.  If we can't get the inode, let each scrub function try again.
+ */
+STATIC struct xfs_inode *
+xchk_scrubv_open_by_handle(
+	struct xfs_mount		*mp,
+	const struct xfs_scrub_vec_head	*head)
+{
+	struct xfs_inode		*ip;
+	int				error;
+
+	error = xfs_iget(mp, NULL, head->svh_ino, XFS_IGET_UNTRUSTED, 0, &ip);
+	if (error)
+		return NULL;
+
+	if (VFS_I(ip)->i_generation != head->svh_gen) {
+		xfs_irele(ip);
+		return NULL;
+	}
+
+	return ip;
+}
+
 /* Vectored scrub implementation to reduce ioctl calls. */
 int
 xfs_ioc_scrubv_metadata(
@@ -804,6 +829,7 @@  xfs_ioc_scrubv_metadata(
 	struct xfs_scrub_vec		__user *uvectors;
 	struct xfs_inode		*ip_in = XFS_I(file_inode(file));
 	struct xfs_mount		*mp = ip_in->i_mount;
+	struct xfs_inode		*handle_ip = NULL;
 	struct xfs_scrub_vec		*v;
 	size_t				vec_bytes;
 	unsigned int			i;
@@ -848,6 +874,17 @@  xfs_ioc_scrubv_metadata(
 		trace_xchk_scrubv_item(mp, &head, i, v);
 	}
 
+	/*
+	 * If the caller wants us to do a scrub-by-handle and the file used to
+	 * call the ioctl is not the same file, load the incore inode and pin
+	 * it across all the scrubv actions to avoid repeated UNTRUSTED
+	 * lookups.  The reference is not passed to deeper layers of scrub
+	 * because each scrubber gets to decide its own strategy for getting an
+	 * inode.
+	 */
+	if (head.svh_ino && head.svh_ino != ip_in->i_ino)
+		handle_ip = xchk_scrubv_open_by_handle(mp, &head);
+
 	/* Run all the scrubbers. */
 	for (i = 0, v = vectors; i < head.svh_nr; i++, v++) {
 		struct xfs_scrub_metadata	sm = {
@@ -895,6 +932,15 @@  xfs_ioc_scrubv_metadata(
 	}
 
 out_free:
+	/*
+	 * If we're holding the only reference to an inode opened via handle,
+	 * mark it dontcache so that we don't pollute the cache.
+	 */
+	if (handle_ip) {
+		if (atomic_read(&VFS_I(handle_ip)->i_count) == 1)
+			d_mark_dontcache(VFS_I(handle_ip));
+		xfs_irele(handle_ip);
+	}
 	kfree(vectors);
 	return error;
 }