Message ID | 171323030309.253873.8649027644659300452.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/4] xfs: reduce the rate of cond_resched calls inside scrub | expand |
> out_free: > + /* > + * If we're holding the only reference to an inode opened via handle, > + * mark it dontcache so that we don't pollute the cache. > + */ > + if (handle_ip) { > + if (atomic_read(&VFS_I(handle_ip)->i_count) == 1) > + d_mark_dontcache(VFS_I(handle_ip)); > + xfs_irele(handle_ip); This still feels the wrong way around vs just using IGET_UNCACHED? Or did I miss the killer argument why that can't work?
On Mon, Apr 15, 2024 at 10:35:26PM -0700, Christoph Hellwig wrote: > > out_free: > > + /* > > + * If we're holding the only reference to an inode opened via handle, > > + * mark it dontcache so that we don't pollute the cache. > > + */ > > + if (handle_ip) { > > + if (atomic_read(&VFS_I(handle_ip)->i_count) == 1) > > + d_mark_dontcache(VFS_I(handle_ip)); > > + xfs_irele(handle_ip); > > This still feels the wrong way around vs just using IGET_UNCACHED? > > Or did I miss the killer argument why that can't work? No, I missed the killer argument. In fact, I think the fsdax reorganization into d_mark_dontcache makes this counterproductive. Initially, I had thought that we'd be doing users a favor by only marking inodes dontcache at the end of a scrub operation, and only if there's only one reference to that inode. This was more or less true back when I_DONTCACHE was an XFS iflag and the only thing it did was change the outcome of xfs_fs_drop_inode to 1. If there are dentries pointing to the inode when scrub finishes, the inode will have positive i_count and stay around in cache until dentry reclaim. But now we have d_mark_dontcache, which cause the inode *and* the dentries attached to it all to be marked I_DONTCACHE, which means that we drop the dentries ASAP, which drops the inode ASAP. This is bad if scrub found problems with the inode, because now they can be scheduled for inactivation, which can cause inodegc to trip on it and shut down the filesystem. Even if the inode isn't bad, this is still suboptimal because phases 3-7 each initiate inode scans. Dropping the inode immediately during phase 3 is silly because phase 5 will reload it and drop it immediately, etc. It's fine to mark the inodes dontcache, but if there have been accesses to the file that set up dentries, we should keep them. I validated this by setting up ftrace to capture xfs_iget_recycle* tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per 30 seconds. So I think I want to change xchk_irele to clear I_DONTCACHE if we're in transaction context or if corruption was found; and to use XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls. --D
On Tue, Apr 16, 2024 at 03:31:48PM -0700, Darrick J. Wong wrote: > On Mon, Apr 15, 2024 at 10:35:26PM -0700, Christoph Hellwig wrote: > > > out_free: > > > + /* > > > + * If we're holding the only reference to an inode opened via handle, > > > + * mark it dontcache so that we don't pollute the cache. > > > + */ > > > + if (handle_ip) { > > > + if (atomic_read(&VFS_I(handle_ip)->i_count) == 1) > > > + d_mark_dontcache(VFS_I(handle_ip)); > > > + xfs_irele(handle_ip); > > > > This still feels the wrong way around vs just using IGET_UNCACHED? > > > > Or did I miss the killer argument why that can't work? > > No, I missed the killer argument. In fact, I think the fsdax > reorganization into d_mark_dontcache makes this counterproductive. > > Initially, I had thought that we'd be doing users a favor by only > marking inodes dontcache at the end of a scrub operation, and only if > there's only one reference to that inode. This was more or less true > back when I_DONTCACHE was an XFS iflag and the only thing it did was > change the outcome of xfs_fs_drop_inode to 1. If there are dentries > pointing to the inode when scrub finishes, the inode will have positive > i_count and stay around in cache until dentry reclaim. > > But now we have d_mark_dontcache, which cause the inode *and* the > dentries attached to it all to be marked I_DONTCACHE, which means that > we drop the dentries ASAP, which drops the inode ASAP. > > This is bad if scrub found problems with the inode, because now they can > be scheduled for inactivation, which can cause inodegc to trip on it and > shut down the filesystem. > > Even if the inode isn't bad, this is still suboptimal because phases 3-7 > each initiate inode scans. Dropping the inode immediately during phase > 3 is silly because phase 5 will reload it and drop it immediately, etc. > It's fine to mark the inodes dontcache, but if there have been accesses > to the file that set up dentries, we should keep them. > > I validated this by setting up ftrace to capture xfs_iget_recycle* > tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I > saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls > and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per > 30 seconds. > > So I think I want to change xchk_irele to clear I_DONTCACHE if we're in > transaction context or if corruption was found; and to use > XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls. Actually we don't even need to clear DONTCACHE on sick inodes; commit 7975e465af6b4 ("xfs: drop IDONTCACHE on inodes when we mark them sick") already took care of that. --D > --D >
On Tue, Apr 16, 2024 at 03:31:48PM -0700, Darrick J. Wong wrote: > So I think I want to change xchk_irele to clear I_DONTCACHE if we're in > transaction context or if corruption was found; and to use > XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls. Yes, that's what I thought.
On Tue, Apr 16, 2024 at 03:51:19PM -0700, Darrick J. Wong wrote: > Actually we don't even need to clear DONTCACHE on sick inodes; commit > 7975e465af6b4 ("xfs: drop IDONTCACHE on inodes when we mark them sick") > already took care of that. Even better.
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 78b00ab85c9c9..87a5a728031fb 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -792,6 +792,31 @@ xfs_scrubv_check_barrier( return 0; } +/* + * If the caller provided us with a nonzero inode number that isn't the ioctl + * file, try to grab a reference to it to eliminate all further untrusted inode + * lookups. If we can't get the inode, let each scrub function try again. + */ +STATIC struct xfs_inode * +xchk_scrubv_open_by_handle( + struct xfs_mount *mp, + const struct xfs_scrub_vec_head *head) +{ + struct xfs_inode *ip; + int error; + + error = xfs_iget(mp, NULL, head->svh_ino, XFS_IGET_UNTRUSTED, 0, &ip); + if (error) + return NULL; + + if (VFS_I(ip)->i_generation != head->svh_gen) { + xfs_irele(ip); + return NULL; + } + + return ip; +} + /* Vectored scrub implementation to reduce ioctl calls. */ int xfs_ioc_scrubv_metadata( @@ -804,6 +829,7 @@ xfs_ioc_scrubv_metadata( struct xfs_scrub_vec __user *uvectors; struct xfs_inode *ip_in = XFS_I(file_inode(file)); struct xfs_mount *mp = ip_in->i_mount; + struct xfs_inode *handle_ip = NULL; struct xfs_scrub_vec *v; size_t vec_bytes; unsigned int i; @@ -848,6 +874,17 @@ xfs_ioc_scrubv_metadata( trace_xchk_scrubv_item(mp, &head, i, v); } + /* + * If the caller wants us to do a scrub-by-handle and the file used to + * call the ioctl is not the same file, load the incore inode and pin + * it across all the scrubv actions to avoid repeated UNTRUSTED + * lookups. The reference is not passed to deeper layers of scrub + * because each scrubber gets to decide its own strategy for getting an + * inode. + */ + if (head.svh_ino && head.svh_ino != ip_in->i_ino) + handle_ip = xchk_scrubv_open_by_handle(mp, &head); + /* Run all the scrubbers. */ for (i = 0, v = vectors; i < head.svh_nr; i++, v++) { struct xfs_scrub_metadata sm = { @@ -895,6 +932,15 @@ xfs_ioc_scrubv_metadata( } out_free: + /* + * If we're holding the only reference to an inode opened via handle, + * mark it dontcache so that we don't pollute the cache. + */ + if (handle_ip) { + if (atomic_read(&VFS_I(handle_ip)->i_count) == 1) + d_mark_dontcache(VFS_I(handle_ip)); + xfs_irele(handle_ip); + } kfree(vectors); return error; }