Message ID | ejy4ska7orznl75364ehskucg7ibo3j3biwkui6q6mv42im6o5@pzl7pwwxjrg3 (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | xfs_release lock contention | expand |
On Wed, Aug 07, 2024 at 06:27:21AM +0200, Mateusz Guzik wrote: > I'm looking at false-sharing problems concerning multicore open + read + > close cycle on one inode and during my survey I found xfs is heavily > serializing on a spinlock in xfs_release, making it perform the worst > out of the btrfs/ext4/xfs trio. > > A trivial test case plopped into will-it-scale is at the end. > > bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }' tells me: > [snip] > @[ > __pv_queued_spin_lock_slowpath+5 > _raw_spin_lock_irqsave+49 > rwsem_wake.isra.0+57 > up_write+69 > xfs_iunlock+244 > xfs_release+175 > __fput+238 > __x64_sys_close+60 > do_syscall_64+82 > entry_SYSCALL_64_after_hwframe+118 > ]: 41132 > @[ > __pv_queued_spin_lock_slowpath+5 > _raw_spin_lock_irq+42 > rwsem_down_read_slowpath+164 > down_read+72 > xfs_ilock+125 > xfs_file_buffered_read+71 > xfs_file_read_iter+115 > vfs_read+604 > ksys_read+103 > do_syscall_64+82 > entry_SYSCALL_64_after_hwframe+118 > ]: 137639 > @[ > __pv_queued_spin_lock_slowpath+5 > _raw_spin_lock+41 > xfs_release+196 > __fput+238 > __x64_sys_close+60 > do_syscall_64+82 > entry_SYSCALL_64_after_hwframe+118 > ]: 1432766 > > The xfs_release -> _raw_spin_lock thing is the XFS_ITRUNCATED flag test. Yeah, these all ring old bells in the back of my skull. > > test case (plop into will-it-scale, say tests/openreadclose3.c and run > ./openreadclose3_processes -t 24): > > #include <stdlib.h> > #include <unistd.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <assert.h> > > #define BUFSIZE 1024 > > static char tmpfile[] = "/tmp/willitscale.XXXXXX"; > > char *testcase_description = "Same file open/read/close"; > > void testcase_prepare(unsigned long nr_tasks) > { > char buf[BUFSIZE]; > int fd = mkstemp(tmpfile); > > assert(fd >= 0); > memset(buf, 'A', sizeof(buf)); > assert(write(fd, buf, sizeof(buf)) == sizeof(buf)); > close(fd); > } > > void testcase(unsigned long long *iterations, unsigned long nr) > { > char buf[BUFSIZE]; > > while (1) { > int fd = open(tmpfile, O_RDONLY); > assert(fd >= 0); > assert(read(fd, buf, sizeof(buf)) == sizeof(buf)); > close(fd); Oh, yeah, I defintely sent patches once upon a time to address this. <scrummage around old patch stacks> Yep, there it is: https://lore.kernel.org/linux-xfs/20190207050813.24271-4-david@fromorbit.com/ This would completely remove the rwsem traffic from O_RDONLY file closes. None of it would address the XFS_ITRUNCATED contention issue, but that's just another of those "test, test-and-clear" cases that avoid the atomic ops by testing if the bit is set without the lock first.... Hmmm, I thought I saw these patches go past on the list again recently. Yeah, that was a coupl eof months ago: https://lore.kernel.org/linux-xfs/20240623053532.857496-1-hch@lst.de/ Christoph, any progress on merging that patchset? -Dave.
On Wed, Aug 07, 2024 at 03:43:53PM +1000, Dave Chinner wrote: > Hmmm, I thought I saw these patches go past on the list again > recently. Yeah, that was a coupl eof months ago: > > https://lore.kernel.org/linux-xfs/20240623053532.857496-1-hch@lst.de/ > > Christoph, any progress on merging that patchset? I expected them to go into 6.11 but they didn't. I'll resend them after retesting them as there weren't any actionable items except for a few typo fixes.
On Wed, Aug 07, 2024 at 04:37:59PM +0200, Christoph Hellwig wrote: > On Wed, Aug 07, 2024 at 03:43:53PM +1000, Dave Chinner wrote: > > Hmmm, I thought I saw these patches go past on the list again > > recently. Yeah, that was a coupl eof months ago: > > > > https://lore.kernel.org/linux-xfs/20240623053532.857496-1-hch@lst.de/ > > > > Christoph, any progress on merging that patchset? > > I expected them to go into 6.11 but they didn't. I'll resend them > after retesting them as there weren't any actionable items except for > a few typo fixes. Heh, I think there's like one patch out of the whole bunch that didn't get a RVB tag, so that's probably why it didn't go in. :( --D
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 7dc6f326936c..1cc62c21e709 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1079,6 +1079,8 @@ xfs_itruncate_extents_flags( return error; } +extern unsigned long magic_tunable; + int xfs_release( xfs_inode_t *ip) @@ -1089,6 +1091,13 @@ xfs_release( if (!S_ISREG(VFS_I(ip)->i_mode) || (VFS_I(ip)->i_mode == 0)) return 0; + if (magic_tunable) { + if (!(ip->i_flags & XFS_ITRUNCATED)) + return 0; + if (ip->i_delayed_blks == 0) + return 0; + } + /* If this is a read-only mount, don't do this (would generate I/O) */ if (xfs_is_readonly(mp)) return 0;