Message ID | 173706974288.1927324.17585931341351454094.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [01/23] generic/476: fix fsstress process management | expand |
On Thu, Jan 16, 2025 at 03:28:49PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > generic/032 now periodically fails with: > > --- /tmp/fstests/tests/generic/032.out 2025-01-05 11:42:14.427388698 -0800 > +++ /var/tmp/fstests/generic/032.out.bad 2025-01-06 18:20:17.122818195 -0800 > @@ -1,5 +1,7 @@ > QA output created by 032 > 100 iterations > -000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >................< > -* > -100000 > +umount: /opt: target is busy. > +mount: /opt: /dev/sda4 already mounted on /opt. > + dmesg(1) may have more information after failed mount system call. > +cycle mount failed > +(see /var/tmp/fstests/generic/032.full for details) > > The root cause of this regression is the _syncloop subshell. This > background process runs _scratch_sync, which is actually an xfs_io > process that calls syncfs on the scratch mount. > > Unfortunately, while the test kills the _syncloop subshell, it doesn't > actually kill the xfs_io process. If the xfs_io process is in D state > running the syncfs, it won't react to the signal, but it will pin the > mount. Then the _scratch_cycle_mount fails because the mount is pinned. > > Prior to commit 8973af00ec212f the _syncloop ran sync(1) which avoided > pinning the scratch filesystem. How does running sync(1) prevent this? they run the same kernel code, so I'm a little confused as to why this is a problem caused by using the syncfs() syscall rather than the sync() syscall... > Fix this by pgrepping for the xfs_io process and killing and waiting for > it if necessary. Change looks fine, though. Reviewed-by: Dave Chinner <dchinner@redhat.com>
diff --git a/common/rc b/common/rc index 4419cfc3188374..d7f3c48eafe590 100644 --- a/common/rc +++ b/common/rc @@ -36,6 +36,12 @@ _pkill() pkill --session 0 "$@" } +# Find only the test processes started by this test +_pgrep() +{ + pgrep --session 0 "$@" +} + # Common execution handling for fsstress invocation. # # We need per-test fsstress binaries because of the way fsstress forks and diff --git a/tests/generic/032 b/tests/generic/032 index 30290c7225a2fa..48d594fe9315b8 100755 --- a/tests/generic/032 +++ b/tests/generic/032 @@ -81,6 +81,15 @@ echo $iters iterations kill $syncpid wait +# The xfs_io instance started by _scratch_sync could be stuck in D state when +# the subshell running _syncloop & is killed. That xfs_io process pins the +# mount so we must kill it and wait for it to die before cycling the mount. +dead_syncfs_pid=$(_pgrep xfs_io) +if [ -n "$dead_syncfs_pid" ]; then + _pkill xfs_io + wait $dead_syncfs_pid +fi + # clear page cache and dump the file _scratch_cycle_mount _hexdump $SCRATCH_MNT/file