diff mbox series

[10/23] mkfs: don't hardcode log size

Message ID 173706974228.1927324.17714311358227511791.stgit@frogsfrogsfrogs (mailing list archive)
State New
Headers show
Series [01/23] generic/476: fix fsstress process management | expand

Commit Message

Darrick J. Wong Jan. 16, 2025, 11:27 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Commit 000813899afb46 hardcoded a log size of 256MB into xfs/501,
xfs/502, and generic/530.  This seems to be an attempt to reduce test
run times by increasing the log size so that more background threads can
run in parallel.  Unfortunately, this breaks a couple of my test
configurations:

 - External logs smaller than 256MB
 - Internal logs where the AG size is less than 256MB

For example, here's seqres.full from a failed xfs/501 invocation:

** mkfs failed with extra mkfs options added to " -m metadir=2,autofsck=1,uquota,gquota,pquota, -d rtinherit=1," by test 501 **
** attempting to mkfs using only test 501 options: -l size=256m **
size 256m specified for log subvolume is too large, maximum is 32768 blocks
<snip>
mount -ortdev=/dev/sdb4 -ologdev=/dev/sdb2 /dev/sda4 /opt failed
umount: /dev/sda4: not mounted.

Note that there's some formatting error here, so we jettison the entire
rt configuration to force the log size option, but then mount fails
because we didn't edit out the rtdev option there too.

Fortunately, mkfs.xfs already /has/ a few options to try to improve
parallelism in the filesystem by avoiding contention on the log grant
heads by scaling up the log size.  These options are aware of log and AG
size constraints so they won't conflict with other geometry options.

Use them.

Cc: <fstests@vger.kernel.org> # v2024.12.08
Fixes: 000813899afb46 ("fstests: scale some tests for high CPU count sanity")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 common/rc         |   27 +++++++++++++++++++++++++++
 tests/generic/530 |    6 +-----
 tests/generic/531 |    6 +-----
 tests/xfs/501     |    2 +-
 tests/xfs/502     |    2 +-
 5 files changed, 31 insertions(+), 12 deletions(-)

Comments

Dave Chinner Jan. 21, 2025, 3:58 a.m. UTC | #1
On Thu, Jan 16, 2025 at 03:27:46PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Commit 000813899afb46 hardcoded a log size of 256MB into xfs/501,
> xfs/502, and generic/530.  This seems to be an attempt to reduce test
> run times by increasing the log size so that more background threads can
> run in parallel.  Unfortunately, this breaks a couple of my test
> configurations:
> 
>  - External logs smaller than 256MB
>  - Internal logs where the AG size is less than 256MB
....

> diff --git a/common/rc b/common/rc
> index 9e34c301b0deb0..885669beeb5e26 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -689,6 +689,33 @@ _test_cycle_mount()
>      _test_mount
>  }
>  
> +# Are there mkfs options to try to improve concurrency?
> +_scratch_mkfs_concurrency_options()
> +{
> +	local nr_cpus="$(( $1 * LOAD_FACTOR ))"

caller does not need to pass a number of CPUs. This function can
simply do:

	local nr_cpus=$(getconf _NPROCESSORS_CONF)

And that will set concurrency to be "optimal" for the number of CPUs
in the machine the test is going to run on. That way tests don't
need to hard code some number that is going to be too large for
small systems and to small for large systems...

-Dave.
Theodore Ts'o Jan. 21, 2025, 12:44 p.m. UTC | #2
On Tue, Jan 21, 2025 at 02:58:25PM +1100, Dave Chinner wrote:
> > +# Are there mkfs options to try to improve concurrency?
> > +_scratch_mkfs_concurrency_options()
> > +{
> > +	local nr_cpus="$(( $1 * LOAD_FACTOR ))"
> 
> caller does not need to pass a number of CPUs. This function can
> simply do:
> 
> 	local nr_cpus=$(getconf _NPROCESSORS_CONF)
> 
> And that will set concurrency to be "optimal" for the number of CPUs
> in the machine the test is going to run on. That way tests don't
> need to hard code some number that is going to be too large for
> small systems and to small for large systems...

Hmm, but is this the right thing if you are using check-parallel?  If
you are running multiple tests that are all running some kind of load
or stress-testing antagonist at the same time, then having 3x to 5x
the number of necessary antagonist threads is going to unnecessarily
slow down the test run, which goes against the original goal of what
we were hoping to achieve with check-parallel.

How many tests are you currently able to run in parallel today, and
what's the ultimate goal?  We could have some kind of antagonist load
which is shared across multiple tests, but it's not clear to me that
it's worth the complexity.  (And note that it's not just fs and cpu
load antagonistsw; there could also be memory stress antagonists, where
having multiple antagonists could lead to OOM kills...)

							- Ted
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index 9e34c301b0deb0..885669beeb5e26 100644
--- a/common/rc
+++ b/common/rc
@@ -689,6 +689,33 @@  _test_cycle_mount()
     _test_mount
 }
 
+# Are there mkfs options to try to improve concurrency?
+_scratch_mkfs_concurrency_options()
+{
+	local nr_cpus="$(( $1 * LOAD_FACTOR ))"
+
+	case "$FSTYP" in
+	xfs)
+		# If any concurrency options are already specified, don't
+		# compute our own conflicting ones.
+		echo "$SCRATCH_OPTIONS $MKFS_OPTIONS" | \
+			grep -q 'concurrency=' &&
+			return
+
+		local sections=(d r)
+
+		# -l concurrency does not work with external logs
+		test _has_logdev || sections+=(l)
+
+		for section in "${sections[@]}"; do
+			$MKFS_XFS_PROG -$section concurrency=$nr_cpus 2>&1 | \
+				grep -q "unknown option -$section" ||
+				echo "-$section concurrency=$nr_cpus "
+		done
+		;;
+	esac
+}
+
 _scratch_mkfs_options()
 {
     _scratch_options mkfs
diff --git a/tests/generic/530 b/tests/generic/530
index f2513156a920e8..7413840476b588 100755
--- a/tests/generic/530
+++ b/tests/generic/530
@@ -25,11 +25,7 @@  _require_test_program "t_open_tmpfiles"
 # For XFS, pushing 50000 unlinked inode inactivations through a small xfs log
 # can result in bottlenecks on the log grant heads, so try to make the log
 # larger to reduce runtime.
-if [ "$FSTYP" = "xfs" ] && ! _has_logdev; then
-    _scratch_mkfs "-l size=256m" >> $seqres.full 2>&1
-else
-    _scratch_mkfs >> $seqres.full 2>&1
-fi
+_scratch_mkfs $(_scratch_mkfs_concurrency_options 32) >> $seqres.full 2>&1
 _scratch_mount
 
 # Set ULIMIT_NOFILE to min(file-max / 2, 50000 files per LOAD_FACTOR)
diff --git a/tests/generic/531 b/tests/generic/531
index ed6c3f91153ecc..3ba2790c923464 100755
--- a/tests/generic/531
+++ b/tests/generic/531
@@ -23,11 +23,7 @@  _require_test_program "t_open_tmpfiles"
 
 # On high CPU count machines, this runs a -lot- of create and unlink
 # concurrency. Set the filesytsem up to handle this.
-if [ $FSTYP = "xfs" ]; then
-	_scratch_mkfs "-d agcount=32" >> $seqres.full 2>&1
-else
-	_scratch_mkfs >> $seqres.full 2>&1
-fi
+_scratch_mkfs $(_scratch_mkfs_concurrency_options 32) >> $seqres.full 2>&1
 _scratch_mount
 
 # Try to load up all the CPUs, two threads per CPU.
diff --git a/tests/xfs/501 b/tests/xfs/501
index 678c51b52948c5..4b29ef97d36c1a 100755
--- a/tests/xfs/501
+++ b/tests/xfs/501
@@ -33,7 +33,7 @@  _require_xfs_sysfs debug/log_recovery_delay
 _require_scratch
 _require_test_program "t_open_tmpfiles"
 
-_scratch_mkfs "-l size=256m" >> $seqres.full 2>&1
+_scratch_mkfs $(_scratch_mkfs_concurrency_options 32) >> $seqres.full 2>&1
 _scratch_mount
 
 # Set ULIMIT_NOFILE to min(file-max / 2, 30000 files per LOAD_FACTOR)
diff --git a/tests/xfs/502 b/tests/xfs/502
index 10b0017f6b2eb2..df3e7bcb17872d 100755
--- a/tests/xfs/502
+++ b/tests/xfs/502
@@ -23,7 +23,7 @@  _require_xfs_io_error_injection "iunlink_fallback"
 _require_scratch
 _require_test_program "t_open_tmpfiles"
 
-_scratch_mkfs "-l size=256m" | _filter_mkfs 2> $tmp.mkfs > /dev/null
+_scratch_mkfs $(_scratch_mkfs_concurrency_options 32) | _filter_mkfs 2> $tmp.mkfs > /dev/null
 cat $tmp.mkfs >> $seqres.full
 . $tmp.mkfs