diff mbox series

[RFC,2/5] check: Add -q <n> option to support unconditional looping.

Message ID 1826e6084fd71e3e9755b1d2750876eb5f0e1161.1736496620.git.nirjhar.roy.lists@gmail.com (mailing list archive)
State New
Headers show
Series Add support for central fsconfig and -q <n> unconditional loop | expand

Commit Message

Nirjhar Roy (IBM) Jan. 10, 2025, 9:10 a.m. UTC
This patch adds -q <n> option through which one can run a given test <n>
times unconditionally. It also prints pass/fail metrics at the end.

The advantage of this over -L <n> and -i/-I <n> is that:
    a. -L <n> will not re-run a flakey test if the test passes for the first time.
    b. -I/-i <n> sets up devices during each iteration and hence slower.
Note -q <n> will override -L <n>.

Also rename _stash_fail_loop_files() to _stash_loop_files()
because this function will now be used even when the test doesn't fail
(i.e. when ran with -q <n>).

Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 check | 46 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 10 deletions(-)

Comments

Theodore Ts'o Jan. 10, 2025, 4:08 p.m. UTC | #1
On Fri, Jan 10, 2025 at 09:10:26AM +0000, Nirjhar Roy (IBM) wrote:
> This patch adds -q <n> option through which one can run a given test <n>
> times unconditionally. It also prints pass/fail metrics at the end.
> 
> The advantage of this over -L <n> and -i/-I <n> is that:
>     a. -L <n> will not re-run a flakey test if the test passes for the first time.
>     b. -I/-i <n> sets up devices during each iteration and hence slower.
> Note -q <n> will override -L <n>.

This is great!  It's something that I've wanted for a while, since at
the moment I implement {gce,kvm}-xfstests -C 10 is to run check ten
times, and doing something which does the looping inside check instead
of outside will be much more efficient.

One other thing that has been on my todo list to update, but which
perhaps you might be willing to do while you are doing work in this
area (nudge, nudge :-), is an optional mode which interates but which
stops once a test fails.  This is essentially the reverse of -L, and
the reason why it's useful is when trying to bisect a flakey test,
which sometimes might only be failing 2-5% of the time, require
running a test 30-50 times.  But the moment the test fails, we don't
need to run the test any more, so this would speed up bisection tests
which today I do via:

   gce-xfstests ltm --repo linux.git --bisect-good v6.12 --bisect-bad \
	v6.13-rc1 -C 50 -c ext4/inline_data generic/273

Because of this, I wonder if we should have one option to specify the
number of interations, and then a different option which specifies the
iteration mode, which might be "unconditional", "until first failure",
"only if the test initially fails", etc, instead of separate options
for -q, -L, etc.

Thanks,

					- Ted
Ritesh Harjani (IBM) Jan. 11, 2025, 3:35 a.m. UTC | #2
"Theodore Ts'o" <tytso@mit.edu> writes:

> On Fri, Jan 10, 2025 at 09:10:26AM +0000, Nirjhar Roy (IBM) wrote:
>> This patch adds -q <n> option through which one can run a given test <n>
>> times unconditionally. It also prints pass/fail metrics at the end.
>> 
>> The advantage of this over -L <n> and -i/-I <n> is that:
>>     a. -L <n> will not re-run a flakey test if the test passes for the first time.
>>     b. -I/-i <n> sets up devices during each iteration and hence slower.
>> Note -q <n> will override -L <n>.
>
> This is great!  It's something that I've wanted for a while, since at
> the moment I implement {gce,kvm}-xfstests -C 10 is to run check ten
> times, and doing something which does the looping inside check instead
> of outside will be much more efficient.

Yup "-q 10" could be used which can give pass/fail metrics of how many
times the test passed v/s failed by doing unconditional looping within
xfstest's check script itself. 

>
> One other thing that has been on my todo list to update, but which
> perhaps you might be willing to do while you are doing work in this
> area (nudge, nudge :-), is an optional mode which interates but which
> stops once a test fails.  This is essentially the reverse of -L, and
> the reason why it's useful is when trying to bisect a flakey test,
> which sometimes might only be failing 2-5% of the time, require
> running a test 30-50 times.  But the moment the test fails, we don't
> need to run the test any more, so this would speed up bisection tests
> which today I do via:
>
>    gce-xfstests ltm --repo linux.git --bisect-good v6.12 --bisect-bad \
> 	v6.13-rc1 -C 50 -c ext4/inline_data generic/273

There is -I which stops iterating on encountering a failure.
Does that work for this usecase?

>
> Because of this, I wonder if we should have one option to specify the
> number of interations, and then a different option which specifies the
> iteration mode, which might be "unconditional", "until first failure",
> "only if the test initially fails", etc, instead of separate options
> for -q, -L, etc.

I like the idea of iteration mode here. I will let others determine on
how easy it is to kill any option in xfstests today and replace it with
other. However here is a summary of different iteration and looping
options.

We have 3 options in xfstests today:
1. -i <n>              iterate the test list <n> times
2. -I <n>              iterate the test list <n> times, but stops iterating further in case of any test failure
3. -L <n>              loop tests <n> times following a failure, measuring aggregate pass/fail metrics

So we have -i/-I which are iterations and -q/-L which are loops. 
Looping happens when we can just loop over a particular test <n> times
and give pass/fail metrics.
Whereas in case of iterations it goes over from the beginning which will
also source different rc/config files and prepares the device etc. (hence
it's a bit slow too)

Using -q will be faster over using -i or -I similar to how -L is faster. However -L
will only re-run when there is a failure the 1st time.

-q v/s -l: Can we kill current -l option if it is not in use by anyone?
I would prefer -l since it looks a short form for looping. I don't see
we using -l anymore. But I will let others comment. 
    -l                  line mode diff


-ritesh
diff mbox series

Patch

diff --git a/check b/check
index 607d2456..e8ac0ce9 100755
--- a/check
+++ b/check
@@ -11,6 +11,7 @@  needsum=true
 try=()
 sum_bad=0
 bad=()
+is_bad_test=false
 notrun=()
 interrupt=true
 diff="diff -u"
@@ -27,6 +28,7 @@  DUMP_OUTPUT=false
 iterations=1
 istop=false
 loop_on_fail=0
+loop_unconditional=0
 exclude_tests=()
 
 # This is a global variable used to pass test failure text to reporting gunk
@@ -83,6 +85,7 @@  check options
     -s section		run only specified section from config file
     -S section		exclude the specified section from the config file
     -L <n>		loop tests <n> times following a failure, measuring aggregate pass/fail metrics
+    -q <n>		loop tests <n> times irrespective of a pass or a failure, measuring aggregate pass/fail metrics
 
 testlist options
     -g group[,group...]	include tests from these groups
@@ -341,7 +344,9 @@  while [ $# -gt 0 ]; do
 	-L)	[[ $2 =~ ^[0-9]+$ ]] || usage
 		loop_on_fail=$2; shift
 		;;
-
+	-q)	[[ $2 =~ ^[0-9]+$ ]] || usage
+		loop_unconditional=$(($2 - 1)); shift
+		;;
 	-*)	usage ;;
 	*)	# not an argument, we've got tests now.
 		have_test_arg=true ;;
@@ -357,6 +362,11 @@  while [ $# -gt 0 ]; do
 	shift
 done
 
+# -q <n> overrides -L <m>
+if [ "$loop_unconditional" -gt 0 ]; then
+	loop_on_fail=0
+fi
+
 # we need common/rc, that also sources common/config. We need to source it
 # after processing args, overlay needs FSTYP set before sourcing common/config
 if ! . ./common/rc; then
@@ -609,7 +619,7 @@  _expunge_test()
 }
 
 # retain files which would be overwritten in subsequent reruns of the same test
-_stash_fail_loop_files() {
+_stash_loop_files() {
 	local seq_prefix="${REPORT_DIR}/${1}"
 	local cp_suffix="$2"
 
@@ -633,10 +643,18 @@  _stash_test_status() {
 	fi
 
 	if ((${#loop_status[*]} > 0)); then
-		# continuing or completing rerun-on-failure loop
-		_stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}"
+		# continuing or completing rerun loop
+		_stash_loop_files "$test_seq" ".rerun${#loop_status[*]}"
 		loop_status+=("$test_status")
-		if ((${#loop_status[*]} > loop_on_fail)); then
+
+		# only stash @bad result for initial failure in loop
+		if [[ "$test_status" == "fail" ]] && ! $is_bad_test; then
+			bad+=("$test_seq")
+			is_bad_test=true
+		fi
+
+		if ((loop_on_fail && ${#loop_status[*]} > loop_on_fail)) || \
+		   ((loop_unconditional && ${#loop_status[*]} > loop_unconditional)); then
 			printf "%s aggregate results across %d runs: " \
 				"$test_seq" "${#loop_status[*]}"
 			awk "BEGIN {
@@ -650,23 +668,30 @@  _stash_test_status() {
 				}'
 			echo
 			loop_status=()
+			is_bad_test=false
 		fi
-		return	# only stash @bad result for initial failure in loop
+		return
 	fi
 
 	case "$test_status" in
 	fail)
-		if ((loop_on_fail > 0)); then
-			# initial failure, start rerun-on-failure loop
-			_stash_fail_loop_files "$test_seq" ".rerun0"
+		# re-run if either of the loop argument is set
+		if ((loop_on_fail > 0)) || ((loop_unconditional > 0)); then
+			_stash_loop_files "$test_seq" ".rerun0"
 			loop_status+=("$test_status")
 		fi
 		bad+=("$test_seq")
+		is_bad_test=true
 		;;
 	list|notrun)
 		notrun+=("$test_seq")
 		;;
 	pass|expunge)
+		# re-run if loop_unconditional argument is set
+		if ((loop_unconditional > 0)); then
+			_stash_loop_files "$test_seq" ".rerun0"
+			loop_status+=("$test_status")
+		fi
 		;;
 	*)
 		echo "Unexpected test $test_seq status: $test_status"
@@ -857,7 +882,8 @@  function run_section()
 	seqres="$check"
 	_check_test_fs
 
-	loop_status=()	# track rerun-on-failure state
+	is_bad_test=false
+	loop_status=()	# track loop rerun state
 	local tc_status ix
 	local -a _list=( $list )
 	for ((ix = 0; ix < ${#_list[*]}; !${#loop_status[*]} && ix++)); do