mbox series

[RFC,0/2] add option to rerun failed tests

Message ID 20220621160153.29591-1-ddiss@suse.de (mailing list archive)
Headers show
Series add option to rerun failed tests | expand

Message

David Disseldorp June 21, 2022, 4:01 p.m. UTC
This RFC patchset adds support to loop on failed tests, as proposed by
Ted Ts'o in https://lwn.net/Articles/897061/:
  add a mode that will immediately rerun a failed test 25 or 100 times
  to establish a failure percentage.

There are a couple of things which I'd like to resolve before dropping
the RFC flag, but would appreciate early feedback on the approach here.
The caveats are:
- rerun tests will be tracked as a single failure in @try and @bad
  + xunit reports do not include any rerun details
- .bad files generated on failure will be overwritten by test reruns

For xunit reports, I think it'll make sense to stash the aggregates in a
separate <test>.agg-results file or something. Similarly for .bad file
overwrites, I could add a .<rerun #> suffix for capturing all failure
data.

Cheers, David

Comments

Theodore Ts'o June 24, 2022, 4:42 a.m. UTC | #1
On Tue, Jun 21, 2022 at 06:01:51PM +0200, David Disseldorp wrote:
> This RFC patchset adds support to loop on failed tests, as proposed by
> Ted Ts'o in https://lwn.net/Articles/897061/:
>   add a mode that will immediately rerun a failed test 25 or 100 times
>   to establish a failure percentage.
> 
> There are a couple of things which I'd like to resolve before dropping
> the RFC flag, but would appreciate early feedback on the approach here.

This is really exciting!  I was hoping to try it out, but the first
patch doesn't apply to origin/master on xfstests-dev.

For example this patch hunk:

@@ -729,9 +750,7 @@ function run_section()
 	prev_seq=""
 	for seq in $list ; do
 		# Run report for previous test!
-		if [ "$tc_status" == "fail" ]; then
-			bad+=("$seqnum")
-		fi
+		_stash_test_status "$seqnum" "$tc_status"
 		if $do_report && [[ ! $tc_status =~ ^(init|expunge)$ ]]; then
 			_make_testcase_report "$prev_seq" "$tc_status"
 		fi

The relevant section of check looks like this:

		# Run report for previous test!
		if $err ; then
			bad="$bad $seqnum"
			n_bad=`expr $n_bad + 1`
			tc_status="fail"
		fi

And "git blame" shows that this part of check hasn't changed since
2018, and I'm on the latest version upstream version of xfstests:

commit 568ac9fffeb6afec03e5d6c9936617232fd7fc6d (HEAD, tag: v2022.06.05, origin/master, origin/HEAD, kernel/master)
Author: Dave Chinner <dchinner@redhat.com>
Date:   Fri Jun 3 11:54:13 2022 +1000

    xfs/189: systemd monitoring of /etc/fstab sucks


Was your patch based xfstests with some out-of-tree patches?

> The caveats are:
> - rerun tests will be tracked as a single failure in @try and @bad
>   + xunit reports do not include any rerun details
> - .bad files generated on failure will be overwritten by test reruns
> 
> For xunit reports, I think it'll make sense to stash the aggregates in a
> separate <test>.agg-results file or something. Similarly for .bad file
> overwrites, I could add a .<rerun #> suffix for capturing all failure
> data.

For xunit results fie, was assuming that simply we would just have
multiple repeated testcase entries stored in the single results.xml
file.  For example:

<testcase classname="xfstests.global" name="generic/476" time="354">
		<failure message="Test  failed, reason unknown" type="TestFail" />
		<system-out>
		...
	</testcase>
<testcase classname="xfstests.global" name="generic/476" time="343">
	</testcase>
<testcase classname="xfstests.global" name="generic/476" time="353">
	</testcase>
	...

I don't know that we need a separate file for the rerun tests, since
it's not that hard to create a python script which parses the results
and calculates the pass/fail percentages for any test which is run
mutiple times.


As far as haivng the .bad and .full files, I agree that some kind of
.rerun-NN suffix would make a lot of sense.

Cheers,

						- Ted
Zorro Lang June 24, 2022, 6:15 a.m. UTC | #2
On Fri, Jun 24, 2022 at 12:42:03AM -0400, Theodore Ts'o wrote:
> On Tue, Jun 21, 2022 at 06:01:51PM +0200, David Disseldorp wrote:
> > This RFC patchset adds support to loop on failed tests, as proposed by
> > Ted Ts'o in https://lwn.net/Articles/897061/:
> >   add a mode that will immediately rerun a failed test 25 or 100 times
> >   to establish a failure percentage.
> > 
> > There are a couple of things which I'd like to resolve before dropping
> > the RFC flag, but would appreciate early feedback on the approach here.
> 
> This is really exciting!  I was hoping to try it out, but the first
> patch doesn't apply to origin/master on xfstests-dev.

Hi Ted,

The origin/for-next has latest (testing) update. But for this issue you hit
below, you might need this patchset:

  https://lore.kernel.org/fstests/20220620192934.21694-1-ddiss@suse.de/

David sent that at first, to fix/change something before having this feature.
I've acked that, and will merge it this week. For this patchset, I'd like to
give it more time to get more reivew points.

Thanks,
Zorro

> 
> For example this patch hunk:
> 
> @@ -729,9 +750,7 @@ function run_section()
>  	prev_seq=""
>  	for seq in $list ; do
>  		# Run report for previous test!
> -		if [ "$tc_status" == "fail" ]; then
> -			bad+=("$seqnum")
> -		fi
> +		_stash_test_status "$seqnum" "$tc_status"
>  		if $do_report && [[ ! $tc_status =~ ^(init|expunge)$ ]]; then
>  			_make_testcase_report "$prev_seq" "$tc_status"
>  		fi
> 
> The relevant section of check looks like this:
> 
> 		# Run report for previous test!
> 		if $err ; then
> 			bad="$bad $seqnum"
> 			n_bad=`expr $n_bad + 1`
> 			tc_status="fail"
> 		fi
> 
> And "git blame" shows that this part of check hasn't changed since
> 2018, and I'm on the latest version upstream version of xfstests:
> 
> commit 568ac9fffeb6afec03e5d6c9936617232fd7fc6d (HEAD, tag: v2022.06.05, origin/master, origin/HEAD, kernel/master)
> Author: Dave Chinner <dchinner@redhat.com>
> Date:   Fri Jun 3 11:54:13 2022 +1000
> 
>     xfs/189: systemd monitoring of /etc/fstab sucks
> 
> 
> Was your patch based xfstests with some out-of-tree patches?
> 
> > The caveats are:
> > - rerun tests will be tracked as a single failure in @try and @bad
> >   + xunit reports do not include any rerun details
> > - .bad files generated on failure will be overwritten by test reruns
> > 
> > For xunit reports, I think it'll make sense to stash the aggregates in a
> > separate <test>.agg-results file or something. Similarly for .bad file
> > overwrites, I could add a .<rerun #> suffix for capturing all failure
> > data.
> 
> For xunit results fie, was assuming that simply we would just have
> multiple repeated testcase entries stored in the single results.xml
> file.  For example:
> 
> <testcase classname="xfstests.global" name="generic/476" time="354">
> 		<failure message="Test  failed, reason unknown" type="TestFail" />
> 		<system-out>
> 		...
> 	</testcase>
> <testcase classname="xfstests.global" name="generic/476" time="343">
> 	</testcase>
> <testcase classname="xfstests.global" name="generic/476" time="353">
> 	</testcase>
> 	...
> 
> I don't know that we need a separate file for the rerun tests, since
> it's not that hard to create a python script which parses the results
> and calculates the pass/fail percentages for any test which is run
> mutiple times.
> 
> 
> As far as haivng the .bad and .full files, I agree that some kind of
> .rerun-NN suffix would make a lot of sense.
> 
> Cheers,
> 
> 						- Ted
>
David Disseldorp June 24, 2022, 8:32 a.m. UTC | #3
Thanks for the feedback, Ted...

On Fri, 24 Jun 2022 00:42:03 -0400, Theodore Ts'o wrote:

> On Tue, Jun 21, 2022 at 06:01:51PM +0200, David Disseldorp wrote:
> > This RFC patchset adds support to loop on failed tests, as proposed by
> > Ted Ts'o in https://lwn.net/Articles/897061/:
> >   add a mode that will immediately rerun a failed test 25 or 100 times
> >   to establish a failure percentage.
> > 
> > There are a couple of things which I'd like to resolve before dropping
> > the RFC flag, but would appreciate early feedback on the approach here.  
> 
> This is really exciting!  I was hoping to try it out, but the first
> patch doesn't apply to origin/master on xfstests-dev.
> 
> For example this patch hunk:
> 
> @@ -729,9 +750,7 @@ function run_section()
>  	prev_seq=""
>  	for seq in $list ; do
>  		# Run report for previous test!
> -		if [ "$tc_status" == "fail" ]; then
> -			bad+=("$seqnum")
> -		fi
> +		_stash_test_status "$seqnum" "$tc_status"
>  		if $do_report && [[ ! $tc_status =~ ^(init|expunge)$ ]]; then
>  			_make_testcase_report "$prev_seq" "$tc_status"
>  		fi
> 
> The relevant section of check looks like this:
> 
> 		# Run report for previous test!
> 		if $err ; then
> 			bad="$bad $seqnum"
> 			n_bad=`expr $n_bad + 1`
> 			tc_status="fail"
> 		fi
> 
> And "git blame" shows that this part of check hasn't changed since
> 2018, and I'm on the latest version upstream version of xfstests:
> 
> commit 568ac9fffeb6afec03e5d6c9936617232fd7fc6d (HEAD, tag: v2022.06.05, origin/master, origin/HEAD, kernel/master)
> Author: Dave Chinner <dchinner@redhat.com>
> Date:   Fri Jun 3 11:54:13 2022 +1000
> 
>     xfs/189: systemd monitoring of /etc/fstab sucks
> 
> 
> Was your patch based xfstests with some out-of-tree patches?

Yes, I forgot to mention that, sorry. As Zorro indicated, these were
done atop the v2022.06.12 tag with the following series applied:
https://lore.kernel.org/fstests/20220620192934.21694-1-ddiss@suse.de/

> > The caveats are:
> > - rerun tests will be tracked as a single failure in @try and @bad
> >   + xunit reports do not include any rerun details
> > - .bad files generated on failure will be overwritten by test reruns
> > 
> > For xunit reports, I think it'll make sense to stash the aggregates in a
> > separate <test>.agg-results file or something. Similarly for .bad file
> > overwrites, I could add a .<rerun #> suffix for capturing all failure
> > data.  
> 
> For xunit results fie, was assuming that simply we would just have
> multiple repeated testcase entries stored in the single results.xml
> file.  For example:
> 
> <testcase classname="xfstests.global" name="generic/476" time="354">
> 		<failure message="Test  failed, reason unknown" type="TestFail" />
> 		<system-out>
> 		...
> 	</testcase>
> <testcase classname="xfstests.global" name="generic/476" time="343">
> 	</testcase>
> <testcase classname="xfstests.global" name="generic/476" time="353">
> 	</testcase>
> 	...

That seems sensible, I'll add this functionality.

> I don't know that we need a separate file for the rerun tests, since
> it's not that hard to create a python script which parses the results
> and calculates the pass/fail percentages for any test which is run
> mutiple times.

It's just a string, so doesn't need to be in a file. I'll add an extra
parameter to _make_testcase_report so that it can be stashed somewhere
like <testcase ... status="$aggregate_stats"> on the last rerun.

> As far as haivng the .bad and .full files, I agree that some kind of
> .rerun-NN suffix would make a lot of sense.

I had a bit of a play with this and it does get a bit ugly if we want to
prefix things like .dmesg as well. The xunit rerun entries will already
capture everything, but I suppose it's still needed for those not using
xunit reports.

Cheers, David
Theodore Ts'o June 24, 2022, 3:18 p.m. UTC | #4
On Fri, Jun 24, 2022 at 10:32:43AM +0200, David Disseldorp wrote:
> Yes, I forgot to mention that, sorry. As Zorro indicated, these were
> done atop the v2022.06.12 tag with the following series applied:
> https://lore.kernel.org/fstests/20220620192934.21694-1-ddiss@suse.de/

Got it, thanks.  Sorry, I had forgotten that we had the next branch now.

I'll try to do a full review once I'm able to give the patches a spin.

> > <testcase classname="xfstests.global" name="generic/476" time="354">
> > 		<failure message="Test  failed, reason unknown" type="TestFail" />
> > 		<system-out>
> > 		...
> > 	</testcase>
> > <testcase classname="xfstests.global" name="generic/476" time="343">
> > 	</testcase>
> > <testcase classname="xfstests.global" name="generic/476" time="353">
> > 	</testcase>
> > 	...
> 
> That seems sensible, I'll add this functionality.

I *think* that should happen automatically when _make_testcase_report
gets called after each iteration.  So that might be easier than having
to do any kind of special case handling.  (Which is why that was going
to be how I was planning on tackling it before you went ahead and
implemented --- thanks for that!!)

> > As far as haivng the .bad and .full files, I agree that some kind of
> > .rerun-NN suffix would make a lot of sense.
> 
> I had a bit of a play with this and it does get a bit ugly if we want to
> prefix things like .dmesg as well. The xunit rerun entries will already
> capture everything, but I suppose it's still needed for those not using
> xunit reports.

Well, actually, one of the things on my TODO list was to implement a
new report type which would removed the xunit <system-out> fields from
the xunit file.  The reason behind that is sometimes the the
NNN.out.bad files can get huge --- and the Python library for parsing
junit XML files has a safety mechanism which will error out if a field
is larger than 10MB, to prevent some denial of service attacks.  And
I've had some XFS NNN.out.bad files get to be 30MB or larger!

When that happens, it causes the Python script I use to parse the XML
file to fail.  In addition, since I already have a different mechanism
for saving the full set of test artifiacts ---- sometimes having the
NNN.full file is really useful for root causing the failure --- having
two copies of the out.bad files in both the Xunit file and in my test
artifacts tarball is a bit of a waste.

I had a POC which implemented this, but then Darrick had a feature
request, since for his workflow, it would be useful if saved only the
first N lines and last N lines in the xunit file, since that's
typically sufficient to figure out what's going on.  And I haven't had
a chance to get back to it.

						- Ted
David Disseldorp June 27, 2022, 10:02 p.m. UTC | #5
On Fri, 24 Jun 2022 11:18:58 -0400, Theodore Ts'o wrote:

> On Fri, Jun 24, 2022 at 10:32:43AM +0200, David Disseldorp wrote:
> > Yes, I forgot to mention that, sorry. As Zorro indicated, these were
> > done atop the v2022.06.12 tag with the following series applied:
> > https://lore.kernel.org/fstests/20220620192934.21694-1-ddiss@suse.de/  
> 
> Got it, thanks.  Sorry, I had forgotten that we had the next branch now.
> 
> I'll try to do a full review once I'm able to give the patches a spin.
> 
> > > <testcase classname="xfstests.global" name="generic/476" time="354">
> > > 		<failure message="Test  failed, reason unknown" type="TestFail" />
> > > 		<system-out>
> > > 		...
> > > 	</testcase>
> > > <testcase classname="xfstests.global" name="generic/476" time="343">
> > > 	</testcase>
> > > <testcase classname="xfstests.global" name="generic/476" time="353">
> > > 	</testcase>
> > > 	...  
> > 
> > That seems sensible, I'll add this functionality.  
> 
> I *think* that should happen automatically when _make_testcase_report
> gets called after each iteration.  So that might be easier than having
> to do any kind of special case handling.  (Which is why that was going
> to be how I was planning on tackling it before you went ahead and
> implemented --- thanks for that!!)

It does, but I've messed around with a few things in that code path, so
just need to make sure that this works as expected :). It should be
working this way in the v2 patchset that I'm about to send...

> > > As far as haivng the .bad and .full files, I agree that some kind of
> > > .rerun-NN suffix would make a lot of sense.  
> > 
> > I had a bit of a play with this and it does get a bit ugly if we want to
> > prefix things like .dmesg as well. The xunit rerun entries will already
> > capture everything, but I suppose it's still needed for those not using
> > xunit reports.  
> 
> Well, actually, one of the things on my TODO list was to implement a
> new report type which would removed the xunit <system-out> fields from
> the xunit file.  The reason behind that is sometimes the the
> NNN.out.bad files can get huge --- and the Python library for parsing
> junit XML files has a safety mechanism which will error out if a field
> is larger than 10MB, to prevent some denial of service attacks.  And
> I've had some XFS NNN.out.bad files get to be 30MB or larger!

Ouch, that does sound hard to parse. One thing I also noticed is that a
stray "]]>" CDATA terminator in the any of the captured content will
likely also cause some parsing headaches, so should be filtered.

> When that happens, it causes the Python script I use to parse the XML
> file to fail.  In addition, since I already have a different mechanism
> for saving the full set of test artifiacts ---- sometimes having the
> NNN.full file is really useful for root causing the failure --- having
> two copies of the out.bad files in both the Xunit file and in my test
> artifacts tarball is a bit of a waste.
> 
> I had a POC which implemented this, but then Darrick had a feature
> request, since for his workflow, it would be useful if saved only the
> first N lines and last N lines in the xunit file, since that's
> typically sufficient to figure out what's going on.  And I haven't had
> a chance to get back to it.

Given that the extra debugging details are already there in the current
xunit output, I think we may be stuck with them. That said, it should be
pretty straightforward to add a new "xunit-brief" or similar report type
under the (currently single-purpose) report API.

Cheers, David