test-lib: make '--stress' more bisect-friendly
diff mbox series

Message ID 20190208115045.13256-1-szeder.dev@gmail.com
State New
Headers show
Series
  • test-lib: make '--stress' more bisect-friendly
Related show

Commit Message

SZEDER Gábor Feb. 8, 2019, 11:50 a.m. UTC
Let's suppose that a test somehow becomes flaky between 'master' and
'pu', and tends to fail within the first 50 repetitions when run with
'--stress'.  In such a case we could use 'git bisect' to find the
culprit: if the test script fails with '--stress', then the commit is
definitely bad, but if it survives, say, 300 repetitions, then we could
consider it good with reasonable confidence.

Unfortunately, all this could only be done manually, because
'--stress' would run the test script repeatedly for all eternity on a
good commit, and it would exit with success even when it found a
failure on a bad commit.

So let's make '--stress' usable with 'git bisect run':

  - Make it exit with failure if a failure is found.

  - Add the '--stress-limit=<N>' option to repeat the test script
    at most N times in each of the parallel jobs, and exit with
    success when the limit is reached.

And then we could simply run something like:

  $ git bisect start origin/pu master
  $ git bisect run sh -c 'make && cd t &&
                          ./t1234-foo.sh --stress --stress-limit=300'

Sure, as a brand new feature it won't be any useful right now, but in
a release or three most cooking topics will already contain this, so
we could automatically bisect at least newly introduced flakiness.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---

This is a case when an external stress script works better, as it can
easily check commits in the past...  if someone has such a script,
that is.

Anyway, the approach works:

  https://public-inbox.org/git/20190129213533.GE13764@szeder.dev/
  https://public-inbox.org/git/20190208113059.GV10587@szeder.dev/

 t/README      |  5 +++++
 t/test-lib.sh | 18 ++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

Comments

Jeff King Feb. 8, 2019, 4:47 p.m. UTC | #1
On Fri, Feb 08, 2019 at 12:50:45PM +0100, SZEDER Gábor wrote:

>   - Make it exit with failure if a failure is found.
> 
>   - Add the '--stress-limit=<N>' option to repeat the test script
>     at most N times in each of the parallel jobs, and exit with
>     success when the limit is reached.
> [...]
> 
> This is a case when an external stress script works better, as it can
> easily check commits in the past...  if someone has such a script,
> that is.

Heh, I literally just implemented this kind of max-count in my own
"stress" script[1] to handle this recent t0025 testing. So certainly I
think it is a good idea.

Picking an <N> is tough. Too low and you get a false negative, too high
and you can wait forever, especially if the script is long. But I don't
think there's any real way to auto-scale it, except by seeing a few of
the failing cases and watching how long they take.

>  t/README      |  5 +++++
>  t/test-lib.sh | 18 ++++++++++++++++--
>  2 files changed, 21 insertions(+), 2 deletions(-)

Patch looks good. A few observations:

> @@ -237,8 +248,10 @@ then
>  				exit 1
>  			' TERM INT
>  
> -			cnt=0
> -			while ! test -e "$stressfail"
> +			cnt=1
> +			while ! test -e "$stressfail" &&
> +			      { test -z "$stress_limit" ||
> +				test $cnt -le $stress_limit ; }
>  			do
>  				$TEST_SHELL_PATH "$0" "$@" >"$TEST_RESULTS_BASE.stress-$job_nr.out" 2>&1 &
>  				test_pid=$!

You switch to 1-indexing the counts here. I think that makes sense,
since otherwise --stress-limit=300 would end at "1.299", etc.

> @@ -261,6 +274,7 @@ then
>  
>  	if test -f "$stressfail"
>  	then
> +		stress_exit=1
>  		echo "Log(s) of failed test run(s):"
>  		for failed_job_nr in $(sort -n "$stressfail")
>  		do

I think I'd argue that this missing stress_exit is a bug in the original
script, and somewhat orthogonal to the limit counter. But I don't think
it's worth the trouble to split it out (and certainly the theme of "now
you can run this via bisect" unifies the two changes).

-Peff
Jeff King Feb. 8, 2019, 4:49 p.m. UTC | #2
On Fri, Feb 08, 2019 at 11:47:33AM -0500, Jeff King wrote:

> > This is a case when an external stress script works better, as it can
> > easily check commits in the past...  if someone has such a script,
> > that is.
> 
> Heh, I literally just implemented this kind of max-count in my own
> "stress" script[1] to handle this recent t0025 testing. So certainly I
> think it is a good idea.

As usual, I forgot my footnote. ;)

It was:

  I've actually mostly given up my stress script in favor of --stress. I
  only used it here because of the bisection issue you mention.

  One other thing I've noticed with it: I forget to add my custom
  --root=/var/ram/git-tests when I invoke it, so my hard disk goes
  crazy (and the tests often run slower!). I'm not sure if there's a
  convenient fix.

-Peff
SZEDER Gábor Feb. 8, 2019, 6:23 p.m. UTC | #3
On Fri, Feb 08, 2019 at 11:47:33AM -0500, Jeff King wrote:
> On Fri, Feb 08, 2019 at 12:50:45PM +0100, SZEDER Gábor wrote:
> 
> >   - Make it exit with failure if a failure is found.
> > 
> >   - Add the '--stress-limit=<N>' option to repeat the test script
> >     at most N times in each of the parallel jobs, and exit with
> >     success when the limit is reached.
> > [...]
> > 
> > This is a case when an external stress script works better, as it can
> > easily check commits in the past...  if someone has such a script,
> > that is.
> 
> Heh, I literally just implemented this kind of max-count in my own
> "stress" script[1] to handle this recent t0025 testing. So certainly I
> think it is a good idea.
> 
> Picking an <N> is tough. Too low and you get a false negative, too high
> and you can wait forever, especially if the script is long. But I don't
> think there's any real way to auto-scale it, except by seeing a few of
> the failing cases and watching how long they take.

So far I've chosen <N> like this: run the test script with --stress
3-5 times to trigger the failure, take the highest repetition count
that was necessary for the failure, multiply it by 4-6 to get a round
number, and that's a good ballpark for <N>.  And once bisect came up
with the suspect commit, I double checked it by letting the test
script run with --stress on its parent commit for at least 5-10x <N>
repetitions.

Anyway, I doubt that auto-scaling <N> is worth the effort.

> >  t/README      |  5 +++++
> >  t/test-lib.sh | 18 ++++++++++++++++--
> >  2 files changed, 21 insertions(+), 2 deletions(-)
> 
> Patch looks good. A few observations:
> 
> > @@ -237,8 +248,10 @@ then
> >  				exit 1
> >  			' TERM INT
> >  
> > -			cnt=0
> > -			while ! test -e "$stressfail"
> > +			cnt=1
> > +			while ! test -e "$stressfail" &&
> > +			      { test -z "$stress_limit" ||
> > +				test $cnt -le $stress_limit ; }
> >  			do
> >  				$TEST_SHELL_PATH "$0" "$@" >"$TEST_RESULTS_BASE.stress-$job_nr.out" 2>&1 &
> >  				test_pid=$!
> 
> You switch to 1-indexing the counts here. I think that makes sense,
> since otherwise --stress-limit=300 would end at "1.299", etc.

Yeah, that's exactly why I did it.

> 
> > @@ -261,6 +274,7 @@ then
> >  
> >  	if test -f "$stressfail"
> >  	then
> > +		stress_exit=1
> >  		echo "Log(s) of failed test run(s):"
> >  		for failed_job_nr in $(sort -n "$stressfail")
> >  		do
> 
> I think I'd argue that this missing stress_exit is a bug in the original
> script,

Well, yes, indeed.

Though being able to trigger an elusive test failure is a success in
my book ;)

> and somewhat orthogonal to the limit counter. But I don't think
> it's worth the trouble to split it out (and certainly the theme of "now
> you can run this via bisect" unifies the two changes).
> 
> -Peff
SZEDER Gábor Feb. 8, 2019, 6:33 p.m. UTC | #4
On Fri, Feb 08, 2019 at 11:49:37AM -0500, Jeff King wrote:
>   One other thing I've noticed with it: I forget to add my custom
>   --root=/var/ram/git-tests when I invoke it, so my hard disk goes
>   crazy (and the tests often run slower!). I'm not sure if there's a
>   convenient fix.

OTOH, that could introduce more variance in the timing of the test's
commands, thus potentially increasing the chances of a failure.  I
dunno.

Maybe ./t1234-foo.sh should learn to respect DEFAULT_TEST_OPTS
somehow?
Jeff King Feb. 8, 2019, 7:11 p.m. UTC | #5
On Fri, Feb 08, 2019 at 07:23:19PM +0100, SZEDER Gábor wrote:

> > Picking an <N> is tough. Too low and you get a false negative, too high
> > and you can wait forever, especially if the script is long. But I don't
> > think there's any real way to auto-scale it, except by seeing a few of
> > the failing cases and watching how long they take.
> 
> So far I've chosen <N> like this: run the test script with --stress
> 3-5 times to trigger the failure, take the highest repetition count
> that was necessary for the failure, multiply it by 4-6 to get a round
> number, and that's a good ballpark for <N>.  And once bisect came up
> with the suspect commit, I double checked it by letting the test
> script run with --stress on its parent commit for at least 5-10x <N>
> repetitions.

Heh. That's exactly my process, too. :)

> Anyway, I doubt that auto-scaling <N> is worth the effort.

Yeah, especially because as a concept it exists outside of the script
itself (i.e., you have to checkout a failing version and then run the
script a bunch of times; that's not something that test-lib.sh should
even know about).

So let's go with this for now. It's already a much nicer tool than we
had yesterday, so we can take some time to get used to it.

-Peff
Jeff King Feb. 8, 2019, 7:12 p.m. UTC | #6
On Fri, Feb 08, 2019 at 07:33:07PM +0100, SZEDER Gábor wrote:

> On Fri, Feb 08, 2019 at 11:49:37AM -0500, Jeff King wrote:
> >   One other thing I've noticed with it: I forget to add my custom
> >   --root=/var/ram/git-tests when I invoke it, so my hard disk goes
> >   crazy (and the tests often run slower!). I'm not sure if there's a
> >   convenient fix.
> 
> OTOH, that could introduce more variance in the timing of the test's
> commands, thus potentially increasing the chances of a failure.  I
> dunno.
> 
> Maybe ./t1234-foo.sh should learn to respect DEFAULT_TEST_OPTS
> somehow?

Yeah, that was what I was thinking. On the other hand, I'd actually find
that a little bit annoying for the non-stress case. I commonly do
"./t1234-foo.sh" in order to dig into a specific breakage, and having
the failing trash directory right there is convenient (and I don't care
as much about speed, since I'm just running it once).

I may just gut my "stress" script and make it a wrapper for calling
the script with "--stress --root=...". :)

-Peff

Patch
diff mbox series

diff --git a/t/README b/t/README
index 11ce7675e3..3aed321248 100644
--- a/t/README
+++ b/t/README
@@ -202,6 +202,11 @@  appropriately before running "make".
 	'.stress-<nr>' suffix, and the trash directory of the failed
 	test job is renamed to end with a '.stress-failed' suffix.
 
+--stress-limit=<N>::
+	When combined with --stress run the test script repeatedly
+	this many times in each of the parallel jobs or until one of
+	them fails, whichever comes first.
+
 You can also set the GIT_TEST_INSTALLED environment variable to
 the bindir of an existing git installation to test that installation.
 You still need to have built this git sandbox, from which various
diff --git a/t/test-lib.sh b/t/test-lib.sh
index a1abb1177a..77eff04c92 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -152,6 +152,17 @@  do
 			;;
 		esac
 		;;
+	--stress-limit=*)
+		stress_limit=${opt#--*=}
+		case "$stress_limit" in
+		*[^0-9]*|0*|"")
+			echo "error: --stress-limit=<N> requires the number of repetitions" >&2
+			exit 1
+			;;
+		*)	# Good.
+			;;
+		esac
+		;;
 	*)
 		echo "error: unknown test option '$opt'" >&2; exit 1 ;;
 	esac
@@ -237,8 +248,10 @@  then
 				exit 1
 			' TERM INT
 
-			cnt=0
-			while ! test -e "$stressfail"
+			cnt=1
+			while ! test -e "$stressfail" &&
+			      { test -z "$stress_limit" ||
+				test $cnt -le $stress_limit ; }
 			do
 				$TEST_SHELL_PATH "$0" "$@" >"$TEST_RESULTS_BASE.stress-$job_nr.out" 2>&1 &
 				test_pid=$!
@@ -261,6 +274,7 @@  then
 
 	if test -f "$stressfail"
 	then
+		stress_exit=1
 		echo "Log(s) of failed test run(s):"
 		for failed_job_nr in $(sort -n "$stressfail")
 		do