Message ID | 20190208115045.13256-1-szeder.dev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | test-lib: make '--stress' more bisect-friendly | expand |
On Fri, Feb 08, 2019 at 12:50:45PM +0100, SZEDER Gábor wrote: > - Make it exit with failure if a failure is found. > > - Add the '--stress-limit=<N>' option to repeat the test script > at most N times in each of the parallel jobs, and exit with > success when the limit is reached. > [...] > > This is a case when an external stress script works better, as it can > easily check commits in the past... if someone has such a script, > that is. Heh, I literally just implemented this kind of max-count in my own "stress" script[1] to handle this recent t0025 testing. So certainly I think it is a good idea. Picking an <N> is tough. Too low and you get a false negative, too high and you can wait forever, especially if the script is long. But I don't think there's any real way to auto-scale it, except by seeing a few of the failing cases and watching how long they take. > t/README | 5 +++++ > t/test-lib.sh | 18 ++++++++++++++++-- > 2 files changed, 21 insertions(+), 2 deletions(-) Patch looks good. A few observations: > @@ -237,8 +248,10 @@ then > exit 1 > ' TERM INT > > - cnt=0 > - while ! test -e "$stressfail" > + cnt=1 > + while ! test -e "$stressfail" && > + { test -z "$stress_limit" || > + test $cnt -le $stress_limit ; } > do > $TEST_SHELL_PATH "$0" "$@" >"$TEST_RESULTS_BASE.stress-$job_nr.out" 2>&1 & > test_pid=$! You switch to 1-indexing the counts here. I think that makes sense, since otherwise --stress-limit=300 would end at "1.299", etc. > @@ -261,6 +274,7 @@ then > > if test -f "$stressfail" > then > + stress_exit=1 > echo "Log(s) of failed test run(s):" > for failed_job_nr in $(sort -n "$stressfail") > do I think I'd argue that this missing stress_exit is a bug in the original script, and somewhat orthogonal to the limit counter. But I don't think it's worth the trouble to split it out (and certainly the theme of "now you can run this via bisect" unifies the two changes). -Peff
On Fri, Feb 08, 2019 at 11:47:33AM -0500, Jeff King wrote: > > This is a case when an external stress script works better, as it can > > easily check commits in the past... if someone has such a script, > > that is. > > Heh, I literally just implemented this kind of max-count in my own > "stress" script[1] to handle this recent t0025 testing. So certainly I > think it is a good idea. As usual, I forgot my footnote. ;) It was: I've actually mostly given up my stress script in favor of --stress. I only used it here because of the bisection issue you mention. One other thing I've noticed with it: I forget to add my custom --root=/var/ram/git-tests when I invoke it, so my hard disk goes crazy (and the tests often run slower!). I'm not sure if there's a convenient fix. -Peff
On Fri, Feb 08, 2019 at 11:47:33AM -0500, Jeff King wrote: > On Fri, Feb 08, 2019 at 12:50:45PM +0100, SZEDER Gábor wrote: > > > - Make it exit with failure if a failure is found. > > > > - Add the '--stress-limit=<N>' option to repeat the test script > > at most N times in each of the parallel jobs, and exit with > > success when the limit is reached. > > [...] > > > > This is a case when an external stress script works better, as it can > > easily check commits in the past... if someone has such a script, > > that is. > > Heh, I literally just implemented this kind of max-count in my own > "stress" script[1] to handle this recent t0025 testing. So certainly I > think it is a good idea. > > Picking an <N> is tough. Too low and you get a false negative, too high > and you can wait forever, especially if the script is long. But I don't > think there's any real way to auto-scale it, except by seeing a few of > the failing cases and watching how long they take. So far I've chosen <N> like this: run the test script with --stress 3-5 times to trigger the failure, take the highest repetition count that was necessary for the failure, multiply it by 4-6 to get a round number, and that's a good ballpark for <N>. And once bisect came up with the suspect commit, I double checked it by letting the test script run with --stress on its parent commit for at least 5-10x <N> repetitions. Anyway, I doubt that auto-scaling <N> is worth the effort. > > t/README | 5 +++++ > > t/test-lib.sh | 18 ++++++++++++++++-- > > 2 files changed, 21 insertions(+), 2 deletions(-) > > Patch looks good. A few observations: > > > @@ -237,8 +248,10 @@ then > > exit 1 > > ' TERM INT > > > > - cnt=0 > > - while ! test -e "$stressfail" > > + cnt=1 > > + while ! test -e "$stressfail" && > > + { test -z "$stress_limit" || > > + test $cnt -le $stress_limit ; } > > do > > $TEST_SHELL_PATH "$0" "$@" >"$TEST_RESULTS_BASE.stress-$job_nr.out" 2>&1 & > > test_pid=$! > > You switch to 1-indexing the counts here. I think that makes sense, > since otherwise --stress-limit=300 would end at "1.299", etc. Yeah, that's exactly why I did it. > > > @@ -261,6 +274,7 @@ then > > > > if test -f "$stressfail" > > then > > + stress_exit=1 > > echo "Log(s) of failed test run(s):" > > for failed_job_nr in $(sort -n "$stressfail") > > do > > I think I'd argue that this missing stress_exit is a bug in the original > script, Well, yes, indeed. Though being able to trigger an elusive test failure is a success in my book ;) > and somewhat orthogonal to the limit counter. But I don't think > it's worth the trouble to split it out (and certainly the theme of "now > you can run this via bisect" unifies the two changes). > > -Peff
On Fri, Feb 08, 2019 at 11:49:37AM -0500, Jeff King wrote: > One other thing I've noticed with it: I forget to add my custom > --root=/var/ram/git-tests when I invoke it, so my hard disk goes > crazy (and the tests often run slower!). I'm not sure if there's a > convenient fix. OTOH, that could introduce more variance in the timing of the test's commands, thus potentially increasing the chances of a failure. I dunno. Maybe ./t1234-foo.sh should learn to respect DEFAULT_TEST_OPTS somehow?
On Fri, Feb 08, 2019 at 07:23:19PM +0100, SZEDER Gábor wrote: > > Picking an <N> is tough. Too low and you get a false negative, too high > > and you can wait forever, especially if the script is long. But I don't > > think there's any real way to auto-scale it, except by seeing a few of > > the failing cases and watching how long they take. > > So far I've chosen <N> like this: run the test script with --stress > 3-5 times to trigger the failure, take the highest repetition count > that was necessary for the failure, multiply it by 4-6 to get a round > number, and that's a good ballpark for <N>. And once bisect came up > with the suspect commit, I double checked it by letting the test > script run with --stress on its parent commit for at least 5-10x <N> > repetitions. Heh. That's exactly my process, too. :) > Anyway, I doubt that auto-scaling <N> is worth the effort. Yeah, especially because as a concept it exists outside of the script itself (i.e., you have to checkout a failing version and then run the script a bunch of times; that's not something that test-lib.sh should even know about). So let's go with this for now. It's already a much nicer tool than we had yesterday, so we can take some time to get used to it. -Peff
On Fri, Feb 08, 2019 at 07:33:07PM +0100, SZEDER Gábor wrote: > On Fri, Feb 08, 2019 at 11:49:37AM -0500, Jeff King wrote: > > One other thing I've noticed with it: I forget to add my custom > > --root=/var/ram/git-tests when I invoke it, so my hard disk goes > > crazy (and the tests often run slower!). I'm not sure if there's a > > convenient fix. > > OTOH, that could introduce more variance in the timing of the test's > commands, thus potentially increasing the chances of a failure. I > dunno. > > Maybe ./t1234-foo.sh should learn to respect DEFAULT_TEST_OPTS > somehow? Yeah, that was what I was thinking. On the other hand, I'd actually find that a little bit annoying for the non-stress case. I commonly do "./t1234-foo.sh" in order to dig into a specific breakage, and having the failing trash directory right there is convenient (and I don't care as much about speed, since I'm just running it once). I may just gut my "stress" script and make it a wrapper for calling the script with "--stress --root=...". :) -Peff
diff --git a/t/README b/t/README index 11ce7675e3..3aed321248 100644 --- a/t/README +++ b/t/README @@ -202,6 +202,11 @@ appropriately before running "make". '.stress-<nr>' suffix, and the trash directory of the failed test job is renamed to end with a '.stress-failed' suffix. +--stress-limit=<N>:: + When combined with --stress run the test script repeatedly + this many times in each of the parallel jobs or until one of + them fails, whichever comes first. + You can also set the GIT_TEST_INSTALLED environment variable to the bindir of an existing git installation to test that installation. You still need to have built this git sandbox, from which various diff --git a/t/test-lib.sh b/t/test-lib.sh index a1abb1177a..77eff04c92 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -152,6 +152,17 @@ do ;; esac ;; + --stress-limit=*) + stress_limit=${opt#--*=} + case "$stress_limit" in + *[^0-9]*|0*|"") + echo "error: --stress-limit=<N> requires the number of repetitions" >&2 + exit 1 + ;; + *) # Good. + ;; + esac + ;; *) echo "error: unknown test option '$opt'" >&2; exit 1 ;; esac @@ -237,8 +248,10 @@ then exit 1 ' TERM INT - cnt=0 - while ! test -e "$stressfail" + cnt=1 + while ! test -e "$stressfail" && + { test -z "$stress_limit" || + test $cnt -le $stress_limit ; } do $TEST_SHELL_PATH "$0" "$@" >"$TEST_RESULTS_BASE.stress-$job_nr.out" 2>&1 & test_pid=$! @@ -261,6 +274,7 @@ then if test -f "$stressfail" then + stress_exit=1 echo "Log(s) of failed test run(s):" for failed_job_nr in $(sort -n "$stressfail") do
Let's suppose that a test somehow becomes flaky between 'master' and 'pu', and tends to fail within the first 50 repetitions when run with '--stress'. In such a case we could use 'git bisect' to find the culprit: if the test script fails with '--stress', then the commit is definitely bad, but if it survives, say, 300 repetitions, then we could consider it good with reasonable confidence. Unfortunately, all this could only be done manually, because '--stress' would run the test script repeatedly for all eternity on a good commit, and it would exit with success even when it found a failure on a bad commit. So let's make '--stress' usable with 'git bisect run': - Make it exit with failure if a failure is found. - Add the '--stress-limit=<N>' option to repeat the test script at most N times in each of the parallel jobs, and exit with success when the limit is reached. And then we could simply run something like: $ git bisect start origin/pu master $ git bisect run sh -c 'make && cd t && ./t1234-foo.sh --stress --stress-limit=300' Sure, as a brand new feature it won't be any useful right now, but in a release or three most cooking topics will already contain this, so we could automatically bisect at least newly introduced flakiness. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> --- This is a case when an external stress script works better, as it can easily check commits in the past... if someone has such a script, that is. Anyway, the approach works: https://public-inbox.org/git/20190129213533.GE13764@szeder.dev/ https://public-inbox.org/git/20190208113059.GV10587@szeder.dev/ t/README | 5 +++++ t/test-lib.sh | 18 ++++++++++++++++-- 2 files changed, 21 insertions(+), 2 deletions(-)