Message ID | 20181204163457.15717-4-szeder.dev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | test-lib: add the '--stress' option to help reproduce occasional failures in flaky tests | expand |
On Tue, Dec 04 2018, SZEDER Gábor wrote: > The number of parallel invocations is determined by, in order of > precedence: the number specified as '--stress=<N>', or the value of > the GIT_TEST_STRESS_LOAD environment variable, or twice the number of > available processors in '/proc/cpuinfo', or 8. With this series we have at least 3 ways to get this number. First online_cpus() in the C code, then Peff's recent `getconf _NPROCESSORS_ONLN` in doc-diff, and now this /proc/cpuinfo parsing. Perhaps it makes sense to split online_cpus() into a helper to use from the shellscripts instead?
On Tue, Dec 04, 2018 at 06:04:14PM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Dec 04 2018, SZEDER Gábor wrote: > > > The number of parallel invocations is determined by, in order of > > precedence: the number specified as '--stress=<N>', or the value of > > the GIT_TEST_STRESS_LOAD environment variable, or twice the number of > > available processors in '/proc/cpuinfo', or 8. > > With this series we have at least 3 ways to get this number. First > online_cpus() in the C code, then Peff's recent `getconf > _NPROCESSORS_ONLN` in doc-diff, and now this /proc/cpuinfo parsing. > > Perhaps it makes sense to split online_cpus() into a helper to use from > the shellscripts instead? I don't think so, but I will update this patch to use 'getconf' instead.
On Tue, Dec 04 2018, SZEDER Gábor wrote: > Unfortunately, we have a few flaky tests, whose failures tend to be > hard to reproduce. We've found that the best we can do to reproduce > such a failure is to run the test repeatedly while the machine is > under load, and wait in the hope that the load creates enough variance > in the timing of the test's commands that a failure is evenually > triggered. I have a command to do that, and I noticed that two other > contributors have rolled their own scripts to do the same, all > choosing slightly different approaches. > > To help reproduce failures in flaky tests, introduce the '--stress' > option to run a test script repeatedly in multiple parallel > invocations until one of them fails, thereby using the test script > itself to increase the load on the machine. > > The number of parallel invocations is determined by, in order of > precedence: the number specified as '--stress=<N>', or the value of > the GIT_TEST_STRESS_LOAD environment variable, or twice the number of > available processors in '/proc/cpuinfo', or 8. > > To prevent the several parallel invocations of the same test from > interfering with each other: > > - Include the parallel job's number in the name of the trash > directory and the various output files under 't/test-results/' as > a '.stress-<Nr>' suffix. > > - Add the parallel job's number to the port number specified by the > user or to the test number, so even tests involving daemons > listening on a TCP socket can be stressed. > > - Make '--stress' imply '--verbose-log' and discard the test's > standard ouput and error; dumping the output of several parallel > tests to the terminal would create a big ugly mess. > > 'wait' for all parallel jobs before exiting (either because a failure > was found or because the user lost patience and aborted the stress > test), allowing the still running tests to finish. Otherwise the "OK > X.Y" progress output from the last iteration would likely arrive after > the user got back the shell prompt, interfering with typing in the > next command. OTOH, this waiting might induce a considerable delay > between hitting ctrl-C and the test actually exiting; I'm not sure > this is the right tradeoff. I think it makes sense to generalize this and split it up into two features. It's a frequent annoyance of mine in the test suite that I'm e.g. running t*.sh with some parallel "prove" in one screen, and then I run tABCD*.sh manually, and get unlucky because they use the same trash dir, and both tests go boom. You can fix that with --root, which is much of what this patch does. My one-liner for doing --stress has been something like: perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh --root=trash-{} -v' But it would be great if I didn't have to worry about that and could just run two concurrent: ./t0000-basic.sh So I think we could just set some env variable where instead of having the predictable trash directory we have a $TRASHDIR.$N as this patch does, except we pick $N by locking some test-runs/tABCD.Nth file with flock() during the run. Then a stress mode like this would just be: GIT_TEST_FLOCKED_TRASH_DIRS=1 perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh' And sure, we could ship a --stress option too, but it wouldn't be magical in any way, just another "spawn N in a loop" implementation, but you could also e.g. use GNU parallel to drive it, and without needing to decide to stress test in advance, since we'd either flock() the trash dir, or just mktemp(1)-it.
On Tue, Dec 04, 2018 at 05:34:57PM +0100, SZEDER Gábor wrote: > To prevent the several parallel invocations of the same test from > interfering with each other: > > - Include the parallel job's number in the name of the trash > directory and the various output files under 't/test-results/' as > a '.stress-<Nr>' suffix. Makes sense. > - Add the parallel job's number to the port number specified by the > user or to the test number, so even tests involving daemons > listening on a TCP socket can be stressed. In my script I just use an arbitrary sequence of ports. That means two stress runs may stomp on each other, but stress runs tend not to interfere with regular runs (whereas with your scheme, a stress run of t5561 is going to stomp on t5562). It probably doesn't matter much either way, as I tend not to do too much with the machine during a stress run. > - Make '--stress' imply '--verbose-log' and discard the test's > standard ouput and error; dumping the output of several parallel > tests to the terminal would create a big ugly mess. Makes sense. My script just redirects the output to a file, but it predates --verbose-log. My script always runs with "-x". I guess it's not that hard to add it if you want, but it is annoying to finally get a failure after several minutes only to realize that your are missing some important information. ;) Ditto for "-i", which leaves more readable output (you know the interesting part is at the end of the log), and leaves a trash directory state that is more likely to be useful for inspecting. My script also implies "--root". That's not strictly necessary, though I suspect it is much more likely to catch races when it's run on a ram disk (simply because it leaves the CPU as the main bottleneck). > 'wait' for all parallel jobs before exiting (either because a failure > was found or because the user lost patience and aborted the stress > test), allowing the still running tests to finish. Otherwise the "OK > X.Y" progress output from the last iteration would likely arrive after > the user got back the shell prompt, interfering with typing in the > next command. OTOH, this waiting might induce a considerable delay > between hitting ctrl-C and the test actually exiting; I'm not sure > this is the right tradeoff. If there is a delay, it's actually quite annoying to _not_ wait; you start doing something in the terminal, and then a bunch of extra test statuses print at a random time. > + job_nr=0 > + while test $job_nr -lt "$job_count" > + do > + ( > + GIT_TEST_STRESS_STARTED=done > + GIT_TEST_STRESS_JOB_NR=$job_nr > + export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR > + > + cnt=0 > + while ! test -e "$stressfail" > + do > + if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 > + then > + printf >&2 "OK %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt OK, this adds a counter for each job number (compared to my script). Seems helpful. > + elif test -f "$stressfail" && > + test "$(cat "$stressfail")" = "aborted" > + then > + printf >&2 "ABORTED %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > + else > + printf >&2 "FAIL %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > + echo $GIT_TEST_STRESS_JOB_NR >>"$stressfail" > + fi Hmm. What happens if we see a failure _and_ there's an abort? That's actually pretty plausible if you see a failure whiz by, and you want to abort the remaining scripts because they take a long time to run. If the abort happens first, then the goal is to assume any errors are due to the abort. But there's a race between the individual jobs reading $stressfail and the parent signal handler writing it. So you may end up with "aborted\n5" or similar (after which any other jobs would fail to match "aborted" and declare themselves failures, as well). I think my script probably also has this race, too (it doesn't check the abort case explicitly, but it would race in checking "test -e fail"). If the fail happens first (which I think is the more likely case), then the abort handler will overwrite the file with "aborted", and we'll forget that there was a real failure. This works in my script because of the symlinking (if a real failure symlinked $fail to a directory, then the attempt to write "aborted" will just be a noop). > + job_nr=0 > + while test $job_nr -lt "$job_count" > + do > + wait > + job_nr=$(($job_nr + 1)) > + done Do we need to loop? Calling "wait" with no arguments should wait for all children. > + if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" > + then > + echo "Log(s) of failed test run(s) be found in:" > + for f in $(cat "$stressfail") > + do > + echo " $TEST_RESULTS_BASE.stress-$f.out" > + done > + fi In my experience, outputting the failed log saves a step (especially with "-i"), since seeing the failed test and its output is often sufficient. I'm also sad to lose the symlink to the failed trash directory, which saves cutting and pasting when you have to inspect it. But I guess we can't rely on symlinks everywhere. -Peff
On Tue, Dec 04, 2018 at 06:04:14PM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Dec 04 2018, SZEDER Gábor wrote: > > > The number of parallel invocations is determined by, in order of > > precedence: the number specified as '--stress=<N>', or the value of > > the GIT_TEST_STRESS_LOAD environment variable, or twice the number of > > available processors in '/proc/cpuinfo', or 8. > > With this series we have at least 3 ways to get this number. First > online_cpus() in the C code, then Peff's recent `getconf > _NPROCESSORS_ONLN` in doc-diff, and now this /proc/cpuinfo parsing. To be fair, I only added the "getconf" thing because somebody complained that I was parsing /proc/cpuinfo, and suggested it instead. ;) I don't think it's especially portable, but it should work on Linux and modern BSD/macOS, which may be enough (unlike doc-diff, I'd be a little more inclined to ask somebody on a random platform to stress test if they're hitting a bug). > Perhaps it makes sense to split online_cpus() into a helper to use from > the shellscripts instead? I think somebody proposed that in a recent thread for other reasons, too. I'd be OK with that, but probably just using getconf is reasonable in the meantime. -Peff
On Tue, Dec 04, 2018 at 07:11:08PM +0100, Ævar Arnfjörð Bjarmason wrote: > It's a frequent annoyance of mine in the test suite that I'm > e.g. running t*.sh with some parallel "prove" in one screen, and then I > run tABCD*.sh manually, and get unlucky because they use the same trash > dir, and both tests go boom. > > You can fix that with --root, which is much of what this patch does. My > one-liner for doing --stress has been something like: > > perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh --root=trash-{} -v' > > But it would be great if I didn't have to worry about that and could > just run two concurrent: > > ./t0000-basic.sh > > So I think we could just set some env variable where instead of having > the predictable trash directory we have a $TRASHDIR.$N as this patch > does, except we pick $N by locking some test-runs/tABCD.Nth file with > flock() during the run. That actually sounds kind of annoying when doing a single run, since now you have this extra ".N". I guess it would at least be predictable, and I tend to tab-complete the trash dirs anyway. I accomplish the same thing by doing my "big" runs using --root specified in config.mak (which points to a RAM disk -- if you're not using one to run the tests, you really should look into it, as it's way faster). And then for one-offs, investigating failures, etc, I do "cd t && ./t0000-basic.sh -v -i", which naturally runs in the local directory. -Peff
Just a quick reply to this one point for now: On Wed, Dec 05, 2018 at 12:44:09AM -0500, Jeff King wrote: > On Tue, Dec 04, 2018 at 05:34:57PM +0100, SZEDER Gábor wrote: > > + job_nr=0 > > + while test $job_nr -lt "$job_count" > > + do > > + wait > > + job_nr=$(($job_nr + 1)) > > + done > > Do we need to loop? Calling "wait" with no arguments should wait for all > children. It should, but in dash, ksh, ksh93, Bash v4.2 or older it doesn't seem to do so, at least not on my machine, and I get back my shell prompt right after hitting ctrl-C or the first failed test, and then get the progress output from the remaining jobs later. Bash 4.3 or later are strange: I get back the shell prompt immediately after ctrl-C as well, so it doesn't appear to be waiting for all remaining jobs to finish either, but! I don't get any of the progress output from those jobs to mess up my next command. And mksh and zsh can't run our tests, and I don't have any more shells at hand to try.
On Tue, Dec 04, 2018 at 07:11:08PM +0100, Ævar Arnfjörð Bjarmason wrote: > It's a frequent annoyance of mine in the test suite that I'm > e.g. running t*.sh with some parallel "prove" in one screen, and then I > run tABCD*.sh manually, and get unlucky because they use the same trash > dir, and both tests go boom. > > You can fix that with --root, which is much of what this patch does. My > one-liner for doing --stress has been something like: > > perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh --root=trash-{} -v' > > But it would be great if I didn't have to worry about that and could > just run two concurrent: > > ./t0000-basic.sh > > So I think we could just set some env variable where instead of having > the predictable trash directory we have a $TRASHDIR.$N as this patch > does, except we pick $N by locking some test-runs/tABCD.Nth file with > flock() during the run. Is 'flock' portable? It doesn't appear so. > Then a stress mode like this would just be: > > GIT_TEST_FLOCKED_TRASH_DIRS=1 perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh' This doesn't address the issue of TCP ports for various daemons. > And sure, we could ship a --stress option too, but it wouldn't be > magical in any way, just another "spawn N in a loop" implementation, but > you could also e.g. use GNU parallel to drive it, and without needing to GNU 'parallel' may or may not be available on the platform, that's why I stuck with the barebones shell-loops-in-the-background approach. > decide to stress test in advance, since we'd either flock() the trash > dir, or just mktemp(1)-it. While 'mktemp' seems to be more portable than 'flock', it doesn't seem to be portable enough; at least it's not in POSIX. And in general I'd prefer deterministic trash directory names. If I re-run a failed test after fixing the bug, then currently the trash directory will be overwritten, and that's good. With 'mktemp's junk in the trash direcory's name it won't be overwritten, and if my bugfix doesn't work, then I'll start accumulating trash directories and won't even know which one is from the last run.
On Wed, Dec 05, 2018 at 12:44:09AM -0500, Jeff King wrote: > On Tue, Dec 04, 2018 at 05:34:57PM +0100, SZEDER Gábor wrote: > > > To prevent the several parallel invocations of the same test from > > interfering with each other: > > > > - Include the parallel job's number in the name of the trash > > directory and the various output files under 't/test-results/' as > > a '.stress-<Nr>' suffix. > > Makes sense. > > > - Add the parallel job's number to the port number specified by the > > user or to the test number, so even tests involving daemons > > listening on a TCP socket can be stressed. > > In my script I just use an arbitrary sequence of ports. That means two > stress runs may stomp on each other, but stress runs tend not to > interfere with regular runs (whereas with your scheme, a stress run of > t5561 is going to stomp on t5562). It probably doesn't matter much > either way, as I tend not to do too much with the machine during a > stress run. A custom port can always be set via environment variables, though the user has to remember to set it (I doubt that I would remember it until reminded by test failures...) > > - Make '--stress' imply '--verbose-log' and discard the test's > > standard ouput and error; dumping the output of several parallel > > tests to the terminal would create a big ugly mess. > > Makes sense. My script just redirects the output to a file, but it > predates --verbose-log. > > My script always runs with "-x". I guess it's not that hard to add it if > you want, but it is annoying to finally get a failure after several > minutes only to realize that your are missing some important > information. ;) > > Ditto for "-i", which leaves more readable output (you know the > interesting part is at the end of the log), and leaves a trash directory > state that is more likely to be useful for inspecting. I wanted to imply only the bare minimum of options that are necessary to prevent the tests from stomping on each other. Other than that I agree and wouldn't run '--stress' without '-x -i' myself, and at one point considered to recommend '-x -i' in the description of '--stress'. > My script also implies "--root". That's not strictly necessary, though I > suspect it is much more likely to catch races when it's run on a ram > disk (simply because it leaves the CPU as the main bottleneck). > > > 'wait' for all parallel jobs before exiting (either because a failure > > was found or because the user lost patience and aborted the stress > > test), allowing the still running tests to finish. Otherwise the "OK > > X.Y" progress output from the last iteration would likely arrive after > > the user got back the shell prompt, interfering with typing in the > > next command. OTOH, this waiting might induce a considerable delay > > between hitting ctrl-C and the test actually exiting; I'm not sure > > this is the right tradeoff. > > If there is a delay, it's actually quite annoying to _not_ wait; you > start doing something in the terminal, and then a bunch of extra test > statuses print at a random time. At least the INT/TERM/etc. handler should say something like "Waiting for background jobs to finish", to let the user know why it doesn't exit right away. An alternative would be to exit right away, and make the background loops a tad bit smarter to not print these progress lines when aborting. > > + job_nr=0 > > + while test $job_nr -lt "$job_count" > > + do > > + ( > > + GIT_TEST_STRESS_STARTED=done > > + GIT_TEST_STRESS_JOB_NR=$job_nr > > + export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR > > + > > + cnt=0 > > + while ! test -e "$stressfail" > > + do > > + if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 > > + then > > + printf >&2 "OK %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > > OK, this adds a counter for each job number (compared to my script). > Seems helpful. > > > + elif test -f "$stressfail" && > > + test "$(cat "$stressfail")" = "aborted" > > + then > > + printf >&2 "ABORTED %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > > + else > > + printf >&2 "FAIL %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > > + echo $GIT_TEST_STRESS_JOB_NR >>"$stressfail" > > + fi > > Hmm. What happens if we see a failure _and_ there's an abort? That's > actually pretty plausible if you see a failure whiz by, and you want to > abort the remaining scripts because they take a long time to run. > > If the abort happens first, then the goal is to assume any errors are > due to the abort. Yeah, that's why I added this ABORTED case. When I hit ctrl-C, a couple of the tests tend to fail, but that's not the rare failure we are hoping to find when stress testing, so printing FAIL would be misleading. The test script exits with 1 all the same, so we can't tell the difference, but since rare failures are, well, rare, this failure is most likely due to the abort. > But there's a race between the individual jobs reading > $stressfail and the parent signal handler writing it. So you may end up > with "aborted\n5" or similar (after which any other jobs would fail to > match "aborted" and declare themselves failures, as well). I think my > script probably also has this race, too (it doesn't check the abort case > explicitly, but it would race in checking "test -e fail"). > > If the fail happens first (which I think is the more likely case), then > the abort handler will overwrite the file with "aborted", and we'll > forget that there was a real failure. This works in my script because of > the symlinking (if a real failure symlinked $fail to a directory, then > the attempt to write "aborted" will just be a noop). OK, I think the abort handler should append, not overwrite, and then the file's contents should be checked e.g. with a case statement looking for *aborted*. Or use two separate files for signaling abort and test failure. In any case, you could always grab the job number from the "FAIL X.Y" line, and then look at its logs and trash direcory. > > + job_nr=0 > > + while test $job_nr -lt "$job_count" > > + do > > + wait > > + job_nr=$(($job_nr + 1)) > > + done > > Do we need to loop? Calling "wait" with no arguments should wait for all > children. > > > + if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" > > + then > > + echo "Log(s) of failed test run(s) be found in:" > > + for f in $(cat "$stressfail") > > + do > > + echo " $TEST_RESULTS_BASE.stress-$f.out" > > + done > > + fi > > In my experience, outputting the failed log saves a step (especially > with "-i"), since seeing the failed test and its output is often > sufficient. I just don't like when test output, especially with '-x', is dumped on my terminal :) > I'm also sad to lose the symlink to the failed trash directory, which > saves cutting and pasting when you have to inspect it. But I guess we > can't rely on symlinks everywhere. You can tab-complete most of the trash directory, and then there are only those 1-2-3 digits of the job number that you have to type. That worked out well enough for me so far (but I've only used it for a few days, so...). We could rename the trash directory of the failed test to something short, sweet, and common, but that could be racy in the unlikely case that two tests fail at the same time. And I like the 'trash directory.' prefix, because then 'make clean' will delete that one, too, without modifying 't/Makefile' (which is also the reason for adding the 'stress-N' suffix instead of a prefix, and putting the fail file into the 'test-results' directory).
On Wed, Dec 05 2018, SZEDER Gábor wrote: > On Tue, Dec 04, 2018 at 07:11:08PM +0100, Ævar Arnfjörð Bjarmason wrote: >> It's a frequent annoyance of mine in the test suite that I'm >> e.g. running t*.sh with some parallel "prove" in one screen, and then I >> run tABCD*.sh manually, and get unlucky because they use the same trash >> dir, and both tests go boom. >> >> You can fix that with --root, which is much of what this patch does. My >> one-liner for doing --stress has been something like: >> >> perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh --root=trash-{} -v' >> >> But it would be great if I didn't have to worry about that and could >> just run two concurrent: >> >> ./t0000-basic.sh >> >> So I think we could just set some env variable where instead of having >> the predictable trash directory we have a $TRASHDIR.$N as this patch >> does, except we pick $N by locking some test-runs/tABCD.Nth file with >> flock() during the run. > > Is 'flock' portable? It doesn't appear so. Portable enough, and since it's just an alias for "grab lock atomically" it can devolve into other more basic FS functions on systems that don't have it. >> Then a stress mode like this would just be: >> >> GIT_TEST_FLOCKED_TRASH_DIRS=1 perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh' > > This doesn't address the issue of TCP ports for various daemons. Once we have a test ABCD and slot offset N (for a fixed size of N) we can pick a port from that. >> And sure, we could ship a --stress option too, but it wouldn't be >> magical in any way, just another "spawn N in a loop" implementation, but >> you could also e.g. use GNU parallel to drive it, and without needing to > > GNU 'parallel' may or may not be available on the platform, that's why > I stuck with the barebones shell-loops-in-the-background approach. Yes, my point is not that you should drop that part of your patch for using GNU parallel, but just to demonstrate that we can get pretty close to this for most tests with an external tool, and that it would make this sort of thing work for concurrent tests without needing to decide to concurrently test in advance. >> decide to stress test in advance, since we'd either flock() the trash >> dir, or just mktemp(1)-it. > > While 'mktemp' seems to be more portable than 'flock', it doesn't seem > to be portable enough; at least it's not in POSIX.> We are already relying on stuff like mktemp() being reliable for e.g. the split index. > And in general I'd prefer deterministic trash directory names. If I > re-run a failed test after fixing the bug, then currently the trash > directory will be overwritten, and that's good. With 'mktemp's junk > in the trash direcory's name it won't be overwritten, and if my bugfix > doesn't work, then I'll start accumulating trash directories and won't > even know which one is from the last run. In general you wouldn't need to set GIT_TEST_FLOCKED_TRASH_DIRS=1 and would get the same monolithic trash names as now, and for something like the flock() version it could use a job number as a suffix like your patch does.
On Wed, Dec 05, 2018 at 03:01:41PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> decide to stress test in advance, since we'd either flock() the trash > >> dir, or just mktemp(1)-it. > > > > While 'mktemp' seems to be more portable than 'flock', it doesn't seem > > to be portable enough; at least it's not in POSIX.> > > We are already relying on stuff like mktemp() being reliable for > e.g. the split index. That's the mktemp() function from libc; I meant the 'mktemp' command that we would use this early in 'test-lib.sh', where PATH has not been set up for testing yet.
On Wed, Dec 05 2018, SZEDER Gábor wrote: > On Wed, Dec 05, 2018 at 03:01:41PM +0100, Ævar Arnfjörð Bjarmason wrote: >> >> decide to stress test in advance, since we'd either flock() the trash >> >> dir, or just mktemp(1)-it. >> > >> > While 'mktemp' seems to be more portable than 'flock', it doesn't seem >> > to be portable enough; at least it's not in POSIX.> >> >> We are already relying on stuff like mktemp() being reliable for >> e.g. the split index. > > That's the mktemp() function from libc; I meant the 'mktemp' command > that we would use this early in 'test-lib.sh', where PATH has not been > set up for testing yet. Ah, if that ever becomes an issue it's a one-line test-tool wrapper.
On Wed, Dec 05, 2018 at 11:34:54AM +0100, SZEDER Gábor wrote: > > Just a quick reply to this one point for now: > > On Wed, Dec 05, 2018 at 12:44:09AM -0500, Jeff King wrote: > > On Tue, Dec 04, 2018 at 05:34:57PM +0100, SZEDER Gábor wrote: > > > + job_nr=0 > > > + while test $job_nr -lt "$job_count" > > > + do > > > + wait > > > + job_nr=$(($job_nr + 1)) > > > + done > > > > Do we need to loop? Calling "wait" with no arguments should wait for all > > children. > > It should, but in dash, ksh, ksh93, Bash v4.2 or older it doesn't seem > to do so, at least not on my machine, and I get back my shell prompt > right after hitting ctrl-C or the first failed test, and then get the > progress output from the remaining jobs later. That's weird. I run my stress script under dash with a single "wait", and it handles the failing case just fine. And switching to a single "wait" in your script works for me. But the ^C case is interesting. Try running your script under "sh -x" and hitting ^C. The signal interrupts the first wait. In my script (or yours modified to use a single wait), we then proceed immediately to the "exit", and get output from the subshells after we've exited. But in your script with the loop, the first wait is interrupted, and then the _second_ wait (in the second loop iteration) picks up all of the exiting processes. The subsequent waits are all noops that return immediately, because there are no processes left. So the right number of waits is either "1" or "2". Looping means we do too many (which is mostly a harmless noop) or too few (in the off chance that you have only a single job ;) ). So it works out in practice. But I think a more clear way to do it is to simply wait in the signal trap, to cover the case when our original wait was aborted. I.e., something like this: diff --git a/t/test-lib.sh b/t/test-lib.sh index 9b7f687396..5e0c41626f 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -98,7 +98,7 @@ done,*) mkdir -p "$(dirname "$TEST_RESULTS_BASE")" stressfail="$TEST_RESULTS_BASE.stress-failed" rm -f "$stressfail" - trap 'echo aborted >"$stressfail"' TERM INT HUP + trap 'echo aborted >"$stressfail"; wait' TERM INT HUP job_nr=0 while test $job_nr -lt "$job_count" @@ -128,12 +128,7 @@ done,*) job_nr=$(($job_nr + 1)) done - job_nr=0 - while test $job_nr -lt "$job_count" - do - wait - job_nr=$(($job_nr + 1)) - done + wait if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" then which behaves well for me in all cases. > Bash 4.3 or later are strange: I get back the shell prompt immediately > after ctrl-C as well, so it doesn't appear to be waiting for all > remaining jobs to finish either, but! I don't get any of the progress > output from those jobs to mess up my next command. Interesting. My bash 4.4 seems to behave the same as dash. It almost sounds like the SIGINT is getting passed to the subshells for you. Probably not really worth digging into, though. In case anybody is playing with this and needed to simulate a stress failure, here's what I used: diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh index b6566003dd..a370cd9977 100755 --- a/t/t0000-basic.sh +++ b/t/t0000-basic.sh @@ -1171,4 +1171,11 @@ test_expect_success 'very long name in the index handled sanely' ' test $len = 4098 ' +test_expect_success 'roll those dice' ' + case "$(openssl rand -base64 1)" in + z*) + return 1 + esac +' + test_done
On Wed, Dec 05, 2018 at 03:01:06PM +0100, SZEDER Gábor wrote: > > > - Make '--stress' imply '--verbose-log' and discard the test's > > > standard ouput and error; dumping the output of several parallel > > > tests to the terminal would create a big ugly mess. > > > > Makes sense. My script just redirects the output to a file, but it > > predates --verbose-log. > > > > My script always runs with "-x". I guess it's not that hard to add it if > > you want, but it is annoying to finally get a failure after several > > minutes only to realize that your are missing some important > > information. ;) > > > > Ditto for "-i", which leaves more readable output (you know the > > interesting part is at the end of the log), and leaves a trash directory > > state that is more likely to be useful for inspecting. > > I wanted to imply only the bare minimum of options that are necessary > to prevent the tests from stomping on each other. Other than that I > agree and wouldn't run '--stress' without '-x -i' myself, and at one > point considered to recommend '-x -i' in the description of > '--stress'. Yeah, it probably is the right thing to make the options as orthogonal as possible, and let the user decide what they want. I was just wondering if I could replace my "stress" script with this. But I think I'd still want it to remember to use a sane set of options (including "--root" in my case). > > > 'wait' for all parallel jobs before exiting (either because a failure > > > was found or because the user lost patience and aborted the stress > > > test), allowing the still running tests to finish. Otherwise the "OK > > > X.Y" progress output from the last iteration would likely arrive after > > > the user got back the shell prompt, interfering with typing in the > > > next command. OTOH, this waiting might induce a considerable delay > > > between hitting ctrl-C and the test actually exiting; I'm not sure > > > this is the right tradeoff. > > > > If there is a delay, it's actually quite annoying to _not_ wait; you > > start doing something in the terminal, and then a bunch of extra test > > statuses print at a random time. > > At least the INT/TERM/etc. handler should say something like "Waiting > for background jobs to finish", to let the user know why it doesn't > exit right away. That seems reasonable. There's also no point in waiting for them to finish if we know we're aborting anyway. Could we just kill them all? I guess it's a little tricky, because $! is going to give us the pid of each subshell. We actually want to kill the test sub-process. That takes a few contortions, but the output is nice (every sub-job immediately says "ABORTED ...", and the final process does not exit until the whole tree is done): diff --git a/t/test-lib.sh b/t/test-lib.sh index 9b7f687396..357dead3ff 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -98,8 +98,9 @@ done,*) mkdir -p "$(dirname "$TEST_RESULTS_BASE")" stressfail="$TEST_RESULTS_BASE.stress-failed" rm -f "$stressfail" - trap 'echo aborted >"$stressfail"' TERM INT HUP + trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' TERM INT HUP + job_pids= job_nr=0 while test $job_nr -lt "$job_count" do @@ -108,10 +109,13 @@ done,*) GIT_TEST_STRESS_JOB_NR=$job_nr export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR + trap 'kill $test_pid 2>/dev/null' TERM + cnt=0 while ! test -e "$stressfail" do - if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 + $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & test_pid=$! + if wait then printf >&2 "OK %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt elif test -f "$stressfail" && @@ -124,16 +128,11 @@ done,*) fi cnt=$(($cnt + 1)) done - ) & + ) & job_pids="$job_pids $!" job_nr=$(($job_nr + 1)) done - job_nr=0 - while test $job_nr -lt "$job_count" - do - wait - job_nr=$(($job_nr + 1)) - done + wait if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" then > > > + elif test -f "$stressfail" && > > > + test "$(cat "$stressfail")" = "aborted" > > > + then > > > + printf >&2 "ABORTED %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > > > + else > > > + printf >&2 "FAIL %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > > > + echo $GIT_TEST_STRESS_JOB_NR >>"$stressfail" > > > + fi > > > > Hmm. What happens if we see a failure _and_ there's an abort? That's > > actually pretty plausible if you see a failure whiz by, and you want to > > abort the remaining scripts because they take a long time to run. > > > > If the abort happens first, then the goal is to assume any errors are > > due to the abort. > > Yeah, that's why I added this ABORTED case. When I hit ctrl-C, a > couple of the tests tend to fail, but that's not the rare failure we > are hoping to find when stress testing, so printing FAIL would be > misleading. The test script exits with 1 all the same, so we can't > tell the difference, but since rare failures are, well, rare, this > failure is most likely due to the abort. With the patch as posted I actually see most of them still say "OK", because the SIGINT does not make it to them (it would be a different story if I killed the whole process group). But with the modification above, I get "ABORTED" for most, which is what I'd want. And it should be pretty race-free, since the "aborted" file is put into place before triggering the signal cascade. > > If the fail happens first (which I think is the more likely case), then > > the abort handler will overwrite the file with "aborted", and we'll > > forget that there was a real failure. This works in my script because of > > the symlinking (if a real failure symlinked $fail to a directory, then > > the attempt to write "aborted" will just be a noop). > > OK, I think the abort handler should append, not overwrite, and then > the file's contents should be checked e.g. with a case statement > looking for *aborted*. > > Or use two separate files for signaling abort and test failure. Two separate files seems a lot simpler to me. > > In my experience, outputting the failed log saves a step (especially > > with "-i"), since seeing the failed test and its output is often > > sufficient. > > I just don't like when test output, especially with '-x', is dumped on > my terminal :) Don't you then use your terminal to display the file? ;) (I'm kidding, I assume you load it in "less" or whatever, which is distinct). I wish this could be optional somehow, though. I really prefer the dump. I guess if I continue to use my script as a wrapper, it can dump for me (after digging in the $stressfail file, I guess). > > I'm also sad to lose the symlink to the failed trash directory, which > > saves cutting and pasting when you have to inspect it. But I guess we > > can't rely on symlinks everywhere. > > You can tab-complete most of the trash directory, and then there are > only those 1-2-3 digits of the job number that you have to type. That > worked out well enough for me so far (but I've only used it for a few > days, so...). > > We could rename the trash directory of the failed test to something > short, sweet, and common, but that could be racy in the unlikely case > that two tests fail at the same time. And I like the 'trash > directory.' prefix, because then 'make clean' will delete that one, > too, without modifying 't/Makefile' (which is also the reason for > adding the 'stress-N' suffix instead of a prefix, and putting the fail > file into the 'test-results' directory). We could rename it to "trash directory.xxx.failed", which is at least predictable. That can even be done atomically, though I think it's a little tricky to do portably with shell tools. I guess I can continue to do the symlink thing in my wrapper script. -Peff
Jeff King <peff@peff.net> writes: > But the ^C case is interesting. Try running your script under "sh -x" > and hitting ^C. The signal interrupts the first wait. In my script (or > yours modified to use a single wait), we then proceed immediately to the > "exit", and get output from the subshells after we've exited. > > But in your script with the loop, the first wait is interrupted, and > then the _second_ wait (in the second loop iteration) picks up all of > the exiting processes. The subsequent waits are all noops that return > immediately, because there are no processes left. > > So the right number of waits is either "1" or "2". Looping means we do > too many (which is mostly a harmless noop) or too few (in the off chance > that you have only a single job ;) ). So it works out in practice. Well, if you time your ^C perfectly, no number of waits is right, I am afraid. You spawn N processes and while looping N times waiting for them, you can ^C out of wait before these N processes all die, no? > But I think a more clear way to do it is to simply wait in the signal > trap, to cover the case when our original wait was aborted. Sounds good.
On Thu, Dec 06, 2018 at 09:22:23AM +0900, Junio C Hamano wrote: > > So the right number of waits is either "1" or "2". Looping means we do > > too many (which is mostly a harmless noop) or too few (in the off chance > > that you have only a single job ;) ). So it works out in practice. > > Well, if you time your ^C perfectly, no number of waits is right, I > am afraid. You spawn N processes and while looping N times waiting > for them, you can ^C out of wait before these N processes all die, > no? Each "wait" will try to collect all processes, but may be interrupted by a signal. So the correct number is actually "1 plus the number of times the user hits ^C". I had assumed the user was just hitting it once, though putting the wait into the trap means we do that "1 plus" thing anyway. I could also see an argument that subsequent ^C's should exit immediately, but I think we are getting well into the realm of over-engineering. -Peff
Jeff King <peff@peff.net> writes: > Each "wait" will try to collect all processes, but may be interrupted by > a signal. So the correct number is actually "1 plus the number of times > the user hits ^C". Yeah and that is not bounded. It is OK not to catch multiple ^C that races with what we do, so having ane extra wait in the clean-up procedure after receiving a signal like you suggested would be both good enough and the cleanest solution, I think. Thanks.
On Wed, Dec 05, 2018 at 04:36:26PM -0500, Jeff King wrote: > The signal interrupts the first wait. Ah, of course. I'm ashamed to say that this is not the first time I forget about that... > > Bash 4.3 or later are strange: I get back the shell prompt immediately > > after ctrl-C as well, so it doesn't appear to be waiting for all > > remaining jobs to finish either, but! I don't get any of the progress > > output from those jobs to mess up my next command. > > Interesting. My bash 4.4 seems to behave the same as dash. It almost > sounds like the SIGINT is getting passed to the subshells for you. > Probably not really worth digging into, though. The subshell does indeed get SIGINT. I don't know why that happens, to my understanding that should not happen. > In case anybody is playing with this and needed to simulate a stress > failure, here's what I used: > > diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh > index b6566003dd..a370cd9977 100755 > --- a/t/t0000-basic.sh > +++ b/t/t0000-basic.sh > @@ -1171,4 +1171,11 @@ test_expect_success 'very long name in the index handled sanely' ' > test $len = 4098 > ' > > +test_expect_success 'roll those dice' ' > + case "$(openssl rand -base64 1)" in > + z*) > + return 1 > + esac > +' Wasteful :) test $(($$ % 42)) -ne 0
On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote: > Could we just kill them all? > > I guess it's a little tricky, because $! is going to give us the pid of > each subshell. We actually want to kill the test sub-process. That takes > a few contortions, but the output is nice (every sub-job immediately > says "ABORTED ...", and the final process does not exit until the whole > tree is done): It's trickier than that: - Nowadays our test lib translates SIGINT to exit, but not TERM (or HUP, in case a user decides to close the terminal window), which means that if the test runs any daemons in the background, then those won't be killed when the stress test is aborted. This is easy enough to address, and I've been meaning to do this in an other patch series anyway. - With the (implied) '--verbose-log' the process killed in the background subshell's SIGTERM handler is not the actual test process, because '--verbose-log' runs the test again in a piped subshell to save its output: (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1; echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE" That 'kill' kills the "outer" shell process. And though I get "ABORTED..." immediately and I get back my prompt right away, the commands involved in the above construct are still running in the background: $ ./t3903-stash.sh --stress=1 ^CABORTED 0.0 $ ps a |grep t3903 1280 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 1281 pts/17 S 0:00 tee -a <...>/test-results/t3903-stash.stress-0.out 1282 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 4173 pts/17 S+ 0:00 grep t3903 I see this behavior with all shells I tried. I haven't yet started to think it through why this happens :) Not implying '--verbose-log' but redirecting the test script's output, like you did in your 'stress' script, seems to work in dash, ksh, and ks93, but not in Bash v4.3 or later, where, for whatever reason, the subshell get SIGINT before the SIGTERM would arrive. While we could easily handle SIGINT in the subshell with the same signal handler as SIGTERM, it bothers me that something fundamental is wrong here. Furthermore, then we should perhaps forbid '--stress' in combination with '--verbose-log' or '--tee'. > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 9b7f687396..357dead3ff 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -98,8 +98,9 @@ done,*) > mkdir -p "$(dirname "$TEST_RESULTS_BASE")" > stressfail="$TEST_RESULTS_BASE.stress-failed" > rm -f "$stressfail" > - trap 'echo aborted >"$stressfail"' TERM INT HUP > + trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' TERM INT HUP > > + job_pids= > job_nr=0 > while test $job_nr -lt "$job_count" > do > @@ -108,10 +109,13 @@ done,*) > GIT_TEST_STRESS_JOB_NR=$job_nr > export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR > > + trap 'kill $test_pid 2>/dev/null' TERM > + > cnt=0 > while ! test -e "$stressfail" > do > - if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 > + $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & test_pid=$! > + if wait > then > printf >&2 "OK %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt > elif test -f "$stressfail" && > @@ -124,16 +128,11 @@ done,*) > fi > cnt=$(($cnt + 1)) > done > - ) & > + ) & job_pids="$job_pids $!" > job_nr=$(($job_nr + 1)) > done > > - job_nr=0 > - while test $job_nr -lt "$job_count" > - do > - wait > - job_nr=$(($job_nr + 1)) > - done > + wait > > if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" > then >
On Thu, Dec 06, 2018 at 11:56:01PM +0100, SZEDER Gábor wrote: > > +test_expect_success 'roll those dice' ' > > + case "$(openssl rand -base64 1)" in > > + z*) > > + return 1 > > + esac > > +' > > Wasteful :) > > test $(($$ % 42)) -ne 0 Oh, indeed, that is much more clever. :) -Peff
On Fri, Dec 07, 2018 at 12:10:05AM +0100, SZEDER Gábor wrote: > On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote: > > Could we just kill them all? > > > > I guess it's a little tricky, because $! is going to give us the pid of > > each subshell. We actually want to kill the test sub-process. That takes > > a few contortions, but the output is nice (every sub-job immediately > > says "ABORTED ...", and the final process does not exit until the whole > > tree is done): > > It's trickier than that: > > - Nowadays our test lib translates SIGINT to exit, but not TERM (or > HUP, in case a user decides to close the terminal window), which > means that if the test runs any daemons in the background, then > those won't be killed when the stress test is aborted. > > This is easy enough to address, and I've been meaning to do this > in an other patch series anyway. Yeah, trapping on more signals seems reasonable (or just propagating INT instead of TERM). I'm more worried about this one: > - With the (implied) '--verbose-log' the process killed in the > background subshell's SIGTERM handler is not the actual test > process, because '--verbose-log' runs the test again in a piped > subshell to save its output: Hmm, yeah. The patch I sent earlier already propagates through the subshell, so this is really just another layer there. I.e., would it be reasonable when we are using --verbose-log (or --tee) for the parent process to propagate signals down to child it spawned? That would fix this case, and it would make things more predictable in general (e.g., a user who manually runs kill). It does feel like a lot of boilerplate to be propagating these signals manually. I think the "right" Unix-y solution here is process groups: each sub-job should get its own group, and then the top-level runner script can kill the whole group. We may run into portability issues, though (setsid is in util-linux, but I don't know if there's an equivalent elsewhere; a small perl or C helper could do it, but I have no idea what that would mean on Windows). > (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1; > echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE" > > That 'kill' kills the "outer" shell process. > And though I get "ABORTED..." immediately and I get back my > prompt right away, the commands involved in the above construct > are still running in the background: > > $ ./t3903-stash.sh --stress=1 > ^CABORTED 0.0 > $ ps a |grep t3903 > 1280 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 > 1281 pts/17 S 0:00 tee -a <...>/test-results/t3903-stash.stress-0.out > 1282 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 > 4173 pts/17 S+ 0:00 grep t3903 > > I see this behavior with all shells I tried. > I haven't yet started to think it through why this happens :) I think you get ABORTED because we are just watching for the parent-process to exit (and reporting its status). The fact that that the subshell (and the tee) are still running is not important. That all makes sense to me. > Not implying '--verbose-log' but redirecting the test script's > output, like you did in your 'stress' script, seems to work in > dash, ksh, and ks93, but not in Bash v4.3 or later, where, for > whatever reason, the subshell get SIGINT before the SIGTERM would > arrive. > While we could easily handle SIGINT in the subshell with the same > signal handler as SIGTERM, it bothers me that something > fundamental is wrong here. Yeah, I don't understand the SIGINT behavior you're seeing with bash. I can't replicate it with bash 4.4.23 (from Debian unstable). -Peff
diff --git a/t/README b/t/README index 28711cc508..9851de25c2 100644 --- a/t/README +++ b/t/README @@ -186,6 +186,16 @@ appropriately before running "make". this feature by setting the GIT_TEST_CHAIN_LINT environment variable to "1" or "0", respectively. +--stress:: +--stress=<N>:: + Run the test script repeatedly in multiple parallel + invocations until one of them fails. Useful for reproducing + rare failures in flaky tests. The number of parallel + invocations is, in order of precedence: <N>, or the value of + the GIT_TEST_STRESS_LOAD environment variable, or twice the + number of available processors in '/proc/cpuinfo', or 8. + Implies `--verbose-log`. + You can also set the GIT_TEST_INSTALLED environment variable to the bindir of an existing git installation to test that installation. You still need to have built this git sandbox, from which various @@ -425,7 +435,8 @@ This test harness library does the following things: - Creates an empty test directory with an empty .git/objects database and chdir(2) into it. This directory is 't/trash directory.$test_name_without_dotsh', with t/ subject to change by - the --root option documented above. + the --root option documented above, and a '.stress-<N>' suffix + appended by the --stress option. - Defines standard test helper functions for your scripts to use. These functions are designed to make all scripts behave diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index d9a602cd0f..9af11e3eed 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1288,8 +1288,6 @@ test_set_port () { # root-only port, use a larger one instead. port=$(($port + 10000)) fi - - eval $var=$port ;; *[^0-9]*) error >&7 "invalid port number: $port" @@ -1298,4 +1296,9 @@ test_set_port () { # The user has specified the port. ;; esac + + # Make sure that parallel '--stress' test jobs get different + # ports. + port=$(($port + ${GIT_TEST_STRESS_JOB_NR:-0})) + eval $var=$port } diff --git a/t/test-lib.sh b/t/test-lib.sh index 49e4563405..9b7f687396 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -71,8 +71,81 @@ then exit 1 fi +TEST_STRESS_SFX="${GIT_TEST_STRESS_JOB_NR:+.stress-$GIT_TEST_STRESS_JOB_NR}" TEST_NAME="$(basename "$0" .sh)" -TEST_RESULTS_BASE="$TEST_OUTPUT_DIRECTORY/test-results/$TEST_NAME" +TEST_RESULTS_BASE="$TEST_OUTPUT_DIRECTORY/test-results/$TEST_NAME$TEST_STRESS_SFX" + +# If --stress was passed, run this test repeatedly in several parallel loops. +case "$GIT_TEST_STRESS_STARTED, $* " in +done,*) + # Don't stress test again. + ;; +*' --stress '*|*' '--stress=*' '*) + job_count=${*##*--stress=} + if test "$job_count" != "$*" + then + job_count=${job_count%% *} + elif test -n "$GIT_TEST_STRESS_LOAD" + then + job_count="$GIT_TEST_STRESS_LOAD" + elif test -r /proc/cpuinfo + then + job_count=$((2 * $(grep -c ^processor /proc/cpuinfo))) + else + job_count=8 + fi + + mkdir -p "$(dirname "$TEST_RESULTS_BASE")" + stressfail="$TEST_RESULTS_BASE.stress-failed" + rm -f "$stressfail" + trap 'echo aborted >"$stressfail"' TERM INT HUP + + job_nr=0 + while test $job_nr -lt "$job_count" + do + ( + GIT_TEST_STRESS_STARTED=done + GIT_TEST_STRESS_JOB_NR=$job_nr + export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR + + cnt=0 + while ! test -e "$stressfail" + do + if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 + then + printf >&2 "OK %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt + elif test -f "$stressfail" && + test "$(cat "$stressfail")" = "aborted" + then + printf >&2 "ABORTED %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt + else + printf >&2 "FAIL %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt + echo $GIT_TEST_STRESS_JOB_NR >>"$stressfail" + fi + cnt=$(($cnt + 1)) + done + ) & + job_nr=$(($job_nr + 1)) + done + + job_nr=0 + while test $job_nr -lt "$job_count" + do + wait + job_nr=$(($job_nr + 1)) + done + + if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" + then + echo "Log(s) of failed test run(s) be found in:" + for f in $(cat "$stressfail") + do + echo " $TEST_RESULTS_BASE.stress-$f.out" + done + fi + exit + ;; +esac # if --tee was passed, write the output not only to the terminal, but # additionally to the file test-results/$BASENAME.out, too. @@ -80,7 +153,7 @@ case "$GIT_TEST_TEE_STARTED, $* " in done,*) # do not redirect again ;; -*' --tee '*|*' --va'*|*' -V '*|*' --verbose-log '*) +*' --tee '*|*' --va'*|*' -V '*|*' --verbose-log '*|*' --stress '*|*' '--stress=*' '*) mkdir -p "$(dirname "$TEST_RESULTS_BASE")" # Make this filename available to the sub-process in case it is using @@ -341,6 +414,9 @@ do -V|--verbose-log) verbose_log=t shift ;; + --stress|--stress=*) + verbose_log=t + shift ;; *) echo "error: unknown test option '$1'" >&2; exit 1 ;; esac @@ -1028,7 +1104,7 @@ then fi # Test repository -TRASH_DIRECTORY="trash directory.$TEST_NAME" +TRASH_DIRECTORY="trash directory.$TEST_NAME$TEST_STRESS_SFX" test -n "$root" && TRASH_DIRECTORY="$root/$TRASH_DIRECTORY" case "$TRASH_DIRECTORY" in /*) ;; # absolute path is good
Unfortunately, we have a few flaky tests, whose failures tend to be hard to reproduce. We've found that the best we can do to reproduce such a failure is to run the test repeatedly while the machine is under load, and wait in the hope that the load creates enough variance in the timing of the test's commands that a failure is evenually triggered. I have a command to do that, and I noticed that two other contributors have rolled their own scripts to do the same, all choosing slightly different approaches. To help reproduce failures in flaky tests, introduce the '--stress' option to run a test script repeatedly in multiple parallel invocations until one of them fails, thereby using the test script itself to increase the load on the machine. The number of parallel invocations is determined by, in order of precedence: the number specified as '--stress=<N>', or the value of the GIT_TEST_STRESS_LOAD environment variable, or twice the number of available processors in '/proc/cpuinfo', or 8. To prevent the several parallel invocations of the same test from interfering with each other: - Include the parallel job's number in the name of the trash directory and the various output files under 't/test-results/' as a '.stress-<Nr>' suffix. - Add the parallel job's number to the port number specified by the user or to the test number, so even tests involving daemons listening on a TCP socket can be stressed. - Make '--stress' imply '--verbose-log' and discard the test's standard ouput and error; dumping the output of several parallel tests to the terminal would create a big ugly mess. 'wait' for all parallel jobs before exiting (either because a failure was found or because the user lost patience and aborted the stress test), allowing the still running tests to finish. Otherwise the "OK X.Y" progress output from the last iteration would likely arrive after the user got back the shell prompt, interfering with typing in the next command. OTOH, this waiting might induce a considerable delay between hitting ctrl-C and the test actually exiting; I'm not sure this is the right tradeoff. Based on Jeff King's 'stress' script. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> --- t/README | 13 ++++++- t/test-lib-functions.sh | 7 +++- t/test-lib.sh | 82 +++++++++++++++++++++++++++++++++++++++-- 3 files changed, 96 insertions(+), 6 deletions(-)