diff mbox series

ci: avoid running the test suite _twice_

Message ID pull.1613.git.1699894837844.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series ci: avoid running the test suite _twice_ | expand

Commit Message

Johannes Schindelin Nov. 13, 2023, 5 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

This is a late amendment of 19ec39aab54 (ci: stop linking the `prove`
cache, 2022-07-10), fixing a bug that had been hidden so far.

The bug is that the `.prove` cache stores information about previous
`prove` runs (`save`) and uses them (`slow`, to run the tests in the
order from longer-running to shorter ones).

This bug can cause some surprising behavior: when the Prove cache
contains a reference to a test script, subsequent `prove` runs (with
`--state=slow`) will run the same test script again even if said script
is not specified on the `prove` command-line!

So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
tests in CI, 2023-11-09) it did not matter.

But starting with that commit, we run `prove` _twice_ in CI, and with
completely different sets of tests to run. Due to the bug, the second
invocation re-runs all of the tests that were already run as part of the
first invocation. This not only wastes build minutes, it also frequently
causes the `osx-*` jobs to fail because they already take a long time
and now are likely to run into a timeout.

The worst part about it is that there is actually no benefit to keep
running with `--state=slow,save`, ever since we decided no longer to
try to reuse the Prove cache between CI runs.

So let's just drop that Prove option and live happily ever after.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    ci: avoid running the test suite twice
    
    This
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166] is
    an example of a osx-* job that times out. Here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:839],
    it is running t0013, and here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:2765],
    it is run again (in the middle of the entire test suite, as part of make
    unit-tests).
    
    While this tries to fix a bug uncovered by js/doc-unit-tests, to avoid
    merge conflicts, this is based on ps/ci-gitlab.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1613%2Fdscho%2Favoid-running-test-suite-twice-in-ci-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1613/dscho/avoid-running-test-suite-twice-in-ci-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1613

 ci/lib.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 0e3b67e2aa25edb7e1a5c999c87b52a7b3a7649a

Comments

Jeff King Nov. 13, 2023, 6:49 p.m. UTC | #1
On Mon, Nov 13, 2023 at 05:00:37PM +0000, Johannes Schindelin via GitGitGadget wrote:

> This is a late amendment of 19ec39aab54 (ci: stop linking the `prove`
> cache, 2022-07-10), fixing a bug that had been hidden so far.

We don't seem to have that commit in Junio's tree; it is only in
git-for-windows.

Not that we should not fix things if they are broken, but I am trying
to understand if git/git is experiencing the same bug. It sounds like
not yet, though from looking at 19ec39aab54, I would expect to get these
doubled runs any time we store the prove state. But maybe without that
commit our state-file symlink is going somewhere invalid, and prove
fails to actually store anything?

> But starting with that commit, we run `prove` _twice_ in CI, and with
> completely different sets of tests to run. Due to the bug, the second
> invocation re-runs all of the tests that were already run as part of the
> first invocation. This not only wastes build minutes, it also frequently
> causes the `osx-*` jobs to fail because they already take a long time
> and now are likely to run into a timeout.
> 
> The worst part about it is that there is actually no benefit to keep
> running with `--state=slow,save`, ever since we decided no longer to
> try to reuse the Prove cache between CI runs.
> 
> So let's just drop that Prove option and live happily ever after.

Yes, I think this is the right thing to do regardless. If we are not
saving the state to use between two related runs, there is no point
storing it in the first place.

I do have to wonder, though, as somebody who did not follow the
unit-test topic closely: why are the unit tests totally separate from
the rest of the suite? I would think we'd want them run from one or more
t/t*.sh scripts. That would make bugs like this impossible, but also:

  1. They'd be run via "make test", so developers don't have to remember
     to run them separately.

  2. They can be run in parallel with all of the other tests when using
     "prove -j", etc.

-Peff
Junio C Hamano Nov. 13, 2023, 11:55 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> I do have to wonder, though, as somebody who did not follow the
> unit-test topic closely: why are the unit tests totally separate from
> the rest of the suite? I would think we'd want them run from one or more
> t/t*.sh scripts. That would make bugs like this impossible, but also:
>
>   1. They'd be run via "make test", so developers don't have to remember
>      to run them separately.
>
>   2. They can be run in parallel with all of the other tests when using
>      "prove -j", etc.

Very good points.  Josh?
Josh Steadmon Nov. 14, 2023, 9:29 p.m. UTC | #3
On 2023.11.14 08:55, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > I do have to wonder, though, as somebody who did not follow the
> > unit-test topic closely: why are the unit tests totally separate from
> > the rest of the suite? I would think we'd want them run from one or more
> > t/t*.sh scripts. That would make bugs like this impossible, but also:
> >
> >   1. They'd be run via "make test", so developers don't have to remember
> >      to run them separately.
> >
> >   2. They can be run in parallel with all of the other tests when using
> >      "prove -j", etc.
> 
> Very good points.  Josh?

In short, the last time I tried to add something to CI, it was not well
received, so I've been perhaps overly cautious in keeping the unit-tests
well-separated from other targets. But I can send a follow-up patch to
fold them into `make test`. Or would you prefer that I send a v11 of
js/doc-unit-tests instead?
Junio C Hamano Nov. 15, 2023, 1 a.m. UTC | #4
Josh Steadmon <steadmon@google.com> writes:

> On 2023.11.14 08:55, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>> 
>> > I do have to wonder, though, as somebody who did not follow the
>> > unit-test topic closely: why are the unit tests totally separate from
>> > the rest of the suite? I would think we'd want them run from one or more
>> > t/t*.sh scripts. That would make bugs like this impossible, but also:
>> >
>> >   1. They'd be run via "make test", so developers don't have to remember
>> >      to run them separately.
>> >
>> >   2. They can be run in parallel with all of the other tests when using
>> >      "prove -j", etc.
>> 
>> Very good points.  Josh?
>
> In short, the last time I tried to add something to CI, it was not well
> received, so I've been perhaps overly cautious in keeping the unit-tests
> well-separated from other targets. But I can send a follow-up patch to
> fold them into `make test`. Or would you prefer that I send a v11 of
> js/doc-unit-tests instead?

Incremental patches to update what is in 'next' would let us try out
the new arragement to drive the tests from the main "make test"
eaarlier.  Post release, a new iteration could replace the series
wholesale as we will have an opportunity to rebuild 'next', but it
would be nice for the end states to match, if you were to do both.

Thanks.
Josh Steadmon Nov. 15, 2023, 9:28 p.m. UTC | #5
On 2023.11.13 13:49, Jeff King wrote:
> On Mon, Nov 13, 2023 at 05:00:37PM +0000, Johannes Schindelin via GitGitGadget wrote:
> 
> > This is a late amendment of 19ec39aab54 (ci: stop linking the `prove`
> > cache, 2022-07-10), fixing a bug that had been hidden so far.
> 
> We don't seem to have that commit in Junio's tree; it is only in
> git-for-windows.
> 
> Not that we should not fix things if they are broken, but I am trying
> to understand if git/git is experiencing the same bug. It sounds like
> not yet, though from looking at 19ec39aab54, I would expect to get these
> doubled runs any time we store the prove state. But maybe without that
> commit our state-file symlink is going somewhere invalid, and prove
> fails to actually store anything?
> 
> > But starting with that commit, we run `prove` _twice_ in CI, and with
> > completely different sets of tests to run. Due to the bug, the second
> > invocation re-runs all of the tests that were already run as part of the
> > first invocation. This not only wastes build minutes, it also frequently
> > causes the `osx-*` jobs to fail because they already take a long time
> > and now are likely to run into a timeout.
> > 
> > The worst part about it is that there is actually no benefit to keep
> > running with `--state=slow,save`, ever since we decided no longer to
> > try to reuse the Prove cache between CI runs.
> > 
> > So let's just drop that Prove option and live happily ever after.
> 
> Yes, I think this is the right thing to do regardless. If we are not
> saving the state to use between two related runs, there is no point
> storing it in the first place.
> 
> I do have to wonder, though, as somebody who did not follow the
> unit-test topic closely: why are the unit tests totally separate from
> the rest of the suite? I would think we'd want them run from one or more
> t/t*.sh scripts. That would make bugs like this impossible, but also:
> 
>   1. They'd be run via "make test", so developers don't have to remember
>      to run them separately.
> 
>   2. They can be run in parallel with all of the other tests when using
>      "prove -j", etc.

The first part is easy, but I don't see a good way to get both shell
tests and unit tests executing under the same `prove` process. For shell
tests, we pass `--exec '$(TEST_SHELL_PATH_SQ)'` to prove, meaning that
we use the specified shell as an interpreter for the test files. That
will not work for unit test executables.

We could bundle all the unit tests into a single shell script, but then
we lose parallelization and add hoops to jump through to determine what
breaks. Or we could autogenerate a corresponding shell script to run
each individual unit test, but that seems gross. Of course, these are
hypothetical concerns for now, since we only have a single unit test at
the moment.

There's also the issue that the shell test arguments we pass on from
prove would be shared with the unit tests. That's fine for now, as
t-strbuf doesn't accept any runtime arguments, but it's possible that
either the framework or individual unit tests might grow to need
arguments, and it might not be convenient to stay compatible with the
shell tests.

Personally, I lean towards keeping things simple and just running a
second `prove` process as part of `make test`. If I was forced to pick a
way to get everything under one process, I'd lean towards autogenerating
individual shell script wrappers for each unit test. But I'm open to
discussion, especially if people have other approaches I haven't thought
of.
Johannes Schindelin Nov. 16, 2023, 8:42 a.m. UTC | #6
Hi Josh,

On Wed, 15 Nov 2023, Josh Steadmon wrote:

> On 2023.11.13 13:49, Jeff King wrote:
> >
> > why are the unit tests totally separate from the rest of the suite? I
> > would think we'd want them run from one or more t/t*.sh scripts. That
> > would make bugs like this impossible, but also:
> >
> >   1. They'd be run via "make test", so developers don't have to remember
> >      to run them separately.
> >
> >   2. They can be run in parallel with all of the other tests when using
> >      "prove -j", etc.
>
> The first part is easy, but I don't see a good way to get both shell
> tests and unit tests executing under the same `prove` process. For shell
> tests, we pass `--exec '$(TEST_SHELL_PATH_SQ)'` to prove, meaning that
> we use the specified shell as an interpreter for the test files. That
> will not work for unit test executables.

Probably my favorite aspect about the new unit tests is that they avoid
using the error-prone, unintuitive and slow shell scripts and stay within
the programming language of the code that is to be tested: C.

> We could bundle all the unit tests into a single shell script, but then
> we lose parallelization and add hoops to jump through to determine what
> breaks. Or we could autogenerate a corresponding shell script to run
> each individual unit test, but that seems gross. Of course, these are
> hypothetical concerns for now, since we only have a single unit test at
> the moment.

I totally agree with you, Josh, that it makes little sense to
try to contort the unit tests to be run in the same `prove` run as the
regression tests that need to be invoked so totally differently.

> There's also the issue that the shell test arguments we pass on from
> prove would be shared with the unit tests. That's fine for now, as
> t-strbuf doesn't accept any runtime arguments, but it's possible that
> either the framework or individual unit tests might grow to need
> arguments, and it might not be convenient to stay compatible with the
> shell tests.
>
> Personally, I lean towards keeping things simple and just running a
> second `prove` process as part of `make test`.

Agreed.

> If I was forced to pick a way to get everything under one process, I'd
> lean towards autogenerating individual shell script wrappers for each
> unit test. But I'm open to discussion, especially if people have other
> approaches I haven't thought of.

One alternative would be to avoid running the unit tests via `prove` in
the first place.

For example, we could use the helper from be5d88e11280 (test-tool
run-command: learn to run (parts of) the testsuite, 2019-10-04) [*1*]. It
would probably need a few improvements, but certainly no wizardry nor
witchcraft would be required. It would also help on Windows, where running
a simple test helper written in C is vastly faster than running a complex
Perl script (which `prove` is).

Ciao,
Johannes

Footnote *1*: I had always wanted to improve that test helper to the point
where it could replace our use of `prove`, at least on Windows. It seems,
however, that as of 4c2c38e800f3 (ci: modification of main.yml to use
cmake for vs-build job, 2020-06-26) we do not use the helper at all
anymore. Hopefully it can still be useful. 
Phillip Wood Nov. 16, 2023, 3:05 p.m. UTC | #7
On 16/11/2023 08:42, Johannes Schindelin wrote:
> On Wed, 15 Nov 2023, Josh Steadmon wrote:
>> On 2023.11.13 13:49, Jeff King wrote:
>> We could bundle all the unit tests into a single shell script, but then
>> we lose parallelization and add hoops to jump through to determine what
>> breaks. Or we could autogenerate a corresponding shell script to run
>> each individual unit test, but that seems gross. Of course, these are
>> hypothetical concerns for now, since we only have a single unit test at
>> the moment.
> 
> I totally agree with you, Josh, that it makes little sense to
> try to contort the unit tests to be run in the same `prove` run as the
> regression tests that need to be invoked so totally differently.

FWIW that's my feeling too. It makes sense for "make test" to run the 
unit tests, but wrapping the unit tests in one or more shell scripts 
adds unnecessary complexity.

Best Wishes

Phillip
Jeff King Nov. 16, 2023, 8:02 p.m. UTC | #8
On Wed, Nov 15, 2023 at 01:28:49PM -0800, Josh Steadmon wrote:

> The first part is easy, but I don't see a good way to get both shell
> tests and unit tests executing under the same `prove` process. For shell
> tests, we pass `--exec '$(TEST_SHELL_PATH_SQ)'` to prove, meaning that
> we use the specified shell as an interpreter for the test files. That
> will not work for unit test executables.

Yes, it's unfortunate that you can't set the "exec" flag per-script
(especially because without --exec it will auto-detect the right thing,
but then of course it won't use TEST_SHELL_PATH). But we can intercept
and do it ourselves, like:

diff --git a/t/Makefile b/t/Makefile
index 225aaf78ed..0b7c028eea 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -61,7 +61,7 @@ failed:
 	test -z "$$failed" || $(MAKE) $$failed
 
 prove: pre-clean check-chainlint $(TEST_LINT)
-	@echo "*** prove ***"; $(CHAINLINTSUPPRESS) $(PROVE) --exec '$(TEST_SHELL_PATH_SQ)' $(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
+	@echo "*** prove ***"; TEST_SHELL_PATH='$(TEST_SHELL_PATH_SQ)' $(CHAINLINTSUPPRESS) $(PROVE) --exec ./run-test.sh $(GIT_PROVE_OPTS) $(T) $(UNIT_TESTS) :: $(GIT_TEST_OPTS)
 	$(MAKE) clean-except-prove-cache
 
 $(T):
diff --git a/t/run-test.sh b/t/run-test.sh
new file mode 100755
index 0000000000..69944029c8
--- /dev/null
+++ b/t/run-test.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+case "$1" in
+*.sh)
+	exec ${TEST_SHELL_PATH:-/bin/sh} "$@"
+	;;
+*)
+	exec "$@"
+	;;
+esac

You can actually do this inside the prove script using their plugin
interface, but the necessary bits are somewhat arcane.

> We could bundle all the unit tests into a single shell script, but then
> we lose parallelization and add hoops to jump through to determine what
> breaks. Or we could autogenerate a corresponding shell script to run
> each individual unit test, but that seems gross. Of course, these are
> hypothetical concerns for now, since we only have a single unit test at
> the moment.

We can't just stick them all in a single script; there must be exactly
one "plan" line in the TAP output from a given source. I had imagined
just manually adding a thin wrapper for each ("t9970-unit-strbuf" or
something). But it would also be easy to autogenerate them while
compiling. (Although all of that is moot with the wrapper I showed
above).

> There's also the issue that the shell test arguments we pass on from
> prove would be shared with the unit tests. That's fine for now, as
> t-strbuf doesn't accept any runtime arguments, but it's possible that
> either the framework or individual unit tests might grow to need
> arguments, and it might not be convenient to stay compatible with the
> shell tests.

Sharing the options between the two seems like a benefit to me. I'd
think that "-v" and "-i" would be useful, at least. Options which don't
apply (e.g., "--root") could be quietly ignored.

-Peff
Josh Steadmon Jan. 4, 2024, 11:54 p.m. UTC | #9
On 2023.11.16 09:42, Johannes Schindelin wrote:
> Hi Josh,
> 
> On Wed, 15 Nov 2023, Josh Steadmon wrote:

[snip]

> > If I was forced to pick a way to get everything under one process, I'd
> > lean towards autogenerating individual shell script wrappers for each
> > unit test. But I'm open to discussion, especially if people have other
> > approaches I haven't thought of.
> 
> One alternative would be to avoid running the unit tests via `prove` in
> the first place.
> 
> For example, we could use the helper from be5d88e11280 (test-tool
> run-command: learn to run (parts of) the testsuite, 2019-10-04) [*1*]. It
> would probably need a few improvements, but certainly no wizardry nor
> witchcraft would be required. It would also help on Windows, where running
> a simple test helper written in C is vastly faster than running a complex
> Perl script (which `prove` is).
> 
> Ciao,
> Johannes
> 
> Footnote *1*: I had always wanted to improve that test helper to the point
> where it could replace our use of `prove`, at least on Windows. It seems,
> however, that as of 4c2c38e800f3 (ci: modification of main.yml to use
> cmake for vs-build job, 2020-06-26) we do not use the helper at all
> anymore. Hopefully it can still be useful. 
diff mbox series

Patch

diff --git a/ci/lib.sh b/ci/lib.sh
index 6dfc90d7f53..307a8df0b5a 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -281,7 +281,7 @@  else
 fi
 
 MAKEFLAGS="$MAKEFLAGS --jobs=$JOBS"
-GIT_PROVE_OPTS="--timer --jobs $JOBS --state=failed,slow,save"
+GIT_PROVE_OPTS="--timer --jobs $JOBS"
 
 GIT_TEST_OPTS="$GIT_TEST_OPTS --verbose-log -x"
 case "$CI_OS_NAME" in