diff mbox series

[v2] ci: avoid running the test suite _twice_

Message ID pull.1613.v2.git.1699907108371.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit e7e03ef9950df84d60b093ab925ccf34a8453369
Headers show
Series [v2] ci: avoid running the test suite _twice_ | expand

Commit Message

Johannes Schindelin Nov. 13, 2023, 8:25 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

This is a late amendment of 4a6e4b960263 (CI: remove Travis CI support,
2021-11-23), whereby the `.prove` file (being written by the `prove`
command that is used to run the test suite) is no longer retained
between CI builds: This feature was only ever used in the Travis CI
builds, we tried for a while to do the same in Azure Pipelines CI runs
(but I gave up on it after a while), and we never used that feature in
GitHub Actions (nor does the new GitLab CI code use it).

Retaining the Prove cache has been fragile from the start, even though
the idea seemed good at the time, the idea being that the `.prove` file
caches information about previous `prove` runs (`save`) and uses them
(`slow`) to run the tests in the order from longer-running to shorter
ones, making optimal use of the parallelism implied by `--jobs=<N>`.

However, using a Prove cache can cause some surprising behavior: When
the `prove` caches information about a test script it has run,
subsequent `prove` runs (with `--state=slow`) will run the same test
script again even if said script is not specified on the `prove`
command-line!

So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
tests in CI, 2023-11-09) did it not matter.

But starting with that commit, we invoke `prove` _twice_ in CI, once to
run the regular test suite of regression test scripts, and once to run
the unit tests. Due to the bug, the second invocation re-runs all of the
tests that were already run as part of the first invocation. This not
only wastes build minutes, it also frequently causes the `osx-*` jobs to
fail because they already take a long time and now are likely to run
into a timeout.

The worst part about it is that there is actually no benefit to keep
running with `--state=slow,save`, ever since we decided no longer to
try to reuse the Prove cache between CI runs.

So let's just drop that Prove option and live happily ever after.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    ci: avoid running the test suite twice
    
    This
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166] is
    an example of a osx-* job that times out. Here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:839],
    it is running t0013, and here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:2765],
    it is run again (in the middle of the entire test suite, as part of make
    unit-tests).
    
    While this tries to fix a bug uncovered by js/doc-unit-tests, to avoid
    merge conflicts, this is based on ps/ci-gitlab.
    
    Changes since v1:
    
     * Reworded the commit message, specifically avoiding a reference to a
       patch that has not been upstreamed from Git for Windows yet.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1613%2Fdscho%2Favoid-running-test-suite-twice-in-ci-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1613/dscho/avoid-running-test-suite-twice-in-ci-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1613

Range-diff vs v1:

 1:  c74eda36246 ! 1:  880f2c19478 ci: avoid running the test suite _twice_
     @@ Metadata
       ## Commit message ##
          ci: avoid running the test suite _twice_
      
     -    This is a late amendment of 19ec39aab54 (ci: stop linking the `prove`
     -    cache, 2022-07-10), fixing a bug that had been hidden so far.
     -
     -    The bug is that the `.prove` cache stores information about previous
     -    `prove` runs (`save`) and uses them (`slow`, to run the tests in the
     -    order from longer-running to shorter ones).
     -
     -    This bug can cause some surprising behavior: when the Prove cache
     -    contains a reference to a test script, subsequent `prove` runs (with
     -    `--state=slow`) will run the same test script again even if said script
     -    is not specified on the `prove` command-line!
     +    This is a late amendment of 4a6e4b960263 (CI: remove Travis CI support,
     +    2021-11-23), whereby the `.prove` file (being written by the `prove`
     +    command that is used to run the test suite) is no longer retained
     +    between CI builds: This feature was only ever used in the Travis CI
     +    builds, we tried for a while to do the same in Azure Pipelines CI runs
     +    (but I gave up on it after a while), and we never used that feature in
     +    GitHub Actions (nor does the new GitLab CI code use it).
     +
     +    Retaining the Prove cache has been fragile from the start, even though
     +    the idea seemed good at the time, the idea being that the `.prove` file
     +    caches information about previous `prove` runs (`save`) and uses them
     +    (`slow`) to run the tests in the order from longer-running to shorter
     +    ones, making optimal use of the parallelism implied by `--jobs=<N>`.
     +
     +    However, using a Prove cache can cause some surprising behavior: When
     +    the `prove` caches information about a test script it has run,
     +    subsequent `prove` runs (with `--state=slow`) will run the same test
     +    script again even if said script is not specified on the `prove`
     +    command-line!
      
          So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
     -    tests in CI, 2023-11-09) it did not matter.
     -
     -    But starting with that commit, we run `prove` _twice_ in CI, and with
     -    completely different sets of tests to run. Due to the bug, the second
     -    invocation re-runs all of the tests that were already run as part of the
     -    first invocation. This not only wastes build minutes, it also frequently
     -    causes the `osx-*` jobs to fail because they already take a long time
     -    and now are likely to run into a timeout.
     +    tests in CI, 2023-11-09) did it not matter.
     +
     +    But starting with that commit, we invoke `prove` _twice_ in CI, once to
     +    run the regular test suite of regression test scripts, and once to run
     +    the unit tests. Due to the bug, the second invocation re-runs all of the
     +    tests that were already run as part of the first invocation. This not
     +    only wastes build minutes, it also frequently causes the `osx-*` jobs to
     +    fail because they already take a long time and now are likely to run
     +    into a timeout.
      
          The worst part about it is that there is actually no benefit to keep
          running with `--state=slow,save`, ever since we decided no longer to


 ci/lib.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 0e3b67e2aa25edb7e1a5c999c87b52a7b3a7649a

Comments

Junio C Hamano Nov. 14, 2023, 12:24 a.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>     While this tries to fix a bug uncovered by js/doc-unit-tests, to avoid
>     merge conflicts, this is based on ps/ci-gitlab.

Excellent choice of the base, as dropping the state is the right
thing to do regardless of the "unit-tests" thing (as Peff said),
and the refactoring done by ps/ci-gitlab topic is what makes the
problematic state=failed,slow,save spread wider and affects more
jobs.

Will queue.  Thanks.
diff mbox series

Patch

diff --git a/ci/lib.sh b/ci/lib.sh
index 6dfc90d7f53..307a8df0b5a 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -281,7 +281,7 @@  else
 fi
 
 MAKEFLAGS="$MAKEFLAGS --jobs=$JOBS"
-GIT_PROVE_OPTS="--timer --jobs $JOBS --state=failed,slow,save"
+GIT_PROVE_OPTS="--timer --jobs $JOBS"
 
 GIT_TEST_OPTS="$GIT_TEST_OPTS --verbose-log -x"
 case "$CI_OS_NAME" in