mbox series

[v4,00/10] A couple of CI improvements

Message ID 20250110-b4-pks-ci-fixes-v4-0-6e4613446080@pks.im (mailing list archive)
Headers show
Series A couple of CI improvements | expand

Message

Patrick Steinhardt Jan. 10, 2025, 11:31 a.m. UTC
Hi,

this patch series addresses a couple of issues I've found while
investigating flaky CI jobs. Besides two more fixes for flaky jobs it
also removes some stale code and simplifies the setup on GitHub Actions
to always use containerized jobs on Linux.

Test runs can be found for GitLab [1] and GitHub [2].

Changes in v2:

  - Expand a bit on the reasoning behind the conversion to use
    containerized jobs.
  - Fix commit message typo.
  - Properly fix the race in t7422 via pipe stuffing, as proposed by
    Peff.
  - Link to v1: https://lore.kernel.org/r/20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im

Changes in v3:

  - Another iteration on the SIGPIPE test, which should now finally plug
    the race.
  - Link to v2: https://lore.kernel.org/r/20250106-b4-pks-ci-fixes-v2-0-06ae540771b7@pks.im

Changes in v4:

  - Improve the commit message of the SIGPIPE test commit to more
    accurately describe the race.
  - Link to v3: https://lore.kernel.org/r/20250107-b4-pks-ci-fixes-v3-0-546a0ebc8481@pks.im

Thanks!

Patrick

[1]: https://gitlab.com/gitlab-org/git/-/merge_requests/277
[2]: https://github.com/git/git/pull/1865

---
Patrick Steinhardt (10):
      t0060: fix EBUSY in MinGW when setting up runtime prefix
      t7422: fix flaky test caused by buffered stdout
      github: adapt containerized jobs to be rootless
      github: convert all Linux jobs to be containerized
      github: simplify computation of the job's distro
      gitlab-ci: remove the "linux-old" job
      gitlab-ci: add linux32 job testing against i386
      ci: stop special-casing for Ubuntu 16.04
      ci: use latest Ubuntu release
      ci: remove stale code for Azure Pipelines

 .github/workflows/main.yml  | 78 ++++++++++++++++++++++-----------------------
 .gitlab-ci.yml              | 19 ++++++-----
 ci/install-dependencies.sh  |  6 ++--
 ci/lib.sh                   | 34 +++-----------------
 ci/print-test-failures.sh   |  5 ---
 t/t0060-path-utils.sh       | 10 +++---
 t/t7422-submodule-output.sh | 43 ++++++++++++++++++++++---
 7 files changed, 100 insertions(+), 95 deletions(-)

Range-diff versus v3:

 1:  324b174988 =  1:  ca4dc636aa t0060: fix EBUSY in MinGW when setting up runtime prefix
 2:  cc095aa2b1 !  2:  d41c00feb1 t7422: fix flaky test caused by buffered stdout
    @@ Commit message
             error: last command exited with $?=1
             not ok 18 - git submodule status --recursive propagates SIGPIPE
     
    -    The issue is caused by us using grep(1) to terminate the pipe on the
    -    first matching line in the recursing git-submodule(1) process. Standard
    -    streams are typically buffered though, so this condition is racy and may
    -    cause us to terminate the pipe after git-submodule(1) has already
    -    exited, and in that case we wouldn't see the expected signal.
    +    The issue is caused by a race between git-submodule(1) and grep(1):
    +
    +      1. git-submodule(1) (or its child process) writes the first X/S line
    +         we're trying to match.
    +
    +      2. grep(1) matches the line.
    +
    +      3a. grep(1) exits, closing the pipe.
    +
    +      3b. git-submodule(1) (or its child process) writes the rest of its
    +      lines.
    +
    +    Steps 3a and 3b happen at the same time without any guarantees. If 3a
    +    happens first, we get SIGPIPE. Otherwise, we don't and the test fails.
     
         Fix the issue by generating a couple thousand nested submodules and
         matching on the first nested submodule. This ensures that the recursive
         git-submodule(1) process completely fills its stdout buffer, which makes
         subsequent writes block until the downstream consumer of the pipe either
    -    fully drains it or closes it.
    +    reads more or closes it.
     
         To verify that this works as expected one can apply the following patch
         to the preimage of this commit, which used to reliably trigger the race:
 3:  73c98e7628 =  3:  6593e3307c github: adapt containerized jobs to be rootless
 4:  4d8f2bdce7 =  4:  9997698c6e github: convert all Linux jobs to be containerized
 5:  4a987aa42e =  5:  65797c51dc github: simplify computation of the job's distro
 6:  bd44700668 =  6:  687ae0c3f2 gitlab-ci: remove the "linux-old" job
 7:  e095c6757c =  7:  974229776b gitlab-ci: add linux32 job testing against i386
 8:  f885740877 =  8:  112ea61a6b ci: stop special-casing for Ubuntu 16.04
 9:  17b19dc51e =  9:  465cd85898 ci: use latest Ubuntu release
10:  95ce4406c7 = 10:  d671ee1f7f ci: remove stale code for Azure Pipelines

---
base-commit: 1b4e9a5f8b5f048972c21fe8acafe0404096f694
change-id: 20250103-b4-pks-ci-fixes-2d0a23fb5c78

Comments

Jeff King Jan. 10, 2025, 12:03 p.m. UTC | #1
On Fri, Jan 10, 2025 at 12:31:56PM +0100, Patrick Steinhardt wrote:

> Changes in v4:
> 
>   - Improve the commit message of the SIGPIPE test commit to more
>     accurately describe the race.

Thank you for addressing my nits. :) The result looks good to me.

-Peff