diff mbox series

GitHub ci(windows): speed up initializing Git for Windows' minimal SDK again

Message ID pull.1841.git.1734447458896.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 55d62306eeada186154fd538cb79efd579f7d9f6
Headers show
Series GitHub ci(windows): speed up initializing Git for Windows' minimal SDK again | expand

Commit Message

Johannes Schindelin Dec. 17, 2024, 2:57 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

It used to be the case that initializing the minimal SDK (i.e. a
radically slimmed-down subset of Git for Windows' development
environment intended to perform the CI builds and little else) took
a bit over one minute, would then be cached, and subsequent jobs would
take at most half a dozen seconds to initialize said minimal SDK.

It is important that this step is fast because we have to run the test
suite in parallel, in a set of matrix jobs, to offset the slowness of
the shell-based test suite, and each and every job has to initialize the
very same minimal SDK.

While it may sound as if parallelizing the jobs might only waste the
generously-provided build minutes but at least the _wallclock_ time
would pass quick, in reality it matters a lot: Frequently Git for
Windows' or GitGitGadget PRs get stuck waiting for quite a while before
CI builds start because other PRs' builds still spend substantial
amounts of time to run, blocking due to the concurrency limit being
reached.

Since 91839a88277 (ci: create script to set up Git for Windows SDK,
2024-10-09), the situation has worsened: every job that requires the
minimal Git for Windows SDK spends roughly two-and-a-half minutes doing
so.

With the switch away from the GitHub Action `setup-git-for-windows-sdk`,
we incurred more downsides:

- It is no longer possible for said Action to fix problems independently
  from the Git repository, e.g. when new rules about GitHub Actions
  require changes in the way the minimal SDK is initialized.

- The minimal SDK was installed specifically outside of the worktree so
  as not to clutter it nor incur an additional cost to verify that the
  worktree is clean.

Therefore, even if it would be nice to have a shared process between
GitHub and GitLab based CI builds, let's switch the GitHub-based CI back
to the tried-and-tested `setup-git-for-windows-sdk` Action.

This commit partially reverts 91839a88277 (ci: create script to set up
Git for Windows SDK, 2024-10-09).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Speed up the Git for Windows SDK initialization again
    
    While waiting for way too many builds in
    https://github.com/gitgitgadget/git/actions to finish, I noticed that
    the minimal Git for Windows SDK initialization now takes a whopping 2.5
    minutes. That's way too much. It used to take a little over a minute
    when uncached, and 2-5 seconds when cached.
    
    Let's fix this regression by reverting to using the
    setup-git-for-windows-sdk GitHub Action (also because that Action will
    soon see another dramatic speed-up, see
    https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965).

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1841%2Fdscho%2Fci-windows-jobs-speedup-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1841/dscho/ci-windows-jobs-speedup-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1841

 .github/workflows/main.yml | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)


base-commit: 631ddbbcbd912530e1b78e5d782e72879f7f1fb2

Comments

Junio C Hamano Dec. 17, 2024, 8:33 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> This commit partially reverts 91839a88277 (ci: create script to set up
> Git for Windows SDK, 2024-10-09).

Thanks, will queue.

> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> index 9301a1edd6d..916a64b6736 100644
> --- a/.github/workflows/main.yml
> +++ b/.github/workflows/main.yml
> @@ -113,15 +113,13 @@ jobs:
>        cancel-in-progress: ${{ needs.ci-config.outputs.skip_concurrent == 'yes' }}
>      steps:
>      - uses: actions/checkout@v4
> -    - name: setup SDK
> -      shell: powershell
> -      run: ci/install-sdk.ps1
> +    - uses: git-for-windows/setup-git-for-windows-sdk@v1
>      - name: build
> -      shell: powershell
> +      shell: bash
>        env:
>          HOME: ${{runner.workspace}}
>          NO_PERL: 1
> -      run: git-sdk/usr/bin/bash.exe -l -c 'ci/make-test-artifacts.sh artifacts'
> +      run: . /etc/profile && ci/make-test-artifacts.sh artifacts
>      - name: zip up tracked files
>        run: git archive -o artifacts/tracked.tar.gz HEAD
>      - name: upload tracked files and build artifacts
> @@ -149,12 +147,10 @@ jobs:
>      - name: extract tracked files and build artifacts
>        shell: bash
>        run: tar xf artifacts.tar.gz && tar xf tracked.tar.gz
> -    - name: setup SDK
> -      shell: powershell
> -      run: ci/install-sdk.ps1
> +    - uses: git-for-windows/setup-git-for-windows-sdk@v1
>      - name: test
> -      shell: powershell
> -      run: git-sdk/usr/bin/bash.exe -l -c 'ci/run-test-slice.sh ${{matrix.nr}} 10'
> +      shell: bash
> +      run: . /etc/profile && ci/run-test-slice.sh ${{matrix.nr}} 10
>      - name: print test failures
>        if: failure() && env.FAILED_TEST_ARTIFACTS != ''
>        shell: bash
>
> base-commit: 631ddbbcbd912530e1b78e5d782e72879f7f1fb2
Patrick Steinhardt Dec. 18, 2024, 5:56 a.m. UTC | #2
On Tue, Dec 17, 2024 at 12:33:10PM -0800, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > This commit partially reverts 91839a88277 (ci: create script to set up
> > Git for Windows SDK, 2024-10-09).
> 
> Thanks, will queue.

Okay. Too bad that things regressed this badly with the change. I agree
that reverting is the right thing to do for now. I may revisit this
again in the next release cycle to investigate whether we can get it up
to par with the GitHub Actions. It would be great if the build infra was
shared between our CIs, so I think there's some value in it. But if the
answer is "no" then I guess that's ultimately fine, as well.

Thanks!

Patrick
Johannes Schindelin Dec. 19, 2024, 1:20 p.m. UTC | #3
Hi Patrick,

On Wed, 18 Dec 2024, Patrick Steinhardt wrote:

> On Tue, Dec 17, 2024 at 12:33:10PM -0800, Junio C Hamano wrote:
> > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> >
> > > This commit partially reverts 91839a88277 (ci: create script to set up
> > > Git for Windows SDK, 2024-10-09).
> >
> > Thanks, will queue.
>
> Okay. Too bad that things regressed this badly with the change. I agree
> that reverting is the right thing to do for now. I may revisit this
> again in the next release cycle to investigate whether we can get it up
> to par with the GitHub Actions.

The way I implemented it should be eminently possible in PowerShell, too.

Something like `Start-Process`, launching not an `Invoke-WebRequest` but
instead `C:\Windows\system32\curl.exe` [*1*] to download the `.tar.gz`
file (not the `.zip` file, more about that below). Another `Start-Process`
should then be opened that executes `tar.exe` [*2*], and then the `stdout`
of the former should be piped to the latter [*3*].

I do think that it makes a lot of sense to start extracting as soon as the
first byte arrives instead of storing the archive as a temporary file and
extracting it only once it has arrived completely.

Now, why not use the `.zip` file instead of the `.tar.gz` file? In my
analysis [*4*], I pointed out that the `.zip` file is about 10MB smaller,
after all, and BSD tar (at least the version in `C:\Windows\system32`) is
able to handle those, too, right? Not so fast. In my experiments, when
streaming the `.zip` file to the `tar.exe -xf -` process, the `etc/`
and `.sparse/` directories were consistently dropped. A bug, I guess. I
ran out of time to investigate this in more depth.

Since the artifacts are now hosted in a GitHub Release and updated
regularly, and since those updates cannot be atomic (you can only upload a
release asset if no asset of the same name exists already, read: the
automation has to _delete_ that asset before uploading a new version), it
would also be good to kind of expect that the file may be intermittently
absent and add a back-off strategy [*5*].

> It would be great if the build infra was shared between our CIs, so I
> think there's some value in it. But if the answer is "no" then I guess
> that's ultimately fine, as well.

It _could_ be done. But the advantages of having it versioned outside of
the Git repository outweigh the benefits of that shared infrastructure, I
believe.

Ciao,
Johannes

Footnote *1*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/6db65223de699c4f75ab083f82f43947a53ad6ff#diff-6855ef61b94227f9264adab3ff9f2de95c2d7b4e451019cc0105896d32550eb0R58-R73

Footnote *2*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/6db65223de699c4f75ab083f82f43947a53ad6ff#diff-6855ef61b94227f9264adab3ff9f2de95c2d7b4e451019cc0105896d32550eb0R77-R86

Footnote *3*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/6db65223de699c4f75ab083f82f43947a53ad6ff#diff-6855ef61b94227f9264adab3ff9f2de95c2d7b4e451019cc0105896d32550eb0R88

Footnote *4*:
https://github.com/git-for-windows/git-sdk-64/pull/87/commits/fdb0cea373893ce7d40bcfcfbeb7fd091a3c4020

Footnote *5*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/3d4ea07041d0740b21160a9d9a4181f569e706d8
diff mbox series

Patch

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 9301a1edd6d..916a64b6736 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -113,15 +113,13 @@  jobs:
       cancel-in-progress: ${{ needs.ci-config.outputs.skip_concurrent == 'yes' }}
     steps:
     - uses: actions/checkout@v4
-    - name: setup SDK
-      shell: powershell
-      run: ci/install-sdk.ps1
+    - uses: git-for-windows/setup-git-for-windows-sdk@v1
     - name: build
-      shell: powershell
+      shell: bash
       env:
         HOME: ${{runner.workspace}}
         NO_PERL: 1
-      run: git-sdk/usr/bin/bash.exe -l -c 'ci/make-test-artifacts.sh artifacts'
+      run: . /etc/profile && ci/make-test-artifacts.sh artifacts
     - name: zip up tracked files
       run: git archive -o artifacts/tracked.tar.gz HEAD
     - name: upload tracked files and build artifacts
@@ -149,12 +147,10 @@  jobs:
     - name: extract tracked files and build artifacts
       shell: bash
       run: tar xf artifacts.tar.gz && tar xf tracked.tar.gz
-    - name: setup SDK
-      shell: powershell
-      run: ci/install-sdk.ps1
+    - uses: git-for-windows/setup-git-for-windows-sdk@v1
     - name: test
-      shell: powershell
-      run: git-sdk/usr/bin/bash.exe -l -c 'ci/run-test-slice.sh ${{matrix.nr}} 10'
+      shell: bash
+      run: . /etc/profile && ci/run-test-slice.sh ${{matrix.nr}} 10
     - name: print test failures
       if: failure() && env.FAILED_TEST_ARTIFACTS != ''
       shell: bash