diff mbox series

ci(linux32): make Javascript Actions work in x86 mode

Message ID pull.1790.git.1726274559928.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series ci(linux32): make Javascript Actions work in x86 mode | expand

Commit Message

Johannes Schindelin Sept. 14, 2024, 12:42 a.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

In February 2023, older `actions/upload-artifact` were deprecated:
https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/
This was recently followed by brown-outs.

However, the `linux32` job relied on those, as there are well-documented
problems (see https://github.com/actions/runner/issues/2115 for example)
running modern, Javascript-based Actions in 32-bit only containers.

To get the CI builds to work again, a work-around was implemented in
https://lore.kernel.org/git/20240912094841.GD589828@coredump.intra.peff.net
to let the 32-bit container make use of the 64-bit node 20 provided by
the Actions runner.

This, however, runs the risk of using 64-bit executables when we
purposefully chose a Docker image that only contains 32-bit bits and
pieces so that accidental use of 64-bit libraries or executables would
not happen.

Let's go about this the other way round instead, by overriding the amd64
version of node 20 the Actions runner provides with an x86 one (which is
"officially unofficial" by virtue of being hosted on
unofficial-builds.nodejs.org).

This allows us to stop using the now-deprecated versions of
`actions/checkout` and `actions/upload-artifact` before these Actions
became Javascript-based Actions.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    ci(linux32): do make Javascript Actions work
    
    I propose this alternative to 9c261856c91 (ci: use regular action
    versions for linux32 job, 2024-09-12), keeping the updates to Javascript
    Actions.
    
    The benefit is that it keeps the 32-bit container used in the linux32
    job clean of any 64-bit libraries so that we don't accidentally end up
    testing 64-bit stuff without wanting it.
    
    For good measure, I also reported this problem with that deprecation at
    https://github.com/actions/upload-artifact/issues/616 even though I know
    that the GitHub Actions team saw a headcount-losing reorg recently and
    therefore I do not really expect that they have any bandwidth to help
    with this. So this work-around is the best we can do for now, I believe.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1790%2Fdscho%2Fuse-x86-node-in-linux32-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1790/dscho/use-x86-node-in-linux32-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1790

 .github/workflows/main.yml | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)


base-commit: 87dc3914693da5febad427161fc6bdfeed3426c7

Comments

Jeff King Sept. 14, 2024, 7:29 a.m. UTC | #1
On Sat, Sep 14, 2024 at 12:42:39AM +0000, Johannes Schindelin via GitGitGadget wrote:

> To get the CI builds to work again, a work-around was implemented in
> https://lore.kernel.org/git/20240912094841.GD589828@coredump.intra.peff.net
> to let the 32-bit container make use of the 64-bit node 20 provided by
> the Actions runner.
> 
> This, however, runs the risk of using 64-bit executables when we
> purposefully chose a Docker image that only contains 32-bit bits and
> pieces so that accidental use of 64-bit libraries or executables would
> not happen.

How big a risk is this? In my experience with multiarch systems, the
difficulty is the opposite: convincing the non-base toolchain to work at
all, rather than using it accidentally. Especially as the solution in my
patch is not configuring apt for multiarch support, where you could
accidentally install gcc:amd64 versus gcc:i386. Even if we were to
accidentally bring in the cross-platform compiler via the
gcc-10-x86-64-linux-gnu package, you'd have to point $(CC) at it
explicitly.

But maybe others have had a different experience?

> Let's go about this the other way round instead, by overriding the amd64
> version of node 20 the Actions runner provides with an x86 one (which is
> "officially unofficial" by virtue of being hosted on
> unofficial-builds.nodejs.org).

I'm not totally opposed to this direction, but I'm a little concerned
about the stability/maintenance of the solution. In particular:

> +        NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz &&

Will this URL work forever? Looking at the release/ directory, it looks
like it should hang around. They have entries going back to 2019 (which
is not all that old, but I suspect that's when they started the build
repository).

The flip side is: will node20 be sufficient for Actions forever? Node16
was already deprecated in Actions in 2023, and it was released in 2021
(looks like Node releases have a 2-year lifespan in general). So node20
takes us to April 2026 or so.

Of course my solution has similar problems. Probably node24 or whatever
comes next will need another glibc bump.

I was mildly surprised that the build you reference didn't run into that
problem, actually. But I double-checked and it appears to run OK in
Xenial; it needs 2.17, according to some "objdump -T" sleuthing. I'm not
sure why that's so different from the official 64-bit builds.

> +        curl -Lo /tmp/node.tar.gz $NODE_URL &&
> +        tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz

This is pretty intimate with how Actions work (both the node20 version
and the "/__e/" magic). It's hard to say if/when that would bite us.

>     For good measure, I also reported this problem with that deprecation at
>     https://github.com/actions/upload-artifact/issues/616 even though I know
>     that the GitHub Actions team saw a headcount-losing reorg recently and
>     therefore I do not really expect that they have any bandwidth to help
>     with this. So this work-around is the best we can do for now, I believe.

I think it's reasonably well known there already. Here are some relevant
issues I came across:

  - https://github.com/actions/checkout/issues/334
  - https://github.com/actions/checkout/issues/1590
  - https://github.com/actions/setup-node/issues/922
  - https://github.com/actions/runner/issues/2906

Most folks are hitting the glibc issue rather than the i386 one, but the
core of the problem is the same.

IMHO the ultimate solution is a statically-linked node binary (you'd
still be relying on the kernel, but that has a very good track record of
userspace compatibility).

-Peff
Junio C Hamano Sept. 14, 2024, 5:17 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> I'm not totally opposed to this direction, but I'm a little concerned
> about the stability/maintenance of the solution. In particular:
>
>> +        NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz &&
> ...
>> +        curl -Lo /tmp/node.tar.gz $NODE_URL &&
>> +        tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz
>
> This is pretty intimate with how Actions work (both the node20 version
> and the "/__e/" magic). It's hard to say if/when that would bite us.

Thanks for clearly expressing the uneasiness I felt, which I could
not turn into concrete words, when I saw the patch first time.

Each of these approaches may have its pros and cons, but I somehow
do not see that the newly proposed alternative is 10x better than
what was reviewed and queued already to be worth the effort to
replace it.

Thanks, both.
Jeff King Sept. 15, 2024, 11:07 a.m. UTC | #3
On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote:

> Each of these approaches may have its pros and cons, but I somehow
> do not see that the newly proposed alternative is 10x better than
> what was reviewed and queued already to be worth the effort to
> replace it.

That's my feeling, too, but I'd reserve final judgement to see Dscho's
response; it's possible I am under-estimating the 32/64-bit confusion
risk.

I'd also note that his patch does not require bumping the distro
version, which would let us continue testing that old version in GitHub
Actions. That might be worth considering.

-Peff
Junio C Hamano Sept. 15, 2024, 4:25 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote:
>
>> Each of these approaches may have its pros and cons, but I somehow
>> do not see that the newly proposed alternative is 10x better than
>> what was reviewed and queued already to be worth the effort to
>> replace it.
>
> That's my feeling, too, but I'd reserve final judgement to see Dscho's
> response; it's possible I am under-estimating the 32/64-bit confusion
> risk.

FWIW, what you said matches my recollection from years ago ;-) back
when I had to deal with that.

> I'd also note that his patch does not require bumping the distro
> version, which would let us continue testing that old version in GitHub
> Actions. That might be worth considering.

Yes, that is true.

Considering that 16.04 has passed its expiration date for standard
support a few years ago, I am not sure how many more years of
practical/unsupported use and testing we would be getting by giving
cycles for the release in CI, though.

Thanks.
Johannes Schindelin Sept. 17, 2024, 12:20 p.m. UTC | #5
Hi,

On Sun, 15 Sep 2024, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
> > On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote:
> >
> >> Each of these approaches may have its pros and cons, but I somehow
> >> do not see that the newly proposed alternative is 10x better than
> >> what was reviewed and queued already to be worth the effort to
> >> replace it.

Installing lib64stdc++ indeed does not rely on the implementation detail
that `/__e/node20/` contains the Node version used to execute the Actions
in those Docker containers.

Of course, the fact that installing lib64stdc++ (and no other 64-bit
library) "fixes" the 64-bit Node version is also an implementation detail.

Between GitHub Actions' and Node's development speed, I personally would
expect the latter implementation detail to be changed many times over
before the former implementation detail would be changed.

In particular given that mapping the "externals" by any other name than
`/__e/` risks breaking existing GitHub workflows that might make use of
exactly that directory name, I consider the chances for that name change
to be negligible. It probably won't change, ever.

Of course, my favorite solution would be for `actions/runner` to be fixed
so that it detects the situation and uses a 32-bit variant in that case
[*1*].

Business forces work against this wish, though, so I don't see this
happening.

The next best thing, in my mind, is to come up with a solution that is
general enough that other projects could follow this example, which is
what I did.

And yes, the idea of mixing 32-bit and 64-bit things in a container that
was specifically used to only have 32-bit things still does not convince
me, it still looks like a much better idea to either stick with a
32-bit-only container, or to just do away with the complexity of a
container altogether if the environment does not need to be free of 64-bit
anyway (but why did we bother with that in the first place, then?).

Ciao,
Johannes

Footnote *1*: I have added quite a few details about this here:
https://github.com/actions/upload-artifact/issues/616#issuecomment-2350667347
Junio C Hamano Sept. 17, 2024, 3:01 p.m. UTC | #6
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Of course, my favorite solution would be for `actions/runner` to be fixed
> so that it detects the situation and uses a 32-bit variant in that case
> [*1*].

Yeah, that would be ideal.

> The next best thing, in my mind, is to come up with a solution that is
> general enough that other projects could follow this example, which is
> what I did.

I guess both patterns can be followed if they discover either, and
it made me wonder if this isn't a problem already solved by others,
but at the end of the day, any solution that was good enough and
unblocks us quickly was needed; the ones that were reviewed earlier
have been fast tracked down to 'maint' as of last night.  That does
not mean we won't have to further improve it, of course.  At least
until 32-bit archs no longer matter, that is ;-).

Thanks.
Jeff King Sept. 19, 2024, 7:05 a.m. UTC | #7
On Tue, Sep 17, 2024 at 02:20:41PM +0200, Johannes Schindelin wrote:

> Installing lib64stdc++ indeed does not rely on the implementation detail
> that `/__e/node20/` contains the Node version used to execute the Actions
> in those Docker containers.
> 
> Of course, the fact that installing lib64stdc++ (and no other 64-bit
> library) "fixes" the 64-bit Node version is also an implementation detail.

Yes, though "provide a 64-bit c/c++ runtime" does not seem all that
exotic. I'd be much more worried about the implicit library version
dependencies that exist, but that is true even on 64-bit systems. As
shown in the issues I linked earlier, lots of people are hitting a
similar problem with containers that have an older glibc.

The real solution there (IMHO) is a statically linked node in the runner
images, but I'm not sure of the reasons they haven't pursued that yet.

In the absence of that, the fact that your solution uses a node build
which (for reasons I'm still not sure I understand) seems to be OK with
an older glibc feels like a compelling reason. In fact, I find that much
more compelling than the risk of 32/64-bit confusion. I think part of
what I was responding to was the focus on that in your commit message.

> In particular given that mapping the "externals" by any other name than
> `/__e/` risks breaking existing GitHub workflows that might make use of
> exactly that directory name, I consider the chances for that name change
> to be negligible. It probably won't change, ever.

Fair enough. Going into this whole problem, I was not clear where /__e/
was coming from. I had thought at first it was something being carried
along by the Actions themselves (and thus action-specific and likely to
change). But it looks like it is part of the runner image and just
mounted into the container volume, so it is a magic keyword that Actions
and runner images both have to depend on.

> Of course, my favorite solution would be for `actions/runner` to be fixed
> so that it detects the situation and uses a 32-bit variant in that case
> [*1*].

Yes, me too (and preferably statically linked :) ).

> And yes, the idea of mixing 32-bit and 64-bit things in a container that
> was specifically used to only have 32-bit things still does not convince
> me, it still looks like a much better idea to either stick with a
> 32-bit-only container, or to just do away with the complexity of a
> container altogether if the environment does not need to be free of 64-bit
> anyway (but why did we bother with that in the first place, then?).

I think 32-bit builds directly inside a 64-bit runner image is a good
way to do the thing you were initially worried about: accidentally using
tools of the wrong type. Doing the whole build and test within the
32-bit container is something we'd want to keep.

The other alternative, which neither of us shown patches for, but which
I mentioned (courtesy of Ed) in the original thread is: do the Actions
outside the 32-bit container, run docker ourselves mounting the repo,
and then build and test inside the container. That's apparently how
libgit2 does it. It sidesteps the issue entirely, as the container never
runs anything external. And it would be applicable to all projects, no
matter what's in their containers (32-bit, old distros, even qemu of
alternate architectures).

I'm not sure how much work it would be to do, though.

Anyway, it sounds like Junio has merged my patches down to get things
moving again. I'm OK if you want to rebase your proposed fix on top of
that.

-Peff
diff mbox series

Patch

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 97f9b063109..25a5f5f0e29 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -348,19 +348,27 @@  jobs:
       jobname: ${{matrix.vector.jobname}}
       distro: ${{matrix.vector.distro}}
     runs-on: ubuntu-latest
-    container: ${{matrix.vector.image}}
+    container:
+      image: ${{matrix.vector.image}}
+      volumes:
+        # override /__e/node20 on 32-bit because it is 64-bit
+        - /tmp:/__e${{matrix.vector.jobname != 'linux32' && '-x86' || ''}}/node20
     steps:
-    - uses: actions/checkout@v4
-      if: matrix.vector.jobname != 'linux32'
-    - uses: actions/checkout@v1 # cannot be upgraded because Node.js Actions aren't supported in this container
+    - name: prepare x86 variant of node20
       if: matrix.vector.jobname == 'linux32'
+      run: |
+        apt -q update && apt -q -y install curl &&
+        NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz &&
+        curl -Lo /tmp/node.tar.gz $NODE_URL &&
+        tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz
+    - uses: actions/checkout@v4
     - run: ci/install-dependencies.sh
     - run: ci/run-build-and-tests.sh
     - name: print test failures
       if: failure() && env.FAILED_TEST_ARTIFACTS != ''
       run: ci/print-test-failures.sh
     - name: Upload failed tests' directories
-      if: failure() && env.FAILED_TEST_ARTIFACTS != '' && matrix.vector.jobname != 'linux32'
+      if: failure() && env.FAILED_TEST_ARTIFACTS != ''
       uses: actions/upload-artifact@v4
       with:
         name: failed-tests-${{matrix.vector.jobname}}