Message ID | pull.1790.git.1726274559928.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ci(linux32): make Javascript Actions work in x86 mode | expand |
On Sat, Sep 14, 2024 at 12:42:39AM +0000, Johannes Schindelin via GitGitGadget wrote: > To get the CI builds to work again, a work-around was implemented in > https://lore.kernel.org/git/20240912094841.GD589828@coredump.intra.peff.net > to let the 32-bit container make use of the 64-bit node 20 provided by > the Actions runner. > > This, however, runs the risk of using 64-bit executables when we > purposefully chose a Docker image that only contains 32-bit bits and > pieces so that accidental use of 64-bit libraries or executables would > not happen. How big a risk is this? In my experience with multiarch systems, the difficulty is the opposite: convincing the non-base toolchain to work at all, rather than using it accidentally. Especially as the solution in my patch is not configuring apt for multiarch support, where you could accidentally install gcc:amd64 versus gcc:i386. Even if we were to accidentally bring in the cross-platform compiler via the gcc-10-x86-64-linux-gnu package, you'd have to point $(CC) at it explicitly. But maybe others have had a different experience? > Let's go about this the other way round instead, by overriding the amd64 > version of node 20 the Actions runner provides with an x86 one (which is > "officially unofficial" by virtue of being hosted on > unofficial-builds.nodejs.org). I'm not totally opposed to this direction, but I'm a little concerned about the stability/maintenance of the solution. In particular: > + NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz && Will this URL work forever? Looking at the release/ directory, it looks like it should hang around. They have entries going back to 2019 (which is not all that old, but I suspect that's when they started the build repository). The flip side is: will node20 be sufficient for Actions forever? Node16 was already deprecated in Actions in 2023, and it was released in 2021 (looks like Node releases have a 2-year lifespan in general). So node20 takes us to April 2026 or so. Of course my solution has similar problems. Probably node24 or whatever comes next will need another glibc bump. I was mildly surprised that the build you reference didn't run into that problem, actually. But I double-checked and it appears to run OK in Xenial; it needs 2.17, according to some "objdump -T" sleuthing. I'm not sure why that's so different from the official 64-bit builds. > + curl -Lo /tmp/node.tar.gz $NODE_URL && > + tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz This is pretty intimate with how Actions work (both the node20 version and the "/__e/" magic). It's hard to say if/when that would bite us. > For good measure, I also reported this problem with that deprecation at > https://github.com/actions/upload-artifact/issues/616 even though I know > that the GitHub Actions team saw a headcount-losing reorg recently and > therefore I do not really expect that they have any bandwidth to help > with this. So this work-around is the best we can do for now, I believe. I think it's reasonably well known there already. Here are some relevant issues I came across: - https://github.com/actions/checkout/issues/334 - https://github.com/actions/checkout/issues/1590 - https://github.com/actions/setup-node/issues/922 - https://github.com/actions/runner/issues/2906 Most folks are hitting the glibc issue rather than the i386 one, but the core of the problem is the same. IMHO the ultimate solution is a statically-linked node binary (you'd still be relying on the kernel, but that has a very good track record of userspace compatibility). -Peff
Jeff King <peff@peff.net> writes: > I'm not totally opposed to this direction, but I'm a little concerned > about the stability/maintenance of the solution. In particular: > >> + NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz && > ... >> + curl -Lo /tmp/node.tar.gz $NODE_URL && >> + tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz > > This is pretty intimate with how Actions work (both the node20 version > and the "/__e/" magic). It's hard to say if/when that would bite us. Thanks for clearly expressing the uneasiness I felt, which I could not turn into concrete words, when I saw the patch first time. Each of these approaches may have its pros and cons, but I somehow do not see that the newly proposed alternative is 10x better than what was reviewed and queued already to be worth the effort to replace it. Thanks, both.
On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote: > Each of these approaches may have its pros and cons, but I somehow > do not see that the newly proposed alternative is 10x better than > what was reviewed and queued already to be worth the effort to > replace it. That's my feeling, too, but I'd reserve final judgement to see Dscho's response; it's possible I am under-estimating the 32/64-bit confusion risk. I'd also note that his patch does not require bumping the distro version, which would let us continue testing that old version in GitHub Actions. That might be worth considering. -Peff
Jeff King <peff@peff.net> writes: > On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote: > >> Each of these approaches may have its pros and cons, but I somehow >> do not see that the newly proposed alternative is 10x better than >> what was reviewed and queued already to be worth the effort to >> replace it. > > That's my feeling, too, but I'd reserve final judgement to see Dscho's > response; it's possible I am under-estimating the 32/64-bit confusion > risk. FWIW, what you said matches my recollection from years ago ;-) back when I had to deal with that. > I'd also note that his patch does not require bumping the distro > version, which would let us continue testing that old version in GitHub > Actions. That might be worth considering. Yes, that is true. Considering that 16.04 has passed its expiration date for standard support a few years ago, I am not sure how many more years of practical/unsupported use and testing we would be getting by giving cycles for the release in CI, though. Thanks.
Hi, On Sun, 15 Sep 2024, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote: > > > >> Each of these approaches may have its pros and cons, but I somehow > >> do not see that the newly proposed alternative is 10x better than > >> what was reviewed and queued already to be worth the effort to > >> replace it. Installing lib64stdc++ indeed does not rely on the implementation detail that `/__e/node20/` contains the Node version used to execute the Actions in those Docker containers. Of course, the fact that installing lib64stdc++ (and no other 64-bit library) "fixes" the 64-bit Node version is also an implementation detail. Between GitHub Actions' and Node's development speed, I personally would expect the latter implementation detail to be changed many times over before the former implementation detail would be changed. In particular given that mapping the "externals" by any other name than `/__e/` risks breaking existing GitHub workflows that might make use of exactly that directory name, I consider the chances for that name change to be negligible. It probably won't change, ever. Of course, my favorite solution would be for `actions/runner` to be fixed so that it detects the situation and uses a 32-bit variant in that case [*1*]. Business forces work against this wish, though, so I don't see this happening. The next best thing, in my mind, is to come up with a solution that is general enough that other projects could follow this example, which is what I did. And yes, the idea of mixing 32-bit and 64-bit things in a container that was specifically used to only have 32-bit things still does not convince me, it still looks like a much better idea to either stick with a 32-bit-only container, or to just do away with the complexity of a container altogether if the environment does not need to be free of 64-bit anyway (but why did we bother with that in the first place, then?). Ciao, Johannes Footnote *1*: I have added quite a few details about this here: https://github.com/actions/upload-artifact/issues/616#issuecomment-2350667347
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Of course, my favorite solution would be for `actions/runner` to be fixed > so that it detects the situation and uses a 32-bit variant in that case > [*1*]. Yeah, that would be ideal. > The next best thing, in my mind, is to come up with a solution that is > general enough that other projects could follow this example, which is > what I did. I guess both patterns can be followed if they discover either, and it made me wonder if this isn't a problem already solved by others, but at the end of the day, any solution that was good enough and unblocks us quickly was needed; the ones that were reviewed earlier have been fast tracked down to 'maint' as of last night. That does not mean we won't have to further improve it, of course. At least until 32-bit archs no longer matter, that is ;-). Thanks.
On Tue, Sep 17, 2024 at 02:20:41PM +0200, Johannes Schindelin wrote: > Installing lib64stdc++ indeed does not rely on the implementation detail > that `/__e/node20/` contains the Node version used to execute the Actions > in those Docker containers. > > Of course, the fact that installing lib64stdc++ (and no other 64-bit > library) "fixes" the 64-bit Node version is also an implementation detail. Yes, though "provide a 64-bit c/c++ runtime" does not seem all that exotic. I'd be much more worried about the implicit library version dependencies that exist, but that is true even on 64-bit systems. As shown in the issues I linked earlier, lots of people are hitting a similar problem with containers that have an older glibc. The real solution there (IMHO) is a statically linked node in the runner images, but I'm not sure of the reasons they haven't pursued that yet. In the absence of that, the fact that your solution uses a node build which (for reasons I'm still not sure I understand) seems to be OK with an older glibc feels like a compelling reason. In fact, I find that much more compelling than the risk of 32/64-bit confusion. I think part of what I was responding to was the focus on that in your commit message. > In particular given that mapping the "externals" by any other name than > `/__e/` risks breaking existing GitHub workflows that might make use of > exactly that directory name, I consider the chances for that name change > to be negligible. It probably won't change, ever. Fair enough. Going into this whole problem, I was not clear where /__e/ was coming from. I had thought at first it was something being carried along by the Actions themselves (and thus action-specific and likely to change). But it looks like it is part of the runner image and just mounted into the container volume, so it is a magic keyword that Actions and runner images both have to depend on. > Of course, my favorite solution would be for `actions/runner` to be fixed > so that it detects the situation and uses a 32-bit variant in that case > [*1*]. Yes, me too (and preferably statically linked :) ). > And yes, the idea of mixing 32-bit and 64-bit things in a container that > was specifically used to only have 32-bit things still does not convince > me, it still looks like a much better idea to either stick with a > 32-bit-only container, or to just do away with the complexity of a > container altogether if the environment does not need to be free of 64-bit > anyway (but why did we bother with that in the first place, then?). I think 32-bit builds directly inside a 64-bit runner image is a good way to do the thing you were initially worried about: accidentally using tools of the wrong type. Doing the whole build and test within the 32-bit container is something we'd want to keep. The other alternative, which neither of us shown patches for, but which I mentioned (courtesy of Ed) in the original thread is: do the Actions outside the 32-bit container, run docker ourselves mounting the repo, and then build and test inside the container. That's apparently how libgit2 does it. It sidesteps the issue entirely, as the container never runs anything external. And it would be applicable to all projects, no matter what's in their containers (32-bit, old distros, even qemu of alternate architectures). I'm not sure how much work it would be to do, though. Anyway, it sounds like Junio has merged my patches down to get things moving again. I'm OK if you want to rebase your proposed fix on top of that. -Peff
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 97f9b063109..25a5f5f0e29 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -348,19 +348,27 @@ jobs: jobname: ${{matrix.vector.jobname}} distro: ${{matrix.vector.distro}} runs-on: ubuntu-latest - container: ${{matrix.vector.image}} + container: + image: ${{matrix.vector.image}} + volumes: + # override /__e/node20 on 32-bit because it is 64-bit + - /tmp:/__e${{matrix.vector.jobname != 'linux32' && '-x86' || ''}}/node20 steps: - - uses: actions/checkout@v4 - if: matrix.vector.jobname != 'linux32' - - uses: actions/checkout@v1 # cannot be upgraded because Node.js Actions aren't supported in this container + - name: prepare x86 variant of node20 if: matrix.vector.jobname == 'linux32' + run: | + apt -q update && apt -q -y install curl && + NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz && + curl -Lo /tmp/node.tar.gz $NODE_URL && + tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz + - uses: actions/checkout@v4 - run: ci/install-dependencies.sh - run: ci/run-build-and-tests.sh - name: print test failures if: failure() && env.FAILED_TEST_ARTIFACTS != '' run: ci/print-test-failures.sh - name: Upload failed tests' directories - if: failure() && env.FAILED_TEST_ARTIFACTS != '' && matrix.vector.jobname != 'linux32' + if: failure() && env.FAILED_TEST_ARTIFACTS != '' uses: actions/upload-artifact@v4 with: name: failed-tests-${{matrix.vector.jobname}}