Message ID | pull.1072.v2.git.1635998463474.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | b4fce699145b488772820ec69f1654910e5c793d |
Headers | show |
Series | [v2] async_die_is_recursing: work around GCC v11.x issue on Fedora | expand |
On Thu, Nov 04, 2021 at 04:01:03AM +0000, Victoria Dye via GitGitGadget wrote: > Changes since V1 > ================ > > * Change &ret to &async_die_counter ("dummy" argument to > pthread_setspecific) so that it's using a global (rather than local) > variable > * Fix typo in commit message (pthread_getspecific -> > pthread_setspecific) Thanks, this version looks good to me! -Peff
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Victoria Dye <vdye@github.com> > > This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI > build, first appearing after the introduction of a new version of the Fedora > docker image version. This image includes a version of `glibc` with the > attribute `__attr_access_none` added to `pthread_setspecific` [1], the > implementation of which only exists for GCC 11.X - the version included in > the Fedora image. The attribute requires that the pointer provided in the > second argument of `pthread_getspecific` must, if not NULL, be a pointer to > a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not > valid, causing the error. > > This fix imitates a workaround added in SELinux [2] by using the pointer to > the static `async_die_counter` itself as the second argument to > `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the > intent of the current usage while not triggering the build error. > > [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8 > [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/ > > Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> > Signed-off-by: Victoria Dye <vdye@github.com> > --- Looks like they timed their update to disrupt us most effectively, but we are gifted with watchful eyes and competent hands ;-). Thanks for coming up with a clearly written description and a clean fix so quickly. Will apply.
Hi Victoria, excellent work! The patch looks very good to me. I just want to add a little context below, and thank you _so much_ for letting me sleep while you tie it all up in a neat patch. On Thu, 4 Nov 2021, Victoria Dye via GitGitGadget wrote: > [...] This image includes a version of `glibc` with the attribute > `__attr_access_none` added to `pthread_setspecific` [1], the > implementation of which only exists for GCC 11.X - the version included > in the Fedora image. The attribute requires that the pointer provided in > the second argument of `pthread_getspecific` must, if not NULL, be a > pointer to a valid object. My initial reading last night was that the `none` mode means that the pointer does not have to be valid, but you're right, the documentation at https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html clearly spells it out: Unless the pointer is null the pointed-to object must exist and have at least the size as denoted by the size-index argument. The object need not be initialized. The mode is intended to be used as a means to help validate the expected object size, for example in functions that call __builtin_object_size. Which means that yes, `(void *)1` was incorrect and _had_ to be changed. The patch is therefore a fix, not a work-around (contrary to my initial impression). So then I got puzzled by this while sleeping: why does it fail on Fedora, when it does _not_ fail in Git for Windows' SDK (where we also recently upgraded to GCC v11.x, thanks to the hard work of the MSYS2 project)? My best guess is that my GCC (which is v11.2.0) and Fedora's GCC (which is v11.2.1 plus the usual Fedora customization) behave slightly differently with respect to that optional `size-index` argument: `pthread_setspecific()` is essentially declared without a `size-index`, so I guess GCC v11.2.1 probably defaults to `size-index = 1`. > diff --git a/run-command.c b/run-command.c > index 7ef5cc712a9..f40df01c772 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -1099,7 +1099,7 @@ static NORETURN void die_async(const char *err, va_list params) > static int async_die_is_recursing(void) > { > void *ret = pthread_getspecific(async_die_counter); > - pthread_setspecific(async_die_counter, (void *)1); > + pthread_setspecific(async_die_counter, &async_die_counter); /* set to any non-NULL valid pointer */ This looks good! To make the intention even clearer, we could change the line above to `int ret = !!pthread_getspecific(async_die_counter);`, but as we are _really_ late in the -rc cycle, I am very much in favor of leaving out such clean-ups that do not fix any bug. Again, thank you so much for your hard work, it was fun debugging this with you, Dscho > return ret != NULL; > } > > > base-commit: 876b1423317071f43c99666f3fc3db3642dfbe14 > -- > gitgitgadget >
On Wed, Nov 03 2021, Junio C Hamano wrote: > "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Victoria Dye <vdye@github.com> >> >> This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI >> build, first appearing after the introduction of a new version of the Fedora >> docker image version. This image includes a version of `glibc` with the >> attribute `__attr_access_none` added to `pthread_setspecific` [1], the >> implementation of which only exists for GCC 11.X - the version included in >> the Fedora image. The attribute requires that the pointer provided in the >> second argument of `pthread_getspecific` must, if not NULL, be a pointer to >> a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not >> valid, causing the error. >> >> This fix imitates a workaround added in SELinux [2] by using the pointer to >> the static `async_die_counter` itself as the second argument to >> `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the >> intent of the current usage while not triggering the build error. >> >> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8 >> [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/ >> >> Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> >> Signed-off-by: Victoria Dye <vdye@github.com> >> --- > > Looks like they timed their update to disrupt us most effectively, > but we are gifted with watchful eyes and competent hands ;-). > > Thanks for coming up with a clearly written description and a clean > fix so quickly. > > Will apply. I don't know what this would be for the "fedora" and other images, but it seems to me like the below is something we should do. This replaces "latest" with whatever "latest" currently maps onto. I.e. I don't think it's a good thing that the carpet gets swept from under you as far as these CI images go. We could subscribe to some feed of when these images are bumped to see when to update, but having our base change from under us just leads to a waste of time for a bunch of people wondering why their CI is failing, which now they'll need to rebase on some on-list or only-in-upstream-master patch to have it "pass". diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 6ed6a9e8076..6b7dab01269 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -7,7 +7,7 @@ env: jobs: ci-config: - runs-on: ubuntu-latest + runs-on: ubuntu-20.04 outputs: enabled: ${{ steps.check-ref.outputs.enabled }}${{ steps.skip-if-redundant.outputs.enabled }} steps: @@ -79,7 +79,7 @@ jobs: windows-build: needs: ci-config if: needs.ci-config.outputs.enabled == 'yes' - runs-on: windows-latest + runs-on: windows-2019 steps: - uses: actions/checkout@v2 - uses: git-for-windows/setup-git-for-windows-sdk@v1 @@ -97,7 +97,7 @@ jobs: name: windows-artifacts path: artifacts windows-test: - runs-on: windows-latest + runs-on: windows-2019 needs: [windows-build] strategy: fail-fast: false @@ -132,7 +132,7 @@ jobs: env: NO_PERL: 1 GIT_CONFIG_PARAMETERS: "'user.name=CI' 'user.email=ci@git'" - runs-on: windows-latest + runs-on: windows-2019 steps: - uses: actions/checkout@v2 - uses: git-for-windows/setup-git-for-windows-sdk@v1 @@ -178,7 +178,7 @@ jobs: name: vs-artifacts path: artifacts vs-test: - runs-on: windows-latest + runs-on: windows-2019 needs: vs-build strategy: fail-fast: false @@ -218,22 +218,22 @@ jobs: vector: - jobname: linux-clang cc: clang - pool: ubuntu-latest + pool: ubuntu-20.04 - jobname: linux-gcc cc: gcc - pool: ubuntu-latest + pool: ubuntu-20.04 - jobname: osx-clang cc: clang - pool: macos-latest + pool: macos-10.15 - jobname: osx-gcc cc: gcc - pool: macos-latest + pool: macos-10.15 - jobname: linux-gcc-default cc: gcc - pool: ubuntu-latest + pool: ubuntu-20.04 - jobname: linux-leaks cc: gcc - pool: ubuntu-latest + pool: ubuntu-20.04 env: CC: ${{matrix.vector.cc}} jobname: ${{matrix.vector.jobname}} @@ -265,7 +265,7 @@ jobs: image: fedora env: jobname: ${{matrix.vector.jobname}} - runs-on: ubuntu-latest + runs-on: ubuntu-20.04 container: ${{matrix.vector.image}} steps: - uses: actions/checkout@v1 @@ -314,7 +314,7 @@ jobs: if: needs.ci-config.outputs.enabled == 'yes' env: jobname: Documentation - runs-on: ubuntu-latest + runs-on: ubuntu-20.04 steps: - uses: actions/checkout@v2 - run: ci/install-dependencies.sh
On Thu, Nov 4, 2021 at 1:43 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > My best guess is that my GCC (which is v11.2.0) and Fedora's GCC (which is > v11.2.1 plus the usual Fedora customization) behave slightly differently > with respect to that optional `size-index` argument: > `pthread_setspecific()` is essentially declared without a `size-index`, so > I guess GCC v11.2.1 probably defaults to `size-index = 1`. Got to read the whole explanation that Victoria put together. for the warning to trigger on that gcc, you got also to have the attribute in the header as shown by the link[1] she provided, and which I am sure could be a nice thing to send toward those helpful folks of the MSYS2 project so it can be added to their WinPthread headers, as was done in glibc. Carlo [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
On 11/4/2021 5:42 AM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Nov 03 2021, Junio C Hamano wrote: > >> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: >> >>> From: Victoria Dye <vdye@github.com> >>> >>> This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI >>> build, first appearing after the introduction of a new version of the Fedora >>> docker image version. This image includes a version of `glibc` with the >>> attribute `__attr_access_none` added to `pthread_setspecific` [1], the >>> implementation of which only exists for GCC 11.X - the version included in >>> the Fedora image. The attribute requires that the pointer provided in the >>> second argument of `pthread_getspecific` must, if not NULL, be a pointer to >>> a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not >>> valid, causing the error. >>> >>> This fix imitates a workaround added in SELinux [2] by using the pointer to >>> the static `async_die_counter` itself as the second argument to >>> `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the >>> intent of the current usage while not triggering the build error. >>> >>> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8 >>> [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/ >>> >>> Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> >>> Signed-off-by: Victoria Dye <vdye@github.com> >>> --- >> >> Looks like they timed their update to disrupt us most effectively, >> but we are gifted with watchful eyes and competent hands ;-). >> >> Thanks for coming up with a clearly written description and a clean >> fix so quickly. >> >> Will apply. > > I don't know what this would be for the "fedora" and other images, but > it seems to me like the below is something we should do. This replaces > "latest" with whatever "latest" currently maps onto. > > I.e. I don't think it's a good thing that the carpet gets swept from > under you as far as these CI images go. We could subscribe to some feed > of when these images are bumped to see when to update, but having our > base change from under us just leads to a waste of time for a bunch of > people wondering why their CI is failing, which now they'll need to > rebase on some on-list or only-in-upstream-master patch to have it > "pass". Having our CI go red because the 'latest' feed changed something is probably the best "feed" to subscribe to, since we only get notified if it matters. Better to have automation go red than for us to not realize our code doesn't work on newer platforms because our CI hasn't been updated. Thanks, -Stolee
Hi Carlo, On Thu, 4 Nov 2021, Carlo Arenas wrote: > On Thu, Nov 4, 2021 at 1:43 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > My best guess is that my GCC (which is v11.2.0) and Fedora's GCC (which is > > v11.2.1 plus the usual Fedora customization) behave slightly differently > > with respect to that optional `size-index` argument: > > `pthread_setspecific()` is essentially declared without a `size-index`, so > > I guess GCC v11.2.1 probably defaults to `size-index = 1`. > > Got to read the whole explanation that Victoria put together. > > for the warning to trigger on that gcc, you got also to have the > attribute in the header as shown by the link[1] she provided, and > which I am sure could be a nice thing to send toward those helpful > folks of the MSYS2 project so it can be added to their WinPthread > headers, as was done in glibc. D'oh. Of course we do not have glibc in the Git for Windows SDK! Thanks for helping me understand this. Ciao, Dscho > > Carlo > > [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8 >
Derrick Stolee <stolee@gmail.com> writes: > Having our CI go red because the 'latest' feed changed something is > probably the best "feed" to subscribe to, since we only get notified > if it matters. > > Better to have automation go red than for us to not realize our code > doesn't work on newer platforms because our CI hasn't been updated. I consider this a better implementation of what Æver suggested ;-) And an incident like this is one of the reasons why I like the "CI does not stop after seeing the first problem" behaviour. In a short term, people can ignore a particular known failure and ensure they do not introduce any new ones. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Thanks for coming up with a clearly written description and a clean > fix so quickly. > > Will apply. FWIW, this is now in all my integration branches that may trigger CI tests. Again, thanks for a quick fix.
diff --git a/run-command.c b/run-command.c index 7ef5cc712a9..f40df01c772 100644 --- a/run-command.c +++ b/run-command.c @@ -1099,7 +1099,7 @@ static NORETURN void die_async(const char *err, va_list params) static int async_die_is_recursing(void) { void *ret = pthread_getspecific(async_die_counter); - pthread_setspecific(async_die_counter, (void *)1); + pthread_setspecific(async_die_counter, &async_die_counter); /* set to any non-NULL valid pointer */ return ret != NULL; }