diff mbox series

[v2] async_die_is_recursing: work around GCC v11.x issue on Fedora

Message ID pull.1072.v2.git.1635998463474.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit b4fce699145b488772820ec69f1654910e5c793d
Headers show
Series [v2] async_die_is_recursing: work around GCC v11.x issue on Fedora | expand

Commit Message

Victoria Dye Nov. 4, 2021, 4:01 a.m. UTC
From: Victoria Dye <vdye@github.com>

This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI
build, first appearing after the introduction of a new version of the Fedora
docker image version. This image includes a version of `glibc` with the
attribute `__attr_access_none` added to `pthread_setspecific` [1], the
implementation of which only exists for GCC 11.X - the version included in
the Fedora image. The attribute requires that the pointer provided in the
second argument of `pthread_getspecific` must, if not NULL, be a pointer to
a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not
valid, causing the error.

This fix imitates a workaround added in SELinux [2] by using the pointer to
the static `async_die_counter` itself as the second argument to
`pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the
intent of the current usage while not triggering the build error.

[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
[2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/

Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
---
    async_die_is_recursing: fix use of pthread_getspecific for Fedora
    
    Following up on Johannes' report earlier [1], this patch fixes a
    compiler error in the Fedora CI build (the same issue was identified in
    a local developer build about a week ago [2]). This fix changes the
    second argument in the call to pthread_setspecific from '(void *)1' to a
    valid pointer, thus preventing the error in the use of
    __attr_access_none.
    
    
    Changes since V1
    ================
    
     * Change &ret to &async_die_counter ("dummy" argument to
       pthread_setspecific) so that it's using a global (rather than local)
       variable
     * Fix typo in commit message (pthread_getspecific ->
       pthread_setspecific)
    
    [1]
    https://lore.kernel.org/git/nycvar.QRO.7.76.6.2111040007170.56@tvgsbejvaqbjf.bet/
    [2]
    https://lore.kernel.org/git/43fe6d5c-bdb2-585c-c601-1da7a1b3ff8b@archlinux.org/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1072%2Fvdye%2Ffix-fedora-build-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1072/vdye/fix-fedora-build-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1072

Range-diff vs v1:

 1:  efe0b398f54 ! 1:  7f10e09269a async_die_is_recursing: fix use of pthread_getspecific for Fedora
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    async_die_is_recursing: fix use of pthread_getspecific for Fedora
     +    async_die_is_recursing: work around GCC v11.x issue on Fedora
      
     -    Correct an issue encountered in the `dockerized(pedantic, fedora)` CI build,
     -    first appearing after the release of v36 as the latest stable version of the
     -    Fedora docker image. This image includes a version of `glibc` with the
     -    attribute `__attr_access_none` added to `pthread_getspecific` [1], the
     +    This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI
     +    build, first appearing after the introduction of a new version of the Fedora
     +    docker image version. This image includes a version of `glibc` with the
     +    attribute `__attr_access_none` added to `pthread_setspecific` [1], the
          implementation of which only exists for GCC 11.X - the version included in
     -    Fedora. The attribute requires that the pointer provided in the second
     -    argument of `pthread_getspecific` must, if not NULL, be a pointer to a valid
     -    object. In the usage in `async_die_is_recursing`, `(void *)1` is not valid,
     -    resulting in the error.
     +    the Fedora image. The attribute requires that the pointer provided in the
     +    second argument of `pthread_getspecific` must, if not NULL, be a pointer to
     +    a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not
     +    valid, causing the error.
      
     -    The fix imitates a workaround added in SELinux [2] by using the pointer to
     -    `ret` as the second argument to `pthread_getspecific`. This guaranteed
     -    non-NULL, valid pointer adheres to the current intended usage of
     -    `pthread_getspecific` while not triggering the build error.
     +    This fix imitates a workaround added in SELinux [2] by using the pointer to
     +    the static `async_die_counter` itself as the second argument to
     +    `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the
     +    intent of the current usage while not triggering the build error.
      
          [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
          [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/
     @@ run-command.c: static NORETURN void die_async(const char *err, va_list params)
       {
       	void *ret = pthread_getspecific(async_die_counter);
      -	pthread_setspecific(async_die_counter, (void *)1);
     -+	pthread_setspecific(async_die_counter, &ret); /* set to any non-NULL valid pointer */
     ++	pthread_setspecific(async_die_counter, &async_die_counter); /* set to any non-NULL valid pointer */
       	return ret != NULL;
       }
       


 run-command.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 876b1423317071f43c99666f3fc3db3642dfbe14

Comments

Jeff King Nov. 4, 2021, 5:58 a.m. UTC | #1
On Thu, Nov 04, 2021 at 04:01:03AM +0000, Victoria Dye via GitGitGadget wrote:

>     Changes since V1
>     ================
>     
>      * Change &ret to &async_die_counter ("dummy" argument to
>        pthread_setspecific) so that it's using a global (rather than local)
>        variable
>      * Fix typo in commit message (pthread_getspecific ->
>        pthread_setspecific)

Thanks, this version looks good to me!

-Peff
Junio C Hamano Nov. 4, 2021, 6:35 a.m. UTC | #2
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI
> build, first appearing after the introduction of a new version of the Fedora
> docker image version. This image includes a version of `glibc` with the
> attribute `__attr_access_none` added to `pthread_setspecific` [1], the
> implementation of which only exists for GCC 11.X - the version included in
> the Fedora image. The attribute requires that the pointer provided in the
> second argument of `pthread_getspecific` must, if not NULL, be a pointer to
> a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not
> valid, causing the error.
>
> This fix imitates a workaround added in SELinux [2] by using the pointer to
> the static `async_die_counter` itself as the second argument to
> `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the
> intent of the current usage while not triggering the build error.
>
> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
> [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/
>
> Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---

Looks like they timed their update to disrupt us most effectively,
but we are gifted with watchful eyes and competent hands ;-).

Thanks for coming up with a clearly written description and a clean
fix so quickly.

Will apply.
Johannes Schindelin Nov. 4, 2021, 8:43 a.m. UTC | #3
Hi Victoria,

excellent work! The patch looks very good to me. I just want to add a
little context below, and thank you _so much_ for letting me sleep while
you tie it all up in a neat patch.

On Thu, 4 Nov 2021, Victoria Dye via GitGitGadget wrote:

> [...] This image includes a version of `glibc` with the attribute
> `__attr_access_none` added to `pthread_setspecific` [1], the
> implementation of which only exists for GCC 11.X - the version included
> in the Fedora image. The attribute requires that the pointer provided in
> the second argument of `pthread_getspecific` must, if not NULL, be a
> pointer to a valid object.

My initial reading last night was that the `none` mode means that the
pointer does not have to be valid, but you're right, the documentation at
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
clearly spells it out:

	Unless the pointer is null the pointed-to object must exist and
	have at least the size as denoted by the size-index argument. The
	object need not be initialized. The mode is intended to be used as
	a means to help validate the expected object size, for example in
	functions that call __builtin_object_size.

Which means that yes, `(void *)1` was incorrect and _had_ to be changed.
The patch is therefore a fix, not a work-around (contrary to my initial
impression).

So then I got puzzled by this while sleeping: why does it fail on Fedora,
when it does _not_ fail in Git for Windows' SDK (where we also recently
upgraded to GCC v11.x, thanks to the hard work of the MSYS2 project)?

My best guess is that my GCC (which is v11.2.0) and Fedora's GCC (which is
v11.2.1 plus the usual Fedora customization) behave slightly differently
with respect to that optional `size-index` argument:
`pthread_setspecific()` is essentially declared without a `size-index`, so
I guess GCC v11.2.1 probably defaults to `size-index = 1`.

> diff --git a/run-command.c b/run-command.c
> index 7ef5cc712a9..f40df01c772 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1099,7 +1099,7 @@ static NORETURN void die_async(const char *err, va_list params)
>  static int async_die_is_recursing(void)
>  {
>  	void *ret = pthread_getspecific(async_die_counter);
> -	pthread_setspecific(async_die_counter, (void *)1);
> +	pthread_setspecific(async_die_counter, &async_die_counter); /* set to any non-NULL valid pointer */

This looks good! To make the intention even clearer, we could change the
line above to `int ret = !!pthread_getspecific(async_die_counter);`, but
as we are _really_ late in the -rc cycle, I am very much in favor of
leaving out such clean-ups that do not fix any bug.

Again, thank you so much for your hard work, it was fun debugging this
with you,
Dscho

>  	return ret != NULL;
>  }
>
>
> base-commit: 876b1423317071f43c99666f3fc3db3642dfbe14
> --
> gitgitgadget
>
Ævar Arnfjörð Bjarmason Nov. 4, 2021, 9:42 a.m. UTC | #4
On Wed, Nov 03 2021, Junio C Hamano wrote:

> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Victoria Dye <vdye@github.com>
>>
>> This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI
>> build, first appearing after the introduction of a new version of the Fedora
>> docker image version. This image includes a version of `glibc` with the
>> attribute `__attr_access_none` added to `pthread_setspecific` [1], the
>> implementation of which only exists for GCC 11.X - the version included in
>> the Fedora image. The attribute requires that the pointer provided in the
>> second argument of `pthread_getspecific` must, if not NULL, be a pointer to
>> a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not
>> valid, causing the error.
>>
>> This fix imitates a workaround added in SELinux [2] by using the pointer to
>> the static `async_die_counter` itself as the second argument to
>> `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the
>> intent of the current usage while not triggering the build error.
>>
>> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
>> [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/
>>
>> Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>
> Looks like they timed their update to disrupt us most effectively,
> but we are gifted with watchful eyes and competent hands ;-).
>
> Thanks for coming up with a clearly written description and a clean
> fix so quickly.
>
> Will apply.

I don't know what this would be for the "fedora" and other images, but
it seems to me like the below is something we should do. This replaces
"latest" with whatever "latest" currently maps onto.

I.e. I don't think it's a good thing that the carpet gets swept from
under you as far as these CI images go. We could subscribe to some feed
of when these images are bumped to see when to update, but having our
base change from under us just leads to a waste of time for a bunch of
people wondering why their CI is failing, which now they'll need to
rebase on some on-list or only-in-upstream-master patch to have it
"pass".

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 6ed6a9e8076..6b7dab01269 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -7,7 +7,7 @@ env:
 
 jobs:
   ci-config:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-20.04
     outputs:
       enabled: ${{ steps.check-ref.outputs.enabled }}${{ steps.skip-if-redundant.outputs.enabled }}
     steps:
@@ -79,7 +79,7 @@ jobs:
   windows-build:
     needs: ci-config
     if: needs.ci-config.outputs.enabled == 'yes'
-    runs-on: windows-latest
+    runs-on: windows-2019
     steps:
     - uses: actions/checkout@v2
     - uses: git-for-windows/setup-git-for-windows-sdk@v1
@@ -97,7 +97,7 @@ jobs:
         name: windows-artifacts
         path: artifacts
   windows-test:
-    runs-on: windows-latest
+    runs-on: windows-2019
     needs: [windows-build]
     strategy:
       fail-fast: false
@@ -132,7 +132,7 @@ jobs:
     env:
       NO_PERL: 1
       GIT_CONFIG_PARAMETERS: "'user.name=CI' 'user.email=ci@git'"
-    runs-on: windows-latest
+    runs-on: windows-2019
     steps:
     - uses: actions/checkout@v2
     - uses: git-for-windows/setup-git-for-windows-sdk@v1
@@ -178,7 +178,7 @@ jobs:
         name: vs-artifacts
         path: artifacts
   vs-test:
-    runs-on: windows-latest
+    runs-on: windows-2019
     needs: vs-build
     strategy:
       fail-fast: false
@@ -218,22 +218,22 @@ jobs:
         vector:
           - jobname: linux-clang
             cc: clang
-            pool: ubuntu-latest
+            pool: ubuntu-20.04
           - jobname: linux-gcc
             cc: gcc
-            pool: ubuntu-latest
+            pool: ubuntu-20.04
           - jobname: osx-clang
             cc: clang
-            pool: macos-latest
+            pool: macos-10.15
           - jobname: osx-gcc
             cc: gcc
-            pool: macos-latest
+            pool: macos-10.15
           - jobname: linux-gcc-default
             cc: gcc
-            pool: ubuntu-latest
+            pool: ubuntu-20.04
           - jobname: linux-leaks
             cc: gcc
-            pool: ubuntu-latest
+            pool: ubuntu-20.04
     env:
       CC: ${{matrix.vector.cc}}
       jobname: ${{matrix.vector.jobname}}
@@ -265,7 +265,7 @@ jobs:
           image: fedora
     env:
       jobname: ${{matrix.vector.jobname}}
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-20.04
     container: ${{matrix.vector.image}}
     steps:
     - uses: actions/checkout@v1
@@ -314,7 +314,7 @@ jobs:
     if: needs.ci-config.outputs.enabled == 'yes'
     env:
       jobname: Documentation
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-20.04
     steps:
     - uses: actions/checkout@v2
     - run: ci/install-dependencies.sh
Carlo Marcelo Arenas Belón Nov. 4, 2021, 9:46 a.m. UTC | #5
On Thu, Nov 4, 2021 at 1:43 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

> My best guess is that my GCC (which is v11.2.0) and Fedora's GCC (which is
> v11.2.1 plus the usual Fedora customization) behave slightly differently
> with respect to that optional `size-index` argument:
> `pthread_setspecific()` is essentially declared without a `size-index`, so
> I guess GCC v11.2.1 probably defaults to `size-index = 1`.

Got to read the whole explanation that Victoria put together.

for the warning to trigger on that gcc, you got also to have the
attribute in the header as shown by the link[1] she provided, and
which I am sure could be a nice thing to send toward those helpful
folks of the MSYS2 project so it can be added to their WinPthread
headers, as was done in glibc.

Carlo

[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
Derrick Stolee Nov. 4, 2021, 1:08 p.m. UTC | #6
On 11/4/2021 5:42 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Nov 03 2021, Junio C Hamano wrote:
> 
>> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI
>>> build, first appearing after the introduction of a new version of the Fedora
>>> docker image version. This image includes a version of `glibc` with the
>>> attribute `__attr_access_none` added to `pthread_setspecific` [1], the
>>> implementation of which only exists for GCC 11.X - the version included in
>>> the Fedora image. The attribute requires that the pointer provided in the
>>> second argument of `pthread_getspecific` must, if not NULL, be a pointer to
>>> a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not
>>> valid, causing the error.
>>>
>>> This fix imitates a workaround added in SELinux [2] by using the pointer to
>>> the static `async_die_counter` itself as the second argument to
>>> `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the
>>> intent of the current usage while not triggering the build error.
>>>
>>> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
>>> [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/
>>>
>>> Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>>> Signed-off-by: Victoria Dye <vdye@github.com>
>>> ---
>>
>> Looks like they timed their update to disrupt us most effectively,
>> but we are gifted with watchful eyes and competent hands ;-).
>>
>> Thanks for coming up with a clearly written description and a clean
>> fix so quickly.
>>
>> Will apply.
> 
> I don't know what this would be for the "fedora" and other images, but
> it seems to me like the below is something we should do. This replaces
> "latest" with whatever "latest" currently maps onto.
> 
> I.e. I don't think it's a good thing that the carpet gets swept from
> under you as far as these CI images go. We could subscribe to some feed
> of when these images are bumped to see when to update, but having our
> base change from under us just leads to a waste of time for a bunch of
> people wondering why their CI is failing, which now they'll need to
> rebase on some on-list or only-in-upstream-master patch to have it
> "pass".
Having our CI go red because the 'latest' feed changed something is
probably the best "feed" to subscribe to, since we only get notified
if it matters.

Better to have automation go red than for us to not realize our code
doesn't work on newer platforms because our CI hasn't been updated.

Thanks,
-Stolee
Johannes Schindelin Nov. 4, 2021, 4:33 p.m. UTC | #7
Hi Carlo,

On Thu, 4 Nov 2021, Carlo Arenas wrote:

> On Thu, Nov 4, 2021 at 1:43 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > My best guess is that my GCC (which is v11.2.0) and Fedora's GCC (which is
> > v11.2.1 plus the usual Fedora customization) behave slightly differently
> > with respect to that optional `size-index` argument:
> > `pthread_setspecific()` is essentially declared without a `size-index`, so
> > I guess GCC v11.2.1 probably defaults to `size-index = 1`.
>
> Got to read the whole explanation that Victoria put together.
>
> for the warning to trigger on that gcc, you got also to have the
> attribute in the header as shown by the link[1] she provided, and
> which I am sure could be a nice thing to send toward those helpful
> folks of the MSYS2 project so it can be added to their WinPthread
> headers, as was done in glibc.

D'oh. Of course we do not have glibc in the Git for Windows SDK! Thanks
for helping me understand this.

Ciao,
Dscho

>
> Carlo
>
> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
>
Junio C Hamano Nov. 4, 2021, 5:46 p.m. UTC | #8
Derrick Stolee <stolee@gmail.com> writes:

> Having our CI go red because the 'latest' feed changed something is
> probably the best "feed" to subscribe to, since we only get notified
> if it matters.
>
> Better to have automation go red than for us to not realize our code
> doesn't work on newer platforms because our CI hasn't been updated.

I consider this a better implementation of what Æver suggested ;-)

And an incident like this is one of the reasons why I like the "CI
does not stop after seeing the first problem" behaviour.  In a short
term, people can ignore a particular known failure and ensure they
do not introduce any new ones.

Thanks.
Junio C Hamano Nov. 5, 2021, 9:45 p.m. UTC | #9
Junio C Hamano <gitster@pobox.com> writes:

> Thanks for coming up with a clearly written description and a clean
> fix so quickly.
>
> Will apply.

FWIW, this is now in all my integration branches that may trigger CI
tests.  Again, thanks for a quick fix.
diff mbox series

Patch

diff --git a/run-command.c b/run-command.c
index 7ef5cc712a9..f40df01c772 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1099,7 +1099,7 @@  static NORETURN void die_async(const char *err, va_list params)
 static int async_die_is_recursing(void)
 {
 	void *ret = pthread_getspecific(async_die_counter);
-	pthread_setspecific(async_die_counter, (void *)1);
+	pthread_setspecific(async_die_counter, &async_die_counter); /* set to any non-NULL valid pointer */
 	return ret != NULL;
 }