diff mbox series

[v3] http API: fix dangling pointer issue noted by GCC 12.0

Message ID patch-v3-1.1-69190804c67-20220325T143322Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v3] http API: fix dangling pointer issue noted by GCC 12.0 | expand

Commit Message

Ævar Arnfjörð Bjarmason March 25, 2022, 2:34 p.m. UTC
The pre-release GCC 12.0 development branch has a new warning about
dangling pointers in -Wall:

    http.c: In function ‘run_active_slot’:
    http.c:1332:24: error: storing the address of local variable ‘finished’ in ‘*slot.finished’ [-Werror=dangling-pointer=]
     1332 |         slot->finished = &finished;
          |         ~~~~~~~~~~~~~~~^~~~~~~~~~~
    http.c:1330:13: note: ‘finished’ declared here
     1330 |         int finished = 0;
          |             ^~~~~~~~

This is on a locally built "gcc (GCC) 12.0.1 20220120 (experimental)",
built from gcc.git's 8bc700f4c3f (Enhance vec_pack_trunc for integral
mode mask., 2022-01-17).

The GCC warning is specifically about pointers that survive the exit
of the function. See a comment added to
"pass_waccess::use_after_inval_p" in the GCC commit that added the
warning, or:

    /* The use is one of a dangling pointer if a clobber of the variable
      [the pointer points to] has not been found before the function exit
      point.  */
    [...]

There's a few possible ways to fix this, but the simplest is to assign
NULL to "slot->finished" at the end of run_active_slot().

This isn't the only caller that assigns to "slot->finished", see see
the assignments in http-walker.c:process_alternates_response() and
http.c:finish_active_slot().

But those assignments are both to the pointer to our local variable
here, so this fix is correct. The only way that code in http-walker.c
could have done its assignments is to the pointer to this specific
variable.

It was suggested[2] to guard that with "if (slot->finished ==
&finished)", but that'll still trigger the warning.

1. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
2. https://lore.kernel.org/git/xmqq8rv2nggn.fsf@gitster.g/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

This should clarify the feedback on v2, sorry about the very late
re-roll.

I.e. I *meant* in v2 that it's the only assignment to slot->finished
itself is here, the other assignments in http-walker.c are to the
pointer to our variable here.

Range-diff against v2:
1:  777838267a5 ! 1:  69190804c67 http API: fix dangling pointer issue noted by GCC 12.0
    @@ Commit message
             [...]
     
         There's a few possible ways to fix this, but the simplest is to assign
    -    NULL to "slot->finished" at the end of run_active_slot(), it's the
    -    only caller that ever assigns non-NULL to it. It was suggested[2] to
    -    guard that with "if (slot->finished == &finished)", but that'll still
    -    trigger the warning.
    +    NULL to "slot->finished" at the end of run_active_slot().
    +
    +    This isn't the only caller that assigns to "slot->finished", see see
    +    the assignments in http-walker.c:process_alternates_response() and
    +    http.c:finish_active_slot().
    +
    +    But those assignments are both to the pointer to our local variable
    +    here, so this fix is correct. The only way that code in http-walker.c
    +    could have done its assignments is to the pointer to this specific
    +    variable.
    +
    +    It was suggested[2] to guard that with "if (slot->finished ==
    +    &finished)", but that'll still trigger the warning.
     
         1. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
         2. https://lore.kernel.org/git/xmqq8rv2nggn.fsf@gitster.g/

 http.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Taylor Blau March 25, 2022, 6:11 p.m. UTC | #1
On Fri, Mar 25, 2022 at 03:34:49PM +0100, Ævar Arnfjörð Bjarmason wrote:
> This isn't the only caller that assigns to "slot->finished", see see

s/see see/see ?

> the assignments in http-walker.c:process_alternates_response() and
> http.c:finish_active_slot().
>
> But those assignments are both to the pointer to our local variable
> here, so this fix is correct. The only way that code in http-walker.c
> could have done its assignments is to the pointer to this specific
> variable.

Got it; this is the key piece that I was missing in my earlier review.
Sorry about that, this looks completely safe to me now.

Thanks,
Taylor
Junio C Hamano March 26, 2022, 12:13 a.m. UTC | #2
Taylor Blau <me@ttaylorr.com> writes:

> On Fri, Mar 25, 2022 at 03:34:49PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> This isn't the only caller that assigns to "slot->finished", see see
>
> s/see see/see ?
>
>> the assignments in http-walker.c:process_alternates_response() and
>> http.c:finish_active_slot().
>>
>> But those assignments are both to the pointer to our local variable
>> here, so this fix is correct. The only way that code in http-walker.c
>> could have done its assignments is to the pointer to this specific
>> variable.
>
> Got it; this is the key piece that I was missing in my earlier review.
> Sorry about that, this looks completely safe to me now.

It does not exactly look safe to me, though.

With a bit of rewrite, here is what I'd queue for now.  I really
hope that the pre-release compiler will be fixed soon so that we do
not have to worry about this patch.

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Date: Fri, 25 Mar 2022 15:34:49 +0100

The pre-release GCC 12.0 development branch has a new warning about
dangling pointers in -Wall:

    http.c: In function ‘run_active_slot’:
    http.c:1332:24: error: storing the address of local variable ‘finished’ in ‘*slot.finished’ [-Werror=dangling-pointer=]
     1332 |         slot->finished = &finished;
          |         ~~~~~~~~~~~~~~~^~~~~~~~~~~
    http.c:1330:13: note: ‘finished’ declared here
     1330 |         int finished = 0;
          |             ^~~~~~~~

This is on a locally built "gcc (GCC) 12.0.1 20220120 (experimental)",
built from gcc.git's 8bc700f4c3f (Enhance vec_pack_trunc for integral
mode mask., 2022-01-17).

The GCC warning is specifically about pointers that survive the exit
of the function. See a comment added to
"pass_waccess::use_after_inval_p" in the GCC commit that added the
warning, or:

    /* The use is one of a dangling pointer if a clobber of the variable
      [the pointer points to] has not been found before the function exit
      point.  */
    [...]

There's a few possible ways to fix this, but the simplest is to assign
NULL to "slot->finished" at the end of run_active_slot().

It was suggested[2] to guard that with "if (slot->finished ==
&finished)", but that'll still trigger the bogus warning from the
compiler.

This is the only place that assigns to "slot->finished"; other
mention of slot->finished in http-walker.c:process_alternates_response() and
http.c:finish_active_slot() are assignments through the pointer,
and not moving where the pointer points at.

As long as the same slot is never passed to run_active_slot()
recursively, clearing the member unconditionally when the control
leaves the function should not break the code.  Knock wood, as
nobody seems to have made sure if that is the case.

We could add

	if (slot->finished)
		BUG("the uncoditional clearing at the end is wrong");

early in run_active_slot(), before we assign the pointer to our
on-stack variable to this field, to give us such a guarantee,
though.

1. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
2. https://lore.kernel.org/git/xmqq8rv2nggn.fsf@gitster.g/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 http.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/http.c b/http.c
index 229da4d148..2f67fbb33c 100644
--- a/http.c
+++ b/http.c
@@ -1367,6 +1367,7 @@ void run_active_slot(struct active_request_slot *slot)
 			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
 		}
 	}
+	slot->finished = NULL;
 }
 
 static void release_active_slot(struct active_request_slot *slot)
Ævar Arnfjörð Bjarmason April 14, 2022, 3:27 p.m. UTC | #3
On Fri, Mar 25 2022, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
>
>> On Fri, Mar 25, 2022 at 03:34:49PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>> This isn't the only caller that assigns to "slot->finished", see see
>>
>> s/see see/see ?
>>
>>> the assignments in http-walker.c:process_alternates_response() and
>>> http.c:finish_active_slot().
>>>
>>> But those assignments are both to the pointer to our local variable
>>> here, so this fix is correct. The only way that code in http-walker.c
>>> could have done its assignments is to the pointer to this specific
>>> variable.
>>
>> Got it; this is the key piece that I was missing in my earlier review.
>> Sorry about that, this looks completely safe to me now.
>
> It does not exactly look safe to me, though.
>
> With a bit of rewrite, here is what I'd queue for now.  I really
> hope that the pre-release compiler will be fixed soon so that we do
> not have to worry about this patch.

Late reply, sorry.

I don't think the compiler is broken in this case.

Having spelunked in the GCC docs, source, commits involved & in their
bugtracker I don't think they'd consider this broken. It's working as
designed.

Now, of course as with any new compiler warning you might argue that
it's too overzealous, but in this case it's included it a -Wall in GCC
12.0.

The warning is specifically about the local pointer being stored in a
structure that survives the exit of function, and therefore if you were
to dereference that pointer after the function exists the results are
undefined.

Now, we happen to be able to carefully read the code in this case and
reason that we won't be looking at it except in the code that's called
within this function, so in practice we're OK.

But we'd have a bug sneak up on us if that wasn't the case, so it's
safer to NULL this out, which is exactly what the new GCC warning is
concerning itself with.

I.e. it's not promising to narrow itself to only those cases where GCC
can know for absolute certain that it's a *practical* issue.

Which, basically would be asking for it to do as much work to emit it as
it has to do on -fanalyzer, which makes compilation of git.git take
somewhere in the mid-range of double digit minutes for me, instead of
~20s.

So I'd prefer:

 1. Adjust for release etc., but that what I submitted in
    https://lore.kernel.org/git/patch-v3-1.1-69190804c67-20220325T143322Z-avarab@gmail.com/
    go in as-is.

 2. If you're convinced this is a compiler bug could you please either
    drop the current version, or rewrite the commit envelope so that
    you're the author?

    I very much appreciate when you do local commit fix-ups for
    rewording, typos, or even rewriting something I did for the better.

    But in this case it's got my name on the envelope & has been
    reworded to say pretty much the exact opposite of what I
    think/believe about this warning.

    Which isn't something wrong or not permitted by the
    license/SOB/whatever.

    I'd just find it confusing when I'll dig this up at some point in
    the future to find my name on it, read the explanation, and then be
    perplexed why I ever thought that about this particular warning
    ... :)

Thanks.
Junio C Hamano April 14, 2022, 5:04 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Having spelunked in the GCC docs, source, commits involved & in their
> bugtracker I don't think they'd consider this broken. It's working as
> designed.
>
> Now, of course as with any new compiler warning you might argue that
> it's too overzealous, but in this case it's included it a -Wall in GCC
> 12.0.

Whatever.  I do not care if this is a broken or wai from GCC's point
of view.

I care more about the correct operation of the production code of
ours, than a workaround to squelch a warning, overzealous or not, by
a compiler, and if it is not clear that the workaround is obviously
free of negative side effect, we do not want to deliberately introduce
potential breakage in our code base just for that.

And I still do not see how it is safe to unconditionally clearing
the pointer in the slot instance to NULL.  It was asked late January
in https://lore.kernel.org/git/xmqqv8y52g3a.fsf@gitster.g/

In other words, what we should have been spelunking is *not* in the
GCC docs and code, but the http codepath in our code that depends on
the slot not being cleared when we didn't set up the pointer in the
current recursion of that function.  With a clear description on how
this change is safe, with a clear understanding on why the pointer
member "finished" was added in the first place to avoid the same
mistake as an earlier round of this patch [*1*], it would become an
acceptable patch, with or without GCC warning.

Namely, the finding in this part of a review comment [*2*]

    The only way the separation could make a difference is while
    step_active_slots(), the current slot is completed, our local
    "finished" gets marked as such thanks to the pointer-ness of the
    finished member, and then another pending request is started
    reusing the same slot object (properly initialized for that new
    request).  In such a case, the while loop we want to see exit
    will see that slot->in_use member is _still_ true, even though
    it is true only because it is now about a separate and unrelated
    request that is still waiting for completion, and the original
    request the caller is waiting for has already finished.

that was made to explain why the pointer member is there, and a
possible case that the code before the introduction of the pointer
would misbehave and today's code would behave better, worries me the
most, as unconditionally assigning NULL there like this patch does
without any guard would mean that we are breaking the code exactly
in such a case, I would think.

In short, I do not care who takes the credit, I care more about the
correctness of the code than a warning by a version of a compiler, I
do not care at all if the compiler writers considers the warning a
bug, and I worry that the change proposed, while it may certainly
squelch the bug, may break the code that has been working happily,
and I haven't seen a clear explanation why it is not the case.

As long as the same slot is never passed to run_active_slot()
recursively, clearing the member unconditionally when the control
leaves the function should not break the code.  Nobody seems to have
explained how it is the case.


[References]

*1* https://lore.kernel.org/git/patch-1.1-1cec367e805-20220126T212921Z-avarab@gmail.com/
*2* https://lore.kernel.org/git/xmqqczkengsg.fsf@gitster.g/
Ævar Arnfjörð Bjarmason April 15, 2022, 1:30 p.m. UTC | #5
On Thu, Apr 14 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> Having spelunked in the GCC docs, source, commits involved & in their
>> bugtracker I don't think they'd consider this broken. It's working as
>> designed.
>>
>> Now, of course as with any new compiler warning you might argue that
>> it's too overzealous, but in this case it's included it a -Wall in GCC
>> 12.0.
>
> Whatever.  I do not care if this is a broken or wai from GCC's point
> of view.
>
> I care more about the correct operation of the production code of
> ours, than a workaround to squelch a warning, overzealous or not, by
> a compiler, and if it is not clear that the workaround is obviously
> free of negative side effect, we do not want to deliberately introduce
> potential breakage in our code base just for that.
>
> And I still do not see how it is safe to unconditionally clearing
> the pointer in the slot instance to NULL.  It was asked late January
> in https://lore.kernel.org/git/xmqqv8y52g3a.fsf@gitster.g/

The v3 re-roll at
https://lore.kernel.org/git/patch-v3-1.1-69190804c67-20220325T143322Z-avarab@gmail.com/
(see range diff) was intended to address both your question there &
Taylor's at https://lore.kernel.org/git/YhprAb1f1WYIktCV@nand.local/.

> In other words, what we should have been spelunking is *not* in the
> GCC docs and code, but the http codepath in our code that depends on
> the slot not being cleared when we didn't set up the pointer in the
> current recursion of that function.  With a clear description on how
> this change is safe, with a clear understanding on why the pointer
> member "finished" was added in the first place to avoid the same
> mistake as an earlier round of this patch [*1*], it would become an
> acceptable patch, with or without GCC warning.
>
> Namely, the finding in this part of a review comment [*2*]
>
>     The only way the separation could make a difference is while
>     step_active_slots(), the current slot is completed, our local
>     "finished" gets marked as such thanks to the pointer-ness of the
>     finished member, and then another pending request is started
>     reusing the same slot object (properly initialized for that new
>     request).  In such a case, the while loop we want to see exit
>     will see that slot->in_use member is _still_ true, even though
>     it is true only because it is now about a separate and unrelated
>     request that is still waiting for completion, and the original
>     request the caller is waiting for has already finished.
>
> that was made to explain why the pointer member is there, and a
> possible case that the code before the introduction of the pointer
> would misbehave and today's code would behave better, worries me the
> most, as unconditionally assigning NULL there like this patch does
> without any guard would mean that we are breaking the code exactly
> in such a case, I would think.

I think that accurately describes a logic error in the v1 of this patch,
i.e. we can't remove the "finished" member, but per the v3 explanation I
believe (re-)setting it to NULL is safe.

> In short, I do not care who takes the credit, I care more about the
> correctness of the code than a warning by a version of a compiler, I
> do not care at all if the compiler writers considers the warning a
> bug, and I worry that the change proposed, while it may certainly
> squelch the bug, may break the code that has been working happily,
> and I haven't seen a clear explanation why it is not the case.
>
> As long as the same slot is never passed to run_active_slot()
> recursively, clearing the member unconditionally when the control
> leaves the function should not break the code.  Nobody seems to have
> explained how it is the case.

I tried to explain that in the v3, but that was part of what you
edited/amended in your applied version of it.

I don't know how to answer your concerns/questions other than as I've
already done there.
diff mbox series

Patch

diff --git a/http.c b/http.c
index 229da4d1488..2f67fbb33cd 100644
--- a/http.c
+++ b/http.c
@@ -1367,6 +1367,7 @@  void run_active_slot(struct active_request_slot *slot)
 			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
 		}
 	}
+	slot->finished = NULL;
 }
 
 static void release_active_slot(struct active_request_slot *slot)