Message ID | patch-v3-1.1-69190804c67-20220325T143322Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] http API: fix dangling pointer issue noted by GCC 12.0 | expand |
On Fri, Mar 25, 2022 at 03:34:49PM +0100, Ævar Arnfjörð Bjarmason wrote: > This isn't the only caller that assigns to "slot->finished", see see s/see see/see ? > the assignments in http-walker.c:process_alternates_response() and > http.c:finish_active_slot(). > > But those assignments are both to the pointer to our local variable > here, so this fix is correct. The only way that code in http-walker.c > could have done its assignments is to the pointer to this specific > variable. Got it; this is the key piece that I was missing in my earlier review. Sorry about that, this looks completely safe to me now. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > On Fri, Mar 25, 2022 at 03:34:49PM +0100, Ævar Arnfjörð Bjarmason wrote: >> This isn't the only caller that assigns to "slot->finished", see see > > s/see see/see ? > >> the assignments in http-walker.c:process_alternates_response() and >> http.c:finish_active_slot(). >> >> But those assignments are both to the pointer to our local variable >> here, so this fix is correct. The only way that code in http-walker.c >> could have done its assignments is to the pointer to this specific >> variable. > > Got it; this is the key piece that I was missing in my earlier review. > Sorry about that, this looks completely safe to me now. It does not exactly look safe to me, though. With a bit of rewrite, here is what I'd queue for now. I really hope that the pre-release compiler will be fixed soon so that we do not have to worry about this patch. ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----- From: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Date: Fri, 25 Mar 2022 15:34:49 +0100 The pre-release GCC 12.0 development branch has a new warning about dangling pointers in -Wall: http.c: In function ‘run_active_slot’: http.c:1332:24: error: storing the address of local variable ‘finished’ in ‘*slot.finished’ [-Werror=dangling-pointer=] 1332 | slot->finished = &finished; | ~~~~~~~~~~~~~~~^~~~~~~~~~~ http.c:1330:13: note: ‘finished’ declared here 1330 | int finished = 0; | ^~~~~~~~ This is on a locally built "gcc (GCC) 12.0.1 20220120 (experimental)", built from gcc.git's 8bc700f4c3f (Enhance vec_pack_trunc for integral mode mask., 2022-01-17). The GCC warning is specifically about pointers that survive the exit of the function. See a comment added to "pass_waccess::use_after_inval_p" in the GCC commit that added the warning, or: /* The use is one of a dangling pointer if a clobber of the variable [the pointer points to] has not been found before the function exit point. */ [...] There's a few possible ways to fix this, but the simplest is to assign NULL to "slot->finished" at the end of run_active_slot(). It was suggested[2] to guard that with "if (slot->finished == &finished)", but that'll still trigger the bogus warning from the compiler. This is the only place that assigns to "slot->finished"; other mention of slot->finished in http-walker.c:process_alternates_response() and http.c:finish_active_slot() are assignments through the pointer, and not moving where the pointer points at. As long as the same slot is never passed to run_active_slot() recursively, clearing the member unconditionally when the control leaves the function should not break the code. Knock wood, as nobody seems to have made sure if that is the case. We could add if (slot->finished) BUG("the uncoditional clearing at the end is wrong"); early in run_active_slot(), before we assign the pointer to our on-stack variable to this field, to give us such a guarantee, though. 1. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 2. https://lore.kernel.org/git/xmqq8rv2nggn.fsf@gitster.g/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- http.c | 1 + 1 file changed, 1 insertion(+) diff --git a/http.c b/http.c index 229da4d148..2f67fbb33c 100644 --- a/http.c +++ b/http.c @@ -1367,6 +1367,7 @@ void run_active_slot(struct active_request_slot *slot) select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); } } + slot->finished = NULL; } static void release_active_slot(struct active_request_slot *slot)
On Fri, Mar 25 2022, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > >> On Fri, Mar 25, 2022 at 03:34:49PM +0100, Ævar Arnfjörð Bjarmason wrote: >>> This isn't the only caller that assigns to "slot->finished", see see >> >> s/see see/see ? >> >>> the assignments in http-walker.c:process_alternates_response() and >>> http.c:finish_active_slot(). >>> >>> But those assignments are both to the pointer to our local variable >>> here, so this fix is correct. The only way that code in http-walker.c >>> could have done its assignments is to the pointer to this specific >>> variable. >> >> Got it; this is the key piece that I was missing in my earlier review. >> Sorry about that, this looks completely safe to me now. > > It does not exactly look safe to me, though. > > With a bit of rewrite, here is what I'd queue for now. I really > hope that the pre-release compiler will be fixed soon so that we do > not have to worry about this patch. Late reply, sorry. I don't think the compiler is broken in this case. Having spelunked in the GCC docs, source, commits involved & in their bugtracker I don't think they'd consider this broken. It's working as designed. Now, of course as with any new compiler warning you might argue that it's too overzealous, but in this case it's included it a -Wall in GCC 12.0. The warning is specifically about the local pointer being stored in a structure that survives the exit of function, and therefore if you were to dereference that pointer after the function exists the results are undefined. Now, we happen to be able to carefully read the code in this case and reason that we won't be looking at it except in the code that's called within this function, so in practice we're OK. But we'd have a bug sneak up on us if that wasn't the case, so it's safer to NULL this out, which is exactly what the new GCC warning is concerning itself with. I.e. it's not promising to narrow itself to only those cases where GCC can know for absolute certain that it's a *practical* issue. Which, basically would be asking for it to do as much work to emit it as it has to do on -fanalyzer, which makes compilation of git.git take somewhere in the mid-range of double digit minutes for me, instead of ~20s. So I'd prefer: 1. Adjust for release etc., but that what I submitted in https://lore.kernel.org/git/patch-v3-1.1-69190804c67-20220325T143322Z-avarab@gmail.com/ go in as-is. 2. If you're convinced this is a compiler bug could you please either drop the current version, or rewrite the commit envelope so that you're the author? I very much appreciate when you do local commit fix-ups for rewording, typos, or even rewriting something I did for the better. But in this case it's got my name on the envelope & has been reworded to say pretty much the exact opposite of what I think/believe about this warning. Which isn't something wrong or not permitted by the license/SOB/whatever. I'd just find it confusing when I'll dig this up at some point in the future to find my name on it, read the explanation, and then be perplexed why I ever thought that about this particular warning ... :) Thanks.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Having spelunked in the GCC docs, source, commits involved & in their > bugtracker I don't think they'd consider this broken. It's working as > designed. > > Now, of course as with any new compiler warning you might argue that > it's too overzealous, but in this case it's included it a -Wall in GCC > 12.0. Whatever. I do not care if this is a broken or wai from GCC's point of view. I care more about the correct operation of the production code of ours, than a workaround to squelch a warning, overzealous or not, by a compiler, and if it is not clear that the workaround is obviously free of negative side effect, we do not want to deliberately introduce potential breakage in our code base just for that. And I still do not see how it is safe to unconditionally clearing the pointer in the slot instance to NULL. It was asked late January in https://lore.kernel.org/git/xmqqv8y52g3a.fsf@gitster.g/ In other words, what we should have been spelunking is *not* in the GCC docs and code, but the http codepath in our code that depends on the slot not being cleared when we didn't set up the pointer in the current recursion of that function. With a clear description on how this change is safe, with a clear understanding on why the pointer member "finished" was added in the first place to avoid the same mistake as an earlier round of this patch [*1*], it would become an acceptable patch, with or without GCC warning. Namely, the finding in this part of a review comment [*2*] The only way the separation could make a difference is while step_active_slots(), the current slot is completed, our local "finished" gets marked as such thanks to the pointer-ness of the finished member, and then another pending request is started reusing the same slot object (properly initialized for that new request). In such a case, the while loop we want to see exit will see that slot->in_use member is _still_ true, even though it is true only because it is now about a separate and unrelated request that is still waiting for completion, and the original request the caller is waiting for has already finished. that was made to explain why the pointer member is there, and a possible case that the code before the introduction of the pointer would misbehave and today's code would behave better, worries me the most, as unconditionally assigning NULL there like this patch does without any guard would mean that we are breaking the code exactly in such a case, I would think. In short, I do not care who takes the credit, I care more about the correctness of the code than a warning by a version of a compiler, I do not care at all if the compiler writers considers the warning a bug, and I worry that the change proposed, while it may certainly squelch the bug, may break the code that has been working happily, and I haven't seen a clear explanation why it is not the case. As long as the same slot is never passed to run_active_slot() recursively, clearing the member unconditionally when the control leaves the function should not break the code. Nobody seems to have explained how it is the case. [References] *1* https://lore.kernel.org/git/patch-1.1-1cec367e805-20220126T212921Z-avarab@gmail.com/ *2* https://lore.kernel.org/git/xmqqczkengsg.fsf@gitster.g/
On Thu, Apr 14 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Having spelunked in the GCC docs, source, commits involved & in their >> bugtracker I don't think they'd consider this broken. It's working as >> designed. >> >> Now, of course as with any new compiler warning you might argue that >> it's too overzealous, but in this case it's included it a -Wall in GCC >> 12.0. > > Whatever. I do not care if this is a broken or wai from GCC's point > of view. > > I care more about the correct operation of the production code of > ours, than a workaround to squelch a warning, overzealous or not, by > a compiler, and if it is not clear that the workaround is obviously > free of negative side effect, we do not want to deliberately introduce > potential breakage in our code base just for that. > > And I still do not see how it is safe to unconditionally clearing > the pointer in the slot instance to NULL. It was asked late January > in https://lore.kernel.org/git/xmqqv8y52g3a.fsf@gitster.g/ The v3 re-roll at https://lore.kernel.org/git/patch-v3-1.1-69190804c67-20220325T143322Z-avarab@gmail.com/ (see range diff) was intended to address both your question there & Taylor's at https://lore.kernel.org/git/YhprAb1f1WYIktCV@nand.local/. > In other words, what we should have been spelunking is *not* in the > GCC docs and code, but the http codepath in our code that depends on > the slot not being cleared when we didn't set up the pointer in the > current recursion of that function. With a clear description on how > this change is safe, with a clear understanding on why the pointer > member "finished" was added in the first place to avoid the same > mistake as an earlier round of this patch [*1*], it would become an > acceptable patch, with or without GCC warning. > > Namely, the finding in this part of a review comment [*2*] > > The only way the separation could make a difference is while > step_active_slots(), the current slot is completed, our local > "finished" gets marked as such thanks to the pointer-ness of the > finished member, and then another pending request is started > reusing the same slot object (properly initialized for that new > request). In such a case, the while loop we want to see exit > will see that slot->in_use member is _still_ true, even though > it is true only because it is now about a separate and unrelated > request that is still waiting for completion, and the original > request the caller is waiting for has already finished. > > that was made to explain why the pointer member is there, and a > possible case that the code before the introduction of the pointer > would misbehave and today's code would behave better, worries me the > most, as unconditionally assigning NULL there like this patch does > without any guard would mean that we are breaking the code exactly > in such a case, I would think. I think that accurately describes a logic error in the v1 of this patch, i.e. we can't remove the "finished" member, but per the v3 explanation I believe (re-)setting it to NULL is safe. > In short, I do not care who takes the credit, I care more about the > correctness of the code than a warning by a version of a compiler, I > do not care at all if the compiler writers considers the warning a > bug, and I worry that the change proposed, while it may certainly > squelch the bug, may break the code that has been working happily, > and I haven't seen a clear explanation why it is not the case. > > As long as the same slot is never passed to run_active_slot() > recursively, clearing the member unconditionally when the control > leaves the function should not break the code. Nobody seems to have > explained how it is the case. I tried to explain that in the v3, but that was part of what you edited/amended in your applied version of it. I don't know how to answer your concerns/questions other than as I've already done there.
diff --git a/http.c b/http.c index 229da4d1488..2f67fbb33cd 100644 --- a/http.c +++ b/http.c @@ -1367,6 +1367,7 @@ void run_active_slot(struct active_request_slot *slot) select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); } } + slot->finished = NULL; } static void release_active_slot(struct active_request_slot *slot)
The pre-release GCC 12.0 development branch has a new warning about dangling pointers in -Wall: http.c: In function ‘run_active_slot’: http.c:1332:24: error: storing the address of local variable ‘finished’ in ‘*slot.finished’ [-Werror=dangling-pointer=] 1332 | slot->finished = &finished; | ~~~~~~~~~~~~~~~^~~~~~~~~~~ http.c:1330:13: note: ‘finished’ declared here 1330 | int finished = 0; | ^~~~~~~~ This is on a locally built "gcc (GCC) 12.0.1 20220120 (experimental)", built from gcc.git's 8bc700f4c3f (Enhance vec_pack_trunc for integral mode mask., 2022-01-17). The GCC warning is specifically about pointers that survive the exit of the function. See a comment added to "pass_waccess::use_after_inval_p" in the GCC commit that added the warning, or: /* The use is one of a dangling pointer if a clobber of the variable [the pointer points to] has not been found before the function exit point. */ [...] There's a few possible ways to fix this, but the simplest is to assign NULL to "slot->finished" at the end of run_active_slot(). This isn't the only caller that assigns to "slot->finished", see see the assignments in http-walker.c:process_alternates_response() and http.c:finish_active_slot(). But those assignments are both to the pointer to our local variable here, so this fix is correct. The only way that code in http-walker.c could have done its assignments is to the pointer to this specific variable. It was suggested[2] to guard that with "if (slot->finished == &finished)", but that'll still trigger the warning. 1. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 2. https://lore.kernel.org/git/xmqq8rv2nggn.fsf@gitster.g/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- This should clarify the feedback on v2, sorry about the very late re-roll. I.e. I *meant* in v2 that it's the only assignment to slot->finished itself is here, the other assignments in http-walker.c are to the pointer to our variable here. Range-diff against v2: 1: 777838267a5 ! 1: 69190804c67 http API: fix dangling pointer issue noted by GCC 12.0 @@ Commit message [...] There's a few possible ways to fix this, but the simplest is to assign - NULL to "slot->finished" at the end of run_active_slot(), it's the - only caller that ever assigns non-NULL to it. It was suggested[2] to - guard that with "if (slot->finished == &finished)", but that'll still - trigger the warning. + NULL to "slot->finished" at the end of run_active_slot(). + + This isn't the only caller that assigns to "slot->finished", see see + the assignments in http-walker.c:process_alternates_response() and + http.c:finish_active_slot(). + + But those assignments are both to the pointer to our local variable + here, so this fix is correct. The only way that code in http-walker.c + could have done its assignments is to the pointer to this specific + variable. + + It was suggested[2] to guard that with "if (slot->finished == + &finished)", but that'll still trigger the warning. 1. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 2. https://lore.kernel.org/git/xmqq8rv2nggn.fsf@gitster.g/ http.c | 1 + 1 file changed, 1 insertion(+)