Message ID | 20240208220604.140859-1-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Kconfig: Explicitly disable asm goto w/ outputs on gcc-11 (and earlier) | expand |
+ Andrew Andrew, here's the full thread, I cut most of it out: https://lore.kernel.org/lkml/20240208220604.140859-1-seanjc@google.com/ . On Thu, Feb 8, 2024 at 2:06 PM Sean Christopherson <seanjc@google.com> wrote: > > Explicitly require gcc-12+ to enable asm goto with outputs on gcc to avoid > what is effectively a data corruption bug on gcc-11. As per > https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, "asm goto" is > *supposed* be implicitly volatile, but gcc-11 fails to treat it as such. > When compiling with -O2, failure to treat the asm block as volatile can > result in the entire block being discarded during optimization. > > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 Shoot, I was cc'ed on that bug (I think I noticed it in testing as well, and pointed it out to Andrew on IRC who then cc'ed me to it). I probably should have asked if that would cause issues at some point for the kernel. I took a look at the test case added in that bug; it doesn't compile until gcc-13 (specifically gcc 13.2, not gcc 13.1). I'm curious since the bug says the fix was backported to gcc-12 and gcc-13. Are there specific versions of those that contain the fix? If so, should Sean amend his version checks below? For instance, was the fix backported to gcc 12.3, so users of gcc 12.2 would still have issues? I can't tell in godbolt since the added test case doesn't compile until gcc 13.2. https://godbolt.org/z/eqaa7dfo3 My implementation in clang still has some issues, too. It's hard to get new control flow right, and there are minimal users outside the kernel to help us validate. So as much of a fan of feature detection as I am, I admit some of these edge cases aren't perfect, and we may need to result to version detection when such bugs become observable to users. I'm happy to ack this patch, but I would like to wait for feedback from Andrew as to whether we can be even more precise with avoiding more specific versions of gcc 12 and 13 (if necessary). > Fixes: 587f17018a2c ("Kconfig: add config option for asm goto w/ outputs") > Cc: Nick Desaulniers <ndesaulniers@google.com> > Cc: Masahiro Yamada <masahiroy@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: kvm@vger.kernel.org > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > > Linus, I'm sending to you directly as this seems urgent enough to apply > straightaway, and this obviously affects much more than the build system. > > init/Kconfig | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/init/Kconfig b/init/Kconfig > index deda3d14135b..f4e46d64c1e7 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -82,6 +82,11 @@ config CC_CAN_LINK_STATIC > default $(success,$(srctree)/scripts/cc-can-link.sh $(CC) $(CLANG_FLAGS) $(USERCFLAGS) $(USERLDFLAGS) $(m32-flag) -static) > > config CC_HAS_ASM_GOTO_OUTPUT > + # gcc-11 has a nasty bug where it doesn't treat asm goto as volatile, > + # which can result in asm blocks being dropped when compiling with -02. > + # Note, explicitly forcing volatile doesn't entirely fix the bug! > + # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 > + depends on !CC_IS_GCC || GCC_VERSION >= 120000 LGTM; but we might need to be more specific about avoiding certain min versions of gcc 13 and 12. > def_bool $(success,echo 'int foo(int x) { asm goto ("": "=r"(x) ::: bar); return x; bar: return 0; }' | $(CC) -x c - -c -o /dev/null) > > config CC_HAS_ASM_GOTO_TIED_OUTPUT > > base-commit: 047371968ffc470769f541d6933e262dc7085456 > -- > 2.43.0.687.g38aa6559b0-goog >
> -----Original Message----- > From: Nick Desaulniers <ndesaulniers@google.com> > Sent: Friday, February 9, 2024 9:03 AM > To: Sean Christopherson <seanjc@google.com>; Andrew Pinski (QUIC) > <quic_apinski@quicinc.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org>; linux- > kernel@vger.kernel.org; Masahiro Yamada <masahiroy@kernel.org>; Peter > Zijlstra <peterz@infradead.org>; kvm@vger.kernel.org > Subject: Re: [PATCH] Kconfig: Explicitly disable asm goto w/ outputs on gcc-11 > (and earlier) > > + Andrew > > Andrew, here's the full thread, I cut most of it out: > https://lore.kernel.org/lkml/20240208220604.140859-1- > seanjc@google.com/ So the exact versions of GCC where this is/was fixed are: 12.4.0 (not released yet) 13.2.0 14.1.0 (not released yet) This information is listed in the bug report via the "known to work" field though since 12.4.0 is not released yet, it is listed as 12.3.1. (13.2.0 was not released at the time it was backported so it is listed as 13.1.1). The target milestone in this case lists the lowest version # where the fix is/will be included. Thanks, Andrew Pinski > . > > On Thu, Feb 8, 2024 at 2:06 PM Sean Christopherson <seanjc@google.com> > wrote: > > > > Explicitly require gcc-12+ to enable asm goto with outputs on gcc to > > avoid what is effectively a data corruption bug on gcc-11. As per > > https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, "asm goto" is > > *supposed* be implicitly volatile, but gcc-11 fails to treat it as such. > > When compiling with -O2, failure to treat the asm block as volatile > > can result in the entire block being discarded during optimization. > > > > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 > > Shoot, I was cc'ed on that bug (I think I noticed it in testing as well, and > pointed it out to Andrew on IRC who then cc'ed me to it). I probably should > have asked if that would cause issues at some point for the kernel. > > I took a look at the test case added in that bug; it doesn't compile until gcc-13 > (specifically gcc 13.2, not gcc 13.1). I'm curious since the bug says the fix was > backported to gcc-12 and gcc-13. Are there specific versions of those that > contain the fix? If so, should Sean amend his version checks below? For > instance, was the fix backported to gcc 12.3, so users of gcc 12.2 would still > have issues? I can't tell in godbolt since the added test case doesn't compile > until gcc 13.2. https://godbolt.org/z/eqaa7dfo3 > > My implementation in clang still has some issues, too. It's hard to get new > control flow right, and there are minimal users outside the kernel to help us > validate. > > So as much of a fan of feature detection as I am, I admit some of these edge > cases aren't perfect, and we may need to result to version detection when > such bugs become observable to users. > > I'm happy to ack this patch, but I would like to wait for feedback from Andrew > as to whether we can be even more precise with avoiding more specific > versions of gcc 12 and 13 (if necessary). > > > Fixes: 587f17018a2c ("Kconfig: add config option for asm goto w/ > > outputs") > > Cc: Nick Desaulniers <ndesaulniers@google.com> > > Cc: Masahiro Yamada <masahiroy@kernel.org> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: kvm@vger.kernel.org > > Cc: stable@vger.kernel.org > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > > > Linus, I'm sending to you directly as this seems urgent enough to > > apply straightaway, and this obviously affects much more than the build > system. > > > > init/Kconfig | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/init/Kconfig b/init/Kconfig index > > deda3d14135b..f4e46d64c1e7 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -82,6 +82,11 @@ config CC_CAN_LINK_STATIC > > default $(success,$(srctree)/scripts/cc-can-link.sh $(CC) > > $(CLANG_FLAGS) $(USERCFLAGS) $(USERLDFLAGS) $(m32-flag) -static) > > > > config CC_HAS_ASM_GOTO_OUTPUT > > + # gcc-11 has a nasty bug where it doesn't treat asm goto as volatile, > > + # which can result in asm blocks being dropped when compiling with - > 02. > > + # Note, explicitly forcing volatile doesn't entirely fix the bug! > > + # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 > > + depends on !CC_IS_GCC || GCC_VERSION >= 120000 > > LGTM; but we might need to be more specific about avoiding certain min > versions of gcc 13 and 12. > > > def_bool $(success,echo 'int foo(int x) { asm goto ("": > > "=r"(x) ::: bar); return x; bar: return 0; }' | $(CC) -x c - -c -o > > /dev/null) > > > > config CC_HAS_ASM_GOTO_TIED_OUTPUT > > > > base-commit: 047371968ffc470769f541d6933e262dc7085456 > > -- > > 2.43.0.687.g38aa6559b0-goog > > > > > -- > Thanks, > ~Nick Desaulniers
On Fri, 9 Feb 2024 at 09:14, Andrew Pinski (QUIC) <quic_apinski@quicinc.com> wrote: > > So the exact versions of GCC where this is/was fixed are: > 12.4.0 (not released yet) > 13.2.0 > 14.1.0 (not released yet) Looking at the patch that the bugzilla says is the fix, it *looks* like it's just the "mark volatile" that is missing. But Sean says that even if we mark "asm goto" as volatile manually, it still fails. So there seems to be something else going on in addition to just the volatile. Side note: the reason we have that "asm_volatile_goto()" define in the kernel is that we *used* to have a _different_ workaround for a gcc bug in this area: /* * GCC 'asm goto' miscompiles certain code sequences: * * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 * * Work it around via a compiler barrier quirk suggested by Jakub Jelinek. * * (asm goto is automatically volatile - the naming reflects this.) */ #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) and looking at that (old) bugzilla there seems to be a lot of "seems to be fixed", but it's not entirely clear. We've removed that workaround in commit 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670"), I'm wondering if maybe that removal was a bit optimistic. Linus
On Fri, Feb 09, 2024, Linus Torvalds wrote: > On Fri, 9 Feb 2024 at 09:14, Andrew Pinski (QUIC) > <quic_apinski@quicinc.com> wrote: > > > > So the exact versions of GCC where this is/was fixed are: > > 12.4.0 (not released yet) > > 13.2.0 > > 14.1.0 (not released yet) > > Looking at the patch that the bugzilla says is the fix, it *looks* > like it's just the "mark volatile" that is missing. > > But Sean says that even if we mark "asm goto" as volatile manually, > it still fails. > > So there seems to be something else going on in addition to just the volatile. Aha! Yeah, there's a second bug that set things up so that the "not implicitly volatile" bug could rear its head. (And now I feel less bad for not suspecting the compiler sooner, because it didn't occur to me that gcc could possibly think the asm blob had no used outputs). With "volatile" forced, gcc generates code for the asm blob, but doesn't actually consume the output of the VMREAD. As a result, the optimization pass sees the unused output and throws it away because the blob isn't treated as volatile. vmread %rax,%rax <= output register is unused jbe 0xffffffff8109994a <sync_vmcs02_to_vmcs12+1898> xor %r12d,%r12d <= one of the "return 0" statements mov %r12,0xf0(%rbx) <= store the output > Side note: the reason we have that "asm_volatile_goto()" define in the > kernel is that we *used* to have a _different_ workaround for a gcc > bug in this area: > > /* > * GCC 'asm goto' miscompiles certain code sequences: > * > * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > * > * Work it around via a compiler barrier quirk suggested by Jakub Jelinek. > * > * (asm goto is automatically volatile - the naming reflects this.) > */ > #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) > > and looking at that (old) bugzilla there seems to be a lot of "seems > to be fixed", but it's not entirely clear. > > We've removed that workaround in commit 43c249ea0b1e ("compiler-gcc.h: > remove ancient workaround for gcc PR 58670"), I'm wondering if maybe > that removal was a bit optimistic. FWIW, reverting that does restore correct behavior on gcc-11. Note, this is 100% reproducible across multiple systems, though AFAICT it's somewhat dependent on the .config. Holler if anyone wants the .config.
On Fri, Feb 9, 2024 at 10:43 AM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Feb 09, 2024, Linus Torvalds wrote: > > On Fri, 9 Feb 2024 at 09:14, Andrew Pinski (QUIC) > > <quic_apinski@quicinc.com> wrote: > > > > > > So the exact versions of GCC where this is/was fixed are: > > > 12.4.0 (not released yet) > > > 13.2.0 > > > 14.1.0 (not released yet) > > > > Looking at the patch that the bugzilla says is the fix, it *looks* > > like it's just the "mark volatile" that is missing. > > > > But Sean says that even if we mark "asm goto" as volatile manually, > > it still fails. > > > > So there seems to be something else going on in addition to just the volatile. > > Aha! Yeah, there's a second bug that set things up so that the "not implicitly > volatile" bug could rear its head. (And now I feel less bad for not suspecting > the compiler sooner, because it didn't occur to me that gcc could possibly think > the asm blob had no used outputs). > > With "volatile" forced, gcc generates code for the asm blob, but doesn't actually > consume the output of the VMREAD. As a result, the optimization pass sees the > unused output and throws it away because the blob isn't treated as volatile. > > vmread %rax,%rax <= output register is unused > jbe 0xffffffff8109994a <sync_vmcs02_to_vmcs12+1898> > xor %r12d,%r12d <= one of the "return 0" statements > mov %r12,0xf0(%rbx) <= store the output > > > Side note: the reason we have that "asm_volatile_goto()" define in the > > kernel is that we *used* to have a _different_ workaround for a gcc > > bug in this area: > > > > /* > > * GCC 'asm goto' miscompiles certain code sequences: > > * > > * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > * > > * Work it around via a compiler barrier quirk suggested by Jakub Jelinek. > > * > > * (asm goto is automatically volatile - the naming reflects this.) > > */ > > #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) > > > > and looking at that (old) bugzilla there seems to be a lot of "seems > > to be fixed", but it's not entirely clear. > > > > We've removed that workaround in commit 43c249ea0b1e ("compiler-gcc.h: Interesting, I thought I had proposed removing that earlier and Linus had yelled about doing so. https://lore.kernel.org/lkml/20180907222109.163802-1-ndesaulniers@google.com/ seems like in ~2018 I was trying to, but can't seem to find the thread where Linus pushed back. > > remove ancient workaround for gcc PR 58670"), I'm wondering if maybe > > that removal was a bit optimistic. > > FWIW, reverting that does restore correct behavior on gcc-11. And also pessimizes all asm gotos (for gcc) including ones that don't contain output as described in 43c249ea0b1e. The version checks would at least not pessimize those. > > Note, this is 100% reproducible across multiple systems, though AFAICT it's > somewhat dependent on the .config. Holler if anyone wants the .config.
On Fri, 9 Feb 2024 at 10:43, Sean Christopherson <seanjc@google.com> wrote: > > > We've removed that workaround in commit 43c249ea0b1e ("compiler-gcc.h: > > remove ancient workaround for gcc PR 58670"), I'm wondering if maybe > > that removal was a bit optimistic. > > FWIW, reverting that does restore correct behavior on gcc-11. Hmm. I suspect we'll just have to revert that commit then - although probably in some form where it only affects the "this has outputs" case. Which is much less common than the non-output "asm goto" case. It does cause gcc to generate fairly horrific code (as noted in the commit), but it's almost certainly still better code than what the non-asm-goto case results in. We have very few uses of CC_HAS_ASM_GOTO_OUTPUT (and the related CC_HAS_ASM_GOTO_TIED_OUTPUT), but unsafe_get_user() in particular generates horrid code without it. But it would be really good to understand *what* that secondary bug is, and the fix for it. Just to make sure that gcc is really fixed, and there isn't some pending bug that we just keep hiding with that extra empty volatile asm. Andrew? Linus
On Fri, 9 Feb 2024 at 10:55, Nick Desaulniers <ndesaulniers@google.com> wrote: > > And also pessimizes all asm gotos (for gcc) including ones that don't > contain output as described in 43c249ea0b1e. The version checks would > at least not pessimize those. Yeah, no, we should limit that workaround only to the asm goto with outputs case. We should also probably get rid of the existing "asm_volatile_goto()" macro name entirely. That name was always pretty horrific, in that it didn't even mark the asm as volatile even in the case where it did anything. So the name of that macro made little sense, and the new workaround should be only for asm goto with outputs. So I'd suggest jmaking a new macro with that name: #define asm_goto_output(x...) and on gcc use that old workaround, and on clang just make it be a plain "asm goto". Hmm? Linus
On Fri, Feb 9, 2024 at 11:01 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Fri, 9 Feb 2024 at 10:55, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > And also pessimizes all asm gotos (for gcc) including ones that don't > > contain output as described in 43c249ea0b1e. The version checks would > > at least not pessimize those. > > Yeah, no, we should limit that workaround only to the asm goto with > outputs case. > > We should also probably get rid of the existing "asm_volatile_goto()" > macro name entirely. That name was always pretty horrific, in that it > didn't even mark the asm as volatile even in the case where it did > anything. +1. 43c249ea0b1e could have done so, but I'm guessing "tree wide changes in Linux are not fun" was perhaps the reason it wasn't done so then. I also think folks are too aggressive putting volatile on asm statements that might not need them; it's definitely less cognitive burden to just always put `volatile` on inline asm but I suspect that's leaving some performance on the floor in certain cases. (I had a patch to clang to warn when the volatile was unnecessary in cases where it was explicit, but that was shot down in code review as being harassing to users). > > So the name of that macro made little sense, and the new workaround > should be only for asm goto with outputs. So I'd suggest jmaking a new > macro with that name: > > #define asm_goto_output(x...) > > and on gcc use that old workaround, and on clang just make it be a > plain "asm goto". > > Hmm? Thinking through the tradeoffs, the Kconfig approach would pessimize GCC versions with the "lack of implicit volatile" bug to not use asm goto w/ outputs at all. Having a new macro would just make the `volatile` qualifier explicit, which is a no-op on gcc versions that don't contain the bug (while allowing them to use asm goto with outputs, which is probably better for codegen). So I agree a new macro seems better than disabling things for kconfig. (Assuming the only issue is the need to make `volatile` explicit for a few GCC versions; it's not clear to me from Sean's latest response if there's more than one bug here). I'm not signing up to shave either yak though.
> -----Original Message----- > From: Linus Torvalds <torvalds@linux-foundation.org> > Sent: Friday, February 9, 2024 10:56 AM > To: Sean Christopherson <seanjc@google.com> > Cc: Andrew Pinski (QUIC) <quic_apinski@quicinc.com>; Nick Desaulniers > <ndesaulniers@google.com>; linux-kernel@vger.kernel.org; Masahiro Yamada > <masahiroy@kernel.org>; Peter Zijlstra <peterz@infradead.org>; > kvm@vger.kernel.org > Subject: Re: [PATCH] Kconfig: Explicitly disable asm goto w/ outputs on gcc-11 > (and earlier) > > On Fri, 9 Feb 2024 at 10:43, Sean Christopherson <seanjc@google.com> > wrote: > > > > > We've removed that workaround in commit 43c249ea0b1e ("compiler- > gcc.h: > > > remove ancient workaround for gcc PR 58670"), I'm wondering if maybe > > > that removal was a bit optimistic. > > > > FWIW, reverting that does restore correct behavior on gcc-11. > > Hmm. I suspect we'll just have to revert that commit then - although probably > in some form where it only affects the "this has outputs" > case. > > Which is much less common than the non-output "asm goto" case. > > It does cause gcc to generate fairly horrific code (as noted in the commit), but > it's almost certainly still better code than what the non-asm-goto case results > in. > > We have very few uses of CC_HAS_ASM_GOTO_OUTPUT (and the related > CC_HAS_ASM_GOTO_TIED_OUTPUT), but unsafe_get_user() in particular > generates horrid code without it. > > But it would be really good to understand *what* that secondary bug is, and > the fix for it. Just to make sure that gcc is really fixed, and there isn't some > pending bug that we just keep hiding with that extra empty volatile asm. > > Andrew? > Let me first state inline-asm without output has always been volatile; `asm goto` or not (this has been true since 2.95.3 or even before). The issue is with an output inline-asm is no longer volatile. But the documentation of GCC stated all `asm goto` was volatile. So the side effect of making all `asm goto` as volatile does have an effect on the ones without an output since they were already treated internally and treated similarly to the inline-asm without an output. The fix that was done to GCC was now treat all `asm goto` as volatile instead of just the ones without output. So basically what would happen is internally GCC would remove the `asm goto` with an output if the output was unused. This would cause havoc in the internal state of GCC because the CFG would not be up to date to what the rest of the IR was. Oh and there was a bug in GCC 12.1.0-12.3.0, and GCC 13.1.0 releases which might be the reason why folks are hitting issues even with the volatile added back, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 was the fix there. Note there was/is a new different for `asm goto` was fixed in the past few weeks where some of the optimizations was not ready to handle them (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422); this was just fixed on all of the open release branches but NOT in any release yet; it will be in 11.5.0, 12.4.0, 13.3.0 and 14.1.0. > Linus
On Fri, 9 Feb 2024 at 11:01, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > We should also probably get rid of the existing "asm_volatile_goto()" > macro name entirely. That name was always pretty horrific, in that it > didn't even mark the asm as volatile even in the case where it did > anything. > > So the name of that macro made little sense, and the new workaround > should be only for asm goto with outputs. So I'd suggest jmaking a new > macro with that name: > > #define asm_goto_output(x...) > > and on gcc use that old workaround, and on clang just make it be a > plain "asm goto". So here's a suggested patch that does this. It's largely done with "git grep" and "sed -i", plus some manual fixups for the (few) cases where we have outputs. It looks superficially sane to me, and it passed an allmodconfig build with gcc, but I'm not going to claim that it is really tested. Sean? Does this work for the case you noticed? Basically this gets rid of the old "asm_volatile_goto()" entirely as useless, but replaces it with "asm_goto_outputs()" for the places where we have outputs. And then for gcc, it makes those cases (a) use "asm volatile goto" to fix the fact that some versions of gcc will have missed the "volatile" (b) adds that extra empty asm as a second barrier after the "real" asm goto statement That (b) is very much voodoo programming, but it matches the old magic barrier thing that Jakub Jelinek suggested for the really *old* gcc bug wrt plain (non-output) "asm goto". The underlying bug for _that_ was fixed long ago: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 We removed that for plain "asm goto" workaround a couple of years ago, so "asm_volatile_goto()" has been a no-op since June 2022, but this now resurrects that hack for the output case. I'm not loving it, but Sean seemed to confirm that it fixes the code generation problem, so ... Adding Uros to the cc, since he is both involved with gcc and with the previous asm goto workaround removal, so maybe he has other suggestions. Uros, see https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ for background. Also adding Jakub since I'm re-using the hack he suggested for a different - but similar - case. He may have strong opinions too, and may object to that particular monkey-see-monkey-do voodoo programming. Linus
On Fri, Feb 09, 2024, Linus Torvalds wrote: > Sean? Does this work for the case you noticed? Yep. You can quite literally see the effect of the asm(""). A "good" sequence directly propagates the result from the VMREAD's destination register to its final destination <+1756>: mov $0x280e,%r13d <+1762>: vmread %r13,%r13 <+1766>: jbe 0x209fa <sync_vmcs02_to_vmcs12+1834> <+1768>: mov %r13,0xe8(%rbx) whereas the "bad" sequence bounces through a different register. <+1780>: mov $0x2810,%eax <+1785>: vmread %rax,%rax <+1788>: jbe 0x209e4 <sync_vmcs02_to_vmcs12+1812> <+1790>: mov %rax,%r12 <+1793>: mov %r12,0xf0(%rbx) > That (b) is very much voodoo programming, but it matches the old magic > barrier thing that Jakub Jelinek suggested for the really *old* gcc > bug wrt plain (non-output) "asm goto". The underlying bug for _that_ > was fixed long ago: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > We removed that for plain "asm goto" workaround a couple of years ago, > so "asm_volatile_goto()" has been a no-op since June 2022, but this > now resurrects that hack for the output case. > > I'm not loving it, but Sean seemed to confirm that it fixes the code > generation problem, so ... Yeah, I'm in the same boat.
From: Sean Christopherson > Sent: 09 February 2024 21:47 > > On Fri, Feb 09, 2024, Linus Torvalds wrote: > > Sean? Does this work for the case you noticed? > > Yep. You can quite literally see the effect of the asm(""). A "good" sequence > directly propagates the result from the VMREAD's destination register to its > final destination > > <+1756>: mov $0x280e,%r13d > <+1762>: vmread %r13,%r13 > <+1766>: jbe 0x209fa <sync_vmcs02_to_vmcs12+1834> > <+1768>: mov %r13,0xe8(%rbx) > > whereas the "bad" sequence bounces through a different register. > > <+1780>: mov $0x2810,%eax > <+1785>: vmread %rax,%rax > <+1788>: jbe 0x209e4 <sync_vmcs02_to_vmcs12+1812> > <+1790>: mov %rax,%r12 > <+1793>: mov %r12,0xf0(%rbx) ... Annoying, but I doubt it is measurable in this case. Firstly it could easily be a 'free' register rename. Secondly isn't vmread horribly slow anyway, so an extra clock or two won't matter? The double register move that OPTIMER_HIDE_VAR() often generates is another matter entirely :-) In the old days the peephole optimiser would (should?) have removed most of these. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Feb 9, 2024 at 9:39 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Fri, 9 Feb 2024 at 11:01, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > We should also probably get rid of the existing "asm_volatile_goto()" > > macro name entirely. That name was always pretty horrific, in that it > > didn't even mark the asm as volatile even in the case where it did > > anything. > > > > So the name of that macro made little sense, and the new workaround > > should be only for asm goto with outputs. So I'd suggest jmaking a new > > macro with that name: > > > > #define asm_goto_output(x...) > > > > and on gcc use that old workaround, and on clang just make it be a > > plain "asm goto". > > So here's a suggested patch that does this. > > It's largely done with "git grep" and "sed -i", plus some manual > fixups for the (few) cases where we have outputs. > > It looks superficially sane to me, and it passed an allmodconfig build > with gcc, but I'm not going to claim that it is really tested. > > Sean? Does this work for the case you noticed? > > Basically this gets rid of the old "asm_volatile_goto()" entirely as > useless, but replaces it with "asm_goto_outputs()" for the places > where we have outputs. > > And then for gcc, it makes those cases > > (a) use "asm volatile goto" to fix the fact that some versions of gcc > will have missed the "volatile" > > (b) adds that extra empty asm as a second barrier after the "real" > asm goto statement > > That (b) is very much voodoo programming, but it matches the old magic > barrier thing that Jakub Jelinek suggested for the really *old* gcc > bug wrt plain (non-output) "asm goto". The underlying bug for _that_ > was fixed long ago: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > We removed that for plain "asm goto" workaround a couple of years ago, > so "asm_volatile_goto()" has been a no-op since June 2022, but this > now resurrects that hack for the output case. > > I'm not loving it, but Sean seemed to confirm that it fixes the code > generation problem, so ... > > Adding Uros to the cc, since he is both involved with gcc and with the > previous asm goto workaround removal, so maybe he has other > suggestions. Uros, see > > https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ I'd suggest the original poster to file a bug report in the GCC bugzilla. This way, the bug can be properly analysed and eventually fixed. The detailed instructions are available at https://gcc.gnu.org/bugs/ > for background. > > Also adding Jakub since I'm re-using the hack he suggested for a > different - but similar - case. He may have strong opinions too, and > may object to that particular monkey-see-monkey-do voodoo programming. This big-hammer approach will effectively disable all optimizations around the asm, and should really be used only as a last resort. As learned from the PR58670 (linked above), these workarounds can stay in the kernel forever, and even when the compiler is fixed, there is little incentive to remove them - developers from the kernel world do not want to touch them, because "the workaround just works", and developers from the compiler world do not want to touch them either, because of unknown hidden consequences of removal. Please note, that the kernel is a heavy user of asm statements, and in GCC some forms of asm statements were developed specifically for kernel use, e.g. CC-output, and asm goto in all of its flavors. They have little or no use outside the kernel. If the kernel disables all optimizations around these statements, there actually remain no real-world testcases of how compiler optimizations interact with asm statements, and when new optimizations are introduced to the compiler, asm statemets are left behind (*). (*) When the GCC compiler nears its release, people start to test it on their code bases, and the kernel is one of the most important code bases as far as the compiler is concerned. We count on bugreports from these tests to fix possible fall-out from new optimizations, and the big-hammer workaround will just hide the possible problems. So, I'd suggest at least limit the workaround to known-bad compilers. Thanks, Uros.
On Sun, 11 Feb 2024 at 03:12, Uros Bizjak <ubizjak@gmail.com> wrote: > > I'd suggest the original poster to file a bug report in the GCC > bugzilla. This way, the bug can be properly analysed and eventually > fixed. The detailed instructions are available at > https://gcc.gnu.org/bugs/ Yes, please. Sean? In order to *not* confuse it with the "asm goto with output doesn't imply volatile" bugs, could you make a bug report that talks purely about the code generation issue that happens even with a manually added volatile (your third code sequence in your original email)? > So, I'd suggest at least limit the workaround to known-bad compilers. Right now, I don't think we know which ones are bad. If it was just the "add a volatile by hand" issue, we'd have a good pointer to exactly which issue it is. The other bugzilla entries I saw talked about ICE errors - and one of them is quite likely to be the cause of this, because it really looks like the code issue is some internal confusion. The asm that Sean uses doesn't even have an unused result, so even the volatile shouldn't matter, and seems purely a case of "limiting optimizations partially hides the issue". And then us adding another empty asm obviously does just even more of the same, to the point that now the code works. So I don't think we actually know which compiler version is the fixed one. We only know that gcc-11 doesn't work for Sean. Linus
On Sun, Feb 11, 2024 at 11:59:49AM -0800, Linus Torvalds wrote: > On Sun, 11 Feb 2024 at 03:12, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > I'd suggest the original poster to file a bug report in the GCC > > bugzilla. This way, the bug can be properly analysed and eventually > > fixed. The detailed instructions are available at > > https://gcc.gnu.org/bugs/ > > Yes, please. Sean? > > In order to *not* confuse it with the "asm goto with output doesn't > imply volatile" bugs, could you make a bug report that talks purely > about the code generation issue that happens even with a manually > added volatile (your third code sequence in your original email)? Preferably for all the different cases where you suspect a compiler bug. At minimum preprocessed source + compiler options + detailed description where do you think the bug is (or small runtime testcase but that is harder for issues derived from the kernel obviously). GCC 11 is still supported upstream, so bugs reproduceable even with just that should be filed in gcc.gnu.org/bugzilla/, if something is only reproduceable with older compilers, guess it belongs in some distribution's bugtrackers if those still support those compilers. Once filed we can bisect, analyze them, fix. ICE bugs are even easier to file, all we need is preprocessed source, command line options and gcc version/architecture. gcc -freport-bug in most cases should be able to create everything in one file for the bugreport. As for the workarounds in the kernel, I'd also like to see only workarounds for specific compiler versions (once filed, bisected and analyzed, workaround could be either based on affected compiler versions, or kernel could try to check for the compiler bug in question and only add workaround if that bug reproduces on a short testcase. Sure, this would be easier in autoconf style checks, but could be done even in kernel's makefiles. Jakub
On Sun, Feb 11, 2024, Linus Torvalds wrote: > On Sun, 11 Feb 2024 at 03:12, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > I'd suggest the original poster to file a bug report in the GCC > > bugzilla. This way, the bug can be properly analysed and eventually > > fixed. The detailed instructions are available at > > https://gcc.gnu.org/bugs/ > > Yes, please. Sean? > > In order to *not* confuse it with the "asm goto with output doesn't > imply volatile" bugs, could you make a bug report that talks purely > about the code generation issue that happens even with a manually > added volatile (your third code sequence in your original email)? Will do. Got a bug report ready, just waiting for a GCC Bugzilla account to be created for me so I can file it...
On Mon, Feb 12, 2024, Sean Christopherson wrote: > On Sun, Feb 11, 2024, Linus Torvalds wrote: > > On Sun, 11 Feb 2024 at 03:12, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > I'd suggest the original poster to file a bug report in the GCC > > > bugzilla. This way, the bug can be properly analysed and eventually > > > fixed. The detailed instructions are available at > > > https://gcc.gnu.org/bugs/ > > > > Yes, please. Sean? > > > > In order to *not* confuse it with the "asm goto with output doesn't > > imply volatile" bugs, could you make a bug report that talks purely > > about the code generation issue that happens even with a manually > > added volatile (your third code sequence in your original email)? > > Will do. Got a bug report ready, just waiting for a GCC Bugzilla account to be > created for me so I can file it... https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
On Sun, 11 Feb 2024 at 03:12, Uros Bizjak <ubizjak@gmail.com> wrote: > > So, I'd suggest at least limit the workaround to known-bad compilers. Based on the current state of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 I would suggest this attached kernel patch, which makes the manual "volatile" the default case (since it should make no difference except for the known gcc issue), and limits the extra empty asm serialization to gcc versions older than 12.1.0. But Jakub is clearly currently trying to figure out exactly what was going wrong, so things may change. Maybe the commit he bisected to happened to just accidentally hide the real issue. Linus
On Wed, 14 Feb 2024 at 10:43, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Based on the current state of > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 > > I would suggest this attached kernel patch [...] Well, that "current state" didn't last long, and it looks like Jakub found the real issue and posted a suggested fix. Anyway, the end result is that the current kernel situation - that adds the workaround for all gcc versions - is the best that we can do for now. Linus
On Wed, Feb 14, 2024 at 04:11:05PM -0800, Linus Torvalds wrote: > On Wed, 14 Feb 2024 at 10:43, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Based on the current state of > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 > > > > I would suggest this attached kernel patch [...] > > Well, that "current state" didn't last long, and it looks like Jakub > found the real issue and posted a suggested fix. > > Anyway, the end result is that the current kernel situation - that > adds the workaround for all gcc versions - is the best that we can do > for now. Can it be guarded with #if GCC_VERSION < 140100 so that it isn't forgotten? GCC 14.1 certainly will have a fix for this (so will GCC 13.3, 12.4 and 11.5). Maybe it would be helpful to use #if GCC_VERSION < 140000 while it is true that no GCC 14 snapshots until today (or whenever the fix will be committed) have the fix, for GCC trunk it is up to the distros to use the latest snapshot if they use it at all and would allow better testing of the kernel code without the workaround, so that if there are other issues they won't be discovered years later. Most userland code doesn't actually use asm goto with outputs... Jakub
On Thu, 15 Feb 2024 at 00:39, Jakub Jelinek <jakub@redhat.com> wrote: > > Can it be guarded with > #if GCC_VERSION < 140100 Ack. I'll update the workaround to do that, and add the new and improved bugzilla pointer. Thanks for fixing this so quickly. Linus
On Thu, 15 Feb 2024 at 10:25, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, 15 Feb 2024 at 00:39, Jakub Jelinek <jakub@redhat.com> wrote: > > > > Can it be guarded with > > #if GCC_VERSION < 140100 > > Ack. I'll update the workaround to do that, and add the new and > improved bugzilla pointer. .. and I also followed your suggestion to just consider any gcc-14 snapshots as fixed. That seemed safe, considering that in practice the actual bug that Sean reported seems to not actually trigger with any gcc version 12.1+ as per your bisect (and my minimal testing). HOWEVER, when I was working through this, I noted that the *other* part of the workaround (the "missing volatile") doesn't seem to have been backported as aggressively. IOW, I find that "Mark asm goto with outputs as volatile" in the gcc-12 and gcc-13 branches, but not in gcc-11. So I did end up making the default "asm_goto_output()" macro always use "asm volatile goto()", so that we don't have to worry about the other gcc issue. End result: the extra empty asm barrier is now dependent on gcc version in my tree, but the manual addition of 'volatile' is unconditional. Because it looks to me like gcc-11.5 will have your fix for the pseudo ordering, but not Andrew Pinski's fix for missing a volatile. Anyway, since the version dependencies were complex enough, I ended up going with putting that logic in our Kconfig files, rather than making the gcc-specific header file an ugly mess of #if's. Our Kconfig files are pretty much designed for having complicated configuration dependencies, so it ends up being quite natural there: config GCC_ASM_GOTO_OUTPUT_WORKAROUND bool depends on CC_IS_GCC && CC_HAS_ASM_GOTO_OUTPUT # Fixed in GCC 14.1, 13.3, 12.4 and 11.5 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 default y if GCC_VERSION < 110500 default y if GCC_VERSION >= 120000 && GCC_VERSION < 120400 default y if GCC_VERSION >= 130000 && GCC_VERSION < 130300 and having those kinds of horrid expressions as preprocessor code included in every single compilation unit seemed just nasty. Linus
On Thu, Feb 15, 2024 at 11:17:44AM -0800, Linus Torvalds wrote:
> but the manual addition of 'volatile' is unconditional.
That is just fine. The extra keyword is not wrong, it is just redundant
with fixed compilers, where it doesn't change anything in the behavior.
Jakub
diff --git a/init/Kconfig b/init/Kconfig index deda3d14135b..f4e46d64c1e7 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -82,6 +82,11 @@ config CC_CAN_LINK_STATIC default $(success,$(srctree)/scripts/cc-can-link.sh $(CC) $(CLANG_FLAGS) $(USERCFLAGS) $(USERLDFLAGS) $(m32-flag) -static) config CC_HAS_ASM_GOTO_OUTPUT + # gcc-11 has a nasty bug where it doesn't treat asm goto as volatile, + # which can result in asm blocks being dropped when compiling with -02. + # Note, explicitly forcing volatile doesn't entirely fix the bug! + # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 + depends on !CC_IS_GCC || GCC_VERSION >= 120000 def_bool $(success,echo 'int foo(int x) { asm goto ("": "=r"(x) ::: bar); return x; bar: return 0; }' | $(CC) -x c - -c -o /dev/null) config CC_HAS_ASM_GOTO_TIED_OUTPUT
Explicitly require gcc-12+ to enable asm goto with outputs on gcc to avoid what is effectively a data corruption bug on gcc-11. As per https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, "asm goto" is *supposed* be implicitly volatile, but gcc-11 fails to treat it as such. When compiling with -O2, failure to treat the asm block as volatile can result in the entire block being discarded during optimization. Even worse, forcing "asm volatile goto" keeps the block, but generates completely bogus code. Hardcode the gcc-12 or later requirement as trying to pipe the assembled output to stdout, e.g. to query the generated code via objdump, doesn't work due to the assembler wanting to seek throughout the output file. Note, gcc-11 is the first gcc version that supports goto w/ outputs (obviously with a loose definition of "supports"). E.g. given KVM's code sequence: vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0); vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1); vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2); vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3); where vmcs_read64() eventually becomes: asm_volatile_goto("1: vmread %[field], %[output]\n\t" "jna %l[do_fail]\n\t" _ASM_EXTABLE(1b, %l[do_exception]) : [output] "=r" (value) : [field] "r" (field) : "cc" : do_fail, do_exception); return value; do_fail: instrumentation_begin(); vmread_error(field); instrumentation_end(); return 0; do_exception: kvm_spurious_fault(); return 0; the sequence of VMREADs should generate: nopl 0x0(%rax,%rax,1) mov $0x280a,%eax vmread %rax,%rax jbe 0xffffffff81099849 <sync_vmcs02_to_vmcs12+1929> mov %rax,0xd8(%rbx) nopl 0x0(%rax,%rax,1) mov $0x280c,%eax vmread %rax,%rax jbe 0xffffffff8109982c <sync_vmcs02_to_vmcs12+1900> mov %rax,0xe0(%rbx) nopl 0x0(%rax,%rax,1) mov $0x280e,%eax vmread %rax,%rax jbe 0xffffffff8109980f <sync_vmcs02_to_vmcs12+1871> mov %rax,0xe8(%rbx) nopl 0x0(%rax,%rax,1) mov $0x2810,%eax vmread %rax,%rax jbe 0xffffffff810997f2 <sync_vmcs02_to_vmcs12+1842> mov %rax,0xf0(%rbx) jmp 0xffffffff81099297 <sync_vmcs02_to_vmcs12+471> but gcc-11 will omit the asm block for the VMREAD to GUEST_PDPTR3 and skip straight to one of the "return 0" statements: nopl 0x0(%rax,%rax,1) mov $0x280a,%r13d vmread %r13,%r13 jbe 0xffffffff810996cd <sync_vmcs02_to_vmcs12+1949> mov %r13,0xd8(%rbx) nopl 0x0(%rax,%rax,1) mov $0x280c,%r13d vmread %r13,%r13 jbe 0xffffffff810996ae <sync_vmcs02_to_vmcs12+1918> mov %r13,0xe0(%rbx) nopl 0x0(%rax,%rax,1) mov $0x280e,%r13d vmread %r13,%r13 jbe 0xffffffff8109968f <sync_vmcs02_to_vmcs12+1887> mov %r13,0xe8(%rbx) nopl 0x0(%rax,%rax,1) xor %r12d,%r12d <= return 0 mov %r12,0xf0(%rbx) <= store result to vmcs12->guest_pdptr3 jmp 0xffffffff8109912c <sync_vmcs02_to_vmcs12+508> and with "volatile" forced, gcc-11 generates the correct-at-first-glance, but terribly broken sequence of: nopl 0x0(%rax,%rax,1) mov $0x280a,%r13d vmread %r13,%r13 jbe 0xffffffff810999a4 <sync_vmcs02_to_vmcs12+1988> mov %r13,0xd8(%rbx) nopl 0x0(%rax,%rax,1) mov $0x280c,%r13d vmread %r13,%r13 jbe 0xffffffff81099985 <sync_vmcs02_to_vmcs12+1957> mov %r13,0xe0(%rbx) nopl 0x0(%rax,%rax,1) mov $0x280e,%r13d vmread %r13,%r13 jbe 0xffffffff81099966 <sync_vmcs02_to_vmcs12+1926> mov %r13,0xe8(%rbx) nopl 0x0(%rax,%rax,1) mov $0x2810,%eax vmread %rax,%rax jbe 0xffffffff8109994a <sync_vmcs02_to_vmcs12+1898> xor %r12d,%r12d <= WTF gcc!?!?! mov %r12,0xf0(%rbx) Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 Fixes: 587f17018a2c ("Kconfig: add config option for asm goto w/ outputs") Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> --- Linus, I'm sending to you directly as this seems urgent enough to apply straightaway, and this obviously affects much more than the build system. init/Kconfig | 5 +++++ 1 file changed, 5 insertions(+) base-commit: 047371968ffc470769f541d6933e262dc7085456