Message ID | 20200413033811.75074-1-maskray@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Delete the space separator in __emit_inst | expand |
On Sun, Apr 12, 2020 at 8:38 PM 'Fangrui Song' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > Many instances of __emit_inst(x) expand to a directive. In a few places > it is used as a macro argument, e.g. > > arch/arm64/include/asm/sysreg.h > #define __emit_inst(x) .inst (x) > > arch/arm64/include/asm/sysreg.h > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > > arch/arm64/kvm/hyp/entry.S > ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) > > Clang integrated assembler parses `.inst (x)` as two arguments passing > to a macro. We delete the space separator so that `.inst(x)` will be > parsed as one argument. > > Note, GNU as parsing `.inst (x)` as one argument is unintentional (for > example the x86 backend will parse the construct as two arguments). > See https://sourceware.org/bugzilla/show_bug.cgi?id=25750#c10 Thanks for the patch and for leading the discussion with the binutils developers on this curious parsing case. > > Link: https://github.com/ClangBuiltLinux/linux/issues/939 > Cc: clang-built-linux@googlegroups.com > Signed-off-by: Fangrui Song <maskray@google.com> Shouldn't this have: Suggested-by: Ilie Halip <ilie.halip@gmail.com> Since Ilie sugguested this in https://github.com/ClangBuiltLinux/linux/issues/939#issuecomment-601776123? > --- > arch/arm64/include/asm/sysreg.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index ebc622432831..af21e2ec5e3e 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -49,7 +49,9 @@ > #ifndef CONFIG_BROKEN_GAS_INST > > #ifdef __ASSEMBLY__ > -#define __emit_inst(x) .inst (x) > +// The space separator is omitted so that __emit_inst(x) can be parsed as > +// either a directive or a macro argument. > +#define __emit_inst(x) .inst(x) > #else > #define __emit_inst(x) ".inst " __stringify((x)) "\n\t" What happens if someone starts using `__emit_inst` from inline assembly, and passes that subexpression to a macro? There are no users today in arch/arm64/, but I think it's better to change both. With this applied on -next, and testing via: $ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 LLVM_IAS=1 -j71 defconfig $ qemu-system-aarch64 -kernel arch/arm64/boot/Image.gz -machine virt -cpu cortex-a57 -nographic --append "console=ttyAMA0" -m 2048 -initrd <path to my buildroot.cpio> I was able to build and boot, modulo: https://github.com/ClangBuiltLinux/linux/issues/988 (new) https://github.com/ClangBuiltLinux/linux/issues/716 https://github.com/ClangBuiltLinux/linux/issues/510 (more specific to LLD than AS) so we're pretty close to being able to assemble an arm64 defconfig with Clang.
Hi Fangrui, On Sun, Apr 12, 2020 at 08:38:11PM -0700, Fangrui Song wrote: > Many instances of __emit_inst(x) expand to a directive. In a few places > it is used as a macro argument, e.g. > > arch/arm64/include/asm/sysreg.h > #define __emit_inst(x) .inst (x) > > arch/arm64/include/asm/sysreg.h > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > > arch/arm64/kvm/hyp/entry.S > ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) > > Clang integrated assembler parses `.inst (x)` as two arguments passing > to a macro. We delete the space separator so that `.inst(x)` will be > parsed as one argument. I'm a little confused by the above; sorry if the below sounds stupid or pedantic, but I just want to make sure I've understood the problem correctly. For the above, ALTERNATIVE() and SET_PSTATE_PAN() are both preprocessor macros, so I would expect those to be expanded before either the integrated assembler or an external assembler consumes any of the assembly (and both would see the same expanded text). Given that, I'm a bit confused as to why the integrated assembly would have an impact on preprocessing. Does compiling the pre-processed source using the integrated assembler result in the same behaviour? Can we see the expanded text to make that clear? ... at what stage exactly does this go wrong? Thanks, Mark. > > Note, GNU as parsing `.inst (x)` as one argument is unintentional (for > example the x86 backend will parse the construct as two arguments). > See https://sourceware.org/bugzilla/show_bug.cgi?id=25750#c10 > > Link: https://github.com/ClangBuiltLinux/linux/issues/939 > Cc: clang-built-linux@googlegroups.com > Signed-off-by: Fangrui Song <maskray@google.com> > --- > arch/arm64/include/asm/sysreg.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index ebc622432831..af21e2ec5e3e 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -49,7 +49,9 @@ > #ifndef CONFIG_BROKEN_GAS_INST > > #ifdef __ASSEMBLY__ > -#define __emit_inst(x) .inst (x) > +// The space separator is omitted so that __emit_inst(x) can be parsed as > +// either a directive or a macro argument. > +#define __emit_inst(x) .inst(x) > #else > #define __emit_inst(x) ".inst " __stringify((x)) "\n\t" > #endif > -- > 2.26.0.110.g2183baf09c-goog >
On 2020-04-14, Mark Rutland wrote: >Hi Fangrui, > >On Sun, Apr 12, 2020 at 08:38:11PM -0700, Fangrui Song wrote: >> Many instances of __emit_inst(x) expand to a directive. In a few places >> it is used as a macro argument, e.g. >> >> arch/arm64/include/asm/sysreg.h >> #define __emit_inst(x) .inst (x) >> >> arch/arm64/include/asm/sysreg.h >> #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) >> >> arch/arm64/kvm/hyp/entry.S >> ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) >> >> Clang integrated assembler parses `.inst (x)` as two arguments passing >> to a macro. We delete the space separator so that `.inst(x)` will be >> parsed as one argument. > >I'm a little confused by the above; sorry if the below sounds stupid or >pedantic, but I just want to make sure I've understood the problem >correctly. > >For the above, ALTERNATIVE() and SET_PSTATE_PAN() are both preprocessor >macros, so I would expect those to be expanded before either the >integrated assembler or an external assembler consumes any of the >assembly (and both would see the same expanded text). Given that, I'm a >bit confused as to why the integrated assembly would have an impact on >preprocessing. > >Does compiling the pre-processed source using the integrated assembler >result in the same behaviour? Can we see the expanded text to make that >clear? > >... at what stage exactly does this go wrong? > >Thanks, >Mark. Hi Mark, The C preprocessor expands arch/arm64/kvm/hyp/entry.S ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) to: alternative_insn nop, .inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 `alternative_insn` is an assembler macro, not handled by the C preprocessor. Both comma and space are separators, with an exception that content inside a pair of parentheses/quotes is not split, so clang -cc1as or GNU as x86 splits arguments this way: nop, .inst, (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 I actually feel that GNU as arm64's behavior is a little bit buggy. It works just because GNU as has another preprocessing step `do_scrub_chars` and its arm64 backend deletes the space before '(' alternative_insn nop,.inst(0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 The x86 backend keeps the space before the outmost '(' alternative_insn nop,.inst (0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 By reading its state machine, I think keeping the spaces will be the most reasonable behavior. If .inst were only used as arguments, alternative_insn nop, ".inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8))", 4, 1 would be the best to avoid parsing issues. >> >> Note, GNU as parsing `.inst (x)` as one argument is unintentional (for >> example the x86 backend will parse the construct as two arguments). >> See https://sourceware.org/bugzilla/show_bug.cgi?id=25750#c10 >> >> Link: https://github.com/ClangBuiltLinux/linux/issues/939 >> Cc: clang-built-linux@googlegroups.com >> Signed-off-by: Fangrui Song <maskray@google.com> >> --- >> arch/arm64/include/asm/sysreg.h | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h >> index ebc622432831..af21e2ec5e3e 100644 >> --- a/arch/arm64/include/asm/sysreg.h >> +++ b/arch/arm64/include/asm/sysreg.h >> @@ -49,7 +49,9 @@ >> #ifndef CONFIG_BROKEN_GAS_INST >> >> #ifdef __ASSEMBLY__ >> -#define __emit_inst(x) .inst (x) >> +// The space separator is omitted so that __emit_inst(x) can be parsed as >> +// either a directive or a macro argument. >> +#define __emit_inst(x) .inst(x) >> #else >> #define __emit_inst(x) ".inst " __stringify((x)) "\n\t" >> #endif >> -- >> 2.26.0.110.g2183baf09c-goog >>
On Tue, Apr 14, 2020 at 08:43:07AM -0700, Fangrui Song wrote: > > On 2020-04-14, Mark Rutland wrote: > > Hi Fangrui, > > > > On Sun, Apr 12, 2020 at 08:38:11PM -0700, Fangrui Song wrote: > > > Many instances of __emit_inst(x) expand to a directive. In a few places > > > it is used as a macro argument, e.g. > > > > > > arch/arm64/include/asm/sysreg.h > > > #define __emit_inst(x) .inst (x) > > > > > > arch/arm64/include/asm/sysreg.h > > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > > > > > > arch/arm64/kvm/hyp/entry.S > > > ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) > > > > > > Clang integrated assembler parses `.inst (x)` as two arguments passing > > > to a macro. We delete the space separator so that `.inst(x)` will be > > > parsed as one argument. > > > > I'm a little confused by the above; sorry if the below sounds stupid or > > pedantic, but I just want to make sure I've understood the problem > > correctly. > > > > For the above, ALTERNATIVE() and SET_PSTATE_PAN() are both preprocessor > > macros, so I would expect those to be expanded before either the > > integrated assembler or an external assembler consumes any of the > > assembly (and both would see the same expanded text). Given that, I'm a > > bit confused as to why the integrated assembly would have an impact on > > preprocessing. > > > > Does compiling the pre-processed source using the integrated assembler > > result in the same behaviour? Can we see the expanded text to make that > > clear? > > > > ... at what stage exactly does this go wrong? > > > > Thanks, > > Mark. > > Hi Mark, > > The C preprocessor expands arch/arm64/kvm/hyp/entry.S > ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) > > to: > alternative_insn nop, .inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 > > `alternative_insn` is an assembler macro, not handled by the C preprocessor. > > Both comma and space are separators, with an exception that content > inside a pair of parentheses/quotes is not split, so clang -cc1as or GNU > as x86 splits arguments this way: > > nop, .inst, (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 Thanks for this; I understand now. Could we fold that into the commit message? I think this is much clearer than the current wording. The explicit description of separator behaviour, the pre-expansion of the CPP macros, and the example of how the assembler will split this really help. > I actually feel that GNU as arm64's behavior is a little bit buggy. It > works just because GNU as has another preprocessing step `do_scrub_chars` > and its arm64 backend deletes the space before '(' > > alternative_insn nop,.inst(0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 > > The x86 backend keeps the space before the outmost '(' > > alternative_insn nop,.inst (0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 > > By reading its state machine, I think keeping the spaces will be the > most reasonable behavior. I think I agree. This deviation across architectures is unfortunate for such a low-level but common tool. > If .inst were only used as arguments, > > alternative_insn nop, ".inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8))", 4, 1 > > would be the best to avoid parsing issues. > > > > > > > Note, GNU as parsing `.inst (x)` as one argument is unintentional (for > > > example the x86 backend will parse the construct as two arguments). > > > See https://sourceware.org/bugzilla/show_bug.cgi?id=25750#c10 > > > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/939 > > > Cc: clang-built-linux@googlegroups.com > > > Signed-off-by: Fangrui Song <maskray@google.com> > > > --- > > > arch/arm64/include/asm/sysreg.h | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > > > index ebc622432831..af21e2ec5e3e 100644 > > > --- a/arch/arm64/include/asm/sysreg.h > > > +++ b/arch/arm64/include/asm/sysreg.h > > > @@ -49,7 +49,9 @@ > > > #ifndef CONFIG_BROKEN_GAS_INST > > > > > > #ifdef __ASSEMBLY__ > > > -#define __emit_inst(x) .inst (x) > > > +// The space separator is omitted so that __emit_inst(x) can be parsed as > > > +// either a directive or a macro argument. > > > +#define __emit_inst(x) .inst(x) Can we make this a bit more explicit and say "assembler macro argument"? That way we can avoid any confusion with a CPP macro. With that (and with the details above folded into the commit message): Reviewed-by: Mark Rutland <mark.rutland@arm.com> Thanks, Mark. > > > #else > > > #define __emit_inst(x) ".inst " __stringify((x)) "\n\t" > > > #endif > > > -- > > > 2.26.0.110.g2183baf09c-goog > > >
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index ebc622432831..af21e2ec5e3e 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -49,7 +49,9 @@ #ifndef CONFIG_BROKEN_GAS_INST #ifdef __ASSEMBLY__ -#define __emit_inst(x) .inst (x) +// The space separator is omitted so that __emit_inst(x) can be parsed as +// either a directive or a macro argument. +#define __emit_inst(x) .inst(x) #else #define __emit_inst(x) ".inst " __stringify((x)) "\n\t" #endif
Many instances of __emit_inst(x) expand to a directive. In a few places it is used as a macro argument, e.g. arch/arm64/include/asm/sysreg.h #define __emit_inst(x) .inst (x) arch/arm64/include/asm/sysreg.h #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) arch/arm64/kvm/hyp/entry.S ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) Clang integrated assembler parses `.inst (x)` as two arguments passing to a macro. We delete the space separator so that `.inst(x)` will be parsed as one argument. Note, GNU as parsing `.inst (x)` as one argument is unintentional (for example the x86 backend will parse the construct as two arguments). See https://sourceware.org/bugzilla/show_bug.cgi?id=25750#c10 Link: https://github.com/ClangBuiltLinux/linux/issues/939 Cc: clang-built-linux@googlegroups.com Signed-off-by: Fangrui Song <maskray@google.com> --- arch/arm64/include/asm/sysreg.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)