Message ID | 20171108092555.ta4mkqolunyw6mdr@yury-thinkpad (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> There's the series from Andi Kleen that enables LTO for Linux on x86: > https://lwn.net/Articles/512548/ > https://github.com/andikleen/linux-misc/tree/lto-411-1 > > It has solved many problems you also try to solve, and some patches > are looking very similar. > > At now we have different patchsets for gcc and clang, and it would be > better to have them together. One thing I'm worried is that you introduce > CONFIG_CLANG_LTO and use it for all cases, including that where more > generic CONFIG_LTO should be used. Yes would be good to merge the two. I've been looking at updating my old one. I don't cover any ARM code, but lots of generic code. My patches also worked on MIPS at least. There's also older patches to enable single-pass-linking for kallsyms, which is extremly useful for LTO build performance. -Andi
> On Nov 9, 2017, at 3:02 AM, Andi Kleen <ak@linux.intel.com> wrote: > >> There's the series from Andi Kleen that enables LTO for Linux on x86: >> https://lwn.net/Articles/512548/ >> https://github.com/andikleen/linux-misc/tree/lto-411-1 >> >> It has solved many problems you also try to solve, and some patches >> are looking very similar. >> >> At now we have different patchsets for gcc and clang, and it would be >> better to have them together. One thing I'm worried is that you introduce >> CONFIG_CLANG_LTO and use it for all cases, including that where more >> generic CONFIG_LTO should be used. > > Yes would be good to merge the two. I've been looking at updating > my old one. > > I don't cover any ARM code, but lots of generic code. My patches > also worked on MIPS at least. > > There's also older patches to enable single-pass-linking for kallsyms, > which is extremly useful for LTO build performance. [Yury, thanks for the CC:] Chiming in from the toolchain side, Linaro's Toolchain team will try to help with any GCC or Clang issues that are exposed by building kernel with LTO on arm64 / arm. Regarding CONFIG_* options, I would expect most of the configuration changes to be equally valid for both GCC's and Clang's LTO support. Sami, I don't think it's fair to ask you to support both Clang and GCC in your patchset, but, where changes are obviously toolchain-agnostic, could you use CONFIG_LTO? And use CONFIG_LTO_CLANG for Clang-specific parts? This way we will be able to avoid most of the refactoring when adding support for GCC's LTO. Thank you, -- Maxim Kuvyrkov www.linaro.org
On Wed, Nov 08, 2017 at 12:25:55PM +0300, Yury Norov wrote: > The patch below uses trick with undefining mrs_s/msr_s immediately > after use to solve the problem. It works for both gcc and clang. Great, looks good to me! I tested the patch with LTO and clang's integrated assembler seems to be happy with it. > It has solved many problems you also try to solve, and some patches > are looking very similar. I haven't had a closer look at the gcc LTO patches yet, but I am definitely all for using common code where possible. Sami
On Wed, Nov 08, 2017 at 04:02:22PM -0800, Andi Kleen wrote: > There's also older patches to enable single-pass-linking for kallsyms, > which is extremly useful for LTO build performance. Excellent, can you point me to the patch in question? I worked around the build performance problem by reusing vmlinux.o for kallsyms instead of linking all bitcode again in each step. I'm not sure if this is feasible with gcc. Sami
On Thu, Nov 09, 2017 at 07:48:06AM +0300, Maxim Kuvyrkov wrote: > Regarding CONFIG_* options, I would expect most of the configuration > changes to be equally valid for both GCC's and Clang's LTO support. > Sami, I don't think it's fair to ask you to support both Clang and GCC in > your patchset, but, where changes are obviously toolchain-agnostic, could > you use CONFIG_LTO? And use CONFIG_LTO_CLANG for Clang-specific parts? Sure, using CONFIG_LTO for common code and CONFIG_LTO_CLANG for clang- specific parts sounds good. Sami
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index 4572a9b560fa..20bfb8e676e0 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -29,7 +29,9 @@ ({ \ u64 reg; \ asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\ - "mrs_s %0, " __stringify(r##vh),\ + DEFINE_MRS_S \ + "mrs_s %0, " __stringify(r##vh) "\n"\ + UNDEFINE_MRS_S, \ ARM64_HAS_VIRT_HOST_EXTN) \ : "=r" (reg)); \ reg; \ @@ -39,7 +41,9 @@ do { \ u64 __val = (u64)(v); \ asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\ - "msr_s " __stringify(r##vh) ", %x0",\ + DEFINE_MSR_S \ + "msr_s " __stringify(r##vh) ", %x0\n"\ + UNDEFINE_MSR_S, \ ARM64_HAS_VIRT_HOST_EXTN) \ : : "rZ" (__val)); \ } while (0) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index f707fed5886f..a69b0ca9a3b4 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -463,20 +463,39 @@ #include <linux/types.h> -asm( -" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" -" .equ .L__reg_num_x\\num, \\num\n" -" .endr\n" +#define __DEFINE_MRS_MSR_S_REGNUM \ +" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \ +" .equ .L__reg_num_x\\num, \\num\n" \ +" .endr\n" \ " .equ .L__reg_num_xzr, 31\n" -"\n" -" .macro mrs_s, rt, sreg\n" - __emit_inst(0xd5200000|(\\sreg)|(.L__reg_num_\\rt)) + +#define DEFINE_MRS_S \ + __DEFINE_MRS_MSR_S_REGNUM \ +" .macro mrs_s, rt, sreg\n" \ + __emit_inst(0xd5200000|(\\sreg)|(.L__reg_num_\\rt)) \ " .endm\n" -"\n" -" .macro msr_s, sreg, rt\n" - __emit_inst(0xd5000000|(\\sreg)|(.L__reg_num_\\rt)) + +#define DEFINE_MSR_S \ + __DEFINE_MRS_MSR_S_REGNUM \ +" .macro msr_s, sreg, rt\n" \ + __emit_inst(0xd5000000|(\\sreg)|(.L__reg_num_\\rt)) \ " .endm\n" -); + +#define UNDEFINE_MRS_S \ +" .purgem mrs_s\n" + +#define UNDEFINE_MSR_S \ +" .purgem msr_s\n" + +#define __mrs_s(r, v) \ + DEFINE_MRS_S \ +" mrs_s %0, " __stringify(r) "\n" \ + UNDEFINE_MRS_S : "=r" (v) + +#define __msr_s(r, v) \ + DEFINE_MSR_S \ +" msr_s " __stringify(r) ", %x0\n" \ + UNDEFINE_MSR_S : : "rZ" (v) /* * Unlike read_cpuid, calls to read_sysreg are never expected to be @@ -502,15 +521,15 @@ asm( * For registers without architectural names, or simply unsupported by * GAS. */ -#define read_sysreg_s(r) ({ \ - u64 __val; \ - asm volatile("mrs_s %0, " __stringify(r) : "=r" (__val)); \ - __val; \ +#define read_sysreg_s(r) ({ \ + u64 __val; \ + asm volatile(__mrs_s(r, __val)); \ + __val; \ }) -#define write_sysreg_s(v, r) do { \ - u64 __val = (u64)(v); \ - asm volatile("msr_s " __stringify(r) ", %x0" : : "rZ" (__val)); \ +#define write_sysreg_s(v, r) do { \ + u64 __val = (u64)(v); \ + asm volatile(__msr_s(r, __val)); \ } while (0) static inline void config_sctlr_el1(u32 clear, u32 set)