Message ID | 20220922053145.944786-1-denik@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: arm64: nvhe: Fix build with profile optimization | expand |
On Thu, 22 Sep 2022 06:31:45 +0100, Denis Nikitin <denik@chromium.org> wrote: > > Kernel build with clang and KCFLAGS=-fprofile-sample-use fails with: > > error: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: Unexpected SHT_REL > section ".rel.llvm.call-graph-profile" > > Starting from 13.0.0 llvm can generate SHT_REL section, see > https://reviews.llvm.org/rGca3bdb57fa1ac98b711a735de048c12b5fdd8086. > gen-hyprel does not support SHT_REL relocation section. > > Remove ".llvm.call-graph-profile" SHT_REL relocation from kvm_nvhe > to fix the build. > > Signed-off-by: Denis Nikitin <denik@chromium.org> > --- > V1 -> V2: Remove the relocation instead of disabling the profile-use flags. > --- > arch/arm64/kvm/hyp/nvhe/Makefile | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile > index b5c5119c7396..49ec950ac57b 100644 > --- a/arch/arm64/kvm/hyp/nvhe/Makefile > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile > @@ -78,8 +78,10 @@ $(obj)/kvm_nvhe.o: $(obj)/kvm_nvhe.rel.o FORCE > > # The HYPREL command calls `gen-hyprel` to generate an assembly file with > # a list of relocations targeting hyp code/data. > +# Starting from 13.0.0 llvm emits SHT_REL section '.llvm.call-graph-profile' > +# when profile optimization is applied. gen-hyprel does not support SHT_REL. > quiet_cmd_hyprel = HYPREL $@ > - cmd_hyprel = $(obj)/gen-hyprel $< > $@ > + cmd_hyprel = $(OBJCOPY) -R .llvm.call-graph-profile $<; $(obj)/gen-hyprel $< > $@ I was really hoping that you'd just drop the flags from the CFLAGS instead of removing the generated section. Something like: diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile index b5c5119c7396..e5b2d43925b4 100644 --- a/arch/arm64/kvm/hyp/nvhe/Makefile +++ b/arch/arm64/kvm/hyp/nvhe/Makefile @@ -88,7 +88,7 @@ quiet_cmd_hypcopy = HYPCOPY $@ # Remove ftrace, Shadow Call Stack, and CFI CFLAGS. # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations. -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI) -fprofile-sample-use, $(KBUILD_CFLAGS)) # KVM nVHE code is run at a different exception code with a different map, so # compiler instrumentation that inserts callbacks or checks into the code may However, I even failed to reproduce your problem using LLVM 14 as packaged by Debian (if that matters, I'm using an arm64 build machine). I build the kernel with: $ make LLVM=1 KCFLAGS=-fprofile-sample-use -j8 vmlinux and the offending object only contains the following sections: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: file format elf64-littleaarch64 Sections: Idx Name Size VMA LMA File off Algn 0 .hyp.idmap.text 00000ae4 0000000000000000 0000000000000000 00000800 2**11 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1 .hyp.text 0000e988 0000000000000000 0000000000000000 00001800 2**11 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 2 .hyp.data..ro_after_init 00000820 0000000000000000 0000000000000000 00010188 2**3 CONTENTS, ALLOC, LOAD, DATA 3 .hyp.rodata 00002e70 0000000000000000 0000000000000000 000109a8 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA 4 .hyp.data..percpu 00001ee0 0000000000000000 0000000000000000 00013820 2**4 CONTENTS, ALLOC, LOAD, DATA 5 .hyp.bss 00001158 0000000000000000 0000000000000000 00015700 2**3 ALLOC 6 .comment 0000001f 0000000000000000 0000000000000000 00017830 2**0 CONTENTS, READONLY 7 .llvm_addrsig 000000b8 0000000000000000 0000000000000000 0001784f 2**0 CONTENTS, READONLY, EXCLUDE 8 .altinstructions 00001284 0000000000000000 0000000000000000 00015700 2**0 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA 9 __jump_table 00000960 0000000000000000 0000000000000000 00016988 2**3 CONTENTS, ALLOC, LOAD, RELOC, DATA 10 __bug_table 0000051c 0000000000000000 0000000000000000 000172e8 2**2 CONTENTS, ALLOC, LOAD, RELOC, DATA 11 __kvm_ex_table 00000028 0000000000000000 0000000000000000 00017808 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA 12 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00027370 2**0 CONTENTS, READONLY So what am I missing to trigger this issue? Does it rely on something like PGO, which is not upstream yet? A bit of handholding would be much appreciated. Thanks, M.
Hi Mark, On Thu, Sep 22, 2022 at 3:38 AM Marc Zyngier <maz@kernel.org> wrote: > > I was really hoping that you'd just drop the flags from the CFLAGS > instead of removing the generated section. Something like: > > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile > index b5c5119c7396..e5b2d43925b4 100644 > --- a/arch/arm64/kvm/hyp/nvhe/Makefile > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile > @@ -88,7 +88,7 @@ quiet_cmd_hypcopy = HYPCOPY $@ > > # Remove ftrace, Shadow Call Stack, and CFI CFLAGS. > # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations. > -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) > +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI) -fprofile-sample-use, $(KBUILD_CFLAGS)) > > # KVM nVHE code is run at a different exception code with a different map, so > # compiler instrumentation that inserts callbacks or checks into the code may Sorry, I moved on with a different approach and didn't explain the rationale. Like you mentioned before, the flag `-fprofile-sample-use` does not appear in the kernel. And it looks confusing when the flag is disabled or filtered out here. This was the first reason. The root cause of the build failure wasn't the compiler profile guided optimization but the extra metadata in SHT_REL section which llvm injected into kvm_nvhe.tmp.o for further link optimization. If we remove the .llvm.call-graph-profile section we fix the build and avoid potential problems with relocations optimized by the linker. The profile guided optimization will still be applied by the compiler. Let me know what you think about it. > > However, I even failed to reproduce your problem using LLVM 14 as > packaged by Debian (if that matters, I'm using an arm64 build > machine). I build the kernel with: > > $ make LLVM=1 KCFLAGS=-fprofile-sample-use -j8 vmlinux > > and the offending object only contains the following sections: > > arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: file format elf64-littleaarch64 > > Sections: > Idx Name Size VMA LMA File off Algn > 0 .hyp.idmap.text 00000ae4 0000000000000000 0000000000000000 00000800 2**11 > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE > 1 .hyp.text 0000e988 0000000000000000 0000000000000000 00001800 2**11 > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE > 2 .hyp.data..ro_after_init 00000820 0000000000000000 0000000000000000 00010188 2**3 > CONTENTS, ALLOC, LOAD, DATA > 3 .hyp.rodata 00002e70 0000000000000000 0000000000000000 000109a8 2**3 > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA > 4 .hyp.data..percpu 00001ee0 0000000000000000 0000000000000000 00013820 2**4 > CONTENTS, ALLOC, LOAD, DATA > 5 .hyp.bss 00001158 0000000000000000 0000000000000000 00015700 2**3 > ALLOC > 6 .comment 0000001f 0000000000000000 0000000000000000 00017830 2**0 > CONTENTS, READONLY > 7 .llvm_addrsig 000000b8 0000000000000000 0000000000000000 0001784f 2**0 > CONTENTS, READONLY, EXCLUDE > 8 .altinstructions 00001284 0000000000000000 0000000000000000 00015700 2**0 > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA > 9 __jump_table 00000960 0000000000000000 0000000000000000 00016988 2**3 > CONTENTS, ALLOC, LOAD, RELOC, DATA > 10 __bug_table 0000051c 0000000000000000 0000000000000000 000172e8 2**2 > CONTENTS, ALLOC, LOAD, RELOC, DATA > 11 __kvm_ex_table 00000028 0000000000000000 0000000000000000 00017808 2**3 > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA > 12 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00027370 2**0 > CONTENTS, READONLY > > So what am I missing to trigger this issue? Does it rely on something > like PGO, which is not upstream yet? A bit of handholding would be > much appreciated. Right, it relies on the PGO profile. On ChromeOS we collect the sample PGO profile from Arm devices with enabled CoreSight/ETM. You can find more details on ETM at https://www.kernel.org/doc/Documentation/trace/coresight/coresight.rst. https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md contains information about the pipeline of collecting, processing, and applying the profile. > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible. Thanks, Denis
Hi Marc, Please let me know what you think about this approach. Thanks, Denis On Thu, Sep 22, 2022 at 11:04 PM Manoj Gupta <manojgupta@google.com> wrote: > > > > On Thu, Sep 22, 2022 at 10:01 PM Denis Nikitin <denik@chromium.org> wrote: >> >> Hi Mark, >> >> On Thu, Sep 22, 2022 at 3:38 AM Marc Zyngier <maz@kernel.org> wrote: >> > >> > I was really hoping that you'd just drop the flags from the CFLAGS >> > instead of removing the generated section. Something like: >> > >> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile >> > index b5c5119c7396..e5b2d43925b4 100644 >> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile >> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile >> > @@ -88,7 +88,7 @@ quiet_cmd_hypcopy = HYPCOPY $@ >> > >> > # Remove ftrace, Shadow Call Stack, and CFI CFLAGS. >> > # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations. >> > -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) >> > +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI) -fprofile-sample-use, $(KBUILD_CFLAGS)) >> > >> > # KVM nVHE code is run at a different exception code with a different map, so >> > # compiler instrumentation that inserts callbacks or checks into the code may >> >> Sorry, I moved on with a different approach and didn't explain the rationale. >> >> Like you mentioned before, the flag `-fprofile-sample-use` does not appear >> in the kernel. And it looks confusing when the flag is disabled or filtered out >> here. This was the first reason. >> >> The root cause of the build failure wasn't the compiler profile guided >> optimization but the extra metadata in SHT_REL section which llvm injected >> into kvm_nvhe.tmp.o for further link optimization. >> If we remove the .llvm.call-graph-profile section we fix the build and avoid >> potential problems with relocations optimized by the linker. The profile >> guided optimization will still be applied by the compiler. >> >> Let me know what you think about it. >> >> > >> > However, I even failed to reproduce your problem using LLVM 14 as >> > packaged by Debian (if that matters, I'm using an arm64 build >> > machine). I build the kernel with: >> > >> > $ make LLVM=1 KCFLAGS=-fprofile-sample-use -j8 vmlinux >> > >> > and the offending object only contains the following sections: >> > > > > Just some comments based on my ChromeOS build experience. > > fprofile-sample-use needs the profile file name argument to read the pgo data from > i.e. -fprofile-sample-use=/path/to/gcov.profile. > > Since the path to filename can change, it makes filtering out more difficult. > It is certainly possible to find and filter the exact argument by some string search of KCFLAGS. > But passing -fno-profile-sample-use is easier and less error prone which I believe the previous patch version tried to do. > > >> > arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: file format elf64-littleaarch64 >> > >> > Sections: >> > Idx Name Size VMA LMA File off Algn >> > 0 .hyp.idmap.text 00000ae4 0000000000000000 0000000000000000 00000800 2**11 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE >> > 1 .hyp.text 0000e988 0000000000000000 0000000000000000 00001800 2**11 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE >> > 2 .hyp.data..ro_after_init 00000820 0000000000000000 0000000000000000 00010188 2**3 >> > CONTENTS, ALLOC, LOAD, DATA >> > 3 .hyp.rodata 00002e70 0000000000000000 0000000000000000 000109a8 2**3 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA >> > 4 .hyp.data..percpu 00001ee0 0000000000000000 0000000000000000 00013820 2**4 >> > CONTENTS, ALLOC, LOAD, DATA >> > 5 .hyp.bss 00001158 0000000000000000 0000000000000000 00015700 2**3 >> > ALLOC >> > 6 .comment 0000001f 0000000000000000 0000000000000000 00017830 2**0 >> > CONTENTS, READONLY >> > 7 .llvm_addrsig 000000b8 0000000000000000 0000000000000000 0001784f 2**0 >> > CONTENTS, READONLY, EXCLUDE >> > 8 .altinstructions 00001284 0000000000000000 0000000000000000 00015700 2**0 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA >> > 9 __jump_table 00000960 0000000000000000 0000000000000000 00016988 2**3 >> > CONTENTS, ALLOC, LOAD, RELOC, DATA >> > 10 __bug_table 0000051c 0000000000000000 0000000000000000 000172e8 2**2 >> > CONTENTS, ALLOC, LOAD, RELOC, DATA >> > 11 __kvm_ex_table 00000028 0000000000000000 0000000000000000 00017808 2**3 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA >> > 12 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00027370 2**0 >> > CONTENTS, READONLY >> > >> > So what am I missing to trigger this issue? Does it rely on something >> > like PGO, which is not upstream yet? A bit of handholding would be >> > much appreciated. >> >> Right, it relies on the PGO profile. >> On ChromeOS we collect the sample PGO profile from Arm devices with >> enabled CoreSight/ETM. You can find more details on ETM at >> https://www.kernel.org/doc/Documentation/trace/coresight/coresight.rst. >> >> https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md >> contains information about the pipeline of collecting, processing, and applying >> the profile. >> > > Generally the difficult part is in collecting a good matching profile for the workload. > So I think this patch is better than previous since it still keeps the compiler optimization for the hot code paths > in the file but removes the problematic section. > > Thanks, > Manoj > > >> >> > >> > Thanks, >> > >> > M. >> > >> > -- >> > Without deviation from the norm, progress is not possible. >> >> Thanks, >> Denis
Hi Mark, This problem currently blocks the PGO roll on the ChromeOS kernel and we need some kind of a solution. Could you please take a look? Thanks, Denis On Thu, Sep 29, 2022 at 9:13 AM Denis Nikitin <denik@chromium.org> wrote: > > Hi Marc, > > Please let me know what you think about this approach. > > Thanks, > Denis > > On Thu, Sep 22, 2022 at 11:04 PM Manoj Gupta <manojgupta@google.com> wrote: > > > > > > > > On Thu, Sep 22, 2022 at 10:01 PM Denis Nikitin <denik@chromium.org> wrote: > >> > >> Hi Mark, > >> > >> On Thu, Sep 22, 2022 at 3:38 AM Marc Zyngier <maz@kernel.org> wrote: > >> > > >> > I was really hoping that you'd just drop the flags from the CFLAGS > >> > instead of removing the generated section. Something like: > >> > > >> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile > >> > index b5c5119c7396..e5b2d43925b4 100644 > >> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile > >> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile > >> > @@ -88,7 +88,7 @@ quiet_cmd_hypcopy = HYPCOPY $@ > >> > > >> > # Remove ftrace, Shadow Call Stack, and CFI CFLAGS. > >> > # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations. > >> > -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) > >> > +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI) -fprofile-sample-use, $(KBUILD_CFLAGS)) > >> > > >> > # KVM nVHE code is run at a different exception code with a different map, so > >> > # compiler instrumentation that inserts callbacks or checks into the code may > >> > >> Sorry, I moved on with a different approach and didn't explain the rationale. > >> > >> Like you mentioned before, the flag `-fprofile-sample-use` does not appear > >> in the kernel. And it looks confusing when the flag is disabled or filtered out > >> here. This was the first reason. > >> > >> The root cause of the build failure wasn't the compiler profile guided > >> optimization but the extra metadata in SHT_REL section which llvm injected > >> into kvm_nvhe.tmp.o for further link optimization. > >> If we remove the .llvm.call-graph-profile section we fix the build and avoid > >> potential problems with relocations optimized by the linker. The profile > >> guided optimization will still be applied by the compiler. > >> > >> Let me know what you think about it. > >> > >> > > >> > However, I even failed to reproduce your problem using LLVM 14 as > >> > packaged by Debian (if that matters, I'm using an arm64 build > >> > machine). I build the kernel with: > >> > > >> > $ make LLVM=1 KCFLAGS=-fprofile-sample-use -j8 vmlinux > >> > > >> > and the offending object only contains the following sections: > >> > > > > > > > Just some comments based on my ChromeOS build experience. > > > > fprofile-sample-use needs the profile file name argument to read the pgo data from > > i.e. -fprofile-sample-use=/path/to/gcov.profile. > > > > Since the path to filename can change, it makes filtering out more difficult. > > It is certainly possible to find and filter the exact argument by some string search of KCFLAGS. > > But passing -fno-profile-sample-use is easier and less error prone which I believe the previous patch version tried to do. > > > > > >> > arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: file format elf64-littleaarch64 > >> > > >> > Sections: > >> > Idx Name Size VMA LMA File off Algn > >> > 0 .hyp.idmap.text 00000ae4 0000000000000000 0000000000000000 00000800 2**11 > >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE > >> > 1 .hyp.text 0000e988 0000000000000000 0000000000000000 00001800 2**11 > >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE > >> > 2 .hyp.data..ro_after_init 00000820 0000000000000000 0000000000000000 00010188 2**3 > >> > CONTENTS, ALLOC, LOAD, DATA > >> > 3 .hyp.rodata 00002e70 0000000000000000 0000000000000000 000109a8 2**3 > >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA > >> > 4 .hyp.data..percpu 00001ee0 0000000000000000 0000000000000000 00013820 2**4 > >> > CONTENTS, ALLOC, LOAD, DATA > >> > 5 .hyp.bss 00001158 0000000000000000 0000000000000000 00015700 2**3 > >> > ALLOC > >> > 6 .comment 0000001f 0000000000000000 0000000000000000 00017830 2**0 > >> > CONTENTS, READONLY > >> > 7 .llvm_addrsig 000000b8 0000000000000000 0000000000000000 0001784f 2**0 > >> > CONTENTS, READONLY, EXCLUDE > >> > 8 .altinstructions 00001284 0000000000000000 0000000000000000 00015700 2**0 > >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA > >> > 9 __jump_table 00000960 0000000000000000 0000000000000000 00016988 2**3 > >> > CONTENTS, ALLOC, LOAD, RELOC, DATA > >> > 10 __bug_table 0000051c 0000000000000000 0000000000000000 000172e8 2**2 > >> > CONTENTS, ALLOC, LOAD, RELOC, DATA > >> > 11 __kvm_ex_table 00000028 0000000000000000 0000000000000000 00017808 2**3 > >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA > >> > 12 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00027370 2**0 > >> > CONTENTS, READONLY > >> > > >> > So what am I missing to trigger this issue? Does it rely on something > >> > like PGO, which is not upstream yet? A bit of handholding would be > >> > much appreciated. > >> > >> Right, it relies on the PGO profile. > >> On ChromeOS we collect the sample PGO profile from Arm devices with > >> enabled CoreSight/ETM. You can find more details on ETM at > >> https://www.kernel.org/doc/Documentation/trace/coresight/coresight.rst. > >> > >> https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md > >> contains information about the pipeline of collecting, processing, and applying > >> the profile. > >> > > > > Generally the difficult part is in collecting a good matching profile for the workload. > > So I think this patch is better than previous since it still keeps the compiler optimization for the hot code paths > > in the file but removes the problematic section. > > > > Thanks, > > Manoj > > > > > >> > >> > > >> > Thanks, > >> > > >> > M. > >> > > >> > -- > >> > Without deviation from the norm, progress is not possible. > >> > >> Thanks, > >> Denis
On Thu, 06 Oct 2022 17:28:17 +0100, Denis Nikitin <denik@chromium.org> wrote: > > Hi Mark, s/k/c/ > > This problem currently blocks the PGO roll on the ChromeOS kernel and > we need some kind of a solution. I'm sorry, but I don't feel constrained by your internal deadlines. I have my own... > Could you please take a look? I have asked for a reproducer. All I got for an answer is "this is hard". Providing a profiling file would help, for example. M.
On Sat, Oct 8, 2022 at 7:22 PM Marc Zyngier <maz@kernel.org> wrote: > > On Thu, 06 Oct 2022 17:28:17 +0100, > Denis Nikitin <denik@chromium.org> wrote: > > > > Hi Mark, > > s/k/c/ > > > > > This problem currently blocks the PGO roll on the ChromeOS kernel and > > we need some kind of a solution. > > I'm sorry, but I don't feel constrained by your internal deadlines. I > have my own... > > > Could you please take a look? > > I have asked for a reproducer. All I got for an answer is "this is > hard". Providing a profiling file would help, for example. Could you please try the following profile on the 5.15 branch? $ cat <<EOF > prof.txt kvm_pgtable_walk:100:10 2: 5 3: 5 5: 5 6: 5 10: 5 10: _kvm_pgtable_walk:50 5: 5 7: 5 10: 5 13.2: 5 14: 5 16: 5 __kvm_pgtable_walk:5 13: kvm_pgd_page_idx:30 2: __kvm_pgd_page_idx:30 2: 5 3: 5 5: 5 2: kvm_granule_shift:5 3: 5 EOF $ make LLVM=1 ARCH=arm64 KCFLAGS=-fprofile-sample-use=prof.txt -j8 vmlinux Thanks, Denis > > M. > > -- > Without deviation from the norm, progress is not possible.
On Tue, 11 Oct 2022 03:15:36 +0100, Denis Nikitin <denik@chromium.org> wrote: > > On Sat, Oct 8, 2022 at 7:22 PM Marc Zyngier <maz@kernel.org> wrote: > > > > On Thu, 06 Oct 2022 17:28:17 +0100, > > Denis Nikitin <denik@chromium.org> wrote: > > > > > > Hi Mark, > > > > s/k/c/ > > > > > > > > This problem currently blocks the PGO roll on the ChromeOS kernel and > > > we need some kind of a solution. > > > > I'm sorry, but I don't feel constrained by your internal deadlines. I > > have my own... > > > > > Could you please take a look? > > > > I have asked for a reproducer. All I got for an answer is "this is > > hard". Providing a profiling file would help, for example. > > Could you please try the following profile on the 5.15 branch? > > $ cat <<EOF > prof.txt > kvm_pgtable_walk:100:10 > 2: 5 > 3: 5 > 5: 5 > 6: 5 > 10: 5 > 10: _kvm_pgtable_walk:50 > 5: 5 > 7: 5 > 10: 5 > 13.2: 5 > 14: 5 > 16: 5 __kvm_pgtable_walk:5 > 13: kvm_pgd_page_idx:30 > 2: __kvm_pgd_page_idx:30 > 2: 5 > 3: 5 > 5: 5 > 2: kvm_granule_shift:5 > 3: 5 > EOF > > $ make LLVM=1 ARCH=arm64 KCFLAGS=-fprofile-sample-use=prof.txt -j8 vmlinux Thanks, this was helpful, as I was able to reproduce the build failure. FWIW, it seems pretty easy to work around by filtering out the offending option, making it consistent with the mechanism we already use for tracing and the like. I came up with the hack below, which does the trick and is IMHO better than dropping the section (extra work) or adding the negation of this option (which depends on the compiler option evaluation order). M. diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile index 48f6ae7cc6e6..7df1b6afca7f 100644 --- a/arch/arm64/kvm/hyp/nvhe/Makefile +++ b/arch/arm64/kvm/hyp/nvhe/Makefile @@ -91,7 +91,7 @@ quiet_cmd_hypcopy = HYPCOPY $@ # Remove ftrace, Shadow Call Stack, and CFI CFLAGS. # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations. -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI) -fprofile-sample-use=%, $(KBUILD_CFLAGS)) # KVM nVHE code is run at a different exception code with a different map, so # compiler instrumentation that inserts callbacks or checks into the code may
Thank you Marc for figuring out the filtering-out solution! It fixed the build on ChromeOS. I will update the patch and also filter out `-fprofile-use` which will avoid a similar problem with the instrumented PGO in the future. Thanks, Denis On Thu, Oct 13, 2022 at 4:09 AM Marc Zyngier <maz@kernel.org> wrote: > > On Tue, 11 Oct 2022 03:15:36 +0100, > Denis Nikitin <denik@chromium.org> wrote: > > > > On Sat, Oct 8, 2022 at 7:22 PM Marc Zyngier <maz@kernel.org> wrote: > > > > > > On Thu, 06 Oct 2022 17:28:17 +0100, > > > Denis Nikitin <denik@chromium.org> wrote: > > > > > > > > Hi Mark, > > > > > > s/k/c/ > > > > > > > > > > > This problem currently blocks the PGO roll on the ChromeOS kernel and > > > > we need some kind of a solution. > > > > > > I'm sorry, but I don't feel constrained by your internal deadlines. I > > > have my own... > > > > > > > Could you please take a look? > > > > > > I have asked for a reproducer. All I got for an answer is "this is > > > hard". Providing a profiling file would help, for example. > > > > Could you please try the following profile on the 5.15 branch? > > > > $ cat <<EOF > prof.txt > > kvm_pgtable_walk:100:10 > > 2: 5 > > 3: 5 > > 5: 5 > > 6: 5 > > 10: 5 > > 10: _kvm_pgtable_walk:50 > > 5: 5 > > 7: 5 > > 10: 5 > > 13.2: 5 > > 14: 5 > > 16: 5 __kvm_pgtable_walk:5 > > 13: kvm_pgd_page_idx:30 > > 2: __kvm_pgd_page_idx:30 > > 2: 5 > > 3: 5 > > 5: 5 > > 2: kvm_granule_shift:5 > > 3: 5 > > EOF > > > > $ make LLVM=1 ARCH=arm64 KCFLAGS=-fprofile-sample-use=prof.txt -j8 vmlinux > > Thanks, this was helpful, as I was able to reproduce the build failure. > > FWIW, it seems pretty easy to work around by filtering out the > offending option, making it consistent with the mechanism we already > use for tracing and the like. > > I came up with the hack below, which does the trick and is IMHO better > than dropping the section (extra work) or adding the negation of this > option (which depends on the compiler option evaluation order). > > M. > > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile > index 48f6ae7cc6e6..7df1b6afca7f 100644 > --- a/arch/arm64/kvm/hyp/nvhe/Makefile > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile > @@ -91,7 +91,7 @@ quiet_cmd_hypcopy = HYPCOPY $@ > > # Remove ftrace, Shadow Call Stack, and CFI CFLAGS. > # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations. > -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) > +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI) -fprofile-sample-use=%, $(KBUILD_CFLAGS)) > > # KVM nVHE code is run at a different exception code with a different map, so > # compiler instrumentation that inserts callbacks or checks into the code may > > > -- > Without deviation from the norm, progress is not possible.
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile index b5c5119c7396..49ec950ac57b 100644 --- a/arch/arm64/kvm/hyp/nvhe/Makefile +++ b/arch/arm64/kvm/hyp/nvhe/Makefile @@ -78,8 +78,10 @@ $(obj)/kvm_nvhe.o: $(obj)/kvm_nvhe.rel.o FORCE # The HYPREL command calls `gen-hyprel` to generate an assembly file with # a list of relocations targeting hyp code/data. +# Starting from 13.0.0 llvm emits SHT_REL section '.llvm.call-graph-profile' +# when profile optimization is applied. gen-hyprel does not support SHT_REL. quiet_cmd_hyprel = HYPREL $@ - cmd_hyprel = $(obj)/gen-hyprel $< > $@ + cmd_hyprel = $(OBJCOPY) -R .llvm.call-graph-profile $<; $(obj)/gen-hyprel $< > $@ # The HYPCOPY command uses `objcopy` to prefix all ELF symbol names # to avoid clashes with VHE code/data.
Kernel build with clang and KCFLAGS=-fprofile-sample-use fails with: error: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: Unexpected SHT_REL section ".rel.llvm.call-graph-profile" Starting from 13.0.0 llvm can generate SHT_REL section, see https://reviews.llvm.org/rGca3bdb57fa1ac98b711a735de048c12b5fdd8086. gen-hyprel does not support SHT_REL relocation section. Remove ".llvm.call-graph-profile" SHT_REL relocation from kvm_nvhe to fix the build. Signed-off-by: Denis Nikitin <denik@chromium.org> --- V1 -> V2: Remove the relocation instead of disabling the profile-use flags. --- arch/arm64/kvm/hyp/nvhe/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)