Message ID | 20231109053029.1403552-1-yonghong.song@linux.dev (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] selftests/bpf: Fix pyperf180 compilation failure with llvm18 | expand |
On Wed, 2023-11-08 at 21:30 -0800, Yonghong Song wrote: > With latest llvm18 (main branch of llvm-project repo), when building bpf selftests, > [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j > > The following compilation error happens: > fatal error: error in backend: Branch target out of insn range > ... > Stack dump: > 0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian > -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include > -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi > -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter > /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include > -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf > -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o > 1. <eof> parser at end of file > 2. Code generation > ... > > The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay > since cpu=v4 supports 32-bit branch target offset. > > The above failure is due to upstream llvm patch [1] where some inlining behavior > are changed in llvm18. > > To workaround the issue, previously all 180 loop iterations are fully unrolled. > Now, the fully unrolling count is changed to 90 for llvm18 and later. This reduced > some otherwise long branch target distance, and fixed the compilation failure. > > [1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e > > Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Can confirm, the issue is present on clang main w/o this patch and disappears after this patch. Yonghong, is there a way to keep original UNROLL_COUNT if cpuv4 is used? Tested-by: Eduard Zingerman <eddyz87@gmail.com>
On 11/9/23 3:47 AM, Eduard Zingerman wrote: > On Wed, 2023-11-08 at 21:30 -0800, Yonghong Song wrote: >> With latest llvm18 (main branch of llvm-project repo), when building bpf selftests, >> [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j >> >> The following compilation error happens: >> fatal error: error in backend: Branch target out of insn range >> ... >> Stack dump: >> 0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian >> -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include >> -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi >> -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter >> /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include >> -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf >> -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o >> 1. <eof> parser at end of file >> 2. Code generation >> ... >> >> The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay >> since cpu=v4 supports 32-bit branch target offset. >> >> The above failure is due to upstream llvm patch [1] where some inlining behavior >> are changed in llvm18. >> >> To workaround the issue, previously all 180 loop iterations are fully unrolled. >> Now, the fully unrolling count is changed to 90 for llvm18 and later. This reduced >> some otherwise long branch target distance, and fixed the compilation failure. >> >> [1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e >> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > Can confirm, the issue is present on clang main w/o this patch and > disappears after this patch. > > Yonghong, is there a way to keep original UNROLL_COUNT if cpuv4 is used? I thought about this but a little bit lazy so not giving it enough throught. But since you mentioned this, I think adding a macro to indicate cpu version by llvm is a good idea. This will give bpf developers some flexibility to add new features (new cpu variant) or workaround bugs (for a particular cpu variant but not impacting others if they are fine), etc. So here is the llvm patch: https://github.com/llvm/llvm-project/pull/71856 With the above llvm patch, the following code change should work: diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c index c39f559d3100..2473845d1ee2 100644 --- a/tools/testing/selftests/bpf/progs/pyperf180.c +++ b/tools/testing/selftests/bpf/progs/pyperf180.c @@ -1,4 +1,18 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (c) 2019 Facebook #define STACK_MAX_LEN 180 + +/* llvm upstream commit at llvm18 + * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e + * changed inlining behavior and caused compilation failure as some branch + * target distance exceeded 16bit representation which is the maximum for + * cpu v1/v2/v3. Macro __bpf_cpu_version__ is implemented in llvm18 to specify + * which cpu version is used for compilation. So we can set a smaller + * unroll_count if __bpf_cpu_version__ is less than 4, which reduced + * some branch target distances and resolved the compilation failure. + */ +#if defined(__bpf_cpu_version__) && __bpf_cpu_version__ < 4 +#define UNROLL_COUNT 90 +#endif + #include "pyperf.h" > > Tested-by: Eduard Zingerman <eddyz87@gmail.com>
On Thu, 2023-11-09 at 11:54 -0800, Yonghong Song wrote: > On 11/9/23 3:47 AM, Eduard Zingerman wrote: > > On Wed, 2023-11-08 at 21:30 -0800, Yonghong Song wrote: > > > With latest llvm18 (main branch of llvm-project repo), when building bpf selftests, > > > [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j > > > > > > The following compilation error happens: > > > fatal error: error in backend: Branch target out of insn range > > > ... > > > Stack dump: > > > 0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian > > > -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include > > > -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi > > > -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter > > > /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include > > > -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf > > > -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o > > > 1. <eof> parser at end of file > > > 2. Code generation > > > ... > > > > > > The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay > > > since cpu=v4 supports 32-bit branch target offset. > > > > > > The above failure is due to upstream llvm patch [1] where some inlining behavior > > > are changed in llvm18. > > > > > > To workaround the issue, previously all 180 loop iterations are fully unrolled. > > > Now, the fully unrolling count is changed to 90 for llvm18 and later. This reduced > > > some otherwise long branch target distance, and fixed the compilation failure. > > > > > > [1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e > > > > > > Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > > Can confirm, the issue is present on clang main w/o this patch and > > disappears after this patch. > > > > Yonghong, is there a way to keep original UNROLL_COUNT if cpuv4 is used? > > I thought about this but a little bit lazy so not giving it enough throught. > But since you mentioned this, I think adding a macro to indicate cpu version > by llvm is a good idea. This will give bpf developers some flexibility to > add new features (new cpu variant) or workaround bugs (for a particular cpu variant > but not impacting others if they are fine), etc. > > So here is the llvm patch: https://github.com/llvm/llvm-project/pull/71856 Thank you, tried it locally, works as expected.
On Thu, Nov 9, 2023 at 11:55 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 11/9/23 3:47 AM, Eduard Zingerman wrote: > > On Wed, 2023-11-08 at 21:30 -0800, Yonghong Song wrote: > >> With latest llvm18 (main branch of llvm-project repo), when building bpf selftests, > >> [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j > >> > >> The following compilation error happens: > >> fatal error: error in backend: Branch target out of insn range > >> ... > >> Stack dump: > >> 0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian > >> -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include > >> -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi > >> -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter > >> /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include > >> -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf > >> -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o > >> 1. <eof> parser at end of file > >> 2. Code generation > >> ... > >> > >> The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay > >> since cpu=v4 supports 32-bit branch target offset. > >> > >> The above failure is due to upstream llvm patch [1] where some inlining behavior > >> are changed in llvm18. > >> > >> To workaround the issue, previously all 180 loop iterations are fully unrolled. > >> Now, the fully unrolling count is changed to 90 for llvm18 and later. This reduced > >> some otherwise long branch target distance, and fixed the compilation failure. > >> > >> [1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e > >> > >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > > Can confirm, the issue is present on clang main w/o this patch and > > disappears after this patch. > > > > Yonghong, is there a way to keep original UNROLL_COUNT if cpuv4 is used? > > I thought about this but a little bit lazy so not giving it enough throught. > But since you mentioned this, I think adding a macro to indicate cpu version > by llvm is a good idea. This will give bpf developers some flexibility to > add new features (new cpu variant) or workaround bugs (for a particular cpu variant > but not impacting others if they are fine), etc. > > So here is the llvm patch: https://github.com/llvm/llvm-project/pull/71856 Great idea. Commented on the diff. > With the above llvm patch, the following code change should work: > > diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c > index c39f559d3100..2473845d1ee2 100644 > --- a/tools/testing/selftests/bpf/progs/pyperf180.c > +++ b/tools/testing/selftests/bpf/progs/pyperf180.c > @@ -1,4 +1,18 @@ > // SPDX-License-Identifier: GPL-2.0 > // Copyright (c) 2019 Facebook > #define STACK_MAX_LEN 180 > + > +/* llvm upstream commit at llvm18 > + * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e > + * changed inlining behavior and caused compilation failure as some branch > + * target distance exceeded 16bit representation which is the maximum for > + * cpu v1/v2/v3. Macro __bpf_cpu_version__ is implemented in llvm18 to specify > + * which cpu version is used for compilation. So we can set a smaller > + * unroll_count if __bpf_cpu_version__ is less than 4, which reduced > + * some branch target distances and resolved the compilation failure. > + */ > +#if defined(__bpf_cpu_version__) && __bpf_cpu_version__ < 4 probably should be combined with __clang_major__ >= 18 check too. > +#define UNROLL_COUNT 90 > +#endif > + > #include "pyperf.h" > > > > > > Tested-by: Eduard Zingerman <eddyz87@gmail.com>
On 11/9/23 1:09 PM, Alexei Starovoitov wrote: > On Thu, Nov 9, 2023 at 11:55 AM Yonghong Song <yonghong.song@linux.dev> wrote: >> >> On 11/9/23 3:47 AM, Eduard Zingerman wrote: >>> On Wed, 2023-11-08 at 21:30 -0800, Yonghong Song wrote: >>>> With latest llvm18 (main branch of llvm-project repo), when building bpf selftests, >>>> [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j >>>> >>>> The following compilation error happens: >>>> fatal error: error in backend: Branch target out of insn range >>>> ... >>>> Stack dump: >>>> 0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian >>>> -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include >>>> -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi >>>> -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter >>>> /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include >>>> -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf >>>> -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o >>>> 1. <eof> parser at end of file >>>> 2. Code generation >>>> ... >>>> >>>> The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay >>>> since cpu=v4 supports 32-bit branch target offset. >>>> >>>> The above failure is due to upstream llvm patch [1] where some inlining behavior >>>> are changed in llvm18. >>>> >>>> To workaround the issue, previously all 180 loop iterations are fully unrolled. >>>> Now, the fully unrolling count is changed to 90 for llvm18 and later. This reduced >>>> some otherwise long branch target distance, and fixed the compilation failure. >>>> >>>> [1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e >>>> >>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> >>> Can confirm, the issue is present on clang main w/o this patch and >>> disappears after this patch. >>> >>> Yonghong, is there a way to keep original UNROLL_COUNT if cpuv4 is used? >> I thought about this but a little bit lazy so not giving it enough throught. >> But since you mentioned this, I think adding a macro to indicate cpu version >> by llvm is a good idea. This will give bpf developers some flexibility to >> add new features (new cpu variant) or workaround bugs (for a particular cpu variant >> but not impacting others if they are fine), etc. >> >> So here is the llvm patch: https://github.com/llvm/llvm-project/pull/71856 > Great idea. Commented on the diff. > >> With the above llvm patch, the following code change should work: >> >> diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c >> index c39f559d3100..2473845d1ee2 100644 >> --- a/tools/testing/selftests/bpf/progs/pyperf180.c >> +++ b/tools/testing/selftests/bpf/progs/pyperf180.c >> @@ -1,4 +1,18 @@ >> // SPDX-License-Identifier: GPL-2.0 >> // Copyright (c) 2019 Facebook >> #define STACK_MAX_LEN 180 >> + >> +/* llvm upstream commit at llvm18 >> + * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e >> + * changed inlining behavior and caused compilation failure as some branch >> + * target distance exceeded 16bit representation which is the maximum for >> + * cpu v1/v2/v3. Macro __bpf_cpu_version__ is implemented in llvm18 to specify >> + * which cpu version is used for compilation. So we can set a smaller >> + * unroll_count if __bpf_cpu_version__ is less than 4, which reduced >> + * some branch target distances and resolved the compilation failure. >> + */ >> +#if defined(__bpf_cpu_version__) && __bpf_cpu_version__ < 4 > probably should be combined with __clang_major__ >= 18 check too. Okay, I could do this to catch the case where somebody uses development llvm18 which has this regression but __bpf_cpu_version__ is not introduced yet. > >> +#define UNROLL_COUNT 90 >> +#endif >> + >> #include "pyperf.h" >> >> >>> Tested-by: Eduard Zingerman <eddyz87@gmail.com>
On Thu, Nov 9, 2023 at 1:53 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > >> + */ > >> +#if defined(__bpf_cpu_version__) && __bpf_cpu_version__ < 4 > > probably should be combined with __clang_major__ >= 18 check too. > > Okay, I could do this to catch the case where somebody uses development > llvm18 which has this regression but __bpf_cpu_version__ is not > introduced yet. Exactly. That's what I tried to say.
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c index c39f559d3100..3c38f3e12836 100644 --- a/tools/testing/selftests/bpf/progs/pyperf180.c +++ b/tools/testing/selftests/bpf/progs/pyperf180.c @@ -1,4 +1,17 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (c) 2019 Facebook #define STACK_MAX_LEN 180 + +/* llvm upstream commit at llvm18 + * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e + * changed inlining behavior and caused compilation failure as some branch + * target distance exceeded 16bit representation which is the maximum for + * cpu v1/v2/v3. To workaround this, for llvm18 and later, let us set unroll_count + * to be 90, which reduced some branch target distances and resolved the + * compilation failure. + */ +#if __clang_major__ >= 18 +#define UNROLL_COUNT 90 +#endif + #include "pyperf.h"
With latest llvm18 (main branch of llvm-project repo), when building bpf selftests, [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j The following compilation error happens: fatal error: error in backend: Branch target out of insn range ... Stack dump: 0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o 1. <eof> parser at end of file 2. Code generation ... The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay since cpu=v4 supports 32-bit branch target offset. The above failure is due to upstream llvm patch [1] where some inlining behavior are changed in llvm18. To workaround the issue, previously all 180 loop iterations are fully unrolled. Now, the fully unrolling count is changed to 90 for llvm18 and later. This reduced some otherwise long branch target distance, and fixed the compilation failure. [1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e Signed-off-by: Yonghong Song <yonghong.song@linux.dev> --- tools/testing/selftests/bpf/progs/pyperf180.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)