selftests: Do not skip BPF selftests by default

Message ID	20241004095348.797020-1-bjorn@kernel.org (mailing list archive)
State	New
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF91714659B; Fri, 4 Oct 2024 09:53:57 +0000 (UTC) From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= <bjorn@kernel.org> To: Mark Brown <broonie@kernel.org>, Shuah Khan <shuah@kernel.org>, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, Andrii Nakryiko <andrii@kernel.org>, Eduard Zingerman <eddyz87@gmail.com>, Mykola Lysenko <mykolal@fb.com> Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= <bjorn@rivosinc.com>, linux-kernel@vger.kernel.org Subject: [PATCH] selftests: Do not skip BPF selftests by default Date: Fri, 4 Oct 2024 11:53:47 +0200 Message-ID: <20241004095348.797020-1-bjorn@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	selftests: Do not skip BPF selftests by default \| expand selftests: Do not skip BPF selftests by default

Björn Töpel Oct. 4, 2024, 9:53 a.m. UTC

From: Björn Töpel <bjorn@rivosinc.com>

This effectively is a revert of commit 7a6eb7c34a78 ("selftests: Skip
BPF seftests by default"). At the time when this was added, BPF had
"build time dependencies on cutting edge versions". Since then a
number of BPF capable tests has been included in net, hid, sched_ext.

There is no reason not to include BPF by default in the build.

Remove BPF from the selftests skiplist.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
 tools/testing/selftests/Makefile | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)


base-commit: 0c559323bbaabee7346c12e74b497e283aaafef5

Mark Brown Oct. 4, 2024, 1:07 p.m. UTC | #1

On Fri, Oct 04, 2024 at 11:53:47AM +0200, Björn Töpel wrote:
> From: Björn Töpel <bjorn@rivosinc.com>
> 
> This effectively is a revert of commit 7a6eb7c34a78 ("selftests: Skip
> BPF seftests by default"). At the time when this was added, BPF had
> "build time dependencies on cutting edge versions". Since then a
> number of BPF capable tests has been included in net, hid, sched_ext.
> 
> There is no reason not to include BPF by default in the build.

The issue was always requiring a bleeding edge version of clang, not
sure if that's been relaxed yet, IIRC sometimes it required git
versions.  I have clang 20 installed here so that's not an issue for me
but given that that's not released yet it wouldn't be reasonable to
expect CI systems to install it.

There's a few other substantial issues with all of these suites now I
look, none of them build on arm64 since arm64 defconfig has
DEBUG_INFO_REDUCED=y which isn't compatible with CONFIG_DEBUG_INFO_BTF
so that gets turned off and the build splats trying to read the BTF out
of the kernel binary (which is a new build dep for the selftests
too...).  

   https://storage.kernelci.org/next/master/next-20241004/arm64/defconfig%2Bkselftest/gcc-12/config/

We also get a bunch of:

die__process_unit: DW_TAG_label (0xa) @ <0x58eb7> not handled!
die__process_unit: tag not supported 0xa (label)!

if we do turn enable CONFIG_DEBUG_INFO_BTF for arm64.  

The whole thing is also broken for cross compilation with clang since
everything is assuming that CROSS_COMPILE will be set for cross builds
but that's not the case for LLVM=1 builds - net gives:

  BPF_PROG nat6to4.bpf.o
  BPF_PROG sample_map_ret0.bpf.o
/usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a(ynl.o): Relocations in generic ELF (EM: 62)
/usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a(ynl.o): Relocations in generic ELF (EM: 62)
/usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a(ynl.o): Relocations in generic ELF (EM: 62)
/usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a: error adding symbols: file in wrong format
  BPF_PROG sample_ret0.bpf.o
collect2: error: ld returned 1 exit status

with similar errors in libbpf for HID:

/usr/bin/aarch64-linux-gnu-ld: /home/broonie/git/linux/tools/testing/selftests/hid/tools/build/libbpf/libbpf.a(libbpf-in.o): Relocations in generic ELF (EM: 62)
/usr/bin/aarch64-linux-gnu-ld: /home/broonie/git/linux/tools/testing/selftests/hid/tools/build/libbpf/libbpf.a(libbpf-in.o): Relocations in generic ELF (EM: 62)

KernelCI is seeing failures earlier with HID:

   https://storage.kernelci.org/next/master/next-20241004/arm64/defconfig%2Bkselftest/gcc-12/logs/kselftest.log

and an unrelated missing dependency on libssl for net that needs to be
fixed.

Björn Töpel Oct. 4, 2024, 1:34 p.m. UTC | #2

Mark!

Mark Brown <broonie@kernel.org> writes:

> On Fri, Oct 04, 2024 at 11:53:47AM +0200, Björn Töpel wrote:
>> From: Björn Töpel <bjorn@rivosinc.com>
>> 
>> This effectively is a revert of commit 7a6eb7c34a78 ("selftests: Skip
>> BPF seftests by default"). At the time when this was added, BPF had
>> "build time dependencies on cutting edge versions". Since then a
>> number of BPF capable tests has been included in net, hid, sched_ext.
>> 
>> There is no reason not to include BPF by default in the build.
>
> The issue was always requiring a bleeding edge version of clang, not
> sure if that's been relaxed yet, IIRC sometimes it required git
> versions.  I have clang 20 installed here so that's not an issue for me
> but given that that's not released yet it wouldn't be reasonable to
> expect CI systems to install it.

Yeah, but I'd say that is not the case anymore. LLVM 18 and 19 works.

> There's a few other substantial issues with all of these suites now I
> look, none of them build on arm64 since arm64 defconfig has
> DEBUG_INFO_REDUCED=y which isn't compatible with CONFIG_DEBUG_INFO_BTF
> so that gets turned off and the build splats trying to read the BTF out
> of the kernel binary (which is a new build dep for the selftests
> too...).  
>
>    https://storage.kernelci.org/next/master/next-20241004/arm64/defconfig%2Bkselftest/gcc-12/config/
>
> We also get a bunch of:
>
> die__process_unit: DW_TAG_label (0xa) @ <0x58eb7> not handled!
> die__process_unit: tag not supported 0xa (label)!
>
> if we do turn enable CONFIG_DEBUG_INFO_BTF for arm64.  

This is pahole version related.

> The whole thing is also broken for cross compilation with clang since
> everything is assuming that CROSS_COMPILE will be set for cross builds
> but that's not the case for LLVM=1 builds - net gives:
>
>   BPF_PROG nat6to4.bpf.o
>   BPF_PROG sample_map_ret0.bpf.o
> /usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a(ynl.o): Relocations in generic ELF (EM: 62)
> /usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a(ynl.o): Relocations in generic ELF (EM: 62)
> /usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a(ynl.o): Relocations in generic ELF (EM: 62)
> /usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/bin/ld: /home/broonie/git/linux/tools/testing/selftests/net/libynl.a: error adding symbols: file in wrong format
>   BPF_PROG sample_ret0.bpf.o
> collect2: error: ld returned 1 exit status
>
> with similar errors in libbpf for HID:
>
> /usr/bin/aarch64-linux-gnu-ld: /home/broonie/git/linux/tools/testing/selftests/hid/tools/build/libbpf/libbpf.a(libbpf-in.o): Relocations in generic ELF (EM: 62)
> /usr/bin/aarch64-linux-gnu-ld: /home/broonie/git/linux/tools/testing/selftests/hid/tools/build/libbpf/libbpf.a(libbpf-in.o): Relocations in generic ELF (EM: 62)
>
> KernelCI is seeing failures earlier with HID:
>
>    https://storage.kernelci.org/next/master/next-20241004/arm64/defconfig%2Bkselftest/gcc-12/logs/kselftest.log
>
> and an unrelated missing dependency on libssl for net that needs to be
> fixed.

A lot can be said about kselftest, and cross-building. It's a bit of a
mess. Maybe we should move to meson or something for kselftest (that
requires less work for lazy developers like me). ;-)

I'm simply arguing that the *default* should be: BPF (and
hid/net/sched_ext) turned on. Default on would surface these kind of
problems, rather than hiding them. (And let the CI exclude tests it
cannot handle).


Cheers!
Björn

Mark Brown Oct. 4, 2024, 2:46 p.m. UTC | #3

On Fri, Oct 04, 2024 at 03:34:49PM +0200, Björn Töpel wrote:
> Mark Brown <broonie@kernel.org> writes:
> > On Fri, Oct 04, 2024 at 11:53:47AM +0200, Björn Töpel wrote:

> >> This effectively is a revert of commit 7a6eb7c34a78 ("selftests: Skip
> >> BPF seftests by default"). At the time when this was added, BPF had
> >> "build time dependencies on cutting edge versions". Since then a
> >> number of BPF capable tests has been included in net, hid, sched_ext.

> > The issue was always requiring a bleeding edge version of clang, not
> > sure if that's been relaxed yet, IIRC sometimes it required git
> > versions.  I have clang 20 installed here so that's not an issue for me
> > but given that that's not released yet it wouldn't be reasonable to
> > expect CI systems to install it.

> Yeah, but I'd say that is not the case anymore. LLVM 18 and 19 works.

Hrm, that's definitely a lot better then though still a little cutting
edge - the 24.10 Ubuntu release has clang 17, never mind any of the
stables or LTSs (Debian is very popular for build containers).  Not
quite at the "you can just install your distro package" level yet though
it's definitely substantial progress.  Is this requirement documented
somewhere someone could reasonably be expected to discover it?

It's a bit unfortunate having to pull clang into GCC build containers,
and needing a newer version than the minimum clang for the kernel itself
too :/

> > We also get a bunch of:

> > die__process_unit: DW_TAG_label (0xa) @ <0x58eb7> not handled!
> > die__process_unit: tag not supported 0xa (label)!

> > if we do turn enable CONFIG_DEBUG_INFO_BTF for arm64.

> This is pahole version related.

Which version is needed?  I've got 1.24 (from Debian) here...

> > The whole thing is also broken for cross compilation with clang since
> > everything is assuming that CROSS_COMPILE will be set for cross builds
> > but that's not the case for LLVM=1 builds - net gives:

> A lot can be said about kselftest, and cross-building. It's a bit of a
> mess. Maybe we should move to meson or something for kselftest (that
> requires less work for lazy developers like me). ;-)

AFAICT it pretty much works fine?  It's certainly widely used.

> I'm simply arguing that the *default* should be: BPF (and
> hid/net/sched_ext) turned on. Default on would surface these kind of
> problems, rather than hiding them. (And let the CI exclude tests it
> cannot handle).

The original motivation behind that patch was that there were a bunch of
CI systems all trying to run as many of the selftests as they can,
running into BPF and getting frustrated at the amount of time it was
consuming (or not managing to get it working at all).  Everyone was
assuming they were missing something or somehow doing the wrong thing to
satisfy the dependencies and it was burning a bunch of time and
discouraging people from using the selftests at all since it doesn't
create a good impression if stuff just doesn't build.  People did often
end up skipping BPF, but only after banging their heads against it for a
while, and then went and compared notes with other CI systems and found
everyone else had the same problem.

I think we before defaulting BPF stuff on we should at the very least
fix the builds for commonly covered architectures, it looks like as well
as arm64 we're also seeing BTF not generated on 32 bit arm:

   https://storage.kernelci.org/next/master/next-20241004/arm/multi_v7_defconfig%2Bkselftest/gcc-12/config/kernel.config

but everything else I spot checked looks fine.  It'd be much better to
skip gracefully if the kernel doesn't have BPF too.

We should probably also have explicit clang presence and feature/version
checks in the builds since clang is now fairly widely available in
distros but many of them have older versions than are needed so I
imagine a common failure pattern might be that people see clang is
needed, install their distro clang package and then run into errors from
clang.  That'd mean people would get a graceful skip with a clear
description of what's needed rather than build errors.

This is all a particular issue for the net tests where the addition of BPF
is a regression, not only can't you run the BPF based tests without
getting BPF working we've now lost the entire net testsuite if BPF isn't
working since it breaks the build.  TBH I didn't notice this getting
broken because I forgot that I was expecting to see net tests on
kernelci.org and the build break means they just disappear entirely from
the runtime results.  That really does need a graceful skip.

Shuah Khan Oct. 4, 2024, 6:52 p.m. UTC | #4

On 10/4/24 08:46, Mark Brown wrote:
> On Fri, Oct 04, 2024 at 03:34:49PM +0200, Björn Töpel wrote:
>> Mark Brown <broonie@kernel.org> writes:
>>> On Fri, Oct 04, 2024 at 11:53:47AM +0200, Björn Töpel wrote:
> 
>>>> This effectively is a revert of commit 7a6eb7c34a78 ("selftests: Skip
>>>> BPF seftests by default"). At the time when this was added, BPF had
>>>> "build time dependencies on cutting edge versions". Since then a
>>>> number of BPF capable tests has been included in net, hid, sched_ext.
> 
>>> The issue was always requiring a bleeding edge version of clang, not
>>> sure if that's been relaxed yet, IIRC sometimes it required git
>>> versions.  I have clang 20 installed here so that's not an issue for me
>>> but given that that's not released yet it wouldn't be reasonable to
>>> expect CI systems to install it.
> 
>> Yeah, but I'd say that is not the case anymore. LLVM 18 and 19 works.
> 
> Hrm, that's definitely a lot better then though still a little cutting
> edge - the 24.10 Ubuntu release has clang 17, never mind any of the
> stables or LTSs (Debian is very popular for build containers).  Not
> quite at the "you can just install your distro package" level yet though
> it's definitely substantial progress.  Is this requirement documented
> somewhere someone could reasonably be expected to discover it?
> 
> It's a bit unfortunate having to pull clang into GCC build containers,
> and needing a newer version than the minimum clang for the kernel itself
> too :/
> 
>>> We also get a bunch of:
> 
>>> die__process_unit: DW_TAG_label (0xa) @ <0x58eb7> not handled!
>>> die__process_unit: tag not supported 0xa (label)!
> 
>>> if we do turn enable CONFIG_DEBUG_INFO_BTF for arm64.
> 
>> This is pahole version related.
> 
> Which version is needed?  I've got 1.24 (from Debian) here...
> 
>>> The whole thing is also broken for cross compilation with clang since
>>> everything is assuming that CROSS_COMPILE will be set for cross builds
>>> but that's not the case for LLVM=1 builds - net gives:
> 
>> A lot can be said about kselftest, and cross-building. It's a bit of a
>> mess. Maybe we should move to meson or something for kselftest (that
>> requires less work for lazy developers like me). ;-)
> 
> AFAICT it pretty much works fine?  It's certainly widely used.
> 
>> I'm simply arguing that the *default* should be: BPF (and
>> hid/net/sched_ext) turned on. Default on would surface these kind of
>> problems, rather than hiding them. (And let the CI exclude tests it
>> cannot handle).
> 
> The original motivation behind that patch was that there were a bunch of
> CI systems all trying to run as many of the selftests as they can,
> running into BPF and getting frustrated at the amount of time it was
> consuming (or not managing to get it working at all).  Everyone was
> assuming they were missing something or somehow doing the wrong thing to
> satisfy the dependencies and it was burning a bunch of time and
> discouraging people from using the selftests at all since it doesn't
> create a good impression if stuff just doesn't build.  People did often
> end up skipping BPF, but only after banging their heads against it for a
> while, and then went and compared notes with other CI systems and found
> everyone else had the same problem.
> 
> I think we before defaulting BPF stuff on we should at the very least
> fix the builds for commonly covered architectures, it looks like as well
> as arm64 we're also seeing BTF not generated on 32 bit arm:
> 
>     https://storage.kernelci.org/next/master/next-20241004/arm/multi_v7_defconfig%2Bkselftest/gcc-12/config/kernel.config
> 
> but everything else I spot checked looks fine.  It'd be much better to
> skip gracefully if the kernel doesn't have BPF too.

+1 on this. I would rather have tests check for their dependencies
and skip them. Let's fix the tests to fail gracefully before adding
bpf and bpf dependent tests to default run.

I saw your patch to add sched_ext which shouldn't be added until
we get these resolved. Maybe you could include the checks to the
following patch to fail gracefully if kernel doesn't have BPF
and the environment doesn't have the dependencies installed.

Makefile could handle some of these and skip build instead of
failing the build. Individual test build failures shouldn't stop
kselftest-all build.

https://patchwork.kernel.org/project/linux-kselftest/patch/20241004094247.795385-1-bjorn@kernel.org/

> 
> We should probably also have explicit clang presence and feature/version
> checks in the builds since clang is now fairly widely available in
> distros but many of them have older versions than are needed so I
> imagine a common failure pattern might be that people see clang is
> needed, install their distro clang package and then run into errors from
> clang.  That'd mean people would get a graceful skip with a clear
> description of what's needed rather than build errors.
> 
> This is all a particular issue for the net tests where the addition of BPF
> is a regression, not only can't you run the BPF based tests without
> getting BPF working we've now lost the entire net testsuite if BPF isn't
> working since it breaks the build.  TBH I didn't notice this getting
> broken because I forgot that I was expecting to see net tests on
> kernelci.org and the build break means they just disappear entirely from
> the runtime results.  That really does need a graceful skip.

This is unfortunate. We have to fix the bpf dependent tests to do build
and run-time checks.

We can add the tests to default once they can work properly without their
dependencies.

thanks,
-- Shuah

Björn Töpel Oct. 5, 2024, 10:12 a.m. UTC | #5

Mark Brown <broonie@kernel.org> writes:

> On Fri, Oct 04, 2024 at 03:34:49PM +0200, Björn Töpel wrote:
>> Mark Brown <broonie@kernel.org> writes:
>> > On Fri, Oct 04, 2024 at 11:53:47AM +0200, Björn Töpel wrote:
>
>> >> This effectively is a revert of commit 7a6eb7c34a78 ("selftests: Skip
>> >> BPF seftests by default"). At the time when this was added, BPF had
>> >> "build time dependencies on cutting edge versions". Since then a
>> >> number of BPF capable tests has been included in net, hid, sched_ext.
>
>> > The issue was always requiring a bleeding edge version of clang, not
>> > sure if that's been relaxed yet, IIRC sometimes it required git
>> > versions.  I have clang 20 installed here so that's not an issue for me
>> > but given that that's not released yet it wouldn't be reasonable to
>> > expect CI systems to install it.
>
>> Yeah, but I'd say that is not the case anymore. LLVM 18 and 19 works.
>
> Hrm, that's definitely a lot better then though still a little cutting
> edge - the 24.10 Ubuntu release has clang 17, never mind any of the
> stables or LTSs (Debian is very popular for build containers).  Not
> quite at the "you can just install your distro package" level yet though
> it's definitely substantial progress.  Is this requirement documented
> somewhere someone could reasonably be expected to discover it?

I agree it would help having the minimal version stated somewhere. I'm
not aware of it.

> It's a bit unfortunate having to pull clang into GCC build containers,
> and needing a newer version than the minimum clang for the kernel itself
> too :/

I guess this boils down to the expecatation on the build environment. I
pull in Rust, various LLVM, and GCC versions into the build container.

Is the expectation the kernel and userland tooling must be the same?

>> > We also get a bunch of:
>
>> > die__process_unit: DW_TAG_label (0xa) @ <0x58eb7> not handled!
>> > die__process_unit: tag not supported 0xa (label)!
>
>> > if we do turn enable CONFIG_DEBUG_INFO_BTF for arm64.
>
>> This is pahole version related.
>
> Which version is needed?  I've got 1.24 (from Debian) here...

I bumped to 1.25!

>> > The whole thing is also broken for cross compilation with clang since
>> > everything is assuming that CROSS_COMPILE will be set for cross builds
>> > but that's not the case for LLVM=1 builds - net gives:
>
>> A lot can be said about kselftest, and cross-building. It's a bit of a
>> mess. Maybe we should move to meson or something for kselftest (that
>> requires less work for lazy developers like me). ;-)
>
> AFAICT it pretty much works fine?  It's certainly widely used.

Ugh, I guess we have very different views here. For me kselftest
cross-building is breaking all the time. The tests are a mix of using
the kselftest framework, and "having tests stored somewhere". Targets
have different semantics (e.g. missing things from "install"),
developers (I definitely include me self here!) seem to have a hard time
figuring out what should be included in the test Makefiles (copy and
paste, etc.).

A lot of tests are not included in the top-level kselftest Makefile
(maybe there's a rationale for that? I haven't found one).

I love kbuild for the *kernel*, but IMO it really feels bolted on for
kselftest (and much of tools/).

Tests don't get the same love as the kernel proper, and developers don't
want to spend a lot of time figuring out how kselftests works -- and
that shows in kselftest.

>> I'm simply arguing that the *default* should be: BPF (and
>> hid/net/sched_ext) turned on. Default on would surface these kind of
>> problems, rather than hiding them. (And let the CI exclude tests it
>> cannot handle).
>
> The original motivation behind that patch was that there were a bunch of
> CI systems all trying to run as many of the selftests as they can,
> running into BPF and getting frustrated at the amount of time it was
> consuming (or not managing to get it working at all).  Everyone was
> assuming they were missing something or somehow doing the wrong thing to
> satisfy the dependencies and it was burning a bunch of time and
> discouraging people from using the selftests at all since it doesn't
> create a good impression if stuff just doesn't build.  People did often
> end up skipping BPF, but only after banging their heads against it for a
> while, and then went and compared notes with other CI systems and found
> everyone else had the same problem.
>
> I think we before defaulting BPF stuff on we should at the very least
> fix the builds for commonly covered architectures, it looks like as well
> as arm64 we're also seeing BTF not generated on 32 bit arm:
>
>    https://storage.kernelci.org/next/master/next-20241004/arm/multi_v7_defconfig%2Bkselftest/gcc-12/config/kernel.config
>
> but everything else I spot checked looks fine.  It'd be much better to
> skip gracefully if the kernel doesn't have BPF too.
>
> We should probably also have explicit clang presence and feature/version
> checks in the builds since clang is now fairly widely available in
> distros but many of them have older versions than are needed so I
> imagine a common failure pattern might be that people see clang is
> needed, install their distro clang package and then run into errors from
> clang.  That'd mean people would get a graceful skip with a clear
> description of what's needed rather than build errors.

This is not only true for BPF/Clang. There are a number of kselftests
that make assumptions about architecture, and tools. I do agree that a
proper feature detection (what bpftool/perf is using, or move to that
new shiny build system ;-P) for kselftest would be great!

> This is all a particular issue for the net tests where the addition of BPF
> is a regression, not only can't you run the BPF based tests without
> getting BPF working we've now lost the entire net testsuite if BPF isn't
> working since it breaks the build.  TBH I didn't notice this getting
> broken because I forgot that I was expecting to see net tests on
> kernelci.org and the build break means they just disappear entirely from
> the runtime results.  That really does need a graceful skip.

...and adding net/hid/sched_ext to the default skip as well? What the
tests that only work on some platforms. I'm intentionally provoking
here. I don't like hiding tests, because it is bit "tricky" to setup the
tooling. BPF is very much in the core of the kernel, and leaving those
tests out seems odd.

Thanks for the discussion! I'll have a look into the feature detection.

Björn

Mark Brown Oct. 5, 2024, 12:10 p.m. UTC | #6

On Sat, Oct 05, 2024 at 12:12:15PM +0200, Björn Töpel wrote:
> Mark Brown <broonie@kernel.org> writes:
> > On Fri, Oct 04, 2024 at 03:34:49PM +0200, Björn Töpel wrote:
> >> Mark Brown <broonie@kernel.org> writes:

> > It's a bit unfortunate having to pull clang into GCC build containers,
> > and needing a newer version than the minimum clang for the kernel itself
> > too :/

> I guess this boils down to the expecatation on the build environment. I
> pull in Rust, various LLVM, and GCC versions into the build container.

> Is the expectation the kernel and userland tooling must be the same?

I'd say it's a likely expectation, or at least a case people are going
to want to try.  A bunch of the people doing builds have per toolchain
containers, as well as keeping the container size under control it makes
it easy to be sure that your build actually used the toolchain you
thought it used if there simply isn't any other toolchain in there.  You
have testing that is focused on making sure that all the puportedly
supported toolchain versions actually work.

> >> > The whole thing is also broken for cross compilation with clang since
> >> > everything is assuming that CROSS_COMPILE will be set for cross builds
> >> > but that's not the case for LLVM=1 builds - net gives:

> >> A lot can be said about kselftest, and cross-building. It's a bit of a
> >> mess. Maybe we should move to meson or something for kselftest (that
> >> requires less work for lazy developers like me). ;-)

> > AFAICT it pretty much works fine?  It's certainly widely used.

> Ugh, I guess we have very different views here. For me kselftest
> cross-building is breaking all the time. The tests are a mix of using
> the kselftest framework, and "having tests stored somewhere". Targets
> have different semantics (e.g. missing things from "install"),
> developers (I definitely include me self here!) seem to have a hard time
> figuring out what should be included in the test Makefiles (copy and
> paste, etc.).

I'm pretty much exclusively cross building kselftest, even for the cases
I end up doing native builds the build is set up like a cross build
because a lot of my stuff is running in a Kubernetes cluster which will
spin up whatever machines it can get cheapest at a given moment.

> A lot of tests are not included in the top-level kselftest Makefile
> (maybe there's a rationale for that? I haven't found one).

As far as I'm aware the ones that aren't included are ones that are (or
were, perhaps there's some cases that need revisiting) unreasonably hard
to get going or where the test programs just don't look at all like
selftests so break the framework when you try to run them.  I
occasionally go through and try to enable more selftests in both my CI
and KernelCI, I know there's some that are missing from the top level
Makefile that I didn't hook in because they were too broken but there's
a bunch of others where I've sent patches because it just looks like an
oversight.  There's other people doing similar stuff, Muhammad Usama
Anjum from Collabora has been doing a bunch of stuff recently for
example.

> I love kbuild for the *kernel*, but IMO it really feels bolted on for
> kselftest (and much of tools/).

> Tests don't get the same love as the kernel proper, and developers don't
> want to spend a lot of time figuring out how kselftests works -- and
> that shows in kselftest.

That's a big part of the pushback on things like the build issues with
the BPF tests and the way those issues manifest, part of the goal is to
get the selftests to the point where they mostly work without too much
effort.  Things like requiring a library are not too much of an issue,
you just search for what provides foo.h on your distro or whatever, and
similarly if it's a common tool that distros tend to carry.  Similarly
adding the tools/testing/selftests/suite/config fragments to a defconfig
should be what's needed to get the kernel configured appropriately to do
the tests.

It would be nice to do something better with the libraries, IIRC some of
the suites do already for cases where the libray is only used by a
subset of the tests, but other than that we do mostly seem to be at a
point where you can reasonably just expect things to work.

> > I think we before defaulting BPF stuff on we should at the very least
> > fix the builds for commonly covered architectures, it looks like as well
> > as arm64 we're also seeing BTF not generated on 32 bit arm:

> >    https://storage.kernelci.org/next/master/next-20241004/arm/multi_v7_defconfig%2Bkselftest/gcc-12/config/kernel.config

> > but everything else I spot checked looks fine.  It'd be much better to
> > skip gracefully if the kernel doesn't have BPF too.

> > We should probably also have explicit clang presence and feature/version
> > checks in the builds since clang is now fairly widely available in
> > distros but many of them have older versions than are needed so I
> > imagine a common failure pattern might be that people see clang is
> > needed, install their distro clang package and then run into errors from
> > clang.  That'd mean people would get a graceful skip with a clear
> > description of what's needed rather than build errors.

> This is not only true for BPF/Clang. There are a number of kselftests
> that make assumptions about architecture, and tools. I do agree that a
> proper feature detection (what bpftool/perf is using, or move to that
> new shiny build system ;-P) for kselftest would be great!

Do you have specific pointers to problem suites?  The general idea is
that if there's architecture dependencies the tests should be skipped
when not applicable (eg, the arm64, riscv and x86 suites do this) and at
the minute for straightforward library dependencies we just expect
people to sort them out (it's mildly annoying whenever someone starts
using a new library but it tends not to be too much of a barrier).

I'm not sure the problem here is really the language TBH.

> > This is all a particular issue for the net tests where the addition of BPF
> > is a regression, not only can't you run the BPF based tests without
> > getting BPF working we've now lost the entire net testsuite if BPF isn't
> > working since it breaks the build.  TBH I didn't notice this getting
> > broken because I forgot that I was expecting to see net tests on
> > kernelci.org and the build break means they just disappear entirely from
> > the runtime results.  That really does need a graceful skip.

> ...and adding net/hid/sched_ext to the default skip as well? What the
> tests that only work on some platforms. I'm intentionally provoking
> here. I don't like hiding tests, because it is bit "tricky" to setup the
> tooling. BPF is very much in the core of the kernel, and leaving those
> tests out seems odd.

I think the ideal thing would be to have these suites included in the
build and skipping gracefully with comprehensible error messages if
their dependencies aren't met.  Now that they are expected to work with
released versions of clang it's a lot more tractable for people to
resolve the dependency than it was in the past.  We also need to ensure
that they work for at least major architectures, even with their
dependencies installed and config fragments turned on no upstream
configuration can currently build any BPF tests on either arm or arm64
which clearly isn't at all OK.

Like I said above I think for net the current situation is a serious
regression and should be fixed so that only the BPF related net tests
are skipped when BPF isn't working.

selftests: Do not skip BPF selftests by default

Commit Message

Comments

Patch