[09/11] x86: rework CONFIG_GENERIC_CPU compiler flags

Message ID	20241204103042.1904639-10-arnd@kernel.org (mailing list archive)
State	New, archived
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0AA71B0F31; Wed, 4 Dec 2024 10:31:35 +0000 (UTC) From: Arnd Bergmann <arnd@kernel.org> To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: Arnd Bergmann <arnd@arndb.de>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Linus Torvalds <torvalds@linux-foundation.org>, Andy Shevchenko <andy@kernel.org>, Matthew Wilcox <willy@infradead.org>, Sean Christopherson <seanjc@google.com>, Davide Ciminaghi <ciminaghi@gnudd.com>, Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org Subject: [PATCH 09/11] x86: rework CONFIG_GENERIC_CPU compiler flags Date: Wed, 4 Dec 2024 11:30:40 +0100 Message-Id: <20241204103042.1904639-10-arnd@kernel.org> In-Reply-To: <20241204103042.1904639-1-arnd@kernel.org> References: <20241204103042.1904639-1-arnd@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	x86: 32-bit cleanups \| expand [00/11] x86: 32-bit cleanups [01/11] x86/Kconfig: Geode CPU has cmpxchg8b [02/11] x86: drop 32-bit "bigsmp" machine support [03/11] x86: Kconfig.cpu: split out 64-bit atom [04/11] x86: split CPU selection into 32-bit and 64-bit [05/11] x86: remove HIGHMEM64G support [06/11] x86: drop SWIOTLB and PHYS_ADDR_T_64BIT for PAE [07/11] x86: drop support for CONFIG_HIGHPTE [08/11] x86: document X86_INTEL_MID as 64-bit-only [09/11] x86: rework CONFIG_GENERIC_CPU compiler flags [10/11] x86: remove old STA2x11 support [11/11] x86: drop 32-bit KVM host support

Arnd Bergmann Dec. 4, 2024, 10:30 a.m. UTC

From: Arnd Bergmann <arnd@arndb.de>

Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to
run on all CPUs, but the Makefile does not actually pass an -march=
argument, instead relying on the default that was used to configure
the toolchain.

In many cases, gcc will be configured to -march=x86-64 or -march=k8
for maximum compatibility, but in other cases a distribution default
may be either raised to a more recent ISA, or set to -march=native
to build for the CPU used for compilation. This still works in the
case of building a custom kernel for the local machine.

The point where it breaks down is building a kernel for another
machine that is older the the default target. Changing the default
to -march=x86-64 would make it work reliable, but possibly produce
worse code on distros that intentionally default to a newer ISA.

To allow reliably building a kernel for either the oldest x86-64
CPUs or a more recent level, add three separate options for
v1, v2 and v3 of the architecture as defined by gcc and clang
and make them all turn on CONFIG_GENERIC_CPU. Based on this it
should be possible to change runtime feature detection into
build-time detection for things like cmpxchg16b, or possibly
gate features that are only available on older architectures.

Link: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143289.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/x86/Kconfig.cpu | 39 ++++++++++++++++++++++++++++++++++-----
 arch/x86/Makefile    |  6 ++++++
 2 files changed, 40 insertions(+), 5 deletions(-)

Tor Vic Dec. 4, 2024, 3:36 p.m. UTC | #1

On 12/4/24 11:30, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to
> run on all CPUs, but the Makefile does not actually pass an -march=
> argument, instead relying on the default that was used to configure
> the toolchain.
> 
> In many cases, gcc will be configured to -march=x86-64 or -march=k8
> for maximum compatibility, but in other cases a distribution default
> may be either raised to a more recent ISA, or set to -march=native
> to build for the CPU used for compilation. This still works in the
> case of building a custom kernel for the local machine.
> 
> The point where it breaks down is building a kernel for another
> machine that is older the the default target. Changing the default
> to -march=x86-64 would make it work reliable, but possibly produce
> worse code on distros that intentionally default to a newer ISA.
> 
> To allow reliably building a kernel for either the oldest x86-64
> CPUs or a more recent level, add three separate options for
> v1, v2 and v3 of the architecture as defined by gcc and clang
> and make them all turn on CONFIG_GENERIC_CPU. Based on this it
> should be possible to change runtime feature detection into
> build-time detection for things like cmpxchg16b, or possibly
> gate features that are only available on older architectures.
> 

Hi Arnd,

Similar but not identical changes have been proposed in the past several 
times like e.g. in 1, 2 and likely even more often.

Your solution seems to be much cleaner, I like it.

That said, on my Skylake platform, there is no difference between 
-march=x86-64 and -march=x86-64-v3 in terms of kernel binary size or 
performance.
I think Boris also said that these settings make no real difference on 
code generation.

Other settings might make a small difference (numbers are from 2023):
   -generic:       85.089.784 bytes
   -core2:         85.139.932 bytes
   -march=skylake: 85.017.808 bytes

----
[1] 
https://lore.kernel.org/all/4_u6ZNYPbaK36xkLt8ApRhiRTyWp_-NExHCH_tTFO_fanDglEmcbfowmiB505heI4md2AuR9hS-VSkf4s90sXb5--AnNTOwvPaTmcgzRYSY=@proton.me/

[2] 
https://lore.kernel.org/all/20230707105601.133221-1-dimitri.ledkov@canonical.com/

Nathan Chancellor Dec. 4, 2024, 5:09 p.m. UTC | #2

Hi Arnd,

On Wed, Dec 04, 2024 at 11:30:40AM +0100, Arnd Bergmann wrote:
...
> +++ b/arch/x86/Kconfig.cpu
> +config X86_64_V1
> +config X86_64_V2
> +config X86_64_V3
...
> +++ b/arch/x86/Makefile
> +        cflags-$(CONFIG_MX86_64_V1)	+= -march=x86-64
> +        cflags-$(CONFIG_MX86_64_V2)	+= $(call cc-option,-march=x86-64-v2,-march=x86-64)
> +        cflags-$(CONFIG_MX86_64_V3)	+= $(call cc-option,-march=x86-64-v3,-march=x86-64)
...
> +        rustflags-$(CONFIG_MX86_64_V1)	+= -Ctarget-cpu=x86-64
> +        rustflags-$(CONFIG_MX86_64_V2)	+= -Ctarget-cpu=x86-64-v2
> +        rustflags-$(CONFIG_MX86_64_V3)	+= -Ctarget-cpu=x86-64-v3

There appears to be an extra 'M' when using these CONFIGs in Makefile,
so I don't think this works as is?

Cheers,
Nathan

Arnd Bergmann Dec. 4, 2024, 5:51 p.m. UTC | #3

On Wed, Dec 4, 2024, at 16:36, Tor Vic wrote:
> On 12/4/24 11:30, Arnd Bergmann wrote:
> Similar but not identical changes have been proposed in the past several 
> times like e.g. in 1, 2 and likely even more often.
>
> Your solution seems to be much cleaner, I like it.

Thanks. It looks like the other two did not actually
address the bug I'm fixing in my version.

> That said, on my Skylake platform, there is no difference between 
> -march=x86-64 and -march=x86-64-v3 in terms of kernel binary size or 
> performance.
> I think Boris also said that these settings make no real difference on 
> code generation.

As Nathan pointed out, I had a typo in my patch, so the
options didn't actually do anything at all. I fixed it now
and did a 'defconfig' test build with all three:

> Other settings might make a small difference (numbers are from 2023):
>    -generic:       85.089.784 bytes
>    -core2:         85.139.932 bytes
>    -march=skylake: 85.017.808 bytes

   text	   data	    bss	    dec	    hex	filename
26664466	10806622	1490948	38962036	2528374	obj-x86/vmlinux-v1
26664466	10806622	1490948	38962036	2528374	obj-x86/vmlinux-v2
26662504	10806654	1490948	38960106	2527bea	obj-x86/vmlinux-v3

which is a tiny 2KB saved between v2 and v3. I looked at
the object code and found that the v3 version takes advantage
of the BMI extension, which makes perfect sense. Not sure
if it has any real performance benefits.

Between v1 and v2, there is a chance to turn things like
system_has_cmpxchg128() into a constant on v2 and higher.

The v4 version is meaningless in practice since it only
adds AVX512 instructions that are only present in very
few CPUs and not that useful inside the kernel side from
specialized crypto and raid helpers.

      Arnd

Arnd Bergmann Dec. 4, 2024, 5:52 p.m. UTC | #4

On Wed, Dec 4, 2024, at 18:09, Nathan Chancellor wrote:
> Hi Arnd,
>
> On Wed, Dec 04, 2024 at 11:30:40AM +0100, Arnd Bergmann wrote:
> ...
>> +++ b/arch/x86/Kconfig.cpu
>> +config X86_64_V1
>> +config X86_64_V2
>> +config X86_64_V3
> ...
>> +++ b/arch/x86/Makefile
>> +        cflags-$(CONFIG_MX86_64_V1)	+= -march=x86-64
>> +        cflags-$(CONFIG_MX86_64_V2)	+= $(call cc-option,-march=x86-64-v2,-march=x86-64)
>> +        cflags-$(CONFIG_MX86_64_V3)	+= $(call cc-option,-march=x86-64-v3,-march=x86-64)
> ...
>> +        rustflags-$(CONFIG_MX86_64_V1)	+= -Ctarget-cpu=x86-64
>> +        rustflags-$(CONFIG_MX86_64_V2)	+= -Ctarget-cpu=x86-64-v2
>> +        rustflags-$(CONFIG_MX86_64_V3)	+= -Ctarget-cpu=x86-64-v3
>
> There appears to be an extra 'M' when using these CONFIGs in Makefile,
> so I don't think this works as is?

Fixed now by adding the 'M' in the Kconfig file, thanks for
noticing it.

      Arnd

Linus Torvalds Dec. 4, 2024, 6:10 p.m. UTC | #5

"On second thought , let’s not go to x86-64 microarchitectural
levels. ‘Tis a silly place"

On Wed, 4 Dec 2024 at 02:31, Arnd Bergmann <arnd@kernel.org> wrote:
>
> To allow reliably building a kernel for either the oldest x86-64
> CPUs or a more recent level, add three separate options for
> v1, v2 and v3 of the architecture as defined by gcc and clang
> and make them all turn on CONFIG_GENERIC_CPU.

The whole "v2", "v3", "v4" etc naming seems to be some crazy glibc
artifact and is stupid and needs to die.

It has no relevance to anything. Please do *not* introduce that
mind-fart into the kernel sources.

I have no idea who came up with the "microarchitecture levels"
garbage, but as far as I can tell, it's entirely unofficial, and it's
a completely broken model.

There is a very real model for microarchitectural features, and it's
the CPUID bits. Trying to linearize those bits is technically wrong,
since these things simply aren't some kind of linear progression.

And worse, it's a "simplification" that literally adds complexity. Now
instead of asking "does this CPU support the cmpxchgb16 instruction?",
the question instead becomes one of "what the hell does 'v3' mean
again?"

So no. We are *NOT* introducing that idiocy in the kernel.

                Linus

Arnd Bergmann Dec. 4, 2024, 7:43 p.m. UTC | #6

On Wed, Dec 4, 2024, at 19:10, Linus Torvalds wrote:
> "On second thought , let’s not go to x86-64 microarchitectural
> levels. ‘Tis a silly place"

Fair enough. I'll just make it use -march=x86_64 to override
the compiler default then.

> On Wed, 4 Dec 2024 at 02:31, Arnd Bergmann <arnd@kernel.org> wrote:
>>
>> To allow reliably building a kernel for either the oldest x86-64
>> CPUs or a more recent level, add three separate options for
>> v1, v2 and v3 of the architecture as defined by gcc and clang
>> and make them all turn on CONFIG_GENERIC_CPU.
>
> The whole "v2", "v3", "v4" etc naming seems to be some crazy glibc
> artifact and is stupid and needs to die.
>
> It has no relevance to anything. Please do *not* introduce that
> mind-fart into the kernel sources.
>
> I have no idea who came up with the "microarchitecture levels"
> garbage, but as far as I can tell, it's entirely unofficial, and it's
> a completely broken model.

I agree that both the name and the concept are broken.
My idea was based on how distros (Red Hat Enterprise Linux
at least) already use the same levels for making userspace
require newer CPUs, so using the same flag for the kernel
makes some sense.

Making a point about the levels being stupid is a useful
goal as well.

> There is a very real model for microarchitectural features, and it's
> the CPUID bits. Trying to linearize those bits is technically wrong,
> since these things simply aren't some kind of linear progression.
>
> And worse, it's a "simplification" that literally adds complexity. Now
> instead of asking "does this CPU support the cmpxchgb16 instruction?",
> the question instead becomes one of "what the hell does 'v3' mean
> again?"

I guess the other side of it is that the current selection
between pentium4/core2/k8/bonnell/generic is not much better,
given that in practice nobody has any of the
pentium4/core2/k8/bonnell variants any more.

A more radical solution would be to just drop the entire
menu for 64-bit kernels and always default to "-march=x86_64
-mtune=generic" and 64 byte L1 cachelines.

      Arnd

Linus Torvalds Dec. 4, 2024, 11:33 p.m. UTC | #7

On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:
>
> I guess the other side of it is that the current selection
> between pentium4/core2/k8/bonnell/generic is not much better,
> given that in practice nobody has any of the
> pentium4/core2/k8/bonnell variants any more.

Yeah, I think that whole part of the x86 Kconfig is almost entirely historical.

It's historical also in the sense that a lot of those decisions matter
a whole lot less these days.

The whole CPU tuning issue is happily mostly a thing of the past,
since all modern CPU's do fairly well, and you don't have the crazy
glass jaws of yesteryear with in-order cores and the insane
instruction choice sensitivity of the P4 uarch.

And on our side, we've just also basically turned to much more dynamic
models, with either instruction rewriting or static branches or both.

So I suspect:

> A more radical solution would be to just drop the entire
> menu for 64-bit kernels and always default to "-march=x86_64
> -mtune=generic" and 64 byte L1 cachelines.

would actually be perfectly acceptable. The non-generic choices are
all entirely historical and not really very interesting.

Absolutely nobody sane cares about instruction scheduling for the old P4 cores.

In the bad old 32-bit days, we had real code generation issues with
basic instruction set, ie the whole "some CPU's are P6-class, but
don't actually support the CMOVxx instruction". Those days are gone.

And yes, on x86-64, we still have the whole cmpxchg16b issue, which
really is a slight annoyance. But the emphasis is on "slight" - we
basically have one back for this in the SLAB code, and a couple of
dynamic tests for one particular driver (iommu 128-bit IRTE mode).

So yeah, the cmpxchg16b thing is annoying, but _realistically_ I don't
think we care.

And some day we will forget about it, notice that those (few) AMD
early 64-bit CPU's can't possibly have been working for the last year
or two, and we'll finally just kill that code, but in the meantime the
cost of maintaining it is so slight that it's not worth actively going
out to kill it.

I do think that the *one* option we might have is "optimize for the
current CPU" for people who just want to build their own kernel for
their own machine. That's a nice easy choice to give people, and
'-march=native' is kind of simple to use.

Will that work when you cross-compile? No. Do we care? Also no. It's
basically a simple "you want to optimize for your own local machine"
switch.

Maybe that could replace some of the 32-bit choices too?

             Linus

Andy Shevchenko Dec. 5, 2024, 8:07 a.m. UTC | #8

On Wed, Dec 04, 2024 at 08:43:35PM +0100, Arnd Bergmann wrote:
> On Wed, Dec 4, 2024, at 19:10, Linus Torvalds wrote:

...

> I guess the other side of it is that the current selection
> between pentium4/core2/k8/bonnell/generic is not much better,
> given that in practice nobody has any of the
> pentium4/core2/k8/bonnell variants any more.

Just booted Bonnell device a day ago (WeTab), pity that it has old kernel
and I have no time to try anything recent on it...

(Just saying :-)

Andy Shevchenko Dec. 5, 2024, 8:13 a.m. UTC | #9

On Wed, Dec 04, 2024 at 03:33:19PM -0800, Linus Torvalds wrote:
> On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:

...

> Will that work when you cross-compile? No. Do we care? Also no. It's
> basically a simple "you want to optimize for your own local machine"
> switch.

Maybe it's okay for 64-bit machines, but for cross-compiling for 32-bit on
64-bit. I dunno what '-march=native -m32' (or equivalent) will give in such
cases.

> Maybe that could replace some of the 32-bit choices too?

Arnd Bergmann Dec. 5, 2024, 9:46 a.m. UTC | #10

On Thu, Dec 5, 2024, at 00:33, Linus Torvalds wrote:
> On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> I guess the other side of it is that the current selection
>> between pentium4/core2/k8/bonnell/generic is not much better,
>> given that in practice nobody has any of the
>> pentium4/core2/k8/bonnell variants any more.
>
> So I suspect:
>
>> A more radical solution would be to just drop the entire
>> menu for 64-bit kernels and always default to "-march=x86_64
>> -mtune=generic" and 64 byte L1 cachelines.
>
> would actually be perfectly acceptable. The non-generic choices are
> all entirely historical and not really very interesting.
>
> Absolutely nobody sane cares about instruction scheduling for the old P4 cores.

Ok, I'll do that instead then. This also means I can drop
the patch for CONFIG_MATOM.

> In the bad old 32-bit days, we had real code generation issues with
> basic instruction set, ie the whole "some CPU's are P6-class, but
> don't actually support the CMOVxx instruction". Those days are gone.

I did come across a remaining odd problem with this, as Crusoe and
GeodeLX both identify as Family 5 but have CMOV.  Trying to use
a CONFIG_M686+CONFIG_X86_GENERIC on these runs fails with a boot
error "This kernel requires a 686 CPU but only detected a 586 CPU".

As a result, the Debian 686 kernel binary gets built with
CONFIG_MGEODE_LX , which seems mildly wrong but harmful enough
to require a change in how we handle the levels.

> And yes, on x86-64, we still have the whole cmpxchg16b issue, which
> really is a slight annoyance. But the emphasis is on "slight" - we
> basically have one back for this in the SLAB code, and a couple of
> dynamic tests for one particular driver (iommu 128-bit IRTE mode).
>
> So yeah, the cmpxchg16b thing is annoying, but _realistically_ I don't
> think we care.
>
> And some day we will forget about it, notice that those (few) AMD
> early 64-bit CPU's can't possibly have been working for the last year
> or two, and we'll finally just kill that code, but in the meantime the
> cost of maintaining it is so slight that it's not worth actively going
> out to kill it.

Right, in particular my hope of turning the runtime detection into
always using compile-time configuration for cmpxchg16b is no longer
works as I noticed that risc-v has also gained a runtime detection
for system_has_cmpxchg128().

Besides cmpxchg16b, I can also see compile-time configuration
for some instructions (popcnt, tzcnt, movbe) and for 5-level
paging being useful, but not enough so to make up for the
configuration complexity.

I still think we will end up needing more compile time
configurability like this on arm64 to deal with small-memory
embedded systems, e.g. with a specialized cortex-a55 kernel
that leaves out support for other CPUs, but this is quite
different from the situation on x86-64.

> I do think that the *one* option we might have is "optimize for the
> current CPU" for people who just want to build their own kernel for
> their own machine. That's a nice easy choice to give people, and
> '-march=native' is kind of simple to use.
>
> Will that work when you cross-compile? No. Do we care? Also no. It's
> basically a simple "you want to optimize for your own local machine"
> switch.

Sure, I'll add that as a separate patch. Should it be -march=native
or -mtune=native though? Using -march= can be faster if it picks
up newer instructions, but it will eventually lead to users
running into a boot panic if it is accidentally turned on for
a kernel that runs on an older machine than it was built on.

> Maybe that could replace some of the 32-bit choices too?

Probably not. I spent hours looking through the 32-bit choices
in the hope of finding a way that is less of a mess. The current
menu mixes up instruction set level (486/586/686), optimization
(atom/k7/m3/pentiumm) and platform (elan/geode/pc) options.
This is needlessly confusing, but any change to the status quo
is going to cause more problems for existing users than it
solves. All the "interesting" embedded ones are likely to be
cross-compiled anyway, so mtune=native or -march=native wouldn't
help them either.

     Arnd

Andy Shevchenko Dec. 5, 2024, 10:01 a.m. UTC | #11

On Thu, Dec 05, 2024 at 10:46:25AM +0100, Arnd Bergmann wrote:
> On Thu, Dec 5, 2024, at 00:33, Linus Torvalds wrote:
> > On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:

...

> I did come across a remaining odd problem with this, as Crusoe and
> GeodeLX both identify as Family 5 but have CMOV.  Trying to use
> a CONFIG_M686+CONFIG_X86_GENERIC on these runs fails with a boot
> error "This kernel requires a 686 CPU but only detected a 586 CPU".

It might be also that Intel Quark is affected same way.

> As a result, the Debian 686 kernel binary gets built with
> CONFIG_MGEODE_LX , which seems mildly wrong but harmful enough
> to require a change in how we handle the levels.

Arnd Bergmann Dec. 5, 2024, 10:09 a.m. UTC | #12

On Thu, Dec 5, 2024, at 09:13, Andy Shevchenko wrote:
> On Wed, Dec 04, 2024 at 03:33:19PM -0800, Linus Torvalds wrote:
>> On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:
>
> ...
>
>> Will that work when you cross-compile? No. Do we care? Also no. It's
>> basically a simple "you want to optimize for your own local machine"
>> switch.
>
> Maybe it's okay for 64-bit machines, but for cross-compiling for 32-bit on
> 64-bit. I dunno what '-march=native -m32' (or equivalent) will give in such
> cases.

From the compiler's perspective this is nothing special, it just
builds a 32-bit binary that can use any instruction supported in
32-bit mode of that 64-bit CPU, the same as the 32-bit CONFIG_MCORE2
option that I disallow in patch 04/11.

     Arnd

Arnd Bergmann Dec. 5, 2024, 10:47 a.m. UTC | #13

On Thu, Dec 5, 2024, at 11:01, Andy Shevchenko wrote:
> On Thu, Dec 05, 2024 at 10:46:25AM +0100, Arnd Bergmann wrote:
>> On Thu, Dec 5, 2024, at 00:33, Linus Torvalds wrote:
>> > On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:
>
> ...
>
>> I did come across a remaining odd problem with this, as Crusoe and
>> GeodeLX both identify as Family 5 but have CMOV.  Trying to use
>> a CONFIG_M686+CONFIG_X86_GENERIC on these runs fails with a boot
>> error "This kernel requires a 686 CPU but only detected a 586 CPU".
>
> It might be also that Intel Quark is affected same way.

No, as far as I can tell, Quark correctly identifies as Family 5
and is lacking CMOV. It does seem though that it's currently
impossible to configure a kernel for Quark that uses PAE/NX,
because there is no CONFIG_MQUARK and it relies on building
with CONFIG_M586TSC. If anyone still cared enough about it,
they could probably add an MQUARK option that has lets
you build the kernel with -march=i586 -mtune=i486 and
optional PAE.

The only other one that perhaps gets misidentified is the IDT
Winchip that is claimed to support cmpxchg64b but only
identifies as Family 4. It's even less likely that anyone
cares about this one than the Quark.

     Arnd

Andy Shevchenko Dec. 5, 2024, 11:17 a.m. UTC | #14

On Thu, Dec 05, 2024 at 11:09:41AM +0100, Arnd Bergmann wrote:
> On Thu, Dec 5, 2024, at 09:13, Andy Shevchenko wrote:
> > On Wed, Dec 04, 2024 at 03:33:19PM -0800, Linus Torvalds wrote:
> >> On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:

...

> >> Will that work when you cross-compile? No. Do we care? Also no. It's
> >> basically a simple "you want to optimize for your own local machine"
> >> switch.
> >
> > Maybe it's okay for 64-bit machines, but for cross-compiling for 32-bit on
> > 64-bit. I dunno what '-march=native -m32' (or equivalent) will give in such
> > cases.
> 
> From the compiler's perspective this is nothing special, it just
> builds a 32-bit binary that can use any instruction supported in
> 32-bit mode of that 64-bit CPU,

But does this affect building, e.g., for Quark on my Skylake desktop?

> the same as the 32-bit CONFIG_MCORE2 option that I disallow in patch 04/11.

Arnd Bergmann Dec. 5, 2024, 11:58 a.m. UTC | #15

On Thu, Dec 5, 2024, at 12:17, Andy Shevchenko wrote:
> On Thu, Dec 05, 2024 at 11:09:41AM +0100, Arnd Bergmann wrote:
>> On Thu, Dec 5, 2024, at 09:13, Andy Shevchenko wrote:
>> > On Wed, Dec 04, 2024 at 03:33:19PM -0800, Linus Torvalds wrote:
>> >> On Wed, 4 Dec 2024 at 11:44, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> >> Will that work when you cross-compile? No. Do we care? Also no. It's
>> >> basically a simple "you want to optimize for your own local machine"
>> >> switch.
>> >
>> > Maybe it's okay for 64-bit machines, but for cross-compiling for 32-bit on
>> > 64-bit. I dunno what '-march=native -m32' (or equivalent) will give in such
>> > cases.
>> 
>> From the compiler's perspective this is nothing special, it just
>> builds a 32-bit binary that can use any instruction supported in
>> 32-bit mode of that 64-bit CPU,
>
> But does this affect building, e.g., for Quark on my Skylake desktop?

Not at the moment:

- the bug I'm fixing in the patch at hand is currently only present
  when building 64-bit kernels

- For a 64-bit target such as a Pineview Atom, it's only a problem
  if the toolchain default is -arch=native and you build with
  CONFIG_GENERIC_CPU

- If we add support for configuring -march=native and you build
  using that option on a Skylake host, that would be equally
  broken for 32-bit Quark or 64-bit Pineview targets that are
  lacking some of the instructions present in Skylake.

As I said earlier, I don't think we should offer the 'native'
option for 32-bit targets at all. For 64-bit, we either decide
it's a user error to enable -march=native, or change it to
-mtune=native to avoid the problem.

     Arnd

Jason A. Donenfeld Dec. 5, 2024, 12:35 p.m. UTC | #16

On Thu, Dec 05, 2024 at 12:58:22PM +0100, Arnd Bergmann wrote:
> As I said earlier, I don't think we should offer the 'native'
> option for 32-bit targets at all. For 64-bit, we either decide
> it's a user error to enable -march=native, or change it to
> -mtune=native to avoid the problem.

I've been building my laptop's kernel with -march=native for years, and
I'd be happy if this capability were upstream.

Jason

David Laight Dec. 6, 2024, 1:56 p.m. UTC | #17

From: Arnd Bergmann
> Sent: 04 December 2024 10:31
> Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to
> run on all CPUs, but the Makefile does not actually pass an -march=
> argument, instead relying on the default that was used to configure
> the toolchain.
> 
> In many cases, gcc will be configured to -march=x86-64 or -march=k8
> for maximum compatibility, but in other cases a distribution default
> may be either raised to a more recent ISA, or set to -march=native
> to build for the CPU used for compilation. This still works in the
> case of building a custom kernel for the local machine.
> 
> The point where it breaks down is building a kernel for another
> machine that is older the the default target. Changing the default
> to -march=x86-64 would make it work reliable, but possibly produce
> worse code on distros that intentionally default to a newer ISA.
> 
> To allow reliably building a kernel for either the oldest x86-64
> CPUs or a more recent level, add three separate options for
> v1, v2 and v3 of the architecture as defined by gcc and clang
> and make them all turn on CONFIG_GENERIC_CPU. Based on this it
> should be possible to change runtime feature detection into
> build-time detection for things like cmpxchg16b, or possibly
> gate features that are only available on older architectures.
> 
> Link: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143289.html
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/x86/Kconfig.cpu | 39 ++++++++++++++++++++++++++++++++++-----
>  arch/x86/Makefile    |  6 ++++++
>  2 files changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
> index 139db904e564..1461a739237b 100644
> --- a/arch/x86/Kconfig.cpu
> +++ b/arch/x86/Kconfig.cpu
> @@ -260,7 +260,7 @@ endchoice
>  choice
>  	prompt "x86-64 Processor family"
>  	depends on X86_64
> -	default GENERIC_CPU
> +	default X86_64_V2
>  	help
>  	  This is the processor type of your CPU. This information is
>  	  used for optimizing purposes. In order to compile a kernel
> @@ -314,15 +314,44 @@ config MSILVERMONT
>  	  early Atom CPUs based on the Bonnell microarchitecture,
>  	  such as Atom 230/330, D4xx/D5xx, D2xxx, N2xxx or Z2xxx.
> 
> -config GENERIC_CPU
> -	bool "Generic-x86-64"
> +config X86_64_V1
> +	bool "Generic x86-64"
>  	depends on X86_64
>  	help
> -	  Generic x86-64 CPU.
> -	  Run equally well on all x86-64 CPUs.
> +	  Generic x86-64-v1 CPU.
> +	  Run equally well on all x86-64 CPUs, including early Pentium-4
> +	  variants lacking the sahf and cmpxchg16b instructions as well
> +	  as the AMD K8 and Intel Core 2 lacking popcnt.

The 'equally well' text was clearly always wrong (equally badly?)
but is now just 'plain wrong'.
Perhaps:
	Runs on all x86-64 CPUs including early cpu that lack the sahf,
	cmpxchg16b and popcnt instructions.

Then for V2 (or whatever it gets called)
	Requires support for the sahf, cmpxchg16b and popcnt instructions.
	This will not run on AMD K8 or Intel before Sandy bridge.

I think someone suggested that run-time detect of AVX/AVX2/AVX512
is fine?

	David

> +
> +config X86_64_V2
> +	bool "Generic x86-64 v2"
> +	depends on X86_64
> +	help
> +	  Generic x86-64-v2 CPU.
> +	  Run equally well on all x86-64 CPUs that meet the x86-64-v2
> +	  definition as well as those that only miss the optional
> +	  SSE3/SSSE3/SSE4.1 portions.
> +	  Examples of this include Intel Nehalem and Silvermont,
> +	  AMD Bulldozer (K10) and Jaguar as well as VIA Nano that
> +	  include popcnt, cmpxchg16b and sahf.
> +
> +config X86_64_V3
> +	bool "Generic x86-64 v3"
> +	depends on X86_64
> +	help
> +	  Generic x86-64-v3 CPU.
> +	  Run equally well on all x86-64 CPUs that meet the x86-64-v3
> +	  definition as well as those that only miss the optional
> +	  AVX/AVX2 portions.
> +	  Examples of this include the Intel Haswell and AMD Excavator
> +	  microarchitectures that include the bmi1/bmi2, lzncnt, movbe
> +	  and xsave instruction set extensions.
> 
>  endchoice
> 
> +config GENERIC_CPU
> +	def_bool X86_64_V1 || X86_64_V2 || X86_64_V3
> +
>  config X86_GENERIC
>  	bool "Generic x86 support"
>  	depends on X86_32
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 05887ae282f5..1fdc3fc6a54e 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -183,6 +183,9 @@ else
>          cflags-$(CONFIG_MPSC)		+= -march=nocona
>          cflags-$(CONFIG_MCORE2)		+= -march=core2
>          cflags-$(CONFIG_MSILVERMONT)	+= -march=silvermont
> +        cflags-$(CONFIG_MX86_64_V1)	+= -march=x86-64
> +        cflags-$(CONFIG_MX86_64_V2)	+= $(call cc-option,-march=x86-64-v2,-march=x86-64)
> +        cflags-$(CONFIG_MX86_64_V3)	+= $(call cc-option,-march=x86-64-v3,-march=x86-64)
>          cflags-$(CONFIG_GENERIC_CPU)	+= -mtune=generic
>          KBUILD_CFLAGS += $(cflags-y)
> 
> @@ -190,6 +193,9 @@ else
>          rustflags-$(CONFIG_MPSC)	+= -Ctarget-cpu=nocona
>          rustflags-$(CONFIG_MCORE2)	+= -Ctarget-cpu=core2
>          rustflags-$(CONFIG_MSILVERMONT)	+= -Ctarget-cpu=silvermont
> +        rustflags-$(CONFIG_MX86_64_V1)	+= -Ctarget-cpu=x86-64
> +        rustflags-$(CONFIG_MX86_64_V2)	+= -Ctarget-cpu=x86-64-v2
> +        rustflags-$(CONFIG_MX86_64_V3)	+= -Ctarget-cpu=x86-64-v3
>          rustflags-$(CONFIG_GENERIC_CPU)	+= -Ztune-cpu=generic
>          KBUILD_RUSTFLAGS += $(rustflags-y)
> 
> --
> 2.39.5
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

[09/11] x86: rework CONFIG_GENERIC_CPU compiler flags

Commit Message

Comments

Patch