Message ID | 20231004002038.907778-1-jmattson@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86: KVM: Add feature flag for AMD's FsGsKernelGsBaseNonSerializing | expand |
On 10/3/23 17:20, Jim Mattson wrote: > Define an X86_FEATURE_* flag for > CPUID.80000021H:EAX.FsGsKernelGsBaseNonSerializing[bit 1], and > advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. ... > +#define X86_FEATURE_BASES_NON_SERIAL (20*32+ 1) /* "" FSBASE, GSBASE, and KERNELGSBASE are non-serializing */ This is failing to differentiate two *VERY* different things. FSBASE, GSBASE, and KERNELGSBASE themselves are registers. They have *NOTHING* to do with serialization. WRFSBASE, for instance is not serializing. Reading (with RDMSR) or using any of those three registers is not serializing. The *ONLY* thing that relates them to serialization is the WRMSR instruction which itself is (mostly) architecturally serializing and the fact that WRMSR has historically been the main way to write those three registers. The AMD docs call this out, which helps. But the changelog, comments and probably the feature naming need some work. Why does this matter, btw? Why do guests need this bit passed through?
On Tue, Oct 3, 2023 at 5:57 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/3/23 17:20, Jim Mattson wrote: > > Define an X86_FEATURE_* flag for > > CPUID.80000021H:EAX.FsGsKernelGsBaseNonSerializing[bit 1], and > > advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. > ... > > +#define X86_FEATURE_BASES_NON_SERIAL (20*32+ 1) /* "" FSBASE, GSBASE, and KERNELGSBASE are non-serializing */ > > This is failing to differentiate two *VERY* different things. > > FSBASE, GSBASE, and KERNELGSBASE themselves are registers. They have > *NOTHING* to do with serialization. WRFSBASE, for instance is not > serializing. Reading (with RDMSR) or using any of those three registers > is not serializing. > > The *ONLY* thing that relates them to serialization is the WRMSR > instruction which itself is (mostly) architecturally serializing and the > fact that WRMSR has historically been the main way to write those three > registers. > > The AMD docs call this out, which helps. But the changelog, comments > and probably the feature naming need some work. You're right; I was overly terse. I'll elucidate in v2. > Why does this matter, btw? Why do guests need this bit passed through? The business of declaring breaking changes to the architectural specification in a CPUID bit has never made much sense to me. Legacy software that depends on the original architectural specification isn't going to query the CPUID bit, because the CPUID bit didn't exist when it was written. New software probably isn't going to query the CPUID bit, either, because it has to have an implementation that works on newer processors regardless. Why, then, would a developer bother to provide an implementation that only works on older processors *and* the code to select an implementation based on a CPUID bit? Take, for example, CPUID.(EAX=7,ECX=0):EBX[bit 13], which, IIRC, was the first CPUID bit of the "Ha ha; we're changing the architectural specification" category. When Intel introduced this new behavior in Haswell, they broke WIN87EM.DLL in Windows XP (see https://communities.vmware.com/t5/Legacy-User-Blogs/General-Protection-Fault-in-module-WIN87EM-DLL-at-0001-02C6/ta-p/2770422). I know of at least three software packages commonly running in VMs that were broken as a result. The CPUID bit didn't solve any problems, and I doubt that any software queries that bit today. As a hypervisor developer, however, it's not up to me to make value judgments on individual CPUID bits. If a bit indicates an innate characteristic of the hardware, it should be passed through. No one is likely to query CPUID.80000021H.EAX[bit 21] today, but if someone does query the bit in the future, they can reasonably expect that WRMSR({FS,GS,KERNELGS}_BASE) is a serializing operation whenever this bit is clear. Therefore, any hypervisor that doesn't pass the bit through is broken. Sadly, this also means that for a heterogenous migration pool, the hypervisor must set this bit in the guest CPUID if it is set on any host in the pool. Yes, that means that the legacy behavior may sometimes be present in a VM that enumerates the CPUID bit, but that's the best we can do. I'm a little surprised at the pushback, TBH. Are you implying that there is some advantage to *not* passing this bit through?
On 10/3/23 19:44, Jim Mattson wrote: > I'm a little surprised at the pushback, TBH. Are you implying that > there is some advantage to *not* passing this bit through? I'm not really trying to push back. I'm honestly just curious. Linux obviously doesn't cat about the bit. So is this for some future Linux or some other OS?
On Tue, Oct 3, 2023 at 8:27 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/3/23 19:44, Jim Mattson wrote: > > I'm a little surprised at the pushback, TBH. Are you implying that > > there is some advantage to *not* passing this bit through? > > I'm not really trying to push back. I'm honestly just curious. Linux > obviously doesn't cat about the bit. So is this for some future Linux > or some other OS? It's not for any particular guest OS. It's just for correctness of the virtual CPU. Pedantically, hardware that enumerates this bit cannot run a guest that doesn't. Pragmatically, it almost certainly doesn't matter. Getting it right is trivial and has no impact on performance or code size, so why not just do it?
On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > The business of declaring breaking changes to the architectural > specification in a CPUID bit has never made much sense to me. How else should they be expressed then? In some flaky PDF which changes URLs whenever the new corporate CMS gets installed? Or we should do f/m/s matching which doesn't make any sense for VMs? When you think about it, CPUID is the best thing we have. > No one is likely to query CPUID.80000021H.EAX[bit 21] today, but if > someone does query the bit in the future, they can reasonably expect > that WRMSR({FS,GS,KERNELGS}_BASE) is a serializing operation whenever > this bit is clear. Therefore, any hypervisor that doesn't pass the bit > through is broken. Sadly, this also means that for a heterogenous > migration pool, the hypervisor must set this bit in the guest CPUID if > it is set on any host in the pool. Yes, that means that the legacy > behavior may sometimes be present in a VM that enumerates the CPUID > bit, but that's the best we can do. Yes, add this to your commit message. > I'm a little surprised at the pushback, TBH. Are you implying that > there is some advantage to *not* passing this bit through? We don't add stuff which is not worth adding. There has to be *at* *least* some justification for it. Thx.
On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > > The business of declaring breaking changes to the architectural > > specification in a CPUID bit has never made much sense to me. > > How else should they be expressed then? > > In some flaky PDF which changes URLs whenever the new corporate CMS gets > installed? > > Or we should do f/m/s matching which doesn't make any sense for VMs? > > When you think about it, CPUID is the best thing we have. > > > No one is likely to query CPUID.80000021H.EAX[bit 21] today, but if > > someone does query the bit in the future, they can reasonably expect > > that WRMSR({FS,GS,KERNELGS}_BASE) is a serializing operation whenever > > this bit is clear. Therefore, any hypervisor that doesn't pass the bit > > through is broken. Sadly, this also means that for a heterogenous > > migration pool, the hypervisor must set this bit in the guest CPUID if > > it is set on any host in the pool. Yes, that means that the legacy > > behavior may sometimes be present in a VM that enumerates the CPUID > > bit, but that's the best we can do. > > Yes, add this to your commit message. > > > I'm a little surprised at the pushback, TBH. Are you implying that > > there is some advantage to *not* passing this bit through? > > We don't add stuff which is not worth adding. There has to be *at* > *least* some justification for it. Let me propose the following axiom as justification: KVM_GET_SUPPORTED_CPUID must pass through any defeature bits that are set on the host, unless KVM is prepared to emulate the missing feature. Here, a defeature bit is any CPUID bit where a value of '1' indicates the absence of a feature. > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > > The business of declaring breaking changes to the architectural > > specification in a CPUID bit has never made much sense to me. > > How else should they be expressed then? > > In some flaky PDF which changes URLs whenever the new corporate CMS gets > installed? > > Or we should do f/m/s matching which doesn't make any sense for VMs? > > When you think about it, CPUID is the best thing we have. Every time a new defeature bit is introduced, it breaks existing hypervisors, because no one can predict ahead of time that these bits have to be passed through. I wonder if we could convince x86 CPU vendors to put all defeature bits under a single leaf, so that we can just set the entire leaf to all 1's in KVM_GET_SUPPORTED_CPUID.
On 10/5/23 09:22, Jim Mattson wrote: > On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: >> On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: >>> The business of declaring breaking changes to the architectural >>> specification in a CPUID bit has never made much sense to me. >> How else should they be expressed then? >> >> In some flaky PDF which changes URLs whenever the new corporate CMS gets >> installed? >> >> Or we should do f/m/s matching which doesn't make any sense for VMs? >> >> When you think about it, CPUID is the best thing we have. > Every time a new defeature bit is introduced, it breaks existing > hypervisors, because no one can predict ahead of time that these bits > have to be passed through. > > I wonder if we could convince x86 CPU vendors to put all defeature > bits under a single leaf, so that we can just set the entire leaf to > all 1's in KVM_GET_SUPPORTED_CPUID. I hope I'm not throwing stones from a glass house here... But I'm struggling to think of cases where Intel has read-only "defeature bits" like this one. There are certainly things like MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only indicators of a departure from established architecture seems ... suboptimal. It's arguable that TDX changed a bunch of architecture like causing exceptions on CPUID and MSRs that never caused exceptions before and _that_ constitutes a defeature. But that's the least of the problems for a TDX VM. :) (Seriously, I'm not trying to shame Intel's x86 fellow travelers here, just trying to make sure I'm not missing something).
On 10/4/23 09:58, Borislav Petkov wrote: > On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: >> The business of declaring breaking changes to the architectural >> specification in a CPUID bit has never made much sense to me. > How else should they be expressed then? > > In some flaky PDF which changes URLs whenever the new corporate CMS gets > installed? > > Or we should do f/m/s matching which doesn't make any sense for VMs? Nothing *needs* to be done other than documenting this retroactive change to what constitutes architectural behavior. It's not a CPUID that can be queried to change behavior; the user can use CPUID to diagnose that something has broken, but the broken program cannot know in the first place that the CPUID bit exists. I agree with Jim that it would be nice to have some bits from Intel, and some bits from AMD, that current processors always return as 1. Future processors can change those to 0 as desired. Intel did something similar with VMX. They have a bunch of bits for which we don't know the meaning, but we know it is something that "right now always causes vmexits". Even if in the future you might be able to disable it, the polarity of the bit is the same as for all other vmexit controls. Paolo
On Thu, Oct 5, 2023 at 9:35 AM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/5/23 09:22, Jim Mattson wrote: > > On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: > >> On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > >>> The business of declaring breaking changes to the architectural > >>> specification in a CPUID bit has never made much sense to me. > >> How else should they be expressed then? > >> > >> In some flaky PDF which changes URLs whenever the new corporate CMS gets > >> installed? > >> > >> Or we should do f/m/s matching which doesn't make any sense for VMs? > >> > >> When you think about it, CPUID is the best thing we have. > > Every time a new defeature bit is introduced, it breaks existing > > hypervisors, because no one can predict ahead of time that these bits > > have to be passed through. > > > > I wonder if we could convince x86 CPU vendors to put all defeature > > bits under a single leaf, so that we can just set the entire leaf to > > all 1's in KVM_GET_SUPPORTED_CPUID. > > I hope I'm not throwing stones from a glass house here... > > But I'm struggling to think of cases where Intel has read-only > "defeature bits" like this one. There are certainly things like > MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only > indicators of a departure from established architecture seems ... > suboptimal. > > It's arguable that TDX changed a bunch of architecture like causing > exceptions on CPUID and MSRs that never caused exceptions before and > _that_ constitutes a defeature. But that's the least of the problems > for a TDX VM. :) > > (Seriously, I'm not trying to shame Intel's x86 fellow travelers here, > just trying to make sure I'm not missing something). Intel's defeature bits that I know of are: CPUID.(EAX=7,ECX=0):EBX[bit 13] (Haswell) - "Deprecates FPU CS and FPU DS values if 1." CPUID.(EAX=7,ECX=0):EBX[bit 6] (Skylake) - "FDP_EXCPTN_ONLY. x87 FPU Data Pointer updated only on x87 exceptions if 1."
On 10/5/23 18:41, Jim Mattson wrote: >> I hope I'm not throwing stones from a glass house here... >> >> But I'm struggling to think of cases where Intel has read-only >> "defeature bits" like this one. There are certainly things like >> MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only >> indicators of a departure from established architecture seems ... >> suboptimal. >> >> It's arguable that TDX changed a bunch of architecture like causing >> exceptions on CPUID and MSRs that never caused exceptions before and >> _that_ constitutes a defeature. But that's the least of the problems >> for a TDX VM.
On Thu, Oct 5, 2023 at 9:39 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > I agree with Jim that it would be nice to have some bits from Intel, and > some bits from AMD, that current processors always return as 1. Future > processors can change those to 0 as desired. That's not quite what I meant. Today, hypervisors will not pass through a non-zero CPUID bit that they don't know the definition of. This makes sense for positive features, and for multi-bit fields. I'm suggesting a leaf devoted to single bit negative features. If a bit is set in hardware, it means that something has been taken away. Hypervisors don't need to know exactly what was taken away. For this leaf only, hypervisors will always pass through a non-zero bit, even if they have know idea what it means.
On 10/5/23 19:06, Jim Mattson wrote: > On Thu, Oct 5, 2023 at 9:39 AM Paolo Bonzini<pbonzini@redhat.com> wrote: > >> I agree with Jim that it would be nice to have some bits from Intel, and >> some bits from AMD, that current processors always return as 1. Future >> processors can change those to 0 as desired. > That's not quite what I meant. > > I'm suggesting a leaf devoted to single bit negative features. If a > bit is set in hardware, it means that something has been taken away. > Hypervisors don't need to know exactly what was taken away. For this > leaf only, hypervisors will always pass through a non-zero bit, even > if they have know idea what it means. Understood, but I'm suggesting that these might even have the right polarity: if a bit is set it means that something is there and might not in the future, even if we don't know exactly what. We can pass through the bit, we can AND bits across the migration pool to define what to pass to the guest, we can force-set the leaves to zero (feature removed). Either way, the point is to group future defeatures together. That said, these bits are only for documentation/debugging purposes anyway. So I like the idea because it would educate the architects about this issue, more than because it is actually useful... Paolo
On Thu, Oct 5, 2023 at 10:14 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 10/5/23 19:06, Jim Mattson wrote: > > On Thu, Oct 5, 2023 at 9:39 AM Paolo Bonzini<pbonzini@redhat.com> wrote: > > > >> I agree with Jim that it would be nice to have some bits from Intel, and > >> some bits from AMD, that current processors always return as 1. Future > >> processors can change those to 0 as desired. > > That's not quite what I meant. > > > > I'm suggesting a leaf devoted to single bit negative features. If a > > bit is set in hardware, it means that something has been taken away. > > Hypervisors don't need to know exactly what was taken away. For this > > leaf only, hypervisors will always pass through a non-zero bit, even > > if they have know idea what it means. > > Understood, but I'm suggesting that these might even have the right > polarity: if a bit is set it means that something is there and might not > in the future, even if we don't know exactly what. We can pass through > the bit, we can AND bits across the migration pool to define what to > pass to the guest, we can force-set the leaves to zero (feature > removed). Either way, the point is to group future defeatures together. Oh, yeah. Your suggestion is better. :)
On 10/5/23 09:41, Jim Mattson wrote: >> >> But I'm struggling to think of cases where Intel has read-only >> "defeature bits" like this one. There are certainly things like >> MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only >> indicators of a departure from established architecture seems ... >> suboptimal. ... > CPUID.(EAX=7,ECX=0):EBX[bit 13] (Haswell) - "Deprecates FPU CS and FPU > DS values if 1." > CPUID.(EAX=7,ECX=0):EBX[bit 6] (Skylake) - "FDP_EXCPTN_ONLY. x87 FPU > Data Pointer updated only on x87 exceptions if 1." Thanks! Trying to group these does make sense to me. I don't think people take architecture breakage lightly, but I certainly never considered that it might, for instance, be important enough to create a new VM migration pool. I'll try to keep an eye out for these.
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 58cb9495e40f..b53951c83d1d 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -443,6 +443,7 @@ /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */ #define X86_FEATURE_NO_NESTED_DATA_BP (20*32+ 0) /* "" No Nested Data Breakpoints */ +#define X86_FEATURE_BASES_NON_SERIAL (20*32+ 1) /* "" FSBASE, GSBASE, and KERNELGSBASE are non-serializing */ #define X86_FEATURE_LFENCE_RDTSC (20*32+ 2) /* "" LFENCE always serializing / synchronizes RDTSC */ #define X86_FEATURE_NULL_SEL_CLR_BASE (20*32+ 6) /* "" Null Selector Clears Base */ #define X86_FEATURE_AUTOIBRS (20*32+ 8) /* "" Automatic IBRS */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0544e30b4946..5e776e8619be 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -761,7 +761,8 @@ void kvm_set_cpu_caps(void) kvm_cpu_cap_mask(CPUID_8000_0021_EAX, F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ | - F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ + F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ | + F(BASES_NON_SERIAL) ); if (cpu_feature_enabled(X86_FEATURE_SRSO_NO))
Define an X86_FEATURE_* flag for CPUID.80000021H:EAX.FsGsKernelGsBaseNonSerializing[bit 1], and advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. This feature is not yet documented in the APM. See AMD's "Processor Programming Reference (PPR) for AMD Family 19h Model 61h, Revision B1 Processors (56713-B1-PUB)." Signed-off-by: Jim Mattson <jmattson@google.com> --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kvm/cpuid.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-)