Message ID | 20200302195736.24777-3-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: CPUID emulation and tracing fixes | expand |
On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > Extend the mask in cpuid_function_in_range() for finding the "class" of > the function to 0xfffffff00. While there is no official definition of > what constitutes a class, e.g. arguably bits 31:16 should be the class > and bits 15:0 the functions within that class, the Hypervisor logic > effectively uses bits 31:8 as the class by virtue of checking for > different bases in increments of 0x100, e.g. KVM advertises its CPUID > functions starting at 0x40000100 when HyperV features are advertised at > the default base of 0x40000000. This convention deserves explicit documentation outside of the commit message. > Masking against 0x80000000 only handles basic and extended leafs, which > results in Centaur and Hypervisor range checks being performed against > the basic CPUID range, e.g. if CPUID.0x40000000.EAX=0x4000000A and there > is no entry for CPUID.0x40000006, then function 0x40000006 would be > incorrectly reported as out of bounds. > > The bad range check doesn't cause function problems for any known VMM > because out-of-range semantics only come into play if the exact entry > isn't found, and VMMs either support a very limited Hypervisor range, > e.g. the official KVM range is 0x40000000-0x40000001 (effectively no > room for undefined leafs) or explicitly defines gaps to be zero, e.g. > Qemu explicitly creates zeroed entries up to the Cenatur and Hypervisor > limits (the latter comes into play when providing HyperV features). Does Centaur implement the bizarre Intel behavior for out-of-bound entries? It seems that if there are Centaur leaves defined, the CPUD semantics should be those specified by Centaur. > The bad behavior can be visually confirmed by dumping CPUID output in > the guest when running Qemu with a stable TSC, as Qemu extends the limit > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, > without defining zeroed entries for 0x40000002 - 0x4000000f. > > Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH") > Cc: Jim Mattson <jmattson@google.com> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/kvm/cpuid.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 6be012937eba..c320126e0118 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -993,7 +993,7 @@ static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function) > { > struct kvm_cpuid_entry2 *max; > > - max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0); > + max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00u, 0); This assumes that CPUID.(function & 0xffffff00):EAX always contains the maximum input value for the 256-entry range sharing the high 24 bits. I don't believe that convention has ever been established or documented. > return max && function <= max->eax; > } > > -- > 2.24.1 >
On Mon, Mar 02, 2020 at 01:59:10PM -0800, Jim Mattson wrote: > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > Extend the mask in cpuid_function_in_range() for finding the "class" of > > the function to 0xfffffff00. While there is no official definition of > > what constitutes a class, e.g. arguably bits 31:16 should be the class > > and bits 15:0 the functions within that class, the Hypervisor logic > > effectively uses bits 31:8 as the class by virtue of checking for > > different bases in increments of 0x100, e.g. KVM advertises its CPUID > > functions starting at 0x40000100 when HyperV features are advertised at > > the default base of 0x40000000. > > This convention deserves explicit documentation outside of the commit message. No argument there. > > Masking against 0x80000000 only handles basic and extended leafs, which > > results in Centaur and Hypervisor range checks being performed against > > the basic CPUID range, e.g. if CPUID.0x40000000.EAX=0x4000000A and there > > is no entry for CPUID.0x40000006, then function 0x40000006 would be > > incorrectly reported as out of bounds. > > > > The bad range check doesn't cause function problems for any known VMM > > because out-of-range semantics only come into play if the exact entry > > isn't found, and VMMs either support a very limited Hypervisor range, > > e.g. the official KVM range is 0x40000000-0x40000001 (effectively no > > room for undefined leafs) or explicitly defines gaps to be zero, e.g. > > Qemu explicitly creates zeroed entries up to the Cenatur and Hypervisor > > limits (the latter comes into play when providing HyperV features). > > Does Centaur implement the bizarre Intel behavior for out-of-bound > entries? It seems that if there are Centaur leaves defined, the CPUD > semantics should be those specified by Centaur. Ah, right, because this code triggers on !=AMD, not ==Intel. Your guess is as good as mine, I've dug around a few times trying to track down a spec for Centaur/VIA without success. I would say that KVM's emulation behavior should probably be all or nothing, i.e. either due Intel's silly logic for all ranges/classes or do it for none. > > The bad behavior can be visually confirmed by dumping CPUID output in > > the guest when running Qemu with a stable TSC, as Qemu extends the limit > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, > > without defining zeroed entries for 0x40000002 - 0x4000000f. > > > > Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH") > > Cc: Jim Mattson <jmattson@google.com> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > --- > > arch/x86/kvm/cpuid.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index 6be012937eba..c320126e0118 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -993,7 +993,7 @@ static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function) > > { > > struct kvm_cpuid_entry2 *max; > > > > - max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0); > > + max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00u, 0); > > This assumes that CPUID.(function & 0xffffff00):EAX always contains > the maximum input value for the 256-entry range sharing the high 24 > bits. I don't believe that convention has ever been established or > documented. Not sure if it's formally documented, but it's well established. The closest thing I could find to documentation is the lkml thread where what's implemented today (AFAICT) was proposed. https://lore.kernel.org/lkml/48E3BBC1.2050607@goop.org/ The relevant linux code in Linux (arch/x86/include/asm/processor.h), where @leaves contains the kernel's required minimum leaf to enable paravirt stuff for the hypervisor. static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) { uint32_t base, eax, signature[3]; for (base = 0x40000000; base < 0x40010000; base += 0x100) { cpuid(base, &eax, &signature[0], &signature[1], &signature[2]); if (!memcmp(sig, signature, 12) && (leaves == 0 || ((eax - base) >= leaves))) return base; } return 0; } > > return max && function <= max->eax; > > } > > > > -- > > 2.24.1 > >
On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > The bad behavior can be visually confirmed by dumping CPUID output in > the guest when running Qemu with a stable TSC, as Qemu extends the limit > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, > without defining zeroed entries for 0x40000002 - 0x4000000f. I think it could be reasonably argued that this is a userspace bug. Clearly, when userspace explicitly supplies the results for a leaf, those results override the default CPUID values for that leaf. But I haven't seen it documented anywhere that leaves *not* explicitly supplied by userspace will override the default CPUID values, just because they happen to appear in some magic range.
On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote: > > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > The bad behavior can be visually confirmed by dumping CPUID output in > > the guest when running Qemu with a stable TSC, as Qemu extends the limit > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, > > without defining zeroed entries for 0x40000002 - 0x4000000f. > > I think it could be reasonably argued that this is a userspace bug. > Clearly, when userspace explicitly supplies the results for a leaf, > those results override the default CPUID values for that leaf. But I > haven't seen it documented anywhere that leaves *not* explicitly > supplied by userspace will override the default CPUID values, just > because they happen to appear in some magic range. In fact, the more I think about it, the original change is correct, at least in this regard. Your "fix" introduces undocumented and unfathomable behavior.
On Mon, Mar 02, 2020 at 08:25:31PM -0800, Jim Mattson wrote: > On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote: > > > > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > The bad behavior can be visually confirmed by dumping CPUID output in > > > the guest when running Qemu with a stable TSC, as Qemu extends the limit > > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, > > > without defining zeroed entries for 0x40000002 - 0x4000000f. > > > > I think it could be reasonably argued that this is a userspace bug. > > Clearly, when userspace explicitly supplies the results for a leaf, > > those results override the default CPUID values for that leaf. But I > > haven't seen it documented anywhere that leaves *not* explicitly > > supplied by userspace will override the default CPUID values, just > > because they happen to appear in some magic range. > > In fact, the more I think about it, the original change is correct, at > least in this regard. Your "fix" introduces undocumented and > unfathomable behavior. Heh, the takeaway from this is that whatever we decide on needs to be documented somewhere :-) I wouldn't say it's unfathomable, conceptually it seems like the intent of the hypervisor range was to mimic the basic and extended ranges. The whole thing is arbitrary behavior. Of course if Intel CPUs would just return 0s on undefined leafs it would be a lot less arbitrary :-) Anyways, I don't have a strong opinion on whether this patch stays or goes.
Unfathomable was the wrong word. I can see what you're trying to do. I just don't think it's defensible. I suspect that Intel CPU architects will be surprised and disappointed to find that the maximum effective value of CPUID.0H:EAX is now 255, and that they have to define CPUID.100H:EAX as the "maximum leaf between 100H and 1FFH" if they want to define any leaves between 100H and 1FFH. Furthermore, AMD has only ceded 4000_0000h through 4000_00FFh to hypervisors, so kvm's use of 40000100H through 400001FFH appears to be a land grab, akin to VIA's unilateral grab of the C0000000H leaves. Admittedly, one could argue that the 40000000H leaves are not AMD's to apportion, since AMD and Intel appear to have reached a detente by splitting the available space down the middle. Intel, who seems to be the recognized authority for this range, declares the entire range from 40000000H through 4FFFFFFFH to be invalid. Make of that what you will. In any event, no one has ever documented what's supposed to happen if you leave gaps in the 4xxxxxxxH range when defining synthesized CPUID leaves under kvm. On Mon, Mar 2, 2020 at 8:58 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Mon, Mar 02, 2020 at 08:25:31PM -0800, Jim Mattson wrote: > > On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > The bad behavior can be visually confirmed by dumping CPUID output in > > > > the guest when running Qemu with a stable TSC, as Qemu extends the limit > > > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, > > > > without defining zeroed entries for 0x40000002 - 0x4000000f. > > > > > > I think it could be reasonably argued that this is a userspace bug. > > > Clearly, when userspace explicitly supplies the results for a leaf, > > > those results override the default CPUID values for that leaf. But I > > > haven't seen it documented anywhere that leaves *not* explicitly > > > supplied by userspace will override the default CPUID values, just > > > because they happen to appear in some magic range. > > > > In fact, the more I think about it, the original change is correct, at > > least in this regard. Your "fix" introduces undocumented and > > unfathomable behavior. > > Heh, the takeaway from this is that whatever we decide on needs to be > documented somewhere :-) > > I wouldn't say it's unfathomable, conceptually it seems like the intent > of the hypervisor range was to mimic the basic and extended ranges. The > whole thing is arbitrary behavior. Of course if Intel CPUs would just > return 0s on undefined leafs it would be a lot less arbitrary :-) > > Anyways, I don't have a strong opinion on whether this patch stays or goes.
On Tue, Mar 03, 2020 at 09:42:42AM -0800, Jim Mattson wrote: > Unfathomable was the wrong word. I dunno, one could argue that the behavior of Intel CPUs for CPUID is unfathomable and I was just trying to follow suit :-D > I can see what you're trying to do. I > just don't think it's defensible. I suspect that Intel CPU architects > will be surprised and disappointed to find that the maximum effective > value of CPUID.0H:EAX is now 255, and that they have to define > CPUID.100H:EAX as the "maximum leaf between 100H and 1FFH" if they > want to define any leaves between 100H and 1FFH. Hmm, ya, I agree that applying a 0xffffff00 mask to all classes of CPUID ranges is straight up wrong. > Furthermore, AMD has only ceded 4000_0000h through 4000_00FFh to > hypervisors, so kvm's use of 40000100H through 400001FFH appears to be > a land grab, akin to VIA's unilateral grab of the C0000000H leaves. > Admittedly, one could argue that the 40000000H leaves are not AMD's to > apportion, since AMD and Intel appear to have reached a detente by > splitting the available space down the middle. Intel, who seems to be > the recognized authority for this range, declares the entire range > from 40000000H through 4FFFFFFFH to be invalid. Make of that what you > will. > > In any event, no one has ever documented what's supposed to happen if > you leave gaps in the 4xxxxxxxH range when defining synthesized CPUID > leaves under kvm. Probably stating the obvious, but for me, the least suprising thing is for such leafs to output zeros. It also feels safer, e.g. a guest that's querying hypervisor support is less likely to be led astray by all zeros than by a random feature bits being set. What about something like this? Along with a comment and documentation... static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function) { struct kvm_cpuid_entry2 *max; if (function >= 0x40000000 && function <= 0x4fffffff) max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00, 0); else max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0); return max && function <= max->eax; } > On Mon, Mar 2, 2020 at 8:58 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Mon, Mar 02, 2020 at 08:25:31PM -0800, Jim Mattson wrote: > > > On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > > > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson > > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > > The bad behavior can be visually confirmed by dumping CPUID output in > > > > > the guest when running Qemu with a stable TSC, as Qemu extends the limit > > > > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, > > > > > without defining zeroed entries for 0x40000002 - 0x4000000f. > > > > > > > > I think it could be reasonably argued that this is a userspace bug. > > > > Clearly, when userspace explicitly supplies the results for a leaf, > > > > those results override the default CPUID values for that leaf. But I > > > > haven't seen it documented anywhere that leaves *not* explicitly > > > > supplied by userspace will override the default CPUID values, just > > > > because they happen to appear in some magic range. > > > > > > In fact, the more I think about it, the original change is correct, at > > > least in this regard. Your "fix" introduces undocumented and > > > unfathomable behavior. > > > > Heh, the takeaway from this is that whatever we decide on needs to be > > documented somewhere :-) > > > > I wouldn't say it's unfathomable, conceptually it seems like the intent > > of the hypervisor range was to mimic the basic and extended ranges. The > > whole thing is arbitrary behavior. Of course if Intel CPUs would just > > return 0s on undefined leafs it would be a lot less arbitrary :-) > > > > Anyways, I don't have a strong opinion on whether this patch stays or goes.
On Tue, Mar 3, 2020 at 10:01 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Tue, Mar 03, 2020 at 09:42:42AM -0800, Jim Mattson wrote: > > Unfathomable was the wrong word. > > I dunno, one could argue that the behavior of Intel CPUs for CPUID is > unfathomable and I was just trying to follow suit :-D > > > I can see what you're trying to do. I > > just don't think it's defensible. I suspect that Intel CPU architects > > will be surprised and disappointed to find that the maximum effective > > value of CPUID.0H:EAX is now 255, and that they have to define > > CPUID.100H:EAX as the "maximum leaf between 100H and 1FFH" if they > > want to define any leaves between 100H and 1FFH. > > Hmm, ya, I agree that applying a 0xffffff00 mask to all classes of CPUID > ranges is straight up wrong. > > > Furthermore, AMD has only ceded 4000_0000h through 4000_00FFh to > > hypervisors, so kvm's use of 40000100H through 400001FFH appears to be > > a land grab, akin to VIA's unilateral grab of the C0000000H leaves. > > Admittedly, one could argue that the 40000000H leaves are not AMD's to > > apportion, since AMD and Intel appear to have reached a detente by > > splitting the available space down the middle. Intel, who seems to be > > the recognized authority for this range, declares the entire range > > from 40000000H through 4FFFFFFFH to be invalid. Make of that what you > > will. > > > > In any event, no one has ever documented what's supposed to happen if > > you leave gaps in the 4xxxxxxxH range when defining synthesized CPUID > > leaves under kvm. > > Probably stating the obvious, but for me, the least suprising thing is for > such leafs to output zeros. It also feels safer, e.g. a guest that's > querying hypervisor support is less likely to be led astray by all zeros > than by a random feature bits being set. > > What about something like this? Along with a comment and documentation... > > static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function) > { > struct kvm_cpuid_entry2 *max; > > if (function >= 0x40000000 && function <= 0x4fffffff) > max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00, 0); > else > max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0); > return max && function <= max->eax; > } I can get behind that. The behavior of the 4xxxxxxxH leaves under kvm is arguably up to kvm (though AMD may disagree).
On 03/03/20 19:01, Sean Christopherson wrote: > static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function) > { > struct kvm_cpuid_entry2 *max; > > if (function >= 0x40000000 && function <= 0x4fffffff) > max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00, 0); > else > max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0); > return max && function <= max->eax; > } Yes, this is a good idea (except it should be & 0xc0000000 to cover Centaur). Paolo
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 6be012937eba..c320126e0118 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -993,7 +993,7 @@ static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function) { struct kvm_cpuid_entry2 *max; - max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0); + max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00u, 0); return max && function <= max->eax; }
Extend the mask in cpuid_function_in_range() for finding the "class" of the function to 0xfffffff00. While there is no official definition of what constitutes a class, e.g. arguably bits 31:16 should be the class and bits 15:0 the functions within that class, the Hypervisor logic effectively uses bits 31:8 as the class by virtue of checking for different bases in increments of 0x100, e.g. KVM advertises its CPUID functions starting at 0x40000100 when HyperV features are advertised at the default base of 0x40000000. Masking against 0x80000000 only handles basic and extended leafs, which results in Centaur and Hypervisor range checks being performed against the basic CPUID range, e.g. if CPUID.0x40000000.EAX=0x4000000A and there is no entry for CPUID.0x40000006, then function 0x40000006 would be incorrectly reported as out of bounds. The bad range check doesn't cause function problems for any known VMM because out-of-range semantics only come into play if the exact entry isn't found, and VMMs either support a very limited Hypervisor range, e.g. the official KVM range is 0x40000000-0x40000001 (effectively no room for undefined leafs) or explicitly defines gaps to be zero, e.g. Qemu explicitly creates zeroed entries up to the Cenatur and Hypervisor limits (the latter comes into play when providing HyperV features). The bad behavior can be visually confirmed by dumping CPUID output in the guest when running Qemu with a stable TSC, as Qemu extends the limit of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, without defining zeroed entries for 0x40000002 - 0x4000000f. Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)