diff mbox series

[2/6] KVM: x86: Fix CPUID range check for Centaur and Hypervisor ranges

Message ID 20200302195736.24777-3-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: CPUID emulation and tracing fixes | expand

Commit Message

Sean Christopherson March 2, 2020, 7:57 p.m. UTC
Extend the mask in cpuid_function_in_range() for finding the "class" of
the function to 0xfffffff00.  While there is no official definition of
what constitutes a class, e.g. arguably bits 31:16 should be the class
and bits 15:0 the functions within that class, the Hypervisor logic
effectively uses bits 31:8 as the class by virtue of checking for
different bases in increments of 0x100, e.g. KVM advertises its CPUID
functions starting at 0x40000100 when HyperV features are advertised at
the default base of 0x40000000.

Masking against 0x80000000 only handles basic and extended leafs, which
results in Centaur and Hypervisor range checks being performed against
the basic CPUID range, e.g. if CPUID.0x40000000.EAX=0x4000000A and there
is no entry for CPUID.0x40000006, then function 0x40000006 would be
incorrectly reported as out of bounds.

The bad range check doesn't cause function problems for any known VMM
because out-of-range semantics only come into play if the exact entry
isn't found, and VMMs either support a very limited Hypervisor range,
e.g. the official KVM range is 0x40000000-0x40000001 (effectively no
room for undefined leafs) or explicitly defines gaps to be zero, e.g.
Qemu explicitly creates zeroed entries up to the Cenatur and Hypervisor
limits (the latter comes into play when providing HyperV features).

The bad behavior can be visually confirmed by dumping CPUID output in
the guest when running Qemu with a stable TSC, as Qemu extends the limit
of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
without defining zeroed entries for 0x40000002 - 0x4000000f.

Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jim Mattson March 2, 2020, 9:59 p.m. UTC | #1
On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Extend the mask in cpuid_function_in_range() for finding the "class" of
> the function to 0xfffffff00.  While there is no official definition of
> what constitutes a class, e.g. arguably bits 31:16 should be the class
> and bits 15:0 the functions within that class, the Hypervisor logic
> effectively uses bits 31:8 as the class by virtue of checking for
> different bases in increments of 0x100, e.g. KVM advertises its CPUID
> functions starting at 0x40000100 when HyperV features are advertised at
> the default base of 0x40000000.

This convention deserves explicit documentation outside of the commit message.

> Masking against 0x80000000 only handles basic and extended leafs, which
> results in Centaur and Hypervisor range checks being performed against
> the basic CPUID range, e.g. if CPUID.0x40000000.EAX=0x4000000A and there
> is no entry for CPUID.0x40000006, then function 0x40000006 would be
> incorrectly reported as out of bounds.
>
> The bad range check doesn't cause function problems for any known VMM
> because out-of-range semantics only come into play if the exact entry
> isn't found, and VMMs either support a very limited Hypervisor range,
> e.g. the official KVM range is 0x40000000-0x40000001 (effectively no
> room for undefined leafs) or explicitly defines gaps to be zero, e.g.
> Qemu explicitly creates zeroed entries up to the Cenatur and Hypervisor
> limits (the latter comes into play when providing HyperV features).

Does Centaur implement the bizarre Intel behavior for out-of-bound
entries? It seems that if there are Centaur leaves defined, the CPUD
semantics should be those specified by Centaur.

> The bad behavior can be visually confirmed by dumping CPUID output in
> the guest when running Qemu with a stable TSC, as Qemu extends the limit
> of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
> without defining zeroed entries for 0x40000002 - 0x4000000f.
>
> Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH")
> Cc: Jim Mattson <jmattson@google.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/cpuid.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 6be012937eba..c320126e0118 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -993,7 +993,7 @@ static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function)
>  {
>         struct kvm_cpuid_entry2 *max;
>
> -       max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
> +       max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00u, 0);

This assumes that CPUID.(function & 0xffffff00):EAX always contains
the maximum input value for the 256-entry range sharing the high 24
bits. I don't believe that convention has ever been established or
documented.

>         return max && function <= max->eax;
>  }
>
> --
> 2.24.1
>
Sean Christopherson March 3, 2020, 12:57 a.m. UTC | #2
On Mon, Mar 02, 2020 at 01:59:10PM -0800, Jim Mattson wrote:
> On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Extend the mask in cpuid_function_in_range() for finding the "class" of
> > the function to 0xfffffff00.  While there is no official definition of
> > what constitutes a class, e.g. arguably bits 31:16 should be the class
> > and bits 15:0 the functions within that class, the Hypervisor logic
> > effectively uses bits 31:8 as the class by virtue of checking for
> > different bases in increments of 0x100, e.g. KVM advertises its CPUID
> > functions starting at 0x40000100 when HyperV features are advertised at
> > the default base of 0x40000000.
> 
> This convention deserves explicit documentation outside of the commit message.

No argument there.
 
> > Masking against 0x80000000 only handles basic and extended leafs, which
> > results in Centaur and Hypervisor range checks being performed against
> > the basic CPUID range, e.g. if CPUID.0x40000000.EAX=0x4000000A and there
> > is no entry for CPUID.0x40000006, then function 0x40000006 would be
> > incorrectly reported as out of bounds.
> >
> > The bad range check doesn't cause function problems for any known VMM
> > because out-of-range semantics only come into play if the exact entry
> > isn't found, and VMMs either support a very limited Hypervisor range,
> > e.g. the official KVM range is 0x40000000-0x40000001 (effectively no
> > room for undefined leafs) or explicitly defines gaps to be zero, e.g.
> > Qemu explicitly creates zeroed entries up to the Cenatur and Hypervisor
> > limits (the latter comes into play when providing HyperV features).
> 
> Does Centaur implement the bizarre Intel behavior for out-of-bound
> entries? It seems that if there are Centaur leaves defined, the CPUD
> semantics should be those specified by Centaur.

Ah, right, because this code triggers on !=AMD, not ==Intel.  Your guess
is as good as mine, I've dug around a few times trying to track down a spec
for Centaur/VIA without success.

I would say that KVM's emulation behavior should probably be all or
nothing, i.e. either due Intel's silly logic for all ranges/classes or do
it for none.

> > The bad behavior can be visually confirmed by dumping CPUID output in
> > the guest when running Qemu with a stable TSC, as Qemu extends the limit
> > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
> > without defining zeroed entries for 0x40000002 - 0x4000000f.
> >
> > Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH")
> > Cc: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 6be012937eba..c320126e0118 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -993,7 +993,7 @@ static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function)
> >  {
> >         struct kvm_cpuid_entry2 *max;
> >
> > -       max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
> > +       max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00u, 0);
> 
> This assumes that CPUID.(function & 0xffffff00):EAX always contains
> the maximum input value for the 256-entry range sharing the high 24
> bits. I don't believe that convention has ever been established or
> documented.

Not sure if it's formally documented, but it's well established.  The
closest thing I could find to documentation is the lkml thread where what's
implemented today (AFAICT) was proposed.

https://lore.kernel.org/lkml/48E3BBC1.2050607@goop.org/


The relevant linux code in Linux (arch/x86/include/asm/processor.h), where
@leaves contains the kernel's required minimum leaf to enable paravirt
stuff for the hypervisor.

static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
{
        uint32_t base, eax, signature[3];

        for (base = 0x40000000; base < 0x40010000; base += 0x100) {
                cpuid(base, &eax, &signature[0], &signature[1], &signature[2]);

                if (!memcmp(sig, signature, 12) &&
                    (leaves == 0 || ((eax - base) >= leaves)))
                        return base;
        }

        return 0;
}

> >         return max && function <= max->eax;
> >  }
> >
> > --
> > 2.24.1
> >
Jim Mattson March 3, 2020, 3:25 a.m. UTC | #3
On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:

> The bad behavior can be visually confirmed by dumping CPUID output in
> the guest when running Qemu with a stable TSC, as Qemu extends the limit
> of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
> without defining zeroed entries for 0x40000002 - 0x4000000f.

I think it could be reasonably argued that this is a userspace bug.
Clearly, when userspace explicitly supplies the results for a leaf,
those results override the default CPUID values for that leaf. But I
haven't seen it documented anywhere that leaves *not* explicitly
supplied by userspace will override the default CPUID values, just
because they happen to appear in some magic range.
Jim Mattson March 3, 2020, 4:25 a.m. UTC | #4
On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
>
> > The bad behavior can be visually confirmed by dumping CPUID output in
> > the guest when running Qemu with a stable TSC, as Qemu extends the limit
> > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
> > without defining zeroed entries for 0x40000002 - 0x4000000f.
>
> I think it could be reasonably argued that this is a userspace bug.
> Clearly, when userspace explicitly supplies the results for a leaf,
> those results override the default CPUID values for that leaf. But I
> haven't seen it documented anywhere that leaves *not* explicitly
> supplied by userspace will override the default CPUID values, just
> because they happen to appear in some magic range.

In fact, the more I think about it, the original change is correct, at
least in this regard. Your "fix" introduces undocumented and
unfathomable behavior.
Sean Christopherson March 3, 2020, 4:58 a.m. UTC | #5
On Mon, Mar 02, 2020 at 08:25:31PM -0800, Jim Mattson wrote:
> On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote:
> >
> > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> >
> > > The bad behavior can be visually confirmed by dumping CPUID output in
> > > the guest when running Qemu with a stable TSC, as Qemu extends the limit
> > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
> > > without defining zeroed entries for 0x40000002 - 0x4000000f.
> >
> > I think it could be reasonably argued that this is a userspace bug.
> > Clearly, when userspace explicitly supplies the results for a leaf,
> > those results override the default CPUID values for that leaf. But I
> > haven't seen it documented anywhere that leaves *not* explicitly
> > supplied by userspace will override the default CPUID values, just
> > because they happen to appear in some magic range.
> 
> In fact, the more I think about it, the original change is correct, at
> least in this regard. Your "fix" introduces undocumented and
> unfathomable behavior.

Heh, the takeaway from this is that whatever we decide on needs to be
documented somewhere :-)

I wouldn't say it's unfathomable, conceptually it seems like the intent
of the hypervisor range was to mimic the basic and extended ranges.  The
whole thing is arbitrary behavior.  Of course if Intel CPUs would just
return 0s on undefined leafs it would be a lot less arbitrary :-)
 
Anyways, I don't have a strong opinion on whether this patch stays or goes.
Jim Mattson March 3, 2020, 5:42 p.m. UTC | #6
Unfathomable was the wrong word. I can see what you're trying to do. I
just don't think it's defensible. I suspect that Intel CPU architects
will be surprised and disappointed to find that the maximum effective
value of CPUID.0H:EAX is now 255, and that they have to define
CPUID.100H:EAX as the "maximum leaf between 100H and 1FFH" if they
want to define any leaves between 100H and 1FFH.

Furthermore, AMD has only ceded 4000_0000h through 4000_00FFh to
hypervisors, so kvm's use of 40000100H through 400001FFH appears to be
a land grab, akin to VIA's unilateral grab of the C0000000H leaves.
Admittedly, one could argue that the 40000000H leaves are not AMD's to
apportion, since AMD and Intel appear to have reached a detente by
splitting the available space down the middle. Intel, who seems to be
the recognized authority for this range, declares the entire range
from 40000000H through 4FFFFFFFH to be invalid. Make of that what you
will.

In any event, no one has ever documented what's supposed to happen if
you leave gaps in the 4xxxxxxxH range when defining synthesized CPUID
leaves under kvm.

On Mon, Mar 2, 2020 at 8:58 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Mar 02, 2020 at 08:25:31PM -0800, Jim Mattson wrote:
> > On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote:
> > >
> > > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson
> > > <sean.j.christopherson@intel.com> wrote:
> > >
> > > > The bad behavior can be visually confirmed by dumping CPUID output in
> > > > the guest when running Qemu with a stable TSC, as Qemu extends the limit
> > > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
> > > > without defining zeroed entries for 0x40000002 - 0x4000000f.
> > >
> > > I think it could be reasonably argued that this is a userspace bug.
> > > Clearly, when userspace explicitly supplies the results for a leaf,
> > > those results override the default CPUID values for that leaf. But I
> > > haven't seen it documented anywhere that leaves *not* explicitly
> > > supplied by userspace will override the default CPUID values, just
> > > because they happen to appear in some magic range.
> >
> > In fact, the more I think about it, the original change is correct, at
> > least in this regard. Your "fix" introduces undocumented and
> > unfathomable behavior.
>
> Heh, the takeaway from this is that whatever we decide on needs to be
> documented somewhere :-)
>
> I wouldn't say it's unfathomable, conceptually it seems like the intent
> of the hypervisor range was to mimic the basic and extended ranges.  The
> whole thing is arbitrary behavior.  Of course if Intel CPUs would just
> return 0s on undefined leafs it would be a lot less arbitrary :-)
>
> Anyways, I don't have a strong opinion on whether this patch stays or goes.
Sean Christopherson March 3, 2020, 6:01 p.m. UTC | #7
On Tue, Mar 03, 2020 at 09:42:42AM -0800, Jim Mattson wrote:
> Unfathomable was the wrong word.

I dunno, one could argue that the behavior of Intel CPUs for CPUID is
unfathomable and I was just trying to follow suit :-D

>  I can see what you're trying to do. I
> just don't think it's defensible. I suspect that Intel CPU architects
> will be surprised and disappointed to find that the maximum effective
> value of CPUID.0H:EAX is now 255, and that they have to define
> CPUID.100H:EAX as the "maximum leaf between 100H and 1FFH" if they
> want to define any leaves between 100H and 1FFH.

Hmm, ya, I agree that applying a 0xffffff00 mask to all classes of CPUID
ranges is straight up wrong.

> Furthermore, AMD has only ceded 4000_0000h through 4000_00FFh to
> hypervisors, so kvm's use of 40000100H through 400001FFH appears to be
> a land grab, akin to VIA's unilateral grab of the C0000000H leaves.
> Admittedly, one could argue that the 40000000H leaves are not AMD's to
> apportion, since AMD and Intel appear to have reached a detente by
> splitting the available space down the middle. Intel, who seems to be
> the recognized authority for this range, declares the entire range
> from 40000000H through 4FFFFFFFH to be invalid. Make of that what you
> will.
> 
> In any event, no one has ever documented what's supposed to happen if
> you leave gaps in the 4xxxxxxxH range when defining synthesized CPUID
> leaves under kvm.

Probably stating the obvious, but for me, the least suprising thing is for
such leafs to output zeros.  It also feels safer, e.g. a guest that's
querying hypervisor support is less likely to be led astray by all zeros
than by a random feature bits being set.

What about something like this?  Along with a comment and documentation...

static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function)
{
	struct kvm_cpuid_entry2 *max;

	if (function >= 0x40000000 && function <= 0x4fffffff)
		max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00, 0);
	else
		max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
	return max && function <= max->eax;
}

> On Mon, Mar 2, 2020 at 8:58 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Mon, Mar 02, 2020 at 08:25:31PM -0800, Jim Mattson wrote:
> > > On Mon, Mar 2, 2020 at 7:25 PM Jim Mattson <jmattson@google.com> wrote:
> > > >
> > > > On Mon, Mar 2, 2020 at 11:57 AM Sean Christopherson
> > > > <sean.j.christopherson@intel.com> wrote:
> > > >
> > > > > The bad behavior can be visually confirmed by dumping CPUID output in
> > > > > the guest when running Qemu with a stable TSC, as Qemu extends the limit
> > > > > of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
> > > > > without defining zeroed entries for 0x40000002 - 0x4000000f.
> > > >
> > > > I think it could be reasonably argued that this is a userspace bug.
> > > > Clearly, when userspace explicitly supplies the results for a leaf,
> > > > those results override the default CPUID values for that leaf. But I
> > > > haven't seen it documented anywhere that leaves *not* explicitly
> > > > supplied by userspace will override the default CPUID values, just
> > > > because they happen to appear in some magic range.
> > >
> > > In fact, the more I think about it, the original change is correct, at
> > > least in this regard. Your "fix" introduces undocumented and
> > > unfathomable behavior.
> >
> > Heh, the takeaway from this is that whatever we decide on needs to be
> > documented somewhere :-)
> >
> > I wouldn't say it's unfathomable, conceptually it seems like the intent
> > of the hypervisor range was to mimic the basic and extended ranges.  The
> > whole thing is arbitrary behavior.  Of course if Intel CPUs would just
> > return 0s on undefined leafs it would be a lot less arbitrary :-)
> >
> > Anyways, I don't have a strong opinion on whether this patch stays or goes.
Jim Mattson March 3, 2020, 6:08 p.m. UTC | #8
On Tue, Mar 3, 2020 at 10:01 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Mar 03, 2020 at 09:42:42AM -0800, Jim Mattson wrote:
> > Unfathomable was the wrong word.
>
> I dunno, one could argue that the behavior of Intel CPUs for CPUID is
> unfathomable and I was just trying to follow suit :-D
>
> >  I can see what you're trying to do. I
> > just don't think it's defensible. I suspect that Intel CPU architects
> > will be surprised and disappointed to find that the maximum effective
> > value of CPUID.0H:EAX is now 255, and that they have to define
> > CPUID.100H:EAX as the "maximum leaf between 100H and 1FFH" if they
> > want to define any leaves between 100H and 1FFH.
>
> Hmm, ya, I agree that applying a 0xffffff00 mask to all classes of CPUID
> ranges is straight up wrong.
>
> > Furthermore, AMD has only ceded 4000_0000h through 4000_00FFh to
> > hypervisors, so kvm's use of 40000100H through 400001FFH appears to be
> > a land grab, akin to VIA's unilateral grab of the C0000000H leaves.
> > Admittedly, one could argue that the 40000000H leaves are not AMD's to
> > apportion, since AMD and Intel appear to have reached a detente by
> > splitting the available space down the middle. Intel, who seems to be
> > the recognized authority for this range, declares the entire range
> > from 40000000H through 4FFFFFFFH to be invalid. Make of that what you
> > will.
> >
> > In any event, no one has ever documented what's supposed to happen if
> > you leave gaps in the 4xxxxxxxH range when defining synthesized CPUID
> > leaves under kvm.
>
> Probably stating the obvious, but for me, the least suprising thing is for
> such leafs to output zeros.  It also feels safer, e.g. a guest that's
> querying hypervisor support is less likely to be led astray by all zeros
> than by a random feature bits being set.
>
> What about something like this?  Along with a comment and documentation...
>
> static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function)
> {
>         struct kvm_cpuid_entry2 *max;
>
>         if (function >= 0x40000000 && function <= 0x4fffffff)
>                 max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00, 0);
>         else
>                 max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
>         return max && function <= max->eax;
> }

I can get behind that. The behavior of the 4xxxxxxxH leaves under kvm
is arguably up to kvm (though AMD may disagree).
Paolo Bonzini March 4, 2020, 11:18 a.m. UTC | #9
On 03/03/20 19:01, Sean Christopherson wrote:
> static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function)
> {
> 	struct kvm_cpuid_entry2 *max;
> 
> 	if (function >= 0x40000000 && function <= 0x4fffffff)
> 		max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00, 0);
> 	else
> 		max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
> 	return max && function <= max->eax;
> }

Yes, this is a good idea (except it should be & 0xc0000000 to cover
Centaur).

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 6be012937eba..c320126e0118 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -993,7 +993,7 @@  static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function)
 {
 	struct kvm_cpuid_entry2 *max;
 
-	max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
+	max = kvm_find_cpuid_entry(vcpu, function & 0xffffff00u, 0);
 	return max && function <= max->eax;
 }