[RFC] kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH
diff mbox series

Message ID 20190912232753.85969-1-jmattson@google.com
State New
Headers show
Series
  • [RFC] kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH
Related show

Commit Message

Jim Mattson Sept. 12, 2019, 11:27 p.m. UTC
If these CPUID leaves are implemented, the EDX output is always the
x2APIC ID, regardless of the ECX input. Furthermore, the low byte of
the ECX output is always identical to the low byte of the ECX input.

KVM's CPUID emulation doesn't report the correct ECX and EDX outputs
when the ECX input is greater than the first subleaf for which the
"level type" is zero. This is probably only significant in the case of
the x2APIC ID, which should be the result of CPUID(EAX=0BH):EDX or
CPUID(EAX=1FH):EDX, without even setting a particular ECX input value.

Create a "wildcard" kvm_cpuid_entry2 for leaves 0BH and 1FH in
response to the KVM_GET_SUPPORTED_CPUID ioctl. This entry does not
have the KVM_CPUID_FLAG_SIGNIFCANT_INDEX flag, so it matches all
subleaves for which there isn't a prior explicit index match.

Add a new KVM_CPUID flag that is only applicable to leaves 0BH and
1FH: KVM_CPUID_FLAG_CL_IS_PASSTHROUGH. When KVM's CPUID emulation
encounters this flag, it will fix up ECX[7:0] in the CPUID output. Add
this flag to the aforementioned "wildcard" kvm_cpuid_entry2.

Note that userspace is still responsible for setting EDX to the x2APIC
ID of the vCPU in each of these structures, *including* the wildcard.

Qemu doesn't pass the flags from KVM_GET_SUPPORTED_CPUID to
KVM_SET_CPUID2, so it will have to be modified to take advantage of
these changes. Note that passing the new flag to older kernels will
have no effect.

Unfortunately, the new flag bit was not previously reserved, so it is
possible that a userspace agent that already sets this bit will be
unhappy with the new behavior. Technically, I suppose, this should be
implemented as a new set of ioctls. Posting as an RFC to get comments
on the API breakage.

Fixes: 0771671749b59a ("KVM: Enhance guest cpuid management")
Fixes: a87f2d3a6eadab ("KVM: x86: Add Intel CPUID.1F cpuid emulation support")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Jacob Xu <jacobhxu@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Change-Id: I6b422427f78b530106af3f929518363895367e25
---
 Documentation/virt/kvm/api.txt  |  6 +++++
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/cpuid.c            | 39 +++++++++++++++++++++++++++------
 3 files changed, 39 insertions(+), 7 deletions(-)

Comments

Jim Mattson Sept. 18, 2019, 3:32 p.m. UTC | #1
On Thu, Sep 12, 2019 at 4:28 PM Jim Mattson <jmattson@google.com> wrote:
>
> If these CPUID leaves are implemented, the EDX output is always the
> x2APIC ID, regardless of the ECX input. Furthermore, the low byte of
> the ECX output is always identical to the low byte of the ECX input.
>
> KVM's CPUID emulation doesn't report the correct ECX and EDX outputs
> when the ECX input is greater than the first subleaf for which the
> "level type" is zero. This is probably only significant in the case of
> the x2APIC ID, which should be the result of CPUID(EAX=0BH):EDX or
> CPUID(EAX=1FH):EDX, without even setting a particular ECX input value.
>
> Create a "wildcard" kvm_cpuid_entry2 for leaves 0BH and 1FH in
> response to the KVM_GET_SUPPORTED_CPUID ioctl. This entry does not
> have the KVM_CPUID_FLAG_SIGNIFCANT_INDEX flag, so it matches all
> subleaves for which there isn't a prior explicit index match.
>
> Add a new KVM_CPUID flag that is only applicable to leaves 0BH and
> 1FH: KVM_CPUID_FLAG_CL_IS_PASSTHROUGH. When KVM's CPUID emulation
> encounters this flag, it will fix up ECX[7:0] in the CPUID output. Add
> this flag to the aforementioned "wildcard" kvm_cpuid_entry2.
>
> Note that userspace is still responsible for setting EDX to the x2APIC
> ID of the vCPU in each of these structures, *including* the wildcard.
>
> Qemu doesn't pass the flags from KVM_GET_SUPPORTED_CPUID to
> KVM_SET_CPUID2, so it will have to be modified to take advantage of
> these changes. Note that passing the new flag to older kernels will
> have no effect.
>
> Unfortunately, the new flag bit was not previously reserved, so it is
> possible that a userspace agent that already sets this bit will be
> unhappy with the new behavior. Technically, I suppose, this should be
> implemented as a new set of ioctls. Posting as an RFC to get comments
> on the API breakage.
>
> Fixes: 0771671749b59a ("KVM: Enhance guest cpuid management")
> Fixes: a87f2d3a6eadab ("KVM: x86: Add Intel CPUID.1F cpuid emulation support")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Reviewed-by: Jacob Xu <jacobhxu@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>

No comments on the API breakage? Shall I resubmit as an actual patch?
Sean Christopherson Sept. 18, 2019, 5:43 p.m. UTC | #2
On Thu, Sep 12, 2019 at 04:27:53PM -0700, Jim Mattson wrote:
> If these CPUID leaves are implemented, the EDX output is always the
> x2APIC ID, regardless of the ECX input. Furthermore, the low byte of
> the ECX output is always identical to the low byte of the ECX input.
> 
> KVM's CPUID emulation doesn't report the correct ECX and EDX outputs
> when the ECX input is greater than the first subleaf for which the
> "level type" is zero. This is probably only significant in the case of
> the x2APIC ID, which should be the result of CPUID(EAX=0BH):EDX or
> CPUID(EAX=1FH):EDX, without even setting a particular ECX input value.

At a glance, shouldn't leaf 0x1f be marked significant in do_host_cpuid()?

> Create a "wildcard" kvm_cpuid_entry2 for leaves 0BH and 1FH in
> response to the KVM_GET_SUPPORTED_CPUID ioctl. This entry does not
> have the KVM_CPUID_FLAG_SIGNIFCANT_INDEX flag, so it matches all
> subleaves for which there isn't a prior explicit index match.
> 
> Add a new KVM_CPUID flag that is only applicable to leaves 0BH and
> 1FH: KVM_CPUID_FLAG_CL_IS_PASSTHROUGH. When KVM's CPUID emulation
> encounters this flag, it will fix up ECX[7:0] in the CPUID output. Add
> this flag to the aforementioned "wildcard" kvm_cpuid_entry2.
> 
> Note that userspace is still responsible for setting EDX to the x2APIC
> ID of the vCPU in each of these structures, *including* the wildcard.
> 
> Qemu doesn't pass the flags from KVM_GET_SUPPORTED_CPUID to
> KVM_SET_CPUID2, so it will have to be modified to take advantage of
> these changes. Note that passing the new flag to older kernels will
> have no effect.
> 
> Unfortunately, the new flag bit was not previously reserved, so it is
> possible that a userspace agent that already sets this bit will be
> unhappy with the new behavior. Technically, I suppose, this should be
> implemented as a new set of ioctls. Posting as an RFC to get comments
> on the API breakage.
> 
> Fixes: 0771671749b59a ("KVM: Enhance guest cpuid management")
> Fixes: a87f2d3a6eadab ("KVM: x86: Add Intel CPUID.1F cpuid emulation support")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Reviewed-by: Jacob Xu <jacobhxu@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> Change-Id: I6b422427f78b530106af3f929518363895367e25
> ---
>  Documentation/virt/kvm/api.txt  |  6 +++++
>  arch/x86/include/uapi/asm/kvm.h |  1 +
>  arch/x86/kvm/cpuid.c            | 39 +++++++++++++++++++++++++++------
>  3 files changed, 39 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index 2d067767b6170..be5cc42ad35f4 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -1396,6 +1396,7 @@ struct kvm_cpuid2 {
>  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
>  #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
>  #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
> +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH	BIT(3)
>  
>  struct kvm_cpuid_entry2 {
>  	__u32 function;
> @@ -1447,6 +1448,8 @@ emulate them efficiently. The fields in each entry are defined as follows:
>          KVM_CPUID_FLAG_STATE_READ_NEXT:
>             for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
>             the first entry to be read by a cpu
> +	KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
> +	   If the output value of ECX[7:0] matches the input value of ECX[7:0]
>     eax, ebx, ecx, edx: the values returned by the cpuid instruction for
>           this function/index combination
>  
> @@ -2992,6 +2995,7 @@ The member 'flags' is used for passing flags from userspace.
>  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
>  #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
>  #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
> +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH	BIT(3)
>  
>  struct kvm_cpuid_entry2 {
>  	__u32 function;
> @@ -3040,6 +3044,8 @@ The fields in each entry are defined as follows:
>          KVM_CPUID_FLAG_STATE_READ_NEXT:
>             for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
>             the first entry to be read by a cpu
> +	KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
> +	   If the output value of ECX[7:0] matches the input value of ECX[7:0]
>     eax, ebx, ecx, edx: the values returned by the cpuid instruction for
>           this function/index combination
>  
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 503d3f42da167..3b67d21123946 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -223,6 +223,7 @@ struct kvm_cpuid_entry2 {
>  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		(1 << 0)
>  #define KVM_CPUID_FLAG_STATEFUL_FUNC		(1 << 1)
>  #define KVM_CPUID_FLAG_STATE_READ_NEXT		(1 << 2)
> +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH	(1 << 3)
>  
>  /* for KVM_SET_CPUID2 */
>  struct kvm_cpuid2 {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index e7d25f4364664..280a796159cb2 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -612,19 +612,41 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>  	 */
>  	case 0x1f:
>  	case 0xb: {
> -		int i, level_type;
> +		int i;
>  
> -		/* read more entries until level_type is zero */
> -		for (i = 1; ; ++i) {
> +		/*
> +		 * We filled in entry[0] for CPUID(EAX=<function>,
> +		 * ECX=00H) above.  If its level type (ECX[15:8]) is
> +		 * zero, then the leaf is unimplemented, and we're
> +		 * done.  Otherwise, continue to populate entries
> +		 * until the level type (ECX[15:8]) of the previously
> +		 * added entry is zero.
> +		 */
> +		for (i = 1; entry[i - 1].ecx & 0xff00; ++i) {
>  			if (*nent >= maxnent)
>  				goto out;
>  
> -			level_type = entry[i - 1].ecx & 0xff00;
> -			if (!level_type)
> -				break;
>  			do_host_cpuid(&entry[i], function, i);
>  			++*nent;
>  		}

This should be a standalone bugfix/enhancement path.  Bugfix because it
eliminates a false positive on *nent >= maxnent.

> +
> +		if (i > 1) {
> +			/*
> +			 * If this leaf has multiple entries, treat
> +			 * the final entry as a "wildcard." Clear the
> +			 * "significant index" flag so that the index
> +			 * will be ignored when we later look for an
> +			 * entry that matches a CPUID
> +			 * invocation. Since this entry will now match
> +			 * CPUID(EAX=<function>, ECX=<index>) for any
> +			 * <index> >= (i - 1), set the "CL
> +			 * passthrough" flag to ensure that ECX[7:0]
> +			 * will be set to (<index> & 0xff), per the SDM.
> +			 */
> +			entry[i - 1].flags &= ~KVM_CPUID_FLAG_SIGNIFCANT_INDEX;

If I'm reading the code correctly, this is fragile and subtle.  The order
of cpuid entries is controlled by userspace, which means that clearing
KVM_CPUID_FLAG_SIGNIFCANT_INDEX depends on this entry being kept after all
other entries for this function.  In practice I'm guessing userspaces
naturally sort entries with the same function by ascending index, but it
seems like avoidable issue.

Also, won't matching the last entry generate the wrong values for EAX, EBX
and ECX, i.e. the valid values for the last index instead of zeroes?

> +			entry[i - 1].flags |= KVM_CPUID_FLAG_CL_IS_PASSTHROUGH;

Lastly, do we actually need to enumerate this silliness to userspace?
What if we handle this as a one-off case in CPUID emulation and avoid the
ABI breakage that way?  E.g.:

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index dd5985eb61b4..aaf5cdcb88c9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1001,6 +1001,16 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
        }

 out:
+       if (!best && (function == 0xb || function == 0x1f)) {
+               best = check_cpuid_limit(vcpu, function, 0);
+               if (best) {
+                       *eax = 0;
+                       *ebx = 0;
+                       *ecx &= 0xff;
+                       *edx = *best->edx;
+               }
+       }
+
        if (best) {
                *eax = best->eax;
                *ebx = best->ebx;

> +		}
> +
>  		break;
>  	}
>  	case 0xd: {
> @@ -1001,8 +1023,11 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>  		*ebx = best->ebx;
>  		*ecx = best->ecx;
>  		*edx = best->edx;
> -	} else
> +		if (best->flags & KVM_CPUID_FLAG_CL_IS_PASSTHROUGH)
> +			*ecx = (*ecx & ~0xff) | (index & 0xff);
> +	} else {
>  		*eax = *ebx = *ecx = *edx = 0;
> +	}
>  	trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, entry_found);
>  	return entry_found;
>  }
> -- 
> 2.23.0.237.gc6a4ce50a0-goog
>
Jim Mattson Sept. 18, 2019, 6:22 p.m. UTC | #3
On Wed, Sep 18, 2019 at 10:43 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Sep 12, 2019 at 04:27:53PM -0700, Jim Mattson wrote:
> > If these CPUID leaves are implemented, the EDX output is always the
> > x2APIC ID, regardless of the ECX input. Furthermore, the low byte of
> > the ECX output is always identical to the low byte of the ECX input.
> >
> > KVM's CPUID emulation doesn't report the correct ECX and EDX outputs
> > when the ECX input is greater than the first subleaf for which the
> > "level type" is zero. This is probably only significant in the case of
> > the x2APIC ID, which should be the result of CPUID(EAX=0BH):EDX or
> > CPUID(EAX=1FH):EDX, without even setting a particular ECX input value.
>
> At a glance, shouldn't leaf 0x1f be marked significant in do_host_cpuid()?

Indeed. See my previous post, "[PATCH] kvm: x86: Add "significant
index" flag to a few CPUID leaves."

> > Create a "wildcard" kvm_cpuid_entry2 for leaves 0BH and 1FH in
> > response to the KVM_GET_SUPPORTED_CPUID ioctl. This entry does not
> > have the KVM_CPUID_FLAG_SIGNIFCANT_INDEX flag, so it matches all
> > subleaves for which there isn't a prior explicit index match.
> >
> > Add a new KVM_CPUID flag that is only applicable to leaves 0BH and
> > 1FH: KVM_CPUID_FLAG_CL_IS_PASSTHROUGH. When KVM's CPUID emulation
> > encounters this flag, it will fix up ECX[7:0] in the CPUID output. Add
> > this flag to the aforementioned "wildcard" kvm_cpuid_entry2.
> >
> > Note that userspace is still responsible for setting EDX to the x2APIC
> > ID of the vCPU in each of these structures, *including* the wildcard.
> >
> > Qemu doesn't pass the flags from KVM_GET_SUPPORTED_CPUID to
> > KVM_SET_CPUID2, so it will have to be modified to take advantage of
> > these changes. Note that passing the new flag to older kernels will
> > have no effect.
> >
> > Unfortunately, the new flag bit was not previously reserved, so it is
> > possible that a userspace agent that already sets this bit will be
> > unhappy with the new behavior. Technically, I suppose, this should be
> > implemented as a new set of ioctls. Posting as an RFC to get comments
> > on the API breakage.
> >
> > Fixes: 0771671749b59a ("KVM: Enhance guest cpuid management")
> > Fixes: a87f2d3a6eadab ("KVM: x86: Add Intel CPUID.1F cpuid emulation support")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Reviewed-by: Steve Rutherford <srutherford@google.com>
> > Reviewed-by: Jacob Xu <jacobhxu@google.com>
> > Reviewed-by: Peter Shier <pshier@google.com>
> > Change-Id: I6b422427f78b530106af3f929518363895367e25
> > ---
> >  Documentation/virt/kvm/api.txt  |  6 +++++
> >  arch/x86/include/uapi/asm/kvm.h |  1 +
> >  arch/x86/kvm/cpuid.c            | 39 +++++++++++++++++++++++++++------
> >  3 files changed, 39 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> > index 2d067767b6170..be5cc42ad35f4 100644
> > --- a/Documentation/virt/kvm/api.txt
> > +++ b/Documentation/virt/kvm/api.txt
> > @@ -1396,6 +1396,7 @@ struct kvm_cpuid2 {
> >  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX              BIT(0)
> >  #define KVM_CPUID_FLAG_STATEFUL_FUNC         BIT(1)
> >  #define KVM_CPUID_FLAG_STATE_READ_NEXT               BIT(2)
> > +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH     BIT(3)
> >
> >  struct kvm_cpuid_entry2 {
> >       __u32 function;
> > @@ -1447,6 +1448,8 @@ emulate them efficiently. The fields in each entry are defined as follows:
> >          KVM_CPUID_FLAG_STATE_READ_NEXT:
> >             for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
> >             the first entry to be read by a cpu
> > +     KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
> > +        If the output value of ECX[7:0] matches the input value of ECX[7:0]
> >     eax, ebx, ecx, edx: the values returned by the cpuid instruction for
> >           this function/index combination
> >
> > @@ -2992,6 +2995,7 @@ The member 'flags' is used for passing flags from userspace.
> >  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX              BIT(0)
> >  #define KVM_CPUID_FLAG_STATEFUL_FUNC         BIT(1)
> >  #define KVM_CPUID_FLAG_STATE_READ_NEXT               BIT(2)
> > +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH     BIT(3)
> >
> >  struct kvm_cpuid_entry2 {
> >       __u32 function;
> > @@ -3040,6 +3044,8 @@ The fields in each entry are defined as follows:
> >          KVM_CPUID_FLAG_STATE_READ_NEXT:
> >             for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
> >             the first entry to be read by a cpu
> > +     KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
> > +        If the output value of ECX[7:0] matches the input value of ECX[7:0]
> >     eax, ebx, ecx, edx: the values returned by the cpuid instruction for
> >           this function/index combination
> >
> > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> > index 503d3f42da167..3b67d21123946 100644
> > --- a/arch/x86/include/uapi/asm/kvm.h
> > +++ b/arch/x86/include/uapi/asm/kvm.h
> > @@ -223,6 +223,7 @@ struct kvm_cpuid_entry2 {
> >  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX              (1 << 0)
> >  #define KVM_CPUID_FLAG_STATEFUL_FUNC         (1 << 1)
> >  #define KVM_CPUID_FLAG_STATE_READ_NEXT               (1 << 2)
> > +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH     (1 << 3)
> >
> >  /* for KVM_SET_CPUID2 */
> >  struct kvm_cpuid2 {
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index e7d25f4364664..280a796159cb2 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -612,19 +612,41 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >        */
> >       case 0x1f:
> >       case 0xb: {
> > -             int i, level_type;
> > +             int i;
> >
> > -             /* read more entries until level_type is zero */
> > -             for (i = 1; ; ++i) {
> > +             /*
> > +              * We filled in entry[0] for CPUID(EAX=<function>,
> > +              * ECX=00H) above.  If its level type (ECX[15:8]) is
> > +              * zero, then the leaf is unimplemented, and we're
> > +              * done.  Otherwise, continue to populate entries
> > +              * until the level type (ECX[15:8]) of the previously
> > +              * added entry is zero.
> > +              */
> > +             for (i = 1; entry[i - 1].ecx & 0xff00; ++i) {
> >                       if (*nent >= maxnent)
> >                               goto out;
> >
> > -                     level_type = entry[i - 1].ecx & 0xff00;
> > -                     if (!level_type)
> > -                             break;
> >                       do_host_cpuid(&entry[i], function, i);
> >                       ++*nent;
> >               }
>
> This should be a standalone bugfix/enhancement path.  Bugfix because it
> eliminates a false positive on *nent >= maxnent.

Sure.

> > +
> > +             if (i > 1) {
> > +                     /*
> > +                      * If this leaf has multiple entries, treat
> > +                      * the final entry as a "wildcard." Clear the
> > +                      * "significant index" flag so that the index
> > +                      * will be ignored when we later look for an
> > +                      * entry that matches a CPUID
> > +                      * invocation. Since this entry will now match
> > +                      * CPUID(EAX=<function>, ECX=<index>) for any
> > +                      * <index> >= (i - 1), set the "CL
> > +                      * passthrough" flag to ensure that ECX[7:0]
> > +                      * will be set to (<index> & 0xff), per the SDM.
> > +                      */
> > +                     entry[i - 1].flags &= ~KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
>
> If I'm reading the code correctly, this is fragile and subtle.  The order
> of cpuid entries is controlled by userspace, which means that clearing
> KVM_CPUID_FLAG_SIGNIFCANT_INDEX depends on this entry being kept after all
> other entries for this function.  In practice I'm guessing userspaces
> naturally sort entries with the same function by ascending index, but it
> seems like avoidable issue.

Though not documented, the CPUID leaf matching code has always
depended on ordering.

> Also, won't matching the last entry generate the wrong values for EAX, EBX
> and ECX, i.e. the valid values for the last index instead of zeroes?

This entry has CH==0. According to the SDM, "For sub-leaves that
return an invalid level-type of 0 in ECX[15:8]; EAX and EBX will
return 0."
ECX[7:0] will be wrong, but that's fixed up by the flag below.
ECX[31:16] are reserved and perhaps should be cleared here, but I'm
not sure how I would interpret it if those bits started being non-zero
for the first leaf with CH==0.

> > +                     entry[i - 1].flags |= KVM_CPUID_FLAG_CL_IS_PASSTHROUGH;
>
> Lastly, do we actually need to enumerate this silliness to userspace?
> What if we handle this as a one-off case in CPUID emulation and avoid the
> ABI breakage that way?  E.g.:
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index dd5985eb61b4..aaf5cdcb88c9 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1001,6 +1001,16 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>         }
>
>  out:
> +       if (!best && (function == 0xb || function == 0x1f)) {
> +               best = check_cpuid_limit(vcpu, function, 0);
> +               if (best) {
> +                       *eax = 0;
> +                       *ebx = 0;
> +                       *ecx &= 0xff;
> +                       *edx = *best->edx;
> +               }
> +       }
> +

Aside from the fact that one should never call check_cpuid_limit on
AMD systems (they don't do the "last basic leaf" nonsense), an
approach like this should work.

>         if (best) {
>                 *eax = best->eax;
>                 *ebx = best->ebx;
>
> > +             }
> > +
> >               break;
> >       }
> >       case 0xd: {
> > @@ -1001,8 +1023,11 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> >               *ebx = best->ebx;
> >               *ecx = best->ecx;
> >               *edx = best->edx;
> > -     } else
> > +             if (best->flags & KVM_CPUID_FLAG_CL_IS_PASSTHROUGH)
> > +                     *ecx = (*ecx & ~0xff) | (index & 0xff);
> > +     } else {
> >               *eax = *ebx = *ecx = *edx = 0;
> > +     }
> >       trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, entry_found);
> >       return entry_found;
> >  }
> > --
> > 2.23.0.237.gc6a4ce50a0-goog
> >
Jim Mattson Sept. 18, 2019, 6:41 p.m. UTC | #4
On Wed, Sep 18, 2019 at 11:22 AM Jim Mattson <jmattson@google.com> wrote:
>
> On Wed, Sep 18, 2019 at 10:43 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Thu, Sep 12, 2019 at 04:27:53PM -0700, Jim Mattson wrote:
> > > If these CPUID leaves are implemented, the EDX output is always the
> > > x2APIC ID, regardless of the ECX input. Furthermore, the low byte of
> > > the ECX output is always identical to the low byte of the ECX input.
> > >
> > > KVM's CPUID emulation doesn't report the correct ECX and EDX outputs
> > > when the ECX input is greater than the first subleaf for which the
> > > "level type" is zero. This is probably only significant in the case of
> > > the x2APIC ID, which should be the result of CPUID(EAX=0BH):EDX or
> > > CPUID(EAX=1FH):EDX, without even setting a particular ECX input value.
> >
> > At a glance, shouldn't leaf 0x1f be marked significant in do_host_cpuid()?
>
> Indeed. See my previous post, "[PATCH] kvm: x86: Add "significant
> index" flag to a few CPUID leaves."
>
> > > Create a "wildcard" kvm_cpuid_entry2 for leaves 0BH and 1FH in
> > > response to the KVM_GET_SUPPORTED_CPUID ioctl. This entry does not
> > > have the KVM_CPUID_FLAG_SIGNIFCANT_INDEX flag, so it matches all
> > > subleaves for which there isn't a prior explicit index match.
> > >
> > > Add a new KVM_CPUID flag that is only applicable to leaves 0BH and
> > > 1FH: KVM_CPUID_FLAG_CL_IS_PASSTHROUGH. When KVM's CPUID emulation
> > > encounters this flag, it will fix up ECX[7:0] in the CPUID output. Add
> > > this flag to the aforementioned "wildcard" kvm_cpuid_entry2.
> > >
> > > Note that userspace is still responsible for setting EDX to the x2APIC
> > > ID of the vCPU in each of these structures, *including* the wildcard.
> > >
> > > Qemu doesn't pass the flags from KVM_GET_SUPPORTED_CPUID to
> > > KVM_SET_CPUID2, so it will have to be modified to take advantage of
> > > these changes. Note that passing the new flag to older kernels will
> > > have no effect.
> > >
> > > Unfortunately, the new flag bit was not previously reserved, so it is
> > > possible that a userspace agent that already sets this bit will be
> > > unhappy with the new behavior. Technically, I suppose, this should be
> > > implemented as a new set of ioctls. Posting as an RFC to get comments
> > > on the API breakage.
> > >
> > > Fixes: 0771671749b59a ("KVM: Enhance guest cpuid management")
> > > Fixes: a87f2d3a6eadab ("KVM: x86: Add Intel CPUID.1F cpuid emulation support")
> > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > Reviewed-by: Steve Rutherford <srutherford@google.com>
> > > Reviewed-by: Jacob Xu <jacobhxu@google.com>
> > > Reviewed-by: Peter Shier <pshier@google.com>
> > > Change-Id: I6b422427f78b530106af3f929518363895367e25
> > > ---
> > >  Documentation/virt/kvm/api.txt  |  6 +++++
> > >  arch/x86/include/uapi/asm/kvm.h |  1 +
> > >  arch/x86/kvm/cpuid.c            | 39 +++++++++++++++++++++++++++------
> > >  3 files changed, 39 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> > > index 2d067767b6170..be5cc42ad35f4 100644
> > > --- a/Documentation/virt/kvm/api.txt
> > > +++ b/Documentation/virt/kvm/api.txt
> > > @@ -1396,6 +1396,7 @@ struct kvm_cpuid2 {
> > >  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX              BIT(0)
> > >  #define KVM_CPUID_FLAG_STATEFUL_FUNC         BIT(1)
> > >  #define KVM_CPUID_FLAG_STATE_READ_NEXT               BIT(2)
> > > +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH     BIT(3)
> > >
> > >  struct kvm_cpuid_entry2 {
> > >       __u32 function;
> > > @@ -1447,6 +1448,8 @@ emulate them efficiently. The fields in each entry are defined as follows:
> > >          KVM_CPUID_FLAG_STATE_READ_NEXT:
> > >             for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
> > >             the first entry to be read by a cpu
> > > +     KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
> > > +        If the output value of ECX[7:0] matches the input value of ECX[7:0]
> > >     eax, ebx, ecx, edx: the values returned by the cpuid instruction for
> > >           this function/index combination
> > >
> > > @@ -2992,6 +2995,7 @@ The member 'flags' is used for passing flags from userspace.
> > >  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX              BIT(0)
> > >  #define KVM_CPUID_FLAG_STATEFUL_FUNC         BIT(1)
> > >  #define KVM_CPUID_FLAG_STATE_READ_NEXT               BIT(2)
> > > +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH     BIT(3)
> > >
> > >  struct kvm_cpuid_entry2 {
> > >       __u32 function;
> > > @@ -3040,6 +3044,8 @@ The fields in each entry are defined as follows:
> > >          KVM_CPUID_FLAG_STATE_READ_NEXT:
> > >             for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
> > >             the first entry to be read by a cpu
> > > +     KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
> > > +        If the output value of ECX[7:0] matches the input value of ECX[7:0]
> > >     eax, ebx, ecx, edx: the values returned by the cpuid instruction for
> > >           this function/index combination
> > >
> > > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> > > index 503d3f42da167..3b67d21123946 100644
> > > --- a/arch/x86/include/uapi/asm/kvm.h
> > > +++ b/arch/x86/include/uapi/asm/kvm.h
> > > @@ -223,6 +223,7 @@ struct kvm_cpuid_entry2 {
> > >  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX              (1 << 0)
> > >  #define KVM_CPUID_FLAG_STATEFUL_FUNC         (1 << 1)
> > >  #define KVM_CPUID_FLAG_STATE_READ_NEXT               (1 << 2)
> > > +#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH     (1 << 3)
> > >
> > >  /* for KVM_SET_CPUID2 */
> > >  struct kvm_cpuid2 {
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index e7d25f4364664..280a796159cb2 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -612,19 +612,41 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> > >        */
> > >       case 0x1f:
> > >       case 0xb: {
> > > -             int i, level_type;
> > > +             int i;
> > >
> > > -             /* read more entries until level_type is zero */
> > > -             for (i = 1; ; ++i) {
> > > +             /*
> > > +              * We filled in entry[0] for CPUID(EAX=<function>,
> > > +              * ECX=00H) above.  If its level type (ECX[15:8]) is
> > > +              * zero, then the leaf is unimplemented, and we're
> > > +              * done.  Otherwise, continue to populate entries
> > > +              * until the level type (ECX[15:8]) of the previously
> > > +              * added entry is zero.
> > > +              */
> > > +             for (i = 1; entry[i - 1].ecx & 0xff00; ++i) {
> > >                       if (*nent >= maxnent)
> > >                               goto out;
> > >
> > > -                     level_type = entry[i - 1].ecx & 0xff00;
> > > -                     if (!level_type)
> > > -                             break;
> > >                       do_host_cpuid(&entry[i], function, i);
> > >                       ++*nent;
> > >               }
> >
> > This should be a standalone bugfix/enhancement path.  Bugfix because it
> > eliminates a false positive on *nent >= maxnent.
>
> Sure.
>
> > > +
> > > +             if (i > 1) {
> > > +                     /*
> > > +                      * If this leaf has multiple entries, treat
> > > +                      * the final entry as a "wildcard." Clear the
> > > +                      * "significant index" flag so that the index
> > > +                      * will be ignored when we later look for an
> > > +                      * entry that matches a CPUID
> > > +                      * invocation. Since this entry will now match
> > > +                      * CPUID(EAX=<function>, ECX=<index>) for any
> > > +                      * <index> >= (i - 1), set the "CL
> > > +                      * passthrough" flag to ensure that ECX[7:0]
> > > +                      * will be set to (<index> & 0xff), per the SDM.
> > > +                      */
> > > +                     entry[i - 1].flags &= ~KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> >
> > If I'm reading the code correctly, this is fragile and subtle.  The order
> > of cpuid entries is controlled by userspace, which means that clearing
> > KVM_CPUID_FLAG_SIGNIFCANT_INDEX depends on this entry being kept after all
> > other entries for this function.  In practice I'm guessing userspaces
> > naturally sort entries with the same function by ascending index, but it
> > seems like avoidable issue.
>
> Though not documented, the CPUID leaf matching code has always
> depended on ordering.
>
> > Also, won't matching the last entry generate the wrong values for EAX, EBX
> > and ECX, i.e. the valid values for the last index instead of zeroes?
>
> This entry has CH==0. According to the SDM, "For sub-leaves that
> return an invalid level-type of 0 in ECX[15:8]; EAX and EBX will
> return 0."
> ECX[7:0] will be wrong, but that's fixed up by the flag below.
> ECX[31:16] are reserved and perhaps should be cleared here, but I'm
> not sure how I would interpret it if those bits started being non-zero
> for the first leaf with CH==0.
>
> > > +                     entry[i - 1].flags |= KVM_CPUID_FLAG_CL_IS_PASSTHROUGH;
> >
> > Lastly, do we actually need to enumerate this silliness to userspace?
> > What if we handle this as a one-off case in CPUID emulation and avoid the
> > ABI breakage that way?  E.g.:
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index dd5985eb61b4..aaf5cdcb88c9 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -1001,6 +1001,16 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> >         }
> >
> >  out:
> > +       if (!best && (function == 0xb || function == 0x1f)) {
> > +               best = check_cpuid_limit(vcpu, function, 0);
> > +               if (best) {
> > +                       *eax = 0;
> > +                       *ebx = 0;
> > +                       *ecx &= 0xff;
> > +                       *edx = *best->edx;
> > +               }
> > +       }
> > +
>
> Aside from the fact that one should never call check_cpuid_limit on
> AMD systems (they don't do the "last basic leaf" nonsense), an
> approach like this should work.

The above proposal doesn't correctly handle a leaf outside of ([0,
maxBasicLeaf] union [80000000H, maxExtendedLeaf]) , where maxBasicLeaf
== (0BH or 1FH) on Intel hardware...but it could be fixed up to do so,
if hard-coding this behavior in kvm_cpuid() is preferable to the API
breakage.
Xiaoyao Li Sept. 19, 2019, 5:31 a.m. UTC | #5
On 9/19/2019 2:41 AM, Jim Mattson wrote:
> On Wed, Sep 18, 2019 at 11:22 AM Jim Mattson <jmattson@google.com> wrote:
[...]>>>
>>> If I'm reading the code correctly, this is fragile and subtle.  The order
>>> of cpuid entries is controlled by userspace, which means that clearing
>>> KVM_CPUID_FLAG_SIGNIFCANT_INDEX depends on this entry being kept after all
>>> other entries for this function.  In practice I'm guessing userspaces
>>> naturally sort entries with the same function by ascending index, but it
>>> seems like avoidable issue.
>>
>> Though not documented, the CPUID leaf matching code has always
>> depended on ordering.
>>
>>> Also, won't matching the last entry generate the wrong values for EAX, EBX
>>> and ECX, i.e. the valid values for the last index instead of zeroes?
>>
>> This entry has CH==0. According to the SDM, "For sub-leaves that
>> return an invalid level-type of 0 in ECX[15:8]; EAX and EBX will
>> return 0."
>> ECX[7:0] will be wrong, but that's fixed up by the flag below.
>> ECX[31:16] are reserved and perhaps should be cleared here, but I'm
>> not sure how I would interpret it if those bits started being non-zero
>> for the first leaf with CH==0.
>>
>>>> +                     entry[i - 1].flags |= KVM_CPUID_FLAG_CL_IS_PASSTHROUGH;
>>>
>>> Lastly, do we actually need to enumerate this silliness to userspace?
>>> What if we handle this as a one-off case in CPUID emulation and avoid the
>>> ABI breakage that way?  E.g.:
>>>
>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>> index dd5985eb61b4..aaf5cdcb88c9 100644
>>> --- a/arch/x86/kvm/cpuid.c
>>> +++ b/arch/x86/kvm/cpuid.c
>>> @@ -1001,6 +1001,16 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>>>          }
>>>
>>>   out:
>>> +       if (!best && (function == 0xb || function == 0x1f)) {
>>> +               best = check_cpuid_limit(vcpu, function, 0);
>>> +               if (best) {
>>> +                       *eax = 0;
>>> +                       *ebx = 0;
>>> +                       *ecx &= 0xff;
>>> +                       *edx = *best->edx;
>>> +               }
>>> +       }
>>> +
>>

Hi Sean,
your proposal above seems not work. check_cpuid_limit(vcpu, 0xb/01f, 0) 
always return NULL, if there are valid 0xb/0x1f leaves in 
vcpu->arch.cpuid_entries[] (maxBaiscleaf >= 0xb/0x1f)

>> Aside from the fact that one should never call check_cpuid_limit on
>> AMD systems (they don't do the "last basic leaf" nonsense), an
>> approach like this should work.
> 
> The above proposal doesn't correctly handle a leaf outside of ([0,
> maxBasicLeaf] union [80000000H, maxExtendedLeaf]) , where maxBasicLeaf
> == (0BH or 1FH) on Intel hardware...but it could be fixed up to do so,
> if hard-coding this behavior in kvm_cpuid() is preferable to the API
> breakage.
> 

I vote for Sean's one-off case, how about something like this:

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 22c2720cd948..6af5febf7b12 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -976,11 +976,23 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, 
u32 *ebx,
                u32 *ecx, u32 *edx, bool check_limit)
  {
         u32 function = *eax, index = *ecx;
-       struct kvm_cpuid_entry2 *best;
+       struct kvm_cpuid_entry2 *best, tmp;
         bool entry_found = true;

         best = kvm_find_cpuid_entry(vcpu, function, index);

+       if (!best && (fuction == 0xb || function == 0x1f) && index > 0) {
+               best = kvm_find_cpuid_entry(vcpu, function, 0);
+               if (best) {
+                       tmp.eax = 0;
+                       tmp.ebx = 0;
+                       tmp.ecx = index & 0xff;
+                       tmp.edx = best->edx;
+                       best = &tmp;
+                       goto out;
+               }
+       }
+
Jim Mattson Sept. 19, 2019, 6:26 p.m. UTC | #6
On Wed, Sep 18, 2019 at 10:31 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> I vote for Sean's one-off case, how about something like this:
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 22c2720cd948..6af5febf7b12 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -976,11 +976,23 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax,
> u32 *ebx,
>                 u32 *ecx, u32 *edx, bool check_limit)
>   {
>          u32 function = *eax, index = *ecx;
> -       struct kvm_cpuid_entry2 *best;
> +       struct kvm_cpuid_entry2 *best, tmp;
>          bool entry_found = true;
>
>          best = kvm_find_cpuid_entry(vcpu, function, index);
>
> +       if (!best && (fuction == 0xb || function == 0x1f) && index > 0) {
> +               best = kvm_find_cpuid_entry(vcpu, function, 0);
> +               if (best) {
> +                       tmp.eax = 0;
> +                       tmp.ebx = 0;
> +                       tmp.ecx = index & 0xff;
> +                       tmp.edx = best->edx;
> +                       best = &tmp;
> +                       goto out;
> +               }
> +       }
> +

I don't believe this works for the case where 0BH or 1FH is the
maximum basic leaf, in which case all out-of-range leaves should have
this behavior. But I'll go ahead and work up a solution using this
two-off :-) approach.
Paolo Bonzini Sept. 24, 2019, 2:01 p.m. UTC | #7
On 18/09/19 20:22, Jim Mattson wrote:
> Aside from the fact that one should never call check_cpuid_limit on
> AMD systems (they don't do the "last basic leaf" nonsense), an
> approach like this should work.

Yeah, I agree it's enough.  If there's a complex or really weird
behavior that userspace would most definitely get wrong, we should
design the API to simplify its job.

Paolo

Patch
diff mbox series

diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index 2d067767b6170..be5cc42ad35f4 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -1396,6 +1396,7 @@  struct kvm_cpuid2 {
 #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
 #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
 #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
+#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH	BIT(3)
 
 struct kvm_cpuid_entry2 {
 	__u32 function;
@@ -1447,6 +1448,8 @@  emulate them efficiently. The fields in each entry are defined as follows:
         KVM_CPUID_FLAG_STATE_READ_NEXT:
            for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
            the first entry to be read by a cpu
+	KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
+	   If the output value of ECX[7:0] matches the input value of ECX[7:0]
    eax, ebx, ecx, edx: the values returned by the cpuid instruction for
          this function/index combination
 
@@ -2992,6 +2995,7 @@  The member 'flags' is used for passing flags from userspace.
 #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
 #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
 #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
+#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH	BIT(3)
 
 struct kvm_cpuid_entry2 {
 	__u32 function;
@@ -3040,6 +3044,8 @@  The fields in each entry are defined as follows:
         KVM_CPUID_FLAG_STATE_READ_NEXT:
            for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
            the first entry to be read by a cpu
+	KVM_CPUID_FLAG_CL_IS_PASSTHROUGH:
+	   If the output value of ECX[7:0] matches the input value of ECX[7:0]
    eax, ebx, ecx, edx: the values returned by the cpuid instruction for
          this function/index combination
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 503d3f42da167..3b67d21123946 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -223,6 +223,7 @@  struct kvm_cpuid_entry2 {
 #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		(1 << 0)
 #define KVM_CPUID_FLAG_STATEFUL_FUNC		(1 << 1)
 #define KVM_CPUID_FLAG_STATE_READ_NEXT		(1 << 2)
+#define KVM_CPUID_FLAG_CL_IS_PASSTHROUGH	(1 << 3)
 
 /* for KVM_SET_CPUID2 */
 struct kvm_cpuid2 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e7d25f4364664..280a796159cb2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -612,19 +612,41 @@  static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 	 */
 	case 0x1f:
 	case 0xb: {
-		int i, level_type;
+		int i;
 
-		/* read more entries until level_type is zero */
-		for (i = 1; ; ++i) {
+		/*
+		 * We filled in entry[0] for CPUID(EAX=<function>,
+		 * ECX=00H) above.  If its level type (ECX[15:8]) is
+		 * zero, then the leaf is unimplemented, and we're
+		 * done.  Otherwise, continue to populate entries
+		 * until the level type (ECX[15:8]) of the previously
+		 * added entry is zero.
+		 */
+		for (i = 1; entry[i - 1].ecx & 0xff00; ++i) {
 			if (*nent >= maxnent)
 				goto out;
 
-			level_type = entry[i - 1].ecx & 0xff00;
-			if (!level_type)
-				break;
 			do_host_cpuid(&entry[i], function, i);
 			++*nent;
 		}
+
+		if (i > 1) {
+			/*
+			 * If this leaf has multiple entries, treat
+			 * the final entry as a "wildcard." Clear the
+			 * "significant index" flag so that the index
+			 * will be ignored when we later look for an
+			 * entry that matches a CPUID
+			 * invocation. Since this entry will now match
+			 * CPUID(EAX=<function>, ECX=<index>) for any
+			 * <index> >= (i - 1), set the "CL
+			 * passthrough" flag to ensure that ECX[7:0]
+			 * will be set to (<index> & 0xff), per the SDM.
+			 */
+			entry[i - 1].flags &= ~KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+			entry[i - 1].flags |= KVM_CPUID_FLAG_CL_IS_PASSTHROUGH;
+		}
+
 		break;
 	}
 	case 0xd: {
@@ -1001,8 +1023,11 @@  bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 		*ebx = best->ebx;
 		*ecx = best->ecx;
 		*edx = best->edx;
-	} else
+		if (best->flags & KVM_CPUID_FLAG_CL_IS_PASSTHROUGH)
+			*ecx = (*ecx & ~0xff) | (index & 0xff);
+	} else {
 		*eax = *ebx = *ecx = *edx = 0;
+	}
 	trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, entry_found);
 	return entry_found;
 }