diff mbox series

KVM: x86: Add Intel CPUID.1F cpuid emulation support

Message ID 1555915234-2536-1-git-send-email-like.xu@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Add Intel CPUID.1F cpuid emulation support | expand

Commit Message

Like Xu April 22, 2019, 6:40 a.m. UTC
Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
host system has multiple software-visible die within each package.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Konrad Rzeszutek Wilk April 22, 2019, 3:53 p.m. UTC | #1
On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
> host system has multiple software-visible die within each package.

Is there some doc on this?

The https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration
has a date of 2012.


> 
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/kvm/cpuid.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index fd39516..9fc14f2 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
>  	return xcr0;
>  }
>  
> +/* We need to check if the host cpu has multi-chip packaging technology. */
> +static bool kvm_supported_intel_mcp(void)
> +{
> +	u32 eax, ignored;
> +
> +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
> +
> +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
> +}
> +
>  #define F(x) bit(X86_FEATURE_##x)
>  
>  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  	switch (function) {
>  	case 0:
>  		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
>  		break;
>  	case 1:
>  		entry->edx &= kvm_cpuid_1_edx_x86_features;
> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		entry->edx = edx.full;
>  		break;
>  	}
> +	/* function 0x1f has additional index. */
> +	case 0x1f:
>  	/* function 0xb has additional index. */
>  	case 0xb: {
>  		int i, level_type;
> -- 
> 1.8.3.1
>
Sean Christopherson April 22, 2019, 4:56 p.m. UTC | #2
On Mon, Apr 22, 2019 at 11:53:33AM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
> > Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
> > host system has multiple software-visible die within each package.
> 
> Is there some doc on this?

It's in recent versions of the SDM, in the massive "Information Returned
by CPUID Instruction" table.
Sean Christopherson April 22, 2019, 6:35 p.m. UTC | #3
On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
> host system has multiple software-visible die within each package.
> 
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/kvm/cpuid.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index fd39516..9fc14f2 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
>  	return xcr0;
>  }
>  
> +/* We need to check if the host cpu has multi-chip packaging technology. */
> +static bool kvm_supported_intel_mcp(void)
> +{
> +	u32 eax, ignored;
> +
> +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
> +
> +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
> +}
> +
>  #define F(x) bit(X86_FEATURE_##x)
>  
>  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  	switch (function) {
>  	case 0:
>  		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;

This all seems unnecessary.  And by 'all', I mean the existing Intel PT
and XSAVE leaf checks, as well as the new mcp check.  entry->eax comes
directly from hardware, and unless I missed something, PT and XSAVE are
only exposed to the guest when they're supported in hardware.  In other
words, KVM will never need to adjust entry->eax to expose PT or XSAVE.

The original min() check was added by commit 0771671749b5 ("KVM: Enhance
guest cpuid management"), which doesn't provide any explicit information
on why KVM does min() in the first place.  Given that the original code
was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
idea was to always report "Extended Topology Enumeration Leaf" as
supported so that userspace can enumerate the VM's topology to the guest
even when hardware itself doesn't do so.

Assuming we want to allow userspace to use "V2 Extended Topology
Enumeration Leaf" regardless of hardware support, then this can simply be:

  entry->eax = min(entry->eax, (u32)0xf);

Or am I completely missing something?

>  		break;
>  	case 1:
>  		entry->edx &= kvm_cpuid_1_edx_x86_features;
> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		entry->edx = edx.full;
>  		break;
>  	}
> +	/* function 0x1f has additional index. */

The original comment is rather useless, it's obvious from the code that
it has additional indices.  No need to repeat its sins.  A more useful
comment would be to explain that 0x1f and 0xb have identical formats and
thus can be handled by common code.

Which begs the question, why does leaf 0x1f exist?  AFAICT the only
difference is that 0x1f supports additional "level types", but 0x1f's
types are backwards compatibile.  Any idea why leaf 0xb wasn't simply
extended for the new types?

> +	case 0x1f:
>  	/* function 0xb has additional index. */
>  	case 0xb: {
>  		int i, level_type;
> -- 
> 1.8.3.1
>
Like Xu April 23, 2019, 3:23 a.m. UTC | #4
On 2019/4/23 2:35, Sean Christopherson wrote:
> On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
>> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
>> host system has multiple software-visible die within each package.
>>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> ---
>>   arch/x86/kvm/cpuid.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index fd39516..9fc14f2 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
>>   	return xcr0;
>>   }
>>   
>> +/* We need to check if the host cpu has multi-chip packaging technology. */
>> +static bool kvm_supported_intel_mcp(void)
>> +{
>> +	u32 eax, ignored;
>> +
>> +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
>> +
>> +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
>> +}
>> +
>>   #define F(x) bit(X86_FEATURE_##x)
>>   
>>   int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>>   	switch (function) {
>>   	case 0:
>>   		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
> 
> This all seems unnecessary.  And by 'all', I mean the existing Intel PT
> and XSAVE leaf checks, as well as the new mcp check.  entry->eax comes
> directly from hardware, and unless I missed something, PT and XSAVE are
> only exposed to the guest when they're supported in hardware.  In other
> words, KVM will never need to adjust entry->eax to expose PT or XSAVE.

We call this function for both case KVM_GET_SUPPORTED_CPUID and 
KVM_GET_EMULATED_CPUID although kvm user could reconfig them via 
KVM_SET_CPUID* path.

> 
> The original min() check was added by commit 0771671749b5 ("KVM: Enhance
> guest cpuid management"), which doesn't provide any explicit information
> on why KVM does min() in the first place.  

Exposing cpuid.0.eax in a blind way (with host hardware support)
is not a good practice for guest migration and improves compatibility 
requirements.

> Given that the original code
> was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
> idea was to always report "Extended Topology Enumeration Leaf" as
> supported so that userspace can enumerate the VM's topology to the guest
> even when hardware itself doesn't do so.

If the host cpu mode is too antiquated to support 0xb, it wouldn't 
report 0xb for sure. The host cpuid.0.eax has been over 0xb for a long 
time and reached 0x1f in the latest SDM.

AFAICT, the original code keeps minimum cpuid.0.eax out of features 
guest just used or at least it claimed to use.

> 
> Assuming we want to allow userspace to use "V2 Extended Topology
> Enumeration Leaf" regardless of hardware support, then this can simply be:
> 
>    entry->eax = min(entry->eax, (u32)0x1f);
> 
> Or am I completely missing something?
> 
>>   		break;
>>   	case 1:
>>   		entry->edx &= kvm_cpuid_1_edx_x86_features;
>> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>>   		entry->edx = edx.full;
>>   		break;
>>   	}
>> +	/* function 0x1f has additional index. */
> 
> The original comment is rather useless, it's obvious from the code that
> it has additional indices.  No need to repeat its sins.  A more useful
> comment would be to explain that 0x1f and 0xb have identical formats and
> thus can be handled by common code.

I agree and let me fix it in next version.

> 
> Which begs the question, why does leaf 0x1f exist?  AFAICT the only
> difference is that 0x1f supports additional "level types", but 0x1f's
> types are backwards compatibile.  Any idea why leaf 0xb wasn't simply
> extended for the new types?

It's not just about backwards compatibility on numerical parsing.

So many software (whatever OS and applications) are using 0x1b
to get CPU topology. In most cases, they (legacy code) would assume that 
the next level of CORE is package (at lease for Intel) not die and it's 
a semantic conflict if we reuse 0xb.

As said in SDM, Intel recommends first checking for the existence of 
Leaf 1FH and using this if available.

> 
>> +	case 0x1f:
>>   	/* function 0xb has additional index. */
>>   	case 0xb: {
>>   		int i, level_type;
>> -- 
>> 1.8.3.1
>>
>
Sean Christopherson April 23, 2019, 5:44 p.m. UTC | #5
On Tue, Apr 23, 2019 at 11:23:59AM +0800, Like Xu wrote:
> On 2019/4/23 2:35, Sean Christopherson wrote:
> >>  #define F(x) bit(X86_FEATURE_##x)
> >>  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> >>@@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> >>  	switch (function) {
> >>  	case 0:
> >>  		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> >>+		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
> >
> >This all seems unnecessary.  And by 'all', I mean the existing Intel PT
> >and XSAVE leaf checks, as well as the new mcp check.  entry->eax comes
> >directly from hardware, and unless I missed something, PT and XSAVE are
> >only exposed to the guest when they're supported in hardware.  In other
> >words, KVM will never need to adjust entry->eax to expose PT or XSAVE.
> 
> We call this function for both case KVM_GET_SUPPORTED_CPUID and
> KVM_GET_EMULATED_CPUID although kvm user could reconfig them via
> KVM_SET_CPUID* path.

Not that it matters, but __do_cpuid_ent() is only used for the non-emulated
case, KVM_GET_EMULATED_CPUID goes to __do_cpuid_ent_emulated().
 
> >The original min() check was added by commit 0771671749b5 ("KVM: Enhance
> >guest cpuid management"), which doesn't provide any explicit information
> >on why KVM does min() in the first place.
> 
> Exposing cpuid.0.eax in a blind way (with host hardware support)
> is not a good practice for guest migration and improves compatibility
> requirements.

Right, but isn't the f_intel_pt check for example completely irrelevant?
f_intel_pt is true if and only if hardware supports PT, i.e. CPUID.0.EAX
and thus entry->eax will already be >=0x14.

I don't fully understand whether or not KVM needs to raise the minimum to
0xb regardless of h/w XSAVE support, but it's likely irrelevant in the end.

Anyways, back to 0x1f, kvm_supported_intel_mcp() returns true if and only
if hardware's CPUID.0.EAX >= 0x1f, i.e. adjusting entry->eax is always a
nop.  So if KVM wants to advertise leaf 0x1f only when it's supported in
hardware then adjusting entry->eax is unnecessary, and if KVM wants to
unconditionally advertise 0x1f then adjusting entry->eax should also be
done unconditionally.

> >Given that the original code
> >was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
> >idea was to always report "Extended Topology Enumeration Leaf" as
> >supported so that userspace can enumerate the VM's topology to the guest
> >even when hardware itself doesn't do so.
> 
> If the host cpu mode is too antiquated to support 0xb, it wouldn't report
> 0xb for sure. The host cpuid.0.eax has been over 0xb for a long time and
> reached 0x1f in the latest SDM.
> 
> AFAICT, the original code keeps minimum cpuid.0.eax out of features guest
> just used or at least it claimed to use.
> 
> >
> >Assuming we want to allow userspace to use "V2 Extended Topology
> >Enumeration Leaf" regardless of hardware support, then this can simply be:
> >
> >   entry->eax = min(entry->eax, (u32)0x1f);
> >
> >Or am I completely missing something?
Like Xu April 24, 2019, 1:59 a.m. UTC | #6
On 2019/4/24 1:44, Sean Christopherson wrote:
> On Tue, Apr 23, 2019 at 11:23:59AM +0800, Like Xu wrote:
>> On 2019/4/23 2:35, Sean Christopherson wrote:
>>>>   #define F(x) bit(X86_FEATURE_##x)
>>>>   int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>>>> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>>>>   	switch (function) {
>>>>   	case 0:
>>>>   		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>>> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
>>>
>>> This all seems unnecessary.  And by 'all', I mean the existing Intel PT
>>> and XSAVE leaf checks, as well as the new mcp check.  entry->eax comes
>>> directly from hardware, and unless I missed something, PT and XSAVE are
>>> only exposed to the guest when they're supported in hardware.  In other
>>> words, KVM will never need to adjust entry->eax to expose PT or XSAVE.
>>
>> We call this function for both case KVM_GET_SUPPORTED_CPUID and
>> KVM_GET_EMULATED_CPUID although kvm user could reconfig them via
>> KVM_SET_CPUID* path.
> 
> Not that it matters, but __do_cpuid_ent() is only used for the non-emulated
> case, KVM_GET_EMULATED_CPUID goes to __do_cpuid_ent_emulated().

It's true and I have to mention we have two scenarios to get vCPUID:

1. For kvm_dev, we have KVM_GET_EMULATED_CPUID for 
kvm_dev_ioctl_get_cpuid; (we're talking about this)

2. For kvm_vcpu,we have KVM_GET_CPUID2 for kvm_vcpu_ioctl_get_cpuid2;

>   
>>> The original min() check was added by commit 0771671749b5 ("KVM: Enhance
>>> guest cpuid management"), which doesn't provide any explicit information
>>> on why KVM does min() in the first place.
>>
>> Exposing cpuid.0.eax in a blind way (with host hardware support)
>> is not a good practice for guest migration and improves compatibility
>> requirements.
> 
> Right, but isn't the f_intel_pt check for example completely irrelevant?
> f_intel_pt is true if and only if hardware supports PT, i.e. CPUID.0.EAX
> and thus entry->eax will already be >=0x14.

The f_intel_pt check is not only about hardware supports check but also 
module_param (pt_mode) supports check.

So the case is the host does have PT support which means (host 
CPUID.0.EAX already be >=0x14 for Intel CPUs) but kvm doesn't want 
advertise it and thus the min() operation is needed.

> 
> I don't fully understand whether or not KVM needs to raise the minimum to
> 0xb regardless of h/w XSAVE support, but it's likely irrelevant in the end.
> 
> Anyways, back to 0x1f, kvm_supported_intel_mcp() returns true if and only
> if hardware's CPUID.0.EAX >= 0x1f, 

According to latest SDM, the max hardware CPUID.0.EAX is 0x1f and BIOS 
would expose 0x1f only for multi-chip packaging CPUs (at least for now).

> i.e. adjusting entry->eax is always a
> nop.  So if KVM wants to advertise leaf 0x1f only when it's supported in
> hardware then adjusting entry->eax is unnecessary, and if KVM wants to
> unconditionally advertise 0x1f then adjusting entry->eax should also be
> done unconditionally.

It we have no check on kvm_supported_intel_mcp() in legacy code,
CPUID.0.EAX would be min() and thus less than 0x1f which means the 
cpuid.1f info is not exposed.

I know your point is to avoid min() totally (I thought so at the time) 
and I have pointed out it's necessary for kvm features setting.

If KVM wants to unconditionally advertise 0x1f (in EMULATED way),
kvm needs cover other side effects and this patch only advertises 0x1f
when hardware has it.

It's very common that guest wants to set 0x1f regardless of h/w support
and this is another story.

> 
>>> Given that the original code
>>> was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
>>> idea was to always report "Extended Topology Enumeration Leaf" as
>>> supported so that userspace can enumerate the VM's topology to the guest
>>> even when hardware itself doesn't do so.
>>
>> If the host cpu mode is too antiquated to support 0xb, it wouldn't report
>> 0xb for sure. The host cpuid.0.eax has been over 0xb for a long time and
>> reached 0x1f in the latest SDM.
>>
>> AFAICT, the original code keeps minimum cpuid.0.eax out of features guest
>> just used or at least it claimed to use.
>>
>>>
>>> Assuming we want to allow userspace to use "V2 Extended Topology
>>> Enumeration Leaf" regardless of hardware support, then this can simply be:
>>>
>>>    entry->eax = min(entry->eax, (u32)0x1f);
>>>
>>> Or am I completely missing something?
>
Sean Christopherson April 24, 2019, 1:53 p.m. UTC | #7
On Wed, Apr 24, 2019 at 09:59:50AM +0800, Like Xu wrote:
> On 2019/4/24 1:44, Sean Christopherson wrote:
> >Right, but isn't the f_intel_pt check for example completely irrelevant?
> >f_intel_pt is true if and only if hardware supports PT, i.e. CPUID.0.EAX
> >and thus entry->eax will already be >=0x14.
> 
> The f_intel_pt check is not only about hardware supports check but also
> module_param (pt_mode) supports check.
> 
> So the case is the host does have PT support which means (host CPUID.0.EAX
> already be >=0x14 for Intel CPUs) but kvm doesn't want advertise it and thus
> the min() operation is needed.
> 
> >
> >I don't fully understand whether or not KVM needs to raise the minimum to
> >0xb regardless of h/w XSAVE support, but it's likely irrelevant in the end.
> >
> >Anyways, back to 0x1f, kvm_supported_intel_mcp() returns true if and only
> >if hardware's CPUID.0.EAX >= 0x1f,
> 
> According to latest SDM, the max hardware CPUID.0.EAX is 0x1f and BIOS would
> expose 0x1f only for multi-chip packaging CPUs (at least for now).
> 
> >i.e. adjusting entry->eax is always a
> >nop.  So if KVM wants to advertise leaf 0x1f only when it's supported in
> >hardware then adjusting entry->eax is unnecessary, and if KVM wants to
> >unconditionally advertise 0x1f then adjusting entry->eax should also be
> >done unconditionally.
> 
> It we have no check on kvm_supported_intel_mcp() in legacy code,
> CPUID.0.EAX would be min() and thus less than 0x1f which means the cpuid.1f
> info is not exposed.

Ah crud, I'm an idiot.  I just spent two days conflating min() and max().
So yeah, everything makes total sense now.  My apologies for wasting your
time, I'll re-review the patch. 

> 
> I know your point is to avoid min() totally (I thought so at the time) and I
> have pointed out it's necessary for kvm features setting.
> 
> If KVM wants to unconditionally advertise 0x1f (in EMULATED way),
> kvm needs cover other side effects and this patch only advertises 0x1f
> when hardware has it.
> 
> It's very common that guest wants to set 0x1f regardless of h/w support
> and this is another story.
> 
> >
> >>>Given that the original code
> >>>was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
> >>>idea was to always report "Extended Topology Enumeration Leaf" as
> >>>supported so that userspace can enumerate the VM's topology to the guest
> >>>even when hardware itself doesn't do so.
> >>
> >>If the host cpu mode is too antiquated to support 0xb, it wouldn't report
> >>0xb for sure. The host cpuid.0.eax has been over 0xb for a long time and
> >>reached 0x1f in the latest SDM.
> >>
> >>AFAICT, the original code keeps minimum cpuid.0.eax out of features guest
> >>just used or at least it claimed to use.
> >>
> >>>
> >>>Assuming we want to allow userspace to use "V2 Extended Topology
> >>>Enumeration Leaf" regardless of hardware support, then this can simply be:
> >>>
> >>>   entry->eax = min(entry->eax, (u32)0x1f);
> >>>
> >>>Or am I completely missing something?
> >
>
Sean Christopherson April 24, 2019, 2:32 p.m. UTC | #8
Now that I understand how min() works...

On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
> host system has multiple software-visible die within each package.
> 
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/kvm/cpuid.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index fd39516..9fc14f2 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
>  	return xcr0;
>  }
>  
> +/* We need to check if the host cpu has multi-chip packaging technology. */
> +static bool kvm_supported_intel_mcp(void)
> +{
> +	u32 eax, ignored;
> +
> +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);

This is broken because of how CPUID works for unsupported input leafs:

  If a value entered for CPUID.EAX is higher than the maximum input value
  for basic or extended function for that processor then the data for the
  highest basic information leaf is returned. 

For example, my system with a max basic leaf of 0x16 returns 0x00000e74
for CPUID.1F.EAX.

> +
> +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);

Checking 'eax != 0' is broken as it will be '0' when SMT is disabled.  ecx
is the obvious choice since bits 15:8 are guaranteed to be non-zero when
the leaf is valid.

I think we can skip the vendor check.  AFAIK, CPUID.1F isn't used by AMD,
and since AMD and Intel try to maintain a semblance of CPUID compatibility
it seems more likely that AMD/Hygon would implement CPUID.1F as-is rather
than repurpose it to mean something else entirely.

> +}
> +
>  #define F(x) bit(X86_FEATURE_##x)
>  
>  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  	switch (function) {
>  	case 0:
>  		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;

If we put everything together, I think the code can be reduced to:

		/* comment about multi-chip leaf... */
		if (entry->eax >= 0x1f && cpuid_ecx(0x1f))
			entry->eax = 0x1f;
		else
			entry->eax = min(entry->eax,
					 (u32)(f_intel_pt ? 0x14 : 0xd));
>  		break;
>  	case 1:
>  		entry->edx &= kvm_cpuid_1_edx_x86_features;
> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		entry->edx = edx.full;
>  		break;
>  	}
> +	/* function 0x1f has additional index. */
> +	case 0x1f:
>  	/* function 0xb has additional index. */
>  	case 0xb: {
>  		int i, level_type;
> -- 
> 1.8.3.1
>
Like Xu April 25, 2019, 2:58 a.m. UTC | #9
On 2019/4/24 22:32, Sean Christopherson wrote:
> Now that I understand how min() works...
> 
> On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
>> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
>> host system has multiple software-visible die within each package.
>>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> ---
>>   arch/x86/kvm/cpuid.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index fd39516..9fc14f2 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
>>   	return xcr0;
>>   }
>>   
>> +/* We need to check if the host cpu has multi-chip packaging technology. */
>> +static bool kvm_supported_intel_mcp(void)
>> +{
>> +	u32 eax, ignored;
>> +
>> +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
> 
> This is broken because of how CPUID works for unsupported input leafs:
> 
>    If a value entered for CPUID.EAX is higher than the maximum input value
>    for basic or extended function for that processor then the data for the
>    highest basic information leaf is returned.
> 
> For example, my system with a max basic leaf of 0x16 returns 0x00000e74
> for CPUID.1F.EAX.

You're right and the cpuid.1f.eax check is unreliable after I checked a 
few machines.

> 
>> +
>> +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
> 
> Checking 'eax != 0' is broken as it will be '0' when SMT is disabled.  ecx
> is the obvious choice since bits 15:8 are guaranteed to be non-zero when
> the leaf is valid.

I agree with this and ecx[15:8] makes sense.

> 
> I think we can skip the vendor check.  AFAIK, CPUID.1F isn't used by AMD,
> and since AMD and Intel try to maintain a semblance of CPUID compatibility
> it seems more likely that AMD/Hygon would implement CPUID.1F as-is rather
> than repurpose it to mean something else entirely.

If it's true, let's skip the vendor check.

// I have to mention that AMD already has MCP CPUs.

> 
>> +}
>> +
>>   #define F(x) bit(X86_FEATURE_##x)
>>   
>>   int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>>   	switch (function) {
>>   	case 0:
>>   		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
> 
> If we put everything together, I think the code can be reduced to:
> 
> 		/* comment about multi-chip leaf... */
> 		if (entry->eax >= 0x1f && cpuid_ecx(0x1f))
> 			entry->eax = 0x1f;
> 		else
> 			entry->eax = min(entry->eax,
> 					 (u32)(f_intel_pt ? 0x14 : 0xd));

Based on:

	ECX Bits 07 - 00: Level number. Same value in ECX input.
	Bits 15 - 08: Level type.
	Bits 31 - 16: Reserved.

how about using an increasing order:

	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));

	// ... more checks when eax is between 0x14 and 0x1f if any

	/* Check if the host cpu has multi-chip packaging technology.*/
	if (((cpuid_ecx(0x1f) >> 8) & 0xff) != 0)
		entry->eax = 0x1f;

	// ... more checks when eax greater than 0x1f if any

are we OK with it?

>>   		break;
>>   	case 1:
>>   		entry->edx &= kvm_cpuid_1_edx_x86_features;
>> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>>   		entry->edx = edx.full;
>>   		break;
>>   	}
>> +	/* function 0x1f has additional index. */
>> +	case 0x1f:
>>   	/* function 0xb has additional index. */
>>   	case 0xb: {
>>   		int i, level_type;
>> -- 
>> 1.8.3.1
>>
>
Xiaoyao Li April 25, 2019, 4:18 a.m. UTC | #10
On Thu, 2019-04-25 at 10:58 +0800, Like Xu wrote:
> On 2019/4/24 22:32, Sean Christopherson wrote:
> > Now that I understand how min() works...
> > 
> > On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
> > > Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
> > > host system has multiple software-visible die within each package.
> > > 
> > > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > > ---
> > >   arch/x86/kvm/cpuid.c | 13 +++++++++++++
> > >   1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index fd39516..9fc14f2 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > >   	return xcr0;
> > >   }
> > >   
> > > +/* We need to check if the host cpu has multi-chip packaging technology.
> > > */
> > > +static bool kvm_supported_intel_mcp(void)
> > > +{
> > > +	u32 eax, ignored;
> > > +
> > > +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
> > 
> > This is broken because of how CPUID works for unsupported input leafs:
> > 
> >    If a value entered for CPUID.EAX is higher than the maximum input value
> >    for basic or extended function for that processor then the data for the
> >    highest basic information leaf is returned.
> > 
> > For example, my system with a max basic leaf of 0x16 returns 0x00000e74
> > for CPUID.1F.EAX.
> 
> You're right and the cpuid.1f.eax check is unreliable after I checked a 
> few machines.
> 
> > 
> > > +
> > > +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
> > 
> > Checking 'eax != 0' is broken as it will be '0' when SMT is disabled.  ecx
> > is the obvious choice since bits 15:8 are guaranteed to be non-zero when
> > the leaf is valid.
> 
> I agree with this and ecx[15:8] makes sense.
> 
> > 
> > I think we can skip the vendor check.  AFAIK, CPUID.1F isn't used by AMD,
> > and since AMD and Intel try to maintain a semblance of CPUID compatibility
> > it seems more likely that AMD/Hygon would implement CPUID.1F as-is rather
> > than repurpose it to mean something else entirely.
> 
> If it's true, let's skip the vendor check.
> 
> // I have to mention that AMD already has MCP CPUs.
> 
> > 
> > > +}
> > > +
> > >   #define F(x) bit(X86_FEATURE_##x)
> > >   
> > >   int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> > > @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct
> > > kvm_cpuid_entry2 *entry, u32 function,
> > >   	switch (function) {
> > >   	case 0:
> > >   		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 :
> > > 0xd));
> > > +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
> > 
> > If we put everything together, I think the code can be reduced to:
> > 
> > 		/* comment about multi-chip leaf... */
> > 		if (entry->eax >= 0x1f && cpuid_ecx(0x1f))
> > 			entry->eax = 0x1f;
> > 		else
> > 			entry->eax = min(entry->eax,
> > 					 (u32)(f_intel_pt ? 0x14 : 0xd));
> 
> Based on:
> 
> 	ECX Bits 07 - 00: Level number. Same value in ECX input.
> 	Bits 15 - 08: Level type.
> 	Bits 31 - 16: Reserved.
> 
> how about using an increasing order:
> 
> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> 
> 	// ... more checks when eax is between 0x14 and 0x1f if any
> 
> 	/* Check if the host cpu has multi-chip packaging technology.*/
> 	if (((cpuid_ecx(0x1f) >> 8) & 0xff) != 0)
> 		entry->eax = 0x1f;

As Sean pointed out, you cannot rely on the output of cpuid.1f to indicate the
existence of leaf 1f. If maximum basic leaf supported is smaller than 1f, the
data returned by cpuid_ecx(0x1f) is the actual highest basic information leaf of
the hardware.
So using "entry->eax >= 0x1f" from cpuid.0H is and only is the right way to
check the existence of leaf 1f.

We can simply use (cpuid_ecx(0x1f) & 0x0000ff00) to avoid the unnecessory
shifting operation.
Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes is that we
cannot assmue the reserved bits 31:16 of ECX is always 0 for the future
generation.

In my opinion, Sean's codes is OK and much simple and clear.
All need to do is using (cpuid_ecx(0x1f) & 0x0000ff00) to verify the leaf.1f is
valid.

Thanks,
-Xiaoyao
> 	// ... more checks when eax greater than 0x1f if any
> 
> are we OK with it?
> 
> > >   		break;
> > >   	case 1:
> > >   		entry->edx &= kvm_cpuid_1_edx_x86_features;
> > > @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct
> > > kvm_cpuid_entry2 *entry, u32 function,
> > >   		entry->edx = edx.full;
> > >   		break;
> > >   	}
> > > +	/* function 0x1f has additional index. */
> > > +	case 0x1f:
> > >   	/* function 0xb has additional index. */
> > >   	case 0xb: {
> > >   		int i, level_type;
> > > -- 
> > > 1.8.3.1
> > > 
> 
>
Like Xu April 25, 2019, 6:02 a.m. UTC | #11
On 2019/4/25 12:18, Xiaoyao Li wrote:
> On Thu, 2019-04-25 at 10:58 +0800, Like Xu wrote:
>> On 2019/4/24 22:32, Sean Christopherson wrote:
>>> Now that I understand how min() works...
>>>
>>> On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
>>>> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
>>>> host system has multiple software-visible die within each package.
>>>>
>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>> ---
>>>>    arch/x86/kvm/cpuid.c | 13 +++++++++++++
>>>>    1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>>> index fd39516..9fc14f2 100644
>>>> --- a/arch/x86/kvm/cpuid.c
>>>> +++ b/arch/x86/kvm/cpuid.c
>>>> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
>>>>    	return xcr0;
>>>>    }
>>>>    
>>>> +/* We need to check if the host cpu has multi-chip packaging technology.
>>>> */
>>>> +static bool kvm_supported_intel_mcp(void)
>>>> +{
>>>> +	u32 eax, ignored;
>>>> +
>>>> +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
>>>
>>> This is broken because of how CPUID works for unsupported input leafs:
>>>
>>>     If a value entered for CPUID.EAX is higher than the maximum input value
>>>     for basic or extended function for that processor then the data for the
>>>     highest basic information leaf is returned.
>>>
>>> For example, my system with a max basic leaf of 0x16 returns 0x00000e74
>>> for CPUID.1F.EAX.
>>
>> You're right and the cpuid.1f.eax check is unreliable after I checked a
>> few machines.
>>
>>>
>>>> +
>>>> +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
>>>
>>> Checking 'eax != 0' is broken as it will be '0' when SMT is disabled.  ecx
>>> is the obvious choice since bits 15:8 are guaranteed to be non-zero when
>>> the leaf is valid.
>>
>> I agree with this and ecx[15:8] makes sense.
>>
>>>
>>> I think we can skip the vendor check.  AFAIK, CPUID.1F isn't used by AMD,
>>> and since AMD and Intel try to maintain a semblance of CPUID compatibility
>>> it seems more likely that AMD/Hygon would implement CPUID.1F as-is rather
>>> than repurpose it to mean something else entirely.
>>
>> If it's true, let's skip the vendor check.
>>
>> // I have to mention that AMD already has MCP CPUs.
>>
>>>
>>>> +}
>>>> +
>>>>    #define F(x) bit(X86_FEATURE_##x)
>>>>    
>>>>    int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>>>> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct
>>>> kvm_cpuid_entry2 *entry, u32 function,
>>>>    	switch (function) {
>>>>    	case 0:
>>>>    		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 :
>>>> 0xd));
>>>> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
>>>
>>> If we put everything together, I think the code can be reduced to:
>>>
>>> 		/* comment about multi-chip leaf... */
>>> 		if (entry->eax >= 0x1f && cpuid_ecx(0x1f))
>>> 			entry->eax = 0x1f;
>>> 		else
>>> 			entry->eax = min(entry->eax,
>>> 					 (u32)(f_intel_pt ? 0x14 : 0xd));
>>
>> Based on:
>>
>> 	ECX Bits 07 - 00: Level number. Same value in ECX input.
>> 	Bits 15 - 08: Level type.
>> 	Bits 31 - 16: Reserved.
>>
>> how about using an increasing order:
>>
>> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>
>> 	// ... more checks when eax is between 0x14 and 0x1f if any
>>
>> 	/* Check if the host cpu has multi-chip packaging technology.*/
>> 	if (((cpuid_ecx(0x1f) >> 8) & 0xff) != 0)
>> 		entry->eax = 0x1f;
> 
> As Sean pointed out, you cannot rely on the output of cpuid.1f to indicate the
> existence of leaf 1f. If maximum basic leaf supported is smaller than 1f, the
> data returned by cpuid_ecx(0x1f) is the actual highest basic information leaf of
> the hardware.

I don't think so.

> So using "entry->eax >= 0x1f" from cpuid.0H is and only is the right way to
> check the existence of leaf 1f.
> 
> We can simply use (cpuid_ecx(0x1f) & 0x0000ff00) to avoid the unnecessory
> shifting operation.

I borrowed this "unnecessory" shifting operation from host 
check_extended_topology_leaf() and we may do better on this.

> Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes is that we
> cannot assmue the reserved bits 31:16 of ECX is always 0 for the future
> generation.

It's true cause the statement in public spec is not "Reserved = 0" but 
"Bits 31 - 16: Reserved".

> 
> In my opinion, Sean's codes is OK and much simple and clear.

If the host cpuid.0.eax is greater than 0x1f but actually it doesn't
have multi-chip packaging technology and we may want to expose 
entry->eax to some value smaller than 0x1f but greater than 0x14, much 
effort needs to apply on Sean's code.

My improvement is good to overwrite cpuid.0.eax in future usage
from the perspective of kvm feature setting not just from value check.

> All need to do is using (cpuid_ecx(0x1f) & 0x0000ff00) to verify the leaf.1f is
> valid.
> 
> Thanks,
> -Xiaoyao
>> 	// ... more checks when eax greater than 0x1f if any
>>
>> are we OK with it?
>>
>>>>    		break;
>>>>    	case 1:
>>>>    		entry->edx &= kvm_cpuid_1_edx_x86_features;
>>>> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct
>>>> kvm_cpuid_entry2 *entry, u32 function,
>>>>    		entry->edx = edx.full;
>>>>    		break;
>>>>    	}
>>>> +	/* function 0x1f has additional index. */
>>>> +	case 0x1f:
>>>>    	/* function 0xb has additional index. */
>>>>    	case 0xb: {
>>>>    		int i, level_type;
>>>> -- 
>>>> 1.8.3.1
>>>>
>>
>>
> 
>
Xiaoyao Li April 25, 2019, 6:30 a.m. UTC | #12
On Thu, 2019-04-25 at 14:02 +0800, Like Xu wrote:
> On 2019/4/25 12:18, Xiaoyao Li wrote:
> > On Thu, 2019-04-25 at 10:58 +0800, Like Xu wrote:
> > > On 2019/4/24 22:32, Sean Christopherson wrote:
> > > > Now that I understand how min() works...
> > > > 
> > > > On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
> > > > > Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
> > > > > host system has multiple software-visible die within each package.
> > > > > 
> > > > > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > > > > ---
> > > > >    arch/x86/kvm/cpuid.c | 13 +++++++++++++
> > > > >    1 file changed, 13 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > > index fd39516..9fc14f2 100644
> > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > > > >    	return xcr0;
> > > > >    }
> > > > >    
> > > > > +/* We need to check if the host cpu has multi-chip packaging
> > > > > technology.
> > > > > */
> > > > > +static bool kvm_supported_intel_mcp(void)
> > > > > +{
> > > > > +	u32 eax, ignored;
> > > > > +
> > > > > +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
> > > > 
> > > > This is broken because of how CPUID works for unsupported input leafs:
> > > > 
> > > >     If a value entered for CPUID.EAX is higher than the maximum input
> > > > value
> > > >     for basic or extended function for that processor then the data for
> > > > the
> > > >     highest basic information leaf is returned.
> > > > 
> > > > For example, my system with a max basic leaf of 0x16 returns 0x00000e74
> > > > for CPUID.1F.EAX.
> > > 
> > > You're right and the cpuid.1f.eax check is unreliable after I checked a
> > > few machines.
> > > 
> > > > 
> > > > > +
> > > > > +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax !=
> > > > > 0);
> > > > 
> > > > Checking 'eax != 0' is broken as it will be '0' when SMT is
> > > > disabled.  ecx
> > > > is the obvious choice since bits 15:8 are guaranteed to be non-zero when
> > > > the leaf is valid.
> > > 
> > > I agree with this and ecx[15:8] makes sense.
> > > 
> > > > 
> > > > I think we can skip the vendor check.  AFAIK, CPUID.1F isn't used by
> > > > AMD,
> > > > and since AMD and Intel try to maintain a semblance of CPUID
> > > > compatibility
> > > > it seems more likely that AMD/Hygon would implement CPUID.1F as-is
> > > > rather
> > > > than repurpose it to mean something else entirely.
> > > 
> > > If it's true, let's skip the vendor check.
> > > 
> > > // I have to mention that AMD already has MCP CPUs.
> > > 
> > > > 
> > > > > +}
> > > > > +
> > > > >    #define F(x) bit(X86_FEATURE_##x)
> > > > >    
> > > > >    int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> > > > > @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct
> > > > > kvm_cpuid_entry2 *entry, u32 function,
> > > > >    	switch (function) {
> > > > >    	case 0:
> > > > >    		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 :
> > > > > 0xd));
> > > > > +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry-
> > > > > >eax;
> > > > 
> > > > If we put everything together, I think the code can be reduced to:
> > > > 
> > > > 		/* comment about multi-chip leaf... */
> > > > 		if (entry->eax >= 0x1f && cpuid_ecx(0x1f))
> > > > 			entry->eax = 0x1f;
> > > > 		else
> > > > 			entry->eax = min(entry->eax,
> > > > 					 (u32)(f_intel_pt ? 0x14 :
> > > > 0xd));
> > > 
> > > Based on:
> > > 
> > > 	ECX Bits 07 - 00: Level number. Same value in ECX input.
> > > 	Bits 15 - 08: Level type.
> > > 	Bits 31 - 16: Reserved.
> > > 
> > > how about using an increasing order:
> > > 
> > > 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> > > 
> > > 	// ... more checks when eax is between 0x14 and 0x1f if any
> > > 
> > > 	/* Check if the host cpu has multi-chip packaging technology.*/
> > > 	if (((cpuid_ecx(0x1f) >> 8) & 0xff) != 0)
> > > 		entry->eax = 0x1f;
> > 
> > As Sean pointed out, you cannot rely on the output of cpuid.1f to indicate
> > the
> > existence of leaf 1f. If maximum basic leaf supported is smaller than 1f,
> > the
> > data returned by cpuid_ecx(0x1f) is the actual highest basic information
> > leaf of
> > the hardware.
> 
> I don't think so.
> 
> > So using "entry->eax >= 0x1f" from cpuid.0H is and only is the right way to
> > check the existence of leaf 1f.
> > 
> > We can simply use (cpuid_ecx(0x1f) & 0x0000ff00) to avoid the unnecessory
> > shifting operation.
> 
> I borrowed this "unnecessory" shifting operation from host 
> check_extended_topology_leaf() and we may do better on this.
> 
> > Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes is that
> > we
> > cannot assmue the reserved bits 31:16 of ECX is always 0 for the future
> > generation.
> 
> It's true cause the statement in public spec is not "Reserved = 0" but 
> "Bits 31 - 16: Reserved".
> 
> > 
> > In my opinion, Sean's codes is OK and much simple and clear.
> 
> If the host cpuid.0.eax is greater than 0x1f but actually it doesn't
> have multi-chip packaging technology and we may want to expose 
> entry->eax to some value smaller than 0x1f but greater than 0x14, much 
> effort needs to apply on Sean's code.
> 
> My improvement is good to overwrite cpuid.0.eax in future usage
> from the perspective of kvm feature setting not just from value check.

Alright, there is something wrong in your code that you haven't realised. 

When you do 
	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));

it changes the entry->eax if entry->eax > 0x14. So you cannot directly use
cpuid_ecx(0x1f). At least, you need to cache the value of entry->eax, like:
	
	u32 max_leaf = entry->eax;
	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
	
	//...leaf between 0x14 and 0x1f
	
	if (max_leaf >= 0x1f && (cpuid_ecx(0x1f) & 0x0000ff00))
 		entry->eax = 0x1f;

However, handling in increasing order in totally wrong. Since it's to report the
max the leaf supported, we should handle in descending order, which is what Sean
does.

> > All need to do is using (cpuid_ecx(0x1f) & 0x0000ff00) to verify the leaf.1f
> > is
> > valid.
> > 
> > Thanks,
> > -Xiaoyao
> > > 	// ... more checks when eax greater than 0x1f if any
> > > 
> > > are we OK with it?
> > > 
> > > > >    		break;
> > > > >    	case 1:
> > > > >    		entry->edx &= kvm_cpuid_1_edx_x86_features;
> > > > > @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct
> > > > > kvm_cpuid_entry2 *entry, u32 function,
> > > > >    		entry->edx = edx.full;
> > > > >    		break;
> > > > >    	}
> > > > > +	/* function 0x1f has additional index. */
> > > > > +	case 0x1f:
> > > > >    	/* function 0xb has additional index. */
> > > > >    	case 0xb: {
> > > > >    		int i, level_type;
> > > > > -- 
> > > > > 1.8.3.1
> > > > > 
> > > 
> > > 
> > 
> > 
> 
>
Like Xu April 25, 2019, 7:07 a.m. UTC | #13
On 2019/4/25 14:30, Xiaoyao Li wrote:
> On Thu, 2019-04-25 at 14:02 +0800, Like Xu wrote:
>> On 2019/4/25 12:18, Xiaoyao Li wrote:
>>> On Thu, 2019-04-25 at 10:58 +0800, Like Xu wrote:
>>>> On 2019/4/24 22:32, Sean Christopherson wrote:
>>>>> Now that I understand how min() works...
>>>>>
>>>>> On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
>>>>>> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
>>>>>> host system has multiple software-visible die within each package.
>>>>>>
>>>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>>>> ---
>>>>>>     arch/x86/kvm/cpuid.c | 13 +++++++++++++
>>>>>>     1 file changed, 13 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>>>>> index fd39516..9fc14f2 100644
>>>>>> --- a/arch/x86/kvm/cpuid.c
>>>>>> +++ b/arch/x86/kvm/cpuid.c
>>>>>> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
>>>>>>     	return xcr0;
>>>>>>     }
>>>>>>     
>>>>>> +/* We need to check if the host cpu has multi-chip packaging
>>>>>> technology.
>>>>>> */
>>>>>> +static bool kvm_supported_intel_mcp(void)
>>>>>> +{
>>>>>> +	u32 eax, ignored;
>>>>>> +
>>>>>> +	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
>>>>>
>>>>> This is broken because of how CPUID works for unsupported input leafs:
>>>>>
>>>>>      If a value entered for CPUID.EAX is higher than the maximum input
>>>>> value
>>>>>      for basic or extended function for that processor then the data for
>>>>> the
>>>>>      highest basic information leaf is returned.
>>>>>
>>>>> For example, my system with a max basic leaf of 0x16 returns 0x00000e74
>>>>> for CPUID.1F.EAX.
>>>>
>>>> You're right and the cpuid.1f.eax check is unreliable after I checked a
>>>> few machines.
>>>>
>>>>>
>>>>>> +
>>>>>> +	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax !=
>>>>>> 0);
>>>>>
>>>>> Checking 'eax != 0' is broken as it will be '0' when SMT is
>>>>> disabled.  ecx
>>>>> is the obvious choice since bits 15:8 are guaranteed to be non-zero when
>>>>> the leaf is valid.
>>>>
>>>> I agree with this and ecx[15:8] makes sense.
>>>>
>>>>>
>>>>> I think we can skip the vendor check.  AFAIK, CPUID.1F isn't used by
>>>>> AMD,
>>>>> and since AMD and Intel try to maintain a semblance of CPUID
>>>>> compatibility
>>>>> it seems more likely that AMD/Hygon would implement CPUID.1F as-is
>>>>> rather
>>>>> than repurpose it to mean something else entirely.
>>>>
>>>> If it's true, let's skip the vendor check.
>>>>
>>>> // I have to mention that AMD already has MCP CPUs.
>>>>
>>>>>
>>>>>> +}
>>>>>> +
>>>>>>     #define F(x) bit(X86_FEATURE_##x)
>>>>>>     
>>>>>>     int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>>>>>> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct
>>>>>> kvm_cpuid_entry2 *entry, u32 function,
>>>>>>     	switch (function) {
>>>>>>     	case 0:
>>>>>>     		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 :
>>>>>> 0xd));
>>>>>> +		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry-
>>>>>>> eax;
>>>>>
>>>>> If we put everything together, I think the code can be reduced to:
>>>>>
>>>>> 		/* comment about multi-chip leaf... */
>>>>> 		if (entry->eax >= 0x1f && cpuid_ecx(0x1f))
>>>>> 			entry->eax = 0x1f;
>>>>> 		else
>>>>> 			entry->eax = min(entry->eax,
>>>>> 					 (u32)(f_intel_pt ? 0x14 :
>>>>> 0xd));
>>>>
>>>> Based on:
>>>>
>>>> 	ECX Bits 07 - 00: Level number. Same value in ECX input.
>>>> 	Bits 15 - 08: Level type.
>>>> 	Bits 31 - 16: Reserved.
>>>>
>>>> how about using an increasing order:
>>>>
>>>> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>>>
>>>> 	// ... more checks when eax is between 0x14 and 0x1f if any
>>>>
>>>> 	/* Check if the host cpu has multi-chip packaging technology.*/
>>>> 	if (((cpuid_ecx(0x1f) >> 8) & 0xff) != 0)
>>>> 		entry->eax = 0x1f;
>>>
>>> As Sean pointed out, you cannot rely on the output of cpuid.1f to indicate
>>> the
>>> existence of leaf 1f. If maximum basic leaf supported is smaller than 1f,
>>> the
>>> data returned by cpuid_ecx(0x1f) is the actual highest basic information
>>> leaf of
>>> the hardware.
>>
>> I don't think so.
>>
>>> So using "entry->eax >= 0x1f" from cpuid.0H is and only is the right way to
>>> check the existence of leaf 1f.
>>>
>>> We can simply use (cpuid_ecx(0x1f) & 0x0000ff00) to avoid the unnecessory
>>> shifting operation.
>>
>> I borrowed this "unnecessory" shifting operation from host
>> check_extended_topology_leaf() and we may do better on this.
>>
>>> Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes is that
>>> we
>>> cannot assmue the reserved bits 31:16 of ECX is always 0 for the future
>>> generation.
>>
>> It's true cause the statement in public spec is not "Reserved = 0" but
>> "Bits 31 - 16: Reserved".
>>
>>>
>>> In my opinion, Sean's codes is OK and much simple and clear.
>>
>> If the host cpuid.0.eax is greater than 0x1f but actually it doesn't
>> have multi-chip packaging technology and we may want to expose
>> entry->eax to some value smaller than 0x1f but greater than 0x14, much
>> effort needs to apply on Sean's code.
>>
>> My improvement is good to overwrite cpuid.0.eax in future usage
>> from the perspective of kvm feature setting not just from value check.
> 
> Alright, there is something wrong in your code that you haven't realised.
> 
> When you do
> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> 
> it changes the entry->eax if entry->eax > 0x14. So you cannot directly use
> cpuid_ecx(0x1f). At least, you need to cache the value of entry->eax, like:
> 	
> 	u32 max_leaf = entry->eax;
> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> 	
> 	//...leaf between 0x14 and 0x1f
> 	
> 	if (max_leaf >= 0x1f && (cpuid_ecx(0x1f) & 0x0000ff00))
>   		entry->eax = 0x1f;

The cache value make no sense on this.

> 
> However, handling in increasing order in totally wrong. Since it's to report the
> max the leaf supported, we should handle in descending order, which is what Sean
> does.

There is no need to check "entry->eax >= 0x1f" before "setting 
entry->eax = 0x1f" if and only if cpuid_ecx(0x1f) meets requirements.

An increasing manner helps to overwrite this value on demand in a flat 
code flow (easy to understand and maintain) not an if-else_if-else flow.

> 
>>> All need to do is using (cpuid_ecx(0x1f) & 0x0000ff00) to verify the leaf.1f
>>> is
>>> valid.
>>>
>>> Thanks,
>>> -Xiaoyao
>>>> 	// ... more checks when eax greater than 0x1f if any
>>>>
>>>> are we OK with it?
>>>>
>>>>>>     		break;
>>>>>>     	case 1:
>>>>>>     		entry->edx &= kvm_cpuid_1_edx_x86_features;
>>>>>> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct
>>>>>> kvm_cpuid_entry2 *entry, u32 function,
>>>>>>     		entry->edx = edx.full;
>>>>>>     		break;
>>>>>>     	}
>>>>>> +	/* function 0x1f has additional index. */
>>>>>> +	case 0x1f:
>>>>>>     	/* function 0xb has additional index. */
>>>>>>     	case 0xb: {
>>>>>>     		int i, level_type;
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
>
Sean Christopherson April 25, 2019, 2:19 p.m. UTC | #14
On Thu, Apr 25, 2019 at 03:07:35PM +0800, Like Xu wrote:
> On 2019/4/25 14:30, Xiaoyao Li wrote:
> >>>Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes is
> >>>that we cannot assmue the reserved bits 31:16 of ECX is always 0 for the
> >>>future generation.

It doesn't matter if future CPUs use 31:16 for other things since this
code only cares about whether or not CPUID.1F exists.  The check
entry->eax >= 0x1f ensures that CPUID.1F will return zeros if the leaf
does *NOT* exist, ergo the check against CPUID.1F.ECX only needs to
look for a non-zero value.  CPUID.1F.ECX is the logical choice for
detecting support because it is guaranteed to be non-zero if the leaf
actually exists (because bits 15:8 will be non-zero).  Whether or not
bits 31:16 are non-zero is irrelevant.

> >>
> >>It's true cause the statement in public spec is not "Reserved = 0" but
> >>"Bits 31 - 16: Reserved".
> >>
> >>>
> >>>In my opinion, Sean's codes is OK and much simple and clear.
> >>
> >>If the host cpuid.0.eax is greater than 0x1f but actually it doesn't
> >>have multi-chip packaging technology and we may want to expose
> >>entry->eax to some value smaller than 0x1f but greater than 0x14, much
> >>effort needs to apply on Sean's code.
> >>
> >>My improvement is good to overwrite cpuid.0.eax in future usage
> >>from the perspective of kvm feature setting not just from value check.
> >
> >Alright, there is something wrong in your code that you haven't realised.
> >
> >When you do
> >	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> >
> >it changes the entry->eax if entry->eax > 0x14. So you cannot directly use
> >cpuid_ecx(0x1f). At least, you need to cache the value of entry->eax, like:
> >	
> >	u32 max_leaf = entry->eax;
> >	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> >	
> >	//...leaf between 0x14 and 0x1f
> >	
> >	if (max_leaf >= 0x1f && (cpuid_ecx(0x1f) & 0x0000ff00))
> >  		entry->eax = 0x1f;
> 
> The cache value make no sense on this.

Xiaoyao is pointing out that by limiting entry->eax to 0x14 (or 0xd), then
it can't be used to detect support for CPUID.1F since KVM will have lost
the required information.

> >However, handling in increasing order in totally wrong. Since it's to report the
> >max the leaf supported, we should handle in descending order, which is what Sean
> >does.
> 
> There is no need to check "entry->eax >= 0x1f" before "setting entry->eax =
> 0x1f" if and only if cpuid_ecx(0x1f) meets requirements.

entry->eax absolutely needs to be checked, otherwise you have no idea what
CPUID leaf is actually being consumed.  For example, a CPU with a maximum
basic CPUID leaf of 0xB will report information that is indistinguishable
from the actual CPUID.1F, i.e. KVM will incorrectly think the CPU supports
CPUID.1F regardless of what heuristic is used to detect a "valid" CPUID.1F
if it only looks at the result of CPUID.1F.

> An increasing manner helps to overwrite this value on demand in a flat code
> flow (easy to understand and maintain) not an if-else_if-else flow.

Maybe in the future the code will need to be refactored as more cases are
added, but for now an if-else is quite readable.  Worry about the future
when it happens. :-)
Like Xu April 25, 2019, 3:33 p.m. UTC | #15
On 2019/4/25 22:19, Sean Christopherson wrote:
> On Thu, Apr 25, 2019 at 03:07:35PM +0800, Like Xu wrote:
>> On 2019/4/25 14:30, Xiaoyao Li wrote:
>>>>> Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes is
>>>>> that we cannot assmue the reserved bits 31:16 of ECX is always 0 for the
>>>>> future generation.
> 
> It doesn't matter if future CPUs use 31:16 for other things since this
> code only cares about whether or not CPUID.1F exists.  The check
> entry->eax >= 0x1f ensures that CPUID.1F will return zeros if the leaf
> does *NOT* exist, ergo the check against CPUID.1F.ECX only needs to
> look for a non-zero value.  CPUID.1F.ECX is the logical choice for
> detecting support because it is guaranteed to be non-zero if the leaf
> actually exists (because bits 15:8 will be non-zero).  Whether or not
> bits 31:16 are non-zero is irrelevant.

Here is a case:

one future CPU may have "cpuid.0.eax > 0x1f" but not use multi-chip 
packaging technology thus its CPUID.1F leaf **could** not exist to 
expose multi-die info and it still use cpuid.0B leaf.

So the entry->eax >= 0x1f check is true and cpuid_ecx(0x1f) check is 
true as well due to default return value. (one of my machines for 
cpuid.1f.ecx would return 0x00000064 without cpuid.1f support).

When we only cares about whether or not CPUID.1F exists,
we may need (cpuid_ecx(0x1f) & 0x0000ff00) != 0 or just follow what host 
check_extended_topology_leaf() does.

> 
>>>>
>>>> It's true cause the statement in public spec is not "Reserved = 0" but
>>>> "Bits 31 - 16: Reserved".
>>>>
>>>>>
>>>>> In my opinion, Sean's codes is OK and much simple and clear.
>>>>
>>>> If the host cpuid.0.eax is greater than 0x1f but actually it doesn't
>>>> have multi-chip packaging technology and we may want to expose
>>>> entry->eax to some value smaller than 0x1f but greater than 0x14, much
>>>> effort needs to apply on Sean's code.
>>>>
>>>> My improvement is good to overwrite cpuid.0.eax in future usage
>>> >from the perspective of kvm feature setting not just from value check.
>>>
>>> Alright, there is something wrong in your code that you haven't realised.
>>>
>>> When you do
>>> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>>
>>> it changes the entry->eax if entry->eax > 0x14. So you cannot directly use
>>> cpuid_ecx(0x1f). At least, you need to cache the value of entry->eax, like:
>>> 	
>>> 	u32 max_leaf = entry->eax;
>>> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>> 	
>>> 	//...leaf between 0x14 and 0x1f
>>> 	
>>> 	if (max_leaf >= 0x1f && (cpuid_ecx(0x1f) & 0x0000ff00))
>>>   		entry->eax = 0x1f;
>>
>> The cache value make no sense on this.
> 
> Xiaoyao is pointing out that by limiting entry->eax to 0x14 (or 0xd), then
> it can't be used to detect support for CPUID.1F since KVM will have lost
> the required information.

The point is that its existence does not really depend on cpuid.0.eax> = 
0x1f. My test machine (CLX-AP, two die in one package) has cpuid.0.eax = 
0x16 but does have CPUID.1F support (a bit strange on BIOS).

> 
>>> However, handling in increasing order in totally wrong. Since it's to report the
>>> max the leaf supported, we should handle in descending order, which is what Sean
>>> does.
>>
>> There is no need to check "entry->eax >= 0x1f" before "setting entry->eax =
>> 0x1f" if and only if cpuid_ecx(0x1f) meets requirements.
> 
> entry->eax absolutely needs to be checked, otherwise you have no idea what
> CPUID leaf is actually being consumed.  For example, a CPU with a maximum
> basic CPUID leaf of 0xB will report information that is indistinguishable
> from the actual CPUID.1F, i.e. KVM will incorrectly think the CPU supports
> CPUID.1F regardless of what heuristic is used to detect a "valid" CPUID.1F
> if it only looks at the result of CPUID.1F.

Based on what I mentioned, just reconsider this proposal:

case 0:
     entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
     if ((cpuid_ecx(0x1f) & 0x0000ff00)) != 0)
         entry->eax = 0x1f;

It increases the cpuid.0.ecx value along with the minimized feature 
requirement in a natural and readable way. Why don't we design a little 
bit future ahead of time?

> 
>> An increasing manner helps to overwrite this value on demand in a flat code
>> flow (easy to understand and maintain) not an if-else_if-else flow.
> 
> Maybe in the future the code will need to be refactored as more cases are
> added, but for now an if-else is quite readable.  Worry about the future
> when it happens. :-)
>
Xiaoyao Li April 25, 2019, 4:28 p.m. UTC | #16
On Thu, 2019-04-25 at 23:33 +0800, Like Xu wrote:
> On 2019/4/25 22:19, Sean Christopherson wrote:
> > On Thu, Apr 25, 2019 at 03:07:35PM +0800, Like Xu wrote:
> > > On 2019/4/25 14:30, Xiaoyao Li wrote:
> > > > > > Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes
> > > > > > is
> > > > > > that we cannot assmue the reserved bits 31:16 of ECX is always 0 for
> > > > > > the
> > > > > > future generation.
> > 
> > It doesn't matter if future CPUs use 31:16 for other things since this
> > code only cares about whether or not CPUID.1F exists.  The check
> > entry->eax >= 0x1f ensures that CPUID.1F will return zeros if the leaf
> > does *NOT* exist, ergo the check against CPUID.1F.ECX only needs to
> > look for a non-zero value.  CPUID.1F.ECX is the logical choice for
> > detecting support because it is guaranteed to be non-zero if the leaf
> > actually exists (because bits 15:8 will be non-zero).  Whether or not
> > bits 31:16 are non-zero is irrelevant.
> 
> Here is a case:
> 
> one future CPU may have "cpuid.0.eax > 0x1f" but not use multi-chip 
> packaging technology thus its CPUID.1F leaf **could** not exist to 
> expose multi-die info and it still use cpuid.0B leaf.
> 
> So the entry->eax >= 0x1f check is true and cpuid_ecx(0x1f) check is 
> true as well due to default return value. (one of my machines for 
> cpuid.1f.ecx would return 0x00000064 without cpuid.1f support).

You mean entry->eax >= 0x1f, and CPUID.1F leaf doesn't exist.
In this case, cpuid_ecx(0x1f) must be zero.

You should the descripton of CPUID instruction in SDM. It says:

   If a value entered for CPUID.EAX is less than or equal to the maximum input
   value and the leaf is not supported on that processor then 0 is returned in
   all the registers.

I can tell you why the cpuid.1f.exc return 0x00000064 in your machines.
That's the value of leaf 0x16, you can check the output of cpuid.0x0_eax, it
should be 0x00000016.

> When we only cares about whether or not CPUID.1F exists,
> we may need (cpuid_ecx(0x1f) & 0x0000ff00) != 0 or just follow what host 
> check_extended_topology_leaf() does.
> > 
> > > > > 
> > > > > It's true cause the statement in public spec is not "Reserved = 0" but
> > > > > "Bits 31 - 16: Reserved".
> > > > > 
> > > > > > 
> > > > > > In my opinion, Sean's codes is OK and much simple and clear.
> > > > > 
> > > > > If the host cpuid.0.eax is greater than 0x1f but actually it doesn't
> > > > > have multi-chip packaging technology and we may want to expose
> > > > > entry->eax to some value smaller than 0x1f but greater than 0x14, much
> > > > > effort needs to apply on Sean's code.
> > > > > 
> > > > > My improvement is good to overwrite cpuid.0.eax in future usage
> > > > > from the perspective of kvm feature setting not just from value check.
> > > > 
> > > > Alright, there is something wrong in your code that you haven't
> > > > realised.
> > > > 
> > > > When you do
> > > > 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> > > > 
> > > > it changes the entry->eax if entry->eax > 0x14. So you cannot directly
> > > > use
> > > > cpuid_ecx(0x1f). At least, you need to cache the value of entry->eax,
> > > > like:
> > > > 	
> > > > 	u32 max_leaf = entry->eax;
> > > > 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> > > > 	
> > > > 	//...leaf between 0x14 and 0x1f
> > > > 	
> > > > 	if (max_leaf >= 0x1f && (cpuid_ecx(0x1f) & 0x0000ff00))
> > > >   		entry->eax = 0x1f;
> > > 
> > > The cache value make no sense on this.
> > 
> > Xiaoyao is pointing out that by limiting entry->eax to 0x14 (or 0xd), then
> > it can't be used to detect support for CPUID.1F since KVM will have lost
> > the required information.
> 
> The point is that its existence does not really depend on cpuid.0.eax> = 
> 0x1f. My test machine (CLX-AP, two die in one package) has cpuid.0.eax = 
> 0x16 but does have CPUID.1F support (a bit strange on BIOS).

I think it must be something wrong with your CLX-AP.
I tested on my CLX-AP machine too, the cpuid.0x0_eax is 0x000000001f.

> > 
> > > > However, handling in increasing order in totally wrong. Since it's to
> > > > report the
> > > > max the leaf supported, we should handle in descending order, which is
> > > > what Sean
> > > > does.
> > > 
> > > There is no need to check "entry->eax >= 0x1f" before "setting entry->eax
> > > =
> > > 0x1f" if and only if cpuid_ecx(0x1f) meets requirements.
> > 
> > entry->eax absolutely needs to be checked, otherwise you have no idea what
> > CPUID leaf is actually being consumed.  For example, a CPU with a maximum
> > basic CPUID leaf of 0xB will report information that is indistinguishable
> > from the actual CPUID.1F, i.e. KVM will incorrectly think the CPU supports
> > CPUID.1F regardless of what heuristic is used to detect a "valid" CPUID.1F
> > if it only looks at the result of CPUID.1F.
> 
> Based on what I mentioned, just reconsider this proposal:
> 
> case 0:
>      entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>      if ((cpuid_ecx(0x1f) & 0x0000ff00)) != 0)
>          entry->eax = 0x1f;
> 
> It increases the cpuid.0.ecx value along with the minimized feature 
> requirement in a natural and readable way. Why don't we design a little 
> bit future ahead of time?

There are two things I want to state:

1. using cpuid_ecx(0x1f) to check the existence of leaf 1f is absolutely wrong.

Using the return value of E{A,B,C,D}X of cpuid leaf *N* to check the existence
of leaf *N* is totally wrong. We need first to ensure that cpuid.0_eax >= *N*,
that's the reason why we need cpuid.0_eax.

Specifically, about cpuid leaf 1f, in page 3-222 Vol.2A of latest SDM publish on
January 2019, the description of Input EAX = 1FH is:

   When CPUID executes with EAX set to 1FH, the processor returns information
   about extended topology enumeration data. Software must detect the presence
   of CPUID leaf 1FH by verifying (a) the highest leaf index supported by CPUID
   is >= 1FH, and (b) CPUID.1FH:EBX[15:0] reports a non-zero value.

2. Checking in descending manner is better that increasing manner.

In increasing manner, we have to check from the smallest one by one, there will
be some useless check for the smaller one. 
   
However, in descending manner, once it finds the supported maxmium one, there is
no need to check the smaller leaf. 

> > 
> > > An increasing manner helps to overwrite this value on demand in a flat
> > > code
> > > flow (easy to understand and maintain) not an if-else_if-else flow.
> > 
> > Maybe in the future the code will need to be refactored as more cases are
> > added, but for now an if-else is quite readable.  Worry about the future
> > when it happens. :-)
> > 
> 
>
Like Xu April 26, 2019, 1:30 a.m. UTC | #17
On 2019/4/26 0:28, Xiaoyao Li wrote:
> On Thu, 2019-04-25 at 23:33 +0800, Like Xu wrote:
>> On 2019/4/25 22:19, Sean Christopherson wrote:
>>> On Thu, Apr 25, 2019 at 03:07:35PM +0800, Like Xu wrote:
>>>> On 2019/4/25 14:30, Xiaoyao Li wrote:
>>>>>>> Besides, the problem of simply using cpuid_exc(0x1f) in Sean's codes
>>>>>>> is
>>>>>>> that we cannot assmue the reserved bits 31:16 of ECX is always 0 for
>>>>>>> the
>>>>>>> future generation.
>>>
>>> It doesn't matter if future CPUs use 31:16 for other things since this
>>> code only cares about whether or not CPUID.1F exists.  The check
>>> entry->eax >= 0x1f ensures that CPUID.1F will return zeros if the leaf
>>> does *NOT* exist, ergo the check against CPUID.1F.ECX only needs to
>>> look for a non-zero value.  CPUID.1F.ECX is the logical choice for
>>> detecting support because it is guaranteed to be non-zero if the leaf
>>> actually exists (because bits 15:8 will be non-zero).  Whether or not
>>> bits 31:16 are non-zero is irrelevant.
>>
>> Here is a case:
>>
>> one future CPU may have "cpuid.0.eax > 0x1f" but not use multi-chip
>> packaging technology thus its CPUID.1F leaf **could** not exist to
>> expose multi-die info and it still use cpuid.0B leaf.
>>
>> So the entry->eax >= 0x1f check is true and cpuid_ecx(0x1f) check is
>> true as well due to default return value. (one of my machines for
>> cpuid.1f.ecx would return 0x00000064 without cpuid.1f support).
> 
> You mean entry->eax >= 0x1f, and CPUID.1F leaf doesn't exist.
> In this case, cpuid_ecx(0x1f) must be zero.
> 
> You should the descripton of CPUID instruction in SDM. It says:
> 
>     If a value entered for CPUID.EAX is less than or equal to the maximum input
>     value and the leaf is not supported on that processor then 0 is returned in
>     all the registers.

It's true.

> 
> I can tell you why the cpuid.1f.exc return 0x00000064 in your machines.
> That's the value of leaf 0x16, you can check the output of cpuid.0x0_eax, it
> should be 0x00000016.

I have to mention in this case, the cpuid.1f.ecx[31:8] is 0 as well.

> 
>> When we only cares about whether or not CPUID.1F exists,
>> we may need (cpuid_ecx(0x1f) & 0x0000ff00) != 0 or just follow what host
>> check_extended_topology_leaf() does.
>>>
>>>>>>
>>>>>> It's true cause the statement in public spec is not "Reserved = 0" but
>>>>>> "Bits 31 - 16: Reserved".
>>>>>>
>>>>>>>
>>>>>>> In my opinion, Sean's codes is OK and much simple and clear.
>>>>>>
>>>>>> If the host cpuid.0.eax is greater than 0x1f but actually it doesn't
>>>>>> have multi-chip packaging technology and we may want to expose
>>>>>> entry->eax to some value smaller than 0x1f but greater than 0x14, much
>>>>>> effort needs to apply on Sean's code.
>>>>>>
>>>>>> My improvement is good to overwrite cpuid.0.eax in future usage
>>>>>> from the perspective of kvm feature setting not just from value check.
>>>>>
>>>>> Alright, there is something wrong in your code that you haven't
>>>>> realised.
>>>>>
>>>>> When you do
>>>>> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>>>>
>>>>> it changes the entry->eax if entry->eax > 0x14. So you cannot directly
>>>>> use
>>>>> cpuid_ecx(0x1f). At least, you need to cache the value of entry->eax,
>>>>> like:
>>>>> 	
>>>>> 	u32 max_leaf = entry->eax;
>>>>> 	entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>>>> 	
>>>>> 	//...leaf between 0x14 and 0x1f
>>>>> 	
>>>>> 	if (max_leaf >= 0x1f && (cpuid_ecx(0x1f) & 0x0000ff00))
>>>>>    		entry->eax = 0x1f;
>>>>
>>>> The cache value make no sense on this.
>>>
>>> Xiaoyao is pointing out that by limiting entry->eax to 0x14 (or 0xd), then
>>> it can't be used to detect support for CPUID.1F since KVM will have lost
>>> the required information.
>>
>> The point is that its existence does not really depend on cpuid.0.eax> =
>> 0x1f. My test machine (CLX-AP, two die in one package) has cpuid.0.eax =
>> 0x16 but does have CPUID.1F support (a bit strange on BIOS).
> 
> I think it must be something wrong with your CLX-AP.
> I tested on my CLX-AP machine too, the cpuid.0x0_eax is 0x000000001f.

You are suggesting that we should give up exposing cpuid.1f on this 
strange machine and it's not **practical**.

One of my CLX-AP machines is normal, but there is such another machine 
with a bad BIOS configuration. We are supposed to take advantage of the 
real hardware support instead of ignoring it just because of the 
cpuid.0.eax limit.

> 
>>>
>>>>> However, handling in increasing order in totally wrong. Since it's to
>>>>> report the
>>>>> max the leaf supported, we should handle in descending order, which is
>>>>> what Sean
>>>>> does.
>>>>
>>>> There is no need to check "entry->eax >= 0x1f" before "setting entry->eax
>>>> =
>>>> 0x1f" if and only if cpuid_ecx(0x1f) meets requirements.
>>>
>>> entry->eax absolutely needs to be checked, otherwise you have no idea what
>>> CPUID leaf is actually being consumed.  For example, a CPU with a maximum
>>> basic CPUID leaf of 0xB will report information that is indistinguishable
>>> from the actual CPUID.1F, i.e. KVM will incorrectly think the CPU supports
>>> CPUID.1F regardless of what heuristic is used to detect a "valid" CPUID.1F
>>> if it only looks at the result of CPUID.1F.
>>
>> Based on what I mentioned, just reconsider this proposal:
>>
>> case 0:
>>       entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
>>       if ((cpuid_ecx(0x1f) & 0x0000ff00)) != 0)
>>           entry->eax = 0x1f;
>>
>> It increases the cpuid.0.ecx value along with the minimized feature
>> requirement in a natural and readable way. Why don't we design a little
>> bit future ahead of time?
> 
> There are two things I want to state:
> 
> 1. using cpuid_ecx(0x1f) to check the existence of leaf 1f is absolutely wrong.
> 
> Using the return value of E{A,B,C,D}X of cpuid leaf *N* to check the existence
> of leaf *N* is totally wrong. We need first to ensure that cpuid.0_eax >= *N*,
> that's the reason why we need cpuid.0_eax.
> 
> Specifically, about cpuid leaf 1f, in page 3-222 Vol.2A of latest SDM publish on
> January 2019, the description of Input EAX = 1FH is:
> 
>     When CPUID executes with EAX set to 1FH, the processor returns information
>     about extended topology enumeration data. Software must detect the presence
>     of CPUID leaf 1FH by verifying (a) the highest leaf index supported by CPUID
>     is >= 1FH, and (b) CPUID.1FH:EBX[15:0] reports a non-zero value.

It's true and it's the official right way to check EBX rather than ECX. 
Nice move, thanks xiaoyao.

However we could do better considering all possible output of 
cpuid.0.eax and cpuid.1f and my proposal for whether or not CPUID.1F 
exists is to check cpuid.1f.ecx[31:8] practically as host check does.

I don't insist on it and what do you think, Sean?

> 
> 2. Checking in descending manner is better that increasing manner.
> 
> In increasing manner, we have to check from the smallest one by one, there will
> be some useless check for the smaller one.

You see the uselessness and I see the necessity.
The number of checks time is not critical.

>     
> However, in descending manner, once it finds the supported maxmium one, there is
> no need to check the smaller leaf.

It's not supposed to finds the maxmium one but the minimum value
just recall the purpose of why we apply min().

The descending if_else flow would be broken if kvm does not want to 
expose 0x1f but a expected value smaller than 0x1f but larger than 0x14.

We may add this kind of module parameter like f_intel_pt to apply min() 
again and if so, an increasing manner helps a lot.

The descending flat flow couldn't reduce the number of checks time
and a temp value for original eax is introduced unavoidably.

> 
>>>
>>>> An increasing manner helps to overwrite this value on demand in a flat
>>>> code
>>>> flow (easy to understand and maintain) not an if-else_if-else flow.
>>>
>>> Maybe in the future the code will need to be refactored as more cases are
>>> added, but for now an if-else is quite readable.  Worry about the future
>>> when it happens. :-)
>>>
>>
>>
> 
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd39516..9fc14f2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -65,6 +65,16 @@  u64 kvm_supported_xcr0(void)
 	return xcr0;
 }
 
+/* We need to check if the host cpu has multi-chip packaging technology. */
+static bool kvm_supported_intel_mcp(void)
+{
+	u32 eax, ignored;
+
+	cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
+
+	return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
+}
+
 #define F(x) bit(X86_FEATURE_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -426,6 +436,7 @@  static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 	switch (function) {
 	case 0:
 		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
+		entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
 		break;
 	case 1:
 		entry->edx &= kvm_cpuid_1_edx_x86_features;
@@ -544,6 +555,8 @@  static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		entry->edx = edx.full;
 		break;
 	}
+	/* function 0x1f has additional index. */
+	case 0x1f:
 	/* function 0xb has additional index. */
 	case 0xb: {
 		int i, level_type;