diff mbox series

[1/2] viridian: remove implicit limit of 64 VPs per partition

Message ID 1610066796-17266-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State Superseded
Headers show
Series [1/2] viridian: remove implicit limit of 64 VPs per partition | expand

Commit Message

Igor Druzhinin Jan. 8, 2021, 12:46 a.m. UTC
TLFS 7.8.1 stipulates that "a virtual processor index must be less than
the maximum number of virtual processors per partition" that "can be obtained
through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
the Microsoft Hypervisor Interface" defines that starting from Windows Server
2012, which allowed more than 64 CPUs to be brought up, this leaf can now
contain a value -1 basically assuming the hypervisor has no restriction while
0 (that we currently expose) means the default restriction is still present.

Along with previous changes exposing ExProcessorMasks this allows a recent
Windows VM with Viridian extension enabled to have more than 64 vCPUs without
going into immediate BSOD.

Since we didn't expose the leaf before and to keep CPUID data consistent for
incoming streams from previous Xen versions - let's keep it behind an option.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 tools/libs/light/libxl_x86.c         |  2 +-
 xen/arch/x86/hvm/viridian/viridian.c | 23 +++++++++++++++++++++++
 xen/include/public/hvm/params.h      |  7 ++++++-
 3 files changed, 30 insertions(+), 2 deletions(-)

Comments

Durrant, Paul Jan. 8, 2021, 8:32 a.m. UTC | #1
> -----Original Message-----
> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> Sent: 08 January 2021 00:47
> To: xen-devel@lists.xenproject.org
> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> 
> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
> the maximum number of virtual processors per partition" that "can be obtained
> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
> the Microsoft Hypervisor Interface" defines that starting from Windows Server
> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
> contain a value -1 basically assuming the hypervisor has no restriction while
> 0 (that we currently expose) means the default restriction is still present.
> 
> Along with previous changes exposing ExProcessorMasks this allows a recent
> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
> going into immediate BSOD.
> 

This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was implemented... no need for the extra leaf.
The documentation for 0x40000005 states " Describes the scale limits supported in the current hypervisor implementation. If any
value is zero, the hypervisor does not expose the corresponding information". It does not say (in section 7.8.1 or elsewhere AFAICT)
what implications that has for VP_INDEX.

In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying what the semantics of not
implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the 64 VP limit. It also says that
reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly not true if ExProcessorMasks is
enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.

Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which version of Windows? I'd like to get
a repro myself.

  Paul

> Since we didn't expose the leaf before and to keep CPUID data consistent for
> incoming streams from previous Xen versions - let's keep it behind an option.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
>  tools/libs/light/libxl_x86.c         |  2 +-
>  xen/arch/x86/hvm/viridian/viridian.c | 23 +++++++++++++++++++++++
>  xen/include/public/hvm/params.h      |  7 ++++++-
>  3 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> index 86d2729..96c8bf1 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -336,7 +336,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>          LOG(DETAIL, "%s group enabled", libxl_viridian_enlightenment_to_string(v));
> 
>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_BASE)) {
> -        mask |= HVMPV_base_freq;
> +        mask |= HVMPV_base_freq | HVMPV_no_vp_limit;
> 
>          if (!libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_FREQ))
>              mask |= HVMPV_no_freq;
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c b/xen/arch/x86/hvm/viridian/viridian.c
> index ed97804..ae1ea86 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -209,6 +209,29 @@ void cpuid_viridian_leaves(const struct vcpu *v, uint32_t leaf,
>          res->b = viridian_spinlock_retry_count;
>          break;
> 
> +    case 5:
> +        /*
> +         * From "Requirements for Implementing the Microsoft Hypervisor
> +         *  Interface":
> +         *
> +         * "On Windows operating systems versions through Windows Server
> +         * 2008 R2, reporting the HV#1 hypervisor interface limits
> +         * the Windows virtual machine to a maximum of 64 VPs, regardless of
> +         * what is reported via CPUID.40000005.EAX.
> +         *
> +         * Starting with Windows Server 2012 and Windows 8, if
> +         * CPUID.40000005.EAX containsa value of -1, Windows assumes that
> +         * the hypervisor imposes no specific limit to the number of VPs.
> +         * In this case, Windows Server 2012 guest VMs may use more than 64
> +         * VPs, up to the maximum supported number of processors applicable
> +         * to the specific Windows version being used."
> +         *
> +         * For compatibility we hide it behind an option.
> +         */
> +        if ( viridian_feature_mask(d) & HVMPV_no_vp_limit )
> +            res->a = -1;
> +        break;
> +
>      case 6:
>          /* Detected and in use hardware features. */
>          if ( cpu_has_vmx_virtualize_apic_accesses )
> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
> index 3b0a0f4..805f4ca 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -168,6 +168,10 @@
>  #define _HVMPV_ex_processor_masks 10
>  #define HVMPV_ex_processor_masks (1 << _HVMPV_ex_processor_masks)
> 
> +/* Allow more than 64 VPs */
> +#define _HVMPV_no_vp_limit 11
> +#define HVMPV_no_vp_limit (1 << _HVMPV_no_vp_limit)
> +
>  #define HVMPV_feature_mask \
>          (HVMPV_base_freq | \
>           HVMPV_no_freq | \
> @@ -179,7 +183,8 @@
>           HVMPV_synic | \
>           HVMPV_stimer | \
>           HVMPV_hcall_ipi | \
> -         HVMPV_ex_processor_masks)
> +         HVMPV_ex_processor_masks | \
> +         HVMPV_no_vp_limit)
> 
>  #endif
> 
> --
> 2.7.4
Jan Beulich Jan. 8, 2021, 9:19 a.m. UTC | #2
On 08.01.2021 01:46, Igor Druzhinin wrote:
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -336,7 +336,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>          LOG(DETAIL, "%s group enabled", libxl_viridian_enlightenment_to_string(v));
>  
>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_BASE)) {
> -        mask |= HVMPV_base_freq;
> +        mask |= HVMPV_base_freq | HVMPV_no_vp_limit;

Could you clarify the point of having the new control when at
the guest config level there's no way to prevent it from
getting enabled (short of disabling Viridian extensions
altogether)?

Jan
Andrew Cooper Jan. 8, 2021, 10:15 a.m. UTC | #3
(Sorry for webmail).

The forthcoming hotfix on Win10/Server2019 (Build  20270) is in serious problems without these two fixes, and never starts secondary processors.

~Andrew

-----Original Message-----
From: Paul Durrant <xadimgnik@gmail.com> 
Sent: Friday, January 8, 2021 8:32 AM
To: Igor Druzhinin <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org
Cc: wl@xen.org; iwj@xenproject.org; Anthony Perard <anthony.perard@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; jbeulich@suse.com; julien@xen.org; sstabellini@kernel.org; Roger Pau Monne <roger.pau@citrix.com>
Subject: RE: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition

> -----Original Message-----
> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> Sent: 08 January 2021 00:47
> To: xen-devel@lists.xenproject.org
> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; 
> anthony.perard@citrix.com; andrew.cooper3@citrix.com; 
> george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org; 
> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin 
> <igor.druzhinin@citrix.com>
> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per 
> partition
> 
> TLFS 7.8.1 stipulates that "a virtual processor index must be less 
> than the maximum number of virtual processors per partition" that "can 
> be obtained through CPUID leaf 0x40000005". Furthermore, "Requirements 
> for Implementing the Microsoft Hypervisor Interface" defines that 
> starting from Windows Server 2012, which allowed more than 64 CPUs to 
> be brought up, this leaf can now contain a value -1 basically assuming 
> the hypervisor has no restriction while
> 0 (that we currently expose) means the default restriction is still present.
> 
> Along with previous changes exposing ExProcessorMasks this allows a 
> recent Windows VM with Viridian extension enabled to have more than 64 
> vCPUs without going into immediate BSOD.
> 

This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was implemented... no need for the extra leaf.
The documentation for 0x40000005 states " Describes the scale limits supported in the current hypervisor implementation. If any value is zero, the hypervisor does not expose the corresponding information". It does not say (in section 7.8.1 or elsewhere AFAICT) what implications that has for VP_INDEX.

In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying what the semantics of not implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the 64 VP limit. It also says that reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly not true if ExProcessorMasks is enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.

Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which version of Windows? I'd like to get a repro myself.

  Paul

> Since we didn't expose the leaf before and to keep CPUID data 
> consistent for incoming streams from previous Xen versions - let's keep it behind an option.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
>  tools/libs/light/libxl_x86.c         |  2 +-
>  xen/arch/x86/hvm/viridian/viridian.c | 23 +++++++++++++++++++++++
>  xen/include/public/hvm/params.h      |  7 ++++++-
>  3 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libs/light/libxl_x86.c 
> b/tools/libs/light/libxl_x86.c index 86d2729..96c8bf1 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -336,7 +336,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>          LOG(DETAIL, "%s group enabled", 
> libxl_viridian_enlightenment_to_string(v));
> 
>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_BASE)) {
> -        mask |= HVMPV_base_freq;
> +        mask |= HVMPV_base_freq | HVMPV_no_vp_limit;
> 
>          if (!libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_FREQ))
>              mask |= HVMPV_no_freq;
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
> b/xen/arch/x86/hvm/viridian/viridian.c
> index ed97804..ae1ea86 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -209,6 +209,29 @@ void cpuid_viridian_leaves(const struct vcpu *v, uint32_t leaf,
>          res->b = viridian_spinlock_retry_count;
>          break;
> 
> +    case 5:
> +        /*
> +         * From "Requirements for Implementing the Microsoft Hypervisor
> +         *  Interface":
> +         *
> +         * "On Windows operating systems versions through Windows Server
> +         * 2008 R2, reporting the HV#1 hypervisor interface limits
> +         * the Windows virtual machine to a maximum of 64 VPs, regardless of
> +         * what is reported via CPUID.40000005.EAX.
> +         *
> +         * Starting with Windows Server 2012 and Windows 8, if
> +         * CPUID.40000005.EAX containsa value of -1, Windows assumes that
> +         * the hypervisor imposes no specific limit to the number of VPs.
> +         * In this case, Windows Server 2012 guest VMs may use more than 64
> +         * VPs, up to the maximum supported number of processors applicable
> +         * to the specific Windows version being used."
> +         *
> +         * For compatibility we hide it behind an option.
> +         */
> +        if ( viridian_feature_mask(d) & HVMPV_no_vp_limit )
> +            res->a = -1;
> +        break;
> +
>      case 6:
>          /* Detected and in use hardware features. */
>          if ( cpu_has_vmx_virtualize_apic_accesses ) diff --git 
> a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h 
> index 3b0a0f4..805f4ca 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -168,6 +168,10 @@
>  #define _HVMPV_ex_processor_masks 10
>  #define HVMPV_ex_processor_masks (1 << _HVMPV_ex_processor_masks)
> 
> +/* Allow more than 64 VPs */
> +#define _HVMPV_no_vp_limit 11
> +#define HVMPV_no_vp_limit (1 << _HVMPV_no_vp_limit)
> +
>  #define HVMPV_feature_mask \
>          (HVMPV_base_freq | \
>           HVMPV_no_freq | \
> @@ -179,7 +183,8 @@
>           HVMPV_synic | \
>           HVMPV_stimer | \
>           HVMPV_hcall_ipi | \
> -         HVMPV_ex_processor_masks)
> +         HVMPV_ex_processor_masks | \
> +         HVMPV_no_vp_limit)
> 
>  #endif
> 
> --
> 2.7.4
Igor Druzhinin Jan. 8, 2021, 11:27 a.m. UTC | #4
On 08/01/2021 09:19, Jan Beulich wrote:
> On 08.01.2021 01:46, Igor Druzhinin wrote:
>> --- a/tools/libs/light/libxl_x86.c
>> +++ b/tools/libs/light/libxl_x86.c
>> @@ -336,7 +336,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>>          LOG(DETAIL, "%s group enabled", libxl_viridian_enlightenment_to_string(v));
>>  
>>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_BASE)) {
>> -        mask |= HVMPV_base_freq;
>> +        mask |= HVMPV_base_freq | HVMPV_no_vp_limit;
> 
> Could you clarify the point of having the new control when at
> the guest config level there's no way to prevent it from
> getting enabled (short of disabling Viridian extensions
> altogether)?

If you migrate in from a host without this code (previous version
of Xen)- you will keep the old behavior for the guest thus preserving
binary compatibility.

Igor
Igor Druzhinin Jan. 8, 2021, 11:48 a.m. UTC | #5
On 08/01/2021 08:32, Paul Durrant wrote:
>> -----Original Message-----
>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
>> Sent: 08 January 2021 00:47
>> To: xen-devel@lists.xenproject.org
>> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
>> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
>> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>
>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
>> the maximum number of virtual processors per partition" that "can be obtained
>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
>> the Microsoft Hypervisor Interface" defines that starting from Windows Server
>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
>> contain a value -1 basically assuming the hypervisor has no restriction while
>> 0 (that we currently expose) means the default restriction is still present.
>>
>> Along with previous changes exposing ExProcessorMasks this allows a recent
>> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
>> going into immediate BSOD.
>>
> 
> This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was implemented... no need for the extra leaf.
> The documentation for 0x40000005 states " Describes the scale limits supported in the current hypervisor implementation. If any
> value is zero, the hypervisor does not expose the corresponding information". It does not say (in section 7.8.1 or elsewhere AFAICT)
> what implications that has for VP_INDEX.

The text says that VP_INDEX "must" be below that limit - that is clear enough
for me.

I admit I have not tested with ExProssorMasks exposed but I disabled TLB flush
and IPI extensions before that - Windows is mysterious in it's handling of
this logic. Let me align all of the changes and perform some clean cut testing.

> In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying what the semantics of not
> implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the 64 VP limit.

Yes, that's exactly the info this change is based on.

> It also says that
> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly not true if ExProcessorMasks is
> enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.

True - I assumed that for them to work ExProcessorMasks is necessary then.

Igor
Jan Beulich Jan. 8, 2021, 1:17 p.m. UTC | #6
On 08.01.2021 12:27, Igor Druzhinin wrote:
> On 08/01/2021 09:19, Jan Beulich wrote:
>> On 08.01.2021 01:46, Igor Druzhinin wrote:
>>> --- a/tools/libs/light/libxl_x86.c
>>> +++ b/tools/libs/light/libxl_x86.c
>>> @@ -336,7 +336,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>>>          LOG(DETAIL, "%s group enabled", libxl_viridian_enlightenment_to_string(v));
>>>  
>>>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_BASE)) {
>>> -        mask |= HVMPV_base_freq;
>>> +        mask |= HVMPV_base_freq | HVMPV_no_vp_limit;
>>
>> Could you clarify the point of having the new control when at
>> the guest config level there's no way to prevent it from
>> getting enabled (short of disabling Viridian extensions
>> altogether)?
> 
> If you migrate in from a host without this code (previous version
> of Xen)- you will keep the old behavior for the guest thus preserving
> binary compatibility.

Keeping thing compatible like this is clearly a requirement. But
is this enough? As soon as the guest reboots, it will see differing
behavior, won't it?

Jan
Igor Druzhinin Jan. 8, 2021, 1:25 p.m. UTC | #7
On 08/01/2021 13:17, Jan Beulich wrote:
> On 08.01.2021 12:27, Igor Druzhinin wrote:
>> On 08/01/2021 09:19, Jan Beulich wrote:
>>> On 08.01.2021 01:46, Igor Druzhinin wrote:
>>>> --- a/tools/libs/light/libxl_x86.c
>>>> +++ b/tools/libs/light/libxl_x86.c
>>>> @@ -336,7 +336,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>>>>          LOG(DETAIL, "%s group enabled", libxl_viridian_enlightenment_to_string(v));
>>>>  
>>>>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_BASE)) {
>>>> -        mask |= HVMPV_base_freq;
>>>> +        mask |= HVMPV_base_freq | HVMPV_no_vp_limit;
>>>
>>> Could you clarify the point of having the new control when at
>>> the guest config level there's no way to prevent it from
>>> getting enabled (short of disabling Viridian extensions
>>> altogether)?
>>
>> If you migrate in from a host without this code (previous version
>> of Xen)- you will keep the old behavior for the guest thus preserving
>> binary compatibility.
> 
> Keeping thing compatible like this is clearly a requirement. But
> is this enough? As soon as the guest reboots, it will see differing
> behavior, won't it?

Yes, that's what I was expecting - after reboot it should be fine.
Hence I put this option into the default set - I'd expect no limit to be
applied in the first place.

Igor
Igor Druzhinin Jan. 9, 2021, 12:55 a.m. UTC | #8
On 08/01/2021 08:32, Paul Durrant wrote:
>> -----Original Message-----
>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
>> Sent: 08 January 2021 00:47
>> To: xen-devel@lists.xenproject.org
>> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
>> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
>> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>
>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
>> the maximum number of virtual processors per partition" that "can be obtained
>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
>> the Microsoft Hypervisor Interface" defines that starting from Windows Server
>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
>> contain a value -1 basically assuming the hypervisor has no restriction while
>> 0 (that we currently expose) means the default restriction is still present.
>>
>> Along with previous changes exposing ExProcessorMasks this allows a recent
>> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
>> going into immediate BSOD.
>>
> 
> This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was implemented... no need for the extra leaf.
> The documentation for 0x40000005 states " Describes the scale limits supported in the current hypervisor implementation. If any
> value is zero, the hypervisor does not expose the corresponding information". It does not say (in section 7.8.1 or elsewhere AFAICT)
> what implications that has for VP_INDEX.
> 
> In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying what the semantics of not
> implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the 64 VP limit. It also says that
> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly not true if ExProcessorMasks is
> enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.
> 
> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which version of Windows? I'd like to get
> a repro myself.

I return with more testing that confirm both my and your results.

1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
pre-release 20270 of vNext boot correctly with no changes

and that would be fine but the existing documentation for ex_processor_masks
states that:
"Hence this enlightenment must be specified for guests with more
than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
specified."

You then would expect 64+ vCPU VM to boot without ex_processors_mask,
hcall_remote_tlb_flush and hcall_ipi.

So,
2) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary CPUs

After applying my change,
3) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
and hcall_ipi disabled WS19 and vNext boot correctly

So we either need to change documentation and require ExProcessorMasks for all
VMs with 64+ vCPUs or go with my change.

Igor
Durrant, Paul Jan. 11, 2021, 8:45 a.m. UTC | #9
> -----Original Message-----
> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> Sent: 09 January 2021 00:56
> To: paul@xen.org; xen-devel@lists.xenproject.org
> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org; sstabellini@kernel.org;
> roger.pau@citrix.com
> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> 
> On 08/01/2021 08:32, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> >> Sent: 08 January 2021 00:47
> >> To: xen-devel@lists.xenproject.org
> >> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
> >> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
> >> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
> >> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> >>
> >> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
> >> the maximum number of virtual processors per partition" that "can be obtained
> >> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
> >> the Microsoft Hypervisor Interface" defines that starting from Windows Server
> >> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
> >> contain a value -1 basically assuming the hypervisor has no restriction while
> >> 0 (that we currently expose) means the default restriction is still present.
> >>
> >> Along with previous changes exposing ExProcessorMasks this allows a recent
> >> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
> >> going into immediate BSOD.
> >>
> >
> > This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was
> implemented... no need for the extra leaf.
> > The documentation for 0x40000005 states " Describes the scale limits supported in the current
> hypervisor implementation. If any
> > value is zero, the hypervisor does not expose the corresponding information". It does not say (in
> section 7.8.1 or elsewhere AFAICT)
> > what implications that has for VP_INDEX.
> >
> > In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying
> what the semantics of not
> > implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the 64
> VP limit. It also says that
> > reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly
> not true if ExProcessorMasks is
> > enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.
> >
> > Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which
> version of Windows? I'd like to get
> > a repro myself.
> 
> I return with more testing that confirm both my and your results.
> 
> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
> pre-release 20270 of vNext boot correctly with no changes
> 
> and that would be fine but the existing documentation for ex_processor_masks
> states that:
> "Hence this enlightenment must be specified for guests with more
> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
> specified."
> 
> You then would expect 64+ vCPU VM to boot without ex_processors_mask,
> hcall_remote_tlb_flush and hcall_ipi.

Indeed.

> 
> So,
> 2) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary CPUs
> 

This is not what I recall from testing but I can confirm I see the same now.

> After applying my change,
> 3) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
> and hcall_ipi disabled WS19 and vNext boot correctly
> 
> So we either need to change documentation and require ExProcessorMasks for all
> VMs with 64+ vCPUs or go with my change.
> 

I think we go with your change. I guess we can conclude then that the separate viridian flag as part of the base set is the right way to go (to stop the leaf magically appearing on migrate of existing guests) and leave ex_processor_masks (and the documentation) as-is.

You can add my R-b to the patch.

  Paul

> Igor
Jan Beulich Jan. 11, 2021, 8:59 a.m. UTC | #10
On 11.01.2021 09:45, Paul Durrant wrote:
>> -----Original Message-----
>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
>> Sent: 09 January 2021 00:56
>> To: paul@xen.org; xen-devel@lists.xenproject.org
>> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
>> george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org; sstabellini@kernel.org;
>> roger.pau@citrix.com
>> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>
>> On 08/01/2021 08:32, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>> Sent: 08 January 2021 00:47
>>>> To: xen-devel@lists.xenproject.org
>>>> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
>>>> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
>>>> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
>>>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>>>
>>>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
>>>> the maximum number of virtual processors per partition" that "can be obtained
>>>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
>>>> the Microsoft Hypervisor Interface" defines that starting from Windows Server
>>>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
>>>> contain a value -1 basically assuming the hypervisor has no restriction while
>>>> 0 (that we currently expose) means the default restriction is still present.
>>>>
>>>> Along with previous changes exposing ExProcessorMasks this allows a recent
>>>> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
>>>> going into immediate BSOD.
>>>>
>>>
>>> This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was
>> implemented... no need for the extra leaf.
>>> The documentation for 0x40000005 states " Describes the scale limits supported in the current
>> hypervisor implementation. If any
>>> value is zero, the hypervisor does not expose the corresponding information". It does not say (in
>> section 7.8.1 or elsewhere AFAICT)
>>> what implications that has for VP_INDEX.
>>>
>>> In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying
>> what the semantics of not
>>> implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the 64
>> VP limit. It also says that
>>> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly
>> not true if ExProcessorMasks is
>>> enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.
>>>
>>> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which
>> version of Windows? I'd like to get
>>> a repro myself.
>>
>> I return with more testing that confirm both my and your results.
>>
>> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
>> pre-release 20270 of vNext boot correctly with no changes
>>
>> and that would be fine but the existing documentation for ex_processor_masks
>> states that:
>> "Hence this enlightenment must be specified for guests with more
>> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
>> specified."
>>
>> You then would expect 64+ vCPU VM to boot without ex_processors_mask,
>> hcall_remote_tlb_flush and hcall_ipi.
> 
> Indeed.
> 
>>
>> So,
>> 2) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
>> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary CPUs
>>
> 
> This is not what I recall from testing but I can confirm I see the same now.
> 
>> After applying my change,
>> 3) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
>> and hcall_ipi disabled WS19 and vNext boot correctly
>>
>> So we either need to change documentation and require ExProcessorMasks for all
>> VMs with 64+ vCPUs or go with my change.
>>
> 
> I think we go with your change. I guess we can conclude then that the separate viridian flag as part of the base set is the right way to go (to stop the leaf magically appearing on migrate of existing guests) and leave ex_processor_masks (and the documentation) as-is.
> 
> You can add my R-b to the patch.

That's the unchanged patch then, including the libxl change that
I had asked about and that I have to admit I don't fully follow
Igor's responses? I'm hesitant to give an ack for that aspect of
the change, yet I suppose the libxl maintainers will defer to
x86 ones there. Alternatively Andrew or Roger could of course
ack this ...

Jan
Durrant, Paul Jan. 11, 2021, 9:09 a.m. UTC | #11
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 11 January 2021 09:00
> To: paul@xen.org
> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> george.dunlap@citrix.com; julien@xen.org; sstabellini@kernel.org; roger.pau@citrix.com; xen-
> devel@lists.xenproject.org; 'Igor Druzhinin' <igor.druzhinin@citrix.com>
> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> 
> On 11.01.2021 09:45, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> >> Sent: 09 January 2021 00:56
> >> To: paul@xen.org; xen-devel@lists.xenproject.org
> >> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> >> george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org; sstabellini@kernel.org;
> >> roger.pau@citrix.com
> >> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> >>
> >> On 08/01/2021 08:32, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> >>>> Sent: 08 January 2021 00:47
> >>>> To: xen-devel@lists.xenproject.org
> >>>> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
> >>>> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
> >>>> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
> >>>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> >>>>
> >>>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
> >>>> the maximum number of virtual processors per partition" that "can be obtained
> >>>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
> >>>> the Microsoft Hypervisor Interface" defines that starting from Windows Server
> >>>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
> >>>> contain a value -1 basically assuming the hypervisor has no restriction while
> >>>> 0 (that we currently expose) means the default restriction is still present.
> >>>>
> >>>> Along with previous changes exposing ExProcessorMasks this allows a recent
> >>>> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
> >>>> going into immediate BSOD.
> >>>>
> >>>
> >>> This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was
> >> implemented... no need for the extra leaf.
> >>> The documentation for 0x40000005 states " Describes the scale limits supported in the current
> >> hypervisor implementation. If any
> >>> value is zero, the hypervisor does not expose the corresponding information". It does not say (in
> >> section 7.8.1 or elsewhere AFAICT)
> >>> what implications that has for VP_INDEX.
> >>>
> >>> In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying
> >> what the semantics of not
> >>> implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the
> 64
> >> VP limit. It also says that
> >>> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly
> >> not true if ExProcessorMasks is
> >>> enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.
> >>>
> >>> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which
> >> version of Windows? I'd like to get
> >>> a repro myself.
> >>
> >> I return with more testing that confirm both my and your results.
> >>
> >> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
> >> pre-release 20270 of vNext boot correctly with no changes
> >>
> >> and that would be fine but the existing documentation for ex_processor_masks
> >> states that:
> >> "Hence this enlightenment must be specified for guests with more
> >> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
> >> specified."
> >>
> >> You then would expect 64+ vCPU VM to boot without ex_processors_mask,
> >> hcall_remote_tlb_flush and hcall_ipi.
> >
> > Indeed.
> >
> >>
> >> So,
> >> 2) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
> >> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary CPUs
> >>
> >
> > This is not what I recall from testing but I can confirm I see the same now.
> >
> >> After applying my change,
> >> 3) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
> >> and hcall_ipi disabled WS19 and vNext boot correctly
> >>
> >> So we either need to change documentation and require ExProcessorMasks for all
> >> VMs with 64+ vCPUs or go with my change.
> >>
> >
> > I think we go with your change. I guess we can conclude then that the separate viridian flag as part
> of the base set is the right way to go (to stop the leaf magically appearing on migrate of existing
> guests) and leave ex_processor_masks (and the documentation) as-is.
> >
> > You can add my R-b to the patch.
> 
> That's the unchanged patch then, including the libxl change that
> I had asked about and that I have to admit I don't fully follow
> Igor's responses? I'm hesitant to give an ack for that aspect of
> the change, yet I suppose the libxl maintainers will defer to
> x86 ones there. Alternatively Andrew or Roger could of course
> ack this ...
> 

I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think that's enough. 

  Paul

> Jan
Durrant, Paul Jan. 11, 2021, 9:12 a.m. UTC | #12
> -----Original Message-----
> From: Paul Durrant <xadimgnik@gmail.com>
> Sent: 11 January 2021 09:10
> To: 'Jan Beulich' <jbeulich@suse.com>
> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> george.dunlap@citrix.com; julien@xen.org; sstabellini@kernel.org; roger.pau@citrix.com; xen-
> devel@lists.xenproject.org; 'Igor Druzhinin' <igor.druzhinin@citrix.com>
> Subject: RE: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> 
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@suse.com>
> > Sent: 11 January 2021 09:00
> > To: paul@xen.org
> > Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> > george.dunlap@citrix.com; julien@xen.org; sstabellini@kernel.org; roger.pau@citrix.com; xen-
> > devel@lists.xenproject.org; 'Igor Druzhinin' <igor.druzhinin@citrix.com>
> > Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> >
> > On 11.01.2021 09:45, Paul Durrant wrote:
> > >> -----Original Message-----
> > >> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> > >> Sent: 09 January 2021 00:56
> > >> To: paul@xen.org; xen-devel@lists.xenproject.org
> > >> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> > >> george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org; sstabellini@kernel.org;
> > >> roger.pau@citrix.com
> > >> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> > >>
> > >> On 08/01/2021 08:32, Paul Durrant wrote:
> > >>>> -----Original Message-----
> > >>>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> > >>>> Sent: 08 January 2021 00:47
> > >>>> To: xen-devel@lists.xenproject.org
> > >>>> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
> > >>>> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
> > >>>> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
> > >>>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> > >>>>
> > >>>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
> > >>>> the maximum number of virtual processors per partition" that "can be obtained
> > >>>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
> > >>>> the Microsoft Hypervisor Interface" defines that starting from Windows Server
> > >>>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
> > >>>> contain a value -1 basically assuming the hypervisor has no restriction while
> > >>>> 0 (that we currently expose) means the default restriction is still present.
> > >>>>
> > >>>> Along with previous changes exposing ExProcessorMasks this allows a recent
> > >>>> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
> > >>>> going into immediate BSOD.
> > >>>>
> > >>>
> > >>> This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was
> > >> implemented... no need for the extra leaf.
> > >>> The documentation for 0x40000005 states " Describes the scale limits supported in the current
> > >> hypervisor implementation. If any
> > >>> value is zero, the hypervisor does not expose the corresponding information". It does not say
> (in
> > >> section 7.8.1 or elsewhere AFAICT)
> > >>> what implications that has for VP_INDEX.
> > >>>
> > >>> In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything
> saying
> > >> what the semantics of not
> > >>> implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the
> > 64
> > >> VP limit. It also says that
> > >>> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is
> clearly
> > >> not true if ExProcessorMasks is
> > >>> enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of
> salt.
> > >>>
> > >>> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which
> > >> version of Windows? I'd like to get
> > >>> a repro myself.
> > >>
> > >> I return with more testing that confirm both my and your results.
> > >>
> > >> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
> > >> pre-release 20270 of vNext boot correctly with no changes
> > >>
> > >> and that would be fine but the existing documentation for ex_processor_masks
> > >> states that:
> > >> "Hence this enlightenment must be specified for guests with more
> > >> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
> > >> specified."
> > >>
> > >> You then would expect 64+ vCPU VM to boot without ex_processors_mask,
> > >> hcall_remote_tlb_flush and hcall_ipi.
> > >
> > > Indeed.
> > >
> > >>
> > >> So,
> > >> 2) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
> > >> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary CPUs
> > >>
> > >
> > > This is not what I recall from testing but I can confirm I see the same now.
> > >
> > >> After applying my change,
> > >> 3) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
> > >> and hcall_ipi disabled WS19 and vNext boot correctly
> > >>
> > >> So we either need to change documentation and require ExProcessorMasks for all
> > >> VMs with 64+ vCPUs or go with my change.
> > >>
> > >
> > > I think we go with your change. I guess we can conclude then that the separate viridian flag as
> part
> > of the base set is the right way to go (to stop the leaf magically appearing on migrate of existing
> > guests) and leave ex_processor_masks (and the documentation) as-is.
> > >
> > > You can add my R-b to the patch.
> >
> > That's the unchanged patch then, including the libxl change that
> > I had asked about and that I have to admit I don't fully follow
> > Igor's responses? I'm hesitant to give an ack for that aspect of
> > the change, yet I suppose the libxl maintainers will defer to
> > x86 ones there. Alternatively Andrew or Roger could of course
> > ack this ...
> >
> 
> I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented
> semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think
> that's enough.

... although adding an option in xl/libxl isn't that much work, I suppose.

Igor, would you be ok plumbing it through?

  Paul

> 
>   Paul
> 
> > Jan
Jan Beulich Jan. 11, 2021, 9:13 a.m. UTC | #13
On 11.01.2021 10:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 11 January 2021 09:00
>> To: paul@xen.org
>> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
>> george.dunlap@citrix.com; julien@xen.org; sstabellini@kernel.org; roger.pau@citrix.com; xen-
>> devel@lists.xenproject.org; 'Igor Druzhinin' <igor.druzhinin@citrix.com>
>> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>
>> On 11.01.2021 09:45, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>> Sent: 09 January 2021 00:56
>>>> To: paul@xen.org; xen-devel@lists.xenproject.org
>>>> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
>>>> george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org; sstabellini@kernel.org;
>>>> roger.pau@citrix.com
>>>> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>>>
>>>> On 08/01/2021 08:32, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>>>> Sent: 08 January 2021 00:47
>>>>>> To: xen-devel@lists.xenproject.org
>>>>>> Cc: paul@xen.org; wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com;
>>>>>> andrew.cooper3@citrix.com; george.dunlap@citrix.com; jbeulich@suse.com; julien@xen.org;
>>>>>> sstabellini@kernel.org; roger.pau@citrix.com; Igor Druzhinin <igor.druzhinin@citrix.com>
>>>>>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>>>>>
>>>>>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
>>>>>> the maximum number of virtual processors per partition" that "can be obtained
>>>>>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
>>>>>> the Microsoft Hypervisor Interface" defines that starting from Windows Server
>>>>>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
>>>>>> contain a value -1 basically assuming the hypervisor has no restriction while
>>>>>> 0 (that we currently expose) means the default restriction is still present.
>>>>>>
>>>>>> Along with previous changes exposing ExProcessorMasks this allows a recent
>>>>>> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
>>>>>> going into immediate BSOD.
>>>>>>
>>>>>
>>>>> This is very odd as I was happily testing with a 128 vCPU VM once ExProcessorMasks was
>>>> implemented... no need for the extra leaf.
>>>>> The documentation for 0x40000005 states " Describes the scale limits supported in the current
>>>> hypervisor implementation. If any
>>>>> value is zero, the hypervisor does not expose the corresponding information". It does not say (in
>>>> section 7.8.1 or elsewhere AFAICT)
>>>>> what implications that has for VP_INDEX.
>>>>>
>>>>> In " Requirements for Implementing the Microsoft Hypervisor Interface" I don't see anything saying
>>>> what the semantics of not
>>>>> implementing leaf 0x40000005 are, only that if implementing it then -1 must be used to break the
>> 64
>>>> VP limit. It also says that
>>>>> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be clear, which is clearly
>>>> not true if ExProcessorMasks is
>>>>> enabled. That document is dated June 13th 2012 so I think it should be taken with a pinch of salt.
>>>>>
>>>>> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is enabled? If so, which
>>>> version of Windows? I'd like to get
>>>>> a repro myself.
>>>>
>>>> I return with more testing that confirm both my and your results.
>>>>
>>>> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
>>>> pre-release 20270 of vNext boot correctly with no changes
>>>>
>>>> and that would be fine but the existing documentation for ex_processor_masks
>>>> states that:
>>>> "Hence this enlightenment must be specified for guests with more
>>>> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
>>>> specified."
>>>>
>>>> You then would expect 64+ vCPU VM to boot without ex_processors_mask,
>>>> hcall_remote_tlb_flush and hcall_ipi.
>>>
>>> Indeed.
>>>
>>>>
>>>> So,
>>>> 2) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
>>>> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary CPUs
>>>>
>>>
>>> This is not what I recall from testing but I can confirm I see the same now.
>>>
>>>> After applying my change,
>>>> 3) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
>>>> and hcall_ipi disabled WS19 and vNext boot correctly
>>>>
>>>> So we either need to change documentation and require ExProcessorMasks for all
>>>> VMs with 64+ vCPUs or go with my change.
>>>>
>>>
>>> I think we go with your change. I guess we can conclude then that the separate viridian flag as part
>> of the base set is the right way to go (to stop the leaf magically appearing on migrate of existing
>> guests) and leave ex_processor_masks (and the documentation) as-is.
>>>
>>> You can add my R-b to the patch.
>>
>> That's the unchanged patch then, including the libxl change that
>> I had asked about and that I have to admit I don't fully follow
>> Igor's responses? I'm hesitant to give an ack for that aspect of
>> the change, yet I suppose the libxl maintainers will defer to
>> x86 ones there. Alternatively Andrew or Roger could of course
>> ack this ...
>>
> 
> I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think that's enough. 

Well, okay then. I can throw in this patch unchanged; it is my
understanding that there'll be a v2 for patch 2.

Jan
Jan Beulich Jan. 11, 2021, 9:16 a.m. UTC | #14
On 11.01.2021 10:12, Paul Durrant wrote:
>> From: Paul Durrant <xadimgnik@gmail.com>
>> Sent: 11 January 2021 09:10
>>
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 11 January 2021 09:00
>>>
>>> On 11.01.2021 09:45, Paul Durrant wrote:
>>>> You can add my R-b to the patch.
>>>
>>> That's the unchanged patch then, including the libxl change that
>>> I had asked about and that I have to admit I don't fully follow
>>> Igor's responses? I'm hesitant to give an ack for that aspect of
>>> the change, yet I suppose the libxl maintainers will defer to
>>> x86 ones there. Alternatively Andrew or Roger could of course
>>> ack this ...
>>>
>>
>> I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented
>> semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think
>> that's enough.
> 
> ... although adding an option in xl/libxl isn't that much work, I suppose.
> 
> Igor, would you be ok plumbing it through?

This back and forth leaves unclear to me what I should do. I
would have asked on irc, but you're not there as it seems.

Jan
Durrant, Paul Jan. 11, 2021, 9:20 a.m. UTC | #15
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 11 January 2021 09:16
> To: paul@xen.org
> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> george.dunlap@citrix.com; julien@xen.org; sstabellini@kernel.org; roger.pau@citrix.com; xen-
> devel@lists.xenproject.org; 'Igor Druzhinin' <igor.druzhinin@citrix.com>
> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> 
> On 11.01.2021 10:12, Paul Durrant wrote:
> >> From: Paul Durrant <xadimgnik@gmail.com>
> >> Sent: 11 January 2021 09:10
> >>
> >>> From: Jan Beulich <jbeulich@suse.com>
> >>> Sent: 11 January 2021 09:00
> >>>
> >>> On 11.01.2021 09:45, Paul Durrant wrote:
> >>>> You can add my R-b to the patch.
> >>>
> >>> That's the unchanged patch then, including the libxl change that
> >>> I had asked about and that I have to admit I don't fully follow
> >>> Igor's responses? I'm hesitant to give an ack for that aspect of
> >>> the change, yet I suppose the libxl maintainers will defer to
> >>> x86 ones there. Alternatively Andrew or Roger could of course
> >>> ack this ...
> >>>
> >>
> >> I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented
> >> semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think
> >> that's enough.
> >
> > ... although adding an option in xl/libxl isn't that much work, I suppose.
> >
> > Igor, would you be ok plumbing it through?
> 
> This back and forth leaves unclear to me what I should do. I
> would have asked on irc, but you're not there as it seems.

No, VPN issues make use of IRC painful I'm afraid. Let's see what Igor says.

  Paul

> 
> Jan
Igor Druzhinin Jan. 11, 2021, 1:34 p.m. UTC | #16
On 11/01/2021 09:16, Jan Beulich wrote:
> On 11.01.2021 10:12, Paul Durrant wrote:
>>> From: Paul Durrant <xadimgnik@gmail.com>
>>> Sent: 11 January 2021 09:10
>>>
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 11 January 2021 09:00
>>>>
>>>> On 11.01.2021 09:45, Paul Durrant wrote:
>>>>> You can add my R-b to the patch.
>>>>
>>>> That's the unchanged patch then, including the libxl change that
>>>> I had asked about and that I have to admit I don't fully follow
>>>> Igor's responses? I'm hesitant to give an ack for that aspect of
>>>> the change, yet I suppose the libxl maintainers will defer to
>>>> x86 ones there. Alternatively Andrew or Roger could of course
>>>> ack this ...
>>>>
>>>
>>> I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented
>>> semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think
>>> that's enough.
>>
>> ... although adding an option in xl/libxl isn't that much work, I suppose.
>>
>> Igor, would you be ok plumbing it through?
> 
> This back and forth leaves unclear to me what I should do. I
> would have asked on irc, but you're not there as it seems.

I don't see a scenario where somebody would want to opt out of unlimited
VPs per domain given the leaf with -1 is supported on all Windows versions.
I can make it configurable in the future if reports re-surface it causes
troubles somewhere.

I'd like to do the same with CPU hotplug bit (given it's not configurable
in QEMU either) but wanted to hear an opinion here?

Igor
Jan Beulich Jan. 11, 2021, 1:38 p.m. UTC | #17
On 11.01.2021 14:34, Igor Druzhinin wrote:
> On 11/01/2021 09:16, Jan Beulich wrote:
>> On 11.01.2021 10:12, Paul Durrant wrote:
>>>> From: Paul Durrant <xadimgnik@gmail.com>
>>>> Sent: 11 January 2021 09:10
>>>>
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: 11 January 2021 09:00
>>>>>
>>>>> On 11.01.2021 09:45, Paul Durrant wrote:
>>>>>> You can add my R-b to the patch.
>>>>>
>>>>> That's the unchanged patch then, including the libxl change that
>>>>> I had asked about and that I have to admit I don't fully follow
>>>>> Igor's responses? I'm hesitant to give an ack for that aspect of
>>>>> the change, yet I suppose the libxl maintainers will defer to
>>>>> x86 ones there. Alternatively Andrew or Roger could of course
>>>>> ack this ...
>>>>>
>>>>
>>>> I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented
>>>> semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think
>>>> that's enough.
>>>
>>> ... although adding an option in xl/libxl isn't that much work, I suppose.
>>>
>>> Igor, would you be ok plumbing it through?
>>
>> This back and forth leaves unclear to me what I should do. I
>> would have asked on irc, but you're not there as it seems.
> 
> I don't see a scenario where somebody would want to opt out of unlimited
> VPs per domain given the leaf with -1 is supported on all Windows versions.

So Paul - commit patch as is then?

> I can make it configurable in the future if reports re-surface it causes
> troubles somewhere.

This is the slight concern I have: Having to make it configurable
once someone has reported trouble would look a little late to me.
Otoh I agree it may end up being dead code if no problems get
ever encountered.

Jan
Durrant, Paul Jan. 11, 2021, 1:47 p.m. UTC | #18
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 11 January 2021 13:38
> To: Igor Druzhinin <igor.druzhinin@citrix.com>; paul@xen.org
> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
> george.dunlap@citrix.com; julien@xen.org; sstabellini@kernel.org; roger.pau@citrix.com; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> 
> On 11.01.2021 14:34, Igor Druzhinin wrote:
> > On 11/01/2021 09:16, Jan Beulich wrote:
> >> On 11.01.2021 10:12, Paul Durrant wrote:
> >>>> From: Paul Durrant <xadimgnik@gmail.com>
> >>>> Sent: 11 January 2021 09:10
> >>>>
> >>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>> Sent: 11 January 2021 09:00
> >>>>>
> >>>>> On 11.01.2021 09:45, Paul Durrant wrote:
> >>>>>> You can add my R-b to the patch.
> >>>>>
> >>>>> That's the unchanged patch then, including the libxl change that
> >>>>> I had asked about and that I have to admit I don't fully follow
> >>>>> Igor's responses? I'm hesitant to give an ack for that aspect of
> >>>>> the change, yet I suppose the libxl maintainers will defer to
> >>>>> x86 ones there. Alternatively Andrew or Roger could of course
> >>>>> ack this ...
> >>>>>
> >>>>
> >>>> I don't think we really need specific control in xl.cfg as this is a fix for some poorly
> documented
> >>>> semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I
> think
> >>>> that's enough.
> >>>
> >>> ... although adding an option in xl/libxl isn't that much work, I suppose.
> >>>
> >>> Igor, would you be ok plumbing it through?
> >>
> >> This back and forth leaves unclear to me what I should do. I
> >> would have asked on irc, but you're not there as it seems.
> >
> > I don't see a scenario where somebody would want to opt out of unlimited
> > VPs per domain given the leaf with -1 is supported on all Windows versions.
> 
> So Paul - commit patch as is then?
> 
> > I can make it configurable in the future if reports re-surface it causes
> > troubles somewhere.
> 
> This is the slight concern I have: Having to make it configurable
> once someone has reported trouble would look a little late to me.
> Otoh I agree it may end up being dead code if no problems get
> ever encountered.
> 

I think I'm persuaded by your caution. Since it's not a massive amount of code, let's have flags for both wired through to xl and default them to on, so I withdraw my R-b for the libxl_x86.c hunk.

  Paul

> Jan
Igor Druzhinin Jan. 11, 2021, 1:50 p.m. UTC | #19
On 11/01/2021 13:47, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 11 January 2021 13:38
>> To: Igor Druzhinin <igor.druzhinin@citrix.com>; paul@xen.org
>> Cc: wl@xen.org; iwj@xenproject.org; anthony.perard@citrix.com; andrew.cooper3@citrix.com;
>> george.dunlap@citrix.com; julien@xen.org; sstabellini@kernel.org; roger.pau@citrix.com; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>
>> On 11.01.2021 14:34, Igor Druzhinin wrote:
>>> On 11/01/2021 09:16, Jan Beulich wrote:
>>>> On 11.01.2021 10:12, Paul Durrant wrote:
>>>>>> From: Paul Durrant <xadimgnik@gmail.com>
>>>>>> Sent: 11 January 2021 09:10
>>>>>>
>>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>>> Sent: 11 January 2021 09:00
>>>>>>>
>>>>>>> On 11.01.2021 09:45, Paul Durrant wrote:
>>>>>>>> You can add my R-b to the patch.
>>>>>>>
>>>>>>> That's the unchanged patch then, including the libxl change that
>>>>>>> I had asked about and that I have to admit I don't fully follow
>>>>>>> Igor's responses? I'm hesitant to give an ack for that aspect of
>>>>>>> the change, yet I suppose the libxl maintainers will defer to
>>>>>>> x86 ones there. Alternatively Andrew or Roger could of course
>>>>>>> ack this ...
>>>>>>>
>>>>>>
>>>>>> I don't think we really need specific control in xl.cfg as this is a fix for some poorly
>> documented
>>>>>> semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I
>> think
>>>>>> that's enough.
>>>>>
>>>>> ... although adding an option in xl/libxl isn't that much work, I suppose.
>>>>>
>>>>> Igor, would you be ok plumbing it through?
>>>>
>>>> This back and forth leaves unclear to me what I should do. I
>>>> would have asked on irc, but you're not there as it seems.
>>>
>>> I don't see a scenario where somebody would want to opt out of unlimited
>>> VPs per domain given the leaf with -1 is supported on all Windows versions.
>>
>> So Paul - commit patch as is then?
>>
>>> I can make it configurable in the future if reports re-surface it causes
>>> troubles somewhere.
>>
>> This is the slight concern I have: Having to make it configurable
>> once someone has reported trouble would look a little late to me.
>> Otoh I agree it may end up being dead code if no problems get
>> ever encountered.
>>
> 
> I think I'm persuaded by your caution. Since it's not a massive amount of code, let's have flags for both wired through to xl and default them to on, so I withdraw my R-b for the libxl_x86.c hunk.

Ok, will re-work the patches.

Igot
diff mbox series

Patch

diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index 86d2729..96c8bf1 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -336,7 +336,7 @@  static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
         LOG(DETAIL, "%s group enabled", libxl_viridian_enlightenment_to_string(v));
 
     if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_BASE)) {
-        mask |= HVMPV_base_freq;
+        mask |= HVMPV_base_freq | HVMPV_no_vp_limit;
 
         if (!libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_FREQ))
             mask |= HVMPV_no_freq;
diff --git a/xen/arch/x86/hvm/viridian/viridian.c b/xen/arch/x86/hvm/viridian/viridian.c
index ed97804..ae1ea86 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -209,6 +209,29 @@  void cpuid_viridian_leaves(const struct vcpu *v, uint32_t leaf,
         res->b = viridian_spinlock_retry_count;
         break;
 
+    case 5:
+        /*
+         * From "Requirements for Implementing the Microsoft Hypervisor
+         *  Interface":
+         *
+         * "On Windows operating systems versions through Windows Server
+         * 2008 R2, reporting the HV#1 hypervisor interface limits
+         * the Windows virtual machine to a maximum of 64 VPs, regardless of
+         * what is reported via CPUID.40000005.EAX.
+         *
+         * Starting with Windows Server 2012 and Windows 8, if
+         * CPUID.40000005.EAX containsa value of -1, Windows assumes that
+         * the hypervisor imposes no specific limit to the number of VPs.
+         * In this case, Windows Server 2012 guest VMs may use more than 64
+         * VPs, up to the maximum supported number of processors applicable
+         * to the specific Windows version being used."
+         *
+         * For compatibility we hide it behind an option.
+         */
+        if ( viridian_feature_mask(d) & HVMPV_no_vp_limit )
+            res->a = -1;
+        break;
+
     case 6:
         /* Detected and in use hardware features. */
         if ( cpu_has_vmx_virtualize_apic_accesses )
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 3b0a0f4..805f4ca 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -168,6 +168,10 @@ 
 #define _HVMPV_ex_processor_masks 10
 #define HVMPV_ex_processor_masks (1 << _HVMPV_ex_processor_masks)
 
+/* Allow more than 64 VPs */
+#define _HVMPV_no_vp_limit 11
+#define HVMPV_no_vp_limit (1 << _HVMPV_no_vp_limit)
+
 #define HVMPV_feature_mask \
         (HVMPV_base_freq | \
          HVMPV_no_freq | \
@@ -179,7 +183,8 @@ 
          HVMPV_synic | \
          HVMPV_stimer | \
          HVMPV_hcall_ipi | \
-         HVMPV_ex_processor_masks)
+         HVMPV_ex_processor_masks | \
+         HVMPV_no_vp_limit)
 
 #endif