diff mbox series

kvm: x86: Increase MAX_VCPUS to 710

Message ID 20210831204535.1594297-1-ehabkost@redhat.com (mailing list archive)
State New, archived
Headers show
Series kvm: x86: Increase MAX_VCPUS to 710 | expand

Commit Message

Eduardo Habkost Aug. 31, 2021, 8:45 p.m. UTC
Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.

For reference, visible effects of changing KVM_MAX_VCPUS are:
- KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
- Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
  will now be 710
- Bitmap stack variables that will grow:
  - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
    - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
    - vcpu_bitmap will be 92 bytes long
  - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
    once patch "KVM: x86: Fix stack-out-of-bounds memory access
    from ioapic_write_indirect()" is applied

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Vitaly Kuznetsov Sept. 1, 2021, 8:02 a.m. UTC | #1
Eduardo Habkost <ehabkost@redhat.com> writes:

> Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
> Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
>
> For reference, visible effects of changing KVM_MAX_VCPUS are:
> - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
> - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
>   will now be 710
> - Bitmap stack variables that will grow:
>   - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
>     - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
>     - vcpu_bitmap will be 92 bytes long
>   - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
>     once patch "KVM: x86: Fix stack-out-of-bounds memory access
>     from ioapic_write_indirect()" is applied
>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index af6ce8d4c86a..f76fae42bf45 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -37,8 +37,8 @@
>  
>  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>  
> -#define KVM_MAX_VCPUS 288
> -#define KVM_SOFT_MAX_VCPUS 240
> +#define KVM_MAX_VCPUS 710

Out of pure curiosity, where did 710 came from? Is this some particular
hardware which was used for testing (weird number btw). Should we maybe
go to e.g. 1024 for the sake of the beauty of powers of two? :-)

> +#define KVM_SOFT_MAX_VCPUS 710

Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?

Reading 

commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
Author: Sasha Levin <levinsasha928@gmail.com>
Date:   Mon Jul 18 17:17:15 2011 +0300

    KVM: x86: Raise the hard VCPU count limit

the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
vCPU numbers without claiming such configurations as supported.

I have two alternative suggestions:
1) Drop KVM_SOFT_MAX_VCPUS completely.
2) Raise it to a higher number (e.g. 2048)

>  #define KVM_MAX_VCPU_ID 1023

1023 may not be enough now. I rememeber there was a suggestion to make
max_vcpus configurable via module parameter and this question was
raised:

https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/

TL;DR: to support EPYC-like topologies we need to keep
 KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS

>  /* memory slots that are not exposed to userspace */
>  #define KVM_PRIVATE_MEM_SLOTS 3
Igor Mammedov Sept. 1, 2021, 9:13 a.m. UTC | #2
On Wed, 01 Sep 2021 10:02:18 +0200
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:

> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
> > Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
> >
> > For reference, visible effects of changing KVM_MAX_VCPUS are:
> > - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
> > - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
> >   will now be 710
> > - Bitmap stack variables that will grow:
> >   - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
> >     - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
> >     - vcpu_bitmap will be 92 bytes long
> >   - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
> >     once patch "KVM: x86: Fix stack-out-of-bounds memory access
> >     from ioapic_write_indirect()" is applied
> >
> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index af6ce8d4c86a..f76fae42bf45 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -37,8 +37,8 @@
> >  
> >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
> >  
> > -#define KVM_MAX_VCPUS 288
> > -#define KVM_SOFT_MAX_VCPUS 240
> > +#define KVM_MAX_VCPUS 710  
> 
> Out of pure curiosity, where did 710 came from? Is this some particular
> hardware which was used for testing (weird number btw). Should we maybe
> go to e.g. 1024 for the sake of the beauty of powers of two? :-)
> 
> > +#define KVM_SOFT_MAX_VCPUS 710  
> 
> Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?
> 
> Reading 
> 
> commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
> Author: Sasha Levin <levinsasha928@gmail.com>
> Date:   Mon Jul 18 17:17:15 2011 +0300
> 
>     KVM: x86: Raise the hard VCPU count limit
> 
> the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
> vCPU numbers without claiming such configurations as supported.
> 
> I have two alternative suggestions:
> 1) Drop KVM_SOFT_MAX_VCPUS completely.
> 2) Raise it to a higher number (e.g. 2048)
> 
> >  #define KVM_MAX_VCPU_ID 1023  
> 
> 1023 may not be enough now. I rememeber there was a suggestion to make
> max_vcpus configurable via module parameter and this question was
> raised:
> 
> https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/
> 
> TL;DR: to support EPYC-like topologies we need to keep
>  KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS

VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
so topology encoded in the later should be orthogonal to VCPU_ID.

> >  /* memory slots that are not exposed to userspace */
> >  #define KVM_PRIVATE_MEM_SLOTS 3  
>
Vitaly Kuznetsov Sept. 1, 2021, 10:13 a.m. UTC | #3
Igor Mammedov <imammedo@redhat.com> writes:

> On Wed, 01 Sep 2021 10:02:18 +0200
> Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>> 
>> > Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
>> > Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
>> >
>> > For reference, visible effects of changing KVM_MAX_VCPUS are:
>> > - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
>> > - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
>> >   will now be 710
>> > - Bitmap stack variables that will grow:
>> >   - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
>> >     - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
>> >     - vcpu_bitmap will be 92 bytes long
>> >   - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
>> >     once patch "KVM: x86: Fix stack-out-of-bounds memory access
>> >     from ioapic_write_indirect()" is applied
>> >
>> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>> > ---
>> >  arch/x86/include/asm/kvm_host.h | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> > index af6ce8d4c86a..f76fae42bf45 100644
>> > --- a/arch/x86/include/asm/kvm_host.h
>> > +++ b/arch/x86/include/asm/kvm_host.h
>> > @@ -37,8 +37,8 @@
>> >  
>> >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>> >  
>> > -#define KVM_MAX_VCPUS 288
>> > -#define KVM_SOFT_MAX_VCPUS 240
>> > +#define KVM_MAX_VCPUS 710  
>> 
>> Out of pure curiosity, where did 710 came from? Is this some particular
>> hardware which was used for testing (weird number btw). Should we maybe
>> go to e.g. 1024 for the sake of the beauty of powers of two? :-)
>> 
>> > +#define KVM_SOFT_MAX_VCPUS 710  
>> 
>> Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?
>> 
>> Reading 
>> 
>> commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
>> Author: Sasha Levin <levinsasha928@gmail.com>
>> Date:   Mon Jul 18 17:17:15 2011 +0300
>> 
>>     KVM: x86: Raise the hard VCPU count limit
>> 
>> the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
>> vCPU numbers without claiming such configurations as supported.
>> 
>> I have two alternative suggestions:
>> 1) Drop KVM_SOFT_MAX_VCPUS completely.
>> 2) Raise it to a higher number (e.g. 2048)
>> 
>> >  #define KVM_MAX_VCPU_ID 1023  
>> 
>> 1023 may not be enough now. I rememeber there was a suggestion to make
>> max_vcpus configurable via module parameter and this question was
>> raised:
>> 
>> https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/
>> 
>> TL;DR: to support EPYC-like topologies we need to keep
>>  KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS
>
> VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
> so topology encoded in the later should be orthogonal to VCPU_ID.

Why do we even have KVM_MAX_VCPU_ID which is != KVM[_SOFT]_MAX_VCPUS
then?

KVM_MAX_VCPU_ID is only checked in kvm_vm_ioctl_create_vcpu() which
passes 'id' down to kvm_vcpu_init() which, in its turn, sets
'vcpu->vcpu_id'. This is, for example, returned by kvm_x2apic_id():

static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
{
        return apic->vcpu->vcpu_id;
}

So I'm pretty certain this is actually APIC id and it has topology in
it.
Igor Mammedov Sept. 1, 2021, 1:36 p.m. UTC | #4
On Wed, 01 Sep 2021 12:13:55 +0200
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:

> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > On Wed, 01 Sep 2021 10:02:18 +0200
> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >  
> >> Eduardo Habkost <ehabkost@redhat.com> writes:
> >>   
> >> > Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
> >> > Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
> >> >
> >> > For reference, visible effects of changing KVM_MAX_VCPUS are:
> >> > - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
> >> > - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
> >> >   will now be 710
> >> > - Bitmap stack variables that will grow:
> >> >   - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
> >> >     - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
> >> >     - vcpu_bitmap will be 92 bytes long
> >> >   - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
> >> >     once patch "KVM: x86: Fix stack-out-of-bounds memory access
> >> >     from ioapic_write_indirect()" is applied
> >> >
> >> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> >> > ---
> >> >  arch/x86/include/asm/kvm_host.h | 4 ++--
> >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> > index af6ce8d4c86a..f76fae42bf45 100644
> >> > --- a/arch/x86/include/asm/kvm_host.h
> >> > +++ b/arch/x86/include/asm/kvm_host.h
> >> > @@ -37,8 +37,8 @@
> >> >  
> >> >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
> >> >  
> >> > -#define KVM_MAX_VCPUS 288
> >> > -#define KVM_SOFT_MAX_VCPUS 240
> >> > +#define KVM_MAX_VCPUS 710    
> >> 
> >> Out of pure curiosity, where did 710 came from? Is this some particular
> >> hardware which was used for testing (weird number btw). Should we maybe
> >> go to e.g. 1024 for the sake of the beauty of powers of two? :-)
> >>   
> >> > +#define KVM_SOFT_MAX_VCPUS 710    
> >> 
> >> Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?
> >> 
> >> Reading 
> >> 
> >> commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
> >> Author: Sasha Levin <levinsasha928@gmail.com>
> >> Date:   Mon Jul 18 17:17:15 2011 +0300
> >> 
> >>     KVM: x86: Raise the hard VCPU count limit
> >> 
> >> the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
> >> vCPU numbers without claiming such configurations as supported.
> >> 
> >> I have two alternative suggestions:
> >> 1) Drop KVM_SOFT_MAX_VCPUS completely.
> >> 2) Raise it to a higher number (e.g. 2048)
> >>   
> >> >  #define KVM_MAX_VCPU_ID 1023    
> >> 
> >> 1023 may not be enough now. I rememeber there was a suggestion to make
> >> max_vcpus configurable via module parameter and this question was
> >> raised:
> >> 
> >> https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/
> >> 
> >> TL;DR: to support EPYC-like topologies we need to keep
> >>  KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS  
> >
> > VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
> > so topology encoded in the later should be orthogonal to VCPU_ID.  
> 
> Why do we even have KVM_MAX_VCPU_ID which is != KVM[_SOFT]_MAX_VCPUS
> then?
I'd say for compat reasons (8c3ba334f85 KVM: x86: Raise the hard VCPU count limit)

qemu warns users that they are out of recommended (tested) limit when
it sees requested maxcpus over soft limit.
See soft_vcpus_limit in qemu.


> KVM_MAX_VCPU_ID is only checked in kvm_vm_ioctl_create_vcpu() which
> passes 'id' down to kvm_vcpu_init() which, in its turn, sets
> 'vcpu->vcpu_id'. This is, for example, returned by kvm_x2apic_id():
> 
> static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> {
>         return apic->vcpu->vcpu_id;
> }
> 
> So I'm pretty certain this is actually APIC id and it has topology in
> it.
Yep, I mixed it up with cpu_index on QEMU side,
for x86 it fetches actual apic id and feeds that to kvm when creating vCPU.

It looks like KVM_MAX_VCPU_ID (KVM_SOFT_MAX_VCPUS) is essentially
MAX_[SOFT_]APIC_ID which in some places is treated as max number of vCPUs,
so actual max count of vCPUs could be less than that (in case of sparse APIC
IDs /non power of 2 thread|core|whatever count/).
Vitaly Kuznetsov Sept. 1, 2021, 2:42 p.m. UTC | #5
Igor Mammedov <imammedo@redhat.com> writes:

> On Wed, 01 Sep 2021 12:13:55 +0200
> Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
>> Igor Mammedov <imammedo@redhat.com> writes:
>> 
>> > On Wed, 01 Sep 2021 10:02:18 +0200
>> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >  
>> >> Eduardo Habkost <ehabkost@redhat.com> writes:
>> >>   
>> >> > Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
>> >> > Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
>> >> >
>> >> > For reference, visible effects of changing KVM_MAX_VCPUS are:
>> >> > - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
>> >> > - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
>> >> >   will now be 710
>> >> > - Bitmap stack variables that will grow:
>> >> >   - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
>> >> >     - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
>> >> >     - vcpu_bitmap will be 92 bytes long
>> >> >   - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
>> >> >     once patch "KVM: x86: Fix stack-out-of-bounds memory access
>> >> >     from ioapic_write_indirect()" is applied
>> >> >
>> >> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>> >> > ---
>> >> >  arch/x86/include/asm/kvm_host.h | 4 ++--
>> >> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> >> > index af6ce8d4c86a..f76fae42bf45 100644
>> >> > --- a/arch/x86/include/asm/kvm_host.h
>> >> > +++ b/arch/x86/include/asm/kvm_host.h
>> >> > @@ -37,8 +37,8 @@
>> >> >  
>> >> >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>> >> >  
>> >> > -#define KVM_MAX_VCPUS 288
>> >> > -#define KVM_SOFT_MAX_VCPUS 240
>> >> > +#define KVM_MAX_VCPUS 710    
>> >> 
>> >> Out of pure curiosity, where did 710 came from? Is this some particular
>> >> hardware which was used for testing (weird number btw). Should we maybe
>> >> go to e.g. 1024 for the sake of the beauty of powers of two? :-)
>> >>   
>> >> > +#define KVM_SOFT_MAX_VCPUS 710    
>> >> 
>> >> Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?
>> >> 
>> >> Reading 
>> >> 
>> >> commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
>> >> Author: Sasha Levin <levinsasha928@gmail.com>
>> >> Date:   Mon Jul 18 17:17:15 2011 +0300
>> >> 
>> >>     KVM: x86: Raise the hard VCPU count limit
>> >> 
>> >> the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
>> >> vCPU numbers without claiming such configurations as supported.
>> >> 
>> >> I have two alternative suggestions:
>> >> 1) Drop KVM_SOFT_MAX_VCPUS completely.
>> >> 2) Raise it to a higher number (e.g. 2048)
>> >>   
>> >> >  #define KVM_MAX_VCPU_ID 1023    
>> >> 
>> >> 1023 may not be enough now. I rememeber there was a suggestion to make
>> >> max_vcpus configurable via module parameter and this question was
>> >> raised:
>> >> 
>> >> https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/
>> >> 
>> >> TL;DR: to support EPYC-like topologies we need to keep
>> >>  KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS  
>> >
>> > VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
>> > so topology encoded in the later should be orthogonal to VCPU_ID.  
>> 
>> Why do we even have KVM_MAX_VCPU_ID which is != KVM[_SOFT]_MAX_VCPUS
>> then?
> I'd say for compat reasons (8c3ba334f85 KVM: x86: Raise the hard VCPU count limit)
>
> qemu warns users that they are out of recommended (tested) limit when
> it sees requested maxcpus over soft limit.
> See soft_vcpus_limit in qemu.
>

That's the reason why we have KVM_SOFT_MAX_VCPUS in addition to
KVM_MAX_VCPUS, not why we have KVM_MAX_VCPU_ID :-)

>
>> KVM_MAX_VCPU_ID is only checked in kvm_vm_ioctl_create_vcpu() which
>> passes 'id' down to kvm_vcpu_init() which, in its turn, sets
>> 'vcpu->vcpu_id'. This is, for example, returned by kvm_x2apic_id():
>> 
>> static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
>> {
>>         return apic->vcpu->vcpu_id;
>> }
>> 
>> So I'm pretty certain this is actually APIC id and it has topology in
>> it.
> Yep, I mixed it up with cpu_index on QEMU side,
> for x86 it fetches actual apic id and feeds that to kvm when creating vCPU.
>
> It looks like KVM_MAX_VCPU_ID (KVM_SOFT_MAX_VCPUS) is essentially
> MAX_[SOFT_]APIC_ID which in some places is treated as max number of vCPUs,
> so actual max count of vCPUs could be less than that (in case of sparse APIC
> IDs /non power of 2 thread|core|whatever count/).

Yes. To avoid the confusion, I'd suggest we re-define KVM_MAX_VCPU_ID as
something like:

#define KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO 4
#define KVM_MAX_VCPU_ID (KVM_MAX_VCPUS * KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO)

and add a comment about sparse APIC IDs/topology.
Eduardo Habkost Sept. 1, 2021, 3:25 p.m. UTC | #6
On Wed, Sep 01, 2021 at 04:42:08PM +0200, Vitaly Kuznetsov wrote:
> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > On Wed, 01 Sep 2021 12:13:55 +0200
> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >
> >> Igor Mammedov <imammedo@redhat.com> writes:
> >> 
> >> > On Wed, 01 Sep 2021 10:02:18 +0200
> >> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >> >  
> >> >> Eduardo Habkost <ehabkost@redhat.com> writes:
> >> >>   
> >> >> > Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
> >> >> > Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
> >> >> >
> >> >> > For reference, visible effects of changing KVM_MAX_VCPUS are:
> >> >> > - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
> >> >> > - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
> >> >> >   will now be 710
> >> >> > - Bitmap stack variables that will grow:
> >> >> >   - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
> >> >> >     - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
> >> >> >     - vcpu_bitmap will be 92 bytes long
> >> >> >   - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
> >> >> >     once patch "KVM: x86: Fix stack-out-of-bounds memory access
> >> >> >     from ioapic_write_indirect()" is applied
> >> >> >
> >> >> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> >> >> > ---
> >> >> >  arch/x86/include/asm/kvm_host.h | 4 ++--
> >> >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> >> >
> >> >> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> >> > index af6ce8d4c86a..f76fae42bf45 100644
> >> >> > --- a/arch/x86/include/asm/kvm_host.h
> >> >> > +++ b/arch/x86/include/asm/kvm_host.h
> >> >> > @@ -37,8 +37,8 @@
> >> >> >  
> >> >> >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
> >> >> >  
> >> >> > -#define KVM_MAX_VCPUS 288
> >> >> > -#define KVM_SOFT_MAX_VCPUS 240
> >> >> > +#define KVM_MAX_VCPUS 710    
> >> >> 
> >> >> Out of pure curiosity, where did 710 came from? Is this some particular
> >> >> hardware which was used for testing (weird number btw). Should we maybe
> >> >> go to e.g. 1024 for the sake of the beauty of powers of two? :-)

710 wasn't tested with real VMs yet due to userspace limitations
that still need to be addressed (specifically, due to SMBIOS 2.1
table size limits).

I would be more than happy to set it to 1024 or 2048 if the KVM
maintainers agree.  :)

For reference, RHEL-8.4 is compiled with KVM_MAX_VCPUS=2048, but
userspace enforces a 710 VCPU limit.

> >> >>   
> >> >> > +#define KVM_SOFT_MAX_VCPUS 710    
> >> >> 
> >> >> Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?
> >> >> 
> >> >> Reading 
> >> >> 
> >> >> commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
> >> >> Author: Sasha Levin <levinsasha928@gmail.com>
> >> >> Date:   Mon Jul 18 17:17:15 2011 +0300
> >> >> 
> >> >>     KVM: x86: Raise the hard VCPU count limit
> >> >> 
> >> >> the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
> >> >> vCPU numbers without claiming such configurations as supported.
> >> >> 
> >> >> I have two alternative suggestions:
> >> >> 1) Drop KVM_SOFT_MAX_VCPUS completely.
> >> >> 2) Raise it to a higher number (e.g. 2048)

I will send a RFC later proposing we make KVM_MAX_VCPUS
configurable by Kconfig, and dropping KVM_SOFT_MAX_VCPUS.

> >> >>   
> >> >> >  #define KVM_MAX_VCPU_ID 1023    
> >> >> 
> >> >> 1023 may not be enough now. I rememeber there was a suggestion to make
> >> >> max_vcpus configurable via module parameter and this question was
> >> >> raised:
> >> >> 
> >> >> https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/
> >> >> 
> >> >> TL;DR: to support EPYC-like topologies we need to keep
> >> >>  KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS  

1024 seems to be enough on all the CPU topologies I have seen,
but I can happily implement the 4x rule below, just to be sure.

> >> >
> >> > VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
> >> > so topology encoded in the later should be orthogonal to VCPU_ID.  
> >> 
> >> Why do we even have KVM_MAX_VCPU_ID which is != KVM[_SOFT]_MAX_VCPUS
> >> then?
> > I'd say for compat reasons (8c3ba334f85 KVM: x86: Raise the hard VCPU count limit)
> >
> > qemu warns users that they are out of recommended (tested) limit when
> > it sees requested maxcpus over soft limit.
> > See soft_vcpus_limit in qemu.
> >
> 
> That's the reason why we have KVM_SOFT_MAX_VCPUS in addition to
> KVM_MAX_VCPUS, not why we have KVM_MAX_VCPU_ID :-)
> 
> >
> >> KVM_MAX_VCPU_ID is only checked in kvm_vm_ioctl_create_vcpu() which
> >> passes 'id' down to kvm_vcpu_init() which, in its turn, sets
> >> 'vcpu->vcpu_id'. This is, for example, returned by kvm_x2apic_id():
> >> 
> >> static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> >> {
> >>         return apic->vcpu->vcpu_id;
> >> }
> >> 
> >> So I'm pretty certain this is actually APIC id and it has topology in
> >> it.
> > Yep, I mixed it up with cpu_index on QEMU side,
> > for x86 it fetches actual apic id and feeds that to kvm when creating vCPU.
> >
> > It looks like KVM_MAX_VCPU_ID (KVM_SOFT_MAX_VCPUS) is essentially
> > MAX_[SOFT_]APIC_ID which in some places is treated as max number of vCPUs,
> > so actual max count of vCPUs could be less than that (in case of sparse APIC
> > IDs /non power of 2 thread|core|whatever count/).
> 
> Yes. To avoid the confusion, I'd suggest we re-define KVM_MAX_VCPU_ID as
> something like:
> 
> #define KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO 4
> #define KVM_MAX_VCPU_ID (KVM_MAX_VCPUS * KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO)
> 
> and add a comment about sparse APIC IDs/topology.

I will submit a new version of this patch with a rule like the
above.

A 4x ratio is very generous, but the only impact of a large
KVM_MAX_VCPU_ID is a larger struct kvm_ioapic.  Changing
KVM_MAX_VCPU_ID from 1024 to 4096 will make struct kvm_ioapic
grow from 1628 bytes to 5084 bytes, which I assume is OK.
Eduardo Habkost Sept. 1, 2021, 5:54 p.m. UTC | #7
On Wed, Sep 01, 2021 at 11:25:25AM -0400, Eduardo Habkost wrote:
> 710 wasn't tested with real VMs yet due to userspace limitations
> that still need to be addressed (specifically, due to SMBIOS 2.1
> table size limits).

Oops, I mean VMs _larger_ than 710 VCPUs weren't tested.  710 was
tested with real VMs, and was the higher we could go before
hitting the SMBIOS 2.1 table size limits.
Jürgen Groß Sept. 3, 2021, 8:13 a.m. UTC | #8
On 01.09.21 17:25, Eduardo Habkost wrote:
> On Wed, Sep 01, 2021 at 04:42:08PM +0200, Vitaly Kuznetsov wrote:
>> Igor Mammedov <imammedo@redhat.com> writes:
>>
>>> On Wed, 01 Sep 2021 12:13:55 +0200
>>> Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>
>>>> Igor Mammedov <imammedo@redhat.com> writes:
>>>>
>>>>> On Wed, 01 Sep 2021 10:02:18 +0200
>>>>> Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>>>   
>>>>>> Eduardo Habkost <ehabkost@redhat.com> writes:
>>>>>>    
>>>>>>> Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
>>>>>>> Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
>>>>>>>
>>>>>>> For reference, visible effects of changing KVM_MAX_VCPUS are:
>>>>>>> - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
>>>>>>> - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
>>>>>>>    will now be 710
>>>>>>> - Bitmap stack variables that will grow:
>>>>>>>    - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
>>>>>>>      - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
>>>>>>>      - vcpu_bitmap will be 92 bytes long
>>>>>>>    - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
>>>>>>>      once patch "KVM: x86: Fix stack-out-of-bounds memory access
>>>>>>>      from ioapic_write_indirect()" is applied
>>>>>>>
>>>>>>> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>>>>>>> ---
>>>>>>>   arch/x86/include/asm/kvm_host.h | 4 ++--
>>>>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>>>> index af6ce8d4c86a..f76fae42bf45 100644
>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>> @@ -37,8 +37,8 @@
>>>>>>>   
>>>>>>>   #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>>>>>>>   
>>>>>>> -#define KVM_MAX_VCPUS 288
>>>>>>> -#define KVM_SOFT_MAX_VCPUS 240
>>>>>>> +#define KVM_MAX_VCPUS 710
>>>>>>
>>>>>> Out of pure curiosity, where did 710 came from? Is this some particular
>>>>>> hardware which was used for testing (weird number btw). Should we maybe
>>>>>> go to e.g. 1024 for the sake of the beauty of powers of two? :-)
> 
> 710 wasn't tested with real VMs yet due to userspace limitations
> that still need to be addressed (specifically, due to SMBIOS 2.1
> table size limits).
> 
> I would be more than happy to set it to 1024 or 2048 if the KVM
> maintainers agree.  :)
> 
> For reference, RHEL-8.4 is compiled with KVM_MAX_VCPUS=2048, but
> userspace enforces a 710 VCPU limit.
> 
>>>>>>    
>>>>>>> +#define KVM_SOFT_MAX_VCPUS 710
>>>>>>
>>>>>> Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?
>>>>>>
>>>>>> Reading
>>>>>>
>>>>>> commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
>>>>>> Author: Sasha Levin <levinsasha928@gmail.com>
>>>>>> Date:   Mon Jul 18 17:17:15 2011 +0300
>>>>>>
>>>>>>      KVM: x86: Raise the hard VCPU count limit
>>>>>>
>>>>>> the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
>>>>>> vCPU numbers without claiming such configurations as supported.
>>>>>>
>>>>>> I have two alternative suggestions:
>>>>>> 1) Drop KVM_SOFT_MAX_VCPUS completely.
>>>>>> 2) Raise it to a higher number (e.g. 2048)
> 
> I will send a RFC later proposing we make KVM_MAX_VCPUS
> configurable by Kconfig, and dropping KVM_SOFT_MAX_VCPUS.
> 
>>>>>>    
>>>>>>>   #define KVM_MAX_VCPU_ID 1023
>>>>>>
>>>>>> 1023 may not be enough now. I rememeber there was a suggestion to make
>>>>>> max_vcpus configurable via module parameter and this question was
>>>>>> raised:
>>>>>>
>>>>>> https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/
>>>>>>
>>>>>> TL;DR: to support EPYC-like topologies we need to keep
>>>>>>   KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS
> 
> 1024 seems to be enough on all the CPU topologies I have seen,
> but I can happily implement the 4x rule below, just to be sure.
> 
>>>>>
>>>>> VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
>>>>> so topology encoded in the later should be orthogonal to VCPU_ID.
>>>>
>>>> Why do we even have KVM_MAX_VCPU_ID which is != KVM[_SOFT]_MAX_VCPUS
>>>> then?
>>> I'd say for compat reasons (8c3ba334f85 KVM: x86: Raise the hard VCPU count limit)
>>>
>>> qemu warns users that they are out of recommended (tested) limit when
>>> it sees requested maxcpus over soft limit.
>>> See soft_vcpus_limit in qemu.
>>>
>>
>> That's the reason why we have KVM_SOFT_MAX_VCPUS in addition to
>> KVM_MAX_VCPUS, not why we have KVM_MAX_VCPU_ID :-)
>>
>>>
>>>> KVM_MAX_VCPU_ID is only checked in kvm_vm_ioctl_create_vcpu() which
>>>> passes 'id' down to kvm_vcpu_init() which, in its turn, sets
>>>> 'vcpu->vcpu_id'. This is, for example, returned by kvm_x2apic_id():
>>>>
>>>> static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
>>>> {
>>>>          return apic->vcpu->vcpu_id;
>>>> }
>>>>
>>>> So I'm pretty certain this is actually APIC id and it has topology in
>>>> it.
>>> Yep, I mixed it up with cpu_index on QEMU side,
>>> for x86 it fetches actual apic id and feeds that to kvm when creating vCPU.
>>>
>>> It looks like KVM_MAX_VCPU_ID (KVM_SOFT_MAX_VCPUS) is essentially
>>> MAX_[SOFT_]APIC_ID which in some places is treated as max number of vCPUs,
>>> so actual max count of vCPUs could be less than that (in case of sparse APIC
>>> IDs /non power of 2 thread|core|whatever count/).
>>
>> Yes. To avoid the confusion, I'd suggest we re-define KVM_MAX_VCPU_ID as
>> something like:
>>
>> #define KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO 4
>> #define KVM_MAX_VCPU_ID (KVM_MAX_VCPUS * KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO)
>>
>> and add a comment about sparse APIC IDs/topology.
> 
> I will submit a new version of this patch with a rule like the
> above.
> 
> A 4x ratio is very generous, but the only impact of a large
> KVM_MAX_VCPU_ID is a larger struct kvm_ioapic.  Changing
> KVM_MAX_VCPU_ID from 1024 to 4096 will make struct kvm_ioapic
> grow from 1628 bytes to 5084 bytes, which I assume is OK.
> 

I'm just about to send V2 of my series [1] to support specifying the
max vcpu-id and max number of vcpus of a guest via command line
parameters.


Juergen

[1]: https://lore.kernel.org/kvm/20210701154105.23215-1-jgross@suse.com/
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af6ce8d4c86a..f76fae42bf45 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,8 +37,8 @@ 
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
-#define KVM_MAX_VCPUS 288
-#define KVM_SOFT_MAX_VCPUS 240
+#define KVM_MAX_VCPUS 710
+#define KVM_SOFT_MAX_VCPUS 710
 #define KVM_MAX_VCPU_ID 1023
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3