mbox series

[v2,0/3] kvm: x86: Set KVM_MAX_VCPUS=1024, KVM_SOFT_MAX_VCPUS=710

Message ID 20210903211600.2002377-1-ehabkost@redhat.com (mailing list archive)
Headers show
Series kvm: x86: Set KVM_MAX_VCPUS=1024, KVM_SOFT_MAX_VCPUS=710 | expand

Message

Eduardo Habkost Sept. 3, 2021, 9:15 p.m. UTC
This:
- Increases KVM_MAX_VCPUS 288 to 1024.
- Increases KVM_MAX_VCPU_ID from 1023 to 4096.
- Increases KVM_SOFT_MAX_VCPUS from 240 to 710.

Note that this conflicts with:
  https://lore.kernel.org/lkml/20210903130808.30142-1-jgross@suse.com
  Date: Fri,  3 Sep 2021 15:08:01 +0200
  From: Juergen Gross <jgross@suse.com>
  Subject: [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs
  Message-Id: <20210903130808.30142-1-jgross@suse.com>

I don't intend to block Juergen's work.  I will be happy to
rebase and resubmit in case Juergen's series is merged first.
However, I do propose that we set the values above as the default
even if Juergen's series is merged.

The additional overhead (on x86_64) of the new defaults will be:
- struct kvm_ioapic will grow from 1628 to 5084 bytes.
- struct kvm will grow from 19328 to 22272 bytes.
- Bitmap stack variables that will grow:
- At kvm_hv_flush_tlb() & kvm_hv_send_ipi(),
  vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
- vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
  once patch "KVM: x86: Fix stack-out-of-bounds memory access
  from ioapic_write_indirect()" is applied

Changes v1 -> v2:
* KVM_MAX_VCPUS is now 1024 (v1 set it to 710)
* KVM_MAX_VCPU_ID is now 4096 (v1 left it unchanged, at 1023)

v1 of this series was:

  https://lore.kernel.org/lkml/20210831204535.1594297-1-ehabkost@redhat.com
  Date: Tue, 31 Aug 2021 16:45:35 -0400
  From: Eduardo Habkost <ehabkost@redhat.com>
  Subject: [PATCH] kvm: x86: Increase MAX_VCPUS to 710
  Message-Id: <20210831204535.1594297-1-ehabkost@redhat.com>

Eduardo Habkost (3):
  kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
  kvm: x86: Increase MAX_VCPUS to 1024
  kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710

 arch/x86/include/asm/kvm_host.h | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini Sept. 6, 2021, 10:12 a.m. UTC | #1
On 03/09/21 23:15, Eduardo Habkost wrote:
> This:
> - Increases KVM_MAX_VCPUS 288 to 1024.
> - Increases KVM_MAX_VCPU_ID from 1023 to 4096.
> - Increases KVM_SOFT_MAX_VCPUS from 240 to 710.
> 
> Note that this conflicts with:
>    https://lore.kernel.org/lkml/20210903130808.30142-1-jgross@suse.com
>    Date: Fri,  3 Sep 2021 15:08:01 +0200
>    From: Juergen Gross <jgross@suse.com>
>    Subject: [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs
>    Message-Id: <20210903130808.30142-1-jgross@suse.com>
> 
> I don't intend to block Juergen's work.  I will be happy to
> rebase and resubmit in case Juergen's series is merged first.
> However, I do propose that we set the values above as the default
> even if Juergen's series is merged.
> 
> The additional overhead (on x86_64) of the new defaults will be:
> - struct kvm_ioapic will grow from 1628 to 5084 bytes.
> - struct kvm will grow from 19328 to 22272 bytes.
> - Bitmap stack variables that will grow:
> - At kvm_hv_flush_tlb() & kvm_hv_send_ipi(),
>    vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
> - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
>    once patch "KVM: x86: Fix stack-out-of-bounds memory access
>    from ioapic_write_indirect()" is applied
> 
> Changes v1 -> v2:
> * KVM_MAX_VCPUS is now 1024 (v1 set it to 710)
> * KVM_MAX_VCPU_ID is now 4096 (v1 left it unchanged, at 1023)
> 
> v1 of this series was:
> 
>    https://lore.kernel.org/lkml/20210831204535.1594297-1-ehabkost@redhat.com
>    Date: Tue, 31 Aug 2021 16:45:35 -0400
>    From: Eduardo Habkost <ehabkost@redhat.com>
>    Subject: [PATCH] kvm: x86: Increase MAX_VCPUS to 710
>    Message-Id: <20210831204535.1594297-1-ehabkost@redhat.com>
> 
> Eduardo Habkost (3):
>    kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
>    kvm: x86: Increase MAX_VCPUS to 1024
>    kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
> 
>   arch/x86/include/asm/kvm_host.h | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 

Queued, thanks.

Paolo
Sean Christopherson Sept. 11, 2021, 12:30 a.m. UTC | #2
On Mon, Sep 06, 2021, Paolo Bonzini wrote:
> On 03/09/21 23:15, Eduardo Habkost wrote:
> > - Increases KVM_MAX_VCPU_ID from 1023 to 4096.

...

> > Eduardo Habkost (3):
> >    kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS

...

> >    kvm: x86: Increase MAX_VCPUS to 1024
> >    kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
> > 
> >   arch/x86/include/asm/kvm_host.h | 18 +++++++++++++++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> 
> Queued, thanks.

Before we commit to this, can we sort out the off-by-one mess that is KVM_MAX_VCPU_ID?
As Eduardo pointed out[*], Juergen's commit 76b4f357d0e7 ("x86/kvm: fix vcpu-id
indexed array sizes") _shouldn't_ be necessary because kvm_vm_ioctl_create_vcpu()
rejects attempts to create id==KVM_MAX_VCPU_ID

	if (id >= KVM_MAX_VCPU_ID)
		return -EINVAL;

which aligns with the docs for KVM_CREATE_VCPU:

	The vcpu id is an integer in the range [0, max_vcpu_id)

but the fix is "needed" because rtc_irq_eoi_tracking_reset() does

	bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_ID + 1);

and now we have this

	DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
	u8 vectors[KVM_MAX_VCPU_ID + 1];

which is wrong as well.  The "right" fix would have been to change
rtc_irq_eoi_tracking_reset(), but that looks all kinds of wrong on the surface.

Non-x86 really mucks it up because generic code does:

	#ifndef KVM_MAX_VCPU_ID
	#define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
	#endif

which means pretty much everything else can create more vCPUs than vCPU IDs.

Maybe fix KVM's internal KVM_MAX_VCPU_ID so that it's incluse, and handle the
backwards compability mess in KVM_CAP_MAX_VCPU_ID?  Then have the max ID for x86
be (4*KVM_MAX_VCPUS - 1).  E.g. something like:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5d5c5ed7dd4..2e5c8081f72b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4061,7 +4061,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
                r = KVM_MAX_VCPUS;
                break;
        case KVM_CAP_MAX_VCPU_ID:
-               r = KVM_MAX_VCPU_ID;
+               /* KVM's ABI is stupid. */
+               r = KVM_MAX_VCPU_ID - 1;
                break;
        case KVM_CAP_PV_MMU:    /* obsolete */
                r = 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b50dbe269f4b..ba46c42a4a6f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3460,7 +3460,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
        struct kvm_vcpu *vcpu;
        struct page *page;

-       if (id >= KVM_MAX_VCPU_ID)
+       if (id > KVM_MAX_VCPU_ID)
                return -EINVAL;

        mutex_lock(&kvm->lock);
17:23:40 ✔ ~/go/src/kernel.org/host $ gdd
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index caaa0f592d8e..0292d8a5ce5e 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -434,7 +434,7 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu);
 #define SPLIT_HACK_OFFS                        0xfb000000

 /*
- * This packs a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the
+ * This packs a VCPU ID from the [0..KVM_MAX_VCPU_ID] space down to the
  * [0..KVM_MAX_VCPUS) space, using knowledge of the guest's core stride
  * (but not its actual threading mode, which is not available) to avoid
  * collisions.
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 9f52f282b1aa..beeebace8d1c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -33,11 +33,11 @@

 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 #include <asm/kvm_book3s_asm.h>                /* for MAX_SMT_THREADS */
-#define KVM_MAX_VCPU_ID                (MAX_SMT_THREADS * KVM_MAX_VCORES)
+#define KVM_MAX_VCPU_ID                (MAX_SMT_THREADS * KVM_MAX_VCORES) - 1
 #define KVM_MAX_NESTED_GUESTS  KVMPPC_NR_LPIDS

 #else
-#define KVM_MAX_VCPU_ID                KVM_MAX_VCPUS
+#define KVM_MAX_VCPU_ID                KVM_MAX_VCPUS - 1
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */

 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5d5c5ed7dd4..5c20c0bd4db6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4061,7 +4061,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
                r = KVM_MAX_VCPUS;
                break;
        case KVM_CAP_MAX_VCPU_ID:
-               r = KVM_MAX_VCPU_ID;
+               /* KVM's ABI is stupid. */
+               r = KVM_MAX_VCPU_ID + 1;
                break;
        case KVM_CAP_PV_MMU:    /* obsolete */
                r = 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae7735b490b4..37ef972caf4b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -40,7 +40,7 @@
 #include <linux/kvm_dirty_ring.h>

 #ifndef KVM_MAX_VCPU_ID
-#define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
+#define KVM_MAX_VCPU_ID KVM_MAX_VCPUS - 1
 #endif

 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b50dbe269f4b..ba46c42a4a6f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3460,7 +3460,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
        struct kvm_vcpu *vcpu;
        struct page *page;

-       if (id >= KVM_MAX_VCPU_ID)
+       if (id > KVM_MAX_VCPU_ID)
                return -EINVAL;

        mutex_lock(&kvm->lock);
Eduardo Habkost Sept. 11, 2021, 3:26 p.m. UTC | #3
On Fri, Sep 10, 2021 at 8:30 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Sep 06, 2021, Paolo Bonzini wrote:
> > On 03/09/21 23:15, Eduardo Habkost wrote:
> > > - Increases KVM_MAX_VCPU_ID from 1023 to 4096.
>
> ...
>
> > > Eduardo Habkost (3):
> > >    kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
>
> ...
>
> > >    kvm: x86: Increase MAX_VCPUS to 1024
> > >    kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
> > >
> > >   arch/x86/include/asm/kvm_host.h | 18 +++++++++++++++---
> > >   1 file changed, 15 insertions(+), 3 deletions(-)
> > >
> >
> > Queued, thanks.
>
> Before we commit to this, can we sort out the off-by-one mess that is KVM_MAX_VCPU_ID?
> As Eduardo pointed out[*], Juergen's commit 76b4f357d0e7 ("x86/kvm: fix vcpu-id
> indexed array sizes") _shouldn't_ be necessary because kvm_vm_ioctl_create_vcpu()
> rejects attempts to create id==KVM_MAX_VCPU_ID
>
>         if (id >= KVM_MAX_VCPU_ID)
>                 return -EINVAL;
>
> which aligns with the docs for KVM_CREATE_VCPU:
>
>         The vcpu id is an integer in the range [0, max_vcpu_id)
>
> but the fix is "needed" because rtc_irq_eoi_tracking_reset() does
>
>         bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_ID + 1);
>
> and now we have this
>
>         DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
>         u8 vectors[KVM_MAX_VCPU_ID + 1];
>
> which is wrong as well.  The "right" fix would have been to change
> rtc_irq_eoi_tracking_reset(), but that looks all kinds of wrong on the surface.
>
> Non-x86 really mucks it up because generic code does:
>
>         #ifndef KVM_MAX_VCPU_ID
>         #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
>         #endif
>
> which means pretty much everything else can create more vCPUs than vCPU IDs.
>
> Maybe fix KVM's internal KVM_MAX_VCPU_ID so that it's incluse, and handle the
> backwards compability mess in KVM_CAP_MAX_VCPU_ID?  Then have the max ID for x86
> be (4*KVM_MAX_VCPUS - 1).  E.g. something like:


Wouldn't it be simpler to just revert 76b4f357d0e7 and rename
KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS?

As far as I can see, every single line of code in KVM (except the 3
lines touched by 76b4f357d0e7) treats both KVM_MAX_VCPU_ID and
KVM_CAP_MAX_VCPU_ID as exclusive. Including the API documentation.

>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5d5c5ed7dd4..2e5c8081f72b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4061,7 +4061,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>                 r = KVM_MAX_VCPUS;
>                 break;
>         case KVM_CAP_MAX_VCPU_ID:
> -               r = KVM_MAX_VCPU_ID;
> +               /* KVM's ABI is stupid. */
> +               r = KVM_MAX_VCPU_ID - 1;
>                 break;
>         case KVM_CAP_PV_MMU:    /* obsolete */
>                 r = 0;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b50dbe269f4b..ba46c42a4a6f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3460,7 +3460,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>         struct kvm_vcpu *vcpu;
>         struct page *page;
>
> -       if (id >= KVM_MAX_VCPU_ID)
> +       if (id > KVM_MAX_VCPU_ID)
>                 return -EINVAL;
>
>         mutex_lock(&kvm->lock);
> 17:23:40 ✔ ~/go/src/kernel.org/host $ gdd
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index caaa0f592d8e..0292d8a5ce5e 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -434,7 +434,7 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu);
>  #define SPLIT_HACK_OFFS                        0xfb000000
>
>  /*
> - * This packs a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the
> + * This packs a VCPU ID from the [0..KVM_MAX_VCPU_ID] space down to the
>   * [0..KVM_MAX_VCPUS) space, using knowledge of the guest's core stride
>   * (but not its actual threading mode, which is not available) to avoid
>   * collisions.
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 9f52f282b1aa..beeebace8d1c 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -33,11 +33,11 @@
>
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  #include <asm/kvm_book3s_asm.h>                /* for MAX_SMT_THREADS */
> -#define KVM_MAX_VCPU_ID                (MAX_SMT_THREADS * KVM_MAX_VCORES)
> +#define KVM_MAX_VCPU_ID                (MAX_SMT_THREADS * KVM_MAX_VCORES) - 1
>  #define KVM_MAX_NESTED_GUESTS  KVMPPC_NR_LPIDS
>
>  #else
> -#define KVM_MAX_VCPU_ID                KVM_MAX_VCPUS
> +#define KVM_MAX_VCPU_ID                KVM_MAX_VCPUS - 1
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5d5c5ed7dd4..5c20c0bd4db6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4061,7 +4061,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>                 r = KVM_MAX_VCPUS;
>                 break;
>         case KVM_CAP_MAX_VCPU_ID:
> -               r = KVM_MAX_VCPU_ID;
> +               /* KVM's ABI is stupid. */
> +               r = KVM_MAX_VCPU_ID + 1;
>                 break;
>         case KVM_CAP_PV_MMU:    /* obsolete */
>                 r = 0;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae7735b490b4..37ef972caf4b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -40,7 +40,7 @@
>  #include <linux/kvm_dirty_ring.h>
>
>  #ifndef KVM_MAX_VCPU_ID
> -#define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
> +#define KVM_MAX_VCPU_ID KVM_MAX_VCPUS - 1
>  #endif
>
>  /*
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b50dbe269f4b..ba46c42a4a6f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3460,7 +3460,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>         struct kvm_vcpu *vcpu;
>         struct page *page;
>
> -       if (id >= KVM_MAX_VCPU_ID)
> +       if (id > KVM_MAX_VCPU_ID)
>                 return -EINVAL;
>
>         mutex_lock(&kvm->lock);
>
>
Jürgen Groß Sept. 13, 2021, 5:09 a.m. UTC | #4
On 11.09.21 17:26, Eduardo Habkost wrote:
> Wouldn't it be simpler to just revert 76b4f357d0e7 and rename
> KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS?
> 
> As far as I can see, every single line of code in KVM (except the 3
> lines touched by 76b4f357d0e7) treats both KVM_MAX_VCPU_ID and
> KVM_CAP_MAX_VCPU_ID as exclusive. Including the API documentation.

I agree.

Will send patches.


Juergen