diff mbox series

[5/5] KVM: Convert the kvm->vcpus array to a xarray

Message ID 20211105192101.3862492-6-maz@kernel.org (mailing list archive)
State Superseded
Headers show
Series KVM: Turn the vcpu array into an xarray | expand

Commit Message

Marc Zyngier Nov. 5, 2021, 7:21 p.m. UTC
At least on arm64 and x86, the vcpus array is pretty huge (512 entries),
and is mostly empty in most cases (running 512 vcpu VMs is not that
common). This mean that we end-up with a 4kB block of unused memory
in the middle of the kvm structure.

Instead of wasting away this memory, let's use an xarray instead,
which gives us almost the same flexibility as a normal array, but
with a reduced memory usage with smaller VMs.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 include/linux/kvm_host.h |  5 +++--
 virt/kvm/kvm_main.c      | 15 +++++++++------
 2 files changed, 12 insertions(+), 8 deletions(-)

Comments

Sean Christopherson Nov. 5, 2021, 8:21 p.m. UTC | #1
On Fri, Nov 05, 2021, Marc Zyngier wrote:
> At least on arm64 and x86, the vcpus array is pretty huge (512 entries),
> and is mostly empty in most cases (running 512 vcpu VMs is not that
> common). This mean that we end-up with a 4kB block of unused memory
> in the middle of the kvm structure.

Heh, x86 is now up to 1024 entries.
 
> Instead of wasting away this memory, let's use an xarray instead,
> which gives us almost the same flexibility as a normal array, but
> with a reduced memory usage with smaller VMs.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
>  
>  	/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
>  	smp_rmb();
> -	return kvm->vcpus[i];
> +	return xa_load(&kvm->vcpu_array, i);
>  }

It'd be nice for this series to convert kvm_for_each_vcpu() to use xa_for_each()
as well.  Maybe as a patch on top so that potential explosions from that are
isolated from the initiali conversion?

Or maybe even use xa_for_each_range() to cap at online_vcpus?  That's technically
a functional change, but IMO it's easier to reason about iterating over a snapshot
of vCPUs as opposed to being able to iterate over vCPUs as their being added.  In
practice I doubt it matters.

#define kvm_for_each_vcpu(idx, vcpup, kvm) \
	xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, atomic_read(&kvm->online_vcpus))
Marc Zyngier Nov. 6, 2021, 11:48 a.m. UTC | #2
On Fri, 05 Nov 2021 20:21:36 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Fri, Nov 05, 2021, Marc Zyngier wrote:
> > At least on arm64 and x86, the vcpus array is pretty huge (512 entries),
> > and is mostly empty in most cases (running 512 vcpu VMs is not that
> > common). This mean that we end-up with a 4kB block of unused memory
> > in the middle of the kvm structure.
> 
> Heh, x86 is now up to 1024 entries.

Humph. I don't want to know whether people are actually using that in
practice. The only time I create VMs with 512 vcpus is to check
whether it still works...

>  
> > Instead of wasting away this memory, let's use an xarray instead,
> > which gives us almost the same flexibility as a normal array, but
> > with a reduced memory usage with smaller VMs.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> > @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> >  
> >  	/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
> >  	smp_rmb();
> > -	return kvm->vcpus[i];
> > +	return xa_load(&kvm->vcpu_array, i);
> >  }
> 
> It'd be nice for this series to convert kvm_for_each_vcpu() to use
> xa_for_each() as well.  Maybe as a patch on top so that potential
> explosions from that are isolated from the initiali conversion?
> 
> Or maybe even use xa_for_each_range() to cap at online_vcpus?
> That's technically a functional change, but IMO it's easier to
> reason about iterating over a snapshot of vCPUs as opposed to being
> able to iterate over vCPUs as their being added.  In practice I
> doubt it matters.
> 
> #define kvm_for_each_vcpu(idx, vcpup, kvm) \
> 	xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, atomic_read(&kvm->online_vcpus))
>

I think that's already the behaviour of this iterator (we stop at the
first empty slot capped to online_vcpus. The only change in behaviour
is that vcpup currently holds a pointer to the last vcpu in no empty
slot has been encountered. xa_for_each{,_range}() would set the
pointer to NULL at all times.

I doubt anyone relies on that, but it is probably worth eyeballing
some of the use cases...

Thanks,

	M.
Marc Zyngier Nov. 8, 2021, 8:23 a.m. UTC | #3
On 2021-11-06 11:48, Marc Zyngier wrote:
> On Fri, 05 Nov 2021 20:21:36 +0000,
> Sean Christopherson <seanjc@google.com> wrote:
>> 
>> On Fri, Nov 05, 2021, Marc Zyngier wrote:
>> > At least on arm64 and x86, the vcpus array is pretty huge (512 entries),
>> > and is mostly empty in most cases (running 512 vcpu VMs is not that
>> > common). This mean that we end-up with a 4kB block of unused memory
>> > in the middle of the kvm structure.
>> 
>> Heh, x86 is now up to 1024 entries.
> 
> Humph. I don't want to know whether people are actually using that in
> practice. The only time I create VMs with 512 vcpus is to check
> whether it still works...
> 
>> 
>> > Instead of wasting away this memory, let's use an xarray instead,
>> > which gives us almost the same flexibility as a normal array, but
>> > with a reduced memory usage with smaller VMs.
>> >
>> > Signed-off-by: Marc Zyngier <maz@kernel.org>
>> > ---
>> > @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
>> >
>> >  	/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
>> >  	smp_rmb();
>> > -	return kvm->vcpus[i];
>> > +	return xa_load(&kvm->vcpu_array, i);
>> >  }
>> 
>> It'd be nice for this series to convert kvm_for_each_vcpu() to use
>> xa_for_each() as well.  Maybe as a patch on top so that potential
>> explosions from that are isolated from the initiali conversion?
>> 
>> Or maybe even use xa_for_each_range() to cap at online_vcpus?
>> That's technically a functional change, but IMO it's easier to
>> reason about iterating over a snapshot of vCPUs as opposed to being
>> able to iterate over vCPUs as their being added.  In practice I
>> doubt it matters.
>> 
>> #define kvm_for_each_vcpu(idx, vcpup, kvm) \
>> 	xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, 
>> atomic_read(&kvm->online_vcpus))
>> 
> 
> I think that's already the behaviour of this iterator (we stop at the
> first empty slot capped to online_vcpus. The only change in behaviour
> is that vcpup currently holds a pointer to the last vcpu in no empty
> slot has been encountered. xa_for_each{,_range}() would set the
> pointer to NULL at all times.
> 
> I doubt anyone relies on that, but it is probably worth eyeballing
> some of the use cases...

This turned out to be an interesting exercise, as we always use an
int for the index, and the xarray iterators insist on an unsigned
long (and even on a pointer to it). On the other hand, I couldn't
spot any case where we'd rely on the last value of the vcpu pointer.

I'll repost the series once we have a solution for patch #4, and
we can then decide whether we want the iterator churn.
diff mbox series

Patch

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 36967291b8c6..3933d825e28b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -29,6 +29,7 @@ 
 #include <linux/refcount.h>
 #include <linux/nospec.h>
 #include <linux/notifier.h>
+#include <linux/xarray.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -552,7 +553,7 @@  struct kvm {
 	struct mutex slots_arch_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
 	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
-	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+	struct xarray vcpu_array;
 
 	/* Used to wait for completion of MMU notifiers.  */
 	spinlock_t mn_invalidate_lock;
@@ -693,7 +694,7 @@  static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
 
 	/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
 	smp_rmb();
-	return kvm->vcpus[i];
+	return xa_load(&kvm->vcpu_array, i);
 }
 
 #define kvm_for_each_vcpu(idx, vcpup, kvm) \
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d83553eeea21..4c18d7911fa5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -461,7 +461,7 @@  void kvm_destroy_vcpus(struct kvm *kvm)
 
 	mutex_lock(&kvm->lock);
 	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-		kvm->vcpus[i] = NULL;
+		xa_erase(&kvm->vcpu_array, i);
 
 	atomic_set(&kvm->online_vcpus, 0);
 	mutex_unlock(&kvm->lock);
@@ -1066,6 +1066,7 @@  static struct kvm *kvm_create_vm(unsigned long type)
 	mutex_init(&kvm->slots_arch_lock);
 	spin_lock_init(&kvm->mn_invalidate_lock);
 	rcuwait_init(&kvm->mn_memslots_update_rcuwait);
+	xa_init(&kvm->vcpu_array);
 
 	INIT_LIST_HEAD(&kvm->devices);
 
@@ -3661,7 +3662,10 @@  static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 	}
 
 	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
-	BUG_ON(kvm->vcpus[vcpu->vcpu_idx]);
+	r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
+	BUG_ON(r == -EBUSY);
+	if (r)
+		goto unlock_vcpu_destroy;
 
 	/* Fill the stats id string for the vcpu */
 	snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d",
@@ -3671,15 +3675,14 @@  static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 	kvm_get_kvm(kvm);
 	r = create_vcpu_fd(vcpu);
 	if (r < 0) {
+		xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
 		kvm_put_kvm_no_destroy(kvm);
 		goto unlock_vcpu_destroy;
 	}
 
-	kvm->vcpus[vcpu->vcpu_idx] = vcpu;
-
 	/*
-	 * Pairs with smp_rmb() in kvm_get_vcpu.  Write kvm->vcpus
-	 * before kvm->online_vcpu's incremented value.
+	 * Pairs with smp_rmb() in kvm_get_vcpu.  Store the vcpu
+	 * pointer before kvm->online_vcpu's incremented value.
 	 */
 	smp_wmb();
 	atomic_inc(&kvm->online_vcpus);