diff mbox series

[1/5] KVM: arm64: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS

Message ID 20211111162746.100598-2-vkuznets@redhat.com (mailing list archive)
State Superseded
Headers show
Series KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS and re-purpose it on x86 | expand

Commit Message

Vitaly Kuznetsov Nov. 11, 2021, 4:27 p.m. UTC
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/arm64/kvm/arm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Marc Zyngier Nov. 11, 2021, 7:36 p.m. UTC | #1
Hi Vitaly,

On 2021-11-11 16:27, Vitaly Kuznetsov wrote:
> It doesn't make sense to return the recommended maximum number of
> vCPUs which exceeds the maximum possible number of vCPUs.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/arm64/kvm/arm.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7838e9fb693e..391dc7a921d5 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -223,7 +223,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> long ext)
>  		r = 1;
>  		break;
>  	case KVM_CAP_NR_VCPUS:
> -		r = num_online_cpus();
> +		if (kvm)
> +			r = min_t(unsigned int, num_online_cpus(),
> +				  kvm->arch.max_vcpus);
> +		else
> +			r = min_t(unsigned int, num_online_cpus(),
> +				  kvm_arm_default_max_vcpus());
>  		break;
>  	case KVM_CAP_MAX_VCPUS:
>  	case KVM_CAP_MAX_VCPU_ID:

This looks odd. This means that depending on the phase userspace is
in while initialising the VM, KVM_CAP_NR_VCPUS can return one thing
or the other.

For example, I create a VM on a 32 CPU system, NR_VCPUS says 32.
I create a GICv2 interrupt controller, it now says 8.

That's a change in behaviour that is visible by userspace, which
I'm keen on avoiding. I'd rather have the kvm and !kvm cases
return the same thing.

Thanks,

         M.
Vitaly Kuznetsov Nov. 12, 2021, 9:51 a.m. UTC | #2
Marc Zyngier <maz@kernel.org> writes:

> Hi Vitaly,
>
> On 2021-11-11 16:27, Vitaly Kuznetsov wrote:
>> It doesn't make sense to return the recommended maximum number of
>> vCPUs which exceeds the maximum possible number of vCPUs.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/arm64/kvm/arm.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 7838e9fb693e..391dc7a921d5 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -223,7 +223,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>  		r = 1;
>>  		break;
>>  	case KVM_CAP_NR_VCPUS:
>> -		r = num_online_cpus();
>> +		if (kvm)
>> +			r = min_t(unsigned int, num_online_cpus(),
>> +				  kvm->arch.max_vcpus);
>> +		else
>> +			r = min_t(unsigned int, num_online_cpus(),
>> +				  kvm_arm_default_max_vcpus());
>>  		break;
>>  	case KVM_CAP_MAX_VCPUS:
>>  	case KVM_CAP_MAX_VCPU_ID:
>
> This looks odd. This means that depending on the phase userspace is
> in while initialising the VM, KVM_CAP_NR_VCPUS can return one thing
> or the other.
>
> For example, I create a VM on a 32 CPU system, NR_VCPUS says 32.
> I create a GICv2 interrupt controller, it now says 8.
>
> That's a change in behaviour that is visible by userspace

Yes, I realize this is a userspace visible change. The reason I suggest
it is that logically, it seems very odd that the maximum recommended
number of vCPUs (KVM_CAP_NR_VCPUS) can be higher, than the maximum
supported number of vCPUs (KVM_CAP_MAX_VCPUS). All userspaces which use
this information somehow should already contain some workaround for this
case. (maybe it's a rare one and nobody hit it yet or maybe there are no
userspaces using KVM_CAP_NR_VCPUS for anything besides complaining --
like QEMU).

I'd like KVM to be consistent across architectures and have the same
(similar) meaning for KVM_CAP_NR_VCPUS.

> which I'm keen on avoiding. I'd rather have the kvm and !kvm cases
> return the same thing.

Forgive me my (ARM?) ignorance but what would it be then? If we go for
min(num_online_cpus(), kvm_arm_default_max_vcpus()) in both cases, cat
this can still go above KVM_CAP_MAX_VCPUS after vGIC is created?

Thanks for the feedback!
Andrew Jones Nov. 12, 2021, 10:38 a.m. UTC | #3
On Fri, Nov 12, 2021 at 10:51:10AM +0100, Vitaly Kuznetsov wrote:
> Marc Zyngier <maz@kernel.org> writes:
> 
> > Hi Vitaly,
> >
> > On 2021-11-11 16:27, Vitaly Kuznetsov wrote:
> >> It doesn't make sense to return the recommended maximum number of
> >> vCPUs which exceeds the maximum possible number of vCPUs.
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/arm64/kvm/arm.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index 7838e9fb693e..391dc7a921d5 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -223,7 +223,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> >> long ext)
> >>  		r = 1;
> >>  		break;
> >>  	case KVM_CAP_NR_VCPUS:
> >> -		r = num_online_cpus();
> >> +		if (kvm)
> >> +			r = min_t(unsigned int, num_online_cpus(),
> >> +				  kvm->arch.max_vcpus);
> >> +		else
> >> +			r = min_t(unsigned int, num_online_cpus(),
> >> +				  kvm_arm_default_max_vcpus());
> >>  		break;
> >>  	case KVM_CAP_MAX_VCPUS:
> >>  	case KVM_CAP_MAX_VCPU_ID:
> >
> > This looks odd. This means that depending on the phase userspace is
> > in while initialising the VM, KVM_CAP_NR_VCPUS can return one thing
> > or the other.
> >
> > For example, I create a VM on a 32 CPU system, NR_VCPUS says 32.
> > I create a GICv2 interrupt controller, it now says 8.
> >
> > That's a change in behaviour that is visible by userspace
> 
> Yes, I realize this is a userspace visible change. The reason I suggest
> it is that logically, it seems very odd that the maximum recommended
> number of vCPUs (KVM_CAP_NR_VCPUS) can be higher, than the maximum
> supported number of vCPUs (KVM_CAP_MAX_VCPUS). All userspaces which use
> this information somehow should already contain some workaround for this
> case. (maybe it's a rare one and nobody hit it yet or maybe there are no
> userspaces using KVM_CAP_NR_VCPUS for anything besides complaining --
> like QEMU).
> 
> I'd like KVM to be consistent across architectures and have the same
> (similar) meaning for KVM_CAP_NR_VCPUS.

KVM_CAP_NR_VCPUS seems pretty useless if we just want to tell userspace
the same thing it can get with _SC_NPROCESSORS_ONLN. In fact, if userspace
knows something we don't about the future onlining of some pcpus, then
maybe userspace would prefer to check _SC_NPROCESSORS_CONF.

> 
> > which I'm keen on avoiding. I'd rather have the kvm and !kvm cases
> > return the same thing.
> 
> Forgive me my (ARM?) ignorance but what would it be then? If we go for
> min(num_online_cpus(), kvm_arm_default_max_vcpus()) in both cases, cat
> this can still go above KVM_CAP_MAX_VCPUS after vGIC is created?

So the GIC version case looks like the type of thing that could make
KVM_CAP_NR_VCPUS useful, i.e. being able to tell userspace a maximum
number of vcpus supported for a given configuration. However, even
in that case the concept of "recommended" number doesn't make sense,
because, for the GICv2 example, a VM cannot configure more than 8 VCPUs,
so it's a real limit, not a recommendation. Maybe KVM_CAP_NR_VCPUS should
just be left alone, but deprecated, and, if there's need, a new CAP could
be created for a config-vcpu-max.

Thanks,
drew
Paolo Bonzini Nov. 12, 2021, 10:51 a.m. UTC | #4
On 11/12/21 11:38, Andrew Jones wrote:
>>
>> I'd like KVM to be consistent across architectures and have the same
>> (similar) meaning for KVM_CAP_NR_VCPUS.
> KVM_CAP_NR_VCPUS seems pretty useless if we just want to tell userspace
> the same thing it can get with _SC_NPROCESSORS_ONLN. In fact, if userspace
> knows something we don't about the future onlining of some pcpus, then
> maybe userspace would prefer to check _SC_NPROCESSORS_CONF.

It's the same for most architectures, but for example PPC has to take 
into account the handling of threads, which can exist while the VMs run 
but not in the host.  So KVM_CAP_NR_VCPUS on PPC is _SC_NPROCESSORS_CONF 
times the number of threads per core.

If you don't count PPC (not sure about s390), it _is_ pretty useless indeed.

Paolo
Marc Zyngier Nov. 12, 2021, 2:02 p.m. UTC | #5
On Fri, 12 Nov 2021 09:51:10 +0000,
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> > Hi Vitaly,
> >
> > On 2021-11-11 16:27, Vitaly Kuznetsov wrote:
> >> It doesn't make sense to return the recommended maximum number of
> >> vCPUs which exceeds the maximum possible number of vCPUs.
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/arm64/kvm/arm.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index 7838e9fb693e..391dc7a921d5 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -223,7 +223,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> >> long ext)
> >>  		r = 1;
> >>  		break;
> >>  	case KVM_CAP_NR_VCPUS:
> >> -		r = num_online_cpus();
> >> +		if (kvm)
> >> +			r = min_t(unsigned int, num_online_cpus(),
> >> +				  kvm->arch.max_vcpus);
> >> +		else
> >> +			r = min_t(unsigned int, num_online_cpus(),
> >> +				  kvm_arm_default_max_vcpus());
> >>  		break;
> >>  	case KVM_CAP_MAX_VCPUS:
> >>  	case KVM_CAP_MAX_VCPU_ID:
> >
> > This looks odd. This means that depending on the phase userspace is
> > in while initialising the VM, KVM_CAP_NR_VCPUS can return one thing
> > or the other.
> >
> > For example, I create a VM on a 32 CPU system, NR_VCPUS says 32.
> > I create a GICv2 interrupt controller, it now says 8.
> >
> > That's a change in behaviour that is visible by userspace
> 
> Yes, I realize this is a userspace visible change. The reason I suggest
> it is that logically, it seems very odd that the maximum recommended
> number of vCPUs (KVM_CAP_NR_VCPUS) can be higher, than the maximum
> supported number of vCPUs (KVM_CAP_MAX_VCPUS).

I'm all for this change.

> All userspaces which use
> this information somehow should already contain some workaround for this
> case. (maybe it's a rare one and nobody hit it yet or maybe there are no
> userspaces using KVM_CAP_NR_VCPUS for anything besides complaining --
> like QEMU).
> 
> I'd like KVM to be consistent across architectures and have the same
> (similar) meaning for KVM_CAP_NR_VCPUS.

Sure, but this is a pretty useless piece of information anyway. As
Andrew pointed out, the information is available somewhere else, and
all we need to do is to cap it to the number of supported vcpus, which
is effectively a KVM limitation.

Also, we are talking about representing the architecture to userspace.
No amount of massaging is going to make an arm64 box look like an x86.

> > which I'm keen on avoiding. I'd rather have the kvm and !kvm cases
> > return the same thing.
> 
> Forgive me my (ARM?) ignorance but what would it be then? If we go for
> min(num_online_cpus(), kvm_arm_default_max_vcpus()) in both cases, cat
> this can still go above KVM_CAP_MAX_VCPUS after vGIC is created?

"min(num_online_cpus(), kvm_arm_default_max_vcpus())" is probably the
right thing in all cases. Yes, KVM_CAP_NR_VCPUS will keep reporting
more than the VM can actually support. But that's why we have
KVM_CAP_MAX_VCPUS, which tells you now many vcpus you can create for a
given configuration.

This shows how useless KVM_CAP_NR_VCPUS is, and I wouldn't mind a
documentation patch stating this.

Thanks,

	M.
Paolo Bonzini Nov. 12, 2021, 2:10 p.m. UTC | #6
On 11/12/21 15:02, Marc Zyngier wrote:
>> I'd like KVM to be consistent across architectures and have the same
>> (similar) meaning for KVM_CAP_NR_VCPUS.
> Sure, but this is a pretty useless piece of information anyway. As
> Andrew pointed out, the information is available somewhere else, and
> all we need to do is to cap it to the number of supported vcpus, which
> is effectively a KVM limitation.
> 
> Also, we are talking about representing the architecture to userspace.
> No amount of massaging is going to make an arm64 box look like an x86.

Not sure what you mean?  The API is about providing a piece of 
information independent of the architecture, while catering for a ppc 
weirdness.  Yes it's mostly useless if you don't care about ppc, but 
it's not about making arm64 look like x86 or ppc; it's about not having 
to special case ppc in userspace.

If anything, if KVM_CAP_NR_VCPUS returns the same for kvm and !kvm, then 
*that* is making an arm64 box look like an x86.  On ARM the max vCPUs 
depends on VM's GIC configuration, so KVM_CAP_NR_VCPUS should take that 
into account.  Or KVM_CAP_NR_VCPUS should have been only for !kvm; but 
the ship for that has sailed.

Paolo

>>> which I'm keen on avoiding. I'd rather have the kvm and !kvm cases
>>> return the same thing.
>> Forgive me my (ARM?) ignorance but what would it be then? If we go for
>> min(num_online_cpus(), kvm_arm_default_max_vcpus()) in both cases, cat
>> this can still go above KVM_CAP_MAX_VCPUS after vGIC is created?
> "min(num_online_cpus(), kvm_arm_default_max_vcpus())" is probably the
> right thing in all cases. Yes, KVM_CAP_NR_VCPUS will keep reporting
> more than the VM can actually support. But that's why we have
> KVM_CAP_MAX_VCPUS, which tells you now many vcpus you can create for a
> given configuration.
Vitaly Kuznetsov Nov. 16, 2021, 1:23 p.m. UTC | #7
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/12/21 15:02, Marc Zyngier wrote:
>>> I'd like KVM to be consistent across architectures and have the same
>>> (similar) meaning for KVM_CAP_NR_VCPUS.
>> Sure, but this is a pretty useless piece of information anyway. As
>> Andrew pointed out, the information is available somewhere else, and
>> all we need to do is to cap it to the number of supported vcpus, which
>> is effectively a KVM limitation.
>> 
>> Also, we are talking about representing the architecture to userspace.
>> No amount of massaging is going to make an arm64 box look like an x86.
>
> Not sure what you mean?  The API is about providing a piece of 
> information independent of the architecture, while catering for a ppc 
> weirdness.  Yes it's mostly useless if you don't care about ppc, but 
> it's not about making arm64 look like x86 or ppc; it's about not having 
> to special case ppc in userspace.
>
> If anything, if KVM_CAP_NR_VCPUS returns the same for kvm and !kvm, then 
> *that* is making an arm64 box look like an x86.  On ARM the max vCPUs 
> depends on VM's GIC configuration, so KVM_CAP_NR_VCPUS should take that 
> into account.

(I'm about to send v2 as we have s390 sorted out.)

So what do we decide about ARM? 
- Current approach (kvm->arch.max_vcpus/kvm_arm_default_max_vcpus()
 depending on 'if (kvm)') - that would be my preference.
- Always kvm_arm_default_max_vcpus to make the output independent on 'if
 (kvm)'.
- keep the status quo (drop the patch).

Please advise)
Paolo Bonzini Nov. 16, 2021, 3:50 p.m. UTC | #8
On 11/16/21 14:23, Vitaly Kuznetsov wrote:
> (I'm about to send v2 as we have s390 sorted out.)
> 
> So what do we decide about ARM?
> - Current approach (kvm->arch.max_vcpus/kvm_arm_default_max_vcpus()
>   depending on 'if (kvm)') - that would be my preference.

That would be mine too.

Paolo

> - Always kvm_arm_default_max_vcpus to make the output independent on 'if
>   (kvm)'.
> - keep the status quo (drop the patch).
Marc Zyngier Nov. 16, 2021, 3:55 p.m. UTC | #9
On Tue, 16 Nov 2021 13:23:25 +0000,
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 11/12/21 15:02, Marc Zyngier wrote:
> >>> I'd like KVM to be consistent across architectures and have the same
> >>> (similar) meaning for KVM_CAP_NR_VCPUS.
> >> Sure, but this is a pretty useless piece of information anyway. As
> >> Andrew pointed out, the information is available somewhere else, and
> >> all we need to do is to cap it to the number of supported vcpus, which
> >> is effectively a KVM limitation.
> >> 
> >> Also, we are talking about representing the architecture to userspace.
> >> No amount of massaging is going to make an arm64 box look like an x86.
> >
> > Not sure what you mean?  The API is about providing a piece of 
> > information independent of the architecture, while catering for a ppc 
> > weirdness.  Yes it's mostly useless if you don't care about ppc, but 
> > it's not about making arm64 look like x86 or ppc; it's about not having 
> > to special case ppc in userspace.
> >
> > If anything, if KVM_CAP_NR_VCPUS returns the same for kvm and !kvm, then 
> > *that* is making an arm64 box look like an x86.  On ARM the max vCPUs 
> > depends on VM's GIC configuration, so KVM_CAP_NR_VCPUS should take that 
> > into account.
> 
> (I'm about to send v2 as we have s390 sorted out.)
> 
> So what do we decide about ARM? 

[...]

> - Always kvm_arm_default_max_vcpus to make the output independent on 'if
>  (kvm)'.

This. Between two useless numbers, I prefer the one that doesn't
introduce any userspace visible changes.

Thanks,

	M.
Paolo Bonzini Nov. 16, 2021, 3:58 p.m. UTC | #10
On 11/16/21 16:55, Marc Zyngier wrote:
>> - Always kvm_arm_default_max_vcpus to make the output independent on 'if
>>   (kvm)'.
> This. Between two useless numbers, I prefer the one that doesn't
> introduce any userspace visible changes.

Fair enough, I'm not going to override you---but please add a comment 
that says

	/*
	 * arm64 treats KVM_CAP_NR_CPUS different from all other
	 * architectures, as it does not bound it to num_online_cpus().
	 * It should not matter much because this is just an advisory
	 * value.
	 */

or something like that.

Paolo
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7838e9fb693e..391dc7a921d5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -223,7 +223,12 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 	case KVM_CAP_NR_VCPUS:
-		r = num_online_cpus();
+		if (kvm)
+			r = min_t(unsigned int, num_online_cpus(),
+				  kvm->arch.max_vcpus);
+		else
+			r = min_t(unsigned int, num_online_cpus(),
+				  kvm_arm_default_max_vcpus());
 		break;
 	case KVM_CAP_MAX_VCPUS:
 	case KVM_CAP_MAX_VCPU_ID: