diff mbox

[4/4] KVM: x86: allow hotplug of VCPU with APIC ID over 0xff

Message ID 20161202194401.10038-5-rkrcmar@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář Dec. 2, 2016, 7:44 p.m. UTC
LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
but there is no way to uniquely address it using xAPIC.

From many possible options, we chose the one that also works on real
hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
xAPIC mode.

KVM intentionally differs from real hardware, because real hardware
(Knights Landing) does just "x2apic_id & 0xff" to decide whether to
accept the interrupt in xAPIC mode and it can deliver one interrupt to
more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Add a capability to let userspace know that we do something now.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/lapic.c     | 13 +++++++++++--
 arch/x86/kvm/x86.c       |  1 +
 include/uapi/linux/kvm.h |  1 +
 3 files changed, 13 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Dec. 5, 2016, 2:37 p.m. UTC | #1
Am 02.12.2016 um 20:44 schrieb Radim Krčmář:
> LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
> VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
> but there is no way to uniquely address it using xAPIC.
>
> From many possible options, we chose the one that also works on real
> hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
> xAPIC mode.
>
> KVM intentionally differs from real hardware, because real hardware
> (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
> accept the interrupt in xAPIC mode and it can deliver one interrupt to
> more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
>
> Add a capability to let userspace know that we do something now.

Should we allow user space to turn it on/off for compatibility handling? 
Or do we just not care? (or how will this capability be used later on?)
Radim Krčmář Dec. 5, 2016, 4:02 p.m. UTC | #2
2016-12-05 15:37+0100, David Hildenbrand:
> Am 02.12.2016 um 20:44 schrieb Radim Krčmář:
>> LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
>> VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
>> but there is no way to uniquely address it using xAPIC.
>> 
>> From many possible options, we chose the one that also works on real
>> hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
>> xAPIC mode.
>> 
>> KVM intentionally differs from real hardware, because real hardware
>> (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
>> accept the interrupt in xAPIC mode and it can deliver one interrupt to
>> more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
>> 
>> Add a capability to let userspace know that we do something now.
> 
> Should we allow user space to turn it on/off for compatibility handling? Or
> do we just not care?

There should be no guest that relies on the previous behavior, so I'd
forgo the toggle, because it would be extra conditions in the code.
I'd add it as a flag to KVM_CAP_X2APIC_API if you have reasons to let
userspace choose.

>                      (or how will this capability be used later on?)

New userspace should check this capability and disable hotplug of VCPUs
with id over 255 if KVM doesn't support it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Hildenbrand Dec. 5, 2016, 6 p.m. UTC | #3
Am 05.12.2016 um 17:02 schrieb Radim Krčmář:
> 2016-12-05 15:37+0100, David Hildenbrand:
>> Am 02.12.2016 um 20:44 schrieb Radim Krčmář:
>>> LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
>>> VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
>>> but there is no way to uniquely address it using xAPIC.
>>>
>>> From many possible options, we chose the one that also works on real
>>> hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
>>> xAPIC mode.
>>>
>>> KVM intentionally differs from real hardware, because real hardware
>>> (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
>>> accept the interrupt in xAPIC mode and it can deliver one interrupt to
>>> more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
>>>
>>> Add a capability to let userspace know that we do something now.
>>
>> Should we allow user space to turn it on/off for compatibility handling? Or
>> do we just not care?
>
> There should be no guest that relies on the previous behavior, so I'd
> forgo the toggle, because it would be extra conditions in the code.
> I'd add it as a flag to KVM_CAP_X2APIC_API if you have reasons to let
> userspace choose.

Okay I see. So if existing user space/guests don't break, there is no 
reason to make it configurable. I was just not sure if user space might 
want to decide whether to act "the old way".

>
>>                      (or how will this capability be used later on?)
>
> New userspace should check this capability and disable hotplug of VCPUs
> with id over 255 if KVM doesn't support it.
>

Wonder if this is actually a bugfix for allowing KVM_MAX_VCPU_ID to
be > 255. Currently it is somewhat like

"yes, I support VCPU ids with > 255, but no, you can't really hotplug
such CPUs".

(fix for older kernels would then be limiting the VCPU ID to 255 and
not introducing a new capability).

But I am no expert on that topic, so feel free to ignore.

The general idea of this patch makes sense to me (x2apic hack)!
Radim Krčmář Dec. 5, 2016, 8:57 p.m. UTC | #4
2016-12-05 19:00+0100, David Hildenbrand:
> Am 05.12.2016 um 17:02 schrieb Radim Krčmář:
>> 2016-12-05 15:37+0100, David Hildenbrand:
>> > Am 02.12.2016 um 20:44 schrieb Radim Krčmář:
>> > > LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
>> > > VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
>> > > but there is no way to uniquely address it using xAPIC.
>> > > 
>> > > From many possible options, we chose the one that also works on real
>> > > hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
>> > > xAPIC mode.
>> > > 
>> > > KVM intentionally differs from real hardware, because real hardware
>> > > (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
>> > > accept the interrupt in xAPIC mode and it can deliver one interrupt to
>> > > more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
>> > > 
>> > > Add a capability to let userspace know that we do something now.
>> > 
>> > Should we allow user space to turn it on/off for compatibility handling? Or
>> > do we just not care?
>> 
>> There should be no guest that relies on the previous behavior, so I'd
>> forgo the toggle, because it would be extra conditions in the code.
>> I'd add it as a flag to KVM_CAP_X2APIC_API if you have reasons to let
>> userspace choose.
> 
> Okay I see. So if existing user space/guests don't break, there is no reason
> to make it configurable. I was just not sure if user space might want to
> decide whether to act "the old way".

I also don't see a reason for userspace to want it disabled -- it just
shouldn't matter even if userspace implements another solution (e.g. it
hotplugs VCPUs in x2APIC mode) or KVM ends up with a better solution.
Any change can break some guest, but I couldn't with anything reasonable
that would be broken.

>> >                      (or how will this capability be used later on?)
>> 
>> New userspace should check this capability and disable hotplug of VCPUs
>> with id over 255 if KVM doesn't support it.
>> 
> 
> Wonder if this is actually a bugfix for allowing KVM_MAX_VCPU_ID to
> be > 255. Currently it is somewhat like

Good point, it is, for guests that want hotplug.  I'll add Fixes: line;
thanks!

> "yes, I support VCPU ids with > 255, but no, you can't really hotplug
> such CPUs".

My bad, offline/online in Linux worked fine so I didn't think enough
about hotplug.

> (fix for older kernels would then be limiting the VCPU ID to 255 and
> not introducing a new capability).
> 
> But I am no expert on that topic, so feel free to ignore.

I think the agreement is to embrace compatibility, so we pile new
mistakes to hide known ones.
(Rewriting the past requires far more power than accepting it:
 If we didn't force unfixed kernels out of existence, then userspace
 couldn't tell if hotplug up to high VCPU ID limit is supported.)

> The general idea of this patch makes sense to me (x2apic hack)!

The situation would be a bit better if xAPIC ID was read-only (we'd
behave more like real-hardware then), but no major OS changes the ID,
which makes it a secondary concern with weird corner-cases.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Hildenbrand Dec. 6, 2016, 9:37 a.m. UTC | #5
> I think the agreement is to embrace compatibility, so we pile new
> mistakes to hide known ones.
> (Rewriting the past requires far more power than accepting it:
>  If we didn't force unfixed kernels out of existence, then userspace
>  couldn't tell if hotplug up to high VCPU ID limit is supported.)

I agree, the question is how old the bug is (you should know better than 
me :) ) and if introducing a capability is strictly necessary. Do we 
have to do the check in QEMU or can we simply fix implementations out 
there silently.

(especially as hotplugging cpuid > 255 doesn't sound like setups wildly 
used already today - and it doesn't work ;) ). But as I said, I don't 
know the history, so you decide if this check in QEMU is necessary.

Fix all QEMUs (introduce capability check) vs fix all relevant kernels 
(limiting VCPU id to 255).
Radim Krčmář Dec. 6, 2016, 12:52 p.m. UTC | #6
2016-12-06 10:37+0100, David Hildenbrand:
>> I think the agreement is to embrace compatibility, so we pile new
>> mistakes to hide known ones.
>> (Rewriting the past requires far more power than accepting it:
>>  If we didn't force unfixed kernels out of existence, then userspace
>>  couldn't tell if hotplug up to high VCPU ID limit is supported.)
> 
> I agree, the question is how old the bug is (you should know better than me
> :) )

Just half a year old, since v4.7.

>      and if introducing a capability is strictly necessary. Do we have to do
> the check in QEMU or can we simply fix implementations out there silently.

This fix is too big for stable and I don't think that patches outside of
stable get backported much.

> (especially as hotplugging cpuid > 255 doesn't sound like setups wildly used
> already today - and it doesn't work ;) ).

Yes, it seems that no-one using high APIC ID noticed/cared.

>                                           But as I said, I don't know the
> history, so you decide if this check in QEMU is necessary.

QEMU can decide not to check (I actually expect it won't :]).
I think the option to check is worth two lines of code in KVM, though.

> Fix all QEMUs (introduce capability check) vs fix all relevant kernels
> (limiting VCPU id to 255).

APID ID over 255 works without hotplug and has few users, so lowering
the limit would regress cases that are more important, IMO.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Dec. 7, 2016, 12:03 p.m. UTC | #7
On 02/12/2016 20:44, Radim Krčmář wrote:
> LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
> VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
> but there is no way to uniquely address it using xAPIC.
> 
> From many possible options, we chose the one that also works on real
> hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
> xAPIC mode.
> 
> KVM intentionally differs from real hardware, because real hardware
> (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
> accept the interrupt in xAPIC mode and it can deliver one interrupt to
> more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
> 
> Add a capability to let userspace know that we do something now.

I wouldn't even bother with the capability.  It's a bit borderline for
stable, but I think it's okay.  We can always Cc and let the maintainer
reject it.

Paolo

> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/lapic.c     | 13 +++++++++++--
>  arch/x86/kvm/x86.c       |  1 +
>  include/uapi/linux/kvm.h |  1 +
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 42edf1ea2909..b85985985ac8 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -199,10 +199,15 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		xapic_id = kvm_xapic_id(apic);
>  		x2apic_id = kvm_x2apic_id(apic);
>  
> -		if (apic_x2apic_mode(apic) &&
> +		/* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
> +		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
>  				x2apic_id <= new->max_apic_id)
>  			new->phys_map[x2apic_id] = apic;
> -		else if (!apic_x2apic_mode(apic))
> +		/*
> +		 * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
> +		 * prevent them from masking VCPUs with APIC ID <= 0xff.
> +		 */
> +		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
>  			new->phys_map[xapic_id] = apic;
>  
>  		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> @@ -612,6 +617,10 @@ static bool kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 mda)
>  	if (apic_x2apic_mode(apic))
>  		return mda == kvm_x2apic_id(apic);
>  
> +	/* Hotplug hack: LAPIC in xAPIC mode also accepts x2APIC. */
> +	if (kvm_x2apic_id(apic) > 0xff && mda == kvm_x2apic_id(apic))
> +		return true;
> +
>  	return mda == kvm_xapic_id(apic);
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7770d77c828d..945e8eeb4eb1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2618,6 +2618,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_DISABLE_QUIRKS:
>  	case KVM_CAP_SET_BOOT_CPU_ID:
>   	case KVM_CAP_SPLIT_IRQCHIP:
> +	case KVM_CAP_X86_ALWAYS_ACCEPT_X2APIC_DEST:
>  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
>  	case KVM_CAP_ASSIGN_DEV_IRQ:
>  	case KVM_CAP_PCI_2_3:
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index e9f5ceffd741..f25efa375255 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -871,6 +871,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_S390_USER_INSTR0 130
>  #define KVM_CAP_MSI_DEVID 131
>  #define KVM_CAP_PPC_HTM 132
> +#define KVM_CAP_X86_ALWAYS_ACCEPT_X2APIC_DEST 133
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Dec. 7, 2016, 3:47 p.m. UTC | #8
2016-12-07 13:03+0100, Paolo Bonzini:
> On 02/12/2016 20:44, Radim Krčmář wrote:
>> LAPIC after reset is in xAPIC mode, which poses a problem for hotplug of
>> VCPUs with high APIC ID, because reset VCPU is waiting for INIT/SIPI,
>> but there is no way to uniquely address it using xAPIC.
>> 
>> From many possible options, we chose the one that also works on real
>> hardware: accepting interrupts addressed to LAPIC's x2APIC ID even in
>> xAPIC mode.
>> 
>> KVM intentionally differs from real hardware, because real hardware
>> (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
>> accept the interrupt in xAPIC mode and it can deliver one interrupt to
>> more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
>> 
>> Add a capability to let userspace know that we do something now.
> 
> I wouldn't even bother with the capability.  It's a bit borderline for
> stable, but I think it's okay.  We can always Cc and let the maintainer
> reject it.

Ok, David also mentioned that I should tone down compatibility ...
Thanks to both, I'll prepare v2.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 42edf1ea2909..b85985985ac8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -199,10 +199,15 @@  static void recalculate_apic_map(struct kvm *kvm)
 		xapic_id = kvm_xapic_id(apic);
 		x2apic_id = kvm_x2apic_id(apic);
 
-		if (apic_x2apic_mode(apic) &&
+		/* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
+		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
 				x2apic_id <= new->max_apic_id)
 			new->phys_map[x2apic_id] = apic;
-		else if (!apic_x2apic_mode(apic))
+		/*
+		 * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
+		 * prevent them from masking VCPUs with APIC ID <= 0xff.
+		 */
+		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
 			new->phys_map[xapic_id] = apic;
 
 		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
@@ -612,6 +617,10 @@  static bool kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 mda)
 	if (apic_x2apic_mode(apic))
 		return mda == kvm_x2apic_id(apic);
 
+	/* Hotplug hack: LAPIC in xAPIC mode also accepts x2APIC. */
+	if (kvm_x2apic_id(apic) > 0xff && mda == kvm_x2apic_id(apic))
+		return true;
+
 	return mda == kvm_xapic_id(apic);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7770d77c828d..945e8eeb4eb1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2618,6 +2618,7 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_DISABLE_QUIRKS:
 	case KVM_CAP_SET_BOOT_CPU_ID:
  	case KVM_CAP_SPLIT_IRQCHIP:
+	case KVM_CAP_X86_ALWAYS_ACCEPT_X2APIC_DEST:
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 	case KVM_CAP_ASSIGN_DEV_IRQ:
 	case KVM_CAP_PCI_2_3:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e9f5ceffd741..f25efa375255 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -871,6 +871,7 @@  struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_USER_INSTR0 130
 #define KVM_CAP_MSI_DEVID 131
 #define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_X86_ALWAYS_ACCEPT_X2APIC_DEST 133
 
 #ifdef KVM_CAP_IRQ_ROUTING