diff mbox

[14/21] KVM: x86: Software disabled APIC should still deliver NMIs

Message ID 02C5C22B-2809-438B-BD42-8C235F1E5CEE@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nadav Amit Nov. 26, 2014, 5:01 p.m. UTC
Paolo Bonzini <pbonzini@redhat.com> wrote:

> 
> 
> On 06/11/2014 17:45, Radim Kr?má? wrote:
>> 2014-11-06 10:34+0100, Paolo Bonzini:
>>> On 05/11/2014 21:45, Nadav Amit wrote:
>>>> If I understand the SDM correctly, in such scenario (all APICs are
>>>> software disabled) the mode is left as the default - flat mode (see
>> 
>> APIC doesn't have any global mode (it is just KVM's simplification), so
>> when a message lands on the system bus, it just compares MDA with LDR
>> and DFR ...
>> 
>>>> section 10.6.2.2 "Logical Destination Mode”): "All processors that
>>>> have their APIC software enabled (using the spurious vector
>>>> enable/disable bit) must have their DFRs (Destination Format
>>>> Registers) programmed identically. The default mode for DFR is flat
>>>> mode.”
>> 
>> I think the "default mode" points to reset state, which is flat DFR;
>> and it is mentioned only because of the following sentence
>>  If you are using cluster mode, DFRs must be programmed before the APIC
>>  is software enabled.
>> 
>>> That's not what either Bochs or QEMU do, though.  (Though in the case of
>>> Bochs I cannot find the place where reception of IPIs is prevented for
>>> software-disabled APICs, so I'm not sure how much to trust it in this case).
>>> 
>>> I'm not sure why software-disabled APICs could have different DFRs, if
>>> the APICs can receive NMI IPIs.  I'll ask Intel.
>> 
>> When changing the mode, we can't switch DFR synchronously, so it has to
>> happen and NMI may be needed (watchdog?) before APIC configuration.
>> Explicit statement might have been a hint to be _extra_ careful when
>> using logical destination for INIT, NMI, ... I wonder what they'll say.
>> 
>> Anyway, Paolo's patch seems to be in the right direction, I'd modify it
>> a bit though:
>> 
>> LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
>> don't set it and decide the mode by the last nonzero one.
>> This works in a situation, where one part is configured for cluster and
>> the rest is still in reset state.
>> 
>> (It gets harder if we allow nonzero LDRs with different DFR ...
>> we'd need to split our logical map to handle it.)
>> 
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 758f838..6da303e1 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 		goto out;
>> 
>> 	new->ldr_bits = 8;
>> -	/* flat mode is default */
>> -	new->cid_shift = 8;
>> -	new->cid_mask = 0;
>> -	new->lid_mask = 0xff;
>> 	new->broadcast = APIC_BROADCAST;
>> 
>> 	kvm_for_each_vcpu(i, vcpu, kvm) {
>> @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>> 			new->lid_mask = 0xffff;
>> 			new->broadcast = X2APIC_BROADCAST;
>> -		} else if (kvm_apic_hw_enabled(apic)) {
>> +		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>> 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>> 							APIC_DFR_CLUSTER) {
>> 				new->cid_shift = 4;
> 
> I merged this patch and Nadav’s.

Sorry for the late and long reply, but I got an issue with the new version
(and my previous version as well). Indeed, the SDM states that DFR should
be the same for enabled CPUs, and that the BIOS should get all CPUs in
either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
in xAPIC/x2APIC mode.

In my tests (which pass on bare-metal), I got a scenario in which some CPUs
are in xAPIC mode, the BSP changed (which is currently not handled correctly
by KVM) and the BSP has x2APIC enabled. All the core APICs are
software-enabled. The expected behaviour is that the CPUs with x2APIC
enabled would work in x2APIC mode.

I think such a transitory scenario is possible on real-systems as well,
perhaps during CPU hot-plug. It appears the previous version (before all of
our changes) handled it better. I presume the most efficient way is to start
determining the APIC logical mode from the BSP, and if it is disabled,
traverse the rest of the CPUs until finding the first one with APIC enabled.
Yet, I have not finished doing and checking the BSP fix and other dependent
INIT signal handling fixes.

In the meanwhile, would you be ok with restoring some of the previous
behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
whether APIC is software enabled), otherwise use the configuration of the
last enabled APIC?

-- >8 —
Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map

---
 arch/x86/kvm/lapic.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

Comments

Paolo Bonzini Nov. 26, 2014, 6 p.m. UTC | #1
On 26/11/2014 18:01, Nadav Amit wrote:
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>>
>>
>> On 06/11/2014 17:45, Radim Kr?má? wrote:
>>> 2014-11-06 10:34+0100, Paolo Bonzini:
>>>> On 05/11/2014 21:45, Nadav Amit wrote:
>>>>> If I understand the SDM correctly, in such scenario (all APICs are
>>>>> software disabled) the mode is left as the default - flat mode (see
>>>
>>> APIC doesn't have any global mode (it is just KVM's simplification), so
>>> when a message lands on the system bus, it just compares MDA with LDR
>>> and DFR ...
>>>
>>>>> section 10.6.2.2 "Logical Destination Mode”): "All processors that
>>>>> have their APIC software enabled (using the spurious vector
>>>>> enable/disable bit) must have their DFRs (Destination Format
>>>>> Registers) programmed identically. The default mode for DFR is flat
>>>>> mode.”
>>>
>>> I think the "default mode" points to reset state, which is flat DFR;
>>> and it is mentioned only because of the following sentence
>>>  If you are using cluster mode, DFRs must be programmed before the APIC
>>>  is software enabled.
>>>
>>>> That's not what either Bochs or QEMU do, though.  (Though in the case of
>>>> Bochs I cannot find the place where reception of IPIs is prevented for
>>>> software-disabled APICs, so I'm not sure how much to trust it in this case).
>>>>
>>>> I'm not sure why software-disabled APICs could have different DFRs, if
>>>> the APICs can receive NMI IPIs.  I'll ask Intel.
>>>
>>> When changing the mode, we can't switch DFR synchronously, so it has to
>>> happen and NMI may be needed (watchdog?) before APIC configuration.
>>> Explicit statement might have been a hint to be _extra_ careful when
>>> using logical destination for INIT, NMI, ... I wonder what they'll say.
>>>
>>> Anyway, Paolo's patch seems to be in the right direction, I'd modify it
>>> a bit though:
>>>
>>> LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
>>> don't set it and decide the mode by the last nonzero one.
>>> This works in a situation, where one part is configured for cluster and
>>> the rest is still in reset state.
>>>
>>> (It gets harder if we allow nonzero LDRs with different DFR ...
>>> we'd need to split our logical map to handle it.)
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 758f838..6da303e1 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
>>> 		goto out;
>>>
>>> 	new->ldr_bits = 8;
>>> -	/* flat mode is default */
>>> -	new->cid_shift = 8;
>>> -	new->cid_mask = 0;
>>> -	new->lid_mask = 0xff;
>>> 	new->broadcast = APIC_BROADCAST;
>>>
>>> 	kvm_for_each_vcpu(i, vcpu, kvm) {
>>> @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>>> 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>>> 			new->lid_mask = 0xffff;
>>> 			new->broadcast = X2APIC_BROADCAST;
>>> -		} else if (kvm_apic_hw_enabled(apic)) {
>>> +		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>>> 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>>> 							APIC_DFR_CLUSTER) {
>>> 				new->cid_shift = 4;
>>
>> I merged this patch and Nadav’s.
> 
> Sorry for the late and long reply, but I got an issue with the new version
> (and my previous version as well). Indeed, the SDM states that DFR should
> be the same for enabled CPUs, and that the BIOS should get all CPUs in
> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
> in xAPIC/x2APIC mode.
> 
> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
> are in xAPIC mode, the BSP changed (which is currently not handled correctly
> by KVM) and the BSP has x2APIC enabled. All the core APICs are
> software-enabled. The expected behaviour is that the CPUs with x2APIC
> enabled would work in x2APIC mode.
> 
> I think such a transitory scenario is possible on real-systems as well,
> perhaps during CPU hot-plug. It appears the previous version (before all of
> our changes) handled it better. I presume the most efficient way is to start
> determining the APIC logical mode from the BSP, and if it is disabled,
> traverse the rest of the CPUs until finding the first one with APIC enabled.
> Yet, I have not finished doing and checking the BSP fix and other dependent
> INIT signal handling fixes.
> 
> In the meanwhile, would you be ok with restoring some of the previous
> behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
> whether APIC is software enabled), otherwise use the configuration of the
> last enabled APIC?
> 
> -- >8 —
> Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
> 
> ---
>  arch/x86/kvm/lapic.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9c90d31..6dc2be6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>  	struct kvm_apic_map *new, *old = NULL;
>  	struct kvm_vcpu *vcpu;
>  	int i;
> +	bool any_enabled = false;
>  
>  	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>  
> @@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		if (!kvm_apic_present(vcpu))
>  			continue;
>  
> +		/*
> +		 * All APICs DFRs have to be configured the same mode by an OS.
> +		 * We take advatage of this while building logical id lookup
> +		 * table. After reset APICs are in software disabled mode, so if
> +		 * we find apic with different setting we assume this is the mode
> +		 * OS wants all apics to be in; build lookup table accordingly.
> +		 */
>  		if (apic_x2apic_mode(apic)) {
>  			new->ldr_bits = 32;
>  			new->cid_shift = 16;
>  			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>  			new->lid_mask = 0xffff;
>  			new->broadcast = X2APIC_BROADCAST;
> -		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
> +			break;
> +		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
>  			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>  							APIC_DFR_CLUSTER) {
>  				new->cid_shift = 4;
> @@ -179,15 +188,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>  			}
>  		}
>  
> -		/*
> -		 * All APICs have to be configured in the same mode by an OS.
> -		 * We take advatage of this while building logical id loockup
> -		 * table. After reset APICs are in software disabled mode, so if
> -		 * we find apic with different setting we assume this is the mode
> -		 * OS wants all apics to be in; build lookup table accordingly.
> -		 */
>  		if (kvm_apic_sw_enabled(apic))
> -			break;
> +			any_enabled = true;
>  	}
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> 

Yes, this looks good.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Nov. 27, 2014, 1:39 p.m. UTC | #2
2014-11-26 19:01+0200, Nadav Amit:
> Sorry for the late and long reply, but I got an issue with the new version
> (and my previous version as well). Indeed, the SDM states that DFR should
> be the same for enabled CPUs, and that the BIOS should get all CPUs in
> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
> in xAPIC/x2APIC mode.
> 
> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
> are in xAPIC mode, the BSP changed (which is currently not handled correctly
> by KVM) and the BSP has x2APIC enabled.

How many (V)CPUs were you using?
(We fail hard with logical destination x2APIC and 16+ VCPUs.)

>                                         All the core APICs are
> software-enabled. The expected behaviour is that the CPUs with x2APIC
> enabled would work in x2APIC mode.

(Nice, I bet that made some Intel designers happy.)

There shouldn't be any message conflict when using APIC IDs <255, so it
might be possible if the x2APIC isn't programmed to issue weird
messages, like physical to nonexistent APIC ID 0xff000000, which would
be also interpreted as xAPIC broadcast.

> I think such a transitory scenario is possible on real-systems as well,
> perhaps during CPU hot-plug. It appears the previous version (before all of
> our changes) handled it better. I presume the most efficient way is to start
> determining the APIC logical mode from the BSP, and if it is disabled,
> traverse the rest of the CPUs until finding the first one with APIC enabled.
> Yet, I have not finished doing and checking the BSP fix and other dependent
> INIT signal handling fixes.
> 
> In the meanwhile, would you be ok with restoring some of the previous
> behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
> whether APIC is software enabled), otherwise use the configuration of the
> last enabled APIC?

I don't think this patch improves anything.
(Both behaviors are wrong and I think the current one is a bit less so.)

Our x2APIC implementation is a hack that allowed faster IPI thanks to 1
MSR exit instead of 2 MMIO ones.  No OS, that doesn't know KVM's
limitations, should have enabled it because we didn't emulate interrupt
remapping, which is an architectural requirement for x2APIC ...

And for more concrete points:
- Physical x2APIC isn't affected (only broadcast, which is incorrect
  either way)

- Logical x2APIC and xAPIC don't work at the same time
  - Btw. logical x2APIC isn't supposed to work (see KVM_X2APIC_CID_BITS)
  - Logical xAPIC is shifted incorrectly in x2APIC mode, so they are all
    going to be inaccessible (ldr = 0)
  - Our map isn't designed to allow x2APIC and xAPIC at the same time

- Your patch does not cover the case where sw-disabled x2APIC is
  "before" sw-enabled xAPIC, only if it is after.

> -- >8 —
> Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
> 
> ---
>  arch/x86/kvm/lapic.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9c90d31..6dc2be6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>  	struct kvm_apic_map *new, *old = NULL;
>  	struct kvm_vcpu *vcpu;
>  	int i;
> +	bool any_enabled = false;
>  
>  	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>  
> @@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		if (!kvm_apic_present(vcpu))
>  			continue;
>  
> +		/*
> +		 * All APICs DFRs have to be configured the same mode by an OS.
> +		 * We take advatage of this while building logical id lookup
> +		 * table. After reset APICs are in software disabled mode, so if
> +		 * we find apic with different setting we assume this is the mode
> +		 * OS wants all apics to be in; build lookup table accordingly.
> +		 */
>  		if (apic_x2apic_mode(apic)) {
>  			new->ldr_bits = 32;
>  			new->cid_shift = 16;
>  			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>  			new->lid_mask = 0xffff;
>  			new->broadcast = X2APIC_BROADCAST;
> -		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
> +			break;
> +		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
>  			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>  							APIC_DFR_CLUSTER) {
>  				new->cid_shift = 4;
> @@ -179,15 +188,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>  			}
>  		}
>  
> -		/*
> -		 * All APICs have to be configured in the same mode by an OS.
> -		 * We take advatage of this while building logical id loockup
> -		 * table. After reset APICs are in software disabled mode, so if
> -		 * we find apic with different setting we assume this is the mode
> -		 * OS wants all apics to be in; build lookup table accordingly.
> -		 */
>  		if (kvm_apic_sw_enabled(apic))
> -			break;
> +			any_enabled = true;
>  	}
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> -- 
> 1.9.1
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nadav Amit Nov. 27, 2014, 9:45 p.m. UTC | #3
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2014-11-26 19:01+0200, Nadav Amit:
>> Sorry for the late and long reply, but I got an issue with the new version
>> (and my previous version as well). Indeed, the SDM states that DFR should
>> be the same for enabled CPUs, and that the BIOS should get all CPUs in
>> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
>> in xAPIC/x2APIC mode.
>> 
>> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
>> are in xAPIC mode, the BSP changed (which is currently not handled correctly
>> by KVM) and the BSP has x2APIC enabled.
> 
> How many (V)CPUs were you using?
> (We fail hard with logical destination x2APIC and 16+ VCPUs.)
2 at the moment. What failure do you refer to?

> 
>>                                        All the core APICs are
>> software-enabled. The expected behaviour is that the CPUs with x2APIC
>> enabled would work in x2APIC mode.
> 
> (Nice, I bet that made some Intel designers happy.)
> 
> There shouldn't be any message conflict when using APIC IDs <255, so it
> might be possible if the x2APIC isn't programmed to issue weird
> messages, like physical to nonexistent APIC ID 0xff000000, which would
> be also interpreted as xAPIC broadcast.
> 
>> I think such a transitory scenario is possible on real-systems as well,
>> perhaps during CPU hot-plug. It appears the previous version (before all of
>> our changes) handled it better. I presume the most efficient way is to start
>> determining the APIC logical mode from the BSP, and if it is disabled,
>> traverse the rest of the CPUs until finding the first one with APIC enabled.
>> Yet, I have not finished doing and checking the BSP fix and other dependent
>> INIT signal handling fixes.
>> 
>> In the meanwhile, would you be ok with restoring some of the previous
>> behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
>> whether APIC is software enabled), otherwise use the configuration of the
>> last enabled APIC?
> 
> I don't think this patch improves anything.
> (Both behaviors are wrong and I think the current one is a bit less so.)
> 
> Our x2APIC implementation is a hack that allowed faster IPI thanks to 1
> MSR exit instead of 2 MMIO ones.  No OS, that doesn't know KVM's
> limitations, should have enabled it because we didn't emulate interrupt
> remapping, which is an architectural requirement for x2APIC …
It is a shame - I was under the impression QEMU emulation of the Intel IOMMU
would include it as well, and I now see they only did DMAR…

> And for more concrete points:
> - Physical x2APIC isn't affected (only broadcast, which is incorrect
>  either way)
> 
> - Logical x2APIC and xAPIC don't work at the same time
No, but it is important to determine what is the “consensus” APIC mode.

>  - Btw. logical x2APIC isn't supposed to work (see KVM_X2APIC_CID_BITS)
Why? It is as if there is only a single cluster. You can still send an APIC
message to multiple CPUs within the same cluster (0).

>  - Logical xAPIC is shifted incorrectly in x2APIC mode, so they are all
>    going to be inaccessible (ldr = 0)
>  - Our map isn't designed to allow x2APIC and xAPIC at the same time
> 
> - Your patch does not cover the case where sw-disabled x2APIC is
>  "before" sw-enabled xAPIC, only if it is after.
I thought I covered it. The rationale was that if any lapic is in x2APIC
mode, then the are all in x2APIC mode. It is done similarly to the previous
version (3.18).

Anyhow, I have my workarounds, so do as you find appropriately. Once I deal
with the BSP issues, I may resubmit another version.

Nadav

>> -- >8 —
>> Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
>> 
>> ---
>> arch/x86/kvm/lapic.c | 20 +++++++++++---------
>> 1 file changed, 11 insertions(+), 9 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 9c90d31..6dc2be6 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 	struct kvm_apic_map *new, *old = NULL;
>> 	struct kvm_vcpu *vcpu;
>> 	int i;
>> +	bool any_enabled = false;
>> 
>> 	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>> 
>> @@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 		if (!kvm_apic_present(vcpu))
>> 			continue;
>> 
>> +		/*
>> +		 * All APICs DFRs have to be configured the same mode by an OS.
>> +		 * We take advatage of this while building logical id lookup
>> +		 * table. After reset APICs are in software disabled mode, so if
>> +		 * we find apic with different setting we assume this is the mode
>> +		 * OS wants all apics to be in; build lookup table accordingly.
>> +		 */
>> 		if (apic_x2apic_mode(apic)) {
>> 			new->ldr_bits = 32;
>> 			new->cid_shift = 16;
>> 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>> 			new->lid_mask = 0xffff;
>> 			new->broadcast = X2APIC_BROADCAST;
>> -		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>> +			break;
>> +		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
>> 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>> 							APIC_DFR_CLUSTER) {
>> 				new->cid_shift = 4;
>> @@ -179,15 +188,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 			}
>> 		}
>> 
>> -		/*
>> -		 * All APICs have to be configured in the same mode by an OS.
>> -		 * We take advatage of this while building logical id loockup
>> -		 * table. After reset APICs are in software disabled mode, so if
>> -		 * we find apic with different setting we assume this is the mode
>> -		 * OS wants all apics to be in; build lookup table accordingly.
>> -		 */
>> 		if (kvm_apic_sw_enabled(apic))
>> -			break;
>> +			any_enabled = true;
>> 	}
>> 
>> 	kvm_for_each_vcpu(i, vcpu, kvm) {
>> -- 
>> 1.9.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Nov. 27, 2014, 10:26 p.m. UTC | #4
2014-11-27 23:45+0200, Nadav Amit:
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
> > 2014-11-26 19:01+0200, Nadav Amit:
> >> Sorry for the late and long reply, but I got an issue with the new version
> >> (and my previous version as well). Indeed, the SDM states that DFR should
> >> be the same for enabled CPUs, and that the BIOS should get all CPUs in
> >> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
> >> in xAPIC/x2APIC mode.
> >> 
> >> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
> >> are in xAPIC mode, the BSP changed (which is currently not handled correctly
> >> by KVM) and the BSP has x2APIC enabled.
> > 
> > How many (V)CPUs were you using?
> > (We fail hard with logical destination x2APIC and 16+ VCPUs.)
> 2 at the moment. What failure do you refer to?

(I'll cover it under KVM_X2APIC_CID_BITS.)

xAPIC shouldn't have ever made it into the logical map under x2APIC ...
Were you testing with broadcasts?

> > Our x2APIC implementation is a hack that allowed faster IPI thanks to 1
> > MSR exit instead of 2 MMIO ones.  No OS, that doesn't know KVM's
> > limitations, should have enabled it because we didn't emulate interrupt
> > remapping, which is an architectural requirement for x2APIC …
> It is a shame - I was under the impression QEMU emulation of the Intel IOMMU
> would include it as well, and I now see they only did DMAR…

(and we had this x2APIC for years ...)

> > And for more concrete points:
> > - Physical x2APIC isn't affected (only broadcast, which is incorrect
> >  either way)
> > 
> > - Logical x2APIC and xAPIC don't work at the same time
> No, but it is important to determine what is the “consensus” APIC mode.

Only for our abstraction, SDM's APICs don't need it and I'd rather see
us not depend on it either ...
(Sanity check: if you do xAPIC broadcast when there is xAPIC and x2APIC
 on real hw, does the xAPIC receive it? And if x2APIC sends 0xff000000?)

> >  - Btw. logical x2APIC isn't supposed to work (see KVM_X2APIC_CID_BITS)
> Why? It is as if there is only a single cluster. You can still send an APIC
> message to multiple CPUs within the same cluster (0).

KVM_X2APIC_CID_BITS = 0 meant that all VCPUs and messages got mapped
into cluster 0.
If you had 32 VCPUs, at least half of them wouldn't have a pointer in
the map -- and those left out would most likely be within first 16 APIC
ids, so messages would go completely off.

> >  - Logical xAPIC is shifted incorrectly in x2APIC mode, so they are all
> >    going to be inaccessible (ldr = 0)
> >  - Our map isn't designed to allow x2APIC and xAPIC at the same time
> > 
> > - Your patch does not cover the case where sw-disabled x2APIC is
> >  "before" sw-enabled xAPIC, only if it is after.
> I thought I covered it. The rationale was that if any lapic is in x2APIC
> mode, then the are all in x2APIC mode. It is done similarly to the previous
> version (3.18).

True, sorry, I missed the 'break;' in x2apic path.

We can't deliver xAPIC and x2APIC broadcasts/logical messages at the
same time with current KVM and this patch just switches the working case
in favour of x2APIC, which is why I didn't think it was necessary ...
(And I didn't understand why prefer disabled x2APIC to enabled xAPIC.)

> Anyhow, I have my workarounds, so do as you find appropriately. Once I deal
> with the BSP issues, I may resubmit another version.

I don't really mind having it, guests worked even with more broken code,
and this patch helps at least one use case :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Dec. 1, 2014, 4:30 p.m. UTC | #5
On 27/11/2014 23:26, Radim Kr?má? wrote:
> We can't deliver xAPIC and x2APIC broadcasts/logical messages at the
> same time with current KVM and this patch just switches the working case
> in favour of x2APIC, which is why I didn't think it was necessary ...
> (And I didn't understand why prefer disabled x2APIC to enabled xAPIC.)

Indeed.  A possible thing to do perhaps would be to build two logical
maps, one for x2APIC and one for xAPIC, and consult both...  That would
slow down the common case when one map is empty, though.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Dec. 1, 2014, 5:49 p.m. UTC | #6
2014-12-01 17:30+0100, Paolo Bonzini:
> On 27/11/2014 23:26, Radim Kr?má? wrote:
> > We can't deliver xAPIC and x2APIC broadcasts/logical messages at the
> > same time with current KVM and this patch just switches the working case
> > in favour of x2APIC, which is why I didn't think it was necessary ...
> > (And I didn't understand why prefer disabled x2APIC to enabled xAPIC.)
> 
> Indeed.  A possible thing to do perhaps would be to build two logical
> maps, one for x2APIC and one for xAPIC, and consult both...  That would
> slow down the common case when one map is empty, though.

Yeah, that seems like the cleanest solution ... I'll prepare the patch.
(so we can then throw it away :)

We'll need to take a look at the source (IO)APIC now.
Some design points I'm not sure about now:
 - Does xAPIC ignore reserved bytes in destination?
   (= Is x2APIC broadcast also delivered to xAPICs?)
 - we'll probably still have to hack it to deliver IOAPIC requests to
   low x2APIC logical addresses for compatibility reasons
 - to gain speed, we could try to encode them into the same cache-line
     struct {kvm_apic_t *xapic, kvm_apic_t *x2apic} logical_map[16][16];
   (and waste some space instead)


---
If we wanted to be super correct, with respect to sw-disabled xAPIC,
there should be three: x2APIC, flat xAPIC, and cluster xAPIC :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9c90d31..6dc2be6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -139,6 +139,7 @@  static void recalculate_apic_map(struct kvm *kvm)
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	int i;
+	bool any_enabled = false;
 
 	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
 
@@ -160,13 +161,21 @@  static void recalculate_apic_map(struct kvm *kvm)
 		if (!kvm_apic_present(vcpu))
 			continue;
 
+		/*
+		 * All APICs DFRs have to be configured the same mode by an OS.
+		 * We take advatage of this while building logical id lookup
+		 * table. After reset APICs are in software disabled mode, so if
+		 * we find apic with different setting we assume this is the mode
+		 * OS wants all apics to be in; build lookup table accordingly.
+		 */
 		if (apic_x2apic_mode(apic)) {
 			new->ldr_bits = 32;
 			new->cid_shift = 16;
 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
 			new->lid_mask = 0xffff;
 			new->broadcast = X2APIC_BROADCAST;
-		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
+			break;
+		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
 							APIC_DFR_CLUSTER) {
 				new->cid_shift = 4;
@@ -179,15 +188,8 @@  static void recalculate_apic_map(struct kvm *kvm)
 			}
 		}
 
-		/*
-		 * All APICs have to be configured in the same mode by an OS.
-		 * We take advatage of this while building logical id loockup
-		 * table. After reset APICs are in software disabled mode, so if
-		 * we find apic with different setting we assume this is the mode
-		 * OS wants all apics to be in; build lookup table accordingly.
-		 */
 		if (kvm_apic_sw_enabled(apic))
-			break;
+			any_enabled = true;
 	}
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {