diff mbox

[v2,2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt

Message ID 1512461786-6465-3-git-send-email-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liran Alon Dec. 5, 2017, 8:16 a.m. UTC
In case posted-interrupt was delivered to CPU while it is in host
(outside guest), then posted-interrupt delivery will be done by
calling sync_pir_to_irr() at vmentry after interrupts are disabled.

sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
ensure virtual-interrupt-delivery will dispatch interrupt to guest.

However, it is possible that L1 will receive a posted-interrupt while
CPU runs at host and is about to enter L2. In this case, the call to
sync_pir_to_irr() will indeed update the L1's APIC IRR but
vcpu_enter_guest() will then just resume into L2 guest without
re-evaluating if it should exit from L2 to L1 as a result of this
new pending L1 event.

To address this case, if sync_pir_to_irr() has a new L1 injectable
interrupt and CPU is running L2, we set KVM_REQ_EVENT.
This will cause vcpu_enter_guest() to run another iteration of
evaluating pending KVM requests and will therefore consume
KVM_REQ_EVENT which will make sure to call check_nested_events() which
will handle the pending L1 event properly.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/vmx.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Radim Krčmář Dec. 6, 2017, 6:52 p.m. UTC | #1
2017-12-05 10:16+0200, Liran Alon:
> In case posted-interrupt was delivered to CPU while it is in host
> (outside guest), then posted-interrupt delivery will be done by
> calling sync_pir_to_irr() at vmentry after interrupts are disabled.
> 
> sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
> set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
> ensure virtual-interrupt-delivery will dispatch interrupt to guest.
> 
> However, it is possible that L1 will receive a posted-interrupt while
> CPU runs at host and is about to enter L2. In this case, the call to
> sync_pir_to_irr() will indeed update the L1's APIC IRR but
> vcpu_enter_guest() will then just resume into L2 guest without
> re-evaluating if it should exit from L2 to L1 as a result of this
> new pending L1 event.
> 
> To address this case, if sync_pir_to_irr() has a new L1 injectable
> interrupt and CPU is running L2, we set KVM_REQ_EVENT.
> This will cause vcpu_enter_guest() to run another iteration of
> evaluating pending KVM requests and will therefore consume
> KVM_REQ_EVENT which will make sure to call check_nested_events() which
> will handle the pending L1 event properly.
> 
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/kvm/vmx.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f5074ec5701b..47bbb8b691e8 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9031,20 +9031,33 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int prev_max_irr;
>  	int max_irr;
>  
>  	WARN_ON(!vcpu->arch.apicv_active);
> +
> +	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
>  	if (pi_test_on(&vmx->pi_desc)) {
>  		pi_clear_on(&vmx->pi_desc);
> +
>  		/*
>  		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>  		 * But on x86 this is just a compiler barrier anyway.
>  		 */
>  		smp_mb__after_atomic();
>  		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);

I think the optimization (partly livelock protection) is not worth the
overhead of two IRR scans for non-nested guests.  Please make
kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

> +
> +		/*
> +		 * If we are running L2 and L1 has a new pending interrupt
> +		 * which can be injected, we should re-evaluate
> +		 * what should be done with this new L1 interrupt.
> +		 */
> +		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
> +			kvm_make_request(KVM_REQ_EVENT, vcpu);

We don't need anything from KVM_REQ_EVENT and only use it to abort the
VM entry, kvm_vcpu_exiting_guest_mode() is better for that.

>  	} else {
> -		max_irr = kvm_lapic_find_highest_irr(vcpu);
> +		max_irr = prev_max_irr;
>  	}
> +
>  	vmx_hwapic_irr_update(vcpu, max_irr);

We also should just inject the interrupt if L2 is run without
nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?

Thanks.
Liran Alon Dec. 7, 2017, 2:29 a.m. UTC | #2
On 06/12/17 20:52, Radim Krčmář wrote:
> 2017-12-05 10:16+0200, Liran Alon:
>> In case posted-interrupt was delivered to CPU while it is in host
>> (outside guest), then posted-interrupt delivery will be done by
>> calling sync_pir_to_irr() at vmentry after interrupts are disabled.
>>
>> sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
>> set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
>> ensure virtual-interrupt-delivery will dispatch interrupt to guest.
>>
>> However, it is possible that L1 will receive a posted-interrupt while
>> CPU runs at host and is about to enter L2. In this case, the call to
>> sync_pir_to_irr() will indeed update the L1's APIC IRR but
>> vcpu_enter_guest() will then just resume into L2 guest without
>> re-evaluating if it should exit from L2 to L1 as a result of this
>> new pending L1 event.
>>
>> To address this case, if sync_pir_to_irr() has a new L1 injectable
>> interrupt and CPU is running L2, we set KVM_REQ_EVENT.
>> This will cause vcpu_enter_guest() to run another iteration of
>> evaluating pending KVM requests and will therefore consume
>> KVM_REQ_EVENT which will make sure to call check_nested_events() which
>> will handle the pending L1 event properly.
>>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>   arch/x86/kvm/vmx.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f5074ec5701b..47bbb8b691e8 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9031,20 +9031,33 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>>   static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +	int prev_max_irr;
>>   	int max_irr;
>>
>>   	WARN_ON(!vcpu->arch.apicv_active);
>> +
>> +	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
>>   	if (pi_test_on(&vmx->pi_desc)) {
>>   		pi_clear_on(&vmx->pi_desc);
>> +
>>   		/*
>>   		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>>   		 * But on x86 this is just a compiler barrier anyway.
>>   		 */
>>   		smp_mb__after_atomic();
>>   		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
>
> I think the optimization (partly livelock protection) is not worth the
> overhead of two IRR scans for non-nested guests.  Please make
> kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

OK. I will modify kvm_apic_update_irr().

>
>> +
>> +		/*
>> +		 * If we are running L2 and L1 has a new pending interrupt
>> +		 * which can be injected, we should re-evaluate
>> +		 * what should be done with this new L1 interrupt.
>> +		 */
>> +		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
>> +			kvm_make_request(KVM_REQ_EVENT, vcpu);
>
> We don't need anything from KVM_REQ_EVENT and only use it to abort the
> VM entry, kvm_vcpu_exiting_guest_mode() is better for that.

Yes you are right. I will change to kvm_vcpu_exiting_guest_mode().

>
>>   	} else {
>> -		max_irr = kvm_lapic_find_highest_irr(vcpu);
>> +		max_irr = prev_max_irr;
>>   	}
>> +
>>   	vmx_hwapic_irr_update(vcpu, max_irr);
>
> We also should just inject the interrupt if L2 is run without
> nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?
See next patch in series :)

>
> Thanks.
>

Regards,
-Liran
Paolo Bonzini Dec. 11, 2017, 10:53 p.m. UTC | #3
On 06/12/2017 19:52, Radim Krčmář wrote:
>>  		smp_mb__after_atomic();
>>  		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> I think the optimization (partly livelock protection) is not worth the
> overhead of two IRR scans for non-nested guests.  Please make
> kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

You could also return max_irr in an int*, and give the function a "bool"
return type for max_irr > prev_max_irr.  That is more efficient because
you can do the check in the "if (pir_val)" conditional of
__kvm_apic_update_irr.

Paolo

>> +
>> +		/*
>> +		 * If we are running L2 and L1 has a new pending interrupt
>> +		 * which can be injected, we should re-evaluate
>> +		 * what should be done with this new L1 interrupt.
>> +		 */
>> +		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
>> +			kvm_make_request(KVM_REQ_EVENT, vcpu);
> We don't need anything from KVM_REQ_EVENT and only use it to abort the
> VM entry, kvm_vcpu_exiting_guest_mode() is better for that.
> 
>>  	} else {
>> -		max_irr = kvm_lapic_find_highest_irr(vcpu);
>> +		max_irr = prev_max_irr;
>>  	}
>> +
>>  	vmx_hwapic_irr_update(vcpu, max_irr);
> We also should just inject the interrupt if L2 is run without
> nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f5074ec5701b..47bbb8b691e8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9031,20 +9031,33 @@  static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int prev_max_irr;
 	int max_irr;
 
 	WARN_ON(!vcpu->arch.apicv_active);
+
+	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
 	if (pi_test_on(&vmx->pi_desc)) {
 		pi_clear_on(&vmx->pi_desc);
+
 		/*
 		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
 		 * But on x86 this is just a compiler barrier anyway.
 		 */
 		smp_mb__after_atomic();
 		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
+
+		/*
+		 * If we are running L2 and L1 has a new pending interrupt
+		 * which can be injected, we should re-evaluate
+		 * what should be done with this new L1 interrupt.
+		 */
+		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
 	} else {
-		max_irr = kvm_lapic_find_highest_irr(vcpu);
+		max_irr = prev_max_irr;
 	}
+
 	vmx_hwapic_irr_update(vcpu, max_irr);
 	return max_irr;
 }