diff mbox

KVM: nVMX: Fix bug of injecting L2 exception into L1

Message ID 2d56b269-0d21-10a3-80ce-19e989d6903b@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Nov. 20, 2017, 9:47 p.m. UTC
On 19/11/2017 17:25, Liran Alon wrote:
> L2 RDMSR could be handled as described below:
> 1) L2 exits to L0 on RDMSR and calls handle_rdmsr().
> 2) handle_rdmsr() calls kvm_inject_gp() which sets
> KVM_REQ_EVENT, exception.pending=true and exception.injected=false.
> 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls
> inject_pending_event() which calls vmx_check_nested_events()
> which sees that exception.pending=true but
> nested_vmx_check_exception() returns 0 and therefore does nothing at
> this point. However let's assume it later sees vmx-preemption-timer
> expired and therefore exits from L2 to L1 by calling
> nested_vmx_vmexit().> 4) nested_vmx_vmexit() calls prepare_vmcs12()
> which calls vmcs12_save_pending_event() but it does nothing as
> exception.injected is false. Also prepare_vmcs12() calls
> kvm_clear_exception_queue() which does nothing as
> exception.injected is already false.
> 5) We now return from vmx_check_nested_events() with 0 while still
> having exception.pending=true!
> 6) Therefore inject_pending_event() continues
> and we inject L2 exception to L1!...

But this is buggy as well, because the #GP is lost, isn't it?

Is the bug that the preemption timer vmexit should only be injected if 
there are no pending exceptions?  In fact, the same bug could also 
happened for NMIs or interrupts, I think.

So, something like (101% untested):


Paolo

> This commit will fix above issue by changing step (4) to
> clear exception.pending in kvm_clear_exception_queue().
> 
> Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not
> yet been injected")
> 
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> ---
>  arch/x86/kvm/vmx.c | 1 -
>  arch/x86/kvm/x86.h | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 7c3522a989d0..bee08782c781 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11035,7 +11035,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>  		if (vmx->nested.nested_run_pending)
>  			return -EBUSY;
>  		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> -		vcpu->arch.exception.pending = false;
>  		return 0;
>  	}
>  
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index d0b95b7a90b4..6d112d8f799c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -12,6 +12,7 @@
>  
>  static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
>  {
> +	vcpu->arch.exception.pending = false;
>  	vcpu->arch.exception.injected = false;
>  }
>  
> 

Should kvm_clear_interrupt_queue do the same?

Thanks,

Paolo

Comments

Liran Alon Nov. 20, 2017, 10:46 p.m. UTC | #1
On 20/11/17 23:47, Paolo Bonzini wrote:
> On 19/11/2017 17:25, Liran Alon wrote:
>> L2 RDMSR could be handled as described below:
>> 1) L2 exits to L0 on RDMSR and calls handle_rdmsr().
>> 2) handle_rdmsr() calls kvm_inject_gp() which sets
>> KVM_REQ_EVENT, exception.pending=true and exception.injected=false.
>> 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls
>> inject_pending_event() which calls vmx_check_nested_events()
>> which sees that exception.pending=true but
>> nested_vmx_check_exception() returns 0 and therefore does nothing at
>> this point. However let's assume it later sees vmx-preemption-timer
>> expired and therefore exits from L2 to L1 by calling
>> nested_vmx_vmexit().> 4) nested_vmx_vmexit() calls prepare_vmcs12()
>> which calls vmcs12_save_pending_event() but it does nothing as
>> exception.injected is false. Also prepare_vmcs12() calls
>> kvm_clear_exception_queue() which does nothing as
>> exception.injected is already false.
>> 5) We now return from vmx_check_nested_events() with 0 while still
>> having exception.pending=true!
>> 6) Therefore inject_pending_event() continues
>> and we inject L2 exception to L1!...
>
> But this is buggy as well, because the #GP is lost, isn't it?

I don't think so.

Since commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has 
not yet been injected"), there is a fundamental difference between a 
pending exception and an injected exception.
A pending exception means that no side-effects of the exception have 
been applied yet. Including incrementing the RIP after the instruction 
which cause exception. In our case for example, handle_wrmsr() calls 
kvm_inject_gp() and returns without calling 
kvm_skip_emulated_instruction() which increments the RIP.

Therefore, when we exit from L2 to L1 on vmx-preemption-timer, we can 
safely clear exception.pending because when L1 will resume L2, the 
exception will be raised again (the same WRMSR instruction will be run 
again which will raise #GP again).
This is also why vmcs12_save_pending_event() only makes sure to save in 
VMCS12 idt-vectoring-info the "injected" events and not the "pending" 
events (interrupt.pending is misleading name and I would rename it in 
upcoming patch to interrupt.injected. See explanation below). And this 
is also why exception.pending is intercepted but exception.injected is not.

I can confirm this patch works because I have wrote a kvm-unit-test 
which reproduce this issue. And after the fix the #GP is not lost and 
raised to L2 directly correctly.
(I haven't posted the unit-test yet because it is very dependent on 
correct vmx-preemption-timer timer config that varies between environments).

>
> Is the bug that the preemption timer vmexit should only be injected if
> there are no pending exceptions?  In fact, the same bug could also
> happened for NMIs or interrupts, I think.
As explained above, I don't think so.

In general there is a bit of mess in KVM code regarding clean separation 
between a "pending" event and an "injected" event. NMIs & Exceptions are 
now separated nicely. However, interrupt.pending is actually 
interrupt.injected as it is signaled after the side-effects have 
occurred (bit moved from IRR to ISR for example).

I am going to post another series (which is a v2 of a previous series I 
posted here) tomorrow that will attempt to clean this and on the way fix 
a couple of bugs in this area.

>
> So, something like (101% untested):
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5b436f4e6e3e..64eecb8b126d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11137,8 +11137,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>   	bool block_nested_events =
>   	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
>
> -	if (vcpu->arch.exception.pending &&
> -		nested_vmx_check_exception(vcpu, &exit_qual)) {
> +	if (vcpu->arch.exception.pending) {
> +		if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +			return 0;
>   		if (block_nested_events)
>   			return -EBUSY;
>   		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -11146,15 +11147,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>   		return 0;
>   	}
>
> -	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
> -	    vmx->nested.preemption_timer_expired) {
> -		if (block_nested_events)
> -			return -EBUSY;
> -		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> -		return 0;
> -	}
> -
> -	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> +	if (vcpu->arch.nmi_pending) {
> +		if (!nested_exit_on_nmi(vcpu))
> +			return 0;
>   		if (block_nested_events)
>   			return -EBUSY;
>   		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> @@ -11169,14 +11164,23 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>   		return 0;
>   	}
>
> -	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
> -	    nested_exit_on_intr(vcpu)) {
> +	if (kvm_cpu_has_interrupt(vcpu) || external_intr) {
> +		if (!nested_exit_on_intr(vcpu))
> +			return 0;
>   		if (block_nested_events)
>   			return -EBUSY;
>   		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>   		return 0;
>   	}
>
> +	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
> +	    vmx->nested.preemption_timer_expired) {
> +		if (block_nested_events)
> +			return -EBUSY;
> +		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> +		return 0;
> +	}
> +
>   	vmx_complete_nested_posted_interrupt(vcpu);
>   	return 0;
>   }
>
> Paolo
>
>> This commit will fix above issue by changing step (4) to
>> clear exception.pending in kvm_clear_exception_queue().
>>
>> Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not
>> yet been injected")
>>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> ---
>>   arch/x86/kvm/vmx.c | 1 -
>>   arch/x86/kvm/x86.h | 1 +
>>   2 files changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 7c3522a989d0..bee08782c781 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11035,7 +11035,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>   		if (vmx->nested.nested_run_pending)
>>   			return -EBUSY;
>>   		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
>> -		vcpu->arch.exception.pending = false;
>>   		return 0;
>>   	}
>>
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index d0b95b7a90b4..6d112d8f799c 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -12,6 +12,7 @@
>>
>>   static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
>>   {
>> +	vcpu->arch.exception.pending = false;
>>   	vcpu->arch.exception.injected = false;
>>   }
>>
>>
>
> Should kvm_clear_interrupt_queue do the same?
interrupts currently only have interrupt.pending (which as I said above, 
it is actually interrupt.injected). Therefore 
kvm_clear_interrupt_queue() clears all there is...
>
> Thanks,
>
> Paolo
>

Regards,
-Liran
Paolo Bonzini Nov. 20, 2017, 10:58 p.m. UTC | #2
On 20/11/2017 23:46, Liran Alon wrote:
>>>
>>
>> But this is buggy as well, because the #GP is lost, isn't it?
> 
> I don't think so.
> 
> Since commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has
> not yet been injected"), there is a fundamental difference between a
> pending exception and an injected exception.
> A pending exception means that no side-effects of the exception have
> been applied yet. Including incrementing the RIP after the instruction
> which cause exception. In our case for example, handle_wrmsr() calls
> kvm_inject_gp() and returns without calling
> kvm_skip_emulated_instruction() which increments the RIP.

Ok, I was almost sure this was going to be the case if the #GP wasn't
lost (but couldn't convince myself 100%).

> Therefore, when we exit from L2 to L1 on vmx-preemption-timer, we can
> safely clear exception.pending because when L1 will resume L2, the
> exception will be raised again (the same WRMSR instruction will be run
> again which will raise #GP again).
> This is also why vmcs12_save_pending_event() only makes sure to save in
> VMCS12 idt-vectoring-info the "injected" events and not the "pending"
> events (interrupt.pending is misleading name and I would rename it in
> upcoming patch to interrupt.injected. See explanation below).

Indeed.  And then kvm_event_needs_reinjection starts making sense.

> I can confirm this patch works because I have wrote a kvm-unit-test
> which reproduce this issue. And after the fix the #GP is not lost and
> raised to L2 directly correctly.
> (I haven't posted the unit-test yet because it is very dependent on
> correct vmx-preemption-timer timer config that varies between
> environments).

Can you post it anyway?  Tests always help understanding the code.

Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5b436f4e6e3e..64eecb8b126d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11137,8 +11137,9 @@  static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 	bool block_nested_events =
 	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
 
-	if (vcpu->arch.exception.pending &&
-		nested_vmx_check_exception(vcpu, &exit_qual)) {
+	if (vcpu->arch.exception.pending) {
+		if (!nested_vmx_check_exception(vcpu, &exit_qual))
+			return 0;
 		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
@@ -11146,15 +11147,9 @@  static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 		return 0;
 	}
 
-	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
-	    vmx->nested.preemption_timer_expired) {
-		if (block_nested_events)
-			return -EBUSY;
-		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
-		return 0;
-	}
-
-	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
+	if (vcpu->arch.nmi_pending) {
+		if (!nested_exit_on_nmi(vcpu))
+			return 0;
 		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
@@ -11169,14 +11164,23 @@  static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 		return 0;
 	}
 
-	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
-	    nested_exit_on_intr(vcpu)) {
+	if (kvm_cpu_has_interrupt(vcpu) || external_intr) {
+		if (!nested_exit_on_intr(vcpu))
+			return 0;
 		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
 		return 0;
 	}
 
+	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
+	    vmx->nested.preemption_timer_expired) {
+		if (block_nested_events)
+			return -EBUSY;
+		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
+		return 0;
+	}
+
 	vmx_complete_nested_posted_interrupt(vcpu);
 	return 0;
 }