diff mbox

[3/7] KVM: x86: Add emulation_type to not raise #UD on CPL=3 emulation failure

Message ID 1513590316-4702-4-git-send-email-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liran Alon Dec. 18, 2017, 9:45 a.m. UTC
Next commits are going introduce support for accessing VMware backdoor
ports even though guest's TSS I/O permissions bitmap doesn't allow
access. This mimic VMware hypervisor behavior.

In order to support this, next commits will change VMX/SVM to
intercept #GP which was raised by such access and handle it by calling
the x86 emulator to emulate instruction. Since commit "KVM: x86:
Always allow access to VMware backdoor I/O ports", the x86 emulator
handles access to these I/O ports by not checking these ports against
the TSS I/O permission bitmap.

It turns out that the x86 emulator is incomplete and therefore
certain instructions that can cause #GP cannot be emulated.
Such an example is "INT x" (opcode 0xcd) which reach emulate_int()
which can only emulate instruction if vCPU is in real-mode.

In those cases, we would like the #GP intercept to just forward #GP
as-is to guest as if there was no intercept to begin with.
However, current emulator code always queue #UD exception in case
emulator fails which is not what is wanted in this flow.

This commit address this issue by adding a new emulation_type flag
that will allow the #GP intercept handler to specify it wish to just
be aware of when instruction emulation fails and doesn't want #UD
exception to be queued.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

Comments

Paolo Bonzini Dec. 21, 2017, 3:11 p.m. UTC | #1
On 18/12/2017 10:45, Liran Alon wrote:
> Next commits are going introduce support for accessing VMware backdoor
> ports even though guest's TSS I/O permissions bitmap doesn't allow
> access. This mimic VMware hypervisor behavior.
> 
> In order to support this, next commits will change VMX/SVM to
> intercept #GP which was raised by such access and handle it by calling
> the x86 emulator to emulate instruction. Since commit "KVM: x86:
> Always allow access to VMware backdoor I/O ports", the x86 emulator
> handles access to these I/O ports by not checking these ports against
> the TSS I/O permission bitmap.
> 
> It turns out that the x86 emulator is incomplete and therefore
> certain instructions that can cause #GP cannot be emulated.
> Such an example is "INT x" (opcode 0xcd) which reach emulate_int()
> which can only emulate instruction if vCPU is in real-mode.
> 
> In those cases, we would like the #GP intercept to just forward #GP
> as-is to guest as if there was no intercept to begin with.
> However, current emulator code always queue #UD exception in case
> emulator fails which is not what is wanted in this flow.
> 
> This commit address this issue by adding a new emulation_type flag
> that will allow the #GP intercept handler to specify it wish to just
> be aware of when instruction emulation fails and doesn't want #UD
> exception to be queued.
> 
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 12 ++++++++----
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 516798431328..2b7ea1ac4f86 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1168,6 +1168,7 @@ enum emulation_result {
>  #define EMULTYPE_SKIP		    (1 << 2)
>  #define EMULTYPE_RETRY		    (1 << 3)
>  #define EMULTYPE_NO_REEXECUTE	    (1 << 4)
> +#define EMULTYPE_NO_UD_ON_FAIL	    (1 << 5)
>  int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
>  			    int emulation_type, void *insn, int insn_len);
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5fef09674de1..8fd2d3e1bcd4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5425,7 +5425,7 @@ int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
>  
> -static int handle_emulation_failure(struct kvm_vcpu *vcpu)
> +static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
>  {
>  	int r = EMULATE_DONE;
>  
> @@ -5437,7 +5437,11 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
>  		vcpu->run->internal.ndata = 0;
>  		r = EMULATE_USER_EXIT;
>  	}
> -	kvm_queue_exception(vcpu, UD_VECTOR);
> +
> +	if (emulation_type & EMULTYPE_NO_UD_ON_FAIL)
> +		r = EMULATE_FAIL;

There seems to be some overlap between EMULTYPE_VMWARE and
EMULTYPE_NO_UD_ON_FAIL.

Even if you don't specify EMULTYPE_VMWARE, the emulation could fail, but
there should have been no writeback and injecting the original #GP
exception should be safe.  Maybe should EMULTYPE_NO_UD_ON_FAIL return
straight away, skipping even the user-mode exit.

On the other hand, you may want to have EMULTYPE_VMWARE so as to reduce
the attack surface from the emulator (as the emulator would then be very
easy to trigger, just by executing an instruction that causes #GP).  In
that case, however, emulation of the {in,out}{s,} instructions shouldn't
fail and you shouldn't need EMULTYPE_NO_UD_ON_FAIL.

Paolo


> +	else
> +		kvm_queue_exception(vcpu, UD_VECTOR);
>  
>  	return r;
>  }
> @@ -5731,7 +5735,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>  				return EMULATE_DONE;
>  			if (emulation_type & EMULTYPE_SKIP)
>  				return EMULATE_FAIL;
> -			return handle_emulation_failure(vcpu);
> +			return handle_emulation_failure(vcpu, emulation_type);
>  		}
>  	}
>  
> @@ -5766,7 +5770,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>  					emulation_type))
>  			return EMULATE_DONE;
>  
> -		return handle_emulation_failure(vcpu);
> +		return handle_emulation_failure(vcpu, emulation_type);
>  	}
>  
>  	if (ctxt->have_exception) {
>
Liran Alon Dec. 22, 2017, 1:11 a.m. UTC | #2
On 21/12/17 17:11, Paolo Bonzini wrote:
> On 18/12/2017 10:45, Liran Alon wrote:
>> Next commits are going introduce support for accessing VMware backdoor
>> ports even though guest's TSS I/O permissions bitmap doesn't allow
>> access. This mimic VMware hypervisor behavior.
>>
>> In order to support this, next commits will change VMX/SVM to
>> intercept #GP which was raised by such access and handle it by calling
>> the x86 emulator to emulate instruction. Since commit "KVM: x86:
>> Always allow access to VMware backdoor I/O ports", the x86 emulator
>> handles access to these I/O ports by not checking these ports against
>> the TSS I/O permission bitmap.
>>
>> It turns out that the x86 emulator is incomplete and therefore
>> certain instructions that can cause #GP cannot be emulated.
>> Such an example is "INT x" (opcode 0xcd) which reach emulate_int()
>> which can only emulate instruction if vCPU is in real-mode.
>>
>> In those cases, we would like the #GP intercept to just forward #GP
>> as-is to guest as if there was no intercept to begin with.
>> However, current emulator code always queue #UD exception in case
>> emulator fails which is not what is wanted in this flow.
>>
>> This commit address this issue by adding a new emulation_type flag
>> that will allow the #GP intercept handler to specify it wish to just
>> be aware of when instruction emulation fails and doesn't want #UD
>> exception to be queued.
>>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  1 +
>>   arch/x86/kvm/x86.c              | 12 ++++++++----
>>   2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 516798431328..2b7ea1ac4f86 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1168,6 +1168,7 @@ enum emulation_result {
>>   #define EMULTYPE_SKIP		    (1 << 2)
>>   #define EMULTYPE_RETRY		    (1 << 3)
>>   #define EMULTYPE_NO_REEXECUTE	    (1 << 4)
>> +#define EMULTYPE_NO_UD_ON_FAIL	    (1 << 5)
>>   int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
>>   			    int emulation_type, void *insn, int insn_len);
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 5fef09674de1..8fd2d3e1bcd4 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5425,7 +5425,7 @@ int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
>>
>> -static int handle_emulation_failure(struct kvm_vcpu *vcpu)
>> +static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
>>   {
>>   	int r = EMULATE_DONE;
>>
>> @@ -5437,7 +5437,11 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
>>   		vcpu->run->internal.ndata = 0;
>>   		r = EMULATE_USER_EXIT;
>>   	}
>> -	kvm_queue_exception(vcpu, UD_VECTOR);
>> +
>> +	if (emulation_type & EMULTYPE_NO_UD_ON_FAIL)
>> +		r = EMULATE_FAIL;
>
> There seems to be some overlap between EMULTYPE_VMWARE and
> EMULTYPE_NO_UD_ON_FAIL.
>
> Even if you don't specify EMULTYPE_VMWARE, the emulation could fail, but
> there should have been no writeback and injecting the original #GP
> exception should be safe.

What you mention is true in case x86_emulate_instruction() fails on 
instruction emulation but it could also fail on instruction disassembly.
(See more details below).

> Maybe should EMULTYPE_NO_UD_ON_FAIL return
> straight away, skipping even the user-mode exit.

I agree it is possible to check
"if (emulation_type & EMULTYPE_NO_UD_ON_FAIL)"
immediately after statistics & tracing, as if this flag is set, we will 
return EMULATE_FAIL anyway (maybe overwriting EMULATE_USER_EXIT).
If that's important, I can do such change in a v2 of this patch.

>
> On the other hand, you may want to have EMULTYPE_VMWARE so as to reduce
> the attack surface from the emulator (as the emulator would then be very
> easy to trigger, just by executing an instruction that causes #GP).  In
> that case, however, emulation of the {in,out}{s,} instructions shouldn't
> fail and you shouldn't need EMULTYPE_NO_UD_ON_FAIL.

Consider the case where the CPU raises a #GP on some instruction which 
is now intercepted by KVM. The #GP intercept will call 
x86_emulate_instruction(). If the x86 emulator disassembly engine is 
incomplete and therefore doesn't know how to parse the instruction which 
caused the #GP, x86_decode_insn() will fail which will also reach 
handle_emulation_failure(). If there is no EMULTYPE_NO_UD_ON_FAIL flag, 
this will cause a #UD exception to be queued which is not what we want.
(We would like to preserve behaviour of raising a #GP to guest)

Therefore we can summarize these flags usage as follows:
1. EMULTYPE_NO_UD_ON_FAIL is used to tell emulator "if you fail to 
disassemble the instruction, I just want you to return failure. Do not 
queue a #UD and let me decide what should be the proper response".
2. EMULTYPE_VMWARE is indeed used to avoid making all instructions that 
could raise #GP to reach instruction-emulation as the x86 emulator is 
incomplete anyway and it just, as you say, increase attack surface.

Having said that, I agree the commit messages of the 2 commits 
introducing these flags may not be indicative enough. If we agree on the 
written above, I can fix them in v2 of this series.

What do you think?

Regards,
-Liran

>
> Paolo
>
>
>> +	else
>> +		kvm_queue_exception(vcpu, UD_VECTOR);
>>
>>   	return r;
>>   }
>> @@ -5731,7 +5735,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>>   				return EMULATE_DONE;
>>   			if (emulation_type & EMULTYPE_SKIP)
>>   				return EMULATE_FAIL;
>> -			return handle_emulation_failure(vcpu);
>> +			return handle_emulation_failure(vcpu, emulation_type);
>>   		}
>>   	}
>>
>> @@ -5766,7 +5770,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>>   					emulation_type))
>>   			return EMULATE_DONE;
>>
>> -		return handle_emulation_failure(vcpu);
>> +		return handle_emulation_failure(vcpu, emulation_type);
>>   	}
>>
>>   	if (ctxt->have_exception) {
>>
>
Paolo Bonzini Dec. 22, 2017, 3:16 p.m. UTC | #3
On 22/12/2017 02:11, Liran Alon wrote:
> Consider the case where the CPU raises a #GP on some instruction
> which is now intercepted by KVM. The #GP intercept will call
> x86_emulate_instruction(). If the x86 emulator disassembly engine is
> incomplete and therefore doesn't know how to parse the instruction
> which caused the #GP, x86_decode_insn() will fail which will also
> reach handle_emulation_failure(). If there is no
> EMULTYPE_NO_UD_ON_FAIL flag, this will cause a #UD exception to be
> queued which is not what we want.

Yup, however EMULTYPE_VMWARE has filtered the opcodes, hasn't it?  So in
this case you shouldn't fail the decoding.

> Therefore we can summarize these flags usage as follows: 1.
> EMULTYPE_NO_UD_ON_FAIL is used to tell emulator "if you fail to 
> disassemble the instruction, I just want you to return failure. Do
> not queue a #UD and let me decide what should be the proper
> response".
>
> 2. EMULTYPE_VMWARE is indeed used to avoid making all
> instructions that could raise #GP to reach instruction-emulation as
> the x86 emulator is incomplete anyway and it just, as you say,
> increase attack surface.
> 
> Having said that, I agree the commit messages of the 2 commits 
> introducing these flags may not be indicative enough. If we agree on
> the written above, I can fix them in v2 of this series.

Yeah, that's good.  In particular it's important to note that
EMULTYPE_VMWARE is not for correctness, only for hardening.

Thanks,

Paolo
Liran Alon Dec. 22, 2017, 3:53 p.m. UTC | #4
On 22/12/17 17:16, Paolo Bonzini wrote:
> On 22/12/2017 02:11, Liran Alon wrote:
>> Consider the case where the CPU raises a #GP on some instruction
>> which is now intercepted by KVM. The #GP intercept will call
>> x86_emulate_instruction(). If the x86 emulator disassembly engine is
>> incomplete and therefore doesn't know how to parse the instruction
>> which caused the #GP, x86_decode_insn() will fail which will also
>> reach handle_emulation_failure(). If there is no
>> EMULTYPE_NO_UD_ON_FAIL flag, this will cause a #UD exception to be
>> queued which is not what we want.
>
> Yup, however EMULTYPE_VMWARE has filtered the opcodes, hasn't it?  So in
> this case you shouldn't fail the decoding.

In my current implementation EMULTYPE_VMWARE is considered only after 
the disassembly engine (x86_decode_insn()) has succeeded. It is true I 
could have filtered the opcodes before invoking the disassembly engine 
but that will make code a bit more complex. In addition, I didn't saw a 
lot of value in reducing the attack surface from the disassembly engine 
itself. Only from the emulation.

Therefore, I decided to make the EMULTYPE_NO_UD_ON_FAIL flag which may 
be also useful in the future for other use cases.

Regards,
-Liran

>
>> Therefore we can summarize these flags usage as follows: 1.
>> EMULTYPE_NO_UD_ON_FAIL is used to tell emulator "if you fail to
>> disassemble the instruction, I just want you to return failure. Do
>> not queue a #UD and let me decide what should be the proper
>> response".
>>
>> 2. EMULTYPE_VMWARE is indeed used to avoid making all
>> instructions that could raise #GP to reach instruction-emulation as
>> the x86 emulator is incomplete anyway and it just, as you say,
>> increase attack surface.
>>
>> Having said that, I agree the commit messages of the 2 commits
>> introducing these flags may not be indicative enough. If we agree on
>> the written above, I can fix them in v2 of this series.
>
> Yeah, that's good.  In particular it's important to note that
> EMULTYPE_VMWARE is not for correctness, only for hardening.
>
> Thanks,
>
> Paolo
>
Paolo Bonzini Dec. 22, 2017, 3:59 p.m. UTC | #5
On 22/12/2017 16:53, Liran Alon wrote:
>>
>> Yup, however EMULTYPE_VMWARE has filtered the opcodes, hasn't it?  So in
>> this case you shouldn't fail the decoding.
> 
> In my current implementation EMULTYPE_VMWARE is considered only after
> the disassembly engine (x86_decode_insn()) has succeeded. It is true I
> could have filtered the opcodes before invoking the disassembly engine
> but that will make code a bit more complex. In addition, I didn't saw a
> lot of value in reducing the attack surface from the disassembly engine
> itself. Only from the emulation.

We've had oopses from incorrect decoding, so it can be useful.  You're
right that the RDPMC case has a two-byte opcode and therefore it makes
decoding a bit more complex.

I'd say, let's keep it simple and start with EMULTYPE_NO_UD_ON_FAIL.

Thanks,

Paolo

> Therefore, I decided to make the EMULTYPE_NO_UD_ON_FAIL flag which may
> be also useful in the future for other use cases.
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 516798431328..2b7ea1ac4f86 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1168,6 +1168,7 @@  enum emulation_result {
 #define EMULTYPE_SKIP		    (1 << 2)
 #define EMULTYPE_RETRY		    (1 << 3)
 #define EMULTYPE_NO_REEXECUTE	    (1 << 4)
+#define EMULTYPE_NO_UD_ON_FAIL	    (1 << 5)
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
 			    int emulation_type, void *insn, int insn_len);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5fef09674de1..8fd2d3e1bcd4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5425,7 +5425,7 @@  int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
 
-static int handle_emulation_failure(struct kvm_vcpu *vcpu)
+static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 {
 	int r = EMULATE_DONE;
 
@@ -5437,7 +5437,11 @@  static int handle_emulation_failure(struct kvm_vcpu *vcpu)
 		vcpu->run->internal.ndata = 0;
 		r = EMULATE_USER_EXIT;
 	}
-	kvm_queue_exception(vcpu, UD_VECTOR);
+
+	if (emulation_type & EMULTYPE_NO_UD_ON_FAIL)
+		r = EMULATE_FAIL;
+	else
+		kvm_queue_exception(vcpu, UD_VECTOR);
 
 	return r;
 }
@@ -5731,7 +5735,7 @@  int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 				return EMULATE_DONE;
 			if (emulation_type & EMULTYPE_SKIP)
 				return EMULATE_FAIL;
-			return handle_emulation_failure(vcpu);
+			return handle_emulation_failure(vcpu, emulation_type);
 		}
 	}
 
@@ -5766,7 +5770,7 @@  int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 					emulation_type))
 			return EMULATE_DONE;
 
-		return handle_emulation_failure(vcpu);
+		return handle_emulation_failure(vcpu, emulation_type);
 	}
 
 	if (ctxt->have_exception) {