diff mbox series

KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)

Message ID 20190212144354.7694-1-brijesh.singh@amd.com (mailing list archive)
State New, archived
Headers show
Series KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) | expand

Commit Message

Brijesh Singh Feb. 12, 2019, 2:44 p.m. UTC
Errata#1090:

On a nested data page fault when CR.SMAP=1 and the guest data read
generates a SMAP violation, GuestInstrBytes field of the VMCB on a
VMEXIT will incorrectly return 0h instead the correct guest
instruction bytes .

Recommend Workaround:

To determine what instruction the guest was executing the hypervisor
will have to decode the instruction at the instruction pointer.

The recommended workaround can not be implemented for the SEV
guest because guest memory is encrypted with the guest specific key,
and instruction decoder will not be able to decode the instruction
bytes. If we hit this errata in the SEV guest then inject #GP into
the guest and log the message.

Cc: Jim Mattson <jmattson@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---

Errata link:
https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf

 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.c              |  6 ++++--
 arch/x86/kvm/svm.c              | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c          |  6 ++++++
 4 files changed, 44 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini Feb. 12, 2019, 3:06 p.m. UTC | #1
On 12/02/19 15:44, Singh, Brijesh wrote:
> -	if (unlikely(insn && !insn_len))
> -		return 1;
> +	if (unlikely(insn && !insn_len)) {
> +		if (!kvm_x86_ops->emulate_instruction_possible(vcpu))
> +			return 1;
> +	}

Are the instruction bytes valid, that is can we just ignore insn_len and
use the bytes but not the length?  That would work for SEV too.

Paolo

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 95d618045001..6767bad8367e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7530,6 +7530,11 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu)
> +{
> +	return 1;

This should be "return 0;" to keep previous behavior.

> +}
> +
>  static __init int hardware_setup(void)
>  {
>  	unsigned long host_bndcfgs;
> @@ -7832,6 +7837,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.set_nested_state = NULL,
>  	.get_vmcs12_pages = NULL,
>  	.nested_enable_evmcs = NULL,
> +	.emulate_instruction_possible = emulate_instruction_possible,
>  };
>
Jim Mattson Feb. 12, 2019, 5:41 p.m. UTC | #2
On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote:
>
> Errata#1090:
>
> On a nested data page fault when CR.SMAP=1 and the guest data read
> generates a SMAP violation, GuestInstrBytes field of the VMCB on a
> VMEXIT will incorrectly return 0h instead the correct guest
> instruction bytes .
>
> Recommend Workaround:
>
> To determine what instruction the guest was executing the hypervisor
> will have to decode the instruction at the instruction pointer.
>
> The recommended workaround can not be implemented for the SEV
> guest because guest memory is encrypted with the guest specific key,
> and instruction decoder will not be able to decode the instruction
> bytes. If we hit this errata in the SEV guest then inject #GP into
> the guest and log the message.
>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>

This was ...
Reported-by: Venkatesh Srinivas <venkateshs@google.com>

I'm curious why you chose to inject #GP rather than, say, requesting a
guest shutdown. Is the guest #GP handler expected to be able to
recover from this?
Brijesh Singh Feb. 12, 2019, 8:06 p.m. UTC | #3
On 2/12/19 9:06 AM, Paolo Bonzini wrote:
> On 12/02/19 15:44, Singh, Brijesh wrote:
>> -	if (unlikely(insn && !insn_len))
>> -		return 1;
>> +	if (unlikely(insn && !insn_len)) {
>> +		if (!kvm_x86_ops->emulate_instruction_possible(vcpu))
>> +			return 1;
>> +	}
> 
> Are the instruction bytes valid, that is can we just ignore insn_len and
> use the bytes but not the length?  That would work for SEV too.
> 


The instruction bytes are not valid so we will not able to workaround
for the SEV.


> Paolo
> 
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 95d618045001..6767bad8367e 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7530,6 +7530,11 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
>>   	return 0;
>>   }
>>   
>> +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu)
>> +{
>> +	return 1;
> 
> This should be "return 0;" to keep previous behavior.
> 
Sure, I will fix in v2.


>> +}
>> +
>>   static __init int hardware_setup(void)
>>   {
>>   	unsigned long host_bndcfgs;
>> @@ -7832,6 +7837,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>   	.set_nested_state = NULL,
>>   	.get_vmcs12_pages = NULL,
>>   	.nested_enable_evmcs = NULL,
>> +	.emulate_instruction_possible = emulate_instruction_possible,
>>   };
>>   
>
Brijesh Singh Feb. 12, 2019, 8:12 p.m. UTC | #4
On 2/12/19 11:41 AM, Jim Mattson wrote:
> On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>
>> Errata#1090:
>>
>> On a nested data page fault when CR.SMAP=1 and the guest data read
>> generates a SMAP violation, GuestInstrBytes field of the VMCB on a
>> VMEXIT will incorrectly return 0h instead the correct guest
>> instruction bytes .
>>
>> Recommend Workaround:
>>
>> To determine what instruction the guest was executing the hypervisor
>> will have to decode the instruction at the instruction pointer.
>>
>> The recommended workaround can not be implemented for the SEV
>> guest because guest memory is encrypted with the guest specific key,
>> and instruction decoder will not be able to decode the instruction
>> bytes. If we hit this errata in the SEV guest then inject #GP into
>> the guest and log the message.
>>
>> Cc: Jim Mattson <jmattson@google.com>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> 
> This was ...
> Reported-by: Venkatesh Srinivas <venkateshs@google.com>
> 

I will add the tag in next rev.


> I'm curious why you chose to inject #GP rather than, say, requesting a
> guest shutdown. Is the guest #GP handler expected to be able to
> recover from this?
> 


We will *not* be able to recover from this, I wanted to abort the
guest and I should admit that I was not ware of requesting a SHUTDOWN
method so decided to inject #GP so that guest does not continue.
Browsing further, I see that kvm_make_request(KVM_REQ_TRIPLE_FAULT,
vcpu) can be used to request a SHUTDOWN. I will use it in next
rev. thanks for the hint .

-Brijeshh
Venkatesh Srinivas Feb. 12, 2019, 8:38 p.m. UTC | #5
On Tue, Feb 12, 2019 at 12:12 PM Singh, Brijesh <brijesh.singh@amd.com> wrote:
> On 2/12/19 11:41 AM, Jim Mattson wrote:
> > On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote:
> >>
> >> Errata#1090:
> >>
> >> On a nested data page fault when CR.SMAP=1 and the guest data read
> >> generates a SMAP violation, GuestInstrBytes field of the VMCB on a
> >> VMEXIT will incorrectly return 0h instead the correct guest
> >> instruction bytes

Do you mean Errata #1096?
(https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf
v1.12 pg 61)

> >>
> >> Recommend Workaround:
> >>
> >> To determine what instruction the guest was executing the hypervisor
> >> will have to decode the instruction at the instruction pointer.
> >>
> >> The recommended workaround can not be implemented for the SEV
> >> guest because guest memory is encrypted with the guest specific key,
> >> and instruction decoder will not be able to decode the instruction
> >> bytes. If we hit this errata in the SEV guest then inject #GP into
> >> the guest and log the message.
> >>
> >> Cc: Jim Mattson <jmattson@google.com>
> >> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> >> Cc: Borislav Petkov <bp@alien8.de>
> >> Cc: Joerg Roedel <joro@8bytes.org>
> >> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >
> > This was ...
> > Reported-by: Venkatesh Srinivas <venkateshs@google.com>
> >
>
> I will add the tag in next rev.
>
>
> > I'm curious why you chose to inject #GP rather than, say, requesting a
> > guest shutdown. Is the guest #GP handler expected to be able to
> > recover from this?
> >
>
>
> We will *not* be able to recover from this, I wanted to abort the
> guest and I should admit that I was not ware of requesting a SHUTDOWN
> method so decided to inject #GP so that guest does not continue.
> Browsing further, I see that kvm_make_request(KVM_REQ_TRIPLE_FAULT,
> vcpu) can be used to request a SHUTDOWN. I will use it in next
> rev. thanks for the hint .
>
> -Brijeshh

Should the pr_err() be ratelimited? Otherwise a guest suppressing #GP
could spam the host dmesg.

Thanks,
-- vs;
Brijesh Singh Feb. 13, 2019, 12:30 a.m. UTC | #6
On 2/12/19 2:38 PM, Venkatesh Srinivas wrote:
> On Tue, Feb 12, 2019 at 12:12 PM Singh, Brijesh <brijesh.singh@amd.com> wrote:
>> On 2/12/19 11:41 AM, Jim Mattson wrote:
>>> On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>>> Errata#1090:
>>>>
>>>> On a nested data page fault when CR.SMAP=1 and the guest data read
>>>> generates a SMAP violation, GuestInstrBytes field of the VMCB on a
>>>> VMEXIT will incorrectly return 0h instead the correct guest
>>>> instruction bytes
> Do you mean Errata #1096?
> (https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf
> v1.12 pg 61)


Yes, I did noticed after sending the patch. Will fix in next rev.


>>>> Recommend Workaround:
>>>>
>>>> To determine what instruction the guest was executing the hypervisor
>>>> will have to decode the instruction at the instruction pointer.
>>>>
>>>> The recommended workaround can not be implemented for the SEV
>>>> guest because guest memory is encrypted with the guest specific key,
>>>> and instruction decoder will not be able to decode the instruction
>>>> bytes. If we hit this errata in the SEV guest then inject #GP into
>>>> the guest and log the message.
>>>>
>>>> Cc: Jim Mattson <jmattson@google.com>
>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>> Cc: Joerg Roedel <joro@8bytes.org>
>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>> This was ...
>>> Reported-by: Venkatesh Srinivas <venkateshs@google.com>
>>>
>> I will add the tag in next rev.
>>
>>
>>> I'm curious why you chose to inject #GP rather than, say, requesting a
>>> guest shutdown. Is the guest #GP handler expected to be able to
>>> recover from this?
>>>
>>
>> We will *not* be able to recover from this, I wanted to abort the
>> guest and I should admit that I was not ware of requesting a SHUTDOWN
>> method so decided to inject #GP so that guest does not continue.
>> Browsing further, I see that kvm_make_request(KVM_REQ_TRIPLE_FAULT,
>> vcpu) can be used to request a SHUTDOWN. I will use it in next
>> rev. thanks for the hint .
>>
>> -Brijeshh
> Should the pr_err() be ratelimited? Otherwise a guest suppressing #GP
> could spam the host dmesg.


Agree, ratelimited is good idea to suppressing the spam of host dmesg.
Will use it in next rev. thanks


-Brijesh

> Thanks,
> -- vs;
Paolo Bonzini Feb. 13, 2019, 2:11 p.m. UTC | #7
On 12/02/19 18:41, Jim Mattson wrote:
> On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>
>> Errata#1090:
>>
>> On a nested data page fault when CR.SMAP=1 and the guest data read
>> generates a SMAP violation, GuestInstrBytes field of the VMCB on a
>> VMEXIT will incorrectly return 0h instead the correct guest
>> instruction bytes .
>>
>> Recommend Workaround:
>>
>> To determine what instruction the guest was executing the hypervisor
>> will have to decode the instruction at the instruction pointer.
>>
>> The recommended workaround can not be implemented for the SEV
>> guest because guest memory is encrypted with the guest specific key,
>> and instruction decoder will not be able to decode the instruction
>> bytes. If we hit this errata in the SEV guest then inject #GP into
>> the guest and log the message.

Actually this is not the workaround that KVM is implementing; KVM is
simply retrying the instruction after fixing the page fault.  This would
cause an infinite loop in the guest if the instruction is actually MMIO,
however in that case KVM will get an RSVD page fault rather than a SMAP
page fault and the errata would not be triggered.  So why is this patch
needed?

Thanks,

Paolo
Brijesh Singh Feb. 13, 2019, 4:40 p.m. UTC | #8
Hi Paolo,


On 2/13/19 8:11 AM, Paolo Bonzini wrote:
> On 12/02/19 18:41, Jim Mattson wrote:
>> On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>>
>>> Errata#1090:
>>>
>>> On a nested data page fault when CR.SMAP=1 and the guest data read
>>> generates a SMAP violation, GuestInstrBytes field of the VMCB on a
>>> VMEXIT will incorrectly return 0h instead the correct guest
>>> instruction bytes .
>>>
>>> Recommend Workaround:
>>>
>>> To determine what instruction the guest was executing the hypervisor
>>> will have to decode the instruction at the instruction pointer.
>>>
>>> The recommended workaround can not be implemented for the SEV
>>> guest because guest memory is encrypted with the guest specific key,
>>> and instruction decoder will not be able to decode the instruction
>>> bytes. If we hit this errata in the SEV guest then inject #GP into
>>> the guest and log the message.
> 
> Actually this is not the workaround that KVM is implementing; KVM is
> simply retrying the instruction after fixing the page fault.  This would
> cause an infinite loop in the guest if the instruction is actually MMIO,
> however in that case KVM will get an RSVD page fault rather than a SMAP
> page fault and the errata would not be triggered.  So why is this patch
> needed?
> 


KVM is not getting a SMAP page fault. The HW gets a SMAP fault when
trying to fetch the instruction bytes, and that fault is ignored (and
instead the ucode just doesn't save any instruction bytes). The fault
code to KVM is the RSVD. While handling the RSVD fault, KVM may come to
the code path which requires instruction emulation. At this time we will
check to see if SMAP was enabled and insn_len is zero. If so, we are
hitting this errata. If we skip the emulation and restart the guest
then in this particular case we will get the *same* fault again and
enter into infinite loop. In case of non SEV, we can workaround it by
letting the KVM read the instruction bytes from the guest RIP. But
in case of SEV we will have no choice other than requesting to restart
and thus causing guest an infinite loop. In SEV, my attempt was to
detect this errata and log it and request the SHUTDOWN of the
guest so that we continue further (in v1 I am doing #GP instead
of the SHUTDOWN request, based on Jim's comment I am switching to use
kvm_make_request(...)).

thanks
Brijesh
Paolo Bonzini Feb. 13, 2019, 4:44 p.m. UTC | #9
On 13/02/19 17:40, Singh, Brijesh wrote:
> 
> KVM is not getting a SMAP page fault. The HW gets a SMAP fault when
> trying to fetch the instruction bytes

How does the hardware get a SMAP fault when trying to fetch
instructions?  SMAP faults are only generated for data accesses, not
instructions.  Now I'm even more confused. :)

Paolo

, and that fault is ignored (and
> instead the ucode just doesn't save any instruction bytes). The fault
> code to KVM is the RSVD. While handling the RSVD fault, KVM may come to
> the code path which requires instruction emulation. At this time we will
> check to see if SMAP was enabled and insn_len is zero. If so, we are
> hitting this errata. If we skip the emulation and restart the guest
> then in this particular case we will get the *same* fault again and
> enter into infinite loop. In case of non SEV, we can workaround it by
> letting the KVM read the instruction bytes from the guest RIP. But
> in case of SEV we will have no choice other than requesting to restart
> and thus causing guest an infinite loop. In SEV, my attempt was to
> detect this errata and log it and request the SHUTDOWN of the
> guest so that we continue further (in v1 I am doing #GP instead
> of the SHUTDOWN request, based on Jim's comment I am switching to use
> kvm_make_request(...)).
Brijesh Singh Feb. 13, 2019, 5:19 p.m. UTC | #10
On 2/13/19 10:44 AM, Paolo Bonzini wrote:
> On 13/02/19 17:40, Singh, Brijesh wrote:
>>
>> KVM is not getting a SMAP page fault. The HW gets a SMAP fault when
>> trying to fetch the instruction bytes
> 
> How does the hardware get a SMAP fault when trying to fetch
> instructions?  SMAP faults are only generated for data accesses, not
> instructions.  Now I'm even more confused. :)
> 

I was hoping to clarify instead of causing more confusion :)

Here's is how it works:

1. Guest does the MMIO access which causes a rsvd page fault
2. Hardware processes this as a VMEXIT
3. During the processing, hardware attempts to read the instruction
    bytes. This is done by issuing a data read request from the RIP
    that the guest was at.
4. The result of these reads are then stored in the VMCB.

So in step #3 there can be a SMAP fault because internally the CPU
is doing a data read from the RIP to get these bytes. Hardware didn't
save them from the actual instruction execution or anything, it
actually go and re-read them, which is why this can cause a SMAP
fault.


> Paolo
> 
> , and that fault is ignored (and
>> instead the ucode just doesn't save any instruction bytes). The fault
>> code to KVM is the RSVD. While handling the RSVD fault, KVM may come to
>> the code path which requires instruction emulation. At this time we will
>> check to see if SMAP was enabled and insn_len is zero. If so, we are
>> hitting this errata. If we skip the emulation and restart the guest
>> then in this particular case we will get the *same* fault again and
>> enter into infinite loop. In case of non SEV, we can workaround it by
>> letting the KVM read the instruction bytes from the guest RIP. But
>> in case of SEV we will have no choice other than requesting to restart
>> and thus causing guest an infinite loop. In SEV, my attempt was to
>> detect this errata and log it and request the SHUTDOWN of the
>> guest so that we continue further (in v1 I am doing #GP instead
>> of the SHUTDOWN request, based on Jim's comment I am switching to use
>> kvm_make_request(...)).
>
Paolo Bonzini Feb. 13, 2019, 5:23 p.m. UTC | #11
On 13/02/19 18:19, Singh, Brijesh wrote:
> 1. Guest does the MMIO access which causes a rsvd page fault
> 2. Hardware processes this as a VMEXIT
> 3. During the processing, hardware attempts to read the instruction
>     bytes. This is done by issuing a data read request from the RIP
>     that the guest was at.
> 4. The result of these reads are then stored in the VMCB.
> 
> So in step #3 there can be a SMAP fault because internally the CPU
> is doing a data read from the RIP to get these bytes. Hardware didn't
> save them from the actual instruction execution or anything, it
> actually go and re-read them, which is why this can cause a SMAP
> fault.

What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault?
Does KVM actually use that combination?

Paolo
Brijesh Singh Feb. 13, 2019, 6:28 p.m. UTC | #12
On 2/13/19 11:23 AM, Paolo Bonzini wrote:
> On 13/02/19 18:19, Singh, Brijesh wrote:
>> 1. Guest does the MMIO access which causes a rsvd page fault
>> 2. Hardware processes this as a VMEXIT
>> 3. During the processing, hardware attempts to read the instruction
>>      bytes. This is done by issuing a data read request from the RIP
>>      that the guest was at.
>> 4. The result of these reads are then stored in the VMCB.
>>
>> So in step #3 there can be a SMAP fault because internally the CPU
>> is doing a data read from the RIP to get these bytes. Hardware didn't
>> save them from the actual instruction execution or anything, it
>> actually go and re-read them, which is why this can cause a SMAP
>> fault.
> 
> What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault?

e.g

1. Guest has SMAP enabled
2. RIP is mapped in guest page tables with U/S=1

The CPU always does the these insn reads as if CPL=0. In the above case
the insn fetch #3 will cause SMAP fault in the hardware. On VMEXIT,
KVM will get the #NPF with a RSVD fault.


> Does KVM actually use that combination?
>

Sorry, I am not sure if I understand you question. From KVM point-of-
view, we get a #NPF (a RSVD fault).

I have not encountered this in any of my guest runs (typically
we don't do MMIO at CPL=3).

-Brijesh
Paolo Bonzini Feb. 13, 2019, 6:33 p.m. UTC | #13
On 13/02/19 19:28, Singh, Brijesh wrote:
> 
> 
> On 2/13/19 11:23 AM, Paolo Bonzini wrote:
>> On 13/02/19 18:19, Singh, Brijesh wrote:
>>> 1. Guest does the MMIO access which causes a rsvd page fault
>>> 2. Hardware processes this as a VMEXIT
>>> 3. During the processing, hardware attempts to read the instruction
>>>      bytes. This is done by issuing a data read request from the RIP
>>>      that the guest was at.
>>> 4. The result of these reads are then stored in the VMCB.
>>>
>>> So in step #3 there can be a SMAP fault because internally the CPU
>>> is doing a data read from the RIP to get these bytes. Hardware didn't
>>> save them from the actual instruction execution or anything, it
>>> actually go and re-read them, which is why this can cause a SMAP
>>> fault.
>>
>> What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault?
> 
> e.g
> 
> 1. Guest has SMAP enabled
> 2. RIP is mapped in guest page tables with U/S=1

Ok so that's pretty messed up.  Is this fixed with newer microcode, and
how widespread is this silicon?  I'm tempted to just blacklist SMAP on
the affected steppings.

Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1,
and set CR4.SMAP back to 1 in the #DB handler?

Thanks,

Paolo

> The CPU always does the these insn reads as if CPL=0. In the above case
> the insn fetch #3 will cause SMAP fault in the hardware. On VMEXIT,
> KVM will get the #NPF with a RSVD fault.
> 
>> Does KVM actually use that combination?
> 
> Sorry, I am not sure if I understand you question. From KVM point-of-
> view, we get a #NPF (a RSVD fault).
>
> I have not encountered this in any of my guest runs (typically
> we don't do MMIO at CPL=3).
> 
> -Brijesh
>
Brijesh Singh Feb. 13, 2019, 7:02 p.m. UTC | #14
On 2/13/19 12:33 PM, Paolo Bonzini wrote:
> On 13/02/19 19:28, Singh, Brijesh wrote:
>>
>>
>> On 2/13/19 11:23 AM, Paolo Bonzini wrote:
>>> On 13/02/19 18:19, Singh, Brijesh wrote:
>>>> 1. Guest does the MMIO access which causes a rsvd page fault
>>>> 2. Hardware processes this as a VMEXIT
>>>> 3. During the processing, hardware attempts to read the instruction
>>>>       bytes. This is done by issuing a data read request from the RIP
>>>>       that the guest was at.
>>>> 4. The result of these reads are then stored in the VMCB.
>>>>
>>>> So in step #3 there can be a SMAP fault because internally the CPU
>>>> is doing a data read from the RIP to get these bytes. Hardware didn't
>>>> save them from the actual instruction execution or anything, it
>>>> actually go and re-read them, which is why this can cause a SMAP
>>>> fault.
>>>
>>> What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault?
>>
>> e.g
>>
>> 1. Guest has SMAP enabled
>> 2. RIP is mapped in guest page tables with U/S=1
> 
> Ok so that's pretty messed up.  Is this fixed with newer microcode, and
> how widespread is this silicon?  I'm tempted to just blacklist SMAP on
> the affected steppings.
> 


This errata is applicable to Family 17h model 00_0fh only, and it is 
definitely planned to fix in the upcoming CPUs. For now, we just need to
workaround for the Fam 17h_00_0Fh.

I am not sure about blacklist SMAP all together, for non SEV its
very easy to workaround. For SEV case, I am just trying to say that
we will *not* workaround (it does not mean that its not possible)


> Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1,
> and set CR4.SMAP back to 1 in the #DB handler?
> 


Theoretically we should be able to do this to workaround. But since
the errata is not wide spread hence I am not sure about going to
this path. Also there are not many apps doing MMIO from userspace
hence I thought its okay to not workaround for the SEV guest.

Having said so, if we find the actual need to workaround then we can
go to that path.

thanks
Paolo Bonzini Feb. 14, 2019, 12:26 p.m. UTC | #15
On 13/02/19 20:02, Singh, Brijesh wrote:
>> Ok so that's pretty messed up.  Is this fixed with newer microcode, and
>> how widespread is this silicon?  I'm tempted to just blacklist SMAP on
>> the affected steppings.
> 
> This errata is applicable to Family 17h model 00_0fh only, and it is 
> definitely planned to fix in the upcoming CPUs. For now, we just need to
> workaround for the Fam 17h_00_0Fh.

So it's on all steppings; no microcode fix. :(

> I am not sure about blacklist SMAP all together, for non SEV its
> very easy to workaround. For SEV case, I am just trying to say that
> we will *not* workaround (it does not mean that its not possible)
>
>> Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1,
>> and set CR4.SMAP back to 1 in the #DB handler?
> 
> Theoretically we should be able to do this to workaround. But since
> the errata is not wide spread hence I am not sure about going to
> this path. Also there are not many apps doing MMIO from userspace
> hence I thought its okay to not workaround for the SEV guest.
> 
> Having said so, if we find the actual need to workaround then we can
> go to that path.

If it's not too hard, having the workaround would be nice(r).

Paolo
Brijesh Singh Feb. 14, 2019, 2:31 p.m. UTC | #16
On 2/14/19 6:26 AM, Paolo Bonzini wrote:
> On 13/02/19 20:02, Singh, Brijesh wrote:
>>> Ok so that's pretty messed up.  Is this fixed with newer microcode, and
>>> how widespread is this silicon?  I'm tempted to just blacklist SMAP on
>>> the affected steppings.
>>
>> This errata is applicable to Family 17h model 00_0fh only, and it is
>> definitely planned to fix in the upcoming CPUs. For now, we just need to
>> workaround for the Fam 17h_00_0Fh.
> 
> So it's on all steppings; no microcode fix. :(

Unfortunately yes :(

> 
>> I am not sure about blacklist SMAP all together, for non SEV its
>> very easy to workaround. For SEV case, I am just trying to say that
>> we will *not* workaround (it does not mean that its not possible)
>>
>>> Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1,
>>> and set CR4.SMAP back to 1 in the #DB handler?
>>
>> Theoretically we should be able to do this to workaround. But since
>> the errata is not wide spread hence I am not sure about going to
>> this path. Also there are not many apps doing MMIO from userspace
>> hence I thought its okay to not workaround for the SEV guest.
>>
>> Having said so, if we find the actual need to workaround then we can
>> go to that path.
> 
> If it's not too hard, having the workaround would be nice(r).
> 

For now I am inclined to go with what we have. Will submit v2
with fixes.

Theoretically we can use the single stepping with CR4.SMAP=0.
Since this is not a suggested workaround in errata doc hence I
am afraid that we may run into some unknown (mainly with SEV);
I will check with HW folks to get some acknowledgment. If we
all feel comfortable then I will submit a follow up patch.

thanks

-Brijesh
Jim Mattson Feb. 14, 2019, 5:31 p.m. UTC | #17
On Thu, Feb 14, 2019 at 6:31 AM Singh, Brijesh <brijesh.singh@amd.com> wrote:
>
>
>
> On 2/14/19 6:26 AM, Paolo Bonzini wrote:
> > On 13/02/19 20:02, Singh, Brijesh wrote:
> >>> Ok so that's pretty messed up.  Is this fixed with newer microcode, and
> >>> how widespread is this silicon?  I'm tempted to just blacklist SMAP on
> >>> the affected steppings.
> >>
> >> This errata is applicable to Family 17h model 00_0fh only, and it is
> >> definitely planned to fix in the upcoming CPUs. For now, we just need to
> >> workaround for the Fam 17h_00_0Fh.
> >
> > So it's on all steppings; no microcode fix. :(
>
> Unfortunately yes :(
>
> >
> >> I am not sure about blacklist SMAP all together, for non SEV its
> >> very easy to workaround. For SEV case, I am just trying to say that
> >> we will *not* workaround (it does not mean that its not possible)
> >>
> >>> Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1,
> >>> and set CR4.SMAP back to 1 in the #DB handler?
> >>
> >> Theoretically we should be able to do this to workaround. But since
> >> the errata is not wide spread hence I am not sure about going to
> >> this path. Also there are not many apps doing MMIO from userspace
> >> hence I thought its okay to not workaround for the SEV guest.
> >>
> >> Having said so, if we find the actual need to workaround then we can
> >> go to that path.
> >
> > If it's not too hard, having the workaround would be nice(r).
> >
>
> For now I am inclined to go with what we have. Will submit v2
> with fixes.
>
> Theoretically we can use the single stepping with CR4.SMAP=0.

SEV allows the hypervisor to override the guest OS's CR4.SMAP
setting?!? That seems counter-intuitive, given SEV's intended use.
Doesn't this potentially give a ring-3 agent in the guest an avenue to
privilege escalation through collusion with the hypervisor?
Paolo Bonzini Feb. 14, 2019, 5:53 p.m. UTC | #18
On 14/02/19 18:31, Jim Mattson wrote:
>> Theoretically we can use the single stepping with CR4.SMAP=0.
> 
> SEV allows the hypervisor to override the guest OS's CR4.SMAP
> setting?!? That seems counter-intuitive, given SEV's intended use.
> Doesn't this potentially give a ring-3 agent in the guest an avenue to
> privilege escalation through collusion with the hypervisor?

The first version does not protect any register content.  See this paper
for an example: https://arxiv.org/pdf/1612.01119.pdf

Paolo
Brijesh Singh Feb. 14, 2019, 5:58 p.m. UTC | #19
On 2/14/19 11:31 AM, Jim Mattson wrote:
...

>>>
>>
>> For now I am inclined to go with what we have. Will submit v2
>> with fixes.
>>
>> Theoretically we can use the single stepping with CR4.SMAP=0.
> 
> SEV allows the hypervisor to override the guest OS's CR4.SMAP
> setting?!? That seems counter-intuitive, given SEV's intended use.
> Doesn't this potentially give a ring-3 agent in the guest an avenue to
> privilege escalation through collusion with the hypervisor?
> 


The guest register state is protected with the SEV-ES feature.
The SEV-ES feature is supported in current HW but its not supported
in KVM yet. We are actively working to add this feature very soon.


-Brijesh
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4660ce90de7f..536acf622af9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1194,6 +1194,8 @@  struct kvm_x86_ops {
 	int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu,
 				   uint16_t *vmcs_version);
 	uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
+
+	bool (*emulate_instruction_possible)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index da9c42349b1f..8ebc1bfcabd3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5384,8 +5384,10 @@  int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 	 * page is not present in memory). In those cases we simply restart the
 	 * guest.
 	 */
-	if (unlikely(insn && !insn_len))
-		return 1;
+	if (unlikely(insn && !insn_len)) {
+		if (!kvm_x86_ops->emulate_instruction_possible(vcpu))
+			return 1;
+	}
 
 	er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f13a3a24d360..1f359667e529 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7098,6 +7098,36 @@  static int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 	return -ENODEV;
 }
 
+static bool emulate_instruction_possible(struct kvm_vcpu *vcpu)
+{
+	bool is_user, smap;
+
+	is_user = svm_get_cpl(vcpu) == 3;
+	smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
+
+	/*
+	 * Detect and workaround Errata 1090 Fam_17h_00_0Fh
+	 *
+	 * In non SEV guest, hypervisor will be able to read the guest
+	 * memory to decode the instruction pointer when insn_len is zero
+	 * so we return true to indicate that decoding is possible.
+	 *
+	 * But in the SEV guest, the guest memory is encrypted with the
+	 * guest specific key and hypervisor will not be able to decode the
+	 * instruction pointer so we will not able to workaround it. Lets
+	 * print the error and inject a #GP in the guest.
+	 */
+	if (is_user && smap) {
+		if (!sev_guest(vcpu->kvm))
+			return true;
+
+		pr_err("ERROR: encountered errata #1090, injecting #GP\n");
+		kvm_inject_gp(vcpu, 0);
+	}
+
+	return false;
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -7231,6 +7261,8 @@  static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 
 	.nested_enable_evmcs = nested_enable_evmcs,
 	.nested_get_evmcs_version = nested_get_evmcs_version,
+
+	.emulate_instruction_possible = emulate_instruction_possible,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 95d618045001..6767bad8367e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7530,6 +7530,11 @@  static int enable_smi_window(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static bool emulate_instruction_possible(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
 static __init int hardware_setup(void)
 {
 	unsigned long host_bndcfgs;
@@ -7832,6 +7837,7 @@  static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.set_nested_state = NULL,
 	.get_vmcs12_pages = NULL,
 	.nested_enable_evmcs = NULL,
+	.emulate_instruction_possible = emulate_instruction_possible,
 };
 
 static void vmx_cleanup_l1d_flush(void)