diff mbox series

[1/2] KVM: SVM: Fix workaround for AMD Errata 1096

Message ID 20190715203043.100483-2-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show
Series [1/2] KVM: SVM: Fix workaround for AMD Errata 1096 | expand

Commit Message

Liran Alon July 15, 2019, 8:30 p.m. UTC
According to AMD Errata 1096:
"On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
bytes."

As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.

To avoid future confusion and improve code readbility, comment errata details in code and not
just in commit message.

Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 arch/x86/kvm/svm.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comments

Brijesh Singh July 16, 2019, 3:48 p.m. UTC | #1
On 7/15/19 3:30 PM, Liran Alon wrote:
> According to AMD Errata 1096:
> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
> bytes."
> 
> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.
> 

The SMAP violation will occur from CPL3 so CPL==3 is a valid check.

See [1] for complete discussion

https://patchwork.kernel.org/patch/10808075/#22479271

However, code mistakenly checked CR4.SMAP==0, it should be CR4.SMAP==1

> To avoid future confusion and improve code readbility, comment errata details in code and not
> just in commit message.
> 
> Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>   arch/x86/kvm/svm.c | 17 +++++++++++++----
>   1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 735b8c01895e..79023a41f7a7 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7123,10 +7123,19 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
>   	bool is_user, smap;
>   
>   	is_user = svm_get_cpl(vcpu) == 3;
> -	smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
> +	smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
>   

Ah good catch. thank


>   	/*
> -	 * Detect and workaround Errata 1096 Fam_17h_00_0Fh
> +	 * Detect and workaround Errata 1096 Fam_17h_00_0Fh.
> +	 *
> +	 * Errata:
> +	 * On a nested page fault when CR4.SMAP=1 and the guest data read generates
> +	 * a SMAP violation, GuestIntrBytes field of the VMCB on a VMEXIT will
> +	 * incorrectly return 0 instead the correct guest instruction bytes.
> +	 *
> +	 * Workaround:
> +	 * To determine what instruction the guest was executing, the hypervisor
> +	 * will have to decode the instruction at the instruction pointer.
>   	 *
>   	 * In non SEV guest, hypervisor will be able to read the guest
>   	 * memory to decode the instruction pointer when insn_len is zero
> @@ -7137,11 +7146,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
>   	 * instruction pointer so we will not able to workaround it. Lets
>   	 * print the error and request to kill the guest.
>   	 */
> -	if (is_user && smap) {
> +	if (!is_user && smap) {
>   		if (!sev_guest(vcpu->kvm))
>   			return true;
>   
> -		pr_err_ratelimited("KVM: Guest triggered AMD Erratum 1096\n");
> +		pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n");
>   		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>   	}
>   
>
Liran Alon July 16, 2019, 3:56 p.m. UTC | #2
> On 16 Jul 2019, at 18:48, Singh, Brijesh <brijesh.singh@amd.com> wrote:
> 
> On 7/15/19 3:30 PM, Liran Alon wrote:
>> According to AMD Errata 1096:
>> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
>> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
>> bytes."
>> 
>> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
>> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.
>> 
> 
> The SMAP violation will occur from CPL3 so CPL==3 is a valid check.
> 
> See [1] for complete discussion
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10808075_-2322479271&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=RAt8t8nBaCxUPy5OTDkO0n8BMQ5l9oSfLMiL0TLTu6c&s=Nkwe8rTJhygBCIPz27LXrylptjnWyMwB-nJaiowWpWc&e= 

I still don’t understand. SMAP is a mechanism which is meant to protect a CPU running in CPL<3 from mistakenly referencing data controllable by CPL==3.
Therefore, SMAP violation should be raised when CPL<3 and data referenced is mapped in page-tables with PTE with U/S bit set to 1. (i.e. User accessible).

Thus, we should check if CPL<3 and CR4.SMAP==1.

-Liran

> 
> However, code mistakenly checked CR4.SMAP==0, it should be CR4.SMAP==1
> 
>> To avoid future confusion and improve code readbility, comment errata details in code and not
>> just in commit message.
>> 
>> Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> ---
>>  arch/x86/kvm/svm.c | 17 +++++++++++++----
>>  1 file changed, 13 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 735b8c01895e..79023a41f7a7 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -7123,10 +7123,19 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
>>  	bool is_user, smap;
>> 
>>  	is_user = svm_get_cpl(vcpu) == 3;
>> -	smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
>> +	smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
>> 
> 
> Ah good catch. thank
> 
> 
>>  	/*
>> -	 * Detect and workaround Errata 1096 Fam_17h_00_0Fh
>> +	 * Detect and workaround Errata 1096 Fam_17h_00_0Fh.
>> +	 *
>> +	 * Errata:
>> +	 * On a nested page fault when CR4.SMAP=1 and the guest data read generates
>> +	 * a SMAP violation, GuestIntrBytes field of the VMCB on a VMEXIT will
>> +	 * incorrectly return 0 instead the correct guest instruction bytes.
>> +	 *
>> +	 * Workaround:
>> +	 * To determine what instruction the guest was executing, the hypervisor
>> +	 * will have to decode the instruction at the instruction pointer.
>>  	 *
>>  	 * In non SEV guest, hypervisor will be able to read the guest
>>  	 * memory to decode the instruction pointer when insn_len is zero
>> @@ -7137,11 +7146,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
>>  	 * instruction pointer so we will not able to workaround it. Lets
>>  	 * print the error and request to kill the guest.
>>  	 */
>> -	if (is_user && smap) {
>> +	if (!is_user && smap) {
>>  		if (!sev_guest(vcpu->kvm))
>>  			return true;
>> 
>> -		pr_err_ratelimited("KVM: Guest triggered AMD Erratum 1096\n");
>> +		pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n");
>>  		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>>  	}
>> 
>>
Liran Alon July 16, 2019, 4:07 p.m. UTC | #3
> On 16 Jul 2019, at 18:56, Liran Alon <liran.alon@oracle.com> wrote:
> 
> 
> 
>> On 16 Jul 2019, at 18:48, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>> 
>> On 7/15/19 3:30 PM, Liran Alon wrote:
>>> According to AMD Errata 1096:
>>> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
>>> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
>>> bytes."
>>> 
>>> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
>>> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.
>>> 
>> 
>> The SMAP violation will occur from CPL3 so CPL==3 is a valid check.
>> 
>> See [1] for complete discussion
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10808075_-2322479271&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=RAt8t8nBaCxUPy5OTDkO0n8BMQ5l9oSfLMiL0TLTu6c&s=Nkwe8rTJhygBCIPz27LXrylptjnWyMwB-nJaiowWpWc&e= 
> 
> I still don’t understand. SMAP is a mechanism which is meant to protect a CPU running in CPL<3 from mistakenly referencing data controllable by CPL==3.
> Therefore, SMAP violation should be raised when CPL<3 and data referenced is mapped in page-tables with PTE with U/S bit set to 1. (i.e. User accessible).
> 
> Thus, we should check if CPL<3 and CR4.SMAP==1.
> 
> -Liran
> 

To clarify, I would assume that to simulate this Errata we should perform the following:
1) Guest maps code in page-tables as user-accessible (i.e. PTE with U/S bit set to 1).
2) Guest executes this code with CPL<3 (even though mapped as user-accessible which is a security vulnerability in itself…) which access data that is not mapped or marked as reserved in NPT and therefore cause #NPF.
3) Physical CPU DecodeAssist feature attempts to fill-in guest instruction bytes. So it reads as data the guest instructions while CPU is currently with CPL<3, CR4.SMAP=1 and code is mapped as user-accessible. Therefore, this fill-in raise a SMAP violation which cause #NPF to be raised to KVM with 0 instruction bytes.

Am I mistaken in my analysis?

-Liran
Brijesh Singh July 16, 2019, 4:10 p.m. UTC | #4
On 7/16/19 10:56 AM, Liran Alon wrote:
> 
> 
>> On 16 Jul 2019, at 18:48, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>
>> On 7/15/19 3:30 PM, Liran Alon wrote:
>>> According to AMD Errata 1096:
>>> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
>>> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
>>> bytes."
>>>
>>> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
>>> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.
>>>
>>
>> The SMAP violation will occur from CPL3 so CPL==3 is a valid check.
>>
>> See [1] for complete discussion
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10808075_-2322479271&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=RAt8t8nBaCxUPy5OTDkO0n8BMQ5l9oSfLMiL0TLTu6c&s=Nkwe8rTJhygBCIPz27LXrylptjnWyMwB-nJaiowWpWc&e=
> 
> I still don’t understand. SMAP is a mechanism which is meant to protect a CPU running in CPL<3 from mistakenly referencing data controllable by CPL==3.
> Therefore, SMAP violation should be raised when CPL<3 and data referenced is mapped in page-tables with PTE with U/S bit set to 1. (i.e. User accessible).
> 
> Thus, we should check if CPL<3 and CR4.SMAP==1.
> 


In this particular case we are dealing with NPF and not SMAP fault per
say.

What typically has happened here is:

- user space does the MMIO access which causes a fault
- hardware processes this as a VMEXIT
- during processing, hardware attempts to read the instruction bytes to
provide decode assist. This is typically done by data read request from
the RIP that the guest was at. While doing so, we may hit SMAP fault
because internally CPU is doing a data read from the RIP to get those
instruction bytes. Since it hit the SMAP fault hence it was not able
to decode the instruction to provide the insn_len. So we are first
checking if it was a fault caused from CPL==3 and SMAP is enabled.
If so, we are hitting this errata and it can be workaround.

-Brijesh
Liran Alon July 16, 2019, 4:20 p.m. UTC | #5
> On 16 Jul 2019, at 19:10, Singh, Brijesh <brijesh.singh@amd.com> wrote:
> 
> 
> 
> On 7/16/19 10:56 AM, Liran Alon wrote:
>> 
>> 
>>> On 16 Jul 2019, at 18:48, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>> 
>>> On 7/15/19 3:30 PM, Liran Alon wrote:
>>>> According to AMD Errata 1096:
>>>> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
>>>> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
>>>> bytes."
>>>> 
>>>> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
>>>> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.
>>>> 
>>> 
>>> The SMAP violation will occur from CPL3 so CPL==3 is a valid check.
>>> 
>>> See [1] for complete discussion
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10808075_-2322479271&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=RAt8t8nBaCxUPy5OTDkO0n8BMQ5l9oSfLMiL0TLTu6c&s=Nkwe8rTJhygBCIPz27LXrylptjnWyMwB-nJaiowWpWc&e=
>> 
>> I still don’t understand. SMAP is a mechanism which is meant to protect a CPU running in CPL<3 from mistakenly referencing data controllable by CPL==3.
>> Therefore, SMAP violation should be raised when CPL<3 and data referenced is mapped in page-tables with PTE with U/S bit set to 1. (i.e. User accessible).
>> 
>> Thus, we should check if CPL<3 and CR4.SMAP==1.
>> 
> 
> In this particular case we are dealing with NPF and not SMAP fault per
> say.
> 
> What typically has happened here is:
> 
> - user space does the MMIO access which causes a fault
> - hardware processes this as a VMEXIT
> - during processing, hardware attempts to read the instruction bytes to
> provide decode assist. This is typically done by data read request from
> the RIP that the guest was at. While doing so, we may hit SMAP fault

How can a SMAP fault occur when CPL==3? One of the conditions for SMAP is that CPL<3.

I think the confusion is that I believe a code mapped as user-accessible in page-tables but runs with CPL<3
should be the one which does the MMIO. Rather then code running in CPL==3.

The sequence of events I imagine to trigger the Errata is as follows:
1) Guest maps code in page-tables as user-accessible (i.e. PTE with U/S bit set to 1).
2) Guest executes this code with CPL<3 (even though mapped as user-accessible which is a security vulnerability in itself…) which access data that is not mapped or marked as reserved in NPT and therefore cause #NPF.
3) Physical CPU DecodeAssist feature attempts to fill-in guest instruction bytes. So it reads as data the guest instructions while CPU is currently with CPL<3, CR4.SMAP=1 and code is mapped as user-accessible. Therefore, this fill-in raise a SMAP violation which cause #NPF to be raised to KVM with 0 instruction bytes.

BTW, this also means that in order to trigger this, CR4.SMEP should be set to 0. As otherwise, instruction couldn’t have been executed to raise #NPF in the first place. Maybe we can add this as another condition to recognise the Errata?

-Liran

> because internally CPU is doing a data read from the RIP to get those
> instruction bytes. Since it hit the SMAP fault hence it was not able
> to decode the instruction to provide the insn_len. So we are first
> checking if it was a fault caused from CPL==3 and SMAP is enabled.
> If so, we are hitting this errata and it can be workaround.
> 
> -Brijesh
> 
> 
>
Sean Christopherson July 16, 2019, 4:41 p.m. UTC | #6
On Tue, Jul 16, 2019 at 07:20:42PM +0300, Liran Alon wrote:
> How can a SMAP fault occur when CPL==3? One of the conditions for SMAP is
> that CPL<3.

The CPU is effectively at CPL0 when it does the decode assist, e.g.:

  1. CPL3 guest hits reserved bit NPT fault (MMIO access)
  2. CPU transitions to CPL0 on VM-Exit
  3. CPU performs data access on **%rip**, encounters SMAP violation
  4. CPU squashes SMAP violation, sets VMCB.insn_len=0
  5. CPU delivers VM-Exit to software for original NPT fault

The original NPT fault is due to a reserved bit (or not present) entry for
a MMIO GPA, *not* the GPA corresponding to %rip.  The fault on the decode
assist is never delivered to software, it simply results in having invalid
info in the VMCB's insn_bytes and insn_len fields.
Liran Alon July 16, 2019, 4:56 p.m. UTC | #7
> On 16 Jul 2019, at 19:41, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Tue, Jul 16, 2019 at 07:20:42PM +0300, Liran Alon wrote:
>> How can a SMAP fault occur when CPL==3? One of the conditions for SMAP is
>> that CPL<3.
> 
> The CPU is effectively at CPL0 when it does the decode assist, e.g.:
> 
>  1. CPL3 guest hits reserved bit NPT fault (MMIO access)
>  2. CPU transitions to CPL0 on VM-Exit
>  3. CPU performs data access on **%rip**, encounters SMAP violation
>  4. CPU squashes SMAP violation, sets VMCB.insn_len=0
>  5. CPU delivers VM-Exit to software for original NPT fault
> 
> The original NPT fault is due to a reserved bit (or not present) entry for
> a MMIO GPA, *not* the GPA corresponding to %rip.  The fault on the decode
> assist is never delivered to software, it simply results in having invalid
> info in the VMCB's insn_bytes and insn_len fields.

If the CPU performs the VMExit transition of state before doing the data read for DecodeAssist,
then I agree that CPL will be 0 on data-access regardless of vCPU CPL. But this also means that SMAP
violation should be raised based on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.

Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this Errata.
It’s only important that guest page-tables maps the guest RIP as user-accessible. i.e. U/S bit in PTE set to 1.

So I’m still left a bit confused rather the correctness of KVM code handling this Errata.

-Liran
Sean Christopherson July 16, 2019, 5:27 p.m. UTC | #8
On Tue, Jul 16, 2019 at 07:56:31PM +0300, Liran Alon wrote:
> 
> > On 16 Jul 2019, at 19:41, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> > On Tue, Jul 16, 2019 at 07:20:42PM +0300, Liran Alon wrote:
> >> How can a SMAP fault occur when CPL==3? One of the conditions for SMAP is
> >> that CPL<3.
> > 
> > The CPU is effectively at CPL0 when it does the decode assist, e.g.:
> > 
> >  1. CPL3 guest hits reserved bit NPT fault (MMIO access)
> >  2. CPU transitions to CPL0 on VM-Exit
> >  3. CPU performs data access on **%rip**, encounters SMAP violation
> >  4. CPU squashes SMAP violation, sets VMCB.insn_len=0
> >  5. CPU delivers VM-Exit to software for original NPT fault
> > 
> > The original NPT fault is due to a reserved bit (or not present) entry for
> > a MMIO GPA, *not* the GPA corresponding to %rip.  The fault on the decode
> > assist is never delivered to software, it simply results in having invalid
> > info in the VMCB's insn_bytes and insn_len fields.
> 
> If the CPU performs the VMExit transition of state before doing the data read
> for DecodeAssist, then I agree that CPL will be 0 on data-access regardless
> of vCPU CPL. But this also means that SMAP violation should be raised based
> on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.
> Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this
> Errata.

Doh, #2 above is likely wrong.  My *guess* is that the DecodeAssist walk
starts with %rip, i.e. does a NPT walk using the guest context, and that
the access is always an implicit system (CPL0) access.

I'll get out of the way and let Brijesh answer :-)
Paolo Bonzini July 16, 2019, 5:27 p.m. UTC | #9
On 16/07/19 18:56, Liran Alon wrote:
> If the CPU performs the VMExit transition of state before doing the data read for DecodeAssist,
> then I agree that CPL will be 0 on data-access regardless of vCPU CPL. But this also means that SMAP
> violation should be raised based on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.
> 
> Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this Errata.

Under the conditions in the code, if CPL were <3 then the SMAP fault
would have been sent to the guest.  But I agree that if we need to
change it to check host CR4, then the CPL of the guest should not be
checked.

Paolo

> It’s only important that guest page-tables maps the guest RIP as user-accessible. i.e. U/S bit in PTE set to 1.
Liran Alon July 16, 2019, 5:35 p.m. UTC | #10
> On 16 Jul 2019, at 20:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 16/07/19 18:56, Liran Alon wrote:
>> If the CPU performs the VMExit transition of state before doing the data read for DecodeAssist,
>> then I agree that CPL will be 0 on data-access regardless of vCPU CPL. But this also means that SMAP
>> violation should be raised based on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.
>> 
>> Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this Errata.
> 
> Under the conditions in the code, if CPL were <3 then the SMAP fault
> would have been sent to the guest.
> But I agree that if we need to
> change it to check host CR4, then the CPL of the guest should not be
> checked.

Yep.
Well it all depends on how AMD CPU actually works.
We need some clarification from AMD but for sure the current code in KVM is not only wrong, but probably have never been tested. :P

Looking for further clarifications from AMD before submitting v2…

-Liran

> 
> Paolo
> 
>> It’s only important that guest page-tables maps the guest RIP as user-accessible. i.e. U/S bit in PTE set to 1.
>
Brijesh Singh July 16, 2019, 6:05 p.m. UTC | #11
Here is a thread.. but more recent is available

https://marc.info/?t=156322283300001&r=1&w=2

Paolo, Sean and others have also replied to it which you can see on 
marc.info.

-Brijesh

On 7/16/19 11:20 AM, Liran Alon wrote:
> 
> 
>> On 16 Jul 2019, at 19:10, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>
>>
>>
>> On 7/16/19 10:56 AM, Liran Alon wrote:
>>>
>>>
>>>> On 16 Jul 2019, at 18:48, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>>>
>>>> On 7/15/19 3:30 PM, Liran Alon wrote:
>>>>> According to AMD Errata 1096:
>>>>> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
>>>>> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
>>>>> bytes."
>>>>>
>>>>> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
>>>>> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.
>>>>>
>>>>
>>>> The SMAP violation will occur from CPL3 so CPL==3 is a valid check.
>>>>
>>>> See [1] for complete discussion
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10808075_-2322479271&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=RAt8t8nBaCxUPy5OTDkO0n8BMQ5l9oSfLMiL0TLTu6c&s=Nkwe8rTJhygBCIPz27LXrylptjnWyMwB-nJaiowWpWc&e=
>>>
>>> I still don’t understand. SMAP is a mechanism which is meant to protect a CPU running in CPL<3 from mistakenly referencing data controllable by CPL==3.
>>> Therefore, SMAP violation should be raised when CPL<3 and data referenced is mapped in page-tables with PTE with U/S bit set to 1. (i.e. User accessible).
>>>
>>> Thus, we should check if CPL<3 and CR4.SMAP==1.
>>>
>>
>> In this particular case we are dealing with NPF and not SMAP fault per
>> say.
>>
>> What typically has happened here is:
>>
>> - user space does the MMIO access which causes a fault
>> - hardware processes this as a VMEXIT
>> - during processing, hardware attempts to read the instruction bytes to
>> provide decode assist. This is typically done by data read request from
>> the RIP that the guest was at. While doing so, we may hit SMAP fault
> 
> How can a SMAP fault occur when CPL==3? One of the conditions for SMAP is that CPL<3.
> 
> I think the confusion is that I believe a code mapped as user-accessible in page-tables but runs with CPL<3
> should be the one which does the MMIO. Rather then code running in CPL==3.
> 
> The sequence of events I imagine to trigger the Errata is as follows:
> 1) Guest maps code in page-tables as user-accessible (i.e. PTE with U/S bit set to 1).
> 2) Guest executes this code with CPL<3 (even though mapped as user-accessible which is a security vulnerability in itself…) which access data that is not mapped or marked as reserved in NPT and therefore cause #NPF.
> 3) Physical CPU DecodeAssist feature attempts to fill-in guest instruction bytes. So it reads as data the guest instructions while CPU is currently with CPL<3, CR4.SMAP=1 and code is mapped as user-accessible. Therefore, this fill-in raise a SMAP violation which cause #NPF to be raised to KVM with 0 instruction bytes.
> 
> BTW, this also means that in order to trigger this, CR4.SMEP should be set to 0. As otherwise, instruction couldn’t have been executed to raise #NPF in the first place. Maybe we can add this as another condition to recognise the Errata?
> 
> -Liran
> 
>> because internally CPU is doing a data read from the RIP to get those
>> instruction bytes. Since it hit the SMAP fault hence it was not able
>> to decode the instruction to provide the insn_len. So we are first
>> checking if it was a fault caused from CPL==3 and SMAP is enabled.
>> If so, we are hitting this errata and it can be workaround.
>>
>> -Brijesh
>>
>>
>>
>
Brijesh Singh July 16, 2019, 6:06 p.m. UTC | #12
Ooops sorry, I was fwding thread HW folks to correct my understanding.
I will update you with result. thanks


On 7/16/19 1:05 PM, Singh, Brijesh wrote:
> Here is a thread.. but more recent is available
> 
> https://marc.info/?t=156322283300001&r=1&w=2
> 
> Paolo, Sean and others have also replied to it which you can see on
> marc.info.
> 
> -Brijesh
> 
> On 7/16/19 11:20 AM, Liran Alon wrote:
>>
>>
>>> On 16 Jul 2019, at 19:10, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>>
>>>
>>>
>>> On 7/16/19 10:56 AM, Liran Alon wrote:
>>>>
>>>>
>>>>> On 16 Jul 2019, at 18:48, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>>>>
>>>>> On 7/15/19 3:30 PM, Liran Alon wrote:
>>>>>> According to AMD Errata 1096:
>>>>>> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the
>>>>>> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction
>>>>>> bytes."
>>>>>>
>>>>>> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs
>>>>>> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0.
>>>>>>
>>>>>
>>>>> The SMAP violation will occur from CPL3 so CPL==3 is a valid check.
>>>>>
>>>>> See [1] for complete discussion
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10808075_-2322479271&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=RAt8t8nBaCxUPy5OTDkO0n8BMQ5l9oSfLMiL0TLTu6c&s=Nkwe8rTJhygBCIPz27LXrylptjnWyMwB-nJaiowWpWc&e=
>>>>
>>>> I still don’t understand. SMAP is a mechanism which is meant to protect a CPU running in CPL<3 from mistakenly referencing data controllable by CPL==3.
>>>> Therefore, SMAP violation should be raised when CPL<3 and data referenced is mapped in page-tables with PTE with U/S bit set to 1. (i.e. User accessible).
>>>>
>>>> Thus, we should check if CPL<3 and CR4.SMAP==1.
>>>>
>>>
>>> In this particular case we are dealing with NPF and not SMAP fault per
>>> say.
>>>
>>> What typically has happened here is:
>>>
>>> - user space does the MMIO access which causes a fault
>>> - hardware processes this as a VMEXIT
>>> - during processing, hardware attempts to read the instruction bytes to
>>> provide decode assist. This is typically done by data read request from
>>> the RIP that the guest was at. While doing so, we may hit SMAP fault
>>
>> How can a SMAP fault occur when CPL==3? One of the conditions for SMAP is that CPL<3.
>>
>> I think the confusion is that I believe a code mapped as user-accessible in page-tables but runs with CPL<3
>> should be the one which does the MMIO. Rather then code running in CPL==3.
>>
>> The sequence of events I imagine to trigger the Errata is as follows:
>> 1) Guest maps code in page-tables as user-accessible (i.e. PTE with U/S bit set to 1).
>> 2) Guest executes this code with CPL<3 (even though mapped as user-accessible which is a security vulnerability in itself…) which access data that is not mapped or marked as reserved in NPT and therefore cause #NPF.
>> 3) Physical CPU DecodeAssist feature attempts to fill-in guest instruction bytes. So it reads as data the guest instructions while CPU is currently with CPL<3, CR4.SMAP=1 and code is mapped as user-accessible. Therefore, this fill-in raise a SMAP violation which cause #NPF to be raised to KVM with 0 instruction bytes.
>>
>> BTW, this also means that in order to trigger this, CR4.SMEP should be set to 0. As otherwise, instruction couldn’t have been executed to raise #NPF in the first place. Maybe we can add this as another condition to recognise the Errata?
>>
>> -Liran
>>
>>> because internally CPU is doing a data read from the RIP to get those
>>> instruction bytes. Since it hit the SMAP fault hence it was not able
>>> to decode the instruction to provide the insn_len. So we are first
>>> checking if it was a fault caused from CPL==3 and SMAP is enabled.
>>> If so, we are hitting this errata and it can be workaround.
>>>
>>> -Brijesh
>>>
>>>
>>>
>>
Brijesh Singh July 16, 2019, 7:28 p.m. UTC | #13
On 7/16/19 12:35 PM, Liran Alon wrote:
> 
> 
>> On 16 Jul 2019, at 20:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 16/07/19 18:56, Liran Alon wrote:
>>> If the CPU performs the VMExit transition of state before doing the data read for DecodeAssist,
>>> then I agree that CPL will be 0 on data-access regardless of vCPU CPL. But this also means that SMAP
>>> violation should be raised based on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.
>>>
>>> Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this Errata.
>>
>> Under the conditions in the code, if CPL were <3 then the SMAP fault
>> would have been sent to the guest.
>> But I agree that if we need to
>> change it to check host CR4, then the CPL of the guest should not be
>> checked.
> 
> Yep.
> Well it all depends on how AMD CPU actually works.
> We need some clarification from AMD but for sure the current code in KVM is not only wrong, but probably have never been tested. :P
> 
> Looking for further clarifications from AMD before submitting v2…
> 

When this errata is hit, the CPU will be at CPL3. From hardware
point-of-view the below sequence happens:

1. CPL3 guest hits reserved bit NPT fault (MMIO access)

2. Microcode uses special opcode which attempts to read data using the
CPL0 privileges. The microcode read CS:RIP, when it hits SMAP fault,
it gives up and returns no instruction bytes.

(Note: vCPU is still at CPL3)

3. CPU causes #VMEXIT for original fault address.

The SMAP fault occurred while we are still in guest context. It will be
nice to have code test example to triggers this errata.

> -Liran
> 
>>
>> Paolo
>>
>>> It’s only important that guest page-tables maps the guest RIP as user-accessible. i.e. U/S bit in PTE set to 1.
>>
>
Liran Alon July 16, 2019, 7:34 p.m. UTC | #14
> On 16 Jul 2019, at 22:28, Singh, Brijesh <brijesh.singh@amd.com> wrote:
> 
> 
> 
> On 7/16/19 12:35 PM, Liran Alon wrote:
>> 
>> 
>>> On 16 Jul 2019, at 20:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 
>>> On 16/07/19 18:56, Liran Alon wrote:
>>>> If the CPU performs the VMExit transition of state before doing the data read for DecodeAssist,
>>>> then I agree that CPL will be 0 on data-access regardless of vCPU CPL. But this also means that SMAP
>>>> violation should be raised based on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.
>>>> 
>>>> Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this Errata.
>>> 
>>> Under the conditions in the code, if CPL were <3 then the SMAP fault
>>> would have been sent to the guest.
>>> But I agree that if we need to
>>> change it to check host CR4, then the CPL of the guest should not be
>>> checked.
>> 
>> Yep.
>> Well it all depends on how AMD CPU actually works.
>> We need some clarification from AMD but for sure the current code in KVM is not only wrong, but probably have never been tested. :P
>> 
>> Looking for further clarifications from AMD before submitting v2…
>> 
> 
> When this errata is hit, the CPU will be at CPL3. From hardware
> point-of-view the below sequence happens:
> 
> 1. CPL3 guest hits reserved bit NPT fault (MMIO access)

Why CPU needs to be at CPL3?
The requirement for SMAP should be that this page is user-accessible in guest page-tables.
Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.

> 
> 2. Microcode uses special opcode which attempts to read data using the
> CPL0 privileges. The microcode read CS:RIP, when it hits SMAP fault,
> it gives up and returns no instruction bytes.
> 
> (Note: vCPU is still at CPL3)

So at this point guest vCPU CR4.SMAP is what matters right? Not host CR4.SMAP.

> 
> 3. CPU causes #VMEXIT for original fault address.
> 
> The SMAP fault occurred while we are still in guest context. It will be
> nice to have code test example to triggers this errata.

I can write such code in kvm-unit-tests for you to run on relevant hardware if you have such a machine present.
I don’t have relevant machine with me and therefore I wrote a disclaimer I couldn’t test it in cover letter.

So to sum-up what KVM needs to do:
1) Check guest vCPU CR4.SMAP is set to 1. (As I fixed in this commit).
2) Remove the check for CPL==3. If we really want to be pedantic, we can parse guest page-tables to see if PTE have U/S bit set to 1.
What do you think?

-Liran

> 
>> -Liran
>> 
>>> 
>>> Paolo
>>> 
>>>> It’s only important that guest page-tables maps the guest RIP as user-accessible. i.e. U/S bit in PTE set to 1.
>>> 
>>
Paolo Bonzini July 16, 2019, 7:39 p.m. UTC | #15
On 16/07/19 21:34, Liran Alon wrote:
>> When this errata is hit, the CPU will be at CPL3. From hardware
>> point-of-view the below sequence happens:
>>
>> 1. CPL3 guest hits reserved bit NPT fault (MMIO access)
> Why CPU needs to be at CPL3?
> The requirement for SMAP should be that this page is user-accessible in guest page-tables.
> Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.
> 

If you are not at CPL3, you'd get a SMAP NPF, not a RSVD NPF.

Paolo
Sean Christopherson July 16, 2019, 7:41 p.m. UTC | #16
On Tue, Jul 16, 2019 at 10:34:08PM +0300, Liran Alon wrote:
> If we really want to be pedantic, we can parse guest page-tables to see if PTE
> have U/S bit set to 1.  What do you think?

Performance aside, walking the guest page tables would fall apart if a
different vCPU modified the guest's page tables.
Sean Christopherson July 16, 2019, 7:45 p.m. UTC | #17
On Tue, Jul 16, 2019 at 09:39:48PM +0200, Paolo Bonzini wrote:
> On 16/07/19 21:34, Liran Alon wrote:
> >> When this errata is hit, the CPU will be at CPL3. From hardware
> >> point-of-view the below sequence happens:
> >>
> >> 1. CPL3 guest hits reserved bit NPT fault (MMIO access)
> > Why CPU needs to be at CPL3?
> > The requirement for SMAP should be that this page is user-accessible in guest page-tables.
> > Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.
> > 
> 
> If you are not at CPL3, you'd get a SMAP NPF, not a RSVD NPF.

I think Liran is right.  When software is executing, the %rip access is
a code fetch (SMEP), but the ucode assist is a data access (SMAP).

This likely has only been observed in a CPL3 scenario because no sane OS
exercises the case of the kernel executing from a user page with SMAP=1
and SMEP=0.
Liran Alon July 16, 2019, 7:47 p.m. UTC | #18
> On 16 Jul 2019, at 22:39, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 16/07/19 21:34, Liran Alon wrote:
>>> When this errata is hit, the CPU will be at CPL3. From hardware
>>> point-of-view the below sequence happens:
>>> 
>>> 1. CPL3 guest hits reserved bit NPT fault (MMIO access)
>> Why CPU needs to be at CPL3?
>> The requirement for SMAP should be that this page is user-accessible in guest page-tables.
>> Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.
>> 
> 
> If you are not at CPL3, you'd get a SMAP NPF, not a RSVD NPF.

If CR4.SMEP=0, guest vCPU can execute a user-accessible page in guest page-tables with CPL<3.
This instruction will successfully execute and can cause by the data it references any type of #NPF. Including RSVD #NPF.
When hardware DecodeAssist microcode will attempt to read guest RIP though, it will get a SMAP violation because
data read is done by microcode with CPL<3 and is accessing user-accessible page.

Therefore, I still don’t think that guest vCPU CPL matters at all. Only whether code page is mapped in guest page-tables as user-accessible or not.

-Liran 

> 
> Paolo
Liran Alon July 16, 2019, 7:50 p.m. UTC | #19
> On 16 Jul 2019, at 22:45, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Tue, Jul 16, 2019 at 09:39:48PM +0200, Paolo Bonzini wrote:
>> On 16/07/19 21:34, Liran Alon wrote:
>>>> When this errata is hit, the CPU will be at CPL3. From hardware
>>>> point-of-view the below sequence happens:
>>>> 
>>>> 1. CPL3 guest hits reserved bit NPT fault (MMIO access)
>>> Why CPU needs to be at CPL3?
>>> The requirement for SMAP should be that this page is user-accessible in guest page-tables.
>>> Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.
>>> 
>> 
>> If you are not at CPL3, you'd get a SMAP NPF, not a RSVD NPF.
> 
> I think Liran is right.  When software is executing, the %rip access is
> a code fetch (SMEP), but the ucode assist is a data access (SMAP).
> 
> This likely has only been observed in a CPL3 scenario because no sane OS
> exercises the case of the kernel executing from a user page with SMAP=1
> and SMEP=0.

True. I’m trying to be pedantic and accurate here. :)
I think we should just remove the vCPU CPL check and remain only with the CR4.SMAP check.
Don’t you agree that having a #NPF that returns 0 instruction bytes with DecodeAssist enabled and CR4.SMAP=1
is sufficient for finger-printing this Errata? With which other use-case it’s expected to collide?

-Liran
Liran Alon July 16, 2019, 7:52 p.m. UTC | #20
> On 16 Jul 2019, at 22:41, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Tue, Jul 16, 2019 at 10:34:08PM +0300, Liran Alon wrote:
>> If we really want to be pedantic, we can parse guest page-tables to see if PTE
>> have U/S bit set to 1.  What do you think?
> 
> Performance aside, walking the guest page tables would fall apart if a
> different vCPU modified the guest's page tables.

True. :)
So let’s just stick with checking only CR4.SMAP=1 then.

-Liran
Brijesh Singh July 16, 2019, 8:02 p.m. UTC | #21
On 7/16/19 2:34 PM, Liran Alon wrote:
> 
> 
>> On 16 Jul 2019, at 22:28, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>
>>
>>
>> On 7/16/19 12:35 PM, Liran Alon wrote:
>>>
>>>
>>>> On 16 Jul 2019, at 20:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>> On 16/07/19 18:56, Liran Alon wrote:
>>>>> If the CPU performs the VMExit transition of state before doing the data read for DecodeAssist,
>>>>> then I agree that CPL will be 0 on data-access regardless of vCPU CPL. But this also means that SMAP
>>>>> violation should be raised based on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.
>>>>>
>>>>> Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this Errata.
>>>>
>>>> Under the conditions in the code, if CPL were <3 then the SMAP fault
>>>> would have been sent to the guest.
>>>> But I agree that if we need to
>>>> change it to check host CR4, then the CPL of the guest should not be
>>>> checked.
>>>
>>> Yep.
>>> Well it all depends on how AMD CPU actually works.
>>> We need some clarification from AMD but for sure the current code in KVM is not only wrong, but probably have never been tested. :P
>>>
>>> Looking for further clarifications from AMD before submitting v2…
>>>
>>
>> When this errata is hit, the CPU will be at CPL3. From hardware
>> point-of-view the below sequence happens:
>>
>> 1. CPL3 guest hits reserved bit NPT fault (MMIO access)
> 
> Why CPU needs to be at CPL3?
> The requirement for SMAP should be that this page is user-accessible in guest page-tables.
> Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.
> 

We are discussing reserved NPF so we need to be at CPL3.

>>
>> 2. Microcode uses special opcode which attempts to read data using the
>> CPL0 privileges. The microcode read CS:RIP, when it hits SMAP fault,
>> it gives up and returns no instruction bytes.
>>
>> (Note: vCPU is still at CPL3)
> 
> So at this point guest vCPU CR4.SMAP is what matters right? Not host CR4.SMAP.
> 

Yes, its guest vCPU SMAP.

>>
>> 3. CPU causes #VMEXIT for original fault address.
>>
>> The SMAP fault occurred while we are still in guest context. It will be
>> nice to have code test example to triggers this errata.
> 
> I can write such code in kvm-unit-tests for you to run on relevant hardware if you have such a machine present.
> I don’t have relevant machine with me and therefore I wrote a disclaimer I couldn’t test it in cover letter.
> 

I have required hardware and should be able to run some test for you.

> So to sum-up what KVM needs to do:
> 1) Check guest vCPU CR4.SMAP is set to 1. (As I fixed in this commit).
> 2) Remove the check for CPL==3. If we really want to be pedantic, we can parse guest page-tables to see if PTE have U/S bit set to 1.


Remember in the case of SEV guest, the page-tables are encrypted with
the guest specific key and we will not be able to walk it to inspect
the U/S bit. We want to detect the errata and if its SEV guest then
we can't do much, for non-SEV fallback to instruction decode which
will walk the guest page-tables to fetch the instruction bytes.


> What do you think?
> 
> -Liran
> 
>>
>>> -Liran
>>>
>>>>
>>>> Paolo
>>>>
>>>>> It’s only important that guest page-tables maps the guest RIP as user-accessible. i.e. U/S bit in PTE set to 1.
>>>>
>>>
>
Sean Christopherson July 16, 2019, 8:07 p.m. UTC | #22
On Tue, Jul 16, 2019 at 08:02:43PM +0000, Singh, Brijesh wrote:
> 
> On 7/16/19 2:34 PM, Liran Alon wrote:
> > Why CPU needs to be at CPL3?
> > The requirement for SMAP should be that this page is user-accessible in guest page-tables.
> > Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.
> > 
> 
> We are discussing reserved NPF so we need to be at CPL3.

For my own education, why does reserved NPF require CPL3?  I.e. what
happens if a reserved bit is encountered at CPL<3 (or do they simply not
exist)?
Liran Alon July 16, 2019, 8:09 p.m. UTC | #23
> On 16 Jul 2019, at 23:02, Singh, Brijesh <brijesh.singh@amd.com> wrote:
> 
> 
> 
> On 7/16/19 2:34 PM, Liran Alon wrote:
>> 
>> 
>>> On 16 Jul 2019, at 22:28, Singh, Brijesh <brijesh.singh@amd.com> wrote:
>>> 
>>> 
>>> 
>>> On 7/16/19 12:35 PM, Liran Alon wrote:
>>>> 
>>>> 
>>>>> On 16 Jul 2019, at 20:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> 
>>>>> On 16/07/19 18:56, Liran Alon wrote:
>>>>>> If the CPU performs the VMExit transition of state before doing the data read for DecodeAssist,
>>>>>> then I agree that CPL will be 0 on data-access regardless of vCPU CPL. But this also means that SMAP
>>>>>> violation should be raised based on host CR4.SMAP value and not vCPU CR4.SMAP value as KVM code checks.
>>>>>> 
>>>>>> Furthermore, vCPU CPL of guest doesn’t need to be 3 in order to trigger this Errata.
>>>>> 
>>>>> Under the conditions in the code, if CPL were <3 then the SMAP fault
>>>>> would have been sent to the guest.
>>>>> But I agree that if we need to
>>>>> change it to check host CR4, then the CPL of the guest should not be
>>>>> checked.
>>>> 
>>>> Yep.
>>>> Well it all depends on how AMD CPU actually works.
>>>> We need some clarification from AMD but for sure the current code in KVM is not only wrong, but probably have never been tested. :P
>>>> 
>>>> Looking for further clarifications from AMD before submitting v2…
>>>> 
>>> 
>>> When this errata is hit, the CPU will be at CPL3. From hardware
>>> point-of-view the below sequence happens:
>>> 
>>> 1. CPL3 guest hits reserved bit NPT fault (MMIO access)
>> 
>> Why CPU needs to be at CPL3?
>> The requirement for SMAP should be that this page is user-accessible in guest page-tables.
>> Think on a case where guest have CR4.SMAP=1 and CR4.SMEP=0.
>> 
> 
> We are discussing reserved NPF so we need to be at CPL3.

I don’t see the connection between a reserved #NPF and the need to be at CPL3.
A vCPU can execute at CPL<3 a page that is mapped user-accessible in guest page-tables in case CR4.SMEP=0
and then instruction will execute successfully and can dereference a page that is mapped in NPT using an entry with a reserved bit set.
Thus, reserved #NPF will be raised while vCPU is at CPL<3 and DecodeAssist microcode will still raise SMAP violation
as CR4.SMAP=1 and microcode perform data-fetch with CPL<3. This leading exactly to Errata condition as far as I understand.

> 
>>> 
>>> 2. Microcode uses special opcode which attempts to read data using the
>>> CPL0 privileges. The microcode read CS:RIP, when it hits SMAP fault,
>>> it gives up and returns no instruction bytes.
>>> 
>>> (Note: vCPU is still at CPL3)
>> 
>> So at this point guest vCPU CR4.SMAP is what matters right? Not host CR4.SMAP.
>> 
> 
> Yes, its guest vCPU SMAP.
> 
>>> 
>>> 3. CPU causes #VMEXIT for original fault address.
>>> 
>>> The SMAP fault occurred while we are still in guest context. It will be
>>> nice to have code test example to triggers this errata.
>> 
>> I can write such code in kvm-unit-tests for you to run on relevant hardware if you have such a machine present.
>> I don’t have relevant machine with me and therefore I wrote a disclaimer I couldn’t test it in cover letter.
>> 
> 
> I have required hardware and should be able to run some test for you.

Ok. Will write such kvm-unit-test and Cc you.

> 
>> So to sum-up what KVM needs to do:
>> 1) Check guest vCPU CR4.SMAP is set to 1. (As I fixed in this commit).
>> 2) Remove the check for CPL==3. If we really want to be pedantic, we can parse guest page-tables to see if PTE have U/S bit set to 1.
> 
> 
> Remember in the case of SEV guest, the page-tables are encrypted with
> the guest specific key and we will not be able to walk it to inspect
> the U/S bit. We want to detect the errata and if its SEV guest then
> we can't do much, for non-SEV fallback to instruction decode which
> will walk the guest page-tables to fetch the instruction bytes.
> 
> 
>> What do you think?
>> 
>> -Liran
>> 
>>> 
>>>> -Liran
>>>> 
>>>>> 
>>>>> Paolo
>>>>> 
>>>>>> It’s only important that guest page-tables maps the guest RIP as user-accessible. i.e. U/S bit in PTE set to 1.
>>>>> 
>>>> 
>>
Paolo Bonzini July 16, 2019, 8:13 p.m. UTC | #24
On 16/07/19 22:07, Sean Christopherson wrote:
>> We are discussing reserved NPF so we need to be at CPL3.
> For my own education, why does reserved NPF require CPL3?  I.e. what
> happens if a reserved bit is encountered at CPL<3 (or do they simply not
> exist)?

Better: what happens if a reserved bit is encountered at CPL<3 *with
CR4.SMEP=0*?

Paolo
Brijesh Singh July 16, 2019, 8:27 p.m. UTC | #25
On 7/16/19 3:09 PM, Liran Alon wrote:
...


>>
>> We are discussing reserved NPF so we need to be at CPL3.
> 
> I don’t see the connection between a reserved #NPF and the need to be at CPL3.
> A vCPU can execute at CPL<3 a page that is mapped user-accessible in guest page-tables in case CR4.SMEP=0
> and then instruction will execute successfully and can dereference a page that is mapped in NPT using an entry with a reserved bit set.
> Thus, reserved #NPF will be raised while vCPU is at CPL<3 and DecodeAssist microcode will still raise SMAP violation
> as CR4.SMAP=1 and microcode perform data-fetch with CPL<3. This leading exactly to Errata condition as far as I understand.
> 

Yes, vCPU at CPL<3 can raise the SMAP violation. When SMEP is disabled,
the guest kernel never should be executing from code in user-mode pages,
that'd be insecure. So I am not sure if kernel code can cause this
errata. Having said so, I'm OK to remove the CPL check if it makes code
more readable.

-Brijesh
Sean Christopherson July 16, 2019, 8:54 p.m. UTC | #26
On Tue, Jul 16, 2019 at 08:27:12PM +0000, Singh, Brijesh wrote:
> 
> On 7/16/19 3:09 PM, Liran Alon wrote:
> >>
> >> We are discussing reserved NPF so we need to be at CPL3.
> > 
> > I don’t see the connection between a reserved #NPF and the need to be at
> > CPL3.  A vCPU can execute at CPL<3 a page that is mapped user-accessible in
> > guest page-tables in case CR4.SMEP=0 and then instruction will execute
> > successfully and can dereference a page that is mapped in NPT using an
> > entry with a reserved bit set.  Thus, reserved #NPF will be raised while
> > vCPU is at CPL<3 and DecodeAssist microcode will still raise SMAP violation
> > as CR4.SMAP=1 and microcode perform data-fetch with CPL<3. This leading
> > exactly to Errata condition as far as I understand.
> > 
> 
> Yes, vCPU at CPL<3 can raise the SMAP violation. When SMEP is disabled,
> the guest kernel never should be executing from code in user-mode pages,
> that'd be insecure. So I am not sure if kernel code can cause this
> errata.

From KVM's perspective, it's not a question of what is *likely* to happen
so much as it's a question of what *can* happen.  Architecturally there is
nothing that prevents CPL<3 code from encountering the SMAP fault.
Liran Alon July 16, 2019, 9:53 p.m. UTC | #27
> On 16 Jul 2019, at 23:54, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Tue, Jul 16, 2019 at 08:27:12PM +0000, Singh, Brijesh wrote:
>> 
>> On 7/16/19 3:09 PM, Liran Alon wrote:
>>>> 
>>>> We are discussing reserved NPF so we need to be at CPL3.
>>> 
>>> I don’t see the connection between a reserved #NPF and the need to be at
>>> CPL3.  A vCPU can execute at CPL<3 a page that is mapped user-accessible in
>>> guest page-tables in case CR4.SMEP=0 and then instruction will execute
>>> successfully and can dereference a page that is mapped in NPT using an
>>> entry with a reserved bit set.  Thus, reserved #NPF will be raised while
>>> vCPU is at CPL<3 and DecodeAssist microcode will still raise SMAP violation
>>> as CR4.SMAP=1 and microcode perform data-fetch with CPL<3. This leading
>>> exactly to Errata condition as far as I understand.
>>> 
>> 
>> Yes, vCPU at CPL<3 can raise the SMAP violation. When SMEP is disabled,
>> the guest kernel never should be executing from code in user-mode pages,
>> that'd be insecure. So I am not sure if kernel code can cause this
>> errata.
> 
> From KVM's perspective, it's not a question of what is *likely* to happen
> so much as it's a question of what *can* happen.  Architecturally there is
> nothing that prevents CPL<3 code from encountering the SMAP fault.

Exactly. :)
I will submit a v2 patch which also clarifies the details we understood in this email thread.

Thanks,
-Liran
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 735b8c01895e..79023a41f7a7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7123,10 +7123,19 @@  static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
 	bool is_user, smap;
 
 	is_user = svm_get_cpl(vcpu) == 3;
-	smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
+	smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
 
 	/*
-	 * Detect and workaround Errata 1096 Fam_17h_00_0Fh
+	 * Detect and workaround Errata 1096 Fam_17h_00_0Fh.
+	 *
+	 * Errata:
+	 * On a nested page fault when CR4.SMAP=1 and the guest data read generates
+	 * a SMAP violation, GuestIntrBytes field of the VMCB on a VMEXIT will
+	 * incorrectly return 0 instead the correct guest instruction bytes.
+	 *
+	 * Workaround:
+	 * To determine what instruction the guest was executing, the hypervisor
+	 * will have to decode the instruction at the instruction pointer.
 	 *
 	 * In non SEV guest, hypervisor will be able to read the guest
 	 * memory to decode the instruction pointer when insn_len is zero
@@ -7137,11 +7146,11 @@  static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
 	 * instruction pointer so we will not able to workaround it. Lets
 	 * print the error and request to kill the guest.
 	 */
-	if (is_user && smap) {
+	if (!is_user && smap) {
 		if (!sev_guest(vcpu->kvm))
 			return true;
 
-		pr_err_ratelimited("KVM: Guest triggered AMD Erratum 1096\n");
+		pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n");
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 	}