Message ID | 20190212144354.7694-1-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) | expand |
On 12/02/19 15:44, Singh, Brijesh wrote: > - if (unlikely(insn && !insn_len)) > - return 1; > + if (unlikely(insn && !insn_len)) { > + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) > + return 1; > + } Are the instruction bytes valid, that is can we just ignore insn_len and use the bytes but not the length? That would work for SEV too. Paolo > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 95d618045001..6767bad8367e 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7530,6 +7530,11 @@ static int enable_smi_window(struct kvm_vcpu *vcpu) > return 0; > } > > +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu) > +{ > + return 1; This should be "return 0;" to keep previous behavior. > +} > + > static __init int hardware_setup(void) > { > unsigned long host_bndcfgs; > @@ -7832,6 +7837,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { > .set_nested_state = NULL, > .get_vmcs12_pages = NULL, > .nested_enable_evmcs = NULL, > + .emulate_instruction_possible = emulate_instruction_possible, > }; >
On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote: > > Errata#1090: > > On a nested data page fault when CR.SMAP=1 and the guest data read > generates a SMAP violation, GuestInstrBytes field of the VMCB on a > VMEXIT will incorrectly return 0h instead the correct guest > instruction bytes . > > Recommend Workaround: > > To determine what instruction the guest was executing the hypervisor > will have to decode the instruction at the instruction pointer. > > The recommended workaround can not be implemented for the SEV > guest because guest memory is encrypted with the guest specific key, > and instruction decoder will not be able to decode the instruction > bytes. If we hit this errata in the SEV guest then inject #GP into > the guest and log the message. > > Cc: Jim Mattson <jmattson@google.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: "Radim Krčmář" <rkrcmar@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> This was ... Reported-by: Venkatesh Srinivas <venkateshs@google.com> I'm curious why you chose to inject #GP rather than, say, requesting a guest shutdown. Is the guest #GP handler expected to be able to recover from this?
On 2/12/19 9:06 AM, Paolo Bonzini wrote: > On 12/02/19 15:44, Singh, Brijesh wrote: >> - if (unlikely(insn && !insn_len)) >> - return 1; >> + if (unlikely(insn && !insn_len)) { >> + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) >> + return 1; >> + } > > Are the instruction bytes valid, that is can we just ignore insn_len and > use the bytes but not the length? That would work for SEV too. > The instruction bytes are not valid so we will not able to workaround for the SEV. > Paolo > >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index 95d618045001..6767bad8367e 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -7530,6 +7530,11 @@ static int enable_smi_window(struct kvm_vcpu *vcpu) >> return 0; >> } >> >> +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu) >> +{ >> + return 1; > > This should be "return 0;" to keep previous behavior. > Sure, I will fix in v2. >> +} >> + >> static __init int hardware_setup(void) >> { >> unsigned long host_bndcfgs; >> @@ -7832,6 +7837,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { >> .set_nested_state = NULL, >> .get_vmcs12_pages = NULL, >> .nested_enable_evmcs = NULL, >> + .emulate_instruction_possible = emulate_instruction_possible, >> }; >> >
On 2/12/19 11:41 AM, Jim Mattson wrote: > On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote: >> >> Errata#1090: >> >> On a nested data page fault when CR.SMAP=1 and the guest data read >> generates a SMAP violation, GuestInstrBytes field of the VMCB on a >> VMEXIT will incorrectly return 0h instead the correct guest >> instruction bytes . >> >> Recommend Workaround: >> >> To determine what instruction the guest was executing the hypervisor >> will have to decode the instruction at the instruction pointer. >> >> The recommended workaround can not be implemented for the SEV >> guest because guest memory is encrypted with the guest specific key, >> and instruction decoder will not be able to decode the instruction >> bytes. If we hit this errata in the SEV guest then inject #GP into >> the guest and log the message. >> >> Cc: Jim Mattson <jmattson@google.com> >> Cc: Tom Lendacky <thomas.lendacky@amd.com> >> Cc: Borislav Petkov <bp@alien8.de> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: "Radim Krčmář" <rkrcmar@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > > This was ... > Reported-by: Venkatesh Srinivas <venkateshs@google.com> > I will add the tag in next rev. > I'm curious why you chose to inject #GP rather than, say, requesting a > guest shutdown. Is the guest #GP handler expected to be able to > recover from this? > We will *not* be able to recover from this, I wanted to abort the guest and I should admit that I was not ware of requesting a SHUTDOWN method so decided to inject #GP so that guest does not continue. Browsing further, I see that kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu) can be used to request a SHUTDOWN. I will use it in next rev. thanks for the hint . -Brijeshh
On Tue, Feb 12, 2019 at 12:12 PM Singh, Brijesh <brijesh.singh@amd.com> wrote: > On 2/12/19 11:41 AM, Jim Mattson wrote: > > On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote: > >> > >> Errata#1090: > >> > >> On a nested data page fault when CR.SMAP=1 and the guest data read > >> generates a SMAP violation, GuestInstrBytes field of the VMCB on a > >> VMEXIT will incorrectly return 0h instead the correct guest > >> instruction bytes Do you mean Errata #1096? (https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf v1.12 pg 61) > >> > >> Recommend Workaround: > >> > >> To determine what instruction the guest was executing the hypervisor > >> will have to decode the instruction at the instruction pointer. > >> > >> The recommended workaround can not be implemented for the SEV > >> guest because guest memory is encrypted with the guest specific key, > >> and instruction decoder will not be able to decode the instruction > >> bytes. If we hit this errata in the SEV guest then inject #GP into > >> the guest and log the message. > >> > >> Cc: Jim Mattson <jmattson@google.com> > >> Cc: Tom Lendacky <thomas.lendacky@amd.com> > >> Cc: Borislav Petkov <bp@alien8.de> > >> Cc: Joerg Roedel <joro@8bytes.org> > >> Cc: "Radim Krčmář" <rkrcmar@redhat.com> > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > > > > This was ... > > Reported-by: Venkatesh Srinivas <venkateshs@google.com> > > > > I will add the tag in next rev. > > > > I'm curious why you chose to inject #GP rather than, say, requesting a > > guest shutdown. Is the guest #GP handler expected to be able to > > recover from this? > > > > > We will *not* be able to recover from this, I wanted to abort the > guest and I should admit that I was not ware of requesting a SHUTDOWN > method so decided to inject #GP so that guest does not continue. > Browsing further, I see that kvm_make_request(KVM_REQ_TRIPLE_FAULT, > vcpu) can be used to request a SHUTDOWN. I will use it in next > rev. thanks for the hint . > > -Brijeshh Should the pr_err() be ratelimited? Otherwise a guest suppressing #GP could spam the host dmesg. Thanks, -- vs;
On 2/12/19 2:38 PM, Venkatesh Srinivas wrote: > On Tue, Feb 12, 2019 at 12:12 PM Singh, Brijesh <brijesh.singh@amd.com> wrote: >> On 2/12/19 11:41 AM, Jim Mattson wrote: >>> On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote: >>>> Errata#1090: >>>> >>>> On a nested data page fault when CR.SMAP=1 and the guest data read >>>> generates a SMAP violation, GuestInstrBytes field of the VMCB on a >>>> VMEXIT will incorrectly return 0h instead the correct guest >>>> instruction bytes > Do you mean Errata #1096? > (https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf > v1.12 pg 61) Yes, I did noticed after sending the patch. Will fix in next rev. >>>> Recommend Workaround: >>>> >>>> To determine what instruction the guest was executing the hypervisor >>>> will have to decode the instruction at the instruction pointer. >>>> >>>> The recommended workaround can not be implemented for the SEV >>>> guest because guest memory is encrypted with the guest specific key, >>>> and instruction decoder will not be able to decode the instruction >>>> bytes. If we hit this errata in the SEV guest then inject #GP into >>>> the guest and log the message. >>>> >>>> Cc: Jim Mattson <jmattson@google.com> >>>> Cc: Tom Lendacky <thomas.lendacky@amd.com> >>>> Cc: Borislav Petkov <bp@alien8.de> >>>> Cc: Joerg Roedel <joro@8bytes.org> >>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com> >>>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> >>> This was ... >>> Reported-by: Venkatesh Srinivas <venkateshs@google.com> >>> >> I will add the tag in next rev. >> >> >>> I'm curious why you chose to inject #GP rather than, say, requesting a >>> guest shutdown. Is the guest #GP handler expected to be able to >>> recover from this? >>> >> >> We will *not* be able to recover from this, I wanted to abort the >> guest and I should admit that I was not ware of requesting a SHUTDOWN >> method so decided to inject #GP so that guest does not continue. >> Browsing further, I see that kvm_make_request(KVM_REQ_TRIPLE_FAULT, >> vcpu) can be used to request a SHUTDOWN. I will use it in next >> rev. thanks for the hint . >> >> -Brijeshh > Should the pr_err() be ratelimited? Otherwise a guest suppressing #GP > could spam the host dmesg. Agree, ratelimited is good idea to suppressing the spam of host dmesg. Will use it in next rev. thanks -Brijesh > Thanks, > -- vs;
On 12/02/19 18:41, Jim Mattson wrote: > On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote: >> >> Errata#1090: >> >> On a nested data page fault when CR.SMAP=1 and the guest data read >> generates a SMAP violation, GuestInstrBytes field of the VMCB on a >> VMEXIT will incorrectly return 0h instead the correct guest >> instruction bytes . >> >> Recommend Workaround: >> >> To determine what instruction the guest was executing the hypervisor >> will have to decode the instruction at the instruction pointer. >> >> The recommended workaround can not be implemented for the SEV >> guest because guest memory is encrypted with the guest specific key, >> and instruction decoder will not be able to decode the instruction >> bytes. If we hit this errata in the SEV guest then inject #GP into >> the guest and log the message. Actually this is not the workaround that KVM is implementing; KVM is simply retrying the instruction after fixing the page fault. This would cause an infinite loop in the guest if the instruction is actually MMIO, however in that case KVM will get an RSVD page fault rather than a SMAP page fault and the errata would not be triggered. So why is this patch needed? Thanks, Paolo
Hi Paolo, On 2/13/19 8:11 AM, Paolo Bonzini wrote: > On 12/02/19 18:41, Jim Mattson wrote: >> On Tue, Feb 12, 2019 at 6:44 AM Singh, Brijesh <brijesh.singh@amd.com> wrote: >>> >>> Errata#1090: >>> >>> On a nested data page fault when CR.SMAP=1 and the guest data read >>> generates a SMAP violation, GuestInstrBytes field of the VMCB on a >>> VMEXIT will incorrectly return 0h instead the correct guest >>> instruction bytes . >>> >>> Recommend Workaround: >>> >>> To determine what instruction the guest was executing the hypervisor >>> will have to decode the instruction at the instruction pointer. >>> >>> The recommended workaround can not be implemented for the SEV >>> guest because guest memory is encrypted with the guest specific key, >>> and instruction decoder will not be able to decode the instruction >>> bytes. If we hit this errata in the SEV guest then inject #GP into >>> the guest and log the message. > > Actually this is not the workaround that KVM is implementing; KVM is > simply retrying the instruction after fixing the page fault. This would > cause an infinite loop in the guest if the instruction is actually MMIO, > however in that case KVM will get an RSVD page fault rather than a SMAP > page fault and the errata would not be triggered. So why is this patch > needed? > KVM is not getting a SMAP page fault. The HW gets a SMAP fault when trying to fetch the instruction bytes, and that fault is ignored (and instead the ucode just doesn't save any instruction bytes). The fault code to KVM is the RSVD. While handling the RSVD fault, KVM may come to the code path which requires instruction emulation. At this time we will check to see if SMAP was enabled and insn_len is zero. If so, we are hitting this errata. If we skip the emulation and restart the guest then in this particular case we will get the *same* fault again and enter into infinite loop. In case of non SEV, we can workaround it by letting the KVM read the instruction bytes from the guest RIP. But in case of SEV we will have no choice other than requesting to restart and thus causing guest an infinite loop. In SEV, my attempt was to detect this errata and log it and request the SHUTDOWN of the guest so that we continue further (in v1 I am doing #GP instead of the SHUTDOWN request, based on Jim's comment I am switching to use kvm_make_request(...)). thanks Brijesh
On 13/02/19 17:40, Singh, Brijesh wrote: > > KVM is not getting a SMAP page fault. The HW gets a SMAP fault when > trying to fetch the instruction bytes How does the hardware get a SMAP fault when trying to fetch instructions? SMAP faults are only generated for data accesses, not instructions. Now I'm even more confused. :) Paolo , and that fault is ignored (and > instead the ucode just doesn't save any instruction bytes). The fault > code to KVM is the RSVD. While handling the RSVD fault, KVM may come to > the code path which requires instruction emulation. At this time we will > check to see if SMAP was enabled and insn_len is zero. If so, we are > hitting this errata. If we skip the emulation and restart the guest > then in this particular case we will get the *same* fault again and > enter into infinite loop. In case of non SEV, we can workaround it by > letting the KVM read the instruction bytes from the guest RIP. But > in case of SEV we will have no choice other than requesting to restart > and thus causing guest an infinite loop. In SEV, my attempt was to > detect this errata and log it and request the SHUTDOWN of the > guest so that we continue further (in v1 I am doing #GP instead > of the SHUTDOWN request, based on Jim's comment I am switching to use > kvm_make_request(...)).
On 2/13/19 10:44 AM, Paolo Bonzini wrote: > On 13/02/19 17:40, Singh, Brijesh wrote: >> >> KVM is not getting a SMAP page fault. The HW gets a SMAP fault when >> trying to fetch the instruction bytes > > How does the hardware get a SMAP fault when trying to fetch > instructions? SMAP faults are only generated for data accesses, not > instructions. Now I'm even more confused. :) > I was hoping to clarify instead of causing more confusion :) Here's is how it works: 1. Guest does the MMIO access which causes a rsvd page fault 2. Hardware processes this as a VMEXIT 3. During the processing, hardware attempts to read the instruction bytes. This is done by issuing a data read request from the RIP that the guest was at. 4. The result of these reads are then stored in the VMCB. So in step #3 there can be a SMAP fault because internally the CPU is doing a data read from the RIP to get these bytes. Hardware didn't save them from the actual instruction execution or anything, it actually go and re-read them, which is why this can cause a SMAP fault. > Paolo > > , and that fault is ignored (and >> instead the ucode just doesn't save any instruction bytes). The fault >> code to KVM is the RSVD. While handling the RSVD fault, KVM may come to >> the code path which requires instruction emulation. At this time we will >> check to see if SMAP was enabled and insn_len is zero. If so, we are >> hitting this errata. If we skip the emulation and restart the guest >> then in this particular case we will get the *same* fault again and >> enter into infinite loop. In case of non SEV, we can workaround it by >> letting the KVM read the instruction bytes from the guest RIP. But >> in case of SEV we will have no choice other than requesting to restart >> and thus causing guest an infinite loop. In SEV, my attempt was to >> detect this errata and log it and request the SHUTDOWN of the >> guest so that we continue further (in v1 I am doing #GP instead >> of the SHUTDOWN request, based on Jim's comment I am switching to use >> kvm_make_request(...)). >
On 13/02/19 18:19, Singh, Brijesh wrote: > 1. Guest does the MMIO access which causes a rsvd page fault > 2. Hardware processes this as a VMEXIT > 3. During the processing, hardware attempts to read the instruction > bytes. This is done by issuing a data read request from the RIP > that the guest was at. > 4. The result of these reads are then stored in the VMCB. > > So in step #3 there can be a SMAP fault because internally the CPU > is doing a data read from the RIP to get these bytes. Hardware didn't > save them from the actual instruction execution or anything, it > actually go and re-read them, which is why this can cause a SMAP > fault. What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault? Does KVM actually use that combination? Paolo
On 2/13/19 11:23 AM, Paolo Bonzini wrote: > On 13/02/19 18:19, Singh, Brijesh wrote: >> 1. Guest does the MMIO access which causes a rsvd page fault >> 2. Hardware processes this as a VMEXIT >> 3. During the processing, hardware attempts to read the instruction >> bytes. This is done by issuing a data read request from the RIP >> that the guest was at. >> 4. The result of these reads are then stored in the VMCB. >> >> So in step #3 there can be a SMAP fault because internally the CPU >> is doing a data read from the RIP to get these bytes. Hardware didn't >> save them from the actual instruction execution or anything, it >> actually go and re-read them, which is why this can cause a SMAP >> fault. > > What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault? e.g 1. Guest has SMAP enabled 2. RIP is mapped in guest page tables with U/S=1 The CPU always does the these insn reads as if CPL=0. In the above case the insn fetch #3 will cause SMAP fault in the hardware. On VMEXIT, KVM will get the #NPF with a RSVD fault. > Does KVM actually use that combination? > Sorry, I am not sure if I understand you question. From KVM point-of- view, we get a #NPF (a RSVD fault). I have not encountered this in any of my guest runs (typically we don't do MMIO at CPL=3). -Brijesh
On 13/02/19 19:28, Singh, Brijesh wrote: > > > On 2/13/19 11:23 AM, Paolo Bonzini wrote: >> On 13/02/19 18:19, Singh, Brijesh wrote: >>> 1. Guest does the MMIO access which causes a rsvd page fault >>> 2. Hardware processes this as a VMEXIT >>> 3. During the processing, hardware attempts to read the instruction >>> bytes. This is done by issuing a data read request from the RIP >>> that the guest was at. >>> 4. The result of these reads are then stored in the VMCB. >>> >>> So in step #3 there can be a SMAP fault because internally the CPU >>> is doing a data read from the RIP to get these bytes. Hardware didn't >>> save them from the actual instruction execution or anything, it >>> actually go and re-read them, which is why this can cause a SMAP >>> fault. >> >> What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault? > > e.g > > 1. Guest has SMAP enabled > 2. RIP is mapped in guest page tables with U/S=1 Ok so that's pretty messed up. Is this fixed with newer microcode, and how widespread is this silicon? I'm tempted to just blacklist SMAP on the affected steppings. Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1, and set CR4.SMAP back to 1 in the #DB handler? Thanks, Paolo > The CPU always does the these insn reads as if CPL=0. In the above case > the insn fetch #3 will cause SMAP fault in the hardware. On VMEXIT, > KVM will get the #NPF with a RSVD fault. > >> Does KVM actually use that combination? > > Sorry, I am not sure if I understand you question. From KVM point-of- > view, we get a #NPF (a RSVD fault). > > I have not encountered this in any of my guest runs (typically > we don't do MMIO at CPL=3). > > -Brijesh >
On 2/13/19 12:33 PM, Paolo Bonzini wrote: > On 13/02/19 19:28, Singh, Brijesh wrote: >> >> >> On 2/13/19 11:23 AM, Paolo Bonzini wrote: >>> On 13/02/19 18:19, Singh, Brijesh wrote: >>>> 1. Guest does the MMIO access which causes a rsvd page fault >>>> 2. Hardware processes this as a VMEXIT >>>> 3. During the processing, hardware attempts to read the instruction >>>> bytes. This is done by issuing a data read request from the RIP >>>> that the guest was at. >>>> 4. The result of these reads are then stored in the VMCB. >>>> >>>> So in step #3 there can be a SMAP fault because internally the CPU >>>> is doing a data read from the RIP to get these bytes. Hardware didn't >>>> save them from the actual instruction execution or anything, it >>>> actually go and re-read them, which is why this can cause a SMAP >>>> fault. >>> >>> What combination of NPF bits, EFLAGS and CR4 is causing a SMAP fault? >> >> e.g >> >> 1. Guest has SMAP enabled >> 2. RIP is mapped in guest page tables with U/S=1 > > Ok so that's pretty messed up. Is this fixed with newer microcode, and > how widespread is this silicon? I'm tempted to just blacklist SMAP on > the affected steppings. > This errata is applicable to Family 17h model 00_0fh only, and it is definitely planned to fix in the upcoming CPUs. For now, we just need to workaround for the Fam 17h_00_0Fh. I am not sure about blacklist SMAP all together, for non SEV its very easy to workaround. For SEV case, I am just trying to say that we will *not* workaround (it does not mean that its not possible) > Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1, > and set CR4.SMAP back to 1 in the #DB handler? > Theoretically we should be able to do this to workaround. But since the errata is not wide spread hence I am not sure about going to this path. Also there are not many apps doing MMIO from userspace hence I thought its okay to not workaround for the SEV guest. Having said so, if we find the actual need to workaround then we can go to that path. thanks
On 13/02/19 20:02, Singh, Brijesh wrote: >> Ok so that's pretty messed up. Is this fixed with newer microcode, and >> how widespread is this silicon? I'm tempted to just blacklist SMAP on >> the affected steppings. > > This errata is applicable to Family 17h model 00_0fh only, and it is > definitely planned to fix in the upcoming CPUs. For now, we just need to > workaround for the Fam 17h_00_0Fh. So it's on all steppings; no microcode fix. :( > I am not sure about blacklist SMAP all together, for non SEV its > very easy to workaround. For SEV case, I am just trying to say that > we will *not* workaround (it does not mean that its not possible) > >> Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1, >> and set CR4.SMAP back to 1 in the #DB handler? > > Theoretically we should be able to do this to workaround. But since > the errata is not wide spread hence I am not sure about going to > this path. Also there are not many apps doing MMIO from userspace > hence I thought its okay to not workaround for the SEV guest. > > Having said so, if we find the actual need to workaround then we can > go to that path. If it's not too hard, having the workaround would be nice(r). Paolo
On 2/14/19 6:26 AM, Paolo Bonzini wrote: > On 13/02/19 20:02, Singh, Brijesh wrote: >>> Ok so that's pretty messed up. Is this fixed with newer microcode, and >>> how widespread is this silicon? I'm tempted to just blacklist SMAP on >>> the affected steppings. >> >> This errata is applicable to Family 17h model 00_0fh only, and it is >> definitely planned to fix in the upcoming CPUs. For now, we just need to >> workaround for the Fam 17h_00_0Fh. > > So it's on all steppings; no microcode fix. :( Unfortunately yes :( > >> I am not sure about blacklist SMAP all together, for non SEV its >> very easy to workaround. For SEV case, I am just trying to say that >> we will *not* workaround (it does not mean that its not possible) >> >>> Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1, >>> and set CR4.SMAP back to 1 in the #DB handler? >> >> Theoretically we should be able to do this to workaround. But since >> the errata is not wide spread hence I am not sure about going to >> this path. Also there are not many apps doing MMIO from userspace >> hence I thought its okay to not workaround for the SEV guest. >> >> Having said so, if we find the actual need to workaround then we can >> go to that path. > > If it's not too hard, having the workaround would be nice(r). > For now I am inclined to go with what we have. Will submit v2 with fixes. Theoretically we can use the single stepping with CR4.SMAP=0. Since this is not a suggested workaround in errata doc hence I am afraid that we may run into some unknown (mainly with SEV); I will check with HW folks to get some acknowledgment. If we all feel comfortable then I will submit a follow up patch. thanks -Brijesh
On Thu, Feb 14, 2019 at 6:31 AM Singh, Brijesh <brijesh.singh@amd.com> wrote: > > > > On 2/14/19 6:26 AM, Paolo Bonzini wrote: > > On 13/02/19 20:02, Singh, Brijesh wrote: > >>> Ok so that's pretty messed up. Is this fixed with newer microcode, and > >>> how widespread is this silicon? I'm tempted to just blacklist SMAP on > >>> the affected steppings. > >> > >> This errata is applicable to Family 17h model 00_0fh only, and it is > >> definitely planned to fix in the upcoming CPUs. For now, we just need to > >> workaround for the Fam 17h_00_0Fh. > > > > So it's on all steppings; no microcode fix. :( > > Unfortunately yes :( > > > > >> I am not sure about blacklist SMAP all together, for non SEV its > >> very easy to workaround. For SEV case, I am just trying to say that > >> we will *not* workaround (it does not mean that its not possible) > >> > >>> Would it work to retry the instruction with CR4.SMAP=0 and EFLAGS.TF=1, > >>> and set CR4.SMAP back to 1 in the #DB handler? > >> > >> Theoretically we should be able to do this to workaround. But since > >> the errata is not wide spread hence I am not sure about going to > >> this path. Also there are not many apps doing MMIO from userspace > >> hence I thought its okay to not workaround for the SEV guest. > >> > >> Having said so, if we find the actual need to workaround then we can > >> go to that path. > > > > If it's not too hard, having the workaround would be nice(r). > > > > For now I am inclined to go with what we have. Will submit v2 > with fixes. > > Theoretically we can use the single stepping with CR4.SMAP=0. SEV allows the hypervisor to override the guest OS's CR4.SMAP setting?!? That seems counter-intuitive, given SEV's intended use. Doesn't this potentially give a ring-3 agent in the guest an avenue to privilege escalation through collusion with the hypervisor?
On 14/02/19 18:31, Jim Mattson wrote: >> Theoretically we can use the single stepping with CR4.SMAP=0. > > SEV allows the hypervisor to override the guest OS's CR4.SMAP > setting?!? That seems counter-intuitive, given SEV's intended use. > Doesn't this potentially give a ring-3 agent in the guest an avenue to > privilege escalation through collusion with the hypervisor? The first version does not protect any register content. See this paper for an example: https://arxiv.org/pdf/1612.01119.pdf Paolo
On 2/14/19 11:31 AM, Jim Mattson wrote: ... >>> >> >> For now I am inclined to go with what we have. Will submit v2 >> with fixes. >> >> Theoretically we can use the single stepping with CR4.SMAP=0. > > SEV allows the hypervisor to override the guest OS's CR4.SMAP > setting?!? That seems counter-intuitive, given SEV's intended use. > Doesn't this potentially give a ring-3 agent in the guest an avenue to > privilege escalation through collusion with the hypervisor? > The guest register state is protected with the SEV-ES feature. The SEV-ES feature is supported in current HW but its not supported in KVM yet. We are actively working to add this feature very soon. -Brijesh
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4660ce90de7f..536acf622af9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1194,6 +1194,8 @@ struct kvm_x86_ops { int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu, uint16_t *vmcs_version); uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu); + + bool (*emulate_instruction_possible)(struct kvm_vcpu *vcpu); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index da9c42349b1f..8ebc1bfcabd3 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5384,8 +5384,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, * page is not present in memory). In those cases we simply restart the * guest. */ - if (unlikely(insn && !insn_len)) - return 1; + if (unlikely(insn && !insn_len)) { + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) + return 1; + } er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f13a3a24d360..1f359667e529 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -7098,6 +7098,36 @@ static int nested_enable_evmcs(struct kvm_vcpu *vcpu, return -ENODEV; } +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu) +{ + bool is_user, smap; + + is_user = svm_get_cpl(vcpu) == 3; + smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); + + /* + * Detect and workaround Errata 1090 Fam_17h_00_0Fh + * + * In non SEV guest, hypervisor will be able to read the guest + * memory to decode the instruction pointer when insn_len is zero + * so we return true to indicate that decoding is possible. + * + * But in the SEV guest, the guest memory is encrypted with the + * guest specific key and hypervisor will not be able to decode the + * instruction pointer so we will not able to workaround it. Lets + * print the error and inject a #GP in the guest. + */ + if (is_user && smap) { + if (!sev_guest(vcpu->kvm)) + return true; + + pr_err("ERROR: encountered errata #1090, injecting #GP\n"); + kvm_inject_gp(vcpu, 0); + } + + return false; +} + static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .cpu_has_kvm_support = has_svm, .disabled_by_bios = is_disabled, @@ -7231,6 +7261,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .nested_enable_evmcs = nested_enable_evmcs, .nested_get_evmcs_version = nested_get_evmcs_version, + + .emulate_instruction_possible = emulate_instruction_possible, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 95d618045001..6767bad8367e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7530,6 +7530,11 @@ static int enable_smi_window(struct kvm_vcpu *vcpu) return 0; } +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu) +{ + return 1; +} + static __init int hardware_setup(void) { unsigned long host_bndcfgs; @@ -7832,6 +7837,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .set_nested_state = NULL, .get_vmcs12_pages = NULL, .nested_enable_evmcs = NULL, + .emulate_instruction_possible = emulate_instruction_possible, }; static void vmx_cleanup_l1d_flush(void)
Errata#1090: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then inject #GP into the guest and log the message. Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- Errata link: https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.c | 6 ++++-- arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 6 ++++++ 4 files changed, 44 insertions(+), 2 deletions(-)