Message ID | 20190215172356.20385-1-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) | expand |
On Fri, Feb 15, 2019 at 05:24:12PM +0000, Singh, Brijesh wrote: > Errata#1096: > > On a nested data page fault when CR.SMAP=1 and the guest data read > generates a SMAP violation, GuestInstrBytes field of the VMCB on a > VMEXIT will incorrectly return 0h instead the correct guest > instruction bytes . > > Recommend Workaround: > > To determine what instruction the guest was executing the hypervisor > will have to decode the instruction at the instruction pointer. > > The recommended workaround can not be implemented for the SEV > guest because guest memory is encrypted with the guest specific key, > and instruction decoder will not be able to decode the instruction > bytes. If we hit this errata in the SEV guest then log the message > and request a guest shutdown. > > Reported-by: Venkatesh Srinivas <venkateshs@google.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: "Radim Krčmář" <rkrcmar@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > --- > > Changes since v1: > * request to shutdown the guest instead of injecting #GP > * use the pr_err_ratelimited to supress the host dmesg > > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/mmu.c | 6 ++++-- > arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++ > arch/x86/kvm/vmx/vmx.c | 6 ++++++ > 4 files changed, 44 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 4660ce90de7f..536acf622af9 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1194,6 +1194,8 @@ struct kvm_x86_ops { > int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu, > uint16_t *vmcs_version); > uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu); > + > + bool (*emulate_instruction_possible)(struct kvm_vcpu *vcpu); > }; > > struct kvm_arch_async_pf { > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index da9c42349b1f..8ebc1bfcabd3 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5384,8 +5384,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, > * page is not present in memory). In those cases we simply restart the > * guest. > */ > - if (unlikely(insn && !insn_len)) > - return 1; > + if (unlikely(insn && !insn_len)) { > + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) > + return 1; > + } IMO "insn && !insn_len" belongs in emulate_instruction_possible(). The cost of the indirect function call can be avoided by making the callback optional and only attaching it when necessary, e.g. if the CPU is affected by the errata. E.g.: For brevity, maybe emulation_impossible() as the ops' name? E.g.: if (kvm_x86_ops->emulation_impossible && kvm_x86_ops->emulation_impossible(vcpu, insn, insn_len)) return 1; > > er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len);
On 2/15/19 3:32 PM, Sean Christopherson wrote: > On Fri, Feb 15, 2019 at 05:24:12PM +0000, Singh, Brijesh wrote: >> Errata#1096: >> >> On a nested data page fault when CR.SMAP=1 and the guest data read >> generates a SMAP violation, GuestInstrBytes field of the VMCB on a >> VMEXIT will incorrectly return 0h instead the correct guest >> instruction bytes . >> >> Recommend Workaround: >> >> To determine what instruction the guest was executing the hypervisor >> will have to decode the instruction at the instruction pointer. >> >> The recommended workaround can not be implemented for the SEV >> guest because guest memory is encrypted with the guest specific key, >> and instruction decoder will not be able to decode the instruction >> bytes. If we hit this errata in the SEV guest then log the message >> and request a guest shutdown. >> >> Reported-by: Venkatesh Srinivas <venkateshs@google.com> >> Cc: Jim Mattson <jmattson@google.com> >> Cc: Tom Lendacky <thomas.lendacky@amd.com> >> Cc: Borislav Petkov <bp@alien8.de> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: "Radim Krčmář" <rkrcmar@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> >> --- >> >> Changes since v1: >> * request to shutdown the guest instead of injecting #GP >> * use the pr_err_ratelimited to supress the host dmesg >> >> arch/x86/include/asm/kvm_host.h | 2 ++ >> arch/x86/kvm/mmu.c | 6 ++++-- >> arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++ >> arch/x86/kvm/vmx/vmx.c | 6 ++++++ >> 4 files changed, 44 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 4660ce90de7f..536acf622af9 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -1194,6 +1194,8 @@ struct kvm_x86_ops { >> int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu, >> uint16_t *vmcs_version); >> uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu); >> + >> + bool (*emulate_instruction_possible)(struct kvm_vcpu *vcpu); >> }; >> >> struct kvm_arch_async_pf { >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index da9c42349b1f..8ebc1bfcabd3 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -5384,8 +5384,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, >> * page is not present in memory). In those cases we simply restart the >> * guest. >> */ >> - if (unlikely(insn && !insn_len)) >> - return 1; >> + if (unlikely(insn && !insn_len)) { >> + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) >> + return 1; >> + } > > IMO "insn && !insn_len" belongs in emulate_instruction_possible(). The > cost of the indirect function call can be avoided by making the callback > optional and only attaching it when necessary, e.g. if the CPU is > affected by the errata. E.g.: > Looking at the recent commits, it seems that the preference it to avoid making callback optional. I don't have strong opinion on it. Since most of time the insn_len is *not* zero, IMO we should keep the unlikely macro check here to avoid unnecessary function call. > For brevity, maybe emulation_impossible() as the ops' name? E.g.: > > if (kvm_x86_ops->emulation_impossible && > kvm_x86_ops->emulation_impossible(vcpu, insn, insn_len)) > return 1; > >> >> er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len);
On 15/02/19 18:24, Singh, Brijesh wrote: > Errata#1096: > > On a nested data page fault when CR.SMAP=1 and the guest data read > generates a SMAP violation, GuestInstrBytes field of the VMCB on a > VMEXIT will incorrectly return 0h instead the correct guest > instruction bytes . > > Recommend Workaround: > > To determine what instruction the guest was executing the hypervisor > will have to decode the instruction at the instruction pointer. > > The recommended workaround can not be implemented for the SEV > guest because guest memory is encrypted with the guest specific key, > and instruction decoder will not be able to decode the instruction > bytes. If we hit this errata in the SEV guest then log the message > and request a guest shutdown. > > Reported-by: Venkatesh Srinivas <venkateshs@google.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: "Radim Krčmář" <rkrcmar@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > --- > > Changes since v1: > * request to shutdown the guest instead of injecting #GP > * use the pr_err_ratelimited to supress the host dmesg > > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/mmu.c | 6 ++++-- > arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++ > arch/x86/kvm/vmx/vmx.c | 6 ++++++ > 4 files changed, 44 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 4660ce90de7f..536acf622af9 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1194,6 +1194,8 @@ struct kvm_x86_ops { > int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu, > uint16_t *vmcs_version); > uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu); > + > + bool (*emulate_instruction_possible)(struct kvm_vcpu *vcpu); > }; > > struct kvm_arch_async_pf { > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index da9c42349b1f..8ebc1bfcabd3 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5384,8 +5384,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, > * page is not present in memory). In those cases we simply restart the > * guest. > */ > - if (unlikely(insn && !insn_len)) > - return 1; > + if (unlikely(insn && !insn_len)) { > + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) > + return 1; I am confused. As the comment says, the "return 1" will _skip_ emulation, so if you enter the "if" you are not going to do any emulation in the first place. You could get an infinite loop though. What you want is, methinks, if (unlikely(insn && !insn_len)) { if (!kvm_x86_ops->need_emulation_on_page_fault(vcpu)) return 1; } Is this name accurate? I am okay with keeping the if here; it is unlikely to happen, or even impossible on Intel, so there won't be a performance impact from the (never invoked) callback on Intel. Paolo
On 2/22/19 11:46 AM, Paolo Bonzini wrote: > On 15/02/19 18:24, Singh, Brijesh wrote: >> Errata#1096: >> >> On a nested data page fault when CR.SMAP=1 and the guest data read >> generates a SMAP violation, GuestInstrBytes field of the VMCB on a >> VMEXIT will incorrectly return 0h instead the correct guest >> instruction bytes . >> >> Recommend Workaround: >> >> To determine what instruction the guest was executing the hypervisor >> will have to decode the instruction at the instruction pointer. >> >> The recommended workaround can not be implemented for the SEV >> guest because guest memory is encrypted with the guest specific key, >> and instruction decoder will not be able to decode the instruction >> bytes. If we hit this errata in the SEV guest then log the message >> and request a guest shutdown. >> >> Reported-by: Venkatesh Srinivas <venkateshs@google.com> >> Cc: Jim Mattson <jmattson@google.com> >> Cc: Tom Lendacky <thomas.lendacky@amd.com> >> Cc: Borislav Petkov <bp@alien8.de> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: "Radim Krčmář" <rkrcmar@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> >> --- >> >> Changes since v1: >> * request to shutdown the guest instead of injecting #GP >> * use the pr_err_ratelimited to supress the host dmesg >> >> arch/x86/include/asm/kvm_host.h | 2 ++ >> arch/x86/kvm/mmu.c | 6 ++++-- >> arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++ >> arch/x86/kvm/vmx/vmx.c | 6 ++++++ >> 4 files changed, 44 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 4660ce90de7f..536acf622af9 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -1194,6 +1194,8 @@ struct kvm_x86_ops { >> int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu, >> uint16_t *vmcs_version); >> uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu); >> + >> + bool (*emulate_instruction_possible)(struct kvm_vcpu *vcpu); >> }; >> >> struct kvm_arch_async_pf { >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index da9c42349b1f..8ebc1bfcabd3 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -5384,8 +5384,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, >> * page is not present in memory). In those cases we simply restart the >> * guest. >> */ >> - if (unlikely(insn && !insn_len)) >> - return 1; >> + if (unlikely(insn && !insn_len)) { >> + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) >> + return 1; > > I am confused. > > As the comment says, the "return 1" will _skip_ emulation, so if you > enter the "if" you are not going to do any emulation in the first place. > You could get an infinite loop though. > If emulation is required and we are not able to read the instruction bytes then we would like to restart the guest. For this particular errata if its SEV guest then yes we will enter into infinite loop hoping that the KVM_SHUTDOWN request will eventually kill the guest. > What you want is, methinks, > > if (unlikely(insn && !insn_len)) { > if (!kvm_x86_ops->need_emulation_on_page_fault(vcpu)) > return 1; > } > > Is this name accurate? I am okay with keeping the if here; it is > unlikely to happen, or even impossible on Intel, so there won't be a > performance impact from the (never invoked) callback on Intel. > Yes, that seems a bit better name than what I have in the patch.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4660ce90de7f..536acf622af9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1194,6 +1194,8 @@ struct kvm_x86_ops { int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu, uint16_t *vmcs_version); uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu); + + bool (*emulate_instruction_possible)(struct kvm_vcpu *vcpu); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index da9c42349b1f..8ebc1bfcabd3 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5384,8 +5384,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, * page is not present in memory). In those cases we simply restart the * guest. */ - if (unlikely(insn && !insn_len)) - return 1; + if (unlikely(insn && !insn_len)) { + if (!kvm_x86_ops->emulate_instruction_possible(vcpu)) + return 1; + } er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f13a3a24d360..32eba0dd1da6 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -7098,6 +7098,36 @@ static int nested_enable_evmcs(struct kvm_vcpu *vcpu, return -ENODEV; } +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu) +{ + bool is_user, smap; + + is_user = svm_get_cpl(vcpu) == 3; + smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); + + /* + * Detect and workaround Errata 1096 Fam_17h_00_0Fh + * + * In non SEV guest, hypervisor will be able to read the guest + * memory to decode the instruction pointer when insn_len is zero + * so we return true to indicate that decoding is possible. + * + * But in the SEV guest, the guest memory is encrypted with the + * guest specific key and hypervisor will not be able to decode the + * instruction pointer so we will not able to workaround it. Lets + * print the error and request to kill the guest. + */ + if (is_user && smap) { + if (!sev_guest(vcpu->kvm)) + return true; + + pr_err_ratelimited("KVM: Guest triggered AMD Erratum 1096\n"); + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + } + + return false; +} + static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .cpu_has_kvm_support = has_svm, .disabled_by_bios = is_disabled, @@ -7231,6 +7261,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .nested_enable_evmcs = nested_enable_evmcs, .nested_get_evmcs_version = nested_get_evmcs_version, + + .emulate_instruction_possible = emulate_instruction_possible, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 95d618045001..6d64b7620741 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7530,6 +7530,11 @@ static int enable_smi_window(struct kvm_vcpu *vcpu) return 0; } +static bool emulate_instruction_possible(struct kvm_vcpu *vcpu) +{ + return 0; +} + static __init int hardware_setup(void) { unsigned long host_bndcfgs; @@ -7832,6 +7837,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .set_nested_state = NULL, .get_vmcs12_pages = NULL, .nested_enable_evmcs = NULL, + .emulate_instruction_possible = emulate_instruction_possible, }; static void vmx_cleanup_l1d_flush(void)
Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: Venkatesh Srinivas <venkateshs@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- Changes since v1: * request to shutdown the guest instead of injecting #GP * use the pr_err_ratelimited to supress the host dmesg arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.c | 6 ++++-- arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 6 ++++++ 4 files changed, 44 insertions(+), 2 deletions(-)