Message ID | cover.1600114548.git.thomas.lendacky@amd.com (mailing list archive) |
---|---|
Headers | show |
Series | SEV-ES hypervisor support | expand |
On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote: > From: Tom Lendacky <thomas.lendacky@amd.com> > > This patch series provides support for running SEV-ES guests under KVM. From the x86/VMX side of things, the GPR hooks are the only changes that I strongly dislike. For the vmsa_encrypted flag and related things like allow_debug(), I'd really like to aim for a common implementation between SEV-ES and TDX[*] from the get go, within reason obviously. From a code perspective, I don't think it will be too onerous as the basic tenets are quite similar, e.g. guest state is off limits, FPU state is autoswitched, etc..., but I suspect (or maybe worry?) that there are enough minor differences that we'll want a more generic way of marking ioctls() as disallowed to avoid having one-off checks all over the place. That being said, it may also be that there are some ioctls() that should be disallowed under SEV-ES, but aren't in this series. E.g. I assume kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary emulation (I assume this applies to vanilla SEV as well?). One thought to try and reconcile the differences between SEV-ES and TDX would be expicitly list which ioctls() are and aren't supported and go from there? E.g. if there is 95% overlap than we probably don't need to get fancy with generic allow/deny. Given that we don't yet have publicly available KVM code for TDX, what if I generate and post a list of ioctls() that are denied by either SEV-ES or TDX, organized by the denier(s)? Then for the ioctls() that are denied by one and not the other, we add a brief explanation of why it's denied? If that sounds ok, I'll get the list and the TDX side of things posted tomorrow. Thanks! [*] https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
On 9/14/20 5:59 PM, Sean Christopherson wrote: > On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote: >> From: Tom Lendacky <thomas.lendacky@amd.com> >> >> This patch series provides support for running SEV-ES guests under KVM. > > From the x86/VMX side of things, the GPR hooks are the only changes that I > strongly dislike. > > For the vmsa_encrypted flag and related things like allow_debug(), I'd > really like to aim for a common implementation between SEV-ES and TDX[*] from > the get go, within reason obviously. From a code perspective, I don't think > it will be too onerous as the basic tenets are quite similar, e.g. guest > state is off limits, FPU state is autoswitched, etc..., but I suspect (or > maybe worry?) that there are enough minor differences that we'll want a more > generic way of marking ioctls() as disallowed to avoid having one-off checks > all over the place. > > That being said, it may also be that there are some ioctls() that should be > disallowed under SEV-ES, but aren't in this series. E.g. I assume > kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary > emulation (I assume this applies to vanilla SEV as well?). Right, SMM isn't currently supported under SEV-ES. SEV does support SMM, though, since the register state can be altered to change over to the SMM register state. So the SMI ioctl() is ok for SEV. > > One thought to try and reconcile the differences between SEV-ES and TDX would > be expicitly list which ioctls() are and aren't supported and go from there? > E.g. if there is 95% overlap than we probably don't need to get fancy with > generic allow/deny. > > Given that we don't yet have publicly available KVM code for TDX, what if I > generate and post a list of ioctls() that are denied by either SEV-ES or TDX, > organized by the denier(s)? Then for the ioctls() that are denied by one and > not the other, we add a brief explanation of why it's denied? > > If that sounds ok, I'll get the list and the TDX side of things posted > tomorrow. That sounds good. Thanks, Tom > > Thanks! > > > [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdevelop%2Farticles%2Fintel-trust-domain-extensions.html&data=02%7C01%7Cthomas.lendacky%40amd.com%7C000b3d355429471694fa08d85901e575%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637357211966578452&sdata=nEhXcrxY7KmQVCsVJrX20bagZLbzwVqlT%2BYvhSYCjHI%3D&reserved=0 >
On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote: > On 9/14/20 5:59 PM, Sean Christopherson wrote: > > On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote: > >> From: Tom Lendacky <thomas.lendacky@amd.com> > >> > >> This patch series provides support for running SEV-ES guests under KVM. > > > > From the x86/VMX side of things, the GPR hooks are the only changes that I > > strongly dislike. > > > > For the vmsa_encrypted flag and related things like allow_debug(), I'd > > really like to aim for a common implementation between SEV-ES and TDX[*] from > > the get go, within reason obviously. From a code perspective, I don't think > > it will be too onerous as the basic tenets are quite similar, e.g. guest > > state is off limits, FPU state is autoswitched, etc..., but I suspect (or > > maybe worry?) that there are enough minor differences that we'll want a more > > generic way of marking ioctls() as disallowed to avoid having one-off checks > > all over the place. > > > > That being said, it may also be that there are some ioctls() that should be > > disallowed under SEV-ES, but aren't in this series. E.g. I assume > > kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary > > emulation (I assume this applies to vanilla SEV as well?). > > Right, SMM isn't currently supported under SEV-ES. SEV does support SMM, > though, since the register state can be altered to change over to the SMM > register state. So the SMI ioctl() is ok for SEV. But isn't guest memory inaccessible for SEV? E.g. how does KVM emulate the save/restore to/from SMRAM?
On 9/15/20 12:32 PM, Sean Christopherson wrote: > On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote: >> On 9/14/20 5:59 PM, Sean Christopherson wrote: >>> On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote: >>>> From: Tom Lendacky <thomas.lendacky@amd.com> >>>> >>>> This patch series provides support for running SEV-ES guests under KVM. >>> From the x86/VMX side of things, the GPR hooks are the only changes that I >>> strongly dislike. >>> >>> For the vmsa_encrypted flag and related things like allow_debug(), I'd >>> really like to aim for a common implementation between SEV-ES and TDX[*] from >>> the get go, within reason obviously. From a code perspective, I don't think >>> it will be too onerous as the basic tenets are quite similar, e.g. guest >>> state is off limits, FPU state is autoswitched, etc..., but I suspect (or >>> maybe worry?) that there are enough minor differences that we'll want a more >>> generic way of marking ioctls() as disallowed to avoid having one-off checks >>> all over the place. >>> >>> That being said, it may also be that there are some ioctls() that should be >>> disallowed under SEV-ES, but aren't in this series. E.g. I assume >>> kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary >>> emulation (I assume this applies to vanilla SEV as well?). >> Right, SMM isn't currently supported under SEV-ES. SEV does support SMM, >> though, since the register state can be altered to change over to the SMM >> register state. So the SMI ioctl() is ok for SEV. > But isn't guest memory inaccessible for SEV? E.g. how does KVM emulate the > save/restore to/from SMRAM? In SEV, to support the SMM, the guest BIOS (Ovmf) maps the SMM state save area as unencrypted. This allows the KVM to access the SMM state saved area as unencrypted. SVM also provides intercepts for the RSM, so KVM does not need to fetch and decode the instruction bytes to know whether the VMEXIT was due to exiting from the SMM mode.
On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote: > On 9/14/20 5:59 PM, Sean Christopherson wrote: > > Given that we don't yet have publicly available KVM code for TDX, what if I > > generate and post a list of ioctls() that are denied by either SEV-ES or TDX, > > organized by the denier(s)? Then for the ioctls() that are denied by one and > > not the other, we add a brief explanation of why it's denied? > > > > If that sounds ok, I'll get the list and the TDX side of things posted > > tomorrow. > > That sounds good. TDX completely blocks the following ioctl()s: kvm_vcpu_ioctl_interrupt kvm_vcpu_ioctl_smi kvm_vcpu_ioctl_x86_setup_mce kvm_vcpu_ioctl_x86_set_mce kvm_vcpu_ioctl_x86_get_debugregs kvm_vcpu_ioctl_x86_set_debugregs kvm_vcpu_ioctl_x86_get_xsave kvm_vcpu_ioctl_x86_set_xsave kvm_vcpu_ioctl_x86_get_xcrs kvm_vcpu_ioctl_x86_set_xcrs kvm_arch_vcpu_ioctl_get_regs kvm_arch_vcpu_ioctl_set_regs kvm_arch_vcpu_ioctl_get_sregs kvm_arch_vcpu_ioctl_set_sregs kvm_arch_vcpu_ioctl_set_guest_debug kvm_arch_vcpu_ioctl_get_fpu kvm_arch_vcpu_ioctl_set_fpu Looking through the code, I think kvm_arch_vcpu_ioctl_get_mpstate() and kvm_arch_vcpu_ioctl_set_mpstate() should also be disallowed, we just haven't actually done so. There are also two helper functions that are "blocked". dm_request_for_irq_injection() returns false if guest_state_protected, and post_kvm_run_save() shoves dummy state. TDX also selectively blocks/skips portions of other ioctl()s so that the TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event injection restrictions are due to direct injection not being allowed (except for NMIs); all IRQs have to be routed through APICv (posted interrupts) and exception injection is completely disallowed. kvm_vcpu_ioctl_x86_get_vcpu_events: if (!vcpu->kvm->arch.guest_state_protected) events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu); kvm_arch_vcpu_put: if (vcpu->preempted && !vcpu->kvm->arch.guest_state_protected) vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu); kvm_vcpu_ioctl_x86_set_vcpu_events: u32 allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR | KVM_VCPUEVENT_VALID_SHADOW | KVM_VCPUEVENT_VALID_SMM | KVM_VCPUEVENT_VALID_PAYLOAD; if (vcpu->kvm->arch.guest_state_protected) allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING; kvm_arch_vcpu_ioctl_run: if (vcpu->kvm->arch.guest_state_protected) kvm_sync_valid_fields = KVM_SYNC_X86_EVENTS; else kvm_sync_valid_fields = KVM_SYNC_X86_VALID_FIELDS; In addition to the more generic guest_state_protected, we also (obviously tentatively) have a few other flags to deal with aspects of TDX that I'm fairly certain don't apply to SEV-ES: tsc_immutable - KVM doesn't have write access to the TSC offset of the guest. eoi_intercept_unsupported - KVM can't intercept EOIs (doesn't have access to EOI bitmaps) and so can't support level triggered interrupts, at least not without extra pain. readonly_mem_unsupported - Secure EPT (analagous to SNP) requires RWX permissions for all private/encrypted memory. S-EPT isn't optional, so we get the joy of adding this right off the bat...
Apologies, Sean. I thought I had replied to this but found it instead in my drafts folder... I've taken much of your feedback and incorporated that into the next version of the patches that I submitted and updated this response based on that, too. On 9/15/20 7:19 PM, Sean Christopherson wrote: > On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote: >> On 9/14/20 5:59 PM, Sean Christopherson wrote: >>> Given that we don't yet have publicly available KVM code for TDX, what if I >>> generate and post a list of ioctls() that are denied by either SEV-ES or TDX, >>> organized by the denier(s)? Then for the ioctls() that are denied by one and >>> not the other, we add a brief explanation of why it's denied? >>> >>> If that sounds ok, I'll get the list and the TDX side of things posted >>> tomorrow. >> >> That sounds good. > > TDX completely blocks the following ioctl()s: SEV-ES doesn't need to completely block these ioctls. SEV-SNP is likely to do more of that. SEV-ES will still allow interrupts to be injected, or registers to be retrieved (which will only contain what was provided in the GHCB exchange), etc. > > kvm_vcpu_ioctl_interrupt > kvm_vcpu_ioctl_smi > kvm_vcpu_ioctl_x86_setup_mce > kvm_vcpu_ioctl_x86_set_mce > kvm_vcpu_ioctl_x86_get_debugregs > kvm_vcpu_ioctl_x86_set_debugregs > kvm_vcpu_ioctl_x86_get_xsave > kvm_vcpu_ioctl_x86_set_xsave > kvm_vcpu_ioctl_x86_get_xcrs > kvm_vcpu_ioctl_x86_set_xcrs > kvm_arch_vcpu_ioctl_get_regs > kvm_arch_vcpu_ioctl_set_regs > kvm_arch_vcpu_ioctl_get_sregs > kvm_arch_vcpu_ioctl_set_sregs > kvm_arch_vcpu_ioctl_set_guest_debug > kvm_arch_vcpu_ioctl_get_fpu > kvm_arch_vcpu_ioctl_set_fpu Of the listed ioctls, really the only ones I've updated are: kvm_vcpu_ioctl_x86_get_xsave kvm_vcpu_ioctl_x86_set_xsave kvm_arch_vcpu_ioctl_get_sregs This allows reading of the tracking value registers kvm_arch_vcpu_ioctl_set_sregs This prevents setting of register values kvm_arch_vcpu_ioctl_set_guest_debug kvm_arch_vcpu_ioctl_get_fpu kvm_arch_vcpu_ioctl_set_fpu > > Looking through the code, I think kvm_arch_vcpu_ioctl_get_mpstate() and > kvm_arch_vcpu_ioctl_set_mpstate() should also be disallowed, we just haven't > actually done so. I haven't done anything with these either. > > There are also two helper functions that are "blocked". > dm_request_for_irq_injection() returns false if guest_state_protected, and > post_kvm_run_save() shoves dummy state. ... and these. > > TDX also selectively blocks/skips portions of other ioctl()s so that the > TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event > injection restrictions are due to direct injection not being allowed (except > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and > exception injection is completely disallowed. For SEV-ES, we don't have those restrictions. > > kvm_vcpu_ioctl_x86_get_vcpu_events: > if (!vcpu->kvm->arch.guest_state_protected) > events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu); > > kvm_arch_vcpu_put: > if (vcpu->preempted && !vcpu->kvm->arch.guest_state_protected) > vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu); > > kvm_vcpu_ioctl_x86_set_vcpu_events: > u32 allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING | > KVM_VCPUEVENT_VALID_SIPI_VECTOR | > KVM_VCPUEVENT_VALID_SHADOW | > KVM_VCPUEVENT_VALID_SMM | > KVM_VCPUEVENT_VALID_PAYLOAD; > > if (vcpu->kvm->arch.guest_state_protected) > allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING; > > > kvm_arch_vcpu_ioctl_run: > if (vcpu->kvm->arch.guest_state_protected) > kvm_sync_valid_fields = KVM_SYNC_X86_EVENTS; > else > kvm_sync_valid_fields = KVM_SYNC_X86_VALID_FIELDS; > > > In addition to the more generic guest_state_protected, we also (obviously > tentatively) have a few other flags to deal with aspects of TDX that I'm > fairly certain don't apply to SEV-ES: > > tsc_immutable - KVM doesn't have write access to the TSC offset of the > guest. > > eoi_intercept_unsupported - KVM can't intercept EOIs (doesn't have access > to EOI bitmaps) and so can't support level > triggered interrupts, at least not without > extra pain. > > readonly_mem_unsupported - Secure EPT (analagous to SNP) requires RWX > permissions for all private/encrypted memory. > S-EPT isn't optional, so we get the joy of > adding this right off the bat... Yes, most of the above stuff doesn't apply to SEV-ES. Thanks, Tom >
On 16/09/20 02:19, Sean Christopherson wrote: > > TDX also selectively blocks/skips portions of other ioctl()s so that the > TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event > injection restrictions are due to direct injection not being allowed (except > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and > exception injection is completely disallowed. > > kvm_vcpu_ioctl_x86_get_vcpu_events: > if (!vcpu->kvm->arch.guest_state_protected) > events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu); Perhaps an alternative implementation can enter the vCPU with immediate exit until no events are pending, and then return all zeroes? Paolo
On 11/30/20 9:31 AM, Paolo Bonzini wrote: > On 16/09/20 02:19, Sean Christopherson wrote: >> >> TDX also selectively blocks/skips portions of other ioctl()s so that the >> TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event >> injection restrictions are due to direct injection not being allowed >> (except >> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and >> exception injection is completely disallowed. >> >> kvm_vcpu_ioctl_x86_get_vcpu_events: >> if (!vcpu->kvm->arch.guest_state_protected) >> events->interrupt.shadow = >> kvm_x86_ops.get_interrupt_shadow(vcpu); > > Perhaps an alternative implementation can enter the vCPU with immediate > exit until no events are pending, and then return all zeroes? SEV-SNP has support for restricting injections, but SEV-ES does not. Perhaps a new boolean, guest_restricted_injection, can be used instead of basing it on guest_state_protected. Thanks, Tom > > Paolo >
On Mon, Nov 30, 2020, Paolo Bonzini wrote: > On 16/09/20 02:19, Sean Christopherson wrote: > > > > TDX also selectively blocks/skips portions of other ioctl()s so that the > > TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event > > injection restrictions are due to direct injection not being allowed (except > > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and > > exception injection is completely disallowed. > > > > kvm_vcpu_ioctl_x86_get_vcpu_events: > > if (!vcpu->kvm->arch.guest_state_protected) > > events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu); > > Perhaps an alternative implementation can enter the vCPU with immediate exit > until no events are pending, and then return all zeroes? This can't work. If the guest has STI blocking, e.g. it did STI->TDVMCALL with a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be non-zero to reflect the STI blocking. But, the immediate exit (a hardware IRQ for TDX guests) will cause VM-Exit before the guest can execute any instructions and thus the guest will never clear STI blocking and never consume the pending event. Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM would need to run the guest for an indeterminate amount of time to wait for the vIRQ to be consumed. Tangentially related, I haven't looked through the official external TDX docs, but I suspect that vmcs.GUEST_RVI is listed as inaccessible for production TDs. This will be changed as the VMM needs access to GUEST_RVI to handle STI->TDVMCALL(HLT), otherwise the VMM may incorrectly put the vCPU into a blocked (not runnable) state even though it has a pending wake event.
On Mon, Nov 30, 2020, Tom Lendacky wrote: > On 11/30/20 9:31 AM, Paolo Bonzini wrote: > > On 16/09/20 02:19, Sean Christopherson wrote: > >> > >> TDX also selectively blocks/skips portions of other ioctl()s so that the > >> TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event > >> injection restrictions are due to direct injection not being allowed > >> (except > >> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and > >> exception injection is completely disallowed. > >> > >> kvm_vcpu_ioctl_x86_get_vcpu_events: > >> if (!vcpu->kvm->arch.guest_state_protected) > >> events->interrupt.shadow = > >> kvm_x86_ops.get_interrupt_shadow(vcpu); > > > > Perhaps an alternative implementation can enter the vCPU with immediate > > exit until no events are pending, and then return all zeroes? > > SEV-SNP has support for restricting injections, but SEV-ES does not. > Perhaps a new boolean, guest_restricted_injection, can be used instead of > basing it on guest_state_protected. Ya, that probably makes sense. I suspect the easiest way to resolve these conflicts will be to land the SEV-ES series and then tweak things as needed for TDX. Easiest in the sense that it should be fairly obvious what can be covered by guest_state_protected and what needs a dedicated flag.
On 30/11/20 19:14, Sean Christopherson wrote: >>> TDX also selectively blocks/skips portions of other ioctl()s so that the >>> TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event >>> injection restrictions are due to direct injection not being allowed (except >>> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and >>> exception injection is completely disallowed. >>> >>> kvm_vcpu_ioctl_x86_get_vcpu_events: >>> if (!vcpu->kvm->arch.guest_state_protected) >>> events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu); >> Perhaps an alternative implementation can enter the vCPU with immediate exit >> until no events are pending, and then return all zeroes? > > This can't work. If the guest has STI blocking, e.g. it did STI->TDVMCALL with > a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be > non-zero to reflect the STI blocking. But, the immediate exit (a hardware IRQ > for TDX guests) will cause VM-Exit before the guest can execute any instructions > and thus the guest will never clear STI blocking and never consume the pending > event. Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM > would need to run the guest for an indeterminate amount of time to wait for the > vIRQ to be consumed. Delayed interrupts are fine, since they are injected according to RVI and the posted interrupt descriptor. I'm thinking more of events (exceptions and interrupts) that caused an EPT violation exit and were recorded in the IDT-vectored info field. Paolo > Tangentially related, I haven't looked through the official external TDX docs, > but I suspect that vmcs.GUEST_RVI is listed as inaccessible for production TDs. > This will be changed as the VMM needs access to GUEST_RVI to handle > STI->TDVMCALL(HLT), otherwise the VMM may incorrectly put the vCPU into a > blocked (not runnable) state even though it has a pending wake event. >
+Isaku and Xiaoyao On Mon, Nov 30, 2020, Paolo Bonzini wrote: > On 30/11/20 19:14, Sean Christopherson wrote: > > > > TDX also selectively blocks/skips portions of other ioctl()s so that the > > > > TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event > > > > injection restrictions are due to direct injection not being allowed (except > > > > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and > > > > exception injection is completely disallowed. > > > > > > > > kvm_vcpu_ioctl_x86_get_vcpu_events: > > > > if (!vcpu->kvm->arch.guest_state_protected) > > > > events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu); > > > Perhaps an alternative implementation can enter the vCPU with immediate exit > > > until no events are pending, and then return all zeroes? > > > > This can't work. If the guest has STI blocking, e.g. it did STI->TDVMCALL with > > a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be > > non-zero to reflect the STI blocking. But, the immediate exit (a hardware IRQ > > for TDX guests) will cause VM-Exit before the guest can execute any instructions > > and thus the guest will never clear STI blocking and never consume the pending > > event. Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM > > would need to run the guest for an indeterminate amount of time to wait for the > > vIRQ to be consumed. > > Delayed interrupts are fine, since they are injected according to RVI and > the posted interrupt descriptor. I'm thinking more of events (exceptions > and interrupts) that caused an EPT violation exit and were recorded in the > IDT-vectored info field. Ah. As is, I don't believe KVM has access to this information. TDX-Module handles the actual EPT violation, as well as event reinjection. The EPT violation reported by SEAMRET is synthesized, and IIRC the IDT-vectoring field is not readable. Regardless, is there an actual a problem with having a "pending" exception that isn't reported to userspace? Obviously the info needs to be migrated, but that will be taken care of by virtue of migrating the VMCS.
On 30/11/20 20:35, Sean Christopherson wrote: >> Delayed interrupts are fine, since they are injected according to RVI and >> the posted interrupt descriptor. I'm thinking more of events (exceptions >> and interrupts) that caused an EPT violation exit and were recorded in the >> IDT-vectored info field. > Ah. As is, I don't believe KVM has access to this information. TDX-Module > handles the actual EPT violation, as well as event reinjection. The EPT > violation reported by SEAMRET is synthesized, and IIRC the IDT-vectoring field > is not readable. > > Regardless, is there an actual a problem with having a "pending" exception that > isn't reported to userspace? Obviously the info needs to be migrated, but that > will be taken care of by virtue of migrating the VMCS. No problem, I suppose we would just have to get used to not being able to look into the state of migrated VMs. Paolo
From: Tom Lendacky <thomas.lendacky@amd.com> This patch series provides support for running SEV-ES guests under KVM. Secure Encrypted Virtualization - Encrypted State (SEV-ES) expands on the SEV support to protect the guest register state from the hypervisor. See "AMD64 Architecture Programmer's Manual Volume 2: System Programming", section "15.35 Encrypted State (SEV-ES)" [1]. In order to allow a hypervisor to perform functions on behalf of a guest, there is architectural support for notifying a guest's operating system when certain types of VMEXITs are about to occur. This allows the guest to selectively share information with the hypervisor to satisfy the requested function. The notification is performed using a new exception, the VMM Communication exception (#VC). The information is shared through the Guest-Hypervisor Communication Block (GHCB) using the VMGEXIT instruction. The GHCB format and the protocol for using it is documented in "SEV-ES Guest-Hypervisor Communication Block Standardization" [2]. Under SEV-ES, a vCPU save area (VMSA) must be encrypted. SVM is updated to build the initial VMSA and then encrypt it before running the guest. Once encrypted, it must not be modified by the hypervisor. Modification of the VMSA will result in the VMRUN instruction failing with a SHUTDOWN exit code. KVM must support the VMGEXIT exit code in order to perform the necessary functions required of the guest. The GHCB is used to exchange the information needed by both the hypervisor and the guest. To simplify access to the VMSA and the GHCB, SVM uses an accessor function to obtain the address of the either the VMSA or the GHCB, depending on the stage of execution of the guest. There are changes to some of the intercepts that are needed under SEV-ES. For example, CR0 writes cannot be intercepted, so the code needs to ensure that the intercept is not enabled during execution or that the hypervisor does not try to read the register as part of exit processing. Another example is shutdown processing, where the vCPU cannot be directly reset. Support is added to handle VMGEXIT events and implement the GHCB protocol. This includes supporting standard exit events, like a CPUID instruction intercept, to new support, for things like AP processor booting. Much of the existing SVM intercept support can be re-used by setting the exit code information from the VMGEXIT and calling the appropriate intercept handlers. Finally, to launch and run an SEV-ES guest requires changes to the vCPU initialization, loading and execution. [1] https://www.amd.com/system/files/TechDocs/24593.pdf [2] https://developer.amd.com/wp-content/resources/56421.pdf --- These patches are based on a commit of the KVM next branch. However, I had to backport recent SEV-ES guest patches (a previous series to the actual patches that are now in the tip tree) into my development branch, since there are prereq patches needed by this series. As a result, this patch series will not successfully build or apply to the KVM next branch as is. A version of the tree can be found at: https://github.com/AMDESE/linux/tree/sev-es-5.8-v3 Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Tom Lendacky (35): KVM: SVM: Remove the call to sev_platform_status() during setup KVM: SVM: Add support for SEV-ES capability in KVM KVM: SVM: Add indirect access to the VM save area KVM: SVM: Make GHCB accessor functions available to the hypervisor KVM: SVM: Add initial support for SEV-ES GHCB access to KVM KVM: SVM: Add required changes to support intercepts under SEV-ES KVM: SVM: Modify DRx register intercepts for an SEV-ES guest KVM: SVM: Prevent debugging under SEV-ES KVM: SVM: Do not emulate MMIO under SEV-ES KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file KVM: SVM: Add initial support for a VMGEXIT VMEXIT KVM: SVM: Create trace events for VMGEXIT processing KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002 KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004 KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100 KVM: SVM: Create trace events for VMGEXIT MSR protocol processing KVM: SVM: Support MMIO for an SEV-ES guest KVM: SVM: Support port IO operations for an SEV-ES guest KVM: SVM: Add SEV/SEV-ES support for intercepting INVD KVM: SVM: Add support for EFER write traps for an SEV-ES guest KVM: SVM: Add support for CR0 write traps for an SEV-ES guest KVM: SVM: Add support for CR4 write traps for an SEV-ES guest KVM: SVM: Add support for CR8 write traps for an SEV-ES guest KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest KVM: SVM: Add support for booting APs for an SEV-ES guest KVM: X86: Update kvm_skip_emulated_instruction() for an SEV-ES guest KVM: SVM: Add NMI support for an SEV-ES guest KVM: SVM: Set the encryption mask for the SVM host save area KVM: SVM: Update ASID allocation to support SEV-ES guests KVM: SVM: Provide support for SEV-ES vCPU creation/loading KVM: SVM: Provide support for SEV-ES vCPU loading KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests KVM: SVM: Provide support to launch and run an SEV-ES guest arch/x86/include/asm/kvm_host.h | 16 + arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/svm.h | 35 +- arch/x86/include/uapi/asm/svm.h | 28 ++ arch/x86/kernel/cpu/vmware.c | 12 +- arch/x86/kvm/Kconfig | 3 +- arch/x86/kvm/cpuid.c | 1 + arch/x86/kvm/kvm_cache_regs.h | 30 +- arch/x86/kvm/mmu/mmu.c | 7 + arch/x86/kvm/svm/nested.c | 125 +++--- arch/x86/kvm/svm/sev.c | 590 ++++++++++++++++++++++++-- arch/x86/kvm/svm/svm.c | 704 ++++++++++++++++++++++++------- arch/x86/kvm/svm/svm.h | 357 ++++++++++++++-- arch/x86/kvm/svm/vmenter.S | 50 +++ arch/x86/kvm/trace.h | 99 +++++ arch/x86/kvm/vmx/vmx.c | 7 + arch/x86/kvm/x86.c | 357 ++++++++++++++-- arch/x86/kvm/x86.h | 8 + 18 files changed, 2070 insertions(+), 360 deletions(-)