Message ID | e08f56496a52a3a974310fbe05bb19100fd6c1d8.1600114548.git.thomas.lendacky@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SEV-ES hypervisor support | expand |
On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote: > From: Tom Lendacky <thomas.lendacky@amd.com> > > Since many of the registers used by the SEV-ES are encrypted and cannot > be read or written, adjust the __get_sregs() / __set_sregs() to only get > or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA > is encrypted. Is there an actual use case for writing said registers after the VMSA is encrypted? Assuming there's a separate "debug mode" and live migration has special logic, can KVM simply reject the ioctl() if guest state is protected?
On 9/14/20 4:37 PM, Sean Christopherson wrote: > On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote: >> From: Tom Lendacky <thomas.lendacky@amd.com> >> >> Since many of the registers used by the SEV-ES are encrypted and cannot >> be read or written, adjust the __get_sregs() / __set_sregs() to only get >> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA >> is encrypted. > > Is there an actual use case for writing said registers after the VMSA is > encrypted? Assuming there's a separate "debug mode" and live migration has > special logic, can KVM simply reject the ioctl() if guest state is protected? Yeah, I originally had it that way but one of the folks looking at live migration for SEV-ES thought it would be easier given the way Qemu does things. But I think it's easy enough to batch the tracking registers into the VMSA state that is being transferred during live migration. Let me check that out and likely the SET ioctl() could just skip all the regs. Thanks, Tom >
On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote: > On 9/14/20 4:37 PM, Sean Christopherson wrote: > > On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote: > >> From: Tom Lendacky <thomas.lendacky@amd.com> > >> > >> Since many of the registers used by the SEV-ES are encrypted and cannot > >> be read or written, adjust the __get_sregs() / __set_sregs() to only get > >> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA > >> is encrypted. > > > > Is there an actual use case for writing said registers after the VMSA is > > encrypted? Assuming there's a separate "debug mode" and live migration has > > special logic, can KVM simply reject the ioctl() if guest state is protected? > > Yeah, I originally had it that way but one of the folks looking at live > migration for SEV-ES thought it would be easier given the way Qemu does > things. But I think it's easy enough to batch the tracking registers into > the VMSA state that is being transferred during live migration. Let me > check that out and likely the SET ioctl() could just skip all the regs. Hmm, that would be ideal. How are the tracked registers validated when they're loaded at the destination? It seems odd/dangerous that KVM would have full control over efer/cr0/cr4/cr8. I.e. why is KVM even responsibile for migrating that information, e.g. as opposed to migrating an opaque blob that contains encrypted versions of those registers?
On 9/15/20 11:33 AM, Sean Christopherson wrote: > On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote: >> On 9/14/20 4:37 PM, Sean Christopherson wrote: >>> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote: >>>> From: Tom Lendacky <thomas.lendacky@amd.com> >>>> >>>> Since many of the registers used by the SEV-ES are encrypted and cannot >>>> be read or written, adjust the __get_sregs() / __set_sregs() to only get >>>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA >>>> is encrypted. >>> >>> Is there an actual use case for writing said registers after the VMSA is >>> encrypted? Assuming there's a separate "debug mode" and live migration has >>> special logic, can KVM simply reject the ioctl() if guest state is protected? >> >> Yeah, I originally had it that way but one of the folks looking at live >> migration for SEV-ES thought it would be easier given the way Qemu does >> things. But I think it's easy enough to batch the tracking registers into >> the VMSA state that is being transferred during live migration. Let me >> check that out and likely the SET ioctl() could just skip all the regs. > > Hmm, that would be ideal. How are the tracked registers validated when they're > loaded at the destination? It seems odd/dangerous that KVM would have full > control over efer/cr0/cr4/cr8. I.e. why is KVM even responsibile for migrating > that information, e.g. as opposed to migrating an opaque blob that contains > encrypted versions of those registers? > KVM doesn't have control of them. They are part of the guest's encrypted state and that is what the guest uses. KVM can't alter the value that the guest is using for them once the VMSA is encrypted. However, KVM makes some decisions based on the values it thinks it knows. For example, early on I remember the async PF support failing because the CR0 that KVM thought the guest had didn't have the PE bit set, even though the guest was in protected mode. So KVM didn't include the error code in the exception it injected (is_protmode() was false) and things failed. Without syncing these values after live migration, things also fail (probably for the same reason). So the idea is to just keep KVM apprised of the values that the guest has. Thanks, Tom
On Tue, Sep 15, 2020 at 03:37:21PM -0500, Tom Lendacky wrote: > On 9/15/20 11:33 AM, Sean Christopherson wrote: > > On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote: > >> On 9/14/20 4:37 PM, Sean Christopherson wrote: > >>> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote: > >>>> From: Tom Lendacky <thomas.lendacky@amd.com> > >>>> > >>>> Since many of the registers used by the SEV-ES are encrypted and cannot > >>>> be read or written, adjust the __get_sregs() / __set_sregs() to only get > >>>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA > >>>> is encrypted. > >>> > >>> Is there an actual use case for writing said registers after the VMSA is > >>> encrypted? Assuming there's a separate "debug mode" and live migration has > >>> special logic, can KVM simply reject the ioctl() if guest state is protected? > >> > >> Yeah, I originally had it that way but one of the folks looking at live > >> migration for SEV-ES thought it would be easier given the way Qemu does > >> things. But I think it's easy enough to batch the tracking registers into > >> the VMSA state that is being transferred during live migration. Let me > >> check that out and likely the SET ioctl() could just skip all the regs. > > > > Hmm, that would be ideal. How are the tracked registers validated when they're > > loaded at the destination? It seems odd/dangerous that KVM would have full > > control over efer/cr0/cr4/cr8. I.e. why is KVM even responsibile for migrating > > that information, e.g. as opposed to migrating an opaque blob that contains > > encrypted versions of those registers? > > > > KVM doesn't have control of them. They are part of the guest's encrypted > state and that is what the guest uses. KVM can't alter the value that the > guest is using for them once the VMSA is encrypted. However, KVM makes > some decisions based on the values it thinks it knows. For example, early > on I remember the async PF support failing because the CR0 that KVM > thought the guest had didn't have the PE bit set, even though the guest > was in protected mode. So KVM didn't include the error code in the > exception it injected (is_protmode() was false) and things failed. Without > syncing these values after live migration, things also fail (probably for > the same reason). So the idea is to just keep KVM apprised of the values > that the guest has. Ah, gotcha. Migrating tracked state through the VMSA would probably be ideal. The semantics of __set_sregs() kinda setting state but not reaaaally setting state would be weird.
On 16/09/20 00:44, Sean Christopherson wrote: >> KVM doesn't have control of them. They are part of the guest's encrypted >> state and that is what the guest uses. KVM can't alter the value that the >> guest is using for them once the VMSA is encrypted. However, KVM makes >> some decisions based on the values it thinks it knows. For example, early >> on I remember the async PF support failing because the CR0 that KVM >> thought the guest had didn't have the PE bit set, even though the guest >> was in protected mode. So KVM didn't include the error code in the >> exception it injected (is_protmode() was false) and things failed. Without >> syncing these values after live migration, things also fail (probably for >> the same reason). So the idea is to just keep KVM apprised of the values >> that the guest has. > > Ah, gotcha. Migrating tracked state through the VMSA would probably be ideal. > The semantics of __set_sregs() kinda setting state but not reaaaally setting > state would be weird. How would that work with TDX? Paolo
On Mon, Nov 30, 2020, Paolo Bonzini wrote: > On 16/09/20 00:44, Sean Christopherson wrote: > > > KVM doesn't have control of them. They are part of the guest's encrypted > > > state and that is what the guest uses. KVM can't alter the value that the > > > guest is using for them once the VMSA is encrypted. However, KVM makes > > > some decisions based on the values it thinks it knows. For example, early > > > on I remember the async PF support failing because the CR0 that KVM > > > thought the guest had didn't have the PE bit set, even though the guest > > > was in protected mode. So KVM didn't include the error code in the > > > exception it injected (is_protmode() was false) and things failed. Without > > > syncing these values after live migration, things also fail (probably for > > > the same reason). So the idea is to just keep KVM apprised of the values > > > that the guest has. > > > > Ah, gotcha. Migrating tracked state through the VMSA would probably be ideal. > > The semantics of __set_sregs() kinda setting state but not reaaaally setting > > state would be weird. > > How would that work with TDX? Can you elaborate? I.e. how would what work with TDX?
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6e445a76b691..76efe70cd635 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9090,6 +9090,9 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { struct desc_ptr dt; + if (vcpu->arch.vmsa_encrypted) + goto tracking_regs; + kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS); kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS); kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES); @@ -9107,12 +9110,15 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) sregs->gdt.limit = dt.size; sregs->gdt.base = dt.address; - sregs->cr0 = kvm_read_cr0(vcpu); sregs->cr2 = vcpu->arch.cr2; sregs->cr3 = kvm_read_cr3(vcpu); + +tracking_regs: + sregs->cr0 = kvm_read_cr0(vcpu); sregs->cr4 = kvm_read_cr4(vcpu); sregs->cr8 = kvm_get_cr8(vcpu); sregs->efer = vcpu->arch.efer; + sregs->apic_base = kvm_get_apic_base(vcpu); memset(sregs->interrupt_bitmap, 0, sizeof(sregs->interrupt_bitmap)); @@ -9248,18 +9254,6 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) if (kvm_set_apic_base(vcpu, &apic_base_msr)) goto out; - dt.size = sregs->idt.limit; - dt.address = sregs->idt.base; - kvm_x86_ops.set_idt(vcpu, &dt); - dt.size = sregs->gdt.limit; - dt.address = sregs->gdt.base; - kvm_x86_ops.set_gdt(vcpu, &dt); - - vcpu->arch.cr2 = sregs->cr2; - mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3; - vcpu->arch.cr3 = sregs->cr3; - kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); - kvm_set_cr8(vcpu, sregs->cr8); mmu_reset_needed |= vcpu->arch.efer != sregs->efer; @@ -9276,6 +9270,14 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) if (cpuid_update_needed) kvm_update_cpuid_runtime(vcpu); + if (vcpu->arch.vmsa_encrypted) + goto tracking_regs; + + vcpu->arch.cr2 = sregs->cr2; + mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3; + vcpu->arch.cr3 = sregs->cr3; + kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); + idx = srcu_read_lock(&vcpu->kvm->srcu); if (is_pae_paging(vcpu)) { load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu)); @@ -9283,16 +9285,12 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) } srcu_read_unlock(&vcpu->kvm->srcu, idx); - if (mmu_reset_needed) - kvm_mmu_reset_context(vcpu); - - max_bits = KVM_NR_INTERRUPTS; - pending_vec = find_first_bit( - (const unsigned long *)sregs->interrupt_bitmap, max_bits); - if (pending_vec < max_bits) { - kvm_queue_interrupt(vcpu, pending_vec, false); - pr_debug("Set back pending irq %d\n", pending_vec); - } + dt.size = sregs->idt.limit; + dt.address = sregs->idt.base; + kvm_x86_ops.set_idt(vcpu, &dt); + dt.size = sregs->gdt.limit; + dt.address = sregs->gdt.base; + kvm_x86_ops.set_gdt(vcpu, &dt); kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS); kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS); @@ -9312,6 +9310,18 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) !is_protmode(vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; +tracking_regs: + if (mmu_reset_needed) + kvm_mmu_reset_context(vcpu); + + max_bits = KVM_NR_INTERRUPTS; + pending_vec = find_first_bit( + (const unsigned long *)sregs->interrupt_bitmap, max_bits); + if (pending_vec < max_bits) { + kvm_queue_interrupt(vcpu, pending_vec, false); + pr_debug("Set back pending irq %d\n", pending_vec); + } + kvm_make_request(KVM_REQ_EVENT, vcpu); ret = 0;