diff mbox series

[RFC,25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES

Message ID e08f56496a52a3a974310fbe05bb19100fd6c1d8.1600114548.git.thomas.lendacky@amd.com (mailing list archive)
State New, archived
Headers show
Series SEV-ES hypervisor support | expand

Commit Message

Tom Lendacky Sept. 14, 2020, 8:15 p.m. UTC
From: Tom Lendacky <thomas.lendacky@amd.com>

Since many of the registers used by the SEV-ES are encrypted and cannot
be read or written, adjust the __get_sregs() / __set_sregs() to only get
or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
is encrypted.

For __get_sregs(), return the actual value that is in use by the guest
as determined by the write trap support of the registers.

For __set_sregs(), set the arch specific value that KVM believes the guest
to be using. Note, this will not set the guest's actual value so it might
only be useful for such things as live migration.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/x86.c | 56 +++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 23 deletions(-)

Comments

Sean Christopherson Sept. 14, 2020, 9:37 p.m. UTC | #1
On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Since many of the registers used by the SEV-ES are encrypted and cannot
> be read or written, adjust the __get_sregs() / __set_sregs() to only get
> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
> is encrypted.

Is there an actual use case for writing said registers after the VMSA is
encrypted?  Assuming there's a separate "debug mode" and live migration has
special logic, can KVM simply reject the ioctl() if guest state is protected?
Tom Lendacky Sept. 15, 2020, 2:19 p.m. UTC | #2
On 9/14/20 4:37 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Since many of the registers used by the SEV-ES are encrypted and cannot
>> be read or written, adjust the __get_sregs() / __set_sregs() to only get
>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
>> is encrypted.
> 
> Is there an actual use case for writing said registers after the VMSA is
> encrypted?  Assuming there's a separate "debug mode" and live migration has
> special logic, can KVM simply reject the ioctl() if guest state is protected?

Yeah, I originally had it that way but one of the folks looking at live
migration for SEV-ES thought it would be easier given the way Qemu does
things. But I think it's easy enough to batch the tracking registers into
the VMSA state that is being transferred during live migration. Let me
check that out and likely the SET ioctl() could just skip all the regs.

Thanks,
Tom

>
Sean Christopherson Sept. 15, 2020, 4:33 p.m. UTC | #3
On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote:
> On 9/14/20 4:37 PM, Sean Christopherson wrote:
> > On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
> >> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>
> >> Since many of the registers used by the SEV-ES are encrypted and cannot
> >> be read or written, adjust the __get_sregs() / __set_sregs() to only get
> >> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
> >> is encrypted.
> > 
> > Is there an actual use case for writing said registers after the VMSA is
> > encrypted?  Assuming there's a separate "debug mode" and live migration has
> > special logic, can KVM simply reject the ioctl() if guest state is protected?
> 
> Yeah, I originally had it that way but one of the folks looking at live
> migration for SEV-ES thought it would be easier given the way Qemu does
> things. But I think it's easy enough to batch the tracking registers into
> the VMSA state that is being transferred during live migration. Let me
> check that out and likely the SET ioctl() could just skip all the regs.

Hmm, that would be ideal.  How are the tracked registers validated when they're
loaded at the destination?  It seems odd/dangerous that KVM would have full
control over efer/cr0/cr4/cr8.  I.e. why is KVM even responsibile for migrating
that information, e.g. as opposed to migrating an opaque blob that contains
encrypted versions of those registers?
Tom Lendacky Sept. 15, 2020, 8:37 p.m. UTC | #4
On 9/15/20 11:33 AM, Sean Christopherson wrote:
> On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote:
>> On 9/14/20 4:37 PM, Sean Christopherson wrote:
>>> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
>>>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>>>
>>>> Since many of the registers used by the SEV-ES are encrypted and cannot
>>>> be read or written, adjust the __get_sregs() / __set_sregs() to only get
>>>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
>>>> is encrypted.
>>>
>>> Is there an actual use case for writing said registers after the VMSA is
>>> encrypted?  Assuming there's a separate "debug mode" and live migration has
>>> special logic, can KVM simply reject the ioctl() if guest state is protected?
>>
>> Yeah, I originally had it that way but one of the folks looking at live
>> migration for SEV-ES thought it would be easier given the way Qemu does
>> things. But I think it's easy enough to batch the tracking registers into
>> the VMSA state that is being transferred during live migration. Let me
>> check that out and likely the SET ioctl() could just skip all the regs.
> 
> Hmm, that would be ideal.  How are the tracked registers validated when they're
> loaded at the destination?  It seems odd/dangerous that KVM would have full
> control over efer/cr0/cr4/cr8.  I.e. why is KVM even responsibile for migrating
> that information, e.g. as opposed to migrating an opaque blob that contains
> encrypted versions of those registers?
> 

KVM doesn't have control of them. They are part of the guest's encrypted
state and that is what the guest uses. KVM can't alter the value that the
guest is using for them once the VMSA is encrypted. However, KVM makes
some decisions based on the values it thinks it knows.  For example, early
on I remember the async PF support failing because the CR0 that KVM
thought the guest had didn't have the PE bit set, even though the guest
was in protected mode. So KVM didn't include the error code in the
exception it injected (is_protmode() was false) and things failed. Without
syncing these values after live migration, things also fail (probably for
the same reason). So the idea is to just keep KVM apprised of the values
that the guest has.

Thanks,
Tom
Sean Christopherson Sept. 15, 2020, 10:44 p.m. UTC | #5
On Tue, Sep 15, 2020 at 03:37:21PM -0500, Tom Lendacky wrote:
> On 9/15/20 11:33 AM, Sean Christopherson wrote:
> > On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote:
> >> On 9/14/20 4:37 PM, Sean Christopherson wrote:
> >>> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
> >>>> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>>>
> >>>> Since many of the registers used by the SEV-ES are encrypted and cannot
> >>>> be read or written, adjust the __get_sregs() / __set_sregs() to only get
> >>>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
> >>>> is encrypted.
> >>>
> >>> Is there an actual use case for writing said registers after the VMSA is
> >>> encrypted?  Assuming there's a separate "debug mode" and live migration has
> >>> special logic, can KVM simply reject the ioctl() if guest state is protected?
> >>
> >> Yeah, I originally had it that way but one of the folks looking at live
> >> migration for SEV-ES thought it would be easier given the way Qemu does
> >> things. But I think it's easy enough to batch the tracking registers into
> >> the VMSA state that is being transferred during live migration. Let me
> >> check that out and likely the SET ioctl() could just skip all the regs.
> > 
> > Hmm, that would be ideal.  How are the tracked registers validated when they're
> > loaded at the destination?  It seems odd/dangerous that KVM would have full
> > control over efer/cr0/cr4/cr8.  I.e. why is KVM even responsibile for migrating
> > that information, e.g. as opposed to migrating an opaque blob that contains
> > encrypted versions of those registers?
> > 
> 
> KVM doesn't have control of them. They are part of the guest's encrypted
> state and that is what the guest uses. KVM can't alter the value that the
> guest is using for them once the VMSA is encrypted. However, KVM makes
> some decisions based on the values it thinks it knows.  For example, early
> on I remember the async PF support failing because the CR0 that KVM
> thought the guest had didn't have the PE bit set, even though the guest
> was in protected mode. So KVM didn't include the error code in the
> exception it injected (is_protmode() was false) and things failed. Without
> syncing these values after live migration, things also fail (probably for
> the same reason). So the idea is to just keep KVM apprised of the values
> that the guest has.

Ah, gotcha.  Migrating tracked state through the VMSA would probably be ideal.
The semantics of __set_sregs() kinda setting state but not reaaaally setting
state would be weird.
Paolo Bonzini Nov. 30, 2020, 6:28 p.m. UTC | #6
On 16/09/20 00:44, Sean Christopherson wrote:
>> KVM doesn't have control of them. They are part of the guest's encrypted
>> state and that is what the guest uses. KVM can't alter the value that the
>> guest is using for them once the VMSA is encrypted. However, KVM makes
>> some decisions based on the values it thinks it knows.  For example, early
>> on I remember the async PF support failing because the CR0 that KVM
>> thought the guest had didn't have the PE bit set, even though the guest
>> was in protected mode. So KVM didn't include the error code in the
>> exception it injected (is_protmode() was false) and things failed. Without
>> syncing these values after live migration, things also fail (probably for
>> the same reason). So the idea is to just keep KVM apprised of the values
>> that the guest has.
> 
> Ah, gotcha.  Migrating tracked state through the VMSA would probably be ideal.
> The semantics of __set_sregs() kinda setting state but not reaaaally setting
> state would be weird.

How would that work with TDX?

Paolo
Sean Christopherson Nov. 30, 2020, 7:39 p.m. UTC | #7
On Mon, Nov 30, 2020, Paolo Bonzini wrote:
> On 16/09/20 00:44, Sean Christopherson wrote:
> > > KVM doesn't have control of them. They are part of the guest's encrypted
> > > state and that is what the guest uses. KVM can't alter the value that the
> > > guest is using for them once the VMSA is encrypted. However, KVM makes
> > > some decisions based on the values it thinks it knows.  For example, early
> > > on I remember the async PF support failing because the CR0 that KVM
> > > thought the guest had didn't have the PE bit set, even though the guest
> > > was in protected mode. So KVM didn't include the error code in the
> > > exception it injected (is_protmode() was false) and things failed. Without
> > > syncing these values after live migration, things also fail (probably for
> > > the same reason). So the idea is to just keep KVM apprised of the values
> > > that the guest has.
> > 
> > Ah, gotcha.  Migrating tracked state through the VMSA would probably be ideal.
> > The semantics of __set_sregs() kinda setting state but not reaaaally setting
> > state would be weird.
> 
> How would that work with TDX?

Can you elaborate?  I.e. how would what work with TDX?
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6e445a76b691..76efe70cd635 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9090,6 +9090,9 @@  static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 {
 	struct desc_ptr dt;
 
+	if (vcpu->arch.vmsa_encrypted)
+		goto tracking_regs;
+
 	kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
 	kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
 	kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
@@ -9107,12 +9110,15 @@  static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	sregs->gdt.limit = dt.size;
 	sregs->gdt.base = dt.address;
 
-	sregs->cr0 = kvm_read_cr0(vcpu);
 	sregs->cr2 = vcpu->arch.cr2;
 	sregs->cr3 = kvm_read_cr3(vcpu);
+
+tracking_regs:
+	sregs->cr0 = kvm_read_cr0(vcpu);
 	sregs->cr4 = kvm_read_cr4(vcpu);
 	sregs->cr8 = kvm_get_cr8(vcpu);
 	sregs->efer = vcpu->arch.efer;
+
 	sregs->apic_base = kvm_get_apic_base(vcpu);
 
 	memset(sregs->interrupt_bitmap, 0, sizeof(sregs->interrupt_bitmap));
@@ -9248,18 +9254,6 @@  static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	if (kvm_set_apic_base(vcpu, &apic_base_msr))
 		goto out;
 
-	dt.size = sregs->idt.limit;
-	dt.address = sregs->idt.base;
-	kvm_x86_ops.set_idt(vcpu, &dt);
-	dt.size = sregs->gdt.limit;
-	dt.address = sregs->gdt.base;
-	kvm_x86_ops.set_gdt(vcpu, &dt);
-
-	vcpu->arch.cr2 = sregs->cr2;
-	mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
-	vcpu->arch.cr3 = sregs->cr3;
-	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
-
 	kvm_set_cr8(vcpu, sregs->cr8);
 
 	mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
@@ -9276,6 +9270,14 @@  static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	if (cpuid_update_needed)
 		kvm_update_cpuid_runtime(vcpu);
 
+	if (vcpu->arch.vmsa_encrypted)
+		goto tracking_regs;
+
+	vcpu->arch.cr2 = sregs->cr2;
+	mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
+	vcpu->arch.cr3 = sregs->cr3;
+	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
+
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 	if (is_pae_paging(vcpu)) {
 		load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu));
@@ -9283,16 +9285,12 @@  static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	}
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 
-	if (mmu_reset_needed)
-		kvm_mmu_reset_context(vcpu);
-
-	max_bits = KVM_NR_INTERRUPTS;
-	pending_vec = find_first_bit(
-		(const unsigned long *)sregs->interrupt_bitmap, max_bits);
-	if (pending_vec < max_bits) {
-		kvm_queue_interrupt(vcpu, pending_vec, false);
-		pr_debug("Set back pending irq %d\n", pending_vec);
-	}
+	dt.size = sregs->idt.limit;
+	dt.address = sregs->idt.base;
+	kvm_x86_ops.set_idt(vcpu, &dt);
+	dt.size = sregs->gdt.limit;
+	dt.address = sregs->gdt.base;
+	kvm_x86_ops.set_gdt(vcpu, &dt);
 
 	kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
 	kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
@@ -9312,6 +9310,18 @@  static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	    !is_protmode(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
+tracking_regs:
+	if (mmu_reset_needed)
+		kvm_mmu_reset_context(vcpu);
+
+	max_bits = KVM_NR_INTERRUPTS;
+	pending_vec = find_first_bit(
+		(const unsigned long *)sregs->interrupt_bitmap, max_bits);
+	if (pending_vec < max_bits) {
+		kvm_queue_interrupt(vcpu, pending_vec, false);
+		pr_debug("Set back pending irq %d\n", pending_vec);
+	}
+
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 
 	ret = 0;