diff mbox series

KVM: x86: Allow restore of some sregs with protected state

Message ID 20230320225159.92771-1-graf@amazon.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Allow restore of some sregs with protected state | expand

Commit Message

Alexander Graf March 20, 2023, 10:51 p.m. UTC
With protected state (like SEV-ES and SEV-SNP), KVM does not have direct
access to guest registers. However, we deflect modifications to CR0,
CR4 and EFER to the host. We also carry the apic_base register and learn
about CR8 directly from a VMCB field.

That means these bits of information do exist in the host's KVM data
structures. If we ever want to resume consumption of an already
initialized VMSA (guest state), we will need to also restore these
additional bits of KVM state.

Prepare ourselves for such a world by allowing set_sregs to set the
relevant fields. This way, it mirrors get_sregs properly that already
exposes them to user space.

Signed-off-by: Alexander Graf <graf@amazon.com>
---
 arch/x86/kvm/x86.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Sean Christopherson March 21, 2023, 5:03 p.m. UTC | #1
+Peter

On Mon, Mar 20, 2023, Alexander Graf wrote:
> With protected state (like SEV-ES and SEV-SNP), KVM does not have direct
> access to guest registers. However, we deflect modifications to CR0,

Please avoid pronouns in changelogs and comments.

> CR4 and EFER to the host. We also carry the apic_base register and learn
> about CR8 directly from a VMCB field.
> 
> That means these bits of information do exist in the host's KVM data
> structures. If we ever want to resume consumption of an already
> initialized VMSA (guest state), we will need to also restore these
> additional bits of KVM state.

For some definitions of "need".  I've looked at this code multiple times in the
past, and even posted patches[1], but I'm still unconvinced that trapping
CR0, CR4, and EFER updates is necessary[2], which is partly why series to fix
this stalled out.

 : If KVM chugs along happily without these patches, I'd love to pivot and yank out
 : all of the CR0/4/8 and EFER trapping/tracking, and then make KVM_GET_SREGS a nop
 : as well.

[1] https://lore.kernel.org/all/20210507165947.2502412-1-seanjc@google.com
[2] https://lore.kernel.org/all/YJla8vpwqCxqgS8C@google.com

> Prepare ourselves for such a world by allowing set_sregs to set the
> relevant fields. This way, it mirrors get_sregs properly that already
> exposes them to user space.
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> ---
>  arch/x86/kvm/x86.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7713420abab0..88fa8b657a2f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11370,7 +11370,8 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
>  	int idx;
>  	struct desc_ptr dt;
>  
> -	if (!kvm_is_valid_sregs(vcpu, sregs))
> +	if (!vcpu->arch.guest_state_protected &&
> +	    !kvm_is_valid_sregs(vcpu, sregs))

This is broken, userspace shouldn't be allowed to stuff complete garbage just
because guest state is protected.

>  		return -EINVAL;
>  
>  	apic_base_msr.data = sregs->apic_base;
> @@ -11378,8 +11379,19 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
>  	if (kvm_set_apic_base(vcpu, &apic_base_msr))
>  		return -EINVAL;
>  
> -	if (vcpu->arch.guest_state_protected)
> +	if (vcpu->arch.guest_state_protected) {
> +		/*
> +		 * While most actual guest state is hidden from us,
> +		 * CR{0,4,8}, efer and apic_base still hold KVM state
> +		 * with protection enabled, so let's allow restoring
> +		 */
> +		kvm_set_cr8(vcpu, sregs->cr8);
> +		kvm_x86_ops.set_efer(vcpu, sregs->efer);
> +		kvm_x86_ops.set_cr0(vcpu, sregs->cr0);

Use static_call().  This code is also broken in that it doesn't set mmu_reset_needed.
This patch also misses the problematic behavior in svm_set_cr0().

And I don't like having a completely separate one-off flow for protected guests.
It's more code and arguably uglier, but I would prefer to explicitly skip each
chunk as needed so that we aren't effectively maintaining multiple flows.

Easiest thing is probably to just resurrect my aforementioned series, pending the
result of the discussion on whether or not KVM should be trapping CR0/CR4/EFER
writes in the first place.
Peter Gonda March 21, 2023, 6:15 p.m. UTC | #2
On Tue, Mar 21, 2023 at 11:03 AM Sean Christopherson <seanjc@google.com> wrote:
>
> +Peter
>
> On Mon, Mar 20, 2023, Alexander Graf wrote:
> > With protected state (like SEV-ES and SEV-SNP), KVM does not have direct
> > access to guest registers. However, we deflect modifications to CR0,
>
> Please avoid pronouns in changelogs and comments.
>
> > CR4 and EFER to the host. We also carry the apic_base register and learn
> > about CR8 directly from a VMCB field.
> >
> > That means these bits of information do exist in the host's KVM data
> > structures. If we ever want to resume consumption of an already
> > initialized VMSA (guest state), we will need to also restore these
> > additional bits of KVM state.
>
> For some definitions of "need".  I've looked at this code multiple times in the
> past, and even posted patches[1], but I'm still unconvinced that trapping
> CR0, CR4, and EFER updates is necessary[2], which is partly why series to fix
> this stalled out.
>
>  : If KVM chugs along happily without these patches, I'd love to pivot and yank out
>  : all of the CR0/4/8 and EFER trapping/tracking, and then make KVM_GET_SREGS a nop
>  : as well.
>
> [1] https://lore.kernel.org/all/20210507165947.2502412-1-seanjc@google.com
> [2] https://lore.kernel.org/all/YJla8vpwqCxqgS8C@google.com

Yea we are using similar patches to do intra-host migration for SNP VMs.

I have dropped the ball on my AI from that thread. Let me look/test this patch.

>
> > Prepare ourselves for such a world by allowing set_sregs to set the
> > relevant fields. This way, it mirrors get_sregs properly that already
> > exposes them to user space.
> >
> > Signed-off-by: Alexander Graf <graf@amazon.com>
> > ---
> >  arch/x86/kvm/x86.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 7713420abab0..88fa8b657a2f 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -11370,7 +11370,8 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
> >       int idx;
> >       struct desc_ptr dt;
> >
> > -     if (!kvm_is_valid_sregs(vcpu, sregs))
> > +     if (!vcpu->arch.guest_state_protected &&
> > +         !kvm_is_valid_sregs(vcpu, sregs))
>
> This is broken, userspace shouldn't be allowed to stuff complete garbage just
> because guest state is protected.

Agreed, we've been using something like this:

kvm_valid_sregs()
-       if (sregs->fer & EFER_LME && !vcpu->arch.guest_state_protected) {
+       if (sregs->efer & EFER_LME) {


>
> >               return -EINVAL;
> >
> >       apic_base_msr.data = sregs->apic_base;
> > @@ -11378,8 +11379,19 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
> >       if (kvm_set_apic_base(vcpu, &apic_base_msr))
> >               return -EINVAL;
> >
> > -     if (vcpu->arch.guest_state_protected)
> > +     if (vcpu->arch.guest_state_protected) {
> > +             /*
> > +              * While most actual guest state is hidden from us,
> > +              * CR{0,4,8}, efer and apic_base still hold KVM state
> > +              * with protection enabled, so let's allow restoring
> > +              */
> > +             kvm_set_cr8(vcpu, sregs->cr8);
> > +             kvm_x86_ops.set_efer(vcpu, sregs->efer);
> > +             kvm_x86_ops.set_cr0(vcpu, sregs->cr0);
>
> Use static_call().  This code is also broken in that it doesn't set mmu_reset_needed.
> This patch also misses the problematic behavior in svm_set_cr0().
>
> And I don't like having a completely separate one-off flow for protected guests.
> It's more code and arguably uglier, but I would prefer to explicitly skip each
> chunk as needed so that we aren't effectively maintaining multiple flows.
>
> Easiest thing is probably to just resurrect my aforementioned series, pending the
> result of the discussion on whether or not KVM should be trapping CR0/CR4/EFER
> writes in the first place.
Alexander Graf March 21, 2023, 11:46 p.m. UTC | #3
On 21.03.23 19:15, Peter Gonda wrote:

> On Tue, Mar 21, 2023 at 11:03 AM Sean Christopherson <seanjc@google.com> wrote:
>> +Peter
>>
>> On Mon, Mar 20, 2023, Alexander Graf wrote:
>>> With protected state (like SEV-ES and SEV-SNP), KVM does not have direct
>>> access to guest registers. However, we deflect modifications to CR0,
>> Please avoid pronouns in changelogs and comments.
>>
>>> CR4 and EFER to the host. We also carry the apic_base register and learn
>>> about CR8 directly from a VMCB field.
>>>
>>> That means these bits of information do exist in the host's KVM data
>>> structures. If we ever want to resume consumption of an already
>>> initialized VMSA (guest state), we will need to also restore these
>>> additional bits of KVM state.
>> For some definitions of "need".  I've looked at this code multiple times in the
>> past, and even posted patches[1], but I'm still unconvinced that trapping
>> CR0, CR4, and EFER updates is necessary[2], which is partly why series to fix
>> this stalled out.
>>
>>   : If KVM chugs along happily without these patches, I'd love to pivot and yank out
>>   : all of the CR0/4/8 and EFER trapping/tracking, and then make KVM_GET_SREGS a nop
>>   : as well.
>>
>> [1] https://lore.kernel.org/all/20210507165947.2502412-1-seanjc@google.com
>> [2] https://lore.kernel.org/all/YJla8vpwqCxqgS8C@google.com
> Yea we are using similar patches to do intra-host migration for SNP VMs.
>
> I have dropped the ball on my AI from that thread. Let me look/test this patch.


Awesome, thanks. If we can get away without any of the above states and 
make sregs completely useless for protected state, I'd be even happier :)


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7713420abab0..88fa8b657a2f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11370,7 +11370,8 @@  static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
 	int idx;
 	struct desc_ptr dt;
 
-	if (!kvm_is_valid_sregs(vcpu, sregs))
+	if (!vcpu->arch.guest_state_protected &&
+	    !kvm_is_valid_sregs(vcpu, sregs))
 		return -EINVAL;
 
 	apic_base_msr.data = sregs->apic_base;
@@ -11378,8 +11379,19 @@  static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
 	if (kvm_set_apic_base(vcpu, &apic_base_msr))
 		return -EINVAL;
 
-	if (vcpu->arch.guest_state_protected)
+	if (vcpu->arch.guest_state_protected) {
+		/*
+		 * While most actual guest state is hidden from us,
+		 * CR{0,4,8}, efer and apic_base still hold KVM state
+		 * with protection enabled, so let's allow restoring
+		 */
+		kvm_set_cr8(vcpu, sregs->cr8);
+		kvm_x86_ops.set_efer(vcpu, sregs->efer);
+		kvm_x86_ops.set_cr0(vcpu, sregs->cr0);
+		vcpu->arch.cr0 = sregs->cr0;
+		kvm_x86_ops.set_cr4(vcpu, sregs->cr4);
 		return 0;
+	}
 
 	dt.size = sregs->idt.limit;
 	dt.address = sregs->idt.base;