diff mbox series

KVM: nSVM: vmentry ignores EFER.LMA and possibly RFLAGS.VM

Message ID 20200709095525.907771-1-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: nSVM: vmentry ignores EFER.LMA and possibly RFLAGS.VM | expand

Commit Message

Paolo Bonzini July 9, 2020, 9:55 a.m. UTC
AMD doesn't specify (unlike Intel) that EFER.LME, CR0.PG and
EFER.LMA must be consistent, and for SMM state restore they say that
"The EFER.LMA register bit is set to the value obtained by logically
ANDing the SMRAM values of EFER.LME, CR0.PG, and CR4.PAE".  It turns
out that this is also true for vmentry: the EFER.LMA value in the VMCB
is completely ignored, and so is EFLAGS.VM if the processor is in
long mode or real mode.

Implement these quirks; the EFER.LMA part is needed because svm_set_efer
looks at the LMA bit in order to support EFER.NX=0, while the EFLAGS.VM
part is just because we can.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Comments

Jim Mattson July 9, 2020, 5:12 p.m. UTC | #1
On Thu, Jul 9, 2020 at 2:55 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> AMD doesn't specify (unlike Intel) that EFER.LME, CR0.PG and
> EFER.LMA must be consistent, and for SMM state restore they say that
> "The EFER.LMA register bit is set to the value obtained by logically
> ANDing the SMRAM values of EFER.LME, CR0.PG, and CR4.PAE".  It turns
> out that this is also true for vmentry: the EFER.LMA value in the VMCB
> is completely ignored, and so is EFLAGS.VM if the processor is in
> long mode or real mode.
>
> Implement these quirks; the EFER.LMA part is needed because svm_set_efer
> looks at the LMA bit in order to support EFER.NX=0, while the EFLAGS.VM
> part is just because we can.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 402ea5b412f0..1c82a1789e0e 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -337,6 +337,24 @@ static void nested_vmcb_save_pending_event(struct vcpu_svm *svm,
>
>  static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
>  {
> +       u64 efer = nested_vmcb->save.efer;
> +
> +       /* The processor ignores EFER.LMA, but svm_set_efer needs it.  */
> +       efer &= ~EFER_LMA;
> +       if ((nested_vmcb->save.cr0 & X86_CR0_PG)
> +           && (nested_vmcb->save.cr4 & X86_CR4_PAE)
> +           && (efer & EFER_LME))
> +               efer |= EFER_LMA;

The CR4.PAE check is unnecessary, isn't it? The combination CR0.PG=1,
EFER.LMA=1, and CR4.PAE=0 is not a legal processor state.

According to the SDM,

* IA32_EFER.LME cannot be modified while paging is enabled (CR0.PG =
1). Attempts to do so using WRMSR cause a general-protection exception
(#GP(0)).
* Paging cannot be enabled (by setting CR0.PG to 1) while CR4.PAE = 0
and IA32_EFER.LME = 1. Attempts to do so using MOV to CR0 cause a
general-protection exception (#GP(0)).
* CR4.PAE and CR4.LA57 cannot be modified while either 4-level paging
or 5-level paging is in use (when CR0.PG = 1 and IA32_EFER.LME = 1).
Attempts to do so using MOV to CR4 cause a general-protection
exception (#GP(0)).

> +
> +       /*
> +        * Likewise RFLAGS.VM is cleared if inconsistent with other processor
> +        * state.  This is sort-of documented in "10.4 Leaving SMM" but applies
> +        * to SVM as well.
> +        */
> +       if (!(nested_vmcb->save.cr0 & X86_CR0_PE)
> +           || (efer & EFER_LMA))
> +               nested_vmcb->save.rflags &= ~X86_EFLAGS_VM;
> +
>         /* Load the nested guest state */
>         svm->vmcb->save.es = nested_vmcb->save.es;
>         svm->vmcb->save.cs = nested_vmcb->save.cs;
> @@ -345,7 +363,7 @@ static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_v
>         svm->vmcb->save.gdtr = nested_vmcb->save.gdtr;
>         svm->vmcb->save.idtr = nested_vmcb->save.idtr;
>         kvm_set_rflags(&svm->vcpu, nested_vmcb->save.rflags);
> -       svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
> +       svm_set_efer(&svm->vcpu, efer);
>         svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
>         svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
>         (void)kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
> --
> 2.26.2
>
Paolo Bonzini July 9, 2020, 5:25 p.m. UTC | #2
On 09/07/20 19:12, Jim Mattson wrote:
>> +
>> +       /* The processor ignores EFER.LMA, but svm_set_efer needs it.  */
>> +       efer &= ~EFER_LMA;
>> +       if ((nested_vmcb->save.cr0 & X86_CR0_PG)
>> +           && (nested_vmcb->save.cr4 & X86_CR4_PAE)
>> +           && (efer & EFER_LME))
>> +               efer |= EFER_LMA;
> The CR4.PAE check is unnecessary, isn't it? The combination CR0.PG=1,
> EFER.LMA=1, and CR4.PAE=0 is not a legal processor state.

Yeah, I was being a bit cautious because this is the nested VMCB and it
can be filled in with invalid state, but indeed that condition was added
just yesterday by myself in nested_vmcb_checks (while reviewing Krish's
CR0/CR3/CR4 reserved bit check series).

That said, the VMCB here is guest memory and it can change under our
feet between nested_vmcb_checks and nested_prepare_vmcb_save.  Copying
the whole save area is overkill, but we probably should copy at least
EFER/CR0/CR3/CR4 in a struct at the beginning of nested_svm_vmrun; this
way there'd be no TOC/TOU issues between nested_vmcb_checks and
nested_svm_vmrun.  This would also make it easier to reuse the checks in
svm_set_nested_state.  Maybe Maxim can look at it while I'm on vacation,
as he's eager to do more nSVM stuff. :D

I'll drop this patch for now.

Thanks for the speedy review!

Paolo

> According to the SDM,
> 
> * IA32_EFER.LME cannot be modified while paging is enabled (CR0.PG =
> 1). Attempts to do so using WRMSR cause a general-protection exception
> (#GP(0)).
> * Paging cannot be enabled (by setting CR0.PG to 1) while CR4.PAE = 0
> and IA32_EFER.LME = 1. Attempts to do so using MOV to CR0 cause a
> general-protection exception (#GP(0)).
> * CR4.PAE and CR4.LA57 cannot be modified while either 4-level paging
> or 5-level paging is in use (when CR0.PG = 1 and IA32_EFER.LME = 1).
> Attempts to do so using MOV to CR4 cause a general-protection
> exception (#GP(0)).
>
Jim Mattson July 9, 2020, 6:28 p.m. UTC | #3
On Thu, Jul 9, 2020 at 10:25 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/07/20 19:12, Jim Mattson wrote:
> >> +
> >> +       /* The processor ignores EFER.LMA, but svm_set_efer needs it.  */
> >> +       efer &= ~EFER_LMA;
> >> +       if ((nested_vmcb->save.cr0 & X86_CR0_PG)
> >> +           && (nested_vmcb->save.cr4 & X86_CR4_PAE)
> >> +           && (efer & EFER_LME))
> >> +               efer |= EFER_LMA;
> > The CR4.PAE check is unnecessary, isn't it? The combination CR0.PG=1,
> > EFER.LMA=1, and CR4.PAE=0 is not a legal processor state.

Oops, I meant EFER.LME above.

Krish pointed out that I was quoting from Intel's documentation. The
same constraints are covered in Table 14-5 of AMD's APM, volume 2.

> Yeah, I was being a bit cautious because this is the nested VMCB and it
> can be filled in with invalid state, but indeed that condition was added
> just yesterday by myself in nested_vmcb_checks (while reviewing Krish's
> CR0/CR3/CR4 reserved bit check series).

From Canonicalization and Consistency Checks of section 15.5 in AMD's
APM, volume 2:

The following conditions are considered illegal state combinations:
...
EFER.LME and CR0.PG are both set and CR4.PAE is zero.

This VMCB state should result in an immediate #VMEXIT with exit code -1.

> That said, the VMCB here is guest memory and it can change under our
> feet between nested_vmcb_checks and nested_prepare_vmcb_save.  Copying
> the whole save area is overkill, but we probably should copy at least
> EFER/CR0/CR3/CR4 in a struct at the beginning of nested_svm_vmrun; this
> way there'd be no TOC/TOU issues between nested_vmcb_checks and
> nested_svm_vmrun.  This would also make it easier to reuse the checks in
> svm_set_nested_state.  Maybe Maxim can look at it while I'm on vacation,
> as he's eager to do more nSVM stuff. :D

I fear that nested SVM is rife with TOCTTOU issues.
Paolo Bonzini July 9, 2020, 6:31 p.m. UTC | #4
On 09/07/20 20:28, Jim Mattson wrote:
>> That said, the VMCB here is guest memory and it can change under our
>> feet between nested_vmcb_checks and nested_prepare_vmcb_save.  Copying
>> the whole save area is overkill, but we probably should copy at least
>> EFER/CR0/CR3/CR4 in a struct at the beginning of nested_svm_vmrun; this
>> way there'd be no TOC/TOU issues between nested_vmcb_checks and
>> nested_svm_vmrun.  This would also make it easier to reuse the checks in
>> svm_set_nested_state.  Maybe Maxim can look at it while I'm on vacation,
>> as he's eager to do more nSVM stuff. :D
>
> I fear that nested SVM is rife with TOCTTOU issues.

I am pretty sure about that, actually. :)

Another possibility to stomp them in a more efficient manner could be to
rely on the dirty flags, and use them to set up an in-memory copy of the
VMCB.

Paolo
Jim Mattson July 9, 2020, 6:40 p.m. UTC | #5
On Thu, Jul 9, 2020 at 11:31 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/07/20 20:28, Jim Mattson wrote:
> >> That said, the VMCB here is guest memory and it can change under our
> >> feet between nested_vmcb_checks and nested_prepare_vmcb_save.  Copying
> >> the whole save area is overkill, but we probably should copy at least
> >> EFER/CR0/CR3/CR4 in a struct at the beginning of nested_svm_vmrun; this
> >> way there'd be no TOC/TOU issues between nested_vmcb_checks and
> >> nested_svm_vmrun.  This would also make it easier to reuse the checks in
> >> svm_set_nested_state.  Maybe Maxim can look at it while I'm on vacation,
> >> as he's eager to do more nSVM stuff. :D
> >
> > I fear that nested SVM is rife with TOCTTOU issues.
>
> I am pretty sure about that, actually. :)
>
> Another possibility to stomp them in a more efficient manner could be to
> rely on the dirty flags, and use them to set up an in-memory copy of the
> VMCB.

That sounds like a great idea! Is Maxim going to look into that?
Paolo Bonzini July 9, 2020, 6:41 p.m. UTC | #6
On 09/07/20 20:40, Jim Mattson wrote:
> On Thu, Jul 9, 2020 at 11:31 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 09/07/20 20:28, Jim Mattson wrote:
>>>> That said, the VMCB here is guest memory and it can change under our
>>>> feet between nested_vmcb_checks and nested_prepare_vmcb_save.  Copying
>>>> the whole save area is overkill, but we probably should copy at least
>>>> EFER/CR0/CR3/CR4 in a struct at the beginning of nested_svm_vmrun; this
>>>> way there'd be no TOC/TOU issues between nested_vmcb_checks and
>>>> nested_svm_vmrun.  This would also make it easier to reuse the checks in
>>>> svm_set_nested_state.  Maybe Maxim can look at it while I'm on vacation,
>>>> as he's eager to do more nSVM stuff. :D
>>>
>>> I fear that nested SVM is rife with TOCTTOU issues.
>>
>> I am pretty sure about that, actually. :)
>>
>> Another possibility to stomp them in a more efficient manner could be to
>> rely on the dirty flags, and use them to set up an in-memory copy of the
>> VMCB.
> 
> That sounds like a great idea! Is Maxim going to look into that?
> 

Now he is!

Paolo
Maxim Levitsky July 10, 2020, 8:37 a.m. UTC | #7
On Thu, 2020-07-09 at 20:41 +0200, Paolo Bonzini wrote:
> On 09/07/20 20:40, Jim Mattson wrote:
> > On Thu, Jul 9, 2020 at 11:31 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > On 09/07/20 20:28, Jim Mattson wrote:
> > > > > That said, the VMCB here is guest memory and it can change under our
> > > > > feet between nested_vmcb_checks and nested_prepare_vmcb_save.  Copying
> > > > > the whole save area is overkill, but we probably should copy at least
> > > > > EFER/CR0/CR3/CR4 in a struct at the beginning of nested_svm_vmrun; this
> > > > > way there'd be no TOC/TOU issues between nested_vmcb_checks and
> > > > > nested_svm_vmrun.  This would also make it easier to reuse the checks in
> > > > > svm_set_nested_state.  Maybe Maxim can look at it while I'm on vacation,
> > > > > as he's eager to do more nSVM stuff. :D
> > > > 
> > > > I fear that nested SVM is rife with TOCTTOU issues.
> > > 
> > > I am pretty sure about that, actually. :)
> > > 
> > > Another possibility to stomp them in a more efficient manner could be to
> > > rely on the dirty flags, and use them to set up an in-memory copy of the
> > > VMCB.
> > 
> > That sounds like a great idea! Is Maxim going to look into that?
> > 
> 
> Now he is!

Yep :-)

Best regards,
	Maxim Levitsky

> 
> Paolo
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 402ea5b412f0..1c82a1789e0e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -337,6 +337,24 @@  static void nested_vmcb_save_pending_event(struct vcpu_svm *svm,
 
 static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
 {
+	u64 efer = nested_vmcb->save.efer;
+
+	/* The processor ignores EFER.LMA, but svm_set_efer needs it.  */
+	efer &= ~EFER_LMA;
+	if ((nested_vmcb->save.cr0 & X86_CR0_PG)
+	    && (nested_vmcb->save.cr4 & X86_CR4_PAE)
+	    && (efer & EFER_LME))
+		efer |= EFER_LMA;
+
+	/*
+	 * Likewise RFLAGS.VM is cleared if inconsistent with other processor
+	 * state.  This is sort-of documented in "10.4 Leaving SMM" but applies
+	 * to SVM as well.
+	 */
+	if (!(nested_vmcb->save.cr0 & X86_CR0_PE)
+	    || (efer & EFER_LMA))
+		nested_vmcb->save.rflags &= ~X86_EFLAGS_VM;
+
 	/* Load the nested guest state */
 	svm->vmcb->save.es = nested_vmcb->save.es;
 	svm->vmcb->save.cs = nested_vmcb->save.cs;
@@ -345,7 +363,7 @@  static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_v
 	svm->vmcb->save.gdtr = nested_vmcb->save.gdtr;
 	svm->vmcb->save.idtr = nested_vmcb->save.idtr;
 	kvm_set_rflags(&svm->vcpu, nested_vmcb->save.rflags);
-	svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
+	svm_set_efer(&svm->vcpu, efer);
 	svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
 	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
 	(void)kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);