diff mbox series

[1/2,v2] KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it

Message ID 20200415183047.11493-2-krish.sadhukhan@oracle.com (mailing list archive)
State New, archived
Headers show
Series kvm-unit-test: nVMX: Test Selector and Base Address fields of Guest Segment registers | expand

Commit Message

Krish Sadhukhan April 15, 2020, 6:30 p.m. UTC
Currently, prepare_vmcs02_early() does not check if the "unrestricted guest"
VM-execution control in vmcs12 is turned off and leaves the corresponding
bit on in vmcs02. Due to this setting, vmentry checks which are supposed to
render the nested guest state as invalid when this VM-execution control is
not set, are passing in hardware.

This patch turns off the "unrestricted guest" VM-execution control in vmcs02
if vmcs12 has turned it off.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
---
 arch/x86/kvm/vmx/nested.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Sean Christopherson April 15, 2020, 7:30 p.m. UTC | #1
On Wed, Apr 15, 2020 at 02:30:46PM -0400, Krish Sadhukhan wrote:
> Currently, prepare_vmcs02_early() does not check if the "unrestricted guest"
> VM-execution control in vmcs12 is turned off and leaves the corresponding
> bit on in vmcs02. Due to this setting, vmentry checks which are supposed to
> render the nested guest state as invalid when this VM-execution control is
> not set, are passing in hardware.
> 
> This patch turns off the "unrestricted guest" VM-execution control in vmcs02
> if vmcs12 has turned it off.
> 
> Suggested-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index cbc9ea2de28f..4fa5f8b51c82 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2224,6 +2224,9 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
>  			vmcs_write16(GUEST_INTR_STATUS,
>  				vmcs12->guest_intr_status);
>  
> +		if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
> +			exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;

Has anyone worked through all the flows to verify this won't break any
assumptions with respect to enable_unrestricted_guest?  I would be
(pleasantly) surprised if this was sufficient to run L2 without
unrestricted guest when it's enabled for L1, e.g. vmx_set_cr0() looks
suspect.

> +
>  		secondary_exec_controls_set(vmx, exec_control);
>  	}
>  
> -- 
> 2.20.1
>
Jim Mattson April 15, 2020, 8:18 p.m. UTC | #2
On Wed, Apr 15, 2020 at 12:30 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Wed, Apr 15, 2020 at 02:30:46PM -0400, Krish Sadhukhan wrote:
> > Currently, prepare_vmcs02_early() does not check if the "unrestricted guest"
> > VM-execution control in vmcs12 is turned off and leaves the corresponding
> > bit on in vmcs02. Due to this setting, vmentry checks which are supposed to
> > render the nested guest state as invalid when this VM-execution control is
> > not set, are passing in hardware.
> >
> > This patch turns off the "unrestricted guest" VM-execution control in vmcs02
> > if vmcs12 has turned it off.
> >
> > Suggested-by: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index cbc9ea2de28f..4fa5f8b51c82 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -2224,6 +2224,9 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> >                       vmcs_write16(GUEST_INTR_STATUS,
> >                               vmcs12->guest_intr_status);
> >
> > +             if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
> > +                     exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;

Better, I think, would be to add SECONDARY_EXEC_UNRESTRICTED_GUEST to
the mask here:

/* Take the following fields only from vmcs12 */
exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
  SECONDARY_EXEC_ENABLE_INVPCID |
  SECONDARY_EXEC_RDTSCP |
  SECONDARY_EXEC_XSAVES |
  SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
  SECONDARY_EXEC_APIC_REGISTER_VIRT |
  SECONDARY_EXEC_ENABLE_VMFUNC);

> Has anyone worked through all the flows to verify this won't break any
> assumptions with respect to enable_unrestricted_guest?  I would be
> (pleasantly) surprised if this was sufficient to run L2 without
> unrestricted guest when it's enabled for L1, e.g. vmx_set_cr0() looks
> suspect.

I think you're right to be concerned.
Paolo Bonzini April 16, 2020, 9:18 a.m. UTC | #3
On 15/04/20 22:18, Jim Mattson wrote:
>> Has anyone worked through all the flows to verify this won't break any
>> assumptions with respect to enable_unrestricted_guest?  I would be
>> (pleasantly) surprised if this was sufficient to run L2 without
>> unrestricted guest when it's enabled for L1, e.g. vmx_set_cr0() looks
>> suspect.
> 
> I think you're right to be concerned.

Thirded, but it shouldn't be too hard.  Basically,
enable_unrestricted_guest must be moved into loaded_vmcs for this to
work.  It may be more work to write the test cases for L2 real mode <->
protected mode switch, which do not entirely fit into the vmx_tests.c
framework (but with the v2 tests it should not be hard to adapt).

Paolo
Krish Sadhukhan April 18, 2020, 1:29 a.m. UTC | #4
On 4/16/20 2:18 AM, Paolo Bonzini wrote:
> On 15/04/20 22:18, Jim Mattson wrote:
>>> Has anyone worked through all the flows to verify this won't break any
>>> assumptions with respect to enable_unrestricted_guest?  I would be
>>> (pleasantly) surprised if this was sufficient to run L2 without
>>> unrestricted guest when it's enabled for L1, e.g. vmx_set_cr0() looks
>>> suspect.
>> I think you're right to be concerned.
> Thirded, but it shouldn't be too hard.  Basically,
> enable_unrestricted_guest must be moved into loaded_vmcs for this to
> work.  It may be more work to write the test cases for L2 real mode <->
> protected mode switch, which do not entirely fit into the vmx_tests.c
> framework (but with the v2 tests it should not be hard to adapt).


OK, I will move enable_unrestricted_guest  to loaded_vmcs.

I also see that enable_ept controls the setting of 
enable_unrestricted_guest. Perhaps both need to be moved to loaded_vmcs ?

About testing, I am thinking the test will first vmlaunch L2 in real 
mode or in protected mode, then vmexit on vmcall and then vmresume in 
the other mode. Is that how the test should flow ?

>
> Paolo
>
Sean Christopherson April 18, 2020, 1:55 a.m. UTC | #5
On Fri, Apr 17, 2020 at 06:29:23PM -0700, Krish Sadhukhan wrote:
> 
> On 4/16/20 2:18 AM, Paolo Bonzini wrote:
> >On 15/04/20 22:18, Jim Mattson wrote:
> >>>Has anyone worked through all the flows to verify this won't break any
> >>>assumptions with respect to enable_unrestricted_guest?  I would be
> >>>(pleasantly) surprised if this was sufficient to run L2 without
> >>>unrestricted guest when it's enabled for L1, e.g. vmx_set_cr0() looks
> >>>suspect.
> >>I think you're right to be concerned.
> >Thirded, but it shouldn't be too hard.  Basically,
> >enable_unrestricted_guest must be moved into loaded_vmcs for this to
> >work.  It may be more work to write the test cases for L2 real mode <->
> >protected mode switch, which do not entirely fit into the vmx_tests.c
> >framework (but with the v2 tests it should not be hard to adapt).
> 
> 
> OK, I will move enable_unrestricted_guest  to loaded_vmcs.

Hmm, enable_unrestricted_guest doesn't _need_ to be moved to loaded_vmcs,
L1 can never diverge from enable_unrestricted_guest.  E.g. the main control
variable can stay global, we just need a flag in nested_vmx to override the
main control.  A simple wrapper can then take care of the check, e.g.

  static inline bool is_unrestricted_guest(struct kvm_vcpu *vcpu)
  {
	return enable_unrestricted_guest && (!is_guest_mode(vcpu) ||
	       to_vmx(vcpu)->nested.unrestricted_guest);
  }

Putting the flag in loaded_vmcs might be more performant?  My guess is it'd
be in the noise, at which point I'd rather have it be clear the override is
only possible/necessary for nested guests.

> I also see that enable_ept controls the setting of
> enable_unrestricted_guest. Perhaps both need to be moved to loaded_vmcs ?

No, letting L1 disable EPT in L0 would be pure insanity, and the overall
paging mode of L2 is already reflected in the MMU.

The dependency on EPT is that VMX requires paging of some form and
unrestricted guest allows entering non-root with CR0.PG=0, i.e. requires EPT
be enabled.

> About testing, I am thinking the test will first vmlaunch L2 in real mode or
> in protected mode, then vmexit on vmcall and then vmresume in the other
> mode. Is that how the test should flow ?
> 
> >
> >Paolo
> >
Paolo Bonzini April 18, 2020, 9:53 a.m. UTC | #6
On 18/04/20 03:55, Sean Christopherson wrote:
> 
>   static inline bool is_unrestricted_guest(struct kvm_vcpu *vcpu)
>   {
> 	return enable_unrestricted_guest && (!is_guest_mode(vcpu) ||
> 	       to_vmx(vcpu)->nested.unrestricted_guest);
>   }
>
> Putting the flag in loaded_vmcs might be more performant?  My guess is it'd
> be in the noise, at which point I'd rather have it be clear the override is
> only possible/necessary for nested guests.

Even better: you can use secondary_exec_controls_get, which does get the
flag from the loaded_vmcs :) but without actually having to add one.

>> I also see that enable_ept controls the setting of
>> enable_unrestricted_guest. Perhaps both need to be moved to loaded_vmcs ?
>
> No, letting L1 disable EPT in L0 would be pure insanity, and the overall
> paging mode of L2 is already reflected in the MMU.

Absolutely.  Unrestricted guest requires EPT, but EPT is invisible to
the guest.  (Currently EPT requires guest MAXPHYADDR = host MAXPHYADDR,
in the sense that the guest can detect that the host is lying about
MAXPHYADDR; but that is really a bug that I hope will be fixed in 5.8,
relaxing the requirement to guest MAXPHYADDR <= host PHYADDR).

Paolo

> The dependency on EPT is that VMX requires paging of some form and
> unrestricted guest allows entering non-root with CR0.PG=0, i.e. requires EPT
> be enabled.
Sean Christopherson April 20, 2020, 3:12 p.m. UTC | #7
On Sat, Apr 18, 2020 at 11:53:36AM +0200, Paolo Bonzini wrote:
> On 18/04/20 03:55, Sean Christopherson wrote:
> > 
> >   static inline bool is_unrestricted_guest(struct kvm_vcpu *vcpu)
> >   {
> > 	return enable_unrestricted_guest && (!is_guest_mode(vcpu) ||
> > 	       to_vmx(vcpu)->nested.unrestricted_guest);
> >   }
> >
> > Putting the flag in loaded_vmcs might be more performant?  My guess is it'd
> > be in the noise, at which point I'd rather have it be clear the override is
> > only possible/necessary for nested guests.
> 
> Even better: you can use secondary_exec_controls_get, which does get the
> flag from the loaded_vmcs :) but without actually having to add one.

I keep forgetting we have those shadows.  Definitely the best solution.
Krish Sadhukhan April 28, 2020, 7:25 a.m. UTC | #8
On 4/18/20 2:53 AM, Paolo Bonzini wrote:
> On 18/04/20 03:55, Sean Christopherson wrote:
>>    static inline bool is_unrestricted_guest(struct kvm_vcpu *vcpu)
>>    {
>> 	return enable_unrestricted_guest && (!is_guest_mode(vcpu) ||
>> 	       to_vmx(vcpu)->nested.unrestricted_guest);
>>    }
>>
>> Putting the flag in loaded_vmcs might be more performant?  My guess is it'd
>> be in the noise, at which point I'd rather have it be clear the override is
>> only possible/necessary for nested guests.
> Even better: you can use secondary_exec_controls_get, which does get the
> flag from the loaded_vmcs :) but without actually having to add one.
>
>>> I also see that enable_ept controls the setting of
>>> enable_unrestricted_guest. Perhaps both need to be moved to loaded_vmcs ?
>> No, letting L1 disable EPT in L0 would be pure insanity, and the overall
>> paging mode of L2 is already reflected in the MMU.
> Absolutely.  Unrestricted guest requires EPT, but EPT is invisible to
> the guest.  (Currently EPT requires guest MAXPHYADDR = host MAXPHYADDR,
> in the sense that the guest can detect that the host is lying about
> MAXPHYADDR; but that is really a bug that I hope will be fixed in 5.8,
> relaxing the requirement to guest MAXPHYADDR <= host PHYADDR).


Should EPT for the nested guest be set up in the normal way (PML4E -> 
PDPTE-> PDE -> PTE) when GUEST_CR0.PE is zero ? Or does it have to be a 
special set up like only the PTEs are needed because no protection and 
no paging are used ?

> Paolo
>
>> The dependency on EPT is that VMX requires paging of some form and
>> unrestricted guest allows entering non-root with CR0.PG=0, i.e. requires EPT
>> be enabled.
Paolo Bonzini April 28, 2020, 8:14 a.m. UTC | #9
On 28/04/20 09:25, Krish Sadhukhan wrote:
>>>
>> Absolutely.  Unrestricted guest requires EPT, but EPT is invisible to
>> the guest.  (Currently EPT requires guest MAXPHYADDR = host MAXPHYADDR,
>> in the sense that the guest can detect that the host is lying about
>> MAXPHYADDR; but that is really a bug that I hope will be fixed in 5.8,
>> relaxing the requirement to guest MAXPHYADDR <= host PHYADDR).
> 
> Should EPT for the nested guest be set up in the normal way (PML4E ->
> PDPTE-> PDE -> PTE) when GUEST_CR0.PE is zero ? Or does it have to be a
> special set up like only the PTEs are needed because no protection and
> no paging are used ?

I don't understand.  When EPT is in use, the vmcs02 CR3 is simply set to
the vmcs12 CR3.

Paolo
Krish Sadhukhan April 28, 2020, 5:38 p.m. UTC | #10
On 4/28/20 1:14 AM, Paolo Bonzini wrote:
> On 28/04/20 09:25, Krish Sadhukhan wrote:
>>> Absolutely.  Unrestricted guest requires EPT, but EPT is invisible to
>>> the guest.  (Currently EPT requires guest MAXPHYADDR = host MAXPHYADDR,
>>> in the sense that the guest can detect that the host is lying about
>>> MAXPHYADDR; but that is really a bug that I hope will be fixed in 5.8,
>>> relaxing the requirement to guest MAXPHYADDR <= host PHYADDR).
>> Should EPT for the nested guest be set up in the normal way (PML4E ->
>> PDPTE-> PDE -> PTE) when GUEST_CR0.PE is zero ? Or does it have to be a
>> special set up like only the PTEs are needed because no protection and
>> no paging are used ?
> I don't understand.  When EPT is in use, the vmcs02 CR3 is simply set to
> the vmcs12 CR3.


Sorry, I should have framed my question in a better way.

My question is  how should L1 in the test code set up EPTP for L2 when 
L2 is an unrestricted guest with no protection (GUEST_CR0.PE = 0) and no 
paging (GUEST_CR0.PG = 0) ? Should EPTP in test code be set up in the 
same way as when L2 is an unrestricted guest with protection and paging 
enabled ?

Getting confused by legacy 16-bit Real Mode and an unrestricted guest in 
Real Mode. :-)
> Paolo
>
Jim Mattson April 28, 2020, 6 p.m. UTC | #11
On Tue, Apr 28, 2020 at 10:39 AM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 4/28/20 1:14 AM, Paolo Bonzini wrote:
> > On 28/04/20 09:25, Krish Sadhukhan wrote:
> >>> Absolutely.  Unrestricted guest requires EPT, but EPT is invisible to
> >>> the guest.  (Currently EPT requires guest MAXPHYADDR = host MAXPHYADDR,
> >>> in the sense that the guest can detect that the host is lying about
> >>> MAXPHYADDR; but that is really a bug that I hope will be fixed in 5.8,
> >>> relaxing the requirement to guest MAXPHYADDR <= host PHYADDR).
> >> Should EPT for the nested guest be set up in the normal way (PML4E ->
> >> PDPTE-> PDE -> PTE) when GUEST_CR0.PE is zero ? Or does it have to be a
> >> special set up like only the PTEs are needed because no protection and
> >> no paging are used ?
> > I don't understand.  When EPT is in use, the vmcs02 CR3 is simply set to
> > the vmcs12 CR3.
>
>
> Sorry, I should have framed my question in a better way.
>
> My question is  how should L1 in the test code set up EPTP for L2 when
> L2 is an unrestricted guest with no protection (GUEST_CR0.PE = 0) and no
> paging (GUEST_CR0.PG = 0) ? Should EPTP in test code be set up in the
> same way as when L2 is an unrestricted guest with protection and paging
> enabled ?

The EPT maps guest physical addresses to host physical addresses. That
mapping is the same regardless of whether the guest is using paging or
not.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index cbc9ea2de28f..4fa5f8b51c82 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2224,6 +2224,9 @@  static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 			vmcs_write16(GUEST_INTR_STATUS,
 				vmcs12->guest_intr_status);
 
+		if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
+			exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+
 		secondary_exec_controls_set(vmx, exec_control);
 	}