diff mbox series

[v3,04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT

Message ID 20200320212833.3507-5-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: TLB flushing fixes and enhancements | expand

Commit Message

Sean Christopherson March 20, 2020, 9:28 p.m. UTC
From: Junaid Shahid <junaids@google.com>

Free all roots when emulating INVVPID for L1 and EPT is disabled, as
outstanding changes to the page tables managed by L1 need to be
recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
because VPID is not tracked by the MMU role, all roots in the current
MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
stale SPTEs.

Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
Signed-off-by: Junaid Shahid <junaids@google.com>
[sean: ported to upstream KVM, reworded the comment and changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Vitaly Kuznetsov March 23, 2020, 3:34 p.m. UTC | #1
Sean Christopherson <sean.j.christopherson@intel.com> writes:

> From: Junaid Shahid <junaids@google.com>
>
> Free all roots when emulating INVVPID for L1 and EPT is disabled, as
> outstanding changes to the page tables managed by L1 need to be
> recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
> because VPID is not tracked by the MMU role, all roots in the current
> MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
> VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
> stale SPTEs.
>
> Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
> Signed-off-by: Junaid Shahid <junaids@google.com>
> [sean: ported to upstream KVM, reworded the comment and changelog]
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 9624cea4ed9f..bc74fbbf33c6 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>  		return kvm_skip_emulated_instruction(vcpu);
>  	}
>  
> +	/*
> +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
> +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
> +	 * VPIDs are not tracked in the MMU role.
> +	 *
> +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
> +	 * an MMU when EPT is disabled.
> +	 *
> +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
> +	 */
> +	if (!enable_ept)
> +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
> +				   KVM_MMU_ROOTS_ALL);
> +

This is related to my remark on the previous patch; the comment above
makes me think I'm missing something obvious, enlighten me please)

My understanding is that L1 and L2 will share arch.root_mmu not only
when EPT is globally disabled, we seem to switch between
root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
guests may be different on this. Do we need to handle this somehow?

>  	return nested_vmx_succeed(vcpu);
>  }
Sean Christopherson March 23, 2020, 4:04 p.m. UTC | #2
On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > From: Junaid Shahid <junaids@google.com>
> >
> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
> > outstanding changes to the page tables managed by L1 need to be
> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
> > because VPID is not tracked by the MMU role, all roots in the current
> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
> > stale SPTEs.
> >
> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
> > Signed-off-by: Junaid Shahid <junaids@google.com>
> > [sean: ported to upstream KVM, reworded the comment and changelog]
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 9624cea4ed9f..bc74fbbf33c6 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
> >  		return kvm_skip_emulated_instruction(vcpu);
> >  	}
> >  
> > +	/*
> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
> > +	 * VPIDs are not tracked in the MMU role.
> > +	 *
> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
> > +	 * an MMU when EPT is disabled.
> > +	 *
> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
> > +	 */
> > +	if (!enable_ept)
> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
> > +				   KVM_MMU_ROOTS_ALL);
> > +
> 
> This is related to my remark on the previous patch; the comment above
> makes me think I'm missing something obvious, enlighten me please)
> 
> My understanding is that L1 and L2 will share arch.root_mmu not only
> when EPT is globally disabled, we seem to switch between
> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
> guests may be different on this. Do we need to handle this somehow?

guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
enable_ept is global and cannot be changed without reloading kvm_intel.

This most definitely over-invalidates, e.g. it blasts away L1's page
tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
disabled.  Assuming the vast majority of nested deployments enable EPT in
L0, the cost of both options likely outweighs the benefits.

> >  	return nested_vmx_succeed(vcpu);
> >  }
> 
> -- 
> Vitaly
>
Vitaly Kuznetsov March 23, 2020, 4:33 p.m. UTC | #3
Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> 
>> > From: Junaid Shahid <junaids@google.com>
>> >
>> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
>> > outstanding changes to the page tables managed by L1 need to be
>> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
>> > because VPID is not tracked by the MMU role, all roots in the current
>> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
>> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
>> > stale SPTEs.
>> >
>> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
>> > Signed-off-by: Junaid Shahid <junaids@google.com>
>> > [sean: ported to upstream KVM, reworded the comment and changelog]
>> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> > ---
>> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
>> >  1 file changed, 14 insertions(+)
>> >
>> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> > index 9624cea4ed9f..bc74fbbf33c6 100644
>> > --- a/arch/x86/kvm/vmx/nested.c
>> > +++ b/arch/x86/kvm/vmx/nested.c
>> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>> >  		return kvm_skip_emulated_instruction(vcpu);
>> >  	}
>> >  
>> > +	/*
>> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
>> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
>> > +	 * VPIDs are not tracked in the MMU role.
>> > +	 *
>> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
>> > +	 * an MMU when EPT is disabled.
>> > +	 *
>> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
>> > +	 */
>> > +	if (!enable_ept)
>> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
>> > +				   KVM_MMU_ROOTS_ALL);
>> > +
>> 
>> This is related to my remark on the previous patch; the comment above
>> makes me think I'm missing something obvious, enlighten me please)
>> 
>> My understanding is that L1 and L2 will share arch.root_mmu not only
>> when EPT is globally disabled, we seem to switch between
>> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
>> guests may be different on this. Do we need to handle this somehow?
>
> guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
> enable_ept is global and cannot be changed without reloading kvm_intel.
>
> This most definitely over-invalidates, e.g. it blasts away L1's page
> tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
> support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
> disabled.  Assuming the vast majority of nested deployments enable EPT in
> L0, the cost of both options likely outweighs the benefits.
>

Yes but my question rather was: what if global 'enable_ept' is true but
nested EPT is not being used by L1, don't we still need to do
kvm_mmu_free_roots(&vcpu->arch.root_mmu) here?
Sean Christopherson March 23, 2020, 4:50 p.m. UTC | #4
On Mon, Mar 23, 2020 at 05:33:08PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> >> 
> >> > From: Junaid Shahid <junaids@google.com>
> >> >
> >> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
> >> > outstanding changes to the page tables managed by L1 need to be
> >> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
> >> > because VPID is not tracked by the MMU role, all roots in the current
> >> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
> >> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
> >> > stale SPTEs.
> >> >
> >> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
> >> > Signed-off-by: Junaid Shahid <junaids@google.com>
> >> > [sean: ported to upstream KVM, reworded the comment and changelog]
> >> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >> > ---
> >> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
> >> >  1 file changed, 14 insertions(+)
> >> >
> >> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> >> > index 9624cea4ed9f..bc74fbbf33c6 100644
> >> > --- a/arch/x86/kvm/vmx/nested.c
> >> > +++ b/arch/x86/kvm/vmx/nested.c
> >> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
> >> >  		return kvm_skip_emulated_instruction(vcpu);
> >> >  	}
> >> >  
> >> > +	/*
> >> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
> >> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
> >> > +	 * VPIDs are not tracked in the MMU role.
> >> > +	 *
> >> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
> >> > +	 * an MMU when EPT is disabled.
> >> > +	 *
> >> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
> >> > +	 */
> >> > +	if (!enable_ept)
> >> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
> >> > +				   KVM_MMU_ROOTS_ALL);
> >> > +
> >> 
> >> This is related to my remark on the previous patch; the comment above
> >> makes me think I'm missing something obvious, enlighten me please)
> >> 
> >> My understanding is that L1 and L2 will share arch.root_mmu not only
> >> when EPT is globally disabled, we seem to switch between
> >> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
> >> guests may be different on this. Do we need to handle this somehow?
> >
> > guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
> > enable_ept is global and cannot be changed without reloading kvm_intel.
> >
> > This most definitely over-invalidates, e.g. it blasts away L1's page
> > tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
> > support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
> > disabled.  Assuming the vast majority of nested deployments enable EPT in
> > L0, the cost of both options likely outweighs the benefits.
> >
> 
> Yes but my question rather was: what if global 'enable_ept' is true but
> nested EPT is not being used by L1, don't we still need to do
> kvm_mmu_free_roots(&vcpu->arch.root_mmu) here?

No, because L0 isn't shadowing the L1->L2 page tables, i.e. there can't be
unsync'd SPTEs for L2.  The vpid_sync_*() above flushes the TLB for L2's
effective VPID, which is all that's required.
Vitaly Kuznetsov March 23, 2020, 4:57 p.m. UTC | #5
Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Mon, Mar 23, 2020 at 05:33:08PM +0100, Vitaly Kuznetsov wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> 
>> > On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
>> >> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> >> 
>> >> > From: Junaid Shahid <junaids@google.com>
>> >> >
>> >> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
>> >> > outstanding changes to the page tables managed by L1 need to be
>> >> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
>> >> > because VPID is not tracked by the MMU role, all roots in the current
>> >> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
>> >> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
>> >> > stale SPTEs.
>> >> >
>> >> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
>> >> > Signed-off-by: Junaid Shahid <junaids@google.com>
>> >> > [sean: ported to upstream KVM, reworded the comment and changelog]
>> >> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> >> > ---
>> >> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
>> >> >  1 file changed, 14 insertions(+)
>> >> >
>> >> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> >> > index 9624cea4ed9f..bc74fbbf33c6 100644
>> >> > --- a/arch/x86/kvm/vmx/nested.c
>> >> > +++ b/arch/x86/kvm/vmx/nested.c
>> >> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>> >> >  		return kvm_skip_emulated_instruction(vcpu);
>> >> >  	}
>> >> >  
>> >> > +	/*
>> >> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
>> >> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
>> >> > +	 * VPIDs are not tracked in the MMU role.
>> >> > +	 *
>> >> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
>> >> > +	 * an MMU when EPT is disabled.
>> >> > +	 *
>> >> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
>> >> > +	 */
>> >> > +	if (!enable_ept)
>> >> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
>> >> > +				   KVM_MMU_ROOTS_ALL);
>> >> > +
>> >> 
>> >> This is related to my remark on the previous patch; the comment above
>> >> makes me think I'm missing something obvious, enlighten me please)
>> >> 
>> >> My understanding is that L1 and L2 will share arch.root_mmu not only
>> >> when EPT is globally disabled, we seem to switch between
>> >> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
>> >> guests may be different on this. Do we need to handle this somehow?
>> >
>> > guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
>> > enable_ept is global and cannot be changed without reloading kvm_intel.
>> >
>> > This most definitely over-invalidates, e.g. it blasts away L1's page
>> > tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
>> > support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
>> > disabled.  Assuming the vast majority of nested deployments enable EPT in
>> > L0, the cost of both options likely outweighs the benefits.
>> >
>> 
>> Yes but my question rather was: what if global 'enable_ept' is true but
>> nested EPT is not being used by L1, don't we still need to do
>> kvm_mmu_free_roots(&vcpu->arch.root_mmu) here?
>
> No, because L0 isn't shadowing the L1->L2 page tables, i.e. there can't be
> unsync'd SPTEs for L2.  The vpid_sync_*() above flushes the TLB for L2's
> effective VPID, which is all that's required.

Ah, stupid me, it's actually EPT and not nested EPT which we care about
here. Thank you for the clarification!

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9624cea4ed9f..bc74fbbf33c6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5250,6 +5250,20 @@  static int handle_invvpid(struct kvm_vcpu *vcpu)
 		return kvm_skip_emulated_instruction(vcpu);
 	}
 
+	/*
+	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
+	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
+	 * VPIDs are not tracked in the MMU role.
+	 *
+	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
+	 * an MMU when EPT is disabled.
+	 *
+	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
+	 */
+	if (!enable_ept)
+		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
+				   KVM_MMU_ROOTS_ALL);
+
 	return nested_vmx_succeed(vcpu);
 }