diff mbox series

[v2,31/32] KVM: nVMX: Don't flush TLB on nested VM transition with EPT enabled

Message ID 20200317045238.30434-32-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: TLB flushing fixes and enhancements | expand

Commit Message

Sean Christopherson March 17, 2020, 4:52 a.m. UTC
Don't flush the TLB if a cached EPTP root can be reused for nested
VM-Enter/VM-Exit.  Each unique EPTP constitutes a different ASID,
cached roots are invalidated on eviction, and KVM invalidates all EPTP
contexts on global events, e.g. mmu_notifier invalidate or memslot
updates.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c    | 2 +-
 arch/x86/kvm/vmx/nested.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini March 17, 2020, 5:18 p.m. UTC | #1
On 17/03/20 05:52, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d816f1366943..a77eab5b0e8a 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1123,7 +1123,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>  	}
>  
>  	if (!nested_ept)
> -		kvm_mmu_new_cr3(vcpu, cr3, false);
> +		kvm_mmu_new_cr3(vcpu, cr3, enable_ept);

Even if enable_ept == false, we could have already scheduled or flushed
the TLB soon due to one of 1) nested_vmx_transition_tlb_flush 2)
vpid_sync_context in prepare_vmcs02 3) the processor doing it for
!enable_vpid.

So for !enable_ept only KVM_REQ_MMU_SYNC is needed, not
KVM_REQ_TLB_FLUSH_CURRENT I think.  Worth adding a TODO?

Paolo

>  	vcpu->arch.cr3 = cr3;
>  	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
> -- 2.24.1
Sean Christopherson March 17, 2020, 6:22 p.m. UTC | #2
On Tue, Mar 17, 2020 at 06:18:37PM +0100, Paolo Bonzini wrote:
> On 17/03/20 05:52, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index d816f1366943..a77eab5b0e8a 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -1123,7 +1123,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
> >  	}
> >  
> >  	if (!nested_ept)
> > -		kvm_mmu_new_cr3(vcpu, cr3, false);
> > +		kvm_mmu_new_cr3(vcpu, cr3, enable_ept);
> 
> Even if enable_ept == false, we could have already scheduled or flushed
> the TLB soon due to one of 1) nested_vmx_transition_tlb_flush 2)
> vpid_sync_context in prepare_vmcs02 3) the processor doing it for
> !enable_vpid.
> 
> So for !enable_ept only KVM_REQ_MMU_SYNC is needed, not
> KVM_REQ_TLB_FLUSH_CURRENT I think.  Worth adding a TODO?

Now that you point it out, I think it makes sense to unconditionally pass
%true here, i.e. rely 100% on nested_vmx_transition_tlb_flush() to do the
right thing.
Paolo Bonzini March 18, 2020, 10:36 a.m. UTC | #3
On 17/03/20 19:22, Sean Christopherson wrote:
> On Tue, Mar 17, 2020 at 06:18:37PM +0100, Paolo Bonzini wrote:
>> On 17/03/20 05:52, Sean Christopherson wrote:
>>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>>> index d816f1366943..a77eab5b0e8a 100644
>>> --- a/arch/x86/kvm/vmx/nested.c
>>> +++ b/arch/x86/kvm/vmx/nested.c
>>> @@ -1123,7 +1123,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>  	}
>>>  
>>>  	if (!nested_ept)
>>> -		kvm_mmu_new_cr3(vcpu, cr3, false);
>>> +		kvm_mmu_new_cr3(vcpu, cr3, enable_ept);
>>
>> Even if enable_ept == false, we could have already scheduled or flushed
>> the TLB soon due to one of 1) nested_vmx_transition_tlb_flush 2)
>> vpid_sync_context in prepare_vmcs02 3) the processor doing it for
>> !enable_vpid.
>>
>> So for !enable_ept only KVM_REQ_MMU_SYNC is needed, not
>> KVM_REQ_TLB_FLUSH_CURRENT I think.  Worth adding a TODO?
> 
> Now that you point it out, I think it makes sense to unconditionally pass
> %true here, i.e. rely 100% on nested_vmx_transition_tlb_flush() to do the
> right thing.

Why doesn't it need KVM_REQ_MMU_SYNC either?  All this should be in a
comment as well, of course.

(All patches I didn't comment on look good).

Paolo
Sean Christopherson March 18, 2020, 5:02 p.m. UTC | #4
On Wed, Mar 18, 2020 at 11:36:04AM +0100, Paolo Bonzini wrote:
> On 17/03/20 19:22, Sean Christopherson wrote:
> > On Tue, Mar 17, 2020 at 06:18:37PM +0100, Paolo Bonzini wrote:
> >> On 17/03/20 05:52, Sean Christopherson wrote:
> >>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> >>> index d816f1366943..a77eab5b0e8a 100644
> >>> --- a/arch/x86/kvm/vmx/nested.c
> >>> +++ b/arch/x86/kvm/vmx/nested.c
> >>> @@ -1123,7 +1123,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
> >>>  	}
> >>>  
> >>>  	if (!nested_ept)
> >>> -		kvm_mmu_new_cr3(vcpu, cr3, false);
> >>> +		kvm_mmu_new_cr3(vcpu, cr3, enable_ept);
> >>
> >> Even if enable_ept == false, we could have already scheduled or flushed
> >> the TLB soon due to one of 1) nested_vmx_transition_tlb_flush 2)
> >> vpid_sync_context in prepare_vmcs02 3) the processor doing it for
> >> !enable_vpid.
> >>
> >> So for !enable_ept only KVM_REQ_MMU_SYNC is needed, not
> >> KVM_REQ_TLB_FLUSH_CURRENT I think.  Worth adding a TODO?
> > 
> > Now that you point it out, I think it makes sense to unconditionally pass
> > %true here, i.e. rely 100% on nested_vmx_transition_tlb_flush() to do the
> > right thing.
> 
> Why doesn't it need KVM_REQ_MMU_SYNC either?

Hmm, so if L1 is using VPID, we're ok without a sync.  Junaid's INVVPID
patch earlier in this series ensures cached roots won't retain unsync'd
SPTEs when L1 does INVVPID.  If L1 doesn't flush L2's TLB on VM-Entry, it
can't expect L2 to recognize changes in the PTEs since the last INVVPID.

Per Intel's SDM, INVLPG (and INVPCID) are only required to invalidate
entries for the current VPID, i.e. virtual VPID=0 when executed by L1.

  Operations that architecturally invalidate entries in the TLBs or
  paging-structure caches independent of VMX operation (e.g., the INVLPG and
  INVPCID instructions) invalidate linear mappings and combined mappings.
  They are required to do so only for the current VPID.
                             ^^^^^^^^^^^^^^^^^^^^^^^^^

If L1 isn't using VPID and L0 isn't using EPT, then a sync is required as
L1 would expect PTE changes to be recognized without an explicit INVLPG
prior to VM-Ennter.

So something like this?

	if (!nested_ept)
		kvm_mmu_new_cr3(vcpu, cr3, enable_ept ||
					   nested_cpu_has_vpid(vmcs12));

The KVM_REQ_TLB_FLUSH_CURRENT request would be redundant with
nested_vmx_transition_tlb_flush() when VPID is enabled, and is a (big) nop
when VPID is disabled.  In either case the overhead is negligible.  Ideally
this logic would tie into nested_vmx_transition_tlb_flush() in some way,
but making that happen may be wishful thinking.

> All this should be in a comment as well, of course.

Heh, in hindsight that's painfully obvious.
Paolo Bonzini March 18, 2020, 5:11 p.m. UTC | #5
On 18/03/20 18:02, Sean Christopherson wrote:
> So something like this?
> 
> 	if (!nested_ept)
> 		kvm_mmu_new_cr3(vcpu, cr3, enable_ept ||
> 					   nested_cpu_has_vpid(vmcs12));

... which is exactly nested_has_guest_tlb_tag(vcpu).  Well, not exactly
but it's a bug in your code above. :)

It completely makes sense to use that as the third argument, and while a
comment is still needed it will be much smaller.

Paolo
Sean Christopherson March 18, 2020, 5:26 p.m. UTC | #6
On Wed, Mar 18, 2020 at 06:11:28PM +0100, Paolo Bonzini wrote:
> On 18/03/20 18:02, Sean Christopherson wrote:
> > So something like this?
> > 
> > 	if (!nested_ept)
> > 		kvm_mmu_new_cr3(vcpu, cr3, enable_ept ||
> > 					   nested_cpu_has_vpid(vmcs12));
> 
> ... which is exactly nested_has_guest_tlb_tag(vcpu).  Well, not exactly
> but it's a bug in your code above. :)

I don't think it's a bug, it's intentionally different.  When enable_ept=0,
nested_has_guest_tlb_tag() returns true if and only if L1 has enabled VPID
for L2 *and* L2 has been assigned a unique VPID by L0.

For sync purposes, whether or not L2 has been assigned a unique VPID is
irrelevant.  L0 needs to invalidate TLB entries to prevent resuing L1's
entries (assuming L1 has been assigned a VPID), but L0 doesn't need to sync
SPTEs because L2 doesn't expect them to be refreshed.

> It completely makes sense to use that as the third argument, and while a
> comment is still needed it will be much smaller.

Ya, agreed.
Paolo Bonzini March 18, 2020, 5:38 p.m. UTC | #7
On 18/03/20 18:26, Sean Christopherson wrote:
>>>
>>> 	if (!nested_ept)
>>> 		kvm_mmu_new_cr3(vcpu, cr3, enable_ept ||
>>> 					   nested_cpu_has_vpid(vmcs12));
>>
>> ... which is exactly nested_has_guest_tlb_tag(vcpu).  Well, not exactly
>> but it's a bug in your code above. :)
>
> I don't think it's a bug, it's intentionally different.  When enable_ept=0,
> nested_has_guest_tlb_tag() returns true if and only if L1 has enabled VPID
> for L2 *and* L2 has been assigned a unique VPID by L0.
> 
> For sync purposes, whether or not L2 has been assigned a unique VPID is
> irrelevant.  L0 needs to invalidate TLB entries to prevent resuing L1's
> entries (assuming L1 has been assigned a VPID), but L0 doesn't need to sync
> SPTEs because L2 doesn't expect them to be refreshed.
                ^^
                L1

Yes you're right.  So I would say keep your code, but we can simplify
the comment.  Something like:

/*
 * We can skip the TLB flush if we have EPT enabled (because...)  and
 * also if L1 is using VPID, because then it doesn't expect SPTEs for L2
 * to be refreshed.
 *
 * This is almost the same as nested_has_guest_tlb_tag(vcpu), but here
 * we don't care if L2 has been assigned a unique VPID; L1 doesn't know,
 * and will nevertheless do INVVPID to avoid reuse of stale page
 * table entries.
 */

Nevertheless it's scary in that this is a potential attack vector for
reusing stale L0 SPTEs, so we should make sure it's all properly commented.

Thanks,

Paolo

>> It completely makes sense to use that as the third argument, and while a
>> comment is still needed it will be much smaller.
> Ya, agreed.
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b98482b60748..de2bd07bad7f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5047,7 +5047,7 @@  void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
 						   execonly, level);
 
-	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, false);
+	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, true);
 
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d816f1366943..a77eab5b0e8a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1123,7 +1123,7 @@  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 	}
 
 	if (!nested_ept)
-		kvm_mmu_new_cr3(vcpu, cr3, false);
+		kvm_mmu_new_cr3(vcpu, cr3, enable_ept);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);