diff mbox series

KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page

Message ID 20191120143307.59906-1-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page | expand

Commit Message

Liran Alon Nov. 20, 2019, 2:33 p.m. UTC
According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
of the INVVPID/INVEPT Instruction, there is a need to execute
INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
either: "Virtualize APIC accesses" VM-execution control was changed
from 0 to 1, OR the value of apic_access_page was changed.

Examining prepare_vmcs02(), one could note that L0 won't flush
physical TLB only in case: L0 use VPID, L1 use VPID and L0
can guarantee TLB entries populated while running L1 are tagged
differently than TLB entries populated while running L2.
The last condition can only occur if either L0 use EPT or
L0 use different VPID for L1 and L2
(i.e. vmx->vpid != vmx->nested.vpid02).

If L0 use EPT, L0 use different EPTP when running L2 than L1
(Because guest_mode is part of mmu-role) and therefore SDM section
28.3.3.4 doesn't apply. Otherwise, L0 use different VPID when
running L2 than L1 and therefore SDM section 28.3.3.3 doesn't
apply.

Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
one could note that L0 won't flush TLB only in cases that doesn't
apply to SDM sections 28.3.3.3 and 28.3.3.4.

Considering the above, there is therefore no need to specifically
flush TLB in case L1 don't use EPT and use "Virtualize APIC accesses".
Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().

Side-note: This patch can be viewed as removing parts of commit
fb6c81984313 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
that is not relevant anymore since commit
1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
then L0 will use same EPTP for both L0 and L1. Which indeed required
L0 to execute INVEPT before entering L2 guest. This assumption is
not true anymore since when guest_mode was added to mmu-role.

Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 arch/x86/kvm/vmx/nested.c | 7 -------
 1 file changed, 7 deletions(-)

Comments

Paolo Bonzini Nov. 20, 2019, 3:05 p.m. UTC | #1
On 20/11/19 15:33, Liran Alon wrote:
> "Virtualize APIC accesses" VM-execution control was changed
> from 0 to 1, OR the value of apic_access_page was changed.
> 
> Examining prepare_vmcs02(), one could note that L0 won't flush
> physical TLB only in case: L0 use VPID, L1 use VPID and L0
> can guarantee TLB entries populated while running L1 are tagged
> differently than TLB entries populated while running L2.
> The last condition can only occur if either L0 use EPT or
> L0 use different VPID for L1 and L2
> (i.e. vmx->vpid != vmx->nested.vpid02).
> 
> If L0 use EPT, L0 use different EPTP when running L2 than L1
> (Because guest_mode is part of mmu-role) and therefore SDM section
> 28.3.3.4 doesn't apply. Otherwise, L0 use different VPID when
> running L2 than L1 and therefore SDM section 28.3.3.3 doesn't
> apply.

I don't understand this.  You could still have a stale EPTP entry from a
previous L2 vmenter.   If L1 uses neither EPT nor VPID, it expects a TLB
flush to occur on every vmentry, but this won't happen if L0 uses EPT.

Paolo
Liran Alon Nov. 20, 2019, 3:25 p.m. UTC | #2
> On 20 Nov 2019, at 17:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 20/11/19 15:33, Liran Alon wrote:
>> "Virtualize APIC accesses" VM-execution control was changed
>> from 0 to 1, OR the value of apic_access_page was changed.
>> 
>> Examining prepare_vmcs02(), one could note that L0 won't flush
>> physical TLB only in case: L0 use VPID, L1 use VPID and L0
>> can guarantee TLB entries populated while running L1 are tagged
>> differently than TLB entries populated while running L2.
>> The last condition can only occur if either L0 use EPT or
>> L0 use different VPID for L1 and L2
>> (i.e. vmx->vpid != vmx->nested.vpid02).
>> 
>> If L0 use EPT, L0 use different EPTP when running L2 than L1
>> (Because guest_mode is part of mmu-role) and therefore SDM section
>> 28.3.3.4 doesn't apply. Otherwise, L0 use different VPID when
>> running L2 than L1 and therefore SDM section 28.3.3.3 doesn't
>> apply.
> 
> I don't understand this.  You could still have a stale EPTP entry from a
> previous L2 vmenter.   If L1 uses neither EPT nor VPID, it expects a TLB
> flush to occur on every vmentry, but this won't happen if L0 uses EPT.

I don’t seem to get your concern.
In case L1 don’t use VPID, prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.
(As it needs to emulate to L1 that on every L1<->L2 switch, the entire physical TLB is flushed)
As explained in commit message.

> 
> Paolo
Paolo Bonzini Nov. 20, 2019, 5:49 p.m. UTC | #3
On 20/11/19 16:25, Liran Alon wrote:
>>> If L0 use EPT, L0 use different EPTP when running L2 than L1
>>> (Because guest_mode is part of mmu-role) and therefore SDM section
>>> 28.3.3.4 doesn't apply. Otherwise, L0 use different VPID when
>>> running L2 than L1 and therefore SDM section 28.3.3.3 doesn't
>>> apply.
>> I don't understand this.  You could still have a stale EPTP entry from a
>> previous L2 vmenter.   If L1 uses neither EPT nor VPID, it expects a TLB
>> flush to occur on every vmentry, but this won't happen if L0 uses EPT.
> I don’t seem to get your concern.
> In case L1 don’t use VPID, prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.
> (As it needs to emulate to L1 that on every L1<->L2 switch, the entire physical TLB is flushed)
> As explained in commit message.
> 

You're right.  I'll rewrite some parts of the commit message, but the
patch is correct.

Paolo
Paolo Bonzini Nov. 21, 2019, 9:39 a.m. UTC | #4
On 20/11/19 18:49, Paolo Bonzini wrote:
> On 20/11/19 16:25, Liran Alon wrote:
>>>> If L0 use EPT, L0 use different EPTP when running L2 than L1
>>>> (Because guest_mode is part of mmu-role) and therefore SDM section
>>>> 28.3.3.4 doesn't apply. Otherwise, L0 use different VPID when
>>>> running L2 than L1 and therefore SDM section 28.3.3.3 doesn't
>>>> apply.
>>> I don't understand this.  You could still have a stale EPTP entry from a
>>> previous L2 vmenter.   If L1 uses neither EPT nor VPID, it expects a TLB
>>> flush to occur on every vmentry, but this won't happen if L0 uses EPT.
>> I don’t seem to get your concern.
>> In case L1 don’t use VPID, prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.
>> (As it needs to emulate to L1 that on every L1<->L2 switch, the entire physical TLB is flushed)
>> As explained in commit message.
> 
> You're right.  I'll rewrite some parts of the commit message, but the
> patch is correct.

I got confused because of "Otherwise, L0 use different VPID when running
L2 than L1 and therefore SDM section 28.3.3.3 doesn't apply".  In fact
SDM section 28.3.3.3 never applies to nested virtualization because:

* if L1 doesn't use VPID, the architecture requires a TLB flush on
vmentry/vmexit anyway

* if L1 uses VPID, it already has to do an INVVPID.  L0 only needs to
take care of combined mappings, which can be done with INVEPT but also
by using a different EP4TA for L1 and L2, as is the case after 1313cc2bd8f6.

Here is the rewritten commit message:

----
According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
of the INVVPID/INVEPT Instruction, the hypervisor needs to execute
INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
either: "Virtualize APIC accesses" VM-execution control was changed
from 0 to 1, OR the value of apic_access_page was changed.

In the nested case, the burden falls on L1, unless L0 enables EPT in
vmcs02 but L1 enables neither EPT nor VPID in vmcs12.  For this reason
prepare_vmcs02() and load_vmcs12_host_state() have special code to
request a TLB flush in case L1 does not use EPT but it uses
"virtualize APIC accesses".

This special case however is not necessary. On a nested vmentry the
physical TLB will already be flushed except if all the following apply:

* L0 uses VPID

* L1 uses VPID

* L0 can guarantee TLB entries populated while running L1 are tagged
differently than TLB entries populated while running L2.

If the first condition is false, the processor will flush the TLB
on vmentry to L2.  If the second or third condition are false,
prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.  However, even
if both are true, no extra TLB flush is needed to handle the APIC
access page:

* if L1 doesn't use VPID, the second condition doesn't hold and the
TLB will be flushed anyway.

* if L1 uses VPID, it has to flush the TLB itself with INVVPID and
section 28.3.3.3 doesn't apply to L0.

* even INVEPT is not needed because, if L0 uses EPT, it uses different
EPTP when running L2 than L1 (because guest_mode is part of mmu-role).
In this case SDM section 28.3.3.4 doesn't apply.

Therefore, there is no need to specifically flush TLB in case L1 does
not use EPT but it uses "Virtualize APIC accesses".

Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
one could note that L0 won't flush TLB only in cases where SDM sections
28.3.3.3 and 28.3.3.4 don't apply.  In particular, if L0 uses different
VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section
28.3.3.3 doesn't apply.

Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().

Side-note: This patch can be viewed as removing parts of commit
fb6c81984313 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
that is not relevant anymore since commit
1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
then L0 will use same EPTP for both L0 and L1. Which indeed required
L0 to execute INVEPT before entering L2 guest. This assumption is
not true anymore since when guest_mode was added to mmu-role.
----

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fdcead2d4dd6..20692e442d13 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2384,9 +2384,6 @@  static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 
 	if (nested_cpu_has_ept(vmcs12))
 		nested_ept_init_mmu_context(vcpu);
-	else if (nested_cpu_has2(vmcs12,
-				 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
-		vmx_flush_tlb(vcpu, true);
 
 	/*
 	 * This sets GUEST_CR0 to vmcs12->guest_cr0, possibly modifying those
@@ -4125,10 +4122,6 @@  void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	if (vmx->nested.change_vmcs01_virtual_apic_mode) {
 		vmx->nested.change_vmcs01_virtual_apic_mode = false;
 		vmx_set_virtual_apic_mode(vcpu);
-	} else if (!nested_cpu_has_ept(vmcs12) &&
-		   nested_cpu_has2(vmcs12,
-				   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
-		vmx_flush_tlb(vcpu, true);
 	}
 
 	/* Unpin physical memory we referred to in vmcs02 */