mbox series

[v4,00/11] KVM: VMX: Clean up Hyper-V PV TLB flush

Message ID 20210305183123.3978098-1-seanjc@google.com (mailing list archive)
Headers show
Series KVM: VMX: Clean up Hyper-V PV TLB flush | expand

Message

Sean Christopherson March 5, 2021, 6:31 p.m. UTC
Clean up KVM's PV TLB flushing when running with EPT on Hyper-V, i.e. as
a nested VMM.  No real goal in mind other than the sole patch in v1, which
is a minor change to avoid a future mixup when TDX also wants to define
.remote_flush_tlb.  Everything else is opportunistic clean up.

NOTE: Based on my NPT+SME bug fix series[*] due to multiple conflicts with
      non-trivial resolutions.

[*] https://lkml.kernel.org/r/20210305011101.3597423-1-seanjc@google.com


Patch 1 legitimately tested on VMX and SVM (including i386).  Patches 2+
smoke tested by hacking usage of the relevant flows without actually
routing to the Hyper-V hypercalls (partial hack-a-patch below).

-static inline int hv_remote_flush_root_ept(hpa_t root_ept,
+static inline int hv_remote_flush_root_ept(struct kvm *kvm, hpa_t root_ept,
                                           struct kvm_tlb_range *range)
 {
-       if (range)
-               return hyperv_flush_guest_mapping_range(root_ept,
-                               kvm_fill_hv_flush_list_func, (void *)range);
-       else
-               return hyperv_flush_guest_mapping(root_ept);
+       if (range) {
+               kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH);
+               return 0;
+       }
+
+       return -ENOMEM;
 }
 
 static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
@@ -7753,8 +7754,7 @@ static __init int hardware_setup(void)
                vmx_x86_ops.update_cr8_intercept = NULL;
 
 #if IS_ENABLED(CONFIG_HYPERV)
-       if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
-           && enable_ept) {
+       if (enable_ept) {
                vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
                vmx_x86_ops.tlb_remote_flush_with_range =
                                hv_remote_flush_tlb_with_range;

v4: 
  - Rebased to kvm/queue, commit fe5f0041c026 ("KVM/SVM: Move vmenter.S
    exception fixups out of line"), plus the aforementioned series.
  - Don't grab PCID for nested_cr3 (NPT). [Paolo]
  - Collect reviews. [Vitaly]

v3:
  - https://lkml.kernel.org/r/20201027212346.23409-1-sean.j.christopherson@intel.com
  - Add a patch to pass the root_hpa instead of pgd to vmx_load_mmu_pgd()
    and retrieve the active PCID only when necessary.  [Vitaly]
  - Selectively collects reviews (skipped a few due to changes). [Vitaly]
  - Explicitly invalidate hv_tlb_eptp instead of leaving it valid when
    the mismatch tracker "knows" it's invalid. [Vitaly]
  - Change the last patch to use "hv_root_ept" instead of "hv_tlb_pgd"
    to better reflect what is actually being tracked.

v2:
  - Rewrite everything.
  - https://lkml.kernel.org/r/20201020215613.8972-1-sean.j.christopherson@intel.com

v1: ???

Sean Christopherson (11):
  KVM: x86: Get active PCID only when writing a CR3 value
  KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flush
  KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB
    flush
  KVM: VMX: Fold Hyper-V EPTP checking into it's only caller
  KVM: VMX: Do Hyper-V TLB flush iff vCPU's EPTP hasn't been flushed
  KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch
  KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matches
  KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd
  KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is
    enabled
  KVM: VMX: Skip additional Hyper-V TLB EPTP flushes if one fails
  KVM: VMX: Track root HPA instead of EPTP for paravirt Hyper-V TLB
    flush

 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/mmu.h              |   4 +-
 arch/x86/kvm/svm/svm.c          |  10 ++-
 arch/x86/kvm/vmx/vmx.c          | 134 ++++++++++++++++++--------------
 arch/x86/kvm/vmx/vmx.h          |  19 ++---
 5 files changed, 92 insertions(+), 79 deletions(-)

Comments

Sean Christopherson March 5, 2021, 6:33 p.m. UTC | #1
I'm an idiot and Cc'd my old @intel.com address on everything.  Apologies in
advance for the inevitable bounces. :-/

On Fri, Mar 05, 2021, Sean Christopherson wrote:
> Clean up KVM's PV TLB flushing when running with EPT on Hyper-V, i.e. as
> a nested VMM.  No real goal in mind other than the sole patch in v1, which
> is a minor change to avoid a future mixup when TDX also wants to define
> .remote_flush_tlb.  Everything else is opportunistic clean up.
Paolo Bonzini March 8, 2021, 11:50 a.m. UTC | #2
On 05/03/21 19:31, Sean Christopherson wrote:
> Clean up KVM's PV TLB flushing when running with EPT on Hyper-V, i.e. as
> a nested VMM.  No real goal in mind other than the sole patch in v1, which
> is a minor change to avoid a future mixup when TDX also wants to define
> .remote_flush_tlb.  Everything else is opportunistic clean up.
> 
> NOTE: Based on my NPT+SME bug fix series[*] due to multiple conflicts with
>        non-trivial resolutions.
> 
> [*] https://lkml.kernel.org/r/20210305011101.3597423-1-seanjc@google.com
> 
> 
> Patch 1 legitimately tested on VMX and SVM (including i386).  Patches 2+
> smoke tested by hacking usage of the relevant flows without actually
> routing to the Hyper-V hypercalls (partial hack-a-patch below).
> 
> -static inline int hv_remote_flush_root_ept(hpa_t root_ept,
> +static inline int hv_remote_flush_root_ept(struct kvm *kvm, hpa_t root_ept,
>                                             struct kvm_tlb_range *range)
>   {
> -       if (range)
> -               return hyperv_flush_guest_mapping_range(root_ept,
> -                               kvm_fill_hv_flush_list_func, (void *)range);
> -       else
> -               return hyperv_flush_guest_mapping(root_ept);
> +       if (range) {
> +               kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH);
> +               return 0;
> +       }
> +
> +       return -ENOMEM;
>   }
>   
>   static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> @@ -7753,8 +7754,7 @@ static __init int hardware_setup(void)
>                  vmx_x86_ops.update_cr8_intercept = NULL;
>   
>   #if IS_ENABLED(CONFIG_HYPERV)
> -       if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> -           && enable_ept) {
> +       if (enable_ept) {
>                  vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
>                  vmx_x86_ops.tlb_remote_flush_with_range =
>                                  hv_remote_flush_tlb_with_range;
> 
> v4:
>    - Rebased to kvm/queue, commit fe5f0041c026 ("KVM/SVM: Move vmenter.S
>      exception fixups out of line"), plus the aforementioned series.
>    - Don't grab PCID for nested_cr3 (NPT). [Paolo]
>    - Collect reviews. [Vitaly]
> 
> v3:
>    - https://lkml.kernel.org/r/20201027212346.23409-1-sean.j.christopherson@intel.com
>    - Add a patch to pass the root_hpa instead of pgd to vmx_load_mmu_pgd()
>      and retrieve the active PCID only when necessary.  [Vitaly]
>    - Selectively collects reviews (skipped a few due to changes). [Vitaly]
>    - Explicitly invalidate hv_tlb_eptp instead of leaving it valid when
>      the mismatch tracker "knows" it's invalid. [Vitaly]
>    - Change the last patch to use "hv_root_ept" instead of "hv_tlb_pgd"
>      to better reflect what is actually being tracked.
> 
> v2:
>    - Rewrite everything.
>    - https://lkml.kernel.org/r/20201020215613.8972-1-sean.j.christopherson@intel.com
> 
> v1: ???
> 
> Sean Christopherson (11):
>    KVM: x86: Get active PCID only when writing a CR3 value
>    KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flush
>    KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB
>      flush
>    KVM: VMX: Fold Hyper-V EPTP checking into it's only caller
>    KVM: VMX: Do Hyper-V TLB flush iff vCPU's EPTP hasn't been flushed
>    KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch
>    KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matches
>    KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd
>    KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is
>      enabled
>    KVM: VMX: Skip additional Hyper-V TLB EPTP flushes if one fails
>    KVM: VMX: Track root HPA instead of EPTP for paravirt Hyper-V TLB
>      flush
> 
>   arch/x86/include/asm/kvm_host.h |   4 +-
>   arch/x86/kvm/mmu.h              |   4 +-
>   arch/x86/kvm/svm/svm.c          |  10 ++-
>   arch/x86/kvm/vmx/vmx.c          | 134 ++++++++++++++++++--------------
>   arch/x86/kvm/vmx/vmx.h          |  19 ++---
>   5 files changed, 92 insertions(+), 79 deletions(-)
> 

Huh, I was sure I had queued this already for 5.12.  Well, done so now.

Paolo
Sean Christopherson March 9, 2021, 1:18 a.m. UTC | #3
On Mon, Mar 08, 2021, Paolo Bonzini wrote:
> On 05/03/21 19:31, Sean Christopherson wrote:
> > Sean Christopherson (11):
> >    KVM: x86: Get active PCID only when writing a CR3 value

...

> Huh, I was sure I had queued this already for 5.12.  Well, done so now.

Maybe this series is cursed.  The first patch got mangled and broke SME.
It shows up as two commits with the same changelog, so maybe you intended to
split the patch and things went sideways?

Anyways, commit a16241ae56fa ("KVM: x86: Get active PCID only when writing a
CR3 value") breaks SME and PCID.  The kvm/queue code looks like this:


	cr3 = __sme_set(root_hpa);
	if (npt_enabled) {
		svm->vmcb->control.nested_cr3 = root_hpa;
		vmcb_mark_dirty(svm->vmcb, VMCB_NPT);

		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
			return;
		cr3 = vcpu->arch.cr3;
	}

	svm->vmcb->save.cr3 = cr3;
	vmcb_mark_dirty(svm->vmcb, VMCB_CR);

but it should look like this:

	if (npt_enabled) {
		svm->vmcb->control.nested_cr3 = __sme_set(root);
		vmcb_mark_dirty(svm->vmcb, VMCB_NPT);

		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
			return;
		cr3 = vcpu->arch.cr3;
	} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
		cr3 = __sme_set(root);
	} else {
		cr3 = root;
	}

	svm->vmcb->save.cr3 = cr3;
	vmcb_mark_dirty(svm->vmcb, VMCB_CR);

I'll generate a delta patch, and test and post, just in case there is other
stuff that got lost.
Paolo Bonzini March 9, 2021, 8:35 a.m. UTC | #4
On 09/03/21 02:18, Sean Christopherson wrote:
> Maybe this series is cursed.  The first patch got mangled and broke SME.
> It shows up as two commits with the same changelog, so maybe you intended to
> split the patch and things went sideways?

There was a conflict.  I admit kvm/queue is not always that good, 
usually I try to test it but yesterday I just didn't have time.

I'll fix up everything (also 20/24 in the other series).

Oh well, you have to break eggs to make an omelette. :)

Paolo

> Anyways, commit a16241ae56fa ("KVM: x86: Get active PCID only when writing a
> CR3 value") breaks SME and PCID.  The kvm/queue code looks like this:
> 
> 
> 	cr3 = __sme_set(root_hpa);
> 	if (npt_enabled) {
> 		svm->vmcb->control.nested_cr3 = root_hpa;
> 		vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
> 
> 		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
> 		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> 			return;
> 		cr3 = vcpu->arch.cr3;
> 	}
> 
> 	svm->vmcb->save.cr3 = cr3;
> 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
> 
> but it should look like this:
> 
> 	if (npt_enabled) {
> 		svm->vmcb->control.nested_cr3 = __sme_set(root);
> 		vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
> 
> 		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
> 		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> 			return;
> 		cr3 = vcpu->arch.cr3;
> 	} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
> 		cr3 = __sme_set(root);
> 	} else {
> 		cr3 = root;
> 	}
> 
> 	svm->vmcb->save.cr3 = cr3;
> 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
> 
> I'll generate a delta patch, and test and post, just in case there is other
> stuff that got lost.
>