diff mbox series

[5/6] KVM: nSVM: avoid loading PDPTRs after migration when possible

Message ID 20210401141814.1029036-6-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series Introduce KVM_{GET|SET}_SREGS2 and fix PDPTR migration | expand

Commit Message

Maxim Levitsky April 1, 2021, 2:18 p.m. UTC
if new KVM_*_SREGS2 ioctls are used, the PDPTRs are
part of the migration state and thus are loaded
by those ioctls.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Sean Christopherson April 5, 2021, 5:01 p.m. UTC | #1
On Thu, Apr 01, 2021, Maxim Levitsky wrote:
> if new KVM_*_SREGS2 ioctls are used, the PDPTRs are
> part of the migration state and thus are loaded
> by those ioctls.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index ac5e3e17bda4..b94916548cfa 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -373,10 +373,9 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
>  		return -EINVAL;
>  
>  	if (!nested_npt && is_pae_paging(vcpu) &&
> -	    (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu))) {
> +	    (cr3 != kvm_read_cr3(vcpu) || !kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR)))
>  		if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)))

What if we ditch the optimizations[*] altogether and just do:

	if (!nested_npt && is_pae_paging(vcpu) &&
	    CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
		return -EINVAL;

Won't that obviate the need for KVM_{GET|SET}_SREGS2 since KVM will always load
the PDPTRs from memory?  IMO, nested migration with shadowing paging doesn't
warrant this level of optimization complexity.

[*] For some definitions of "optimization", since the extra pdptrs_changed()
    check in the existing code is likely a net negative.

>  			return -EINVAL;
> -	}
>  
>  	/*
>  	 * TODO: optimize unconditional TLB flush/MMU sync here and in
Maxim Levitsky April 6, 2021, 10:12 a.m. UTC | #2
On Mon, 2021-04-05 at 17:01 +0000, Sean Christopherson wrote:
> On Thu, Apr 01, 2021, Maxim Levitsky wrote:
> > if new KVM_*_SREGS2 ioctls are used, the PDPTRs are
> > part of the migration state and thus are loaded
> > by those ioctls.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index ac5e3e17bda4..b94916548cfa 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -373,10 +373,9 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
> >  		return -EINVAL;
> >  
> >  	if (!nested_npt && is_pae_paging(vcpu) &&
> > -	    (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu))) {
> > +	    (cr3 != kvm_read_cr3(vcpu) || !kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR)))
> >  		if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)))
> 
> What if we ditch the optimizations[*] altogether and just do:
> 
> 	if (!nested_npt && is_pae_paging(vcpu) &&
> 	    CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> 		return -EINVAL;
> 
> Won't that obviate the need for KVM_{GET|SET}_SREGS2 since KVM will always load
> the PDPTRs from memory?  IMO, nested migration with shadowing paging doesn't
> warrant this level of optimization complexity.

Its not an optimization, it was done to be 100% within the X86 spec. 
PDPTRs are internal cpu registers which are loaded only when
CR3/CR0/CR4 are written by the guest, guest entry loads CR3, or 
when guest exit loads CR3 (I checked both Intel and AMD manuals).

In addition to that when NPT is enabled, AMD drops this siliness and 
just treats PDPTRs as normal paging entries, while on Intel side 
when EPT is enabled, PDPTRs are stored in VMCS.

Nested migration is neither of these cases, thus PDPTRs should be 
stored out of band.
Same for non nested migration.

This was requested by Jim Mattson, and I went ahead and 
implemented it, even though I do understand that no sane OS 
relies on PDPTRs to be unsync v.s the actual page
table containing them.

Best regards,
	Maxim Levitsky


> 
> [*] For some definitions of "optimization", since the extra pdptrs_changed()
>     check in the existing code is likely a net negative.
> 
> >  			return -EINVAL;
> > -	}
> >  
> >  	/*
> >  	 * TODO: optimize unconditional TLB flush/MMU sync here and in
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ac5e3e17bda4..b94916548cfa 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -373,10 +373,9 @@  static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 		return -EINVAL;
 
 	if (!nested_npt && is_pae_paging(vcpu) &&
-	    (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu))) {
+	    (cr3 != kvm_read_cr3(vcpu) || !kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR)))
 		if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)))
 			return -EINVAL;
-	}
 
 	/*
 	 * TODO: optimize unconditional TLB flush/MMU sync here and in
@@ -552,6 +551,8 @@  int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
 	nested_vmcb02_prepare_control(svm);
 	nested_vmcb02_prepare_save(svm, vmcb12);
 
+	kvm_register_clear_available(&svm->vcpu, VCPU_EXREG_PDPTR);
+
 	ret = nested_svm_load_cr3(&svm->vcpu, vmcb12->save.cr3,
 				  nested_npt_enabled(svm));
 	if (ret)
@@ -779,6 +780,8 @@  int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	nested_svm_uninit_mmu_context(vcpu);
 
+	kvm_register_clear_available(&svm->vcpu, VCPU_EXREG_PDPTR);
+
 	rc = nested_svm_load_cr3(vcpu, svm->vmcb->save.cr3, false);
 	if (rc)
 		return 1;
@@ -1301,6 +1304,14 @@  static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
 	if (WARN_ON(!is_guest_mode(vcpu)))
 		return true;
 
+	if (vcpu->arch.reload_pdptrs_on_nested_entry) {
+		/* If legacy KVM_SET_SREGS API was used, it might have
+		 * loaded wrong PDPTRs from memory so we have to reload
+		 * them here (which is against x86 spec)
+		 */
+		kvm_register_clear_available(vcpu, VCPU_EXREG_PDPTR);
+	}
+
 	if (nested_svm_load_cr3(&svm->vcpu, vcpu->arch.cr3,
 				nested_npt_enabled(svm)))
 		return false;