diff mbox series

[v3,8/8] KVM: x86: avoid loading PDPTRs after migration when possible

Message ID 20210607090203.133058-9-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series Introduce KVM_{GET|SET}_SREGS2 and fix PDPTR migration | expand

Commit Message

Maxim Levitsky June 7, 2021, 9:02 a.m. UTC
if new KVM_*_SREGS2 ioctls are used, the PDPTRs are
a part of the migration state and are correctly
restored by those ioctls.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++++
 arch/x86/kvm/svm/nested.c       | 3 ++-
 arch/x86/kvm/vmx/nested.c       | 3 ++-
 arch/x86/kvm/x86.c              | 3 +++
 4 files changed, 13 insertions(+), 2 deletions(-)

Comments

Sean Christopherson June 18, 2021, 8:53 p.m. UTC | #1
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 11260e83518f..eadfc9caf500 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -815,6 +815,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
>  
>  	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> +	vcpu->arch.pdptrs_restored_oob = false;
> +
>  out:
>  
>  	return ret;
> @@ -10113,6 +10115,7 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
>  
>  		kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
>  		mmu_reset_needed = 1;
> +		vcpu->arch.pdptrs_restored_oob = true;

Setting pdptrs_restored_oob[*] here and _only_ clearing it on successful
load_pdptrs() is not robust.  Potential problems once the flag is set:

  1.  Userspace calls KVM_SET_SREGS{,2} without valid PDPTRs.  Flag is now stale.
  2.  kvm_check_nested_events() VM-Exits to L1 before the flag is processed.
      Flag is now stale.

(2) might not be problematic in practice since the "normal" load_pdptrs()
should reset the flag on the next VM-Enter, but it's really, really hard to tell.
E.g. what if an SMI causes an exit and _that_ non-VM-Enter reload of L2 state
is the first to trip the flag?  The bool is essentially an extension of
KVM_REQ_GET_NESTED_STATE_PAGES, I think it makes sense to clear the flag whenever
KVM_REQ_GET_NESTED_STATE_PAGES is cleared.

Another thing that's not obvious is the required ordering between KVM_SET_SREGS2
and KVM_SET_NESTED_STATE.  AFAICT it's not documented, but that may be PEBKAC on
my end.  E.g. what happens if walk_mmu == &root_mmu (L1 active in targte KVM)
when SET_SREGS2 is called, and _then_ KVM_SET_NESTED_STATE is called?

[*] pdptrs_from_userspace in Paolo's tree.
Paolo Bonzini June 19, 2021, 7:03 a.m. UTC | #2
On 18/06/21 22:53, Sean Christopherson wrote:
> The bool is essentially an extension of
> KVM_REQ_GET_NESTED_STATE_PAGES, I think it makes sense to clear the flag whenever
> KVM_REQ_GET_NESTED_STATE_PAGES is cleared.

So in vcpu_enter_guest?

> Another thing that's not obvious is the required ordering between KVM_SET_SREGS2
> and KVM_SET_NESTED_STATE.  AFAICT it's not documented, but that may be PEBKAC on
> my end.  E.g. what happens if walk_mmu == &root_mmu (L1 active in targte KVM)
> when SET_SREGS2 is called, and_then_  KVM_SET_NESTED_STATE is called?

Either ordering should work.

Paolo
Maxim Levitsky June 20, 2021, 10:25 p.m. UTC | #3
On Fri, 2021-06-18 at 20:53 +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 11260e83518f..eadfc9caf500 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -815,6 +815,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
> >  
> >  	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
> >  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> > +	vcpu->arch.pdptrs_restored_oob = false;
> > +
> >  out:
> >  
> >  	return ret;
> > @@ -10113,6 +10115,7 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
> >  
> >  		kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> >  		mmu_reset_needed = 1;
> > +		vcpu->arch.pdptrs_restored_oob = true;
> 
> Setting pdptrs_restored_oob[*] here and _only_ clearing it on successful
> load_pdptrs() is not robust.  Potential problems once the flag is set:

Hi Sean Christopherson!
Thanks for the review!

I also thought about the exact same thing when I submitted the last version of
this patches (prior version didn't clear the flag at all but I noticed that
while doing self review of my patches).


> 
>   1.  Userspace calls KVM_SET_SREGS{,2} without valid PDPTRs.  Flag is now stale.

True. It isn't that big issue though since the only way to this is to also disable PAE mode
in CR4 during the call or before it, thus PDPTRs becomes irrelevant.
Once PAE is enabled again, PDPTRs will be loaded again, resetting this flag.

Something to note is that we also don't clear available/dirty status of VCPU_EXREG_PDPTR
in this case.



>   2.  kvm_check_nested_events() VM-Exits to L1 before the flag is processed.
>       Flag is now stale.

Also true. However this means that we enter L1 now, and once we are ready to enter
L2 again, we will load PDPTRS from guest memory as thankfully VM entries in PAE mode
do load PDPTRS from guest memory (both on Intel and AMD).



> 
> (2) might not be problematic in practice since the "normal" load_pdptrs()
> should reset the flag on the next VM-Enter, but it's really, really hard to tell.
> E.g. what if an SMI causes an exit and _that_ non-VM-Enter reload of L2 state
> is the first to trip the flag?  The bool is essentially an extension of
> KVM_REQ_GET_NESTED_STATE_PAGES, I think it makes sense to clear the flag whenever
> KVM_REQ_GET_NESTED_STATE_PAGES is cleared.

Could you expalain a bit better about SMM case? When SMM entry is done, at least Intel
spec is silent on if PDPTRs are preserved in SMRAM (and it doesn't have any place allocated
for them).

We currently don't preserve PDTPRS on SMM entry and we reload them via CR3/CR0 write 
when we exit SMM.

Ah I see now, on VMX the non VM-Enter code path is also used for returns from SMM,
and it sets the KVM_REQ_GET_NESTED_STATE_PAGES, so this is a real issue.
If SMM code enabled PAE, and then KVM_SET_SREGS2 was used, and followed by
RSM, we indeed have a risk of not loading the PDPTRs.

I think that this is best fixed by resetting this flag in vmx_leave_smm,
since RSM loads PDPTRS from memory always otherwise.

I also note that we don't do KVM_REQ_GET_NESTED_STATE_PAGES on SVM,
on return from SMM at all,
thus on SVM I think I broke the resume from SMM to a guest if the guest is PAE.
Oh well....

I will now extend the testing I usually do to SMM and prepare a patch to fix this.

> 
> Another thing that's not obvious is the required ordering between KVM_SET_SREGS2
> and KVM_SET_NESTED_STATE.  AFAICT it's not documented, but that may be PEBKAC on
> my end.  E.g. what happens if walk_mmu == &root_mmu (L1 active in targte KVM)
> when SET_SREGS2 is called, and _then_ KVM_SET_NESTED_STATE is called?

Isn't that exactly the current ordering (and reason why I had to do this patch series)
First the KVM_SET_SREGS is called indeed prior to KVM_SET_NESTED_STATE and it can
potentially load wrong PDPTRS, and then KVM_SET_NESTED_STATE is called which used
to 'fix' this by reloading them always.



> 
> [*] pdptrs_from_userspace in Paolo's tree.
> 


I think that strictly speaking this flag should be cleared when PAE mode is disabled,
and together with clearing of availablity of VCPU_EXREG_PDPTR.

I don't agree that this flag is an extension of the KVM_REQ_GET_NESTED_STATE_PAGES.
I think this flag is more like an extra property of VCPU_EXREG_PDPTR.
In addition to being dirty/available, this "register" can be loaded from memory
or restored from migration stream.

Thanks again for the review,
Best regards,
	Maxim Levitsky
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 83f948bdc59a..8eb107ceb45a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -851,6 +851,12 @@  struct kvm_vcpu_arch {
 
 	/* Protected Guests */
 	bool guest_state_protected;
+
+	/*
+	 * Set when PDPTS were loaded directly by the userspace without
+	 * reading the guest memory
+	 */
+	bool pdptrs_restored_oob;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index e3e5775b8f1c..0f80d68a45e1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1369,7 +1369,8 @@  static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
 	if (WARN_ON(!is_guest_mode(vcpu)))
 		return true;
 
-	if (!nested_npt_enabled(svm) && is_pae_paging(vcpu))
+	if (!vcpu->arch.pdptrs_restored_oob &&
+	    !nested_npt_enabled(svm) && is_pae_paging(vcpu))
 		/*
 		 * Reload the guest's PDPTRs since after a migration
 		 * the guest CR3 might be restored prior to setting the nested
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0acdda85f36a..78d6c71ab03b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3113,7 +3113,8 @@  static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 	struct page *page;
 	u64 hpa;
 
-	if (!nested_cpu_has_ept(vmcs12) && is_pae_paging(vcpu)) {
+	if (!vcpu->arch.pdptrs_restored_oob &&
+	    !nested_cpu_has_ept(vmcs12) && is_pae_paging(vcpu)) {
 		/*
 		 * Reload the guest's PDPTRs since after a migration
 		 * the guest CR3 might be restored prior to setting the nested
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 11260e83518f..eadfc9caf500 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -815,6 +815,8 @@  int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
 
 	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
+	vcpu->arch.pdptrs_restored_oob = false;
+
 out:
 
 	return ret;
@@ -10113,6 +10115,7 @@  static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
 
 		kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
 		mmu_reset_needed = 1;
+		vcpu->arch.pdptrs_restored_oob = true;
 	}
 	if (mmu_reset_needed)
 		kvm_mmu_reset_context(vcpu);