diff mbox series

[10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check

Message ID 20210622175739.3610207-11-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Bug fixes and summer cleaning | expand

Commit Message

Sean Christopherson June 22, 2021, 5:56 p.m. UTC
Replace the hack to identify nested EPT shadow pages with a simple check
that the size of the guest PTEs associated with the shadow page and the
current MMU match, which is the intent of the "8 bytes == PAE" test.
The nested EPT hack existed to avoid a false negative due to the is_pae()
check not matching for 32-bit L2 guests; checking the MMU role directly
avoids the indirect calculation of the guest PTE size entirely.

Note, this should be a glorified nop now that __kvm_sync_page() is called
if and only if the role is an exact match (kvm_mmu_get_page()) or is part
of the current MMU context (kvm_mmu_sync_roots()).  A future commit will
convert the likely-pointless check into a meaningful WARN to enforce that
the mmu_roles of the current context and the shadow page are compatible.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/mmu.rst |  3 ---
 arch/x86/kvm/mmu/mmu.c         | 16 +++-------------
 2 files changed, 3 insertions(+), 16 deletions(-)

Comments

Paolo Bonzini June 23, 2021, 3:49 p.m. UTC | #1
On 22/06/21 19:56, Sean Christopherson wrote:
> Replace the hack to identify nested EPT shadow pages with a simple check
> that the size of the guest PTEs associated with the shadow page and the
> current MMU match, which is the intent of the "8 bytes == PAE" test.
> The nested EPT hack existed to avoid a false negative due to the is_pae()
> check not matching for 32-bit L2 guests; checking the MMU role directly
> avoids the indirect calculation of the guest PTE size entirely.

What the commit message doesn't say is, did we miss this opportunity all
along, or has there been a change since commit 47c42e6b4192 ("KVM: x86:
fix handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28)
that allows this?

I think the only change needed would be making the commit something like
this:

==========
KVM: x86/mmu: Use MMU role to check for matching guest page sizes

Originally, __kvm_sync_page used to check the cr4_pae bit in the role
to avoid zapping 4-byte kvm_mmu_pages when guest page size are 8-byte
or the other way round.  However, in commit 47c42e6b4192 ("KVM: x86: fix
handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28) it
was observed that this did not work for nested EPT, where the page table
size would be 8 bytes even if CR4.PAE=0.  (Note that the check still
has to be done for nested *NPT*, so it is not possible to use tdp_enabled
or similar).

Therefore, a hack was introduced to identify nested EPT shadow pages
and unconditionally call __kvm_sync_page() on them.  However, it is
possible to do without the hack to identify nested EPT shadow pages:
if EPT is active, there will be no shadow pages in non-EPT format,
and all of them will have gpte_is_8_bytes set to true; we can just
check the MMU role directly, and the test will always be true.

Even for non-EPT shadow MMUs, this test should really always be true
now that __kvm_sync_page() is called if and only if the role is an
exact match (kvm_mmu_get_page()) or is part of the current MMU context
(kvm_mmu_sync_roots()).  A future commit will convert the likely-pointless
check into a meaningful WARN to enforce that the mmu_roles of the current
context and the shadow page are compatible.
==========


Paolo

> Note, this should be a glorified nop now that __kvm_sync_page() is called
> if and only if the role is an exact match (kvm_mmu_get_page()) or is part
> of the current MMU context (kvm_mmu_sync_roots()).  A future commit will
> convert the likely-pointless check into a meaningful WARN to enforce that
> the mmu_roles of the current context and the shadow page are compatible.
> 
> Cc: Vitaly Kuznetsov<vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson<seanjc@google.com>
Sean Christopherson June 23, 2021, 4:17 p.m. UTC | #2
On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:56, Sean Christopherson wrote:
> > Replace the hack to identify nested EPT shadow pages with a simple check
> > that the size of the guest PTEs associated with the shadow page and the
> > current MMU match, which is the intent of the "8 bytes == PAE" test.
> > The nested EPT hack existed to avoid a false negative due to the is_pae()
> > check not matching for 32-bit L2 guests; checking the MMU role directly
> > avoids the indirect calculation of the guest PTE size entirely.
> 
> What the commit message doesn't say is, did we miss this opportunity all
> along, or has there been a change since commit 47c42e6b4192 ("KVM: x86:
> fix handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28)
> that allows this?

The code was wrong from the initial "unsync" commit.  The 4-byte vs. 8-byte check
papered over the real bug, which was that the roles were not checked for
compabitility.  I suspect that the bug only manisfested as an observable problem
when the GPTE sizes mismatched, thus the PAE check was added.

So yes, there was an "opportunity" that was missed all along.

> I think the only change needed would be making the commit something like
> this:
> 
> ==========
> KVM: x86/mmu: Use MMU role to check for matching guest page sizes
> 
> Originally, __kvm_sync_page used to check the cr4_pae bit in the role
> to avoid zapping 4-byte kvm_mmu_pages when guest page size are 8-byte
> or the other way round.  However, in commit 47c42e6b4192 ("KVM: x86: fix
> handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28) it
> was observed that this did not work for nested EPT, where the page table
> size would be 8 bytes even if CR4.PAE=0.  (Note that the check still
> has to be done for nested *NPT*, so it is not possible to use tdp_enabled
> or similar).
> 
> Therefore, a hack was introduced to identify nested EPT shadow pages
> and unconditionally call __kvm_sync_page() on them.  However, it is
> possible to do without the hack to identify nested EPT shadow pages:
> if EPT is active, there will be no shadow pages in non-EPT format,
> and all of them will have gpte_is_8_bytes set to true; we can just
> check the MMU role directly, and the test will always be true.
> 
> Even for non-EPT shadow MMUs, this test should really always be true
> now that __kvm_sync_page() is called if and only if the role is an
> exact match (kvm_mmu_get_page()) or is part of the current MMU context
> (kvm_mmu_sync_roots()).  A future commit will convert the likely-pointless
> check into a meaningful WARN to enforce that the mmu_roles of the current
> context and the shadow page are compatible.
> ==========
> 
> 
> Paolo
> 
> > Note, this should be a glorified nop now that __kvm_sync_page() is called
> > if and only if the role is an exact match (kvm_mmu_get_page()) or is part
> > of the current MMU context (kvm_mmu_sync_roots()).  A future commit will
> > convert the likely-pointless check into a meaningful WARN to enforce that
> > the mmu_roles of the current context and the shadow page are compatible.
> > 
> > Cc: Vitaly Kuznetsov<vkuznets@redhat.com>
> > Signed-off-by: Sean Christopherson<seanjc@google.com>
>
Paolo Bonzini June 23, 2021, 4:41 p.m. UTC | #3
On 23/06/21 18:17, Sean Christopherson wrote:
>> What the commit message doesn't say is, did we miss this
>> opportunity all along, or has there been a change since commit
>> 47c42e6b4192 ("KVM: x86: fix handling of role.cr4_pae and rename it
>> to 'gpte_size'", 2019-03-28) that allows this?
>
> The code was wrong from the initial "unsync" commit.  The 4-byte vs.
> 8-byte check papered over the real bug, which was that the roles were
> not checked for compabitility.  I suspect that the bug only
> manisfested as an observable problem when the GPTE sizes mismatched,
> thus the PAE check was added.

I meant that we really never needed is_ept_sp, and you could have used 
the simpler check already at the time you introduced gpte_is_8_bytes. 
But anyway I think we're in agreement.

Paolo
Sean Christopherson June 23, 2021, 4:54 p.m. UTC | #4
On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 23/06/21 18:17, Sean Christopherson wrote:
> > > What the commit message doesn't say is, did we miss this
> > > opportunity all along, or has there been a change since commit
> > > 47c42e6b4192 ("KVM: x86: fix handling of role.cr4_pae and rename it
> > > to 'gpte_size'", 2019-03-28) that allows this?
> > 
> > The code was wrong from the initial "unsync" commit.  The 4-byte vs.
> > 8-byte check papered over the real bug, which was that the roles were
> > not checked for compabitility.  I suspect that the bug only
> > manisfested as an observable problem when the GPTE sizes mismatched,
> > thus the PAE check was added.
> 
> I meant that we really never needed is_ept_sp, and you could have used the
> simpler check already at the time you introduced gpte_is_8_bytes. But anyway
> I think we're in agreement.

Ah, yes, I was too clever :-/
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 20d85daed395..ddbb23998742 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -192,9 +192,6 @@  Shadow pages contain the following information:
     Contains the value of cr4.smap && !cr0.wp for which the page is valid
     (pages for which this is true are different from other pages; see the
     treatment of cr0.wp=0 below).
-  role.ept_sp:
-    This is a virtual flag to denote a shadowed nested EPT page.  ept_sp
-    is true if "cr0_wp && smap_andnot_wp", an otherwise invalid combination.
   role.smm:
     Is 1 if the page is valid in system management mode.  This field
     determines which of the kvm_memslots array was used to build this
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 99d26859021d..9f277c5bab76 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1780,16 +1780,13 @@  static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
 		if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
 
-static inline bool is_ept_sp(struct kvm_mmu_page *sp)
-{
-	return sp->role.cr0_wp && sp->role.smap_andnot_wp;
-}
-
 /* @sp->gfn should be write-protected at the call site */
 static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			    struct list_head *invalid_list)
 {
-	if ((!is_ept_sp(sp) && sp->role.gpte_is_8_bytes != !!is_pae(vcpu)) ||
+	union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base;
+
+	if (sp->role.gpte_is_8_bytes != mmu_role.gpte_is_8_bytes ||
 	    vcpu->arch.mmu->sync_page(vcpu, sp) == 0) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
 		return false;
@@ -4721,13 +4718,6 @@  kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 	role.base.guest_mode = true;
 	role.base.access = ACC_ALL;
 
-	/*
-	 * WP=1 and NOT_WP=1 is an impossible combination, use WP and the
-	 * SMAP variation to denote shadow EPT entries.
-	 */
-	role.base.cr0_wp = true;
-	role.base.smap_andnot_wp = true;
-
 	role.ext = kvm_calc_mmu_role_ext(vcpu);
 	role.ext.execonly = execonly;