[RFC,02/28] kvm: mmu: Separate pte generation from set_spte

Message ID	20190926231824.149014-3-bgardon@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ErSa=XV=vger.kernel.org=kvm-owner@kernel.org> Date: Thu, 26 Sep 2019 16:17:58 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-3-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> Subject: [RFC PATCH 02/28] kvm: mmu: Separate pte generation from set_spte From: Ben Gardon <bgardon@google.com> To: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Feiner <pfeiner@google.com>, Peter Shier <pshier@google.com>, Junaid Shahid <junaids@google.com>, Jim Mattson <jmattson@google.com>, Ben Gardon <bgardon@google.com> Content-Type: text/plain; charset="UTF-8" Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	kvm: mmu: Rework the x86 TDP direct mapped case \| expand [RFC,00/28] kvm: mmu: Rework the x86 TDP direct mapped case [RFC,01/28] kvm: mmu: Separate generating and setting mmio ptes [RFC,02/28] kvm: mmu: Separate pte generation from set_spte [RFC,03/28] kvm: mmu: Zero page cache memory at allocation time [RFC,04/28] kvm: mmu: Update the lpages stat atomically [RFC,05/28] sched: Add cond_resched_rwlock [RFC,06/28] kvm: mmu: Replace mmu_lock with a read/write lock [RFC,07/28] kvm: mmu: Add functions for handling changed PTEs [RFC,08/28] kvm: mmu: Init / Uninit the direct MMU [RFC,09/28] kvm: mmu: Free direct MMU page table memory in an RCU callback [RFC,10/28] kvm: mmu: Flush TLBs before freeing direct MMU page table memory [RFC,11/28] kvm: mmu: Optimize for freeing direct MMU PTs on teardown [RFC,12/28] kvm: mmu: Set tlbs_dirty atomically [RFC,13/28] kvm: mmu: Add an iterator for concurrent paging structure walks [RFC,14/28] kvm: mmu: Batch updates to the direct mmu disconnected list [RFC,15/28] kvm: mmu: Support invalidate_zap_all_pages [RFC,16/28] kvm: mmu: Add direct MMU page fault handler [RFC,17/28] kvm: mmu: Add direct MMU fast page fault handler [RFC,18/28] kvm: mmu: Add an hva range iterator for memslot GFNs [RFC,19/28] kvm: mmu: Make address space ID a property of memslots [RFC,20/28] kvm: mmu: Implement the invalidation MMU notifiers for the direct MMU [RFC,21/28] kvm: mmu: Integrate the direct mmu with the changed pte notifier [RFC,22/28] kvm: mmu: Implement access tracking for the direct MMU [RFC,23/28] kvm: mmu: Make mark_page_dirty_in_slot usable from outside kvm_main [RFC,24/28] kvm: mmu: Support dirty logging in the direct MMU [RFC,25/28] kvm: mmu: Support kvm_zap_gfn_range in the direct MMU [RFC,26/28] kvm: mmu: Integrate direct MMU with nesting [RFC,27/28] kvm: mmu: Lazily allocate rmap when direct MMU is enabled [RFC,28/28] kvm: mmu: Support MMIO in the direct MMU

Message ID

20190926231824.149014-3-bgardon@google.com (mailing list archive)

State

New, archived

Headers

Date: Thu, 26 Sep 2019 16:17:58 -0700
In-Reply-To: <20190926231824.149014-1-bgardon@google.com>
Message-Id: <20190926231824.149014-3-bgardon@google.com>
Mime-Version: 1.0
References: <20190926231824.149014-1-bgardon@google.com>
Subject: [RFC PATCH 02/28] kvm: mmu: Separate pte generation from set_spte
From: Ben Gardon <bgardon@google.com>
To: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
        Peter Feiner <pfeiner@google.com>,
        Peter Shier <pshier@google.com>,
        Junaid Shahid <junaids@google.com>,
        Jim Mattson <jmattson@google.com>,
        Ben Gardon <bgardon@google.com>
Content-Type: text/plain; charset="UTF-8"
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Series

kvm: mmu: Rework the x86 TDP direct mapped case | expand

Commit Message

Ben Gardon Sept. 26, 2019, 11:17 p.m. UTC

Separate the functions for generating leaf page table entries from the
function that inserts them into the paging structure. This refactoring
will allow changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu.c | 93 ++++++++++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 36 deletions(-)

Comments

Sean Christopherson Nov. 27, 2019, 6:25 p.m. UTC | #1

On Thu, Sep 26, 2019 at 04:17:58PM -0700, Ben Gardon wrote:
> Separate the functions for generating leaf page table entries from the
> function that inserts them into the paging structure. This refactoring
> will allow changes to the MMU sychronization model to use atomic
> compare / exchanges (which are not guaranteed to succeed) instead of a
> monolithic MMU lock.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>  arch/x86/kvm/mmu.c | 93 ++++++++++++++++++++++++++++------------------
>  1 file changed, 57 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 781c2ca7455e3..7e5ab9c6e2b09 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2964,21 +2964,15 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
>  #define SET_SPTE_WRITE_PROTECTED_PT	BIT(0)
>  #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
>  
> -static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> -		    unsigned pte_access, int level,
> -		    gfn_t gfn, kvm_pfn_t pfn, bool speculative,
> -		    bool can_unsync, bool host_writable)
> +static int generate_pte(struct kvm_vcpu *vcpu, unsigned pte_access, int level,

Similar comment on "generate".  Note, I don't necessarily like the names
get_mmio_spte_value() or get_spte_value() either as they could be
misinterpreted as reading the value from memory.  Maybe
calc_{mmio_}spte_value()?

> +		    gfn_t gfn, kvm_pfn_t pfn, u64 old_pte, bool speculative,
> +		    bool can_unsync, bool host_writable, bool ad_disabled,
> +		    u64 *ptep)
>  {
> -	u64 spte = 0;
> +	u64 pte;

Renames and unrelated refactoring (leaving the variable uninitialized and
setting it directdly to shadow_present_mask) belong in separate patches.
The renames especially make this patch much more difficult to review.  And,
I disagree with the rename, I think it's important to keep the "spte"
nomenclature, even though it's a bit of a misnomer for TDP entries, so that
it is easy to differentiate data that is coming from the host PTEs versus
data that is for KVM's MMU.

>  	int ret = 0;
> -	struct kvm_mmu_page *sp;
> -
> -	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
> -		return 0;
>  
> -	sp = page_header(__pa(sptep));
> -	if (sp_ad_disabled(sp))
> -		spte |= shadow_acc_track_value;
> +	*ptep = 0;
>  
>  	/*
>  	 * For the EPT case, shadow_present_mask is 0 if hardware
> @@ -2986,36 +2980,39 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	 * ACC_USER_MASK and shadow_user_mask are used to represent
>  	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
>  	 */
> -	spte |= shadow_present_mask;
> +	pte = shadow_present_mask;
> +
> +	if (ad_disabled)
> +		pte |= shadow_acc_track_value;
> +
>  	if (!speculative)
> -		spte |= spte_shadow_accessed_mask(spte);
> +		pte |= spte_shadow_accessed_mask(pte);
>  
>  	if (pte_access & ACC_EXEC_MASK)
> -		spte |= shadow_x_mask;
> +		pte |= shadow_x_mask;
>  	else
> -		spte |= shadow_nx_mask;
> +		pte |= shadow_nx_mask;
>  
>  	if (pte_access & ACC_USER_MASK)
> -		spte |= shadow_user_mask;
> +		pte |= shadow_user_mask;
>  
>  	if (level > PT_PAGE_TABLE_LEVEL)
> -		spte |= PT_PAGE_SIZE_MASK;
> +		pte |= PT_PAGE_SIZE_MASK;
>  	if (tdp_enabled)
> -		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> +		pte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
>  
>  	if (host_writable)
> -		spte |= SPTE_HOST_WRITEABLE;
> +		pte |= SPTE_HOST_WRITEABLE;
>  	else
>  		pte_access &= ~ACC_WRITE_MASK;
>  
>  	if (!kvm_is_mmio_pfn(pfn))
> -		spte |= shadow_me_mask;
> +		pte |= shadow_me_mask;
>  
> -	spte |= (u64)pfn << PAGE_SHIFT;
> +	pte |= (u64)pfn << PAGE_SHIFT;
>  
>  	if (pte_access & ACC_WRITE_MASK) {
> -
>  		/*
>  		 * Other vcpu creates new sp in the window between
>  		 * mapping_level() and acquiring mmu-lock. We can
> @@ -3024,9 +3021,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		 */
>  		if (level > PT_PAGE_TABLE_LEVEL &&
>  		    mmu_gfn_lpage_is_disallowed(vcpu, gfn, level))
> -			goto done;
> +			return 0;
>  
> -		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
> +		pte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
>  
>  		/*
>  		 * Optimization: for pte sync, if spte was writable the hash
> @@ -3034,30 +3031,54 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		 * is responsibility of mmu_get_page / kvm_sync_page.
>  		 * Same reasoning can be applied to dirty page accounting.
>  		 */
> -		if (!can_unsync && is_writable_pte(*sptep))
> -			goto set_pte;
> +		if (!can_unsync && is_writable_pte(old_pte)) {
> +			*ptep = pte;
> +			return 0;
> +		}
>  
>  		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
>  			pgprintk("%s: found shadow page for %llx, marking ro\n",
>  				 __func__, gfn);
> -			ret |= SET_SPTE_WRITE_PROTECTED_PT;
> +			ret = SET_SPTE_WRITE_PROTECTED_PT;

More unnecessary refactoring.

>  			pte_access &= ~ACC_WRITE_MASK;
> -			spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
> +			pte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
>  		}
>  	}
>  
> -	if (pte_access & ACC_WRITE_MASK) {
> -		kvm_vcpu_mark_page_dirty(vcpu, gfn);
> -		spte |= spte_shadow_dirty_mask(spte);
> -	}
> +	if (pte_access & ACC_WRITE_MASK)
> +		pte |= spte_shadow_dirty_mask(pte);
>  
>  	if (speculative)
> -		spte = mark_spte_for_access_track(spte);
> +		pte = mark_spte_for_access_track(pte);
> +
> +	*ptep = pte;
> +	return ret;
> +}
> +
> +static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
> +		    int level, gfn_t gfn, kvm_pfn_t pfn, bool speculative,
> +		    bool can_unsync, bool host_writable)
> +{
> +	u64 spte;
> +	int ret;
> +	struct kvm_mmu_page *sp;
> +
> +	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
> +		return 0;
> +
> +	sp = page_header(__pa(sptep));
> +
> +	ret = generate_pte(vcpu, pte_access, level, gfn, pfn, *sptep,
> +			   speculative, can_unsync, host_writable,
> +			   sp_ad_disabled(sp), &spte);

Yowsers, that's a big prototype.  This is something that came up in an
unrelated internal discussion.  I wonder if it would make sense to define
a struct to hold all of the data needed to insert an spte and pass that
on the stack isntead of having a bajillion parameters.  Just spitballing,
no idea if it's feasible and/or reasonable.

> +	if (!spte)
> +		return 0;
> +
> +	if (spte & PT_WRITABLE_MASK)
> +		kvm_vcpu_mark_page_dirty(vcpu, gfn);
>  
> -set_pte:
>  	if (mmu_spte_update(sptep, spte))
>  		ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
> -done:
>  	return ret;
>  }
>  
> -- 
> 2.23.0.444.g18eeb5a265-goog
>

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 781c2ca7455e3..7e5ab9c6e2b09 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2964,21 +2964,15 @@  static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 #define SET_SPTE_WRITE_PROTECTED_PT	BIT(0)
 #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
 
-static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-		    unsigned pte_access, int level,
-		    gfn_t gfn, kvm_pfn_t pfn, bool speculative,
-		    bool can_unsync, bool host_writable)
+static int generate_pte(struct kvm_vcpu *vcpu, unsigned pte_access, int level,
+		    gfn_t gfn, kvm_pfn_t pfn, u64 old_pte, bool speculative,
+		    bool can_unsync, bool host_writable, bool ad_disabled,
+		    u64 *ptep)
 {
-	u64 spte = 0;
+	u64 pte;
 	int ret = 0;
-	struct kvm_mmu_page *sp;
-
-	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
-		return 0;
 
-	sp = page_header(__pa(sptep));
-	if (sp_ad_disabled(sp))
-		spte |= shadow_acc_track_value;
+	*ptep = 0;
 
 	/*
 	 * For the EPT case, shadow_present_mask is 0 if hardware
@@ -2986,36 +2980,39 @@  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	 * ACC_USER_MASK and shadow_user_mask are used to represent
 	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
 	 */
-	spte |= shadow_present_mask;
+	pte = shadow_present_mask;
+
+	if (ad_disabled)
+		pte |= shadow_acc_track_value;
+
 	if (!speculative)
-		spte |= spte_shadow_accessed_mask(spte);
+		pte |= spte_shadow_accessed_mask(pte);
 
 	if (pte_access & ACC_EXEC_MASK)
-		spte |= shadow_x_mask;
+		pte |= shadow_x_mask;
 	else
-		spte |= shadow_nx_mask;
+		pte |= shadow_nx_mask;
 
 	if (pte_access & ACC_USER_MASK)
-		spte |= shadow_user_mask;
+		pte |= shadow_user_mask;
 
 	if (level > PT_PAGE_TABLE_LEVEL)
-		spte |= PT_PAGE_SIZE_MASK;
+		pte |= PT_PAGE_SIZE_MASK;
 	if (tdp_enabled)
-		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
+		pte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));
 
 	if (host_writable)
-		spte |= SPTE_HOST_WRITEABLE;
+		pte |= SPTE_HOST_WRITEABLE;
 	else
 		pte_access &= ~ACC_WRITE_MASK;
 
 	if (!kvm_is_mmio_pfn(pfn))
-		spte |= shadow_me_mask;
+		pte |= shadow_me_mask;
 
-	spte |= (u64)pfn << PAGE_SHIFT;
+	pte |= (u64)pfn << PAGE_SHIFT;
 
 	if (pte_access & ACC_WRITE_MASK) {
-
 		/*
 		 * Other vcpu creates new sp in the window between
 		 * mapping_level() and acquiring mmu-lock. We can
@@ -3024,9 +3021,9 @@  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 */
 		if (level > PT_PAGE_TABLE_LEVEL &&
 		    mmu_gfn_lpage_is_disallowed(vcpu, gfn, level))
-			goto done;
+			return 0;
 
-		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
+		pte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
 
 		/*
 		 * Optimization: for pte sync, if spte was writable the hash
@@ -3034,30 +3031,54 @@  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * is responsibility of mmu_get_page / kvm_sync_page.
 		 * Same reasoning can be applied to dirty page accounting.
 		 */
-		if (!can_unsync && is_writable_pte(*sptep))
-			goto set_pte;
+		if (!can_unsync && is_writable_pte(old_pte)) {
+			*ptep = pte;
+			return 0;
+		}
 
 		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
-			ret |= SET_SPTE_WRITE_PROTECTED_PT;
+			ret = SET_SPTE_WRITE_PROTECTED_PT;
 			pte_access &= ~ACC_WRITE_MASK;
-			spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
+			pte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
 		}
 	}
 
-	if (pte_access & ACC_WRITE_MASK) {
-		kvm_vcpu_mark_page_dirty(vcpu, gfn);
-		spte |= spte_shadow_dirty_mask(spte);
-	}
+	if (pte_access & ACC_WRITE_MASK)
+		pte |= spte_shadow_dirty_mask(pte);
 
 	if (speculative)
-		spte = mark_spte_for_access_track(spte);
+		pte = mark_spte_for_access_track(pte);
+
+	*ptep = pte;
+	return ret;
+}
+
+static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
+		    int level, gfn_t gfn, kvm_pfn_t pfn, bool speculative,
+		    bool can_unsync, bool host_writable)
+{
+	u64 spte;
+	int ret;
+	struct kvm_mmu_page *sp;
+
+	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
+		return 0;
+
+	sp = page_header(__pa(sptep));
+
+	ret = generate_pte(vcpu, pte_access, level, gfn, pfn, *sptep,
+			   speculative, can_unsync, host_writable,
+			   sp_ad_disabled(sp), &spte);
+	if (!spte)
+		return 0;
+
+	if (spte & PT_WRITABLE_MASK)
+		kvm_vcpu_mark_page_dirty(vcpu, gfn);
 
-set_pte:
 	if (mmu_spte_update(sptep, spte))
 		ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
-done:
 	return ret;
 }

[RFC,02/28] kvm: mmu: Separate pte generation from set_spte

Commit Message

Comments

Patch