diff mbox series

[v2,6/7] KVM: arm64: Break the table entries using TLBI range instructions

Message ID 20230206172340.2639971-7-rananta@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Add support for FEAT_TLBIRANGE | expand

Commit Message

Raghavendra Rao Ananta Feb. 6, 2023, 5:23 p.m. UTC
Currently, when breaking up the stage-2 table entries, KVM
would flush the entire VM's context using 'vmalls12e1is'
TLBI operation. One of the problematic situation is collapsing
table entries into a hugepage, specifically if the VM is
faulting on many hugepages (say after dirty-logging). This
creates a performance penality for the guest whose pages have
already been faulted earlier as they would have to refill their
TLBs again.

Hence, if the system supports it, use __kvm_tlb_flush_range_vmid_ipa()
to flush only the range of pages governed by the table entry,
while leaving other TLB entries alone. An upcoming patch also
takes advantage of this when breaking up table entries during
the unmap operation.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Comments

Oliver Upton March 30, 2023, 12:17 a.m. UTC | #1
nit: s/break/invalidate/g

There is a rather important degree of nuance there. 'Break' as it
relates to break-before-make implies that the PTE is made invalid and
visible to hardware _before_ a subsequent invalidation. There will be
systems that relax this requirement and also support TLBIRANGE.

On Mon, Feb 06, 2023 at 05:23:39PM +0000, Raghavendra Rao Ananta wrote:

Some nitpicking on the changelog:

> Currently, when breaking up the stage-2 table entries, KVM

'breaking up stage-2 table entries' is rather ambiguous. Instead
describe the operation taking place on the page tables (i.e. hugepage
collapse).

> would flush the entire VM's context using 'vmalls12e1is'
> TLBI operation. One of the problematic situation is collapsing
> table entries into a hugepage, specifically if the VM is
> faulting on many hugepages (say after dirty-logging). This
> creates a performance penality for the guest whose pages have

typo: penalty

> already been faulted earlier as they would have to refill their
> TLBs again.
> 
> Hence, if the system supports it, use __kvm_tlb_flush_range_vmid_ipa()

> to flush only the range of pages governed by the table entry,
> while leaving other TLB entries alone. An upcoming patch also
> takes advantage of this when breaking up table entries during
> the unmap operation.

Language regarding an upcoming patch isn't necessary, as this one stands
on its own (implements and uses a range-based invalidation helper).

> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b11cf2c618a6c..0858d1fa85d6b 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -686,6 +686,20 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
>  	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
>  }
>  
> +static void kvm_pgtable_stage2_flush_range(struct kvm_s2_mmu *mmu, u64 start, u64 end,
> +						u32 level, u32 tlb_level)
> +{
> +	if (system_supports_tlb_range())

You also check this in __kvm_tlb_flush_range(), ideally this should be
done exactly once per call.

> +		kvm_call_hyp(__kvm_tlb_flush_range_vmid_ipa, mmu, start, end, level, tlb_level);
> +	else
> +		/*
> +		 * Invalidate the whole stage-2, as we may have numerous leaf
> +		 * entries below us which would otherwise need invalidating
> +		 * individually.
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +}
> +
>  /**
>   * stage2_try_break_pte() - Invalidates a pte according to the
>   *			    'break-before-make' requirements of the
> @@ -721,10 +735,13 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
>  	 * Perform the appropriate TLB invalidation based on the evicted pte
>  	 * value (if any).
>  	 */
> -	if (kvm_pte_table(ctx->old, ctx->level))
> -		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> -	else if (kvm_pte_valid(ctx->old))
> +	if (kvm_pte_table(ctx->old, ctx->level)) {
> +		u64 end = ctx->addr + kvm_granule_size(ctx->level);
> +
> +		kvm_pgtable_stage2_flush_range(mmu, ctx->addr, end, ctx->level, 0);
> +	} else if (kvm_pte_valid(ctx->old)) {
>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +	}
>  
>  	if (stage2_pte_is_counted(ctx->old))
>  		mm_ops->put_page(ctx->ptep);
> -- 
> 2.39.1.519.gcb327c4b5f-goog
> 
>
Raghavendra Rao Ananta April 3, 2023, 9:25 p.m. UTC | #2
On Wed, Mar 29, 2023 at 5:17 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> nit: s/break/invalidate/g
>
> There is a rather important degree of nuance there. 'Break' as it
> relates to break-before-make implies that the PTE is made invalid and
> visible to hardware _before_ a subsequent invalidation. There will be
> systems that relax this requirement and also support TLBIRANGE.
>
> On Mon, Feb 06, 2023 at 05:23:39PM +0000, Raghavendra Rao Ananta wrote:
>
> Some nitpicking on the changelog:
>
> > Currently, when breaking up the stage-2 table entries, KVM
>
> 'breaking up stage-2 table entries' is rather ambiguous. Instead
> describe the operation taking place on the page tables (i.e. hugepage
> collapse).
>
> > would flush the entire VM's context using 'vmalls12e1is'
> > TLBI operation. One of the problematic situation is collapsing
> > table entries into a hugepage, specifically if the VM is
> > faulting on many hugepages (say after dirty-logging). This
> > creates a performance penality for the guest whose pages have
>
> typo: penalty
>
> > already been faulted earlier as they would have to refill their
> > TLBs again.
> >
> > Hence, if the system supports it, use __kvm_tlb_flush_range_vmid_ipa()
>
> > to flush only the range of pages governed by the table entry,
> > while leaving other TLB entries alone. An upcoming patch also
> > takes advantage of this when breaking up table entries during
> > the unmap operation.
>
> Language regarding an upcoming patch isn't necessary, as this one stands
> on its own (implements and uses a range-based invalidation helper).
>
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index b11cf2c618a6c..0858d1fa85d6b 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -686,6 +686,20 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
> >       return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> >  }
> >
> > +static void kvm_pgtable_stage2_flush_range(struct kvm_s2_mmu *mmu, u64 start, u64 end,
> > +                                             u32 level, u32 tlb_level)
> > +{
> > +     if (system_supports_tlb_range())
>
> You also check this in __kvm_tlb_flush_range(), ideally this should be
> done exactly once per call.
>
> > +             kvm_call_hyp(__kvm_tlb_flush_range_vmid_ipa, mmu, start, end, level, tlb_level);
> > +     else
> > +             /*
> > +              * Invalidate the whole stage-2, as we may have numerous leaf
> > +              * entries below us which would otherwise need invalidating
> > +              * individually.
> > +              */
> > +             kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > +}
> > +
> >  /**
> >   * stage2_try_break_pte() - Invalidates a pte according to the
> >   *                       'break-before-make' requirements of the
> > @@ -721,10 +735,13 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> >        * Perform the appropriate TLB invalidation based on the evicted pte
> >        * value (if any).
> >        */
> > -     if (kvm_pte_table(ctx->old, ctx->level))
> > -             kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > -     else if (kvm_pte_valid(ctx->old))
> > +     if (kvm_pte_table(ctx->old, ctx->level)) {
> > +             u64 end = ctx->addr + kvm_granule_size(ctx->level);
> > +
> > +             kvm_pgtable_stage2_flush_range(mmu, ctx->addr, end, ctx->level, 0);
> > +     } else if (kvm_pte_valid(ctx->old)) {
> >               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > +     }
> >
> >       if (stage2_pte_is_counted(ctx->old))
> >               mm_ops->put_page(ctx->ptep);
> > --
> > 2.39.1.519.gcb327c4b5f-goog
> >
> >
ACK on all of the comments. I'll address them in next revision.

Thank you.
Raghavendra
>
> --
> Thanks,
> Oliver
diff mbox series

Patch

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b11cf2c618a6c..0858d1fa85d6b 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -686,6 +686,20 @@  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
 	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
 }
 
+static void kvm_pgtable_stage2_flush_range(struct kvm_s2_mmu *mmu, u64 start, u64 end,
+						u32 level, u32 tlb_level)
+{
+	if (system_supports_tlb_range())
+		kvm_call_hyp(__kvm_tlb_flush_range_vmid_ipa, mmu, start, end, level, tlb_level);
+	else
+		/*
+		 * Invalidate the whole stage-2, as we may have numerous leaf
+		 * entries below us which would otherwise need invalidating
+		 * individually.
+		 */
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+}
+
 /**
  * stage2_try_break_pte() - Invalidates a pte according to the
  *			    'break-before-make' requirements of the
@@ -721,10 +735,13 @@  static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
 	 * Perform the appropriate TLB invalidation based on the evicted pte
 	 * value (if any).
 	 */
-	if (kvm_pte_table(ctx->old, ctx->level))
-		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
-	else if (kvm_pte_valid(ctx->old))
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		u64 end = ctx->addr + kvm_granule_size(ctx->level);
+
+		kvm_pgtable_stage2_flush_range(mmu, ctx->addr, end, ctx->level, 0);
+	} else if (kvm_pte_valid(ctx->old)) {
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+	}
 
 	if (stage2_pte_is_counted(ctx->old))
 		mm_ops->put_page(ctx->ptep);