diff mbox series

KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4k

Message ID 20211011204418.162846-1-dmatlack@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4k | expand

Commit Message

David Matlack Oct. 11, 2021, 8:44 p.m. UTC
slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
whereas "leaf" is used to describe any valid terminal SPTE (4K or
large page). Rename slot_handle_leaf to slot_handle_level_4k to
avoid confusion.

Making this change makes it more obvious there is a benign discrepency
between the legacy MMU and the TDP MMU when it comes to dirty logging.
The legacy MMU only operates on 4K SPTEs when zapping for collapsing
and when clearing D-bits. The TDP MMU, on the other hand, operates on
SPTEs on all levels. The TDP MMU behavior is technically overkill but
not incorrect. So opportunistically add comments to explain the
difference.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

Comments

Ben Gardon Oct. 11, 2021, 9:07 p.m. UTC | #1
On Mon, Oct 11, 2021 at 1:44 PM David Matlack <dmatlack@google.com> wrote:
>
> slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
> whereas "leaf" is used to describe any valid terminal SPTE (4K or
> large page). Rename slot_handle_leaf to slot_handle_level_4k to
> avoid confusion.
>
> Making this change makes it more obvious there is a benign discrepency
> between the legacy MMU and the TDP MMU when it comes to dirty logging.
> The legacy MMU only operates on 4K SPTEs when zapping for collapsing
> and when clearing D-bits. The TDP MMU, on the other hand, operates on
> SPTEs on all levels. The TDP MMU behavior is technically overkill but
> not incorrect. So opportunistically add comments to explain the
> difference.

Note that at least in the zapping case when disabling dirty logging,
the TDP MMU will still only zap pages if they're mapped smaller than
the highest granularity they could be. As a result it uses a slower
check, but shouldn't be doing many (if any) extra zaps.

>
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  arch/x86/kvm/mmu/mmu.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 24a9f4c3f5e7..f00644e79ef5 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5382,8 +5382,8 @@ slot_handle_level(struct kvm *kvm, const struct kvm_memory_slot *memslot,
>  }
>
>  static __always_inline bool
> -slot_handle_leaf(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> -                slot_level_handler fn, bool flush_on_yield)
> +slot_handle_level_4k(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> +                    slot_level_handler fn, bool flush_on_yield)
>  {
>         return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
>                                  PG_LEVEL_4K, flush_on_yield);
> @@ -5772,7 +5772,12 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>
>         if (kvm_memslots_have_rmaps(kvm)) {
>                 write_lock(&kvm->mmu_lock);
> -               flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> +               /*
> +                * Strictly speaking only 4k SPTEs need to be zapped because
> +                * KVM never creates intermediate 2m mappings when performing
> +                * dirty logging.
> +                */
> +               flush = slot_handle_level_4k(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
>                 if (flush)
>                         kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
>                 write_unlock(&kvm->mmu_lock);
> @@ -5809,8 +5814,11 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
>
>         if (kvm_memslots_have_rmaps(kvm)) {
>                 write_lock(&kvm->mmu_lock);
> -               flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
> -                                        false);
> +               /*
> +                * Strictly speaking only 4k SPTEs need to be cleared because
> +                * KVM always performs dirty logging at a 4k granularity.
> +                */
> +               flush = slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false);
>                 write_unlock(&kvm->mmu_lock);
>         }
>
> --
> 2.33.0.882.g93a45727a2-goog
>
David Matlack Oct. 11, 2021, 9:25 p.m. UTC | #2
On Mon, Oct 11, 2021 at 2:07 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Mon, Oct 11, 2021 at 1:44 PM David Matlack <dmatlack@google.com> wrote:
> >
> > slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
> > whereas "leaf" is used to describe any valid terminal SPTE (4K or
> > large page). Rename slot_handle_leaf to slot_handle_level_4k to
> > avoid confusion.
> >
> > Making this change makes it more obvious there is a benign discrepency
> > between the legacy MMU and the TDP MMU when it comes to dirty logging.
> > The legacy MMU only operates on 4K SPTEs when zapping for collapsing
> > and when clearing D-bits. The TDP MMU, on the other hand, operates on
> > SPTEs on all levels. The TDP MMU behavior is technically overkill but
> > not incorrect. So opportunistically add comments to explain the
> > difference.
>
> Note that at least in the zapping case when disabling dirty logging,
> the TDP MMU will still only zap pages if they're mapped smaller than
> the highest granularity they could be. As a result it uses a slower
> check, but shouldn't be doing many (if any) extra zaps.

Agreed. The legacy MMU implementation relies on the fact that
collapsible 2M SPTEs are never generated by dirty logging so it only
needs to check 4K SPTEs.

The TDP MMU implementation is actually more robust, since it checks
every SPTE for collapsibility. The only reason it would be doing extra
zaps if there is something other than dirty logging can cause an SPTE
to be collapsible. (HugePage NX comes to mind.)

>
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
>
> Reviewed-by: Ben Gardon <bgardon@google.com>
>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 24a9f4c3f5e7..f00644e79ef5 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5382,8 +5382,8 @@ slot_handle_level(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> >  }
> >
> >  static __always_inline bool
> > -slot_handle_leaf(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> > -                slot_level_handler fn, bool flush_on_yield)
> > +slot_handle_level_4k(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> > +                    slot_level_handler fn, bool flush_on_yield)
> >  {
> >         return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
> >                                  PG_LEVEL_4K, flush_on_yield);
> > @@ -5772,7 +5772,12 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> >
> >         if (kvm_memslots_have_rmaps(kvm)) {
> >                 write_lock(&kvm->mmu_lock);
> > -               flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> > +               /*
> > +                * Strictly speaking only 4k SPTEs need to be zapped because
> > +                * KVM never creates intermediate 2m mappings when performing
> > +                * dirty logging.
> > +                */
> > +               flush = slot_handle_level_4k(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> >                 if (flush)
> >                         kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> >                 write_unlock(&kvm->mmu_lock);
> > @@ -5809,8 +5814,11 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
> >
> >         if (kvm_memslots_have_rmaps(kvm)) {
> >                 write_lock(&kvm->mmu_lock);
> > -               flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
> > -                                        false);
> > +               /*
> > +                * Strictly speaking only 4k SPTEs need to be cleared because
> > +                * KVM always performs dirty logging at a 4k granularity.
> > +                */
> > +               flush = slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false);
> >                 write_unlock(&kvm->mmu_lock);
> >         }
> >
> > --
> > 2.33.0.882.g93a45727a2-goog
> >
David Matlack Oct. 11, 2021, 9:27 p.m. UTC | #3
On Mon, Oct 11, 2021 at 2:25 PM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, Oct 11, 2021 at 2:07 PM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 1:44 PM David Matlack <dmatlack@google.com> wrote:
> > >
> > > slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
> > > whereas "leaf" is used to describe any valid terminal SPTE (4K or
> > > large page). Rename slot_handle_leaf to slot_handle_level_4k to
> > > avoid confusion.
> > >
> > > Making this change makes it more obvious there is a benign discrepency
> > > between the legacy MMU and the TDP MMU when it comes to dirty logging.
> > > The legacy MMU only operates on 4K SPTEs when zapping for collapsing
> > > and when clearing D-bits. The TDP MMU, on the other hand, operates on
> > > SPTEs on all levels. The TDP MMU behavior is technically overkill but
> > > not incorrect. So opportunistically add comments to explain the
> > > difference.
> >
> > Note that at least in the zapping case when disabling dirty logging,
> > the TDP MMU will still only zap pages if they're mapped smaller than
> > the highest granularity they could be. As a result it uses a slower
> > check, but shouldn't be doing many (if any) extra zaps.
>
> Agreed. The legacy MMU implementation relies on the fact that
> collapsible 2M SPTEs are never generated by dirty logging so it only
> needs to check 4K SPTEs.
>
> The TDP MMU implementation is actually more robust, since it checks
> every SPTE for collapsibility. The only reason it would be doing extra
> zaps if there is something other than dirty logging can cause an SPTE
> to be collapsible. (HugePage NX comes to mind.)

Ah but HugePage NX does not create 2M SPTEs so this wouldn't actually
result in extra zaps.

>
> >
> > >
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> >
> > Reviewed-by: Ben Gardon <bgardon@google.com>
> >
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 18 +++++++++++++-----
> > >  1 file changed, 13 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 24a9f4c3f5e7..f00644e79ef5 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -5382,8 +5382,8 @@ slot_handle_level(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> > >  }
> > >
> > >  static __always_inline bool
> > > -slot_handle_leaf(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> > > -                slot_level_handler fn, bool flush_on_yield)
> > > +slot_handle_level_4k(struct kvm *kvm, const struct kvm_memory_slot *memslot,
> > > +                    slot_level_handler fn, bool flush_on_yield)
> > >  {
> > >         return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
> > >                                  PG_LEVEL_4K, flush_on_yield);
> > > @@ -5772,7 +5772,12 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> > >
> > >         if (kvm_memslots_have_rmaps(kvm)) {
> > >                 write_lock(&kvm->mmu_lock);
> > > -               flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> > > +               /*
> > > +                * Strictly speaking only 4k SPTEs need to be zapped because
> > > +                * KVM never creates intermediate 2m mappings when performing
> > > +                * dirty logging.
> > > +                */
> > > +               flush = slot_handle_level_4k(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> > >                 if (flush)
> > >                         kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> > >                 write_unlock(&kvm->mmu_lock);
> > > @@ -5809,8 +5814,11 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
> > >
> > >         if (kvm_memslots_have_rmaps(kvm)) {
> > >                 write_lock(&kvm->mmu_lock);
> > > -               flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
> > > -                                        false);
> > > +               /*
> > > +                * Strictly speaking only 4k SPTEs need to be cleared because
> > > +                * KVM always performs dirty logging at a 4k granularity.
> > > +                */
> > > +               flush = slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false);
> > >                 write_unlock(&kvm->mmu_lock);
> > >         }
> > >
> > > --
> > > 2.33.0.882.g93a45727a2-goog
> > >
Sean Christopherson Oct. 15, 2021, 4:02 p.m. UTC | #4
On Mon, Oct 11, 2021, David Matlack wrote:
> On Mon, Oct 11, 2021 at 2:07 PM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 1:44 PM David Matlack <dmatlack@google.com> wrote:
> > >
> > > slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
> > > whereas "leaf" is used to describe any valid terminal SPTE (4K or
> > > large page). Rename slot_handle_leaf to slot_handle_level_4k to
> > > avoid confusion.
> > >
> > > Making this change makes it more obvious there is a benign discrepency
> > > between the legacy MMU and the TDP MMU when it comes to dirty logging.
> > > The legacy MMU only operates on 4K SPTEs when zapping for collapsing
> > > and when clearing D-bits. The TDP MMU, on the other hand, operates on
> > > SPTEs on all levels. The TDP MMU behavior is technically overkill but
> > > not incorrect. So opportunistically add comments to explain the
> > > difference.
> >
> > Note that at least in the zapping case when disabling dirty logging,
> > the TDP MMU will still only zap pages if they're mapped smaller than
> > the highest granularity they could be. As a result it uses a slower
> > check, but shouldn't be doing many (if any) extra zaps.

FWIW, the legacy MMU now has the same guard against spurious zaps.

> Agreed. The legacy MMU implementation relies on the fact that
> collapsible 2M SPTEs are never generated by dirty logging so it only
> needs to check 4K SPTEs.

I think it makes sense to send a v2 to further clarify the TDP MMU with these
details, e.g. expand the last sentences to something like

  The TDP MMU behavior of zapping SPTEs at all levels is technically overkill for
  its current dirty logging implementation, which always demotes to 4k SPTES, but
  both the TDP MMU and legacy MMU zap if and only if the SPTE can be replaced by
  a larger page, i.e. will not spuriously zap 2m (or larger) SPTEs.

> > > @@ -5772,7 +5772,12 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> > >
> > >         if (kvm_memslots_have_rmaps(kvm)) {
> > >                 write_lock(&kvm->mmu_lock);
> > > -               flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> > > +               /*
> > > +                * Strictly speaking only 4k SPTEs need to be zapped because
> > > +                * KVM never creates intermediate 2m mappings when performing
> > > +                * dirty logging.

And then also tweak this comment to clarify that the _legacy_ MMU never creates
2m SPTEs when dirty logging, so that the comment doesn't become incorrect if KVM
ever supports 2mb+ granularity in the TDP MMU but not the legacy MMU, e.g.

			/*
			 * Zap only 4k SPTEs, the legacy MMU only supports dirty
			 * logging at 4k granularity, i.e. doesn't create 2m
			 * SPTEs when dirty logging is enabled.
			 */

> > > +                */
> > > +               flush = slot_handle_level_4k(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> > >                 if (flush)
> > >                         kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> > >                 write_unlock(&kvm->mmu_lock);
> > > @@ -5809,8 +5814,11 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
> > >
> > >         if (kvm_memslots_have_rmaps(kvm)) {
> > >                 write_lock(&kvm->mmu_lock);
> > > -               flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
> > > -                                        false);
> > > +               /*
> > > +                * Strictly speaking only 4k SPTEs need to be cleared because
> > > +                * KVM always performs dirty logging at a 4k granularity.

And "s/KVM/the legacy MMU" here as well.

> > > +                */
> > > +               flush = slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false);
> > >                 write_unlock(&kvm->mmu_lock);
> > >         }
> > >
> > > --
> > > 2.33.0.882.g93a45727a2-goog
> > >
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 24a9f4c3f5e7..f00644e79ef5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5382,8 +5382,8 @@  slot_handle_level(struct kvm *kvm, const struct kvm_memory_slot *memslot,
 }
 
 static __always_inline bool
-slot_handle_leaf(struct kvm *kvm, const struct kvm_memory_slot *memslot,
-		 slot_level_handler fn, bool flush_on_yield)
+slot_handle_level_4k(struct kvm *kvm, const struct kvm_memory_slot *memslot,
+		     slot_level_handler fn, bool flush_on_yield)
 {
 	return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
 				 PG_LEVEL_4K, flush_on_yield);
@@ -5772,7 +5772,12 @@  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 
 	if (kvm_memslots_have_rmaps(kvm)) {
 		write_lock(&kvm->mmu_lock);
-		flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
+		/*
+		 * Strictly speaking only 4k SPTEs need to be zapped because
+		 * KVM never creates intermediate 2m mappings when performing
+		 * dirty logging.
+		 */
+		flush = slot_handle_level_4k(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
 		if (flush)
 			kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
 		write_unlock(&kvm->mmu_lock);
@@ -5809,8 +5814,11 @@  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 
 	if (kvm_memslots_have_rmaps(kvm)) {
 		write_lock(&kvm->mmu_lock);
-		flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
-					 false);
+		/*
+		 * Strictly speaking only 4k SPTEs need to be cleared because
+		 * KVM always performs dirty logging at a 4k granularity.
+		 */
+		flush = slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false);
 		write_unlock(&kvm->mmu_lock);
 	}