diff mbox series

[v6,05/11] mm: Add fast_only bool to test_young and clear_young MMU notifiers

Message ID 20240724011037.3671523-6-jthoughton@google.com (mailing list archive)
State New, archived
Headers show
Series mm: multi-gen LRU: Walk secondary MMU page tables while aging | expand

Commit Message

James Houghton July 24, 2024, 1:10 a.m. UTC
For implementers, the fast_only bool indicates that the age information
needs to be harvested such that we do not slow down other MMU operations,
and ideally that we are not ourselves slowed down by other MMU
operations.  Usually this means that the implementation should be
lockless.

Also add mmu_notifier_test_young_fast_only() and
mmu_notifier_clear_young_fast_only() helpers to set fast_only for these
notifiers.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/mmu_notifier.h | 46 +++++++++++++++++++++++++++++++-----
 include/trace/events/kvm.h   | 19 +++++++++------
 mm/mmu_notifier.c            | 12 ++++++----
 virt/kvm/kvm_main.c          | 12 ++++++----
 4 files changed, 67 insertions(+), 22 deletions(-)

Comments

David Hildenbrand Aug. 1, 2024, 9:36 a.m. UTC | #1
On 24.07.24 03:10, James Houghton wrote:
> For implementers, the fast_only bool indicates that the age information
> needs to be harvested such that we do not slow down other MMU operations,
> and ideally that we are not ourselves slowed down by other MMU
> operations.  Usually this means that the implementation should be
> lockless.

But what are the semantics if "fast_only" cannot be achieved by the 
implementer?

Can we add some documentation to the new functions that explain what 
this mysterious "fast_only" is and what the expected semantics are? 
Please? :)
James Houghton Aug. 1, 2024, 11:13 p.m. UTC | #2
On Thu, Aug 1, 2024 at 2:36 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.07.24 03:10, James Houghton wrote:
> > For implementers, the fast_only bool indicates that the age information
> > needs to be harvested such that we do not slow down other MMU operations,
> > and ideally that we are not ourselves slowed down by other MMU
> > operations.  Usually this means that the implementation should be
> > lockless.
>
> But what are the semantics if "fast_only" cannot be achieved by the
> implementer?
>
> Can we add some documentation to the new functions that explain what
> this mysterious "fast_only" is and what the expected semantics are?
> Please? :)

Thanks for pointing out the missing documentation. How's this?

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 45c5995ebd84..c21992036dd3 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -106,6 +106,18 @@ struct mmu_notifier_ops {
         * clear_young is a lightweight version of clear_flush_young. Like the
         * latter, it is supposed to test-and-clear the young/accessed bitflag
         * in the secondary pte, but it may omit flushing the secondary tlb.
+        *
+        * The fast_only parameter indicates that this call should not block,
+        * and this function should not cause other MMU notifier calls to
+        * block. Usually this means that the implementation should be
+        * lockless.
+        *
+        * When called with fast_only, this notifier will be a no-op unless
+        * has_fast_aging is set on the struct mmu_notifier.
+        *
+        * When fast_only is true, if the implementer cannot determine that a
+        * range is young without blocking, it should return 0 (i.e.,
+        * that the range is NOT young).
         */
        int (*clear_young)(struct mmu_notifier *subscription,
                           struct mm_struct *mm,
@@ -118,6 +130,8 @@ struct mmu_notifier_ops {
         * the secondary pte. This is used to know if the page is
         * frequently used without actually clearing the flag or tearing
         * down the secondary mapping on the page.
+        *
+        * The fast_only parameter has the same meaning as with clear_young.
         */
        int (*test_young)(struct mmu_notifier *subscription,
                          struct mm_struct *mm,

I've also moved the commit that follows this one (the one that adds
has_fast_aging) to be before this one so that the comment makes sense.
David Hildenbrand Aug. 2, 2024, 3:57 p.m. UTC | #3
On 02.08.24 01:13, James Houghton wrote:
> On Thu, Aug 1, 2024 at 2:36 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 24.07.24 03:10, James Houghton wrote:
>>> For implementers, the fast_only bool indicates that the age information
>>> needs to be harvested such that we do not slow down other MMU operations,
>>> and ideally that we are not ourselves slowed down by other MMU
>>> operations.  Usually this means that the implementation should be
>>> lockless.
>>
>> But what are the semantics if "fast_only" cannot be achieved by the
>> implementer?
>>
>> Can we add some documentation to the new functions that explain what
>> this mysterious "fast_only" is and what the expected semantics are?
>> Please? :)
> 
> Thanks for pointing out the missing documentation. How's this?
> 
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index 45c5995ebd84..c21992036dd3 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -106,6 +106,18 @@ struct mmu_notifier_ops {
>           * clear_young is a lightweight version of clear_flush_young. Like the
>           * latter, it is supposed to test-and-clear the young/accessed bitflag
>           * in the secondary pte, but it may omit flushing the secondary tlb.
> +        *

Probably makes sense to highlight the parameters like @fast_only

> +        * The fast_only parameter indicates that this call should not block,
> +        * and this function should not cause other MMU notifier calls to
> +        * block. Usually this means that the implementation should be
> +        * lockless.
> +        *
> +        * When called with fast_only, this notifier will be a no-op unless
> +        * has_fast_aging is set on the struct mmu_notifier.

"... and will return 0 (NOT young)." ?

> +        *
> +        * When fast_only is true, if the implementer cannot determine that a
> +        * range is young without blocking, it should return 0 (i.e.,
> +        * that the range is NOT young).
>           */
>          int (*clear_young)(struct mmu_notifier *subscription,
>                             struct mm_struct *mm,
> @@ -118,6 +130,8 @@ struct mmu_notifier_ops {
>           * the secondary pte. This is used to know if the page is
>           * frequently used without actually clearing the flag or tearing
>           * down the secondary mapping on the page.
> +        *
> +        * The fast_only parameter has the same meaning as with clear_young.
>           */
>          int (*test_young)(struct mmu_notifier *subscription,
>                            struct mm_struct *mm,
> 
> I've also moved the commit that follows this one (the one that adds
> has_fast_aging) to be before this one so that the comment makes sense.


Makes sense, thanks!
James Houghton Aug. 5, 2024, 4:54 p.m. UTC | #4
On Fri, Aug 2, 2024 at 8:57 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 02.08.24 01:13, James Houghton wrote:
> > On Thu, Aug 1, 2024 at 2:36 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 24.07.24 03:10, James Houghton wrote:
> >>> For implementers, the fast_only bool indicates that the age information
> >>> needs to be harvested such that we do not slow down other MMU operations,
> >>> and ideally that we are not ourselves slowed down by other MMU
> >>> operations.  Usually this means that the implementation should be
> >>> lockless.
> >>
> >> But what are the semantics if "fast_only" cannot be achieved by the
> >> implementer?
> >>
> >> Can we add some documentation to the new functions that explain what
> >> this mysterious "fast_only" is and what the expected semantics are?
> >> Please? :)
> >
> > Thanks for pointing out the missing documentation. How's this?
> >
> > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > index 45c5995ebd84..c21992036dd3 100644
> > --- a/include/linux/mmu_notifier.h
> > +++ b/include/linux/mmu_notifier.h
> > @@ -106,6 +106,18 @@ struct mmu_notifier_ops {
> >           * clear_young is a lightweight version of clear_flush_young. Like the
> >           * latter, it is supposed to test-and-clear the young/accessed bitflag
> >           * in the secondary pte, but it may omit flushing the secondary tlb.
> > +        *
>
> Probably makes sense to highlight the parameters like @fast_only

Will do.

>
> > +        * The fast_only parameter indicates that this call should not block,
> > +        * and this function should not cause other MMU notifier calls to
> > +        * block. Usually this means that the implementation should be
> > +        * lockless.
> > +        *
> > +        * When called with fast_only, this notifier will be a no-op unless
> > +        * has_fast_aging is set on the struct mmu_notifier.
>
> "... and will return 0 (NOT young)." ?

Thanks, I'll add this.

>
> > +        *
> > +        * When fast_only is true, if the implementer cannot determine that a
> > +        * range is young without blocking, it should return 0 (i.e.,
> > +        * that the range is NOT young).
> >           */
> >          int (*clear_young)(struct mmu_notifier *subscription,
> >                             struct mm_struct *mm,
> > @@ -118,6 +130,8 @@ struct mmu_notifier_ops {
> >           * the secondary pte. This is used to know if the page is
> >           * frequently used without actually clearing the flag or tearing
> >           * down the secondary mapping on the page.
> > +        *
> > +        * The fast_only parameter has the same meaning as with clear_young.
> >           */
> >          int (*test_young)(struct mmu_notifier *subscription,
> >                            struct mm_struct *mm,
> >
> > I've also moved the commit that follows this one (the one that adds
> > has_fast_aging) to be before this one so that the comment makes sense.
>
>
> Makes sense, thanks!

Thanks David!
Jason Gunthorpe Aug. 6, 2024, 5:23 p.m. UTC | #5
On Thu, Aug 01, 2024 at 04:13:40PM -0700, James Houghton wrote:
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -106,6 +106,18 @@ struct mmu_notifier_ops {
>          * clear_young is a lightweight version of clear_flush_young. Like the
>          * latter, it is supposed to test-and-clear the young/accessed bitflag
>          * in the secondary pte, but it may omit flushing the secondary tlb.
> +        *
> +        * The fast_only parameter indicates that this call should not block,
> +        * and this function should not cause other MMU notifier calls to
> +        * block. Usually this means that the implementation should be
> +        * lockless.
> +        *
> +        * When called with fast_only, this notifier will be a no-op unless
> +        * has_fast_aging is set on the struct mmu_notifier.

If you add a has_fast_aging I wonder if it is better to introduce new
ops instead? The semantics are a bit easier to explain that way

Jason
James Houghton Aug. 7, 2024, 3:02 p.m. UTC | #6
On Tue, Aug 6, 2024 at 10:23 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Aug 01, 2024 at 04:13:40PM -0700, James Houghton wrote:
> > --- a/include/linux/mmu_notifier.h
> > +++ b/include/linux/mmu_notifier.h
> > @@ -106,6 +106,18 @@ struct mmu_notifier_ops {
> >          * clear_young is a lightweight version of clear_flush_young. Like the
> >          * latter, it is supposed to test-and-clear the young/accessed bitflag
> >          * in the secondary pte, but it may omit flushing the secondary tlb.
> > +        *
> > +        * The fast_only parameter indicates that this call should not block,
> > +        * and this function should not cause other MMU notifier calls to
> > +        * block. Usually this means that the implementation should be
> > +        * lockless.
> > +        *
> > +        * When called with fast_only, this notifier will be a no-op unless
> > +        * has_fast_aging is set on the struct mmu_notifier.
>
> If you add a has_fast_aging I wonder if it is better to introduce new
> ops instead? The semantics are a bit easier to explain that way

v5 implemented these with a new op[1]. *Just* having the new op is
kind of problematic -- we have yet another op to do something very
similar to what already exists. We are left with two options:
consolidate everything into a single notifier[2] or add a new
parameter to test/clear_young()[3]. The latter, implemented in this
v6, is somewhat simpler to implement (fewer LoC, reduces some
duplication in KVM), though it does indeed make the explanation for
test/clear_young() slightly more complex. I don't feel very strongly,
but unless you do, I think I just ought to stick with how the v6 does
it. :)

Thanks Jason!

[1]: https://lore.kernel.org/linux-mm/20240611002145.2078921-5-jthoughton@google.com/
[2]: https://lore.kernel.org/linux-mm/CADrL8HVHcKSW3hiHzKTit07gzo36jtCZCnM9ZpueyifgNdGggw@mail.gmail.com/
[3]: https://lore.kernel.org/linux-mm/CADrL8HXhGFWwHt728Bg15x1YxJmS=WD8z=KJc_Koaah=OvHDwg@mail.gmail.com/
Jason Gunthorpe Aug. 7, 2024, 6:26 p.m. UTC | #7
On Wed, Aug 07, 2024 at 08:02:26AM -0700, James Houghton wrote:
> On Tue, Aug 6, 2024 at 10:23 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Aug 01, 2024 at 04:13:40PM -0700, James Houghton wrote:
> > > --- a/include/linux/mmu_notifier.h
> > > +++ b/include/linux/mmu_notifier.h
> > > @@ -106,6 +106,18 @@ struct mmu_notifier_ops {
> > >          * clear_young is a lightweight version of clear_flush_young. Like the
> > >          * latter, it is supposed to test-and-clear the young/accessed bitflag
> > >          * in the secondary pte, but it may omit flushing the secondary tlb.
> > > +        *
> > > +        * The fast_only parameter indicates that this call should not block,
> > > +        * and this function should not cause other MMU notifier calls to
> > > +        * block. Usually this means that the implementation should be
> > > +        * lockless.
> > > +        *
> > > +        * When called with fast_only, this notifier will be a no-op unless
> > > +        * has_fast_aging is set on the struct mmu_notifier.
> >
> > If you add a has_fast_aging I wonder if it is better to introduce new
> > ops instead? The semantics are a bit easier to explain that way
> 
> v5 implemented these with a new op[1]. *Just* having the new op is
> kind of problematic -- we have yet another op to do something very
> similar to what already exists. We are left with two options:
> consolidate everything into a single notifier[2] or add a new
> parameter to test/clear_young()[3]. The latter, implemented in this
> v6, is somewhat simpler to implement (fewer LoC, reduces some
> duplication in KVM), though it does indeed make the explanation for
> test/clear_young() slightly more complex. I don't feel very strongly,
> but unless you do, I think I just ought to stick with how the v6 does
> it. :)

If it does makes the code simpler then it is probably the better choice

Jason
diff mbox series

Patch

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index e2dd57ca368b..45c5995ebd84 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -110,7 +110,8 @@  struct mmu_notifier_ops {
 	int (*clear_young)(struct mmu_notifier *subscription,
 			   struct mm_struct *mm,
 			   unsigned long start,
-			   unsigned long end);
+			   unsigned long end,
+			   bool fast_only);
 
 	/*
 	 * test_young is called to check the young/accessed bitflag in
@@ -120,7 +121,8 @@  struct mmu_notifier_ops {
 	 */
 	int (*test_young)(struct mmu_notifier *subscription,
 			  struct mm_struct *mm,
-			  unsigned long address);
+			  unsigned long address,
+			  bool fast_only);
 
 	/*
 	 * invalidate_range_start() and invalidate_range_end() must be
@@ -380,9 +382,11 @@  extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
 					  unsigned long end);
 extern int __mmu_notifier_clear_young(struct mm_struct *mm,
 				      unsigned long start,
-				      unsigned long end);
+				      unsigned long end,
+				      bool fast_only);
 extern int __mmu_notifier_test_young(struct mm_struct *mm,
-				     unsigned long address);
+				     unsigned long address,
+				     bool fast_only);
 extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r);
 extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r);
 extern void __mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm,
@@ -416,7 +420,16 @@  static inline int mmu_notifier_clear_young(struct mm_struct *mm,
 					   unsigned long end)
 {
 	if (mm_has_notifiers(mm))
-		return __mmu_notifier_clear_young(mm, start, end);
+		return __mmu_notifier_clear_young(mm, start, end, false);
+	return 0;
+}
+
+static inline int mmu_notifier_clear_young_fast_only(struct mm_struct *mm,
+						     unsigned long start,
+						     unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		return __mmu_notifier_clear_young(mm, start, end, true);
 	return 0;
 }
 
@@ -424,7 +437,15 @@  static inline int mmu_notifier_test_young(struct mm_struct *mm,
 					  unsigned long address)
 {
 	if (mm_has_notifiers(mm))
-		return __mmu_notifier_test_young(mm, address);
+		return __mmu_notifier_test_young(mm, address, false);
+	return 0;
+}
+
+static inline int mmu_notifier_test_young_fast_only(struct mm_struct *mm,
+						    unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		return __mmu_notifier_test_young(mm, address, true);
 	return 0;
 }
 
@@ -613,12 +634,25 @@  static inline int mmu_notifier_clear_young(struct mm_struct *mm,
 	return 0;
 }
 
+static inline int mmu_notifier_clear_young_fast_only(struct mm_struct *mm,
+						     unsigned long start,
+						     unsigned long end)
+{
+	return 0;
+}
+
 static inline int mmu_notifier_test_young(struct mm_struct *mm,
 					  unsigned long address)
 {
 	return 0;
 }
 
+static inline int mmu_notifier_test_young_fast_only(struct mm_struct *mm,
+						    unsigned long address)
+{
+	return 0;
+}
+
 static inline void
 mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 {
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 74e40d5d4af4..6d9485cf3e51 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -457,36 +457,41 @@  TRACE_EVENT(kvm_unmap_hva_range,
 );
 
 TRACE_EVENT(kvm_age_hva,
-	TP_PROTO(unsigned long start, unsigned long end),
-	TP_ARGS(start, end),
+	TP_PROTO(unsigned long start, unsigned long end, bool fast_only),
+	TP_ARGS(start, end, fast_only),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	start		)
 		__field(	unsigned long,	end		)
+		__field(	bool,		fast_only	)
 	),
 
 	TP_fast_assign(
 		__entry->start		= start;
 		__entry->end		= end;
+		__entry->fast_only	= fast_only;
 	),
 
-	TP_printk("mmu notifier age hva: %#016lx -- %#016lx",
-		  __entry->start, __entry->end)
+	TP_printk("mmu notifier age hva: %#016lx -- %#016lx fast_only: %d",
+		  __entry->start, __entry->end, __entry->fast_only)
 );
 
 TRACE_EVENT(kvm_test_age_hva,
-	TP_PROTO(unsigned long hva),
-	TP_ARGS(hva),
+	TP_PROTO(unsigned long hva, bool fast_only),
+	TP_ARGS(hva, fast_only),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	hva		)
+		__field(	bool,		fast_only	)
 	),
 
 	TP_fast_assign(
 		__entry->hva		= hva;
+		__entry->fast_only	= fast_only;
 	),
 
-	TP_printk("mmu notifier test age hva: %#016lx", __entry->hva)
+	TP_printk("mmu notifier test age hva: %#016lx fast_only: %d",
+		  __entry->hva, __entry->fast_only)
 );
 
 #endif /* _TRACE_KVM_MAIN_H */
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 8982e6139d07..f9a0ca6ffe65 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -384,7 +384,8 @@  int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
 
 int __mmu_notifier_clear_young(struct mm_struct *mm,
 			       unsigned long start,
-			       unsigned long end)
+			       unsigned long end,
+			       bool fast_only)
 {
 	struct mmu_notifier *subscription;
 	int young = 0, id;
@@ -395,7 +396,8 @@  int __mmu_notifier_clear_young(struct mm_struct *mm,
 				 srcu_read_lock_held(&srcu)) {
 		if (subscription->ops->clear_young)
 			young |= subscription->ops->clear_young(subscription,
-								mm, start, end);
+								mm, start, end,
+								fast_only);
 	}
 	srcu_read_unlock(&srcu, id);
 
@@ -403,7 +405,8 @@  int __mmu_notifier_clear_young(struct mm_struct *mm,
 }
 
 int __mmu_notifier_test_young(struct mm_struct *mm,
-			      unsigned long address)
+			      unsigned long address,
+			      bool fast_only)
 {
 	struct mmu_notifier *subscription;
 	int young = 0, id;
@@ -414,7 +417,8 @@  int __mmu_notifier_test_young(struct mm_struct *mm,
 				 srcu_read_lock_held(&srcu)) {
 		if (subscription->ops->test_young) {
 			young = subscription->ops->test_young(subscription, mm,
-							      address);
+							      address,
+							      fast_only);
 			if (young)
 				break;
 		}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33f8997a5c29..959b6d5d8ce4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -874,7 +874,7 @@  static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
 					      unsigned long start,
 					      unsigned long end)
 {
-	trace_kvm_age_hva(start, end);
+	trace_kvm_age_hva(start, end, false);
 
 	return kvm_handle_hva_range(mn, start, end, kvm_age_gfn);
 }
@@ -882,9 +882,10 @@  static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
 static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
 					struct mm_struct *mm,
 					unsigned long start,
-					unsigned long end)
+					unsigned long end,
+					bool fast_only)
 {
-	trace_kvm_age_hva(start, end);
+	trace_kvm_age_hva(start, end, fast_only);
 
 	/*
 	 * Even though we do not flush TLB, this will still adversely
@@ -904,9 +905,10 @@  static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
 
 static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
 				       struct mm_struct *mm,
-				       unsigned long address)
+				       unsigned long address,
+				       bool fast_only)
 {
-	trace_kvm_test_age_hva(address);
+	trace_kvm_test_age_hva(address, fast_only);
 
 	return kvm_handle_hva_range_no_flush(mn, address, address + 1,
 					     kvm_test_age_gfn);