diff mbox series

[v3,10/12] mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting

Message ID 20200201034029.4063170-11-jhubbard@nvidia.com (mailing list archive)
State New, archived
Headers show
Series mm/gup: track FOLL_PIN pages | expand

Commit Message

John Hubbard Feb. 1, 2020, 3:40 a.m. UTC
Now that pages are "DMA-pinned" via pin_user_page*(), and unpinned via
unpin_user_pages*(), we need some visibility into whether all of this is
working correctly.

Add two new fields to /proc/vmstat:

    nr_foll_pin_requested
    nr_foll_pin_returned

These are documented in Documentation/core-api/pin_user_pages.rst.
They represent the number of pages (since boot time) that have been
pinned ("nr_foll_pin_requested") and unpinned ("nr_foll_pin_returned"),
via pin_user_pages*() and unpin_user_pages*().

In the absence of long-running DMA or RDMA operations that hold pages
pinned, the above two fields will normally be equal to each other.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mmzone.h |  2 ++
 mm/gup.c               | 21 +++++++++++++++++++++
 mm/vmstat.c            |  2 ++
 3 files changed, 25 insertions(+)

Comments

Kirill A . Shutemov Feb. 3, 2020, 1:53 p.m. UTC | #1
On Fri, Jan 31, 2020 at 07:40:27PM -0800, John Hubbard wrote:
> diff --git a/mm/gup.c b/mm/gup.c
> index c10d0d051c5b..9fe61d15fc0e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,6 +29,19 @@ struct follow_page_context {
>  	unsigned int page_mask;
>  };
>  
> +#ifdef CONFIG_DEBUG_VM

Why under CONFIG_DEBUG_VM? There's nothing about this in the cover letter.

> +static inline void __update_proc_vmstat(struct page *page,
> +					enum node_stat_item item, int count)
> +{
> +	mod_node_page_state(page_pgdat(page), item, count);
> +}
> +#else
> +static inline void __update_proc_vmstat(struct page *page,
> +					enum node_stat_item item, int count)
> +{
> +}
> +#endif
> +
>  static void hpage_pincount_add(struct page *page, int refs)
>  {
>  	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
John Hubbard Feb. 3, 2020, 9:04 p.m. UTC | #2
On 2/3/20 5:53 AM, Kirill A. Shutemov wrote:
> On Fri, Jan 31, 2020 at 07:40:27PM -0800, John Hubbard wrote:
>> diff --git a/mm/gup.c b/mm/gup.c
>> index c10d0d051c5b..9fe61d15fc0e 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -29,6 +29,19 @@ struct follow_page_context {
>>  	unsigned int page_mask;
>>  };
>>  
>> +#ifdef CONFIG_DEBUG_VM
> 
> Why under CONFIG_DEBUG_VM? There's nothing about this in the cover letter.
> 

Early on, gup_benchmark showed a really significant slowdown from using these 
counters. And I don't doubt that it's still the case.

I'll re-measure and add a short summary and a few numbers to the patch commit
description, and to the v4 cover letter.


thanks,
Kirill A . Shutemov Feb. 3, 2020, 9:30 p.m. UTC | #3
On Mon, Feb 03, 2020 at 01:04:04PM -0800, John Hubbard wrote:
> On 2/3/20 5:53 AM, Kirill A. Shutemov wrote:
> > On Fri, Jan 31, 2020 at 07:40:27PM -0800, John Hubbard wrote:
> >> diff --git a/mm/gup.c b/mm/gup.c
> >> index c10d0d051c5b..9fe61d15fc0e 100644
> >> --- a/mm/gup.c
> >> +++ b/mm/gup.c
> >> @@ -29,6 +29,19 @@ struct follow_page_context {
> >>  	unsigned int page_mask;
> >>  };
> >>  
> >> +#ifdef CONFIG_DEBUG_VM
> > 
> > Why under CONFIG_DEBUG_VM? There's nothing about this in the cover letter.
> > 
> 
> Early on, gup_benchmark showed a really significant slowdown from using these 
> counters. And I don't doubt that it's still the case.
> 
> I'll re-measure and add a short summary and a few numbers to the patch commit
> description, and to the v4 cover letter.

Looks like you'll show zeros for these counters if debug is off. It can be
confusing to the user. I think these counters should go away if you don't
count them.
John Hubbard Feb. 3, 2020, 9:34 p.m. UTC | #4
On 2/3/20 1:30 PM, Kirill A. Shutemov wrote:
> On Mon, Feb 03, 2020 at 01:04:04PM -0800, John Hubbard wrote:
>> On 2/3/20 5:53 AM, Kirill A. Shutemov wrote:
>>> On Fri, Jan 31, 2020 at 07:40:27PM -0800, John Hubbard wrote:
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index c10d0d051c5b..9fe61d15fc0e 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -29,6 +29,19 @@ struct follow_page_context {
>>>>  	unsigned int page_mask;
>>>>  };
>>>>  
>>>> +#ifdef CONFIG_DEBUG_VM
>>>
>>> Why under CONFIG_DEBUG_VM? There's nothing about this in the cover letter.
>>>
>>
>> Early on, gup_benchmark showed a really significant slowdown from using these 
>> counters. And I don't doubt that it's still the case.
>>
>> I'll re-measure and add a short summary and a few numbers to the patch commit
>> description, and to the v4 cover letter.
> 
> Looks like you'll show zeros for these counters if debug is off. It can be
> confusing to the user. I think these counters should go away if you don't
> count them.
> 

OK, that's a good point. (And in fact, the counters==0 situation already led me
astray briefly while debugging with Leon R, even. heh.) I'll remove them entirely for
the !CONFIG_DEBUG_VM case.

thanks,
John Hubbard Feb. 3, 2020, 11:16 p.m. UTC | #5
On 2/3/20 1:34 PM, John Hubbard wrote:
> On 2/3/20 1:30 PM, Kirill A. Shutemov wrote:
>> On Mon, Feb 03, 2020 at 01:04:04PM -0800, John Hubbard wrote:
>>> On 2/3/20 5:53 AM, Kirill A. Shutemov wrote:
>>>> On Fri, Jan 31, 2020 at 07:40:27PM -0800, John Hubbard wrote:
>>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>>> index c10d0d051c5b..9fe61d15fc0e 100644
>>>>> --- a/mm/gup.c
>>>>> +++ b/mm/gup.c
>>>>> @@ -29,6 +29,19 @@ struct follow_page_context {
>>>>>  	unsigned int page_mask;
>>>>>  };
>>>>>  
>>>>> +#ifdef CONFIG_DEBUG_VM
>>>>
>>>> Why under CONFIG_DEBUG_VM? There's nothing about this in the cover letter.
>>>>
>>>
>>> Early on, gup_benchmark showed a really significant slowdown from using these 
>>> counters. And I don't doubt that it's still the case.
>>>
>>> I'll re-measure and add a short summary and a few numbers to the patch commit
>>> description, and to the v4 cover letter.
>>
>> Looks like you'll show zeros for these counters if debug is off. It can be
>> confusing to the user. I think these counters should go away if you don't
>> count them.
>>
> 
> OK, that's a good point. (And in fact, the counters==0 situation already led me
> astray briefly while debugging with Leon R, even. heh.) I'll remove them entirely for
> the !CONFIG_DEBUG_VM case.
> 

On second thought, let me do some more careful performance testing. I don't recall
now if I was just removing every possible perf slowdown item, when I made this decision.
It could be that the perf is not affected, and I could just leave this feature enabled
at all times, which would be nicer.

And after all, these counters were designed for pretty hot-path items. I'll report back
with results...


thanks,
John Hubbard Feb. 3, 2020, 11:43 p.m. UTC | #6
On 2/3/20 3:16 PM, John Hubbard wrote:
> On 2/3/20 1:34 PM, John Hubbard wrote:
>> On 2/3/20 1:30 PM, Kirill A. Shutemov wrote:
>>> On Mon, Feb 03, 2020 at 01:04:04PM -0800, John Hubbard wrote:
>>>> On 2/3/20 5:53 AM, Kirill A. Shutemov wrote:
>>>>> On Fri, Jan 31, 2020 at 07:40:27PM -0800, John Hubbard wrote:
>>>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>>>> index c10d0d051c5b..9fe61d15fc0e 100644
>>>>>> --- a/mm/gup.c
>>>>>> +++ b/mm/gup.c
>>>>>> @@ -29,6 +29,19 @@ struct follow_page_context {
>>>>>>  	unsigned int page_mask;
>>>>>>  };
>>>>>>  
>>>>>> +#ifdef CONFIG_DEBUG_VM
>>>>>
>>>>> Why under CONFIG_DEBUG_VM? There's nothing about this in the cover letter.
>>>>>
>>>>
>>>> Early on, gup_benchmark showed a really significant slowdown from using these 
>>>> counters. And I don't doubt that it's still the case.
>>>>
>>>> I'll re-measure and add a short summary and a few numbers to the patch commit
>>>> description, and to the v4 cover letter.
>>>
>>> Looks like you'll show zeros for these counters if debug is off. It can be
>>> confusing to the user. I think these counters should go away if you don't
>>> count them.
>>>
>>
>> OK, that's a good point. (And in fact, the counters==0 situation already led me
>> astray briefly while debugging with Leon R, even. heh.) I'll remove them entirely for
>> the !CONFIG_DEBUG_VM case.
>>
> 
> On second thought, let me do some more careful performance testing. I don't recall
> now if I was just removing every possible perf slowdown item, when I made this decision.
> It could be that the perf is not affected, and I could just leave this feature enabled
> at all times, which would be nicer.
> 
> And after all, these counters were designed for pretty hot-path items. I'll report back
> with results...


Sure enough, any perf effects are hidden in the noise. I'll just remove the CONFIG_DEBUG_VM
checks. Glad you asked about this!



thanks,
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c2bc309d1634..01d690586206 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -243,6 +243,8 @@  enum node_stat_item {
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
 	NR_KERNEL_MISC_RECLAIMABLE,	/* reclaimable non-slab kernel pages */
+	NR_FOLL_PIN_REQUESTED,	/* via: pin_user_page(), gup flag: FOLL_PIN */
+	NR_FOLL_PIN_RETURNED,	/* pages returned via unpin_user_page() */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/gup.c b/mm/gup.c
index c10d0d051c5b..9fe61d15fc0e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,6 +29,19 @@  struct follow_page_context {
 	unsigned int page_mask;
 };
 
+#ifdef CONFIG_DEBUG_VM
+static inline void __update_proc_vmstat(struct page *page,
+					enum node_stat_item item, int count)
+{
+	mod_node_page_state(page_pgdat(page), item, count);
+}
+#else
+static inline void __update_proc_vmstat(struct page *page,
+					enum node_stat_item item, int count)
+{
+}
+#endif
+
 static void hpage_pincount_add(struct page *page, int refs)
 {
 	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
@@ -86,6 +99,8 @@  static __maybe_unused struct page *try_grab_compound_head(struct page *page,
 	if (flags & FOLL_GET)
 		return try_get_compound_head(page, refs);
 	else if (flags & FOLL_PIN) {
+		int orig_refs = refs;
+
 		/*
 		 * When pinning a compound page of order > 1 (which is what
 		 * hpage_pincount_available() checks for), use an exact count to
@@ -104,6 +119,7 @@  static __maybe_unused struct page *try_grab_compound_head(struct page *page,
 		if (hpage_pincount_available(page))
 			hpage_pincount_add(page, refs);
 
+		__update_proc_vmstat(page, NR_FOLL_PIN_REQUESTED, orig_refs);
 		return page;
 	}
 
@@ -158,6 +174,8 @@  bool __must_check try_grab_page(struct page *page, unsigned int flags)
 		 * once, so that the page really is pinned.
 		 */
 		page_ref_add(page, refs);
+
+		__update_proc_vmstat(page, NR_FOLL_PIN_REQUESTED, 1);
 	}
 
 	return true;
@@ -178,6 +196,7 @@  static bool __unpin_devmap_managed_user_page(struct page *page)
 
 	count = page_ref_sub_return(page, refs);
 
+	__update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1);
 	/*
 	 * devmap page refcounts are 1-based, rather than 0-based: if
 	 * refcount is 1, then the page is free and the refcount is
@@ -228,6 +247,8 @@  void unpin_user_page(struct page *page)
 
 	if (page_ref_sub_and_test(page, refs))
 		__put_page(page);
+
+	__update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1);
 }
 EXPORT_SYMBOL(unpin_user_page);
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 78d53378db99..b56808bae1b4 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1168,6 +1168,8 @@  const char * const vmstat_text[] = {
 	"nr_dirtied",
 	"nr_written",
 	"nr_kernel_misc_reclaimable",
+	"nr_foll_pin_requested",
+	"nr_foll_pin_returned",
 
 	/* enum writeback_stat_item counters */
 	"nr_dirty_threshold",