diff mbox series

mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM

Message ID 54b0b07a-c178-9ffe-b5af-088f3c21696c@kernel.dk (mailing list archive)
State New, archived
Headers show
Series mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM | expand

Commit Message

Jens Axboe Jan. 31, 2023, 3:36 p.m. UTC
Using FOLL_PIN for mapping user pages caused a performance regression of
about 2.7%. Looking at profiles, we see:

+2.71%  [kernel.vmlinux]  [k] mod_node_page_state

which wasn't there before. The node page state counters are percpu, but
with a very low threshold. On my setup, every 108th update ends up
needing to punt to two atomic_lond_add()'s, which is causing this above
regression.

As these counters are purely for debug purposes, move them under
CONFIG_DEBUG_VM rather than do them unconditionally.

Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages")
Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

I added fixes tags, even though it's not a strict fix for this commits.
But it does fix a performance regression introduced by those commits.
It's a useful hint for backporting.

I'd prefer sticking this at the end of the iov-extract series that
is already pulled in, so it can go with those patches.

Comments

David Hildenbrand Jan. 31, 2023, 3:48 p.m. UTC | #1
On 31.01.23 16:36, Jens Axboe wrote:
> Using FOLL_PIN for mapping user pages caused a performance regression of
> about 2.7%. Looking at profiles, we see:
> 
> +2.71%  [kernel.vmlinux]  [k] mod_node_page_state
> 
> which wasn't there before. The node page state counters are percpu, but
> with a very low threshold. On my setup, every 108th update ends up
> needing to punt to two atomic_lond_add()'s, which is causing this above
> regression.
> 
> As these counters are purely for debug purposes, move them under
> CONFIG_DEBUG_VM rather than do them unconditionally.
> 
> Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages")
> Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
> Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> I added fixes tags, even though it's not a strict fix for this commits.
> But it does fix a performance regression introduced by those commits.
> It's a useful hint for backporting.

I'd just mention them in the commit log instead, but I don't 
particularly care here as long as the commit ID's are stable.

If still possible, I'd include this as a preparational change for these 
commits instead. Anyhow

Acked-by: David Hildenbrand <david@redhat.com>
Jens Axboe Jan. 31, 2023, 3:50 p.m. UTC | #2
On 1/31/23 8:48?AM, David Hildenbrand wrote:
> On 31.01.23 16:36, Jens Axboe wrote:
>> Using FOLL_PIN for mapping user pages caused a performance regression of
>> about 2.7%. Looking at profiles, we see:
>>
>> +2.71%  [kernel.vmlinux]  [k] mod_node_page_state
>>
>> which wasn't there before. The node page state counters are percpu, but
>> with a very low threshold. On my setup, every 108th update ends up
>> needing to punt to two atomic_lond_add()'s, which is causing this above
>> regression.
>>
>> As these counters are purely for debug purposes, move them under
>> CONFIG_DEBUG_VM rather than do them unconditionally.
>>
>> Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages")
>> Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
>> Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> I added fixes tags, even though it's not a strict fix for this commits.
>> But it does fix a performance regression introduced by those commits.
>> It's a useful hint for backporting.
> 
> I'd just mention them in the commit log instead, but I don't
> particularly care here as long as the commit ID's are stable.

Sure, I can move that bit into the commit message.

> If still possible, I'd include this as a preparational change for
> these commits instead. Anyhow

That would be preferable in general, but I don't think it's worth
rebasing the series just for that.

> Acked-by: David Hildenbrand <david@redhat.com>

Thanks! Will add in conjunction with updating the commit message.
John Hubbard Jan. 31, 2023, 5:47 p.m. UTC | #3
On 1/31/23 07:36, Jens Axboe wrote:
> Using FOLL_PIN for mapping user pages caused a performance regression of
> about 2.7%. Looking at profiles, we see:
> 
> +2.71%  [kernel.vmlinux]  [k] mod_node_page_state
> 
> which wasn't there before. The node page state counters are percpu, but
> with a very low threshold. On my setup, every 108th update ends up
> needing to punt to two atomic_lond_add()'s, which is causing this above
> regression.
> 
> As these counters are purely for debug purposes, move them under
> CONFIG_DEBUG_VM rather than do them unconditionally.
> 
> Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages")
> Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
> Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 

Yes, that's the exact right fix (in the absence of some magical
high-volume, high performance per-cpu counters anyway). In fact,
originally these counter operations were behind CONFIG_DEBUG_VM in an
early patchset, but the FOLL_PIN feature was new and potentially flaky,
and also not yet in the Direct IO fast path. So during code review we
decided to enable the debugging information unconditionally.

But now we can no longer afford the debug counters in release
builds--but we also no longer need them, because the FOLL_PIN feature
has settled in and had enough soak time to be more confident about it.

Over time, I'd kind of started assuming that these counters were
necessary for release builds, and it required someone else to wake me up
and point out the obvious, so thanks! :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cd28a100d9e4..0153ec8a54ae 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -195,8 +195,10 @@  enum node_stat_item {
 	NR_WRITTEN,		/* page writings since bootup */
 	NR_THROTTLED_WRITTEN,	/* NR_WRITTEN while reclaim throttled */
 	NR_KERNEL_MISC_RECLAIMABLE,	/* reclaimable non-slab kernel pages */
+#ifdef CONFIG_DEBUG_VM
 	NR_FOLL_PIN_ACQUIRED,	/* via: pin_user_page(), gup flag: FOLL_PIN */
 	NR_FOLL_PIN_RELEASED,	/* pages returned via unpin_user_page() */
+#endif
 	NR_KERNEL_STACK_KB,	/* measured in KiB */
 #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
 	NR_KERNEL_SCS_KB,	/* measured in KiB */
diff --git a/mm/gup.c b/mm/gup.c
index f45a3a5be53a..41abb16286ec 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -168,7 +168,9 @@  struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 		 */
 		smp_mb__after_atomic();
 
+#ifdef CONFIG_DEBUG_VM
 		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
+#endif
 
 		return folio;
 	}
@@ -180,7 +182,9 @@  struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 {
 	if (flags & FOLL_PIN) {
+#ifdef CONFIG_DEBUG_VM
 		node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
+#endif
 		if (folio_test_large(folio))
 			atomic_sub(refs, folio_pincount_ptr(folio));
 		else
@@ -236,8 +240,9 @@  int __must_check try_grab_page(struct page *page, unsigned int flags)
 		} else {
 			folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
 		}
-
+#ifdef CONFIG_DEBUG_VM
 		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+#endif
 	}
 
 	return 0;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1ea6a5ce1c41..5cbd9a1924bf 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1227,8 +1227,10 @@  const char * const vmstat_text[] = {
 	"nr_written",
 	"nr_throttled_written",
 	"nr_kernel_misc_reclaimable",
+#ifdef CONFIG_DEBUG_VM
 	"nr_foll_pin_acquired",
 	"nr_foll_pin_released",
+#endif
 	"nr_kernel_stack",
 #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
 	"nr_shadow_call_stack",