diff mbox series

[1/2] mm: support fastpath if NUMA is enabled with numa off

Message ID 20210616083745.14288-1-janghyuck.kim@samsung.com (mailing list archive)
State New, archived
Headers show
Series [1/2] mm: support fastpath if NUMA is enabled with numa off | expand

Commit Message

Janghyuck Kim June 16, 2021, 8:37 a.m. UTC
Architecture might support fake node when CONFIG_NUMA is enabled but any
node settings were supported by ACPI or device tree. In this case,
getting memory policy during memory allocation path is meaningless.

Moreover, performance degradation was observed in the minor page fault
test, which is provided by (https://lkml.org/lkml/2006/8/29/294).
Average faults/sec of enabling NUMA with fake node was 5~6 % worse than
disabling NUMA. To reduce this performance regression, fastpath is
introduced. fastpath can skip the memory policy checking if NUMA is
enabled but it uses fake node. If architecture doesn't support fake
node, fastpath affects nothing for memory allocation path.

Signed-off-by: Janghyuck Kim <janghyuck.kim@samsung.com>
---
 mm/internal.h  | 4 ++++
 mm/mempolicy.c | 3 +++
 2 files changed, 7 insertions(+)

Comments

Vlastimil Babka June 16, 2021, 5:10 p.m. UTC | #1
On 6/16/21 10:37 AM, Janghyuck Kim wrote:
> Architecture might support fake node when CONFIG_NUMA is enabled but any

I suppose you mean the dummy node, i.e. dummy_numa_init()?

Because fakenuma is something different and I think if someone defines fakenuma
nodes they actually would want for the mempolicies to be honored as if there was
a real NUMA setup.

> node settings were supported by ACPI or device tree. In this case,
> getting memory policy during memory allocation path is meaningless.
> 
> Moreover, performance degradation was observed in the minor page fault
> test, which is provided by (https://lkml.org/lkml/2006/8/29/294).
> Average faults/sec of enabling NUMA with fake node was 5~6 % worse than
> disabling NUMA. To reduce this performance regression, fastpath is

So you have measured this overhead is all due to mempolicy evaluation?
Interesting, sounds like a lot.

> introduced. fastpath can skip the memory policy checking if NUMA is
> enabled but it uses fake node. If architecture doesn't support fake
> node, fastpath affects nothing for memory allocation path.
> 
> Signed-off-by: Janghyuck Kim <janghyuck.kim@samsung.com>

Sounds like an interesting direction to improve CONFIG_NUMA built kernels on
single-node systems, but why restrict it only to arm64 and not make it generic
for all systems with a single node?
We could also probably use a static key instead of this #define.
That would even make it possible to switch in case memory hotplug onlines
another node, etc.

> ---
>  mm/internal.h  | 4 ++++
>  mm/mempolicy.c | 3 +++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 31ff935b2547..3b6c21814fbc 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -36,6 +36,10 @@ void page_writeback_init(void);
>  
>  vm_fault_t do_swap_page(struct vm_fault *vmf);
>  
> +#ifndef numa_off_fastpath
> +#define numa_off_fastpath()	false
> +#endif
> +
>  void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>  		unsigned long floor, unsigned long ceiling);
>  
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index e32360e90274..21156671d941 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2152,6 +2152,9 @@ struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
>  	int preferred_nid;
>  	nodemask_t *nmask;
>  
> +	if (numa_off_fastpath())
> +		return __alloc_pages_nodemask(gfp, order, 0, NULL);
> +
>  	pol = get_vma_policy(vma, addr);
>  
>  	if (pol->mode == MPOL_INTERLEAVE) {
>
Matthew Wilcox June 16, 2021, 5:32 p.m. UTC | #2
On Wed, Jun 16, 2021 at 05:37:41PM +0900, Janghyuck Kim wrote:
> Architecture might support fake node when CONFIG_NUMA is enabled but any
> node settings were supported by ACPI or device tree. In this case,
> getting memory policy during memory allocation path is meaningless.
> 
> Moreover, performance degradation was observed in the minor page fault
> test, which is provided by (https://lkml.org/lkml/2006/8/29/294).
> Average faults/sec of enabling NUMA with fake node was 5~6 % worse than
> disabling NUMA. To reduce this performance regression, fastpath is
> introduced. fastpath can skip the memory policy checking if NUMA is
> enabled but it uses fake node. If architecture doesn't support fake
> node, fastpath affects nothing for memory allocation path.

This patch doesn't even apply to the current kernel, but putting that
aside, what's the expensive part of the current code?  That is,
comparing performance stats between this numa_off enabled and numa_off
disabled, where do you see taking a lot of time?
Janghyuck Kim June 17, 2021, 11:42 a.m. UTC | #3
Hi,

On Wed, Jun 16, 2021 at 07:10:06PM +0200, Vlastimil Babka wrote:
> On 6/16/21 10:37 AM, Janghyuck Kim wrote:
> > Architecture might support fake node when CONFIG_NUMA is enabled but any
> 
> I suppose you mean the dummy node, i.e. dummy_numa_init()?
> 
> Because fakenuma is something different and I think if someone defines fakenuma
> nodes they actually would want for the mempolicies to be honored as if there was
> a real NUMA setup.
> 

You are correct. I mean dummy node, which shows "Faking a node at ..."
message at boot time. So I called it fake node.

> > node settings were supported by ACPI or device tree. In this case,
> > getting memory policy during memory allocation path is meaningless.
> > 
> > Moreover, performance degradation was observed in the minor page fault
> > test, which is provided by (https://protect2.fireeye.com/v1/url?k=32536af8-6dc85232-3252e1b7-0cc47a31bee8-e52eadd28e1e9a6e&q=1&e=39db7dd8-7f21-41a4-b4a9-9ad395d36e23&u=https%3A%2F%2Flkml.org%2Flkml%2F2006%2F8%2F29%2F294).
> > Average faults/sec of enabling NUMA with fake node was 5~6 % worse than
> > disabling NUMA. To reduce this performance regression, fastpath is
> 
> So you have measured this overhead is all due to mempolicy evaluation?
> Interesting, sounds like a lot.
> 

It's early to conclude, but mempolicy evaluation seems to account for a
large portion of the total overhead. Since this patch, performance
regression has decreased from 5-6% to 2-3%. It is still unclear whether
the remainder is within the margin of error of the measurement results
or is affected by other NUMA-related codes.

> > introduced. fastpath can skip the memory policy checking if NUMA is
> > enabled but it uses fake node. If architecture doesn't support fake
> > node, fastpath affects nothing for memory allocation path.
> > 
> > Signed-off-by: Janghyuck Kim <janghyuck.kim@samsung.com>
> 
> Sounds like an interesting direction to improve CONFIG_NUMA built kernels on
> single-node systems, but why restrict it only to arm64 and not make it generic
> for all systems with a single node?
> We could also probably use a static key instead of this #define.
> That would even make it possible to switch in case memory hotplug onlines
> another node, etc.
> 

I'm participating in arm64 project now, so I'm not sure if other
architectures will accept this way. So I tried not to touch other
architecture. Of course, it can be changed in the generic way if agree.

> > ---
> >  mm/internal.h  | 4 ++++
> >  mm/mempolicy.c | 3 +++
> >  2 files changed, 7 insertions(+)
> > 
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 31ff935b2547..3b6c21814fbc 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -36,6 +36,10 @@ void page_writeback_init(void);
> >  
> >  vm_fault_t do_swap_page(struct vm_fault *vmf);
> >  
> > +#ifndef numa_off_fastpath
> > +#define numa_off_fastpath()	false
> > +#endif
> > +
> >  void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
> >  		unsigned long floor, unsigned long ceiling);
> >  
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index e32360e90274..21156671d941 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2152,6 +2152,9 @@ struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
> >  	int preferred_nid;
> >  	nodemask_t *nmask;
> >  
> > +	if (numa_off_fastpath())
> > +		return __alloc_pages_nodemask(gfp, order, 0, NULL);
> > +
> >  	pol = get_vma_policy(vma, addr);
> >  
> >  	if (pol->mode == MPOL_INTERLEAVE) {
> > 
> 
>
Janghyuck Kim June 17, 2021, 11:55 a.m. UTC | #4
On Wed, Jun 16, 2021 at 06:32:50PM +0100, Matthew Wilcox wrote:
> On Wed, Jun 16, 2021 at 05:37:41PM +0900, Janghyuck Kim wrote:
> > Architecture might support fake node when CONFIG_NUMA is enabled but any
> > node settings were supported by ACPI or device tree. In this case,
> > getting memory policy during memory allocation path is meaningless.
> > 
> > Moreover, performance degradation was observed in the minor page fault
> > test, which is provided by (https://protect2.fireeye.com/v1/url?k=c81407ae-978f3ea4-c8158ce1-0cc47a31384a-10187d5ead74c318&q=1&e=cbc91c9b-80e1-4ca0-b51a-9f79fad5b0c1&u=https%3A%2F%2Flkml.org%2Flkml%2F2006%2F8%2F29%2F294).
> > Average faults/sec of enabling NUMA with fake node was 5~6 % worse than
> > disabling NUMA. To reduce this performance regression, fastpath is
> > introduced. fastpath can skip the memory policy checking if NUMA is
> > enabled but it uses fake node. If architecture doesn't support fake
> > node, fastpath affects nothing for memory allocation path.
> 
> This patch doesn't even apply to the current kernel, but putting that
> aside, what's the expensive part of the current code?  That is,
> comparing performance stats between this numa_off enabled and numa_off
> disabled, where do you see taking a lot of time?
> 

mempolicy related code that I skipped by this patch took a short time,
taking only a few tens of nanoseconds that difficult to measure by
sched_clock's degree of precision. But it can be affect the minor page
fault test with large buffer size, because one page fault handling takes
several ms. As I replied in previous mail, performance regression has
been reduced from 5~6% to 2~3%.

>
Matthew Wilcox June 17, 2021, 12:40 p.m. UTC | #5
On Thu, Jun 17, 2021 at 08:55:44PM +0900, Janghyuck Kim wrote:
> On Wed, Jun 16, 2021 at 06:32:50PM +0100, Matthew Wilcox wrote:
> > On Wed, Jun 16, 2021 at 05:37:41PM +0900, Janghyuck Kim wrote:
> > > Architecture might support fake node when CONFIG_NUMA is enabled but any
> > > node settings were supported by ACPI or device tree. In this case,
> > > getting memory policy during memory allocation path is meaningless.
> > > 
> > > Moreover, performance degradation was observed in the minor page fault
> > > test, which is provided by (https://protect2.fireeye.com/v1/url?k=c81407ae-978f3ea4-c8158ce1-0cc47a31384a-10187d5ead74c318&q=1&e=cbc91c9b-80e1-4ca0-b51a-9f79fad5b0c1&u=https%3A%2F%2Flkml.org%2Flkml%2F2006%2F8%2F29%2F294).
> > > Average faults/sec of enabling NUMA with fake node was 5~6 % worse than
> > > disabling NUMA. To reduce this performance regression, fastpath is
> > > introduced. fastpath can skip the memory policy checking if NUMA is
> > > enabled but it uses fake node. If architecture doesn't support fake
> > > node, fastpath affects nothing for memory allocation path.
> > 
> > This patch doesn't even apply to the current kernel, but putting that
> > aside, what's the expensive part of the current code?  That is,
> > comparing performance stats between this numa_off enabled and numa_off
> > disabled, where do you see taking a lot of time?
> > 
> 
> mempolicy related code that I skipped by this patch took a short time,
> taking only a few tens of nanoseconds that difficult to measure by
> sched_clock's degree of precision. But it can be affect the minor page
> fault test with large buffer size, because one page fault handling takes
> several ms. As I replied in previous mail, performance regression has
> been reduced from 5~6% to 2~3%.

I'm not proposing you use sched_clock.  Try perf.
diff mbox series

Patch

diff --git a/mm/internal.h b/mm/internal.h
index 31ff935b2547..3b6c21814fbc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -36,6 +36,10 @@  void page_writeback_init(void);
 
 vm_fault_t do_swap_page(struct vm_fault *vmf);
 
+#ifndef numa_off_fastpath
+#define numa_off_fastpath()	false
+#endif
+
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 		unsigned long floor, unsigned long ceiling);
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e32360e90274..21156671d941 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2152,6 +2152,9 @@  struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 	int preferred_nid;
 	nodemask_t *nmask;
 
+	if (numa_off_fastpath())
+		return __alloc_pages_nodemask(gfp, order, 0, NULL);
+
 	pol = get_vma_policy(vma, addr);
 
 	if (pol->mode == MPOL_INTERLEAVE) {