[v3] SLUB: Add support for per object memory policies

Message ID	20241001-strict_numa-v3-1-ee31405056ee@gentwo.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Christoph Lameter via B4 Relay <devnull+cl.gentwo.org@kernel.org> Date: Tue, 01 Oct 2024 12:08:06 -0700 Subject: [PATCH v3] SLUB: Add support for per object memory policies MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20241001-strict_numa-v3-1-ee31405056ee@gentwo.org> To: Vlastimil Babka <vbabka@suse.cz>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Andrew Morton <akpm@linux-foundation.org>, Roman Gushchin <roman.gushchin@linux.dev>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, Yang Shi <shy828301@gmail.com>, Christoph Lameter <cl@linux.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Huang Shijie <shijie@os.amperecomputing.com>, "Christoph Lameter (Ampere)" <cl@gentwo.org> Reply-To: cl@gentwo.org Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[v3] SLUB: Add support for per object memory policies \| expand [v3] SLUB: Add support for per object memory policies

Christoph Lameter via B4 Relay Oct. 1, 2024, 7:08 p.m. UTC

From: Christoph Lameter <cl@gentwo.org>

    The old SLAB allocator used to support memory policies on a per
    allocation bases. In SLUB the memory policies are applied on a
    per page frame / folio bases. Doing so avoids having to check memory
    policies in critical code paths for kmalloc and friends.

    This worked on general well on Intel/AMD/PowerPC because the
    interconnect technology is mature and can minimize the latencies
    through intelligent caching even if a small object is not
    placed optimally.

    However, on ARM we have an emergence of new NUMA interconnect
    technology based more on embedded devices. Caching of remote content
    can currently be ineffective using the standard building blocks / mesh
    available on that platform. Such architectures benefit if each slab
    object is individually placed according to memory policies
    and other restrictions.

    This patch adds another kernel parameter

            slab_strict_numa

    If that is set then a static branch is activated that will cause
    the hotpaths of the allocator to evaluate the current memory
    allocation policy. Each object will be properly placed by
    paying the price of extra processing and SLUB will no longer
    defer to the page allocator to apply memory policies at the
    folio level.

    This patch improves performance of memcached running
    on Ampere Altra 2P system (ARM Neoverse N1 processor)
    by 3.6% due to accurate placement of small kernel objects.

Tested-by: Huang Shijie <shijie@os.amperecomputing.com>
Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org>
---
Changes in v3:
- Make the static key a static in slub.c
- Use pr_warn / pr_info instead of printk
- Link to v2: https://lore.kernel.org/r/20240906-strict_numa-v2-1-f104e6de6d1e@gentwo.org

Changes in v2:
- Fix various issues
- Testing
---
 mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)


---
base-commit: e32cde8d2bd7d251a8f9b434143977ddf13dcec6
change-id: 20240819-strict_numa-fc59b33123a2

Best regards,

Vlastimil Babka Oct. 2, 2024, 10:32 a.m. UTC | #1

On 10/1/24 21:08, Christoph Lameter via B4 Relay wrote:
> From: Christoph Lameter <cl@gentwo.org>
> 
>     The old SLAB allocator used to support memory policies on a per
>     allocation bases. In SLUB the memory policies are applied on a
>     per page frame / folio bases. Doing so avoids having to check memory
>     policies in critical code paths for kmalloc and friends.
> 
>     This worked on general well on Intel/AMD/PowerPC because the
>     interconnect technology is mature and can minimize the latencies
>     through intelligent caching even if a small object is not
>     placed optimally.
> 
>     However, on ARM we have an emergence of new NUMA interconnect
>     technology based more on embedded devices. Caching of remote content
>     can currently be ineffective using the standard building blocks / mesh
>     available on that platform. Such architectures benefit if each slab
>     object is individually placed according to memory policies
>     and other restrictions.
> 
>     This patch adds another kernel parameter
> 
>             slab_strict_numa
> 
>     If that is set then a static branch is activated that will cause
>     the hotpaths of the allocator to evaluate the current memory
>     allocation policy. Each object will be properly placed by
>     paying the price of extra processing and SLUB will no longer
>     defer to the page allocator to apply memory policies at the
>     folio level.
> 
>     This patch improves performance of memcached running
>     on Ampere Altra 2P system (ARM Neoverse N1 processor)
>     by 3.6% due to accurate placement of small kernel objects.
> 
> Tested-by: Huang Shijie <shijie@os.amperecomputing.com>
> Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org>

OK, but we should document this parameter in:
Documentation/admin-guide/kernel-parameters.rst
Documentation/mm/slab.rst

Thanks,
Vlastimil

> ---
> Changes in v3:
> - Make the static key a static in slub.c
> - Use pr_warn / pr_info instead of printk
> - Link to v2: https://lore.kernel.org/r/20240906-strict_numa-v2-1-f104e6de6d1e@gentwo.org
> 
> Changes in v2:
> - Fix various issues
> - Testing
> ---
>  mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 21f71cb6cc06..7ae94f79740d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -218,6 +218,10 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
>  #endif
>  #endif		/* CONFIG_SLUB_DEBUG */
>  
> +#ifdef CONFIG_NUMA
> +static DEFINE_STATIC_KEY_FALSE(strict_numa);
> +#endif
> +
>  /* Structure holding parameters for get_partial() call chain */
>  struct partial_context {
>  	gfp_t flags;
> @@ -3957,6 +3961,28 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s,
>  	object = c->freelist;
>  	slab = c->slab;
>  
> +#ifdef CONFIG_NUMA
> +	if (static_branch_unlikely(&strict_numa) &&
> +			node == NUMA_NO_NODE) {
> +
> +		struct mempolicy *mpol = current->mempolicy;
> +
> +		if (mpol) {
> +			/*
> +			 * Special BIND rule support. If existing slab
> +			 * is in permitted set then do not redirect
> +			 * to a particular node.
> +			 * Otherwise we apply the memory policy to get
> +			 * the node we need to allocate on.
> +			 */
> +			if (mpol->mode != MPOL_BIND || !slab ||
> +					!node_isset(slab_nid(slab), mpol->nodes))
> +
> +				node = mempolicy_slab_node();
> +		}
> +	}
> +#endif
> +
>  	if (!USE_LOCKLESS_FAST_PATH() ||
>  	    unlikely(!object || !slab || !node_match(slab, node))) {
>  		object = __slab_alloc(s, gfpflags, node, addr, c, orig_size);
> @@ -5601,6 +5627,22 @@ static int __init setup_slub_min_objects(char *str)
>  __setup("slab_min_objects=", setup_slub_min_objects);
>  __setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0);
>  
> +#ifdef CONFIG_NUMA
> +static int __init setup_slab_strict_numa(char *str)
> +{
> +	if (nr_node_ids > 1) {
> +		static_branch_enable(&strict_numa);
> +		pr_info("SLUB: Strict NUMA enabled.\n");
> +	} else
> +		pr_warn("slab_strict_numa parameter set on non NUMA system.\n");
> +
> +	return 1;
> +}
> +
> +__setup("slab_strict_numa", setup_slab_strict_numa);
> +#endif
> +
> +
>  #ifdef CONFIG_HARDENED_USERCOPY
>  /*
>   * Rejects incorrectly sized objects and objects that are to be copied
> 
> ---
> base-commit: e32cde8d2bd7d251a8f9b434143977ddf13dcec6
> change-id: 20240819-strict_numa-fc59b33123a2
> 
> Best regards,

Christoph Lameter (Ampere) Oct. 2, 2024, 5:52 p.m. UTC | #2

On Wed, 2 Oct 2024, Vlastimil Babka wrote:

> OK, but we should document this parameter in:
> Documentation/admin-guide/kernel-parameters.rst
> Documentation/mm/slab.rst

mm/slab.rst is empty? I used slub.rst instead.

Here is a patch to add documentation:


From 510a95b00355fcbf3fb9e0325c1a0f0ef80c6278 Mon Sep 17 00:00:00 2001
From: Christoph Lameter <cl@gentwo.org>
Date: Wed, 2 Oct 2024 10:27:00 -0700
Subject: [PATCH] Add documentation for the new slab_strict_numa kernel command
 line option

Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
 Documentation/mm/slub.rst                       |  9 +++++++++
 2 files changed, 19 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1518343bbe22..89a4c0ec290c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6544,6 +6544,16 @@
 	stifb=		[HW]
 			Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]

+	slab_strict_numa	[MM]
+			Support memory policies on a per object level
+			in the slab allocator. The default is for memory
+			policies to be applied at the folio level when
+			a new folio is needed or a partial folio is
+			retrieved from the lists. Increases overhead
+			in the slab fastpaths but gains more accurate
+			NUMA kernel object placement which helps with slow
+			interconnects in NUMA systems.
+
         strict_sas_size=
 			[X86]
 			Format: <bool>
diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst
index 60d350d08362..84ca1dc94e5e 100644
--- a/Documentation/mm/slub.rst
+++ b/Documentation/mm/slub.rst
@@ -175,6 +175,15 @@ can be influenced by kernel parameters:
 	``slab_max_order`` to 0, what cause minimum possible order of
 	slabs allocation.

+``slab_strict_numa``
+        Enables the application of memory policies on each
+        allocation. This results in more accurate placement of
+        objects which may result in the reduction of accesses
+        to remote nodes. The default is to only apply memory
+        policies at the folio level when a new folio is acquired
+        or a folio is retrieved from the lists. Enabling this
+        option reduces the fastpath performance of the slab allocator.
+
 SLUB Debug output
 =================

Vlastimil Babka Oct. 3, 2024, 9:51 a.m. UTC | #3

On 10/2/24 19:52, Christoph Lameter (Ampere) wrote:
> On Wed, 2 Oct 2024, Vlastimil Babka wrote:
> 
>> OK, but we should document this parameter in:
>> Documentation/admin-guide/kernel-parameters.rst
>> Documentation/mm/slab.rst
> 
> mm/slab.rst is empty? I used slub.rst instead.

Ah yes.

> Here is a patch to add documentation:

Thanks, amended into the commit

Hyeonggon Yoo Oct. 6, 2024, 2:37 p.m. UTC | #4

On Wed, Oct 2, 2024 at 4:08 AM Christoph Lameter via B4 Relay
<devnull+cl.gentwo.org@kernel.org> wrote:
>
> From: Christoph Lameter <cl@gentwo.org>
>
>     The old SLAB allocator used to support memory policies on a per
>     allocation bases. In SLUB the memory policies are applied on a
>     per page frame / folio bases. Doing so avoids having to check memory
>     policies in critical code paths for kmalloc and friends.
>
>     This worked on general well on Intel/AMD/PowerPC because the
>     interconnect technology is mature and can minimize the latencies
>     through intelligent caching even if a small object is not
>     placed optimally.
>
>     However, on ARM we have an emergence of new NUMA interconnect
>     technology based more on embedded devices. Caching of remote content
>     can currently be ineffective using the standard building blocks / mesh
>     available on that platform. Such architectures benefit if each slab
>     object is individually placed according to memory policies
>     and other restrictions.
>
>     This patch adds another kernel parameter
>
>             slab_strict_numa
>
>     If that is set then a static branch is activated that will cause
>     the hotpaths of the allocator to evaluate the current memory
>     allocation policy. Each object will be properly placed by
>     paying the price of extra processing and SLUB will no longer
>     defer to the page allocator to apply memory policies at the
>     folio level.
>
>     This patch improves performance of memcached running
>     on Ampere Altra 2P system (ARM Neoverse N1 processor)
>     by 3.6% due to accurate placement of small kernel objects.
>
> Tested-by: Huang Shijie <shijie@os.amperecomputing.com>
> Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org>
> ---
> Changes in v3:
> - Make the static key a static in slub.c
> - Use pr_warn / pr_info instead of printk
> - Link to v2: https://lore.kernel.org/r/20240906-strict_numa-v2-1-f104e6de6d1e@gentwo.org
>
> Changes in v2:
> - Fix various issues
> - Testing
> ---
>  mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 21f71cb6cc06..7ae94f79740d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -218,6 +218,10 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
>  #endif
>  #endif         /* CONFIG_SLUB_DEBUG */
>
> +#ifdef CONFIG_NUMA
> +static DEFINE_STATIC_KEY_FALSE(strict_numa);
> +#endif
> +
>  /* Structure holding parameters for get_partial() call chain */
>  struct partial_context {
>         gfp_t flags;
> @@ -3957,6 +3961,28 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s,
>         object = c->freelist;
>         slab = c->slab;
>
> +#ifdef CONFIG_NUMA
> +       if (static_branch_unlikely(&strict_numa) &&
> +                       node == NUMA_NO_NODE) {
> +
> +               struct mempolicy *mpol = current->mempolicy;
> +
> +               if (mpol) {
> +                       /*
> +                        * Special BIND rule support. If existing slab
> +                        * is in permitted set then do not redirect
> +                        * to a particular node.
> +                        * Otherwise we apply the memory policy to get
> +                        * the node we need to allocate on.
> +                        */
> +                       if (mpol->mode != MPOL_BIND || !slab ||
> +                                       !node_isset(slab_nid(slab), mpol->nodes))
> +
> +                               node = mempolicy_slab_node();
> +               }

Is it intentional to allow the local node only (via
mempolicy_slab_node()) in interrupt contexts?

> +       }
> +#endif
> +
>         if (!USE_LOCKLESS_FAST_PATH() ||
>             unlikely(!object || !slab || !node_match(slab, node))) {
>                 object = __slab_alloc(s, gfpflags, node, addr, c, orig_size);
> @@ -5601,6 +5627,22 @@ static int __init setup_slub_min_objects(char *str)
>  __setup("slab_min_objects=", setup_slub_min_objects);
>  __setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0);
>
> +#ifdef CONFIG_NUMA
> +static int __init setup_slab_strict_numa(char *str)
> +{
> +       if (nr_node_ids > 1) {
> +               static_branch_enable(&strict_numa);
> +               pr_info("SLUB: Strict NUMA enabled.\n");
> +       } else
> +               pr_warn("slab_strict_numa parameter set on non NUMA system.\n");

nit: this statement should be enclosed within braces per coding style guideline.
Otherwise everything looks good to me (including the document amended).

Best,
Hyeonggon

Christoph Lameter (Ampere) Oct. 7, 2024, 4:19 p.m. UTC | #5

On Sun, 6 Oct 2024, Hyeonggon Yoo wrote:

> > +                        */
> > +                       if (mpol->mode != MPOL_BIND || !slab ||
> > +                                       !node_isset(slab_nid(slab), mpol->nodes))
> > +
> > +                               node = mempolicy_slab_node();
> > +               }
>
> Is it intentional to allow the local node only (via
> mempolicy_slab_node()) in interrupt contexts?

Yes that is the general approach since the task context is generally not
valid for the interrupt which is usually from a device that is not task
specific.

Vlastimil Babka Oct. 8, 2024, 9:48 a.m. UTC | #6

On 10/6/24 16:37, Hyeonggon Yoo wrote:

>> +
>>         if (!USE_LOCKLESS_FAST_PATH() ||
>>             unlikely(!object || !slab || !node_match(slab, node))) {
>>                 object = __slab_alloc(s, gfpflags, node, addr, c, orig_size);
>> @@ -5601,6 +5627,22 @@ static int __init setup_slub_min_objects(char *str)
>>  __setup("slab_min_objects=", setup_slub_min_objects);
>>  __setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0);
>>
>> +#ifdef CONFIG_NUMA
>> +static int __init setup_slab_strict_numa(char *str)
>> +{
>> +       if (nr_node_ids > 1) {
>> +               static_branch_enable(&strict_numa);
>> +               pr_info("SLUB: Strict NUMA enabled.\n");
>> +       } else
>> +               pr_warn("slab_strict_numa parameter set on non NUMA system.\n");
> 
> nit: this statement should be enclosed within braces per coding style guideline.
> Otherwise everything looks good to me (including the document amended).

Right, amended locally, thanks.

> Best,
> Hyeonggon

[v3] SLUB: Add support for per object memory policies

Commit Message

Comments

Patch