Message ID | 20241001-strict_numa-v3-1-ee31405056ee@gentwo.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] SLUB: Add support for per object memory policies | expand |
On 10/1/24 21:08, Christoph Lameter via B4 Relay wrote: > From: Christoph Lameter <cl@gentwo.org> > > The old SLAB allocator used to support memory policies on a per > allocation bases. In SLUB the memory policies are applied on a > per page frame / folio bases. Doing so avoids having to check memory > policies in critical code paths for kmalloc and friends. > > This worked on general well on Intel/AMD/PowerPC because the > interconnect technology is mature and can minimize the latencies > through intelligent caching even if a small object is not > placed optimally. > > However, on ARM we have an emergence of new NUMA interconnect > technology based more on embedded devices. Caching of remote content > can currently be ineffective using the standard building blocks / mesh > available on that platform. Such architectures benefit if each slab > object is individually placed according to memory policies > and other restrictions. > > This patch adds another kernel parameter > > slab_strict_numa > > If that is set then a static branch is activated that will cause > the hotpaths of the allocator to evaluate the current memory > allocation policy. Each object will be properly placed by > paying the price of extra processing and SLUB will no longer > defer to the page allocator to apply memory policies at the > folio level. > > This patch improves performance of memcached running > on Ampere Altra 2P system (ARM Neoverse N1 processor) > by 3.6% due to accurate placement of small kernel objects. > > Tested-by: Huang Shijie <shijie@os.amperecomputing.com> > Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org> OK, but we should document this parameter in: Documentation/admin-guide/kernel-parameters.rst Documentation/mm/slab.rst Thanks, Vlastimil > --- > Changes in v3: > - Make the static key a static in slub.c > - Use pr_warn / pr_info instead of printk > - Link to v2: https://lore.kernel.org/r/20240906-strict_numa-v2-1-f104e6de6d1e@gentwo.org > > Changes in v2: > - Fix various issues > - Testing > --- > mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/mm/slub.c b/mm/slub.c > index 21f71cb6cc06..7ae94f79740d 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -218,6 +218,10 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled); > #endif > #endif /* CONFIG_SLUB_DEBUG */ > > +#ifdef CONFIG_NUMA > +static DEFINE_STATIC_KEY_FALSE(strict_numa); > +#endif > + > /* Structure holding parameters for get_partial() call chain */ > struct partial_context { > gfp_t flags; > @@ -3957,6 +3961,28 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s, > object = c->freelist; > slab = c->slab; > > +#ifdef CONFIG_NUMA > + if (static_branch_unlikely(&strict_numa) && > + node == NUMA_NO_NODE) { > + > + struct mempolicy *mpol = current->mempolicy; > + > + if (mpol) { > + /* > + * Special BIND rule support. If existing slab > + * is in permitted set then do not redirect > + * to a particular node. > + * Otherwise we apply the memory policy to get > + * the node we need to allocate on. > + */ > + if (mpol->mode != MPOL_BIND || !slab || > + !node_isset(slab_nid(slab), mpol->nodes)) > + > + node = mempolicy_slab_node(); > + } > + } > +#endif > + > if (!USE_LOCKLESS_FAST_PATH() || > unlikely(!object || !slab || !node_match(slab, node))) { > object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); > @@ -5601,6 +5627,22 @@ static int __init setup_slub_min_objects(char *str) > __setup("slab_min_objects=", setup_slub_min_objects); > __setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0); > > +#ifdef CONFIG_NUMA > +static int __init setup_slab_strict_numa(char *str) > +{ > + if (nr_node_ids > 1) { > + static_branch_enable(&strict_numa); > + pr_info("SLUB: Strict NUMA enabled.\n"); > + } else > + pr_warn("slab_strict_numa parameter set on non NUMA system.\n"); > + > + return 1; > +} > + > +__setup("slab_strict_numa", setup_slab_strict_numa); > +#endif > + > + > #ifdef CONFIG_HARDENED_USERCOPY > /* > * Rejects incorrectly sized objects and objects that are to be copied > > --- > base-commit: e32cde8d2bd7d251a8f9b434143977ddf13dcec6 > change-id: 20240819-strict_numa-fc59b33123a2 > > Best regards,
On Wed, 2 Oct 2024, Vlastimil Babka wrote: > OK, but we should document this parameter in: > Documentation/admin-guide/kernel-parameters.rst > Documentation/mm/slab.rst mm/slab.rst is empty? I used slub.rst instead. Here is a patch to add documentation: From 510a95b00355fcbf3fb9e0325c1a0f0ef80c6278 Mon Sep 17 00:00:00 2001 From: Christoph Lameter <cl@gentwo.org> Date: Wed, 2 Oct 2024 10:27:00 -0700 Subject: [PATCH] Add documentation for the new slab_strict_numa kernel command line option Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com> --- Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++ Documentation/mm/slub.rst | 9 +++++++++ 2 files changed, 19 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1518343bbe22..89a4c0ec290c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6544,6 +6544,16 @@ stifb= [HW] Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]] + slab_strict_numa [MM] + Support memory policies on a per object level + in the slab allocator. The default is for memory + policies to be applied at the folio level when + a new folio is needed or a partial folio is + retrieved from the lists. Increases overhead + in the slab fastpaths but gains more accurate + NUMA kernel object placement which helps with slow + interconnects in NUMA systems. + strict_sas_size= [X86] Format: <bool> diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst index 60d350d08362..84ca1dc94e5e 100644 --- a/Documentation/mm/slub.rst +++ b/Documentation/mm/slub.rst @@ -175,6 +175,15 @@ can be influenced by kernel parameters: ``slab_max_order`` to 0, what cause minimum possible order of slabs allocation. +``slab_strict_numa`` + Enables the application of memory policies on each + allocation. This results in more accurate placement of + objects which may result in the reduction of accesses + to remote nodes. The default is to only apply memory + policies at the folio level when a new folio is acquired + or a folio is retrieved from the lists. Enabling this + option reduces the fastpath performance of the slab allocator. + SLUB Debug output =================
On 10/2/24 19:52, Christoph Lameter (Ampere) wrote: > On Wed, 2 Oct 2024, Vlastimil Babka wrote: > >> OK, but we should document this parameter in: >> Documentation/admin-guide/kernel-parameters.rst >> Documentation/mm/slab.rst > > mm/slab.rst is empty? I used slub.rst instead. Ah yes. > Here is a patch to add documentation: Thanks, amended into the commit
On Wed, Oct 2, 2024 at 4:08 AM Christoph Lameter via B4 Relay <devnull+cl.gentwo.org@kernel.org> wrote: > > From: Christoph Lameter <cl@gentwo.org> > > The old SLAB allocator used to support memory policies on a per > allocation bases. In SLUB the memory policies are applied on a > per page frame / folio bases. Doing so avoids having to check memory > policies in critical code paths for kmalloc and friends. > > This worked on general well on Intel/AMD/PowerPC because the > interconnect technology is mature and can minimize the latencies > through intelligent caching even if a small object is not > placed optimally. > > However, on ARM we have an emergence of new NUMA interconnect > technology based more on embedded devices. Caching of remote content > can currently be ineffective using the standard building blocks / mesh > available on that platform. Such architectures benefit if each slab > object is individually placed according to memory policies > and other restrictions. > > This patch adds another kernel parameter > > slab_strict_numa > > If that is set then a static branch is activated that will cause > the hotpaths of the allocator to evaluate the current memory > allocation policy. Each object will be properly placed by > paying the price of extra processing and SLUB will no longer > defer to the page allocator to apply memory policies at the > folio level. > > This patch improves performance of memcached running > on Ampere Altra 2P system (ARM Neoverse N1 processor) > by 3.6% due to accurate placement of small kernel objects. > > Tested-by: Huang Shijie <shijie@os.amperecomputing.com> > Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org> > --- > Changes in v3: > - Make the static key a static in slub.c > - Use pr_warn / pr_info instead of printk > - Link to v2: https://lore.kernel.org/r/20240906-strict_numa-v2-1-f104e6de6d1e@gentwo.org > > Changes in v2: > - Fix various issues > - Testing > --- > mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/mm/slub.c b/mm/slub.c > index 21f71cb6cc06..7ae94f79740d 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -218,6 +218,10 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled); > #endif > #endif /* CONFIG_SLUB_DEBUG */ > > +#ifdef CONFIG_NUMA > +static DEFINE_STATIC_KEY_FALSE(strict_numa); > +#endif > + > /* Structure holding parameters for get_partial() call chain */ > struct partial_context { > gfp_t flags; > @@ -3957,6 +3961,28 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s, > object = c->freelist; > slab = c->slab; > > +#ifdef CONFIG_NUMA > + if (static_branch_unlikely(&strict_numa) && > + node == NUMA_NO_NODE) { > + > + struct mempolicy *mpol = current->mempolicy; > + > + if (mpol) { > + /* > + * Special BIND rule support. If existing slab > + * is in permitted set then do not redirect > + * to a particular node. > + * Otherwise we apply the memory policy to get > + * the node we need to allocate on. > + */ > + if (mpol->mode != MPOL_BIND || !slab || > + !node_isset(slab_nid(slab), mpol->nodes)) > + > + node = mempolicy_slab_node(); > + } Is it intentional to allow the local node only (via mempolicy_slab_node()) in interrupt contexts? > + } > +#endif > + > if (!USE_LOCKLESS_FAST_PATH() || > unlikely(!object || !slab || !node_match(slab, node))) { > object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); > @@ -5601,6 +5627,22 @@ static int __init setup_slub_min_objects(char *str) > __setup("slab_min_objects=", setup_slub_min_objects); > __setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0); > > +#ifdef CONFIG_NUMA > +static int __init setup_slab_strict_numa(char *str) > +{ > + if (nr_node_ids > 1) { > + static_branch_enable(&strict_numa); > + pr_info("SLUB: Strict NUMA enabled.\n"); > + } else > + pr_warn("slab_strict_numa parameter set on non NUMA system.\n"); nit: this statement should be enclosed within braces per coding style guideline. Otherwise everything looks good to me (including the document amended). Best, Hyeonggon
On Sun, 6 Oct 2024, Hyeonggon Yoo wrote: > > + */ > > + if (mpol->mode != MPOL_BIND || !slab || > > + !node_isset(slab_nid(slab), mpol->nodes)) > > + > > + node = mempolicy_slab_node(); > > + } > > Is it intentional to allow the local node only (via > mempolicy_slab_node()) in interrupt contexts? Yes that is the general approach since the task context is generally not valid for the interrupt which is usually from a device that is not task specific.
On 10/6/24 16:37, Hyeonggon Yoo wrote: >> + >> if (!USE_LOCKLESS_FAST_PATH() || >> unlikely(!object || !slab || !node_match(slab, node))) { >> object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); >> @@ -5601,6 +5627,22 @@ static int __init setup_slub_min_objects(char *str) >> __setup("slab_min_objects=", setup_slub_min_objects); >> __setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0); >> >> +#ifdef CONFIG_NUMA >> +static int __init setup_slab_strict_numa(char *str) >> +{ >> + if (nr_node_ids > 1) { >> + static_branch_enable(&strict_numa); >> + pr_info("SLUB: Strict NUMA enabled.\n"); >> + } else >> + pr_warn("slab_strict_numa parameter set on non NUMA system.\n"); > > nit: this statement should be enclosed within braces per coding style guideline. > Otherwise everything looks good to me (including the document amended). Right, amended locally, thanks. > Best, > Hyeonggon
diff --git a/mm/slub.c b/mm/slub.c index 21f71cb6cc06..7ae94f79740d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -218,6 +218,10 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled); #endif #endif /* CONFIG_SLUB_DEBUG */ +#ifdef CONFIG_NUMA +static DEFINE_STATIC_KEY_FALSE(strict_numa); +#endif + /* Structure holding parameters for get_partial() call chain */ struct partial_context { gfp_t flags; @@ -3957,6 +3961,28 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s, object = c->freelist; slab = c->slab; +#ifdef CONFIG_NUMA + if (static_branch_unlikely(&strict_numa) && + node == NUMA_NO_NODE) { + + struct mempolicy *mpol = current->mempolicy; + + if (mpol) { + /* + * Special BIND rule support. If existing slab + * is in permitted set then do not redirect + * to a particular node. + * Otherwise we apply the memory policy to get + * the node we need to allocate on. + */ + if (mpol->mode != MPOL_BIND || !slab || + !node_isset(slab_nid(slab), mpol->nodes)) + + node = mempolicy_slab_node(); + } + } +#endif + if (!USE_LOCKLESS_FAST_PATH() || unlikely(!object || !slab || !node_match(slab, node))) { object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); @@ -5601,6 +5627,22 @@ static int __init setup_slub_min_objects(char *str) __setup("slab_min_objects=", setup_slub_min_objects); __setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0); +#ifdef CONFIG_NUMA +static int __init setup_slab_strict_numa(char *str) +{ + if (nr_node_ids > 1) { + static_branch_enable(&strict_numa); + pr_info("SLUB: Strict NUMA enabled.\n"); + } else + pr_warn("slab_strict_numa parameter set on non NUMA system.\n"); + + return 1; +} + +__setup("slab_strict_numa", setup_slab_strict_numa); +#endif + + #ifdef CONFIG_HARDENED_USERCOPY /* * Rejects incorrectly sized objects and objects that are to be copied