diff mbox series

mm/slab: add new flag SLAB_NO_MERGE to avoid merging per slab

Message ID 20230524101748.30714-1-dsterba@suse.com (mailing list archive)
State New
Headers show
Series mm/slab: add new flag SLAB_NO_MERGE to avoid merging per slab | expand

Commit Message

David Sterba May 24, 2023, 10:17 a.m. UTC
Add a flag that allows to disable merging per slab. This can be used for
more fine grained control over the caches or for debugging builds where
separate slabs can verify that no objects leak.

The slab_nomerge boot option is too coarse and would need to be enabled
on all testing hosts. There are some other ways how to disable merging,
e.g. a slab constructor but this disables poisoning besides that it adds
additional overhead. Other flags are internal and may have other
semantics.

A concrete example what motivates the flag. During 'btrfs balance' slab
top reported huge increase in caches like

  1330095 1330095 100%    0.10K  34105       39    136420K Acpi-ParseExt
  1734684 1734684 100%    0.14K  61953       28    247812K pid_namespace
  8244036 6873075  83%    0.11K 229001       36    916004K khugepaged_mm_slot

which was confusing and that it's because of slab merging was not the
first idea.  After rebooting with slab_nomerge all the caches were from
btrfs_ namespace as expected.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 include/linux/slab.h | 3 +++
 mm/slab_common.c     | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Hyeonggon Yoo May 24, 2023, 12:56 p.m. UTC | #1
On Wed, May 24, 2023 at 12:17:48PM +0200, David Sterba wrote:
> Add a flag that allows to disable merging per slab. This can be used for
> more fine grained control over the caches or for debugging builds where
> separate slabs can verify that no objects leak.
> The slab_nomerge boot option is too coarse and would need to be enabled
> on all testing hosts. 

Hello David,

There is no users nor interface to set this flag, I guess you're going
to use it by modifying source code, when debugging?

Does introducing new slub_debug option (i.e. slub_debug=N,pid_namespace)
work for your use case? (there are some boot-time slub_debug options described in
Documentation/mm/slub.rst)

> There are some other ways how to disable merging,
> e.g. a slab constructor but this disables poisoning besides that it adds
> additional overhead. Other flags are internal and may have other
> semantics.
>
> A concrete example what motivates the flag. During 'btrfs balance' slab
> top reported huge increase in caches like
> 
>   1330095 1330095 100%    0.10K  34105       39    136420K Acpi-ParseExt
>   1734684 1734684 100%    0.14K  61953       28    247812K pid_namespace
>   8244036 6873075  83%    0.11K 229001       36    916004K khugepaged_mm_slot
> 
> which was confusing and that it's because of slab merging was not the
> first idea.  After rebooting with slab_nomerge all the caches were from
> btrfs_ namespace as expected.
>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  include/linux/slab.h | 3 +++
>  mm/slab_common.c     | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h

Thanks,
Vlastimil Babka May 24, 2023, 1:53 p.m. UTC | #2
On 5/24/23 14:56, Hyeonggon Yoo wrote:
> On Wed, May 24, 2023 at 12:17:48PM +0200, David Sterba wrote:
>> Add a flag that allows to disable merging per slab. This can be used for
>> more fine grained control over the caches or for debugging builds where
>> separate slabs can verify that no objects leak.
>> The slab_nomerge boot option is too coarse and would need to be enabled
>> on all testing hosts. 
> 
> Hello David,
> 
> There is no users nor interface to set this flag, I guess you're going
> to use it by modifying source code, when debugging?
> 
> Does introducing new slub_debug option (i.e. slub_debug=N,pid_namespace)
> work for your use case? (there are some boot-time slub_debug options described in
> Documentation/mm/slub.rst)

Yeah, it supports globbing so it would be e.g. slub_debug=N,btrfs*
That would deal with the "too coarse" aspect slab_nomerge. As for "need to
be enabled on all testing hosts", is it more convenient to deploy a debug
kernel build on them? Might be because you do that for other reasons
already? Just want to clarify.

BTW this was proposed as RFC few months ago but not pursued:
https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/

I have no big objections, just wouldn't like to see its usage proliferate
unconditionally into non-debug builds.

>> There are some other ways how to disable merging,
>> e.g. a slab constructor but this disables poisoning besides that it adds
>> additional overhead. Other flags are internal and may have other
>> semantics.
>>
>> A concrete example what motivates the flag. During 'btrfs balance' slab
>> top reported huge increase in caches like
>> 
>>   1330095 1330095 100%    0.10K  34105       39    136420K Acpi-ParseExt
>>   1734684 1734684 100%    0.14K  61953       28    247812K pid_namespace
>>   8244036 6873075  83%    0.11K 229001       36    916004K khugepaged_mm_slot
>> 
>> which was confusing and that it's because of slab merging was not the
>> first idea.  After rebooting with slab_nomerge all the caches were from
>> btrfs_ namespace as expected.
>>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> ---
>>  include/linux/slab.h | 3 +++
>>  mm/slab_common.c     | 2 +-
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/linux/slab.h b/include/linux/slab.h
> 
> Thanks,
>
David Sterba May 25, 2023, 11:59 a.m. UTC | #3
On Wed, May 24, 2023 at 09:56:02PM +0900, Hyeonggon Yoo wrote:
> On Wed, May 24, 2023 at 12:17:48PM +0200, David Sterba wrote:
> > Add a flag that allows to disable merging per slab. This can be used for
> > more fine grained control over the caches or for debugging builds where
> > separate slabs can verify that no objects leak.
> > The slab_nomerge boot option is too coarse and would need to be enabled
> > on all testing hosts. 
> 
> There is no users nor interface to set this flag, I guess you're going
> to use it by modifying source code, when debugging?

An example usage

--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -12,6 +12,12 @@
 #include "async-thread.h"
 #include "block-rsv.h"
 
+#ifdef CONFIG_BTRFS_DEBUG
+#define SLAB_DEBUG_NO_MERGE            0
+#else
+#define SLAB_DEBUG_NO_MERGE            SLAB_NO_MERGE
+#endif
+
 #define BTRFS_MAX_EXTENT_SIZE SZ_128M
 
 #define BTRFS_OLDEST_GENERATION        0ULL
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5049,7 +5049,7 @@ int __init btrfs_ctree_init(void)
 {
        btrfs_path_cachep = kmem_cache_create("btrfs_path",
                        sizeof(struct btrfs_path), 0,
-                       SLAB_MEM_SPREAD, NULL);
+                       SLAB_MEM_SPREAD | SLAB_DEBUG_NO_MERGE, NULL);
        if (!btrfs_path_cachep)
                return -ENOMEM;
        return 0;
---

and this will be a permanent change, not added as needed.

> Does introducing new slub_debug option (i.e. slub_debug=N,pid_namespace)
> work for your use case? (there are some boot-time slub_debug options described in
> Documentation/mm/slub.rst)

I'd like to keep boot parameters unchanged, the testing setups are
different, physical, local VM, hosted. For the same reason the config
option CONFIG_SLUB_DEBUG_ON=y is very convenient.
David Sterba May 25, 2023, 12:05 p.m. UTC | #4
On Wed, May 24, 2023 at 03:53:11PM +0200, Vlastimil Babka wrote:
> On 5/24/23 14:56, Hyeonggon Yoo wrote:
> > On Wed, May 24, 2023 at 12:17:48PM +0200, David Sterba wrote:
> > work for your use case? (there are some boot-time slub_debug options described in
> > Documentation/mm/slub.rst)
> 
> Yeah, it supports globbing so it would be e.g. slub_debug=N,btrfs*
> That would deal with the "too coarse" aspect slab_nomerge. As for "need to
> be enabled on all testing hosts", is it more convenient to deploy a debug
> kernel build on them? Might be because you do that for other reasons
> already? Just want to clarify.

Yeah, agreed.

> BTW this was proposed as RFC few months ago but not pursued:
> https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/
> 
> I have no big objections, just wouldn't like to see its usage proliferate
> unconditionally into non-debug builds.

It would be fine for me to make it conditionally available, e.g.
depending on SLUB_DEBUG.
diff mbox series

Patch

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6b3e155b70bf..06b94dfbce65 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -106,6 +106,9 @@ 
 /* Avoid kmemleak tracing */
 #define SLAB_NOLEAKTRACE	((slab_flags_t __force)0x00800000U)
 
+/* Don't merge slab */
+#define SLAB_NO_MERGE		((slab_flags_t __force)0x01000000U)
+
 /* Fault injection mark */
 #ifdef CONFIG_FAILSLAB
 # define SLAB_FAILSLAB		((slab_flags_t __force)0x02000000U)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 607249785c07..0e0a617eae7d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -47,7 +47,7 @@  static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
  */
 #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
 		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
-		SLAB_FAILSLAB | kasan_never_merge())
+		SLAB_FAILSLAB | SLAB_NO_MERGE | kasan_never_merge())
 
 #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
 			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)