diff mbox series

[v2,31/33] mm/sl*b: Differentiate struct slab fields by sl*b implementations

Message ID 20211201181510.18784-32-vbabka@suse.cz (mailing list archive)
State New
Headers show
Series Separate struct slab from struct page | expand

Commit Message

Vlastimil Babka Dec. 1, 2021, 6:15 p.m. UTC
With a struct slab definition separate from struct page, we can go further and
define only fields that the chosen sl*b implementation uses. This means
everything between __page_flags and __page_refcount placeholders now depends on
the chosen CONFIG_SL*B. Some fields exist in all implementations (slab_list)
but can be part of a union in some, so it's simpler to repeat them than
complicate the definition with ifdefs even more.

The patch doesn't change physical offsets of the fields, although it could be
done later - for example it's now clear that tighter packing in SLOB could be
possible.

This should also prevent accidental use of fields that don't exist in given
implementation. Before this patch virt_to_cache() and and cache_from_obj() was
visible for SLOB (albeit not used), although it relies on the slab_cache field
that isn't set by SLOB. With this patch it's now a compile error, so these
functions are now hidden behind #ifndef CONFIG_SLOB.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Marco Elver <elver@google.com> # kfence
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <kasan-dev@googlegroups.com>
---
 mm/kfence/core.c |  9 +++++----
 mm/slab.h        | 46 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 41 insertions(+), 14 deletions(-)

Comments

Vlastimil Babka Dec. 10, 2021, 6:26 p.m. UTC | #1
On 12/10/21 17:37, Hyeonggon Yoo wrote:
> On Wed, Dec 01, 2021 at 07:15:08PM +0100, Vlastimil Babka wrote:
>> With a struct slab definition separate from struct page, we can go further and
>> define only fields that the chosen sl*b implementation uses. This means
>> everything between __page_flags and __page_refcount placeholders now depends on
>> the chosen CONFIG_SL*B.
> 
> When I read this patch series first, I thought struct slab is allocated
> separately from struct page.
> 
> But after reading it again, It uses same allocated space of struct page.

Yes. Allocating it elsewhere is something that can be discussed later. It's
not a simple clear win - more memory used, more overhead, complicated code...

> So, the code should care about fields that page allocator cares when
> freeing page. (->mapping, ->refcount, ->flags, ...)
> 
> And, we can change offset of fields between page->flags and page->refcount,
> If we care about the value of page->mapping before freeing it.
> 
> Did I get it right?

Yeah. Also whatever aliases with compound_head must not have bit zero set as
that means a tail page.

>> Some fields exist in all implementations (slab_list)
>> but can be part of a union in some, so it's simpler to repeat them than
>> complicate the definition with ifdefs even more.
> 
> Before this patch I always ran preprocessor in my brain.
> now it's MUCH easier to understand than before!
> 
>> 
>> The patch doesn't change physical offsets of the fields, although it could be
>> done later - for example it's now clear that tighter packing in SLOB could be
>> possible.
>>
> 
> Is there a benefit if we pack SLOB's struct slab tighter?

I don't see any immediate benefit, except avoiding the page->mapping alias
as you suggested.

> ...
> 
>>  #ifdef CONFIG_MEMCG
>>  	unsigned long memcg_data;
>> @@ -47,7 +69,9 @@ struct slab {
>>  	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
>>  SLAB_MATCH(flags, __page_flags);
>>  SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
>> +#ifndef CONFIG_SLOB
>>  SLAB_MATCH(rcu_head, rcu_head);
> 
> Because SLUB and SLAB sets slab->slab_cache = NULL (to set page->mapping = NULL),

Hm, now that you mention it, maybe it would be better to do a
"folio->mapping = NULL" instead as we now have a more clearer view where we
operate on struct slab, and where we transition between that and a plain
folio. This is IMHO part of preparing the folio for freeing, not a struct
slab cleanup as struct slab doesn't need this cleanup.

> What about adding this?:
> 
> SLAB_MATCH(mapping, slab_cache);
> 
> there was SLAB_MATCH(slab_cache, slab_cache) but removed.

With the change suggested above, it wouldn't be needed as a safety check
anymore.

>> +#endif
>>  SLAB_MATCH(_refcount, __page_refcount);
>>  #ifdef CONFIG_MEMCG
>>  SLAB_MATCH(memcg_data, memcg_data);
> 
> I couldn't find any functional problem on this patch.
> but it seems there's some style issues.
> 
> Below is what checkpatch.pl complains.
> it's better to fix them!

Not all checkpatch suggestions are correct and have to be followed, but I'll
check what I missed. Thanks.

> WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
> #7: 
> With a struct slab definition separate from struct page, we can go further and
> 
> WARNING: Possible repeated word: 'and'
> #19: 
> implementation. Before this patch virt_to_cache() and and cache_from_obj() was
> 
> WARNING: space prohibited between function name and open parenthesis '('
> #49: FILE: mm/kfence/core.c:432:
> +#elif defined (CONFIG_SLAB)
> 
> ERROR: "foo * bar" should be "foo *bar"
> #73: FILE: mm/slab.h:20:
> +void * s_mem;/* first object */
> 
> ERROR: "foo * bar" should be "foo *bar"
> #111: FILE: mm/slab.h:53:
> +void * __unused_1;
> 
> ERROR: "foo * bar" should be "foo *bar"
> #113: FILE: mm/slab.h:55:
> +void * __unused_2;
> 
> ---
> Thanks,
> Hyeonggon.
Matthew Wilcox Dec. 11, 2021, 4:23 p.m. UTC | #2
On Fri, Dec 10, 2021 at 07:26:11PM +0100, Vlastimil Babka wrote:
> > Because SLUB and SLAB sets slab->slab_cache = NULL (to set page->mapping = NULL),
> 
> Hm, now that you mention it, maybe it would be better to do a
> "folio->mapping = NULL" instead as we now have a more clearer view where we
> operate on struct slab, and where we transition between that and a plain
> folio. This is IMHO part of preparing the folio for freeing, not a struct
> slab cleanup as struct slab doesn't need this cleanup.

Yes, I did that as part of "mm/slub: Convert slab freeing to struct slab"
in my original series:

-       __ClearPageSlabPfmemalloc(page);
+       __slab_clear_pfmemalloc(slab);
        __ClearPageSlab(page);
-       /* In union with page->mapping where page allocator expects NULL */
-       page->slab_cache = NULL;
+       page->mapping = NULL;
Matthew Wilcox Dec. 11, 2021, 4:52 p.m. UTC | #3
On Sat, Dec 11, 2021 at 11:55:27AM +0000, Hyeonggon Yoo wrote:
> On Fri, Dec 10, 2021 at 07:26:11PM +0100, Vlastimil Babka wrote:
> > On 12/10/21 17:37, Hyeonggon Yoo wrote:
> > > On Wed, Dec 01, 2021 at 07:15:08PM +0100, Vlastimil Babka wrote:
> > >> With a struct slab definition separate from struct page, we can go further and
> > >> define only fields that the chosen sl*b implementation uses. This means
> > >> everything between __page_flags and __page_refcount placeholders now depends on
> > >> the chosen CONFIG_SL*B.
> > > 
> > > When I read this patch series first, I thought struct slab is allocated
> > > separately from struct page.
> > > 
> > > But after reading it again, It uses same allocated space of struct page.
> > 
> > Yes. Allocating it elsewhere is something that can be discussed later. It's
> > not a simple clear win - more memory used, more overhead, complicated code...
> >
> 
> Right. That is a something that can be discussed,
> But I don't think there will be much win.

Oh no, there's a substantial win.  If we can reduce struct page to a
single pointer, that shrinks it from 64 bytes/4k to 8 bytes/4k.  Set
against that, you have to allocate the struct folio / struct slab / ...
but then it's one _per allocation_ rather than one per page.  So for
an order-2 allocation, it takes 32 bytes + 64 bytes (= 96 bytes)
rather than 4*64 = 256 bytes.  It's an even bigger win for larger
allocations, and it lets us grow the memory descriptors independently
of each other.

But it's also a substantial amount of work, so don't expect us to get
there any time soon.  Everything currently using struct page needs to
be converted to use another type, and that's just the pre-requisite
step.

Some more thoughts on this here:
https://lore.kernel.org/linux-mm/YXcLqcFhDq3uUwIj@casper.infradead.org/

> > Yeah. Also whatever aliases with compound_head must not have bit zero set as
> > that means a tail page.
> > 
> 
> Oh I was missing that. Thank you.
> 
> Hmm then in struct slab, page->compound_head and slab->list_head (or
> slab->rcu_head) has same offset. And list_head / rcu_head both store pointers.
> 
> then it has a alignment requirement. (address saved in list_head/rcu_head
> should be multiple of 2)
> 
> Anyway, it was required long time before this patch,
> so it is not a problem for this patch.

Yes, that's why there's an assert that the list_heads all line up.  This
requirement will go away if we do get separately allocated memory
descriptors (because that bottom bit is no longer PageTail).
diff mbox series

Patch

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 4eb60cf5ff8b..46103a7628a6 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -427,10 +427,11 @@  static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g
 	/* Set required slab fields. */
 	slab = virt_to_slab((void *)meta->addr);
 	slab->slab_cache = cache;
-	if (IS_ENABLED(CONFIG_SLUB))
-		slab->objects = 1;
-	if (IS_ENABLED(CONFIG_SLAB))
-		slab->s_mem = addr;
+#if defined(CONFIG_SLUB)
+	slab->objects = 1;
+#elif defined (CONFIG_SLAB)
+	slab->s_mem = addr;
+#endif
 
 	/* Memory initialization. */
 	for_each_canary(meta, set_canary_byte);
diff --git a/mm/slab.h b/mm/slab.h
index 2d50c099a222..8c5a8c005896 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -8,9 +8,24 @@ 
 /* Reuses the bits in struct page */
 struct slab {
 	unsigned long __page_flags;
+
+#if defined(CONFIG_SLAB)
+
+	union {
+		struct list_head slab_list;
+		struct rcu_head rcu_head;
+	};
+	struct kmem_cache *slab_cache;
+	void *freelist;	/* array of free object indexes */
+	void * s_mem;	/* first object */
+	unsigned int active;
+
+#elif defined(CONFIG_SLUB)
+
 	union {
 		struct list_head slab_list;
-		struct {	/* Partial pages */
+		struct rcu_head rcu_head;
+		struct {
 			struct slab *next;
 #ifdef CONFIG_64BIT
 			int slabs;	/* Nr of slabs left */
@@ -18,25 +33,32 @@  struct slab {
 			short int slabs;
 #endif
 		};
-		struct rcu_head rcu_head;
 	};
-	struct kmem_cache *slab_cache; /* not slob */
+	struct kmem_cache *slab_cache;
 	/* Double-word boundary */
 	void *freelist;		/* first free object */
 	union {
-		void *s_mem;	/* slab: first object */
-		unsigned long counters;		/* SLUB */
-		struct {			/* SLUB */
+		unsigned long counters;
+		struct {
 			unsigned inuse:16;
 			unsigned objects:15;
 			unsigned frozen:1;
 		};
 	};
+	unsigned int __unused;
+
+#elif defined(CONFIG_SLOB)
+
+	struct list_head slab_list;
+	void * __unused_1;
+	void *freelist;		/* first free block */
+	void * __unused_2;
+	int units;
+
+#else
+#error "Unexpected slab allocator configured"
+#endif
 
-	union {
-		unsigned int active;		/* SLAB */
-		int units;			/* SLOB */
-	};
 	atomic_t __page_refcount;
 #ifdef CONFIG_MEMCG
 	unsigned long memcg_data;
@@ -47,7 +69,9 @@  struct slab {
 	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
 SLAB_MATCH(flags, __page_flags);
 SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
+#ifndef CONFIG_SLOB
 SLAB_MATCH(rcu_head, rcu_head);
+#endif
 SLAB_MATCH(_refcount, __page_refcount);
 #ifdef CONFIG_MEMCG
 SLAB_MATCH(memcg_data, memcg_data);
@@ -623,6 +647,7 @@  static inline void memcg_slab_free_hook(struct kmem_cache *s,
 }
 #endif /* CONFIG_MEMCG_KMEM */
 
+#ifndef CONFIG_SLOB
 static inline struct kmem_cache *virt_to_cache(const void *obj)
 {
 	struct slab *slab;
@@ -669,6 +694,7 @@  static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
 		print_tracking(cachep, x);
 	return cachep;
 }
+#endif /* CONFIG_SLOB */
 
 static inline size_t slab_ksize(const struct kmem_cache *s)
 {