diff mbox

arm64: Increase the max granular size

Message ID 20151104123640.GK7637@e104818-lin.cambridge.arm.com (mailing list archive)
State Not Applicable
Delegated to: Simon Horman
Headers show

Commit Message

Catalin Marinas Nov. 4, 2015, 12:36 p.m. UTC
(+ linux-mm)

On Tue, Nov 03, 2015 at 05:33:25PM -0600, Christoph Lameter wrote:
> On Tue, 3 Nov 2015, Catalin Marinas wrote:
> > (cc'ing Jonsoo and Christoph; summary: slab failure with L1_CACHE_BYTES
> > of 128 and sizeof(kmem_cache_node) of 152)
> 
> Hmmm... Yes that would mean use the 196 sized kmalloc array which is not a
> power of two slab. But the code looks fine to me.

I'm not entirely sure that gets used (or even created).
kmalloc_index(152) returns 8 (INDEX_NODE==8) since KMALLOC_MIN_SIZE==128
and the "kmalloc-node" cache size is 256.

> > If I revert commit 8fc9cf420b36 ("slab: make more slab management
> > structure off the slab") it works but I still need to figure out how
> > slab indices are calculated. The size_index[] array is overridden so
> > that 0..15 are 7 and 16..23 are 8. But the kmalloc_caches[7] has never
> > been populated, hence the BUG_ON. Another option may be to change
> > kmalloc_size and kmalloc_index to cope with KMALLOC_MIN_SIZE of 128.
> >
> > I'll do some more investigation tomorrow.
> 
> The commit allows off slab management for PAGE_SIZE >> 5 that is 128.

This means that the first kmalloc cache to be created, "kmalloc-128", is
off slab.

> After that commit kmem_cache_create would try to allocate an off slab
> management structure which is not available during early boot.
> But the slab_early_init is set which should prevent the use of an off slab
> management infrastructure in kmem_cache_create().
> 
> However, the failure in line 2283 shows that the OFF SLAB flag was
> mistakenly set anyways!!!! Something must havee cleared slab_early_init?

slab_early_init is cleared after "kmem_cache" and "kmalloc-node" caches
are successfully created. Following this, the minimum kmalloc allocation
goes off-slab when KMALLOC_MIN_SIZE == 128.

When trying to create "kmalloc-128" (via create_kmalloc_caches(),
slab_early_init is already 0), __kmem_cache_create() requires an
allocation of 32 bytes (freelist_size) which has index 7, hence exactly
the kmalloc_caches[7] we are trying to create.

The simplest option would be to make sure that off slab isn't allowed
for caches of KMALLOC_MIN_SIZE or smaller, with the drawback that not
only "kmalloc-128" but any other such caches will be on slab.

I think a better option would be to first check that there is a
kmalloc_caches[] entry for freelist_size before deciding to go off-slab.
See below:

-----8<------------------------------

From ce27c5c6d156522ceaff20de8a7af281cf079b6f Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed, 4 Nov 2015 12:19:00 +0000
Subject: [PATCH] mm: slab: Avoid BUG when KMALLOC_MIN_SIZE == (PAGE_SIZE >> 5)

The slab allocator, following commit 8fc9cf420b36 ("slab: make more slab
management structure off the slab"), tries to place slab management
off-slab when the object size is PAGE_SIZE >> 5 or larger. On arm64 with
KMALLOC_MIN_SIZE = L1_CACHE_BYTES = 128, "kmalloc-128" is the smallest
cache to be created after slab_early_init = 0. The current
__kmem_cache_create() implementation aims to place the management
structure off-slab. However, the kmalloc_slab(freelist_size) has not
been populated yet, triggering a bug on !cachep->freelist_cache.

This patch addresses the problem by keeping the management structure
on-slab if the corresponding kmalloc_caches[] is not populated yet.

Fixes: 8fc9cf420b36 ("slab: make more slab management structure off the slab")
Cc: <stable@vger.kernel.org> # 3.15+
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/slab.c | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Lameter (Ampere) Nov. 4, 2015, 1:53 p.m. UTC | #1
On Wed, 4 Nov 2015, Catalin Marinas wrote:

> The simplest option would be to make sure that off slab isn't allowed
> for caches of KMALLOC_MIN_SIZE or smaller, with the drawback that not
> only "kmalloc-128" but any other such caches will be on slab.

The reason for an off slab configuration is denser object packing.

> I think a better option would be to first check that there is a
> kmalloc_caches[] entry for freelist_size before deciding to go off-slab.

Hmmm.. Yes seems to be an option.

Maybe we simply revert commit 8fc9cf420b36 instead? That does not seem to
make too much sense to me and the goal of the commit cannot be
accomplished on ARM. Your patch essentially reverts the effect anyways.

Smaller slabs really do not need off slab management anyways since they
will only loose a few objects per slab page.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Nov. 4, 2015, 2:54 p.m. UTC | #2
On Wed, Nov 04, 2015 at 07:53:50AM -0600, Christoph Lameter wrote:
> On Wed, 4 Nov 2015, Catalin Marinas wrote:
> 
> > The simplest option would be to make sure that off slab isn't allowed
> > for caches of KMALLOC_MIN_SIZE or smaller, with the drawback that not
> > only "kmalloc-128" but any other such caches will be on slab.
> 
> The reason for an off slab configuration is denser object packing.
> 
> > I think a better option would be to first check that there is a
> > kmalloc_caches[] entry for freelist_size before deciding to go off-slab.
> 
> Hmmm.. Yes seems to be an option.
> 
> Maybe we simply revert commit 8fc9cf420b36 instead?

I'm fine with this. Also note that the arm64 commit changing
L1_CACHE_BYTES to 128 hasn't been pushed yet (it's queued for 4.4).

> That does not seem to make too much sense to me and the goal of the
> commit cannot be accomplished on ARM. Your patch essentially reverts
> the effect anyways.

In theory it only reverts the effect for the first kmalloc_cache
("kmalloc-128" in the arm64 case). Any other bigger cache which would
not be mergeable with an existing one still has the potential of
off-slab management.

> Smaller slabs really do not need off slab management anyways since they
> will only loose a few objects per slab page.

IIUC, starting with 128 slab size for a 4KB page, you have 32 objects
per page. The freelist takes 32 bytes (or 31), therefore you waste a
single slab object. However, only 1/4 of it is used for freelist and the
waste gets bigger with 256 slab size, hence the original commit.

BTW, assuming L1_CACHE_BYTES is 512 (I don't ever see this happening but
just in theory), we potentially have the same issue. What would save us
is that INDEX_NODE would match the first "kmalloc-512" cache, so we have
it pre-populated.
Christoph Lameter (Ampere) Nov. 4, 2015, 3:28 p.m. UTC | #3
On Wed, 4 Nov 2015, Catalin Marinas wrote:

> BTW, assuming L1_CACHE_BYTES is 512 (I don't ever see this happening but
> just in theory), we potentially have the same issue. What would save us
> is that INDEX_NODE would match the first "kmalloc-512" cache, so we have
> it pre-populated.

Ok maybe add some BUILD_BUG_ONs to ensure that builds fail until we have
addressed that.

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/slab.c b/mm/slab.c
index 4fcc5dd8d5a6..d4a21736eb5d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2246,16 +2246,33 @@  __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
-		freelist_size = calculate_freelist_size(cachep->num, 0);
+		size_t off_freelist_size = calculate_freelist_size(cachep->num, 0);
+
+		cachep->freelist_cache = kmalloc_slab(off_freelist_size, 0u);
+		if (ZERO_OR_NULL_PTR(cachep->freelist_cache)) {
+			/*
+			 * We don't have kmalloc_caches[] populated for
+			 * off_freelist_size yet. This can happen during
+			 * create_kmalloc_caches() when KMALLOC_MIN_SIZE >=
+			 * (PAGE_SHIFT >> 5) and CFLGS_OFF_SLAB is set. Move
+			 * the cache on-slab.
+			 */
+			flags &= ~CFLGS_OFF_SLAB;
+			left_over = calculate_slab_order(cachep, size, cachep->align, flags);
+		} else {
+			freelist_size = off_freelist_size;
 
 #ifdef CONFIG_PAGE_POISONING
-		/* If we're going to use the generic kernel_map_pages()
-		 * poisoning, then it's going to smash the contents of
-		 * the redzone and userword anyhow, so switch them off.
-		 */
-		if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
-			flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
+			/*
+			 * If we're going to use the generic kernel_map_pages()
+			 * poisoning, then it's going to smash the contents of
+			 * the redzone and userword anyhow, so switch them off.
+			 */
+			if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
+				flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
 #endif
+		}
+
 	}
 
 	cachep->colour_off = cache_line_size();
@@ -2271,18 +2288,6 @@  __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->size = size;
 	cachep->reciprocal_buffer_size = reciprocal_value(size);
 
-	if (flags & CFLGS_OFF_SLAB) {
-		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
-		/*
-		 * This is a possibility for one of the kmalloc_{dma,}_caches.
-		 * But since we go off slab only for object size greater than
-		 * PAGE_SIZE/8, and kmalloc_{dma,}_caches get created
-		 * in ascending order,this should not happen at all.
-		 * But leave a BUG_ON for some lucky dude.
-		 */
-		BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
-	}
-
 	err = setup_cpu_cache(cachep, gfp);
 	if (err) {
 		__kmem_cache_shutdown(cachep);