From patchwork Mon Aug 26 11:16:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11114383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E241716B1 for ; Mon, 26 Aug 2019 11:16:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B8561206B7 for ; Mon, 26 Aug 2019 11:16:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8561206B7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9A31D6B0567; Mon, 26 Aug 2019 07:16:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 97A826B0569; Mon, 26 Aug 2019 07:16:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88FBA6B056A; Mon, 26 Aug 2019 07:16:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 5FB226B0567 for ; Mon, 26 Aug 2019 07:16:44 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 01F2C181AC9B4 for ; Mon, 26 Aug 2019 11:16:44 +0000 (UTC) X-FDA: 75864326328.06.front88_1d9eb5a38ec5b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:akpm@linux-foundation.org::linux-kernel@vger.kernel.org:cl@linux.com:penberg@kernel.org:rientjes@google.com:ming.lei@redhat.com:david@fromorbit.com:willy@infradead.org:darrick.wong@oracle.com:hch@lst.de:linux-xfs@vger.kernel.org:linux-fsdevel@vger.kernel.org:linux-block@vger.kernel.org:james.bottomley@hansenpartnership.com:linux-btrfs@vger.kernel.org:vbabka@suse.cz,RULES_HIT:30001:30054:30055:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: front88_1d9eb5a38ec5b X-Filterd-Recvd-Size: 5779 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Mon, 26 Aug 2019 11:16:43 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id DF8EFAC28; Mon, 26 Aug 2019 11:16:41 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Christoph Lameter , Pekka Enberg , David Rientjes , Ming Lei , Dave Chinner , Matthew Wilcox , "Darrick J . Wong" , Christoph Hellwig , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, James Bottomley , linux-btrfs@vger.kernel.org, Vlastimil Babka Subject: [PATCH v2 1/2] mm, sl[ou]b: improve memory accounting Date: Mon, 26 Aug 2019 13:16:26 +0200 Message-Id: <20190826111627.7505-2-vbabka@suse.cz> X-Mailer: git-send-email 2.22.1 In-Reply-To: <20190826111627.7505-1-vbabka@suse.cz> References: <20190826111627.7505-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: SLOB currently doesn't account its pages at all, so in /proc/meminfo the Slab field shows zero. Modifying a counter on page allocation and freeing should be acceptable even for the small system scenarios SLOB is intended for. Since reclaimable caches are not separated in SLOB, account everything as unreclaimable. SLUB currently doesn't account kmalloc() and kmalloc_node() allocations larger than order-1 page, that are passed directly to the page allocator. As they also don't appear in /proc/slabinfo, it might look like a memory leak. For consistency, account them as well. (SLAB doesn't actually use page allocator directly, so no change there). Ideally SLOB and SLUB would be handled in separate patches, but due to the shared kmalloc_order() function and different kfree() implementations, it's easier to patch both at once to prevent inconsistencies. Signed-off-by: Vlastimil Babka --- mm/slab_common.c | 8 ++++++-- mm/slob.c | 20 ++++++++++++++++---- mm/slub.c | 14 +++++++++++--- 3 files changed, 33 insertions(+), 9 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490fe217a..929c02a90fba 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1250,12 +1250,16 @@ void __init create_kmalloc_caches(slab_flags_t flags) */ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) { - void *ret; + void *ret = NULL; struct page *page; flags |= __GFP_COMP; page = alloc_pages(flags, order); - ret = page ? page_address(page) : NULL; + if (likely(page)) { + ret = page_address(page); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, + 1 << order); + } ret = kasan_kmalloc_large(ret, size, flags); /* As ret might get tagged, call kmemleak hook after KASAN. */ kmemleak_alloc(ret, size, 1, flags); diff --git a/mm/slob.c b/mm/slob.c index 7f421d0ca9ab..3dcde9cf2b17 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -190,7 +190,7 @@ static int slob_last(slob_t *s) static void *slob_new_pages(gfp_t gfp, int order, int node) { - void *page; + struct page *page; #ifdef CONFIG_NUMA if (node != NUMA_NO_NODE) @@ -202,14 +202,21 @@ static void *slob_new_pages(gfp_t gfp, int order, int node) if (!page) return NULL; + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, + 1 << order); return page_address(page); } static void slob_free_pages(void *b, int order) { + struct page *sp = virt_to_page(b); + if (current->reclaim_state) current->reclaim_state->reclaimed_slab += 1 << order; - free_pages((unsigned long)b, order); + + mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE, + -(1 << order)); + __free_pages(sp, order); } /* @@ -521,8 +528,13 @@ void kfree(const void *block) int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); unsigned int *m = (unsigned int *)(block - align); slob_free(m, *m + align); - } else - __free_pages(sp, compound_order(sp)); + } else { + unsigned int order = compound_order(sp); + mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE, + -(1 << order)); + __free_pages(sp, order); + + } } EXPORT_SYMBOL(kfree); diff --git a/mm/slub.c b/mm/slub.c index 8834563cdb4b..74365d083a1e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3819,11 +3819,15 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) { struct page *page; void *ptr = NULL; + unsigned int order = get_order(size); flags |= __GFP_COMP; - page = alloc_pages_node(node, flags, get_order(size)); - if (page) + page = alloc_pages_node(node, flags, order); + if (page) { ptr = page_address(page); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, + 1 << order); + } return kmalloc_large_node_hook(ptr, size, flags); } @@ -3949,9 +3953,13 @@ void kfree(const void *x) page = virt_to_head_page(x); if (unlikely(!PageSlab(page))) { + unsigned int order = compound_order(page); + BUG_ON(!PageCompound(page)); kfree_hook(object); - __free_pages(page, compound_order(page)); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, + -(1 << order)); + __free_pages(page, order); return; } slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_); From patchwork Mon Aug 26 11:16:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11114387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E9AE1399 for ; Mon, 26 Aug 2019 11:16:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 383F2206B7 for ; Mon, 26 Aug 2019 11:16:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 383F2206B7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A3F796B0569; Mon, 26 Aug 2019 07:16:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9CBE26B056B; Mon, 26 Aug 2019 07:16:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 757966B056C; Mon, 26 Aug 2019 07:16:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0244.hostedemail.com [216.40.44.244]) by kanga.kvack.org (Postfix) with ESMTP id 469416B0569 for ; Mon, 26 Aug 2019 07:16:45 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id E2C68180AD7C1 for ; Mon, 26 Aug 2019 11:16:44 +0000 (UTC) X-FDA: 75864326328.06.toes92_1dbd2cd5ec25c X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:akpm@linux-foundation.org::linux-kernel@vger.kernel.org:cl@linux.com:penberg@kernel.org:rientjes@google.com:ming.lei@redhat.com:david@fromorbit.com:willy@infradead.org:darrick.wong@oracle.com:hch@lst.de:linux-xfs@vger.kernel.org:linux-fsdevel@vger.kernel.org:linux-block@vger.kernel.org:james.bottomley@hansenpartnership.com:linux-btrfs@vger.kernel.org:vbabka@suse.cz,RULES_HIT:30003:30012:30029:30054:30056:30062:30070:30090,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:29,LUA_SUMMARY:none X-HE-Tag: toes92_1dbd2cd5ec25c X-Filterd-Recvd-Size: 11174 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Mon, 26 Aug 2019 11:16:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A3E48AC64; Mon, 26 Aug 2019 11:16:42 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Christoph Lameter , Pekka Enberg , David Rientjes , Ming Lei , Dave Chinner , Matthew Wilcox , "Darrick J . Wong" , Christoph Hellwig , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, James Bottomley , linux-btrfs@vger.kernel.org, Vlastimil Babka Subject: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Date: Mon, 26 Aug 2019 13:16:27 +0200 Message-Id: <20190826111627.7505-3-vbabka@suse.cz> X-Mailer: git-send-email 2.22.1 In-Reply-To: <20190826111627.7505-1-vbabka@suse.cz> References: <20190826111627.7505-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In most configurations, kmalloc() happens to return naturally aligned (i.e. aligned to the block size itself) blocks for power of two sizes. That means some kmalloc() users might unknowingly rely on that alignment, until stuff breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then developers have to devise workaround such as own kmem caches with specified alignment [1], which is not always practical, as recently evidenced in [2]. The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' variant would not help with code unknowingly relying on the implicit alignment. For slab implementations it would either require creating more kmalloc caches, or allocate a larger size and only give back part of it. That would be wasteful, especially with a generic alignment parameter (in contrast with a fixed alignment to size). Ideally we should provide to mm users what they need without difficult workarounds or own reimplementations, so let's make the kmalloc() alignment to size explicitly guaranteed for power-of-two sizes under all configurations. What this means for the three available allocators? * SLAB object layout happens to be mostly unchanged by the patch. The implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for caches with alignment larger than unsigned long long. Practically on at least x86 this includes kmalloc caches as they use cache line alignment, which is larger than that. Still, this patch ensures alignment on all arches and cache sizes. * SLUB layout is also unchanged unless redzoning is enabled through CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With this patch, explicit alignment is guaranteed with redzoning as well. This will result in more memory being wasted, but that should be acceptable in a debugging scenario. * SLOB has no implicit alignment so this patch adds it explicitly for kmalloc(). The potential downside is increased fragmentation. While pathological allocation scenarios are certainly possible, in my testing, after booting a x86_64 kernel+userspace with virtme, around 16MB memory was consumed by slab pages both before and after the patch, with difference in the noise. [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/ [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/ [3] https://lwn.net/Articles/787740/ Signed-off-by: Vlastimil Babka Reviewed-by: Matthew Wilcox (Oracle) Acked-by: Michal Hocko Acked-by: Kirill A. Shutemov --- Documentation/core-api/memory-allocation.rst | 4 ++ include/linux/slab.h | 4 ++ mm/slab_common.c | 11 ++++- mm/slob.c | 42 +++++++++++++++----- 4 files changed, 49 insertions(+), 12 deletions(-) diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 7744aa3bf2e0..27c54854b508 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the kernel configuration, but it is a good practice to use `kmalloc` for objects smaller than page size. +The address of a chunk allocated with `kmalloc` is aligned to at least +ARCH_KMALLOC_MINALIGN bytes. For sizes of power of two bytes, the +alignment is also guaranteed to be at least to the respective size. + For large allocations you can use :c:func:`vmalloc` and :c:func:`vzalloc`, or directly request pages from the page allocator. The memory allocated by `vmalloc` and related functions is diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..0d4c26395785 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -493,6 +493,10 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags) * kmalloc is the normal method of allocating memory * for objects smaller than page size in the kernel. * + * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN + * bytes. For @size of power of two bytes, the alignment is also guaranteed + * to be at least to the size. + * * The @flags argument may be one of the GFP flags defined at * include/linux/gfp.h and described at * :ref:`Documentation/core-api/mm-api.rst ` diff --git a/mm/slab_common.c b/mm/slab_common.c index 929c02a90fba..b9ba93ad5c7f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -993,10 +993,19 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, unsigned int useroffset, unsigned int usersize) { int err; + unsigned int align = ARCH_KMALLOC_MINALIGN; s->name = name; s->size = s->object_size = size; - s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size); + + /* + * For power of two sizes, guarantee natural alignment for kmalloc + * caches, regardless of SL*B debugging options. + */ + if (is_power_of_2(size)) + align = max(align, size); + s->align = calculate_alignment(flags, align, size); + s->useroffset = useroffset; s->usersize = usersize; diff --git a/mm/slob.c b/mm/slob.c index 3dcde9cf2b17..07a39047aa54 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -224,6 +224,7 @@ static void slob_free_pages(void *b, int order) * @sp: Page to look in. * @size: Size of the allocation. * @align: Allocation alignment. + * @align_offset: Offset in the allocated block that will be aligned. * @page_removed_from_list: Return parameter. * * Tries to find a chunk of memory at least @size bytes big within @page. @@ -234,7 +235,7 @@ static void slob_free_pages(void *b, int order) * true (set to false otherwise). */ static void *slob_page_alloc(struct page *sp, size_t size, int align, - bool *page_removed_from_list) + int align_offset, bool *page_removed_from_list) { slob_t *prev, *cur, *aligned = NULL; int delta = 0, units = SLOB_UNITS(size); @@ -243,8 +244,17 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align, for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) { slobidx_t avail = slob_units(cur); + /* + * 'aligned' will hold the address of the slob block so that the + * address 'aligned'+'align_offset' is aligned according to the + * 'align' parameter. This is for kmalloc() which prepends the + * allocated block with its size, so that the block itself is + * aligned when needed. + */ if (align) { - aligned = (slob_t *)ALIGN((unsigned long)cur, align); + aligned = (slob_t *) + (ALIGN((unsigned long)cur + align_offset, align) + - align_offset); delta = aligned - cur; } if (avail >= units + delta) { /* room enough? */ @@ -288,7 +298,8 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align, /* * slob_alloc: entry point into the slob allocator. */ -static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) +static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, + int align_offset) { struct page *sp; struct list_head *slob_list; @@ -319,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) if (sp->units < SLOB_UNITS(size)) continue; - b = slob_page_alloc(sp, size, align, &page_removed_from_list); + b = slob_page_alloc(sp, size, align, align_offset, &page_removed_from_list); if (!b) continue; @@ -356,7 +367,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) INIT_LIST_HEAD(&sp->slab_list); set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE)); set_slob_page_free(sp, slob_list); - b = slob_page_alloc(sp, size, align, &_unused); + b = slob_page_alloc(sp, size, align, align_offset, &_unused); BUG_ON(!b); spin_unlock_irqrestore(&slob_lock, flags); } @@ -458,7 +469,7 @@ static __always_inline void * __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller) { unsigned int *m; - int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); + int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); void *ret; gfp &= gfp_allowed_mask; @@ -466,19 +477,28 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller) fs_reclaim_acquire(gfp); fs_reclaim_release(gfp); - if (size < PAGE_SIZE - align) { + if (size < PAGE_SIZE - minalign) { + int align = minalign; + + /* + * For power of two sizes, guarantee natural alignment for + * kmalloc()'d objects. + */ + if (is_power_of_2(size)) + align = max(minalign, (int) size); + if (!size) return ZERO_SIZE_PTR; - m = slob_alloc(size + align, gfp, align, node); + m = slob_alloc(size + minalign, gfp, align, node, minalign); if (!m) return NULL; *m = size; - ret = (void *)m + align; + ret = (void *)m + minalign; trace_kmalloc_node(caller, ret, - size, size + align, gfp, node); + size, size + minalign, gfp, node); } else { unsigned int order = get_order(size); @@ -579,7 +599,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node) fs_reclaim_release(flags); if (c->size < PAGE_SIZE) { - b = slob_alloc(c->size, flags, c->align, node); + b = slob_alloc(c->size, flags, c->align, node, 0); trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size, SLOB_UNITS(c->size) * SLOB_UNIT, flags, node);