From patchwork Tue Sep 13 06:54:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12974475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 732E9C6FA82 for ; Tue, 13 Sep 2022 06:54:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09C118D0001; Tue, 13 Sep 2022 02:54:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04A936B0074; Tue, 13 Sep 2022 02:54:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDF478D0001; Tue, 13 Sep 2022 02:54:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CF4586B0073 for ; Tue, 13 Sep 2022 02:54:52 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9F7C280335 for ; Tue, 13 Sep 2022 06:54:52 +0000 (UTC) X-FDA: 79906149624.12.CD59797 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf11.hostedemail.com (Postfix) with ESMTP id E630C40088 for ; Tue, 13 Sep 2022 06:54:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663052092; x=1694588092; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/rWF7oJWaQ0MvKDkckIq2WnWpbUmXfHiQf4LPdFStXU=; b=YekPzm/79sLJIGeqw05gDQaIrVEAbfjCz5BmjHDRDwOn0FL9n6RdmXhS QQyXpbCxm8YFQgwKemlD5yQ7/Xgt+z6Z5CKwkbmQzlIMgV6fP0O3Gp5cA jtQbC7jijs3eSEp0giP7vE0D4uf87egD3KnofzODxS6lmKZmSFKY2vj1b geI1VshuwXVGpzqrt5zSUnnVD4gXbQCXtdvh3+2/Mq+vKfTQlS8KT39bR xs+Bb/EamB2szzqVW+xGhaUlzN6XVdcmUEWo4DsXKB8HBquDa/XTCpjLx 73RbO1gUd9RBhfZvurCK14HuQb4G5b4QeeicRVLOgW2hx0CxRqKAF4gfv g==; X-IronPort-AV: E=McAfee;i="6500,9779,10468"; a="285079372" X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="285079372" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2022 23:54:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="861440711" Received: from feng-clx.sh.intel.com ([10.238.200.228]) by fmsmga006.fm.intel.com with ESMTP; 12 Sep 2022 23:54:46 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov , Jonathan Corbet , Andrey Konovalov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, Feng Tang , Robin Murphy , John Garry , Kefeng Wang Subject: [PATCH v6 1/4] mm/slub: enable debugging memory wasting of kmalloc Date: Tue, 13 Sep 2022 14:54:20 +0800 Message-Id: <20220913065423.520159-2-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220913065423.520159-1-feng.tang@intel.com> References: <20220913065423.520159-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663052092; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RcXsgfFXEF3riXkzvKo6+RnVpYllkUNgb1bKfGwzJCk=; b=FTw2hHekZLywIJExivX+wbZjflEYZfTBchovsxp8RAsFrutRq3QO4ROQ98d2ehzXqr1IYg LGWoDsPd0uasgNyzCyPEyAbQVDPAcCuTX5cqy/5EQa3BsvjfyzalsuP3mhcSKvmqyRO/H3 kM+Sw/Dm2H5i3TnfrWjkN+bZo2aZAc0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="YekPzm/7"; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663052092; a=rsa-sha256; cv=none; b=PvuQNZX5KKBqPGnceGxd/HNC8DpRgsU8AcoUBumOF4D0xIWr8q5WdkHdb6KB8pZiOQnWBa Vpy6gTz7/G2VVgbSTNh3/zA2mF7hPDvAn+mw8OVvupuFf24aconn8QPJG5Thd8LupoDJ8u 2KmpM8ygbS7HdJWq1pg3DMUUcOBYYak= X-Rspamd-Queue-Id: E630C40088 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="YekPzm/7"; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Stat-Signature: 4acuqxk5goesmi94jqy1heiso7qzn7fg X-HE-Tag: 1663052091-228432 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy Cc: John Garry Cc: Kefeng Wang Signed-off-by: Feng Tang --- Documentation/mm/slub.rst | 33 +++++--- include/linux/slab.h | 2 + mm/slub.c | 156 ++++++++++++++++++++++++++++---------- 3 files changed, 141 insertions(+), 50 deletions(-) diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst index 43063ade737a..4e1578186b4f 100644 --- a/Documentation/mm/slub.rst +++ b/Documentation/mm/slub.rst @@ -400,21 +400,30 @@ information: allocated objects. The output is sorted by frequency of each trace. Information in the output: - Number of objects, allocating function, minimal/average/maximal jiffies since alloc, - pid range of the allocating processes, cpu mask of allocating cpus, and stack trace. + Number of objects, allocating function, possible memory wastage of + kmalloc objects(total/per-object), minimal/average/maximal jiffies + since alloc, pid range of the allocating processes, cpu mask of + allocating cpus, numa node mask of origins of memory, and stack trace. Example::: - 1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1:: - __slab_alloc+0x6d/0x90 - kmem_cache_alloc_trace+0x2eb/0x300 - populate_error_injection_list+0x97/0x110 - init_error_injection+0x1b/0x71 - do_one_initcall+0x5f/0x2d0 - kernel_init_freeable+0x26f/0x2d7 - kernel_init+0xe/0x118 - ret_from_fork+0x22/0x30 - + 338 pci_alloc_dev+0x2c/0xa0 waste=521872/1544 age=290837/291891/293509 pid=1 cpus=106 nodes=0-1 + __kmem_cache_alloc_node+0x11f/0x4e0 + kmalloc_trace+0x26/0xa0 + pci_alloc_dev+0x2c/0xa0 + pci_scan_single_device+0xd2/0x150 + pci_scan_slot+0xf7/0x2d0 + pci_scan_child_bus_extend+0x4e/0x360 + acpi_pci_root_create+0x32e/0x3b0 + pci_acpi_scan_root+0x2b9/0x2d0 + acpi_pci_root_add.cold.11+0x110/0xb0a + acpi_bus_attach+0x262/0x3f0 + device_for_each_child+0xb7/0x110 + acpi_dev_for_each_child+0x77/0xa0 + acpi_bus_attach+0x108/0x3f0 + device_for_each_child+0xb7/0x110 + acpi_dev_for_each_child+0x77/0xa0 + acpi_bus_attach+0x108/0x3f0 2. free_traces:: diff --git a/include/linux/slab.h b/include/linux/slab.h index 9b592e611cb1..6dc495f76644 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -29,6 +29,8 @@ #define SLAB_RED_ZONE ((slab_flags_t __force)0x00000400U) /* DEBUG: Poison objects */ #define SLAB_POISON ((slab_flags_t __force)0x00000800U) +/* Indicate a kmalloc slab */ +#define SLAB_KMALLOC ((slab_flags_t __force)0x00001000U) /* Align objs on cache lines */ #define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U) /* Use GFP_DMA memory */ diff --git a/mm/slub.c b/mm/slub.c index fe4fe0e72daf..c8ba16b3a4db 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -194,11 +194,24 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled); #endif #endif /* CONFIG_SLUB_DEBUG */ +/* Structure holding parameters for get_partial() call chain */ +struct partial_context { + struct slab **slab; + gfp_t flags; + unsigned int orig_size; +}; + static inline bool kmem_cache_debug(struct kmem_cache *s) { return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS); } +static inline bool slub_debug_orig_size(struct kmem_cache *s) +{ + return (kmem_cache_debug_flags(s, SLAB_STORE_USER) && + (s->flags & SLAB_KMALLOC)); +} + void *fixup_red_left(struct kmem_cache *s, void *p) { if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) @@ -785,6 +798,39 @@ static void print_slab_info(const struct slab *slab) folio_flags(folio, 0)); } +/* + * kmalloc caches has fixed sizes (mostly power of 2), and kmalloc() API + * family will round up the real request size to these fixed ones, so + * there could be an extra area than what is requested. Save the original + * request size in the meta data area, for better debug and sanity check. + */ +static inline void set_orig_size(struct kmem_cache *s, + void *object, unsigned int orig_size) +{ + void *p = kasan_reset_tag(object); + + if (!slub_debug_orig_size(s)) + return; + + p += get_info_end(s); + p += sizeof(struct track) * 2; + + *(unsigned int *)p = orig_size; +} + +static unsigned int get_orig_size(struct kmem_cache *s, void *object) +{ + void *p = kasan_reset_tag(object); + + if (!slub_debug_orig_size(s)) + return s->object_size; + + p += get_info_end(s); + p += sizeof(struct track) * 2; + + return *(unsigned int *)p; +} + static void slab_bug(struct kmem_cache *s, char *fmt, ...) { struct va_format vaf; @@ -844,6 +890,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p) if (s->flags & SLAB_STORE_USER) off += 2 * sizeof(struct track); + if (slub_debug_orig_size(s)) + off += sizeof(unsigned int); + off += kasan_metadata_size(s); if (off != size_from_object(s)) @@ -977,7 +1026,8 @@ static int check_bytes_and_report(struct kmem_cache *s, struct slab *slab, * * A. Free pointer (if we cannot overwrite object on free) * B. Tracking data for SLAB_STORE_USER - * C. Padding to reach required alignment boundary or at minimum + * C. Original request size for kmalloc object (SLAB_STORE_USER enabled) + * D. Padding to reach required alignment boundary or at minimum * one word if debugging is on to be able to detect writes * before the word boundary. * @@ -995,10 +1045,14 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p) { unsigned long off = get_info_end(s); /* The end of info */ - if (s->flags & SLAB_STORE_USER) + if (s->flags & SLAB_STORE_USER) { /* We also have user information there */ off += 2 * sizeof(struct track); + if (s->flags & SLAB_KMALLOC) + off += sizeof(unsigned int); + } + off += kasan_metadata_size(s); if (size_from_object(s) == off) @@ -1293,7 +1347,7 @@ static inline int alloc_consistency_checks(struct kmem_cache *s, } static noinline int alloc_debug_processing(struct kmem_cache *s, - struct slab *slab, void *object) + struct slab *slab, void *object, int orig_size) { if (s->flags & SLAB_CONSISTENCY_CHECKS) { if (!alloc_consistency_checks(s, slab, object)) @@ -1302,6 +1356,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s, /* Success. Perform special debug activities for allocs */ trace(s, slab, object, 1); + set_orig_size(s, object, orig_size); init_object(s, object, SLUB_RED_ACTIVE); return 1; @@ -1570,7 +1625,10 @@ static inline void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {} static inline int alloc_debug_processing(struct kmem_cache *s, - struct slab *slab, void *object) { return 0; } + struct slab *slab, void *object, int orig_size) { return 0; } + +static inline void set_orig_size(struct kmem_cache *s, + void *object, unsigned int orig_size) {} static inline void free_debug_processing( struct kmem_cache *s, struct slab *slab, @@ -1999,7 +2057,7 @@ static inline void remove_partial(struct kmem_cache_node *n, * it to full list if it was the last free object. */ static void *alloc_single_from_partial(struct kmem_cache *s, - struct kmem_cache_node *n, struct slab *slab) + struct kmem_cache_node *n, struct slab *slab, int orig_size) { void *object; @@ -2009,7 +2067,7 @@ static void *alloc_single_from_partial(struct kmem_cache *s, slab->freelist = get_freepointer(s, object); slab->inuse++; - if (!alloc_debug_processing(s, slab, object)) { + if (!alloc_debug_processing(s, slab, object, orig_size)) { remove_partial(n, slab); return NULL; } @@ -2028,7 +2086,7 @@ static void *alloc_single_from_partial(struct kmem_cache *s, * and put the slab to the partial (or full) list. */ static void *alloc_single_from_new_slab(struct kmem_cache *s, - struct slab *slab) + struct slab *slab, int orig_size) { int nid = slab_nid(slab); struct kmem_cache_node *n = get_node(s, nid); @@ -2040,7 +2098,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, slab->freelist = get_freepointer(s, object); slab->inuse = 1; - if (!alloc_debug_processing(s, slab, object)) + if (!alloc_debug_processing(s, slab, object, orig_size)) /* * It's not really expected that this would fail on a * freshly allocated slab, but a concurrent memory @@ -2118,7 +2176,7 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags); * Try to allocate a partial slab from a specific node. */ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, - struct slab **ret_slab, gfp_t gfpflags) + struct partial_context *pc) { struct slab *slab, *slab2; void *object = NULL; @@ -2138,11 +2196,12 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, list_for_each_entry_safe(slab, slab2, &n->partial, slab_list) { void *t; - if (!pfmemalloc_match(slab, gfpflags)) + if (!pfmemalloc_match(slab, pc->flags)) continue; if (kmem_cache_debug(s)) { - object = alloc_single_from_partial(s, n, slab); + object = alloc_single_from_partial(s, n, slab, + pc->orig_size); if (object) break; continue; @@ -2153,7 +2212,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, break; if (!object) { - *ret_slab = slab; + *pc->slab = slab; stat(s, ALLOC_FROM_PARTIAL); object = t; } else { @@ -2177,14 +2236,13 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, /* * Get a slab from somewhere. Search in increasing NUMA distances. */ -static void *get_any_partial(struct kmem_cache *s, gfp_t flags, - struct slab **ret_slab) +static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc) { #ifdef CONFIG_NUMA struct zonelist *zonelist; struct zoneref *z; struct zone *zone; - enum zone_type highest_zoneidx = gfp_zone(flags); + enum zone_type highest_zoneidx = gfp_zone(pc->flags); void *object; unsigned int cpuset_mems_cookie; @@ -2212,15 +2270,15 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, do { cpuset_mems_cookie = read_mems_allowed_begin(); - zonelist = node_zonelist(mempolicy_slab_node(), flags); + zonelist = node_zonelist(mempolicy_slab_node(), pc->flags); for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) { struct kmem_cache_node *n; n = get_node(s, zone_to_nid(zone)); - if (n && cpuset_zone_allowed(zone, flags) && + if (n && cpuset_zone_allowed(zone, pc->flags) && n->nr_partial > s->min_partial) { - object = get_partial_node(s, n, ret_slab, flags); + object = get_partial_node(s, n, pc); if (object) { /* * Don't check read_mems_allowed_retry() @@ -2241,8 +2299,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, /* * Get a partial slab, lock it and return it. */ -static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, - struct slab **ret_slab) +static void *get_partial(struct kmem_cache *s, int node, struct partial_context *pc) { void *object; int searchnode = node; @@ -2250,11 +2307,11 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, if (node == NUMA_NO_NODE) searchnode = numa_mem_id(); - object = get_partial_node(s, get_node(s, searchnode), ret_slab, flags); + object = get_partial_node(s, get_node(s, searchnode), pc); if (object || node != NUMA_NO_NODE) return object; - return get_any_partial(s, flags, ret_slab); + return get_any_partial(s, pc); } #ifdef CONFIG_PREEMPTION @@ -2974,11 +3031,12 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab) * already disabled (which is the case for bulk allocation). */ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, - unsigned long addr, struct kmem_cache_cpu *c) + unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size) { void *freelist; struct slab *slab; unsigned long flags; + struct partial_context pc; stat(s, ALLOC_SLOWPATH); @@ -3092,7 +3150,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, new_objects: - freelist = get_partial(s, gfpflags, node, &slab); + pc.flags = gfpflags; + pc.slab = &slab; + pc.orig_size = orig_size; + freelist = get_partial(s, node, &pc); if (freelist) goto check_new_slab; @@ -3108,7 +3169,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, stat(s, ALLOC_SLAB); if (kmem_cache_debug(s)) { - freelist = alloc_single_from_new_slab(s, slab); + freelist = alloc_single_from_new_slab(s, slab, orig_size); if (unlikely(!freelist)) goto new_objects; @@ -3140,6 +3201,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, */ if (s->flags & SLAB_STORE_USER) set_track(s, freelist, TRACK_ALLOC, addr); + return freelist; } @@ -3182,7 +3244,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, * pointer. */ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, - unsigned long addr, struct kmem_cache_cpu *c) + unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size) { void *p; @@ -3195,7 +3257,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, c = slub_get_cpu_ptr(s->cpu_slab); #endif - p = ___slab_alloc(s, gfpflags, node, addr, c); + p = ___slab_alloc(s, gfpflags, node, addr, c, orig_size); #ifdef CONFIG_PREEMPT_COUNT slub_put_cpu_ptr(s->cpu_slab); #endif @@ -3280,7 +3342,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l if (!USE_LOCKLESS_FAST_PATH() || unlikely(!object || !slab || !node_match(slab, node))) { - object = __slab_alloc(s, gfpflags, node, addr, c); + object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); } else { void *next_object = get_freepointer_safe(s, object); @@ -3747,7 +3809,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * of re-populating per CPU c->freelist */ p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, - _RET_IP_, c); + _RET_IP_, c, s->object_size); if (unlikely(!p[i])) goto error; @@ -4150,12 +4212,17 @@ static int calculate_sizes(struct kmem_cache *s) } #ifdef CONFIG_SLUB_DEBUG - if (flags & SLAB_STORE_USER) + if (flags & SLAB_STORE_USER) { /* * Need to store information about allocs and frees after * the object. */ size += 2 * sizeof(struct track); + + /* Save the original kmalloc request size */ + if (flags & SLAB_KMALLOC) + size += sizeof(unsigned int); + } #endif kasan_cache_create(s, &size, &s->flags); @@ -4770,7 +4837,7 @@ void __init kmem_cache_init(void) /* Now we can use the kmem_cache to allocate kmalloc slabs */ setup_kmalloc_cache_index_table(); - create_kmalloc_caches(0); + create_kmalloc_caches(SLAB_KMALLOC); /* Setup random freelists for each cache */ init_freelist_randomization(); @@ -4937,6 +5004,7 @@ struct location { depot_stack_handle_t handle; unsigned long count; unsigned long addr; + unsigned long waste; long long sum_time; long min_time; long max_time; @@ -4983,13 +5051,15 @@ static int alloc_loc_track(struct loc_track *t, unsigned long max, gfp_t flags) } static int add_location(struct loc_track *t, struct kmem_cache *s, - const struct track *track) + const struct track *track, + unsigned int orig_size) { long start, end, pos; struct location *l; - unsigned long caddr, chandle; + unsigned long caddr, chandle, cwaste; unsigned long age = jiffies - track->when; depot_stack_handle_t handle = 0; + unsigned int waste = s->object_size - orig_size; #ifdef CONFIG_STACKDEPOT handle = READ_ONCE(track->handle); @@ -5007,11 +5077,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, if (pos == end) break; - caddr = t->loc[pos].addr; - chandle = t->loc[pos].handle; - if ((track->addr == caddr) && (handle == chandle)) { + l = &t->loc[pos]; + caddr = l->addr; + chandle = l->handle; + cwaste = l->waste; + if ((track->addr == caddr) && (handle == chandle) && + (waste == cwaste)) { - l = &t->loc[pos]; l->count++; if (track->when) { l->sum_time += age; @@ -5036,6 +5108,9 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, end = pos; else if (track->addr == caddr && handle < chandle) end = pos; + else if (track->addr == caddr && handle == chandle && + waste < cwaste) + end = pos; else start = pos; } @@ -5059,6 +5134,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, l->min_pid = track->pid; l->max_pid = track->pid; l->handle = handle; + l->waste = waste; cpumask_clear(to_cpumask(l->cpus)); cpumask_set_cpu(track->cpu, to_cpumask(l->cpus)); nodes_clear(l->nodes); @@ -5077,7 +5153,7 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s, for_each_object(p, s, addr, slab->objects) if (!test_bit(__obj_to_index(s, addr, p), obj_map)) - add_location(t, s, get_track(s, p, alloc)); + add_location(t, s, get_track(s, p, alloc), get_orig_size(s, p)); } #endif /* CONFIG_DEBUG_FS */ #endif /* CONFIG_SLUB_DEBUG */ @@ -5942,6 +6018,10 @@ static int slab_debugfs_show(struct seq_file *seq, void *v) else seq_puts(seq, ""); + if (l->waste) + seq_printf(seq, " waste=%lu/%lu", + l->count * l->waste, l->waste); + if (l->sum_time != l->min_time) { seq_printf(seq, " age=%ld/%llu/%ld", l->min_time, div_u64(l->sum_time, l->count), From patchwork Tue Sep 13 06:54:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12974476 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95A51C6FA86 for ; Tue, 13 Sep 2022 06:54:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2E53E8D0002; Tue, 13 Sep 2022 02:54:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 294DF6B0074; Tue, 13 Sep 2022 02:54:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 138CF8D0002; Tue, 13 Sep 2022 02:54:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 061D16B0073 for ; Tue, 13 Sep 2022 02:54:56 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CCFC5A0939 for ; Tue, 13 Sep 2022 06:54:55 +0000 (UTC) X-FDA: 79906149750.10.B3A9B6E Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf11.hostedemail.com (Postfix) with ESMTP id 32E3640088 for ; Tue, 13 Sep 2022 06:54:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663052095; x=1694588095; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jNIjARJaqhZDNzVjt1J74PS/eC/mhxmnJkoZtfdgppw=; b=Hd0aQ5hB+m8RpR5gnQyg5/Dcu0cXreFj+Fb2Zez17CdxB/7bBIMutSw9 erTe+dt2ZvNtMSR+FksiQsQtjp4mxvcFNr4HNLWSxgjmRWB55EnvlDwHs UJn724vhVr+NNb29N5ROHfUi3W9/a0UP9v+cHqsoXlwxt3NpGxmUQJJeY 3E1fwh2LvkbtdEvNb4RRlXOddvT8ojZlFFtU98sFoTRnsZqUKzdb4WQVE kWfienSg4es4wfmGFHy7SmbHBPfxPCZL7dokdWcfzN/QJn3yGwIpNTiX4 0SURNbxE4H5YdoXrqmU2HEQahGgf6BNGhYH3l8HcMHd01W2f1Wu20CPKD Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10468"; a="285079390" X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="285079390" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2022 23:54:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="861440725" Received: from feng-clx.sh.intel.com ([10.238.200.228]) by fmsmga006.fm.intel.com with ESMTP; 12 Sep 2022 23:54:51 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov , Jonathan Corbet , Andrey Konovalov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, Feng Tang Subject: [PATCH v6 2/4] mm/slub: only zero the requested size of buffer for kzalloc Date: Tue, 13 Sep 2022 14:54:21 +0800 Message-Id: <20220913065423.520159-3-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220913065423.520159-1-feng.tang@intel.com> References: <20220913065423.520159-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663052095; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WygvbcU+57VUtDe0idm3K8vvy6IHtOeCKf1SmJPZZYI=; b=kt11PmAJrTFbRbi1VyhYRHr0uSgxTddlTL7H9ss50fP9U+oXbezrMJOad3Ns0cibLYNQ+R joHSxf0zC7y/mup6wZ6K3KuBjlqvI6nir7sSS9XbIZ25wPefTe27JApX9C0X0tVL4qOHZJ USheRWCigcqeAx/IgVspJSNvXjshBIY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Hd0aQ5hB; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663052095; a=rsa-sha256; cv=none; b=z8lja88osOA1Z4l+OH4aJ65MOh/ymGjpXc1sW/e2SkRU8VSTmYkElQKy/oyut3vB21SOYS LfIeF745Ov7hWxcGvXqJRIOi+OCoiaB/ggJaIpmo7Ho0uJ90y9XcVvVCPs3y2wX/oGnzLg +W19MWG/+OlqN1t2/gsEmNnhchhxtPQ= X-Rspamd-Queue-Id: 32E3640088 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Hd0aQ5hB; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Stat-Signature: 8hfmzo3k5ummq9hj5gpyw5x33u7di9s5 X-HE-Tag: 1663052095-71704 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: kzalloc/kmalloc will round up the request size to a fixed size (mostly power of 2), so the allocated memory could be more than requested. Currently kzalloc family APIs will zero all the allocated memory. To detect out-of-bound usage of the extra allocated memory, only zero the requested part, so that sanity check could be added to the extra space later. Performance wise, smaller zeroing length also brings shorter execution time, as shown from test data on various server/desktop platforms. For kzalloc users who will call ksize() later and utilize this extra space, please be aware that the space is not zeroed any more. Signed-off-by: Feng Tang --- mm/slab.c | 7 ++++--- mm/slab.h | 5 +++-- mm/slub.c | 10 +++++++--- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index a5486ff8362a..4594de0e3d6b 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3253,7 +3253,8 @@ slab_alloc_node(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags, init = slab_want_init_on_alloc(flags, cachep); out: - slab_post_alloc_hook(cachep, objcg, flags, 1, &objp, init); + slab_post_alloc_hook(cachep, objcg, flags, 1, &objp, init, + cachep->object_size); return objp; } @@ -3506,13 +3507,13 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * Done outside of the IRQ disabled section. */ slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s)); + slab_want_init_on_alloc(flags, s), s->object_size); /* FIXME: Trace call missing. Christoph would like a bulk variant */ return size; error: local_irq_enable(); cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); - slab_post_alloc_hook(s, objcg, flags, i, p, false); + slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size); kmem_cache_free_bulk(s, i, p); return 0; } diff --git a/mm/slab.h b/mm/slab.h index d0ef9dd44b71..3cf5adf63f48 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -730,7 +730,8 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, static inline void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, gfp_t flags, - size_t size, void **p, bool init) + size_t size, void **p, bool init, + unsigned int orig_size) { size_t i; @@ -746,7 +747,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, for (i = 0; i < size; i++) { p[i] = kasan_slab_alloc(s, p[i], flags, init); if (p[i] && init && !kasan_has_integrated_init()) - memset(p[i], 0, s->object_size); + memset(p[i], 0, orig_size); kmemleak_alloc_recursive(p[i], s->object_size, 1, s->flags, flags); } diff --git a/mm/slub.c b/mm/slub.c index c8ba16b3a4db..6f823e99d8b4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3376,7 +3376,11 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l init = slab_want_init_on_alloc(gfpflags, s); out: - slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init); + /* + * When init equals 'true', like for kzalloc() family, only + * @orig_size bytes will be zeroed instead of s->object_size + */ + slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init, orig_size); return object; } @@ -3833,11 +3837,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * Done outside of the IRQ disabled fastpath loop. */ slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s)); + slab_want_init_on_alloc(flags, s), s->object_size); return i; error: slub_put_cpu_ptr(s->cpu_slab); - slab_post_alloc_hook(s, objcg, flags, i, p, false); + slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size); kmem_cache_free_bulk(s, i, p); return 0; } From patchwork Tue Sep 13 06:54:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12974477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3563AC54EE9 for ; Tue, 13 Sep 2022 06:55:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C831A6B0073; Tue, 13 Sep 2022 02:54:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3168940008; Tue, 13 Sep 2022 02:54:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF9BA940007; Tue, 13 Sep 2022 02:54:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A00E86B0073 for ; Tue, 13 Sep 2022 02:54:59 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7B32F1608BA for ; Tue, 13 Sep 2022 06:54:59 +0000 (UTC) X-FDA: 79906149918.18.D02ECD2 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf11.hostedemail.com (Postfix) with ESMTP id CAC5940091 for ; Tue, 13 Sep 2022 06:54:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663052098; x=1694588098; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DytKD4r9+X/L41exc0OVKMw7LZ0hR4YVrRBK5Geu9bo=; b=IoR8MGtQ6sN3mUy9jWov3JLEXeQdhItD2wIiQCy5NyG8gsWVO6ldg3vx rMfDaJf1Spg2tmT2vnAu2u3626tP/+hIRbtjbfFvtqyPKb2w46whY5Ls9 Iu43CI2f/6HLU7aOl9BGTAtO6OwMZNermMkoVkHCN1eS2XEYcbucSnTVK D448JFaySJ55MU4N9tLasuckSMTAWkfpf+bo+8DIsX7nN7j7HC5XDwWut iMcMjEp4xceq6FJiyTrNwth5Jtt+TtDxOHWG/Xy5QS/sjefkgJxoXPZxK SFZ87oGRZ+6nLu2Ykh4DCf+oe1+PfzphFaanqDdBXP31HFtDeEvjnyIvi Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10468"; a="285079409" X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="285079409" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2022 23:54:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="861440734" Received: from feng-clx.sh.intel.com ([10.238.200.228]) by fmsmga006.fm.intel.com with ESMTP; 12 Sep 2022 23:54:54 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov , Jonathan Corbet , Andrey Konovalov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, Feng Tang , kernel test robot Subject: [PATCH v6 3/4] mm: kasan: Add free_meta size info in struct kasan_cache Date: Tue, 13 Sep 2022 14:54:22 +0800 Message-Id: <20220913065423.520159-4-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220913065423.520159-1-feng.tang@intel.com> References: <20220913065423.520159-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663052099; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w4DAXzPXkCbDYCXb48OkMTjrkf0UJXQmQ3pLbnSy+bk=; b=3vumt5mXMtggIc59dxdgctkR2Ypm/oup6jysrbV83q4HB4sfhX4wTvJMGTPKe+naF8cQpX NGLg11sDQ3cmfdBwZCp11qiVdWz6Rt7+qdMB1hwwfxT3PpWn9onr5jevM/+OxuWisGmaH0 7kRSXnACrPo5uOwQyPHqMLBDBUakU+8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=IoR8MGtQ; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663052099; a=rsa-sha256; cv=none; b=gFV+POF8mDV4dMF9ox2CQqisyIU4NUEO+iGNwRTtt2kFAlQ9cLeRc/aGP2D8uGsS+grbER u4WXwA8+QwMUxRzpT8DsQBXxuaHpjOD3+dASKeYyxYBt+lDXKwxghPgeOmmvTVgokj2wo1 9AuFz+Gz2dpjdNh5NTc4JlbLZeEbtbE= X-Rspamd-Queue-Id: CAC5940091 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=IoR8MGtQ; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Stat-Signature: g57a5j765mqwowayp8fo9ezyr8eiki58 X-HE-Tag: 1663052098-724201 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When kasan is enabled for slab/slub, it may save kasan' free_meta data in the former part of slab object data area in slab object's free path, which works fine. There is ongoing effort to extend slub's debug function which will redzone the latter part of kmalloc object area, and when both of the debug are enabled, there is possible conflict, especially when the kmalloc object has small size, as caught by 0Day bot [1] For better information for slab/slub, add free_meta's data size into 'struct kasan_cache', so that its users can take right action to avoid data conflict. [1]. https://lore.kernel.org/lkml/YuYm3dWwpZwH58Hu@xsang-OptiPlex-9020/ Reported-by: kernel test robot Signed-off-by: Feng Tang Acked-by: Dmitry Vyukov Reviewed-by: Andrey Konovalov Reported-by: kernel test robot Signed-off-by: Feng Tang Reviewed-by: Andrey Konovalov --- include/linux/kasan.h | 2 ++ mm/kasan/common.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/linux/kasan.h b/include/linux/kasan.h index b092277bf48d..49af9513e8ed 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -100,6 +100,8 @@ static inline bool kasan_has_integrated_init(void) struct kasan_cache { int alloc_meta_offset; int free_meta_offset; + /* size of free_meta data saved in object's data area */ + int free_meta_size; bool is_kmalloc; }; diff --git a/mm/kasan/common.c b/mm/kasan/common.c index 69f583855c8b..0cb867e92524 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -201,6 +201,8 @@ void __kasan_cache_create(struct kmem_cache *cache, unsigned int *size, cache->kasan_info.free_meta_offset = KASAN_NO_FREE_META; *size = ok_size; } + } else { + cache->kasan_info.free_meta_size = sizeof(struct kasan_free_meta); } /* Calculate size with optimal redzone. */ From patchwork Tue Sep 13 06:54:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12974478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E304C6FA82 for ; Tue, 13 Sep 2022 06:55:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA5596B0074; Tue, 13 Sep 2022 02:55:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5487940008; Tue, 13 Sep 2022 02:55:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1CAA940007; Tue, 13 Sep 2022 02:55:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C339E6B0074 for ; Tue, 13 Sep 2022 02:55:03 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9CDE1AACE7 for ; Tue, 13 Sep 2022 06:55:03 +0000 (UTC) X-FDA: 79906150086.05.7226CB1 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf11.hostedemail.com (Postfix) with ESMTP id B31CD40088 for ; Tue, 13 Sep 2022 06:55:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663052102; x=1694588102; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MwGHd/pOZOwzdadrxPsU8z6L/u7Dv9xis33Cv0S9Lpk=; b=XrvVyepKNEjVOWgZ/ZKYl8PhiWD2PROnQRoTCilESgqu/5b0XOhgNn7K lj5u7GffXI+3DReR+82t5Pozj9LOtmKUnsERFYzUnGwR+AlzHFOzp6Fuo 01xNR1mWnl+zbQXzeFmsy3ZpgBNy2ONJ87l9YUvWKcP9guiS/5tsDsApC jawdtuqTXKHpe4/RFANR0NAVMylG6g9QKPvxyqe6zTtQYhaS8K20QHMds 86mn8vPlIF8DZ9DwrAht+vQjWvEzj7FPILmcKCxsOBOW1H+lGSXOfri1M g3QAOcHYciLEqMjHF/o4YIBhBmJhCACj3qikv/36GplVVE0bycefEG7Jr w==; X-IronPort-AV: E=McAfee;i="6500,9779,10468"; a="285079435" X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="285079435" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2022 23:55:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,312,1654585200"; d="scan'208";a="861440749" Received: from feng-clx.sh.intel.com ([10.238.200.228]) by fmsmga006.fm.intel.com with ESMTP; 12 Sep 2022 23:54:58 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov , Jonathan Corbet , Andrey Konovalov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, Feng Tang Subject: [PATCH v6 4/4] mm/slub: extend redzone check to extra allocated kmalloc space than requested Date: Tue, 13 Sep 2022 14:54:23 +0800 Message-Id: <20220913065423.520159-5-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220913065423.520159-1-feng.tang@intel.com> References: <20220913065423.520159-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663052103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=17ru2aScMcm/B0OnyQ31ceHqJ7mFMov0F/bGaFPOtys=; b=5NDfNsN1emJFTa7QPk4VNxJzzCw2+f6TLZfhfxwEslszrhMrp3qe3NTDpJouKXVILOGhTj iuuwZx4u2ZS8LNm9jcfb6TbjIlfZt8Yez3D+eehpQKwWCEGi4IwU6X/Li91gzdNtqIO05R AfOMM7a3Vg5gOy53bsm/Hryu3NwjmPs= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=XrvVyepK; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663052103; a=rsa-sha256; cv=none; b=j0WMa8FmHya1iw1FKD1OVA+LUskCccpzThrM9gFoxqBadOO2voqTLnFteqSaVHw1vY2top kaQgXzMt02n3saxGCWr0HGlz5r7FJk4vvZg2B6iM/MriElR6Tl/kzNLxWHpjeN5Z1ZwXcu 08HX5VBKoefdwvJiJx2vQltAweMPjmo= X-Rspamd-Queue-Id: B31CD40088 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=XrvVyepK; spf=pass (imf11.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Stat-Signature: rzgf58cm1uezgsxny4np1z1i54rwbp8u X-HE-Tag: 1663052102-351484 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: kmalloc will round up the request size to a fixed size (mostly power of 2), so there could be a extra space than what is requested, whose size is the actual buffer size minus original request size. To better detect out of bound access or abuse of this space, add redzone sanity check for it. And in current kernel, some kmalloc user already knows the existence of the space and utilizes it after calling 'ksize()' to know the real size of the allocated buffer. So we skip the sanity check for objects which have been called with ksize(), as treating them as legitimate users. In some cases, the free pointer could be saved inside the latter part of object data area, which may overlap the redzone part(for small sizes of kmalloc objects). As suggested by Hyeonggon Yoo, force the free pointer to be in meta data area when kmalloc redzone debug is enabled, to make all kmalloc objects covered by redzone check. Suggested-by: Vlastimil Babka Signed-off-by: Feng Tang Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/slab.h | 4 ++++ mm/slab_common.c | 4 ++++ mm/slub.c | 51 ++++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 55 insertions(+), 4 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 3cf5adf63f48..5ca04d9c8bf5 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -881,4 +881,8 @@ void __check_heap_object(const void *ptr, unsigned long n, } #endif +#ifdef CONFIG_SLUB_DEBUG +void skip_orig_size_check(struct kmem_cache *s, const void *object); +#endif + #endif /* MM_SLAB_H */ diff --git a/mm/slab_common.c b/mm/slab_common.c index 8e13e3aac53f..5106667d6adb 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1001,6 +1001,10 @@ size_t __ksize(const void *object) return folio_size(folio); } +#ifdef CONFIG_SLUB_DEBUG + skip_orig_size_check(folio_slab(folio)->slab_cache, object); +#endif + return slab_ksize(folio_slab(folio)->slab_cache); } diff --git a/mm/slub.c b/mm/slub.c index 6f823e99d8b4..546b30ed5afd 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -812,12 +812,28 @@ static inline void set_orig_size(struct kmem_cache *s, if (!slub_debug_orig_size(s)) return; +#ifdef CONFIG_KASAN_GENERIC + /* + * KASAN could save its free meta data in object's data area at + * offset 0, if the size is larger than 'orig_size', it could + * overlap the data redzone(from 'orig_size+1' to 'object_size'), + * where the check should be skipped. + */ + if (s->kasan_info.free_meta_size > orig_size) + orig_size = s->object_size; +#endif + p += get_info_end(s); p += sizeof(struct track) * 2; *(unsigned int *)p = orig_size; } +void skip_orig_size_check(struct kmem_cache *s, const void *object) +{ + set_orig_size(s, (void *)object, s->object_size); +} + static unsigned int get_orig_size(struct kmem_cache *s, void *object) { void *p = kasan_reset_tag(object); @@ -949,13 +965,27 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, static void init_object(struct kmem_cache *s, void *object, u8 val) { u8 *p = kasan_reset_tag(object); + unsigned int orig_size = s->object_size; - if (s->flags & SLAB_RED_ZONE) + if (s->flags & SLAB_RED_ZONE) { memset(p - s->red_left_pad, val, s->red_left_pad); + if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { + orig_size = get_orig_size(s, object); + + /* + * Redzone the extra allocated space by kmalloc + * than requested. + */ + if (orig_size < s->object_size) + memset(p + orig_size, val, + s->object_size - orig_size); + } + } + if (s->flags & __OBJECT_POISON) { - memset(p, POISON_FREE, s->object_size - 1); - p[s->object_size - 1] = POISON_END; + memset(p, POISON_FREE, orig_size - 1); + p[orig_size - 1] = POISON_END; } if (s->flags & SLAB_RED_ZONE) @@ -1103,6 +1133,7 @@ static int check_object(struct kmem_cache *s, struct slab *slab, { u8 *p = object; u8 *endobject = object + s->object_size; + unsigned int orig_size; if (s->flags & SLAB_RED_ZONE) { if (!check_bytes_and_report(s, slab, object, "Left Redzone", @@ -1112,6 +1143,17 @@ static int check_object(struct kmem_cache *s, struct slab *slab, if (!check_bytes_and_report(s, slab, object, "Right Redzone", endobject, val, s->inuse - s->object_size)) return 0; + + if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { + orig_size = get_orig_size(s, object); + + if (s->object_size > orig_size && + !check_bytes_and_report(s, slab, object, + "kmalloc Redzone", p + orig_size, + val, s->object_size - orig_size)) { + return 0; + } + } } else { if ((s->flags & SLAB_POISON) && s->object_size < s->inuse) { check_bytes_and_report(s, slab, p, "Alignment padding", @@ -4187,7 +4229,8 @@ static int calculate_sizes(struct kmem_cache *s) */ s->inuse = size; - if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + if (slub_debug_orig_size(s) || + (flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || s->ctor) { /*