From patchwork Fri Feb 14 16:27:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13975227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 065F3C02198 for ; Fri, 14 Feb 2025 16:28:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25DAF280002; Fri, 14 Feb 2025 11:28:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1468B6B0095; Fri, 14 Feb 2025 11:28:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC9DF280002; Fri, 14 Feb 2025 11:28:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B03616B0093 for ; Fri, 14 Feb 2025 11:28:04 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6D3F4140675 for ; Fri, 14 Feb 2025 16:28:04 +0000 (UTC) X-FDA: 83119082088.20.91C3996 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf05.hostedemail.com (Postfix) with ESMTP id 3A0B2100005 for ; Fri, 14 Feb 2025 16:28:01 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ILIM8IK+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=eu8p7c8w; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ILIM8IK+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=eu8p7c8w; spf=pass (imf05.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739550482; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2gC+NW6XckiNVk+/D7b+N9hWo76+AgBZY8MvMIQKRf8=; b=50X2NrIFwrtyiwwGvZSZw6GAdC/rXLhhHTsq/b37YwqD88AFgbeM+0mcRTlGRmKok6t49Z vB/zfr2jlcKHYdv03ONbPCOsk05sme81mZBMC8uKfNfim7ggB98lsMziUPnY6DHc/ILOre EKYg1kysAxY8LLhE5jNEkG6aFfVnZw0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739550482; a=rsa-sha256; cv=none; b=OSK4F0SLFgG+cGdr/KZkhq6BnWnms5gj1b/1WNrttDwqCK7BXCX2tIUSLZeR9Jqo1hWcDv PhsYtUjzbGKPaG6WxSh4cjNjh6Oh6U22cUn0JQNY+153WdKYimXK4g3aeZUGRu+Wg03uLz KW98ET01sULFJHFqJpqkbLXdnxl+D1w= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ILIM8IK+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=eu8p7c8w; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ILIM8IK+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=eu8p7c8w; spf=pass (imf05.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E071B2117F; Fri, 14 Feb 2025 16:27:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1739550462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2gC+NW6XckiNVk+/D7b+N9hWo76+AgBZY8MvMIQKRf8=; b=ILIM8IK+3J4IfEo0gZOouIpIlScU7OJCRqnNHFe9OiOkcLyUROo61BwoJ8WWUBi0ufsryN jj5AGVgjKVrOT90+Uf3/hL1ntH5+F1/ZrHW6qGo3Wd0OzVo+upLsfIZ2tvvp0iFIrGAYkR 28dXjHWZ8PABBhSU2SEVH5uCfBz8ZzM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1739550462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2gC+NW6XckiNVk+/D7b+N9hWo76+AgBZY8MvMIQKRf8=; b=eu8p7c8wwEumPgH4SHoMMDznurIxOg1TRknhjPw34dLYnTNadrhGOra2epncsvGk1I/q3M 2txSX63oXmM3Q1Bg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1739550462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2gC+NW6XckiNVk+/D7b+N9hWo76+AgBZY8MvMIQKRf8=; b=ILIM8IK+3J4IfEo0gZOouIpIlScU7OJCRqnNHFe9OiOkcLyUROo61BwoJ8WWUBi0ufsryN jj5AGVgjKVrOT90+Uf3/hL1ntH5+F1/ZrHW6qGo3Wd0OzVo+upLsfIZ2tvvp0iFIrGAYkR 28dXjHWZ8PABBhSU2SEVH5uCfBz8ZzM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1739550462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2gC+NW6XckiNVk+/D7b+N9hWo76+AgBZY8MvMIQKRf8=; b=eu8p7c8wwEumPgH4SHoMMDznurIxOg1TRknhjPw34dLYnTNadrhGOra2epncsvGk1I/q3M 2txSX63oXmM3Q1Bg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id CAF8313AFC; Fri, 14 Feb 2025 16:27:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OIBIMf5ur2eHSAAAD6G6ig (envelope-from ); Fri, 14 Feb 2025 16:27:42 +0000 From: Vlastimil Babka Date: Fri, 14 Feb 2025 17:27:42 +0100 Subject: [PATCH RFC v2 06/10] slab: sheaf prefilling for guaranteed allocations MIME-Version: 1.0 Message-Id: <20250214-slub-percpu-caches-v2-6-88592ee0966a@suse.cz> References: <20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz> In-Reply-To: <20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, Vlastimil Babka X-Mailer: b4 0.14.2 X-Rspam-User: X-Rspamd-Queue-Id: 3A0B2100005 X-Rspamd-Server: rspam07 X-Stat-Signature: qyp8t5w7i35gw85xuny9tbohpqb6gkqq X-HE-Tag: 1739550481-705899 X-HE-Meta: U2FsdGVkX1/4AH43mpgWkNm9dsXNfid9vTXFDUbhLzHlHxK7WP54Oz1/8iUvKmfmQcKF7JbK09LPJuZS0R3KIA0SvPOCFUchPPdyppdbe0f5b82u2dhQx9KLbewIO/JSZFKW8FaM2JLLGGKWACclJCHDOEryucit7M8SKcGpg6FP1HkhKJOR0Wfb2SXGS9MHdPyuZqvSOWDYsfiptZ4G8JDiRYcusV05gpEBdgNRJX8O8SrtX2tMKw3m7N8OPD8++sKRIB1PSMKpmQyP0r0nbZt1c0XurXsF0VwFC5cc5cFS8hcuPaF/5wRfQgznoleSuxbKrtiU3Gdzkr5GUwlupxCKNeU+gmk0YhQL86KHatHEAiV9r6kcDyha3JmTWmX/iPoMvbPRwXHIu5TTmM6q1nVap/j/TkaPiM2fy8Q4f9gkRe6QjZcUoxRSwG2Qx6zVujrSJigJxtlT2SFLBmSghKA14nJ/TR/bFZH8ou3BiOGthBlx6D/CIfLhmhP/rk3du1TgEl3o3DxJ8JI9O07nbEf3CTo78HasRZ5JkgmTpzxGfC638uDPbnZFxw9HC5lycRspJUyuyqBYiGBk6juG6ICcI8zPDHXzJVEUatSwYzm4SJ1tikamay8N4ibeKKdO3Vvh8tAy3XNXIJfeOqkTIKElXEyGnzHMN+xBy9UjCtC7CxaGdn90G/mhNWY7I6YcD8KLGeARX2iodc8X62AGbBtMS7BXN3U+pvld5WjINYy2LCUta1g4+PJuNM93i4lvhiZWP67Cl5rrf5QQDWwSv/fI2bNqgwtwzcS1QSt63a4dILKrkqeCR92b5+gAZQhVGqE6cpv5VoNVs0InuV/Izg9dC9q2SZcRTg55Si0OggmaTOw8p7FCWUdZqblt84on+xeKOmRrLUAoZKWmeNny1HkCKpufDSzVKe36wM2TW7yHznZIPPChlTkBI1txBPvf8Dr7Wj0Cyzm3fkM7T3B AA107Jp8 YY3CZeNYiTzdjmRDaxv7/nsjxdHuZcUC/D5C53am6oclcPC1sOWbVolVkJdoaoRvFLmCY2OjBW3xV/2KiZm9eVLSN785Jt9HPOaoqVUs+a/hZbfV9SBo8ZdD8BkWUtUOg1LqvjNo7bGzqM2OCHIO/5O56ON5BTgTOd7NcsBON89ExrwxywKMzB20ItWx66mHNVh+bGczBQ56blCOVcr0NCpg/GofKyeybmpZ8WoaW0X0kx/Ki5Qxd4aLQos4FS6azWwZb7VDUeHR0sXVbhCtZ9ukjGeJVuqoxcOz63VZc23glPD+/maPCGXGYu/eoHwCVVaE0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add functions for efficient guaranteed allocations e.g. in a critical section that cannot sleep, when the exact number of allocations is not known beforehand, but an upper limit can be calculated. kmem_cache_prefill_sheaf() returns a sheaf containing at least given number of objects. kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf and is guaranteed not to fail until depleted. kmem_cache_return_sheaf() is for giving the sheaf back to the slab allocator after the critical section. This will also attempt to refill it to cache's sheaf capacity for better efficiency of sheaves handling, but it's not stricly necessary to succeed. kmem_cache_refill_sheaf() can be used to refill a previously obtained sheaf to requested size. If the current size is sufficient, it does nothing. If the requested size exceeds cache's sheaf_capacity and the sheaf's current capacity, the sheaf will be replaced with a new one, hence the indirect pointer parameter. kmem_cache_sheaf_size() can be used to query the current size. The implementation supports requesting sizes that exceed cache's sheaf_capacity, but it is not efficient - such sheaves are allocated fresh in kmem_cache_prefill_sheaf() and flushed and freed immediately by kmem_cache_return_sheaf(). kmem_cache_refill_sheaf() might be expecially ineffective when replacing a sheaf with a new one of a larger capacity. It is therefore better to size cache's sheaf_capacity accordingly. Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 16 ++++ mm/slub.c | 227 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 243 insertions(+) diff --git a/include/linux/slab.h b/include/linux/slab.h index 0e1b25228c77140d05b5b4433c9d7923de36ec05..dd01b67982e856b1b02f4f0e6fc557726e7f02a8 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -829,6 +829,22 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment __malloc; #define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__)) +struct slab_sheaf * +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size); + +int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf **sheafp, unsigned int size); + +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf); + +void *kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *cachep, gfp_t gfp, + struct slab_sheaf *sheaf) __assume_slab_alignment __malloc; +#define kmem_cache_alloc_from_sheaf(...) \ + alloc_hooks(kmem_cache_alloc_from_sheaf_noprof(__VA_ARGS__)) + +unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf); + /* * These macros allow declaring a kmem_buckets * parameter alongside size, which * can be compiled out with CONFIG_SLAB_BUCKETS=n so that a large number of call diff --git a/mm/slub.c b/mm/slub.c index 3d7345e7e938d53950ed0d6abe8eb0e93cf8f5b1..c1df7cf22267f28f743404531bef921e25fac086 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -443,6 +443,8 @@ struct slab_sheaf { union { struct rcu_head rcu_head; struct list_head barn_list; + /* only used for prefilled sheafs */ + unsigned int capacity; }; struct kmem_cache *cache; unsigned int size; @@ -2735,6 +2737,30 @@ static int barn_put_full_sheaf(struct node_barn *barn, struct slab_sheaf *sheaf, return ret; } +static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct node_barn *barn) +{ + struct slab_sheaf *sheaf = NULL; + unsigned long flags; + + spin_lock_irqsave(&barn->lock, flags); + + if (barn->nr_full) { + sheaf = list_first_entry(&barn->sheaves_full, struct slab_sheaf, + barn_list); + list_del(&sheaf->barn_list); + barn->nr_full--; + } else if (barn->nr_empty) { + sheaf = list_first_entry(&barn->sheaves_empty, + struct slab_sheaf, barn_list); + list_del(&sheaf->barn_list); + barn->nr_empty--; + } + + spin_unlock_irqrestore(&barn->lock, flags); + + return sheaf; +} + /* * If a full sheaf is available, return it and put the supplied empty one to * barn. We ignore the limit on empty sheaves as the number of sheaves doesn't @@ -4831,6 +4857,207 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int nod } EXPORT_SYMBOL(kmem_cache_alloc_node_noprof); + +/* + * returns a sheaf that has least the requested size + * when prefilling is needed, do so with given gfp flags + * + * return NULL if sheaf allocation or prefilling failed + */ +struct slab_sheaf * +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size) +{ + struct slub_percpu_sheaves *pcs; + struct slab_sheaf *sheaf = NULL; + + if (unlikely(size > s->sheaf_capacity)) { + sheaf = kzalloc(struct_size(sheaf, objects, size), gfp); + if (!sheaf) + return NULL; + + sheaf->cache = s; + sheaf->capacity = size; + + if (!__kmem_cache_alloc_bulk(s, gfp, size, + &sheaf->objects[0])) { + kfree(sheaf); + return NULL; + } + + sheaf->size = size; + + return sheaf; + } + + localtry_lock(&s->cpu_sheaves->lock); + pcs = this_cpu_ptr(s->cpu_sheaves); + + if (pcs->spare) { + sheaf = pcs->spare; + pcs->spare = NULL; + } + + if (!sheaf) + sheaf = barn_get_full_or_empty_sheaf(pcs->barn); + + localtry_unlock(&s->cpu_sheaves->lock); + + if (!sheaf) { + sheaf = alloc_empty_sheaf(s, gfp); + } + + if (sheaf && sheaf->size < size) { + if (refill_sheaf(s, sheaf, gfp)) { + sheaf_flush(s, sheaf); + free_empty_sheaf(s, sheaf); + sheaf = NULL; + } + } + + if (sheaf) + sheaf->capacity = s->sheaf_capacity; + + return sheaf; +} + +/* + * Use this to return a sheaf obtained by kmem_cache_prefill_sheaf() + * It tries to refill the sheaf back to the cache's sheaf_capacity + * to avoid handling partially full sheaves. + * + * If the refill fails because gfp is e.g. GFP_NOWAIT, the sheaf is + * instead dissolved + */ +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf) +{ + struct slub_percpu_sheaves *pcs; + bool refill = false; + struct node_barn *barn; + + if (unlikely(sheaf->capacity != s->sheaf_capacity)) { + sheaf_flush(s, sheaf); + kfree(sheaf); + return; + } + + localtry_lock(&s->cpu_sheaves->lock); + pcs = this_cpu_ptr(s->cpu_sheaves); + + if (!pcs->spare) { + pcs->spare = sheaf; + sheaf = NULL; + } else if (pcs->barn->nr_full >= MAX_FULL_SHEAVES) { + /* racy check */ + barn = pcs->barn; + refill = true; + } + + localtry_unlock(&s->cpu_sheaves->lock); + + if (!sheaf) + return; + + /* + * if the barn is full of full sheaves or we fail to refill the sheaf, + * simply flush and free it + */ + if (!refill || refill_sheaf(s, sheaf, gfp)) { + sheaf_flush(s, sheaf); + free_empty_sheaf(s, sheaf); + return; + } + + /* we racily determined the sheaf would fit, so now force it */ + barn_put_full_sheaf(barn, sheaf, true); +} + +/* + * refill a sheaf previously returned by kmem_cache_prefill_sheaf to at least + * the given size + * + * the sheaf might be replaced by a new one when requesting more than + * s->sheaf_capacity objects if such replacement is necessary, but the refill + * fails (with -ENOMEM), the existing sheaf is left intact + */ +int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf **sheafp, unsigned int size) +{ + struct slab_sheaf *sheaf; + + /* + * TODO: do we want to support *sheaf == NULL to be equivalent of + * kmem_cache_prefill_sheaf() ? + */ + if (!sheafp || !(*sheafp)) + return -EINVAL; + + sheaf = *sheafp; + if (sheaf->size >= size) + return 0; + + if (likely(sheaf->capacity >= size)) { + if (likely(sheaf->capacity == s->sheaf_capacity)) + return refill_sheaf(s, sheaf, gfp); + + if (!__kmem_cache_alloc_bulk(s, gfp, sheaf->capacity - sheaf->size, + &sheaf->objects[sheaf->size])) { + return -ENOMEM; + } + sheaf->size = sheaf->capacity; + + return 0; + } + + /* + * We had a regular sized sheaf and need an oversize one, or we had an + * oversize one already but need a larger one now. + * This should be a very rare path so let's not complicate it. + */ + sheaf = kmem_cache_prefill_sheaf(s, gfp, size); + if (!sheaf) + return -ENOMEM; + + kmem_cache_return_sheaf(s, gfp, *sheafp); + *sheafp = sheaf; + return 0; +} + +/* + * Allocate from a sheaf obtained by kmem_cache_prefill_sheaf() + * + * Guaranteed not to fail as many allocations as was the requested size. + * After the sheaf is emptied, it fails - no fallback to the slab cache itself. + * + * The gfp parameter is meant only to specify __GFP_ZERO or __GFP_ACCOUNT + * memcg charging is forced over limit if necessary, to avoid failure. + */ +void * +kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf) +{ + void *ret = NULL; + bool init; + + if (sheaf->size == 0) + goto out; + + ret = sheaf->objects[--sheaf->size]; + + init = slab_want_init_on_alloc(gfp, s); + + /* add __GFP_NOFAIL to force successful memcg charging */ + slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->object_size); +out: + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE); + + return ret; +} + +unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf) +{ + return sheaf->size; +} /* * To avoid unnecessary overhead, we pass through large allocation requests * directly to the page allocator. We use __GFP_COMP, because we will need to