From patchwork Thu Jan 23 10:37:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13948126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53087C02182 for ; Thu, 23 Jan 2025 10:38:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C13EB6B0082; Thu, 23 Jan 2025 05:38:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BC5976B007B; Thu, 23 Jan 2025 05:38:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A65E9280001; Thu, 23 Jan 2025 05:38:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7F5F26B007B for ; Thu, 23 Jan 2025 05:38:00 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 29985120E9E for ; Thu, 23 Jan 2025 10:38:00 +0000 (UTC) X-FDA: 83038366320.19.690D5FC Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf29.hostedemail.com (Postfix) with ESMTP id CF281120008 for ; Thu, 23 Jan 2025 10:37:57 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737628678; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hYjobfsRV6Bv9yMyF621UzjH0Jk51dGWfTZXNQOPQhA=; b=HDx4s06naYtcbne0uWGP+GmCJAVawJDwQyXynAHCof+1/kW26O39AbKtTCSEU62uev+nY/ UwK5WUq3rS7zjawfN3BOEQUaLxowDI+tFd5xX3rojF1gIzfUUaUjUXIvR7GSSwlHYe8RPt ftc25dA97ns2TYAqJ6zQxoOgI+wpX94= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737628678; a=rsa-sha256; cv=none; b=X17FewLUL1lYwdrkV9hr+FdZCcuPi1lhtrf1ddimlzMfvajmW48OJ9PPmHwZp7blGI7c6E Bvdr5Cgci6ADs3/gkdTzyWqYPxEVW5wGOyMm0hW0w3zqJXRHz1YVpjQrZwOKxMGiaR49Gu FqTiey+6+hF8EDom7erveaW6FNxvhIw= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 691061F38E; Thu, 23 Jan 2025 10:37:56 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4E17913AEB; Thu, 23 Jan 2025 10:37:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 0J/IEgQckmdrAQAAD6G6ig (envelope-from ); Thu, 23 Jan 2025 10:37:56 +0000 From: Vlastimil Babka Date: Thu, 23 Jan 2025 11:37:20 +0100 Subject: [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu MIME-Version: 1.0 Message-Id: <20250123-slub-tiny-kfree_rcu-v1-3-0e386ef1541a@suse.cz> References: <20250123-slub-tiny-kfree_rcu-v1-0-0e386ef1541a@suse.cz> In-Reply-To: <20250123-slub-tiny-kfree_rcu-v1-0-0e386ef1541a@suse.cz> To: Christoph Lameter , David Rientjes , "Paul E. McKenney" , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki Cc: Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , rcu@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=7919; i=vbabka@suse.cz; h=from:subject:message-id; bh=WKAvcWMavpHcTSt989mw1tvjqHP9LL5fcBho8SiSQTA=; b=owEBbQGS/pANAwAIAbvgsHXSRYiaAcsmYgBnkhv+Bcof+4VLA8ZCmcNF8uGQ+HPFDOAc3ycvb OlK4KbXIoGJATMEAAEIAB0WIQR7u8hBFZkjSJZITfG74LB10kWImgUCZ5Ib/gAKCRC74LB10kWI mjj9B/wLJAtKN4UJIhpxj+DobUk7Ek5tOmZzstc+dAZsuRqyn198EWVLIS8Uhc0dWq63TqWdQWn /UVNlKV69e0eP38w+/gU5B+BTgFzQ1kkJmxQtUpfao/K0COlP5oIriK+j2W645dlUaK/ztsuTWi Obv0B/MI2gi72hTbQx6gvkzvsRG2utl7kcHKbJqkowShnxyuHhamYklwDT7hk6OLm3agO2tIyRi 3z+cx3ya23EMKYEKGJjHMvhcAGq5pMRUM8jwTvs5ZozAQnw57YEqC3F6FoyxpONhF4yKFNHriGo zanwGQGLGDP/OTdWk4WhZ34dM18Hgr19k/vrwKa6VvcpRYLk X-Developer-Key: i=vbabka@suse.cz; a=openpgp; fpr=A940D434992C2E8E99103D50224FA7E7CC82A664 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Rspamd-Queue-Id: CF281120008 X-Stat-Signature: x4ddnn8xd6h3z47iz6czbocu5kc53cuu X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1737628677-450439 X-HE-Meta: U2FsdGVkX19pmER7hULD6+WjCozlGa8p8G/o/36vESW+RxS8I55/84iRcCkfiAZtkNz26KKPXpyCMfZqyTEvqjg05a71SuuAd8bPAd8xvAuuxwGfuEsfUWfeVypiFZ1e3E2ckWEWRht6ov5W1IuRy+DjGm0JqaJyEIQKIXnPj2bd092DrcluVSBrsf1SHS8mZDuW1J2uq8W1QjR6yM6OvwrFX0kUlUMOqO+TIa2MDhrFmTgfSU+VtAl7ed8QSAymlyP0tMNgwVZ+74J4/7tMqi6Q/XvOZtV4Bkh9IQW5+q2hjbOV2LzCYU3ahALmCR84P+dyMkgjX96ZATlHRIc3iZVc1tJ8t54Mvu9T0zjf1loDvUbKaiOpaPEqq6Doch0REelBtV3mdXzV8Z8zgyps93OUtcl4y/iDJP6/PVDZ+s1oQB2G3C6XRswo7Onn2YvFA4ZAW+cam0AnnYSWNCZxBNWzhNo+aWBjVXyZMRskSzFtnoWGUJ4upyfh6TDJJNEK0tOICTelTkwwJqRYuP7g26NymtSku1aHnQD5ksScqqewPB3dps8VGX7Nb0W7BkpsQmVfZtBp92o2o6wmBwzmPxj6nQyLeFFHpAG/f7cfEN/C8fM57oW6I9p+uvFEiPKv445WshAAtIoU4EyMQ3up7gBrSNd9o5xuvsjOAQca0cG0FSan7e3FXJDdl8nbAUchuF3r7bNgsjejPj3R922BgUNjQ262xAvckBMpaUGK8GCTnpWFpa0IwDP+dneW+uNImYhd1H9hX0joJTIi91vI3/aQguDnlwknvASTi/ty1gbwj5MweY1jkDmCTxTKUhfA96zLUihZ7AL8M9nVV2S5MKUxCAhZR/c89rsum72EFs8qf+8eswwcM7a/jSP9IynXtv8UD/GHN2OUTdMgE17inC9yjegxuCvx9/D/Sd6aYeeDZeCy2JyUDvFC4G6MUKjZLBy2rkxRib4jHubmWAQ /pWnZdvF gi/mhcT8SyKgZs+3/m+IYy2DdNUIfY0wNSGmmLP9212mpVYygXeElo6tLFzgOf2ltddhTfdNWC+zcup7EEQZ521tXYf22MocsMB+erf26t2c4UXS+roq0b6jEq/pMRXT2B/0TT7V3/dXtZRH5c1Ep32GAvI4ceL0U2NtXjD2u2HNxoZxYFnOVchbrgN9mkNyCFb8FLewPkFU+YHZc/UKZwktx5aLoOFY8JCi14gCGzQTLVA4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: RCU has been special-casing callback function pointers that are integers lower than 4096 as offsets of rcu_head for kvfree() instead. The tree RCU implementation no longer does that as the batched kvfree_rcu() is not a simple call_rcu(). The tiny RCU still does, and the plan is also to make tree RCU use call_rcu() for SLUB_TINY configurations. Instead of teaching tree RCU again to special case the offsets, let's remove the special casing completely. Since there's no SLOB anymore, it is possible to create a callback function that can take a pointer to a middle of slab object with unknown offset and determine the object's pointer before freeing it, so implement that as kvfree_rcu_cb(). Large kmalloc and vmalloc allocations are handled simply by aligning down to page size. For that we retain the requirement that the offset is smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely and instead just opencode the condition in the BUILD_BUG_ON() check. Signed-off-by: Vlastimil Babka --- include/linux/rcupdate.h | 24 +++++++++--------------- kernel/rcu/tiny.c | 13 ------------- mm/slab.h | 2 ++ mm/slab_common.c | 5 +---- mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 54 insertions(+), 32 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) #define RCU_POINTER_INITIALIZER(p, v) \ .p = RCU_INITIALIZER(v) -/* - * Does the specified offset indicate that the corresponding rcu_head - * structure can be handled by kvfree_rcu()? - */ -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096) - /** * kfree_rcu() - kfree an object after a grace period. * @ptr: pointer to kfree for double-argument invocations. @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) * when they are used in a kernel module, that module must invoke the * high-latency rcu_barrier() function at module-unload time. * - * The kfree_rcu() function handles this issue. Rather than encoding a - * function address in the embedded rcu_head structure, kfree_rcu() instead - * encodes the offset of the rcu_head structure within the base structure. - * Because the functions are not allowed in the low-order 4096 bytes of - * kernel virtual memory, offsets up to 4095 bytes can be accommodated. + * The kfree_rcu() function handles this issue. In order to have a universal + * callback function handling different offsets of rcu_head, the callback needs + * to determine the starting address of the freed object, which can be a large + * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to + * page boundary for those, only offsets up to 4095 bytes can be accommodated. * If the offset is larger than 4095 bytes, a compile-time error will * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can * either fall back to use of call_rcu() or rearrange the structure to @@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr); do { \ typeof (ptr) ___p = (ptr); \ \ - if (___p) { \ - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \ - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \ - } \ + if (___p) { \ + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \ + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \ + } \ } while (0) #define kvfree_rcu_arg_1(ptr) \ diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index 0ec27093d0e14a4b1060ea08932c4ac13f9b0f26..77e0db0221364376a99ebeb17485650879385a6e 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -88,12 +88,6 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head) unsigned long offset = (unsigned long)head->func; rcu_lock_acquire(&rcu_callback_map); - if (__is_kvfree_rcu_offset(offset)) { - trace_rcu_invoke_kvfree_callback("", head, offset); - kvfree((void *)head - offset); - rcu_lock_release(&rcu_callback_map); - return true; - } trace_rcu_invoke_callback("", head); f = head->func; @@ -159,10 +153,6 @@ void synchronize_rcu(void) } EXPORT_SYMBOL_GPL(synchronize_rcu); -static void tiny_rcu_leak_callback(struct rcu_head *rhp) -{ -} - /* * Post an RCU callback to be invoked after the end of an RCU grace * period. But since we have but one CPU, that would be after any @@ -178,9 +168,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) pr_err("%s(): Double-freed CB %p->%pS()!!! ", __func__, head, head->func); mem_dump_obj(head); } - - if (!__is_kvfree_rcu_offset((unsigned long)head->func)) - WRITE_ONCE(head->func, tiny_rcu_leak_callback); return; } diff --git a/mm/slab.h b/mm/slab.h index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects, struct slabobj_ext *obj_exts); #endif +void kvfree_rcu_cb(struct rcu_head *head); + size_t __ksize(const void *objp); static inline size_t slab_ksize(const struct kmem_cache *s) diff --git a/mm/slab_common.c b/mm/slab_common.c index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head) rcu_lock_acquire(&rcu_callback_map); trace_rcu_invoke_kvfree_callback("slab", head, offset); - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) - kvfree(ptr); - rcu_lock_release(&rcu_callback_map); cond_resched_tasks_rcu_qs(); } @@ -1867,7 +1864,7 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) { if (head) { kasan_record_aux_stack_noalloc(ptr); - call_rcu(head, (rcu_callback_t) ((void *) head - ptr)); + call_rcu(head, kvfree_rcu_cb); return; } diff --git a/mm/slub.c b/mm/slub.c index c2151c9fee228d121a9cbcc220c3ae054769dacf..651381bf05566e88de8493e0550f121d23b757a1 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -19,6 +19,7 @@ #include #include #include "slab.h" +#include #include #include #include @@ -4732,6 +4733,47 @@ static void free_large_kmalloc(struct folio *folio, void *object) folio_put(folio); } +void kvfree_rcu_cb(struct rcu_head *head) +{ + void *obj = head; + struct folio *folio; + struct slab *slab; + struct kmem_cache *s; + void *slab_addr; + + if (unlikely(is_vmalloc_addr(obj))) { + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj); + vfree(obj); + return; + } + + folio = virt_to_folio(obj); + if (unlikely(!folio_test_slab(folio))) { + /* + * rcu_head offset can be only less than page size so no need to + * consider folio order + */ + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj); + free_large_kmalloc(folio, obj); + return; + } + + slab = folio_slab(folio); + s = slab->slab_cache; + slab_addr = folio_address(folio); + + if (is_kfence_address(obj)) { + obj = kfence_object_start(obj); + } else { + unsigned int idx = __obj_to_index(s, slab_addr, obj); + + obj = slab_addr + s->size * idx; + obj = fixup_red_left(s, obj); + } + + slab_free(s, slab, obj, _RET_IP_); +} + /** * kfree - free previously allocated memory * @object: pointer returned by kmalloc() or kmem_cache_alloc()