From patchwork Mon Feb 3 09:28:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13957131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9E01C02192 for ; Mon, 3 Feb 2025 09:29:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A57E280011; Mon, 3 Feb 2025 04:29:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 52826280002; Mon, 3 Feb 2025 04:29:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DE38280011; Mon, 3 Feb 2025 04:29:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 041AC280002 for ; Mon, 3 Feb 2025 04:29:33 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7BC6C1CAA81 for ; Mon, 3 Feb 2025 09:29:32 +0000 (UTC) X-FDA: 83078110584.21.FA33236 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf01.hostedemail.com (Postfix) with ESMTP id 3795A40008 for ; Mon, 3 Feb 2025 09:29:29 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=0Qxyruqu; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mpHr8Utj; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=0Qxyruqu; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mpHr8Utj; spf=pass (imf01.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738574970; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0EAiV0EQ0UZEjL4mUXNSvL6nqq3icXpaRdqmV/qKRdM=; b=ZhqdRzqU3F8RWGrPSCpKkNlRHs4Z+jPMouuWgjwiotW/duIFc5eC4hMlnCl+vv9AlJp0Ri zhF+v6rFM8rNSXbDzxJuzgu2gsE5a83yt4ePj6K8vMpq3ftGGZeXOnhiSQGE8TykuyhHhC A+ojVHGjAQ8N2vnkkMwHjsYAcv9bBlw= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=0Qxyruqu; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mpHr8Utj; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=0Qxyruqu; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mpHr8Utj; spf=pass (imf01.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738574970; a=rsa-sha256; cv=none; b=RlFaN7jiwQ2E3b/nkhnGEPjC2jGWM0HPjuEURtqt9R7cK7Lopno9fNpBBQ3yuyOH9NGl5V iwFiGM4xYrNxPH4Ay5zN4gUKq1H+B6FialUnY1JZ8OC6zxcregWTfNBxUyBX5EypRTbMvi Vdq6zbawDVqvNxrzpr3dz9BQMcdWacQ= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id AEF321F381; Mon, 3 Feb 2025 09:29:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1738574968; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0EAiV0EQ0UZEjL4mUXNSvL6nqq3icXpaRdqmV/qKRdM=; b=0Qxyruqu06l41TSkvtmP7rXzERFZbhb89hIZTGSuoVH94HG5UYmQ8ntLjtkYKDxAvW/NJ+ xD6OHEa9E9TNVLWm+f4DqxI707Kd9xN4fgpMcYdmT0/hkMqC/RYAQzW1whoIUlr5BOs+vK wNaDLpf4+uAnoUAp1pRhXx1Wq1NhcwM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1738574968; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0EAiV0EQ0UZEjL4mUXNSvL6nqq3icXpaRdqmV/qKRdM=; b=mpHr8Utj3tzbk8Isy/GcXAvB4aFtM3pfE2N1pgSKK+FihsiFNUYJgfRgbY46QhTV4LspCb ZoxxFIqBM4d4LYBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1738574968; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0EAiV0EQ0UZEjL4mUXNSvL6nqq3icXpaRdqmV/qKRdM=; b=0Qxyruqu06l41TSkvtmP7rXzERFZbhb89hIZTGSuoVH94HG5UYmQ8ntLjtkYKDxAvW/NJ+ xD6OHEa9E9TNVLWm+f4DqxI707Kd9xN4fgpMcYdmT0/hkMqC/RYAQzW1whoIUlr5BOs+vK wNaDLpf4+uAnoUAp1pRhXx1Wq1NhcwM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1738574968; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0EAiV0EQ0UZEjL4mUXNSvL6nqq3icXpaRdqmV/qKRdM=; b=mpHr8Utj3tzbk8Isy/GcXAvB4aFtM3pfE2N1pgSKK+FihsiFNUYJgfRgbY46QhTV4LspCb ZoxxFIqBM4d4LYBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 916EA13AE2; Mon, 3 Feb 2025 09:29:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id UEs8I3iMoGfwYAAAD6G6ig (envelope-from ); Mon, 03 Feb 2025 09:29:28 +0000 From: Vlastimil Babka Date: Mon, 03 Feb 2025 10:28:49 +0100 Subject: [PATCH 3/4] rcu, slab: use a regular callback function for kvfree_rcu MIME-Version: 1.0 Message-Id: <20250203-slub-tiny-kfree_rcu-v1-3-d4428bf9a8a1@suse.cz> References: <20250203-slub-tiny-kfree_rcu-v1-0-d4428bf9a8a1@suse.cz> In-Reply-To: <20250203-slub-tiny-kfree_rcu-v1-0-d4428bf9a8a1@suse.cz> To: Christoph Lameter , David Rientjes , "Paul E. McKenney" , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki Cc: Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , rcu@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.2 X-Rspamd-Queue-Id: 3795A40008 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 8pnkreyky19gkgjzomu9n47qtihczefp X-HE-Tag: 1738574969-146156 X-HE-Meta: U2FsdGVkX1/K8Pi5su3uqmMkg8rN8Pjmo/GoKgghGYQFhOMtiAMjM+e7qOj9S4KjB6CgmhmgwrdlC5wTXPv9IJ/3WMo8qvKbcV62hBuBZfX6NgntchKB6hV72b9PCl6upzv3hZV1dmNvfxnTOYm774yHpZqzH+3fWo0xRBM2rQShpEwiYodMgFc44OLg+DpuQBAyI6o24DcljZD+9Qzk7PWu1ELshDXPVRZp6J7rPtkxTz15fkuoFbdHEp4ZpcH62AG0ro5LQ1GSFPTSjYG3YXaVsDzuCjWbQWpEPZWF5hFnHf9Ykuq1mKCsO+bxi2ENR1IoK2a0hFUhNq7S9xSZMk0lHtGpxyXyKSXQBAjvfOc6p/EVHCPGqKYZEGjBB/5FVHIliKdCx5v2ZKcAnOq6lnsRXBkmprpQE5ys8CUVZv6l475YnJZtcmTuHNh3KX5YZUmfO86km+7L8h4zDUtIuJKUj5+nqL7HEanUbhHLgtNpElaQTeUYeGi0Hjvd1rG1XFYWYco3/yMZ9HyKz03QZTnPRLYd5Fzz49WBR62KzeIGbfDU9z27WJQumQYbe5fPnA1g4YSM7V5PDVoN3pQJ+qCUBjRtg2X4v9+yU/BfVNZCXfho6rA8/1114FKrSTXhvE21IG5Eq/WDZFJ21tTI3iSsZ5Cv8TKkkMItrVu9pJhvbZnrNMxxD/Q45bb6dymXhThoA/x+YYBjWxx9XX6WYIljlD+fK2QywrV6aMBzAnChTGyTnaseRNpvagVVkniTKBhbuwQxOXvyb5W2TdeBvQNwIa7dcE8Xr/JWI5Ze5+6ej1/NHO7kB2A9fLPkt80+lpcyjy0gdIcaSuGxL9IY51DHBkEwegpuB1B42vAamtueqLyZQPu8eVmW6WrcnKKqW4z+kgaLrDwEbbY0ruYZCRz5oCli/JAZDz67ln5q+Xp5hby2+gIBVuFDQC+Zbb8vpEuV9Od4ZkTQtPuOvEz 0/Yyg29H Uq+PKwB0fS+vTWtSnTPnmQeOkf0LKWWzfW8MjVMpI/s+OjYziyPDkVTSzG9TsmQBqiLrMbEgog3Cmk/Axay+06UZZVQb/etMt7KE4o8sjaFJhKW3ON7XBhNfHC9dONDcLsoWWbO2pCe6EOXHtN45VW/UdL+PvpVBOxQWCZgyAOCW0mpfhxYRQXmgma76T8Wi9em5TLDq7yLfCCLAuA9P74eSRKJAUaXxktv3qZZmSj8juDvvjgwjJafmVo602qLliQ4aoT2ZNlGmfwaL3a84k9hC82H5punA8NS+NjJnjt/8UKvhgup038DymcvpiwSYLbjxOKhcq0ona8h+4EAuh0KOso+v/x1bX+mex/86gbK6Ma85KWHFo1zWtVlDvsqQG4I6gk77EGbSXd+jHjP5vbJvPUSlNpdk5zSlR73Z5NV1PJbo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: RCU has been special-casing callback function pointers that are integers lower than 4096 as offsets of rcu_head for kvfree() instead. The tree RCU implementation no longer does that as the batched kvfree_rcu() is not a simple call_rcu(). The tiny RCU still does, and the plan is also to make tree RCU use call_rcu() for SLUB_TINY configurations. Instead of teaching tree RCU again to special case the offsets, let's remove the special casing completely. Since there's no SLOB anymore, it is possible to create a callback function that can take a pointer to a middle of slab object with unknown offset and determine the object's pointer before freeing it, so implement that as kvfree_rcu_cb(). Large kmalloc and vmalloc allocations are handled simply by aligning down to page size. For that we retain the requirement that the offset is smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely and instead just opencode the condition in the BUILD_BUG_ON() check. Reviewed-by: Joel Fernandes (Google) Signed-off-by: Vlastimil Babka Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- include/linux/rcupdate.h | 28 +++++++++++++--------------- kernel/rcu/tiny.c | 14 -------------- mm/slab.h | 2 ++ mm/slab_common.c | 5 ++--- mm/slub.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 63 insertions(+), 32 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3f70d1c8144426f40553c8c589f07097ece8a706..23bcf71ffb06ecf60d42690803ffc5adb7d1aedd 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) #define RCU_POINTER_INITIALIZER(p, v) \ .p = RCU_INITIALIZER(v) -/* - * Does the specified offset indicate that the corresponding rcu_head - * structure can be handled by kvfree_rcu()? - */ -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096) - /** * kfree_rcu() - kfree an object after a grace period. * @ptr: pointer to kfree for double-argument invocations. @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) * when they are used in a kernel module, that module must invoke the * high-latency rcu_barrier() function at module-unload time. * - * The kfree_rcu() function handles this issue. Rather than encoding a - * function address in the embedded rcu_head structure, kfree_rcu() instead - * encodes the offset of the rcu_head structure within the base structure. - * Because the functions are not allowed in the low-order 4096 bytes of - * kernel virtual memory, offsets up to 4095 bytes can be accommodated. + * The kfree_rcu() function handles this issue. In order to have a universal + * callback function handling different offsets of rcu_head, the callback needs + * to determine the starting address of the freed object, which can be a large + * kmalloc or vmalloc allocation. To allow simply aligning the pointer down to + * page boundary for those, only offsets up to 4095 bytes can be accommodated. * If the offset is larger than 4095 bytes, a compile-time error will * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can * either fall back to use of call_rcu() or rearrange the structure to @@ -1087,14 +1081,18 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) */ void kvfree_call_rcu(struct rcu_head *head, void *ptr); +/* + * The BUILD_BUG_ON() makes sure the rcu_head offset can be handled. See the + * comment of kfree_rcu() for details. + */ #define kvfree_rcu_arg_2(ptr, rhf) \ do { \ typeof (ptr) ___p = (ptr); \ \ - if (___p) { \ - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \ - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \ - } \ + if (___p) { \ + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \ + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \ + } \ } while (0) #define kvfree_rcu_arg_1(ptr) \ diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index 0ec27093d0e14a4b1060ea08932c4ac13f9b0f26..7a34a99d4664f3fc050ec22378782fe26e8d1e95 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -85,15 +85,8 @@ void rcu_sched_clock_irq(int user) static inline bool rcu_reclaim_tiny(struct rcu_head *head) { rcu_callback_t f; - unsigned long offset = (unsigned long)head->func; rcu_lock_acquire(&rcu_callback_map); - if (__is_kvfree_rcu_offset(offset)) { - trace_rcu_invoke_kvfree_callback("", head, offset); - kvfree((void *)head - offset); - rcu_lock_release(&rcu_callback_map); - return true; - } trace_rcu_invoke_callback("", head); f = head->func; @@ -159,10 +152,6 @@ void synchronize_rcu(void) } EXPORT_SYMBOL_GPL(synchronize_rcu); -static void tiny_rcu_leak_callback(struct rcu_head *rhp) -{ -} - /* * Post an RCU callback to be invoked after the end of an RCU grace * period. But since we have but one CPU, that would be after any @@ -178,9 +167,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) pr_err("%s(): Double-freed CB %p->%pS()!!! ", __func__, head, head->func); mem_dump_obj(head); } - - if (!__is_kvfree_rcu_offset((unsigned long)head->func)) - WRITE_ONCE(head->func, tiny_rcu_leak_callback); return; } diff --git a/mm/slab.h b/mm/slab.h index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects, struct slabobj_ext *obj_exts); #endif +void kvfree_rcu_cb(struct rcu_head *head); + size_t __ksize(const void *objp); static inline size_t slab_ksize(const struct kmem_cache *s) diff --git a/mm/slab_common.c b/mm/slab_common.c index 81a0ce77b11c2ef8db9164cdca5853069402f161..6438a38aa5dc2ede0b5afa04bc3fbff5a4697d87 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1290,7 +1290,7 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) { if (head) { kasan_record_aux_stack(ptr); - call_rcu(head, (rcu_callback_t) ((void *) head - ptr)); + call_rcu(head, kvfree_rcu_cb); return; } @@ -1551,8 +1551,7 @@ kvfree_rcu_list(struct rcu_head *head) rcu_lock_acquire(&rcu_callback_map); trace_rcu_invoke_kvfree_callback("slab", head, offset); - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) - kvfree(ptr); + kvfree(ptr); rcu_lock_release(&rcu_callback_map); cond_resched_tasks_rcu_qs(); diff --git a/mm/slub.c b/mm/slub.c index 1f50129dcfb3cd1fc76ac9398fa7718cedb42385..e8273f28656936c05d015c53923f8fe69cd161b2 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -19,6 +19,7 @@ #include #include #include "slab.h" +#include #include #include #include @@ -4728,6 +4729,51 @@ static void free_large_kmalloc(struct folio *folio, void *object) folio_put(folio); } +/* + * Given an rcu_head embedded within an object obtained from kvmalloc at an + * offset < 4k, free the object in question. + */ +void kvfree_rcu_cb(struct rcu_head *head) +{ + void *obj = head; + struct folio *folio; + struct slab *slab; + struct kmem_cache *s; + void *slab_addr; + + if (is_vmalloc_addr(obj)) { + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj); + vfree(obj); + return; + } + + folio = virt_to_folio(obj); + if (!folio_test_slab(folio)) { + /* + * rcu_head offset can be only less than page size so no need to + * consider folio order + */ + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj); + free_large_kmalloc(folio, obj); + return; + } + + slab = folio_slab(folio); + s = slab->slab_cache; + slab_addr = folio_address(folio); + + if (is_kfence_address(obj)) { + obj = kfence_object_start(obj); + } else { + unsigned int idx = __obj_to_index(s, slab_addr, obj); + + obj = slab_addr + s->size * idx; + obj = fixup_red_left(s, obj); + } + + slab_free(s, slab, obj, _RET_IP_); +} + /** * kfree - free previously allocated memory * @object: pointer returned by kmalloc() or kmem_cache_alloc()