From patchwork Mon Jul 15 20:29:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13733873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D818BC3DA59 for ; Mon, 15 Jul 2024 20:29:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89B216B009E; Mon, 15 Jul 2024 16:29:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 685DD6B00AB; Mon, 15 Jul 2024 16:29:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DECF06B00AC; Mon, 15 Jul 2024 16:29:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7799E6B009F for ; Mon, 15 Jul 2024 16:29:43 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 091521A04C6 for ; Mon, 15 Jul 2024 20:29:43 +0000 (UTC) X-FDA: 82343127846.25.7B2930A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf21.hostedemail.com (Postfix) with ESMTP id C0E471C0020 for ; Mon, 15 Jul 2024 20:29:40 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=pbG0llnr; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=97BjocNX; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=pbG0llnr; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=97BjocNX; spf=pass (imf21.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721075344; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c5NdmWuv3Hvepg9STEhFTiZ7iZ+qzw7pFNvQWY9q/Os=; b=CwQweSOmH1dfn7lkYOixYiBUETE8iYfGeeHnjtmJgkCq4AkPsHMpKpw+43oodvK5DNlxwM WOdYTQitkOTeSvrgs3MFbxI+O2x9Fr84Snd+p9V9qlE/a+FPyB1yPvE08VC8apu/XCkXNZ AR4fjgDar4h4PeIpkDiJFh5LcEYx5zQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721075344; a=rsa-sha256; cv=none; b=llm+bqSRYkmB5jlxw+0z8myKjs6Mq6HwzlzCoC01zQ1b1R2ZG+qEGRXAHX0PFDNDRMDjXL ImaOLZoA+xvIydA7EspfkNONb6W6RNJyXwWeXIQcOi6oMmHefkYuN+cCEfTPqEXMk5AML1 JhL9pf5cXbXAPFxOgN6cCqFmvPGvygo= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=pbG0llnr; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=97BjocNX; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=pbG0llnr; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=97BjocNX; spf=pass (imf21.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 350E221BCF; Mon, 15 Jul 2024 20:29:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1721075379; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5NdmWuv3Hvepg9STEhFTiZ7iZ+qzw7pFNvQWY9q/Os=; b=pbG0llnr/a+hFS/aFYyY7iQhUo9J3Wd8lOWvT1qWA9/XkF+qfpvwM//mHrWFKSTPxhXVEe zbwxz4zS8s/7SSuWKiw2wR23oMVa/QOGGzCmFoCeaXeLcrBVQrCw9TeP3gyEjSiP6Xax+a oBM6hahMKNsMzEDYtO0hstWCtbwQ084= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1721075379; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5NdmWuv3Hvepg9STEhFTiZ7iZ+qzw7pFNvQWY9q/Os=; b=97BjocNXuZryya7AfWUdPflgXoUPJtywDuTsx6AhZ3hrUWAHocxKNhe1/FtxgkiAol+Mmj mG/gcI85urtW8DBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1721075379; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5NdmWuv3Hvepg9STEhFTiZ7iZ+qzw7pFNvQWY9q/Os=; b=pbG0llnr/a+hFS/aFYyY7iQhUo9J3Wd8lOWvT1qWA9/XkF+qfpvwM//mHrWFKSTPxhXVEe zbwxz4zS8s/7SSuWKiw2wR23oMVa/QOGGzCmFoCeaXeLcrBVQrCw9TeP3gyEjSiP6Xax+a oBM6hahMKNsMzEDYtO0hstWCtbwQ084= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1721075379; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5NdmWuv3Hvepg9STEhFTiZ7iZ+qzw7pFNvQWY9q/Os=; b=97BjocNXuZryya7AfWUdPflgXoUPJtywDuTsx6AhZ3hrUWAHocxKNhe1/FtxgkiAol+Mmj mG/gcI85urtW8DBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 13E72137EB; Mon, 15 Jul 2024 20:29:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id YBKJBLOGlWbvUwAAD6G6ig (envelope-from ); Mon, 15 Jul 2024 20:29:39 +0000 From: Vlastimil Babka Date: Mon, 15 Jul 2024 22:29:31 +0200 Subject: [PATCH RFC 5/6] mm, slab: asynchronously destroy caches with outstanding objects MIME-Version: 1.0 Message-Id: <20240715-b4-slab-kfree_rcu-destroy-v1-5-46b2984c2205@suse.cz> References: <20240715-b4-slab-kfree_rcu-destroy-v1-0-46b2984c2205@suse.cz> In-Reply-To: <20240715-b4-slab-kfree_rcu-destroy-v1-0-46b2984c2205@suse.cz> To: "Paul E. McKenney" , Joel Fernandes , Josh Triplett , Boqun Feng , Christoph Lameter , David Rientjes Cc: Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Julia Lawall , Jakub Kicinski , "Jason A. Donenfeld" , "Uladzislau Rezki (Sony)" , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.0 X-Rspamd-Queue-Id: C0E471C0020 X-Stat-Signature: dfnyrpycgoszez7s783et8wxsdatjhgi X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1721075380-587387 X-HE-Meta: U2FsdGVkX18JpEAodNreuajm8zmXq+ylxektzF9BFhP4ai95BrMS1qnmEdo47rFS8RVbsqS+AG7D4U68yWNghPfroMmzFZ8ujH3WNgd8IXMQ4PZLKN11R/CYOV7EaEYYm8GKaCzDNlimbJKIxxS5OYqBhAFdrN9Pfk3fp1Or5tKnolICIpWFlHesIYXpAS2lDM45gNFOgiLJoTDkcTbYHLJhqo98Ta3PYrc7yl8iozPIJF2Lz6LZAWgCfZUamGLGudaOE1hLbKzLS6KgqatdqycHiUQfa9rOJgF97xQaNH71xI7XXlpNSU7l3LzXFyOUzJF9d0N6Cru3BafquPPHKKoQ9UDQrUPzsaPbKSPmuVk2dMUQD9PzyT92cNvDhS3BWZr/2u5TnCwScxtdadqbf+Q2yG7N5G0V3vbgvS3KvNOcnR5EBVAV7FcuYeDlu4vf+zhKE6xVt4emN9idLO2XYCmnh0xlCXBv6ZbFjmvMlcyp8Fw3RBLHw4Qd/UP1SwTCphtSr3mM/40yvpb1XYJn8Qxxgp5PM6sST3UkF5PaIobZDoaAdx8KJMmNnN2iUFCM+c8ZbtdY88GKjUzmSIH8iRQFdaiNRxU6tvdPl1CUY5zuj3staH6XT7xazlox1B/gWiXh+GQSM5jjKQiIdDp/dHgUwp1dMiUsEKATWGpHsPmf987Q2huDhvMar8IuxIhS4WXXIsr8+9iC1I9S1zlX/tPRvr9kc5v2zDdsGDrYStJoS3caq2cgeE+r/AwPsBdi1NBdvyBsyo0Dm5YpB4+Y9N1diL9OSluexPr8x+alwEMu9boksJlPxPpmVnbdz/+KfHH/2v5jjfjE8SHcHv6lakXDcaWh+wgdjwOiq140oiesBs81Usn5PqTZccImg6igfNBtSr1w4V99CqJmW3eGMF21V0LHPAvIYhjRAIvX0TT6PuwBmszwxDfx5mznPMa6pWOU7mhJlN1IAlkH9fo fg3dlOMR nvghCActNRUkcWMuvfF7xcZaxFHPPCvUunq8ghxCRG4nVv420GkpIVQ++BJI9KrG8vwwGems89bjgpdNlQz6LNvvk51lX7yU/3H6fixirDsEwC5RHDVasPorMW82GPzSbXFZNzmi8T79klKOMGJAYTH0zMnr3aTeFzLhliIqZF8Yb8ZLGCfRZeJf3BhMGFFC4n9+Rz9WHjYHGKGmqBzC4P9oNtCAx5yaKRFAZO0p9yLmB9BlVqtbxAMDKm91Wc/ZpeHwUcEUgCCehIbzuyC81o19gQoc+DkHcsWIZBpB0cHoRcCI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We would like to replace call_rcu() users with kfree_rcu() where the existing callback is just a kmem_cache_free(). However this causes issues when the cache can be destroyed (such as due to module unload). Currently such modules should be issuing rcu_barrier() before kmem_cache_destroy() to have their call_rcu() callbacks processed first. This barrier is however not sufficient for kfree_rcu() in flight due to the batching introduced by a35d16905efc ("rcu: Add basic support for kfree_rcu() batching"). This is not a problem for kmalloc caches which are never destroyed, but since removing SLOB, kfree_rcu() is allowed also for any other cache, that might be destroyed. In order not to complicate the API, put the responsibility for handling outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the result of __kmem_cache_shutdown() to determine if there are still allocated objects in the cache, and if there are, assume it's due to kfree_rcu(). In that case schedule a work item that will use the appropriate barrier and then attempt __kmem_cache_shutdown() again. Only if that fails as well, produce the usual warning about non-freed objects. Sysfs and debugs directories are removed immediately, so the cache can be recreated with the same name without issues, while the previous instance is still pending removal. Users of call_rcu() with arbitrary callbacks should still perform their own synchronous barrier before destroying the cache and unloading the module, as the callbacks may be invoking module code or perform other actions that are necessary for a successful unload. Note that another non-bug reason why there might be objects outstanding is the kasan quarantine. In that case the cleanup also becomes asynchronous, and flushing the quarantine by kasan_cache_shutdown(s) is only done in the workfn. Signed-off-by: Vlastimil Babka --- mm/slab.h | 4 +++- mm/slab_common.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++---- mm/slub.c | 9 +++++---- 3 files changed, 56 insertions(+), 9 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index ece18ef5dd04..390a4e265f03 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -279,6 +279,8 @@ struct kmem_cache { unsigned int red_left_pad; /* Left redzone padding size */ const char *name; /* Name (only for display!) */ struct list_head list; /* List of slab caches */ + struct work_struct async_destroy_work; + #ifdef CONFIG_SYSFS struct kobject kobj; /* For sysfs */ #endif @@ -478,7 +480,7 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s) SLAB_NO_USER_FLAGS) bool __kmem_cache_empty(struct kmem_cache *); -int __kmem_cache_shutdown(struct kmem_cache *); +int __kmem_cache_shutdown(struct kmem_cache *, bool); void __kmem_cache_release(struct kmem_cache *); int __kmem_cache_shrink(struct kmem_cache *); void slab_kmem_cache_release(struct kmem_cache *); diff --git a/mm/slab_common.c b/mm/slab_common.c index 57962e1a5a86..3e15525819b6 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -44,6 +44,8 @@ static LIST_HEAD(slab_caches_to_rcu_destroy); static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work); static DECLARE_WORK(slab_caches_to_rcu_destroy_work, slab_caches_to_rcu_destroy_workfn); +static void kmem_cache_kfree_rcu_destroy_workfn(struct work_struct *work); + /* * Set of flags that will prevent slab merging @@ -235,6 +237,7 @@ static struct kmem_cache *create_cache(const char *name, s->refcount = 1; list_add(&s->list, &slab_caches); + INIT_WORK(&s->async_destroy_work, kmem_cache_kfree_rcu_destroy_workfn); return s; out_free_cache: @@ -535,6 +538,47 @@ void slab_kmem_cache_release(struct kmem_cache *s) kmem_cache_free(kmem_cache, s); } +static void kmem_cache_kfree_rcu_destroy_workfn(struct work_struct *work) +{ + struct kmem_cache *s; + bool rcu_set; + int err; + + s = container_of(work, struct kmem_cache, async_destroy_work); + + // XXX use the real kmem_cache_free_barrier() or similar thing here + rcu_barrier(); + + cpus_read_lock(); + mutex_lock(&slab_mutex); + + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU; + + /* free asan quarantined objects */ + kasan_cache_shutdown(s); + + err = __kmem_cache_shutdown(s, true); + WARN(err, "kmem_cache_destroy %s: Slab cache still has objects", + s->name); + + if (err) + goto out_unlock; + + list_del(&s->list); + + if (rcu_set) { + list_add_tail(&s->list, &slab_caches_to_rcu_destroy); + schedule_work(&slab_caches_to_rcu_destroy_work); + } + +out_unlock: + mutex_unlock(&slab_mutex); + cpus_read_unlock(); + + if (!err && !rcu_set) + kmem_cache_release(s); +} + void kmem_cache_destroy(struct kmem_cache *s) { bool rcu_set; @@ -558,9 +602,7 @@ void kmem_cache_destroy(struct kmem_cache *s) /* free asan quarantined objects */ kasan_cache_shutdown(s); - err = __kmem_cache_shutdown(s); - WARN(err, "%s %s: Slab cache still has objects when called from %pS", - __func__, s->name, (void *)_RET_IP_); + err = __kmem_cache_shutdown(s, false); if (!err) list_del(&s->list); @@ -573,8 +615,10 @@ void kmem_cache_destroy(struct kmem_cache *s) } debugfs_slab_release(s); - if (err) + if (err) { + schedule_work(&s->async_destroy_work); return; + } if (rcu_set) { mutex_lock(&slab_mutex); diff --git a/mm/slub.c b/mm/slub.c index aa4d80109c49..c1222467c346 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5352,7 +5352,8 @@ static void list_slab_objects(struct kmem_cache *s, struct slab *slab, * This is called from __kmem_cache_shutdown(). We must take list_lock * because sysfs file might still access partial list after the shutdowning. */ -static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n) +static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n, + bool warn_inuse) { LIST_HEAD(discard); struct slab *slab, *h; @@ -5363,7 +5364,7 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n) if (!slab->inuse) { remove_partial(n, slab); list_add(&slab->slab_list, &discard); - } else { + } else if (warn_inuse) { list_slab_objects(s, slab, "Objects remaining in %s on __kmem_cache_shutdown()"); } @@ -5388,7 +5389,7 @@ bool __kmem_cache_empty(struct kmem_cache *s) /* * Release all resources used by a slab cache. */ -int __kmem_cache_shutdown(struct kmem_cache *s) +int __kmem_cache_shutdown(struct kmem_cache *s, bool warn_inuse) { int node; struct kmem_cache_node *n; @@ -5396,7 +5397,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s) flush_all_cpus_locked(s); /* Attempt to free all objects */ for_each_kmem_cache_node(s, node, n) { - free_partial(s, n); + free_partial(s, n, warn_inuse); if (n->nr_partial || node_nr_slabs(n)) return 1; }