From patchwork Fri Mar 1 17:07:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13578811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45561C5478C for ; Fri, 1 Mar 2024 17:07:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA2246B009F; Fri, 1 Mar 2024 12:07:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B52EC6B009D; Fri, 1 Mar 2024 12:07:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97D3E6B009F; Fri, 1 Mar 2024 12:07:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 736896B009C for ; Fri, 1 Mar 2024 12:07:16 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E1B671C0CCD for ; Fri, 1 Mar 2024 17:07:15 +0000 (UTC) X-FDA: 81849100830.26.89C4276 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf06.hostedemail.com (Postfix) with ESMTP id 4BC6F180024 for ; Fri, 1 Mar 2024 17:07:12 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709312832; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XQGU1Qk0c+C8g3iQCGZYbUkVzv136rhWCBBorzn4hKY=; b=3+Yck4fVmMKy8eSE+lRXW3ZbbWJsgbonzx5uS31n1v/g8aAgTSm3/fvC7cJ7kbg/Ebz2mL ZzB1o3DWInZi4pHSKnE8ZxuoyuPGUoTqTW/sJFOpzXbEjt8wX62L49PPUd4IVR4S/UHD3r Vwt/ZBdc0g+66E6/RNKloPHyeUJfBT8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709312832; a=rsa-sha256; cv=none; b=G7yDmRN346N57b1TZJUol+IsSH0zZHoNBklx9yxuI4e8rByZ+TX6iV/+Il+EfNzC2jQHzf iyJxtEaA15hassNc1Ojn95VhxTJleFa3QtfX2VJvmA510CFaxhwde8PHAI3p6CMn3L788X 9mBOXrZi2QPoUKteeLYTiIu+AU/RIGY= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B170733D2C; Fri, 1 Mar 2024 17:07:10 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 7CD6813AA3; Fri, 1 Mar 2024 17:07:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 6PIzHj4L4mUcGQAAD6G6ig (envelope-from ); Fri, 01 Mar 2024 17:07:10 +0000 From: Vlastimil Babka Date: Fri, 01 Mar 2024 18:07:08 +0100 Subject: [PATCH RFC 1/4] mm, slab: move memcg charging to post-alloc hook MIME-Version: 1.0 Message-Id: <20240301-slab-memcg-v1-1-359328a46596@suse.cz> References: <20240301-slab-memcg-v1-0-359328a46596@suse.cz> In-Reply-To: <20240301-slab-memcg-v1-0-359328a46596@suse.cz> To: Linus Torvalds , Josh Poimboeuf , Jeff Layton , Chuck Lever , Kees Cook , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Alexander Viro , Christian Brauner , Jan Kara Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.13.0 X-Rspamd-Queue-Id: 4BC6F180024 X-Rspam-User: X-Stat-Signature: rywhcnit4f8sjbosi7tqu91a7n5obwnu X-Rspamd-Server: rspam01 X-HE-Tag: 1709312832-44656 X-HE-Meta: U2FsdGVkX1/IFuBMFK4ykNDd3ZMDoxMqHS71Pd31NNYxHeikuO5eAwpbYRkUxkF2fw0i0B2bMdj0aZdRMF9UvTmRRCv2Ch0Zy09q11U6megikczItBiFatGGwvrYSeCKUZP5VWApx9p5lpEtrH0d8w0otg+bT7HZfONeI3wzOuRen2zLRIT2due++2FzGrLiTvQ2MQfQChF3mOsNqifooAKbG4Ve+GQGDY4yqNQ4qrEZe8UunUxxEUVYDuP681VDZC9Hir4yjwj25nuSDjFpCdsoUO0Rwv8QTJIdYJIiGkOstqH3P2oo7C4n+IEpsCPcZkU50TqpS3TUA6lKw9KbBKc9oUhivQ9+70krn1claP6mws1HclryW3+xs6zKk4ZPyuRFx+otp05VrWL+k1YihH4GTYVc7BL2T2QWbfOQE0/tBtPETPmMavsXPVSy4Onz5SxP32paon+picIty6wToZrQUZp6KkYcepbBTfg+EJ5BTuF5qphyFn0uxgLp7yoNG9MwvwiHgJCgx1CKmgw5kAN738N17hiAGbE4XzMCcmIhDx4vuv33EwykvsXHHZtPSIen9MgT38NQIuBXdp/6EV5K5zgdEdS/GN3w5xix/JZ/DNStC69w/Ld/VvitQkXKF+xXTqGG3CC4AWoqJtoZ6Tj7yUC9wDIfJPEF38DtB5MQh6ivForsXLJd6HdL5WEEfOpkkjX6Xcge2ANMxLm0Zqxf14jL7v0To1zVMcMGPPyMnf+lxrwRauyeb31lzWQ+qaOm6YUvaovyXcmffVp+US722xzBHs1QLuGJKv05LxGaJGBqCLyYRyFbz4Gf4eL3GtAS2TiPOFQroh0OYyO39VSVRxEs9ymhN4cszL7rmgA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The MEMCG_KMEM integration with slab currently relies on two hooks during allocation. memcg_slab_pre_alloc_hook() determines the objcg and charges it, and memcg_slab_post_alloc_hook() assigns the objcg pointer to the allocated object(s). As Linus pointed out, this is unnecessarily complex. Failing to charge due to memcg limits should be rare, so we can optimistically allocate the object(s) and do the charging together with assigning the objcg pointer in a single post_alloc hook. In the rare case the charging fails, we can free the object(s) back. This simplifies the code (no need to pass around the objcg pointer) and potentially allows to separate charging from allocation in cases where it's common that the allocation would be immediately freed, and the memcg handling overhead could be saved. Suggested-by: Linus Torvalds Link: https://lore.kernel.org/all/CAHk-=whYOOdM7jWy5jdrAm8LxcgCMFyk2bt8fYYvZzM4U-zAQA@mail.gmail.com/ Signed-off-by: Vlastimil Babka Reviewed-by: Roman Gushchin Reviewed-by: Chengming Zhou --- mm/slub.c | 180 +++++++++++++++++++++++++++----------------------------------- 1 file changed, 77 insertions(+), 103 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 2ef88bbf56a3..7022a1246bab 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1897,23 +1897,36 @@ static inline size_t obj_full_size(struct kmem_cache *s) return s->size + sizeof(struct obj_cgroup *); } -/* - * Returns false if the allocation should fail. - */ -static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s, - struct list_lru *lru, - struct obj_cgroup **objcgp, - size_t objects, gfp_t flags) +static bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, + gfp_t flags, size_t size, + void **p) { + struct obj_cgroup *objcg; + struct slab *slab; + unsigned long off; + size_t i; + /* * The obtained objcg pointer is safe to use within the current scope, * defined by current task or set_active_memcg() pair. * obj_cgroup_get() is used to get a permanent reference. */ - struct obj_cgroup *objcg = current_obj_cgroup(); + objcg = current_obj_cgroup(); if (!objcg) return true; + /* + * slab_alloc_node() avoids the NULL check, so we might be called with a + * single NULL object. kmem_cache_alloc_bulk() aborts if it can't fill + * the whole requested size. + * return success as there's nothing to free back + */ + if (unlikely(*p == NULL)) + return true; + + flags &= gfp_allowed_mask; + if (lru) { int ret; struct mem_cgroup *memcg; @@ -1926,71 +1939,51 @@ static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s, return false; } - if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) + if (obj_cgroup_charge(objcg, flags, size * obj_full_size(s))) return false; - *objcgp = objcg; + for (i = 0; i < size; i++) { + slab = virt_to_slab(p[i]); + + if (!slab_objcgs(slab) && + memcg_alloc_slab_cgroups(slab, s, flags, false)) { + obj_cgroup_uncharge(objcg, obj_full_size(s)); + continue; + } + + off = obj_to_index(s, slab, p[i]); + obj_cgroup_get(objcg); + slab_objcgs(slab)[off] = objcg; + mod_objcg_state(objcg, slab_pgdat(slab), + cache_vmstat_idx(s), obj_full_size(s)); + } + return true; } -/* - * Returns false if the allocation should fail. - */ +static void memcg_alloc_abort_single(struct kmem_cache *s, void *object); + static __fastpath_inline -bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru, - struct obj_cgroup **objcgp, size_t objects, - gfp_t flags) +bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, + gfp_t flags, size_t size, void **p) { - if (!memcg_kmem_online()) + if (likely(!memcg_kmem_online())) return true; if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))) return true; - return likely(__memcg_slab_pre_alloc_hook(s, lru, objcgp, objects, - flags)); -} - -static void __memcg_slab_post_alloc_hook(struct kmem_cache *s, - struct obj_cgroup *objcg, - gfp_t flags, size_t size, - void **p) -{ - struct slab *slab; - unsigned long off; - size_t i; - - flags &= gfp_allowed_mask; - - for (i = 0; i < size; i++) { - if (likely(p[i])) { - slab = virt_to_slab(p[i]); - - if (!slab_objcgs(slab) && - memcg_alloc_slab_cgroups(slab, s, flags, false)) { - obj_cgroup_uncharge(objcg, obj_full_size(s)); - continue; - } + if (likely(__memcg_slab_post_alloc_hook(s, lru, flags, size, p))) + return true; - off = obj_to_index(s, slab, p[i]); - obj_cgroup_get(objcg); - slab_objcgs(slab)[off] = objcg; - mod_objcg_state(objcg, slab_pgdat(slab), - cache_vmstat_idx(s), obj_full_size(s)); - } else { - obj_cgroup_uncharge(objcg, obj_full_size(s)); - } + if (likely(size == 1)) { + memcg_alloc_abort_single(s, p); + *p = NULL; + } else { + kmem_cache_free_bulk(s, size, p); } -} - -static __fastpath_inline -void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, - gfp_t flags, size_t size, void **p) -{ - if (likely(!memcg_kmem_online() || !objcg)) - return; - return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p); + return false; } static void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, @@ -2029,14 +2022,6 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, __memcg_slab_free_hook(s, slab, p, objects, objcgs); } - -static inline -void memcg_slab_alloc_error_hook(struct kmem_cache *s, int objects, - struct obj_cgroup *objcg) -{ - if (objcg) - obj_cgroup_uncharge(objcg, objects * obj_full_size(s)); -} #else /* CONFIG_MEMCG_KMEM */ static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) { @@ -2047,31 +2032,18 @@ static inline void memcg_free_slab_cgroups(struct slab *slab) { } -static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, - struct list_lru *lru, - struct obj_cgroup **objcgp, - size_t objects, gfp_t flags) -{ - return true; -} - -static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, - struct obj_cgroup *objcg, +static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, gfp_t flags, size_t size, void **p) { + return true; } static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects) { } - -static inline -void memcg_slab_alloc_error_hook(struct kmem_cache *s, int objects, - struct obj_cgroup *objcg) -{ -} #endif /* CONFIG_MEMCG_KMEM */ /* @@ -3751,10 +3723,7 @@ noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags) ALLOW_ERROR_INJECTION(should_failslab, ERRNO); static __fastpath_inline -struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, - struct list_lru *lru, - struct obj_cgroup **objcgp, - size_t size, gfp_t flags) +struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags) { flags &= gfp_allowed_mask; @@ -3763,14 +3732,11 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, if (unlikely(should_failslab(s, flags))) return NULL; - if (unlikely(!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))) - return NULL; - return s; } static __fastpath_inline -void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, +bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, gfp_t flags, size_t size, void **p, bool init, unsigned int orig_size) { @@ -3819,7 +3785,7 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, kmsan_slab_alloc(s, p[i], init_flags); } - memcg_slab_post_alloc_hook(s, objcg, flags, size, p); + return memcg_slab_post_alloc_hook(s, lru, flags, size, p); } /* @@ -3836,10 +3802,9 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) { void *object; - struct obj_cgroup *objcg = NULL; bool init = false; - s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); + s = slab_pre_alloc_hook(s, gfpflags); if (unlikely(!s)) return NULL; @@ -3856,8 +3821,10 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list /* * When init equals 'true', like for kzalloc() family, only * @orig_size bytes might be zeroed instead of s->object_size + * In case this fails due to memcg_slab_post_alloc_hook(), + * object is set to NULL */ - slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init, orig_size); + slab_post_alloc_hook(s, lru, gfpflags, 1, &object, init, orig_size); return object; } @@ -4300,6 +4267,16 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object, do_slab_free(s, slab, object, object, 1, addr); } +#ifdef CONFIG_MEMCG_KMEM +/* Do not inline the rare memcg charging failed path into the allocation path */ +static noinline +void memcg_alloc_abort_single(struct kmem_cache *s, void *object) +{ + if (likely(slab_free_hook(s, object, slab_want_init_on_free(s)))) + do_slab_free(s, virt_to_slab(object), object, object, 1, _RET_IP_); +} +#endif + static __fastpath_inline void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head, void *tail, void **p, int cnt, unsigned long addr) @@ -4635,29 +4612,26 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p) { int i; - struct obj_cgroup *objcg = NULL; if (!size) return 0; - /* memcg and kmem_cache debug support */ - s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); + s = slab_pre_alloc_hook(s, flags); if (unlikely(!s)) return 0; i = __kmem_cache_alloc_bulk(s, flags, size, p); + if (unlikely(i == 0)) + return 0; /* * memcg and kmem_cache debug support and memory initialization. * Done outside of the IRQ disabled fastpath loop. */ - if (likely(i != 0)) { - slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s), s->object_size); - } else { - memcg_slab_alloc_error_hook(s, size, objcg); + if (unlikely(!slab_post_alloc_hook(s, NULL, flags, size, p, + slab_want_init_on_alloc(flags, s), s->object_size))) { + return 0; } - return i; } EXPORT_SYMBOL(kmem_cache_alloc_bulk);