From patchwork Thu Sep 2 21:52:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12472715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EFEEC433EF for ; Thu, 2 Sep 2021 21:52:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E94A0610A0 for ; Thu, 2 Sep 2021 21:52:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E94A0610A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 91D476B0081; Thu, 2 Sep 2021 17:52:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C8896B0083; Thu, 2 Sep 2021 17:52:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B6C48D0001; Thu, 2 Sep 2021 17:52:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id 6D1C56B0081 for ; Thu, 2 Sep 2021 17:52:11 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 33BF7184558E0 for ; Thu, 2 Sep 2021 21:52:11 +0000 (UTC) X-FDA: 78543982062.30.93174D8 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf13.hostedemail.com (Postfix) with ESMTP id B58AE1025ABB for ; Thu, 2 Sep 2021 21:52:10 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 7E787610E7; Thu, 2 Sep 2021 21:52:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1630619530; bh=f2Dkp19KJIGDMDzDJFBR6Dv3OShUmkbxuvJFlI/0DIQ=; h=Date:From:To:Subject:In-Reply-To:From; b=gwIK+/KcYPEz6VcFkc6306irItkm36oc6aMHT9T8hnB6Wiexs5N2Ugd2M69UjjAiw Dj+y6NEULqKPqP/O4eYYvQH25k3H1mqmxJ1Alcq2/ivBIzILtUOGzJoYX0L4hjhT96 TlfRcZxjjEFaRS8l3wDdi/rbAvwV4RpHkXgI1vxM= Date: Thu, 02 Sep 2021 14:52:09 -0700 From: Andrew Morton To: akpm@linux-foundation.org, bigeasy@linutronix.de, brouer@redhat.com, cl@linux.com, efault@gmx.de, iamjoonsoo.kim@lge.com, jannh@google.com, linux-mm@kvack.org, mgorman@techsingularity.net, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, tglx@linutronix.de, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 041/212] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Message-ID: <20210902215209.n4o2elMhD%akpm@linux-foundation.org> In-Reply-To: <20210902144820.78957dff93d7bea620d55a89@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Queue-Id: B58AE1025ABB Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="gwIK+/Kc"; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam01 X-Stat-Signature: 9r86hsu6yrnhwatp4oscntrjt1u3kzt5 X-HE-Tag: 1630619530-206428 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vlastimil Babka Subject: mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Jann Horn reported [1] the following theoretically possible race: task A: put_cpu_partial() calls preempt_disable() task A: oldpage = this_cpu_read(s->cpu_slab->partial) interrupt: kfree() reaches unfreeze_partials() and discards the page task B (on another CPU): reallocates page as page cache task A: reads page->pages and page->pobjects, which are actually halves of the pointer page->lru.prev task B (on another CPU): frees page interrupt: allocates page as SLUB page and places it on the percpu partial list task A: this_cpu_cmpxchg() succeeds which would cause page->pages and page->pobjects to end up containing halves of pointers that would then influence when put_cpu_partial() happens and show up in root-only sysfs files. Maybe that's acceptable, I don't know. But there should probably at least be a comment for now to point out that we're reading union fields of a page that might be in a completely different state. Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is only safe against s->cpu_slab->partial manipulation in ___slab_alloc() if the latter disables irqs, otherwise a __slab_free() in an irq handler could call put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial and corrupt it. This becomes an issue on RT after a local_lock is introduced in later patch. The fix means taking the local_lock also in put_cpu_partial() on RT. After debugging this issue, Mike Galbraith suggested [2] that to avoid different locking schemes on RT and !RT, we can just protect put_cpu_partial() with disabled irqs (to be converted to local_lock_irqsave() later) everywhere. This should be acceptable as it's not a fast path, and moving the actual partial unfreezing outside of the irq disabled section makes it short, and with the retry loop gone the code can be also simplified. In addition, the race reported by Jann should no longer be possible. [1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz+H66fvm4FcA@mail.gmail.com/ [2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b982178a7d2befb.camel@gmx.de/ Link: https://lkml.kernel.org/r/20210805152000.12817-34-vbabka@suse.cz Signed-off-by: Vlastimil Babka Reported-by: Jann Horn Suggested-by: Mike Galbraith Cc: Christoph Lameter Cc: David Rientjes Cc: Jesper Dangaard Brouer Cc: Joonsoo Kim Cc: Mel Gorman Cc: Pekka Enberg Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- mm/slub.c | 83 ++++++++++++++++++++++++++++------------------------ 1 file changed, 45 insertions(+), 38 deletions(-) --- a/mm/slub.c~mm-slub-protect-put_cpu_partial-with-disabled-irqs-instead-of-cmpxchg +++ a/mm/slub.c @@ -2006,7 +2006,12 @@ static inline void *acquire_slab(struct return freelist; } +#ifdef CONFIG_SLUB_CPU_PARTIAL static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain); +#else +static inline void put_cpu_partial(struct kmem_cache *s, struct page *page, + int drain) { } +#endif static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags); /* @@ -2440,14 +2445,6 @@ static void unfreeze_partials_cpu(struct __unfreeze_partials(s, partial_page); } -#else /* CONFIG_SLUB_CPU_PARTIAL */ - -static inline void unfreeze_partials(struct kmem_cache *s) { } -static inline void unfreeze_partials_cpu(struct kmem_cache *s, - struct kmem_cache_cpu *c) { } - -#endif /* CONFIG_SLUB_CPU_PARTIAL */ - /* * Put a page that was just frozen (in __slab_free|get_partial_node) into a * partial page slot if available. @@ -2457,46 +2454,56 @@ static inline void unfreeze_partials_cpu */ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) { -#ifdef CONFIG_SLUB_CPU_PARTIAL struct page *oldpage; - int pages; - int pobjects; + struct page *page_to_unfreeze = NULL; + unsigned long flags; + int pages = 0; + int pobjects = 0; - preempt_disable(); - do { - pages = 0; - pobjects = 0; - oldpage = this_cpu_read(s->cpu_slab->partial); + local_irq_save(flags); + + oldpage = this_cpu_read(s->cpu_slab->partial); - if (oldpage) { + if (oldpage) { + if (drain && oldpage->pobjects > slub_cpu_partial(s)) { + /* + * Partial array is full. Move the existing set to the + * per node partial list. Postpone the actual unfreezing + * outside of the critical section. + */ + page_to_unfreeze = oldpage; + oldpage = NULL; + } else { pobjects = oldpage->pobjects; pages = oldpage->pages; - if (drain && pobjects > slub_cpu_partial(s)) { - /* - * partial array is full. Move the existing - * set to the per node partial list. - */ - unfreeze_partials(s); - oldpage = NULL; - pobjects = 0; - pages = 0; - stat(s, CPU_PARTIAL_DRAIN); - } } + } - pages++; - pobjects += page->objects - page->inuse; + pages++; + pobjects += page->objects - page->inuse; - page->pages = pages; - page->pobjects = pobjects; - page->next = oldpage; - - } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) - != oldpage); - preempt_enable(); -#endif /* CONFIG_SLUB_CPU_PARTIAL */ + page->pages = pages; + page->pobjects = pobjects; + page->next = oldpage; + + this_cpu_write(s->cpu_slab->partial, page); + + local_irq_restore(flags); + + if (page_to_unfreeze) { + __unfreeze_partials(s, page_to_unfreeze); + stat(s, CPU_PARTIAL_DRAIN); + } } +#else /* CONFIG_SLUB_CPU_PARTIAL */ + +static inline void unfreeze_partials(struct kmem_cache *s) { } +static inline void unfreeze_partials_cpu(struct kmem_cache *s, + struct kmem_cache_cpu *c) { } + +#endif /* CONFIG_SLUB_CPU_PARTIAL */ + static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c, bool lock) {