From patchwork Mon Apr 27 23:56:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11513397 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 35B9D92A for ; Mon, 27 Apr 2020 23:56:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0278E20B80 for ; Mon, 27 Apr 2020 23:56:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iPQT2XYO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0278E20B80 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CCAE08E0006; Mon, 27 Apr 2020 19:56:49 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C7AE68E0001; Mon, 27 Apr 2020 19:56:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B90BD8E0006; Mon, 27 Apr 2020 19:56:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id A2A628E0001 for ; Mon, 27 Apr 2020 19:56:49 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6D4BE8248047 for ; Mon, 27 Apr 2020 23:56:49 +0000 (UTC) X-FDA: 76755297738.27.coil63_75bf227700d0a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,longman@redhat.com,,RULES_HIT:30054:30070,0,RBL:207.211.31.81:@redhat.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: coil63_75bf227700d0a X-Filterd-Recvd-Size: 4961 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 23:56:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588031808; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=bG+oOpvHV7fdGOI3xpfAnUa5ETjwSHs1N5A5bl4B/qU=; b=iPQT2XYOUvCYah1j4sxKrOqGrk/zzBzpSoZW/9npQ6+TCMUxxoacZlCbCAmgG925prVWpM Z4eAojg8WqVpu2eJ1G7u1v0tFjulnOyQtiK7qAE6xtvX5rQViBN+qFN1dapukajFXL/4T7 GebTd5b/uUZ0PYeCX7riCW1kCCrDFKA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-407-H4_XG4_1NE6Kei4zmKFt3g-1; Mon, 27 Apr 2020 19:56:46 -0400 X-MC-Unique: H4_XG4_1NE6Kei4zmKFt3g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 96A5D835B8C; Mon, 27 Apr 2020 23:56:43 +0000 (UTC) Received: from llong.com (ovpn-112-176.rdu2.redhat.com [10.10.112.176]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3C1D460BF4; Mon, 27 Apr 2020 23:56:42 +0000 (UTC) From: Waiman Long To: Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Johannes Weiner , Michal Hocko , Vladimir Davydov Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Juri Lelli , Qian Cai , Waiman Long Subject: [PATCH v2 1/4] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Date: Mon, 27 Apr 2020 19:56:18 -0400 Message-Id: <20200427235621.7823-2-longman@redhat.com> In-Reply-To: <20200427235621.7823-1-longman@redhat.com> References: <20200427235621.7823-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the slub shrink sysfs file is written into, the function call sequence is as follows: kernfs_fop_write => slab_attr_store => shrink_store => kmem_cache_shrink_all It turns out that doing a memcg cache scan in kmem_cache_shrink_all() is redundant as the same memcg cache scan is being done in slab_attr_store(). So revert the commit 04f768a39d55 ("mm, slab: extend slab/shrink to shrink all memcg caches") except the documentation change which is still valid. Signed-off-by: Waiman Long --- mm/slab.h | 1 - mm/slab_common.c | 37 ------------------------------------- mm/slub.c | 2 +- 3 files changed, 1 insertion(+), 39 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 207c83ef6e06..0937cb2ae8aa 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -237,7 +237,6 @@ int __kmem_cache_shrink(struct kmem_cache *); void __kmemcg_cache_deactivate(struct kmem_cache *s); void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s); void slab_kmem_cache_release(struct kmem_cache *); -void kmem_cache_shrink_all(struct kmem_cache *s); struct seq_file; struct file; diff --git a/mm/slab_common.c b/mm/slab_common.c index 23c7500eea7d..2e367ab8c15c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -995,43 +995,6 @@ int kmem_cache_shrink(struct kmem_cache *cachep) } EXPORT_SYMBOL(kmem_cache_shrink); -/** - * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache - * @s: The cache pointer - */ -void kmem_cache_shrink_all(struct kmem_cache *s) -{ - struct kmem_cache *c; - - if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !is_root_cache(s)) { - kmem_cache_shrink(s); - return; - } - - get_online_cpus(); - get_online_mems(); - kasan_cache_shrink(s); - __kmem_cache_shrink(s); - - /* - * We have to take the slab_mutex to protect from the memcg list - * modification. - */ - mutex_lock(&slab_mutex); - for_each_memcg_cache(c, s) { - /* - * Don't need to shrink deactivated memcg caches. - */ - if (s->flags & SLAB_DEACTIVATED) - continue; - kasan_cache_shrink(c); - __kmem_cache_shrink(c); - } - mutex_unlock(&slab_mutex); - put_online_mems(); - put_online_cpus(); -} - bool slab_is_available(void) { return slab_state >= UP; diff --git a/mm/slub.c b/mm/slub.c index 9bf44955c4f1..183ccc364ccf 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5343,7 +5343,7 @@ static ssize_t shrink_store(struct kmem_cache *s, const char *buf, size_t length) { if (buf[0] == '1') - kmem_cache_shrink_all(s); + kmem_cache_shrink(s); else return -EINVAL; return length; From patchwork Mon Apr 27 23:56:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11513399 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 172F615AB for ; Mon, 27 Apr 2020 23:56:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CB6D721D7B for ; Mon, 27 Apr 2020 23:56:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="J1HHi/UF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB6D721D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AA8588E0008; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A7F7E8E0007; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 999DF8E0008; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id 7B8F48E0001 for ; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3D0DF180AD804 for ; Mon, 27 Apr 2020 23:56:52 +0000 (UTC) X-FDA: 76755297864.01.coach83_76257cbbff657 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,longman@redhat.com,,RULES_HIT:30012:30029:30054:30056:30070:30075,0,RBL:207.211.31.120:@redhat.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: coach83_76257cbbff657 X-Filterd-Recvd-Size: 10526 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 23:56:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588031811; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=eIRBSs/p8ctNV04QrR/EQANzSNnPdqVK06DiPuk5hes=; b=J1HHi/UFhIbx83fZHDqQWpeDzEl/Ag5N1ajZLrP0811Z1lTL5mDc/hQ55jXfOReQ1mXZ8z jR/MWWjMFcwwhr9XSk57keVubEYSHZ2B6HzSQ95WKh+E3x+pWv9hJihkqdivOoPDKtGVhR RxAL6m+NWoIAH3Bs5zy/u3d0ITKyzhg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-159-qH4HN4ZaPiCPFikr68OLlg-1; Mon, 27 Apr 2020 19:56:47 -0400 X-MC-Unique: qH4HN4ZaPiCPFikr68OLlg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 127FE464; Mon, 27 Apr 2020 23:56:45 +0000 (UTC) Received: from llong.com (ovpn-112-176.rdu2.redhat.com [10.10.112.176]) by smtp.corp.redhat.com (Postfix) with ESMTP id C363460BE2; Mon, 27 Apr 2020 23:56:43 +0000 (UTC) From: Waiman Long To: Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Johannes Weiner , Michal Hocko , Vladimir Davydov Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Juri Lelli , Qian Cai , Waiman Long Subject: [PATCH v2 2/4] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Date: Mon, 27 Apr 2020 19:56:19 -0400 Message-Id: <20200427235621.7823-3-longman@redhat.com> In-Reply-To: <20200427235621.7823-1-longman@redhat.com> References: <20200427235621.7823-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following lockdep splat was reported: [ 176.241923] ====================================================== [ 176.241924] WARNING: possible circular locking dependency detected [ 176.241926] 4.18.0-172.rt13.29.el8.x86_64+debug #1 Not tainted [ 176.241927] ------------------------------------------------------ [ 176.241929] slub_cpu_partia/5371 is trying to acquire lock: [ 176.241930] ffffffffa0b83718 (slab_mutex){+.+.}, at: slab_attr_store+0x6b/0xe0 [ 176.241941] but task is already holding lock: [ 176.241942] ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400 [ 176.241947] which lock already depends on the new lock. [ 176.241949] the existing dependency chain (in reverse order) is: [ 176.241949] -> #1 (kn->count#103){++++}: [ 176.241955] __kernfs_remove+0x616/0x800 [ 176.241957] kernfs_remove_by_name_ns+0x3e/0x80 [ 176.241959] sysfs_slab_add+0x1c6/0x330 [ 176.241961] __kmem_cache_create+0x15f/0x1b0 [ 176.241964] create_cache+0xe1/0x220 [ 176.241966] kmem_cache_create_usercopy+0x1a3/0x260 [ 176.241967] kmem_cache_create+0x12/0x20 [ 176.242076] mlx5_init_fs+0x18d/0x1a00 [mlx5_core] [ 176.242100] mlx5_load_one+0x3b4/0x1730 [mlx5_core] [ 176.242124] init_one+0x901/0x11b0 [mlx5_core] [ 176.242127] local_pci_probe+0xd4/0x180 [ 176.242131] work_for_cpu_fn+0x51/0xa0 [ 176.242133] process_one_work+0x91a/0x1ac0 [ 176.242134] worker_thread+0x536/0xb40 [ 176.242136] kthread+0x30c/0x3d0 [ 176.242140] ret_from_fork+0x27/0x50 [ 176.242140] -> #0 (slab_mutex){+.+.}: [ 176.242145] __lock_acquire+0x22cb/0x48c0 [ 176.242146] lock_acquire+0x134/0x4c0 [ 176.242148] _mutex_lock+0x28/0x40 [ 176.242150] slab_attr_store+0x6b/0xe0 [ 176.242151] kernfs_fop_write+0x251/0x400 [ 176.242154] vfs_write+0x157/0x460 [ 176.242155] ksys_write+0xb8/0x170 [ 176.242158] do_syscall_64+0x13c/0x710 [ 176.242160] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [ 176.242161] other info that might help us debug this: [ 176.242161] Possible unsafe locking scenario: [ 176.242162] CPU0 CPU1 [ 176.242163] ---- ---- [ 176.242163] lock(kn->count#103); [ 176.242165] lock(slab_mutex); [ 176.242166] lock(kn->count#103); [ 176.242167] lock(slab_mutex); [ 176.242169] *** DEADLOCK *** [ 176.242170] 3 locks held by slub_cpu_partia/5371: [ 176.242170] #0: ffff888705e3a800 (sb_writers#4){.+.+}, at: vfs_write+0x31c/0x460 [ 176.242174] #1: ffff889aeec4d658 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1a9/0x400 [ 176.242177] #2: ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400 [ 176.242180] stack backtrace: [ 176.242183] CPU: 36 PID: 5371 Comm: slub_cpu_partia Not tainted 4.18.0-172.rt13.29.el8.x86_64+debug #1 [ 176.242184] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RDY1005C 11/22/2019 [ 176.242185] Call Trace: [ 176.242190] dump_stack+0x9a/0xf0 [ 176.242193] check_noncircular+0x317/0x3c0 [ 176.242195] ? print_circular_bug+0x1e0/0x1e0 [ 176.242199] ? native_sched_clock+0x32/0x1e0 [ 176.242202] ? sched_clock+0x5/0x10 [ 176.242205] ? sched_clock_cpu+0x238/0x340 [ 176.242208] __lock_acquire+0x22cb/0x48c0 [ 176.242213] ? trace_hardirqs_on+0x10/0x10 [ 176.242215] ? trace_hardirqs_on+0x10/0x10 [ 176.242218] lock_acquire+0x134/0x4c0 [ 176.242220] ? slab_attr_store+0x6b/0xe0 [ 176.242223] _mutex_lock+0x28/0x40 [ 176.242225] ? slab_attr_store+0x6b/0xe0 [ 176.242227] slab_attr_store+0x6b/0xe0 [ 176.242229] ? sysfs_file_ops+0x160/0x160 [ 176.242230] kernfs_fop_write+0x251/0x400 [ 176.242232] ? __sb_start_write+0x26a/0x3f0 [ 176.242234] vfs_write+0x157/0x460 [ 176.242237] ksys_write+0xb8/0x170 [ 176.242239] ? __ia32_sys_read+0xb0/0xb0 [ 176.242242] ? do_syscall_64+0xb9/0x710 [ 176.242245] do_syscall_64+0x13c/0x710 [ 176.242247] entry_SYSCALL_64_after_hwframe+0x6a/0xdf There was another lockdep splat generated by echoing "1" to "/sys/kernel/slab/fs_cache/shrink": [ 445.231443] Chain exists of: cpu_hotplug_lock --> mem_hotplug_lock --> slab_mutex [ 445.242025] Possible unsafe locking scenario: [ 445.247977] CPU0 CPU1 [ 445.252529] ---- ---- [ 445.257082] lock(slab_mutex); [ 445.260239] lock(mem_hotplug_lock); [ 445.266452] lock(slab_mutex); [ 445.272141] lock(cpu_hotplug_lock); So it is problematic to use slab_mutex to iterate the list of child memcgs with for_each_memcg_cache(). Fortunately, there is another way to do child memcg iteration by going through the array entries in memcg_params.memcg_caches while holding a read lock on memcg_cache_ids_sem. To avoid other possible circular locking problems, we only take a reference to the child memcgs and store their addresses while holding memcg_cache_ids_sem. The actual store method is called for each of the child memcgs after releasing the lock. Signed-off-by: Waiman Long --- mm/slub.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 50 insertions(+), 8 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 183ccc364ccf..44cb5215c17f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5567,13 +5567,32 @@ static ssize_t slab_attr_store(struct kobject *kobj, return -EIO; err = attribute->store(s, buf, len); -#ifdef CONFIG_MEMCG - if (slab_state >= FULL && err >= 0 && is_root_cache(s)) { - struct kmem_cache *c; +#ifdef CONFIG_MEMCG_KMEM + if (slab_state >= FULL && err >= 0 && is_root_cache(s) && + !list_empty(&s->memcg_params.children)) { + struct kmem_cache *c, **pcaches; + int idx, max, cnt = 0; + size_t size, old = s->max_attr_size; + struct memcg_cache_array *arr; + + /* + * Make atomic update to s->max_attr_size. + */ + do { + size = old; + if (len <= size) + break; + old = cmpxchg(&s->max_attr_size, size, len); + } while (old != size); - mutex_lock(&slab_mutex); - if (s->max_attr_size < len) - s->max_attr_size = len; + memcg_get_cache_ids(); + max = memcg_nr_cache_ids; + + pcaches = kmalloc_array(max, sizeof(void *), GFP_KERNEL); + if (!pcaches) { + memcg_put_cache_ids(); + return -ENOMEM; + } /* * This is a best effort propagation, so this function's return @@ -5591,10 +5610,33 @@ static ssize_t slab_attr_store(struct kobject *kobj, * has well defined semantics. The cache being written to * directly either failed or succeeded, in which case we loop * through the descendants with best-effort propagation. + * + * To avoid potential circular lock dependency problems, we + * just get a reference and store child cache pointers while + * holding the memcg_cache_ids_sem read lock. The store + * method is then called for each child cache after releasing + * the lock. Code sequence partly borrowed from + * memcg_kmem_get_cache(). */ - for_each_memcg_cache(c, s) + rcu_read_lock(); + arr = rcu_dereference(s->memcg_params.memcg_caches); + for (idx = 0; idx < max; idx++) { + c = READ_ONCE(arr->entries[idx]); + if (!c) + continue; + if (!percpu_ref_tryget(&c->memcg_params.refcnt)) + continue; + pcaches[cnt++] = c; + } + rcu_read_unlock(); + memcg_put_cache_ids(); + + for (idx = 0; idx < cnt; idx++) { + c = pcaches[idx]; attribute->store(c, buf, len); - mutex_unlock(&slab_mutex); + percpu_ref_put(&c->memcg_params.refcnt); + } + kfree(pcaches); } #endif return err; From patchwork Mon Apr 27 23:56:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11513403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8BEA892A for ; Mon, 27 Apr 2020 23:56:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 592A02076A for ; Mon, 27 Apr 2020 23:56:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="D2DQHQ5R" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 592A02076A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E78968E0009; Mon, 27 Apr 2020 19:56:53 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E50688E0007; Mon, 27 Apr 2020 19:56:53 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCF658E0009; Mon, 27 Apr 2020 19:56:53 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id A94F78E0007 for ; Mon, 27 Apr 2020 19:56:53 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 73BC0180AD804 for ; Mon, 27 Apr 2020 23:56:53 +0000 (UTC) X-FDA: 76755297906.25.basin57_764e363c5a40f X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,longman@redhat.com,,RULES_HIT:30034:30054,0,RBL:207.211.31.81:@redhat.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: basin57_764e363c5a40f X-Filterd-Recvd-Size: 6292 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 23:56:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588031812; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=hKMGvop5RUwWSaAIrxGf06+2WG92KPK2um2wWjE0aqI=; b=D2DQHQ5RNrtIT8HeV52WaaKun3oNQfTv1LLBdq+6ErTP1534uLmbYnys1TSgGaao0ohG1A Hd/XZHQLRndtsmG7z0q7pu21tT8lOmhTVxEcednPujaTZPt/LihjTrc75PKsPjTG/p1pXr vtgi9Jb5BRjl5w3BITZwcLwN6wJKGhk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-447-iDM7_ISSM26hLWkH4Npq8w-1; Mon, 27 Apr 2020 19:56:48 -0400 X-MC-Unique: iDM7_ISSM26hLWkH4Npq8w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7A2A7180F122; Mon, 27 Apr 2020 23:56:46 +0000 (UTC) Received: from llong.com (ovpn-112-176.rdu2.redhat.com [10.10.112.176]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3293560BE2; Mon, 27 Apr 2020 23:56:45 +0000 (UTC) From: Waiman Long To: Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Johannes Weiner , Michal Hocko , Vladimir Davydov Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Juri Lelli , Qian Cai , Waiman Long Subject: [PATCH v2 3/4] mm/slub: Fix another circular locking dependency in slab_attr_store() Date: Mon, 27 Apr 2020 19:56:20 -0400 Message-Id: <20200427235621.7823-4-longman@redhat.com> In-Reply-To: <20200427235621.7823-1-longman@redhat.com> References: <20200427235621.7823-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It turns out that switching from slab_mutex to memcg_cache_ids_sem in slab_attr_store() does not completely eliminate circular locking dependency as shown by the following lockdep splat when the system is shut down: [ 2095.079697] Chain exists of: [ 2095.079697] kn->count#278 --> memcg_cache_ids_sem --> slab_mutex [ 2095.079697] [ 2095.090278] Possible unsafe locking scenario: [ 2095.090278] [ 2095.096227] CPU0 CPU1 [ 2095.100779] ---- ---- [ 2095.105331] lock(slab_mutex); [ 2095.108486] lock(memcg_cache_ids_sem); [ 2095.114961] lock(slab_mutex); [ 2095.120649] lock(kn->count#278); [ 2095.124068] [ 2095.124068] *** DEADLOCK *** To eliminate this possibility, we have to use trylock to acquire memcg_cache_ids_sem. Unlikely slab_mutex which can be acquired in many places, the memcg_cache_ids_sem write lock is only acquired in memcg_alloc_cache_id() to double the size of memcg_nr_cache_ids. So the chance of successive calls to memcg_alloc_cache_id() within a short time is pretty low. As a result, we can retry the read lock acquisition a few times if the first attempt fails. Signed-off-by: Waiman Long --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 5 +++++ mm/slub.c | 25 +++++++++++++++++++++++-- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d275c72c4f8e..9285f14965b1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1379,6 +1379,7 @@ extern struct workqueue_struct *memcg_kmem_cache_wq; extern int memcg_nr_cache_ids; void memcg_get_cache_ids(void); void memcg_put_cache_ids(void); +int memcg_tryget_cache_ids(void); /* * Helper macro to loop through all memcg-specific caches. Callers must still diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5beea03dd58a..9fa8535ff72a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -279,6 +279,11 @@ void memcg_get_cache_ids(void) down_read(&memcg_cache_ids_sem); } +int memcg_tryget_cache_ids(void) +{ + return down_read_trylock(&memcg_cache_ids_sem); +} + void memcg_put_cache_ids(void) { up_read(&memcg_cache_ids_sem); diff --git a/mm/slub.c b/mm/slub.c index 44cb5215c17f..cf2114ca27f7 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -34,6 +34,7 @@ #include #include #include +#include #include @@ -5572,6 +5573,7 @@ static ssize_t slab_attr_store(struct kobject *kobj, !list_empty(&s->memcg_params.children)) { struct kmem_cache *c, **pcaches; int idx, max, cnt = 0; + int retries = 3; size_t size, old = s->max_attr_size; struct memcg_cache_array *arr; @@ -5585,9 +5587,28 @@ static ssize_t slab_attr_store(struct kobject *kobj, old = cmpxchg(&s->max_attr_size, size, len); } while (old != size); - memcg_get_cache_ids(); - max = memcg_nr_cache_ids; + /* + * To avoid the following circular lock chain + * + * kn->count#278 --> memcg_cache_ids_sem --> slab_mutex + * + * We need to use trylock to acquire memcg_cache_ids_sem. + * + * Since the write lock is acquired only in + * memcg_alloc_cache_id() to double the size of + * memcg_nr_cache_ids. The chance of successive + * memcg_alloc_cache_id() calls within a short time is + * very low except at the beginning where the number of + * memory cgroups is low. So we retry a few times to get + * the memcg_cache_ids_sem read lock. + */ + while (!memcg_tryget_cache_ids()) { + if (retries-- <= 0) + return -EBUSY; + msleep(100); + } + max = memcg_nr_cache_ids; pcaches = kmalloc_array(max, sizeof(void *), GFP_KERNEL); if (!pcaches) { memcg_put_cache_ids(); From patchwork Mon Apr 27 23:56:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11513401 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26F5C15AB for ; Mon, 27 Apr 2020 23:56:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE4AE2076A for ; Mon, 27 Apr 2020 23:56:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Bx3jM5Mc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DE4AE2076A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C81278E0001; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C2E5F8E0009; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A30358E0001; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id 7E8CA8E0007 for ; Mon, 27 Apr 2020 19:56:52 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3C90152B7 for ; Mon, 27 Apr 2020 23:56:52 +0000 (UTC) X-FDA: 76755297864.23.clam51_762610d9a1d31 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,longman@redhat.com,,RULES_HIT:30054,0,RBL:205.139.110.120:@redhat.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: clam51_762610d9a1d31 X-Filterd-Recvd-Size: 5735 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 23:56:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588031811; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=qCIbPd4z+LsfKk+IqgZJljs6Wparbj+tB3otWihkQ10=; b=Bx3jM5Mcv+ejkUcUJsMHp79kfoeQ1CXgBM2qcLexgKkcbfv0crGVy2cIzoYIv9XjS7PV7X 11Bq/FxJpXNOhv6UsiY53pOrChnuycYdYdaXvxluVITKJ5nSNJaHpPkZu8/f+uh9XfaITA HPhXhABHKa0paY6B7YFgFVsdq9FelbM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-116-t_m2IxUVOyyRbN5EYA7WrQ-1; Mon, 27 Apr 2020 19:56:49 -0400 X-MC-Unique: t_m2IxUVOyyRbN5EYA7WrQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E899C180F12C; Mon, 27 Apr 2020 23:56:47 +0000 (UTC) Received: from llong.com (ovpn-112-176.rdu2.redhat.com [10.10.112.176]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9A89D60C81; Mon, 27 Apr 2020 23:56:46 +0000 (UTC) From: Waiman Long To: Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Johannes Weiner , Michal Hocko , Vladimir Davydov Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Juri Lelli , Qian Cai , Waiman Long Subject: [PATCH v2 4/4] mm/slub: Fix sysfs shrink circular locking dependency Date: Mon, 27 Apr 2020 19:56:21 -0400 Message-Id: <20200427235621.7823-5-longman@redhat.com> In-Reply-To: <20200427235621.7823-1-longman@redhat.com> References: <20200427235621.7823-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A lockdep splat is observed by echoing "1" to the shrink sysfs file and then shutting down the system: [ 167.473392] Chain exists of: [ 167.473392] kn->count#279 --> mem_hotplug_lock.rw_sem --> slab_mutex [ 167.473392] [ 167.484323] Possible unsafe locking scenario: [ 167.484323] [ 167.490273] CPU0 CPU1 [ 167.494825] ---- ---- [ 167.499376] lock(slab_mutex); [ 167.502530] lock(mem_hotplug_lock.rw_sem); [ 167.509356] lock(slab_mutex); [ 167.515044] lock(kn->count#279); [ 167.518462] [ 167.518462] *** DEADLOCK *** It is because of the get_online_cpus() and get_online_mems() calls in kmem_cache_shrink() invoked via the shrink sysfs file. To fix that, we have to use trylock to get the memory and cpu hotplug read locks. Since hotplug events are rare, it should be fine to refuse a kmem caches shrink operation when some hotplug events are in progress. Signed-off-by: Waiman Long --- include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 5 +++++ mm/slub.c | 19 +++++++++++++++---- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 93d9ada74ddd..4ec4b0a2f0fa 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -231,6 +231,7 @@ extern void get_page_bootmem(unsigned long ingo, struct page *page, void get_online_mems(void); void put_online_mems(void); +int tryget_online_mems(void); void mem_hotplug_begin(void); void mem_hotplug_done(void); @@ -274,6 +275,7 @@ static inline int try_online_node(int nid) static inline void get_online_mems(void) {} static inline void put_online_mems(void) {} +static inline int tryget_online_mems(void) { return 1; } static inline void mem_hotplug_begin(void) {} static inline void mem_hotplug_done(void) {} diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index fc0aad0bc1f5..38f9ccec9259 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -59,6 +59,11 @@ void get_online_mems(void) percpu_down_read(&mem_hotplug_lock); } +int tryget_online_mems(void) +{ + return percpu_down_read_trylock(&mem_hotplug_lock); +} + void put_online_mems(void) { percpu_up_read(&mem_hotplug_lock); diff --git a/mm/slub.c b/mm/slub.c index cf2114ca27f7..c4977ac3271b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5343,10 +5343,20 @@ static ssize_t shrink_show(struct kmem_cache *s, char *buf) static ssize_t shrink_store(struct kmem_cache *s, const char *buf, size_t length) { - if (buf[0] == '1') - kmem_cache_shrink(s); - else + if (buf[0] != '1') return -EINVAL; + + if (!cpus_read_trylock()) + return -EBUSY; + if (!tryget_online_mems()) { + length = -EBUSY; + goto cpus_unlock_out; + } + kasan_cache_shrink(s); + __kmem_cache_shrink(s); + put_online_mems(); +cpus_unlock_out: + cpus_read_unlock(); return length; } SLAB_ATTR(shrink); @@ -5654,7 +5664,8 @@ static ssize_t slab_attr_store(struct kobject *kobj, for (idx = 0; idx < cnt; idx++) { c = pcaches[idx]; - attribute->store(c, buf, len); + if (attribute->store(c, buf, len) == -EBUSY) + err = -EBUSY; percpu_ref_put(&c->memcg_params.refcnt); } kfree(pcaches);