From patchwork Mon Aug 7 11:09:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13343699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74539EB64DD for ; Mon, 7 Aug 2023 11:19:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 126926B0080; Mon, 7 Aug 2023 07:19:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B1278D0005; Mon, 7 Aug 2023 07:19:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E44E58D0003; Mon, 7 Aug 2023 07:19:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CFE6A6B0080 for ; Mon, 7 Aug 2023 07:19:50 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 333B91A0A22 for ; Mon, 7 Aug 2023 11:19:50 +0000 (UTC) X-FDA: 81097063740.23.8C298A0 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf03.hostedemail.com (Postfix) with ESMTP id 5BC1E20023 for ; Mon, 7 Aug 2023 11:19:48 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=O8pmKc9m; spf=pass (imf03.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691407188; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kfKKsEp61SZUnTy00HUcIO/dAXsDrrCBrc1q4f5gA2g=; b=h0EHdaXyG2zpf3+9OVvgNhhpXWjOFMwxHq87lwDq2ZWLSjc+OckOf4sRAWpt3gXOZ2rTl4 2DMGq+LIgmgrkPyOKJinQe068tZIlwEqeqBqhg+mQN1Z+Qx9DqzxDmsmtowpCHuZX4g8wY 7XbzWaEPd4C/e/JvLUBqfvYVYMSRzLI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691407188; a=rsa-sha256; cv=none; b=aq+sIoRDKxh+VjzH+W3yTJ95PqQbf1/04RPNbATIpFuHZ39kKJ3Y1TKXR2T/e23B+ZfYeE a3ibVZRAGUQl3iBH9t8bzLNBtX614LTg6is0JzVZ/SpQk6ixTRQWgUJPdNHJRBD9rGOrl3 Mdfyqtb5maRdrZtapmvTWUd8sKMjRXo= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=O8pmKc9m; spf=pass (imf03.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-5648fca18b9so575745a12.0 for ; Mon, 07 Aug 2023 04:19:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691407187; x=1692011987; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kfKKsEp61SZUnTy00HUcIO/dAXsDrrCBrc1q4f5gA2g=; b=O8pmKc9mkUPcwQn3K6b4vfxrIY3urK+/nlSQG6IQu3i18+IpYULExI3wKEZ/xymknF +6oCCfFy6i2a+FxBuWdzORrib00J2OMl9nk96M99T35Rfc4jeDAhNe99hGl5QvCM8eUE R++VmLiOwJ6rpxA/HEOBJYNzpyPX4zl9hoqj3WQaK+cPExCggyj0/ZBw0z6oFSY7C91n 2YVhcwVUTJYlZTyzWc8hxnW2XvaNGOtz3UCGr0ovpjlwcNhR4geKecrxjuux2WrpYp3y 4o91QPRwP3cp6mR8m2un4L62pLlrYSmXBb+57ubwaC0hesJSLB153WlFgONAQW4tHigo DeMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691407187; x=1692011987; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kfKKsEp61SZUnTy00HUcIO/dAXsDrrCBrc1q4f5gA2g=; b=VW8tQBbv2OsjNa+alHfCZ5Fev2xTYrV2sY6FEMaw+opB9JZ8MFF2JqS8DFkvg0bk1V A6DMEbY4j2JoEB8dTWrsTne5ZSuSHs1rosxfYV6D6Rc/fS1cY6rXOGnvGcq5G1wzXJSF BwrrWbbzRRY3qgnlHGZIeituLz8ES9OKymTFrUvEw590Dk3d3dcrEY+IjlcJhaOrFAS2 wW7nqirOqQ7oBOB38NcVoy3rHtIEqkmM3zmFDSCgyUCrcKLQE00C21Ucb02Ql48h1cBK GDTid83Fy7pvR0wAT2ldX5O0f1E/Y+VUT/GWinRtw5y+sa85nIZMcwyWlKD4/MWTf3Fc V4Mg== X-Gm-Message-State: ABy/qLb35+57vxy/mIaQBrq83OXixGtSZuKqA41ig8ed+zY2ZQb+zGrP LOa9C3s2s42GrxDkYq5QcV/oig== X-Google-Smtp-Source: APBJJlGG6pz8UXODUVvMLzBj3hYH8JYeumw11e9YZ9plh8RnP0KuzfFyuRP+T7CTu9v0n/nCJMsqhQ== X-Received: by 2002:a17:90a:1090:b0:268:126c:8a8b with SMTP id c16-20020a17090a109000b00268126c8a8bmr24580285pja.3.1691407187287; Mon, 07 Aug 2023 04:19:47 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id y13-20020a17090aca8d00b0025be7b69d73sm5861191pjt.12.2023.08.07.04.19.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 04:19:46 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, simon.horman@corigine.com, dlemoal@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org, rcu@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, Qi Zheng Subject: [PATCH v4 46/48] mm: shrinker: make memcg slab shrink lockless Date: Mon, 7 Aug 2023 19:09:34 +0800 Message-Id: <20230807110936.21819-47-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230807110936.21819-1-zhengqi.arch@bytedance.com> References: <20230807110936.21819-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: 8odtor8or4yjke1bj18krqri5sprxww4 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 5BC1E20023 X-Rspam-User: X-HE-Tag: 1691407188-306434 X-HE-Meta: U2FsdGVkX18Z8b5oziF8egHwb62fTD7dLjH4p1c0njPB7FsLb/DH8H724B7I9s1yqMa7YMavhaVt52pbuwyKupjqUcSc7vxfir75dh9FaBf11tiPByshvX5gc5v9g1vGgTeeXzN+LrLfWJ2rzYKebox/CrxUuk1SnPTyv4e+Tq4B+q8TRF5QNxh+LqH29erM7IklyjG75/cDT1AbIbeD0SPRl0BNXZK5JLamRzU9xtfDrSOuNJKTuDWxrI/+YShDPSiEtMMOtPzZGMif7r+eeYowc40dEOMwiEPRJg7wyfWRpJ+EU1n+Mxv0vrIplJ2h3joSV5e1N4TIcbZe3X9hKqpsrsjWrYHc2UMUKuU+Cwn5qO9OnTcBo4Y3jyfcbpDsWN0FmLUPKJhqBdIyUlbYjMBLSyQuSMc7qvGeDT8wIRUW+5gZ0nKJQmCA/lNB5RHeYBjJZFZQ340C30LB9OH/vu3ZEWEd/HDvxp/lxFHqi2gHN2Xs12M6BZtBze8/b7ogW3usw5STTIKOtYd2GsRLKBLYRiPoYLS+bMZ0+lgZdOkre3+aQ2Eognca4R0UPXUwUjqJEgcIzPWabgFYpU2jEHtdV7wix27n1bLxKP6ArNt/uwFnll/PvhRfUyaJhkqrC4Dr0v4UvYtYs5amQvbD5kfx0teE8sr+qymy90nm5fTrhAGsNpPM1alwpvoFnEtpns93Q/5QK+YYz2DV1kMLZ8okSOLGkEAzjdHjFZGBDNeeEw1J4D68t9dsyHDfn3JoAn09dPH6QJ0IhxxwrhUAMAKZwikZdZTo5uR8dq8cgIy67gQVRxuzVP3wjHFLweu/aZbAiNlcaVZ4wB4bDWeWA1IAsq+KkfXI0yeTaov2CpvlejF1LKGJjK6uzP3R4MXH56PgK/NpSdZ45pyjBhqDM5/ktHAIi4zFckWuvYbZn1X0XvuBcuMJn6g3I6Cc05VSHxm55V/M4RKFM2DhtzC aQpo3LEE XAt2MjBeCWwoVO8lDD0sGbB4tRYG42c/KRcSkuZEabiLHsS8s8NAaiBTIKOzc/nMUUAsWu+Y7xl4rTUkV4+6zHJ26YCe0hoabiygsA2hkdTUwVrp7VObxxOt6VpWLZE9bhjhUywum29yD7tji8JbBTRXQBbOqRIhgp+poBF5LBpR+BQL4Qcix3N1i4vulrJQq2V/FP13rIfcWqFWgAKvtSe73Dym0X1TD6vUS8dPfdqeJuufaiAGkI3BufyBssFNNnlUmlpIG0IaQJ+tyiaaPpzor1AeAuILw13Wxzajx5IzxMjbQHC9cTs+Q0WHyYPjngsYkpJv/aRNZAkLu2suZVhxO400Af2CRtWMNq4SJgoJX4Ez6CBhi1AcH8IxKsdjKJ+gVDjA6mYHrWZOj+jEIjTq3Sq3RpR/xPVOG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Like global slab shrink, this commit also uses refcount+RCU method to make memcg slab shrink lockless. Use the following script to do slab shrink stress test: ``` DIR="/root/shrinker/memcg/mnt" do_create() { mkdir -p /sys/fs/cgroup/memory/test echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir -p /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & done } case "$1" in touch) do_touch $2 $3 ;; test) do_create 4000 do_mount 0 4000 do_touch 0 3000 ;; *) exit 1 ;; esac ``` Save the above script, then run test and touch commands. Then we can use the following perf command to view hotspots: perf top -U -F 999 1) Before applying this patchset: 40.44% [kernel] [k] down_read_trylock 17.59% [kernel] [k] up_read 13.64% [kernel] [k] pv_native_safe_halt 11.90% [kernel] [k] shrink_slab 8.21% [kernel] [k] idr_find 2.71% [kernel] [k] _find_next_bit 1.36% [kernel] [k] shrink_node 0.81% [kernel] [k] shrink_lruvec 0.80% [kernel] [k] __radix_tree_lookup 0.50% [kernel] [k] do_shrink_slab 0.21% [kernel] [k] list_lru_count_one 0.16% [kernel] [k] mem_cgroup_iter 2) After applying this patchset: 60.17% [kernel] [k] shrink_slab 20.42% [kernel] [k] pv_native_safe_halt 3.03% [kernel] [k] do_shrink_slab 2.73% [kernel] [k] shrink_node 2.27% [kernel] [k] shrink_lruvec 2.00% [kernel] [k] __rcu_read_unlock 1.92% [kernel] [k] mem_cgroup_iter 0.98% [kernel] [k] __rcu_read_lock 0.91% [kernel] [k] osq_lock 0.63% [kernel] [k] mem_cgroup_calculate_protection 0.55% [kernel] [k] shrinker_put 0.46% [kernel] [k] list_lru_count_one We can see that the first perf hotspot becomes shrink_slab, which is what we expect. Signed-off-by: Qi Zheng --- mm/shrinker.c | 80 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 54 insertions(+), 26 deletions(-) diff --git a/mm/shrinker.c b/mm/shrinker.c index d318f5621862..fee6f62904fb 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -107,6 +107,12 @@ static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, lockdep_is_held(&shrinker_rwsem)); } +static struct shrinker_info *shrinker_info_rcu(struct mem_cgroup *memcg, + int nid) +{ + return rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); +} + static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_size, int old_size, int new_nr_max) { @@ -198,7 +204,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) struct shrinker_info_unit *unit; rcu_read_lock(); - info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); + info = shrinker_info_rcu(memcg, nid); unit = info->unit[shriner_id_to_index(shrinker_id)]; if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) { /* Pairs with smp mb in shrink_slab() */ @@ -211,7 +217,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) static DEFINE_IDR(shrinker_idr); -static int prealloc_memcg_shrinker(struct shrinker *shrinker) +static int shrinker_memcg_alloc(struct shrinker *shrinker) { int id, ret = -ENOMEM; @@ -219,7 +225,6 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) return -ENOSYS; down_write(&shrinker_rwsem); - /* This may call shrinker, so it must use down_read_trylock() */ id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -237,7 +242,7 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) return ret; } -static void unregister_memcg_shrinker(struct shrinker *shrinker) +static void shrinker_memcg_remove(struct shrinker *shrinker) { int id = shrinker->id; @@ -253,10 +258,15 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker, { struct shrinker_info *info; struct shrinker_info_unit *unit; + long nr_deferred; - info = shrinker_info_protected(memcg, nid); + rcu_read_lock(); + info = shrinker_info_rcu(memcg, nid); unit = info->unit[shriner_id_to_index(shrinker->id)]; - return atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0); + nr_deferred = atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0); + rcu_read_unlock(); + + return nr_deferred; } static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, @@ -264,10 +274,16 @@ static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, { struct shrinker_info *info; struct shrinker_info_unit *unit; + long nr_deferred; - info = shrinker_info_protected(memcg, nid); + rcu_read_lock(); + info = shrinker_info_rcu(memcg, nid); unit = info->unit[shriner_id_to_index(shrinker->id)]; - return atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset(shrinker->id)]); + nr_deferred = + atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset(shrinker->id)]); + rcu_read_unlock(); + + return nr_deferred; } void reparent_shrinker_deferred(struct mem_cgroup *memcg) @@ -299,12 +315,12 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) up_read(&shrinker_rwsem); } #else -static int prealloc_memcg_shrinker(struct shrinker *shrinker) +static int shrinker_memcg_alloc(struct shrinker *shrinker) { return -ENOSYS; } -static void unregister_memcg_shrinker(struct shrinker *shrinker) +static void shrinker_memcg_remove(struct shrinker *shrinker) { } @@ -464,18 +480,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (!mem_cgroup_online(memcg)) return 0; - if (!down_read_trylock(&shrinker_rwsem)) - return 0; - - info = shrinker_info_protected(memcg, nid); +again: + rcu_read_lock(); + info = shrinker_info_rcu(memcg, nid); if (unlikely(!info)) goto unlock; - for (; index < shriner_id_to_index(info->map_nr_max); index++) { + if (index < shriner_id_to_index(info->map_nr_max)) { struct shrinker_info_unit *unit; unit = info->unit[index]; + /* + * The shrinker_info_unit will not be freed, so we can + * safely release the RCU lock here. + */ + rcu_read_unlock(); + for_each_set_bit(offset, unit->map, SHRINKER_UNIT_BITS) { struct shrink_control sc = { .gfp_mask = gfp_mask, @@ -485,12 +506,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, struct shrinker *shrinker; int shrinker_id = calc_shrinker_id(index, offset); + rcu_read_lock(); shrinker = idr_find(&shrinker_idr, shrinker_id); - if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) { - if (!shrinker) - clear_bit(offset, unit->map); + if (unlikely(!shrinker || !shrinker_try_get(shrinker))) { + clear_bit(offset, unit->map); + rcu_read_unlock(); continue; } + rcu_read_unlock(); /* Call non-slab shrinkers even though kmem is disabled */ if (!memcg_kmem_online() && @@ -523,15 +546,20 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, set_shrinker_bit(memcg, nid, shrinker_id); } freed += ret; - - if (rwsem_is_contended(&shrinker_rwsem)) { - freed = freed ? : 1; - goto unlock; - } + shrinker_put(shrinker); } + + /* + * We have already exited the read-side of rcu critical section + * before calling do_shrink_slab(), the shrinker_info may be + * released in expand_one_shrinker_info(), so reacquire the + * shrinker_info. + */ + index++; + goto again; } unlock: - up_read(&shrinker_rwsem); + rcu_read_unlock(); return freed; } #else /* !CONFIG_MEMCG */ @@ -638,7 +666,7 @@ struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...) shrinker->flags = flags | SHRINKER_ALLOCATED; if (flags & SHRINKER_MEMCG_AWARE) { - err = prealloc_memcg_shrinker(shrinker); + err = shrinker_memcg_alloc(shrinker); if (err == -ENOSYS) shrinker->flags &= ~SHRINKER_MEMCG_AWARE; else if (err == 0) @@ -731,7 +759,7 @@ void shrinker_free(struct shrinker *shrinker) } if (shrinker->flags & SHRINKER_MEMCG_AWARE) - unregister_memcg_shrinker(shrinker); + shrinker_memcg_remove(shrinker); up_write(&shrinker_rwsem); if (debugfs_entry)