From patchwork Thu Jun 22 08:53:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13288657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54A54EB64DA for ; Thu, 22 Jun 2023 08:57:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC0828D0002; Thu, 22 Jun 2023 04:57:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E71348D0001; Thu, 22 Jun 2023 04:57:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D11E18D0002; Thu, 22 Jun 2023 04:57:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C21B48D0001 for ; Thu, 22 Jun 2023 04:57:26 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 94B10A0292 for ; Thu, 22 Jun 2023 08:57:26 +0000 (UTC) X-FDA: 80929780092.02.D956851 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf21.hostedemail.com (Postfix) with ESMTP id AE8CC1C0017 for ; Thu, 22 Jun 2023 08:57:24 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VXm59oxe; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687424244; a=rsa-sha256; cv=none; b=hr5qCHsJO5f+EQK5vZMPnmeGyS1lIQ95+9FlWFdi1FdFsYEh0q/p5Z3ToWvMf1+ryzODlP wD8QZw8mDHFBLnIz1wXTEPTmBL8NtY/sl6gkGbcH1BH9DL0MiZiLA3VKbHj2plD+VNZmc2 VNpN1QKNDPOBiXG+sV5O2XLF9g2YCXc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VXm59oxe; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687424244; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A8Fm0YGxp8yCkyBdSSDbuW3Cqk4JmtEEZRGj6EeWqug=; b=b+S/zPOJEH+S6u0WElTJes90XzCNf/Vpssu7NipNYrXS21JW2frJd4p9h9iSLUcASB++cl NKA7KGFmuX+CJaaFYo2NbtXYjRbs5NQwqJrV/Zm71tJZF6/MMxJ7FlXXq54Bb0Wuq6sIcx LhJZCpOUPwmjTMcmxLAaTtfZF88Mnng= Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-54fd4a7ce25so1096509a12.0 for ; Thu, 22 Jun 2023 01:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1687424243; x=1690016243; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=A8Fm0YGxp8yCkyBdSSDbuW3Cqk4JmtEEZRGj6EeWqug=; b=VXm59oxeB5FguY23iZ6wxsHZUa/n0ixVM5+RJovFhBsSqk7jUN18WD75GpJCgArR0W JBsEkA9d39jtf7NG1SF+QvFe/XJdGOqobYaTEX0aq6YpeCNoIMMKV4Gdkmm7QKEFMCrH QNswdcRq+FKKLLJSUCefQDEwUuyAf6EXdUAA8SWzr8d5PlDWqAvD+NLb3ziu4Q1ZbddG 3AHOgUEiuKr4IMyzSNLNbfYcakjlWAem1FVIRkWrvCWnM54TEnQwCZtRjpDkJUyTk1gU FK042RTEuCbaP60yQQ0pKyvy7feFAoM0ianmMZaWx78B21cSO7WyFn4vwESs5Li+xEdq ozzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687424243; x=1690016243; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A8Fm0YGxp8yCkyBdSSDbuW3Cqk4JmtEEZRGj6EeWqug=; b=O9BM38h5GZaV9Mt/KjG2cJnq7ekI0GfkfouZGkzUTXuMnc17P5YBLHx85OfHMEpbmC WFTV+m5qvjBO8ck/8CPiZyAruthOooSQa1KzpckMIdbpt+/kUf5VdF8CvRt24VTo/JRh tudSnGSNG649Qk2Oz4zzKH6D4eHD7s8hW4xTgi1ujj2r4gSMQ8kRgku9HqSj/OfRURCx P/cXz6S47MSDsG/FiZ+1nMOnuoLAecRS8StCskzn2ZzKOAmVNh9W50YTwcM2fhK7gB7L BGSBgPHlryoMnDOdHpRRK9/Tfjq9yxMTwpkpvyMXcp34qVYNUL5S/Eh/p1DpR3DsxtTK T6pA== X-Gm-Message-State: AC+VfDz3sf+jIs0JZ4bjy818xVtxxGq2Hb2sKzP7D2Bj9bbZxJDV9haR fcTT4sjxL0KOdznB1Y+1iAMj7A== X-Google-Smtp-Source: ACHHUZ6zp07NG2bkXXxglRYaIEoFlfKXA7hhsSA7stt9xCgrqpdelrec/tU9od+DRY9y6jYGAzcstA== X-Received: by 2002:a17:902:da91:b0:1b0:3d54:358f with SMTP id j17-20020a170902da9100b001b03d54358fmr20812342plx.0.1687424243512; Thu, 22 Jun 2023 01:57:23 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id h2-20020a170902f7c200b001b549fce345sm4806971plw.230.2023.06.22.01.57.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Jun 2023 01:57:23 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, Qi Zheng Subject: [PATCH 25/29] mm: vmscan: make memcg slab shrink lockless Date: Thu, 22 Jun 2023 16:53:31 +0800 Message-Id: <20230622085335.77010-26-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230622085335.77010-1-zhengqi.arch@bytedance.com> References: <20230622085335.77010-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: AE8CC1C0017 X-Stat-Signature: 7jbj8b36c6iei4fwebur7c4ac8da3se7 X-HE-Tag: 1687424244-449013 X-HE-Meta: U2FsdGVkX1+2tabyPXA8DN/m/P+SDG0KcVUTR36QTxMSKgt02KG0K4L239Ah6i+IqPhycPwyOkBt8Bkk4YYdrifWdELBPlezCtGzp6R50qGMQ6jo0iH/uAd+4n+PbbUElgaZkArf2/mJqXA6BUHKBdWH/LuydnEwomwS77MwEE6ODbxxGKbHQE3LZKb1pHS7AXe1nwNz5F6IMO371CI3Trt1CxqMqWjE2QDLD3gRHlamKDCx0oCHBSexfDYDJ9OfwkSrKrbKV62qrCfItj137eXGqRPUkcK6XNfYEg92vHPLNgBE9ITVXNyefimGUSmabMklgsuIEn4QYOX4ogxn8yC/XRawHvK3Y25bpRQybVLKp/Ihgw/RliqYrKicTSmew41mwv1XCjTXpIBll/EJd7KTBoeDlI57a6bMJlaKtaHSSxlTSNBW7AmDBVdxkbFrhP2n6E4OoPh41tJblqjKh98v+n+quZ2ao/VTC1wxTOMX3PsJZxjjt90tyyeJ91o4uXe6L7g3dqR4bLoI0Z3ucKUEd5WMnrwK1IT2OriAQrdofh6ScpfbJ3atcO1bDs6ljO4j5rfQLcdPyA/youa6a6Ilfyznj93Zbx+rDxGBwjNZ0czgtm3v3qZPgPMBHwW4XyRA53QNn4sAWprRRWg0KeiCy3pHlDrTPXPN2yNAkr9cMu5SLaFbZWPlF6gqPwrlfS8tPOLwUxbJY3Nh9AdLYGuNmxN/R6XOo3/YsS3ksERkyfr+VUFxBIbafRQMJlln52z1VmLx68wfZj//TBSB8RirXR7rYMf7EwercVUv2q6a4Fh0Pd/mrMqCINTern4BVwH48ZzR+DhC5DoZb+MtouTb3TxUnhf1Uoi0zO6gARwupcEBwBuFahJd6ut19H9fWL5p0v6Q9n4A6CRgEzUTmHDcIVw5UNjYbnunGoY8+ZwPmdCl16EwRs3xogIYEGxK+9VZ1xBkUrL0ft4pYHL lsXWn9Lh elqZ/EXjgK1Ydnwl1lyJh94E+Egg3CeCdDjDAjzzfGCxZ7j36/rvCgDNj/QAtGDnxZl5yHoRGgjhBd+yZ47A11DH6r/2O+nG7+uni/YLj0pTyTGfkFp811bGol82DH7GVso3y8kOjkQlYdhthr+grAz0nT6JG1zz6jsIhPf8AQuCv6CXKNjLYi85wZwb4qKsAUZR+JrX/du6eZs234i+xsJkX/6W0MNBXfSfczRlFd1rwOgFA/N6oOvnhLf/2sJIOM8EfY4/ON3rYtE+6TfEojFUdFTi5f74K70Jhkxk97/bujtEadY9Mcyllu8zP5A6FzvBy5oDNU7pXylT0R8B2Q1pIqCGVXl5bNIxxcLzs/CrLvBQVAXEft79/b6vk3tqwhDXaNhpI3n85fP9AfJZLCzzvdzAFnhw2biBn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Like global slab shrink, this commit also uses refcount+RCU method to make memcg slab shrink lockless. We can reproduce the down_read_trylock() hotspot through the following script: ``` DIR="/root/shrinker/memcg/mnt" do_create() { mkdir -p /sys/fs/cgroup/memory/test mkdir -p /sys/fs/cgroup/perf_event/test echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir -p /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & done } case "$1" in touch) do_touch $2 $3 ;; test) do_create 4000 do_mount 0 4000 do_touch 0 3000 ;; *) exit 1 ;; esac ``` Save the above script, then run test and touch commands. Then we can use the following perf command to view hotspots: perf top -U -F 999 [-g] 1) Before applying this patchset: 35.34% [kernel] [k] down_read_trylock 18.44% [kernel] [k] shrink_slab 15.98% [kernel] [k] pv_native_safe_halt 15.08% [kernel] [k] up_read 5.33% [kernel] [k] idr_find 2.71% [kernel] [k] _find_next_bit 2.21% [kernel] [k] shrink_node 1.29% [kernel] [k] shrink_lruvec 0.66% [kernel] [k] do_shrink_slab 0.33% [kernel] [k] list_lru_count_one 0.33% [kernel] [k] __radix_tree_lookup 0.25% [kernel] [k] mem_cgroup_iter - 82.19% 19.49% [kernel] [k] shrink_slab - 62.00% shrink_slab 36.37% down_read_trylock 15.52% up_read 5.48% idr_find 3.38% _find_next_bit + 0.98% do_shrink_slab 2) After applying this patchset: 46.83% [kernel] [k] shrink_slab 20.52% [kernel] [k] pv_native_safe_halt 8.85% [kernel] [k] do_shrink_slab 7.71% [kernel] [k] _find_next_bit 1.72% [kernel] [k] xas_descend 1.70% [kernel] [k] shrink_node 1.44% [kernel] [k] shrink_lruvec 1.43% [kernel] [k] mem_cgroup_iter 1.28% [kernel] [k] xas_load 0.89% [kernel] [k] super_cache_count 0.84% [kernel] [k] xas_start 0.66% [kernel] [k] list_lru_count_one - 65.50% 40.44% [kernel] [k] shrink_slab - 22.96% shrink_slab 13.11% _find_next_bit - 9.91% do_shrink_slab - 1.59% super_cache_count 0.92% list_lru_count_one We can see that the first perf hotspot becomes shrink_slab, which is what we expect. Signed-off-by: Qi Zheng --- mm/vmscan.c | 58 +++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 41 insertions(+), 17 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 767569698946..357a1f2ad690 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -213,6 +213,12 @@ static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, lockdep_is_held(&shrinker_rwsem)); } +static struct shrinker_info *shrinker_info_rcu(struct mem_cgroup *memcg, + int nid) +{ + return rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); +} + static int expand_one_shrinker_info(struct mem_cgroup *memcg, int map_size, int defer_size, int old_map_size, int old_defer_size, @@ -339,7 +345,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) struct shrinker_info *info; rcu_read_lock(); - info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); + info = shrinker_info_rcu(memcg, nid); if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) { /* Pairs with smp mb in shrink_slab() */ smp_mb__before_atomic(); @@ -359,7 +365,6 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) return -ENOSYS; down_write(&shrinker_rwsem); - /* This may call shrinker, so it must use down_read_trylock() */ id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -392,18 +397,28 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker, struct mem_cgroup *memcg) { struct shrinker_info *info; + long nr_deferred; - info = shrinker_info_protected(memcg, nid); - return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); + rcu_read_lock(); + info = shrinker_info_rcu(memcg, nid); + nr_deferred = atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); + rcu_read_unlock(); + + return nr_deferred; } static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, struct mem_cgroup *memcg) { struct shrinker_info *info; + long nr_deferred; + + rcu_read_lock(); + info = shrinker_info_rcu(memcg, nid); + nr_deferred = atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); + rcu_read_unlock(); - info = shrinker_info_protected(memcg, nid); - return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); + return nr_deferred; } void reparent_shrinker_deferred(struct mem_cgroup *memcg) @@ -955,19 +970,18 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, { struct shrinker_info *info; unsigned long ret, freed = 0; - int i; + int i = 0; if (!mem_cgroup_online(memcg)) return 0; - if (!down_read_trylock(&shrinker_rwsem)) - return 0; - - info = shrinker_info_protected(memcg, nid); +again: + rcu_read_lock(); + info = shrinker_info_rcu(memcg, nid); if (unlikely(!info)) goto unlock; - for_each_set_bit(i, info->map, info->map_nr_max) { + for_each_set_bit_from(i, info->map, info->map_nr_max) { struct shrink_control sc = { .gfp_mask = gfp_mask, .nid = nid, @@ -982,6 +996,10 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, continue; } + if (!shrinker_try_get(shrinker)) + continue; + rcu_read_unlock(); + /* Call non-slab shrinkers even though kmem is disabled */ if (!memcg_kmem_online() && !(shrinker->flags & SHRINKER_NONSLAB)) @@ -1014,13 +1032,19 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, } freed += ret; - if (rwsem_is_contended(&shrinker_rwsem)) { - freed = freed ? : 1; - break; - } + shrinker_put(shrinker); + + /* + * We have already exited the read-side of rcu critical section + * before calling do_shrink_slab(), the shrinker_info may be + * released in expand_one_shrinker_info(), so restart the + * iteration. + */ + i++; + goto again; } unlock: - up_read(&shrinker_rwsem); + rcu_read_unlock(); return freed; } #else /* CONFIG_MEMCG */