From patchwork Thu Mar 30 19:17:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194745 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8F40C77B60 for ; Thu, 30 Mar 2023 19:18:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232006AbjC3TSI (ORCPT ); Thu, 30 Mar 2023 15:18:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232022AbjC3TSH (ORCPT ); Thu, 30 Mar 2023 15:18:07 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 590DDE1AF for ; Thu, 30 Mar 2023 12:18:06 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5416d3a321eso198540897b3.12 for ; Thu, 30 Mar 2023 12:18:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203885; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ECd9n7cK9XN/T+IJ39MyPXAm65DzgHuyZCFnDGQgZIE=; b=tY0E3oybgVZIuKKOOKBykm8/TKbf2scexLJVT3DkzJcaMo8pgAVQKd31VkYqY1QVPJ mHo+yh2Z2e/UWhIAS6NyKjgzsMFfM3Q7bRmRT9xetX5hDAZEg0lz49jvXxYxOEOA0GHz 2iOR7gvTKjdYv9QT9UofKbbkFs9vSWVQ0TKS3904q473sNNcjSfvya1m1B2ab+CL/f+M 4qtYTT2pypw3Co5Bobg3nKNc5an7wdNYyXFt6rny9eUWrmoVflUhZJQGBV9r+u9oNXbf thlMA7X4kUIdycQp/zBF4S4OoaK6pqFmZjXA4rakf85jUKtTz9rXXIDv36itoKwLwzmv vt/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203885; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ECd9n7cK9XN/T+IJ39MyPXAm65DzgHuyZCFnDGQgZIE=; b=eci8iolKXDAoCrQwoFmb09ZzxW3/okdQYxtftb6qsAFA6egbNqz7qS4qlkAE4NSkuU uJl44rhc0Ymt34oXQ64eDeFr+8ZzVTE0SIx3A/uVNyVJJNuIz1muj5TpmvfyIW3Ls8Zm bBl1yLasz11XNuXmdMndFYHlKhxWm6Ar71lX/+zJSxaqe1hgjRrGmLedqT3b/IvpWHmE heoVqaNEtPK/Mh6eFVElgqPOiF9o1wdcgW3olLpOFu628/cQNvNJPXkhoAorRkyVYSWF RSpfU0tySlb8NhJP1leCGzegJmKFqO1I2xvGMg0MOr5MK+fvfvDiShQHDHANagAx6dBo Ooeg== X-Gm-Message-State: AAQBX9fLLWrelM7iU7GuxdhavElmYpaXsRc38NBf9jhCeGByzzbHA2zL lthlhG4rdhLNVvi/oNpOjXH96vTpdOZi6DtX X-Google-Smtp-Source: AKy350aEWgmUGoq47vtgVPSLObUE1OHXFIc1JYGvZjZHSdRsfuGe4z8jTEoHLVbNJzm7tfhcDcAxLa1st977h/QM X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a25:d1d8:0:b0:b6d:fc53:c5c0 with SMTP id i207-20020a25d1d8000000b00b6dfc53c5c0mr4650822ybg.1.1680203885608; Thu, 30 Mar 2023 12:18:05 -0700 (PDT) Date: Thu, 30 Mar 2023 19:17:54 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-2-yosryahmed@google.com> Subject: [PATCH v3 1/8] cgroup: rename cgroup_rstat_flush_"irqsafe" to "atomic" From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org cgroup_rstat_flush_irqsafe() can be a confusing name. It may read as "irqs are disabled throughout", which is what the current implementation does (currently under discussion [1]), but is not the intention. The intention is that this function is safe to call from atomic contexts. Name it as such. Suggested-by: Johannes Weiner Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner --- include/linux/cgroup.h | 2 +- kernel/cgroup/rstat.c | 4 ++-- mm/memcontrol.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 3410aecffdb4..885f5395fcd0 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -692,7 +692,7 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) */ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp); +void cgroup_rstat_flush_atomic(struct cgroup *cgrp); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(void); diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 831f1f472bb8..d3252b0416b6 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -241,12 +241,12 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) } /** - * cgroup_rstat_flush_irqsafe - irqsafe version of cgroup_rstat_flush() + * cgroup_rstat_flush_atomic- atomic version of cgroup_rstat_flush() * @cgrp: target cgroup * * This function can be called from any context. */ -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) +void cgroup_rstat_flush_atomic(struct cgroup *cgrp) { unsigned long flags; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5abffe6f8389..0205e58ea430 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -642,7 +642,7 @@ static void __mem_cgroup_flush_stats(void) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; - cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); + cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); spin_unlock_irqrestore(&stats_flush_lock, flag); } From patchwork Thu Mar 30 19:17:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194746 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50E93C77B73 for ; Thu, 30 Mar 2023 19:18:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232052AbjC3TSK (ORCPT ); Thu, 30 Mar 2023 15:18:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232001AbjC3TSJ (ORCPT ); Thu, 30 Mar 2023 15:18:09 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DB2EEC50 for ; Thu, 30 Mar 2023 12:18:08 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id t12-20020a65608c000000b005091ec4f2d4so5859478pgu.20 for ; Thu, 30 Mar 2023 12:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203887; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LXlZsArwPDY7OuU2BxzeoXG1UXSoglEAfUQ5EzzMXG4=; b=QmYQM+j13cdbjBiZy3Y+dzkJUgGOVHTdr9lvDbSE859fYaJtq+8zmH02RDaKbdpvap CqejHKuoW/4IBz5mIHOf6Y8AHa6GMb/hCZ7zMABh1AePaLkgpcjAyzU7xGg8rQCDHyuT xd/GhSCAHuv9H/zwirCUR2wyGX+B6mkWwB9WVRwD19bK9qyHzg7+GCXy42vkNiRXXZdr kN4hOyhVE2VLeXDBEBRHB8fSA0RE0BkgU+jq32E4Jq+3xIhGwrpX/kGZ92PdxYBPFxyB 8OgXFYA1yB1rgGnRYxUqB9CL2/ZWFsevIT5f1UxupCOTzAHu4BNZEW/BqlkhNKD16qxi CoQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203887; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LXlZsArwPDY7OuU2BxzeoXG1UXSoglEAfUQ5EzzMXG4=; b=QEsXlL+t8BMYQLodvT5ZXK5M++CYdea86BQc+5mg7xixGhpfACFhc8W3Oskg6hfAZA RWN27MNq0h6O7HtkpjNL9QnxiATboIYmxk8bWV4ze6OHZlEG2hRgekK/t8tz6OXCWigw qJg1E7ezXYPCJeJarODrrfsA5tKArn/JcwPBWKThFzDAvjks8ZNIeQ2dq7DRkQYfqnOT 2Ak+87QVqb8gZIslhQvnrvbE0TvOChIyoQsup9WkCUgZ7cwBpvDx2bP7XTbYUhMpVGOp Oxiovc164y8o16ceqcHGbDO0EeiasRnv+m8iUigM9/bW7VCHL6vP7tlV69ucUGBV3APQ wb2A== X-Gm-Message-State: AAQBX9fBUZZhcCSsvkqL5PRJWV/yX4c1+e1CT7+K6ju0WvmpFd4cYrB8 UaP+abvEGSwDL8//MSnRD+OTuRmw8OIUauAZ X-Google-Smtp-Source: AKy350acDpkb2oKWRJtDPrvhJr6G7w3aWGLRFehCl/mgbnA6nqxhni7pGTt+2Eb5S7HVMulqi96jSmdvt00VgGJt X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:90a:ec11:b0:237:9cbe:22ad with SMTP id l17-20020a17090aec1100b002379cbe22admr7641616pjy.5.1680203887352; Thu, 30 Mar 2023 12:18:07 -0700 (PDT) Date: Thu, 30 Mar 2023 19:17:55 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-3-yosryahmed@google.com> Subject: [PATCH v3 2/8] memcg: rename mem_cgroup_flush_stats_"delayed" to "ratelimited" From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org mem_cgroup_flush_stats_delayed() suggests his is using a delayed_work, but this is actually sometimes flushing directly from the callsite. What it's doing is ratelimited calls. A better name would be mem_cgroup_flush_stats_ratelimited(). Suggested-by: Johannes Weiner Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner Acked-by: Michal Hocko --- include/linux/memcontrol.h | 4 ++-- mm/memcontrol.c | 2 +- mm/workingset.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b6eda2ab205d..ac3f3b3a45e2 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1037,7 +1037,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, } void mem_cgroup_flush_stats(void); -void mem_cgroup_flush_stats_delayed(void); +void mem_cgroup_flush_stats_ratelimited(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1535,7 +1535,7 @@ static inline void mem_cgroup_flush_stats(void) { } -static inline void mem_cgroup_flush_stats_delayed(void) +static inline void mem_cgroup_flush_stats_ratelimited(void) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0205e58ea430..c3b6aae78901 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -653,7 +653,7 @@ void mem_cgroup_flush_stats(void) __mem_cgroup_flush_stats(); } -void mem_cgroup_flush_stats_delayed(void) +void mem_cgroup_flush_stats_ratelimited(void) { if (time_after64(jiffies_64, flush_next_time)) mem_cgroup_flush_stats(); diff --git a/mm/workingset.c b/mm/workingset.c index 00c6f4d9d9be..af862c6738c3 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -462,7 +462,7 @@ void workingset_refault(struct folio *folio, void *shadow) mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - mem_cgroup_flush_stats_delayed(); + mem_cgroup_flush_stats_ratelimited(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Mar 30 19:17:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194747 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FF90C77B60 for ; Thu, 30 Mar 2023 19:18:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232097AbjC3TSU (ORCPT ); Thu, 30 Mar 2023 15:18:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232068AbjC3TSN (ORCPT ); Thu, 30 Mar 2023 15:18:13 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 674D3FF16 for ; Thu, 30 Mar 2023 12:18:10 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-54574d6204aso197363467b3.15 for ; Thu, 30 Mar 2023 12:18:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203889; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aVus0GFA9nzxOZ5lNIH1Nx8kHcLEa/aLkX98gl9HKOI=; b=Rb2fsB8aWLuAh/6UfhLacuvSldBOfVahXww4db+vgAZH6d8hHK5iS/Pb3yBuHu8zXu 8OHs3K+JgNd5BWKL8HmUqSNmeBMWOUfM93BgScQIJGiFyUTuS/OhqqsgB8B/zESgECWW id0La031n/vRfUHiqMH5ptV4ZUHLtuC1awDopegwUjfP8gYPSzkD9GboR6Sv+BT5F0il 4cP0p/kpaTUv9ECKBi7X6nDyHboM9b8Z+lLZ6iyDmkLsMIt+cWstg1DQo4e/ZnrIN56I iRBi8QgWvfbGxJTMXxv2zZuY7CwzGwzqEOEYe4XvyV9MRzo4MibTgBPKLkzWKa272DY9 IgGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203889; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aVus0GFA9nzxOZ5lNIH1Nx8kHcLEa/aLkX98gl9HKOI=; b=7XR/xUhcdZsWkNctqqCn4GX755dPHJHPnPglrZnNbEI+rXcONkgqCB4kv4weAdARIz Zc14mjIgolC8ZBxAeAQp08riVPeN7bFKS0KvEs+gkW10KpXKdwI1CdSQutvBtpLNPpV9 V5hxyEzgQMIspeF3wVCQ/yfjvFCa4SIhX2fsow4GwJkDvaZuxX+cvIQZpj8RiYABOtjn 2bNE4NiNqRBNeFGWLmWGxsqbgdrzy5U70TqmjZxj+daKMpeFbVl9jN57tmgC6qA8zPz6 /eAyxjGBm5x55fUeFlI1z+w4L1NxgNA3r329pBYuziQQKeLfidr0QCWHivcsiojAWZ5t ZYgA== X-Gm-Message-State: AAQBX9cHAm+Jf0S/hp/7OBRfNXFk/xSKIHzokpaiqUw1r7y8IcWbRoEa gda5dbUdXgv7kzv43M+rzFju9Ujpk+aIZS81 X-Google-Smtp-Source: AKy350az4CClmAfjDaxG6/AcT2u/aalMEwRTuNYj9qh25Mudi/3feszrufxHB4k9is1ESXC81UQXzGZVI+EC5ExV X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:10c3:b0:b79:4826:e8e3 with SMTP id w3-20020a05690210c300b00b794826e8e3mr10930499ybu.1.1680203889210; Thu, 30 Mar 2023 12:18:09 -0700 (PDT) Date: Thu, 30 Mar 2023 19:17:56 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-4-yosryahmed@google.com> Subject: [PATCH v3 3/8] memcg: do not flush stats in irq context From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Currently, the only context in which we can invoke an rstat flush from irq context is through mem_cgroup_usage() on the root memcg when called from memcg_check_events(). An rstat flush is an expensive operation that should not be done in irq context, so do not flush stats and use the stale stats in this case. Arguably, usage threshold events are not reliable on the root memcg anyway since its usage is ill-defined. Suggested-by: Johannes Weiner Suggested-by: Shakeel Butt Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner Acked-by: Michal Hocko --- mm/memcontrol.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c3b6aae78901..ff39f78f962e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3669,7 +3669,21 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) unsigned long val; if (mem_cgroup_is_root(memcg)) { - mem_cgroup_flush_stats(); + /* + * We can reach here from irq context through: + * uncharge_batch() + * |--memcg_check_events() + * |--mem_cgroup_threshold() + * |--__mem_cgroup_threshold() + * |--mem_cgroup_usage + * + * rstat flushing is an expensive operation that should not be + * done from irq context; use stale stats in this case. + * Arguably, usage threshold events are not reliable on the root + * memcg anyway since its usage is ill-defined. + */ + if (in_task()) + mem_cgroup_flush_stats(); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap) From patchwork Thu Mar 30 19:17:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194748 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C10DC77B70 for ; Thu, 30 Mar 2023 19:18:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232140AbjC3TSW (ORCPT ); Thu, 30 Mar 2023 15:18:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229933AbjC3TST (ORCPT ); Thu, 30 Mar 2023 15:18:19 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97BD1E1A4 for ; Thu, 30 Mar 2023 12:18:11 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id j7-20020a17090aeb0700b0023d19dfe884so5587658pjz.4 for ; Thu, 30 Mar 2023 12:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203891; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4nnUf1Bi8zb8HMhH9DtFsVEWdtAjMz82Sf34ZimV0fg=; b=X2RFp1pZWOYnRsIh7IWjsVsM+KEFc9WJSHxF/6WWA0h+C4aKaj5fmmxIF9JIbSgACd wSFJlPVZHSShfit9sIv2gEMkHiLDYI1aj0ZjNxN+KDFEUuJ5hrS0dAiMoZtgG8tE01Qw Z6IuI1YwDe0K3jJKl09PQ+88WTXCXmLj3mXjZyc3NpqNvz62t2kYidmfKl22OQzA3sQQ EAjXhE1Jh+vHizPfZUZ3LQ+IzSkqjVL6en02vxZp22lS6vBDlsf8gGPBHsHSCXP5Ac15 JluiS/D70cGu14FmTesFiFf/4cN9gJ7flhGTivB535PauxJnWfGkMyHN0wJ36oB1Mtk7 YlVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203891; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4nnUf1Bi8zb8HMhH9DtFsVEWdtAjMz82Sf34ZimV0fg=; b=6dDh/Kv+9WivcnC4NiIQ7oAgf2TCw/s51dhtYP2raVfpFh72wjWNlLGIdAslNGoCDA Oj56HTileTaE29ZG4TDguse1R+rFqnheeo9Pav81aD0paOW0AADpFZsIXz42xISYsKC1 1vgrIz6jgYb1/AQzmV5eZ1s+kqoqbUb/6YmTx6xZkK/JiZ5QrT1MaGjRGogVAEerbLNM 3JU4t/d+5T61FPt6XS46FWOyxnE5vZ9fwONnmkx6ismBXO54QMEaR/S/alXVHqJERVsW jmSKPPvWrLgA76Gqm1nAN/YOQa1HFbNhJ80sgUpuQLct2SZQZaFMrHMFj4nQvtlOhtiq Vm/g== X-Gm-Message-State: AAQBX9efjq3vVo2bzPj2lvJYdV4AL6fkKd1nCyMZAXvlb0TyZtO/Cvgy o+IWoOk424sPrwKPIgfgy6yv3LRIfLixumBn X-Google-Smtp-Source: AKy350ZmAXD9xXrcG5sJYCYyMiAcFBYVD5KvS63XKanxHd2/adSILGt8yBb9ajmjtAKFUTblNAGcd8BbimVIOaMZ X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6a00:2286:b0:627:e6d5:ba2d with SMTP id f6-20020a056a00228600b00627e6d5ba2dmr13290706pfe.6.1680203890966; Thu, 30 Mar 2023 12:18:10 -0700 (PDT) Date: Thu, 30 Mar 2023 19:17:57 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-5-yosryahmed@google.com> Subject: [PATCH v3 4/8] memcg: replace stats_flush_lock with an atomic From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org As Johannes notes in [1], stats_flush_lock is currently used to: (a) Protect updated to stats_flush_threshold. (b) Protect updates to flush_next_time. (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits. However: 1. stats_flush_threshold is already an atomic 2. flush_next_time is not atomic. The writer is locked, but the reader is lockless. If the reader races with a flush, you could see this: if (time_after(jiffies, flush_next_time)) spin_trylock() flush_next_time = now + delay flush() spin_unlock() spin_trylock() flush_next_time = now + delay flush() spin_unlock() which means we already can get flushes at a higher frequency than FLUSH_TIME during races. But it isn't really a problem. The reader could also see garbled partial updates if the compiler decides to split the write, so it needs at least READ_ONCE and WRITE_ONCE protection. 3. Serializing cgroup_rstat_flush() calls against the ratelimit factors is currently broken because of the race in 2. But the race is actually harmless, all we might get is the occasional earlier flush. If there is no delta, the flush won't do much. And if there is, the flush is justified. So the lock can be removed all together. However, the lock also served the purpose of preventing a thundering herd problem for concurrent flushers, see [2]. Use an atomic instead to serve the purpose of unifying concurrent flushers. [1]https://lore.kernel.org/lkml/20230323172732.GE739026@cmpxchg.org/ [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb@google.com/ Signed-off-by: Yosry Ahmed Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: Michal Hocko --- mm/memcontrol.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ff39f78f962e..65750f8b8259 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -585,8 +585,8 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); -static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); +static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); static u64 flush_next_time; @@ -636,15 +636,19 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * We always flush the entire tree, so concurrent flushers can just + * skip. This avoids a thundering herd problem on the rstat global lock + * from memcg flushers (e.g. reclaim, refault, etc). + */ + if (atomic_read(&stats_flush_ongoing) || + atomic_xchg(&stats_flush_ongoing, 1)) return; - flush_next_time = jiffies_64 + 2*FLUSH_TIME; + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + atomic_set(&stats_flush_ongoing, 0); } void mem_cgroup_flush_stats(void) @@ -655,7 +659,7 @@ void mem_cgroup_flush_stats(void) void mem_cgroup_flush_stats_ratelimited(void) { - if (time_after64(jiffies_64, flush_next_time)) + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) mem_cgroup_flush_stats(); } From patchwork Thu Mar 30 19:17:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194749 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EEF9C77B60 for ; Thu, 30 Mar 2023 19:18:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232082AbjC3TS0 (ORCPT ); Thu, 30 Mar 2023 15:18:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232106AbjC3TSU (ORCPT ); Thu, 30 Mar 2023 15:18:20 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 216421025E for ; Thu, 30 Mar 2023 12:18:13 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id d2-20020a170902cec200b001a1e8390831so11596320plg.5 for ; Thu, 30 Mar 2023 12:18:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203893; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4K13XuzFhLt0ZIgNSFirN/NTpinGGuhSsjDOtu88ZFQ=; b=nHw0e79hZFaw3nf4LyIXmmhTYfA1FbBbf2dimkl5At5Psg3CjvNgu6oG6xad78uoN/ S2j2UYV8EipalJrwt2zNvbI6g73PyAp4x9X0MELejGCTtg493QkbzRVplqF8mo+gd+ZT v2g6y/Uq/dc/IcDqjVuG6qTAXC3lltOSedgQENPqcdYx34VNoyr/anJbPmkUtw6k7KGZ kbHnvz3Vd5/Dm/bxM/NXhZiy/XES3eJpoxargQYJd3pDWMStv8K0/vXl6hkWv2MnHLzC WyPbqSXfAwOToFWCyu1C3jP4ndGUE9RCrti/toBLaYuvsaopTHlhvDYChxTi2n0kxppd O1Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203893; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4K13XuzFhLt0ZIgNSFirN/NTpinGGuhSsjDOtu88ZFQ=; b=q9xeGhftZc4JJjxPDpLur3qlmPpbcxqZe/2PIx0vY22uq2WS6O872GXBGVtLmF1416 4U21fWu0YtmG7kpzvJFVaYoabiPPHrkLleIcTAQFLAqQOT1GUldIcA0OHho/NTRjdWri nHVCOJz9wzraz9HTa9xwmgt4fisXv0trexy+8hlmumJjJl/dTXISEaI7U80H05e5Q+kZ KnIQAZ/47Qz+eEirrUrOXgJiEP03jtO70ncOBui9c849ao7uBULN6aflqLNELdGvHciz JXueFbJrshx3YAAOU7qbHVYB0v9VM0Lt+I7FOhQkbElGtqUnEFgBYoqs9piTisiMZCV7 37qg== X-Gm-Message-State: AAQBX9eDR+0DloH7eFaKXu0XifT3bQm0NQzbtbBvAZPpDjobvMikKchS 4YtYJnBWXjw8I+WMzfH0UB7uenOQRQNxR/VT X-Google-Smtp-Source: AKy350YSzgQBtWYiTKsbkJ1oQdlD62/Xn6W7mjRdu2TN+RxVPyaHwSjuEqzrgKulPryFVMSvh1+rAuiwX+ctKyIl X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a63:1914:0:b0:50b:de95:af77 with SMTP id z20-20020a631914000000b0050bde95af77mr2125406pgl.1.1680203892860; Thu, 30 Mar 2023 12:18:12 -0700 (PDT) Date: Thu, 30 Mar 2023 19:17:58 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-6-yosryahmed@google.com> Subject: [PATCH v3 5/8] memcg: sleep during flushing stats in safe contexts From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Refactor the code to make mem_cgroup_flush_stats() non-atomic (aka sleepable), and provide a separate atomic version. The atomic version is used in reclaim, refault, writeback, and in mem_cgroup_usage(). All other code paths are left to use the non-atomic version. This includes callbacks for userspace reads and the periodic flusher. Since refault is the only caller of mem_cgroup_flush_stats_ratelimited(), change it to mem_cgroup_flush_stats_atomic_ratelimited(). Reclaim and refault code paths are modified to do non-atomic flushing in separate later patches -- so it will eventually be changed back to mem_cgroup_flush_stats_ratelimited(). Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Michal Hocko Acked-by: Johannes Weiner Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 9 ++++++-- mm/memcontrol.c | 45 ++++++++++++++++++++++++++++++-------- mm/vmscan.c | 2 +- mm/workingset.c | 2 +- 4 files changed, 45 insertions(+), 13 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ac3f3b3a45e2..b424ba3ebd09 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1037,7 +1037,8 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, } void mem_cgroup_flush_stats(void); -void mem_cgroup_flush_stats_ratelimited(void); +void mem_cgroup_flush_stats_atomic(void); +void mem_cgroup_flush_stats_atomic_ratelimited(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1535,7 +1536,11 @@ static inline void mem_cgroup_flush_stats(void) { } -static inline void mem_cgroup_flush_stats_ratelimited(void) +static inline void mem_cgroup_flush_stats_atomic(void) +{ +} + +static inline void mem_cgroup_flush_stats_atomic_ratelimited(void) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 65750f8b8259..a2ce3aa10d94 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -634,7 +634,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) } } -static void __mem_cgroup_flush_stats(void) +static void do_flush_stats(bool atomic) { /* * We always flush the entire tree, so concurrent flushers can just @@ -646,26 +646,46 @@ static void __mem_cgroup_flush_stats(void) return; WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); - cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); + + if (atomic) + cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); + else + cgroup_rstat_flush(root_mem_cgroup->css.cgroup); + atomic_set(&stats_flush_threshold, 0); atomic_set(&stats_flush_ongoing, 0); } +static bool should_flush_stats(void) +{ + return atomic_read(&stats_flush_threshold) > num_online_cpus(); +} + void mem_cgroup_flush_stats(void) { - if (atomic_read(&stats_flush_threshold) > num_online_cpus()) - __mem_cgroup_flush_stats(); + if (should_flush_stats()) + do_flush_stats(false); } -void mem_cgroup_flush_stats_ratelimited(void) +void mem_cgroup_flush_stats_atomic(void) +{ + if (should_flush_stats()) + do_flush_stats(true); +} + +void mem_cgroup_flush_stats_atomic_ratelimited(void) { if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_atomic(); } static void flush_memcg_stats_dwork(struct work_struct *w) { - __mem_cgroup_flush_stats(); + /* + * Always flush here so that flushing in latency-sensitive paths is + * as cheap as possible. + */ + do_flush_stats(false); queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); } @@ -3685,9 +3705,12 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) * done from irq context; use stale stats in this case. * Arguably, usage threshold events are not reliable on the root * memcg anyway since its usage is ill-defined. + * + * Additionally, other call paths through memcg_check_events() + * disable irqs, so make sure we are flushing stats atomically. */ if (in_task()) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_atomic(); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap) @@ -4610,7 +4633,11 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(); + /* + * wb_writeback() takes a spinlock and calls + * wb_over_bg_thresh()->mem_cgroup_wb_stats(). Do not sleep. + */ + mem_cgroup_flush_stats_atomic(); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); diff --git a/mm/vmscan.c b/mm/vmscan.c index 9c1c5e8b24b8..a9511ccb936f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_atomic(); /* * Determine the scan balance between anon and file LRUs. diff --git a/mm/workingset.c b/mm/workingset.c index af862c6738c3..dab0c362b9e3 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -462,7 +462,7 @@ void workingset_refault(struct folio *folio, void *shadow) mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - mem_cgroup_flush_stats_ratelimited(); + mem_cgroup_flush_stats_atomic_ratelimited(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Mar 30 19:17:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194768 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17915C6FD1D for ; Thu, 30 Mar 2023 19:19:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232221AbjC3TSj (ORCPT ); Thu, 30 Mar 2023 15:18:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231255AbjC3TSV (ORCPT ); Thu, 30 Mar 2023 15:18:21 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B23C7E3A4 for ; Thu, 30 Mar 2023 12:18:15 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id o4-20020a056a00214400b00627ddde00f4so9246322pfk.4 for ; Thu, 30 Mar 2023 12:18:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203895; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zjUVlGzaNGh8M5BDppplh7Z2E1z5drNgoNePIuD8ZZo=; b=EVT1tk0hlY3K/RTYvcQHMKPpwQPVXOnpdSgQM68BKQtZmfmON2APgqvMOvMMFyRoss Dmd2H4VUZwWGIfs6kzYLS/yY0LBdZ4oH1VkvSRWDGcSOJqUREz+4yAA6vO0jGdRbUllx xh0AWez/MaDxogJM7rwIORKSy2FNGdepATzkqraYhiO3RCB8HnWcqe3zSlq35IxkzxtC FV1PJ6neoxu760xZAlTGk8nkupYvKz5UcVjEtp57Ggrbl9Lds+WScYdzag5BG/1gW5Yv DUw8gcccaLuVJ49JXssrnmfFrYddyvv++tF0t2hffuUaTqszDGsJ7ZLQoChqmjJGkrqP YfTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203895; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zjUVlGzaNGh8M5BDppplh7Z2E1z5drNgoNePIuD8ZZo=; b=yfs6fjfjoMCe9ytrzuCZTcJRBoZ8k9ZEd4AEPs3dDNXRw1QWLteDsYMWG4n0l4pFGj SWpId3IcvSIojx4lVcDZXMbOAVlnODvXdRUh4HuEykqLnN3CAO0/bLRynF8E/W5CCXzW S6Gs8krU9nDrfBmp4MC7V1yEBQDp89BgrI4iq+0cDaqvD8WlqrW1fpgjlQLVnGzV3N9T GtXMMV+tmZMAAvLgHamA37wLrE9mTgraL6lBIpMkc0yuQCaoFxRv3XuEgd/njAnJkpKV soZ73a1qyCmdNuSB9YouU7042hYTevyccN7VsXGUwJB7TXUX2nPbMU2YvsUDCUzftYpG j6yA== X-Gm-Message-State: AAQBX9cJA4DaXPL07jHm4h3bPcBbWezdG3PNw2Pds8XjSnvpiX8cJqZg d4m5GX9jfeff4LMFv0lqsLLgvJ/aZn4EiDcS X-Google-Smtp-Source: AKy350YjU2Y2MgqNkqgZoIOPOW0+HOYv1MqQ0pV2FUPZVEGY798zCaQjIBhDtJyF0YZxiEsyw9QKlZMU8q/qm34J X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:902:6b44:b0:1a2:6e4d:7832 with SMTP id g4-20020a1709026b4400b001a26e4d7832mr2680127plt.1.1680203894775; Thu, 30 Mar 2023 12:18:14 -0700 (PDT) Date: Thu, 30 Mar 2023 19:17:59 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-7-yosryahmed@google.com> Subject: [PATCH v3 6/8] workingset: memcg: sleep when flushing stats in workingset_refault() From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org In workingset_refault(), we call mem_cgroup_flush_stats_atomic_ratelimited() to read accurate stats within an RCU read section and with sleeping disallowed. Move the call above the RCU read section to make it non-atomic. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Since workingset_refault() is the only caller of mem_cgroup_flush_stats_atomic_ratelimited(), just make it non-atomic, and rename it to mem_cgroup_flush_stats_ratelimited(). Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner Acked-by: Michal Hocko --- include/linux/memcontrol.h | 4 ++-- mm/memcontrol.c | 4 ++-- mm/workingset.c | 5 +++-- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b424ba3ebd09..a4bc3910a2eb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1038,7 +1038,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, void mem_cgroup_flush_stats(void); void mem_cgroup_flush_stats_atomic(void); -void mem_cgroup_flush_stats_atomic_ratelimited(void); +void mem_cgroup_flush_stats_ratelimited(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1540,7 +1540,7 @@ static inline void mem_cgroup_flush_stats_atomic(void) { } -static inline void mem_cgroup_flush_stats_atomic_ratelimited(void) +static inline void mem_cgroup_flush_stats_ratelimited(void) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a2ce3aa10d94..361c0bbf7283 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -673,10 +673,10 @@ void mem_cgroup_flush_stats_atomic(void) do_flush_stats(true); } -void mem_cgroup_flush_stats_atomic_ratelimited(void) +void mem_cgroup_flush_stats_ratelimited(void) { if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) - mem_cgroup_flush_stats_atomic(); + mem_cgroup_flush_stats(); } static void flush_memcg_stats_dwork(struct work_struct *w) diff --git a/mm/workingset.c b/mm/workingset.c index dab0c362b9e3..3025beee9b34 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -406,6 +406,9 @@ void workingset_refault(struct folio *folio, void *shadow) unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); eviction <<= bucket_order; + /* Flush stats (and potentially sleep) before holding RCU read lock */ + mem_cgroup_flush_stats_ratelimited(); + rcu_read_lock(); /* * Look up the memcg associated with the stored ID. It might @@ -461,8 +464,6 @@ void workingset_refault(struct folio *folio, void *shadow) lruvec = mem_cgroup_lruvec(memcg, pgdat); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - - mem_cgroup_flush_stats_atomic_ratelimited(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Mar 30 19:18:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194769 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 572CCC77B6D for ; Thu, 30 Mar 2023 19:19:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229609AbjC3TTG (ORCPT ); Thu, 30 Mar 2023 15:19:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232106AbjC3TS0 (ORCPT ); Thu, 30 Mar 2023 15:18:26 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1FBE1043B for ; Thu, 30 Mar 2023 12:18:17 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id 16-20020a056a00073000b006260bbaa3a4so9335077pfm.16 for ; Thu, 30 Mar 2023 12:18:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203896; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xcUaqJ6BFfuwa7nRJSvBlAwhhxeRf5mk+29EPOCHz/Q=; b=ltNePRXuZIePvYweAMHUj9ZiAeC5qavTrptvs7HWUZy4ZQjHtls0D9eRHEg0pqkaqA 2rj8RSL+AK/a3iTAV1slsfA/XO8i87DoFdpL6M03j+wstTxD0gNLM1Eo1e2c6UpyLkfA 02M3+F6NXEWgNHJDDpxLzGC2sGDBHCNs+LO2yZgTc79ihpz2oe9Gkrj2oQSZRb8nP/pk DTjpgFJ4LNEzmr5tCkDpR0cOQrENPev1JFpwtylRIUOchlXAi5bgsrxJgKXBgxobydtr bGBrN9gUtdaDKVSRv2j9cJxRBN0C3BLXf8CobNS0sZxfbCzcE+BakwfvTig0jcnVMYxC dA2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203896; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xcUaqJ6BFfuwa7nRJSvBlAwhhxeRf5mk+29EPOCHz/Q=; b=hY0ARgh4V56/h9LKezTcGnGKzVnfBS7dC9OEBvPz8DL32ip4COg98foRk+bgo0STKx jsTmxj0J9Qte2AIHmwyH+fuNp4X7+xscVzsbqLz3UCPXXLVtdX3fni/2kGLIBU6fZj/j gKMw4VPRNIIcqk26nJxINAwm0SWD9QJJN9Q9d58CoY5k71VLhojIWGUtdhWJMCcJZKbG vWotAhscn1Q0/0A4rjyj4OFU+LV5VyfJliDzWNt2Ff19xZbf2BVvVa2OLrSZsU8PZi7u Jc5iqnZJ31BLsPPFwOY4HAxaI4NuvA6itvwzjEIw3gImMoBwVtsz7MbYjElTEhwo55iC SVtg== X-Gm-Message-State: AAQBX9cFkxn33qEbzkHzBAo5tO1h0BiQoNZWz83+EmsVaIB4Yvk6HX+U sD0WbVU+D74UL19iFdIJ2AG3ZoParOGCENJ+ X-Google-Smtp-Source: AKy350bGhjPkePduCVIwuMcmgqQ6a76256SvlUw7LUz9ChuwvjB5lA7RHnWs+sTxxUB3GpOtDLTlb1QCGdSZixrZ X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a63:40c1:0:b0:4fd:5105:eb93 with SMTP id n184-20020a6340c1000000b004fd5105eb93mr6667473pga.3.1680203896567; Thu, 30 Mar 2023 12:18:16 -0700 (PDT) Date: Thu, 30 Mar 2023 19:18:00 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-8-yosryahmed@google.com> Subject: [PATCH v3 7/8] vmscan: memcg: sleep when flushing stats during reclaim From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Memory reclaim is a sleepable context. Flushing is an expensive operaiton that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically unnecessarily. This can slow down reclaim code if flushing stats is taking too long, but there is already multiple cond_resched()'s in reclaim code. Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner Acked-by: Michal Hocko --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index a9511ccb936f..9c1c5e8b24b8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats_atomic(); + mem_cgroup_flush_stats(); /* * Determine the scan balance between anon and file LRUs. From patchwork Thu Mar 30 19:18:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194770 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D3B3C6FD1D for ; Thu, 30 Mar 2023 19:19:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230088AbjC3TTH (ORCPT ); Thu, 30 Mar 2023 15:19:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232217AbjC3TSj (ORCPT ); Thu, 30 Mar 2023 15:18:39 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B625EB55 for ; Thu, 30 Mar 2023 12:18:19 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-546422bd3ceso21437277b3.21 for ; Thu, 30 Mar 2023 12:18:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203898; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WbNC07VGnJucfMQdaVru/HWxeCTMcNM6hFkgQ/MCeFU=; b=PUv9MA0n+Z8fm7ErrGvHySjECm+JmVMmb7yIewlfjR/61XxilLPkk2HWdbFMTfUOaL VEd5S3Db2lj6fZaM1EVkiIBt/SsoMNdbAydQDGkFf9C+290T1e1uwhZ4tZ9DFf8Phq0h P4OVDvENTCX+3IndK3Cvs6OkRD4foG1c4Yew7PAGOLNrfjOHkmrszhM9+d8Istzt+wK/ krUK3yorRZOqS1BM/g9ye1FRB5jRNaXtQwUxDYcXcRNeFj0C93RhkVtRFak+8MvkC/U9 MW/LMGQgSZbJ8G2s7XbIxfyCgArJtDyQ9vNCBWNE1utNTCLeMAHTcanEcYFbU2EbD5hm feog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203898; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WbNC07VGnJucfMQdaVru/HWxeCTMcNM6hFkgQ/MCeFU=; b=VRMreukfjDhDToHglRfUHYbybrdkjf8Skt55QtgYDFvRisd5/oCzgEPLnvTTlJtcUC 0yKD/oKcgYq4gFUJkfx4pq6MmZiySXunDicb7Oz+yGUojjNNaRS2jYjRvAT68xyMohUP 1Klt+sjGajTeNGdH/aSgZG9n/0v+N+XM5DIbXgsbMHU4v0ggaUDHrb022kOVbDm+37T/ Ly6CwkkIu4gw29o1Zt8qJMW8TdO5kh3vg177trehCUhVGCnHYjyfk//7OUHPTjeRATJy PbcsvZ8GkrpCNKg8mAr/kuEhQERlisvBMPIPuLlAn6WCZ5wxYrO/7KoUXsIXaOn13bS5 +/mg== X-Gm-Message-State: AAQBX9fhgsOgMosV4O/D1mmpmIyCAzWdhyfizNNOAtHgrM8Wzj6wM99N Eoy+2Z7kHHW5GILZID0Y6TmXmNjnmfhBtFtl X-Google-Smtp-Source: AKy350Z/bGsj2O9Q2OanxyiTVpU0KIn4cOhMHnfEJjh1llR4Xo1YrB6Sk31TDPAteoghtoX9TuixuaLotnLFVi3d X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a25:12d4:0:b0:b6e:d788:eba4 with SMTP id 203-20020a2512d4000000b00b6ed788eba4mr12664999ybs.6.1680203898630; Thu, 30 Mar 2023 12:18:18 -0700 (PDT) Date: Thu, 30 Mar 2023 19:18:01 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-9-yosryahmed@google.com> Subject: [PATCH v3 8/8] memcg: do not modify rstat tree for zero updates From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org In some situations, we may end up calling memcg_rstat_updated() with a value of 0, which means the stat was not actually updated. An example is if we fail to reclaim any pages in shrink_folio_list(). Do not add the cgroup to the rstat updated tree in this case, to avoid unnecessarily flushing it. Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner Acked-by: Michal Hocko Reviewed-by: Michal Koutný --- mm/memcontrol.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 361c0bbf7283..a63ee2efa780 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -618,6 +618,9 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { unsigned int x; + if (!val) + return; + cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id()); x = __this_cpu_add_return(stats_updates, abs(val));