From patchwork Wed Sep 22 22:49:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 12511563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FFFEC433F5 for ; Wed, 22 Sep 2021 22:50:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BBE3561107 for ; Wed, 22 Sep 2021 22:50:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BBE3561107 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 0F20D6B006C; Wed, 22 Sep 2021 18:50:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A3AC6B0071; Wed, 22 Sep 2021 18:50:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED2FE900002; Wed, 22 Sep 2021 18:50:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id DABF16B006C for ; Wed, 22 Sep 2021 18:50:09 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 984DE181CBDAC for ; Wed, 22 Sep 2021 22:50:09 +0000 (UTC) X-FDA: 78616704138.29.8AAE71B Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf18.hostedemail.com (Postfix) with ESMTP id 57AB94002085 for ; Wed, 22 Sep 2021 22:50:09 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id rm6-20020a17090b3ec600b0019ce1db4eaeso6257796pjb.5 for ; Wed, 22 Sep 2021 15:50:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=fYT2H3FnVzXmjkC8XwE06RYc4twnR3x9ZhWA5eYW5cM=; b=rH+lVrWvrauGyiUwv5CSwXopm3CgX3XPqpRDHpOOCjDUe+yvZkUEaNfH6ZQ2UYO+Ek 7TEMmGYaBrPb96sMk+TKXY21cu3ellnypTef268Klog7veZGxSSD8h0dVJbmTXk2clr2 0dFJgysUBwznrfNhSvprOIVnJDciDtVgTRGZuj5rA6lLhpuuyYlQbsEggPRNrqFdfy+g zGBAY2bQIFDS64BERxlmoUUih3bidH8IrEzFZukq+Uewj9fuyIm/Fi/C3l7z/6ilMLy4 1Q02afBkMQj88St/gnheNtf3i2b1TX4e/LQYpiUbSCYqEv6nmNOXbgFu4isIOQO99liN 68uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=fYT2H3FnVzXmjkC8XwE06RYc4twnR3x9ZhWA5eYW5cM=; b=eDqHwz8jg3fQoC5t2j+VcbUFZiD+WLC2PoWU7pLNs/XZCcKVOKi6qzDzleUNTvWj4j k3vSVF5Gn21W32EhXBq/jCLK2XF6IoLbxfCwQj6UB2skd9cYk0fM7VCS5HYaITtIQuHd HfUAMy5zu34qi70R+Z3VHBBSo6an+XaqYg5EGZi/J0IZq8TW8bnW1Lszj7d6EyECnkWX 8PEG3c60/hw98nfVaSWDISP/TYABE5Y/FhGUi99R8BdswprBV9V2pSuLA2RkALG1o6rx Ze54Wh8ntQW93ps/6PkB4dc+GYFiOduBYKk5vfBekRXAvMR2hT/RklI1Ng428jY7pWwM 0sCA== X-Gm-Message-State: AOAM531CfiBMJreYT8FllNt7s8aWj43MqlAPmm3p3BxEkM+P7c/oj3eD RsLMOh8fNvL7YPLG2iatEo9+Uz4OJVDj7Q== X-Google-Smtp-Source: ABdhPJznY6PL1nh29qPs0zhG801cikn0glY0dATnTSJNV1ctf+V5pzXrBNDRXS/pyVgnIRtlqYYio0ssmI9pNA== X-Received: from shakeelb.svl.corp.google.com ([2620:15c:2cd:202:d448:5a1d:9294:788]) (user=shakeelb job=sendgmr) by 2002:a65:508a:: with SMTP id r10mr1259313pgp.96.1632351008138; Wed, 22 Sep 2021 15:50:08 -0700 (PDT) Date: Wed, 22 Sep 2021 15:49:06 -0700 Message-Id: <20210922224906.676151-1-shakeelb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH] memcg: flush lruvec stats in the refault From: Shakeel Butt To: Johannes Weiner Cc: Roman Gushchin , Michael Larabel , Feng Tang , Michal Hocko , Hillf Danton , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Andrew Morton , Linus Torvalds , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 57AB94002085 X-Stat-Signature: 6hbjbipeaawe8nxesrhhzig5i94hr6dh Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=rH+lVrWv; spf=pass (imf18.hostedemail.com: domain of 3ILNLYQgKCMU3slvppwmrzzrwp.nzxwty58-xxv6lnv.z2r@flex--shakeelb.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3ILNLYQgKCMU3slvppwmrzzrwp.nzxwty58-xxv6lnv.z2r@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1632351009-747861 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Prior to the commit 7e1c0d6f5820 ("memcg: switch lruvec stats to rstat") and the commit aa48e47e3906 ("memcg: infrastructure to flush memcg stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus * 32) at worst and for unbounded amount of time. The commit aa48e47e3906 moved the lruvec stats to rstat infrastructure and the commit 7e1c0d6f5820 bounded the error for all the lruvec stats to (nr_cpus * 32) at worst for at most 2 seconds. More specifically it decoupled the number of stats and the number of cgroups from the error rate. However this reduction in error comes with the cost of triggering the slowpath of stats update more frequently. Previously in the slowpath the kernel adds the stats up the memcg tree. After aa48e47e3906, the kernel triggers the asyn lruvec stats flush through queue_work(). This causes regression reports from 0day kernel bot [1] as well as from phoronix test suite [2]. We tried two options to fix the regression: 1) Increase the threshold to trigger the slowpath in lruvec stats update codepath from 32 to 512. 2) Remove the slowpath from lruvec stats update codepath and instead flush the stats in the page refault codepath. The assumption is that the kernel timely flush the stats, so, the update tree would be small in the refault codepath to not cause the preformance impact. Following are the results of will-it-scale/page_fault[1|2|3] benchmark on four settings i.e. (1) 5.15-rc1 as baseline (2) 5.15-rc1 with aa48e47e3906 and 7e1c0d6f5820 reverted (3) 5.15-rc1 with option-1 (4) 5.15-rc1 with option-2. test (1) (2) (3) (4) pg_f1 368563 406277 (10.23%) 399693 (8.44%) 416398 (12.97%) pg_f2 338399 372133 (9.96%) 369180 (9.09%) 381024 (12.59%) pg_f3 500853 575399 (14.88%) 570388 (13.88%) 576083 (15.02%) From the above result, it seems like the option-2 not only solves the regression but also improves the performance for at least these benchmarks. Feng Tang (intel) ran the aim7 benchmark with these two options and confirms that option-1 reduces the regression but option-2 removes the regression. Michael Larabel (phoronix) ran multiple benchmarks with these options and reported the results at [3] and it shows for most benchmarks option-2 removes the regression introduced by the commit aa48e47e3906 ("memcg: infrastructure to flush memcg stats"). Based on the experiment results, this patch proposed the option-2 as the solution to resolve the regression. [1] https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [2] https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [3] https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 Fixes: aa48e47e3906 ("memcg: infrastructure to flush memcg stats") Signed-off-by: Shakeel Butt Tested-by: Michael Larabel --- mm/memcontrol.c | 10 ---------- mm/workingset.c | 1 + 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b762215d73eb..6da5020a8656 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -106,9 +106,6 @@ static bool do_memsw_account(void) /* memcg and lruvec stats flushing */ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); -static void flush_memcg_stats_work(struct work_struct *w); -static DECLARE_WORK(stats_flush_work, flush_memcg_stats_work); -static DEFINE_PER_CPU(unsigned int, stats_flush_threshold); static DEFINE_SPINLOCK(stats_flush_lock); #define THRESHOLDS_EVENTS_TARGET 128 @@ -682,8 +679,6 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, /* Update lruvec */ __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); - if (!(__this_cpu_inc_return(stats_flush_threshold) % MEMCG_CHARGE_BATCH)) - queue_work(system_unbound_wq, &stats_flush_work); } /** @@ -5361,11 +5356,6 @@ static void flush_memcg_stats_dwork(struct work_struct *w) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); } -static void flush_memcg_stats_work(struct work_struct *w) -{ - mem_cgroup_flush_stats(); -} - static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); diff --git a/mm/workingset.c b/mm/workingset.c index d4268d8e9a82..d5b81e4f4cbe 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -352,6 +352,7 @@ void workingset_refault(struct page *page, void *shadow) inc_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file); + mem_cgroup_flush_stats(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if