From patchwork Tue Dec 29 14:35:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 11992443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24B3EC433DB for ; Tue, 29 Dec 2020 14:35:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9C69E20825 for ; Tue, 29 Dec 2020 14:35:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9C69E20825 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 951B58D002D; Tue, 29 Dec 2020 09:35:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 900D88D002C; Tue, 29 Dec 2020 09:35:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 819118D002D; Tue, 29 Dec 2020 09:35:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6D03E8D002C for ; Tue, 29 Dec 2020 09:35:23 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 212B48249980 for ; Tue, 29 Dec 2020 14:35:23 +0000 (UTC) X-FDA: 77646567726.26.worm99_200ee892749d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id E402F1804B65A for ; Tue, 29 Dec 2020 14:35:22 +0000 (UTC) X-HE-Tag: worm99_200ee892749d X-Filterd-Recvd-Size: 3682 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 29 Dec 2020 14:35:20 +0000 (UTC) IronPort-SDR: DQTlbItKicezHMvjazzt3hZQbdXUELzBzjuaptGB/6sDR1++t2HntqsWUIO3Uut7DcW7kwZXUt ze5FQISH3GWA== X-IronPort-AV: E=McAfee;i="6000,8403,9849"; a="238051116" X-IronPort-AV: E=Sophos;i="5.78,458,1599548400"; d="scan'208";a="238051116" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Dec 2020 06:35:18 -0800 IronPort-SDR: 90Tie2bCZPBy/9Ux/SljHFQW6M/rurKEfGWOYmOFx7dMBOJtv+/ehiW22sam+UhhnQ1/zeywBI IjAXYLa4RmPQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.78,458,1599548400"; d="scan'208";a="419194790" Received: from shbuild999.sh.intel.com ([10.239.147.98]) by orsmga001.jf.intel.com with ESMTP; 29 Dec 2020 06:35:15 -0800 From: Feng Tang To: Andrew Morton , Michal Hocko , Johannes Weiner , Vladimir Davydov , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, andi.kleen@intel.com, tim.c.chen@intel.com, dave.hansen@intel.com, ying.huang@intel.com, Feng Tang , Roman Gushchin Subject: [PATCH 1/2] mm: page_counter: relayout structure to reduce false sharing Date: Tue, 29 Dec 2020 22:35:13 +0800 Message-Id: <1609252514-27795-1-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When checking a memory cgroup related performance regression [1], from the perf c2c profiling data, we found high false sharing for accessing 'usage' and 'parent'. On 64 bit system, the 'usage' and 'parent' are close to each other, and easy to be in one cacheline (for cacheline size == 64+ B). 'usage' is usally written, while 'parent' is usually read as the cgroup's hierarchical counting nature. So move the 'parent' to the end of the structure to make sure they are in different cache lines. Following are some performance data with the patch, against v5.11-rc1, on several generations of Xeon platforms. Most of the results are improvements, with only one malloc case on one platform shows a -4.0% regression. Each category below has several subcases run on different platform, and only the worst and best scores are listed: fio: +1.8% ~ +8.3% will-it-scale/malloc1: -4.0% ~ +8.9% will-it-scale/page_fault1: no change will-it-scale/page_fault2: +2.4% ~ +20.2% [1].https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Signed-off-by: Feng Tang Cc: Roman Gushchin Cc: Johannes Weiner Reviewed-by: Roman Gushchin --- include/linux/page_counter.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index 85bd413..6795913 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -12,7 +12,6 @@ struct page_counter { unsigned long low; unsigned long high; unsigned long max; - struct page_counter *parent; /* effective memory.min and memory.min usage tracking */ unsigned long emin; @@ -27,6 +26,14 @@ struct page_counter { /* legacy */ unsigned long watermark; unsigned long failcnt; + + /* + * 'parent' is placed here to be far from 'usage' to reduce + * cache false sharing, as 'usage' is written mostly while + * parent is frequently read for cgroup's hierarchical + * counting nature. + */ + struct page_counter *parent; }; #if BITS_PER_LONG == 32 From patchwork Tue Dec 29 14:35:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 11992445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E17CC433E6 for ; Tue, 29 Dec 2020 14:35:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A5CF620825 for ; Tue, 29 Dec 2020 14:35:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A5CF620825 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 227268D002E; Tue, 29 Dec 2020 09:35:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 200AD8D002C; Tue, 29 Dec 2020 09:35:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13B378D002E; Tue, 29 Dec 2020 09:35:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id F197B8D002C for ; Tue, 29 Dec 2020 09:35:25 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B944B180AD80F for ; Tue, 29 Dec 2020 14:35:25 +0000 (UTC) X-FDA: 77646567810.15.egg24_2b125ff2749d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 8E18E1814B0C1 for ; Tue, 29 Dec 2020 14:35:25 +0000 (UTC) X-HE-Tag: egg24_2b125ff2749d X-Filterd-Recvd-Size: 4826 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 29 Dec 2020 14:35:23 +0000 (UTC) IronPort-SDR: T095ifLz01piAm4zt9VWY2bYn0dBGfBDEgCVG50KtxZrOMpp6vrLrKu9VSXx11eYCOCUqk/Msi GJECO+XNKm1w== X-IronPort-AV: E=McAfee;i="6000,8403,9849"; a="238051121" X-IronPort-AV: E=Sophos;i="5.78,458,1599548400"; d="scan'208";a="238051121" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Dec 2020 06:35:23 -0800 IronPort-SDR: eYxPnBAWuuBFssEaZl+on9r7PyKlV5TajRHuQ2CGZcaFo3np2Uiks/13bzVdmbX/LUrwDE110k VixndsB/+xFw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.78,458,1599548400"; d="scan'208";a="419194823" Received: from shbuild999.sh.intel.com ([10.239.147.98]) by orsmga001.jf.intel.com with ESMTP; 29 Dec 2020 06:35:18 -0800 From: Feng Tang To: Andrew Morton , Michal Hocko , Johannes Weiner , Vladimir Davydov , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, andi.kleen@intel.com, tim.c.chen@intel.com, dave.hansen@intel.com, ying.huang@intel.com, Feng Tang , Shakeel Butt , Roman Gushchin Subject: [PATCH 2/2] mm: memcg: add a new MEMCG_UPDATE_BATCH Date: Tue, 29 Dec 2020 22:35:14 +0800 Message-Id: <1609252514-27795-2-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1609252514-27795-1-git-send-email-feng.tang@intel.com> References: <1609252514-27795-1-git-send-email-feng.tang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When profiling memory cgroup involved benchmarking, status update sometimes take quite some CPU cycles. Current MEMCG_CHARGE_BATCH is used for both charging and statistics/events updating, and is set to 32, which may be good for accuracy of memcg charging, but too small for stats update which causes concurrent access to global stats data instead of per-cpu ones. So handle them differently, by adding a new bigger batch number for stats updating, while keeping the value for charging (though comments in memcontrol.h suggests to consider a bigger value too) The new batch is set to 512, which considers 2MB huge pages (512 pages), as the check logic mostly is: if (x > BATCH), then skip updating global data so it will save 50% global data updating for 2MB pages Following are some performance data with the patch, against v5.11-rc1, on several generations of Xeon platforms. Each category below has several subcases run on different platform, and only the worst and best scores are listed: fio: +2.0% ~ +6.8% will-it-scale/malloc: -0.9% ~ +6.2% will-it-scale/page_fault1: no change will-it-scale/page_fault2: +13.7% ~ +26.2% One thought is it could be dynamically calculated according to memcg limit and number of CPUs, and another is to add a periodic syncing of the data for accuracy reason similar to vmstat, as suggested by Ying. Signed-off-by: Feng Tang Cc: Shakeel Butt Cc: Roman Gushchin --- include/linux/memcontrol.h | 2 ++ mm/memcontrol.c | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d827bd7..d58bf28 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -335,6 +335,8 @@ struct mem_cgroup { */ #define MEMCG_CHARGE_BATCH 32U +#define MEMCG_UPDATE_BATCH 512U + extern struct mem_cgroup *root_mem_cgroup; enum page_memcg_data_flags { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 605f671..01ca85d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -760,7 +760,7 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) { - long x, threshold = MEMCG_CHARGE_BATCH; + long x, threshold = MEMCG_UPDATE_BATCH; if (mem_cgroup_disabled()) return; @@ -800,7 +800,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, { struct mem_cgroup_per_node *pn; struct mem_cgroup *memcg; - long x, threshold = MEMCG_CHARGE_BATCH; + long x, threshold = MEMCG_UPDATE_BATCH; pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; @@ -905,7 +905,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, return; x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]); - if (unlikely(x > MEMCG_CHARGE_BATCH)) { + if (unlikely(x > MEMCG_UPDATE_BATCH)) { struct mem_cgroup *mi; /*