From patchwork Tue Jul 24 22:46:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 10543265 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5D3C184F for ; Tue, 24 Jul 2018 22:46:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 933D0295E5 for ; Tue, 24 Jul 2018 22:46:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8647E295F1; Tue, 24 Jul 2018 22:46:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF393295E5 for ; Tue, 24 Jul 2018 22:46:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD3E06B0008; Tue, 24 Jul 2018 18:46:49 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A847E6B000A; Tue, 24 Jul 2018 18:46:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 975486B000C; Tue, 24 Jul 2018 18:46:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id 57EB76B0008 for ; Tue, 24 Jul 2018 18:46:49 -0400 (EDT) Received: by mail-pl0-f69.google.com with SMTP id b5-v6so3889507ple.20 for ; Tue, 24 Jul 2018 15:46:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=GzGVy7APnjBkq3bC65L+G6LChGTLaZHf9aWjvjtrwu4=; b=AW9+DVEY+apkEADQcC63oA/mR90e6EP4SW4yRft0XUcXZFkY8MGioVvVO2e8HSD9y/ 7C79FLmw6ZqBZ80Gmv8sd7sV7ciYUwsV9841sU92CvkHNwT4who0hM37luwiLDHjIq64 ngV6VWgk3ULQUESkGIV0KTgnEHm+9YU3ulVkUtIruL6fo4jWynwz4ynrSaZeoplT7RWV TtEEho2FWmkz0wmGJ+z4mRN9PLDvV8J93mwgqtvqDjqYcFuRHsqY5vj3W8uZWyrFt7xH y+A2LJSGTnG1hMX5EE3aAoxmEaXxv0N+UVVy5bsh0Y2eBqKuL4mTayhZwPIwW3xUfINi u8Jg== X-Gm-Message-State: AOUpUlGrUzbQME/ejShHoaeKai+OV+L3CUb1HmRToFRQ0tO2rAtvNGMf yJxdokNFNl52V6syzPZ3pz5ZTYPX4J/xTQ5+qlwTvJbhixu2DTIrXhrB1aFSvWCasHHSmwDVbee j/c/7otR4SaDrkIZptsWIGe8p/P1HqtlqHyrU5bWQPTgpuu4cRFbXg3DxgrKAv1LjX3VUBu2Uqe FoPUHit4m2W2ybxg8KfMOnxq8CWZ1U9X/ERrMWATyrKsd2YjUzI8WUuyXFjGxvWtIO6So6ozCZq NX8CE+MZo6fwlxUgfvGnv0VT0ehGA7z86dUYsiidvvZCBamsbq250qxVi95QvAupgj9trW9+l37 8WCLT6smpaCrpa60gDhlQoj1O3VPdK6eENFDofvywx7XPeMkhqqVdQfkw06gY/rNLTyW06Sw4Qf G X-Received: by 2002:a17:902:7d85:: with SMTP id a5-v6mr18753345plm.202.1532472409022; Tue, 24 Jul 2018 15:46:49 -0700 (PDT) X-Received: by 2002:a17:902:7d85:: with SMTP id a5-v6mr18753324plm.202.1532472408137; Tue, 24 Jul 2018 15:46:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532472408; cv=none; d=google.com; s=arc-20160816; b=LahAJMZGTnJbfnE/zvAyYRz21ukF9AvWYFHw/TDKZD4YrIIhoYgcj7XtP8HrGdeQoB TjvwG1P5A3w52fFbmX6dHUENlwEPeVNfgdKB4/DN6HWmJPionJISZTrUIZbZ5AyeRrzn zodgNyDODuuxOoZ3iMkrCWuO4OmZMrtdhwxgSxvU/VO25Wr6ENs91XQUtSKrMzq+/Mb7 mCs1lcp4TcQU8T6BbiEu+wMAbEDi+EoN7uQ+H/I5LDJn8kq6mAZxr8qmjAxWp2+sBCPe ewKY9sluav23/S03lqSU2GzdLHiELQukVWbehfeb8lW3AfB7j/RJ2tPwF5Y0I6PhTWgq USpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=GzGVy7APnjBkq3bC65L+G6LChGTLaZHf9aWjvjtrwu4=; b=Oc++M7EGvGrJRKwCpmNtYPq7sxYQn3LGtDIZoA+cuoFOWodNsHpNye5bRSjbDFabbz fWCmkMEUGJoZ5hvjrJiAj+Yko9ijtBLWnB/xdFNboZTwZ9Cpl4++DpZDmtZOPa3W34GZ w8O22cltN1MXUy55JsrOFwNzkEcinbTRfTBnEn2U1TNoQnQqYmZErWSZSiJ1HpbUZmhG x40k3llMg/91oSSGmhK7VenugCyG1/uBzW8i3qh9pZrBD9lb0JhXJE0e0SUvfcw7m5dK 9Krg0MaUqov5+fZ6nyerdyZjAzMZ4AvJ5ktGPUluLK2+TMMAF/GNJIQeZtc0pJeQuHq4 /TyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EgiXL2ut; spf=pass (google.com: domain of shakeelb@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id k20-v6sor2909974pgb.17.2018.07.24.15.46.48 for (Google Transport Security); Tue, 24 Jul 2018 15:46:48 -0700 (PDT) Received-SPF: pass (google.com: domain of shakeelb@google.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EgiXL2ut; spf=pass (google.com: domain of shakeelb@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=GzGVy7APnjBkq3bC65L+G6LChGTLaZHf9aWjvjtrwu4=; b=EgiXL2utxPOWKtH7BDpX7ZEQ5CZzyb7CTvMvu9bb6hIPvtlvh6hKhotE+2HyCIOGTK bRzIUgqHgzqCEZhpUxLxI+pj9eUCJTYhymftdbODKGyHj1R7IJya1Xa2FA8c8xfbexM+ FJKYa3l6som+CGM1qBVq2682BBZ7E+gIu9+tkar0py9yhKkf0+ych7ISR3uf0uxOuc9x y7XwTMU5LZcpXTAMe7FdEP/st1rie9rIIesUcaBOiwkzA5ToZD2BLgeNO12mJW35Lp1J qbjYRnOR9TQXNPW7NKqb3M6zQMiAweYshq9MgiVQu8xmc+T/GKGVxtIIebAsc1Ro7ThA vYig== X-Google-Smtp-Source: AAOMgpfv/e6THL7CNyCf9XzlcnKwVkVKJZ+iPoIGX98oKlWidykNqkPnUuqOnwMtgH3Z4bZkS0+BqA== X-Received: by 2002:a63:5c7:: with SMTP id 190-v6mr17538515pgf.385.1532472407442; Tue, 24 Jul 2018 15:46:47 -0700 (PDT) Received: from shakeelb.mtv.corp.google.com ([2620:15c:2cb:201:3a5f:3a4f:fa44:6b63]) by smtp.gmail.com with ESMTPSA id k26-v6sm32721310pfb.167.2018.07.24.15.46.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Jul 2018 15:46:46 -0700 (PDT) From: Shakeel Butt To: Johannes Weiner , Michal Hocko , Vladimir Davydov , Greg Thelen , Bruce Merry , Andrew Morton Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt Subject: [PATCH] memcg: reduce memcg tree traversals for stats collection Date: Tue, 24 Jul 2018 15:46:35 -0700 Message-Id: <20180724224635.143944-1-shakeelb@google.com> X-Mailer: git-send-email 2.18.0.233.g985f88cf7e-goog X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently cgroup-v1's memcg_stat_show traverses the memcg tree ~17 times to collect the stats while cgroup-v2's memory_stat_show traverses the memcg tree thrice. On a large machine, a couple thousand memcgs is very normal and if the churn is high and memcgs stick around during to several reasons, tens of thousands of nodes in memcg tree can exist. This patch has refactored and shared the stat collection code between cgroup-v1 and cgroup-v2 and has reduced the tree traversal to just one. I ran a simple benchmark which reads the root_mem_cgroup's stat file 1000 times in the presense of 2500 memcgs on cgroup-v1. The results are: Without the patch: $ time ./read-root-stat-1000-times real 0m1.663s user 0m0.000s sys 0m1.660s With the patch: $ time ./read-root-stat-1000-times real 0m0.468s user 0m0.000s sys 0m0.467s Signed-off-by: Shakeel Butt Acked-by: Michal Hocko --- mm/memcontrol.c | 150 +++++++++++++++++++++++------------------------- 1 file changed, 73 insertions(+), 77 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a5869b9d5194..d90993ef1d7d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3122,29 +3122,34 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, return retval; } -static void tree_stat(struct mem_cgroup *memcg, unsigned long *stat) -{ - struct mem_cgroup *iter; - int i; - - memset(stat, 0, sizeof(*stat) * MEMCG_NR_STAT); - - for_each_mem_cgroup_tree(iter, memcg) { - for (i = 0; i < MEMCG_NR_STAT; i++) - stat[i] += memcg_page_state(iter, i); - } -} +struct accumulated_stats { + unsigned long stat[MEMCG_NR_STAT]; + unsigned long events[NR_VM_EVENT_ITEMS]; + unsigned long lru_pages[NR_LRU_LISTS]; + const unsigned int *stats_array; + const unsigned int *events_array; + int stats_size; + int events_size; +}; -static void tree_events(struct mem_cgroup *memcg, unsigned long *events) +static void accumulate_memcg_tree(struct mem_cgroup *memcg, + struct accumulated_stats *acc) { - struct mem_cgroup *iter; + struct mem_cgroup *mi; int i; - memset(events, 0, sizeof(*events) * NR_VM_EVENT_ITEMS); + for_each_mem_cgroup_tree(mi, memcg) { + for (i = 0; i < acc->stats_size; i++) + acc->stat[i] += memcg_page_state(mi, + acc->stats_array ? acc->stats_array[i] : i); - for_each_mem_cgroup_tree(iter, memcg) { - for (i = 0; i < NR_VM_EVENT_ITEMS; i++) - events[i] += memcg_sum_events(iter, i); + for (i = 0; i < acc->events_size; i++) + acc->events[i] += memcg_sum_events(mi, + acc->events_array ? acc->events_array[i] : i); + + for (i = 0; i < NR_LRU_LISTS; i++) + acc->lru_pages[i] += + mem_cgroup_nr_lru_pages(mi, BIT(i)); } } @@ -3555,6 +3560,7 @@ static int memcg_stat_show(struct seq_file *m, void *v) unsigned long memory, memsw; struct mem_cgroup *mi; unsigned int i; + struct accumulated_stats acc; BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); @@ -3587,32 +3593,27 @@ static int memcg_stat_show(struct seq_file *m, void *v) seq_printf(m, "hierarchical_memsw_limit %llu\n", (u64)memsw * PAGE_SIZE); - for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { - unsigned long long val = 0; + memset(&acc, 0, sizeof(acc)); + acc.stats_size = ARRAY_SIZE(memcg1_stats); + acc.stats_array = memcg1_stats; + acc.events_size = ARRAY_SIZE(memcg1_events); + acc.events_array = memcg1_events; + accumulate_memcg_tree(memcg, &acc); + for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { if (memcg1_stats[i] == MEMCG_SWAP && !do_memsw_account()) continue; - for_each_mem_cgroup_tree(mi, memcg) - val += memcg_page_state(mi, memcg1_stats[i]) * - PAGE_SIZE; - seq_printf(m, "total_%s %llu\n", memcg1_stat_names[i], val); + seq_printf(m, "total_%s %llu\n", memcg1_stat_names[i], + (u64)acc.stat[i] * PAGE_SIZE); } - for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) { - unsigned long long val = 0; - - for_each_mem_cgroup_tree(mi, memcg) - val += memcg_sum_events(mi, memcg1_events[i]); - seq_printf(m, "total_%s %llu\n", memcg1_event_names[i], val); - } - - for (i = 0; i < NR_LRU_LISTS; i++) { - unsigned long long val = 0; + for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) + seq_printf(m, "total_%s %llu\n", memcg1_event_names[i], + (u64)acc.events[i]); - for_each_mem_cgroup_tree(mi, memcg) - val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE; - seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i], val); - } + for (i = 0; i < NR_LRU_LISTS; i++) + seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i], + (u64)acc.lru_pages[i] * PAGE_SIZE); #ifdef CONFIG_DEBUG_VM { @@ -5737,8 +5738,7 @@ static int memory_events_show(struct seq_file *m, void *v) static int memory_stat_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); - unsigned long stat[MEMCG_NR_STAT]; - unsigned long events[NR_VM_EVENT_ITEMS]; + struct accumulated_stats acc; int i; /* @@ -5752,66 +5752,62 @@ static int memory_stat_show(struct seq_file *m, void *v) * Current memory state: */ - tree_stat(memcg, stat); - tree_events(memcg, events); + memset(&acc, 0, sizeof(acc)); + acc.stats_size = MEMCG_NR_STAT; + acc.events_size = NR_VM_EVENT_ITEMS; + accumulate_memcg_tree(memcg, &acc); seq_printf(m, "anon %llu\n", - (u64)stat[MEMCG_RSS] * PAGE_SIZE); + (u64)acc.stat[MEMCG_RSS] * PAGE_SIZE); seq_printf(m, "file %llu\n", - (u64)stat[MEMCG_CACHE] * PAGE_SIZE); + (u64)acc.stat[MEMCG_CACHE] * PAGE_SIZE); seq_printf(m, "kernel_stack %llu\n", - (u64)stat[MEMCG_KERNEL_STACK_KB] * 1024); + (u64)acc.stat[MEMCG_KERNEL_STACK_KB] * 1024); seq_printf(m, "slab %llu\n", - (u64)(stat[NR_SLAB_RECLAIMABLE] + - stat[NR_SLAB_UNRECLAIMABLE]) * PAGE_SIZE); + (u64)(acc.stat[NR_SLAB_RECLAIMABLE] + + acc.stat[NR_SLAB_UNRECLAIMABLE]) * PAGE_SIZE); seq_printf(m, "sock %llu\n", - (u64)stat[MEMCG_SOCK] * PAGE_SIZE); + (u64)acc.stat[MEMCG_SOCK] * PAGE_SIZE); seq_printf(m, "shmem %llu\n", - (u64)stat[NR_SHMEM] * PAGE_SIZE); + (u64)acc.stat[NR_SHMEM] * PAGE_SIZE); seq_printf(m, "file_mapped %llu\n", - (u64)stat[NR_FILE_MAPPED] * PAGE_SIZE); + (u64)acc.stat[NR_FILE_MAPPED] * PAGE_SIZE); seq_printf(m, "file_dirty %llu\n", - (u64)stat[NR_FILE_DIRTY] * PAGE_SIZE); + (u64)acc.stat[NR_FILE_DIRTY] * PAGE_SIZE); seq_printf(m, "file_writeback %llu\n", - (u64)stat[NR_WRITEBACK] * PAGE_SIZE); + (u64)acc.stat[NR_WRITEBACK] * PAGE_SIZE); - for (i = 0; i < NR_LRU_LISTS; i++) { - struct mem_cgroup *mi; - unsigned long val = 0; - - for_each_mem_cgroup_tree(mi, memcg) - val += mem_cgroup_nr_lru_pages(mi, BIT(i)); - seq_printf(m, "%s %llu\n", - mem_cgroup_lru_names[i], (u64)val * PAGE_SIZE); - } + for (i = 0; i < NR_LRU_LISTS; i++) + seq_printf(m, "%s %llu\n", mem_cgroup_lru_names[i], + (u64)acc.lru_pages[i] * PAGE_SIZE); seq_printf(m, "slab_reclaimable %llu\n", - (u64)stat[NR_SLAB_RECLAIMABLE] * PAGE_SIZE); + (u64)acc.stat[NR_SLAB_RECLAIMABLE] * PAGE_SIZE); seq_printf(m, "slab_unreclaimable %llu\n", - (u64)stat[NR_SLAB_UNRECLAIMABLE] * PAGE_SIZE); + (u64)acc.stat[NR_SLAB_UNRECLAIMABLE] * PAGE_SIZE); /* Accumulated memory events */ - seq_printf(m, "pgfault %lu\n", events[PGFAULT]); - seq_printf(m, "pgmajfault %lu\n", events[PGMAJFAULT]); + seq_printf(m, "pgfault %lu\n", acc.events[PGFAULT]); + seq_printf(m, "pgmajfault %lu\n", acc.events[PGMAJFAULT]); - seq_printf(m, "pgrefill %lu\n", events[PGREFILL]); - seq_printf(m, "pgscan %lu\n", events[PGSCAN_KSWAPD] + - events[PGSCAN_DIRECT]); - seq_printf(m, "pgsteal %lu\n", events[PGSTEAL_KSWAPD] + - events[PGSTEAL_DIRECT]); - seq_printf(m, "pgactivate %lu\n", events[PGACTIVATE]); - seq_printf(m, "pgdeactivate %lu\n", events[PGDEACTIVATE]); - seq_printf(m, "pglazyfree %lu\n", events[PGLAZYFREE]); - seq_printf(m, "pglazyfreed %lu\n", events[PGLAZYFREED]); + seq_printf(m, "pgrefill %lu\n", acc.events[PGREFILL]); + seq_printf(m, "pgscan %lu\n", acc.events[PGSCAN_KSWAPD] + + acc.events[PGSCAN_DIRECT]); + seq_printf(m, "pgsteal %lu\n", acc.events[PGSTEAL_KSWAPD] + + acc.events[PGSTEAL_DIRECT]); + seq_printf(m, "pgactivate %lu\n", acc.events[PGACTIVATE]); + seq_printf(m, "pgdeactivate %lu\n", acc.events[PGDEACTIVATE]); + seq_printf(m, "pglazyfree %lu\n", acc.events[PGLAZYFREE]); + seq_printf(m, "pglazyfreed %lu\n", acc.events[PGLAZYFREED]); seq_printf(m, "workingset_refault %lu\n", - stat[WORKINGSET_REFAULT]); + acc.stat[WORKINGSET_REFAULT]); seq_printf(m, "workingset_activate %lu\n", - stat[WORKINGSET_ACTIVATE]); + acc.stat[WORKINGSET_ACTIVATE]); seq_printf(m, "workingset_nodereclaim %lu\n", - stat[WORKINGSET_NODERECLAIM]); + acc.stat[WORKINGSET_NODERECLAIM]); return 0; }