From patchwork Thu Sep 10 20:26:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11769149 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E799618 for ; Thu, 10 Sep 2020 20:27:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 550AD221E8 for ; Thu, 10 Sep 2020 20:27:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="hD9uHy9Y" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 550AD221E8 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6D0A990000F; Thu, 10 Sep 2020 16:27:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 658E4900002; Thu, 10 Sep 2020 16:27:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 520DB90000F; Thu, 10 Sep 2020 16:27:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id 35EC4900002 for ; Thu, 10 Sep 2020 16:27:42 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D62B0181AEF1E for ; Thu, 10 Sep 2020 20:27:41 +0000 (UTC) X-FDA: 77248287522.12.place53_5a05474270e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 2789618011243 for ; Thu, 10 Sep 2020 20:27:40 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=6522785d41=guro@fb.com,,RULES_HIT:30012:30054:30064:30070,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yrouodxpngbi6zt88hcptkko875ocs6sz4ps67u4k81xt9s51j5g7jgjcern9.p55nn7rh5oojr48ith8a8qayodw48asormbin3gedawq1shjho3fpd61j4jw7ht.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: place53_5a05474270e8 X-Filterd-Recvd-Size: 5514 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Sep 2020 20:27:39 +0000 (UTC) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08AKKbbX022314 for ; Thu, 10 Sep 2020 13:27:38 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=cyPxDfqqKL5Uzd94aW7bOSgHl42MxVutfMVoXupZePw=; b=hD9uHy9Y9tnltawnBFs2J4mBJfHNp/SpDwtkzvk+khZIwxz9aNPZsfRGnOjc26fGQ7P6 N/+lvKXtRHbkVQqlBkBW40GoQqt5IfqF/4UnWRVbaRpLf9UwOtiW7EZG3o4zow6a5Qhh 3bWpEjqgE1GVqhxjd594CrjNncetMvKtje8= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 33fhxub58u-15 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 10 Sep 2020 13:27:38 -0700 Received: from intmgw003.06.prn3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Thu, 10 Sep 2020 13:27:07 -0700 Received: by devvm1096.prn0.facebook.com (Postfix, from userid 111017) id AAEFD3D449B2; Thu, 10 Sep 2020 13:27:03 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1096.prn0.facebook.com To: Andrew Morton , CC: =Shakeel Butt , Johannes Weiner , Michal Hocko , , , Roman Gushchin , Vlastimil Babka Smtp-Origin-Cluster: prn0c01 Subject: [PATCH rfc 1/5] mm: memcg/slab: fix racy access to page->mem_cgroup in mem_cgroup_from_obj() Date: Thu, 10 Sep 2020 13:26:55 -0700 Message-ID: <20200910202659.1378404-2-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200910202659.1378404-1-guro@fb.com> References: <20200910202659.1378404-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-10_09:2020-09-10,2020-09-10 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 spamscore=0 lowpriorityscore=0 suspectscore=0 bulkscore=0 clxscore=1015 priorityscore=1501 impostorscore=0 adultscore=0 mlxscore=0 malwarescore=0 mlxlogscore=630 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009100183 X-FB-Internal: deliver X-Rspamd-Queue-Id: 2789618011243 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mem_cgroup_from_obj() checks the lowest bit of the page->mem_cgroup pointer to determine if the page has an attached obj_cgroup vector instead of a regular memcg pointer. If it's not set, it simple returns the page->mem_cgroup value as a struct mem_cgroup pointer. The commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") changed the moment when this bit is set: if previously it was set on the allocation of the slab page, now it can be set well after, when the first accounted object is allocated on this page. It opened a race: if page->mem_cgroup is set concurrently after the first page_has_obj_cgroups(page) check, a pointer to the obj_cgroups array can be returned as a memory cgroup pointer. A simple check for page->mem_cgroup pointer for NULL before the page_has_obj_cgroups() check fixes the race. Indeed, if the pointer is not NULL, it's either a simple mem_cgroup pointer or a pointer to obj_cgroup vector. The pointer can be asynchronously changed from NULL to (obj_cgroup_vec | 0x1UL), but can't be changed from a valid memcg pointer to objcg vector or back. If the object passed to mem_cgroup_from_obj() is a slab object and page->mem_cgroup is NULL, it means that the object is not accounted, so the function must return NULL. I've discovered the race looking at the code, so far I haven't seen it in the wild. Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") Signed-off-by: Roman Gushchin Cc: Johannes Weiner Cc: Vlastimil Babka Cc: Shakeel Butt --- mm/memcontrol.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 75cd1a1e66c8..093526fec4bf 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2923,6 +2923,17 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) page = virt_to_head_page(p); + /* + * If page->mem_cgroup is set, it's either a simple mem_cgroup pointer + * or a pointer to obj_cgroup vector. In the latter case the lowest + * bit of the pointer is set. + * The page->mem_cgroup pointer can be asynchronously changed + * from NULL to (obj_cgroup_vec | 0x1UL), but can't be changed + * from a valid memcg pointer to objcg vector or back. + */ + if (!page->mem_cgroup) + return NULL; + /* * Slab objects are accounted individually, not per-page. * Memcg membership data for each individual object is saved in From patchwork Thu Sep 10 20:26:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11769139 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C33C0618 for ; Thu, 10 Sep 2020 20:27:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 45A9E20BED for ; Thu, 10 Sep 2020 20:27:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="cW7DAo+G" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 45A9E20BED Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3CAE890000D; Thu, 10 Sep 2020 16:27:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2B570900002; Thu, 10 Sep 2020 16:27:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DD0790000D; Thu, 10 Sep 2020 16:27:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146]) by kanga.kvack.org (Postfix) with ESMTP id E21C1900002 for ; Thu, 10 Sep 2020 16:27:13 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 736C22478 for ; Thu, 10 Sep 2020 20:27:13 +0000 (UTC) X-FDA: 77248286346.21.foot25_540604a270e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 3C8E4180442C2 for ; Thu, 10 Sep 2020 20:27:13 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=6522785d41=guro@fb.com,,RULES_HIT:30045:30054:30056:30064:30070:30075:30080,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04yr1h3azmew7y5m7pgu77ecdg8ppopjzj5kjqxqefuj731zzjuwrykbhsce93s.yzhci84r8t9mq1wwwaagnqpzwskesjeei91q6gj89nmtc1i1komitswitntxmgx.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: foot25_540604a270e8 X-Filterd-Recvd-Size: 29526 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Sep 2020 20:27:12 +0000 (UTC) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08AKJrKO023662 for ; Thu, 10 Sep 2020 13:27:11 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=pxL6C6slAXM/4e5kZiFDoO4ZMGsuovxE7jduAqGktPQ=; b=cW7DAo+GV1VWjyhhS8Ic/K2XODglRypR+VJB5RuWglymEBuFYCCpbgxM9K9WOFrsCPfp yyRz1RNNtxEVxINvoi/gZMxGoJ0TABy8LXgrrT3bpvhEFmb7i41yIDqzcC1G1v9UoHXL Nma0fX529A1mOpcuKbVct2zFrZ4ftFIuIuk= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 33f8bfddwa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 10 Sep 2020 13:27:11 -0700 Received: from intmgw002.41.prn1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Thu, 10 Sep 2020 13:27:09 -0700 Received: by devvm1096.prn0.facebook.com (Postfix, from userid 111017) id B162A3D449B4; Thu, 10 Sep 2020 13:27:03 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1096.prn0.facebook.com To: Andrew Morton , CC: =Shakeel Butt , Johannes Weiner , Michal Hocko , , , Roman Gushchin Smtp-Origin-Cluster: prn0c01 Subject: [PATCH rfc 2/5] mm: memcontrol: use helpers to access page's memcg data Date: Thu, 10 Sep 2020 13:26:56 -0700 Message-ID: <20200910202659.1378404-3-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200910202659.1378404-1-guro@fb.com> References: <20200910202659.1378404-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-10_09:2020-09-10,2020-09-10 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 impostorscore=0 priorityscore=1501 malwarescore=0 phishscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 suspectscore=2 clxscore=1015 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009100183 X-FB-Internal: deliver X-Rspamd-Queue-Id: 3C8E4180442C2 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently there are many open-coded reads and writes of the page->mem_cgroup pointer, as well as a couple of read helpers, which are barely used. It creates an obstacle on a way to reuse some bits of the pointer for storing additional bits of information. In fact, we already do this for slab pages, where the last bit indicates that a pointer has an attached vector of objcg pointers instead of a regular memcg pointer. This commits introduces 4 new helper functions and converts all raw accesses to page->mem_cgroup to calls of these helpers: struct mem_cgroup *page_mem_cgroup(struct page *page); struct mem_cgroup *page_mem_cgroup_check(struct page *page); void set_page_mem_cgroup(struct page *page, struct mem_cgroup *memcg); void clear_page_mem_cgroup(struct page *page); page_mem_cgroup_check() is intended to be used in cases when the page can be a slab page and have a memcg pointer pointing at objcg vector. It does check the lowest bit, and if set, returns NULL. page_mem_cgroup() contains a VM_BUG_ON_PAGE() check for the page not being a slab page. So do set_page_mem_cgroup() and clear_page_mem_cgroup(). To make sure nobody uses a direct access, struct page's mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. Only new helpers and a couple of slab-accounting related functions access this field directly. page_memcg() and page_memcg_rcu() helpers defined in mm.h are removed. New page_mem_cgroup() is a direct analog of page_memcg(), while page_memcg_rcu() has a single call site in a small rcu-read-lock section, so it's just not worth it to have a separate helper. So it's replaced with page_mem_cgroup() too. Signed-off-by: Roman Gushchin Reviewed-by: Shakeel Butt --- include/linux/memcontrol.h | 72 +++++++++++++++++--- include/linux/mm.h | 22 ------ include/linux/mm_types.h | 5 +- include/trace/events/writeback.h | 2 +- mm/debug.c | 4 +- mm/huge_memory.c | 4 +- mm/memcontrol.c | 111 +++++++++++++------------------ mm/migrate.c | 2 +- mm/page_alloc.c | 4 +- mm/page_io.c | 4 +- mm/slab.h | 7 +- mm/workingset.c | 4 +- 12 files changed, 124 insertions(+), 117 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 924177502479..0997220c84ce 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -340,6 +340,41 @@ struct mem_cgroup { extern struct mem_cgroup *root_mem_cgroup; +static inline struct mem_cgroup *page_mem_cgroup(struct page *page) +{ + VM_BUG_ON_PAGE(PageSlab(page), page); + return (struct mem_cgroup *)page->memcg_data; +} + +static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page) +{ + unsigned long memcg_data = page->memcg_data; + + /* + * The lowest bit set means that memcg isn't a valid + * memcg pointer, but a obj_cgroups pointer. + * In this case the page is shared and doesn't belong + * to any specific memory cgroup. + */ + if (memcg_data & 0x1UL) + return NULL; + + return (struct mem_cgroup *)memcg_data; +} + +static inline void set_page_mem_cgroup(struct page *page, + struct mem_cgroup *memcg) +{ + VM_BUG_ON_PAGE(PageSlab(page), page); + page->memcg_data = (unsigned long)memcg; +} + +static inline void clear_page_mem_cgroup(struct page *page) +{ + VM_BUG_ON_PAGE(PageSlab(page), page); + page->memcg_data = 0; +} + static __always_inline bool memcg_stat_item_in_bytes(int idx) { if (idx == MEMCG_PERCPU_B) @@ -740,15 +775,15 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg, static inline void __mod_memcg_page_state(struct page *page, int idx, int val) { - if (page->mem_cgroup) - __mod_memcg_state(page->mem_cgroup, idx, val); + if (page_mem_cgroup(page)) + __mod_memcg_state(page_mem_cgroup(page), idx, val); } static inline void mod_memcg_page_state(struct page *page, int idx, int val) { - if (page->mem_cgroup) - mod_memcg_state(page->mem_cgroup, idx, val); + if (page_mem_cgroup(page)) + mod_memcg_state(page_mem_cgroup(page), idx, val); } static inline unsigned long lruvec_page_state(struct lruvec *lruvec, @@ -835,12 +870,12 @@ static inline void __mod_lruvec_page_state(struct page *page, struct lruvec *lruvec; /* Untracked pages have no memcg, no lruvec. Update only the node */ - if (!head->mem_cgroup) { + if (!page_mem_cgroup(head)) { __mod_node_page_state(pgdat, idx, val); return; } - lruvec = mem_cgroup_lruvec(head->mem_cgroup, pgdat); + lruvec = mem_cgroup_lruvec(page_mem_cgroup(head), pgdat); __mod_lruvec_state(lruvec, idx, val); } @@ -875,8 +910,8 @@ static inline void count_memcg_events(struct mem_cgroup *memcg, static inline void count_memcg_page_event(struct page *page, enum vm_event_item idx) { - if (page->mem_cgroup) - count_memcg_events(page->mem_cgroup, idx, 1); + if (page_mem_cgroup(page)) + count_memcg_events(page_mem_cgroup(page), idx, 1); } static inline void count_memcg_event_mm(struct mm_struct *mm, @@ -938,6 +973,25 @@ void mem_cgroup_split_huge_fixup(struct page *head); struct mem_cgroup; +static inline struct mem_cgroup *page_mem_cgroup(struct page *page) +{ + return NULL; +} + +static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page) +{ + return NULL; +} + +static inline void set_page_mem_cgroup(struct page *page, + struct mem_cgroup *memcg) +{ +} + +static inline void clear_page_mem_cgroup(struct page *page) +{ +} + static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg) { return true; @@ -1427,7 +1481,7 @@ static inline void mem_cgroup_track_foreign_dirty(struct page *page, if (mem_cgroup_disabled()) return; - if (unlikely(&page->mem_cgroup->css != wb->memcg_css)) + if (unlikely(&page_mem_cgroup(page)->css != wb->memcg_css)) mem_cgroup_track_foreign_dirty_slowpath(page, wb); } diff --git a/include/linux/mm.h b/include/linux/mm.h index 517751310dd2..dc4eb3b150fe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1474,28 +1474,6 @@ static inline void set_page_links(struct page *page, enum zone_type zone, #endif } -#ifdef CONFIG_MEMCG -static inline struct mem_cgroup *page_memcg(struct page *page) -{ - return page->mem_cgroup; -} -static inline struct mem_cgroup *page_memcg_rcu(struct page *page) -{ - WARN_ON_ONCE(!rcu_read_lock_held()); - return READ_ONCE(page->mem_cgroup); -} -#else -static inline struct mem_cgroup *page_memcg(struct page *page) -{ - return NULL; -} -static inline struct mem_cgroup *page_memcg_rcu(struct page *page) -{ - WARN_ON_ONCE(!rcu_read_lock_held()); - return NULL; -} -#endif - /* * Some inline functions in vmstat.h depend on page_zone() */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 496c3ff97cce..4856d23b1161 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -199,10 +199,7 @@ struct page { atomic_t _refcount; #ifdef CONFIG_MEMCG - union { - struct mem_cgroup *mem_cgroup; - struct obj_cgroup **obj_cgroups; - }; + unsigned long memcg_data; #endif /* diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index e7cbccc7c14c..b1fa3ac64fa5 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -257,7 +257,7 @@ TRACE_EVENT(track_foreign_dirty, __entry->ino = inode ? inode->i_ino : 0; __entry->memcg_id = wb->memcg_css->id; __entry->cgroup_ino = __trace_wb_assign_cgroup(wb); - __entry->page_cgroup_ino = cgroup_ino(page->mem_cgroup->css.cgroup); + __entry->page_cgroup_ino = cgroup_ino(page_mem_cgroup(page)->css.cgroup); ), TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%lu page_cgroup_ino=%lu", diff --git a/mm/debug.c b/mm/debug.c index ccca576b2899..55d1c42c7da8 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -182,8 +182,8 @@ void __dump_page(struct page *page, const char *reason) pr_warn("page dumped because: %s\n", reason); #ifdef CONFIG_MEMCG - if (!page_poisoned && page->mem_cgroup) - pr_warn("page->mem_cgroup:%px\n", page->mem_cgroup); + if (!page_poisoned && page_mem_cgroup(page)) + pr_warn("page->mem_cgroup:%px\n", page_mem_cgroup(page)); #endif } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2a468a4acb0a..7ca3baea9810 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -470,7 +470,7 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) #ifdef CONFIG_MEMCG static inline struct deferred_split *get_deferred_split_queue(struct page *page) { - struct mem_cgroup *memcg = compound_head(page)->mem_cgroup; + struct mem_cgroup *memcg = page_mem_cgroup(compound_head(page)); struct pglist_data *pgdat = NODE_DATA(page_to_nid(page)); if (memcg) @@ -2728,7 +2728,7 @@ void deferred_split_huge_page(struct page *page) { struct deferred_split *ds_queue = get_deferred_split_queue(page); #ifdef CONFIG_MEMCG - struct mem_cgroup *memcg = compound_head(page)->mem_cgroup; + struct mem_cgroup *memcg = page_mem_cgroup(compound_head(page)); #endif unsigned long flags; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 093526fec4bf..19180674e38a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -541,7 +541,7 @@ struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page) { struct mem_cgroup *memcg; - memcg = page->mem_cgroup; + memcg = page_mem_cgroup(page); if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) memcg = root_mem_cgroup; @@ -568,16 +568,7 @@ ino_t page_cgroup_ino(struct page *page) unsigned long ino = 0; rcu_read_lock(); - memcg = page->mem_cgroup; - - /* - * The lowest bit set means that memcg isn't a valid - * memcg pointer, but a obj_cgroups pointer. - * In this case the page is shared and doesn't belong - * to any specific memory cgroup. - */ - if ((unsigned long) memcg & 0x1UL) - memcg = NULL; + memcg = page_mem_cgroup_check(page); while (memcg && !(memcg->css.flags & CSS_ONLINE)) memcg = parent_mem_cgroup(memcg); @@ -1058,7 +1049,7 @@ EXPORT_SYMBOL(get_mem_cgroup_from_mm); */ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) { - struct mem_cgroup *memcg = page->mem_cgroup; + struct mem_cgroup *memcg = page_mem_cgroup(page); if (mem_cgroup_disabled()) return NULL; @@ -1343,7 +1334,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, * @page: the page * @pgdat: pgdat of the page * - * This function relies on page->mem_cgroup being stable - see the + * This function relies on page and memcg binding being stable - see the * access rules in commit_charge(). */ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat) @@ -1357,7 +1348,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd goto out; } - memcg = page->mem_cgroup; + memcg = page_mem_cgroup(page); /* * Swapcache readahead pages are added to the LRU - and * possibly migrated - before they are charged. @@ -2100,7 +2091,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) } /** - * lock_page_memcg - lock a page->mem_cgroup binding + * lock_page_memcg - lock a page and memcg binding * @page: the page * * This function protects unlocked LRU pages from being moved to @@ -2132,7 +2123,7 @@ struct mem_cgroup *lock_page_memcg(struct page *page) if (mem_cgroup_disabled()) return NULL; again: - memcg = head->mem_cgroup; + memcg = page_mem_cgroup(head); if (unlikely(!memcg)) return NULL; @@ -2140,7 +2131,7 @@ struct mem_cgroup *lock_page_memcg(struct page *page) return memcg; spin_lock_irqsave(&memcg->move_lock, flags); - if (memcg != head->mem_cgroup) { + if (memcg != page_mem_cgroup(head)) { spin_unlock_irqrestore(&memcg->move_lock, flags); goto again; } @@ -2178,14 +2169,14 @@ void __unlock_page_memcg(struct mem_cgroup *memcg) } /** - * unlock_page_memcg - unlock a page->mem_cgroup binding + * unlock_page_memcg - unlock a page and memcg binding * @page: the page */ void unlock_page_memcg(struct page *page) { struct page *head = compound_head(page); - __unlock_page_memcg(head->mem_cgroup); + __unlock_page_memcg(page_mem_cgroup(head)); } EXPORT_SYMBOL(unlock_page_memcg); @@ -2875,16 +2866,16 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) static void commit_charge(struct page *page, struct mem_cgroup *memcg) { - VM_BUG_ON_PAGE(page->mem_cgroup, page); + VM_BUG_ON_PAGE(page_mem_cgroup(page), page); /* - * Any of the following ensures page->mem_cgroup stability: + * Any of the following ensures page and memcg binding stability: * * - the page lock * - LRU isolation * - lock_page_memcg() * - exclusive reference */ - page->mem_cgroup = memcg; + set_page_mem_cgroup(page, memcg); } #ifdef CONFIG_MEMCG_KMEM @@ -2899,8 +2890,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, if (!vec) return -ENOMEM; - if (cmpxchg(&page->obj_cgroups, NULL, - (struct obj_cgroup **) ((unsigned long)vec | 0x1UL))) + if (cmpxchg(&page->memcg_data, 0, (unsigned long)vec | 0x1UL)) kfree(vec); else kmemleak_not_leak(vec); @@ -2923,17 +2913,6 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) page = virt_to_head_page(p); - /* - * If page->mem_cgroup is set, it's either a simple mem_cgroup pointer - * or a pointer to obj_cgroup vector. In the latter case the lowest - * bit of the pointer is set. - * The page->mem_cgroup pointer can be asynchronously changed - * from NULL to (obj_cgroup_vec | 0x1UL), but can't be changed - * from a valid memcg pointer to objcg vector or back. - */ - if (!page->mem_cgroup) - return NULL; - /* * Slab objects are accounted individually, not per-page. * Memcg membership data for each individual object is saved in @@ -2952,7 +2931,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) } /* All other pages use page->mem_cgroup */ - return page->mem_cgroup; + return page_mem_cgroup_check(page); } __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) @@ -3090,7 +3069,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) if (memcg && !mem_cgroup_is_root(memcg)) { ret = __memcg_kmem_charge(memcg, gfp, 1 << order); if (!ret) { - page->mem_cgroup = memcg; + set_page_mem_cgroup(page, memcg); __SetPageKmemcg(page); return 0; } @@ -3106,7 +3085,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct mem_cgroup *memcg = page->mem_cgroup; + struct mem_cgroup *memcg = page_mem_cgroup(page); unsigned int nr_pages = 1 << order; if (!memcg) @@ -3114,7 +3093,7 @@ void __memcg_kmem_uncharge_page(struct page *page, int order) VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); __memcg_kmem_uncharge(memcg, nr_pages); - page->mem_cgroup = NULL; + clear_page_mem_cgroup(page); css_put(&memcg->css); /* slab pages do not have PageKmemcg flag set */ @@ -3265,7 +3244,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) */ void mem_cgroup_split_huge_fixup(struct page *head) { - struct mem_cgroup *memcg = head->mem_cgroup; + struct mem_cgroup *memcg = page_mem_cgroup(head); int i; if (mem_cgroup_disabled()) @@ -3273,7 +3252,7 @@ void mem_cgroup_split_huge_fixup(struct page *head) for (i = 1; i < HPAGE_PMD_NR; i++) { css_get(&memcg->css); - head[i].mem_cgroup = memcg; + set_page_mem_cgroup(&head[i], memcg); } } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -4649,7 +4628,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, void mem_cgroup_track_foreign_dirty_slowpath(struct page *page, struct bdi_writeback *wb) { - struct mem_cgroup *memcg = page->mem_cgroup; + struct mem_cgroup *memcg = page_mem_cgroup(page); struct memcg_cgwb_frn *frn; u64 now = get_jiffies_64(); u64 oldest_at = now; @@ -5641,14 +5620,14 @@ static int mem_cgroup_move_account(struct page *page, /* * Prevent mem_cgroup_migrate() from looking at - * page->mem_cgroup of its source page while we change it. + * page_mem_cgroup(page) of its source page while we change it. */ ret = -EBUSY; if (!trylock_page(page)) goto out; ret = -EINVAL; - if (page->mem_cgroup != from) + if (page_mem_cgroup(page) != from) goto out_unlock; pgdat = page_pgdat(page); @@ -5703,13 +5682,13 @@ static int mem_cgroup_move_account(struct page *page, /* * All state has been migrated, let's switch to the new memcg. * - * It is safe to change page->mem_cgroup here because the page + * It is safe to change page's memcg here because the page * is referenced, charged, isolated, and locked: we can't race * with (un)charging, migration, LRU putback, or anything else - * that would rely on a stable page->mem_cgroup. + * that would rely on a stable page_mem_cgroup(page). * * Note that lock_page_memcg is a memcg lock, not a page lock, - * to save space. As soon as we switch page->mem_cgroup to a + * to save space. As soon as we switch page_mem_cgroup(page) to a * new memcg that isn't locked, the above state can change * concurrently again. Make sure we're truly done with it. */ @@ -5718,7 +5697,7 @@ static int mem_cgroup_move_account(struct page *page, css_get(&to->css); css_put(&from->css); - page->mem_cgroup = to; + set_page_mem_cgroup(page, to); __unlock_page_memcg(from); @@ -5784,7 +5763,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * mem_cgroup_move_account() checks the page is valid or * not under LRU exclusion. */ - if (page->mem_cgroup == mc.from) { + if (page_mem_cgroup(page) == mc.from) { ret = MC_TARGET_PAGE; if (is_device_private_page(page)) ret = MC_TARGET_DEVICE; @@ -5828,7 +5807,7 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!page || !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) return ret; - if (page->mem_cgroup == mc.from) { + if (page_mem_cgroup(page) == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); @@ -6739,12 +6718,12 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) /* * Every swap fault against a single page tries to charge the * page, bail as early as possible. shmem_unuse() encounters - * already charged pages, too. page->mem_cgroup is protected - * by the page lock, which serializes swap cache removal, which - * in turn serializes uncharging. + * already charged pages, too. page and memcg binding is + * protected by the page lock, which serializes swap cache + * removal, which in turn serializes uncharging. */ VM_BUG_ON_PAGE(!PageLocked(page), page); - if (compound_head(page)->mem_cgroup) + if (page_mem_cgroup(compound_head(page))) goto out; id = lookup_swap_cgroup_id(ent); @@ -6828,21 +6807,21 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) VM_BUG_ON_PAGE(PageLRU(page), page); - if (!page->mem_cgroup) + if (!page_mem_cgroup(page)) return; /* * Nobody should be changing or seriously looking at - * page->mem_cgroup at this point, we have fully + * page_mem_cgroup(page) at this point, we have fully * exclusive access to the page. */ - if (ug->memcg != page->mem_cgroup) { + if (ug->memcg != page_mem_cgroup(page)) { if (ug->memcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg = page->mem_cgroup; + ug->memcg = page_mem_cgroup(page); /* pairs with css_put in uncharge_batch */ css_get(&ug->memcg->css); @@ -6859,7 +6838,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) } ug->dummy_page = page; - page->mem_cgroup = NULL; + clear_page_mem_cgroup(page); css_put(&ug->memcg->css); } @@ -6902,7 +6881,7 @@ void mem_cgroup_uncharge(struct page *page) return; /* Don't touch page->lru of any random page, pre-check: */ - if (!page->mem_cgroup) + if (!page_mem_cgroup(page)) return; uncharge_gather_clear(&ug); @@ -6952,11 +6931,11 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage) return; /* Page cache replacement: new page already charged? */ - if (newpage->mem_cgroup) + if (page_mem_cgroup(newpage)) return; /* Swapcache readahead pages can get replaced before being charged */ - memcg = oldpage->mem_cgroup; + memcg = page_mem_cgroup(oldpage); if (!memcg) return; @@ -7151,7 +7130,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; - memcg = page->mem_cgroup; + memcg = page_mem_cgroup(page); /* Readahead page, never charged */ if (!memcg) @@ -7172,7 +7151,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) VM_BUG_ON_PAGE(oldid, page); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); - page->mem_cgroup = NULL; + clear_page_mem_cgroup(page); if (!mem_cgroup_is_root(memcg)) page_counter_uncharge(&memcg->memory, nr_entries); @@ -7215,7 +7194,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return 0; - memcg = page->mem_cgroup; + memcg = page_mem_cgroup(page); /* Readahead page, never charged */ if (!memcg) @@ -7296,7 +7275,7 @@ bool mem_cgroup_swap_full(struct page *page) if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) return false; - memcg = page->mem_cgroup; + memcg = page_mem_cgroup(page); if (!memcg) return false; diff --git a/mm/migrate.c b/mm/migrate.c index 659d3d8a3e1f..6c3b542395e7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -493,7 +493,7 @@ int migrate_page_move_mapping(struct address_space *mapping, struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; - memcg = page_memcg(page); + memcg = page_mem_cgroup(page); old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0d9f9bd0e06c..a707671f3b6c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1056,7 +1056,7 @@ static inline bool page_expected_state(struct page *page, if (unlikely((unsigned long)page->mapping | page_ref_count(page) | #ifdef CONFIG_MEMCG - (unsigned long)page->mem_cgroup | + (unsigned long)page_mem_cgroup(page) | #endif (page->flags & check_flags))) return false; @@ -1081,7 +1081,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags) bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set"; } #ifdef CONFIG_MEMCG - if (unlikely(page->mem_cgroup)) + if (unlikely(page_mem_cgroup(page))) bad_reason = "page still charged to cgroup"; #endif return bad_reason; diff --git a/mm/page_io.c b/mm/page_io.c index dc6de6962612..ffa3a7d20c58 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -282,11 +282,11 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) { struct cgroup_subsys_state *css; - if (!page->mem_cgroup) + if (!page_mem_cgroup(page)) return; rcu_read_lock(); - css = cgroup_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys); + css = cgroup_e_css(page_mem_cgroup(page)->css.cgroup, &io_cgrp_subsys); bio_associate_blkg_from_css(bio, css); rcu_read_unlock(); } diff --git a/mm/slab.h b/mm/slab.h index 4a24e1702923..b9787c4d6e78 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -247,13 +247,12 @@ static inline struct obj_cgroup **page_obj_cgroups(struct page *page) * that the page is a slab page (e.g. page_cgroup_ino()), let's * always set the lowest bit of obj_cgroups. */ - return (struct obj_cgroup **) - ((unsigned long)page->obj_cgroups & ~0x1UL); + return (struct obj_cgroup **)(page->memcg_data & ~0x1UL); } static inline bool page_has_obj_cgroups(struct page *page) { - return ((unsigned long)page->obj_cgroups & 0x1UL); + return page->memcg_data & 0x1UL; } int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, @@ -262,7 +261,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, static inline void memcg_free_page_obj_cgroups(struct page *page) { kfree(page_obj_cgroups(page)); - page->obj_cgroups = NULL; + page->memcg_data = 0; } static inline size_t obj_full_size(struct kmem_cache *s) diff --git a/mm/workingset.c b/mm/workingset.c index 92e66113a577..c0b7a4faa0d8 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -345,7 +345,7 @@ void workingset_refault(struct page *page, void *shadow) * However, the cgroup that will own the page is the one that * is actually experiencing the refault event. */ - memcg = page_memcg(page); + memcg = page_mem_cgroup(page); lruvec = mem_cgroup_lruvec(memcg, pgdat); inc_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file); @@ -407,7 +407,7 @@ void workingset_activation(struct page *page) * XXX: See workingset_refault() - this should return * root_mem_cgroup even for !CONFIG_MEMCG. */ - memcg = page_memcg_rcu(page); + memcg = page_mem_cgroup(page); if (!mem_cgroup_disabled() && !memcg) goto out; lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); From patchwork Thu Sep 10 20:26:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11769135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A600F618 for ; Thu, 10 Sep 2020 20:27:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4E450221E8 for ; Thu, 10 Sep 2020 20:27:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="GSooHj9E" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E450221E8 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 54B21900004; Thu, 10 Sep 2020 16:27:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4D45B900002; Thu, 10 Sep 2020 16:27:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39A9A900004; Thu, 10 Sep 2020 16:27:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 1FAED900002 for ; Thu, 10 Sep 2020 16:27:12 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BD15A3629 for ; Thu, 10 Sep 2020 20:27:11 +0000 (UTC) X-FDA: 77248286262.01.fish86_31059d7270e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 789EA1004E185 for ; Thu, 10 Sep 2020 20:27:11 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=6522785d41=guro@fb.com,,RULES_HIT:30012:30054:30064:30070,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04y84a87ttm7w655unpk43ncpfsf6ypwdnqqwxbnaz754jcjuimawzmymmq51eh.yei8obsr5t3i4c967anqh1wsc8nd3rkzxwi3sgx1ioqp35q8948ka1ztg7hbrt8.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:40,LUA_SUMMARY:none X-HE-Tag: fish86_31059d7270e8 X-Filterd-Recvd-Size: 8103 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Sep 2020 20:27:10 +0000 (UTC) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08AKJrkS023692 for ; Thu, 10 Sep 2020 13:27:09 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=YpAuu73LqS8VETlPU30ylvsD7APUey0fsX1DDemWPf8=; b=GSooHj9El7P0teER6JYGoUtDeShoupvpKFT8fvdlYReQJaYurws3JH/sfMhQSkETBTRd PheQjT4o+oRUvLoJa4Un833Lp0Z1twCeQwZ/k/7qZt49zFmh9tPCMnPrioCUWP4QRGSQ 1t3gE86q//hqySZytD4A/rqtVSrWoWVgyDw= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 33f8bfddw5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 10 Sep 2020 13:27:09 -0700 Received: from intmgw002.41.prn1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Thu, 10 Sep 2020 13:27:08 -0700 Received: by devvm1096.prn0.facebook.com (Postfix, from userid 111017) id B64093D449B6; Thu, 10 Sep 2020 13:27:03 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1096.prn0.facebook.com To: Andrew Morton , CC: =Shakeel Butt , Johannes Weiner , Michal Hocko , , , Roman Gushchin Smtp-Origin-Cluster: prn0c01 Subject: [PATCH rfc 3/5] mm: memcontrol/slab: use helpers to access slab page's memcg_data Date: Thu, 10 Sep 2020 13:26:57 -0700 Message-ID: <20200910202659.1378404-4-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200910202659.1378404-1-guro@fb.com> References: <20200910202659.1378404-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-10_09:2020-09-10,2020-09-10 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 impostorscore=0 priorityscore=1501 malwarescore=0 phishscore=0 adultscore=0 mlxscore=0 mlxlogscore=689 suspectscore=2 clxscore=1015 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009100183 X-FB-Internal: deliver X-Rspamd-Queue-Id: 789EA1004E185 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.011136, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To gather all direct accesses to struct page's memcg_data field in one place, let's introduce 4 new helper functions to use in the slab accounting code: struct obj_cgroup **page_obj_cgroups(struct page *page); struct obj_cgroup **page_obj_cgroups_check(struct page *page); bool set_page_obj_cgroups(struct page *page, struct obj_cgroup **objcgs); void clear_page_obj_cgroups(struct page *page); They are similar to the corresponding API for generic pages, except that the setter can return false, indicating that the value has been already set from a different thread. Signed-off-by: Roman Gushchin Reviewed-by: Shakeel Butt --- include/linux/memcontrol.h | 48 ++++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 4 ++-- mm/slab.h | 27 +++------------------ 3 files changed, 53 insertions(+), 26 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0997220c84ce..48d4c2c1ce81 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -375,6 +375,54 @@ static inline void clear_page_mem_cgroup(struct page *page) page->memcg_data = 0; } +#ifdef CONFIG_MEMCG_KMEM +static inline struct obj_cgroup **page_obj_cgroups(struct page *page) +{ + return (struct obj_cgroup **)(page->memcg_data & ~0x1UL); +} + +static inline struct obj_cgroup **page_obj_cgroups_check(struct page *page) +{ + unsigned long memcg_data = page->memcg_data; + + if (memcg_data && (memcg_data & 0x1UL)) + return (struct obj_cgroup **)memcg_data; + + return NULL; +} + +static inline bool set_page_obj_cgroups(struct page *page, + struct obj_cgroup **objcgs) +{ + return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL); +} + +static inline void clear_page_obj_cgroups(struct page *page) +{ + page->memcg_data = 0; +} +#else +static inline struct obj_cgroup **page_obj_cgroups(struct page *page) +{ + return NULL; +} + +static inline struct obj_cgroup **page_obj_cgroups_check(struct page *page) +{ + return NULL; +} + +static inline bool set_page_obj_cgroups(struct page *page, + struct obj_cgroup **objcgs) +{ + return true; +} + +static inline void clear_page_obj_cgroups(struct page *page) +{ +} +#endif + static __always_inline bool memcg_stat_item_in_bytes(int idx) { if (idx == MEMCG_PERCPU_B) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 19180674e38a..ba9b053b1b88 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2890,7 +2890,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, if (!vec) return -ENOMEM; - if (cmpxchg(&page->memcg_data, 0, (unsigned long)vec | 0x1UL)) + if (!set_page_obj_cgroups(page, vec)) kfree(vec); else kmemleak_not_leak(vec); @@ -2918,7 +2918,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) * Memcg membership data for each individual object is saved in * the page->obj_cgroups. */ - if (page_has_obj_cgroups(page)) { + if (page_obj_cgroups_check(page)) { struct obj_cgroup *objcg; unsigned int off; diff --git a/mm/slab.h b/mm/slab.h index b9787c4d6e78..9a46ab76cb61 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -239,29 +239,13 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla } #ifdef CONFIG_MEMCG_KMEM -static inline struct obj_cgroup **page_obj_cgroups(struct page *page) -{ - /* - * page->mem_cgroup and page->obj_cgroups are sharing the same - * space. To distinguish between them in case we don't know for sure - * that the page is a slab page (e.g. page_cgroup_ino()), let's - * always set the lowest bit of obj_cgroups. - */ - return (struct obj_cgroup **)(page->memcg_data & ~0x1UL); -} - -static inline bool page_has_obj_cgroups(struct page *page) -{ - return page->memcg_data & 0x1UL; -} - int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp); static inline void memcg_free_page_obj_cgroups(struct page *page) { kfree(page_obj_cgroups(page)); - page->memcg_data = 0; + clear_page_obj_cgroups(page); } static inline size_t obj_full_size(struct kmem_cache *s) @@ -322,7 +306,7 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, if (likely(p[i])) { page = virt_to_head_page(p[i]); - if (!page_has_obj_cgroups(page) && + if (!page_obj_cgroups(page) && memcg_alloc_page_obj_cgroups(page, s, flags)) { obj_cgroup_uncharge(objcg, obj_full_size(s)); continue; @@ -349,7 +333,7 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, if (!memcg_kmem_enabled()) return; - if (!page_has_obj_cgroups(page)) + if (!page_obj_cgroups(page)) return; off = obj_to_index(s, page, p); @@ -367,11 +351,6 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, } #else /* CONFIG_MEMCG_KMEM */ -static inline bool page_has_obj_cgroups(struct page *page) -{ - return false; -} - static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) { return NULL; From patchwork Thu Sep 10 20:26:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11769151 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 815B0698 for ; Thu, 10 Sep 2020 20:27:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A7E4208A9 for ; Thu, 10 Sep 2020 20:27:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="OdPSSXBK" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A7E4208A9 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 632F2900011; Thu, 10 Sep 2020 16:27:49 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5E323900002; Thu, 10 Sep 2020 16:27:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4ABB6900011; Thu, 10 Sep 2020 16:27:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0110.hostedemail.com [216.40.44.110]) by kanga.kvack.org (Postfix) with ESMTP id 2F707900002 for ; Thu, 10 Sep 2020 16:27:49 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 017A42465 for ; Thu, 10 Sep 2020 20:27:48 +0000 (UTC) X-FDA: 77248287858.09.trade76_4809f30270e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id CE5FA180AD817 for ; Thu, 10 Sep 2020 20:27:48 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=6522785d41=guro@fb.com,,RULES_HIT:30054:30064,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yffh7hx1y8jsh6why8u1f1eg415ypdcoywfcjhnbsrhtfpze1bxdrpmpp6gcu.bifc4gmzajhdjrexgmshkcbagmjpd18eyf6pqzb9wm3s4mgkpcntsmmf1ru5ign.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: trade76_4809f30270e8 X-Filterd-Recvd-Size: 5948 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Sep 2020 20:27:48 +0000 (UTC) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.42/8.16.0.42) with SMTP id 08AKRgT7024440 for ; Thu, 10 Sep 2020 13:27:47 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=nr+CLhhN+2HgzncJdYQx+0jf1+z+pRfq+bAtFH0myb8=; b=OdPSSXBKGgp2q1eC20Kdj4CCneYPxpDtqyammQniEF8YPG++HLphpZd0UFqvjVCSV+jj 9BlFna7Lq3dKZZdXN5EC/CEIbXaaZzP5YUlV0vu+WPYHljyvX30t2ysLI1+FMzzjvLMa 2OcPrQtaExV9YvbgAc9JAfVjbuYm51DnuBQ= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net with ESMTP id 33exvj0kr4-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 10 Sep 2020 13:27:47 -0700 Received: from intmgw002.06.prn3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Thu, 10 Sep 2020 13:27:09 -0700 Received: by devvm1096.prn0.facebook.com (Postfix, from userid 111017) id BB3623D449B8; Thu, 10 Sep 2020 13:27:03 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1096.prn0.facebook.com To: Andrew Morton , CC: =Shakeel Butt , Johannes Weiner , Michal Hocko , , , Roman Gushchin Smtp-Origin-Cluster: prn0c01 Subject: [PATCH rfc 4/5] mm: introduce page memcg flags Date: Thu, 10 Sep 2020 13:26:58 -0700 Message-ID: <20200910202659.1378404-5-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200910202659.1378404-1-guro@fb.com> References: <20200910202659.1378404-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-10_09:2020-09-10,2020-09-10 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 lowpriorityscore=0 priorityscore=1501 suspectscore=0 clxscore=1015 adultscore=0 impostorscore=0 spamscore=0 malwarescore=0 bulkscore=0 mlxlogscore=317 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009100184 X-FB-Internal: deliver X-Rspamd-Queue-Id: CE5FA180AD817 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The lowest bit in page->memcg_data is used to distinguish between struct memory_cgroup pointer and a pointer to a objcgs array. All checks and modifications of this bit are open-coded. Let's formalize it using page memcg flags, defined in page_memcg_flags enum and replace all open-coded accesses with test_bit()/__set_bit(). Few additional flags might be added later. Flags are intended to be mutually exclusive. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 48d4c2c1ce81..7ab5f92bb686 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -340,23 +340,25 @@ struct mem_cgroup { extern struct mem_cgroup *root_mem_cgroup; +enum page_memcg_flags { + /* page->memcg_data is a pointer to a objcgs vector */ + PG_MEMCG_OBJ_CGROUPS, +}; + static inline struct mem_cgroup *page_mem_cgroup(struct page *page) { + unsigned long memcg_data = page->memcg_data; + VM_BUG_ON_PAGE(PageSlab(page), page); - return (struct mem_cgroup *)page->memcg_data; + + return (struct mem_cgroup *)memcg_data; } static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page) { unsigned long memcg_data = page->memcg_data; - /* - * The lowest bit set means that memcg isn't a valid - * memcg pointer, but a obj_cgroups pointer. - * In this case the page is shared and doesn't belong - * to any specific memory cgroup. - */ - if (memcg_data & 0x1UL) + if (test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) return NULL; return (struct mem_cgroup *)memcg_data; @@ -378,14 +380,20 @@ static inline void clear_page_mem_cgroup(struct page *page) #ifdef CONFIG_MEMCG_KMEM static inline struct obj_cgroup **page_obj_cgroups(struct page *page) { - return (struct obj_cgroup **)(page->memcg_data & ~0x1UL); + unsigned long memcg_data = page->memcg_data; + + VM_BUG_ON_PAGE(memcg_data && !test_bit(PG_MEMCG_OBJ_CGROUPS, + &memcg_data), page); + __clear_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data); + + return (struct obj_cgroup **)memcg_data; } static inline struct obj_cgroup **page_obj_cgroups_check(struct page *page) { unsigned long memcg_data = page->memcg_data; - if (memcg_data && (memcg_data & 0x1UL)) + if (memcg_data && test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) return (struct obj_cgroup **)memcg_data; return NULL; @@ -394,7 +402,11 @@ static inline struct obj_cgroup **page_obj_cgroups_check(struct page *page) static inline bool set_page_obj_cgroups(struct page *page, struct obj_cgroup **objcgs) { - return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL); + unsigned long memcg_data = (unsigned long)objcgs; + + __set_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data); + + return !cmpxchg(&page->memcg_data, 0, memcg_data); } static inline void clear_page_obj_cgroups(struct page *page) From patchwork Thu Sep 10 20:26:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11769141 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3EC2698 for ; Thu, 10 Sep 2020 20:27:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47A82221E2 for ; Thu, 10 Sep 2020 20:27:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="BoqgspmR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47A82221E2 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 91FAD90000E; Thu, 10 Sep 2020 16:27:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 831E9900002; Thu, 10 Sep 2020 16:27:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D44090000E; Thu, 10 Sep 2020 16:27:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 4FB62900002 for ; Thu, 10 Sep 2020 16:27:16 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1B604181AEF1E for ; Thu, 10 Sep 2020 20:27:16 +0000 (UTC) X-FDA: 77248286472.01.iron70_4203592270e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id E08311004E11D for ; Thu, 10 Sep 2020 20:27:15 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=6522785d41=guro@fb.com,,RULES_HIT:30054:30064:30070,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04y8qht4d1ygpcf7mxcortc517gefopms95r7n3sarxygfbrkkk6nqeker76cdz.eqwtu1we84mxsca7i7aiijp47r76sz3a8waxw1c7yyic65f1ojtwksg71es38su.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: iron70_4203592270e8 X-Filterd-Recvd-Size: 9944 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Sep 2020 20:27:15 +0000 (UTC) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08AKJrkY023692 for ; Thu, 10 Sep 2020 13:27:14 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=U00U9BAnLGdXJ2bHMe54VK8p/LCktUxrS/WXFvH6Qno=; b=BoqgspmRwSxVsaGA4vmb5sCQHKhaJn/Oh+Z/H8+U6KYmnShK+7iKa5t7brEuCguG7f3r 6gd65O4MAkK4GY+Ep23rtor69yBcMDO6aSHSomLLiYk4inf/Xg2on2dUs7q6RP9Rh/VA luVb+jd2CjQfJfD4bcfXXZbsSYPEjnM47uk= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 33f8bfddw5-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 10 Sep 2020 13:27:14 -0700 Received: from intmgw002.41.prn1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Thu, 10 Sep 2020 13:27:10 -0700 Received: by devvm1096.prn0.facebook.com (Postfix, from userid 111017) id C0CB43D449BA; Thu, 10 Sep 2020 13:27:03 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1096.prn0.facebook.com To: Andrew Morton , CC: =Shakeel Butt , Johannes Weiner , Michal Hocko , , , Roman Gushchin Smtp-Origin-Cluster: prn0c01 Subject: [PATCH rfc 5/5] mm: convert page kmemcg type to a page memcg flag Date: Thu, 10 Sep 2020 13:26:59 -0700 Message-ID: <20200910202659.1378404-6-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200910202659.1378404-1-guro@fb.com> References: <20200910202659.1378404-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-10_09:2020-09-10,2020-09-10 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 impostorscore=0 priorityscore=1501 malwarescore=0 phishscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 suspectscore=2 clxscore=1015 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009100183 X-FB-Internal: deliver X-Rspamd-Queue-Id: E08311004E11D X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: PageKmemcg flag is currently defined as a page type (like buddy, offline, table and guard). Semantically it means that the page was accounted as a kernel memory by the page allocator and has to be uncharged on the release. As a side effect of defining the flag as a page type, the accounted page can't be mapped to userspace (look at page_has_type() and comments above). In particular, this blocks the accounting of vmalloc-backed memory used by some bpf maps, because these maps do map the memory to userspace. One option is to fix it by complicating the access to page->mapcount, which provides some free bits for page->page_type. But it's way better to move this flag into page->memcg_data flags. Indeed, the flag makes no sense without enabled memory cgroups and memory cgroup pointer set in particular. This commit replaces PageKmemcg() and __SetPageKmemcg() with PageMemcgKmem() and SetPageMemcgKmem(). __ClearPageKmemcg() can be simple deleted because clear_page_mem_cgroup() already does the job. As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will be compiled out. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 35 ++++++++++++++++++++++++++++++++--- include/linux/page-flags.h | 11 ++--------- mm/memcontrol.c | 14 ++++---------- mm/page_alloc.c | 2 +- 4 files changed, 39 insertions(+), 23 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 7ab5f92bb686..430d1ca925c9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -343,15 +343,22 @@ extern struct mem_cgroup *root_mem_cgroup; enum page_memcg_flags { /* page->memcg_data is a pointer to a objcgs vector */ PG_MEMCG_OBJ_CGROUPS, + /* page has been accounted as a non-slab kernel page */ + PG_MEMCG_KMEM, + /* the next bit after the last actual flag */ + PG_MEMCG_LAST_FLAG, }; +#define MEMCG_FLAGS_MASK ((1UL << PG_MEMCG_LAST_FLAG) - 1) + static inline struct mem_cgroup *page_mem_cgroup(struct page *page) { unsigned long memcg_data = page->memcg_data; VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data), page); - return (struct mem_cgroup *)memcg_data; + return (struct mem_cgroup *)(memcg_data & ~MEMCG_FLAGS_MASK); } static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page) @@ -361,7 +368,7 @@ static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page) if (test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) return NULL; - return (struct mem_cgroup *)memcg_data; + return (struct mem_cgroup *)(memcg_data & ~MEMCG_FLAGS_MASK); } static inline void set_page_mem_cgroup(struct page *page, @@ -377,6 +384,16 @@ static inline void clear_page_mem_cgroup(struct page *page) page->memcg_data = 0; } +static inline bool PageMemcgKmem(struct page *page) +{ + return test_bit(PG_MEMCG_KMEM, &page->memcg_data); +} + +static inline void SetPageMemcgKmem(struct page *page) +{ + __set_bit(PG_MEMCG_KMEM, &page->memcg_data); +} + #ifdef CONFIG_MEMCG_KMEM static inline struct obj_cgroup **page_obj_cgroups(struct page *page) { @@ -385,6 +402,7 @@ static inline struct obj_cgroup **page_obj_cgroups(struct page *page) VM_BUG_ON_PAGE(memcg_data && !test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data), page); __clear_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data); + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_KMEM, &memcg_data), page); return (struct obj_cgroup **)memcg_data; } @@ -393,8 +411,10 @@ static inline struct obj_cgroup **page_obj_cgroups_check(struct page *page) { unsigned long memcg_data = page->memcg_data; - if (memcg_data && test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) + if (memcg_data && test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) { + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_KMEM, &memcg_data), page); return (struct obj_cgroup **)memcg_data; + } return NULL; } @@ -1052,6 +1072,15 @@ static inline void clear_page_mem_cgroup(struct page *page) { } +static inline bool PageMemcgKmem(struct page *page) +{ + return false; +} + +static inline void SetPageMemcgKmem(struct page *page) +{ +} + static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg) { return true; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index fbbb841a9346..a7ca01ae78d9 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -712,9 +712,8 @@ PAGEFLAG_FALSE(DoubleMap) #define PAGE_MAPCOUNT_RESERVE -128 #define PG_buddy 0x00000080 #define PG_offline 0x00000100 -#define PG_kmemcg 0x00000200 -#define PG_table 0x00000400 -#define PG_guard 0x00000800 +#define PG_table 0x00000200 +#define PG_guard 0x00000400 #define PageType(page, flag) \ ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE) @@ -765,12 +764,6 @@ PAGE_TYPE_OPS(Buddy, buddy) */ PAGE_TYPE_OPS(Offline, offline) -/* - * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on - * pages allocated with __GFP_ACCOUNT. It gets cleared on page free. - */ -PAGE_TYPE_OPS(Kmemcg, kmemcg) - /* * Marks pages in use as page tables. */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ba9b053b1b88..d4c21870dab9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3070,7 +3070,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) ret = __memcg_kmem_charge(memcg, gfp, 1 << order); if (!ret) { set_page_mem_cgroup(page, memcg); - __SetPageKmemcg(page); + SetPageMemcgKmem(page); return 0; } css_put(&memcg->css); @@ -3095,10 +3095,6 @@ void __memcg_kmem_uncharge_page(struct page *page, int order) __memcg_kmem_uncharge(memcg, nr_pages); clear_page_mem_cgroup(page); css_put(&memcg->css); - - /* slab pages do not have PageKmemcg flag set */ - if (PageKmemcg(page)) - __ClearPageKmemcg(page); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) @@ -6830,12 +6826,10 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) nr_pages = compound_nr(page); ug->nr_pages += nr_pages; - if (!PageKmemcg(page)) { - ug->pgpgout++; - } else { + if (PageMemcgKmem(page)) ug->nr_kmem += nr_pages; - __ClearPageKmemcg(page); - } + else + ug->pgpgout++; ug->dummy_page = page; clear_page_mem_cgroup(page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a707671f3b6c..3a61868113ec 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1197,7 +1197,7 @@ static __always_inline bool free_pages_prepare(struct page *page, } if (PageMappingFlags(page)) page->mapping = NULL; - if (memcg_kmem_enabled() && PageKmemcg(page)) + if (memcg_kmem_enabled() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); if (check_free) bad += check_free_page(page);