From patchwork Tue Jun 23 01:58:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11619569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B9016C1 for ; Tue, 23 Jun 2020 01:59:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C00642078E for ; Tue, 23 Jun 2020 01:59:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="JxwtWvrc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C00642078E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 769186B000A; Mon, 22 Jun 2020 21:58:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1A03F6B000C; Mon, 22 Jun 2020 21:58:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE9556B000C; Mon, 22 Jun 2020 21:58:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0073.hostedemail.com [216.40.44.73]) by kanga.kvack.org (Postfix) with ESMTP id 3920F6B0027 for ; Mon, 22 Jun 2020 21:58:57 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F36F02DFA for ; Tue, 23 Jun 2020 01:58:56 +0000 (UTC) X-FDA: 76958818272.25.shame66_2905a9b26e37 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id C3F2D1804E3A1 for ; Tue, 23 Jun 2020 01:58:56 +0000 (UTC) X-Spam-Summary: 2,0,0,04ae2f5afaf9e2e6,d41d8cd98f00b204,prvs=34435a7ffb=guro@fb.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2731:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4051:4250:4321:4385:4605:5007:6261:6653:7903:8603:9592:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12555:12679:12683:12895:13227:13229:13255:13869:14096:14097:14394:21080:21433:21450:21451:21627:21972:21990:30005:30012:30054:30064,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.12.0.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: shame66_2905a9b26e37 X-Filterd-Recvd-Size: 11537 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Tue, 23 Jun 2020 01:58:56 +0000 (UTC) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05N1sn3Z008923 for ; Mon, 22 Jun 2020 18:58:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=lgIPlQEXQMl2FRknlz4CSxRj2aZBHfbCY7YgfC2TDXk=; b=JxwtWvrclMX8/QJ6LPmFLkad/pvZOFRKHpCY3HrTEsukV8Py+pkjcwVH0Q+83PU1z5Nq Vx2zu9HFtQPnKrbnAtIlfJWaLwp77HVE0X/WMqGMU7iVKsv2WfAl2sg4Fc2wCgmbs1C8 M/jYjhM+hY2hbKUHX1Ir+F0UiQPu0V3HboY= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 31u09s2buw-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 22 Jun 2020 18:58:55 -0700 Received: from intmgw003.06.prn3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 22 Jun 2020 18:58:53 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 87B0826DD02C; Mon, 22 Jun 2020 18:58:48 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , Vlastimil Babka , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v7 09/19] mm: memcg/slab: charge individual slab objects instead of pages Date: Mon, 22 Jun 2020 18:58:36 -0700 Message-ID: <20200623015846.1141975-10-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200623015846.1141975-1-guro@fb.com> References: <20200623015846.1141975-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-22_16:2020-06-22,2020-06-22 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 phishscore=0 mlxscore=0 cotscore=-2147483648 suspectscore=2 adultscore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 mlxlogscore=999 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006230012 X-FB-Internal: deliver X-Rspamd-Queue-Id: C3F2D1804E3A1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Switch to per-object accounting of non-root slab objects. Charging is performed using obj_cgroup API in the pre_alloc hook. Obj_cgroup is charged with the size of the object and the size of metadata: as now it's the size of an obj_cgroup pointer. If the amount of memory has been charged successfully, the actual allocation code is executed. Otherwise, -ENOMEM is returned. In the post_alloc hook if the actual allocation succeeded, corresponding vmstats are bumped and the obj_cgroup pointer is saved. Otherwise, the charge is canceled. On the free path obj_cgroup pointer is obtained and used to uncharge the size of the releasing object. Memcg and lruvec counters are now representing only memory used by active slab objects and do not include the free space. The free space is shared and doesn't belong to any specific cgroup. Global per-node slab vmstats are still modified from (un)charge_slab_page() functions. The idea is to keep all slab pages accounted as slab pages on system level. Signed-off-by: Roman Gushchin Reviewed-by: Vlastimil Babka Reviewed-by: Shakeel Butt --- mm/slab.h | 173 ++++++++++++++++++++++++------------------------------ 1 file changed, 77 insertions(+), 96 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index c37a50f26e41..09d2c659cb68 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -381,72 +381,6 @@ static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) return NULL; } -/* - * Charge the slab page belonging to the non-root kmem_cache. - * Can be called for non-root kmem_caches only. - */ -static __always_inline int memcg_charge_slab(struct page *page, - gfp_t gfp, int order, - struct kmem_cache *s) -{ - int nr_pages = 1 << order; - struct mem_cgroup *memcg; - struct lruvec *lruvec; - int ret; - - rcu_read_lock(); - memcg = READ_ONCE(s->memcg_params.memcg); - while (memcg && !css_tryget_online(&memcg->css)) - memcg = parent_mem_cgroup(memcg); - rcu_read_unlock(); - - if (unlikely(!memcg || mem_cgroup_is_root(memcg))) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - nr_pages << PAGE_SHIFT); - percpu_ref_get_many(&s->memcg_params.refcnt, nr_pages); - return 0; - } - - ret = memcg_kmem_charge(memcg, gfp, nr_pages); - if (ret) - goto out; - - lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), nr_pages << PAGE_SHIFT); - - percpu_ref_get_many(&s->memcg_params.refcnt, nr_pages); -out: - css_put(&memcg->css); - return ret; -} - -/* - * Uncharge a slab page belonging to a non-root kmem_cache. - * Can be called for non-root kmem_caches only. - */ -static __always_inline void memcg_uncharge_slab(struct page *page, int order, - struct kmem_cache *s) -{ - int nr_pages = 1 << order; - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - rcu_read_lock(); - memcg = READ_ONCE(s->memcg_params.memcg); - if (likely(!mem_cgroup_is_root(memcg))) { - lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), - -(nr_pages << PAGE_SHIFT)); - memcg_kmem_uncharge(memcg, nr_pages); - } else { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(nr_pages << PAGE_SHIFT)); - } - rcu_read_unlock(); - - percpu_ref_put_many(&s->memcg_params.refcnt, nr_pages); -} - static inline int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp) { @@ -469,6 +403,47 @@ static inline void memcg_free_page_obj_cgroups(struct page *page) page->obj_cgroups = NULL; } +static inline size_t obj_full_size(struct kmem_cache *s) +{ + /* + * For each accounted object there is an extra space which is used + * to store obj_cgroup membership. Charge it too. + */ + return s->size + sizeof(struct obj_cgroup *); +} + +static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct obj_cgroup **objcgp, + size_t objects, gfp_t flags) +{ + struct kmem_cache *cachep; + + cachep = memcg_kmem_get_cache(s, objcgp); + if (is_root_cache(cachep)) + return s; + + if (obj_cgroup_charge(*objcgp, flags, objects * obj_full_size(s))) { + memcg_kmem_put_cache(cachep); + cachep = NULL; + } + + return cachep; +} + +static inline void mod_objcg_state(struct obj_cgroup *objcg, + struct pglist_data *pgdat, + int idx, int nr) +{ + struct mem_cgroup *memcg; + struct lruvec *lruvec; + + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + lruvec = mem_cgroup_lruvec(memcg, pgdat); + mod_memcg_lruvec_state(lruvec, idx, nr); + rcu_read_unlock(); +} + static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, size_t size, void **p) @@ -483,6 +458,10 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, off = obj_to_index(s, page, p[i]); obj_cgroup_get(objcg); page_obj_cgroups(page)[off] = objcg; + mod_objcg_state(objcg, page_pgdat(page), + cache_vmstat_idx(s), obj_full_size(s)); + } else { + obj_cgroup_uncharge(objcg, obj_full_size(s)); } } obj_cgroup_put(objcg); @@ -501,6 +480,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, off = obj_to_index(s, page, p); objcg = page_obj_cgroups(page)[off]; page_obj_cgroups(page)[off] = NULL; + + obj_cgroup_uncharge(objcg, obj_full_size(s)); + mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s), + -obj_full_size(s)); + obj_cgroup_put(objcg); } @@ -542,17 +526,6 @@ static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) return NULL; } -static inline int memcg_charge_slab(struct page *page, gfp_t gfp, int order, - struct kmem_cache *s) -{ - return 0; -} - -static inline void memcg_uncharge_slab(struct page *page, int order, - struct kmem_cache *s) -{ -} - static inline int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp) { @@ -563,6 +536,13 @@ static inline void memcg_free_page_obj_cgroups(struct page *page) { } +static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct obj_cgroup **objcgp, + size_t objects, gfp_t flags) +{ + return NULL; +} + static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, size_t size, void **p) @@ -600,32 +580,33 @@ static __always_inline int charge_slab_page(struct page *page, gfp_t gfp, int order, struct kmem_cache *s) { - int ret; - - if (is_root_cache(s)) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - PAGE_SIZE << order); - return 0; - } +#ifdef CONFIG_MEMCG_KMEM + if (memcg_kmem_enabled() && !is_root_cache(s)) { + int ret; - ret = memcg_alloc_page_obj_cgroups(page, s, gfp); - if (ret) - return ret; + ret = memcg_alloc_page_obj_cgroups(page, s, gfp); + if (ret) + return ret; - return memcg_charge_slab(page, gfp, order, s); + percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); + } +#endif + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), + PAGE_SIZE << order); + return 0; } static __always_inline void uncharge_slab_page(struct page *page, int order, struct kmem_cache *s) { - if (is_root_cache(s)) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(PAGE_SIZE << order)); - return; +#ifdef CONFIG_MEMCG_KMEM + if (memcg_kmem_enabled() && !is_root_cache(s)) { + memcg_free_page_obj_cgroups(page); + percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); } - - memcg_free_page_obj_cgroups(page); - memcg_uncharge_slab(page, order, s); +#endif + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), + -(PAGE_SIZE << order)); } static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) @@ -691,7 +672,7 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, if (memcg_kmem_enabled() && ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT))) - return memcg_kmem_get_cache(s, objcgp); + return memcg_slab_pre_alloc_hook(s, objcgp, size, flags); return s; }